[SRILM User List] Constraining class building
Jonathan Mendoza
mrfox321 at gmail.com
Wed May 23 10:35:12 PDT 2018
SRILM community,
I am trying to work with ngram-classes. More specifically, I want to
connect new vocabulary that is semantically similar to certain vocabulary
within the corpus. e.g. My language model has the class $organization =
{ibm, intel} and I know that {google}, which is not in the training corpus,
will show up in the same context in some test corpus. The corpus /
language model that I am working with is much simpler, meaning that the
language is very much like a template (or mad libs).
As a result of the structure of the corpus I am working with, I am only
concerned with a few (2-5) multi-word clusters, while retaining single
element classes for rest of the vocabulary. This means that numclasses is
going to be on the order of {V - O(|C|)} where |C| is expected cardinality
of the set. I also plan on defining the initial clusters that would be
appended during the merging via ngram-classes.
Does ngram-classes support a method for constraining the class merging to
only work between single-word classes and the predefined multi-word classes?
My initial attempt at a solution would be to iterate over a range of
numclasses with the aforementioned base-classes and see how classes are
formed from the initial conditions. My worry is that words not in the
initial multi-word classes will merge, leading to a Null result.
For the time being, I am going to use the -full flag to glean intuition
about word clusters, then plan my class initialization accordingly.
Best,
Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20180523/b23b06f7/attachment.html>
More information about the SRILM-User
mailing list