<div>SRILM community,</div><div><br></div><div>I am trying to work with ngram-classes.  More specifically, I want to connect new vocabulary that is semantically similar to certain vocabulary within the corpus.  e.g. My language model has the class $organization = {ibm, intel} and I know that {google}, which is not in the training corpus, will show up in the same context in some test corpus.  The corpus / language model that I am working with is much simpler, meaning that the language is very much like a template (or mad libs).</div><div><br></div><div>As a result of the structure of the corpus I am working with, I am only concerned with a few (2-5) multi-word clusters, while retaining single element classes for rest of the vocabulary.  This means that numclasses is going to be on the order of {V - O(|C|)} where |C| is expected cardinality of the set.  I also plan on defining the initial clusters that would be appended during the merging via ngram-classes.  </div><div><br></div><div>Does ngram-classes support a method for constraining the class merging to only work between single-word classes and the predefined multi-word classes?</div><div><br></div><div>My initial attempt at a solution would be to iterate over a range of numclasses with the aforementioned base-classes and see how classes are formed from the initial conditions.  My worry is that words not in the initial multi-word classes will merge, leading to a Null result.  </div><div><br></div><div>For the time being, I am going to use the -full flag to glean intuition about word clusters, then plan my class initialization accordingly.</div><div><br></div><div>Best,</div><div>Jon</div>