[SRILM User List] linear interpolation of different vocabulary language models

Andreas Stolcke stolcke at icsi.berkeley.edu
Wed Jan 16 19:22:20 PST 2013


On 1/16/2013 6:00 PM, Marta Ruiz wrote:
> Hi Andreas,
>
> regarding this issue, I got the error
>
>  class definition has too many fields
That means you must have a very long line in your class definitions file.
You should have one class membership definition per line.
If a class has many members you write one per line, for example

NN    cat
NN    dog
NN    ball

etc.

Andreas
>
> in fact, I wanted to expand a language model of PoS tags into words...
> actually, each PoS has many words related...
>
>
> best regards,
> Marta
>
> On Wed, Jan 9, 2013 at 3:34 PM, Andreas Stolcke 
> <stolcke at icsi.berkeley.edu <mailto:stolcke at icsi.berkeley.edu>> wrote:
>
>     On 1/8/2013 6:07 PM, Marta Ruiz wrote:
>>     Thanks Andreas, two more questions
>>
>>
>>         1. Create a word-based version of each model.  For example,
>>         you can construct a POS-based LM and combine it with a class
>>         membership mapping (in classes-format, see man page) to get a
>>         word-level POS-based model.   Similar with lemma-based LMs
>>         (the lemmas are effectively word classes).
>>
>>
>>     which is the instruction to do this?
>
>     1. You create the class-to-word mapping file (in the format
>     described here
>     <http://www.speech.sri.com/projects/srilm/manpages/classes-format.5.html>)
>     to reflect either your POS-to-word or lemma-to-word mapping.
>     2. Process the training data to replace the words with POS or
>     lemmas, as appropriate.
>     3. Train the ngram portion of the LM by running ngram-count on the
>     training data represented as a sequence of POS tags / lemmas (from
>     step 2).
>
>
>
>>         2. Then interpolate the models using
>>
>>             ngram -bayes 0 -lm LM1 -mix-lm LM2 -mix-lm2 LM3 ....
>>         -lambda ... -mix-lambda2 ... -classes CLASSES
>>
>>         where CLASSES is a classes-format(5) file defining the union
>>         of all the word classes used in the various component models.
>>
>>
>>     to find the lambdas can I use the compute-best-mix, can't I?
>     Exactly.
>
>     Andreas
>
>
>
>
> -- 
> Marta Ruiz Costa-jussà
> martaruizcostajussa at gmail.com <mailto:martaruizcostajussa at gmail.com>
> http://gps-tsc.upc.es/veu/personal/mruiz/mruiz.php3

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20130116/6a58b6fb/attachment.html>


More information about the SRILM-User mailing list