Language Model output problem using FLM

Antoine Ghaoui Antoine.Ghaoui at
Thu Feb 15 00:09:39 PST 2007


I'm trying to use fngram-count to generate a Language Model based on  
I'm trying to generate a trigram model in order to be familiar with  
the tool.

The factor file is:

## word trigram
W : 2 W(-1) W(-2) ntextfile_99.flm.cnt ntextfile_99.flm.lm 3
W1W2    W2      kndiscount gtmin 1 interpolate
W1      W1      kndiscount gtmin 1 interpolate
0       0       kndiscount gtmin 1

The command line used is:
fngram-count -factor-file flm_spc.1 -text ntextfile_99.flm -lm  
ntextfile_99.flm.lm -vocab ntextfile.vocab.flm

The lm file generated is a little bit strange. A part of it is shown  
ngram 0x0=18119
ngram 0x1=2855740
ngram 0x2=0
ngram 0x3=6490198

-2.313375       </s>
-99     <s>
-0.9892201      <s> W-LTN       -1.629908

-0.9725394      <s> <s> W-LTN   -1.654503

Can you please help on this? Is it normal to have ngram 0x2=0? How  
can I get the old format?

Thanks for your help


More information about the SRILM-User mailing list