Language Model output problem using FLM

Antoine Ghaoui Antoine.Ghaoui at jinny.ie
Thu Feb 15 00:09:39 PST 2007


Hello,

I'm trying to use fngram-count to generate a Language Model based on  
Morphology.
I'm trying to generate a trigram model in order to be familiar with  
the tool.

The factor file is:

## word trigram
1
W : 2 W(-1) W(-2) ntextfile_99.flm.cnt ntextfile_99.flm.lm 3
W1W2    W2      kndiscount gtmin 1 interpolate
W1      W1      kndiscount gtmin 1 interpolate
0       0       kndiscount gtmin 1

The command line used is:
fngram-count -factor-file flm_spc.1 -text ntextfile_99.flm -lm  
ntextfile_99.flm.lm -vocab ntextfile.vocab.flm

The lm file generated is a little bit strange. A part of it is shown  
below:
\data\
ngram 0x0=18119
ngram 0x1=2855740
ngram 0x2=0
ngram 0x3=6490198


\0x0-grams:
-2.313375       </s>
-99     <s>
.
.
\0x1-grams:
-0.9892201      <s> W-LTN       -1.629908
.
.
\\0x2-grams:

\0x3-grams:
-0.9725394      <s> <s> W-LTN   -1.654503
.
.
\end\

Can you please help on this? Is it normal to have ngram 0x2=0? How  
can I get the old format?

Thanks for your help

Antoine



More information about the SRILM-User mailing list