Language Model output problem using FLM
Antoine Ghaoui
Antoine.Ghaoui at jinny.ie
Thu Feb 15 00:09:39 PST 2007
Hello,
I'm trying to use fngram-count to generate a Language Model based on
Morphology.
I'm trying to generate a trigram model in order to be familiar with
the tool.
The factor file is:
## word trigram
1
W : 2 W(-1) W(-2) ntextfile_99.flm.cnt ntextfile_99.flm.lm 3
W1W2 W2 kndiscount gtmin 1 interpolate
W1 W1 kndiscount gtmin 1 interpolate
0 0 kndiscount gtmin 1
The command line used is:
fngram-count -factor-file flm_spc.1 -text ntextfile_99.flm -lm
ntextfile_99.flm.lm -vocab ntextfile.vocab.flm
The lm file generated is a little bit strange. A part of it is shown
below:
\data\
ngram 0x0=18119
ngram 0x1=2855740
ngram 0x2=0
ngram 0x3=6490198
\0x0-grams:
-2.313375 </s>
-99 <s>
.
.
\0x1-grams:
-0.9892201 <s> W-LTN -1.629908
.
.
\\0x2-grams:
\0x3-grams:
-0.9725394 <s> <s> W-LTN -1.654503
.
.
\end\
Can you please help on this? Is it normal to have ngram 0x2=0? How
can I get the old format?
Thanks for your help
Antoine
More information about the SRILM-User
mailing list