read/write counts in FLMs
tanel.alumae at aqris.com
Fri Jun 10 06:46:50 PDT 2005
As far as I understand, you need both the FLM LM file and the FLM counts
file to actually use the FLM. So you should actually always use both the
-write-counts and the -lm option when building FLM.
As for -read-counts, I believe that you could use a general counts file
there (i.e. which counts the occurrances of tagged words rather than the
factors). You can get the general counts file from the tagged corpus
using the ngram-count program, just like for untagged corpus.
The FLM counts file uses a special format (look into it and you see)
which probably confuses fngram-count when fed into it using
Hope this helps,
On Wed, 2005-06-08 at 09:29 -0400, Shachi Dave wrote:
> I am trying to build a factored language model(FLM) using "fngram-count"
> in SRILM toolkit.
> When I run it using "-write-counts" and "-lm" options together, it
> builds the FLM correctly. But when I try to break it down into two
> (a) only "-write-counts" option to write the counts file
> (b) "-read-counts" and "-lm" options to build the FLM using the counts
> it gives errors. I checked the debug output; it seems it is getting the
> count-of-counts for modified Kneser-Ney discounting wrong in the step
> (b) above. The counts file generated in step (a) is exactly similar to
> the one generated using both "-write-counts" and "-lm" options together.
> I tried these steps using a couple of different FLM specifications and
> the error is the same. Has anyone faced this problem before? I will
> appreciate if you can help me out here.
More information about the SRILM-User