read/write counts in FLMs

Tanel Alumäe tanel.alumae at aqris.com
Fri Jun 10 06:46:50 PDT 2005


Hello,

As far as I understand, you need both the FLM LM file and the FLM counts
file to actually use the FLM. So you should actually always use both the
-write-counts and the -lm option when building FLM.

As for -read-counts, I believe that you could use a general counts file
there (i.e. which counts the occurrances of tagged words rather than the
factors). You can get the general counts file from the tagged corpus
using the ngram-count program, just like for untagged corpus.

The FLM counts file uses a special format (look into it and you see)
which probably confuses fngram-count when fed into it using
-read-counts.

Hope this helps,

Tanel A.


On Wed, 2005-06-08 at 09:29 -0400, Shachi Dave wrote:
> Hi,
> 
> I am trying to build a factored language model(FLM) using "fngram-count"
> in SRILM toolkit. 
> 
> When I run it using "-write-counts" and "-lm" options together, it
> builds the FLM correctly. But when I try to break it down into two
> steps:
> (a) only "-write-counts" option to write the counts file
> (b) "-read-counts" and "-lm" options to build the FLM using the counts
> file
> 
> it gives errors. I checked the debug output; it seems it is getting the
> count-of-counts for modified Kneser-Ney discounting wrong in the step
> (b) above. The counts file generated in step (a) is exactly similar to
> the one generated using both "-write-counts" and "-lm" options together.
> I tried these steps using a couple of different FLM specifications and
> the error is the same. Has anyone faced this problem before? I will
> appreciate if you can help me out here.
> 
> Thanks,
> Shachi
> 
> 
> 
> 




More information about the SRILM-User mailing list