</s> Backoff missing

Tolos, Marta tolos at sony.de
Wed Aug 21 02:06:22 PDT 2002


Hi all,

I have a problem using the toolkit, I create a language model using only the
ngram-count command:

ngram-count -text my.text -lm my.arpa -wbdiscount1 -wbdiscount3 -wbdiscount3


My text file has the setences markers <s> </s>.

And then the arpa file I get, for the unigram </s> has no backoff weight and
also all the bigrams that contain </s> as the second word in the bigram have
no backoff either.
Does someone know how to get the backoff weight? My problem is that the
recognizer complains about the format of my language model, since all the
bigrams without the backoff are not considered and then at the end since
there are so many it stops.

I also have another question about the format of the arpa file created.
Between the probabilities and the words there is not a single space and this
causes problems also with the recognizer I am using. What I am doing right
now to avoid this problem is to use a perl script to fix the format and then
use the converted file that has only a single space, is there an option to
get a single space??


Thanks a lot.

Best,

Marta




More information about the SRILM-User mailing list