LM missing back-off probabilities

Wed May 25 13:57:50 PDT 2005

I hope this message can help you.

To use CMU Sphinx with LM estimated with SRILM you have to use two tools 
provided with SRILM toolkit :

-add-dummy-bows:  this program adds the 'missing' back-off weights (in 
fact, when these weights equal to 0 ngram-count doesn't print them)
-sort-lm: this program sorts n-grams in lexical order (lm3gdmp works 
only if the n-grams are sorted. In fact, 2-3-...-k-grams have to be 
sorted in the same order).

These two tools are programmed in awk (awk or gawk have to be installed 
on your computer).

-- Yannick

Goldee Udani a écrit :

> Hi there,
>
> I am sorry if this problem has already been addressed before on this 
> forum.
>
> I am trying to generate a small LM for using in Sphinx Speech 
> Recognition system but the back-off probabilities for every ngram 
> occuring at the end of sentence(s) are missing.
> For example -
>
> <s> we cannot afford to fight the war against poverty with accounting 
> tricks </s>
>
> For a trigram LM, it doesn't generate back-off probabilities for 
> "tricks" (unigram) and "accounting tricks " (bigram). This tends to 
> happen for all the sentences in the test set taken from the corpus.
>
> I am trying to use the "ngram-count" script with witten bell 
> discounting applied to all n-grams in a trigram model.
>
> If any of you have faced a similar problem before, I would appreciate 
> it if you could help me out here.
>
> Thanks,
> Goldee
>
>