LM missing back-off probabilities
Yannick Estève - LIUM
yannick.esteve at lium.univ-lemans.fr
Wed May 25 13:57:50 PDT 2005
I hope this message can help you.
To use CMU Sphinx with LM estimated with SRILM you have to use two tools
provided with SRILM toolkit :
-add-dummy-bows: this program adds the 'missing' back-off weights (in
fact, when these weights equal to 0 ngram-count doesn't print them)
-sort-lm: this program sorts n-grams in lexical order (lm3gdmp works
only if the n-grams are sorted. In fact, 2-3-...-k-grams have to be
sorted in the same order).
These two tools are programmed in awk (awk or gawk have to be installed
on your computer).
-- Yannick
Goldee Udani a écrit :
> Hi there,
>
> I am sorry if this problem has already been addressed before on this
> forum.
>
> I am trying to generate a small LM for using in Sphinx Speech
> Recognition system but the back-off probabilities for every ngram
> occuring at the end of sentence(s) are missing.
> For example -
>
> <s> we cannot afford to fight the war against poverty with accounting
> tricks </s>
>
> For a trigram LM, it doesn't generate back-off probabilities for
> "tricks" (unigram) and "accounting tricks " (bigram). This tends to
> happen for all the sentences in the test set taken from the corpus.
>
> I am trying to use the "ngram-count" script with witten bell
> discounting applied to all n-grams in a trigram model.
>
> If any of you have faced a similar problem before, I would appreciate
> it if you could help me out here.
>
> Thanks,
> Goldee
>
>
More information about the SRILM-User
mailing list