0-grams
Andreas Stolcke
stolcke at speech.sri.com
Mon Jul 8 11:40:41 PDT 2002
There are no 0-gram models, mostly because the DARPA format does not
support that. Because of that, SRILM handles the backoff probability mass
at the unigram level in a special way: it is distributed over all unobserved
words. This is equivalent to having a backoff to 0-th order distribution.
In practical terms, you use
ngram-count -vocab VOCAB -order 1 -lm LM
Since no ngram counts or text data are supplied, the mechanism that
distributes backoff probability mass for unigrams will spread all
probability uniformly over the entire vocabulary (which you have to
supply of course).
Of course -order 0 should not make the program core dump -- i'll fix that.
--Andreas
In message <3D2947A2.7040304 at ei.tum.de>you wrote:
> Hello,
>
> I'd like to create 0-grams as well as higher-order n-grams, but when I
> call ngram-count with option -order 0 I get a segmentation fault (SRI LM
> 1.3.1).
>
> Regards
> Matthias
>
More information about the SRILM-User
mailing list