0-grams

Mon Jul 8 11:40:41 PDT 2002

There are no 0-gram models, mostly because the DARPA format does not 
support that.  Because of that, SRILM handles the backoff probability mass
at the unigram level in a special way:  it is distributed over all unobserved
words.  This is equivalent to having a backoff to 0-th order distribution.

In practical terms, you use

	ngram-count -vocab VOCAB -order 1 -lm LM 

Since no ngram counts or text data are supplied, the mechanism that
distributes backoff probability mass for unigrams will spread all
probability uniformly over the entire vocabulary (which you have to
supply of course).

Of course -order 0 should not make the program core dump -- i'll fix that.

--Andreas

In message <3D2947A2.7040304 at ei.tum.de>you wrote:
> Hello,
> 
> I'd like to create 0-grams as well as higher-order n-grams, but when I
> call ngram-count with option -order 0 I get a segmentation fault (SRI LM
> 1.3.1).
> 
> Regards
> Matthias
>