FW: A simple question about SRILM

Andreas Stolcke stolcke at speech.sri.com
Mon May 17 10:37:58 PDT 2004


In message <001701c43c3c$65fc62c0$34284484 at cs.technion.ac.il>you wrote:
> Hi,
> 
> I have the same problem. I want the LM to give maximum-likelihood estimates.
> That is, all the backoff weights should be zero.
> 
> I applied the solution below, but still I get backoff weights. 
> 
> For example, when I build the lm like this:
> ngram-count -order 3 -gt1max 0 -gt2max 0 -gt3max 0 -text corpus.tags -lm corp
> us.tags.lm
> 
> I found that the once-occuring trigrams DO NOT APPEAR in the lm, so probablit
> y mass is still discounted.

the default minimum coccurrence count for trigrams is 2.  set it to 1 to 
include all trigrams:

-gt3min 1 etc.

that's why you still get backoff.

> 
> When I turned on the debug messages, I saw many messages like: 
> warning: 0 backoff probability mass left for "AT SCLN" -- incrementing denomi
> nator 
> 
> Does it mean that smoothing is enforced here?
> 
> Is there a way to get a pure maximum-likelihood language model, without backo
> ff weights at all, using ngram-count?

see above.

--Andreas 




More information about the SRILM-User mailing list