FW: A simple question about SRILM
Roy Bar Haim
barhaim at cs.technion.ac.il
Mon May 17 13:05:31 PDT 2004
Hi Andreas,
Thanks for you super-fast reply!
I tried it like you suggested:
ngram-count -order 3 -gt1max 0 -gt1min 1 -gt2max 0 -gt2min 1 -gt3max 0
-gt3min 1 -text corpus.tags -lm corpus.tags.lm2 -debug 1
Many of the backoff weights indeed became 99 (which is good), but many
remained non-zero (although small: -6,-7,-8...)
Is there a way to make them all 99?
The debug messages I got are listed below.
Thanks a lot,
Roy.
------------------------------------------------------------------------
---------------------
corpus.tags: line 1892: 1892 sentences, 48332 words, 0 OOVs
0 zeroprobs, logprob= 0 ppl= 1 ppl1= 1
Good-Turing discounting 1-grams
GT-count [0] = 0
GT-count [1] = 0
warning: no singleton counts
GT discounting disabled
Good-Turing discounting 2-grams
GT-count [0] = 0
GT-count [1] = 126
GT discounting disabled
Good-Turing discounting 3-grams
GT-count [0] = 0
GT-count [1] = 2142
GT discounting disabled
discarded 1 2-gram contexts containing pseudo-events
discarded 2 3-gram contexts containing pseudo-events
writing 41 1-grams
writing 800 2-grams
writing 5145 3-grams
> -----Original Message-----
> From: Andreas Stolcke [mailto:stolcke at speech.sri.com]
> Sent: Monday, May 17, 2004 7:38 PM
> To: Roy Bar Haim
> Cc: srilm-user at speech.sri.com
> Subject: Re: FW: A simple question about SRILM
>
>
>
> In message
> <001701c43c3c$65fc62c0$34284484 at cs.technion.ac.il>you wrote:
> > Hi,
> >
> > I have the same problem. I want the LM to give maximum-likelihood
> > estimates. That is, all the backoff weights should be zero.
> >
> > I applied the solution below, but still I get backoff weights.
> >
> > For example, when I build the lm like this:
> > ngram-count -order 3 -gt1max 0 -gt2max 0 -gt3max 0 -text
> corpus.tags
> > -lm corp us.tags.lm
> >
> > I found that the once-occuring trigrams DO NOT APPEAR in the lm, so
> > probablit y mass is still discounted.
>
> the default minimum coccurrence count for trigrams is 2. set
> it to 1 to
> include all trigrams:
>
> -gt3min 1 etc.
>
> that's why you still get backoff.
>
> >
> > When I turned on the debug messages, I saw many messages like:
> > warning: 0 backoff probability mass left for "AT SCLN" --
> incrementing denomi
> > nator
> >
> > Does it mean that smoothing is enforced here?
> >
> > Is there a way to get a pure maximum-likelihood language model,
> > without backo ff weights at all, using ngram-count?
>
> see above.
>
> --Andreas
>
>
More information about the SRILM-User
mailing list