Factored LMs and interpolated models

Katrin Kirchhoff katrin at ssli-mail.ee.washington.edu
Fri May 7 10:25:20 PDT 2004


In order to emulate the exact behaviour of ngram with fngram,
you need to use:

-no-virtual-begin-sentence
-nonull

and make sure that the smoothing options (smoothing method, gtmin, gtmax etc.)
in your FLM file correspond to the the same values that ngram uses.

E.g. for a standard trigram 

ngram -lm <non-factored LM> -ppl <text>

and

fngram -factor-file <factored LM> -ppl <text> -no-virtual-begin-sentence -nonull

should give exactly the same perplexities. Andreas might be able to 
say whether these are needed when using ngram with the -factored  
option.

Katrin 


> Still, I noticed a strange thing with perplexity calculation. Namely,
> the perplexity figures calculated by fngram and ngram are slightly
> different.  I used the following options and got following results:
> 
> fngram -ppl <testtext> -factor-file tmp/fngram_m.conf
> 
> Result: 
> 61 sentences, 1009 words, 26 OOVs 
> 0 zeroprobs, logprob= -2760.87 ppl= 441.076 ppl1= 643.604
> 
> ngram -factored -ppl <testtext> -lm tmp/fngram_m.conf 61 sentences, 1009
> words, 
> 
> Result:
> 26 OOVs 0 zeroprobs, logprob= -2761.16 ppl= 441.359 ppl1= 644.042
> 
> 
> -- 
> 
> The above is for a FLM that in fact is standard word trigram. The
> difference is very small.
> 
> However, when I test a FLM that is a word-given-two-previous-classes
> trigram, the difference is much larger:
> 
> fngram -ppl <testtext> -factor-file tmp/fngram_c.conf 
> 
> 61 sentences, 1009 words, 26 OOVs 
> 0 zeroprobs, logprob= - 2826.73 ppl= 510.034 ppl1= 750.963
> 
> And the same with ngram:
> 
> ngram -factored -lm tmp/fngram_c.conf -ppl <testtext>
> 
> 61 sentences, 1009 words, 26 OOVs 
> 0 zeroprobs, logprob= -2863.71 ppl= 553.378 ppl1= 818.917
> 
> 
> As you see, here the difference (ppl1= 750 vs 818) is significant. Could
> this be a configuration issue, a bug or have I understood smth wrong?
> 
> Regards,
> 
> Tanel Alumäe

-- 
-----------------------------------------------------------------
Katrin Kirchhoff
Dept of Electrical Engineering, University of Washington
M422 EE/CS Building, Box 352500, Seattle, WA, 98195
Phone: (206) 616 5494
katrin at ee.washington.edu
-----------------------------------------------------------------




More information about the SRILM-User mailing list