Factored LMs and interpolated models
Andreas Stolcke
stolcke at speech.sri.com
Fri May 7 07:02:40 PDT 2004
There are few knowns bugs in the FLM code as last released.
They will be fixed in the next release (1.4.1) which I expect to
be out in a couple days.
--Andreas
In message <1083912755.8267.7.camel at NOOL2>you wrote:
>
>
> > Let me know if this helps or if I have misunderstood your question...
> >
>
> Hello,
>
> First, thanks to everybody for help.
>
> My goal was, as Katrin correctly assumed, "to interpolate a
> traditional class-based model and a standard n-gram model but you want
> to express this within a single FLM file". This is currently not
> possible, but it's not very important because I learned that I can
> use:
>
> ngram -factored -lm <FLM1> -mix-lm <FLM2>
>
> The above really works.
>
> Still, I noticed a strange thing with perplexity calculation. Namely,
> the perplexity figures calculated by fngram and ngram are slightly
> different. I used the following options and got following results:
>
> fngram -ppl <testtext> -factor-file tmp/fngram_m.conf
>
> Result:
> 61 sentences, 1009 words, 26 OOVs
> 0 zeroprobs, logprob= -2760.87 ppl= 441.076 ppl1= 643.604
>
> ngram -factored -ppl <testtext> -lm tmp/fngram_m.conf 61 sentences, 1009
> words,
>
> Result:
> 26 OOVs 0 zeroprobs, logprob= -2761.16 ppl= 441.359 ppl1= 644.042
>
>
> --
>
> The above is for a FLM that in fact is standard word trigram. The
> difference is very small.
>
> However, when I test a FLM that is a word-given-two-previous-classes
> trigram, the difference is much larger:
>
> fngram -ppl <testtext> -factor-file tmp/fngram_c.conf
>
> 61 sentences, 1009 words, 26 OOVs
> 0 zeroprobs, logprob= - 2826.73 ppl= 510.034 ppl1= 750.963
>
> And the same with ngram:
>
> ngram -factored -lm tmp/fngram_c.conf -ppl <testtext>
>
> 61 sentences, 1009 words, 26 OOVs
> 0 zeroprobs, logprob= -2863.71 ppl= 553.378 ppl1= 818.917
>
>
> As you see, here the difference (ppl1= 750 vs 818) is significant. Could
> this be a configuration issue, a bug or have I understood smth wrong?
>
> Regards,
>
> Tanel Alumäe
>
More information about the SRILM-User
mailing list