[SRILM User List] lattice rescoring with conventional LM and FLM

yuan liang yuan at ks.cs.titech.ac.jp
Tue Oct 16 17:33:03 PDT 2012


Hi Andreas,

Thank you very much!


>> 2) I used a Trigram in FLM format to rescore "Lattice_1":
>>
>>     First I converted all word nodes (HTk format) to FLM representation;
>>
>>     Then rescored with:
>>
>>   " lattice-tool  -in-lattice  Lattice_1  -unk  -vocab [voc_file]
>>  -read-htk  -no-nulls  -no-htk-nulls  -factored  -lm
>> [FLM_specification_file]  -htk-lmscale  15  -htk-logbase 2.71828183
>>  -posterior-scale  15  -write-htk  -out-lattice Lattice_3"
>>
>>    I think "Lattice_2" and "Lattice_3" should be the same, since the
>> perplexity of using Trigram and using Trigram in FLM format are same.
>> However, they are different. Did I miss something?
>>
>
> This is a question about the equivalent encoding of standard word-based
> LMs as FLMs, and I'm not an expert here.
> However, as a sanity check, I would first do a simple perplexity
> computation (ngram -debug 2 -ppl) with both models on some test set and
> make sure you get the same word-for-word conditional probabilities.  If
> not, you can spot where the differences are and present a specific case of
> different probabilities to the group for debugging.
>
>
> Actually I did the perplexity test on a test set of 6564 sentences (72854
words). The total perplexity are the same using standard word-based Trigram
LM as using FLM Trigram. Also I checked the details of the word-for-word
conditional probability, for these 72854 words, only 442 words' conditional
probabilities are not exactly the same, others are exactly the same.
However the probability difference is negligible ( like 0.00531048 and
0.00531049, 5.38809e-07 and 5.38808e-07 ). So I thought we can say both
models can get the same word-for-word conditional probabilities.

I also considered probably it's because of the FLM format, lattice
expanding with standard Trigram is seems different with FLM Trigram, using
FLM Trigram lattice expanded around 300 times larger than using standard
Trigram, maybe the expanding way is different. I'm not sure, I still need
to investigate more.


Thank you very much for your advices!

Regards,
Yuan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20121017/e68f52d0/attachment.html>


More information about the SRILM-User mailing list