[SRILM User List] lattice rescoring with conventional LM and FLM
Andreas Stolcke
stolcke at icsi.berkeley.edu
Tue Oct 16 21:52:44 PDT 2012
On 10/16/2012 5:33 PM, yuan liang wrote:
> Hi Andreas,
>
> Thank you very much!
>
>
> 2) I used a Trigram in FLM format to rescore "Lattice_1":
>
> First I converted all word nodes (HTk format) to FLM
> representation;
>
> Then rescored with:
>
> " lattice-tool -in-lattice Lattice_1 -unk -vocab
> [voc_file] -read-htk -no-nulls -no-htk-nulls -factored
> -lm [FLM_specification_file] -htk-lmscale 15 -htk-logbase
> 2.71828183 -posterior-scale 15 -write-htk -out-lattice
> Lattice_3"
>
> I think "Lattice_2" and "Lattice_3" should be the same,
> since the perplexity of using Trigram and using Trigram in FLM
> format are same. However, they are different. Did I miss
> something?
>
>
> This is a question about the equivalent encoding of standard
> word-based LMs as FLMs, and I'm not an expert here.
> However, as a sanity check, I would first do a simple perplexity
> computation (ngram -debug 2 -ppl) with both models on some test
> set and make sure you get the same word-for-word conditional
> probabilities. If not, you can spot where the differences are and
> present a specific case of different probabilities to the group
> for debugging.
>
>
> Actually I did the perplexity test on a test set of 6564 sentences
> (72854 words). The total perplexity are the same using standard
> word-based Trigram LM as using FLM Trigram. Also I checked the details
> of the word-for-word conditional probability, for these 72854 words,
> only 442 words' conditional probabilities are not exactly the same,
> others are exactly the same. However the probability difference is
> negligible ( like 0.00531048 and 0.00531049, 5.38809e-07 and
> 5.38808e-07 ). So I thought we can say both models can get the same
> word-for-word conditional probabilities.
>
> I also considered probably it's because of the FLM format, lattice
> expanding with standard Trigram is seems different with FLM Trigram,
> using FLM Trigram lattice expanded around 300 times larger than using
> standard Trigram, maybe the expanding way is different. I'm not sure,
> I still need to investigate more.
The lattice expansion algorithm makes use of the backoff structure of
the standard LM to minimize the number of nodes that need to be
duplicated to correctly apply the probabilities. The FLM makes more
conservative assumptions and always assumes you need two words of
context, leading to more nodes after expansion. That would explain the
size difference.
You can also check the probabilities in expanded lattices. The command
lattice-tool -in-lattice LATTICE -ppl TEXT -debug 2 ...
will compute the probabilities assigned to the words in TEXT by
traversing the lattice. It is worth checking first that expansion with
FLMs yields the right probabilities.
You say that viterbi decoding gives almost the same results (this
suggests the expansion works correctly), but posterior (confusion
network) decoding doesn't. It is possible there is a problem with
building CNs from lattices with factored vocabularies. I don't think I
every tried that. It would help to find a minimal test case that shows
the problem.
Andreas
>
>
> Thank you very much for your advices!
>
> Regards,
> Yuan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20121016/008a936f/attachment.html>
More information about the SRILM-User
mailing list