[SRILM User List] Factored Language Model - Backoff Weight Differs
Gebhardt, Jan
jan.gebhardt at student.kit.edu
Thu Aug 5 10:03:40 PDT 2010
Hello,
I am working with Factored Language Models and want to start with a Factored Language Model which is equal to a standart 4-gram Language Model. Therefore I use the following factor language model file:
1
W : 3 W(-1) W(-2) W(-3) trainW.count trainW.flm.lm 4
W1,W2,W3 W3 ukndiscount gtmin 0
W1,W2 W2 ukndiscount gtmin 0
W1 W1 ukndiscount gtmin 0
0 0 ukndiscount gtmin 0
When I build the factored language model and write it using fngram-count -lm I realized the backoff weights in the language model differ significantly from the backoff weights in the standart n-gram. Both language model use ukndiscount and a cutoff of 0.
For example while my normal 4-gram contains the following entries:
-2.401827 A BEAUTIFUL 0.01767567
-2.401827 A BETTER 0.01767567
the factored language model has:
-2.401827 W-A W-BEAUTIFUL -0.1628703
-2.401827 W-A W-BETTER
So the both language models have the same probability but a different or even missing backoff weight.
If I evalulate the language model written with fngram-count using ngram I get a lot of warnings like:
trainWX.flm.lm: line 2678: warning: no bow for prefix of ngram "A BEAUTIFUL" .
If I use the factored language model for decode I have a higher WER than with the standart 4-gram.
I would like to know how to get the backoff weights for FLMs like for a standart n-gram.
Also an explanation why the backoff weights are missing or different in the FLM would help.
Thank you for your help.
Jan
More information about the SRILM-User
mailing list