[SRILM User List] SRILM trigram worse than HTK bigram?
Dmytro Prylipko
dmytro.prylipko at ovgu.de
Thu Nov 22 05:06:38 PST 2012
Hi,
I found that the accuracy of the recognition results obtained with HVite
is about 5% better with comparison to the hypothesis got after rescoring
the lattices with lattice-tool.
HVite do not really use an N-gram, it is a word net, but I cannot really
figure out why does it work so much better than SRILM models.
I use the following script to generate lattices (60-best):
HVite -A -T 1 \
-C GENLATTICES.conf \
-n 20 60 \
-l outLatDir \
-z lat \
-H hmmDefs \
-S test.list \
-i out.bigram.HLStats.mlf \
-w bigram.HLStats.lat \
-p 0.0 \
-s 8.0 \
lexicon \
hmm.mono.list
Which are then rescored with:
lattice-tool \
-read-htk \
-write-htk \
-htk-lmscale 10.0 \
-htk-words-on-nodes \
-order 3 \
-in-lattice-list srclat.list \
-out-lattice-dir rescoredLatDir \
-lm trigram.SRILM.lm \
-overwrite
find rescoredLatDir -name "*.lat" > rescoredLat.list
lattice-tool \
-read-htk \
-write-htk \
-htk-lmscale 10.0 \
-htk-words-on-nodes \
-order 3 \
-in-lattice-list rescoredLat.list\
-viterbi-decode \
-output-ctm | ctm2mlf_r > out.trigram.SRILM.mlf
Decoded with HVite (92.86%):
LAB: <A> wie sieht es aus mit einem weiteren zweitaegigen mit einer
weiteren zweitaegigen arbeitssitzu
REC: <A> wie sieht es aus mit einem weiteren zweitaegigen in einer
weiteren zweitaegigen arbeitssitzu
... and with lattice-tool (64.29%):
LAB: <A> wie sieht es aus mit einem weiteren zweitaegigen mit einer
weiteren zweitaegigen arbeitssitzu
REC: <A> wie sieht es aus mit einen weiteren zweitaegigen dann bei
einem zweitaegigen arbeitssitzung
Corresponding word nets and LMs have been built using the same
vocabulary and training data. I should say that for some sentences SRILM
outperforms HTK, but in general it is roughly 5-7% behind.
Could you please suggest why is it so? Maybe some parameter values are
wrong?
Or should it be like this?
I would be greatly appreciated for help.
Yours,
Dmytro Prylipko.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20121122/c4dcb03c/attachment.html>
More information about the SRILM-User
mailing list