ngram-count -read performance difference for different tokens
Ergun Bicici
ebicici at ku.edu.tr
Sat Dec 20 16:00:18 PST 2008
Dear SRILM List Members,
I was experimenting with the "-use-server" option of ngram and it appears to
work for "-ppl" calculations from text but I was receiving different numbers
when working with count files. With some debugging, I realized that this was
due to the server receiving <unk> tokens from the client.
I made the following modification:
line 352, LM.cc, version 1.5.7:
//vocab.getIndices(words, wids, order + 1, vocab.unkIndex());
vocab.addWords(words, wids, order + 1);
and I am able to get the same results with or without using a server.
I have not checked whether this will effect "-cache-served-ngrams" policy or
whether this may have other impacts on the results.
Regards,
Ergun
Ergun Bicici
Koc University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20081221/9eaef1f8/attachment.html>
More information about the SRILM-User
mailing list