[SRILM User List] Count-lm reference request
E
otheremailid at aol.com
Wed Oct 2 01:16:03 PDT 2013
Thanks for the pointers! Three questions -
1. The same number of bins are used for all n-grams even though number of ngrams for each N may differ. In web1T,
Number of unigrams: 13,588,391
Number of fivegrams: 1,176,470,663
Would it make any improvement if fivegrams were binned more number of times than unigrams?
2. For a particular ngram in test data, the algorithm will decide which bin Wij's to use based on how many times that n-gram occurred in training data. Is this right?
3. What does it mean when some weights are zero after tuning them. I used just 10 sentences (5 repeated) in tune.txt and got google.countlm as at the bottom.
For ex. w01, w02 are non-zero but w03 is zero. Does this mean that in the development set, there were no trigrams that corresponded to counts in bin 0?
order 5
mixweights 15
0.5 0.5 0 0 0
0.5 0.5 0 0 0
0.5 0.5 0 0 0
0.5 0.5 0.5 0.5 0.198641
0.5 0.5 0 0 0
0.5 0.5 0.5 0 0.5
0.5 0.5 0.5 0.5 0
0.5 0.5 0.5 0 0.5
0.5 0.5 0.5 0.5 0
0.5 0.5 0 0 0.5
0.5 0.5 0.054722 0 0.5
0.5 0.5 0.5 0.5 0.5
0.5 0.5 0.5 0.5 0
0.5 0.5 0.5 0.5 0.5
0.5 0.5 0.5 0 0.5
1 1.97997e-05 0.0844577 0.030065 3.44131e-06
countmodulus 40
vocabsize 13588391
totalcount 4294967295
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131002/00dee2c2/attachment.html>
More information about the SRILM-User
mailing list