[SRILM User List] Count-lm reference request

E otheremailid at aol.com
Wed Oct 2 01:16:03 PDT 2013


Thanks for the pointers! Three questions - 


1. The same number of bins are used for all n-grams even though number of ngrams for each N may differ. In web1T,  

Number of unigrams:         13,588,391
Number of fivegrams:     1,176,470,663


Would it make any improvement if fivegrams were binned more number of times than unigrams?

  
2. For a particular ngram in test data, the algorithm will decide which bin Wij's to use based on how many times that n-gram occurred in training data. Is this right?


3. What does it mean when some weights are zero after tuning them. I used just 10 sentences  (5 repeated) in tune.txt and got google.countlm as at the bottom.


For ex. w01, w02 are non-zero but w03 is zero. Does this mean that in the development set, there were no trigrams that corresponded to counts in bin 0?



order 5                                                                                                                                 
mixweights 15
 0.5 0.5 0 0 0 
 0.5 0.5 0 0 0 
 0.5 0.5 0 0 0 
 0.5 0.5 0.5 0.5 0.198641
 0.5 0.5 0 0 0 
 0.5 0.5 0.5 0 0.5 
 0.5 0.5 0.5 0.5 0
 0.5 0.5 0.5 0 0.5 
 0.5 0.5 0.5 0.5 0
 0.5 0.5 0 0 0.5 
 0.5 0.5 0.054722 0 0.5 
 0.5 0.5 0.5 0.5 0.5 
 0.5 0.5 0.5 0.5 0
 0.5 0.5 0.5 0.5 0.5 
 0.5 0.5 0.5 0 0.5 
 1 1.97997e-05 0.0844577 0.030065 3.44131e-06
countmodulus 40
vocabsize 13588391
totalcount 4294967295

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131002/00dee2c2/attachment.html>


More information about the SRILM-User mailing list