[SRILM User List] Count-lm reference request
    E 
    otheremailid at aol.com
       
    Wed Oct  2 01:16:03 PDT 2013
    
    
  
Thanks for the pointers! Three questions - 
1. The same number of bins are used for all n-grams even though number of ngrams for each N may differ. In web1T,  
Number of unigrams:         13,588,391
Number of fivegrams:     1,176,470,663
Would it make any improvement if fivegrams were binned more number of times than unigrams?
  
2. For a particular ngram in test data, the algorithm will decide which bin Wij's to use based on how many times that n-gram occurred in training data. Is this right?
3. What does it mean when some weights are zero after tuning them. I used just 10 sentences  (5 repeated) in tune.txt and got google.countlm as at the bottom.
For ex. w01, w02 are non-zero but w03 is zero. Does this mean that in the development set, there were no trigrams that corresponded to counts in bin 0?
order 5                                                                                                                                 
mixweights 15
 0.5 0.5 0 0 0 
 0.5 0.5 0 0 0 
 0.5 0.5 0 0 0 
 0.5 0.5 0.5 0.5 0.198641
 0.5 0.5 0 0 0 
 0.5 0.5 0.5 0 0.5 
 0.5 0.5 0.5 0.5 0
 0.5 0.5 0.5 0 0.5 
 0.5 0.5 0.5 0.5 0
 0.5 0.5 0 0 0.5 
 0.5 0.5 0.054722 0 0.5 
 0.5 0.5 0.5 0.5 0.5 
 0.5 0.5 0.5 0.5 0
 0.5 0.5 0.5 0.5 0.5 
 0.5 0.5 0.5 0 0.5 
 1 1.97997e-05 0.0844577 0.030065 3.44131e-06
countmodulus 40
vocabsize 13588391
totalcount 4294967295
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20131002/00dee2c2/attachment.html>
    
    
More information about the SRILM-User
mailing list