incremental ngram counts

Alexy Khrabrov deliverable at
Tue Oct 23 09:47:00 PDT 2007

Greetings -- I want to count ngrams at certain fraction of my corpus  
by size, e.g. for 10%, 20%, etc.  Is there an alternative to  
concocting separate lists of ad hoc subcorpora and running ngram- 
count separately?  What if I want to track exactly how many new  
ngrams each file contributes, when going in a certain order?


