Problems about srilm

Andreas Stolcke stolcke at speech.sri.com
Mon Apr 23 22:44:45 PDT 2007


大弘 wrote:
> Thank you very much for your answering.
> I have another question that if I only want to train *google 3-gram
> language model*, what instructions should I use?
> I have referred to the pages and tried the instructions, but it still
> did not work out.
> Is the reason the same as memory not big enough?
>   
Even just the google 3-grams will be way to big to all fit into memory.

> Could you give me an *example* about bulilding google 3-gram LM file
> ,please?
>   
Again, this will require using the -count-lm option with some tricks
that are not documents
as yet. Please be patient (or read all the manual pages carefully to
figure it our yourself.)
>
> I figured out maybe there are two methods to resolve the problem:
> 1.Build the google 3-gram LM file by batches of reading google corpus
> and then build the complete google 3-gram LM file.
> But I need to know that is there any instruction to build the google
> 3-gram LM file by *batches of reading google corpus*?
>   
This won't work because the smoothing methods for backoff LMs require
access to the entire
ngram set to compute its discounting estimates.
> 2.I trained small language models individually from google files and
> then combined pieces of google 3-gram LM files.
> But I need to know that is there any instruction to *combine pieces of
> google 3-gram LM files*?
>   
Sorry, that won't work, for the same reason as above.

Andreas






More information about the SRILM-User mailing list