[SRILM User List] A question about lattice-tool

Andreas Stolcke stolcke at icsi.berkeley.edu
Sat Apr 7 16:22:27 PDT 2012


On 4/7/2012 3:40 PM, Nizar Habash wrote:
> Hi Andreas, Jing,
>
> We use your lattice-tool for weight assignment and n-best decoding of a lattice including simple paraphrases.
> We're trying to process a huge number of sentences (lattices) with this tool. Our understanding is that if we
> want to handle many sentences with one command (loading the LM once), we should put each sentence in
> a file, and pass a list of these files with the -in-lattice-list option and expect the output as files in the directory
> passed with the -out-lattice-dir option. If we process millions of sentences (i.e. lattices) this way, where they
> end up being  millions of files, we're afraid the file system may crash. Is there a way to pass all the source
> lattices to lattice-tool in one file and get the output in one file?
>
> Thanks
> Nizar and Wael
Sorry, there is not such provision in lattice-tool.  But usually the 
problem with flaky filesystems is having too many files within the same 
directory.
So I would recommend you split your file list into batches of a few 
thousand each, and direct the output for each batch into a separate 
directory.
Even on robust filesystems this is good idea because very large 
directories have slow access times in many filesystem implementations.

Andreas



More information about the SRILM-User mailing list