[SRILM User List] lines starting with ## skipped

Andreas Stolcke stolcke at icsi.berkeley.edu
Mon Dec 19 11:32:19 PST 2011


Deniz Yuret wrote:
> Hi,
>
> I was working on the reuters rcv1 corpus and while investigating a
> discrepancy in the language model output I realized that the ngram
> command skips lines in the test file that start with '##'.  Is this a
> documented feature or a bug?
>   
Yes, it's a feature of the File::getline() function, but not documented.
In the API you can disable this by setting the skipComments variable in 
the File object to false.
There is currently no way to do it at the command line (but would be 
easy to add an option).

A workaround is to insert a space character at the beginning of each 
input line.

Andreas




More information about the SRILM-User mailing list