[SRILM User List] lines starting with ## skipped
Andreas Stolcke
stolcke at icsi.berkeley.edu
Mon Dec 19 11:32:19 PST 2011
Deniz Yuret wrote:
> Hi,
>
> I was working on the reuters rcv1 corpus and while investigating a
> discrepancy in the language model output I realized that the ngram
> command skips lines in the test file that start with '##'. Is this a
> documented feature or a bug?
>
Yes, it's a feature of the File::getline() function, but not documented.
In the API you can disable this by setting the skipComments variable in
the File object to false.
There is currently no way to do it at the command line (but would be
easy to add an option).
A workaround is to insert a space character at the beginning of each
input line.
Andreas
More information about the SRILM-User
mailing list