[SRILM User List] Fwd: Batch no-sos and no-eos

Andreas Stolcke stolcke at icsi.berkeley.edu
Sun Jul 29 08:55:34 PDT 2012


On 7/29/2012 3:46 AM, Alex Tomescu wrote:
> Hello,
>
>> I don't see this behavior.  With make-big-lm -no-sos -no-eos  it's true that <s> and </s> appear in the unigram section of the LM (they are still part of the vocabulary, similar to other words that might occur in your vocab file but don't occur in your training data), but there are not higher-order order N-gram involving <s> or </s> in the resulting LM.
>
> These are the exact parameters I passed to make-big-lm, and still I
> looked through the LM and there are ngrams containing </s>
> ("-0.0009011862   <PERIOD> </s>")
>
> make-big-lm -name biglm -read merge-iter9-1.ngrams.gz -lm gut.lm
> -no-eos -no-sos -prune 1e-8 -vocab ../gut.vocab -limit-vocab
That just means that those ngrams are in the input count file 
(merge-iter9-1.ngrams.gz).  You need to also include -no-eos -no-sos 
when generating the counts (e.g., with make-batch-counts or directly 
with ngram-count).

Andreas



More information about the SRILM-User mailing list