[SRILM User List] Fwd: Batch no-sos and no-eos
Andreas Stolcke
stolcke at icsi.berkeley.edu
Sun Jul 29 08:55:34 PDT 2012
On 7/29/2012 3:46 AM, Alex Tomescu wrote:
> Hello,
>
>> I don't see this behavior. With make-big-lm -no-sos -no-eos it's true that <s> and </s> appear in the unigram section of the LM (they are still part of the vocabulary, similar to other words that might occur in your vocab file but don't occur in your training data), but there are not higher-order order N-gram involving <s> or </s> in the resulting LM.
>
> These are the exact parameters I passed to make-big-lm, and still I
> looked through the LM and there are ngrams containing </s>
> ("-0.0009011862 <PERIOD> </s>")
>
> make-big-lm -name biglm -read merge-iter9-1.ngrams.gz -lm gut.lm
> -no-eos -no-sos -prune 1e-8 -vocab ../gut.vocab -limit-vocab
That just means that those ngrams are in the input count file
(merge-iter9-1.ngrams.gz). You need to also include -no-eos -no-sos
when generating the counts (e.g., with make-batch-counts or directly
with ngram-count).
Andreas
More information about the SRILM-User
mailing list