[SRILM User List] why some trigrams are lost?

Andreas Stolcke stolcke at speech.sri.com
Thu Jul 22 09:11:51 PDT 2010


By default, trigrams, 4-grams, ..., must occur at least twice in the
training data to be included in the model. To change that use

ngram -gt3min 1 -gt4min 1 ...

Andreas

PS. This has become a FAQ ...

王秋锋 wrote:
> Hi,all:
> I'm trying to get the trigram from the text with the command:
> "ngram-count -text char.txt -lm char.tri -order 3"
> and the content of char.txt is: a b a b a d a e
> and only one trigram in the result file char.tri,
> why other trigram are lost? like "b a b" , "b a d" ,...
> the content of char.tri is :
>
> \data\
> ngram 1=6
> ngram 2=7
> ngram 3=1
> \1-grams:
> -0.9208187 </s>
> -99 <s> -0.06445797
> -0.3767507 a -0.4313637
> -0.6575773 b -0.2405493
> -0.9208187 d -0.06445797
> -0.9208187 e -0.2455126
> \2-grams:
> -0.30103 <s> a
> -0.39794 a b 0
> -0.69897 a d
> -0.69897 a e
> -0.1760913 b a
> -0.30103 d a
> -0.30103 e </s>
> \3-grams:
> -0.1760913 a b a
> \end\
> 2010-07-22
> ------------------------------------------------------------------------
> 王秋锋
> ------------------------------------------------------------------------
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user



More information about the SRILM-User mailing list