[SRILM User List] How does the option "-gtmin" work in ngram-count?
bulusheva
bulusheva at speechpro.com
Tue Apr 10 00:21:04 PDT 2012
Hi, I have two questions:
1. If I generate the language model with Kneser-Ney smoothing (or
Modified Kneser-Ney), why do the parameter "-gtnmin" apply to already
modified counts?
For example, if in the training data 2-gram "markov model" occurs
only in the context "hidden markov model" and gt2min = 2, then the
modified count for "markov model" = n(* markov model) = 1 < gt2min and
prob("markov model") = bow("markov")*prob("model").
Instead of prob("markov model") = ( n(* markov model) - D)/ n(*
markov *) ;
2. Let say I use ngram-count to generate the language model as
following:
ngram-count -text text.txt -vocab vocab.txt -gt1min 5 -lm sri.lm
Let the word "hello" exists in "vocab.txt" and occurs 4 times in
"text.txt". Then probability of "hello" is calculated as
probability of zerotone. Is it correct?
Thanks
Anna Bulusheva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120410/c17fef55/attachment.html>
More information about the SRILM-User
mailing list