[SRILM User List] How does the option "-gtmin" work in ngram-count?

Tue Apr 10 00:21:04 PDT 2012

Hi, I have two questions:

1. If I generate the language model with Kneser-Ney smoothing (or 
Modified Kneser-Ney), why do the parameter "-gtnmin" apply to already 
modified counts?

    For example, if in the training data 2-gram "markov model" occurs
    only in the context "hidden markov model" and gt2min = 2, then the
    modified count for "markov model" = n(* markov model) = 1 < gt2min and
    prob("markov model") = bow("markov")*prob("model").
    Instead of  prob("markov model") = ( n(* markov model)  - D)/ n(*
    markov *) ;

    2. Let say I use ngram-count to generate the language model as
    following:
    ngram-count -text text.txt -vocab vocab.txt -gt1min 5 -lm sri.lm
    Let the word "hello" exists in "vocab.txt" and occurs 4 times in
    "text.txt". Then probability of "hello" is calculated as 
    probability of zerotone. Is it correct?

Thanks
Anna Bulusheva

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120410/c17fef55/attachment.html>