[SRILM User List] Question about -prune-lowprobs and -text-has-weights

Meng Chen chenmengdx at gmail.com
Wed Aug 8 03:31:03 PDT 2012


Hi, the* -prune-lowprobs* option in* ngram* will  "prune N-gram
probabilities that are lower than the corresponding backed-off estimates".
This option would be useful especially when the back-off-weight (bow) value
is positive. However, I want to ask if I could simply replace the positive
bow value with 0 instead of using prune-lowprobs. Are there any
differences? Or replace simply is not correct?

Another question:
When training LM, we could use* -text-has-weights* option for the corpus
with sentence frequency. I want to ask what we should do with
the*duplicated sentences
* in large corpus. Should I delete the duplicated sentences? Or should I
calculate the sentence frequency first and use the -text-has-weights option
instead? Or do nothing, just throw all the corpus into training?

Thanks!

Meng CHEN
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120808/849ea146/attachment.html>


More information about the SRILM-User mailing list