[SRILM User List] Question about 3-gram Language Model with OOV triplets
Burkay Gur
burkay at mit.edu
Tue Oct 25 12:41:43 PDT 2011
Hi,
I have just started using SRILM, and it is a great tool. But I ran
across this issue. The situation is that I have:
corpusA.txt
corpusB.txt
What I want to do is create two different 3-gram language models for
both corpora. But I want to make sure that if a triplet is non-existent
in the other corpus, then a smoothed probability should be assigned to
that. For example;
if corpusA has triplet counts:
this is a 1
is a test 1
and corpusB has triplet counts:
that is a 1
is a test 1
then the final counts for corpusA should be:
this is a 1
is a test 1
that is a 0
because "that is a" is in B but not A.
similarly corpusB should be:
that is a 1
is a test 1
this is a 0
After the counts are setup, some smoothing algorithm might be used. I
have manually tried to make the triple word counts 0, however it does
not seem to work. As they are omitted from 3-grams.
Can you recommend any other ways of doing this?
Thank you,
Burkay
More information about the SRILM-User
mailing list