[SRILM User List] Follow Up: Question about 3-gram Language Model with OOV triplets
Andreas Stolcke
stolcke at icsi.berkeley.edu
Tue Oct 25 15:38:41 PDT 2011
Burkay Gur wrote:
> To follow up, basically, when I edit the .count file and add 0 counts
> for some trigrams, they will not be included in the final .lm file,
> when I try to read from the .count file and create a language model.
A zero count is complete equivalent to a non-existent count, so what
you're seeing it expected.
It is not clear what precisely you want to happen. As a result of
discounting and backing off, your LM, even without the unobserved
trigram, will already assign a non-zero probability to that trigram.
That's exactly what the ngram smoothing algorithms are for.
If you want to inject some specific statistical information rom another
dataset into your target LM you could interpolate (mix) the two LMs to
obtain a third LM. See the description of the ngram -mix-lm option.
Andreas
>
> On 10/25/11 3:41 PM, Burkay Gur wrote:
>> Hi,
>>
>> I have just started using SRILM, and it is a great tool. But I ran
>> across this issue. The situation is that I have:
>>
>> corpusA.txt
>> corpusB.txt
>>
>> What I want to do is create two different 3-gram language models for
>> both corpora. But I want to make sure that if a triplet is
>> non-existent in the other corpus, then a smoothed probability should
>> be assigned to that. For example;
>>
>> if corpusA has triplet counts:
>>
>> this is a 1
>> is a test 1
>>
>> and corpusB has triplet counts:
>>
>> that is a 1
>> is a test 1
>>
>> then the final counts for corpusA should be:
>>
>> this is a 1
>> is a test 1
>> that is a 0
>>
>> because "that is a" is in B but not A.
>>
>> similarly corpusB should be:
>>
>> that is a 1
>> is a test 1
>> this is a 0
>>
>> After the counts are setup, some smoothing algorithm might be used. I
>> have manually tried to make the triple word counts 0, however it does
>> not seem to work. As they are omitted from 3-grams.
>>
>> Can you recommend any other ways of doing this?
>>
>> Thank you,
>> Burkay
>>
>
> _______________________________________________
> SRILM-User site list
> SRILM-User at speech.sri.com
> http://www.speech.sri.com/mailman/listinfo/srilm-user
More information about the SRILM-User
mailing list