[SRILM User List] Follow Up: Question about 3-gram Language Model with OOV triplets
Burkay Gur
burkay at mit.edu
Tue Oct 25 15:16:49 PDT 2011
To follow up, basically, when I edit the .count file and add 0 counts
for some trigrams, they will not be included in the final .lm file, when
I try to read from the .count file and create a language model.
On 10/25/11 3:41 PM, Burkay Gur wrote:
> Hi,
>
> I have just started using SRILM, and it is a great tool. But I ran
> across this issue. The situation is that I have:
>
> corpusA.txt
> corpusB.txt
>
> What I want to do is create two different 3-gram language models for
> both corpora. But I want to make sure that if a triplet is
> non-existent in the other corpus, then a smoothed probability should
> be assigned to that. For example;
>
> if corpusA has triplet counts:
>
> this is a 1
> is a test 1
>
> and corpusB has triplet counts:
>
> that is a 1
> is a test 1
>
> then the final counts for corpusA should be:
>
> this is a 1
> is a test 1
> that is a 0
>
> because "that is a" is in B but not A.
>
> similarly corpusB should be:
>
> that is a 1
> is a test 1
> this is a 0
>
> After the counts are setup, some smoothing algorithm might be used. I
> have manually tried to make the triple word counts 0, however it does
> not seem to work. As they are omitted from 3-grams.
>
> Can you recommend any other ways of doing this?
>
> Thank you,
> Burkay
>
More information about the SRILM-User
mailing list