begining/end of sentence tags
Jachym Kolar
jachym at kky.zcu.cz
Wed Apr 23 13:53:48 PDT 2008
Dmitriy,
you can use the "continuous-ngram-count" script to generate counts
not containing sentence boundary tags. It can be combined with
ngram-count, such as
'continuous-ngram-count order=3 train.txt | ngram-count -read - -lm lm3gram'
Best,
Jachym
Quoting Dmitriy Dligach <Dmitriy.Dligach at colorado.edu>:
> Andreas,
>
> First of all I wanted to thank you for your SRILM toolkit; I find it
> extremely useful in my research!
>
> Also, I had a question about the beginning/end of sentence tags:
>
> I need to compute probabilities of strings that are *not* complete
> sentences. My understanding is both 'ngram-count' and 'ngram' tools
> automatically add these tags if they are not explicitly present.
>
> Is there any way to prevent the 'ngram' tool from doing so?
>
> Perhaps the '-limit-vocab' option can somehow help by specifying all
> words in the vocabulary except for the <s> and </s>?
>
> Thanks,
>
>
> Dima
More information about the SRILM-User
mailing list