begining/end of sentence tags

Jachym Kolar jachym at kky.zcu.cz
Wed Apr 23 13:53:48 PDT 2008


Dmitriy,
  you can use the "continuous-ngram-count" script to generate counts  
not containing sentence boundary tags. It can be combined with  
ngram-count, such as

'continuous-ngram-count order=3 train.txt | ngram-count -read - -lm lm3gram'

Best,
  Jachym

Quoting Dmitriy Dligach <Dmitriy.Dligach at colorado.edu>:

> Andreas,
>
> First of all I wanted to thank you for your SRILM toolkit; I find it
> extremely useful in my research!
>
> Also, I had a question about the beginning/end of sentence tags:
>
> I need to compute probabilities of strings that are *not* complete
> sentences. My understanding is both 'ngram-count' and 'ngram' tools
> automatically add these tags if they are not explicitly present.
>
> Is there any way to prevent the 'ngram' tool from doing so?
>
> Perhaps the '-limit-vocab' option can somehow help by specifying all
> words in the vocabulary except for the <s> and </s>?
>
> Thanks,
>
>
> Dima






More information about the SRILM-User mailing list