Andreas Stolcke
stolcke at speech.sri.com
Mon Sep 8 09:50:11 PDT 2008
Christine de Bond wrote:
> Hello,
>
> I tried out:
>
> ngram-count -write-vocab vocab.txt -text input.txt
>
> and in the resulting file there is an entry " -pau- " which is not in my input.txt.
> Does anybody know where this pau comes from and what it means?
>
It's a predefine vocabulary item used to represent nonspeech (eg., in
lattices).
This word does not take up any probability mass so it doesn't interfere
with the LM building.
Andreas
> Best regards,
> Christine
>
More information about the SRILM-User
mailing list