Andreas Stolcke stolcke at speech.sri.com
Mon Sep 8 09:50:11 PDT 2008


Christine de Bond wrote:
> Hello,
>
> I tried out:
>
> ngram-count   -write-vocab vocab.txt   -text input.txt
>
> and in the resulting file there is an entry " -pau- " which is not in my input.txt.
> Does anybody know where this pau comes from and what it means?
>   
It's a predefine vocabulary item used to represent nonspeech (eg., in 
lattices).  
This word does not take up any probability mass so it doesn't interfere 
with the LM building.

Andreas

> Best regards,
> Christine
>   





More information about the SRILM-User mailing list