[SRILM User List] LM of phonemes strings

Fri Mar 29 09:06:18 PDT 2013

On 3/28/2013 12:40 PM, Ana Montalvo Bereau wrote:
> Hello all, my name is Ana, I'm a beginner with srilm.
> My objective is to construct a language model to make spoken language 
> recognition.
> I'll use the output of a phoneme recognizer to train the LM, so my 
> doubt is if SRILM allows me to build models that estimate the prior 
> probabilities of phonemes strings rather than words strings.
> In case of positive answer, wich should be the procedure?
> thx in advance
> ana
Ana,

there is nothing really different about building phone-based language 
models.   In a phone recognizer the phone labels are treated just the 
same as the words in a word recognizer, and the same is true of the 
LM.     You just prepare a corpus of phone labels separated by white 
space (don't forget the phone representing "pause" or nonspeech), then 
use the ngram-count in the usual way to train an LM.

A minor difference is that because the vocabulary is small and finite, 
different smoothing methods might work best.  For example, Witten-Bell 
smoothing is a good choice for phone LMs in my experience.

Andreas