[SRILM User List] LM of phonemes strings
stolcke at icsi.berkeley.edu
Fri Mar 29 09:06:18 PDT 2013
On 3/28/2013 12:40 PM, Ana Montalvo Bereau wrote:
> Hello all, my name is Ana, I'm a beginner with srilm.
> My objective is to construct a language model to make spoken language
> I'll use the output of a phoneme recognizer to train the LM, so my
> doubt is if SRILM allows me to build models that estimate the prior
> probabilities of phonemes strings rather than words strings.
> In case of positive answer, wich should be the procedure?
> thx in advance
there is nothing really different about building phone-based language
models. In a phone recognizer the phone labels are treated just the
same as the words in a word recognizer, and the same is true of the
LM. You just prepare a corpus of phone labels separated by white
space (don't forget the phone representing "pause" or nonspeech), then
use the ngram-count in the usual way to train an LM.
A minor difference is that because the vocabulary is small and finite,
different smoothing methods might work best. For example, Witten-Bell
smoothing is a good choice for phone LMs in my experience.
More information about the SRILM-User