[SRILM User List] buid a language model multiword
stolcke at icsi.berkeley.edu
Fri Mar 17 11:54:24 PDT 2017
On 3/17/2017 7:44 AM, Van Tuan MAI wrote:
> Now i have a text file that contain all the word in story and a vocab
> file that include not only normaly word but also wrong pronunciation
> words (a, b(b1, b2),c(c1, c2, c3)). SO can i add b1, b2, c1, c2 into
> N-gram models??
I'm not sure I fully understand your notation (can you give examples of
what b, b1, b2, etc. stand for?) but you can train an LM on "normal"
or "wrong" words as you wish. The software makes no difference between
You have to experiment to find out if mapping "wrong" to "normal" words
(usually called "text normalization" or TN) would help the performance
of your overall system. The rationale for TN is that is reduces the
sparseness of your data and thereby improves generalization. Also, if
you have a postprocessing step that interprets the words it might help
to only deal with "normal" words.
More information about the SRILM-User