general help

Andreas Stolcke stolcke at speech.sri.com
Tue Sep 3 08:45:06 PDT 2002


Hongqin,

Two suggestions:

- interpolate your class-based LM with the word-based one
 (class-based LMs alone usually don't give an improvement over word-based ones
  except in very limited domains).

- use Kneser-Ney smoothing (with interpolation) for the 4gram LM:

   -kndiscount1 -interpolate1 -kndiscount2 -interpolate2 
   -kndiscount3 -interpolate3 -kndiscount4 -interpolate4

  You should see a perplexity reduction over the 3gram, and over GT
   discounting.  Of course you never know about WER...

--Andreas


In message <3D74D4C4.2A5F65BB at inzigo.com>you wrote:
> Hi, Crouching tigers & hidden dragons:
> 
> I am using a word based trigram (GT backoff) for an application, and
> trying to make futher improvement. I tried to use class based, but
> seemed not so good as word based. Higher gram (4gram) seems also worse
> than 3gram. The WER (word error rate) i got now is about 8-10%, it seems
> that there is still some room for improvement. Anyone got good ideas --
> within ngram. Thanks in advance.
> 
> Hongqin Liu
> 
> 




More information about the SRILM-User mailing list