[SRILM User List] class based language model

Shammur Absar Chowdhury shammurchowdhury at gmail.com
Wed Jun 6 03:36:50 PDT 2012


Thank You sir for your help.

I have actually another very silly question.
After I get the probability distribution over word, I build another
language model and when I try to find difference between my previous LM
(where I used my class definition with no [p] value) and my recently
created LM , I found no difference.

I might have a understanding problem in basic theory [as just read about it
in books] or am I doing anything wrong in the step.

My recent steps that I am following:

[1] replace-words-with-classes classes=atis_sphinx.def addone=1 normalize=1
outfile=countExpansion compound_LM.txt

[2] replace-words-with-classes classes=countExpansion compound_LM.txt >
output_text_with_classes

[3] ngram-count  -text output_text_with_classes classes=countExpansion
-lm class_based_model_2.lm

also tried  ngram-count  -text output_text_with_classes   -lm
class_based_model_2.lm

Please do suggest me where I am wrong.

And really sorry for my stupid question.

Thank You

On Wed, Jun 6, 2012 at 12:26 AM, Andreas Stolcke
<stolcke at icsi.berkeley.edu>wrote:

>  You can build class-based LMs using your own class assignments.
>
> Step 2 works with a classfile with or without probabilities (the probs are
> optional in the format).
>
> For step 3, you need some probability distribution over the words to
> obtain a proper language model.
> For example, use the "uniform-classes" script to insert uniform
> probabilities for those class assignments that don't have any.
> If you have a large training set, you can run
>
>     replace-with-words-classes classes=<classfile> addone=1 normalize=1
> outfile=OUTPUT  TEXTFILE
>
> to count the number of times each word occurs and estimate class expansion
> probabilities (written to OUTFILE).
>
> Andreas
>
>
> On 6/5/2012 1:37 AM, Shammur Absar Chowdhury wrote:
>
> Hello
>
>  I am new to srilm and at the same time I am recently learning about
> language model. My aim was to build a class based language model with a
> given class definition.
>
> So far I have used, the below 3 commands from
> http://www.speech.sri.com/pipermail/srilm-user/2010q1/000843.html
>
>
> 1. ngram-class -vocab vocab.txt \
>             -text LM.txt \
>             -numclasses 16 \
>             -classes classfile
> 2. replace-words-with-classes classes=classfile LM.txt >
> Output_text_with_classes
> 3. ngram-count  -text Output_text_with_classes   -lm Class_based_model
>
>
> But as far as I think that the first command here induces the classes. Now
> what if I want srilm to use my assigned class tag and its followed words
> list to make the class model, how will I do it? I meant I try formating my
> classes tag in the class-format and then run the second step but as in the
> format I am suppose to assign a probability, p - which I cant assign in my
> manual created class file.
>
> Could any one please help me and give a direction or suggest some reading
> for me.
> Thank you .
>
> Shammur Absar Chowdhury
>
>
>
> _______________________________________________
> SRILM-User site listSRILM-User at speech.sri.comhttp://www.speech.sri.com/mailman/listinfo/srilm-user
>
>
>


-- 
Shammur Absar Chowdhury
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120606/1f0dff38/attachment.html>


More information about the SRILM-User mailing list