[SRILM User List] disambig with FLM

Reham Al-Majed reham.imamu at gmail.com
Tue Mar 6 13:11:06 PST 2012


Thanks a lot for your reply ,,


I'm trying to build FLM with the following FLM specifications file:

## normal trigram LM
1
W : 2 W(-1) W(-2) FLMCount.count  FLMLM.lm  3
W1,W2 W2 wbdiscount  interpolate
W1 W1 wbdiscount  interpolate
0 0 wbdiscount


I generate my FLM model using the following command :

fngram-count -factor-file FLMDes -debug 2 -text TrainFLM  -lm FLMLM .lm
-write-counts FLMcount.count -no-virtual-begin-sentence -nonull

It runs without errors .. I then measure the ppl of  the generated FLM with
the following command:

fngram -factor-file FLMDes -debug 2 -ppl FLMTest -nonull


Unfortunately, when I tried to test the main step I got an error :(  ... I
search the mailing list archive but I didn't  find similar problem

The command I used to test disambig with my FLM model was :

 disambig -text FLMTest -map 3.map -factored -lm FLMLM.lm

The output of this command was:

No known factors found in Aa
No known factors found in AA
No known factors found in aa
No known factors found in Bb
No known factors found in bb
No known factors found in BB
No known factors found in CC
No known factors found in cc
No known factors found in Cc
FLMLM.lm: line 2: Error: couldn't form int for number of factored LMs in
when reading FLM spec file


I don't know what dose it mean by "No known factors found in ......"

And I wonder about the error message "couldn't form int for number of
factored LMs in when reading FLM spec file"  .... As you can see above in
my FLM specifications file, I determined the number of FLM specifications !



Some notes may help you  to solve my problem :

-- I've built my model  to  test disambig with FLM  before using it in my
project so it was build with training data of  only 28 sentences, 138 words

-- The mapping file (named 3.map) used to test disambig was :
W-aa Aa 0.5  AA 0.4 aa 0.1
W-bb Bb 0.6 bb 0.1 BB 0.3
W-cc CC 0.7 cc 0.1 Cc 0.2

-- The FLMTest contains only one sentence:
<s>  W-aa W-bb W-cc </s>



Am I doing something wrong ?


Your help and support is really greatly appreciated .. I've a graduation
project  that needs a disambiguator for highly inflected language I'm
worried that I could not use your disambig program with FLM model :(






Best Regards,,
Reham



On 5 March 2012 21:09, Andreas Stolcke <stolcke at icsi.berkeley.edu> wrote:

>  On 3/5/2012 7:17 AM, Reham Al-Majed wrote:
>
>
>
>  Hello ,,
>>
>> I've built class based n-gram by :
>>
>> 1- define my classes
>> 2- use replace-words-with-classes
>> 3- use ngram-count to estimate the LM
>>
>> I want to use this class based n gram model with disambig tool ,, The
>> options (-factored and -count-lm) interpret the LMs as factored and count
>> based LMs ... What about class-based ?  How to tell disambig to interpret
>> the LM as a class-based ?
>>
>> I'm trying to use my class-based as an original n-gram model, however the
>> output for sample test seems strange ... words in the test sample are
>> always disambiguated using the last word in the mapping file !
>>
>>  Actually I want the words be disambiguated using the LM probabilities
>> only without considering the probabilities in the mapping file.. I use the
>> options -lmw 1 and -mapw 0 but the output still the same ...
>>
>>
>> In short my questions are :
>>
>> 1- Is it possible to use class-based n gram with disabmig tool ? Or
>> should I build my own disambiguator  using  the output of ngram tool ?
>>
>
> Unfortunately disambig currently does not support the use of class-based
> ngram LMs (what is implemented by ngram -classes).
> Two workarounds are
> 1) if feasible, expand the class-ngram LM into a word-ngram LM (using
> ngram -expand-classes).
> 2) rewrite the class-ngram as a factored LM. This will require some
> investment into understanding the much more general FLM mechanism.
>
>
>
>
>> 2- How to make disambig tool use the probabilities of LM ONLY ?
>>
>
> disambig -mapw 0 will do that.
>
> Andreas
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120307/cae0b76e/attachment.html>


More information about the SRILM-User mailing list