[SRILM User List] disambig with FLM
reham.imamu at gmail.com
Tue Mar 6 13:11:06 PST 2012
Thanks a lot for your reply ,,
I'm trying to build FLM with the following FLM specifications file:
## normal trigram LM
W : 2 W(-1) W(-2) FLMCount.count FLMLM.lm 3
W1,W2 W2 wbdiscount interpolate
W1 W1 wbdiscount interpolate
0 0 wbdiscount
I generate my FLM model using the following command :
fngram-count -factor-file FLMDes -debug 2 -text TrainFLM -lm FLMLM .lm
-write-counts FLMcount.count -no-virtual-begin-sentence -nonull
It runs without errors .. I then measure the ppl of the generated FLM with
the following command:
fngram -factor-file FLMDes -debug 2 -ppl FLMTest -nonull
Unfortunately, when I tried to test the main step I got an error :( ... I
search the mailing list archive but I didn't find similar problem
The command I used to test disambig with my FLM model was :
disambig -text FLMTest -map 3.map -factored -lm FLMLM.lm
The output of this command was:
No known factors found in Aa
No known factors found in AA
No known factors found in aa
No known factors found in Bb
No known factors found in bb
No known factors found in BB
No known factors found in CC
No known factors found in cc
No known factors found in Cc
FLMLM.lm: line 2: Error: couldn't form int for number of factored LMs in
when reading FLM spec file
I don't know what dose it mean by "No known factors found in ......"
And I wonder about the error message "couldn't form int for number of
factored LMs in when reading FLM spec file" .... As you can see above in
my FLM specifications file, I determined the number of FLM specifications !
Some notes may help you to solve my problem :
-- I've built my model to test disambig with FLM before using it in my
project so it was build with training data of only 28 sentences, 138 words
-- The mapping file (named 3.map) used to test disambig was :
W-aa Aa 0.5 AA 0.4 aa 0.1
W-bb Bb 0.6 bb 0.1 BB 0.3
W-cc CC 0.7 cc 0.1 Cc 0.2
-- The FLMTest contains only one sentence:
<s> W-aa W-bb W-cc </s>
Am I doing something wrong ?
Your help and support is really greatly appreciated .. I've a graduation
project that needs a disambiguator for highly inflected language I'm
worried that I could not use your disambig program with FLM model :(
On 5 March 2012 21:09, Andreas Stolcke <stolcke at icsi.berkeley.edu> wrote:
> On 3/5/2012 7:17 AM, Reham Al-Majed wrote:
> Hello ,,
>> I've built class based n-gram by :
>> 1- define my classes
>> 2- use replace-words-with-classes
>> 3- use ngram-count to estimate the LM
>> I want to use this class based n gram model with disambig tool ,, The
>> options (-factored and -count-lm) interpret the LMs as factored and count
>> based LMs ... What about class-based ? How to tell disambig to interpret
>> the LM as a class-based ?
>> I'm trying to use my class-based as an original n-gram model, however the
>> output for sample test seems strange ... words in the test sample are
>> always disambiguated using the last word in the mapping file !
>> Actually I want the words be disambiguated using the LM probabilities
>> only without considering the probabilities in the mapping file.. I use the
>> options -lmw 1 and -mapw 0 but the output still the same ...
>> In short my questions are :
>> 1- Is it possible to use class-based n gram with disabmig tool ? Or
>> should I build my own disambiguator using the output of ngram tool ?
> Unfortunately disambig currently does not support the use of class-based
> ngram LMs (what is implemented by ngram -classes).
> Two workarounds are
> 1) if feasible, expand the class-ngram LM into a word-ngram LM (using
> ngram -expand-classes).
> 2) rewrite the class-ngram as a factored LM. This will require some
> investment into understanding the much more general FLM mechanism.
>> 2- How to make disambig tool use the probabilities of LM ONLY ?
> disambig -mapw 0 will do that.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SRILM-User