Fw: From logproba on sentences to logproba on words
Amin Mantrach
amantrac at ulb.ac.be
Tue Jan 29 10:35:25 PST 2008
Apparently my question doesn't meet any answer, so I'll reformulate it
in order to be more clear.
Actually, I want to create an LM model with the command > # ngram-
count -text textfile -lm lmfile
In the case I'm concerned with I dispose of the log-probabilities for
every sentences of appearing. The same that you can obtain from
(#ngram -lm lm_file -debug 1 -ppl testfile)
What I want to do ? Create a new LM file build from probabilities on
sentences I have.
Current ideas :
1 / Produce a text file with the sentences. Each sentence can appear
in file multiple times. It will appear in fact exactly n times. Where
n = exp(log-proba of the sentence)*1000) (Rounded to integer).
And then simply : ngram-count -text newtextsentences -lm new_lm
2 / Produce a count file (with only the counts needed (of the highest
order, etc.) and for each n-gram multiply the nb of occurrence by the
sum of proba of the sentences it belongs to.
This methods is clearly not fair.
Can you answer me if one of those ideas are correct. If not how should
I proceed.
I hope the question in now clear enough.
Thanks a lot for your help.
Amin.
More information about the SRILM-User
mailing list