[SRILM User List] customizing the ngram out put

Somayeh Bakhshaei s.bakhshaei at yahoo.com
Fri Jan 22 03:48:56 PST 2010


( i must send it with wrong subject, excuse me!!)

i am using this code
 ./../srilm/bin/i686/ngram -lm my.lm -ppl test -debug 2 > out

but the out put is not in the format i want, is there any solution or i must write a post processing code to change it?

1. I want each 3-gram of the test text and its probability and/or perplexity in this form

0.000693052 p( bro | <s> ) 
0.000224209 p( <unk> | bro ...)

2. also above forms for sentenses :

392.676 bro mano dari mikesh

3. I don't want the words that are unknown change to <unk>, how i can use the main word instead?

4. Is it possible to get the n-worth 3-grams (ans sentenses) of the test text?

best regards,

