[SRILM User List] Using SRILM for text classification

Ali Asghar Toraby Parizy aliasghar.toraby at gmail.com
Sun Jun 3 12:40:15 PDT 2012


Hi
Thanks for your reply.
I'm trying to use ngram program to compute perplexity for several files in
a directory. As you said I'm trying to build a simple shell script for
that. ngram prints a large output but I only need perplexity as a number
then I can save those numbers in a loop for every model and then compare
those numbers. Something like this:

for j in $models
do
echo model: $j
ngram -lm $j -ppl $i
done

How can I adjust ngram to print only a number instead of this kind of
output:

file testfiles/test.test: 427 sentences, 2433 words, 1184 OOVs
0 zeroprobs, logprob= -5075.52 ppl= 1067.47 ppl1= 11578.9

I need only number 1067.47 in this case!
Thanks for your help in advance.

On Sat, Jun 2, 2012 at 2:39 AM, Andreas Stolcke
<stolcke at icsi.berkeley.edu>wrote:

> On 6/1/2012 6:04 AM, Ali Asghar Toraby Parizy wrote:
>
>> Hi
>> I wanna use SRILM for text classification. I've successfully compiled
>> srilm and I could reach the classes and utilities in my own project by
>> including header files in include folder and adding libraries in lib folder.
>> I'm also familiar with concepts of language modeling and text
>> categorization but I don't know where to start for using srilm in this
>> regard.
>> I need to create some language models from the corpus that I have and
>> then guess the best model for a new text file using perplexity.
>> Can anybody give me a review of classes and utilities or possibly a
>> document that explains the class hierarchies? I don't have enough time to
>> explore all codes to found out how to use it!
>>
> You probably don't need to link into the C++ API to do what you want.
> Instead, you can operate at the command line, train your LMs, and
> postprocess the output of
>
> ngram -debug 1 -ppl ...
>
> to obtain the model likelihoods on your test data.
>
> The file $SRILM/doc/lm-intro  should contain all the info you need to get
> that going.
>
> Andreas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20120604/c14d2176/attachment.html>


More information about the SRILM-User mailing list