[SRILM User List] Errors while trying to use fngram-count, fngram with Estonian Tagged data

Asooryampasya asooryampasya at gmail.com
Fri Feb 7 05:14:06 PST 2014


Dear fellow users

I am trying to build a factored model for Estonian (which is
morphologically tagged, using tree tagger). The fngram-count program seems
to run without issues. However, when I use fngram program to estimate the
perplexity of a test sample, I get an error.

I found the same question asked here before (
http://www.speech.sri.com/pipermail/srilm-user/2011q3/001088.html), but I
could not find a response to this email. Hence, I am posting it to the list
again.

I am pasting below the error I am getting while running fngram program and
also the contents of my factor-file that I used with both fngram-count and
fngram programs. Please let me know if any more information is needed.

The error:

***
w_g4_w1w2m1m2.count.gz: line 14172: malformed N-gram count or more than 100
words per line
warning: no singleton counts
GT discounting disabled
warning: no singleton counts
GT discounting disabled
warning: no singleton counts
GT discounting disabled
warning: no singleton counts
GT discounting disabled
warning: no singleton counts
GT discounting disabled
warning: no singleton counts
GT discounting disabled
warning: no singleton counts
GT discounting disabled
warning: no singleton counts
GT discounting disabled
warning: no singleton counts
GT discounting disabled
warning: no singleton counts
GT discounting disabled
warning: no singleton counts
GT discounting disabled
s_g4_w1w2m1m2.lm.gz: line 21: error, ngram line has invalid number (1) of
fields, expecting either 2 or 3
format error in lm file
*******

I am still new to using factored models, and I am as of now only using the
example settings given in the Kirchhoff, Blimes and Duh tutorial.

Here is how my factor-file looks like:

******
##word given word-1 word-2 morph-1 morph-2
1
W : 4 W(-1) W(-2) M(-1) M(-2) w_g4_w1w2m1m2.count.gz s_g4_w1w2m1m2.lm.gz 5
0b0111 0b0010 wbdiscount gtmin 4 interpolate
0b1101 0b1000 wbdiscount gtmin 3 interpolate
0b0101 0b0001 wbdiscount gtmin 2 interpolate
0b0100 0b0100 wbdiscount gtmin 1 interpolate
0b0000 0b0000 wbdiscount gtmin 1
******

My training data look like this:
<s>
W-Eksamitöö:M-S.com.pl.nom
W-I.:M-Y.nominal.?
W-Pange:M-V.main.imper.pres
W-sulgudes:M-S.com.pl.in
W-olevad:M-A.pos.pl.nom
W-sõnad:M-S.com.pl.nom
W-õigesse:M-A.pos.sg.ill
W-vormi:M-S.com.sg.adit
W-!:M-Z.Exc
W-Piret:M-S.prop.sg.nom
W-Toomet:M-S.prop.sg.abl
W-on:M-V.main.indic.pres.ps3
W-ettevõtlik:M-A.pos.sg.nom
W-naine:M-S.com.sg.nom
W-.:M-Z.Fst
</s>
******

Thanks,
Pasya.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20140207/a26f1b2b/attachment.html>


More information about the SRILM-User mailing list