[SRILM User List] Assertion failed: (!Map_noKeyP(key)) in LHash.cc -Error when using segment

Eeva Nikkari eevanikkari at gmail.com
Wed Nov 2 06:22:47 PDT 2016


Hi,

I'm trying out the SRILM toolkit

I'm trying to build a language model for sentence segmentation. The ngram
model I built to test the functions is

minicorpus.lm

\data\

ngram 1=10

ngram 2=18


\1-grams:

-0.5862657      </s>

-99     <s>     -99

-1.431364       bark    -99

-0.9542425      birds   -99

-0.8293038      cats    -7.050447

-0.8293038      chase   -7.129629

-1.431364       chirp   -99

-0.8293038      dogs    -7.84033

-1.431364       meow    -99

-1.130334       the     -7.351478


\2-grams:

-0.544068       <s> cats

-0.243038       <s> dogs

-0.845098       <s> the

0       bark </s>

-0.1760913      birds </s>

-0.4771213      birds chirp

-0.30103        cats </s>

-0.60206        cats chase

-0.60206        cats meow

-0.30103        chase birds

-0.60206        chase cats

-0.60206        chase the

0       chirp </s>

-0.60206        dogs bark

-0.1249387      dogs chase

0       meow </s>

-0.30103        the birds

-0.30103        the cats


\end\



from the text (same used to test the segment function)


minicorpus.txt


dogs chase cats

dogs bark

cats meow

dogs chase birds

cats chase birds

dogs chase the cats

the birds chirp




When I try using the segment function I get the following error



$ segment -order 2 -lm minicorpus.lm -text minicorpus.txt -continuous
-debug 5

reading 10 1-grams

reading 18 2-grams

warning: p(w1) < p(<s> w1))

0: p(NOS) = 0, P(S) = 0.148148

1: p(NOS) = 0.111111, P(S) = 0

2: p(NOS) = 0.0277778, P(S) = 6.10653e-10

3: p(NOS) = 3.66393e-10, P(S) = 0.00793651

4: p(NOS) = 0.00198413, P(S) = 0

5: p(NOS) = 0, P(S) = 0.000566893

6: p(NOS) = 0.000141723, P(S) = 0

7: p(NOS) = 0, P(S) = 8.09848e-05

8: p(NOS) = 6.07386e-05, P(S) = 0

9: p(NOS) = 3.03693e-05, P(S) = 0

10: p(NOS) = 0, P(S) = 5.78463e-06

11: p(NOS) = 1.44616e-06, P(S) = 0

12: p(NOS) = 7.23079e-07, P(S) = 0

13: p(NOS) = 0, P(S) = 2.75459e-07

14: p(NOS) = 2.06594e-07, P(S) = 0

15: p(NOS) = 5.16485e-08, P(S) = 5.67708e-16

16: p(NOS) = 2.58243e-08, P(S) = 1.70313e-16

17: p(NOS) = 1.70313e-16, P(S) = 1.84459e-09

18: p(NOS) = 9.22294e-10, P(S) = 0

19: p(NOS) = 3.07431e-10, P(S) = 0

Assertion failed: (!Map_noKeyP(key)), function locate, file
../../include/LHash.cc, line 275.

Abort trap: 6


I get the

Assertion failed: (!Map_noKeyP(key)), function locate, file
../../include/LHash.cc, line 275.

Abort trap: 6

-error every time I use the segment function. I've tried with different
texts and language models (different orders, smoothing and corpora). Is my
model missing something? The man page says to use "standard backoff N-gram
model in ARPA ngram-format(5)
<http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html>,
modeling segmentation using the boundary tags <s> and </s>", which to my
understanding minicorpus.lm is. I use macOS Sierra Version 10.12.1



When I run the 'make test' the only test that fails is


*** Running test make-ngram-pfsg ***


real 0m0.056s

user 0m0.048s

sys 0m0.016s

sed: RE error: illegal byte sequence

make-ngram-pfsg: stdout output DIFFERS.

make-ngram-pfsg: stderr output IDENTICAL.



I'm be thankful for any advise you can provide,

Eeva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speech.sri.com/pipermail/srilm-user/attachments/20161102/85673ab4/attachment.html>


More information about the SRILM-User mailing list