<div dir="ltr">Hi,<div><br></div><div>I'm trying out the SRILM toolkit</div><div><br></div><div>I'm trying to build a language model for sentence segmentation. The ngram model I built to test the functions is</div><div><br></div><div>minicorpus.lm</div><div><br></div><div><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">\data\</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">ngram 1=10</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">ngram 2=18</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0);min-height:13px"><span style="font-variant-ligatures:no-common-ligatures"></span><br></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">\1-grams:</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.5862657 </s></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-99 <s> -99</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-1.431364 bark -99</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.9542425 birds -99</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.8293038 cats -7.050447</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.8293038 chase -7.129629</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-1.431364 chirp -99</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.8293038 dogs -7.84033</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-1.431364 meow -99</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-1.130334 the -7.351478</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0);min-height:13px"><span style="font-variant-ligatures:no-common-ligatures"></span><br></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">\2-grams:</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.544068 <s> cats</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.243038 <s> dogs</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.845098 <s> the</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">0 bark </s></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.1760913 birds </s></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.4771213 birds chirp</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.30103 cats </s></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.60206 cats chase</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.60206 cats meow</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.30103 chase birds</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.60206 chase cats</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.60206 chase the</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">0 chirp </s></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.60206 dogs bark</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.1249387 dogs chase</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">0 meow </s></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.30103 the birds</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-0.30103 the cats</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0);min-height:13px"><span style="font-variant-ligatures:no-common-ligatures"></span><br></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">\end\</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"><br></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"><br></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">from the text (same used to test the segment function)</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"><br></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">minicorpus.txt</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"><br></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">dogs chase cats</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">dogs bark</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">cats meow</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">dogs chase birds</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">cats chase birds</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">dogs chase the cats</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">the birds chirp</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"><br></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"><br></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"><br></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">When I try using the segment function I get the following error</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><br></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><br></p></div><div><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">$ segment -order 2 -lm minicorpus.lm -text minicorpus.txt -continuous -debug 5</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">reading 10 1-grams</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">reading 18 2-grams</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">warning: p(w1) < p(<s> w1)) </span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">0: p(NOS) = 0, P(S) = 0.148148</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">1: p(NOS) = 0.111111, P(S) = 0</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">2: p(NOS) = 0.0277778, P(S) = 6.10653e-10</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">3: p(NOS) = 3.66393e-10, P(S) = 0.00793651</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">4: p(NOS) = 0.00198413, P(S) = 0</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">5: p(NOS) = 0, P(S) = 0.000566893</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">6: p(NOS) = 0.000141723, P(S) = 0</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">7: p(NOS) = 0, P(S) = 8.09848e-05</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">8: p(NOS) = 6.07386e-05, P(S) = 0</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">9: p(NOS) = 3.03693e-05, P(S) = 0</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">10: p(NOS) = 0, P(S) = 5.78463e-06</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">11: p(NOS) = 1.44616e-06, P(S) = 0</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">12: p(NOS) = 7.23079e-07, P(S) = 0</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">13: p(NOS) = 0, P(S) = 2.75459e-07</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">14: p(NOS) = 2.06594e-07, P(S) = 0</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">15: p(NOS) = 5.16485e-08, P(S) = 5.67708e-16</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">16: p(NOS) = 2.58243e-08, P(S) = 1.70313e-16</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">17: p(NOS) = 1.70313e-16, P(S) = 1.84459e-09</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">18: p(NOS) = 9.22294e-10, P(S) = 0</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">19: p(NOS) = 3.07431e-10, P(S) = 0</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">Assertion failed: (!Map_noKeyP(key)), function locate, file ../../include/LHash.cc, line 275.</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)">
</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">Abort trap: 6</span></p></div><div><span style="font-variant-ligatures:no-common-ligatures"><br></span></div><div><br></div><div>I get the </div><div><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">Assertion failed: (!Map_noKeyP(key)), function locate, file ../../include/LHash.cc, line 275.</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">Abort trap: 6</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">-error every time I use the segment function. I've tried with different texts and language models (different orders, smoothing and corpora). Is my model missing something? The man page says to use "</span><span style="font-family:-webkit-standard;font-size:medium">standard backoff N-gram model in ARPA </span><a href="http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html" style="font-family:-webkit-standard">ngram-format(5)</a><span style="font-family:-webkit-standard;font-size:medium">, modeling segmentation using the boundary tags <s> and </s></span>", which to my understanding minicorpus.lm is. I use macOS Sierra Version 10.12.1 </p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><br></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><br></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)">When I run the 'make test' the only test that fails is </p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"><br></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)">*** Running test make-ngram-pfsg ***</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><br></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)">real<span class="gmail-Apple-tab-span" style="white-space:pre"> </span>0m0.056s</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)">user<span class="gmail-Apple-tab-span" style="white-space:pre"> </span>0m0.048s</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)">sys<span class="gmail-Apple-tab-span" style="white-space:pre"> </span>0m0.016s</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)">sed: RE error: illegal byte sequence</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)">make-ngram-pfsg: stdout output DIFFERS.</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures"></span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)">make-ngram-pfsg: stderr output IDENTICAL.</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><br></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)"><br></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)">I'm be thankful for any advise you can provide,</p><p style="margin:0px;font-size:11px;line-height:normal;font-family:menlo;color:rgb(0,0,0)">Eeva</p></div></div>