Where have all the 3-grams gone?
Karl Weilhammer
weilkar at phonetik.uni-muenchen.de
Tue Mar 18 14:37:13 PST 2003
Hi Andreas,
experimenting a little with SRILM, I found that ngram-count does not enter
trigrams into the language model, that occur only once, while it does so
with bigrams. The command
echo "the man hit the ball" | ngram-count -order 3 -text - -cdiscount3 0.5
-cdiscount2 0.5 -cdiscount1 0.5 -unk -lm test_C3gram.lm
results in the following language model:
__________________________________________
\data\
ngram 1=7
ngram 2=6
ngram 3=0
\1-grams:
-1.079181 </s>
-99 <s> -0.1760913
-0.3802113 <unk>
-1.079181 ball -0.2632414
-1.079181 hit -0.1760913
-1.079181 man -0.2632414
-0.60206 the -0.2218487
\2-grams:
-0.30103 <s> the
-0.30103 ball </s>
-0.30103 hit the
-0.30103 man hit
-0.60206 the ball
-0.60206 the man
\3-grams:
\end\
_________________________________________
The same command with "-order 2" results in basically the same language
model (only the lines "ngram 3=0" and "\3-grams:" are missing).
Using "-minprune 4" and "-prune 0" did not change the result.
Is there a possibility to get entries for singular trigrams in the
language model?
Karl
----------------------------------------------------------------------------
Karl Weilhammer
Institut fuer Phonetik und Sprachliche Kommunikation
Ludwig-Maximilians-Universitaet Muenchen Tel.: +49-(0)89-2180-2454
Schellingstr. 3 Fax : +49-(0)89-2800362
80799 Muenchen Email: weilkar at phonetik.uni-muenchen.de
GERMANY www : http://www.phonetik.uni-muenchen.de/
----------------------------------------------------------------------------
More information about the SRILM-User
mailing list