interpreting -order and -debug results
Andreas Stolcke
stolcke at speech.sri.com
Mon Dec 1 21:41:08 PST 2008
Alexy Khrabrov wrote:
> Greetings -- I've trained a Kneser-Ney model of a Russian corpus with
> -order 5 -kndiscount, and started it as a server with -order 5. Then,
> to see that indeed 5-grams are working, I feed it a sentence with (a)
> an existing first word present in the corpus, (b) a made-up first word
> not present in the Russian language. Then I run both 5-word sentences
> in two ways: (1) -order 5 -debug 2 (2) -order 0 debug 3, both for
> -ppl. The results, which puzzle me, are below, followed by a
> description of the puzzlement.
>
> ~ echo c этим заявлением он выступил | ngram -use-server <badbox>
> -order 5 -debug 2 -ppl -
> server <badbox>: probserver ready
> c этим заявлением он выступил
> p( c | <s> ) = 3.67342e-06 [ -5.43493 ]
> p( этим | c ...) = 0.00102315 [ -2.99006 ]
> p( заявлением | этим ...) = 0.00151464 [ -2.81969 ]
> p( он | заявлением ...) = 0.0218172 [ -1.6612 ]
> p( выступил | он ...) = 0.000925487 [ -3.03363 ]
> p( </s> | выступил ...) = 0.00693155 [ -2.15917 ]
> 1 sentences, 5 words, 0 OOVs
> 0 zeroprobs, logprob= -18.0987 ppl= 1038.6 ppl1= 4166.16
>
> file -: 1 sentences, 5 words, 0 OOVs
> 0 zeroprobs, logprob= -18.0987 ppl= 1038.6 ppl1= 4166.16
> ~ echo жуемотничая этим заявлением он выступил | ngram -use-server
> <badbox> -order 5 -debug 2 -ppl -
> server <badbox>: probserver ready
> жуемотничая этим заявлением он выступил
> p( жуемотничая | <s> ) = 0 [ -inf ]
> p( этим | жуемотничая ...) = 0.00014788 [ -3.83009 ]
> p( заявлением | этим ...) = 0.00151464 [ -2.81969 ]
> p( он | заявлением ...) = 0.0218172 [ -1.6612 ]
> p( выступил | он ...) = 0.000925487 [ -3.03363 ]
> p( </s> | выступил ...) = 0.00693155 [ -2.15917 ]
> 1 sentences, 5 words, 0 OOVs
> 1 zeroprobs, logprob= -13.5038 ppl= 502.061 ppl1= 2376.54
>
> file -: 1 sentences, 5 words, 0 OOVs
> 1 zeroprobs, logprob= -13.5038 ppl= 502.061 ppl1= 2376.54
>
> == notice that from the 3rd line p(word | context ...), the
> conditional probs are the same, although we're using a 5-gram model
> and in the second batch the first word is non-existing! We also have
> 0 OOVs reported there (?).
The conditional probs can be the same because the N-gram probability
might not use the full context. In this case, it might just back off to
using one context word.
You can verify this by running ngram -ppl with the LM in a file. -debug
2 will display the length of the ngram used in each position.
You can also start the SERVER side with ngram -debug 2 to see this
information.
About 0 OOVs: The LM client/server implementation has a few limitation
relative to evaluating the LM from a file. One such limitation is that
the client cannot tell the difference between an OOV and word with zero
probability. Functionally they are the same (both are excluded from the
perplexity computation). You see the OOVs being reported as "zeroprob"
tokens, rather than OOVs.
>
> == Now, let's explore what "unlimited ngrams" mean with -order 0, and
> set -debug 3 too:
Note that -order 0 on the client side just means the no context
truncation happens in the CLIENT. So the full history of each ngram is
passed to the server, but then of course there the effective history is
limited by the order of the LM.
So, if your SERVER was started with -order 5 then the -order 0 on the
client side should have no effect.
>
> ~ echo с этим заявлением он выступил | ngram -use-server <badbox>
> -order 0 -debug 3 -ppl -
> server <badbox>: probserver ready
> с этим заявлением он выступил
>
> warning: word probs for this context sum to 0.00119158 != 1 : <s>
> p( с | <s> ) = 0.000113967 [ -3.94322 ] / 0.00119158
>
> warning: word probs for this context sum to 0.0248594 != 1 : с <s>
> p( этим | с ...) = 0.00614229 [ -2.21167 ] / 0.0248594
>
> warning: word probs for this context sum to 0.0135057 != 1 : этим с <s>
> p( заявлением | этим ...) = 0.0026996 [ -2.5687 ] /
> 0.0135057
>
> warning: word probs for this context sum to 0.136629 != 1 : заявлением
> этим с <s>
> p( он | заявлением ...) = 0.0191721 [ -1.71733 ] /
> 0.136629
>
> warning: word probs for this context sum to 0.00931138 != 1 : он
> заявлением этим с <s>
> p( выступил | он ...) = 0.000925487 [ -3.03363 ] / 0.00931138
>
> warning: word probs for this context sum to 0.243228 != 1 : выступил
> он заявлением этим с <s>
> p( </s> | выступил ...) = 0.00693155 [ -2.15917 ] /
> 0.243228
> 1 sentences, 5 words, 0 OOVs
> 0 zeroprobs, logprob= -15.6337 ppl= 403.293 ppl1= 1338.89
>
> file -: 1 sentences, 5 words, 0 OOVs
> 0 zeroprobs, logprob= -15.6337 ppl= 403.293 ppl1= 1338.89
>
> -----
>
> ~ echo жуемотничая этим заявлением он выступил | ngram -use-server
> <badbox> -order 0 -debug 3 -ppl -
> server <badbox>: probserver ready
> жуемотничая этим заявлением он выступил
>
> warning: word probs for this context sum to 0.00107762 != 1 : <s>
> p( жуемотничая | <s> ) = 0 [ -inf ] / 0.00107762
>
> warning: word probs for this context sum to 0.0136768 != 1 :
> жуемотничая <s>
> p( этим | жуемотничая ...) = 0.00014788 [ -3.83009 ] /
> 0.0136768
>
> warning: word probs for this context sum to 0.0105593 != 1 : этим
> жуемотничая <s>
> p( заявлением | этим ...) = 0.00151464 [ -2.81969 ] /
> 0.0105593
>
> warning: word probs for this context sum to 0.0891667 != 1 :
> заявлением этим жуемотничая <s>
> p( он | заявлением ...) = 0.0218172 [ -1.6612 ] /
> 0.0891667
>
> warning: word probs for this context sum to 0.00501918 != 1 : он
> заявлением этим жуемотничая <s>
> p( выступил | он ...) = 0.000925487 [ -3.03363 ] / 0.00501918
>
> warning: word probs for this context sum to 0.00712921 != 1 : выступил
> он заявлением этим жуемотничая <s>
> p( </s> | выступил ...) = 0.00693155 [ -2.15917 ] /
> 0.00712921
> 1 sentences, 5 words, 0 OOVs
> 1 zeroprobs, logprob= -13.5038 ppl= 502.061 ppl1= 2376.54
>
> file -: 1 sentences, 5 words, 0 OOVs
> 1 zeroprobs, logprob= -13.5038 ppl= 502.061 ppl1= 2376.54
>
> == Now we get more differences, the "real" example, the first one,
> differs from the "fake" second one in the first 4 lines, the p(|)'s
> are the same only for the last two lines, 5 and 6. However, the 4th
> line of the first "real" case has a *lower* p( он | заявлением ...)
> = 0.0191721 < p( он | заявлением ...) = 0.0218172 in 4th
> line of the second *fake* case!
>
> Again, we see 0 OOVs reported in both cases, despite "жуемотничая"
> being a fake word with 0 [-Inf] prob.
See explanation above.
>
> Although the final perplexities are higher for the fake case, I can't
> be certain, from these results, that the -order 5 option is being
> honored, and am not sure what -order 0 does here, as well as why some
> conditional probability can be higher for a fake word. Also, what
> exactly is the -debug 3 "word probs for this context", and why would
> they cause a warning for a rather large real corpus, and how should I
> interpret it?
>
> For the reference, here's the model building command I used:
>
> time make-batch-counts list/list-stok 100000 cat counts/5g -order 5 >
> /dev/null 2>&1; time merge-batch-counts counts/5g; time make-big-lm
> -name lm-ko-kn5 -lm lm-ko-kn5 -max-per-file 100000000 -kndiscount
> -order 5 -read counts/5g/*.ngrams.gz
>
> -- and here's how I launch the resulting LM server:
>
> ngram -server-port <badport> -lm /data/rupress/lm-ko-kn5 -order 5
I don't understand why -order 0 gives you any different from -order 5,
as explained above.
I also cannot reproduce this discrepancy with a model I have..
So, I would suggest that you start your server ngram with the -debug 2
option and then pay attention to
- what ngrams get passed to the server
- what the ngram length found in the lm is
- what the returned probability is
The last two pieces of information should be identical with -order 0 or
5 on the client side. If not please email me the output of a short
example and we can investigate further.
Andreas
More information about the SRILM-User
mailing list