lattice-tool -ppl not working for me
Andreas Stolcke
stolcke at speech.sri.com
Tue Oct 23 16:26:15 PDT 2007
In message <380533.58323.qm at web60616.mail.yahoo.com>you wrote:
> Hi,
>
> Please excuse the newbie question, but I have searched the archives and web f
> or an answer, and have not been able to find one. I am running the following
> command:
>
> echo "HUGE WIN OVER RUTGERS" > sentence.txt
> lattice-tool -ppl sentence.txt -in-lattice footballPodcast.lat -read-htk -deb
> ug 2 -order 4
>
> and am getting the folllowing results:
>
> p( HUGE | <s> ) = 0 [ -inf ]
> p( WIN | HUGE ...) = 0 [ -inf ]
> p( OVER | WIN ...) = 0 [ -inf ]
> p( RUTGERS | OVER ...) = 0 [ -inf ]
> p( </s> | RUTGERS ...) = 0 [ -inf ]
> Viterbi backtrace failed
> 1 sentences, 4 words, 0 OOVs
> 5 zeroprobs, logprob= 0 ppl= undefined ppl1= undefined
>
> Anyone know what might be going on? The original utterance from which the lat
> tice was built is 2 minutes long, containing much more speech than just the f
> our word sentence I am testing on. Is that the problem?
Yes, probably. lattice-tool -ppl only works for word sequences that exactly
correspond to a path through the lattice between initial and final node.
> Generally speaking, I am looking for a tool that can give me the highest prob
> ability location (along with the associated probability) of where a sequence
> of words was spoken in an audio file. I am using Sphinx 3.7 to generate latti
> ces from the audio, and have been using various SRILM tools to examine these
> lattices. Is there a tool that does what I want, or will I need to make one?
What you are trying to do is a kind of word or phrase spotting.
lattice-tool -order 4 -write-ngrams OUTPUT
will write a list of all 4-grams occurring anywhere in the lattice, along
with their posterior probabilities accumulated over all positions.
You could use this to see if your string is SOMEWHERE in the lattice.
lattice-tool -order 4 -write-ngram-index OUTPUT
will generate an index of all 4-gram occurrences and their positions relative
to the start of the utterance, durations, and posterior probabilities
(without combinining distinct instances that are separated in time).
You might have to play with the -min-count option to limit output of
very low-probability ngrams, or -posterior-prune to make the lattices
smaller prior to processing (for speed/memory reasons).
Andreas
More information about the SRILM-User
mailing list