lattice-tool -ppl not working for me

Tue Oct 23 16:26:15 PDT 2007

In message <380533.58323.qm at web60616.mail.yahoo.com>you wrote:
> Hi,
> 
> Please excuse the newbie question, but I have searched the archives and web f
> or an answer, and have not been able to find one. I am running the following 
> command:
> 
> echo "HUGE WIN OVER RUTGERS" > sentence.txt
> lattice-tool -ppl sentence.txt -in-lattice footballPodcast.lat -read-htk -deb
> ug 2 -order 4
> 
> and am getting the folllowing results:
> 
>         p( HUGE | <s> )         =  0 [ -inf ]
>         p( WIN | HUGE ...)      =  0 [ -inf ]
>         p( OVER | WIN ...)      =  0 [ -inf ]
>         p( RUTGERS | OVER ...)  =  0 [ -inf ]
>         p( </s> | RUTGERS ...)  =  0 [ -inf ]
> Viterbi backtrace failed
> 1 sentences, 4 words, 0 OOVs
> 5 zeroprobs, logprob= 0 ppl= undefined ppl1= undefined
> 
> Anyone know what might be going on? The original utterance from which the lat
> tice was built is 2 minutes long, containing much more speech than just the f
> our word sentence I am testing on. Is that the problem?

Yes, probably.  lattice-tool -ppl only works for word sequences that exactly
correspond to a path through the lattice between initial and final node.

> Generally speaking, I am looking for a tool that can give me the highest prob
> ability location (along with the associated probability) of where a sequence 
> of words was spoken in an audio file. I am using Sphinx 3.7 to generate latti
> ces from the audio, and have been using various SRILM tools to examine these 
> lattices. Is there a tool that does what I want, or will I need to make one?

What you are trying to do is a kind of word or phrase spotting.

lattice-tool -order 4 -write-ngrams OUTPUT

will write a list of all 4-grams occurring anywhere in the lattice, along
with their posterior probabilities accumulated over all positions.
You could use this to see if your string is SOMEWHERE in the lattice.

lattice-tool -order 4 -write-ngram-index OUTPUT

will generate an index of all 4-gram occurrences and their positions relative
to the start of the utterance, durations, and posterior probabilities
(without combinining distinct instances that are separated in time).

You might have to play with the -min-count option to limit output of 
very low-probability ngrams, or -posterior-prune to make the lattices
smaller prior to processing (for speed/memory reasons).

Andreas