Problems finding best path (to choose synonynm)

Wed Sep 7 09:49:40 PDT 2005

Ehud,

I just realized the latest version of SRILM (1.4.5) might solve your
problem.   

You can read word confusion networks directly into lattice-tool.

lattice-tool -read-mesh -in-lattice test.wlat -split-multiwords -lm ... -write-htk -out-lattice -

will convert your confusion network to HTK format, and rescore with an LM
(all the usual LM options apply).  The original scores are encoded as
"x1" scores.

You can then decode the lattices wiht Viterbi, using 
a weighting specified by the -htk-lmscale and -htk-x1scale options.

I think you need to do these two steps in two separate invocations of 
lattices-tool.

--Andreas

In message <6.0.1.1.1.20050907095331.036a0a60 at pigeon.csd.abdn.ac.uk>you wrote:
> Andreas - Thanks very much for suggesting the disambig tool,
> this works very well for simple cases when synonyms are the
> same length.  Unfortunately (as I didn't say in my first message)
> I also have cases where synonyms are of different lengths, eg
> 
>     you should not be OR[discouraged,put off] by
>     OR[the results of,NULL] this OR[assessment,test]
> 
> I can't see how to get disambig to handle such cases.  Thats
> why wlat format and lattice-tool seemed promising, since they handle
> the above with *DELETE* pseudo-words and -split-multiwords option
> 
> Is there any way to tell lattice-tool that the scores on the
> input lattice are purely "acoustic" scores (that is, my
> preference weights), and that it should
> compute language model scores separately and combine these with
> the existing "acoustic" scores in the lattice?
> 
> I suspect there is some simple thing I need to do, or simple
> mistake which I am making
> 
> An example wlat file that I have created (for the above example) is
> 
> numaligns 9
> name ex9
> posterior 1
> align 0 you 1
align 1 should 1
> align 2 not 1
> align 3 be 1
> align 4 discouraged 0.33 put_off 0.67
> align 5 by 1
> align 6 the_results_of 0.33 *DELETE* 0.67
> align 7 this 1
> align 8 assessment 0.25 test 0.75
> 
> I convert the above to pfsg using wlat-to-pfsg, and then invoke
> 
>  lattice-tool -viterbi-decode -in-lattice ex9.pfsg -lm LM -split-multiwords
> 
> If there isn't a way to do the above, I'll try to create HTK
> formal files as you suggest
> 
> Thanks for you help!
>                                                 Ehud
> 
> At 17:24 06/09/2005, you wrote:
> 
> >In message <6.0.1.1.1.20050906134955.036d7238 at pigeon.csd.abdn.ac.uk>you wrot
> e:
> >> I'm trying to use srilm for a Natural Language Generation
> >> application, to choose between synonymns of a word.  The input
> >> to a system is a structure such as
> >> 
> >>   you OR[answered,got] 4 questions OR[correctly,correct,right]
> >> 
> >> The system needs to make a choice at each OR point, with the
> >> goal of producing the easiest-to-read final sentence.  There are
> >> preference weights for the choices, for example, "answered"
> >> gets a preference weight of 0.2 and "got" gets 0.8, this reflects
> >> the fact that even ignoring LM issues we expect "got"
> >> to be easier to read (shorter, simple phoneme->letter mapping)
> >> 
> >> I represent the above as a "wlat" format file, which I convert
> >> to pfsg and then run lattice-tool on.  However, I can't get
> >> lattice-tool to find the best path through the mesh taking into
> >> account both the language model and the preference weights.
> >> If I specify -viterbi-decode  I get the best path based on the
> >> LM (but ignoring the preference scores), while if I specify
> >> -posterior-decode I get the best path based on preference scores
> >> (but ignoring the LM).  I'd also like to see the actual scores,
> >> I thought I would get this with -nbest-decode but the nbest file
> >> has 0 for all the scores.
> >> 
> >> Is there any way to find the best path taking both LM and
> >> preference weights into consideration, and giving actual
> >> scores?
> >
> >I think you would have to directly encode your problem as an HTK-style
> >lattice, where you can have a number of scores associated with each word.
> >The HTK format is not documented as part of SRILM, but as part of 
> >the HTK documentation (which is available online at
> >http://htk.eng.cam.ac.uk/
> >
> >That said, it seems like your problem is more straightforwardly encoded
> >as a HMM tagging problem.   Have a look at the disambig tool, especially
> >the -text-map option.  The preference values would be encoded in the 
> >map file, and the unamiguous words are mapped to themselves.
> >
> >--Andreas 
> >
> >> 
> >k
> >> Many thanks
> >>                                       Ehud Reiter
> >> 
>