Problems finding best path (to choose synonynm)

Andreas Stolcke stolcke at speech.sri.com
Tue Sep 6 09:24:24 PDT 2005


In message <6.0.1.1.1.20050906134955.036d7238 at pigeon.csd.abdn.ac.uk>you wrote:
> I'm trying to use srilm for a Natural Language Generation
> application, to choose between synonymns of a word.  The input
> to a system is a structure such as
> 
>   you OR[answered,got] 4 questions OR[correctly,correct,right]
> 
> The system needs to make a choice at each OR point, with the
> goal of producing the easiest-to-read final sentence.  There are
> preference weights for the choices, for example, "answered"
> gets a preference weight of 0.2 and "got" gets 0.8, this reflects
> the fact that even ignoring LM issues we expect "got"
> to be easier to read (shorter, simple phoneme->letter mapping)
> 
> I represent the above as a "wlat" format file, which I convert
> to pfsg and then run lattice-tool on.  However, I can't get
> lattice-tool to find the best path through the mesh taking into
> account both the language model and the preference weights.
> If I specify -viterbi-decode  I get the best path based on the
> LM (but ignoring the preference scores), while if I specify
> -posterior-decode I get the best path based on preference scores
> (but ignoring the LM).  I'd also like to see the actual scores,
> I thought I would get this with -nbest-decode but the nbest file
> has 0 for all the scores.
> 
> Is there any way to find the best path taking both LM and
> preference weights into consideration, and giving actual
> scores?

I think you would have to directly encode your problem as an HTK-style
lattice, where you can have a number of scores associated with each word.
The HTK format is not documented as part of SRILM, but as part of 
the HTK documentation (which is available online at
http://htk.eng.cam.ac.uk/

That said, it seems like your problem is more straightforwardly encoded
as a HMM tagging problem.   Have a look at the disambig tool, especially
the -text-map option.  The preference values would be encoded in the 
map file, and the unamiguous words are mapped to themselves.

--Andreas 

> 
k
> Many thanks
> 					Ehud Reiter
> 




More information about the SRILM-User mailing list