Problems finding best path (to choose synonynm)
Andreas Stolcke
stolcke at speech.sri.com
Tue Sep 6 09:24:24 PDT 2005
In message <6.0.1.1.1.20050906134955.036d7238 at pigeon.csd.abdn.ac.uk>you wrote:
> I'm trying to use srilm for a Natural Language Generation
> application, to choose between synonymns of a word. The input
> to a system is a structure such as
>
> you OR[answered,got] 4 questions OR[correctly,correct,right]
>
> The system needs to make a choice at each OR point, with the
> goal of producing the easiest-to-read final sentence. There are
> preference weights for the choices, for example, "answered"
> gets a preference weight of 0.2 and "got" gets 0.8, this reflects
> the fact that even ignoring LM issues we expect "got"
> to be easier to read (shorter, simple phoneme->letter mapping)
>
> I represent the above as a "wlat" format file, which I convert
> to pfsg and then run lattice-tool on. However, I can't get
> lattice-tool to find the best path through the mesh taking into
> account both the language model and the preference weights.
> If I specify -viterbi-decode I get the best path based on the
> LM (but ignoring the preference scores), while if I specify
> -posterior-decode I get the best path based on preference scores
> (but ignoring the LM). I'd also like to see the actual scores,
> I thought I would get this with -nbest-decode but the nbest file
> has 0 for all the scores.
>
> Is there any way to find the best path taking both LM and
> preference weights into consideration, and giving actual
> scores?
I think you would have to directly encode your problem as an HTK-style
lattice, where you can have a number of scores associated with each word.
The HTK format is not documented as part of SRILM, but as part of
the HTK documentation (which is available online at
http://htk.eng.cam.ac.uk/
That said, it seems like your problem is more straightforwardly encoded
as a HMM tagging problem. Have a look at the disambig tool, especially
the -text-map option. The preference values would be encoded in the
map file, and the unamiguous words are mapped to themselves.
--Andreas
>
k
> Many thanks
> Ehud Reiter
>
More information about the SRILM-User
mailing list