converting ngram format model to AT&T FSM format
stolcke at speech.sri.com
Tue Sep 13 18:50:30 PDT 2005
In message <20050909032901.81882.qmail at web40506.mail.yahoo.com>you wrote:
> I'm trying to convert an n-gram model (e.g., a.lm) into AT&T FSM format.
> I have first used make-ngram-pfsg (e.g., make-ngram-pfsg a.lm > a.pfsg), then
> I used pfsg-to-fsm (e.g., pfsg-fsm a.pfsg > a.fsm). I have some questions re
> garding the interpretation of the transition probabilities and labels:
> 1. words are represented as themselves in the n-gram format, but in the FSM f
> ormat model, the transitions seem to have an index. Which word is represented
> with which index? Can it be extracted from the order of the unigrams in the
> ngram format file? Is 0 representing an epsilon?
to dump the index-to-word mapping to FILE. FILE can then be used with the
FSM tool options -i and -o (this is explained in the pfsg-scripts man page).
> 2. Are the transition probabilities -10000.5*logprobs?
They are, because that's what make-ngram-pfsg outputs, and pfsg-to-fsm doesn't
change the scaling except changing the sign. But you can undo this scaling
by using the
option and setting S=1/-10000.5. Note this will give you back log-base-10,
> 3. What do the state potentials represent?
They are the costs of ending a path in a given state.
I don't think they're used in the encoding of PFSGs.
> Also, is there a better way of doing these?
Probably, but not in SRILM ;-)
More information about the SRILM-User