converting ngram format model to AT&T FSM format

Andreas Stolcke stolcke at speech.sri.com
Tue Sep 13 18:50:30 PDT 2005


In message <20050909032901.81882.qmail at web40506.mail.yahoo.com>you wrote:
> Hi,
> I'm trying to convert an n-gram model (e.g., a.lm) into AT&T FSM format.
> I have first used make-ngram-pfsg (e.g., make-ngram-pfsg a.lm > a.pfsg), then
>  I used pfsg-to-fsm (e.g., pfsg-fsm a.pfsg > a.fsm). I have some questions re
> garding the interpretation of the transition probabilities and labels:
> 1. words are represented as themselves in the n-gram format, but in the FSM f
> ormat model, the transitions seem to have an index. Which word is represented
>  with which index? Can it be extracted from the order of the unigrams in the 
> ngram format file? Is 0 representing an epsilon?

Use

	pfsg-to-fsm symbolfile=FILE

to dump the index-to-word mapping to FILE.  FILE can then be used with the 
FSM tool options -i and -o (this is explained in the pfsg-scripts man page).

> 2. Are the transition probabilities -10000.5*logprobs?

They are, because that's what make-ngram-pfsg outputs, and pfsg-to-fsm doesn't
change the scaling except changing the sign.  But you can undo this scaling
by using the 

	pfsg-to-fsm scale=S 

option and setting S=1/-10000.5.   Note this will give you back log-base-10,
not log-base-e.

> 3. What do the state potentials represent?

They are the costs of ending a path in a given state.
I don't think they're used in the encoding of PFSGs.

> Also, is there a better way of doing these?

Probably, but not in SRILM ;-)

--Andreas 




More information about the SRILM-User mailing list