pfsg-format

Thu Mar 25 09:58:57 PST 2004

Ciprian raises a good point.  Before comparing results you should
process the LM with ngram -prune-lowprobs. (Otherwise the PFSG may not 
be an accurate representation of the LM.)

--Andreas

In message <C7C4A42E1B9A2740B0BDF39D9BB22B4B037C2019 at RED-MSG-30.redmond.corp.mi
crosoft.com>you wrote:
> Hi Andreas,
> 
> I am following these threads since they sometimes contain useful
> information.
> 
> > > Because, when I use a language model made from an ARPA file (by
> using
> > > the NgramLM class) to compute the probability of a word (my language
> > > model is based on letters) and when I use a language model made from
> a
> > > PFSG file (I convert the ARPA thanks to the make-ngram-pfsg script
> and
> > > then by using the LatticeLM class), I don't have the same
> > > log-probability from both representations. Why is there a difference
> ?
> > > Since I convert the ARPA file into a PFSG file, it should be the
> same.
> > 
> > How big are the differences?  there will be some discrepancy due to
> > rounding the scaled log probabilities to an integer, but it should
> > be a small error.
> 
> [Ciprian] I assume PFSG is Probabilistic Finite State Grammar. I do not
> know how exactly the conversion is done in the SRIlm toolkit, but the
> difference could also come from the standard hack used in representing
> ARPA back-off models in FSM format --- having a common back-off state
> that forgets what higher order n-gram state we arrived there from. Am I
> wrong?
> 
> -Ciprian
>