Language Model output problem using FLM
amittai e axelrod
amittai at mit.edu
Thu Feb 15 07:49:46 PST 2007
On 2/15/07, Antoine Ghaoui <Antoine.Ghaoui at jinny.ie> wrote:
> ## word trigram
> W : 2 W(-1) W(-2) ntextfile_99.flm.cnt ntextfile_99.flm.lm 3
> W1W2 W2 kndiscount gtmin 1 interpolate
> W1 W1 kndiscount gtmin 1 interpolate
> 0 0 kndiscount gtmin 1
> Can you please help on this? Is it normal to have ngram 0x2=0?
Yes (for a regular trigram LM in FLM format). The short answer is that
this indicates that you have no histories that consist simply of W2.
> How can I get the old format?
You can't. This is the standard FLM file format-- but it's really
equivalent to the LM format, it's just labelled a bit differently.
Because a FLM allows you to select arbitrary combinations of factors
to use as the ngram history, the header of the FLM file will contain a
list of how many of each possible combination of factors you're using
for your history. However, as your FLM specification narrows down
which factor combinations are valid histories, some (or many) of the
entries in the FLM header will have a count of zero.
For example, a FLM header corresponding to an FLM over a trigram with
3 factors per word, might look something like this:
...and this is also normal. While in a normal trigram LM you'd see
"1-gram", "2-gram", etc, a FLM will just number all the nodes in the
possible backoff graph and use each node's label in the header rather
than write out which particular factor combination it represents. If
you want to figure out which particular factor combination each hex
label means, I think the counting mechanism is commented in the FLM
In the case of a trigram model, though, there's only one combination
of factors that's not used as a history and thus has zero entries
(namely that of W2 alone), and therefore that's the one labelled 0x2
More information about the SRILM-User