Beginning and end of sentences tags
Andreas Stolcke
stolcke at speech.sri.com
Tue Oct 7 21:41:43 PDT 2008
Gwénolé Lecorvé wrote:
> Hi,
>
> I'm currently trying to rescore language scores of lattices generated
> using the HTK toolkit and personal tools.
> Here is an example of lattice to be rescored :
>> VERSION=1.0
>> UTTERANCE=/path/to/one.spf
>> acscale=1.00
>> vocab=/path/to/dic
>> N=290 L=942
>> I=0 t=0.00 W=<s>
>> I=1 t=0.14 W=le v=1
>> I=2 t=0.33 W=chien v=1
>> I=3 t=0.83 W=miaule v=1
>> I=4 t=1.08 W=</s>
>> J=0 S=0 E=1 a=-55.36 l=-2973.43
>> J=1 S=1 E=2 a=-72.28 l=-48.43
>> J=2 S=2 E=3 a=-72.28 l=-87.30
>> J=3 S=3 E=4 a=-91.57 l=-145.72
> You can notice that the tags for beginning/end of sentence are present.
> My problem is that once I launch lattice-tool (with
> -htk-words-on-nodes and -no-htk-nulls) on such a lattice the results
> (HTK format) looks like this :
>> # Header (generated by SRILM)
>> VERSION=1.1
>> UTTERANCE=/path/to/one.spf
>> base=2.71828
>> dir=f
>> vocab=/path/to/di
>> start=0
>> end=1
>> NODES=6 LINKS=5
>> # Nodes
>> I=0 W=!NULL t=0
>> I=1 W=!NULL t=1.08
>> I=2 W=le t=0.14 v=1
>> I=3 W=chien t=0.33 v=1
>> I=4 W=miaule t=0.83 v=1
>> I=5 W=!NULL t=1.08
>> # Links
>> J=0 S=0 E=2 a=-55.36 l=-2.74741
>> J=1 S=2 E=3 a=-72.28 l=-9.61595
>> J=2 S=3 E=4 a=-72.28 l=-inf
>> J=3 S=4 E=5 a=-91.5701 l=-2.87136
>> J=4 S=5 E=1 l=-2.87136
> Something strange happens : the "bos" and "eos" tags disappear and
> !NULL tags are introduced instead.
> Why aren't the "bos" and "eos" printed anymore and why are these !NULL
> tagged considered insteand ?
> Can't I just keep the same lattice structure as the one given in input ?
>
> I'm facing this problem since several months and still did not find
> any solution. I would be really grateful if you help me.
<s> and </s> are replaced by !NULL because they are not necessary, since
the start/end of sentence are implicit in the lattice structure.
For example, when rescoring the lattice with an LM the initial node is
implicitly treat as the <s> context.
However, I can see how you would want to preserve these tags for some
applications.
If you download the beta version of srilm you will find a new option:
lattice-tool -print-sent-tags will output <s> and </s> in the lattice
format (both HTK and PFSG).
Andreas
>
> Regards,
> Gwénolé Lecorvé.
More information about the SRILM-User
mailing list