Beginning and end of sentences tags
Gwénolé Lecorvé
gwenole.lecorve at irisa.fr
Tue Oct 7 04:55:08 PDT 2008
Hi,
I'm currently trying to rescore language scores of lattices generated
using the HTK toolkit and personal tools.
Here is an example of lattice to be rescored :
> VERSION=1.0
> UTTERANCE=/path/to/one.spf
> acscale=1.00
> vocab=/path/to/dic
> N=290 L=942
> I=0 t=0.00 W=<s>
> I=1 t=0.14 W=le v=1
> I=2 t=0.33 W=chien v=1
> I=3 t=0.83 W=miaule v=1
> I=4 t=1.08 W=</s>
> J=0 S=0 E=1 a=-55.36 l=-2973.43
> J=1 S=1 E=2 a=-72.28 l=-48.43
> J=2 S=2 E=3 a=-72.28 l=-87.30
> J=3 S=3 E=4 a=-91.57 l=-145.72
You can notice that the tags for beginning/end of sentence are present.
My problem is that once I launch lattice-tool (with -htk-words-on-nodes
and -no-htk-nulls) on such a lattice the results (HTK format) looks like
this :
> # Header (generated by SRILM)
> VERSION=1.1
> UTTERANCE=/path/to/one.spf
> base=2.71828
> dir=f
> vocab=/path/to/di
> start=0
> end=1
> NODES=6 LINKS=5
> # Nodes
> I=0 W=!NULL t=0
> I=1 W=!NULL t=1.08
> I=2 W=le t=0.14 v=1
> I=3 W=chien t=0.33 v=1
> I=4 W=miaule t=0.83 v=1
> I=5 W=!NULL t=1.08
> # Links
> J=0 S=0 E=2 a=-55.36 l=-2.74741
> J=1 S=2 E=3 a=-72.28 l=-9.61595
> J=2 S=3 E=4 a=-72.28 l=-inf
> J=3 S=4 E=5 a=-91.5701 l=-2.87136
> J=4 S=5 E=1 l=-2.87136
Something strange happens : the "bos" and "eos" tags disappear and !NULL
tags are introduced instead.
Why aren't the "bos" and "eos" printed anymore and why are these !NULL
tagged considered insteand ?
Can't I just keep the same lattice structure as the one given in input ?
I'm facing this problem since several months and still did not find any
solution. I would be really grateful if you help me.
Regards,
Gwénolé Lecorvé.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gwenole_lecorve.vcf
Type: text/x-vcard
Size: 255 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20081007/1855a295/attachment.vcf>
More information about the SRILM-User
mailing list