Beginning and end of sentences tags

Gwénolé Lecorvé gwenole.lecorve at irisa.fr
Tue Oct 7 04:55:08 PDT 2008


Hi,

I'm currently trying to rescore language scores of lattices generated 
using the HTK toolkit and personal tools.
Here is an example of lattice to be rescored :
> VERSION=1.0
> UTTERANCE=/path/to/one.spf
> acscale=1.00
> vocab=/path/to/dic
> N=290  L=942
> I=0    t=0.00  W=<s>
> I=1    t=0.14  W=le                 v=1
> I=2    t=0.33  W=chien                 v=1
> I=3    t=0.83  W=miaule                 v=1
> I=4    t=1.08  W=</s>
> J=0     S=0    E=1    a=-55.36    l=-2973.43
> J=1     S=1    E=2    a=-72.28    l=-48.43
> J=2     S=2    E=3    a=-72.28    l=-87.30
> J=3     S=3    E=4    a=-91.57    l=-145.72
You can notice that the tags for beginning/end of sentence are present.
My problem is that once I launch lattice-tool (with -htk-words-on-nodes 
and -no-htk-nulls) on such a lattice the results (HTK format) looks like 
this :
> # Header (generated by SRILM)
> VERSION=1.1
> UTTERANCE=/path/to/one.spf
> base=2.71828
> dir=f
> vocab=/path/to/di
> start=0
> end=1
> NODES=6 LINKS=5
> # Nodes
> I=0     W=!NULL t=0
> I=1     W=!NULL t=1.08
> I=2     W=le    t=0.14  v=1
> I=3     W=chien t=0.33  v=1
> I=4     W=miaule        t=0.83  v=1
> I=5     W=!NULL t=1.08
> # Links
> J=0     S=0     E=2     a=-55.36        l=-2.74741
> J=1     S=2     E=3     a=-72.28        l=-9.61595
> J=2     S=3     E=4     a=-72.28        l=-inf
> J=3     S=4     E=5     a=-91.5701      l=-2.87136
> J=4     S=5     E=1     l=-2.87136
Something strange happens : the "bos" and "eos" tags disappear and !NULL 
tags are introduced instead.
Why aren't the "bos" and "eos" printed anymore and why are these !NULL 
tagged considered insteand ?
Can't I just keep the same lattice structure as the one given in input ?

I'm facing this problem since several months and still did not find any 
solution. I would be really grateful if you help me.

Regards,
Gwénolé Lecorvé.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gwenole_lecorve.vcf
Type: text/x-vcard
Size: 255 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20081007/1855a295/attachment.vcf>


More information about the SRILM-User mailing list