Beginning and end of sentences tags
Gwénolé Lecorvé
gwenole.lecorve at irisa.fr
Wed Oct 8 06:29:47 PDT 2008
Thank you for this quick and precise answer.
However, when I launch my command (see below), I still do not get back
the same lattice structure.
command :
> lattice-tool -in-lattice /path/to/input.lat -out-lattice
> /path/to/output.lat
> -lm $LM
> -htk-logbase 2.71828
> -write-htk
> -read-htk
> -print-sent-tags
> -htk-logzero '-99'
> -no-htk-nulls
> -htk-words-on-nodes
Then the result is as follows :
> # Header (generated by SRILM)
> VERSION=1.1
> UTTERANCE=/path/to/one.spf
> base=2.71828
> dir=f
> vocab=/path/to/dic
> start=0
> end=1
> NODES=6 LINKS=5
> # Nodes
> I=0 W=<s> t=0
> I=1 W=</s> t=1.08
> I=2 W=le t=0.14 v=1
> I=3 W=chien t=0.33 v=1
> I=4 W=miaule t=0.83 v=1
> I=5 W=</s> t=1.08
> # Links
> J=0 S=0 E=2 a=-55.36 l=-2.74741
> J=1 S=2 E=3 a=-72.28 l=-8.60446
> J=2 S=3 E=4 a=-72.28 l=-inf
> J=3 S=4 E=5 a=-91.5701 l=-2.87136
> J=4 S=5 E=1 l=-2.87136
I notice 2 things :
1/ Evenif if !NULL are replaced by the sentence start/end tags, one more
"eos" tag is added at the end of the lattice. Isn't it a problem since a
P(</s>|</s>) would then be considered while computing the posteriors ?
When writing words on edges the problem is the same (whereas the "bos"
tag dissapears).
2/ Despite the "-htk-logzero -99" option, "-inf" is still returned.
After a few additional experiments, it appears that the "-htk-logzero"
option works when, for example, no LM rescoring is applied or when the
"-no-expansion" option is enabled.
I may misuse the lattice-tool command but I do not see how to preserve
the original lattice structure (eventhough I know that SRILM converts
HTK lattices into its own format and that my goal is maybe unreachable
:-) ).
Best regards,
Gwénolé Lecorvé.
Andreas Stolcke a écrit :
> Gwénolé Lecorvé wrote:
>> Hi,
>>
>> I'm currently trying to rescore language scores of lattices generated
>> using the HTK toolkit and personal tools.
>> Here is an example of lattice to be rescored :
>>> VERSION=1.0
>>> UTTERANCE=/path/to/one.spf
>>> acscale=1.00
>>> vocab=/path/to/dic
>>> N=290 L=942
>>> I=0 t=0.00 W=<s>
>>> I=1 t=0.14 W=le v=1
>>> I=2 t=0.33 W=chien v=1
>>> I=3 t=0.83 W=miaule v=1
>>> I=4 t=1.08 W=</s>
>>> J=0 S=0 E=1 a=-55.36 l=-2973.43
>>> J=1 S=1 E=2 a=-72.28 l=-48.43
>>> J=2 S=2 E=3 a=-72.28 l=-87.30
>>> J=3 S=3 E=4 a=-91.57 l=-145.72
>> You can notice that the tags for beginning/end of sentence are present.
>> My problem is that once I launch lattice-tool (with
>> -htk-words-on-nodes and -no-htk-nulls) on such a lattice the results
>> (HTK format) looks like this :
>>> # Header (generated by SRILM)
>>> VERSION=1.1
>>> UTTERANCE=/path/to/one.spf
>>> base=2.71828
>>> dir=f
>>> vocab=/path/to/di
>>> start=0
>>> end=1
>>> NODES=6 LINKS=5
>>> # Nodes
>>> I=0 W=!NULL t=0
>>> I=1 W=!NULL t=1.08
>>> I=2 W=le t=0.14 v=1
>>> I=3 W=chien t=0.33 v=1
>>> I=4 W=miaule t=0.83 v=1
>>> I=5 W=!NULL t=1.08
>>> # Links
>>> J=0 S=0 E=2 a=-55.36 l=-2.74741
>>> J=1 S=2 E=3 a=-72.28 l=-9.61595
>>> J=2 S=3 E=4 a=-72.28 l=-inf
>>> J=3 S=4 E=5 a=-91.5701 l=-2.87136
>>> J=4 S=5 E=1 l=-2.87136
>> Something strange happens : the "bos" and "eos" tags disappear and
>> !NULL tags are introduced instead.
>> Why aren't the "bos" and "eos" printed anymore and why are these
>> !NULL tagged considered insteand ?
>> Can't I just keep the same lattice structure as the one given in input ?
>>
>> I'm facing this problem since several months and still did not find
>> any solution. I would be really grateful if you help me.
> <s> and </s> are replaced by !NULL because they are not necessary,
> since the start/end of sentence are implicit in the lattice structure.
> For example, when rescoring the lattice with an LM the initial node is
> implicitly treat as the <s> context.
>
> However, I can see how you would want to preserve these tags for some
> applications.
> If you download the beta version of srilm you will find a new option:
> lattice-tool -print-sent-tags will output <s> and </s> in the lattice
> format (both HTK and PFSG).
>
> Andreas
>>
>> Regards,
>> Gwénolé Lecorvé.
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gwenole_lecorve.vcf
Type: text/x-vcard
Size: 255 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20081008/e0edf345/attachment.vcf>
More information about the SRILM-User
mailing list