Beginning and end of sentences tags

Gwénolé Lecorvé gwenole.lecorve at irisa.fr
Wed Oct 8 06:29:47 PDT 2008


Thank you for this quick and precise answer.
However, when I launch my command (see below), I still do not get back 
the same lattice structure.
command :
> lattice-tool -in-lattice /path/to/input.lat -out-lattice 
> /path/to/output.lat
>     -lm $LM
>     -htk-logbase 2.71828
>     -write-htk
>     -read-htk
>     -print-sent-tags
>     -htk-logzero '-99'
>     -no-htk-nulls
>     -htk-words-on-nodes

Then the result is as follows :
> # Header (generated by SRILM)
> VERSION=1.1
> UTTERANCE=/path/to/one.spf
> base=2.71828
> dir=f
> vocab=/path/to/dic
> start=0
> end=1
> NODES=6 LINKS=5
> # Nodes
> I=0     W=<s>   t=0
> I=1     W=</s>  t=1.08
> I=2     W=le    t=0.14  v=1
> I=3     W=chien t=0.33  v=1
> I=4     W=miaule        t=0.83  v=1
> I=5     W=</s>  t=1.08
> # Links
> J=0     S=0     E=2     a=-55.36        l=-2.74741
> J=1     S=2     E=3     a=-72.28        l=-8.60446
> J=2     S=3     E=4     a=-72.28        l=-inf
> J=3     S=4     E=5     a=-91.5701      l=-2.87136
> J=4     S=5     E=1     l=-2.87136


I notice 2 things :
1/ Evenif if !NULL are replaced by the sentence start/end tags, one more 
"eos" tag is added at the end of the lattice. Isn't it a problem since a 
P(</s>|</s>) would then be considered while computing the posteriors ? 
When writing words on edges the problem is the same (whereas the "bos" 
tag dissapears).
2/ Despite the "-htk-logzero -99" option, "-inf" is still returned. 
After a few additional experiments, it appears that the "-htk-logzero" 
option works when, for example, no LM rescoring is applied or when the 
"-no-expansion" option is enabled.

I may misuse the lattice-tool command but I do not see how to preserve 
the original lattice structure (eventhough I know that SRILM converts 
HTK lattices into its own format and that my goal is maybe unreachable 
:-) ).

Best regards,
Gwénolé Lecorvé.

Andreas Stolcke a écrit :
> Gwénolé Lecorvé wrote:
>> Hi,
>>
>> I'm currently trying to rescore language scores of lattices generated 
>> using the HTK toolkit and personal tools.
>> Here is an example of lattice to be rescored :
>>> VERSION=1.0
>>> UTTERANCE=/path/to/one.spf
>>> acscale=1.00
>>> vocab=/path/to/dic
>>> N=290  L=942
>>> I=0    t=0.00  W=<s>
>>> I=1    t=0.14  W=le                 v=1
>>> I=2    t=0.33  W=chien                 v=1
>>> I=3    t=0.83  W=miaule                 v=1
>>> I=4    t=1.08  W=</s>
>>> J=0     S=0    E=1    a=-55.36    l=-2973.43
>>> J=1     S=1    E=2    a=-72.28    l=-48.43
>>> J=2     S=2    E=3    a=-72.28    l=-87.30
>>> J=3     S=3    E=4    a=-91.57    l=-145.72
>> You can notice that the tags for beginning/end of sentence are present.
>> My problem is that once I launch lattice-tool (with 
>> -htk-words-on-nodes and -no-htk-nulls) on such a lattice the results 
>> (HTK format) looks like this :
>>> # Header (generated by SRILM)
>>> VERSION=1.1
>>> UTTERANCE=/path/to/one.spf
>>> base=2.71828
>>> dir=f
>>> vocab=/path/to/di
>>> start=0
>>> end=1
>>> NODES=6 LINKS=5
>>> # Nodes
>>> I=0     W=!NULL t=0
>>> I=1     W=!NULL t=1.08
>>> I=2     W=le    t=0.14  v=1
>>> I=3     W=chien t=0.33  v=1
>>> I=4     W=miaule        t=0.83  v=1
>>> I=5     W=!NULL t=1.08
>>> # Links
>>> J=0     S=0     E=2     a=-55.36        l=-2.74741
>>> J=1     S=2     E=3     a=-72.28        l=-9.61595
>>> J=2     S=3     E=4     a=-72.28        l=-inf
>>> J=3     S=4     E=5     a=-91.5701      l=-2.87136
>>> J=4     S=5     E=1     l=-2.87136
>> Something strange happens : the "bos" and "eos" tags disappear and 
>> !NULL tags are introduced instead.
>> Why aren't the "bos" and "eos" printed anymore and why are these 
>> !NULL tagged considered insteand ?
>> Can't I just keep the same lattice structure as the one given in input ?
>>
>> I'm facing this problem since several months and still did not find 
>> any solution. I would be really grateful if you help me.
> <s> and </s> are replaced by !NULL because they are not necessary, 
> since the start/end of sentence are implicit in the lattice structure.
> For example, when rescoring the lattice with an LM the initial node is 
> implicitly treat as the <s> context.
>
> However, I can see how you would want to preserve these tags for some 
> applications.
> If you download the beta version of srilm you will find a new option: 
> lattice-tool -print-sent-tags will output <s> and </s> in the lattice 
> format (both HTK and PFSG).
>
> Andreas
>>
>> Regards,
>> Gwénolé Lecorvé.
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: gwenole_lecorve.vcf
Type: text/x-vcard
Size: 255 bytes
Desc: not available
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20081008/e0edf345/attachment.vcf>


More information about the SRILM-User mailing list