Question about hidden-ngram

Carmen Alvarez carmena at mailandnews.com
Fri Nov 21 05:32:17 PST 2003


Try the flag -force-event for hidden-ngram:

hidden-ngram -text test4.txt -lm lmfile -tolower -hidden-vocab
tags -continuous -posteriors -force-event

Carmen


----- Original Message ----- 
From: "Jachym Kolar" <jachym at kky.zcu.cz>
To: <srilm-user at speech.sri.com>
Sent: Friday, November 21, 2003 7:15 AM
Subject: Question about hidden-ngram


> Hi,
>  I've just tried the hidden-ngram tool to punctuate automatically an
> unpunctuated text. But I got some unexpected results - every word was
tagged
> with the *noevent*.
>
> I've used a training text in a following form:
>
> ...
> for more than a century <COM> the fingerprint has been the quintessential
piece
> of crime scene evidence <PER>
> but now the palm is getting its due <PER>
> ...
>
> Then I trained a 3-gram model with:
>
> ngram-count -write-vocab vocabulary -tolower -text trainingtext -write
output
> -lm lmfile
>
> ... and then I used hidden-ngram tool with following option:
>
> hidden-ngram -text test4.txt -lm lmfile -tolower -hidden-vocab
tags -continuous
> -posteriors
>
> ... and received something like that:
>
> 6        *noevent* 0.998811 <com> 0.00117427 <per> 1.46659e-05 <qm>
7.92597e-10
> měsíců   *noevent* 0.999898 <com> 9.326e-05 <per> 9.07804e-06 <qm>
4.61643e-10
> do       *noevent* 1 <com> 4.19776e-09 <per> 5.76912e-09 <qm> 6.25918e-12
> jednoho  *noevent* 0.999998 <com> 4.18691e-07 <per> 1.24419e-06 <qm>
8.63805e-11
> roku     *noevent* 0.197671 <com> 0.801881 <per> 0.000340206 <qm>
0.000107651
> jak      *noevent* 0.99997 <com> 2.44243e-05 <per> 1.32587e-06 <qm>
4.09674e-06
> je       *noevent* 0.999857 <com> 0.000142836 <per> 2.47722e-07 <qm>
2.47757e-07
> to       *noevent* 0.972235 <com> 0.0266202 <per> 0.000937748 <qm>
0.000206936
> <unk>    *noevent* 0.979455 <com> 0.0205446 <per> 2.70218e-07 <qm>
1.33261e-07
> uvedeno  *noevent* 0.933133 <com> 0.0538742 <per> 0.0129924 <qm>
6.16205e-08
> na       *noevent* 0.999965 <com> 4.71218e-07 <per> 3.39777e-05 <qm>
1.57228e-07
> výrobku  *noevent* 0.736376 <com> 0.168451 <per> 0.0947272 <qm> 0.00044499
>
> Please, can somebody tell me what I did wrong? And is there in SRILM a
tool to
> obtain a text-map from the training text?
>
> Thanks Jachym
>
>
>





More information about the SRILM-User mailing list