-tagged option?
Gemma Boleda
gemma.boleda at upf.edu
Tue May 17 13:45:40 PDT 2005
Hi,
I am using the -tagged option for ngram-count and I am experiencing 2
problems:
a) the slash is taken into account in the ngram counts: taking as input "la/DT
nena/N5 és/V maca/JQ ./PT", the bigrams look as follows:
<s> la 1
<s> /DT 1
la nena 1
nena és 1
és maca 1
/N5 és 1
/N5 /V 1
/V maca 1
/V /JQ 1
/DT nena 1
/DT /N5 1
maca . 1
/JQ . 1
/JQ /PT 1
. </s> 1
/PT </s> 1
Why is the slash considered as part of the tag?
b) as can be seen in the example, the n-grams with tags are only built
left-to-right, e.g. there is no bigram "la /N5", as I would have expected
(and needed).
Can you help me?
Thanks a lot,
Gemma Boleda
Universitat Pompeu Fabra
Barcelona
More information about the SRILM-User
mailing list