Gemma Boleda gemma.boleda at upf.edu
Tue May 17 13:45:40 PDT 2005


I am using the -tagged option for ngram-count and I am experiencing 2 

a) the slash is taken into account in the ngram counts: taking as input "la/DT 
nena/N5 és/V maca/JQ ./PT", the bigrams look as follows:

<s> la	1
<s> /DT	1
la nena	1
nena és	1
és maca	1
/N5 és	1
/N5 /V	1
/V maca	1
/V /JQ	1
/DT nena	1
/DT /N5	1
maca .	1
/JQ .	1
/JQ /PT	1
. </s>	1
/PT </s>	1

Why is the slash considered as part of the tag?

b) as can be seen in the example, the n-grams with tags are only built 
left-to-right, e.g. there is no bigram "la /N5", as I would have expected 
(and needed).

Can you help me?

Thanks a lot,

Gemma Boleda
Universitat Pompeu Fabra

