[SRILM User List] Using hidden events
dmytro.prylipko at ovgu.de
Sun Jan 22 11:19:36 PST 2012
I would like to use models with hidden vocabulary for filled pauses
but I am not sure what is the right way to train and test such models.
I have a train and test data containing filled pauses between words as
well as 'clean' datasets where FPs are removed.
The filled pauses are going to be modeled as '-observed -omit' or '-observed'.
The questions are:
- Should I train the model on the data containing the FPs or on the
- Which vocabulary to use during training and test: with FP or
without, since FP word is included into hidden vocabulary?
I am also trying to estimate local perplexity of the words following
filled pauses. I extracted these words together with the contexts into
separate sentences, e.g:
eine woche <FP> was
aus vom <FP> sonnabend
and applied trained LM on them. Total perplexity is calculated as 10^(
- totalLogProb / N ), where totalLogProb is the sum of log
probabilities of the words predicted after <FP>.
The same value is then calculated on these chunks where <FP> have been
removed from the context:
eine woche was
aus vom sonnabend.
Is this right?
Which setup should I use in order to calculate the local perplexity,
when I want to model FPs as hidden events with '-observed -omit'
Thanks in advance.
More information about the SRILM-User