[SRILM User List] perplexity

Andreas Stolcke stolcke at icsi.berkeley.edu
Mon Mar 31 17:44:48 PDT 2014


On 3/31/2014 4:26 AM, Laatar Rim wrote:
> Dear Andreas,
>
> PLz i have a question :
> you say : Knowing which words should be in class should be considered 
> part of the training process, or comes from prior knowledge.
> If you application gives you the class membership of the words in the 
> test data then you can add it, otherwise it would be "training on test 
> data".
>
> you mean that my "IN_SRILM: my classes-format - File format for word 
> class definitions  ( /class/ [/p/] /word1/ /word2/ ... )" should also 
> contain both words that exist in my training data and test data or it 
> should contains only words from trainnig data .??
You should only use words in the training data, plus any other knowledge 
source or databases that are different from the test data.
In many application domains that involve semantic knowledge you have 
additional information about the task domain from which you can infer 
class membership.
For example, if you are doing air travel domain, you probably have a 
list of all airport cities, and you create a word class from that.

Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.speech.sri.com/pipermail/srilm-user/attachments/20140331/58681df5/attachment.html>


More information about the SRILM-User mailing list