Problems with reading data from STDIN in: SRILM 1.3.3
Andreas Stolcke
stolcke at speech.sri.com
Tue Mar 11 23:48:04 PST 2003
In message <Pine.LNX.4.44.0303111624410.915-100000 at linux14.phonetik.uni-muenche
n.de>you wrote:
> Hi Andreas,
>
> I installed the new version 1.3.3 of the SRI LM toolkit on a Linux
> machine, (Linux 2.4.19, GNU libc 2.2.5, gcc version 2.95.3). I have
> problems with reading data from STDIN in ngram:
>
> Version 1.3.2 and older this worked:
> cat 300classes | ngram -order 2 -ppl DEVTEST.sri -unk -lm 300Klassen.LM -clas
> ses -
> file DEVTEST.sri: 515 sentences, 13964 words, 0 OOVs
> 0 zeroprobs, logprob= -30572.9 ppl= 129.28 ppl1= 154.67
>
>
> This produces a warning with Version 1.3.3:
> cat 300classes | ngram -order 2 -ppl DEVTEST.sri -unk -lm 300Klassen.LM -clas
> ses -
> warning: '-' used multiple times for input
> file DEVTEST.sri: 515 sentences, 13964 words, 0 OOVs
> 0 zeroprobs, logprob= -78894.3 ppl= 281112 ppl1= 446516
>
> But this works perfectly well with Version 1.3.3:
> ngram -order 2 -ppl DEVTEST.sri -unk -lm 300Klassen.LM -classes 300classes
> file DEVTEST.sri: 515 sentences, 13964 words, 0 OOVs
> 0 zeroprobs, logprob= -30572.9 ppl= 129.28 ppl1= 154.67
>
> Is this problem due to my configuration?
>
> Regards,
> Karl
>
Karl,
what you see is an unfortunate byproduct of the new -limit-vocab
facility. It requires the class definition file to be read multiple
times to work correctly (at least in the current implementation).
However, the simple patch included below avoids the problem when the
-limit-vocab option is not being used (as in your case).
Note that another scenario where the classes file is read multiple times
is when you are mixing several models. The message
warning: '-' used multiple times for input
at least warns you that something is trying to read stdin multiple times.
--Andreas
*** /tmp/T00o6hhK Tue Mar 11 23:40:05 2003
--- lm/src/ngram.cc Tue Mar 11 23:37:22 2003
***************
*** 369,377 ****
--- 369,379 ----
* the class names (the first column of the class definitions)
* into the vocabulary.
*/
+ if (limitVocab) {
File file(classesFile, "r");
classVocab->read(file);
}
+ }
ngramLM =
decipherHack ? new DecipherNgram(*vocab, order, !decipherNoBackoff) :
More information about the SRILM-User
mailing list