wild cards

Andreas Stolcke stolcke at speech.sri.com
Thu Apr 17 11:24:05 PDT 2008


Sorry, no wildcards in SRILM!

What you need to do is collect the ngram counts yourself (simple, using 
a gawk or perl script), and structure them such that

n(a, b, c, d, e)

and 

n(a, b, d, e)

are given to ngram-count pretending to be for the ngrams "a b c d e c" and
"a b d e", respectively.
ngram-count will then compute the desired conditional probabilities for you.

--Andreas

In message <Pine.LNX.4.64.0804171051500.26338 at cormorant.colorado.edu>you wrote:
> Hello,
> 
> I am new to language modeling and to SRILM so I appologize if this 
> question has already been discussed here:
> 
> SRILM can compute the probability of a string such as "a b c d e". I was 
> wondering if there is a way to compute the probability of a string where 
> one of the words is a wildcard. E.g., suppose I want to compute the 
> probability of "a b * d e" where "*" is any word.
> 
> I believe this probability P(a, b, *, d, e) can be approximated as
> P(a, b) * P(d, e), but I am still wondering whether there is a better way 
> to compute it (e.g. by passing the wildcard to SRILM).
> 
> I need it to compute a conditional probability (e.g. P(c | a, b, d, e)).
> 
> Thank you!
> 
> Dmitriy
> 
> 




More information about the SRILM-User mailing list