Counting occurrence of words in random texts
Tuesday 19 February 2008 at 12.30 PM by jcdubacq
Séminaire OCAD (Pierre Nicodème)
Tuesday 19 February 2008 at 12.30 PM
Location: B311, LIPN — Duration: one hour
Le séminaire OCAD accueille Pierre Nicodème (chargé de recherche CNRS, École polytechnique)
We consider random texts generated by a Bernoulli source. We want to count simultaneously the number of occurrences of words within a finite set of words in a random text of size n. The use of generating functions permits to construct a multivariate function counting the occurrences in texts of all size from 0 to infinity. From there several techniques give access to texts of size n. We compute the multivariate generating function:
- by the very elegant analytic inclusion-exclusion method that resorts to combinatorics.
- by constructing the Aho-Corasick automaton recognizing the words, and translating later to generating functions. We compare the complexity of these two methods.
This is a joint work with F. Bassino, J. Clément and J. Fayolle.