Counting occurrence of words in random texts
Tuesday 19 February 2008 at 12.30 PM by jcdubacq
Le séminaire OCAD accueille Pierre Nicodème (chargé de recherche CNRS, École polytechnique)
We consider random texts generated by a Bernoulli source. We want to count simultaneously the number of occurrences of words within a finite set of words in a random text of size n. The use of generating functions permits to construct a multivariate function counting the occurrences in texts of all size from 0 to infinity. From there several techniques give access to texts of size n. We compute the multivariate generating function:
- by the very elegant analytic inclusion-exclusion method that resorts to combinatorics.
- by constructing the Aho-Corasick automaton recognizing the words, and translating later to generating functions. We compare the complexity of these two methods.
This is a joint work with F. Bassino, J. Clément and J. Fayolle.
![[LIPN]](/blog-themes/lipn-automne/img/logo_lipn.png)
![[CNRS]](/blog-themes/lipn-automne/img/logo_cnrs.png)
![[Université Paris 13]](/blog-themes/lipn-automne/img/logo_paris13.png)
About the ICS format