Approximation and Randomised String Processing
(PARSe, ANR-20-CE48-0001)
In this project we aim to study the foundations of processing large-scale, noisy string data. Our goal is to understand the limit of computations, and to provide new ultra-efficient algorithms and data structures for processing such data, inspired by approaches in hashing and high-dimensional geometry. We will focus on three research directions: streaming pattern matching, probabilistic text indexing, and synopsis-based clustering of sequence data. Algorithms and data structures on strings have traditionally been exploited in such fields as Bioinformatics, Information Retrieval, and Digital Security, and we expect our project to have a significant impact on these fields.
Members
- P. Gawrychowski (U. of Wrocław, Assistant Professor, member)
- G. Kucherov (LIGM, Senior CNRS Researcher, external member)
- P. Peterlongo (IRISA, Inria Research Associate, member)
- T. Starikovskaya (ENS Paris, Assistant Professor, PI)
- K. Swenson (LIRMM, Junior CNRS Researcher, member)
PhDs and Postdocs
- G. Gourdel (PhD student, ENS Paris and U. Rennes)
- G. Bathie (PhD student, ENS Paris and U. Bordeaux)
- 2-years postdoc position to fill, deadline October 31, 2022, see offer
Publications
- Property Testing of Regular Languages with Applications to Streaming Property Testing of Visibly Pushdown Languages by G. Bathie, T. Starikovskaya. In ICALP 2021, pp. 119:1–119:17.
- Compressing and Indexing Aligned Readsets by T. Gagie, G. Gourdel, G. Manzini. In WABI 2021, pp. 13:1–13:21.
- Small space and streaming pattern matching with k edits by T. Kociumaka, E. Porat, and T. Starikovskaya. In FOCS 2021.
- An Improved Algorithm for The k-Dyck Edit Distance Problem by D. Fried, S. Golan, T. Kociumaka, T. Kopelowitz, E. Porat, T. Starikovskaya. In SODA 2022.
- Streaming Regular Expression Membership and Pattern Matching by B. Dudek, P. Gawrychowski, G. Gourdel, T. Starikovskaya. In SODA 2022.
Workshop
An integral part of the project is establishing a workshop on large-scale string processing co-organized by P. Gawrychoswki and T. Starikovskaya. The major goal of the workshop is to bridge the gap between researchers focusing on Algorithms on Strings and those working in the general area of algorithms and data structures for large-scale data processing. Details to follow.