Approximation and Randomised String Processing

(PARSe, ANR-20-CE48-0001)

In this project we aim to study the foundations of processing large-scale, noisy string data. Our goal is to understand the limit of computations, and to provide new ultra-efficient algorithms and data structures for processing such data, inspired by approaches in hashing and high-dimensional geometry. We will focus on three research directions: streaming pattern matching, probabilistic text indexing, and synopsis-based clustering of sequence data. Algorithms and data structures on strings have traditionally been exploited in such fields as Bioinformatics, Information Retrieval, and Digital Security, and we expect our project to have a significant impact on these fields.

Members

PhDs and Postdocs

Publications

Workshop

An integral part of the project is establishing a workshop on large-scale string processing co-organized by P. Gawrychoswki and T. Starikovskaya. The major goal of the workshop is to bridge the gap between researchers focusing on Algorithms on Strings and those working in the general area of algorithms and data structures for large-scale data processing. Details to follow.

anr