Novel approaches in indexed-based approximative search for large sequence collections

Prof. Dr. Knut Reinert

October 13, 2020

Finding approximate occurrences of a pattern in a text using a full-text index is a central problem in bioinformatics and has been extensively researched. Bidirectional indices have opened new possibilities in this regard allowing the search to start from anywhere within the pattern and extend in both directions. In particular, use of search schemes (partitioning the pattern and searching the pieces in certain orders with given bounds on errors) can yield significant speed-ups. We have recently formulated a new combinatorial optimization problem and developed a method to compute optimal search schemes [1]. Based on this, we aim to research these methods further by a) finding search schemes for new (error prone) 3rd generation sequencing reads (NanoPore or PacBio), b) develop faster methods to compute good search schemes and c) extend the methods for RNA hairpins. All this should be embedded in the DREAM framework [2]. The applicant will be embedded in the SeqAn team at the Reinert lab.

[1] K. Reinert, C. Pockrandt, K. Kianfar, B. Torkamandi, and H. Luo: Optimum Search Schemes for Approximate String Matching Using Bidirectional FM-Index. RECOMB-Seq, 2018, pp. 1–13.

[2] T. H. Dadi, E. Siragusa, V. C. Piro, A. Andrusch, E. Seiler, B. Y. Renard, and K. Reinert: DREAM-Yara: an exact read mapper for very large databases with short update time. Bioinformatics, vol. 34, no. 17, pp. 766–772, 2018.

For more detailed information visit the website of the Reinert Lab.

Go to Editor View