Nanocourse: Genomic analysis using sketching techniques
- Start: May 22, 2025 10:00 AM (Local Time Germany)
- End: May 23, 2025 04:00 PM
- Location: MPIMG
- Room: Seminar room 2 and IT teaching room
- Host: Prof. Dr. Knut Reinert, Jonas Schulte-Mattler
Description
Today many sequencing analyses are done based on k-mers or rather submers of the sequences to be analyzed.
Still, individual genomes contain millions or billions of different k-mers which makes certain tasks, although in principle trivial, at best very time and space-consuming and at worst impossible. For example, how do you determine how many different k-mers are in a genome collection ?
Easy for 1 Gb, not so for 100 Gb.
In the nanocourse we will give an introduction into the theory of a) probabilistic counting (Flajolet, Martin) and b) min hash based sketching.
The participants will then experience the power of these concepts by counting k-mers on different-sized files and comparing different genomes using fractional min hashing. There will be some code pieces given, other have to be filled in by you.
Agenda
May 22nd
10:00-11:30 Lecture part 111:30-13:00 break
13:00-14:30 Lecture part 2
May 23rd
10:00-16:00 Practical session*Target
Doctoral candidates with general programming knowledge and some knowledge of C/C++/Java
Registration
Please register for this course by emailing gindrat@molgen.mpg.de.
*The practical session is limited to 20 participants.
Before signing up for this nanocourse, please make sure that you can attend all the dates and write them in your calendar. If you do not show up or cancel at very short notice, other interested students will not be able to participate.