Classification of Correlated Mutations using Machine Learning

Dr. Peter Arndt

September 27, 2022

Keywords: NGS data analysis, genome evolution

The heritable information stored in DNA is subject to a multitude of mutagenic processes. These processes might change only one base pair leading to a single nucleotide polymorphism (SNP), insert or delete a stretch of sequence (indels), or lead to more complex changes, e.g., inversions or ectopic rearrangements. It is important to have a good understanding of all these processes and how they act along the DNA, since they not just play a role in evolution but also in the development of diseases, for instance cancer.
Recent advances in the statistical interpretation of single nucleotide polymorphisms give us the power to extend the classification of DNA mutations, resolve complex pattern of DNA alterations, and reclassify many of these events. A better classification of mutations will ultimately be used to improve our knowledge of evolutionary events as well as cancerogenesis.
Successful candidates will develop projects to investigate sequencing read data and identify correlated changes in the DNA sequence at two positions, model them computationally, and use tools from machine learning to (re)classify mutations in whole genome sequencing data. A strong background in statistical analysis, computational biology and machine learning is required.

For more information visit the website of the Evolutionary Genomics Group.


Sequencing reads aligned to the human reference genome as seen in the Integrative Genomics Viewer (IGV). Missmatches and Indels are highlighted using different colors.

Go to Editor View