Applied Machine Learning SoSe 2017


Annalisa Marsico
Freie Universitaet Berlin, Max Planck Institute for Molecular Genetics,

Bernhard Renard,
Robert Koch Insitute Berlin

Additional lecturer:

Thilo Muth,
Robert Koch Insitute Berlin

Robert Rentzsch
Robert Koch Institute Berlin

Roman Shulte-Sasse
Max Planck Institute for molecular Genetics Berlin


Prerequisites: Attended the Statistics course from the Master in Bioinformatics FU (or equivalent)

Maximum number of students: 15

Language: English


Lectures and Tutorials: Friday 10-14

Thanks for the enjoyable semester. Please note that there is no lecture on July 21st!

Location: 017/A6 (Arnimallee 6)



The students will be introduced to the basic statistical and algorithmic concepts in the field of Machine Learning, especially in the context of current research in bioinformatics, biology and biotechnology. This courses focuses on specific applications and data handling, rather than on theoretical concepts. Topics include:

  • Regularization methods for variable selections and regression methods for features decorrelation with application to Mass Spectroscopy data and Cancer data
  • Support Vector Machines for tumor classification based on genomic data and clinical covariates
  • SVMs with string kernels to classify RNA sequences
  • Artificial Neural Networks (ANNs) and Deep Learning and some recent applications in Bioinformatics
  • Graphical models for signal cascade analysis and quasi-species identification
  • Active learning with Random Forests applied to Mass Specrometry data
  • Unsupervised learning: model-based clustering of microRNA expression data



The students will be assigned weekly exercises which they have to complete. They will work on several practical problems and implement / use the methods learned during the lectures to extract information from biological datasets in R. Exercises are mandatory, problem sets will be posted on this website on a weekly basis and are to be handed in at the end of the Wednesday lecture.

Completing 80% of the assignment correctly and presenting in turns the results from the exercises, are prerequisites to pass the course.

More concretely, there are 13 tutorials and 80% of 13 is 10.4. Every assignment will be graded either ok, ok(-) and fail/NA

  • "ok" referes to a good assignment with only minor errors

  • "ok(-)" refers to non-failed assignment with major errors
  • "fail" refers to a failed assignment; "NA" refers to a non-submitted assignment

Therefore the following scenarios are possible:

  • If you get 13 ok(-) you pass
  • If you get 12 ok(-) + 1 fail/NA you pass
  • If you get 11 ok(-) + 2 fail/NA (and everything lower) you DON'T pass
  • If you get 10 ok(-) +1 ok +2 fail/NA you pass
  • Everything higher than this: you pass!

Grading criteria slides



Hastie, Tibshirani & Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009.

James, Witten, Hastie and Tibshirani. An introduciton to Statistical Learning (with applications in R). Springer, 2013

Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006

Jon Shlens. A tutorial on Principal Component Analysis. 2003

Max Kuhn. Applied Predictive Modeling. Sprinder, 2013

Chapelle, Schoelkopf, Zien. Semi-supervised Learning, 2006

Resources on string kernels pdf1 

Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning


Lecture Materials

00 Introduction slides_Thilo, slides_Annalisa

01 Overview slides

02 Practical challenges in any ML study: data and benchmarking slides

03 Partial Least square Regression (PLSR) slides

04 Support Vector Machines (SVMs) slides

05 String kernel SVMs slides

06 Graphical Models slides

07 Nested Effects Models slides

08 Quasispecies Reconstruction slides

09 Artificial Neural Networks (ANNs) slides  let's play with NNs

10 Deep Learning 1 slides

11 Deep Learning 2 slides

12 Active Learning  slides

13 Regularisation  slides



01 Assignment

02 Assignment

03 Assignment link1:pca_utils link2:plsq_var_imp

04 Assignment

05 Assignment

06 Assignment

07 Assignment

08 Assignment

09 Assignment

10 Assignment  ipynb

11 Assignment

12 Assignment

13 Assignment

loading content