Applied-Machine-Learning
Contacts
Annalisa Marsico
Freie Universitaet Berlin, Max Planck Institute for Molecular Genetics
marsico@molgen.mpg.de, Annalisa.Marsico@fu-berlin.de
Bernhard Renard,
Robert Koch Insitute Berlin
RenardB@rki.de
Additional lecturer:
Carlus Deneke,
Robert Koch Insitute Berlin
DenekeC@rki.de
Robert Rentzsch
Robert Koch Institute Berlin
Rentzsch@rki.de
Stefan Budach
Max Planck Institute for molecular Genetics Berlin
budach@molgen.mpg.de
Prerequisites: Attended the Statistics course from the Master in Bioinformatics FU (or equivalent)
Maximum number of students: 20
Language: English
Time:
Lectures: Wednesday 9-11
Tutorials: Wednesday 11-12
Location: 017/A6 (Arnimallee 6)
Goals
The students will be introduced to the basic statistical and algorithmic concepts in the field of Machine Learning, especially in the context of current research in bioinformatics, biology and biotechnology. This courses focuses on specific applications and data handling, rather than on theoretical concepts. Topics include:
- Regularization methods for variable selections and regression methods for features decorrelation with application to Mass Spectroscopy data and Cancer data
- Support Vector Machines for tumor classification based on genomic data and clinical covariates
- SVMs with string kernels to classify RNA sequences
- Artificial Neural Networks (ANNs) and Deep Learning and some recent applications in Bioinformatics
- Graphical models for signal cascade analysis and quasi-species identification
- Active learning with Random Forests applied to Mass Specrometry data
- Unsupervised learning: model-based clustering of microRNA expression data
Requirements
The students will be assigned weekly exercises which they have to complete. They will work on several practical problems and implement / use the methods learned during the lectures to extract information from biological datasets in R. Exercises are mandatory, problem sets will be posted on this website on a weekly basis and are to be handed in at the end of the Wednesday lecture.
Completing 80% of the assignment correctly and presenting in turns the results from the exercises, are prerequisites to pass the course.
More concretely, there are 13 tutorials and 80% of 13 is 10.4. Every assignment will be graded either ok, ok(-) and fail/NA
- "ok" referes to a good assignment with only minor errors
- "ok(-)" refers to non-failed assignment with major errors
- "fail" refers to a failed assignment; "NA" refers to a non-submitted assignment
Therefore the following scenarios are possible:
- If you get 13 ok(-) you pass
- If you get 12 ok(-) + 1 fail/NA you pass
- If you get 11 ok(-) + 2 fail/NA (and everything lower) you DON'T pass
- If you get 10 ok(-) +1 ok +2 fail/NA you pass
- Everything higher than this: you pass!
Literature
Hastie, Tibshirani & Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009.
James, Witten, Hastie and Tibshirani. An introduciton to Statistical Learning (with applications in R). Springer, 2013
Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006
Jon Shlens. A tutorial on Principal Component Analysis. 2003
Max Kuhn. Applied Predictive Modeling. Sprinder, 2013
Chapelle, Schoelkopf, Zien. Semi-supervised Learning, 2006
Lecture Materials
00 Introduction slides_Renard slides_Marsico
01 Overview slides
02 Protein Function Prediction slides
03 Graphical Models slides
04 Nested Effect Models slides
05 Quasispecies Reconstruction slides
06 Active Learning slides
07 Regularization slides
08 PCA and Partial Least Square slides
09 Logistic Rregression and Gradient Descent slides
10 Support Vector Machines slides
11 String kernels, classification of biological sequences slides
12 Introduction to Neural Networks slides
13 Introduction to Deep Learning - ConvNets slides
Tutorials
01 Assignment
02 Assignment
03 Assignment
04 Assignment
05 Assignment
06 Assignment
07 Assignment
08 Assignment
09 Assignment
10 Assignment
11 Assignment
12 Assignment