Max Planck Institute for Molecular Genetics

Max Planck Institute for Molecular Genetics - Ihnestraße 63-73 - 14195 Berlin - Germany - Phone: (+49 30) 8413 0 - Fax: (+49 30) 8413 1394
home contact search
    Vertebrate Genomics  Genetic Variation, Haplotypes & Genetics of Complex Disease Group


Projects

Home              Team               Collaborations Projects            Background   Publications   Jobs                 Services        Links            

Our group works on the following projects:

MHC haplotype sequencing: An integrated approach to common disease (NGFN-Plus IG/ Central Research Project)

The human major histocompatibility complex (MHC) is recognized as the most important genetic region in relation to common human diseases including inflammatory, infectious and autoimmune diseases as well as transplant medicine (Lechler and Warren, 2000). Major National and International Genome Research Networks including the largest whole-genome association study to date (WTCCC, Nature 447: 661, 2007) have now demonstrated associations between the MHC and numerous disease phenotypes of interest such as inflammatory bowel disease (IBD), psoriasis, sarcoidosis, atopic eczema, susceptibility to sepsis, asthma, diabetes type 1, and many more. The question now arises, how to move from the regions of association to the underlying causal variants for functional analysis and translation into diagnostics and therapeutics.

While clinically highly informative, the complex nature of the MHC presents major challenges to genetic analysis: structural variation in the form of copy-number variations, insertions, deletions and inversions coupled with unprecedented levels of single nucleotide polymorphisms and differing degrees of recombination and linkage disequilibrium have made the MHC the most variable and plastic region in the human genome and severely hampered the hunt for disease genes. Genomic sequencing is by far the most efficient and possibly the only means by which this extraordinary genetic complexity can be unravelled. The potential of this approach has already been demonstrated by sequencing a small number of haplotypes (Stewart et al., 2004, Horton et al., 2008), which led to the identification of a novel susceptibility locus for multiple sclerosis (Yeo et al., 2007). These studies also showed that, despite the large number (>70,000) of MHC variations already known, no variation plateau has yet been reached suggesting that many more potentially disease causing variants must exist. The recent advent of new sequencing platforms has now created the opportunity to capture and clinically harness the full variation content of the MHC by sequencing disease-associated haplotypes on the population level.

Our major goal therefore is to sequence complete MHC haplotypes conferring risk to specific diseases of prime interest to the National Genome Research Networks. This will deliver sets of ‘candidate causal variations’, the missing link between association and disease gene. This essential information will enable research groups to track down the causative variants. We are in a leading position worldwide to embark on such an effort, because of our unique Haploid Reference Resource (HRR) and availability of second generation sequencing technologies. The HRR consists of 100 fosmid libraries representing 200 haploid genomes, including 200 ‘homozygous’ MHC haplotypes. We have demonstrated feasibility and validity of mapping and sequencing HRR MHC clones using NGS technology. In addition, Affymetrix 1000 K genotypic data are available for all 100 HRR DNA samples, allowing the mapping of clones into haplotypes. Moreover, four-digit HLA-typing for all HRR samples confirms presence of a broad spectrum of both risk and protective haplotypes for many common MHC-related diseases analyzed by the National Genome Research Networks, national and international collaborators.

Haplotype approaches to disease gene discovery: A systematic investigation and establishment of reference resources (NGFN2 Systematic Methodological Plattform DNA-Project)

Haplotype-based approaches to disease gene discovery have become a central theme. The ‘International HapMap Project’ has been launched (Nature 425: 758-9, 2003; Nature 426: 789-96, 2003), specifying strategies and resources to be made available to the international community. The HapMap Project relies on the assumption that the human genome can be resolved into ‘blocks’ of common haplotypes, with only few haplotypes per block and few SNPs necessary to tag each block, allowing genome-wide association and candidate gene studies at much higher efficiency. With view of future lines of investigation it has now been recognized that the evaluation of high resolution genetic variation data will be a next important and necessary step in order to 1) assist optimisation of SNP selection and analysis of LD and haplotype structures and extraction of tags and 2) systematically assess the ‘completeness of the information’ (Nature 426: 793, 2003). In depth knowledge on the amount and nature of information that will be added will be indispensable and critically reflect on the power of current haplotype approaches to represent underlying LD and haplotype structures and their validity as a tool to map causative variants. It will, moreover, critically guide the design of the ultimately successful approaches to haplotype-based disease gene discovery. It will, at last, provide the basis to make informed decisions on the meaningful investments in this line of research in the future.

We will perform a first systematic investigation in this direction, analysing high resolution genetic variation data in comparison to the data provided by the HapMap Project. We rely on the following prerequisites: 1) High resolution data sets obtained by the comparative sequence analysis of nuclear loci in an average of several hundred individuals including cases and controls, a significantly greater depth than achieved previously; 2) a novel, highly efficient haplotyping technology (CSH), which allows the genome-wide determination of the molecular haplotype structures of any gene or chromosomal region of interest, respectively, and 3) an (inter)national network of leading experts in haplotype analysis. We will establish a reference resource of haploid clone pools from a total of 250 individuals (500 haploid genomes) from a representative German population. We will type the same loci, using the high resolution-derived SNPs on the one side, the HapMap-derived SNPs on the other side, in the same sample of haploid genomes. We will then systematically analyse and compare the LD and haplotype structures and tag SNPs derived by these two approaches, respectively. Major objectives are: 1) to evaluate to which extent the HapMap-derived SNPs and tag SNPs in fact capture the LD and haplotype structures given at the ultimate level of resolution, DNA sequence, and, in particular, candidate gene-related haplotype structures; 2) to assess, which types of information will be added at increasing levels of resolution; 3) to test at given data sets whether the disease associations derived by high resolution analyses could have been captured by HapMap-derived SNPs and to which extent evaluation of rare (disease-related) haplotypes may be of relevance. Moreover, proposed haploid reference system provides the basis to systematically assess the correspondence of haplotype structures (both phase and block decomposition) predicted in silico with their molecular correlates. Thus, it will serve as basis to comparatively evaluate, develop, optimise and validate algorithms, an issue of increasing importance with view of the increasingly complex data sets expected in the future.

Undoubtedly, the proposed project will provide essential information for all present and future disease gene discovery projects in the NGFN that rely on genetic variation/haplotype approaches. The results will have important implications on the development of successful strategies and investment of resources. Importantly, this project implies the establishment of a ‘community resource’, accessible to any collaborators from the national / international genome networks: a resource for the validation of haplotype structures, a reference system for the development of algorithms and a ‘permanent’ control group and reference resource for all NGFN2 SNP and haplotype-based disease association studies. Moreover, access to the molecular haplotypes of any gene/potential drug target in a population of substantial size will provide essential information to pharmaceutical and biotech companies, which will help elucidate individually different drug response and facilitate processes of drug target evaluation, prioritisation and clinical trials. Thus, it represents a key resource for pharmacogenomic approaches to drug development. Proposed haploid reference resource represents the basis 1) to test for existence of numerous individually different forms of a gene and 2) to provide the templates for their in vitro functional characterisation. This represents a key step in the evaluation of gene function, dysfunction, the molecular basis of drug response and disease processes.

Comparative Candidate Gene Sequencing, Haplotype Analysis and Genetic Risk Profile Identification (longstanding line of research, recently funded by the NGFN Optimization Fonds)

The identification of genes predisposing to human diseases is of paramount importance for understanding the molecular basis of the disease and individually different drug response, and will establish new routes to diagnosis and therapeutic advances of immense medical benefit. A key step in all strategies for disease gene identification is the comparative sequence analysis of candidate genes in patients and controls to identify those specific sequence variations (SNPs) associated with common, complex disease. The importance of haplotype-based analysis over single SNP scoring of disease gene candidates has at last been established.

The present work of the group relies on longstanding lines of research and development (since 1990, see http://www.molgen.mpg.de/~genetic-variation/ProjectProposal1990 and ‘Background’), that have focussed on the systematic analysis of human interindividual DNA sequence differences and their potential functional implications. The underlying, implicit concept that has been pursued was that of whole gene ‘causal haplotypes’: Since it is the entire gene and its encoded protein that act as the units of function which potentially affect a phenotype and ultimately allow first conclusions on disease mechanisms, we will have to analyse the entire sequences of the individual genes including their regulatory and critical intronic regions. It is therefore essential in diploid organisms to determine the specific combinations of given gene sequence variants for each of the chromosomes defined here as haplotypes. Only the correct determination of the underlying haplotypes will allow establishment of meaningful relationships between gene variants (SNPs), gene function and phenotype. It has now become evident that genes and the human genome are much more variable than previously thought. Our as well as other studies that have systematically compared individual candidate gene sequences have revealed that single genes may contain multiple SNPs. The abundant gene variability presents major challenges to the analysis and establishment of complex genotype/haplotype phenotype relationships against a background of high natural genome sequence diversity.

Past and present lines of research and development were/are:  

- The development of high throughput (HT) resequencing technologies to compare candidate gene sequence information in multiple individuals, specifically  (automated) ‘Multiplex Sequence Comparison’, which allowed the simultaneous sequence analysis of 5 up to 55 PCR products in one reaction tube; later, HT capillary sequencing was implemented.

- The prediction of haplotypes from numerous variants, first by development of a haplotype program (MULTIHAP) based on the EM algorithm that allowed prediction of the most likely haplotype pair for each genotype in a given sample and moreover processing a high number of variants. Additional/available programs have been implemented. Recently, a program version has been developed that improves over popular methods by introducing a general complete-data-likelihood framework (Zhang J et al., in press).

- The validation of the genetic haplotypes by application of molecular genetic techniques, such as for instance by a combination of allele-specific PCR and generation of allele-specific products, or by (cosmid) cloning and subsequent DNA sequencing or DNA marker typing, respectively.

- The development of approaches to reduce haplotype complexity, for instance by classification of haplotypes into functionally related (or ideally functionally equivalent) groups. For instance, a hierarchical cluster analysis procedure has been applied for classification; additional clustering methods are available. Genotypes are being analysed/classified accordingly. 

- The development and application of approaches to perform haplotype-based association studies and identify those specific sequence variants, or combinations of variants (risk pattern(s)) that are associated with the disease phenotype or individually different drug response.

At present, substantial data sets from the comparative candidate gene sequence analysis in an average of several hundred individuals are being tested and analysed in collaboration with international institutions with respect to given genetic variation, underlying haplotype structures and the extraction of phenotypically relevant patterns of variants.

Genetics and Pharmacogenomics of Obesity (BMBF ‘BioProfile Nutrigenomics Program’ Potsdam/ Berlin)

The World Health Organisation (WHO) has declared obesity the largest global chronic health problem in adults. The burden it imposes on public health and economy is enormous. Therefore, there is a tremendous need for more effective drugs and diagnostics that allow optimised therapy, prediction and prevention of this disease, as well as prediction of individual treatment outcome.

The major goals of this project are: 1) The identification of disease genes/key molecules involved in the development of obesity, specifically, its molecular phenotypes; 2) the identification of novel drug targets that allow development of innovative, highly efficient therapeutics and, in consequence, disease prevention; 3) the identification of genetic markers that allow prediction of specific molecular disease phenotypes and/or individual responsiveness to pharmaceutical drugs, dietary habits/nutrition and surgical measures to food restriction, which will provide the basis for the development of diagnostic procedures/chips; 4) the development of highly cost-efficient high throughput genotyping technologies that facilitate disease gene discovery and whole genome association studies. To achieve these goals, unique resources and expertise from investigators at major research and clinical institutions have been combined with complementary industrial resources, technologies and know-how. Competitive advantages include: 1) unique, high quality clinical material and a notable body of data to draw from; these point to specific molecules/functional pathways involved and promise innovative approaches to drug target identification and treatment, such as for instance the modulation of pre-adipocyte differentiation and adipocyte biology; 2) key/high throughput technologies in all relevant areas of molecular genetic variation analysis, functional genomics and bioinformatics. Importantly, the collaborators constituting this network have already shown to be capable of generating positive results in terms of potential genetic risk factors, drug target candidates and genetic variants of predictive value. Thus, we anticipate that the necessary spectrum of marketable results, products and technologies can be generated within the funding period for the purpose of subsequent commercialisation. The proposed lines of research and development are designed to prepare the ground for the establishment of a company that will integrate the following components: 1) a patient-oriented ‘Service/Competence Center’ delivering molecular phenotyping, genotyping and diagnostic services; 2) a ‘Medical Consulting Center’ for disease treatment and prevention; 3) an extensive clinical DNA resource as the basis for industrial co-operations and disease gene discovery; 4) a ‘Center for Clinical Development and Drug Trials’ and 5) a genotyping/sequencing/high throughput technology platform, serving industrial co-operations and in house disease gene discovery. First steps towards the foundation of an ‘Integrated Medical Genomics’ company have been taken. A strong interest in several of these components has already been expressed by both pharmaceutical and biotech companies.

External funding

BMBF/NGFN-Plus: MHC Haplotype Sequencing: An integrated approach to common disease

BMBF/NGFN2: Haplotype approaches to disease gene discovery: A systematic investigation and establishment of reference resources.

BMBF/NGFN (Optimization Funds): Comparative Candidate Gene Sequencing, Haplotype Analysis and Genetic Risk Profile Identification.

BMBF BioProfile Potsdam Berlin: Genetics and Pharmacogenomics of Obesity.

German-Israeli Foundation (GIF): Haplotyping and Association Algorithms and their Applications to Model Disease Genes.

BMBF BioProfile Potsdam Berlin: Verbundvorhaben ‘Innovation des Therapiekonzeptes für das Metabolische Syndrom’ – Teilprojekt: Haplotype analysis.

GlaxoSmithKline Award: Analysis of high-resolution genetic variation data with particular emphasis on haplotype structures and LD patterns.