Chung lab / Max Planck Research Group Epigenomics
Despite their constant genome sequence cells of multicellular organisms have different morphologies and functions due to the execution of distinct gene ex- pression programs. In this context, transcriptional regulation is very important, as it controls the production rate of mRNAs, which together with the degradation rate determines the steady state level of mRNAs. Transcriptional control depends on the action of transcription factors, which bind to distinct DNA sequences in so-called cis-regulatory elements. These binding events in turn influence the re- cruitment and activity of RNA polymerases.
In eukaryotes, both the binding of transcription factors as well as transcription itself take place in the context of chromatin. The major repeating unit of chromatin is the nucleosome, which consists of two copies each of the four core his- tones H2A, H2B, H3 and H4 (histone octamer) and 147 base pairs of DNA, which is wrapped around the histones in a flat left-handed superhelix. Nucleo- somes form every ~200 base pairs along the complete length of the chromosomal DNA. The mere presence of nucleosomes modulates the accessibility to specific DNA sequences, like promoters and other cis-regulatory sequences. Furthermore, histones are frequently modified by covalent addition of for example acetyl- or methyl-groups. These histone modifications can influence the stability of the DNA-histone complex and/or may serve as binding sites for protein complexes. Thus, histone modifications on the one hand may be read out during process- es acting on chromatin, like transcription, or on the other hand may constitute a memory of past regulatory decisions. Hence, unraveling the cis- and trans- determinants of nucleosome positioning and the role of histone modifications in transcription are central questions of biology in the post-genomic era.
In my previous work, I have started to address these questions, namely (1) the impact of the DNA sequence on nucleosome positioning and (2) the role of his- tone modifications in transcription. In my newly founded group I would like to further investigate these questions theoretically and most importantly also experi- mentally.
Sequence-preferences of the histone octamer
The central assumption in this (still ongoing) project is that the DNA sequence of transcription factor bound genomic regions (referred to as cis-regulatory el- ements) disfavours nucleosome formation, such that once the sequence prefer- ences of histones are known, cis-regulatory regions may be readily identifiable from the DNA sequence alone. In a first study, we made use of publicly available data that measured nucleosome positions in yeast by means of chromatin immu- noprecipitation followed by sequencing, where mono-nucleosomal-sized DNA fragments were generated by digestion with Micrococcal nuclease (MNase). The analysis of this data revealed two signals that help to predict nucleosome posi- tions determined in vivo by this method: (1) an overall enrichment of G or C bases in the nucleosomal DNA and (2) a periodic enrichment of A or T bases and an out of phase enrichment of C or G bases with a period of 10 bases. The analy- sis also showed that the DNA sequence directs nucleosome formation to a minor but significant degree.
Later we recognized that the preference for GC base pairs (signal 1) may be due to the experimental procedure to obtain nucleosomal DNA fragments, i.e. the digestion of chromatin with MNase. MNase is well known to cut DNA almost exclusively at AT base pairs and also known to cut nucleosomal DNA (albeit to a lesser degree than linker DNA). These properties together with the size-selection step may lead to an artificial increase in GC-rich DNA fragments, because those have a lower probability of internal cuts – or in other words, it is much more likely to recover a GC-rich 150 base pair fragment than an AT-rich one even in the absence of nucleosomes. To test this, we performed an experiment, where we digested naked yeast genomic DNA with MNase, selected ~150 base pair frag- ments and end-sequenced these using 2nd generation sequencing. In line with our hypothesis, we found that the recovered DNA fragments were indeed GC-rich. Moreover, the corresponding coverage profile was well correlated to the nucleo- some occupancy profiles obtained both in vitro and in vivo, suggesting that these measurements are heavily biased by the sequence preferences of MNase.
Histone modifications and transcription
It is well established that the presence of certain histone modifications is correlat- ed to transcriptional activity. To elucidate the relationship between histone modi- fications and gene expression in humans, we made use of a publicly available data set that measured the abundance of 38 histone modifications and one histone variant in human CD4+ T-cells. We derived very simple models that relate the levels of histone modifications present at a promoter proximal region to the ex- pression level. The analysis of this data revealed that there is a stable relationship between histone modifications and gene expression, which allows to predict gene expression from the levels of histone modifications in one cell type using a model trained in another cell type, suggesting that we uncovered relationships that are general. Moreover, we showed that only a small number of histone modifications are necessary for accurate prediction. Starting from 39 modifications, we could model gene expression almost as accurately by using only three modifications. An over-representation analysis identified H3K27ac, H2BK5ac, H3K79me1 and H4K20me1 as most important. This result suggests that there is a lot of redundan- cy in the information contained in the histone modifications and that we possibly have identified modifications that are crucial during the transcriptional process. Finally, we found that the important histone modifications differ in two promoter types, namely CpG island promoters and non-CpG island promoters. While in CpG island promoters H3K27ac (and H2BK5ac) and H4K20me1 turned out to be most important, in non-CpG island promoters H3K4me3 and H3K79me1 were identified. This result suggests that these two promoter types are regulated by dif- ferent mechanisms.
Sequence preferences of the histone octamer
The current state of the art technique to map nucleosomes on a genome-wide scale is based on the assumption that the presence of nucleosomes protects the underlying DNA from the cutting activity of micrococcal nuclease. In this sce- nario, the number of reads should correspond to the proportion of cells that have a nucleosome covering the protected DNA fragment. However, we have shown that the number of reads is highly correlated in digestions of chromatin and naked DNA via the GC content of the underlying DNA fragments. This high correla- tion can be explained by two scenarios: (i) the histone octamer prefers GC-rich regions and micrococcal nuclease has just the opposite specificity and (ii) the histone octamer has no preference for GC-rich regions and what we observe is a bias due to the experimental technique.
The current method is not able to distinguish between these two scenarios. We have established that the enrichment of GC-rich DNA fragments is due to the size selection step. Thus, we are currently developing a method to map nucleosome positions without size selection. This is accomplished by isolating the cut sites and comparing the cutting frequencies of micrococcal nuclease in digestions of chromatin to naked DNA. Genomic regions bound by the histone octamer will display a reduced cutting frequency compared to the naked DNA sample, while there is no difference in linker regions. In effect this technique shifts the para- digm from indirect evidence of protection by the ability to recover a certain DNA fragment to direct evidence, i.e. a reduced cutting frequency. To critically chal- lenge the resulting nucleosome map, we will use other nucleases with different sequence specificities and compare the resulting maps. However, this technique requires a much higher coverage than the previous method. Thus, we will use yeast with its small genome as a model system.
Once the data has been generated, we will be able to (i) estimate the probability that a nucleosome is bound to a region, (ii) calculate the underlying potential energies from these probabilities using methods developed in statistical mechan- ics, and (iii) derive a model of the sequence preferences of the histone octamer independent of the effect of statistical positioning.
Towards a histone code of transcription
Our previous work has shown that histone modifications and transcription are very well correlated. In fact, one can use the levels of a few histone modifica- tions measured at a promoter proximal region to infer the expression level of the corresponding genes. This implies that the histone modifications are tightly con- nected to the regulatory network that controls the activity of RNA polymerase II. However, we were not able to establish the precise relationships between histone modifications and RNA polymerase II, i.e. whether these histone modifications act up- and/or downstream of Pol II.
To get insight into the changes that take place during the transcription cycle, we plan to measure the levels of histone modifications and RNA polymerase II phos- phorylation states after the induction of transcription in a time resolved manner by chromatin immunoprecipitation followed by sequencing. Here, we will make use of the model system Drosophila melanogaster. During embryogenesis of Drosophila there is a unique time point, the so-called maternal-to-zygotic transi- tion at which ~1500 genes are induced. This transition occurs in the interphase of cell cycle 14, during which also cellularization takes place, such that the degree of cellularization can be used as a proxy for the time. The embryos will be collected in close collaboration with the group of Bodo Lange. We are currently develop- ing methods to sort Drosophila embryos by morphological characteristics. Since the amount of starting material for chromatin immunoprecipitiation is the main limiting factor we plan to automate the sorting. This will be done by a microflu- idic approach coupled to a microscope with a high performance digital camera. The images will be used to classify the embryos and to sort them accordingly. Furthermore, we are exploring means to substitute the conventional microscope by an optofluidic microscope, which can be built at low cost. The latter approach will allow for parallelizing the sorting to get even higher amounts of “pure” mate- rial. On the other end we are testing experimental procedures to lower the amount of starting material for chromatin immunoprecipitation. Drosophila (as a model system) allows for testing the effect of the removal of certain chromatin modifiers as well as mutations in the histones themselves on the transcriptional process, such that hypothesis formulated from the time course data can be tested directly. In case we are not able to gather enough starting material for the chromatin im- munoprecipitation experiment in Drosophila, we plan to use human tissue culture cells, which are synchronized in the cell cycle. During mitosis transcription stops and recommences after cell division. Thus, by isolating cells, which have just completed cell division, we could effectively synchronize transcription (at least for some time).
With this approach we will be able to uncover the dynamics of histone modifi- cations during transcription in relationship with changes in the phosphorylation status of RNA polymerase II. Thus, we will be able to unravel the cause-effect relationship between histone modifications and transcription, which in the long run will establish a histone code of transcription.
German contribution to the International Human Epigenome Consortium
Our group is part of a BMBF (Federal Ministry for Education and Research) funded consortium entitled “Deutsches Epigenom Programm – DEEP”. Our part in this project is the generation of 34 histone modification maps for cells involved in inflammatory diseases and the downstream analysis of the data as well as the integration with other data types such as DNase I hypersensitivity, DNA methyla- tion etc., together with the group of Martin Vingron at the institute.
For the data generation, we will implement quality controlled standard operation procedures for cell-type dependent chromatin isolation, chromatin immunopre- cipitation and sequencing. Chromatin immunoprecipitation will be performed by a ChIP-robot to ensure maximal reproducibility independent of the operator. The same robot will also prepare the libraries for sequencing. Sequencing will be performed in collaboration with the inhouse sequencing unit headed by Bernd Timmermann.
The whole process of data generation will be monitored by an expert bioinforma- tician. The main emphasis will lie on the critical evaluation of data quality and reproducibility. Furthermore, we will implement (together with our project part- ners) methods to transfer the data to the data center in Heidelberg at the DKFZ. Finally, we will participate in the downstream analysis and data integration (i) to identify epigenetic markers for certain disease states and (ii) to unravel epigenetic mechanisms underlying the emergence of the disease state