PhD Projects in the IMPRS-CBSC
Prediction of transcription factor co-occurrence using rank based statistics
- Alena van Bömmel, 2014
One of the key questions in molecular biology is how cells with the same genetic code are able to differentiate into a large variety of cell types. The differentiation of the cell is controlled through the regulation of gene expression and more precisely by transcription factors (TFs) that occur in pairs. With the vast amount of genomic data at hand, one can correlate the occurance of TFs at certain regulatory DNA sequences. Based on the affinity of TFs to DNA sequences, probable TF pairs were determined and ranked with user-defined R scripts. The statistics using Spearman’s correlation, Kendall’s τ , R2KS score and Fisher’s exact test yielded similar results.
Of much larger interest is the tissue-specific or cell type-specific co-occurrence of TFs. Including additional information about the tissue or cell type specificity of genomic regions and corresponding TFs led to introducing a third dimension for the association measure. Because of an easier extension of conventional 2×2-contingency tables to three dimensions we prefered to use the contingency tables for prediction of co-occurring TFs in tissue-specific manner.
Our method might be applied in future studies for prediction of cell-type-specific cooperation of other regulatory factors or other important players of gene regulation, such as microRNAs, long non-coding RNAs an others. In principle, our method requires only that these factors can be represented by a ranked list of genomic regions or other informative elements. Further, we hope, that the rapid development of experimental techniques will produce reliable data of co-occurring TFs in cell-type-specific manner which would enable the statistical validation of our predictions.