Database searching with raw sequences
- Obtain the sequence with the accession number
AI557731 and BG570348 via SRS
(http://genius.embnet.dkfz-heidelberg.de/menu/srs/)
- Perform a BLAST search at the NCBI
(http://www.ncbi.nlm.nih.gov/BLAST/)
against the nr database
- Compare the results. Can we assign the sequences to a specific gene?
Prediction of a human gene from genomic sequence and ESTs
Sequence retrieval
Obtain the GenBank entry of the human chromosome 16 BAC clone by SRS
http://genius.embnet.dkfz-heidelberg.de/menu/srs/
Accession number: AF001549
The complete sequence consists of 202004 bp. Cut out the bases from positions
35101 to 80100 and save them to file.
Screening for repeat sequences
Screen the genomic DNA against repetitive elements by means of
Repeat Masker at
http://repeatmasker.genome.washington.edu/cgi-bin/RepeatMasker or http://woody.embl-heidelberg.de/repeatmask
by uploading the above file and setting the return-format to html
in the submission form.
Inspect the masked sequence.
EST-searching
Select a 40 kb genomic region from the Drosophila genome (chromosome 3R:25863269..25903268 at http://hdflyarray.zmbh.uni-heidelberg.de/cgi-bin/gbrowse).
- Perform a BLAST-search at NCBI
(http://www.ncbi.nlm.nih.gov/BLAST/)
- against ESTs
- against nr
- Perform BLAST-search
against the EST clusters
(GeneNest:
http://genenest.molgen.mpg.de)
- Inspect the EST clusters. Obtain one or more consensus sequences of the contigs
(and save to a file).
What are the differences between the results; advantages/disadvantages?
Gene prediction
Run gene predictions on the selected (repeatmasked?)
40 kb of genomic sequence.
- Gene Prediction Programs
- GENSCAN (
http://genome.dkfz-heidelberg.de/cgi-bin/GENSCAN/genscan.cgi)
- Genie (http://www.fruitfly.org/seq_tools/genie.html)
- FGeneSH
(http://www.softberry.com/berry.phtml?topic=fgenesh&group=programs&subgroup=gfind)
- Align one or several of the consensus sequences from GeneNest against the human genomic sequence using the program
SIM4 (input sequences in plain text format).
(http://pbil.univ-lyon1.fr/sim4.php).
- Compare the resulting protein sequences by means of a dotplot
(use dotlet:
http://www.isrec.isb-sib.ch/java/dotlet/Dotlet.html)
- make a dotplot to compare the consensus sequence(s) of the EST clusters (see above)
and the predicted genes
- make a dotplot to compare the gene predictions
Verification and functional assignment
- Verify the above gene predictions by
- performing a BLAST search against ESTs
- searching against EST clusters with GeneNest
- What 's the function of the predicted gene?
Perform a BLAST-search against proteins.
- Use Interpro (http://www.ebi.ac.uk/interpro/) or the Pfam database (http://www.sanger.ac.uk/Pfam/) to screen for protein domains.
Upstream regulating sequences
Check upstream regulating sequences at the CORG web site (e.g. genes BHMT, HNF4, DYPS)
Use the Transfac database to search
for potential transcription factor binding sites. (online registration required)
Which strategies are used?
Alternative Splicing prediction based on EST data
For the following human EST sequences find a homologous cluster in the
GeneNest database.
Look for alternative splicing in the alignments of the consensus sequences of a cluster with
the genomic sequence at
http://splicenest.molgen.mpg.de
Optional:
- Search for ESTs by BLAST at NCBI.
- Find genomic regions (NCBI, Celera, etc.?).
- Use different gene prediction programs (are gene predictions matched by ESTs?).
- Align ESTs versus the genomic sequence using SIM4.
Analysis of tissue-specificity based on EST data
Analyze the following GeneNest EST clusters.
Which EST clusters may reflect tissue-specific genes/transcripts ?
Which tissues are most important for each cluster ?
Is this prediction consistent with alternative data sources, e.g. Gene Expression Atlas, Source, Gene Cards?
- Hs75527

- Hs90005

- Hs1852

- Hs572

- Hs1815

- Hs2062

- Hs3132

- Hs21

Tissue-specific alternative splicing
Run a query at http://splicenest.molgen.mpg.de/cgi-bin/ESTbase/query.cgi?Hs7 (preliminary interface) to find tissue-specific isoforms.
Use the options "brain", "Skipped Exon" and "Best display for every cluster:".
Where are preferential locations of these alternative exons?
Search for the gene LMO7, how reliable is the prediction?
Comparing tools
What are the differences of Ensembl, UniGene, GeneNest/SpliceNest, TIGR gene indices?