Database searching with raw sequences
- Obtain the sequence with the accession number
AW951311 by SRS
(http://genius.embnet.dkfz-heidelberg.de/menu/srs/)
- Perform a blast search at the NCBI
(http://www.ncbi.nlm.nih.gov/BLAST/)
- against the est database
- agaist the vector database
- Compare the results
Prediction of a human gene from genomic sequence and ESTs
Sequence retrieval
Obtain the GenBank entry of the human chromosome 16 BAC clone by SRS
http://genius.embnet.dkfz-heidelberg.de/menu/srs/
Accession number: AF001549
The complete sequence consists of 202004 bp. Copy the bases from positions
35101 to 80100 and save them as a file.
Screening for repeat sequences
Screen the genomic DNA against repetitive elements by means of
Repeat Masker at
http://repeatmasker.genome.washington.edu/cgi-bin/RepeatMasker or http://woody.embl-heidelberg.de/repeatmask
by uploading the above file and setting the return-format to html
in the submission form.
Inspect the masked sequence.
EST-searching
- Perform a BLAST-search against human ESTs at NCBI
(http://www.ncbi.nlm.nih.gov/BLAST/)
- with the unmasked part of the BAC clone
- with the repeatmasked BAC clone
Discuss the difference in the obtained hits.
- Perform BLAST-search with the repeatmasked BAC clone
against the EST clusters
(GeneNest:
http://genenest.molgen.mpg.de)
- Inspect the EST clusters. Obtain one or more consensus sequences of the contigs
(and save to a file).
Gene prediction
Use GENSCAN and Genie to make gene predictions on the selected (repeatmasked?)
45 kB of the BAC-clone.
- Gene Prediction Programs
- GENSCAN (
http://genome.dkfz-heidelberg.de/cgi-bin/GENSCAN/genscan.cgi)
- Genie (http://www.fruitfly.org/seq_tools/genie.html)
- FGenes
(http://dot.imgen.bcm.tmc.edu:9331/gene-finder/gf.html)
- Align one or several of the consensus sequences from GeneNest against the human genomic sequence using the program
SIM4 (input sequences in plain text format).
(http://pbil.univ-lyon1.fr/sim4.html).
- Compare the resulting protein sequences by means of a dotplot
(use dotlet:
http://www.isrec.isb-sib.ch/java/dotlet/Dotlet.html)
- make a dotplot to compare the consensus sequence(s) of the EST clusters (see above)
and the predicted genes
- make a dotplot to compare the gene predictions
Verification and functional assignment
- Verify the above gene predictions by
- performing a BLAST search against ESTs
- searching against EST clusters with GeneNest
- What 's the function of the predicted gene?
Perform a BLAST-search against proteins.
- Use the Pfam Database to screen for protein domains. (http://www.sanger.ac.uk/Pfam/)
Other sources (optional)
- Find the genomic region in the assembly of the public human genome project
(http://genome.ucsc.edu/). Use the accession number (AF001549) as a query.
- Find the genomic region in the assembly of the Celera (http://www.celera.com/)
(online registration required).
- Find gene in the Ensembl project (http://www.ensembl.org).
- Compare the gene prediction with the mRNA with the accession number:
AJ272050
Prediction of a mouse gene by homology
Sequence retrieval
- Retrieve the sequence
BB019265
- Find the sequence in GeneNest database by either
Blast or query with the AC.
- Screen the "nr" database at NCBI with
- the sequence of BB019265 (Tip: try also repeat masking)
- the consensus sequence from GeneNest
- compare the results.
- Screen the genomic "htgs" database at NCBI with
the consensus sequence from GeneNest
Gene Prediction
- Run gene prediction on a genomic sequence found with the GeneNest consensus and compare the results with the
sequences of the genes retrieved from the "nr" search.
- Cut out a region of 10 kb from the genomic sequence. Use the homologous rat gene for the
2-oxoglutarate carrier to run homology based gene prediction using Gene Wise
(
http://www.sanger.ac.uk/Software/Wise2/genewiseform.shtml)
- Compare the different gene predictions.
Alternative Splicing prediction based on EST data
For the following human EST sequences find a homologous cluster in the
GeneNest database.
Look for alternative splicing in the alignments of the consensus sequences of a cluster with
the genomic sequence at
http://splicenest.molgen.mpg.de
Optional:
- Search for ESTs by BLAST at NCBI.
- Find genomic regions (NCBI, Celera, etc.?).
- Use different gene prediction programs (do the gene predictions match to ESTs?).
- Align ESTs versus the genomic sequence using SIM4.
Analysis of tissue-specificity based on EST data
Analyze the following GeneNest EST clusters.
Which EST clusters may reflect tissue-specific genes/transcripts ?
Which tissues are most important for each cluster ?
- Hs75527

5 splice variants
- Hs90005

mainly expressed in brain; RNA is incomplete; internal poly A
- Hs1852

mainly expressed in breast, prostate; compare with RNA annotation
- Hs572

mainly expressed in liver, bladder; alternative splicing ?
- Hs1815

mainly expressed in eye, muscle; alternative splicing ?
- Hs2062

expressed in several tissues, thyroid only as tumor
- Hs3132

mainly expressed in brain, adrenal gland; alternative splicing
- Hs21

mainly expressed in pancreas, contig 4: muscle; difference based on a single EST (AW409991)
Promotor recognition (optional)
Retrieve the Sequence for the human c-myc oncogene
(Accession Number: D10493)
the promoter around position 2300
- Use the transfac database (http://transfac.gbf.de/TRANSFAC/)
to get the promoter description. Alternatively, SRS can be used. Use search to find the c-myc gene.
- Use the Matinspector program to search
in a 1000 bp window around the promotor region for potenitial transcription factor binding sites.