DOC PREVIEW
Berkeley MCELLBI 110 - Genomics and bioinformatics summary

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Genomics and bioinformatics summary1. Gene finding: computer searches, cDNAs, ESTs,2. Microarrays3. Use BLAST to find homologous sequences4. Multiple sequence alignments (MSAs)5. Trees quantify sequence and evolutionary relationships6. Protein sequences are evolutionary clocks7. Some public databases and protein sequence analysistoolsFinding genes -- computer searchesComputer searches locate most genes in prokaryotes, Archeae, andyeast, but only ~1/3 of human genes are identified correctly.CriteriaProtein start, stop signals, splicing signals . . .Codon biasComparisons to other genomes (mouse, rat, fish, fly, mosquito, worm, yeast . . .)Some hard problems: small genes, post-translational modifications,unique genes, spliced genes, alternative splicing, generearrangements (e.g. IgGs) . . .2Finding genes -- cDNA synthesis Synthesizing “cDNA”(complementary DNA)1. Extract RNA2. Hybridize polyT primer3. Synthesize DNA strand 1using reverse transcriptase.4. Fragment RNA strand usingRNaseH.5. Synthesize DNA strand 2using DNA polSequences of random cDNAs provide ESTs (ExpressedSequence Tags)Microarrays quantify expressed genesby hybridization1. Label cDNAs with red fluorophore inone condition and green fluorophorein another reference condition.2. Mix red and green DNA andhybridize to a “microarray”.Red genes enriched in referenceYellow genes (green + red) =Green genes enriched in experimentEach spot is adifferent syntheticoligonucleotidecomplementary toa specific gene.3“Cluster analysis” identifies patterns of geneexpression1. Similar patterns of expression are placed next to each other.Groups of genes with similar patterns form a hierarchical “tree”.For example the two major branches of the tree compriseactivated (left, green) or repressed genes (right, red).2. Genes with similar expression patterns (e.g. A-E) often functiontogether.GenesConditions“Tiling” microarrays can findtranscribed sequencesEach spot has a different syntheticoligonucleotide complementary to adifferent segment of the genome (E.gevery 100 bps). Spots that hydridizereveal transcribed regions.Microarray coding capacity ~16 M bases4Find similar sequences (homologs) with BLASTThe most related human protein identified by a BLAST search of the human genome using the sequence of M.tuberculosis PknB Ser/Thr protein kinase is . . . ELKL motif kinase 1. Query = the part of the PknB sequence thatmatches ELKL-1. Subject = ELKL-1. Expect = expectation value = the number of hits of this quality expected bychance in a database of this size (5e-24 = 5 x 10-24; is this a big number or small?) Identities = # of exact aminoacid matches in the alignment. Positives = # of conservative changes as defined by the residues that tend toreplace each other in homologous proteins. NP_00495.2 = sequence ID for ELKL-1.>ref|NP_004945.2| ELKL motif kinase 1 [Homo sapiens]Length = 691 Score = 108 bits (270), Expect = 5e-24 Identities = 87/296 (29%), Positives = 135/296 (45%), Gaps = 21/296 (7%)Query: 11 YELGEILGFGGMSEVHLARDLRLHRDVAVKVLRADLARDPSFYLRFRREAQNAAALNHPA 70 Y L + +G G ++V LAR + ++VAVK++ S FR E + LNHPSbjct: 20 YRLLKTIGKGNFAKVKLARHILTGKEVAVKIIDKTQLNSSSLQKLFR-EVRIMKVLNHPN 78Query: 71 IVAVYDTGEAETPAGPLPYIVMEYVDGVTLRDIVHTEGPMTPKRAIEVIADACQALNFSH 130 IV +++ E E Y+VMEY G + D + G M K A A+ + HSbjct: 79 IVKLFEVIETEKTL----YLVMEYASGGEVFDYLVAHGRMKEKEARAKFRQIVSAVQYCH 134Query: 131 QNGIIHRDVKPANIMISATNAVKVMDFGIARAIADSGNSVTQTAAVIGTAQYLSPEQARG 190 Q I+HRD+K N+++ A +K+ DFG + GN + G+ Y +PE +GSbjct: 135 QKFIVHRDLKAENLLLDADMNIKIADFGFSNEFT-FGNKLD---TFCGSPPYAAPELFQG 190Query: 191 DSVDA-RSDVYSLGCVLYEVLTGEPPFTGDSPVSVAYQHVREDPIPPSARHE-GLSADLD 248 D DV+SLG +LY +++G PF G + + +RE + R +S D +Sbjct: 191 KKYDGPEVDVWSLGVILYTLVSGSLPFDGQN-----LKELRERVLRGKYRIPFYMSTDCE 245Query: 249 AVVLKALAKNPENRYQTAAEMRADLVRVHNGEPPEAPKV-----LTDAERTSLLSS 299 ++ K L NP R M+ + V + + P V D RT L+ SSbjct: 246 NLLKKFLILNPSKRGTLEQIMKDRWMNVGHEDDELKPYVEPLPDYKDPRRTELMVS 301Ser/Thr Protein kinases diverge rapidlyMultiple Sequence Alignment (MSA) of the N-terminal ~90residues of M. tuberculosis PknB (bottom) and Ser/Thr proteinkinases of known structure. The histogram at the bottomshows % identity at each position. Only a few residues areabsolutely conserved (functional sites!). The MSA defines thebeginning of the kinase domain. Insertions often occur in loops.5Histones evolve slowlyCore H3 proteins (that have the same function) are nearlyidentical in eukaryotes (left). Archaeal H3s and specializedH3 proteins that bind at centromeres show much moredivergence (bottom sequences and tree branches, right).MSA = Multiple Sequence AlignmentTreeProtein sequences are evolutionary clocksAssuming that organisms diverged from acommon ancestor and sequence changesaccumulate at constant rates, thenumber of changes in homologousproteins gives information about the timethat each sequence has been evolvingindependently.Average rate of change ofproteins of differentfunction.FastSlow6Tree of life (Sequences = biological clocks)A tree derived byclustering sequencesof a typical proteinfamily (pterin-4a-hydroxylase)recapitulates the treeof life. Evolutionaryrelationships are seenat the molecular levelin virtually everyshared protein andRNA!Some web sites for bioinformaticsNucleic acid sequenceshttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=nucleotideProtein sequenceshttp://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=ProteinStructure Coordinates: Protein Data Bankhttp://www.rcsb.org/pdb/ProgramsBLAST sequence similarity calculationhttp://www.ncbi.nlm.nih.gov/BLAST/BLAST bacterial genomeshttp://www.ncbi.nlm.nih.gov/sutils/genom_table.cgiPHD secondary structure predictor and motif searchhttp://www.embl-heidelberg.de/predictprotein/predictprotein.htmlPHYRE fold predictorhttp://www.sbg.bio.ic.ac.uk/~phyre/Multicoil: Coiled coil prediction http://multicoil.lcs.mit.edu/cgi-bin/multicoil/Many nucleic acid and protein sequence-analysis toolshttp://au.expasy.org/Predict transmembrane heliceshttp://www.cbs.dtu.dk/services/THMM-2.0/Predict signal sequenceshttp://www.cbs.dtu.dk/services/SignalP/7Genomics and bioinformatics summary1. Gene finding: computer searches, cDNAs, ESTs,2. Microarrays3. Use BLAST to find homologous sequences4. Multiple sequence alignments (MSAs)5. Trees quantify sequence and evolutionary relationships6. Protein sequences are evolutionary clocks7. Lots of public


View Full Document

Berkeley MCELLBI 110 - Genomics and bioinformatics summary

Documents in this Course
Midterm

Midterm

7 pages

Midterm

Midterm

5 pages

Exam

Exam

15 pages

Load more
Download Genomics and bioinformatics summary
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Genomics and bioinformatics summary and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Genomics and bioinformatics summary 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?