CMSC423 Bioinformatic Algorithms Databases and Tools Mihai Pop Molecular biology primer Admin Have you tried your glue accounts Issues concerns questions about class and policies Reading assignment Chapter 1 in the book The tree of life http www fossilmuseum net Tree of Life Domains Archaea Bacteria DNA the code of life Purines A G caffeine Pyrimidines C T Sugar backbone ticker tape Double stranded allows replication pictures from wikipedia DNA in the computer FASTA multi FASTA file format gi 110227054 gb AE004091 2 Pseudomonas aeruginosa PAO1 complete genome TTTAAAGAGACCGGCGATTCTAGTGAAATCGAACGGGCAGGTCAATTTCCAACCAGCGATGACGTAATAG ATAGATACAAGGAAGTCATTTTTCTTTTAAAGGATAGAAACGGTTAATGCTCTTGGGACGGCGCTTTTCT GTGCATAACTCGATGAAGCCCAGCAATTGCGTGTTTCTCCGGCAGGCAAAAGGTTGTCGAGAACCGGTGT CGAGGCTGTTTCCTTCCTGAGCGAAGCCTGGGGATGAACGAGATGGTTATCCACAGCGGTTTTTTCCACA CGGCTGTGCGCAGGGATGTACCCCCTTCAAAGCAAGGGTTATCCACAAAGTCCAGGACGACCGTCCGTCG Parsers easy to write also available in a variety of software libraries Central dogma AGGTACGCGTACCTGACAGG http www accessexcellence org RC VL GG central html Genes transcription translation DNA RNA Thymine replaced by Uracil T U The transcribed segments are called genes ACCGUACCAUGUUA AUAGGCUGAGCA AUG start codon also amino acid Methionine UAA UAG UGA stop codons Genes are read in sets of 3 nucleotides during translation 43 64 possible combinations Each combination codes for one of 20 amino acids the building blocks for proteins Amino acid translation table Genes proteins in the computer gi 15596155 ref NP 249649 1 basic amino acid MKVMKWSAIALAVSAGSTQFAVADAFVSDQAEAKGFIEDSSLDLLLRNYYFNRDGKSGSGDRVDWTQGFL TTYESGFTQGTVGFGVDAFGYLGLKLDGTSDKTGTGNLPVMNDGKPRDDYSRAGGAVKVRISKTMLKWGE MQPTAPVFAAGGSRLFPQTATGFQLQSSEFEGLDLEAGHFTEGKEPTTVKSRGELYATYAGETAKSADFI GGRYAITDNLSASLYGAELEDIYRQYYLNSNYTIPLASDQSLGFDFNIYRTNDEGKAKAGDISNTTWSLA AAYTLDAHTFTLAYQKVHGDQPFDYIGFGRNGSGAGGDSIFLANSVQYSDFNGPGEKSWQARYDLNLASY GVPGLTFMVRYINGKDIDGTKMSDNNVGYKNYGYGEDGKHHETNLEAKYVVQSGPAKDLSFRIRQAWHRA NADQGEGDQNEFRLIVDYPLSIL Same FASTA multi FASTA but with bigger alphabet Genes proteins in the computer gene CDS complement 1043983 1045314 gene oprD locus tag PA0958 complement 1043983 1045314 gene oprD locus tag PA0958 note Product name confidence Class 1 Function experimentally demonstrated in P aeruginosa codon start 1 transl table 11 product Basic amino acid basic peptide and imipenem outer membrane porin OprD precursor protein id AAG04347 1 db xref GI 9946864 GenBank file format Translation complications Alternative splicing examples Protein structure http www tulane edu biochem med second htm Protein structure Primary structure sequence Secondary structure structure motifs Tertiary structure 3D position of atoms Quaternary structure docking of proteins Protein structure data PDB format ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM 1 2 3 4 5 6 7 8 9 10 N CA C O CB CG SD CE N CA MET MET MET MET MET MET MET MET LEU LEU A A A A A A A A A A 1 1 1 1 1 1 1 1 2 2 20 020 20 598 22 118 22 660 20 009 20 331 21 406 21 129 22 799 24 249 28 662 29 950 29 937 29 623 31 073 32 468 33 373 32 396 30 285 30 178 42 801 42 438 42 576 43 636 43 293 42 765 43 921 45 410 41 490 41 424 1 00 51 80 1 00 52 13 1 00 47 63 1 00 49 97 1 00 51 36 1 00 51 13 1 00103 49 1 00 55 43 1 00 41 99 1 00 37 25 N C C O C C S C N C RECAP DNA is a string formed with letters A C T G called nucleotides or bases DNA is double stranded allows replication transfer of genetic code from parents to offspring DNA is naturally oriented from 5 to 3 and the two strands are anti parallel If you know the sequence of one strand you can obtain the sequence of the other by reversecomplementation 5 AGACCTAGTGCACGGCTACTACC 3 5 CCATCATCGGCACGTGATCCAGA 3 Reverse 5 GGTAGTAGCCGTGCACTAGGTCT 3 Complement RECAP Central Dogma of molecular biology DNA RNA transcription RNA Protein translation The transcribed segments of DNA are called genes Translation occurs in sets of 3 nucleotides codons Each codon encodes one of 20 amino acids and 3 stop codons In eukaryotes the genes may be split into multiple exons separated by introns DNA segments that will not get translated The protein is translated from an RNA representing the concatenation of the exons of the gene The new biology DNA is not the only heritable information Epigenetic information RNA molecules DNA methylation patterns affects coiling on DNA on histones Complex regulation patterns Genes turn on other genes Genes inhibit other genes RNA interference small RNA molecules can destroy specific transcripts down regulate production Playing with DNA Biologists can Cut the DNA restriction enzymes often palindromes Nobel prize Arber Nathans Smith 5 GAATTC 3 CTTAAG 5 G 3 CTTAA AATTC 3 G 5 Attach things to DNA either single or double strand TAGGCACGTTGCAACTACGGC TGCAACGT Amplify DNA Polymerase Chain Reaction Nobel prize Mullis Polymerase chain reaction PCR 1 Denature 2 Anneal attach primer 3 Extend 4 Repeat How does PCR work 1 Start 1 double stranded molecule 1 Denature 2 singlestranded molecules 1 Anneal 2 single stranded molecules with primers attached 1 Extend 2 double stranded molecules one long L strand and one short S terminated at a primer 2 Start 2 double stranded molecules L S L S 2 Denature 2 x L strands 2 x S strands 2 Anneal all strands with primers attached 2 Extend 2 double stranded molecules L S L S 2 double stranded molecules S SS S SS SS strand terminated at both ends with a primer PCR Recurrences Ln Sn SSn of strands of each type at cycle n L n Ln 1 2 Sn Sn 1 Ln 1 Sn 1 2 2 n 1 O n SSn Sn 1 2 SSn 1 O 2n The sequence between the primers SS is amplified exponentially will quickly overtake the solution Quantitative PCR Measure of PCR cycles needed to reach a certain concentration of DNA depends on initial of molecules Used in diagnostics e g is this a random Anthrax spore from the environment or lots of spores from an attack http www dxsgenotyping com technology main htm 23 DNA sequencing Most techniques trick the polymerase into revealing the sequence The traditional method Sanger sequencing based on terminator bases prevent the polymerase from extending the DNA Sanger sequencing is essentially PCR terminator bases Other methods spy on the polymerase as it incorporates nucleotides 24 Sanger sequencing Sanger F Coulson AR A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase J Mol Biol 94 1975 G C A T A G G G TCTAATAGA AGATTATCTAACAGCTACCCTTCCATCA TCTAATT TCTAATTA TCTAATTAG TCTAATTAGA TCTAATTAGAT
View Full Document
Unlocking...