CMSC 423 Introduction Biology Basics August 31 2010 Carl Kingsford Center for Bioinformatics and Computational Biology What does biology have to do with computers Huge amount of data too much to analyze by hand Requires clever algorithms to find interesting patterns store search compare predict missing or hard to observe features like protein structure or evolutionary relationships Nearly all molecular biology is now computational biology biologists depend on computer scientists every day Algorithmic Techniques Data Structures Dynamic programming Hidden Markov Models Divide and conquer Branch and bound Very general algorithmic techniques Linear programming Gibbs sampling expectation maximization Burroughs Wheeler transform Suffix arrays suffix trees Techniques widely used in string sequence algorithms Course work Evaluation 2 exams during the semester Sept 30 2010 Nov 11 2010 These dates are fixed Exams are noncumulative 2nd exam will cover material since the first exam Each 20 of the grade Comprehensive final 30 Tuesday Dec 14 2010 8am 10am Covers everything in the class Several homework assignments 10 of your grade Neatness counts Programming project 20 of your grade Mostly in the second half of the semester Administrative Details Homeworks are due at the start of class on their due dates No late homeworks will be accepted You can discuss homeworks with your classmates You must list the names of the people with whom you collaborated at the top of the homework You must write up homeworks solutions on your own Late programming assignments will lose 10 per day up to 5 days after which they will not be accepted TA Emre Sefer TA office hours TBD Grades will be posted on http grades cs umd edu More details on syllabus handout Instructor office hours Mondays 2 30 3 30pm or by appointment in CBCB 3113 Tentative Course Topics Outline Before midterm 1 Sequence search comparison Dynamic programming Local global alignments Aligning multiple sequences RNA folding Suffix trees suffix arrays Burroughs Wheeler Transform Before final Before midterm 2 Hidden Markov Models For gene finding For sequence pattern finding Expectation Maximization for pattern finding Gene Expression Clustering Phylogenetics Algorithms for building trees Genome Rearrangements Protein Structure Secondary structure prediction Threading for structure prediction Side chain positioning Spatial biology Mouse brain structure Genome shape E coli E coli is an example of a bacterium total dry weight DNA 3 1 Algorithms are used to understand RNA 20 5 these important components Protein 55 0 Lipid 9 1 Christos Savva Microscopy Imaging Center and Thomas Wood Dept of Chemical Engineering at Texas A M University Central Dogma of Biology proteins Translation mRNA T U Transcription Genome DNA double stranded linear molecule strands are complements of each other A T C G each strand is string over A C G T substrings encode for genes most of which encode for proteins DNA G A C T DNA Replication The Cartoon Guide to Genetics Larry Gonick Mark Wheelis 1983 Recent Genomics DNA First genome sequenced in 1995 the bacteria H influenzae with a genome of 1 830 140 letters 1st draft of human genome finished in 2001 3 billion letters Now Over 1100 bacterial genomes Hundreds of higher order genomes done or in progress Several complete human genomes finished 1 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681 1741 1801 1861 1921 1981 2041 2101 2161 2221 2281 2341 2401 2461 2521 2581 2641 2701 2761 2821 2881 2941 3001 3061 3121 3181 3241 3301 3361 3421 3481 3541 3601 3661 3721 3781 3841 3901 3961 4021 4081 4141 atactataaa atcatagtat tggtgtaccc gcatgtacat cgatagtcta accagcaatc tttctatact ctcaatccta gatcacacat ctatgactca attcatcttt gacataacac cgcacgtgta cgtacgtgta cgtacgtgta cgtacgtgta cgcacgtgta cgcgtacgta caagaagctt tgcgcaaaca tgttgcaaca ttactgtgtc ttgagctatt aggtactgaa ccttcctatt cccttaagtc acaacgcctt acgaaagttc ggtcatacga tcctctacta actacgaaag gataccccac agaactacta ggagcctgtt tatataccgc ataaaaaagt ttcaagaaca taaattaaga accctcctcg agacaagtcg ttaaacaaag caaagctagc ttagtagcat ccgcaaggga cttttgtata gaaaccagac agtgagaaga cagaatagaa tttaaaatat gagtaaatat aaagctcaac ctgggtcaat ctcccatgca aataacctaa agggaaagat aaacatcacc acggccgcgg ttgtatgaat tccccgtgaa actaattcaa agcaatttag gacttaccag caagttaccc tacgacctcg gttcaacgat tatctattca gaagcgcctt gcccaagaga cttgtgtcca taattgtacc tccacctctc gttttcatac cccctccccc actgtgcttg tgagcatgta cttgcgagta gaaactatac ctaacccttc aactgtggtg gctatgaccg cgaggctcct tatagatcac cgtacgtgta cgcacgtgta cgcacgtgta cgcacgtgta cgcacgtgta ttttagatac acatatactt tgaagtcact actacggaca tccccagcat tcacatgagt aatgcctaga agccattaac ctcttagacg gcttagccac gactaagcta ttaactcgag aagttaaagt tgactttaaa tatgcctagc gcaacagctt ctgtaatcga catcttcagc taggtcaagg acttctacga atagagagct agtgatataa taacaaggta cgtctggctt ccaagcaaca gactagagta atgatgaaag atgagttagc gagctaccta tccataggta ttttagttcg attctaaaaa ttatataagc atctctatta ctatttaaat taagcttata tgataaaaca taaaagaagt tccagcattt tattctgacc ggccacacga gaggcgggga aaagaaacta gttggggcga tcaaaatgct tagggataac atgttggatc taaagtccta aataatttct aagaccaata cagggctttg gaggttcaat tattcttctt attttattca atcctccctt tatgtatatc gctttacatg tttcacttag cgtgtacctc ctggcatctg aaatgggaca tcatgcattt taaaggtctc catggacacc ccggactggc cgcacgtgta cgtacgtgta cgtacgtgta cgtacgtgta cgcacgtgta taagttagct atggatgtcc tacacctaaa tgggactcta tgatttttta tccaaatcaa tgggtcacgc aagattacac acctaaagga acccccacgg tgttaatact ttaataggcc atgattaagc atttctgatt tctaaacata aaaactcaaa taaaccccga aaacccttaa tgtaacctat aaacttttat taattgaaca tttaattata agcatactgg acatccagaa atgactagta taggagatag attacctaaa tagaaataac tgaacaatcc gaggtgaaag actttaaacc ggtacagctt catagtaggc acttaatacc atagaagtga acagcaacgg cctattaaat gaaaggaact ccagtattgg gtgcaaaggt gggtttaact taagacaata ctaacgaccc cctcggagga taatcactta agcgcaatcc aggacatcct cgtgatctga cccagtacga gatgaattta ttagggtggc tcctctccct gctgtagcct cttcatacat ctttcacacc gtgcattaat aggatactca tccaagagct ttctcgctcc gttcttacct tctcgatgga ggtatttttt gtcgcagtca cataaggtgc gttacgtgta cgtacgtgta cgtacgtgta cgcacgtgta cgcacgtgta cgcacgtgta tagacaaacc tgccaaaccc cccatataat aattttaatt attatcatta ttatgttcat taccccatag atgtaagtct gcgggtatca gaaacagcag agggttggta tacggcgtaa tgtaaaaagc acacgatagc gatattttac ggacttggcg tagacctcac aaggaaaaaa gggctgggaa gaaactaaaa gggcaatgaa acctatttaa aaagtgtgct gatttcatta aaaccattat aaatttttaa gtgataaaca ttaacaaaga actgggatga
View Full Document
Unlocking...