New version page

UAB MIC 753 - Data

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

View Full Document
View Full Document

End of preview. Want to read all 35 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

1DataSequencesandOther StuffSequence DataNucleic Acid and Protein Sequencesn Sources of Genetic Sequencesn Usern GCG supplied databasesn Flat Filen Oracle Relational Databasen NCBI supplied databasesn Other databasesSequence Databasesn Genbankn EMBLn DDBJn NCBIn PIRn Swiss-Protn Swiss-Prot TrEMBLGenbankn Primary nucleic acid sequence databasen Maintained by NCBIn National Center for Biotechnology Informationn http://www.ncbi.nlm.nih.govn Current Release 122, 2/2001n 11,720,120,326 basesn 10,896,781 sequences2Species 1995 1996 1997 1998 1999 2000 2001 Increase(since 1995)Increase(12 months)all: 16109 23119 32880 43516 61952 87751 95168 490% 40.9% Viruses: 1845 2122 2678 2968 3573 4428 4857 163% 32.4% Bacteria: 2939 3847 6091 8711 14322 22758 24878 746% 53.3% Archaea: 162 235 385 555 1015 1709 1906 1076% 68.8% Eukaryota: 10366 15901 22596 29926 41420 56961 61571 493% 37.4% How Many Organisms Are In The Sequence Databases?(April 1, 2001)Other NCBI Databasesn HTGSn ESTn STSn GSSn RefSeqn Unigenen GenomicHTGSHigh Throughput Genomic Sequencesn ‘Unfinished' DNA sequences generated by the high-throughput sequencing centersn Phase 0n Single-few pass reads of a single clone (not contigs)n Phase 1n Unfinished, may be unordered, unoriented contigs, with gapsn Phase 2n Unfinished, ordered, oriented contigs, with or without gapsn Phase 3n Primary division (Genbank)n Finished, no gaps (with or without annotations)ESTn Expressed Sequence Tagsn “Single-pass" cDNA sequencesn Generally representative of the 3’ ends of cDNAsn More “full-length” ESTs now availableSTSn Sequence Tagged Sites n Sequence and mapping datan Short genomic landmark sequencesGSSn Genome Survey Sequences n Similar to the EST division, except that its sequences are genomic in origin, rather thancDNA n Random “single pass read” genome survey sequences.n Cosmid/BAC/YAC end sequencesn Exon trapped genomic sequencesn alu PCR sequences3RefSeqn NCBI Reference Sequence project n Provides reference sequence standards for the naturally occurring molecules from chromosomes to mRNAs to proteinsn Stable reference point for:n mutation analysisn gene expression studiesn polymorphism discoveryRefSeq…n Curated RefSeqn transcripts and proteinsn Genome Annotationn contigs, transcripts, and proteinsn Complete Genomesn genomes, chromosomes, and proteinsUnigenen Experimental system for automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clustersn Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location. n Includes EST and cDNA sequencesn Includes human, rat, mouse, cow and zebrafish HomoloGenen Curated and calculated orthologs andhomologs for genes represented inUniGene and LocusLinkn Includes human, mouse, rat, zebrafish, cow and drosophilaLocusLinkn Provides a single query interface tocurated sequence and descriptive information about genetic locin Nomenclaturen Aliasesn Sequence accessionsn Phenotypesn EC numbersn MIM numbersn UniGene clustersn Homologyn Map locationsn Web sitesEMBL and DDBJn European Molecular Biology Laboratoryn Hinxton, UKn http://www.ebi.ac.uk/n DNA Data Bank of Japann Mishima, Japann http://www.ddbj.nig.ac.jp/4Coordination with Genbankn Prevents duplicationn Genbank enters sequences from U.S. journals and researchersn EMBL handles European datan DDBJ handles Asian datan Data exchanged dailySequence submissionsn Sequences entered from journalsn Sequences submitted by individual researchersn BankItn NCBI WWW Siten Sequinn Multi-platform programSequence Namesn DO NOT rely on names to find particular sequencesn Few conventions n Organismn Hum: Humann Mus: mousen Eco: E. colin Syn: syntheticLast Letter(s)n Sometimes gives useful informationn cg: Complete genomen VirusesOther Lettersn Specifies a particular sequencen vsvcgn Vesicular stomatitis virus (Indiana serotype) complete genomeEMBL File Namesn Ec: E. colin Hs: Human5Locus namen Names are short, fairly non-descriptive, and can change from one release to anothern vsvcgn The complete sequence for the virus VSVn Most “mnemonic” names already takenn Genbank now using accession numbers as locus namesAccession Numbersn Each sequence submitted to a database is assigned a unique primary accession numbern Accession numbers do not changen If a sequence is merged with another, a new accession number is assigned, and the original number becomes a secondary accession numbern Accession numbers may include version numbersn AO2428.2Accession Numbersn Using GCG to access sequences via their accession numbern Data Library:Accession Numbern Flatfile - vi:JO2428n RDB - gcgnuc: JO2428The Sequence Recordn Different for each databasen Locus (Name)n Accession Numbern Keywordsn Descriptionn Propertiesn Referencesn The Sequenceanalyze% typedata ge:humcftrm!!NA_SEQUENCE 1.0LOCUS HUMCFTRM 6129 bp mRNA PRI 15-DEC-1989DEFINITION Human cystic fibrosis mRNA, encoding a presumed transmembraneconductance regulator (CFTR).ACCESSION M28668NID g180331KEYWORDS cystic fibrosis; transmembrane conductance regulator.SOURCE Human, cDNA to mRNA.ORGANISM Homo sapiensEukaryotae ; mitochondrial eukaryotes; Metazoa ; Chordata;Vertebrata; Eutheria; Primates; Catarrhini; Hominidae ; Homo.REFERENCE 1 (bases 1 to 6129)AUTHORS Riordan,J.R., Rommens,J.M., Kerem ,B., Alon,N., Rozmahel,R.,Grzelczak,Z., Zielenski,J., Lok,S., Plavsic,N., Chou,J.-L.,Drumm,M.L., Iannuzzi,M.C., Collins,F.S. and Tsui,L.-C.TITLE Identification of the cystic fibrosis gene: Cloning andcharacterization of complementary DNAJOURNAL Science 245, 1066-1073 (1989)MEDLINE 89368940COMMENT A three base- pair deletion spanning positions 1654-1656 is observedin cDNAs from cystic fibrosis patients.FEATURES Location/Qualifierssource 1. .6129/organism="Homo sapiens"/db_ xref ="taxon :9606"CDS 133. .4575/note="cystic fibrosis transmembrane conductanceregulator"/codon_start=1/db_ xref ="PID:g180332"/translation="MQRSPLEKASVVSKLFFSWTRPILRKGYR QRLELSDIYQIPSVDSADNLSEKLEREWDRELASKKNPKLINALRRCFFWRFMFYGIF LYLGEVTKAVQPLLLLNRFSKDIAILDDLLPLTIFDFIQLLLIVIGAIAVVAVLQPYI FVATVPVIVAFIMLRAYFLQTSQQLKQLESEGRSPIFTHLVTSLKGLWTLRAFGRQPY FETLFHKALNLHTANWFLYLSTLRWFQMRIEMIFVIFFIAVTFISILTTGEGEGRVGI ILTLAMNIMSTLQWAVNSSIDVDSLMRSVSRVFKFIDMPTEGKPTKSTKPYKNGQLSK


View Full Document
Loading Unlocking...
Login

Join to view Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?