DARTMOUTH BIOL 039 - BIOINFORMATICS DATABASES

Unformatted text preview:

Bioinformatics Databases: Fundamental Concepts of Database Technology & Data OrganizationPowerPoint PresentationSlide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Data Security: High PriorityLife science has become a field which generates an enormous amount of un-integrated data.Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Bioinformatics Databases:Fundamental Concepts of Database Technology & Data OrganizationKristen AntonDirector of BioInformaticsDartmouth Medical SchoolBioInformatics @ Dartmouth Medical SchoolBioInformatics @ Dartmouth Medical SchoolHow can data be organized?•Paper (i.e. in notebooks)•Flat files–Collection of data records–Minimal structure, no metadata–Application program must contain relationship information•Database–Hierarchical–Network–RelationalBioInformatics @ Dartmouth Medical SchoolBioInformatics @ Dartmouth Medical SchoolHow can data be organized?•Paper (i.e. in notebooks)•Flat files–Collection of data records–Minimal structure, no metadata–Application program must contain relationship information•Database–Hierarchical–Network–RelationalBioInformatics @ Dartmouth Medical SchoolWhat is a relational database?A database composed of relations and conformingto a set of principles governing how such relationsare supposed to behave (“Codd’s 12 Rules”).There are many database systems that use tablesbut don’t conform to all of the principles. These are often called “semirelational” systems. from Understanding SQL, Martin GruberBioInformatics @ Dartmouth Medical SchoolPractically speaking...•A database is a body of information stored in two dimensions (rows and columns)–Rows are records–Columns are attributes of those record entities (usually!)•The groups of rows and columns, or tables, are largely independent of each other•The power of the database lies in the relationships that you construct among the tables•A database is self-describing: it contains metadata, which is a description of its own structure•A set of programs which define, administer and process databases and their associated applications•A scalable DBMS can run on multiple platforms (varying sizes)•A DBMS that supports interoperability uses industry-standard language and standard ways of exchanging dataWhat is a Database Management System (DBMS)?Examples: Oracle, Sybase, 4D, MS Access …BioInformatics @ Dartmouth Medical SchoolFeatures of a Relational Database•Rows (records) are in no particular order•Columns (fields) are ordered, numbered and named; names should indicate content of the field•Primary key uniquely identifies each row - ensures that no row is empty, and that every row is different from every other row•Two-step commit processBioInformatics @ Dartmouth Medical SchoolFeatures of a Relational Database•A view is a subset of the database that an application (or user) can process•The database schema is the structure of the entire database•A constraint is a condition you apply to an attribute of a tableBioInformatics @ Dartmouth Medical SchoolBioInformatics @ Dartmouth Medical SchoolRelationships between tables•One-to-One, Many-to-One, Many-to-Many•A “join” is an operation that combines data from multiple tables into a singe result table•E-R (entity-relationship) diagram is the basic graphic to describe the structure of a databaseSELECT Sequence.sname, KnownGenes.gname, KnownGenes.length FROM Sequence, KnownGenes WHERE KnownGenes.length = Sequence.lengthBioInformatics @ Dartmouth Medical SchoolE-R DiagramThe tool for communicating withrelational databases: SQL•Standard Query Language (SQL)•A query is a question you ask the database, and SQL retrieves the appropriate answer set•Interactive SQL (command line) vs. RAD tool/GUI•Standardization issue: ANSI (American National Standards Institute)BioInformatics @ Dartmouth Medical SchoolData Types•Types of data indicate functions that are possible between related fields•Each field is assigned one data type (imposes structure on data)•Examples: text (CHAR, VARCHAR), number (INT, DEC); date, time, money binary•Standardization issue: ANSI (American National Standards Institute)BioInformatics @ Dartmouth Medical School•Designing a database is not trivial•The value is not in the data, but in the structure•Design to facilitate the retrieval and interpretation of the dataBioInformatics @ Dartmouth Medical SchoolA word about database design:•Relationships ease extraction and/or reporting of data from the system•Redundancy•Concept of attributes in rows instead of columnsBioInformatics @ Dartmouth Medical SchoolDesign database for data extraction: think it throughBioInformatics @ Dartmouth Medical SchoolDesign database for data extraction: think it throughBioInformatics @ Dartmouth Medical SchoolDesign database for data extraction: think it throughSubject IDAddressCity/TownStateZipcodeDOB...Subject IDSibling IDSibling Cancer TypeSibling Dx Date...•Reusable ‘core’ modules, with customizable components•Standard business logic framework controls transactions (middle layer)•Metadata-based back-end data storage (facilitates data sharing)BioInformatics @ Dartmouth Medical SchoolExample: BioInformatics Core TechnologyBioInformatics @ Dartmouth Medical SchoolBioInformatics Core TechnologySybaseAuthntic.dbEvents.dbQuestions.dbSpecimen.dbSpecific tostudyOthers...Specimen Trackingdefine/create/edit/destroyItem ()define/create/edit/destroyPkg ()add/deleteItemFrmPkg ()send/receivePkg()AuthnticationEvent TrackUtilitiesQuestionsSpec TrackAuth Tools Spec Tools Quest Tools Event Tools Utility ToolsDatabase AccessWeb AppsISQL/Reportscreate/edit/destroyReportQueryadd/edit/deleteQueryParamGenericSQLMethodmake/get/destroyConnection ()prepareTheCall/Statement()executeQuery/Update()create/edit/retireUsergrant/revokeUserPermissionsHTMLHTML ToolsData Security: High PriorityBioInformatics @ Dartmouth Medical SchoolHIPAA, FIPS 140-2 (VA), IRB requirements …Life science has become a field which generates an enormous amount of un-integrated data.BioInformatics @ Dartmouth Medical SchoolHow can methods for data organization help to solve this problem?BioInformatics @ Dartmouth Medical SchoolWhat is Data Integration?•Creating a system which allows the extraction of a piece or set of information (query


View Full Document

DARTMOUTH BIOL 039 - BIOINFORMATICS DATABASES

Download BIOINFORMATICS DATABASES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view BIOINFORMATICS DATABASES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view BIOINFORMATICS DATABASES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?