DOC PREVIEW
UMD BSCI 410 - NCBI Tutorial Notes

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

NCBI Tutorial Notes BSCI 410 Spring 07/Liu1Ch1: General introductionThe National Center for Biotechnology Information (NCBI): Established in 1988 as anational resource for molecular biology information, NCBI creates public databases,conducts research in computational biology, develops software tools for analyzinggenome data, and disseminates biomedical information.DNA sequences are stored in three major banks: GenBank (USA)To get more info check http://www.ncbi.nlm.nih.gov/About/index.htmlExponential increase in DNA sequences:http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.htmlExplore:Pubmed: literature searches-try Hartwell, LProtein: Search "Keratin AND Human"-- get about 1000 entriesBlast:OMIM (Online Mendelian Inheritance in Man)Open statistic link under OMIM Facts on the leftTaxonomy: Click on ArabidopsisStructure: 3D structure database for all nucleic acids and proteins whose shapehas been determined by X-ray crystallography or nuclear magnetic resonance.The structure database is associated with the VAST program that allows for 3Dstructural comparisons among different proteins. It also will search the databaseon the basis of structure.Mapviewer (under hot spot): click on Homo sapiensClick on Y chromasome-click on a unigeneCh 2: How to use Entrez (All Databases)Entrez' strength is that it provides links between related types of information. Forinstance, in storing a DNA sequence file, the file is associated with the proteintranslation of the sequence, with the literature reference, and with links to similargenes or proteins in other organisms. It also provides a characterization of any notablefeatures, such as conserved regions, and ultimately chromosomal location and 3Dstructure of the gene product. The retrieval system moves relatively effortlessly amongthese various types of data. The links are updated as new data are added to thedatabases (or new databases are developed) and the resource becomes richer andricher over time.Go to www.ncbi.nlm.nih.govClick on "All Databases" on top black barSearch Dystrophin (underlying Duchenne muscular dystrophy) in PubmedType "Dystrophin" in the search box and hit go4393 entries-in reverse chronological orderClick on some of the entries to explore-get abstractLet's call up a paper by "Hoffman, Brown and Kunkel (1987)"Click "Limit" on light blue bar above display buttonEnter "jan 1, 1987 to Dec 31, 1987" press GONow you see the "Hoffman Brown and Kunkel"Click on the authors to get abstract, related articles, etc.Click Link (upper right) drag down menu to click OMIMClick on Xp21.2 -lead to a tableNCBI Tutorial Notes BSCI 410 Spring 07/Liu2Click Xp21.2 again to get the human map (genomic map)Click DMD to get to LocusLink (all the info about DMD)Go back to 'All Databases"Click Protein and then search for "Dystrophin" in the search boxGet 3359 entriesClick on P11532 (Accession number)Click on the Blink on the upper right handThe BLink program displays a graphic of an alignment of our gene with allrelated sequences in the database.Click on any entry The display you see is the standard output format forGenBank. It has a specific list of sections, each with a particular type ofinformation. It also includes highlighted links to other aspects of the databases.Check different sections for informationWhat is the protein sequence?Get the "FASTA" format by clicking the "DISPAY drag down menu" in upperleftSelect FASTA from the dragdown menuRestart Search by going back to "All Databases"Click "Protein" and then search for"Dystrophin and chicken" in the search box--get 83 entries"Dystrophin not human" in the search box--get about 2081 entriesCh 3: How to use Blast (Basic Local Alignment Search Tool)Blast: the most important single software tool for searching sequence databases.Query: The sequence used to initiate searchesGo to www.ncbi.nlm.nih.govClick BLASTClick on 'Protein-protein BLAST [blastp]' under the Protein BLAST heading1. Use following INSULIN sequence from zebra fish to searchCopy following sequence in FASTA format; paste the sequence into thesearch box>gi|12053668|emb|CAC20109.1|insulin[Danio rerio]MAVWIQAGALLVLLVVSSVSTNPGTPQHLCGSHLVDALYLVCGPTGFFYNPKRDVEPLLGFLPPKSAQETEVADFAFKDHAELIRKRGIVEQCCHKPCSIFELQNYCNAlternatively, you can also type the accession number in the windowSet subsequence allows you to search with a particular portion of the sequence.Leave it blank so that the entire sequence will be used in the search.Choose database has a drop-down menu: choose nr for non-redundant-itincludes one copy of each gene or protein in several of the main databases andexcludes multiple copies of each recordDo CD-Search allows a comparison of the query sequence to a database ofconserved domain patterns. This is a powerful tool for finding functionaldomains in genes. Leave it toggled onHit the submit buttomOn the top, you will see conserved domain (red bar), click on that to see CDNCBI Tutorial Notes BSCI 410 Spring 07/Liu3Click "format" to get the Blast resultYou will see a detailed list of hits ordered by their alignment scores. Theycorrespond to the ones displayed graphically. Note that each line gives theidentification information for the protein followed by the alignment score andthe E value. The entries are ranked from the lowest to the highest E value, whichcan be interpreted as from most similar to more distant. The top line, notsurprisingly, is the record for Zebra fish insulin itself. If you click on the geneidentifier link, it will call up the sequence from Entrez. When you click on theScore link, it will show you the particular alignment from lower down in theoutput file.XXXX is a low complexity, or repetitive, region that is masked out in the queryand ignored in the database search. Such regions may interfere with thealignment. You can see the actual masked sequence, since it is the sameprotein, in the subject line (LLVLLVVSSVS); it is a mostly hydrophobic, repetitivesequence.Alignment score (S): indicating how strong the match was (higher is better).E value: a statistical measure of the significance of the match; expectation thatthe match would have been found in the database by chance alone (lower isbetter).Click on score 91.3: see imperfect matches+ means similar but not identicalHow do you calculate % identity vs % similarityThere are gaps (deletions and insertions) between alignmentNote the query vs subjectNote that the full length of the protein is not shown. BLAST is a local alignmenttool and only displays the most


View Full Document

UMD BSCI 410 - NCBI Tutorial Notes

Documents in this Course
Notes

Notes

21 pages

Notes

Notes

21 pages

Quiz 6

Quiz 6

2 pages

Quiz 4

Quiz 4

2 pages

Exam I

Exam I

5 pages

Load more
Download NCBI Tutorial Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view NCBI Tutorial Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view NCBI Tutorial Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?