Unformatted text preview:

Database SearchingSearching for DataIntroduction to Data Base Searching"Exact" matches"Related" sequencesSearch Program ConsiderationsSpeed and CostResultsFindPatternsPattern DefinitionsRepeatsTAATA(N){20,30}ATGOR MatchingNOT MatchingBEGIN AND END ConstraintsPowerPoint PresentationSlide 17Slide 18Slide 19Slide 20Slide 21FastAFastA AlgorithmStep 1Slide 25Slide 26Step 2Initial regionsSlide 29Step 3Slide 31Step 4FastA SummarySpecifying the Word SizeOutputFeaturesSlide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44FastA Results (nt search) Polio Polymerase vs GenEMBLFastA Results (protein search) Polio Polymerase vs SwissProt word=2FastA Results (protein search) Polio Polymerase vs SwissProt word=1TFastASlide 49Slide 50Slide 51TFastA Results Polio polymerase vs. GenEMBLBLASTBLAST SearchesBLAST OptionsNetBLASTFlavors of BLASTSlide 58Slide 59Slide 60Slide 61Local BLAST ResultsSlide 63Slide 64Slide 65Slide 66NetBLAST Results Polio polymerase vs. Genbank nrWeb-based BLAST SearchesRunning NCBI BlastSSearchSlide 71Slide 72Slide 73Slide 74SSearch OutputFrameSearchSlide 77Running FrameSearchSlide 79Slide 80Slide 81Slide 82Slide 83Slide 84Slide 85FrameAlignSlide 87Slide 88Slide 89Slide 90Slide 91NextDatabase SearchingSearching for DataText PatternsLookUpSequence PatternsFindPatternsProfileSearchSequence SimilarityFastA, TFastABLAST, NetBLASTIntroduction to Data Base SearchingWhat are you looking for?"Exact" matches"Have I cloned something that someone else has already worked on?""Related" sequencesIs there something similar to my sequenceEvolutionary relationshipsConvergent functionSearch Program ConsiderationsSensitivityStringencySpeedCostSpeed and CostTime and cost of the search is dependent on the size of the database and the size of the queryRestrict the size of the databaseUse the -batch qualifier to save moneyUse GenBank's ServicesResultsHistogramPlot of 'match scores" vs. number of sequencesAllows you to distinguish background noise from significant matchesSequence NamesAlignmentsFindPatternsLocate short sequence patterns in sequencesNucleic acid or ProteinSearches both strands of a nucleic acid sequencePattern DefinitionsFindpatterns, Map, Mapsort, Mapplot, and Motifs all let you search with ambiguous expressionsExpressions can include any legal GCG sequence characterExpressions can also specify:OR and NOT matchingBegin and end constraintsRepeat countsRepeatsParentheses () enclose one or more symbols that can be repeatedBraces {} enclose numbers that tell how many times the symbol(s) must be found(GA){2,10} - GA repeated 2 to 10 timesG{2,} - G repeated 2 to 350,000 times(GAT){,10} - GAT repeated 0 to 10 timesTAATA(N){20,30}ATGTAATA, followed by 20 to 30 of any base, followed by ATGOR MatchingEnclose the different choices in parentheses and separate the choices with commasRGF(Q,A)S RGF followed by either Q or A followed by S.GAT(TG,T,G){1,4}A means GAT followed by any combination of TG, T, or G repeated from 1 to 4 times followed by ANOT MatchingUse the ~ symbolGC~CATGC, followed by any symbol except C followed by ATGC~(A,T)CCGC followed by any symbol except A or T, followed by CC.BEGIN AND END Constraints The pattern <GACCAT can only be found at the beginning of the sequenceThe pattern GACCAT> can only be found at the end of the sequenceanalyze% findpatterns -check FindPatterns identifies sequences that contain short patterns likeGAATTC or YRYRYRYR. You can define the patterns ambiguously and allowmismatches. You can provide the patterns in a file or simply type themin from the terminal. Minimal Syntax: % findpatterns [-INfile=]Genbank:Humig* -Default Prompted Parameters: -PATterns=GAATTC,RGGAY patterns to be found[-OUTfile=]findpatterns.find the output file name Local Data Files: -DATa=pattern.dat a file with a set of patternsOptional Parameters: -MISmatch=1 allows mismatches in the search for your subsequence-NAMes writes the output as a list file-ONEstrand searches only the top strand of nucleotide sequences-SIXbase searches only for patterns with six or more symbols-CIRcular searches all sequences as if they were circular-ALL does an "overlapping-set" search in nucleotide sequences-PERFect looks only for perfect matches-APPend appends the pattern data file to the output file-SHOw shows every file searched even if there are no finds-TERminal writes output to the terminal screen instead of a file-NOMONitor suppresses the screen trace showing each file-ONCe limits finds to patterns found a maximum of 1 time-MINCuts=1 limits finds to patterns found a minimum of 1 time-MAXCuts=3 limits finds to patterns found a maximum of 3 times-EXCLude=n1,n2 excludes patterns found between positions n1 and n2-SINce=6.90 limits search to sequences dated on or after June 1990-BATch Submits the program to run in the batch queue Add what to the command line ?FINDPATTERNS in what sequence(s) ? swp:* Enter patterns individually, one per line. End the list with a blank line. Pattern 1: ygdd Pattern 2: What should I call the output file (* findpatterns.find *) ? ygdd.find ** findpatterns will run as a batch or at job. ** findpatterns was submitted using the command: " atnow " Job class000.894911339.a will be run at Mon May 11 13:28:59 CDT 1998.analyze%! FINDPATTERNS on swp:* allowing 0 mismatches! 1 YGDD May 11, 1998 11:02 .. AAC1_PSEAE ck: 7052 len: 177 ! P23181 pseudomonas aeruginosa. gentamicin 3'-acetyltransferase (ec 2.3.1.61 YGDD 148: YVQAD YGDD PAVAL AMDZ_YEAST ck: 8601 len: 464 ! Q03557 saccharomyces cerevisiae (baker's yeast). probable amidase ymr293c 1 YGDD 450: QVVGQ YGDD STVLD AMOB_NITEU ck: 4649 len: 420 ! Q04508 nitrosomonas europaea. ammonia monooxygenase (ec 1.13.12.-). 2/961 YGDD 227: RVLLA YGDD LLMDP AMYM_BACST ck: 5976 len: 717 ! P19531 bacillus stearothermophilus. maltogenic alpha-amylase precursor (ecPOLG_HRV1B VPSGCSGTSI FNTMINNIII RTLVLDAYKN IDLDKLKIIA YGDDVIFSYK POLG_HRV2 VPSGCSGTSI FNTMINNIII RTLVLDAYKN IDLDKLKIIA YGDDVIFSYI


View Full Document

UAB MIC 753 - Database Searching

Download Database Searching
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Database Searching and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Database Searching 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?