Unformatted text preview:

1Pattern RecognitionPrimary Sequences with Functional SignificanceGene FindingRecognition of Coding Regions!What type of information is present within a primary genomic nucleotide sequence which might provide a hint as to which genomic sequences code for proteins?Gene Signals!Promoters!Terminators!Ribosome binding sites!Start/Stop codonsContext Searches!Open reading frames!Codon usage preference!Species-specific preferences!Non-random nucleotide patterns!GC base composition!Species-specificity GC percentage!Third position GC bias!General base frequency!Non-random triplet organization!Hidden Markov Models!Neural NetworksReading Frame Prediction and Codon Analysis!Frames!CodonFrequency!Correspond2Gene Finding Programs!CodonPreference!TestCode!Genemark!Glimmer!GrailPro!Many OthersTranslation!Translate!BackTranslate!PepDataSequence Patterns!Composition!Terminator!Repeat!Window!StatPlotFinding Open Reading FramesFrames!Identification of Reading Frames!Open frames only!All start and stop codons!Necessary for viruses and eukaryotic genes!Rare codon display!Mark boxesanalyze% frames -check humhbb.gb_pr1Frames shows open reading frames for the six translation frames of aDNA sequence. Frames can superimpose the pattern of rare codonchoices if you provide it with a codon frequency table.Minimal Syntax: % frames [-INfile1=]Bacteria:EcoOmpa -DefaultPrompted Parameters:-BEGin=1 -END=2270 range of interestLocal Data Files:-TRANSlate=translate.txt defines start and stop codons-MARk1=ecoompa.mrk marks regions of interest on the plot3Optional Parameters:-ALL shows all start and stop signals, not just open frames-RARe=ecohigh.cod mark rare codons according to codon frequency table-THReshold=0.0 sets threshold at or below which rare codons are shown-DENsity=2270 sets the number of bases per 100 platen unitsAll GCG graphics programs accept these and other switches. See the UsingGraphics chapter of the USERS GUIDE for descriptions.-FIGure[=FileName] stores plot in a file for later input to FIGUREAdd what to the command line ?Process set to plot with VT340 attached to termusing the regd graphic interface.Begin (* 1 *) ?End (* 73308 *) ?When your VT340 attached to tty is ready, press <Return>.E. coli ompA GeneLOCUS ECOOMPA 2270 bp ds-DNABCT 15-SEP-1989DEFINITION E.coli sulA and ompA genes coding forsulA protein (lon suppressor)and outer membrane protein II.FEATURES Location/QualifiersCDS 172. .681/note="sulA protein"/gene="sulA"/codon_start=1CDS 1036. .2076/note="outer membrane protein II"/gene="ompA"/codon_start=1E. coli ompA Gene /rareb-Actin gene ORFb-Actin gene /all Human Fetal b-Globin Gamma Gene ORF4Human Fetal b-Globin Gamma Gene /allCodonPreference!Preference for particular codons within a synonymous group!Due to utilization of specific isoacceptingtRNA species!GC genomic biases!Organism and gene differences!High and low expressersCodon Frequency Tables!Table of codon usage frequencies!Organisms!Classes of genes!Particular genesanalyze% more ecohigh.cod!!CODON_FREQUENCY 1.0Codon usage for enteric bacterial (highly expressed) genes 7/19/83AmAcid Codon Number /1000 Fraction ..Gly GGG 13.00 1.89 0.02Gly GGA 3.00 0.44 0.00Gly GGU 365.00 52.99 0.59Gly GGC 238.00 34.55 0.38Glu GAG 108.00 15.68 0.22Glu GAA 394.00 57.20 0.78Asp GAU 149.00 21.63 0.33Asp GAC 298.00 43.26 0.67Val GUG 93.00 13.50 0.16Val GUA 146.00 21.20 0.26Val GUU 289.00 41.96 0.51Val GUC 38.00 5.52 0.07Ala GCG 161.00 23.37 0.26Ala GCA 173.00 25.12 0.28Ala GCU 212.00 30.78 0.35Ala GCC 62.00 9.00 0.10Available Tables - GCGanalyze% to genrundata/usr1/gcg/gcgcore/data/rundataanalyze% ls *.cod!ecohigh.codanalyze% to genmoredata/usr1/gcg/gcgcore/data/moredataanalyze% ls *.cod!celegans_high.cod!drosophila_high.cod!ecolow.cod!maize_high.cod!celegans_low.cod!human_high.cod!yeast_high.codCodon Preference Statistic!Measure of how likely the use of a particular codon is, in comparison to its frequency in a random sequence of the same composition5Codon Preference Statistic!p = F/R!F:Expected frequency of a codon'soccurrence !from the .cod file!R: Frequency of a codon's occurrence in a random sequence of the same compositionOutput!Codon preference statistic is averaged over a window of 25 codons (default)!Output is a curve of the statistic vs. nucleotide positionThird Position Nucleotide Bias!Preference for particular nucleotides in the third (wobble) position of the codon!Based upon overall G+C content of the genome (?)!/bias=GC!(or AT, or any desired nucleotides)analyze% codonpreference -check bg.seqCodonPreference is a frame-specific gene finder that tries torecognize protein coding sequences by virtue of the similarity of theircodon usage to a codon frequency table or by the bias of theircomposition (usually GC) in the third position of each codon.Minimal Syntax: % codonpreference [-INfile1=]Bacterial:EcoOmpa -DefaultPrompted Parameters:-BEGin=1 -END=2270 range of interest-REVerse use the back strand-FREQuency[=ecohigh.cod] codon frequency table-PWINdow=25 preference window in codons-RARe=0.1 rare codon display threshold (-NORAResuppresses)-DENsity=74.48 density in bases per centimeter (11 x 17 paper)Local Data Files:-TRANSlate=translate.txt defines the start and stop codons-MARk=ecoompa.mrk defines regions of known interestOptional Parameters:-BIAS=GC shows third position bias curves for G+C (-NOBIASsuppresses)-NOPREFerence suppresses the codon preference curves-BWINdow=25 bias window in codons-FILe[=FName] makes an output file of the preference curve values-TABle[=FName] creates a table with the statistics for each codon-ALLFrames shows all start and stop codons-NOFRAmes suppresses the reading frame part of the plot-NOPLOt suppresses the whole plot-PHEIght=77.0 sets the height of the vertical axis in platen units-PLENgth=120.0 sets the length of the horizontal axis in platen units-PSCAlemax=2.2 sets the maximum value on the codon preference scale-BSCAlemax=1.1 sets the maximum value on the third position bias scaleAll GCG graphics programs accept these and other switches. See the UsingGraphics chapter of the USERS GUIDE for descriptions.-FIGure[=FileName] stores plot in a file for later input to FIGUREAdd what to the command line ?Process set to plot with VT340 attached to termusing the regd graphic interface.Begin (* 1 *) ?End(* 5000*)?Reverse (* No *) ?What codon frequency file (* GenRunData:ecohigh.cod *) ?What codon preference window size (in codons) (* 25 *) ?The minimum density for a one-page plot is 164.04 bases/cm.What density would


View Full Document

UAB MIC 753 - Pattern Recognition

Download Pattern Recognition
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pattern Recognition and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pattern Recognition 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?