Unformatted text preview:

1Protein AnalysisProtein Analysis!Peptide Mapping!Structural/Functional Motifs!Secondary Structure PredictionMotifsIdentification of Functional DomainsIdentifying Functional Protein Domains!Search protein sequences with a database of defined functional motifs!Motifs are derived by aligning peptide regions which have been shown to have common function!A sequence specification is derived from the alignment which can be used to search for similar motifs in other protein sequencesMotif Sequence Specifications!The sequence specification is the same as forFindPatterns. This is used as a consensus pattern in a search!Motifs!The sequence specification may also be defined as a profile constructed from a set of aligned sequences and used as a part of a library of profiles in a search!ProfileScanPattern Definitions!Findpatterns, Map, Mapsort, Mapplot, and Motifs all let you search with ambiguous expressions!Expressions can include any legal GCG sequence character!Expressions can also specify:!OR and NOT matching!Begin and end constraints!Repeat counts2TAATA(N){20,30}ATG!TAATA, followed by 20 to 30 of any base, followed by ATGRepeats!Parentheses () enclose one or more symbols that can be repeated!Braces {} enclose numbers that tell how many times the symbol(s) must be found!(GA){2,10} - GA repeated 2 to 10 times!G{2,} - G repeated 2 to 350,000 times!(GAT){,10} - GAT repeated 0 to 10 timesOR Matching!Enclose the different choices in parentheses and separate the choices with commas!RGF(Q,A)S!RGF followed by either Q or A followed by S.!GAT(TG,T,G){1,4}A means !GAT followed by any combination of TG, T, or G repeated from 1 to 4 times followed by ANOT Matching!Use the ~ symbol!GC~CAT!GC, followed by any symbol except C followed by AT!GC~(A,T)CC!GC followed by any symbol except A or T, followed by CC.BEGIN AND END Constraints!The pattern <GACCAT can only be found at the beginning of the sequence!The pattern GACCAT> can only be found at the end of the sequenceMotifs!Uses the Prosite dictionary of peptide motifs to search for occurrences of each motif in a query sequence3Prosite!Dictionary of protein sites and patterns!http://www.expasy.ch/prosite/!Distributed by EMBL and maintained by Dr. Amos Bairoch at the University of Geneva!Release 16.35; 13-Apr-2001 !1,462 motif descriptions!GCG at release 16, 7/1999Prosite Files!Site name!Site Description!The sequence motif in FindPatterns format!An abstract file describing the motif along with referencesRestrictions!Patterns are limited to 350 characters!Motifs does not introduce gaps!Mismatches can be tolerated with /Mis=nanalyze% more prosite.seqcat11s_Seed_Storage 11-S plant seed storage proteins signature. 11/19 (0284.pdoc) NGx(D,E)2x(L,I,V,M,F...1433_1 14-3-3 proteins signature 1. 11/19 (0633.pdoc) RNL(L,I)SV(G,A)YKN(I...1433_2 14-3-3 proteins signature 2. 11/19 (0633.pdoc) YK(D,E)STLIMQLL(R,H)...25a_Synth_1 2'-5'-oligoadenylate synthetases signature 1. 11/19 (0653.pdoc) GGSx(A,G)(K,R)xTxL(K...25a_Synth_2 2'-5'-oligoadenylate synthetases signature 2. 11/19 (0653.pdoc) RPVILDPx(D,E)PT2fe2s_Ferredoxin 2Fe-2S ferredoxins, iron-sulfur binding region signature. 11/19 (0175.pdoc) C~(C)~(C)(G,A)~(C)C(...3_Hydroxyisobut_Dh 3-hydroxyisobutyrate dehydrogenase signature. 11/19 (0697.pdoc) F(L,I,V,M)GLGxMGxPM(...3hcdh 3-hydroxyacyl-CoA dehydrogenase signature. 11/19 (0065.pdoc) (D,N,E)x2GF(L,I,V,M,...43_Kd_Postsynaptic 43 Kd postsynaptic protein signature. 11/19 (0339.pdoc)GQDQTKQQI4_Disulfide_Core WAP-type 'four-disulfide core' domain signature. 11/19 (0026.pdoc) Cx~(C)(D,N)x2Cx5CC4fe4s_Ferredoxin 4Fe-4S ferredoxins, iron-sulfur binding region signature. 11/19 (0176.pdoc) Cx2Cx2Cx3C(P,E,G)5_Nucleotidase_1 5'-nucleotidase signature 1. 11/19 (0627.pdoc) (L,I,V,M)x(L,I,V,M)2...5_Nucleotidase_2 5'-nucleotidase signature 2. 11/19 (0627.pdoc) (F,Y)x4(L,I,V,M)GNHE...6pgd 6-phosphogluconate dehydrogenase signature. 11/19 (0390.pdoc) (L,I,V,M)xDx2(G,A)(N...;Amidation Amidation site. 4/19 (0009.pdoc) xG(R,K)(R,K);Asn_Glycosylation N-glycosylation site. 4/19 (0001.pdoc) N~(P)(S,T)~(P)analyze% fetch 0001.pdocanalyze% cat 0001.pdoc************************* N-glycosylation site *************************It has been known for a long time [1] that potential N-glycosylation sites arespecific to the consensus sequence Asn-Xaa-Ser/Thr. It must be noted that thepresence of the consensus tripeptide is not sufficient to conclude that anasparagine residue is glycosylated, due to the fact that the folding of theprotein plays an important role in the regulation of N-glycosylation [2]. Ithas been shown [3] that the presence of proline between Asn and Ser/Thr willinhibit N-glycosylation; this has been confirmed by a recent [4] statisticalanalysis of glycosylation sites, which also shows that about 50% of the sitesthat have a proline C-terminal to Ser/Thr are not glycosylated.It must also be noted that there are a few reported cases of glycosylationsites with the pattern Asn-Xaa-Cys; an experimentally demonstrated occurrenceof such a non-standard site is found in the plasma protein C [5].-Consensus pattern: N-{P}-[ST]-{P}[N is the glycosylation site]-Last update: May 1991 / Text revised.[1]MarshallR.D.Annu. Rev. Biochem. 41:673-702(1972).[ 2] Pless D.D., Lennarz W.J.Proc. Natl. Acad. Sci. U.S.A. 74:134-138(1977).[3]BauseE.Biochem. J. 209:331-336(1983).analyze% motifs -checkMotifs looks for sequence motifs by searching through proteins for thepatterns defined in the PROSITE Dictionary of Protein Sites andPatterns. Motifs can display an abstract of the current literature oneach of the motifs it finds.Minimal Syntax: % motifs [-INfile=]SW:Kad1_Human -DefaultPrompted Parameters:[-OUTfile=]kad1_human.motifs the output file nameLocal Data Files:-DATa=prosite.patterns file of protein sequence patterns4Optional Parameters:-NOREFerence suppresses the PROSITE abstract for each pattern found-FREquent shows motifs that are frequently found in proteins-MISmatch=1 allows one mismatch-NAMes writes the output as a list file-APPend appends the pattern data file to your output file-SHOw shows every file searched, even if no pattern was found-ONCe limits finds to patterns found only once-MINCuts=2 limits finds to patterns found a minimum of 2 times-MAXCuts=3 limits finds to patterns found a maximum of 3 times-EXCLude=n1,n2 excludes patterns found between positions n1 and n2-NOMONitor suppresses the screen trace showing each file-NOSUMmary suppresses the screen summary at


View Full Document

UAB MIC 753 - Protein Analysis

Download Protein Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Protein Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Protein Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?