Name Rachel Patterson BLAST Assignment Answer Sheet Please read the instructions found in the BLAST handout and record all answers on this worksheet ELECTRONICALLY If you choose to hand write you answer you will need to create some space for some of the answers This answer sheet MUST be emailed to your TA by your discussion TIME on the due day listed on the discussion syllabus REMEMBER THAT ALL ANSWERS ARE SUPPOSED TO BE IN YOUR OWN WORDS BLASTN PART ONE 6a Briefly explain what information is contained within this summary The summary shows how well the sequence matches the result sequence 6b Given your knowledge of sequence similarity measures which color is least desirable Most desirable Explain Black is least desirable because it indicates matching less than 40 Red is the most desirable because it is most likely an exact match 7a Record the score 1548 and the e value 0 0 Alignment 78b How many base pairs long is the alignment 858 7c In which organism is this similar sequence found Is there any variation in different strains of this species Vibrio cholerae there is definitely variations and different strains of this species 7d Is there any information about what protein the gene encodes Not in the nucleotide search 7e Look further down the list for 2 other species of organisms Collect the scores e values and alignments of another organism with similar sequence Escherichia coli has a score of 654 645 and e values of 0 0 to 1e 180 Alignment has fewer matching bases 8a Match after drop in scores to less than 100 the score 59 0 and the e value 4e 04 Alignment Organism Felis catus isolate Cinnamon breed 8b Compare these values to those of the first alignment How do the values differ Why is the first alignment a better match More base pairs match The score is higher indicating that the organism is a better match to the sequence The error values are 0 indicating that the results are certain whereas the greater error values indicate that the results are sketchy at best PART TWO BLASTX 5a What is the conserved motif found in a translation of your sequence Enterotoxin a 6a What are the main differences between this summary and the one you saw earlier It s for proteins so the 20 amino acid letters are shown instead of the four bases It shows family and superfamily because of the conserved motif It only looks at 285 identities instead of the 858 base pairs 6b Look at the numbers in the query that matched Does it look like all the DNA is in a coding region or is only part of it matching a protein once your query is translated Not all of the DNA is in a coding region only about seem to be matching to proteins when translated 6c Best Match the score 564 and the e value 0 0 alignment Paste the best alignment that includes a best description of the protein s function It doesn t say anything about cholera enterotoxin A s function but it is known to cause the diarrhea symptoms residents complained of 6d In which organism is this amino acid sequence found Vibrio cholera 01 str 2010EL 1786 What protein is encoded Enterotoxin A Does this protein s sequence differ much between different strains of this species Not too much it seems pretty conserved 6e Find two different species with matches in the Descriptions list that you found in BLASTN above Report on these two matches Score e values and paste the alignments What do they encode Is the match any better use score and e values when matching amino acids than when matching nucleotides Escherichia coli Score 426 e value 4e 147 Codes for heat labile enterotoxin A Scores are better with lower e values using BLASTn Vibrio phage Score 375 1584 in BLASTn with an e value 8e 129 0 in BLASTn it codes for cholera toxin A Scores are better with lower e values in BLASTn BLASTN vs BLASTX comparison Which form of comparison nucleotide to nucleotide OR translated nucleotide to protein gives the best idea of what is the closest match BLASTN gives the best match Based upon all the information you have discovered above draw some conclusions about your DNA sequence What is the likely organism of origin How much confidence do you have in both the nucleotide blast and blastx search results Are the second and third best matching organism possible contaminants in the drinking water or is it clearly the best matching organism The likely culprit is Vibrio cholerae 01 I have a lot of confidence in this answer only tempered by the variance in strains of Vibrio cholerae it could be another strain easily but it is definitely Vibrio cholerae The second and third best matching organisms are possible but not probable given the superiority of Vibrio cholerae matches to any other matches PART THREE BLASTP 13a the conserved motif found in a translation of your sequence None detected 14a Considering just the first match graphics bars how do these results with the short peptide compare to the BLASTX with the long stretch of translated nucleotide sequence Note the colors These are green 50 80 which are not terrible but nowhere near as accurate as with the long stretch of translated sequence 14b What is the highest score and e value for the best alignment listed Paste the best alignment that includes a good description of the gene Best Match score 75 7 and the e value 2e 14 alignment 14c In which organism is this amino acid sequence found Vibrio cholerae What protein is encoded CtxB 14d Find another species with matches in the Descriptions list Report on this match Score e values and paste the alignments Vibrio phage CTX score 72 3 and the e value 3e 13 alignment What does that protein function as Is it a likely alternative to the best match It s another enterotoxin but it is not a likely alternative to the best match given that its score is much lower and does not match the amino acid sequence as well 15 Full sequence of the protein based upon the fragment found by your protein colleagues 16 BLASTN results from using 23 nucleotides of your gene s sequence Can you find the same gene as you did before in the list Yes Possible number of bases in a 23 nt DNA sequence is 423 7 036874 x 10 13 Possible number of amino acids in a 23 aa protein sequence is 2023 8 388608 x 10 29 Why was it so much harder to find a match with only 23 bases of DNA sequence compared to 23 amino acids of protein sequence Because protein sequences have more possibilities therefore results are more specific DNA has so few bases in comparison that 23 repeated ones are way more likely to be repeated in other sequences
View Full Document