DOC PREVIEW
CORNELL BIOMG 1350 - Section 3 - Worksheet - Full (Answers)

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

BIOMG 1350 – Section 3 Worksheet DO NOT WRITE ON THIS INSTRUCTION SHEET Databases and Protein Domains Activity In this activity you are going to learn how to use BLAST in order to find homologs and conserved domains to an unknown anemone protein. On the desktop of your computer there is a folder labeled BioMG1350. In this folder is a text file named Morgan_unknownsequence. Morgan is one of the former TAs for this course. She generated this sequence from the anemone Aiptaisa pallida (a Cnidarian). Morgan found that sick anemones had a lot more of this gene expressed in their cells than healthy anemones and so she really wanted to figure out what this gene does. This is your job… Q1: Double click on the text file to open it. Is this an amino acid sequence, an RNA sequence, or a DNA sequence? DNA Q2: In principle, should we be able to easily convert between these three forms? It is easy to convert from DNA to protein if one knows where the ORF is. Cannot convert from protein to DNA because of redundancy in the genetic code, although it is often possible to know the first two nucleotides in a codon. Now go to http://blast.ncbi.nlm.nih.gov/Blast.cgi. For our purposes we will only use the nucleotide blast, the protein blast, and blastx. First we will do a nucleotide blast on Morgan’s sequence: - Click on ‘nucleotide blast’ and it will take you to a new page. - Paste Morgan’s sequence into the search box. (You can include the first line because the > sign tells the program that what is on that line is the title of the search) - Under ‘Choose Search Set’ you can change a number of the settings. (You can choose to search different databases, to search particular organisms, or to exclude particular organisms.) - Use the pull down menu to change the search database from the entire ‘nucleotide collection (nr/nt)’ to just ‘Reference RNA sequences (refseq_rna)’. Remember we want to know what this protein does, so known reference RNA sequences are our best way to find out! - Press the BLAST button and it will start searching GenBank for similar sequences. Give it some time to finish searching. Q3: What result does it give you? (hint: don’t wait too long) No matches Q4: Remember that Cnidarians (anemones) are not organisms that most people study, and Cnidarians are very distantly related to most organisms that people do study. So what does this result from BLAST likely tell us about the sequence that we searched? (1 sentence max!) There are no known nucleotide sequences similar to this cnidarian gene, probably because we have few sequences from cnidarians in general. Ok let’s try something different. Morgan’s sequence could code for the amino acid sequence of a protein. So in theory we could translate Morgan’s sequence and BLAST the amino acid sequence, which may be more conserved across taxa than the nucleotide sequence! However, to do this we need to figure out (1) whether this sequence codes for a protein, and (2) which part of Morgan’s sequence codes for a protein. It turns out that there are a number of programs available on the web that will search nucleotide sequences to look for start codons and stop codons. These programs will find portions of nucleotide sequences that could be protein-coding regions. These putative protein-coding regions are called ‘Open Reading Frames’ (ORF for short). One thing to keep in mind is that these programs will identify open reading frames that may or may not be actual translation sites. For example, if we took a random sequence and put it into these programs we would still expect to find some ‘open reading frames’ just by chance alone. However, we know that for eukaryotes each mRNA sequence only codes for a single protein! So when we see multiple open reading frames in an mRNA molecule we know that only one is actually translated. In general this tends to be the longest open reading frame. - Go to the NCBI Open Reading Frame Finder: http://www.ncbi.nlm.nih.gov/projects/gorf/.BIOMG 1350 – Section 3 Worksheet DO NOT WRITE ON THIS INSTRUCTION SHEET - Copy and paste Morgan’s sequence into the box and click on the ‘OrfFind’ button. This website will now search for start and stop codons to identify all open reading frames (e.g. potential translation sites) in Morgan’s sequence. Every time it finds a start codon followed by a stop codon it marks this area as an ORF (e.g possible translation sites). - Look at the results: The bars that you see are Morgan’s sequence, and the cyan colored areas are possible translation sites within that sequence. Each cyan bar is an ORF. - Click on the longest open reading frame—this is almost certainly the actual translation site for Morgan’s sequence. Notice that after you click on this sequence the nucleotide sequence and the translated amino acid sequence for this portion of Morgan’s unknown gene appear below. Aha…we have now identified the amino acid sequence for Morgan’s unknown gene! Q5: How long in amino acids is Morgan’s translated sequence? (hint: it says on your screen.) 606 amino acids If you look at the top of the page you will notice a pink bar where you can run a BLAST directly from the page you are currently on. In fact, it is asking us whether we would like to run a protein BLAST (blastp) for the amino acid sequence that we just identified. This is exactly what we want to do! - Under the ‘Database’ pull down menu change the database to Swissprot. SwissProt/Uniprot is a large comprehensive database of proteins that is curated by scientists. The Swissprot database currently includes more than 500,000 proteins with known function! (Uniprot has its own website, http://www.uniprot.org/, where you can also run a blast. We will not be using uniprot here, but feel free to look at the uniprot website on your own sometime.) - Press the BLAST button it will take you to a new page. - On this new page press the ‘view report’ button to start your search…give it some time to finish….Aha! Now we get a very different result. - First, look (don’t click) at the thin box on the top of the screen that is labeled: ‘Putative conserved domains have been detected, click on the image below for details’. Blastp has searched your query sequence and looked for portions of this sequence that are similar to known functional domains in other proteins. The grey bar represents the length of your query sequence and


View Full Document

CORNELL BIOMG 1350 - Section 3 - Worksheet - Full (Answers)

Download Section 3 - Worksheet - Full (Answers)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Section 3 - Worksheet - Full (Answers) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Section 3 - Worksheet - Full (Answers) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?