DOC PREVIEW
UCSD CSE 182 - Assignment 2

This preview shows page 1 out of 2 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Biological Data Analysis (CSE 182) : Assignment 2LogisticsSubmit a zip archive of your code, written answers, output and graphs by email.Sequence Alignment and Gap penaltiesThe first question is simply to understand the space-saving alignment algorithm. Hopefully, the implemen-tation will clarify some of the questions you may have had.Also, we have had quite a bit of discussion about the impact of different scoring functions on the alignment.In this assignment (Q2-4), you will look at the impact of scoring functions on the length of the optimalalignment. If you can explain the behaviour theoretically, you get extra credit, and I hope some of you willattempt Q4.Finally, we go back to the biology. In Q5. you find that running Blast on two sequences of similarlengths have very different outcomes in terms of running times, and what you see as the output. Why isthat? This assignment requires a bit of biology, and computer science, so you should feel free to collaboratewith colleagues from ’the other side’. Also feel free to use the class mailing list for general queries.0.1 Problems1. (40 pts.) Implement a space-efficient program locAL to align two long DNA sequences with lineargap-penalty. The program should take two DNA sequences as input, along with user defined valuesfor match-score, mismatch-score, indel. Its should be invoked as follows:locAL <seqfile1> <seq file2> -m <match> -s <mismatch> -d <indel>and should output the following:• Score of the best local-alignment.• Length of the best local-alignment• The alignment itself.Apply the program to aligning the two pairs of sequences (available on the course web-site) with thefollowing parameters: match:1, mismatch -3, indel -2. Submit the code, and the output.2. (30 pts.) Write a program to generate random DNA sequence (pr[A] = P r[C] = P r[G] = P r[T ] = 0.25)with a specified length. Generate 500 pairs of sequences. Each pair has two sequences of length 1000bpeach. Align them using locAL using two sets of parameters:• parameter P1: match 1, mismatch 0, indel 0• parameters P2: match 1, mismatch -30, indel -20Plot the lengths of the local alignment using paramaters P1, and P2. Are the lengths of the optimallocal alignments different? If so, why? Try the same experiment with random pairs of different length.Define lp(n) as the expected length of the optimal local alignment for a pair of random sequences oflength n. Your computations should give you estimates of lP 1(n), and lP 2(n) for different values of n.1Can you guess the form of lP 1(n), and lP 2(n), as a function of n?Your program should be invoked as follows:randomDNA.pl <number of seq> <size of seq>It should output the random sequences, one per line. After the sequences, the program should calculateand output a summary of the observed nucleotide frequencies in the set of sequences.3. (20 pts.) Phase Transition of Local Alignments: Define lP(n) as in Problem 2. Clearly, theparameter set P can change lP(n).(a) Plot the values of lP(n) for a variety of parameter settings which go from mis-match= -30, to mismatch = 0. For example, you can choose mismatch=indel from{−30, −20, −10, −1, −0.5, −0.33, −0.25, 0}.(b) Is there an abrupt change in the value of lP(n)? If so, can you give the parameters at which thechange happens?4. (Extra credit:10pts.) Can you give a theoretical justfication of your answers in Problems 2, and 3?5. (8pts.) Go to the NCBI web-site, and BLAST the two sequences (available on the course web-page),after switching OFF all filters. What is the number of hits for each sequence? Why is the numberdifferent for the two sequences?6. (2pts.) What language did you use? How much time did you take to do the assignment? Who did youdiscuss your homework


View Full Document

UCSD CSE 182 - Assignment 2

Download Assignment 2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Assignment 2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Assignment 2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?