This preview shows page 1 out of 2 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC423 Project 1Handed out: 10/2/2008Due: 10/28/2008For this project you will have to create a program that takes as input a sequence and searches for all good alignments of this sequence inside a database, using the Smith-Waterman dynamic programming algorithm described in class.Inputs:• one query sequence in FASTA format• multiple sequence database in FASTA format• similarity matrix (see format on syllabus page)• minimum % identity• gap opening (creation) and extension penalty (i.e. using affine gaps)Outputs:• precise local alignment for all hits over %identitySample inputs:See syllabus page: http://www.cbcb.umd.edu/confcour/CMSC423-syllabus.shtmlOutput format:See syllabus page: http://www.cbcb.umd.edu/confcour/CMSC423-syllabus.shtml>gi|90423415|ref|YP_531785.1| Gene info cytochrome c, class I [Rhodopseudomonas palustris BisB18]Score = 184, Identities = 49/152 (32%), Gaps = 6/152 (3%)Query 23 LPKTRTKALLTALTLAAAAAAAPALADVEFRHAL---DDSALDLSPIKGEEITDAVKSFR 79 P A A AL FRH D S G T AV F Text 3 MPSFNRSIAISATLAVGLLAPVVALGQEVFRHTVTGEDLKIMETSQPSGRD-TEAVRNFL 61 Note:The output must include, for each database sequence matching the query:● the header of the database sequence● aggregate information:○ Smith-Waterman score, score, number of identities and percentage (w.r.t. length aligned range within query sequence), and # and % of gaps.● the full alignment information including:○ the identifiers for the aligned sequences○ coordinates along these sequences○ gaps within the sequencesFurthermore, identical amino-acids are highlighted by repeating the identical letter in between the aligned sequences.Input format:Your program must accept all parameters from the command line, e.g:myprogram -m BLOSUM.matrix -d sequence.database -i sequence.inputSubmission: Use the submit program - this project should be submitted as assignment 3:submit 2008 fall cmsc 423 0101 3 <submission_file>Grading! We will grade all aspects of the code, including how “pretty” it looks. Specifically pay attention to the following aspects:1. Please make sure that your code works as advertised in the README file you provided. If your code doesn’t work as indicated in the README file you will automatically lose 50% of the grade for this assignment.2. Please provide copious comments and format your code so that it is easy to read. Part of your grade will be based on the formatting of the code.3. Fastest programs get additional credit:a. 20 points for fastestb. 12 points for second fastestc. 5 points for third fastest4. If your code does not implement affine gap penalties the maximum score you will receive is 75 and your program will not be part of the speed competition.Please contact me and Mohammad as soon as possible if you have any questions regarding this assignment, or if you “get stuck” and might not be able to complete the assignment on time. Once the assignment is due I will no longer accept any excuses.Important: Please copy all email to both myself and Mohammad if you want a quick reply!Good


View Full Document

UMD CMSC 423 - Project 1

Documents in this Course
Midterm

Midterm

8 pages

Lecture 7

Lecture 7

15 pages

Load more
Download Project 1
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Project 1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Project 1 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?