DOC PREVIEW
BYU BIO 465 - VelvetManual

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Velvet Manual - version 0.7Daniel ZerbinoAugust 29, 2008Contents11 For impatient people> make> ./velveth> ./velvetg> ./velveth sillyDirectory 21 -shortPaired data/test_reads.fa> ./velvetg sillyDirectory(Final graph has 16 nodes and n50 of 24184 max 44966)> less sillyDirectory/stats.txt> ./velvetg sillyDirectory -cov_cutoff 5 -read_trkg yes -amos_file yes(Final graph has 1 nodes and n50 of 99975 max 99975)> less sillyDirectory/velvet_asm.afg> ./velvetg sillyDirectory -exp_cov 19 -ins_length 100(Final graph has 12 nodes and n50 of 99975 max 99975)> ./velveth sillyDirectory 21 -short data/test_reads.fa -long data/test_long.fa> ./velvetg sillyDirectory -exp_cov 19(Final graph has 2 nodes and n50 of 99893 max 99893)2 Installation2.1 RequirementsVelvet should function on any standard 64bit Linux environment with gcc. Agood amount of physical memory (12GB to start with, more is no luxury) isrecommended.It can in theory function on a 32bit environment, but such systems havememory limitations which might ultimately be a constraint for assembly.2.2 Compiling instructionsFrom a GNU environment, simply type:> make3 Running instructions3.1 Running velvethVelveth helps you construct the dataset for the following program, velvetg, andindicate to the system what each sequence file represents.2If, on the command line, you forget the syntax, you can print out a shorthelp message:> ./velvethVelveth takes in a number of sequence files, produces a hashtable, thenoutputs two files in an output directory (creating it if necessary), Sequencesand R oadmaps, which are necessary to velvetg. The syntax is as follows:> ./velveth output_directory hash_length[[-file_format][-read_type] filename]The hash length, also known as k-mer length, corresponds to the length, inbase pairs, of the words being hashed. See ?? for a detailed explanation of howto choose the hash length.Supp orted file formats are:fasta (default)fastqfasta.gzfastq.gzelandgeraldRead categories are:short (default)shortPairedshort2 (same as short, but for a separate insert-size library)shortPaired2 (see above)long (for Sanger, 454 or even reference sequences)longPairedFor concision, options are stable. In other words, they are true until contra-dicted by another operator. This allows you to write as many filenames as youwish without having to re-type identical descriptors. For example:> ./velveth output_directory/ 21 -fasta -short solexa1.fa solexa2.fa solexa3.fa -longcapillary.faIn this example, all the files are considered to be in FASTA format, only theread category changes. However, the default options are “fasta” and “short”,so the previous example can also be written as:> ./velveth output_directory/ 21 solexa*.fa -long capillary.fa33.2 Running velvetgVelvetg is the core of Velvet where the de Bruijn graph is built then manipu-lated. Note that although velvetg saves some files during the process to avoiduseless recalculations, the parameters are not saved from one run to the next.Therefore:> ./velvetg output_directory/ -cov_cutoff 4> ./velvetg output_directory/ -min_contig_lgth 100. . . is different from:> ./velvetg output_directory/ -cov_cutoff 4 -min_contig_lgth 100This means you can freely play around with parameters, without re-doingmost of the c alculations:> ./velvetg output_directory/ -cov_cutoff 4> ./velvetg output_directory/ -cov_cutoff 3.8> ./velvetg output_directory/ -cov_cutoff 7> ./velvetg output_directory/ -cov_cutoff 10> ./velvetg output_directory/ -cov_cutoff 2On the other hand, within a single velvetg command, the order of parametersis not important.Finally, if you have any doubt at the command line, you can obtain a shorthelp message by typing:> ./velvetg3.2.1 Single readsInitally, you simply run:> ./velvetg output_directory/This will produce a fasta file of contigs and output some statistics. Expe-rience shows that there are many short, low-coverage nodes left over from theintial correction. Determine as you wish a coverage cutoff value (cf. ??), say5.2x, then type:> ./velvetg output_directory/ -cov_cutoff 5.2On the other hand, if you want to e xclude highly covered data from yourassembly (e.g. plasmid, mitochondrial, and chloroplast sequences) you can usea maximum coverage cutoff:> ./velvetg output_directory/ -max_coverage 300 (... other parameters ...)43.2.2 Adding long readsReminder: you must have flagged your long reads as such when running velveth(cf. ??).If you have a sufficient coverage of short reads, and any quantity of longreads (obviously the more the coverage and the longer the reads, the better),you can use the long reads to resolve repeats in a greedy fashion.To do this, Velvet needs to have a reasonable estimate of the expected cover-age in short reads of unique sequence (see ?? for a definition of k-mer coverage).The simplest way to obtain this value is simply to observe the distribution ofcontig coverages (as described in ??), and see around which value the coveragesof nodes seem to cluster (especially the longer nodes in your dataset). Suppos ingthe expected coverage is 19x, them you indicate it with the exp cov marker:> ./velvetg output_directory/ -exp_cov 19 (... other parameters ...)3.2.3 Paired-ends readsReminder: you must have flagged your reads as being paired-ends when run-ning velveth (cf. ??).To activate the use of read pairs, you must specify two parameters: theexpected (i.e. average) insert length (or at least a rough estimate), and theexpected short-read k-mer coverage (see ?? for more information). If you expectyour insert length to be around 400bp, and your coverage to be around 21.3x,you would type:> ./velvetg output_directory/ -ins_length 400 -exp_cov 21.3(... other parameters ...)If you happen to have hashed paired long reads and you ordered them asexplained in ?? you can also tell Velvet to use this information for scaffoldingby indicating the corresponding insert length (remember that you still need toindicate the short-read k-mer coverage):> ./velvetg output_directory/ -exp_cov 21 -ins_length_long 40000(... other parameters ...)Standard deviations This is a more subtle point which you can ignore ifyou have only one dataset of paired-end reads or if the standard deviation (SD)of the insert lengths is roughly proportional to the expected length (e.g. if theinsert-lengths are describ e d as length ± p%).Velvet does not use the absolute values of the insert-length SDs, but theirrelative values. Therefore, you do not need to spend too much time on theestimation of the SDs, as


View Full Document

BYU BIO 465 - VelvetManual

Documents in this Course
summary

summary

13 pages

Cancer

Cancer

8 pages

Ch1

Ch1

5 pages

GNUMap

GNUMap

20 pages

cancer

cancer

8 pages

SNPs

SNPs

22 pages

Load more
Download VelvetManual
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view VelvetManual and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view VelvetManual 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?