DOC PREVIEW
Bloomberg School BIO 751 - data from Illumina

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Analysing data fromIllumina BeadArraysMatt RitchieDepartment of OncologyUniversity of Cambridge, UK24th September 2008The bead• Each silica bead is 3 microns in diameter• 700,000 copies of same probe sequence are covalentlyattached to each bead for hybridisation & decoding• Can have more than one bead for a particular gene *Beads in wells• Bead pools produced containing 384 to 24,000bead types• Wells created in either fibre optic bundle(hexagon) or chip (rectangle)• Beads self-assemble into wells to form randomlyarranged array of beads• Average of 30 beads of each type (addsrobustness)• Each array produced separatelyBead preparation and arrayproductionDecoding process• Beads get assigned random locations on each array.Need to know what type every bead is• Decoding is achieved by series of sequential hybridisations*• Each bead type defined by unique DNA sequence that isrecognised by a complementary decoder• Process is highly effective (error rate < 1 x 10-4 per bead)* Gunderson et al. Decoding randomly ordered DNA arrays.Genome Research, May 20041 1 0 1 2 2 0 2SAM - Sentrix Array Matrix• 96 arrays processed in parallel• each array contains ~ 1500 bead types2RefSeq BeadChip8 arrays per chip 1 strip = 1 array24,000 bead types from RefSeqdatabase x 30 reps on each arrayWhole Genome6 arrays per chip: 2 strips = 1 array48,000 bead types on each array(24,000 RefSeq + 24,000 supplementary)BeadChipAdvantages• Random placement of beads• High number of within array replicates - robustness• High throughput studies possible (eg HapMap)• Expression, genotyping and methylation variants possible• Lower costTIFFsData analysisImage analysis (BeadScan)Summarisation (BeadScan)Downstream analysis (BeadStudio)Normalisation (BeadStudio)Quality assessment (BeadStudio)Scanning (BeadScan).idat files.txt filessummary .txt.locs filesarrayRaw dataIllumina’s scanning software (BeadScan) produces files (.idat, .locs etc) inproprietary format which are read by Illumina’s BeadStudio softwareHowever, with modifications to BeadScan, you can obtain standard(readable) files for each array on a SAM or strip on a BeadChip- Text file giving the identity and location of each individualbead (~ 50,000 rows for each SAM ~ 1.1 million for each strip/array on a BeadChip) [required]-TIFF images [optional]We refer to the text and TIFF files as the bead-level data for an arrayBead-level text filesExample:ProbeIDBackgroundcorrectedintensityBead CentreInformation for all* beads on an array*Sometimes outliers or non-decoded beads are removedTIFF imagesCan use bead centres tore-calculate intensities3• Foreground signal = average over 9pixels for each bead• Background calculated using a17 x 17 window around each bead• Local background = average of the5 minimum pixels ( ) within eachwindow• Background correction carried outby BeadScan:Foreground - Background• For two-colour spotted arrays,background correction in this waycan be disastrous!Image analysisSpike-in data set1000pM300pM100pM30pM10pM3pMx 41pM0.3pM0.1pM0.03pM0.01pM0pMx 4• Mouse 6 version 1 BeadChips• Arrays include 33 spike probes• Each array hybridised with the same RNA sample + spikeConcentration seriesRaw dataForeground Backgroundlog2(intensity)3pM vs 1pM1pM vs 0.3pM4Concentration seriesQuality assessment - image plot• scanning problem - values down one edge censored at 0• 4 - 7% of beads affected - successfully detected as outliers• Question: How many outliers can Illumina’s summarisationmethod handle before we need to think about removing an array?Simulation of corrupted databias variance% outliers simulatedBackground normalisation3pM vs 1pMlog-odds• Summarise data fromreplicate beads on the log2-scale (mean and variance)• Use inverse of bead variancesas weights in differentialexpression (DE) analysis• Assess DE with and without(treat all observations equally)weightsUse of variability in DE analysis5Recommendations• Illumina’s local background correction and summarisation methodsperform well• We do not currently recommend Illumina’s background normalisation(now referred to as ‘subtract background’ option in BeadStudio - not tobe confused with local background subtraction refereed to above, whichis not optional)• We do not recommend exporting ‘gene’ summary data fromBeadStudio, as the results are averaged over (sometimes) mis-annotated probes. We prefer ‘probe’ summary output.• Access to bead-level data enables - more detailed quality assessment - calculation of means and variances on the appropriate scale• The beadarray package from Bioconductor can process bead-level orsummary-level data in R. We have used limma for DE analysis.Current and future work• improved annotation (for next Bioconductor release)• automatic artefact detection and removal tool (BASH)• exploring data from two-colour technologies• apply crlmm to Illumina Infinium II genotyping dataAcknowledgementsTavaré LabMark DunningAndy LynchNuno Barbosa-MoraisJonathan CairnsMike SmithSimon TavaréGenomics CoreJames HadfieldMichelle OsborneIllumina (San Diego)Semyon KruglyakGary NunnUCSD (San Diego)Roman SasikWEHI (Melbourne)Wei ShiGordon SmythReferences1. KL Gunderson, S Kruglyak, et al. Decoding randomly ordered DNA arrays.Genome Res, 14(5):870-877, 20042. K Kuhn, SC Baker, et al. A novel, high-performance random array platform forquantitative gene expression profiling. Genome Res, 14(11):2347-2356, 20043. M Barnes, J Freudenberg, et al. Experimental comparison and cross-validation of theAffymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res,33(18):5914-5923, 20054. MJ Dunning, ML Smith, et al. beadarray: R classes and methods for Illumina bead-based arrays, Bioinformatics, 23(16):2183-4, 20075. MJ Dunning, NL Barbosa-Morais, et al. Statistical issues in the analysis of Illuminadata. BMC Bioinformatics, 9:85, 20086. MJ Dunning, ME Ritchie, et al. Spike-in validation of an Illumina-specific variance-stabilizing transformation. BMC Research Notes, 1:18, 20087. Illumina probe reannotation: http://www.compbio.group.cam.ac.uk/Resources/Annotation/index.htmlAccessing bead-level dataBead-level text and tiff files can usually be obtained bymodifying the following lines in the settings.xml file used byBeadScan (the entries below are typically be set to theiropposites, i.e. true -> false, and false ->


View Full Document

Bloomberg School BIO 751 - data from Illumina

Download data from Illumina
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view data from Illumina and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view data from Illumina 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?