1Applications of Applications of AffymetrixAffymetrixSNP chipsSNP chipsRafael A. IrizarryDepartment of BiostatisticsJohns Hopkins Bloomberg School of Public HealthAcknowledgementsAcknowledgements• Benilton Carvalho, JHU Biostat• Wenyi Wang, UC Berkeley• Terry Speed, UC Berkeley• Shin Lin, UPenn• Simon Cawley, Affymetrix• Aravinda Chakravarti, JHU IGM• Dan Arking, JHU IGM• Dave Cutler, JHU IGM• Seth Falcon, Robert Gentleman andBioconductor TeamGenotypingGenotyping2What are What are SNPsSNPs??TAGCCATCGGTANGTACTCAATGATGenomic DNA:ASNPGA person can be AA , AG or GGTACATAGCCATCGGTANGTACTCAATGATGATAGenomic DNA:ATCGGTAGCCATTCATGAGTTACTAPM probe for Allele A:ATCGGTAGCCATCCATGAGTTACTAPM probe for Allele B:ASNPGAffymetrix SNP chip terminologyGenotyping: answering the question about the two copies of the chromosome on which the SNP is located:Is a person AA , AG or GG at this Single Nucleotide Polymorphism?Probe effectProbe effect3TACATAGCCATCGGTANGTACTCAATGATGATAGenomic DNA:ATCGGTAGCCATTCATGAGTTACTAPM probe for Allele A:ATCGGTAGCCATCCATGAGTTACTAPM probe for Allele B:ASNPGAffymetrix SNP chip terminologyGenotyping: answering the question about the two copies of the chromosome on which the SNP is located:Is a person AA , AG or GG at this Single Nucleotide Polymorphism?TACATAGCCATCGGTANGTACTCAATGATGATAGenomic DNA:GTAGCCATTCATGAGTTACTACTCTPM probe for Allele A:GTAGCCATCCATGAGTTACTACTCTPM probe for Allele B:ASNPGAffymetrix SNP chip terminologyGenotyping: answering the question about the two copies of the chromosome on which the SNP is located:Is a person AA , AG or GG at this Single Nucleotide Polymorphism?Probe IntensitiesProbe IntensitiesSample1Genotype=AASample2Genotype=ABSample3Genotype=BBFake (idealized) image for 3 samples on one SNPFake, as the probes are not all adjacent on the chipIdealized, as all the probes are high or low as they should be.4NotationNotation• Once we are done with first part ofpreprocessing we have the following:θA and θB proportional to log of the amount offragments from allele A and B respectivelyIn principal these can only be (log of) 0, x, or 2x, but weknow better than to believe this.. In fact we know notto expect the same cut-off to work for all SNPsItIt’’s nots not easyeasyThis picture shows that most the information is in the leftright diagonal direction, i.e. in the log-ratiosCRLMMCRLMMCarvalho et al. (2007) Biostatistics5Further difficultiesFurther difficultiesAccuracy versusAccuracy versus Drop RateDrop RateExamples of why CRLMMExamples of why CRLMMbetterbetter6Big ShiftsBig ShiftsBRLMMCRLMM““Room for improvementRoom for improvement”” Probes ProbesBRLMM CRLMMDifferent Different hybeshybes,, different qualitydifferent quality7BadBad HybesHybesCopy NumberCopy NumberCopyCopy NumberNumberChr 21Now we want absolutes:Probe effect a problem!8Copy NumberCopy NumberChr 21910Thanks!Supplemental SlidesSupplemental Slides11LabLab EffectEffectWhy is this?Why is this?• Our guess is that the PCR step introduces alot of SNP to SNP variation• We have proxies for measuring PCR effect:fragment sequence and fragment length• We can examine the fragment sequence viathe probe sequenceLog-ratio biases persistLog-ratio biases persist12Different Different hybeshybes,, different qualitydifferent qualityLength effect on MLength effect on MIntensity effect on MIntensity effect on M13NormalizationNormalization• We normalize/summarize using RMA (noBG correction) after correcting forsequence and length effects on the logintensities• We then examine log-ratios• We keep sense and antisense separateUse mixture model to fix thisUse mixture model to fix this• SNP denoted with I• Z is true, so k = AA, AB or BB• X are covariates that cause bias• We later use SNR = Median(f1)2 / Var(ε) as measure of qualityPreprocessing modelPreprocessing modelmotivates genotype algorithmmotivates genotype algorithm•Array denoted with j•Shift in cluster center denoted with m•Assume m are bivairate normal withcovariance V and the variance of themeasurement error is inverse chi-squared•Use training data to estimate•Use empirical bayes approach for caseswith few data points14250 ng Genomic DNARE digestionXba XbaXbaAdapter ligationSingle PrimerAmplificationFragmentationand LabelingHyb & Scan onStandard HardwareSingle primer assay: overviewSingle primer assay:
View Full Document