Classical Test Theory and ReliabilityBasics of Classical Test TheoryClassical Test TheorySlide 4Slide 5Slide 6Slide 7Slide 8True ScoresDomain Sampling TheorySlide 11Slide 12Slide 13Classical Test Theory ReliabilityCTT: Reliability IndexSlide 16CTT: Test-Retest ReliabilitySlide 18Slide 19Slide 20Slide 21CTT: Parallel Forms ReliabilitySlide 23Slide 24CTT: Split Half ReliabilitySlide 26Spearman Brown FormulaSlide 28Slide 29Slide 30Detour 1: Variance Sum LawSlide 32Slide 33Slide 34CTT: Internal Consistency ReliabilitySlide 36Slide 37Slide 38Slide 39Slide 40Slide 41Detour 2: Dichotomous ItemsSlide 43CTT: Reliability of ObservationsSlide 45Slide 46Standard Error of MeasurementSlide 48CTT: The Prophecy FormulaSlide 50CTT: AttenuationSlide 52Slide 53Cal State NorthridgePsy 427Andrew Ainsworth, PhDBasics of Classical Test TheoryTheory and AssumptionsTypes of ReliabilityExampleClassical Test TheoryClassical Test Theory (CTT) – often called the “true score model”Called classic relative to Item Response Theory (IRT) which is a more modern approachCTT describes a set of psychometric procedures used to test items and scales reliability, difficulty, discrimination, etc.Classical Test TheoryCTT analyses are the easiest and most widely used form of analyses. The statistics can be computed by readily available statistical packages (or even by hand)CTT Analyses are performed on the test as a whole rather than on the item and although item statistics can be generated, they apply only to that group of students on that collection of itemsClassical Test TheoryAssumes that every person has a true score on an item or a scale if we can only measure it directly without errorCTT analyses assumes that a person’s test score is comprised of their “true” score plus some measurement error. This is the common true score modelX T E= +Classical Test TheoryBased on the expected values of each component for each person we can see thatE and X are random variables, t is constantHowever this is theoretical and not done at the individual level.( )( ) ( ) ( ) 0i ii i ii i i i i iX tE X tX t X t t tee e e== -- = - = - =Classical Test TheoryIf we assume that people are randomly selected then t becomes a random variable as well and we get:Therefore, in CTT we assume that the error :Is normally distributedUncorrelated with true scoreHas a mean of ZeroX T E= +TX=T+E measWithout s measWi th sTrue ScoresMeasurement error around a T can be large or smallT1T2T3Domain Sampling TheoryAnother Central Component of CTTAnother way of thinking about populations and samplesDomain - Population or universe of all possible items measuring a single concept or trait (theoretically infinite)Test – a sample of items from that universeDomain Sampling TheoryA person’s true score would be obtained by having them respond to all items in the “universe” of itemsWe only see responses to the sample of items on the testSo, reliability is the proportion of variance in the “universe” explained by the test varianceDomain Sampling TheoryA universe is made up of a (possibly infinitely) large number of itemsSo, as tests get longer they represent the domain better, therefore longer tests should have higher reliabilityAlso, if we take multiple random samples from the population we can have a distribution of sample scores that represent the populationDomain Sampling TheoryEach random sample from the universe would be “randomly parallel” to each otherUnbiased estimate of reliability = correlation between test and true score = average correlation between the test and all other randomly parallel tests1 1t jr r=1tr1 jrClassical Test Theory ReliabilityReliability is theoretically the correlation between a test-score and the true score, squaredEssentially the proportion of X that is TThis can’t be measured directly so we use other methods to estimate2 222 2 2T TXTX T Es srs s s= =+CTT: Reliability IndexReliability can be viewed as a measure of consistency or how well as test “holds together”Reliability is measured on a scale of 0-1. The greater the number the higher the reliability.CTT: Reliability IndexThe approach to estimating reliability depends on Estimation of “true” scoreSource of measurement errorTypes of reliabilityTest-retestParallel FormsSplit-halfInternal ConsistencyCTT: Test-Retest ReliabilityEvaluates the error associated with administering a test at two different times.Time Sampling ErrorHow-To:Give test at Time 1Give SAME TEST at Time 2Calculate r for the two scores• Easy to do; one test does it all.CTT: Test-Retest ReliabilityAssume 2 administrations X1 and X2The correlation between the 2 administrations is the reliability1 2( ) ( )i iX Xe e=1 22 2i iE Es s=1 21 21 222X XTX X XTX X Xssr rs s s\ = = =CTT: Test-Retest ReliabilitySources of errorrandom fluctuations in performanceuncontrolled testing conditions○extreme changes in weather○sudden noises / chronic noise○other distractionsinternal factors○illness, fatigue, emotional strain, worry ○recent experiencesCTT: Test-Retest ReliabilityGenerally used to evaluate constant traits.Intelligence, personalityNot appropriate for qualities that change rapidly over time.Mood, hungerProblem: Carryover EffectsExposure to the test at time #1 influences scores on the test at time #2Only a problem when the effects are random.If everybody goes up 5pts, you still have the same variabilityCTT: Test-Retest ReliabilityPractice effectsType of carryover effectSome skills improve with practice○Manual dexterity, ingenuity or creativityPractice effects may not benefit everybody in the same way.Carryover & Practice effects more of a problem with short inter-test intervals (ITI).But, longer ITI’s have other problemsdevelopmental change, maturation, exposure to historical eventsCTT: Parallel Forms ReliabilityEvaluates the error associated with selecting a particular set of items.Item Sampling ErrorHow To:Develop a large pool of items (i.e. Domain) of varying difficulty.Choose equal distributions of difficult / easy items to produce multiple forms of the same test.Give both forms close in time.Calculate r for the two administrations.CTT: Parallel Forms ReliabilityAlso Known As:Alternative Forms or Equivalent FormsCan
View Full Document