1SamplingWhy Sample?• Why not study everyone?• Debate about Census vs. samplingProblems in Sampling?• What problems do you know about?• What issues are you aware of?• What questions do you have?2Key Sampling ConceptsCopyright ©2002, William M.K. Trochim, All Rights ReservedSampling ProcessList of Target SampleUnits of Analysis (people)Actual Population to Which Generalizations Are MadeDefined/Listed by Sampling FrameSampling FrameList or Rule Defining the PopulationSampleThe people actually studiedTarget PopulationPopulation of InterestTarget SampleMethod of selectionResponseRateGeneralizationList or ProcedureKey Ideas• Distinction between the population of interest and the actual population defined by the sampling frame• Generalizations can be made only to the actual population• Understand crucial role of the sampling frame3Sampling Frame• The list or procedure defining the POPULATION. (From which the sample will be drawn.)• Distinguish sampling frame from sample.•Examples:– Telephone book– Voter list– Random digit dialing• Essential for probability sampling, but can be defined for nonprobability samplingTypes of SamplesProbabilityNon-ProbabilityConveniencePurposiveSimple RandomSystematic RandomStratified RandomRandom ClusterComplex Multi-stage Random (various kinds)QuotaStratified ClusterProbability Samples• A probability sample is one in which each element of the population has a known non-zero probability of selection.• Not a probability sample of some elements of population cannot be selected (have zero probability)• Not a probability sample if probabilities of selection are not known.4Probability Sampling• Cannot guarantee “representativeness” on all traits of interest• A sampling plan with known statistical properties• Permits statements like: “The probability is .99 that the true population correlation falls between .46 and .56.”Sampling Frame is Crucial in Probability Sampling• If the sampling frame is a poor fit to the population of interest, random sampling from that frame cannot fix the problem• The sampling frame is non-randomly chosen. Elements not in the sampling frame have zero probability of selection.• Generalizations can be made ONLY to the actual population defined by the sampling frameTypes of Probability SamplesSimple Random Systematic RandomStratified RandomRandom ClusterComplex Multi-stage Random (various kinds)Stratified Cluster5Simple Random Sampling• Each element in the population has an equal probability of selection AND each combination of elements has an equal probability of selection• Names drawn out of a hat• Random numbers to select elements from an ordered listStratified Random Sampling-1• Divide population into groups that differ in important ways• Basis for grouping must be known before sampling• Select random sample from within each groupStratified Random Sampling-2• For a given sample size, reduces error compared to simple random sampling IF the groups are different from each other• Tradeoff between the cost of doing the stratification and smaller sample size needed for same error• Probabilities of selection may be different for different groups, as long as they are known• Oversampling small groups improves inter-group comparisons6Systematic Random Sampling-1• Each element has an equal probability of selection, but combinations of elements have different probabilities.• Population size N, desired sample size n, sampling interval k=N/n.• Randomly select a number j between 1 and k, sample element j and then every kthelement thereafter, j+k, j+2k, etc.• Example: N=64, n=8, k=64/8=8. Random j=3.Systematic Random Sampling-2• Has same error rate as simple random sample if the list is in random or haphazard order• Provides the benefits of implicit stratification if the list is groupedSystematic Random Sampling-3• Runs the risk of error if periodicity in the list matches the sampling interval• This is rare.• In this example, every 4thelement is red, and red never gets sampled. If j had been 4 or 8, ONLY reds would be sampled.7Random Cluster Sampling - 1• Done correctly, this is a form of random sampling• Population is divided into groups, usually geographic or organizational• Some of the groups are randomly chosen• In pure cluster sampling, whole cluster is sampled.• In simple multistage cluster, there is random sampling within each randomly chosen clusterRandom Cluster Sampling - 2• Population is divided into groups• Some of the groups are randomly selected• For given sample size, a cluster sample has more error than a simple random sample• Cost savings of clustering may permit larger sample • Error is smaller if the clusters are similar to each otherRandom Cluster Samplng - 3• Cluster sampling has very high error if the clusters are different from each other• Cluster sampling is NOT desirable if the clusters are different• It IS random sampling: you randomly choose the clusters• But you will tend to omit some kinds of subjects8Stratified Cluster Sampling• Reduce the error in cluster sampling by creating strata of clusters• Sample one cluster from each stratum• The cost-savings of clustering with the error reduction of stratificationStrataStratification vs. ClusteringStratification• Divide population into groups different from each other: sexes, races, ages• Sample randomly from each group• Less error compared to simple random• More expensive to obtain stratification information before samplingClustering• Divide population into comparable groups: schools, cities• Randomly sample some of the groups• More error compared to simple random• Reduces costs to sample only some areas or organizationsStratified Cluster Sampling• Combines elements of stratification and clustering• First you define the clusters• Then you group the clusters into strata of clusters, putting similar clusters together in a stratum• Then you randomly pick one (or more) cluster from each of the strata of clusters• Then you sample the subjects within the sampled clusters (either all the subjects, or a simple random sample of them)9Multi-stage Probability Samples –1 • Large national probability samples involve several stages of stratified cluster sampling• The whole country is divided into geographic clusters, metropolitan and rural• Some large metropolitan areas are selected with certainty (certainty is a non-zero
View Full Document