149 Cards in this Set
Front | Back |
---|---|
data editing
|
the inspection and correlation of the data received from each element of the sample
|
Primary tasks in editing process
|
- convert all responses into consistent units
- assess degree of non response
- check for consistency across responses
- look for evidence that the respondent wasn't really thinking about the answers
- verify that the branching questions were followed correctly
- add any needed codes…
|
data coding
|
the process of transforming raw data into symbols
|
how to code close ended items: check all that apply questions
|
1 if checked, 0 if not
|
how to code factual open ended items
|
code the numerical variables
|
how to code exploratory open ended items
|
1. identify useable response
2. develop categories for response
3. sort responses into categories using multiple codes
4. assess the degree of agreement between coders
|
nominal can be used for
|
mode, frequency distrobution
|
ordinal can be used for
|
mode, median, frequency distribution, range
|
interval/ ratio can be used for
|
mode, median, mean, frequency distribution, range, standard deviation
|
chi square analysis
|
test for significance between the frequency distributions of 2 or more nominally scaled variables to determine if there is an association between the variables
|
This term defines how well the observed frequencies fit the pattern of expected frequencies
|
chi square
|
crosstabs
|
way to organize data by groups or categories, thus facilitating comparisons; joint frequency distribution of observations on two or more sets of variables
|
this term defines how certain variables differ among various subgroups of the total sample
|
crosstabs
|
when is a chi square test appropriate?
|
type of measurement is nominal and/or difference between 2 independent groups
|
t-tests require what sort of data?
|
interval or ratio
|
t-tests determine
|
if the difference between the 2 sample means occured by chance
|
what test should you use when the sample size is less than 30 and standard deviation is unknown
|
t-test
|
null hypothesis for t-test says
|
group means are equal
|
independent samples
|
2 or more groups of responses tested as if they came from different populations
|
related samples
|
2 or more groups of responses that originate from the sample population
|
paired sample t-test
|
difference in means for variables in the sample
|
ANOVAs
|
determines if 3 or more means are different from each other
|
null hypothesis for ANOVA says
|
all means are equal
|
the dependent variable in an ANOVA must be
|
measureable
|
the independent variable in an ANOVA must be
|
nominal
|
one-way ANOVAs have only one
|
independent variable
|
what test do ANOVAs use
|
f-test
|
f-tests are used to
|
evaluate the differences between the group means in ANOVA
|
how to determine significance using f-test
|
larger f-ratio value = reject null = group mean differences are significant
|
ANOVAs do NOT tell us
|
where the difference is
|
ANOVAs ONLY tell us
|
a difference exists
|
how to find where the difference is in an ANOVA
|
use a follow up/ post hoc test
|
a follow up/ post hoc test conducts
|
multiple pairwise comparisons of means to determine where the differences lie
|
what CAN you determine from the mean
|
if numbers are above or below the mean
|
what CAN NOT be determined by mean
|
- if there will be outliers
- what individual scores were
|
outliers substantially distort
|
mean
|
what is usually the best choice to describe data without outliers
|
mean
|
what CAN you determine from median
|
- if numbers are above or below the halfway point
|
what CAN NOT be determined by the median
|
- if there were outliers
- what individual scores were
|
what is the best choice to describe data when there are outliers
|
median
|
what CAN you determine from mode
|
what are the most frequent numbers
|
what CAN NOT be determined by mode
|
- where the number is in the group of data
- what all of the other numbers are
|
this is the best choice to describe data if you want to select the most popular value
|
mode
|
what CAN you determine from range
|
how far apart the numbers are
|
what CAN NOT be determined by range
|
what the numbers are in the data
|
what is the best choice to describe the spread of the data
|
range
|
this is the best choice to show how much a typical number in the set differs from the mean
|
standard deviation
|
null hypothesis
|
claims a value is equal to some claimed value
|
alt hypothesis
|
claims a value is different from null value
|
the p-value is a measure of
|
significance
|
if p-value is small
|
there is strong evidence for alt hypothesis, reject null
|
if p-value is large
|
there is insignificant evidence, do not reject null
|
what is considered a small p-value
|
less than or equal to .05
|
null hypothesis assumes
|
no difference, association or relationship between variables
|
alt hypothesis assumes
|
a difference, association or relationship between variables
|
if p-value is ≤ .05
|
reject null
|
if p-value is ≥ .05
|
accept null
|
chi square test determines
|
if there is a significant difference between expected and observed frequencies in on or more category
|
chi square test requirements
|
1. one or more category (nominal data)
2. adequate sample size (at least 10)
3. simple random sample
4. data in frequency form
5. all observations must be used
|
What test determines whether the observed frequencies differ from the expected ones?
|
chi square test
|
null hypothesis for chi square claims
|
no significant difference between expected and observed
|
alt hypothesis for chi square claims
|
there IS a significant difference between expected and observed
|
(detail) what do you assume when rejecting the null hypothesis in a chi square test
|
there is a difference but it is NOT by chance or sample error. There is a REAL difference between expected and observed frequencies
|
"you want to know if the mean from one population is larger the mean for another. what do you use?
|
independent sample t-test
|
what does independent samples mean?
|
you have DIFFERENT individuals in your two sample groups
|
examples of independent sample t-test
|
- compare sales volume for stores that advertise vs. stores that dont
- compare speed of survey programming for students that have completed some type of training vs no training
|
the null hypothesis in an independent sample claims
|
difference between the 2 means are 0
|
the alt hypothesis in an independent sample claims
|
difference between 2 samples are above/ below/ not equal to 0
|
what does paired sample t-test mean?
|
you have the SAME individuals in your individuals
|
what do paired sample t-tests compare?
|
compares the mean difference of values to 0
|
what is needed for paired sample t-tests to be valid?
|
difference between paired values should be approximately normally distrobuted
|
examples of paired sample t-test
|
- compare the weight of people on the show before the season begins and after the show ends.
- are workers more productive 6 months after they attend training vs. before training?
|
ANOVAs are used to compare
|
3 or more means
|
ANOVAs ask us
|
what asks us "do all our groups come from populations with the same mean?"
|
A one-way ANOVA compares 3 or more means with?
|
only one independent variable
|
example of ANOVA
|
- comparing light, medium, and heavy consumers of Starbucks' attitudes towards an advertisement.
- comparing light, medium, and heavy users of paper coupons with their likelihood to use mobile coupons
|
f-test is used to compare
|
variances from two normal populations.
|
the peak of any F-test is close to what number?
|
1
|
What values provide evidence against the null in an f-test?
|
values far from 1
|
crosstabs are what sort of variate technique?
|
multivariate
|
what multivariate technique studies the relationship between 2 or more categorical variables?
|
crosstabs
|
this technique constructs joint distributions of sample elements across variables
|
crosstabs
|
the independent variable is also known as?
|
the causal variable
|
the dependent variable is also know as?
|
the outcome variable
|
what is a banner?
|
a series of crosstabs between an outcome and several exploratory variables in a single tab
|
What does the "Pearson chi square test of independence" test for?
|
tests for significance between the frequency distributions of 2 or more nominal variables to determine if there is any association between variables
|
what tests the null hypothesis claiming categorical variables are independent of each other?
|
Pearson chi square test of independence
|
the same proportion of variable X make up each of the response categories for variable Y
|
null hypothesis of Pearson chi square test for independence
|
independent sample t-tests for the mean determine?
|
- determine whether 2 groups differ on some characteristics assessed on a continuous measure
|
what test is used to compare means of 2 groups to see if they are significantly DIFFERENT?
|
independent sample t-test
|
u1=u2. this test's null hypothesis symbolizes that the two means of (interval/ ratio variable x) are equal
|
independent sample t-test
|
example of independent sample t-test?
|
- satisfaction rating of men vs. women
- age in years, customers vs. non customers
|
paired sample t-tests are used to?
|
used to compare 2 means when scores for both variables are provided by the same sample
|
paired sample t-tests are good for measuring?
|
good for measuring "before and after" results
|
this test is good for applying same measures to different objects
|
paired sample t-tests
|
what test would we use to "compare light, medium, and heavy users of crystal meth on their attitudes towards the show "breaking bad"
|
ANOVA
|
an ANOVAs independent variable must be?
|
nominal
|
an ANOVAs dependent variable must be?
|
interval/ ratio
|
An ANOVAs null hypothesis claims?
|
all means are equal. u1=u2=u3
|
what does a larger number mean in the F-ratio?
|
reject null, group means are significantly different
|
covariation is?
|
the amount of change in one variable in relation to the amount of change in another
|
scatter diagram is?
|
graphical plot of the relative position of 2 variables
|
scatterplots/ scattergrams/ scatter diagrams are?
|
a graph of 2 numerical variables (x,y)
|
what is a 2 dimensional graph representing 2 variable measures from the same set of subject elements?
|
a scatterplot
|
if the relationship of 2 variables forms a straight line (linear), variables are considered?
|
correlated
|
what variable is typically placed on the x axis?
|
the more controlled variable
|
what variable is typically placed on the y axis?
|
the response variable
|
Pearson correlation coefficient (r) measures?
|
measures the strength and direction of a linear relationship between 2 variables
|
Pearson correlation coefficient varies between?
|
varies between -1.00 & +1.00
|
a higher (r) value means what?
|
it means a stronger level of association
|
(r) can be ______ or _______ ?
|
can be positive or negative
|
what values depict a very strong range of coefficient?
|
+-0.81 to +-1.00
|
what values depict NO range of coefficient?
|
+-0.00 to +-0.20
|
what depicts a moderate range of coefficient?
|
+-0.41 to +-0.60
|
Properties of Pearson correlation consist of?
|
- values of r that dont depend on the units of measurement
- values of r that dont depend on which variable is labeled x or y (x&y = y&x
|
r=+1 is what type of linear relationship?
|
a perfect POSITIVE linear relationship
|
r=-1 is what type of linear relationship?
|
a perfect NEGATIVE linear relationship
|
-1≤r≤+1. positive value of r means? negative value means?
|
- positive means positive linear.
- negative means negative linear.
|
Value of r close to Zero means what?
|
means no linear relation
|
"no LINEAR relation" does NOT mean what?
|
it does not mean there is "NO relation AT ALL"
|
r ONLY measure what sort of relations?
|
this value only measures LINEAR relationships
|
there still may be non-linear relations if r is close to?
|
close to Zero
|
Assumptions in Pearson correlation coefficient
|
- both variables are measured using interval or ratio scales
- nature of relationship is linear
- Both variables come from a bivariate, normally distributed population
|
Causation is?
|
∆ in x CAUSES ∆ in y
|
common response is?
|
both x and y respond to change in an unobservable manner
|
confounding is?
|
the effect of x and y is mixed up with effects of other exploratory variables of y
|
example of confounding is?
|
tylenol and placebo effect
|
examples of common response are?
|
- ice cream sales and # of shark attacks are positively correlated (because ice cream and swimming occurs in same seasons)
- # of cavities in elementary school children and vocabulary size are positively correlated (because # of cavities and vocabulary both increase with age, but they do…
|
example of causation is?
|
football weekends CAUSE heavier traffic, more food sales, etc. (because the game directly CAUSES this)
|
Spearman Rank Order Correlation measures?
|
measures the linear association between 2 ORDINALLY scaled (RANK ORDER) variables
|
Key differences between Spearman and Pearson?
|
- Sperman = ordinal variables
- Pearson = interval/ ratio variables
|
Regression analysis is used to?
|
used to derive an equation representing the influence of a single or multiple independent variables on a continuous dependent variable
|
what is used to predict the value of a Dependent variable based on the value of at least 1 Independent variable
|
you use regression analysis
|
what explains the impact of changes in an independent variable, on the dependent variable?
|
regression analysis explains...
|
in a regression analysis, what variable do we wish to explain?
|
dependent variable
|
this variable is used to explain the other variable in a regression analysis
|
independent variable
|
what describes relationships in a linear function?
|
regression analysis
|
what assumes ∆ dependent variables are CAUSED by ∆ in independent variables?
|
regression analysis assumes
|
R^2 is also called ?
|
coefficient of multiple determination
|
R^2 is a measure representing what?
|
it is a measure representing total variation in the dependent variable that can be explained or accounted for by a fitted regression equation
|
what is a value called when there is only ONE predictor variable?
|
referred to as the "coefficient of determination"
|
key terms for, R
|
- correlation coefficient, indicating strength and direction of relationship
|
key terms for, R^2
|
- coefficient of determination, % of variation in one variable accounted for by another variable
|
Adjusted r^2 adjusts statistics based on what?
|
adjusts based on the number of independent variables in the model
|
Adjusted R^2 variable provides adjustments to R^2 how?
|
adjusts R^2 such that,
an independent variable that HAS a correlation to Y increases adjusted R^2,
and any variable WITHOUT a strong correlation will make R^2 decrease.
|
use R^2 for what variate regression?
|
use for bivariate regression
|
use Adjusted R^2 for what variate regression?
|
use for multiple regression
|
unstandardized regression coefficients are used in what?
|
used in simple/ bivariate regressions
|
large coefficients are good predictors for what type of regression coefficients?
|
good for unstandardized regression coefficients
|