Ch 11: Testing a Claim
Things to know before starting:
Part 1: basics of significance tests
going to say that we already know the standard deviation here
significant test: procedure for comparing observed data with hypothesis we want to assess
hypothesis: statement about the population
uses probability that shows how well data and hypothesis agree
says what would happen in the long run
outline for significance tests: make and test the claim the opposite of something we are trying to prove (we are making a null hypothesis)
if the data says something is faster, our claim will be that it is actually slower
choose a mean that satisfies the claim, then calculate standard deviation using reported stan. dev. / sqrt (sample size)
make a Normal curve based on this info, then compare the mean reported by data to the mean chosen in claim
stating hypothesis
identify the two claims, also known as hypothesis
tests look for evidence against a claim, so start by making a claim that is against the claim we want evidence for
alternative hypothesis: claim about population that we are about to find evidence for
symbolized by Ha
do a one-sided alternative hypothesis if direction (specific conclusion) stated; examples of direction: “is increasing” or “is decreasing”, etc.
do a two-sided alternative hypothesis if direction is not stated: examples of no specific direction: there is change, but it doesn’t say change for the better or worse, or doesn’t say increase or decrease, etc.
DON’T look at data before making alternative hypothesis!!!
hypotheses describe population, so write them in terms of population parameter
conditions needed for significant tests
test statistics
compares value of parameter calculated in null hypothesis with that of data
if the value of data is far away from value of parameter (null hypothesis), then evidence is against null hypothesis
test statistics: the standardized estimate
p-values
determines if estimate from data is far enough away from H0 in the direction of Ha to be considered against H0
p-value: probability that any observed outcome is as extreme as or more extreme than the actual observed outcome
small >>> good evidence against H0; large >>> not good evidence against Ho
one-direction Ha >>> calculate p-value that is in the direction of Ha; two-direction >>> add the p-values then decide if the total p-value is small enough to give evidence against H0
use z-score to calculate
statistical significance
compare with p-value with fixed value that is considered decisive; in other words, a given value that shows how much evidence we need against H0
significance level expressed by Greek letter alpha, and is expressed in decimals
if alpha = 0.01, then that means H0 will be true no more than 1% of the time
considered statistically significant if p value is as small as or smaller than alpha
most common is alpha = 0.05, but can sometimes use alpha = 0.01 or 0.10
inference
p value is the best indication
Part 2: Carrying out significance tests
How to carry out tests
Step 1: identify populations and parameter, then alternate hypothesis and null hypothesis
Step 2: do conditions fit? Conditions are: Normality, independence, and sample from SRS
Step 3: If the conditions fit, calculate the test statistics and p-value
step p4: Interpret p-value or Ho. If you use Ho, then analyze using statistic significance
test statistics: (x-bar - value stated in H0) / (o-- / n)
interpretation sentences
p-value: More/less than ___% of the time, an SRS the size of ___ from the ____(thing you are comparing population to)_____ would have a __ (measure (mean or proportion)) ___ at least as far from __(value from H0)__as that of the sample of __(sample of Ha) ___. The observed __(value from Ha)__ therefore is/is not a good evidence that __what Ha states___
Ho: Since our P-value, _____, is less/more than “alpha” = ____, this result is/is not statistically significant. We accept/reject H0 and conclude that the __(what we are trying to measure)___ among __(population)_ is positive (has increase)/negative (has decreased)
if the test is a one-direction test, then state whether the subjects in Ha have decreased or increased compared to the subjects in H0
If there is no evidence against Ho that only means that the data is consistent with Ho. We can’t say that we have clear evidence that H0 is true.
Proof of Ha does not mean that what is done in Ha caused what happened in Ha
Tests from confidence intervals
duality: when asking for a significance level of x, do a 1-x confidence interval of the mean of the null hypothesis. If the confidence interval includes the value of the mean of the null hypothesis, you fail to reject the null hypothesis. If the value of the mean of the null hypothesis is not within the 1-x confidence level, then you can reject the null hypothesis
Part 3: Use and Abuse of Tests
choosing level of significance
Significance vs. practical importance
Very big samples can result in tests that have small deviations from the null hypothesis; these small deviations are significant in these large samples, but not significant in small samples
something that might not seem to be significant can be useful in practice, due to confidence intervals; confidence intervals give range where true mean can take place and shows how much the real mean differs from the null mean
always check for shape and outliers; outliers can make the data not significant
confidence intervals really useful b/c it estimates the possible places of the real mean instead of determining if the mean is too large to occur by chance
pay attention to whether there is significance
sometimes, people will say that they fail to reject the hypothesis even though the confidence interval is very big and a bigger sample size is needed
in large sample sizes, small deviations from the null that can only be detected in large samples are also significant; be sure that the test you use can detect what you want to find out
sometimes you can’t infer information from data
always make sure that the conclusion came from sample that satisfies conditions SRS, Normality, and Independence before trusting it and inferring information on it
Hawthorne effect: when some change in the environment or a knowledge of something, such as knowing someone is observing them, changes the subjects’ behaviors, and therefore the data
beware of bias and data from uncontrolled situations with variables that can affect data
Always follow the inference guideline to determine if you have real evidence or not
step 1: determine parameter and the null and alternative hypotheses
step 2: are the required conditions present? (Normality, SRS, and Independence)
Step 3: find test statistic and p-value
step 4: state and interpret conclusion in context, and connect conclusions back to calculations of p-value or test statistic
Part 4: Using inference to make decisions
decision making is different from measuring strength of evidence
when we need to make decision that is mostly based on data, which could happen, two errors can come up
Type I: happens when we reject H0 when H0 is actually true
Type II: happens when we fail to reject H0 when H0 is actually false
when solving problems, describe these errors and the consequences of these errors in context
the seriousness of each of these two errors depends on the conditions of the problem or event
possibility of error
can’t eliminate all error through sample; to do so will require us to go through the whole population
by asking what will happen if we did this many times, we can acquire the possibility of having a Type I or Type II error
to do this, first graph the distribution of Ho, then find the critical value. The section that is on the side of the critical value that symbolizes the p-value represents the possibility of a Type I error; only the part that belongs to Ho counts
Graph the distribution of Ha, then shade the space the belongs to Ha that is on the other side of the critical value of H0 (the part that does not represent the p-value)
high probability of Type II can means that it is very possible for the test to detect Ha; this means that the test is not good
What is power
high probability of Type II can means that it is very possible for the test to detect Ha; this means that the test is not good
power: probability that H0 is rejected when the an alternative value (value that belongs in Ha, is true
depends on what specific mean value of Ha is
p-value vs. power : what do they state?
p-value: states the probability of getting a value from the test statistic as extreme as or more extreme than the critical value if H0 is true
power: states the test’s ability to reject H0 when H0 is false
increasing power
low power = very likely to commit Error Type II
to maximize power, choose a large sample size and as high a significance level as you are willing to use
Websites I found useful