Ch 11: Testing a Claim
Things to know before starting:
- confidence interval used to estimate population parameter
- significance test asses evidence provided by data to look at claim about population
- basic idea: if there is an outcome that is highly improbable if a claim is true, then the highly improbable outcome would be good evidence that the claim is not true
Part 1: basics of significance tests
- going to say that we already know the standard deviation here
- significant test: procedure for comparing observed data with hypothesis we want to assess
- hypothesis: statement about the population
- there are two: null hypothesis and alternative hypothesis
- uses probability that shows how well data and hypothesis agree
- says what would happen in the long run
- outline for significance tests: make and test the claim the opposite of something we are trying to prove (we are making a null hypothesis)
- if the data says something is faster, our claim will be that it is actually slower
- choose a mean that satisfies the claim, then calculate standard deviation using reported stan. dev. / sqrt (sample size)
- or o-- / sqrt (n)
- make a Normal curve based on this info, then compare the mean reported by data to the mean chosen in claim
- if the two means are close , then there is no convincing evidence that hypothesis is correct; if two means are far away, then there is convincing evidence that the hypothesis is correct
- use p-values to determine if means are close or far (going to discuss later)
- stating hypothesis
- identify the two claims, also known as hypothesis
- tests look for evidence against a claim, so start by making a claim that is against the claim we want evidence for
- called null hypothesis
- symbolized by H0
- usuall says no change, no effect, or no difference
- alternative hypothesis: claim about population that we are about to find evidence for
- symbolized by Ha
- do a one-sided alternative hypothesis if direction (specific conclusion) stated; examples of direction: “is increasing” or “is decreasing”, etc.
- do a two-sided alternative hypothesis if direction is not stated: examples of no specific direction: there is change, but it doesn’t say change for the better or worse, or doesn’t say increase or decrease, etc.
- DON’T look at data before making alternative hypothesis!!!
- hypotheses describe population, so write them in terms of population parameter
- conditions needed for significant tests
- Normality, independence, and sample made from SRS
- Normality for means: consider Normal if sample size is at least 30
- Normality for proportions: consider Normal if np and n(1-p) greater than 10
- test statistics
- compares value of parameter calculated in null hypothesis with that of data
- if the value of data is far away from value of parameter (null hypothesis), then evidence is against null hypothesis
- calculate how far away estimate is from parameter >>> need to standardize estimate
- test statistics: the standardized estimate
- (estimate from data - value from null hypothesis) / standard deviation of estimate)
- standard deviation of estimate = o-- / sqrt (n)
- p-values
- determines if estimate from data is far enough away from H0 in the direction of Ha to be considered against H0
- p-value: probability that any observed outcome is as extreme as or more extreme than the actual observed outcome
- small >>> good evidence against H0; large >>> not good evidence against Ho
- one-direction Ha >>> calculate p-value that is in the direction of Ha; two-direction >>> add the p-values then decide if the total p-value is small enough to give evidence against H0
- use z-score to calculate
- statistical significance
- compare with p-value with fixed value that is considered decisive; in other words, a given value that shows how much evidence we need against H0
- significance level expressed by Greek letter alpha, and is expressed in decimals
- if alpha = 0.01, then that means H0 will be true no more than 1% of the time
- considered statistically significant if p value is as small as or smaller than alpha
- if statistically significant, then say data is statistically significant at level alpha; means that it is not likely for data against H0 to happen just by chance
- most common is alpha = 0.05, but can sometimes use alpha = 0.01 or 0.10
- inference
- write conclusion in context and that clearly connects to calculations
- can be based on p values or statistical significance
- p-value: ask if is it small enough
- statistical inference: can we reject H0, or will we fail to reject H0?
- use p-values to see they are smaller than alpha
- reject if yes; fail to reject if no
- alpha must be stated before data is produced, or else it can’t be trusted and
- p value is the best indication
- identifies minimum significance level at which data is significant
- makes it easier to determine whether or not to reject H0 at a given significance level
Part 2: Carrying out significance tests
- How to carry out tests
- Step 1: identify populations and parameter, then alternate hypothesis and null hypothesis
- Step 2: do conditions fit? Conditions are: Normality, independence, and sample from SRS
- Step 3: If the conditions fit, calculate the test statistics and p-value
- step p4: Interpret p-value or Ho. If you use Ho, then analyze using statistic significance
- test statistics: (x-bar - value stated in H0) / (o-- / n)
- interpretation sentences
- p-value: More/less than ___% of the time, an SRS the size of ___ from the ____(thing you are comparing population to)_____ would have a __ (measure (mean or proportion)) ___ at least as far from __(value from H0)__as that of the sample of __(sample of Ha) ___. The observed __(value from Ha)__ therefore is/is not a good evidence that __what Ha states___
- Ho: Since our P-value, _____, is less/more than “alpha” = ____, this result is/is not statistically significant. We accept/reject H0 and conclude that the __(what we are trying to measure)___ among __(population)_ is positive (has increase)/negative (has decreased)
- if the test is a one-direction test, then state whether the subjects in Ha have decreased or increased compared to the subjects in H0
- If there is no evidence against Ho that only means that the data is consistent with Ho. We can’t say that we have clear evidence that H0 is true.
- Proof of Ha does not mean that what is done in Ha caused what happened in Ha
- only a randomized experiment can do that
- Tests from confidence intervals
- duality: when asking for a significance level of x, do a 1-x confidence interval of the mean of the null hypothesis. If the confidence interval includes the value of the mean of the null hypothesis, you fail to reject the null hypothesis. If the value of the mean of the null hypothesis is not within the 1-x confidence level, then you can reject the null hypothesis
Part 3: Use and Abuse of Tests
- choosing level of significance
- no real clear border between a p-value that is significant and p-value that is insignificant
- determine how small of a p-value you want by considering
- how plausible is H0 : if many people believed in it, or H0 is assumed as true for many years, then must get strong evidence >>> significance level must be very small
- consequences rejecting H0 : if rejecting H0 means you will go through a big loss, then significance level must be very small to convince people to reject H0
- Significance vs. practical importance
- Very big samples can result in tests that have small deviations from the null hypothesis; these small deviations are significant in these large samples, but not significant in small samples
- something that might not seem to be significant can be useful in practice, due to confidence intervals; confidence intervals give range where true mean can take place and shows how much the real mean differs from the null mean
- if the confidence interval is too large, then a larger sample size is needed to accurately determine conclusion
- always check for shape and outliers; outliers can make the data not significant
- confidence intervals really useful b/c it estimates the possible places of the real mean instead of determining if the mean is too large to occur by chance
- pay attention to whether there is significance
- sometimes, people will say that they fail to reject the hypothesis even though the confidence interval is very big and a bigger sample size is needed
- in large sample sizes, small deviations from the null that can only be detected in large samples are also significant; be sure that the test you use can detect what you want to find out
- sometimes you can’t infer information from data
- always make sure that the conclusion came from sample that satisfies conditions SRS, Normality, and Independence before trusting it and inferring information on it
- Hawthorne effect: when some change in the environment or a knowledge of something, such as knowing someone is observing them, changes the subjects’ behaviors, and therefore the data
- beware of bias and data from uncontrolled situations with variables that can affect data
- Always follow the inference guideline to determine if you have real evidence or not
- step 1: determine parameter and the null and alternative hypotheses
- step 2: are the required conditions present? (Normality, SRS, and Independence)
- Step 3: find test statistic and p-value
- step 4: state and interpret conclusion in context, and connect conclusions back to calculations of p-value or test statistic
Part 4: Using inference to make decisions
- decision making is different from measuring strength of evidence
- significance test: measure strength of evidence
- decision: based on result of significance test and many other factors
- when we need to make decision that is mostly based on data, which could happen, two errors can come up
- Type I: happens when we reject H0 when H0 is actually true
- Type II: happens when we fail to reject H0 when H0 is actually false
- when solving problems, describe these errors and the consequences of these errors in context
- the seriousness of each of these two errors depends on the conditions of the problem or event
- possibility of error
- can’t eliminate all error through sample; to do so will require us to go through the whole population
- by asking what will happen if we did this many times, we can acquire the possibility of having a Type I or Type II error
- to do this, first graph the distribution of Ho, then find the critical value. The section that is on the side of the critical value that symbolizes the p-value represents the possibility of a Type I error; only the part that belongs to Ho counts
- therefore, the probability of a Type I error is same as the significance level
- Graph the distribution of Ha, then shade the space the belongs to Ha that is on the other side of the critical value of H0 (the part that does not represent the p-value)
- that area is the probability of a Type II error; use a graphing calculator to calculate the area
- high probability of Type II can means that it is very possible for the test to detect Ha; this means that the test is not good
- What is power
- high probability of Type II can means that it is very possible for the test to detect Ha; this means that the test is not good
- power: probability that H0 is rejected when the an alternative value (value that belongs in Ha, is true
- must determine significance value and alternate value before calculating power
- calculated by 1 - probability of Type II error
- depends on what specific mean value of Ha is
- p-value vs. power : what do they state?
- p-value: states the probability of getting a value from the test statistic as extreme as or more extreme than the critical value if H0 is true
- power: states the test’s ability to reject H0 when H0 is false
- increasing power
- 80% power is the standard
- to increase power,
- increase significance value
- consider an alternate value further from the value of H0
- increase sample size (will provide more info about mean of sample)
- decrease standard deviation; can do this by...
- lowering amount of people from subpopulation in sample
- improving measuring process
- low power = very likely to commit Error Type II
- to maximize power, choose a large sample size and as high a significance level as you are willing to use
Websites I found useful
No comments:
Post a Comment