AP Statistics Ch 10: Estimating with Confidence
Before starting, you should know...
- statistical inference: method for drawing conclusions of population based on conclusions on sample
- since diff. samples can lead to diff. conclusions, we can’t be sure our conclusion is correct; can only use probability to prove conclusion is strong or weak
- two types of statistical inference
- confidence intervals (ch 10)
- estimate value of population parameter
- significance tests (ch 11)
- assess evidence for claim about population
- shows probabilities about what happens if we use inference method many times
- require predictable behavior that resulted from many trials
- most reliable if sample or experiment is randomized
Part 1: Confidence intervals: the basics
- sample mean will vary slightly if we take different samples of the same size and population
- what mean of sample tells us about mean of population; confidence interval
- rare for mean of sample (x-bar) to be exactly equal to mean of population (mu), so there will be some error in the estimation; x-bar will help us find how big that error is
- x-bar gets close to Normal distribution in large samples, so use the 68-95-99.7 rule; find the standard deviation, x-bar to 2 standard deviations from mu (95% of samples in distribution); this will mean mu is at most 2 standard deviations from x-bar too
- 95% samples have mu that is between x-bar + 2o-- and x-bar - 2o--
- interval called 95% confidence interval; a confidence interval: range of numbers so that a certain percentage of samples have parameter in that range of numbers
- very possible that a sample has mu between x-bar + 2o-- and x-bar - 2o--
- narrows down the data to an interval where mu is in 95% of all samples; use to estimate the error we get when estimating mu
- 5% of sample will not have mu in the confidence interval, so we are 95% sure that mu is between confidence interval, aka we used a method that made sure 95% of samples have mu between confidence interval
- don’t know whether sample we get is one of the 5% or the 95%
- Level C confidence interval
- two parts: confidence interval and confidence level C
- confidence interval: has in the form of estimate + / - margin of error
- + / - = plus or minus
- confidence level C: usually use over 90%
- expressed in decimal form when finding z* (shown below) or t* (shown in part 2)
- calculating confidence interval when know standard deviation but not mean
- only construct confidence intervals when data is from SRS, sampling distribution is Normal, and individual observations are independent
- SRS: data comes from SRS of size n
- Normal: sample distribution of sample mean approximately Normal; population at least 30 times as large as sample
- Independence: is required to use standard deviation equation o-- /sqrt(n); observations independent if sampling w/ replacement (rare), or sampling w/o replacement when population ten times as large as sample
- first, use Normal curve and z-scores, which are called z* in this situation
- determine the confidence level C, then take away (1-C)/2 from each end of the tail
- calculate z* from table A; mark it on the Normal curve; since Normal curve is symmetric, the side second “z*” will be the negative of the first one
- second z* = -z*, which is generally drawn to the left of mean
- the area contained within z* and -z* will be the confidence level C
- you now have the critical value: the the two z* s
- now Level C confidence interval for mu can be
- x-bar + z* x o-- / sqrt(n) and x-bar - z* x o-- / sqrt (n)
- z* determines area between -z* and z*
- accurate for exactly normal; only approximate if not (sample still needs to be big and curve still needs to be Normal though)
- how to calculate Level C confidence model based on data
- know what is the population and what do you want to know about the population
- based on the conditions, decide what method you will use to determine confidence level, then carry out the calculations
- interpret the results
- sentence structure: We are ___ % confident that the true mean of the ______ is between ___ and ___
- behavior of confidence intervals
- margin of error (MOE) changes as choice of confidence level changes
- MOE gets smaller when...
- z* is smaller, which means less confidence
- o-- gets smaller, but very difficult to do
- n gets larger, must take many more samples just to cut moe in half, however (because of the square root)
- determining sample size
- w/ enough observations, can get both large confidence and small MOE
- to find the sample size necessary, use formula: (desired z*) x (o--)/(sqrt(n)) is equal or less than (specified margin of error)
- simplifies to z* x o-- / sqrt (n) is less than or equal to m
- always round up when dealing with sample size
- sample size determines margin of error while population size doesn’t
- remember
- data must be from SRS of population formed by random selection (applies to random observations too); the method discussed here can’t be used on samples formed by more complicated methods than SRS
- formulas cannot fix badly sampled data and can’t produce trustworthy conclusions from that data; outliers can also change the results, so they should be removed or corrected
- shape of population distribution can affect results; skewed and other non-Normal shapes will have a different confidence level than the one you calculate; level c confidence interval only depends on distribution of x-bar; however, when sample size is equal or greater than 15, then confidence level not really affected by non-normal shapes except when there are very strong outliers and skewness
- standard deviation must be known
- MOE (margin of error) tells amount error to expect b/c of chance variation; it only covers sampling errors and will not fix bias, non-response, and other practical errors
- every method of inference have some kind of warning and condition
- conditions rarely fully met when dealing with these problems in real life; data should be judged and analyzed first
- just because we are x % confident, we cannot say that there is an x% chance that mu or x-bar is in the interval; either it is or is not in the interval, so the probability that mu or x-bar is in the interval is 0 or 1
- after we got an interval, no randomness will remain
Part 2: Estimating population mean
- do not know o-- and will still calculate confidence interval
- will need to estimate o-- first; use sample’s standard deviation s to estimate
- o-- is around s
- change (o--) / (sqrt(n)) to s / sqrt(n)
- called standard error of sample mean; is when standard deviation of statistics is estimated by data
- still need to follow the same three settings to use this method: data from SRS, have Normality, and outcomes are independent
- for Normality: if sample size is larger than 30, then count it as Normal; if not, then it can either be stated in problem or determined by graphing data then looking for the shape
- t distribution
- when we substitute s / sqrt (n), the graph of the means will not be normal anymore if the data size is really small
- will continue to get more and more normal as sample size (and therefore df) grows, as usual
- becomes t-distribution, and t will be our critical value
- similar in to Normal shape, but spread and area of tails of t distribution greater than that of Normal distribution
- extra info: a z-distribution is the distribution when we use z* as our critical value
- different with different sizes of sample
- identified with degree of freedom, or df
- use the table t-distribution (Table B critical value t) to calculate the t-value
- df on the side and confidence level C on the bottom; probability above critical number is on top
- one-sample t confidence interval
- like the level C confidence interval
- equation: x-bar + / - t* (s)/sqrt(n)
- + / - means plus or minus
- interval will be approximate if large samples and exact if population distribution is exactly Normal
- still use the four steps to calculate interval for data:
- know what is the population and what do you want to know about the population
- based on the conditions, decide what method you will use to determine confidence level, then carry out the calculations
- interpret the results
- sentence structure: We are ___ % confident that the true mean of the ______ is between ___ and ___
- form: estimate + / - t* SEestimate
- SE = standard error
- only need to know level C and df
- if df does not show on table C, choose the greatest df on the table that is less than the actual df >>> gives us wider confidence interval than we need
- these problems not as common as paired t procedures b/c not as convincing
- paired t procedures
- compare observations of two treatments in matched pairs design or of before-and-after measurements on same subjects
- uses one-sample t procedure
- population mean equal mean diff. in observations between
- responses to 2 treatments in matched-paired subjects in population
- responses to 2 treatments in single individuals in a population
- before-after measurements of all individuals in population (one set of measurements carried out on same individual)
- so calculate this mean and use that for x-bar
- sentence structure: I am ___% confident that the actual mean difference in ______ for the population is between ___ and ___
- positive and different numbers do make a difference!
- since many paired t problems do not have samples from SRS, can only say shows evidence, but can say the conclusion about population
- diff between random selection and random assignment
- random selection >>> conclusion about population
- random assignment >>> shows whether there is evidence treatment caused effect
- DO NOT calculate as if there are two samples, b/c the pairing means that samples might not be chosen independently, and treating as if they are two separate samples means you are treating as if samples are chosen independently
- robust t procedures
- since no sample in real life is exactly Normal, t confidence interval not exact
- procedures’ usefulness depends on how resistant they are by lack of Normality
- robust: when an inference procedure’s calculations remains accurate and not very affected when one condition for use is violated >>> confidence interval still accurate
- not robust against outliers if small sample >>> can’t declare demanded confidence
- if no outliers, can be pretty robust against non-Normality
- if skewed, large sample size can fix (b/c Central Limit Theorem)
- small samples: always check shape and outliers
- rules for t procedures
- unless there is a small sample size, SRS is more important than Normality
- sample size less than 15 >>> only use t procedures if data close to Normal or if no outliers
- samples size at least 15 >>> can use t procedures except if have strong outliers or skewness
- sample size at least 30 >>> can use t procedures anytime
- if sample gives bias (not randomly selected) or is actually a population, don’t use t procedures
Part 3: Estimating population proportion
- interested in proportion of population that fits some requirement, which we will call success
- conditions for inference
- based on sampling distribution of statistic
- in large statistic, population proportion p is close to sample proportion p-hat
- standard error of p-hat will be sqrt((p-hat x (1 - p-hat))/n)
- confidence interval becomes
- estimate + / - z* SE
- to use z procedures, data obtained by SRS, is Normal (when n(p-hat) and n(1 - p-hat) is at least ten, data also counted as normal), and outcomes are independent
- stated in terms of p-hat
- z procedures for proportions
- level C confidence interval = p-hat + / - z* sqrt((p-hat x (1 - p-hat))/n)
- z* still equal (1-C)/2
- interpretation sentence structure: I am ___% confident that the proportion of _____ lies between ___ and ___.
- margin of error only describes random sampling error
- there are other sources of error that can’t be accounted for by margin of error
- inference toolbox summary
- step 1: determine population and interest of measure
- step 2: do they pass the conditions? (Normality, SRS, Independence?)
- step 3: determine on method and then calculate
- step 4: interpret what you found out
- sample size
- estimating parameter to certain confidence and margin of error
- set z* sqrt((p* x (1 - p*))/n) to be less than or equal to the margin of error you desire, then calculate for n
- to get p*, you can base estimate on experience... do several values of p* to cover range of p-hat if you are doing this
- can also set p* to 0.5 b/c margin of error largest at p* = 0.5 >>> real margin of error for any p-hat value will be smaller than anticipated
- if p* is from 0.3 to 0.7, use p* = 0.5; if not, then using p* = 0.5 will give larger sample than needed, so estimate based on experience
Websites I found helpful: