AP Statistics: February 2014

AP Statistics Ch 10: Estimating with Confidence

Before starting, you should know...

statistical inference: method for drawing conclusions of population based on conclusions on sample
since diff. samples can lead to diff. conclusions, we can’t be sure our conclusion is correct; can only use probability to prove conclusion is strong or weak
two types of statistical inference

confidence intervals (ch 10)

estimate value of population parameter

significance tests (ch 11)

assess evidence for claim about population

shows probabilities about what happens if we use inference method many times
require predictable behavior that resulted from many trials
most reliable if sample or experiment is randomized

Part 1: Confidence intervals: the basics

sample mean will vary slightly if we take different samples of the same size and population
what mean of sample tells us about mean of population; confidence interval

rare for mean of sample (x-bar) to be exactly equal to mean of population (mu), so there will be some error in the estimation; x-bar will help us find how big that error is

x-bar gets close to Normal distribution in large samples, so use the 68-95-99.7 rule; find the standard deviation, x-bar to 2 standard deviations from mu (95% of samples in distribution); this will mean mu is at most 2 standard deviations from x-bar too

95% samples have mu that is between x-bar + 2o-- and x-bar - 2o--

interval called 95% confidence interval; a confidence interval: range of numbers so that a certain percentage of samples have parameter in that range of numbers
very possible that a sample has mu between x-bar + 2o-- and x-bar - 2o--

narrows down the data to an interval where mu is in 95% of all samples; use to estimate the error we get when estimating mu

5% of sample will not have mu in the confidence interval, so we are 95% sure that mu is between confidence interval, aka we used a method that made sure 95% of samples have mu between confidence interval

don’t know whether sample we get is one of the 5% or the 95%

Level C confidence interval

two parts: confidence interval and confidence level C

confidence interval: has in the form of estimate + / - margin of error

+ / - = plus or minus

confidence level C: usually use over 90%

expressed in decimal form when finding z* (shown below) or t* (shown in part 2)

calculating confidence interval when know standard deviation but not mean

only construct confidence intervals when data is from SRS, sampling distribution is Normal, and individual observations are independent

SRS: data comes from SRS of size n
Normal: sample distribution of sample mean approximately Normal; population at least 30 times as large as sample
Independence: is required to use standard deviation equation o-- /sqrt(n); observations independent if sampling w/ replacement (rare), or sampling w/o replacement when population ten times as large as sample

first, use Normal curve and z-scores, which are called z* in this situation

determine the confidence level C, then take away (1-C)/2 from each end of the tail
calculate z* from table A; mark it on the Normal curve; since Normal curve is symmetric, the side second “z*” will be the negative of the first one

second z* = -z*, which is generally drawn to the left of mean
the area contained within z* and -z* will be the confidence level C

you now have the critical value: the the two z* s

now Level C confidence interval for mu can be

x-bar + z* x o-- / sqrt(n) and x-bar - z* x o-- / sqrt (n)

z* determines area between -z* and z*

accurate for exactly normal; only approximate if not (sample still needs to be big and curve still needs to be Normal though)

how to calculate Level C confidence model based on data

know what is the population and what do you want to know about the population
based on the conditions, decide what method you will use to determine confidence level, then carry out the calculations
interpret the results

sentence structure: We are ___ % confident that the true mean of the ______ is between ___ and ___

behavior of confidence intervals

margin of error (MOE) changes as choice of confidence level changes
MOE gets smaller when...

z* is smaller, which means less confidence
o-- gets smaller, but very difficult to do
n gets larger, must take many more samples just to cut moe in half, however (because of the square root)

determining sample size

w/ enough observations, can get both large confidence and small MOE
to find the sample size necessary, use formula: (desired z*) x (o--)/(sqrt(n)) is equal or less than (specified margin of error)

simplifies to z* x o-- / sqrt (n) is less than or equal to m

always round up when dealing with sample size
sample size determines margin of error while population size doesn’t

remember

data must be from SRS of population formed by random selection (applies to random observations too); the method discussed here can’t be used on samples formed by more complicated methods than SRS
formulas cannot fix badly sampled data and can’t produce trustworthy conclusions from that data; outliers can also change the results, so they should be removed or corrected
shape of population distribution can affect results; skewed and other non-Normal shapes will have a different confidence level than the one you calculate; level c confidence interval only depends on distribution of x-bar; however, when sample size is equal or greater than 15, then confidence level not really affected by non-normal shapes except when there are very strong outliers and skewness
standard deviation must be known

MOE (margin of error) tells amount error to expect b/c of chance variation; it only covers sampling errors and will not fix bias, non-response, and other practical errors
every method of inference have some kind of warning and condition

conditions rarely fully met when dealing with these problems in real life; data should be judged and analyzed first

just because we are x % confident, we cannot say that there is an x% chance that mu or x-bar is in the interval; either it is or is not in the interval, so the probability that mu or x-bar is in the interval is 0 or 1

after we got an interval, no randomness will remain

Part 2: Estimating population mean

do not know o-- and will still calculate confidence interval

will need to estimate o-- first; use sample’s standard deviation s to estimate

o-- is around s
change (o--) / (sqrt(n)) to s / sqrt(n)
called standard error of sample mean; is when standard deviation of statistics is estimated by data

still need to follow the same three settings to use this method: data from SRS, have Normality, and outcomes are independent

for Normality: if sample size is larger than 30, then count it as Normal; if not, then it can either be stated in problem or determined by graphing data then looking for the shape

t distribution

when we substitute s / sqrt (n), the graph of the means will not be normal anymore if the data size is really small

will continue to get more and more normal as sample size (and therefore df) grows, as usual
becomes t-distribution, and t will be our critical value

similar in to Normal shape, but spread and area of tails of t distribution greater than that of Normal distribution

extra info: a z-distribution is the distribution when we use z* as our critical value

different with different sizes of sample
identified with degree of freedom, or df
use the table t-distribution (Table B critical value t) to calculate the t-value

df on the side and confidence level C on the bottom; probability above critical number is on top

one-sample t confidence interval

like the level C confidence interval
equation: x-bar + / - t* (s)/sqrt(n)

+ / - means plus or minus

interval will be approximate if large samples and exact if population distribution is exactly Normal
still use the four steps to calculate interval for data:

know what is the population and what do you want to know about the population
based on the conditions, decide what method you will use to determine confidence level, then carry out the calculations
interpret the results

sentence structure: We are ___ % confident that the true mean of the ______ is between ___ and ___

form: estimate + / - t* SEestimate

SE = standard error

only need to know level C and df

if df does not show on table C, choose the greatest df on the table that is less than the actual df >>> gives us wider confidence interval than we need

these problems not as common as paired t procedures b/c not as convincing

paired t procedures

compare observations of two treatments in matched pairs design or of before-and-after measurements on same subjects
uses one-sample t procedure
population mean equal mean diff. in observations between

responses to 2 treatments in matched-paired subjects in population
responses to 2 treatments in single individuals in a population
before-after measurements of all individuals in population (one set of measurements carried out on same individual)
so calculate this mean and use that for x-bar

sentence structure: I am ___% confident that the actual mean difference in ______ for the population is between ___ and ___

positive and different numbers do make a difference!

since many paired t problems do not have samples from SRS, can only say shows evidence, but can say the conclusion about population
diff between random selection and random assignment

random selection >>> conclusion about population
random assignment >>> shows whether there is evidence treatment caused effect

DO NOT calculate as if there are two samples, b/c the pairing means that samples might not be chosen independently, and treating as if they are two separate samples means you are treating as if samples are chosen independently

robust t procedures

since no sample in real life is exactly Normal, t confidence interval not exact
procedures’ usefulness depends on how resistant they are by lack of Normality

robust: when an inference procedure’s calculations remains accurate and not very affected when one condition for use is violated >>> confidence interval still accurate

not robust against outliers if small sample >>> can’t declare demanded confidence
if no outliers, can be pretty robust against non-Normality

if skewed, large sample size can fix (b/c Central Limit Theorem)

small samples: always check shape and outliers

rules for t procedures

unless there is a small sample size, SRS is more important than Normality
sample size less than 15 >>> only use t procedures if data close to Normal or if no outliers
samples size at least 15 >>> can use t procedures except if have strong outliers or skewness
sample size at least 30 >>> can use t procedures anytime
if sample gives bias (not randomly selected) or is actually a population, don’t use t procedures

Part 3: Estimating population proportion

interested in proportion of population that fits some requirement, which we will call success
conditions for inference

based on sampling distribution of statistic
in large statistic, population proportion p is close to sample proportion p-hat
standard error of p-hat will be sqrt((p-hat x (1 - p-hat))/n)
confidence interval becomes

estimate + / - z* SE

to use z procedures, data obtained by SRS, is Normal (when n(p-hat) and n(1 - p-hat) is at least ten, data also counted as normal), and outcomes are independent

stated in terms of p-hat

z procedures for proportions

level C confidence interval = p-hat + / - z* sqrt((p-hat x (1 - p-hat))/n)
z* still equal (1-C)/2
interpretation sentence structure: I am ___% confident that the proportion of _____ lies between ___ and ___.
margin of error only describes random sampling error

there are other sources of error that can’t be accounted for by margin of error

inference toolbox summary

step 1: determine population and interest of measure
step 2: do they pass the conditions? (Normality, SRS, Independence?)
step 3: determine on method and then calculate
step 4: interpret what you found out

sample size

estimating parameter to certain confidence and margin of error
set z* sqrt((p* x (1 - p*))/n) to be less than or equal to the margin of error you desire, then calculate for n

to get p*, you can base estimate on experience... do several values of p* to cover range of p-hat if you are doing this
can also set p* to 0.5 b/c margin of error largest at p* = 0.5 >>> real margin of error for any p-hat value will be smaller than anticipated
if p* is from 0.3 to 0.7, use p* = 0.5; if not, then using p* = 0.5 will give larger sample than needed, so estimate based on experience

Websites I found helpful:

http://quizlet.com/16802319/ap-statistics-ch-10-flash-cards/

Wednesday, February 5, 2014