AP Statistics Ch 9: Sampling Distributions
Things to know before starting this chapter...
- if an event can’t be repeated, then statistical inference can’t be made on it; data analysis can be made on any data, though
Part 1: Sampling Distributions
- vocab and symbols
- parameter: number that describes population; usually not known in statistics since we will need to look at entire population
- statistics: number that describes sample; don’t need to know parameter to calculate this, and we can use this to estimate parameter
- mean of population symbolized by Greek letter mu; in other words, info of parameter; not known until we find mean of sample
- mean of sample symbolized by x-bar (x with a line over the top); in other words, info of statistics
- sampling variability: the fact that statistics will vary with different samples
- population proportion symbolized with p
- sample proportion symbolized with p-hat
- sampling distribution: distribution of values calculated from statistics; made from all same-sized, possible samples of the event
- will make an ideal pattern
- more accurate than distribution of statistics of only a certain number of trials
- how to describe sampling distributions
- can be described like any other distributions
- describe shape, center, spread, and outliers
- appearance of sampling distributions based on samples depends on random sampling and how it is done; bad sampling = bad results (not accurate)
- as always, more repetitions/individuals = more accurate >>> predictable pattern and behavior >>> sample distributions have a more definite shape; few repetitions/individuals = very inaccurate
- bias of statistic
- using sampling distributions makes us able to tell if conclusion trustworthy based on the event’s usual sampling distribution
- when we say “bias”, we are talking about bias as that of a statistic, not that of a sampling method
- unbiased if mean of sampling distribution = true mean of parameter
- statistic often called unbiased estimator of parameter; will sometimes be above or below true value, but is still centered at true value (no systematic tendency to make these slight errors)
- using the statistic as an unbiased estimator means we can say p is around p-hat and mu is around x-bar
- sample size does not affect what p-hat and p will be
- as long as sample distribution is centered at true value (mean) of population, then it is considered unbiased
- high = data’s center and data points not on or near the true value of parameter, low = data’s center and points on or near the true value of parameter
- variability of statistic
- less variability in large samples than in small samples
- spread does not depend on size of population as long as population is at least 10 times as large as sample
- only depends on sampling design and size
- high = data points are all over the place, low = data points are close together
Part 2: Sample proportions
- usually appears in questions that involve categorical variables
- when you express proportion of statistics, you express it in decimals
- p-hat = number of successes in sample / size of sample = X / n
- p-hat always the same as p; it is an unbiased estimator
- how well p-hat estimates p depends on sampling distribution of p-hat
- describing sampling distribution of p-hat
- X and p-hat will vary with size of samples, so considered random samples
- mean: the same as p
- standard deviation: sqrt((p x (1-p))/(n))
- you can’t and don’t use this formula if the sample is a large part of population; only sue when population is at least 10x as large as sample
- normal approximation and p-hat
- p-hat is approximately normal, and the larger sample is, the more accurate the normal curve is
- normal curve most accurate when p is close to 0.5, and least accurate when close to 0 or 1
- only use Normal approximation if np is larger or equal to 10 and n(1-p) is larger or equal to 10
- how likely is getting a sample in which p-hat is close to p or any other value?
- we are most likely going to work with a normal curve; if the normal curve works... work with z-scores and table A
- first, find the value(s) of p-hat you are trying to look for, and then find the z-scores
- using table A, find the correct percentages; if you are looking for the percentage between a range of values, then subtract the percentages so that it represents the percentage between the values you are looking for
- for more information about working with z-scores, table A, and normal curves, look back to Ch 2 notes
Part 3: Sample Means
- very common
- distributions of means are less variable and more Normal than distributions of individual observations
- sample distribution of means (x-bar): distribution of value of means of all possible samples that have the same size; the samples still belong to the population you are interested in
- if x-bar is mean of SRS with a certain size from a large population, that has mean mu and standard deviation o--...
- mean of distribution of x-bar = mean of population (mu)
- standard deviation = (standard deviation of population) / (sqrt(size of sample))
- = o-- / (sqrt(n))
- no matter what shape, size, etc. population distribution is...
- like p-hat, x-bar is an unbiased estimator, this time, of mu
- only use standard deviation equation when population is more than 10 times as large as sample
- shape of distribution of x-bar
- depends on shape of population distribution
- is exactly normal if the population distribution is exactly normal
- central limit theorem
- no matter how the population distribution looks and what the mean is, a sample with a certain standard deviation, the larger the size of the sample, the closer it gets to a normal distribution, which can be summarized by saying N(mu, o-- / n)
- the less the population distribution looks like a Normal distribution, the larger the n needs to be to make distribution of x-bar look like a Normal distribution
- mean doesn’t change as sample gets larger, but the standard deviation does get smaller and the curve does get more Normal
Websites I found helpful
No comments:
Post a Comment