Tuesday, January 28, 2014

Yay, finally! Chapter 9! I hope this website is helpful so far. I know making it has been helpful for me for finals, and it will probably be helpful for the AP Exam as well. However, I want it to be helpful for you as well. If you have any problems, comment it and I will try to answer it to the best of my ability. It has been a while since I said this, so I will say it again. I do not own any of the pictures (unless otherwise stated) you see here. I also do not own any of the information found in the websites posted in these notes. The pictures are pictures that I thought will be helpful for showing you what is going on, and the websites are those I found helpful for understanding the material. I am posting them here because I hope you find them helpful as well. Happy studying!

AP Statistics Ch 9: Sampling Distributions

Things to know before starting this chapter...
  • if an event can’t be repeated, then statistical inference can’t be made on it; data analysis can be made on any data, though

Part 1: Sampling Distributions

  • vocab and symbols
    • parameter: number that describes population; usually not known in statistics since we will need to look at entire population
    • statistics: number that describes sample; don’t need to know parameter to calculate this, and we can use this to estimate parameter
    • mean of population symbolized by Greek letter mu; in other words, info of parameter; not known until we find mean of sample
    • mean of sample symbolized by x-bar (x with a line over the top); in other words, info of statistics
    • sampling variability: the fact that statistics will vary with different samples
    • population proportion symbolized with p
    • sample proportion symbolized with p-hat
    • sampling distribution: distribution of values calculated from statistics; made from all same-sized, possible samples of the event
      • will make an ideal pattern
      • more accurate than distribution of statistics of only a certain number of trials
  • how to describe sampling distributions
    • can be described like any other distributions
    • describe shape, center, spread, and outliers
    • appearance of sampling distributions based on samples depends on random sampling and how it is done; bad sampling = bad results (not accurate)
    • as always, more repetitions/individuals = more accurate >>> predictable pattern and behavior >>> sample distributions have a more definite shape; few repetitions/individuals  = very inaccurate
  • bias of statistic
    • using sampling distributions makes us able to tell if conclusion trustworthy based on the event’s usual sampling distribution
    • when we say “bias”, we are talking about bias as that of a statistic, not that of a sampling method
    • unbiased if mean of sampling distribution = true mean of parameter
      • statistic often called unbiased estimator of parameter; will sometimes be above or below true value, but is still centered at true value (no systematic tendency to make these slight errors)
    • using the statistic as an unbiased estimator means we can say p is around p-hat and mu is around x-bar
    • sample size does not affect what p-hat and p will be
    • as long as sample distribution is centered at true value (mean) of population, then it is considered unbiased
    • high = data’s center and data points not on or near the true value of parameter, low = data’s center and points on or near the true value of parameter
  • variability of statistic
    • less variability in large samples than in small samples
    • spread does not depend on size of population as long as population is at least 10 times as large as sample
    • only depends on sampling design and size
    • high = data points are all over the place, low = data points are close together

Part 2: Sample proportions

  • usually appears in questions that involve categorical variables
  • when you express proportion of statistics, you express it in decimals
  • p-hat = number of successes in sample / size of sample = X / n
    • p-hat always the same as p; it is an unbiased estimator
  • how well p-hat estimates p depends on sampling distribution of p-hat
  • describing sampling distribution of p-hat
      • X and p-hat will vary with size of samples, so considered random samples
    • mean: the same as p
    • standard deviation: sqrt((p x (1-p))/(n))
      • you can’t and don’t use this formula if the sample is a large part of population; only sue when population is at least 10x as large as sample
  • normal approximation and p-hat
    • p-hat is approximately normal, and the larger sample is, the more accurate the normal curve is
    • normal curve most accurate when p is close to 0.5, and least accurate when close to 0 or 1
    • only use Normal approximation if np is larger or equal to 10 and n(1-p) is larger or equal to 10
  • how likely is getting a sample in which p-hat is close to p or any other value?
    • we are most likely going to work with a normal curve; if the normal curve works... work with z-scores and table A
    • first, find the value(s) of p-hat you are trying to look for, and then find the z-scores
    • using table A, find the correct percentages; if you are looking for the percentage between a range of values, then subtract the percentages so that it represents the percentage between the values you are looking for
    • for more information about working with z-scores, table A, and normal curves, look back to Ch 2 notes

Part 3: Sample Means

  • very common
  • distributions of means are less variable and more Normal than distributions of individual observations
  • sample distribution of means (x-bar): distribution of value of means of all possible samples that have the same size; the samples still belong to the population you are interested in
  • if x-bar is mean of SRS with a certain size from a large population, that has mean mu and standard deviation o--...
    • mean of distribution of x-bar = mean of population (mu)
    • standard deviation = (standard deviation of population) / (sqrt(size of sample))
      • = o-- / (sqrt(n))
  • no matter what shape, size, etc. population distribution is...
    • like p-hat, x-bar is an unbiased estimator, this time, of mu
    • only use standard deviation equation when population is more than 10 times as large as sample
  • shape of distribution of x-bar
    • depends on shape of population distribution
    • is exactly normal if the population distribution is exactly normal
  • central limit theorem
    • no matter how the population distribution looks and what the mean is, a sample with a certain standard deviation, the larger the size of the sample, the closer it gets to a normal distribution, which can be summarized by saying N(mu, o-- / n)
    • the less the population distribution looks like a Normal distribution, the larger the n needs to be to make distribution of x-bar look like a Normal distribution
    • mean doesn’t change as sample gets larger, but the standard deviation does get smaller and the curve does get more Normal

Websites I found helpful


Saturday, January 25, 2014

AP Statistics Ch 8: Binomial and geometric distributions

Part 1: Binomial distributions

  • it is vital to know when a situation is a binomial setting and when it is a geometric setting (covered in part 2)
  • what makes a setting a binomial setting
    • outcomes can only be success and failure
    • there is a fixed number n of observations (trial)
    • the observations are independent, and knowing one observation won’t help you know the next observation
    • probability of success is the same for every trial
  • if the situation is a binomial setting
    • binomial random variable: the number of successes of the random variable, described by X
    • binomial distribution: probability distribution of X, or the distribution of (number of successes)/(number of trials)
      • has parameters of n (or number of trials) and p (or the probability of success for each trial)
      • X can be any number from 0 to n
        • can be abbreviated B(n, p)
      • is a discrete probability distribution
      • important to know when the binomial distributions apply
  • Binomial distribution is sampling
    • used to find out about the probability of success p in a population
      • even though using simple random sampling can result in making the trials dependent, if the population is much larger than the sample, then the count of successes p in an SRS with n trials is about the same as the p obtained in the binomial distribution with the same n number of trials
      • in other words, in this situation, the p obtained with an SRS of size n is about the same as B(n, p) with the same size n
  • the equations for finding probability for a binomial distribution, called binomial probability
    • nCk  (pk)(1-p)n-k
      • k = the number of success, n = the number of trials
      • nCk is the binomial coefficient, which is the number of ways of arranging the successes among the observations
        • can use the calculator to calculate, or use (n!)/(k!(n-k)!)
    • using pdf
      • called the probability distribution function
      • given an X number of successes, pdf can tell you what is the probability that X number of successes will occur
    • using cdf
      • we can use this if we want to calculate probabilities for more than one X
      • for example, if we calculate P(x< or equal to 3), we can use the cdf function
      • called the cumulative distribution function
      • if you want to calculate the probability of P(x greater or equal than #), then subtract P(x equal or less than #-1) from 1
        • the idea here is that cdf only calculates the probability for x is equal or less than ___, not the probability for x is equal or greater than ___
  • mean and standard deviation of binomial distribution
    • in general, mean should be np, where n is the total number of trials and p is the  probability of success in each trial
    • standard deviation is (np(1-p))0.5
    • IMPORTANT! These formulas for mean and standard deviation can only be used for binomial distributions!
  • Normal approximations for binomial distributions
    • as the number of trials increase, the binomial distribution will get close to a Normal distribution, so we can then use Normal probability calculations to find probabilities
      • notation will be N(np, (np(1-p))0.5
    • we can use this when np is greater or equal to 10 and when n(1-p) is greater or equal to 10
    • accuracy improves as n increases, and is most accurate when p is equal to 0.5

Part 2: Geometric Distributions

  • unlike binomial distributions, where the number of trials is known and fixed, we want to know when (what trial #) we will get our first success in geometric distributions
    • takes place in geometric settings
  • requirements of geometric settings
    • only two categories: fail and success
    • observations are independent
    • probability p for success is the same for every observation
    • we want to know the number of trials required to obtain the first success
      • so X = number of trials until ____occurs
  • equation for geometric distributions
    • the probability that the first success occurs on the nth trial = (1-p)n-1(p)
      • simplifies to P(X=n) = (1-p)n-1(p)
      • n = trial number
    • the probability that the first success will take more than n trials to occur = (1-p)n
  • distribution table for geometric series never ends
  • properties of geometric random variable
    • the expected value is just another way to say the mean; it is the expected number of trials required to get the first success
    • mean = 1/p
    • variance = (1-p)/(p2), and the standard deviation is just the square root of that

Websites I found: