AP Statistics

Hi! Sorry for the late update...
...
Anyways, here are the Ch 5 notes I was talking about.

AP Statistics Ch 5: Producing Data

Part 1: Designing Samples

we want to get info about a large population, but due to inconvenience, we will gather info about only a part of the group, and use that info to make conclusions about the whole population
population: all the individuals we want info about
sample: a part of the population we actually get collect info from
sampling: studying only a part of the population to get info about the whole population
census: when you contact everyone in a population
carefully conducted sample often more accurate than census
bad sampling methods, however, produce inaccurate data

voluntary response sampling: people choose by themselves whether or not to take a survey; biased because it is more likely for people with strong feelings to respond, especially people with negative feelings
convenience sampling: interviewing only the individuals who are the easiest to reach
these methods result in bias, which means they systematically favor certain outcomes

to fix this, avoid the samples being made out of personal choices (of either the interviewers’ or the individuals’)

right ways to sample: simple random sample (SRS), stratified random sample, and cluster sample

these are all probability samples, or a sample chosen by chance; using chance to select samples is an essential principle of sampling

Simple Random Sample

all individuals and possible samples have equal chances of getting selected
choose individuals using random digits either from Table B or from calculator

random digits are just strings of numbers from 0 to 9
when assigning numbers to people, make sure that all the numbers have the same amount of digits

how to choose an SRS

label every person in sample

again, make sure that all the numbers have the same amount of digits, and use shortest possible numbers

then, use Table B or calculator to find a series of random digits

don’t use the same row for every sample

stop picking numbers when have enough for sample, and then write the people in the sample down

systematic random sampling

does not give every possible sample a chance at getting picked
number of individuals in population / wanted number of individuals in sample: n
then, starting from a random person, pick every nth person as you go down the list

stratified random sample

divide a population into groups with individuals that have similar characteristics in each group; these groups are called strata
pick an SRS in each strata individually and then combine the SRS’s to get the sample
can provide info about each stratum as well as info about the population

cluster sample

divide population into groups (clusters), then use SRS to select groups, and choose everyone inside those groups

stratified sampling vs. cluster sampling

stratified: random sample of individuals in a stratum
cluster: study all of the individuals in chosen clusters and none of the individuals in the non-selected clusters

multistage sampling

choose sample by completing stages; divide total area into groups of areas, then divide these groups of areas into stratas, and take an SRS of these groups. Divide the chosen groups down into more groups, and take another SRS

caution with samples

if the populations contain human beings, we need more than good sampling
need accurate complete list of population to make sure that there is no undercoverage, but this list is rarely available, so most samples do have some undercoverage
undercoverage and nonresponse can both lead to bias (nonresponse b/c if people substituted are diff. from people that can’t be contacted or who is refusing to participate, bias can occur)

undercoverage: some groups in pop. left out of sample
nonresponse: when chosen individual can’t be contacted or won’t participate

response bias

caused by behavior or respondent or interviewer
can be caused by wording

when seeing if we can trust surveys on complicated issues on large human populations, we should know what questions are asked, rate of nonresponse, and the date and the method of the survey

Inferring info about population

unlikely that results from sample are exactly the same as that of entire population, so can only estimate info of population
can make estimates more accurate by taking larger samples

large sample = can be confident that sample results are very close to populations’
only true with probability samples

Part 2: Designing experiments

experiment: when you do something to the individuals to observe the response

experimental units: individuals that the experiment is done on
subjects: when the experimental units are humans
treatment: the specific experimental condition done on the subjects

important to know which variable is an explanatory variable and which is response variable

explanatory variable also called factors
many experiments have results that are because of the effects of two or more factors combined

using more than one factor in an experiment can make results that are different from the results from only one factor

level: the amount of treatment used

advantage: experiments can give good evidence for causation
in experiment, we study specific factors that we are interested in while controlling the environment to prevent lurking variables
control

there are experiments that have only one treatment; we will then perform the experiment and the observe response
no matter the experiment, we will need to rely on controlled environment to prevent lurking variables

if there is lurking variable affecting results of experiment, that means there is a placebo effect

placebo: a dummy/false/non-intended treatment (a lurking variable)
can lead to confounding >>> results are misleading

can prevent falling for the placebo effect by using a control group, or the group of subjects that receive a placebo
Control is the first basic principle of designing an experiment; comparing between different experiments in the same setting is the simplest form of control.
no control >>> bias (systematically favoring an outcome)
control vs. control group

control: effort to minimize variability in the way experimental units are obtained and treated
control group: group of experimental unit that are given a placebo treatment

replication

there is natural variability among experimental units

that means that variation in an experiment can be caused by both treatments or natural variability

to minimize natural variability, we want to see units from a treatment group responding similarly to the units in the same group but differently to units in different groups
if we contain many experimental units in each group, then the natural variations will minimize

so replication in experiments in statistics means using enough experimental units in order to reduce natural variation; it is the second principle of experiments

also could mean repeat the experiment again and again, but that isn’t the principle we mean here; here, we mean using enough experimental units

randomization

used to assign experimental units to treatments
comparison of effects of experiments only good, useful, and true if the treatments are applied to similar experimental units
use chance assign the units so that it doesn’t involve the experimenter's judgements

can be combined with matching the units

we should make sure that the treatment groups are around the same size
only used to select which person gets which treatment, not who participates in experiment; people decide if they will be in the experiment voluntarily

randomization: the subjects in the experiment are similar to each other; control makes sure that there are no lurking variables, and that all variables instead of the treatments act equally on both groups
summary of principles of experimenting

control: get rid of the possible effects of lurking variables
replicate: use many units so natural variations are minimized
randomize: use chance to assign units to treatment groups

want to see difference in responses so large that it isn’t just because of natural variation

then we can say that the effect of the treatment is statistically significant
means that there is good evidence for the effect they are looking for

completely randomized design

when units are assigned to groups randomly, with no matching or blocks
can be used to compare any number of treatments, and treatments with more than one levels or factors

blocking

when experimental units are similar in some way that can affect the treatments (ex. gender, age, etc.)
randomization is carried out separately within each block
blocks can be any size, and are another form of control

control effects of lurking variables by making the lurking variables blocks

lets us draw conclusions about each block
we should form blocks based on the most important unavoidable sources of variability among the unit to get rid of any effects of lurking variables, then randomize minimize natural variation

matched pair designs

matching units in various ways can produce more precise results
matched pairs is the simplest form of matching

compares only two treatments

randomization still important, but it is the pairs that are randomized, not the individuals; the pairs stay matched
goal: try to get pairs that are as similar to each other as possible

comparing similar individuals produces more usable and efficient info than comparing individuals who are not similar

pair can either be two people, each taking different treatment; or one person, taking both treatments at different times
matched pairs are a kind of block design

cautions about experimentation

units must be treated identically in every way except for the actual treatments being compared
good way to do this: double-blind

neither unit or person giving treatment knows which treatment is given
avoids bias because knowing the treatment might affect how the treatment is given and how the unit responds

environments can affect outcomes

so evidence for causation usually requires many studies to have experiments done in different places with different details and still have same results

experiments can have lack of realism

means can’t be done realistically, so we can not realistically produce conditions we want to study >>> can’t make conclusions about that unrealistic setting >>> will have to generalize, but the analysis of the experiment can’t tell us how far we can generalize the conclusion into other settings

AP Statistics

Friday, January 17, 2014

No comments:

Post a Comment