...
Anyways, here are the Ch 5 notes I was talking about.
AP Statistics Ch 5: Producing Data
Part 1: Designing Samples
- we want to get info about a large population, but due to inconvenience, we will gather info about only a part of the group, and use that info to make conclusions about the whole population
- population: all the individuals we want info about
- sample: a part of the population we actually get collect info from
- sampling: studying only a part of the population to get info about the whole population
- census: when you contact everyone in a population
- carefully conducted sample often more accurate than census
- bad sampling methods, however, produce inaccurate data
- voluntary response sampling: people choose by themselves whether or not to take a survey; biased because it is more likely for people with strong feelings to respond, especially people with negative feelings
- convenience sampling: interviewing only the individuals who are the easiest to reach
- these methods result in bias, which means they systematically favor certain outcomes
- to fix this, avoid the samples being made out of personal choices (of either the interviewers’ or the individuals’)
- right ways to sample: simple random sample (SRS), stratified random sample, and cluster sample
- these are all probability samples, or a sample chosen by chance; using chance to select samples is an essential principle of sampling
- Simple Random Sample
- all individuals and possible samples have equal chances of getting selected
- choose individuals using random digits either from Table B or from calculator
- random digits are just strings of numbers from 0 to 9
- when assigning numbers to people, make sure that all the numbers have the same amount of digits
- how to choose an SRS
- label every person in sample
- again, make sure that all the numbers have the same amount of digits, and use shortest possible numbers
- then, use Table B or calculator to find a series of random digits
- don’t use the same row for every sample
- stop picking numbers when have enough for sample, and then write the people in the sample down
- systematic random sampling
- does not give every possible sample a chance at getting picked
- number of individuals in population / wanted number of individuals in sample: n
- then, starting from a random person, pick every nth person as you go down the list
- stratified random sample
- divide a population into groups with individuals that have similar characteristics in each group; these groups are called strata
- pick an SRS in each strata individually and then combine the SRS’s to get the sample
- can provide info about each stratum as well as info about the population
- cluster sample
- divide population into groups (clusters), then use SRS to select groups, and choose everyone inside those groups
- stratified sampling vs. cluster sampling
- stratified: random sample of individuals in a stratum
- cluster: study all of the individuals in chosen clusters and none of the individuals in the non-selected clusters
- multistage sampling
- choose sample by completing stages; divide total area into groups of areas, then divide these groups of areas into stratas, and take an SRS of these groups. Divide the chosen groups down into more groups, and take another SRS
- caution with samples
- if the populations contain human beings, we need more than good sampling
- need accurate complete list of population to make sure that there is no undercoverage, but this list is rarely available, so most samples do have some undercoverage
- undercoverage and nonresponse can both lead to bias (nonresponse b/c if people substituted are diff. from people that can’t be contacted or who is refusing to participate, bias can occur)
- undercoverage: some groups in pop. left out of sample
- nonresponse: when chosen individual can’t be contacted or won’t participate
- response bias
- caused by behavior or respondent or interviewer
- can be caused by wording
- when seeing if we can trust surveys on complicated issues on large human populations, we should know what questions are asked, rate of nonresponse, and the date and the method of the survey
- Inferring info about population
- unlikely that results from sample are exactly the same as that of entire population, so can only estimate info of population
- can make estimates more accurate by taking larger samples
- large sample = can be confident that sample results are very close to populations’
- only true with probability samples
Part 2: Designing experiments
- experiment: when you do something to the individuals to observe the response
- experimental units: individuals that the experiment is done on
- subjects: when the experimental units are humans
- treatment: the specific experimental condition done on the subjects
- important to know which variable is an explanatory variable and which is response variable
- explanatory variable also called factors
- many experiments have results that are because of the effects of two or more factors combined
- using more than one factor in an experiment can make results that are different from the results from only one factor
- level: the amount of treatment used
- advantage: experiments can give good evidence for causation
- in experiment, we study specific factors that we are interested in while controlling the environment to prevent lurking variables
- control
- there are experiments that have only one treatment; we will then perform the experiment and the observe response
- no matter the experiment, we will need to rely on controlled environment to prevent lurking variables
- if there is lurking variable affecting results of experiment, that means there is a placebo effect
- placebo: a dummy/false/non-intended treatment (a lurking variable)
- can lead to confounding >>> results are misleading
- can prevent falling for the placebo effect by using a control group, or the group of subjects that receive a placebo
- Control is the first basic principle of designing an experiment; comparing between different experiments in the same setting is the simplest form of control.
- no control >>> bias (systematically favoring an outcome)
- control vs. control group
- control: effort to minimize variability in the way experimental units are obtained and treated
- control group: group of experimental unit that are given a placebo treatment
- replication
- there is natural variability among experimental units
- that means that variation in an experiment can be caused by both treatments or natural variability
- to minimize natural variability, we want to see units from a treatment group responding similarly to the units in the same group but differently to units in different groups
- if we contain many experimental units in each group, then the natural variations will minimize
- so replication in experiments in statistics means using enough experimental units in order to reduce natural variation; it is the second principle of experiments
- also could mean repeat the experiment again and again, but that isn’t the principle we mean here; here, we mean using enough experimental units
- randomization
- used to assign experimental units to treatments
- comparison of effects of experiments only good, useful, and true if the treatments are applied to similar experimental units
- use chance assign the units so that it doesn’t involve the experimenter's judgements
- can be combined with matching the units
- we should make sure that the treatment groups are around the same size
- only used to select which person gets which treatment, not who participates in experiment; people decide if they will be in the experiment voluntarily
- randomization: the subjects in the experiment are similar to each other; control makes sure that there are no lurking variables, and that all variables instead of the treatments act equally on both groups
- summary of principles of experimenting
- control: get rid of the possible effects of lurking variables
- replicate: use many units so natural variations are minimized
- randomize: use chance to assign units to treatment groups
- want to see difference in responses so large that it isn’t just because of natural variation
- then we can say that the effect of the treatment is statistically significant
- means that there is good evidence for the effect they are looking for
- completely randomized design
- when units are assigned to groups randomly, with no matching or blocks
- can be used to compare any number of treatments, and treatments with more than one levels or factors
- blocking
- when experimental units are similar in some way that can affect the treatments (ex. gender, age, etc.)
- randomization is carried out separately within each block
- blocks can be any size, and are another form of control
- control effects of lurking variables by making the lurking variables blocks
- lets us draw conclusions about each block
- we should form blocks based on the most important unavoidable sources of variability among the unit to get rid of any effects of lurking variables, then randomize minimize natural variation
- matched pair designs
- matching units in various ways can produce more precise results
- matched pairs is the simplest form of matching
- compares only two treatments
- randomization still important, but it is the pairs that are randomized, not the individuals; the pairs stay matched
- goal: try to get pairs that are as similar to each other as possible
- comparing similar individuals produces more usable and efficient info than comparing individuals who are not similar
- pair can either be two people, each taking different treatment; or one person, taking both treatments at different times
- matched pairs are a kind of block design
- cautions about experimentation
- units must be treated identically in every way except for the actual treatments being compared
- good way to do this: double-blind
- neither unit or person giving treatment knows which treatment is given
- avoids bias because knowing the treatment might affect how the treatment is given and how the unit responds
- environments can affect outcomes
- so evidence for causation usually requires many studies to have experiments done in different places with different details and still have same results
- experiments can have lack of realism
- means can’t be done realistically, so we can not realistically produce conditions we want to study >>> can’t make conclusions about that unrealistic setting >>> will have to generalize, but the analysis of the experiment can’t tell us how far we can generalize the conclusion into other settings
No comments:
Post a Comment