Chapter 4 up! I couldn't find any links that are helpful for me in the chapter. Hopefully, you could have better luck. Anyways, I hope this is the last one of these kinds of messages I will post for a while, but any other links you see here are just info that I find helpful for understanding the material. I did not create the info in those links; they are just there to help...
AP Statistics Ch 4: More relationships between two variables
Part 1: Achieving Linearity
- if a relationship is not linear, then we can make it linear by applying a function to a variable
- called transforming or reexpressing data
- what are we doing?
- we are doing nonlinear transformations (logs, sq roots, etc.)
- ultimately changing scale of data
- transformation needed happens because of a model... there are exponential models and then there are power models
- exponential model/growth
- function: a(bx)
- growth occurs because a variable is multiplied by a constant once every certain amount of time
- in contrast to linear growth, in which constant is added
- the graph looks like
- can use logarithms to transform the data
- logarithm transformation
- logb(x) = y if and only if x = by
- logb(mn) = logbm + logbn
- logb(m/n) = logbm - logbn
- logb(mp) = p x logbm
- b can’t equal 1
- names of log: logbm is a common log and lne is a natural log
- transforming exponential models and predicting y from it
- so to transform into linear... plot ln(y) against x
- becomes log(y) = log(a) + (log(b))x
- we can use log(y) too... it doesn’t matter which one we use...
- to predict response variable, do a LSRL using ln(y) and x, then take that equation, plug in the stated x and solve the side that used to have x, then solve for y
- if told you to solve for y, take the LSRL (with the ln(y) and x), plug in the stated y, and solve for x
- round AFTER you reach the final answer; otherwise, we will have a roundoff error (errors produced by rounding)
- what about power models?
- function: a(xp)
- to transform into linear, take the logs of both x and y
- becomes log(y) = log(a) + p(log(x))
- p becomes the slope
- to predict response variables, undo the log of y, plug in the x, and then solve...
- as always...
- extrapolation leads to inaccuracy, and always use y-hat when predicting!
Part 2: Relationships between categorical variables
- measured by using counts or percentages of individuals that fall into certain categories
- we record the categorical variables on two-way tables
- two kinds of distributions for describing categorical variables: marginal and conditional
- marginal distribution
- only looks at the total of one section of one variable (I’m going to call this “section variable”), so each marginal distribution is a distribution for only one of the categorical variables
- counts: the total number of individuals of each section
- percentages: more informative than counts, great to use for making comparisons
- calculate by dividing section total from the total of all the individuals interviewed
- (total number of individuals in the section you want to know about)/(the total number of individuals interviewed)
- don’t tell how variables are related; conditional distributions do that
- conditional distribution
- use percentages
- compares the two variables
- ex.: What percentage of women are 30-45 yrs old?
- to calculate: (number of women who are 30-45 yrs old)/(total number of individuals who are women)
- ex: What percentages of 30-45 yrs olds are women?
- to calculate: (number of women who are 30-45 yrs old)/(total number of individuals who are 30-45 yrs old)
- as shown, wording does affect how you calculate the conditional distributions!
- no graph shows the form of the relationship between categorical variables, and no numerical measurement shows the strength of the association
- bar graphs can be helpful, but be sure you are displaying the variables that you want to compare
- simpson’s paradox: a situation in which an introduction of a new variable reverses the conclusions and associations/comparisons of a group of data; this new variable is a lurking variable
- the lurking variable is categorical
Part 3: Establishing Causation
- there must be a clear association between explanatory and response variable before we can start speculating causation
- causation: direct cause and effect between the explanatory and response variables
- even with direct causation, the causation is rarely a complete explanation of an association
- causation, even if it is 100% true in one situation, can change into something that suggests that the causation is not true in other settings
- ex. x causes y in one species, but there is no data that proves x causes y in another species
- common response: another thing that can appear in an association; in this, both the explanatory and response variable will change due to a lurking variable
- can happen even when there is not direct causation between the explanatory and response variables
- confounding: occurs when two variable’s effects on the response variable are the same; their effects can’t be distinguished; either lurking or explanatory variables can be confounding
- prevents us from making conclusions about causation
- can’t say how strong or even if there is x’s direct effect on y
- both common response and confounding means there is a lurking variable
- how to make sure that there is causation between two variables
- best method is conducting carefully controlled experiments, but in many situations, we can’t conduct these kinds of experiments
- the way to do this is:
- first make sure association is strong and consistent
- make sure that if the response variable is larger, that means the effect is stronger
- the suspected cause always comes before the effect (response variable)
- the cause seems reasonable or possible
No comments:
Post a Comment