Sunday, January 5, 2014

Chapter 4 up! I couldn't find any links that are helpful for me in the chapter. Hopefully, you could have better luck. Anyways, I hope this is the last one of these kinds of messages I will post for a while, but any other links you see here are just info that I find helpful for understanding the material. I did not create the info in those links; they are just there to help...

AP Statistics Ch 4: More relationships between two variables

Part 1: Achieving Linearity
  • if a relationship is not linear, then we can make it linear by applying a function to a variable
    • called transforming or reexpressing data
  • what are we doing?
    • we are doing nonlinear transformations (logs, sq roots, etc.)
    • ultimately changing scale of data
    • transformation needed happens because of a model... there are exponential models and then there are power models
  • exponential model/growth
    • function: a(bx)
    • growth occurs because a variable is multiplied by a constant once every certain amount of time
      • in contrast to linear growth, in which constant is added
    • the graph looks like
    • can use logarithms to transform the data
  • logarithm transformation
    • logb(x) = y if and only if x = by
    • logb(mn) = logbm + logbn
    • logb(m/n) = logbm - logbn
    • logb(mp) = p x logbm
    • b can’t equal 1
    • names of log: logbm is a common log and lne is a natural log
  • transforming exponential models and predicting y from it
    • so to transform into linear... plot ln(y) against x
      • becomes log(y) = log(a) + (log(b))x
    • we can use log(y) too... it doesn’t matter which one we use...
    • to predict response variable, do a LSRL using ln(y) and x, then take that equation, plug in the stated x and solve the side that used to have x, then solve for y
      • if told you to solve for y, take the LSRL (with the ln(y) and x), plug in the stated y, and solve for x
    • round AFTER you reach the final answer; otherwise, we will have a roundoff error (errors produced by rounding)
  • what about power models?
    • function: a(xp)
    • to transform into linear, take the logs of both x and y
      • becomes log(y) = log(a) + p(log(x))
        • p becomes the slope
    • to predict response variables, undo the log of y, plug in the x, and then solve...
  • as always...
    • extrapolation leads to inaccuracy, and always use y-hat when predicting!

Part 2: Relationships between categorical variables

  • measured by using counts or percentages of individuals that fall into certain categories
  • we record the categorical variables on two-way tables
  • two kinds of distributions for describing categorical variables: marginal and conditional
  • marginal distribution
    • only looks at the total of one section of one variable (I’m going to call this “section variable”), so each marginal distribution is a distribution for only one of the categorical variables
    • counts: the total number of individuals of each section
    • percentages: more informative than counts, great to use for making comparisons
      • calculate by dividing section total from the total of all the individuals interviewed
        • (total number of individuals in the section you want to know about)/(the total number of individuals interviewed)
    • don’t tell how variables are related; conditional distributions do that
  • conditional distribution
    • use percentages
    • compares the two variables
    • ex.: What percentage of women are 30-45 yrs old?
      • to calculate: (number of women who are 30-45 yrs old)/(total number of individuals who are women)
    • ex: What percentages of 30-45 yrs olds are women?
      • to calculate: (number of women who are 30-45 yrs old)/(total number of individuals who are 30-45 yrs old)
    • as shown, wording does affect how you calculate the conditional distributions!
  • no graph shows the form of the relationship between categorical variables, and no numerical measurement shows the strength of the association
    • bar graphs can be helpful, but be sure you are displaying the variables that you want to compare
  • simpson’s paradox: a situation in which an introduction of a new variable reverses the conclusions and associations/comparisons of a group of data; this new variable is a lurking variable
    • the lurking variable is categorical

Part 3: Establishing Causation

  • there must be a clear association between explanatory and response variable before we can start speculating causation
  • causation: direct cause and effect between the explanatory and response variables
    • even with direct causation, the causation is rarely a complete explanation of an association
    • causation, even if it is 100% true in one situation, can change into something that suggests that the causation is not true in other settings
      • ex. x causes y in one species, but there is no data that proves x causes y in another species
  • common response: another thing that can appear in an association; in this, both the explanatory and response variable will change due to a lurking variable
    • can happen even when there is not direct causation between the explanatory and response variables
  • confounding: occurs when two variable’s effects on the response variable are the same; their effects can’t be distinguished; either lurking or explanatory variables can be confounding
    • prevents us from making conclusions about causation
    • can’t say how strong or even if there is x’s direct effect on y
  • both common response and confounding means there is a lurking variable
  • how to make sure that there is causation between two variables
    • best method is conducting carefully controlled experiments, but in many situations, we can’t conduct these kinds of experiments
    • the way to do this is:
      • first make sure association is strong and consistent
      • make sure that if the response variable is larger, that means the effect is stronger
      • the suspected cause always comes before the effect (response variable)
      • the cause seems reasonable or possible

No comments:

Post a Comment