Chapter 1
Part 1: Types of Graphs
- Graphs for categorical variables
- Pie charts: must include all the categories to make up a whole; use only when want to emphasize category’s relation to the whole
- Bar graphs: can be used to compare quantities with the same units
- Graphs for quantitative variables
- stemplot: also called stem-and-leaf plot
- shows shape and actual numerical values
- looks like >>>
- the stem is on the left/middle, can contain as
much digits as needed
- the leaves are on the far right/ right and left,
but can only contain one digit
- written in increasing order out from the stem
- stemplots might not work as well for large data sets in which stem must hold a lot of leaves; can solve this by
- splitting stems: two stem with same digits, one with leaves 0-4, other with leaves 5-9
- trimming: removing last digit or digits of value before plotting them
- Histograms: breaks range of values into classes and displays the count or percent of observations that fall into these classes
- always choose classes of equal width
- Histograms vs. Bargraphs
- histogram: distribution of counts or percentages among values of a single quantitative variable; drawn with bars touching
- bargraphs: distribution of categorical variable; drawn with spaces between bars
- examining distribution
- look for overall patterns and obvious deviations from pattern
- determine pattern through shape, center, and spread
- determine deviations by looking for outliers
- spread: giving the smallest and largest values
- shape: unimodal (one mode), bimodal (two modes), or skewed
- state values with modes if it is uni/bimodal
- skewed left: tail to the left; skewed right: tail to the right
- relative and cumulative frequency
- Ogive: a cumulative frequency graph
- teacher said don’t need to know how to construct, but need to know how to read
- read by either starting at a value and the locating the percentile to find the percentile of a value, or starting at a percentile and then locating the value to find the value of the percentile
- time plots: plots variable observed against the time
- time on horizontal scale, variable on vertical scale
- useful because distributions that ignore time order can be misleading when there is systematic change over time
- can reveal trends
- positive/negative or upwards/downwards secular trends: when plots have positive/negative slope
- variation: looks like series of pointed mountains and valleys
- seasonal: when the variations are happening regularly; can predict; a certain thing happening every time at a certain predictable time; regular time intervals
- cyclic: has peaks and troughs, but unpredictable
- random: no obvious patterns of peaks and troughs; can have trend, though
- more info about trends: http://www.youtube.com/watch?v=ca0rDWo7IpI
Part 2: Describing Distributions with Numbers
- description should include what the distribution is about, its shape, center, and spread
- center: mean, median, or mode
- use mean when distribution is normal, median when distribution is skewed, and mode when distribution is uni/bimodal
- mean is nonresistant, median is resistant
- can calculate the place that median is located at by using (n+1)/2; formula doesn’t give median, just the place median is...
- mean vs. median
- Typical value of skewed distribution: median
- average: mean
- spread: uses quartiles; is the variability
- reporting only center can be misleading
- simple useful description: has measure of center and spread
- range: max - min
- quartiles: 1st quartile calculated by (n+1)/4, 3rd by 3(n+1)/4
- median: 50th percentile, Q1: 25th percentile, Q3: 75th percentile
- five-number summary: not very common; describe the min., Q1, median, Q3, and max.
- used to make a box-plot, or box-and-whiskers
- outliers: calculated by the 1.5 x IQR rule
- range misleading b/c of outliers, so use IQR = Q3 - Q1
- is outlier if more than Q3 + 1.5 x IQR or less than Q1 - 1.5 x IQR
- Standard deviation: the square root of variance
- variance: ((x1-mean)2 + (x2-mean)2 ..... (xn-mean)2)/(n-1)
- sum of the deviations (x-mean) will always be 0
- should only be used if mean is the center
- 0 if there is no spread/variability (all values the same)
- not resistant, even more sensitive that the mean
- five number summary great to use if distribution is skewed
- graph gives best overall picture of distribution, while reporting just center and spread won’t give you any idea of shape; always plot data
- Changing the unit
- can change using linear transformation in which the xnew = a + bx
- a: shifts all values of x up or down, b changes size of unit
- how the spread and center change
- IQR and standard deviation does not change when we add same number a to all observations
- multiplying by B changes measures of center (mean and median) and spread (standard deviation and IQR) by multiplying them by b
- adding number a to each observation adds a to measure of center and to quartiles, but doesn’t change spread
- Data Analysis Toolbox
- To answer statistical question of interest involving data sets....
- organize the data and identify who are the individuals, what are the variables and units, why the data is collected, and when, where, how, and by whom the data is produced
- graph the data
- do the summaries
- interpret the data and what it means, then answer the question
More vocab and concepts for Ch 1
No comments:
Post a Comment