Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Statistical Analysis and Probability: Data Analysis and Hypothesis Testing, Exams of Statistics

New England College (NEC)Statistics

A comprehensive guide to statistical analysis, focusing on categorical and quantitative data, data visualization, and hypothesis testing. Topics covered include histograms, skewed data, crf plots, statistical analysis, standard deviation, bivariate data, data transformation, simpson's paradox, experimental and observational studies, sampling methods, treatments, confounding variables, lurking variables, placebo effect, relative frequencies, random variables, discrete random variables, geometric distribution, probabilities, expected value, variance, standard deviation, chi-square independence or homogeneity test, one-sample mean z test, two-sample mean t test, proportion z interval, t-interval for slope, confidence intervals, and properties of the normal curve.

Typology: Exams

2023/2024

Available from 05/08/2024

star_score_grades 🇺🇸

3.6

(19)

1.7K documents

1 / 39

This page cannot be seen from the preview

Don't miss anything!

AP Statistics Exam Review

Questions with Answers

2024.

What is a dotplot?

A graphical display which shows "dots" for each

point. It's good for categorical data- ie data

classified into categories.

What's the difference between categorical and

quantitative data?

Categorical data fits into various categories;

whereas, quantitative data has numerical values

associated with it.

What is a bar chart?

A display for categorical data which indicates

frequencies or percents for each category.

What are histograms?

Histograms are good for large quantitative data

sets- either having numbers at the left/right of a

bar to show the amount of data in-between each

value or in the center of a bar to show the amount

of data at a certain value. Sometimes, the axis will

just be the frequency, but often, it can be the

relative frequency (ie. amount/total).

What do relative areas in histograms mean?

Relative areas correspond to relative frequencies

(ie. if 10% of the area for a histogram is between

25-26, that means that 10% of the data falls

between 25 and 26.

Partial preview of the text

Download Statistical Analysis and Probability: Data Analysis and Hypothesis Testing and more Exams Statistics in PDF only on Docsity!

AP Statistics Exam Review

Questions with Answers

What is a dotplot? A graphical display which shows "dots" for each point. It's good for categorical data- ie data classified into categories. What's the difference between categorical and quantitative data? Categorical data fits into various categories; whereas, quantitative data has numerical values associated with it. What is a bar chart? A display for categorical data which indicates frequencies or percents for each category. What are histograms? Histograms are good for large quantitative data sets- either having numbers at the left/right of a bar to show the amount of data in-between each value or in the center of a bar to show the amount of data at a certain value. Sometimes, the axis will just be the frequency, but often, it can be the relative frequency (ie. amount/total). What do relative areas in histograms mean? Relative areas correspond to relative frequencies (ie. if 10% of the area for a histogram is between 25-26, that means that 10% of the data falls between 25 and 26.

What's a stemplot/stem and leaf plot? It has stems which are some digit and leaves which are the other part of the number (for example depending on context 5|7 could be 57, 5.7, or some other variant- that's why a key must always be included). It's good for looking at individual data in small data sets. What is important in analyzing visual data displays? SOCS (Shape, Outlier, Center, Spread): Shape-How is the data shaped (skewed left/right, symmetric, bimodal, etc.)? Are there any clusters (subgroups which the data falls into)? Are there any gaps in the data set? Outliers: Are there any outliers within the data set? Center: Give the mean/median- the value which is the approximate midpoint of the data Spread- What is the range OR IQR (if it's easy to find) of the data set? What is a mode? How do modes relate to unimodal/bimodal data sets? A mode is a major peak in the data (most repeated value). A unimodal data set has just one mode; whereas, a bimodal data set has two modes. What are some possible descriptions of shapes within distributions?

Descriptive statistics means summarizing averages, shape of a distribution, etc. while statistical analysis means drawing inferences from limited data. What are the two main ways of measuring center? The median (the middle number of a set when arranged in order). The mean (summing the values in a set and dividing by the number of quantities in that set) When does it make more sense to use the median over the mean? When there are outliers which we want to minimize. We say the median is RESISTANT to outliers (which means it's not affected). What are the notations for mean of a population and mean of a sample? The sample mean usually assumes a simple random sample. The mean is computed by ∑x/n. What are the ways of describing variability/dispersion of the measurements?

Range - difference between largest and smallest values.
IQR- difference between largest and smallest values after removing lower and upper quarters.

There are two ways of computing this : way 1) simply take out upper and lower quarters of the data and subtract.

Find Q1 by taking the median of the lower half and Q3 by the median of the upper half (median itself must be included if there are an odd number of points). Then do Q3-Q1 to get the IQR. These should be equivalent if there are many data points.
Variance- an average of squared distances from the mean
Standard deviation- square root of the variance What is the rule for designating outliers? Outliers are considered to be any value above Q3+1.5IQR OR any value below Q1-1.5IQR How is the variance calculated for a population? How is it calculated for a sample? So for a population you sum up all of the squares of the deviations from the mean and divide by the number of terms. You do the same thing for a sample but divide by number of terms-1 due to degrees of freedom.

The deciles have ranks of 10% and 90%. What is the formula for a z score? This shows the number of standard deviations away from the mean. Also, if you're given a z score, the mean, and the standard deviation, you can solve for an x value. What is the empirical rule? The empirical rule says that for symmetric, bell- shaped data, 68% of the data lies within one standard deviation of the mean, 95% lies within 2 standard deviations of the mean, and 99.7% of the data lies within 3 standard deviations of the mean. How is the empirical rule related to range? The empirical rule can indicate arithmetic errors as the range should be somewhere between 4 times the standard deviation and 6 times the standard deviation. How does skewed data affect how the mean compares to the median? If data is skewed to the left, the mean is usually lower than the median. If data is skewed to the right, the mean is usually higher than the median. What is a boxplot?

It gives a 5 number summary with a whisker out to the highest value, a line at Q3, a line for the median, a line at Q1, and a line out to the lowest value. Alternatively, outliers can be depicted as dots on the boxplot, and the lines just go to the highest/lowest values not considered to be outliers. What is the effect on mean, median, range, and standard deviation of adding a certain amount or multiplying by a certain amount to every value in the data set? Adding: Changes the mean & median by that amount but doesn't change the range or standard deviation. Multiplying: Changes mean, median, range, and standard deviation all by that same factor. What are some graphical methods of comparing distributions?

Dotplots either above or next to each other for each distribution. 2)Double barcharts with bars next to each other to make the comparison.
Back to back stemplots with leaves going out to either side

What is r²? r² is called the coefficient of determination and gives the percentage of variation in y explained by x. One must be careful when finding r from r² in terms of assigning positive/negative values. What is the least squares regression line? It's the line that is the best fitting as it minimizes the squares of the residuals. It's equation can be determined as it goes through the mean of x (x bar) and the mean of y (y bar). The slope is determined by b1=r *(sy/sx) where sy is the standard deviation of y, and sx is the standard deviation of x. What is the equation for the line comparing z scores of y to z scores of x? zy=rzx What's the difference between interpolation and extrapolation? Interpolation is inside the scope of your data range which is good. Extrapolation is outside your data set and is risky as you don't know whether the linear trend will continue. What does y hat really indicate?

The mean prediction for each x value (there could be a variety of y values, so it simply gives the mean) What is a residual plot? Observed-expected value gives the residuals. A residual plot gives the residuals on the y axis and the x values on the x. What is the mean and standard deviations of residuals? The mean of the residuals is always 0. The standard deviation of residuals is given by the following formula: The standard deviation of residuals indicates a typical residual value. In computer output, it's given by S. What are you looking for in a residual plot? Small, balanced residuals which don't show any kind of curve/pattern. What are outliers and influential points in regression? Outliers deviate from the overall pattern. Influential points sharply change the slope of the regression line. How do you transform data to make it linear?

attack severity). This information can be displayed in a bar chart. What are conditional relative frequencies? Dividing each value by the marginal frequency of that row or column. So you could divide the number of non fatal heart attacks with low cholesterol by the total number of non fatal heart attacks. This information can be displayed in side by side bars in bar charts or alternatively by segmented bar charts in order to gauge association. What is perfect independence in two way contingency tables? Perfect independence is when the conditional relative frequencies all match up. However, even if two variables are completely independent, they may not necessarily show perfect indepndence. What is Simpson's paradox? Simpson's paradox is when the results from a combined grouping contradict the results for an individual group (due to lurking variables). Ie. if there are two doctors and you're comparing survival rates, you may initially conclude that one doctor is better than the other (based on combined survival rate). However, if you split these groups

into good & bad condition of the patients that they're treating, you may come to the opposite conclusion. What is a census? What are the advantages/disadvantages of a census? A census is a complete enumeration of the population. It's ideal because you manage to capture everybody. However, it can be very time consuming/costly. Also, it would be far better to take a sample and do it well then to conduct a poorly run census. What is a sample survey? A sample survey just takes a part of the whole population to survey. What's necessary for a good sample survey? Avoiding bias which is frequently achieved by randomization. Also, a large sample size gives more validity to the results (NOTE: It's the actual size not percentage- a group of 500 in a population of 100,000 is just as good as a group of 500 in a population of 1,000,000). What is an experiment The researchers divide subjects into appropriate groups. Most often there is a treatment group which receives the treatment and a control group which does not (often receiving a placebo). What are the facets of a well designed experiment?

What is a simple random sample? What are some ways to get a simple random sample? In a simple random sample, every participant has an equal chance of being selected. The best ways to generate a simple random sample are via random digit tables or having a computer generate random samples. One thing you have to be careful of is that you might not have a complete listing of the population in which case randomness is not ensured. Are other sampling techniques (stratified, cluster, etc.) just subsets of simple random sampling? NO!!! In these techniques, every participant does not have equal chance of being selected. What is sampling error? No matter how well designed a survey is, it still gives a sample statistic for a population parameter, so we're always bound to have some error. Generally, the chance of an error occurring is less when the sample size is larger unless the survey was badly conducted. What are some common types of biases? Bias is defined as a tendency to favor certain members of a population. The following are the main types of bias: Household bias- only one member of a households responds, so large households are

underrepresented. Nonresponse bias- people don't respond to surveys or are too difficult to contact, thus creating a source of bias. Quota sampling bias- interviewers are at liberty to pick people (ie. a specific percentage Catholic, a specific percentage African-American, etc.). Response bias- People may lie/be untruthful when responding, especially when they're not anonymous if their views are unsavory. Selection bias- for example a newspaper interviewed just people with cars and telephones in a presidential election and predicted a landslide victory for the wrong person due to the fact that the people owning cars and telephones were wealthy and tended to vote Republican. Size bias- For instance if you have a student pick a coin out of a bag to estimate the monetary value, throw a dart at a map, etc. This benefits large states, large coins, etc. Undercoverage bias- Inadequate representation- for instance there were phone surveys to landlines which left out people who only had cell phones. Another instance of this is convenience samples,

school classes to survey. Multistage sampling- there are two or more steps, each of which involves any of the other sampling techniques. For instance, some organizations randomly select nationwide locations, then randomly pick neighborhoods in each of these locations, then randomly pick households in each of these neighborhoods. What is an experiment vs. an observational study vs. a survey? An experiment is when a treatment or change is assigned. An observational study is when we observe or measure something which is occurring. A sample survey is a particular type of observational study when we look at a sample. What are explanatory and response variables? What are treatments? Explanatory variables (called factors) are what is being changed/tested and is believed to have an effect on the response variable (which is being measured). Treatments consist of factor-level combinations (for instance, you could have two factors and 3 levels of each factor for a total of 6 treatments).

What is confounding? What are lurking variables? How can both of these effects be overcome? Confounding is when there's uncertainty with regard to which variable is causing a given set of results (for instance if two or more variables are being altered). A lurking variable is a variable driving two other variables (for instance, those with higher shoe sizes have higher reading levels not because of their shoe size but because of the lurking variable of age). This can also be described as a common response in that the lurking variable and the measured variable seem to be producing the same response. What is a control group? What is the placebo effect? How can the placebo effect be minimized? A control group is one which doesn't receive the treatment, and the treatment group receives the treatment. People can randomly be assigned to control & treatment groups in order to minimize confounding/lurking variables. The placebo effect is when people respond to any treatment (for instance, they might report that a sugar pill makes them feel much better). This can be overcome by either single-blinding in which the subjects don't know what they're receiving or double-blinding in which neither subjects nor

Statistical Analysis and Probability: Data Analysis and Hypothesis Testing, Exams of Statistics

Related documents

Partial preview of the text

Download Statistical Analysis and Probability: Data Analysis and Hypothesis Testing and more Exams Statistics in PDF only on Docsity!

AP Statistics Exam Review

Questions with Answers