Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Business Statistics Final Exam Cheat Sheet, Cheat Sheet of Statistics

All Business Statistics Foundation. Includes probability, Anova, single and two data sample means and etc.,

Typology: Cheat Sheet

2020/2021

Uploaded on 04/12/2023

phillip-lim
phillip-lim 🇨🇦

5

(1)

1 document

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
COMM 291 Final Formula Sheet
Chapter 5: Displaying and Describing Quantitative Data
Measures of Centre
Excel
Mean
𝑥 = 𝑥𝑖
𝑛
=AVERAGE
Median
𝑛+1
2
=MEDIAN
Mode
Highest point in the histogram
=MODE.SING
Measures of Spread
Range
Maximum value minimum value
Standard deviation
s = (𝑥𝑖−𝑥)2
𝑛−1
=STDEV.S
Percentiles
75th percentile = 3rd or upper quartile =
Q3
25th percentile = 1st or lower quartile =
Q1
=QUARTILE
Interquartile range (IQR)
IQR = Q3 Q1
Outliers
Greater than: Q3 + 1.5 (IQR)
Less than: Q1 1.5 (IQR)
Extreme Outliers
Lower outer fence = Q1 3(IQR)
Upper outer fence = Q3 + 3(IQR)
Chapter 6: Correlation and Linear Progression
Measure
Formula
Excel
Correlation Coefficient
r = 1
𝑛−1(𝑥𝑖−𝑥
𝑆𝑥)(𝑦𝑖−𝑦
𝑆𝑦)
=CORREL
Residuals
𝑒𝑖= 𝑦𝑖 𝑦𝑖
Observed y predicted y
Least Square Regression
𝑦 = 𝑏0+𝑏1𝑥
Data analysis tool pack
(‘Regression’)
b1
𝑟𝑆𝑦
𝑆𝑥
Data analysis tool pack
(‘Regression’)
b0
𝑦𝑏1𝑥
Data analysis tool pack
(‘Regression’)
R-Squared
r2
Data analysis tool pack
(‘Regression’); (=CORREL)2
Standard deviation of the
residuals
𝑠𝑒=𝑒𝑖2
𝑛2
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Business Statistics Final Exam Cheat Sheet and more Cheat Sheet Statistics in PDF only on Docsity!

COMM 291 – Final Formula Sheet

Chapter 5: Displaying and Describing Quantitative Data

Measures of Centre Excel Mean

=AVERAGE

Median 𝑛 + 1

=MEDIAN

Mode Highest point in the histogram =MODE.SING Measures of Spread Range (^) Maximum value – minimum value

Standard deviation s = √∑(𝑥𝑖−𝑥̅^ )

2

=STDEV.S

Percentiles 75 th^ percentile = 3rd^ or upper quartile = Q 25 th^ percentile = 1st^ or lower quartile = Q

=QUARTILE

Interquartile range (IQR) IQR = Q 3 – Q 1 Outliers (^) Greater than: Q3 + 1.5 (IQR) Less than: Q1 – 1.5 (IQR) Extreme Outliers Lower outer fence = Q1 – 3(IQR) Upper outer fence = Q3 + 3(IQR)

Chapter 6: Correlation and Linear Progression

Measure Formula Excel Correlation Coefficient (^) r = 1 𝑛− 1 ∑^ (

𝑥𝑖−𝑥̅ 𝑆𝑥^ )^ (

𝑦𝑖−𝑦̅ 𝑆𝑦^ )^

=CORREL

Residuals 𝑒

𝑖 =^ 𝑦𝑖 −^ 𝑦̂ 𝑖

Observed y – predicted y

Least Square Regression 𝑦̂ = 𝑏

0 +^ 𝑏 1 𝑥^

Data analysis tool pack (‘Regression’) b 1

Data analysis tool pack (‘Regression’)

b 0 𝑦̅ − 𝑏 1 𝑥̅ Data analysis tool pack

(‘Regression’)

R-Squared r^2 Data analysis tool pack

(‘Regression’); (=CORREL)^2 Standard deviation of the residuals (^) 𝑠𝑒 = √ ∑^ 𝑒𝑖

2

𝑛 − 2

Chapter 8: Random Variables and Probability Models

COMM290 Review : If X is a random variable:

Measure Formula Excel Expected Value E(X)=∑X*P(X) =SUMPRODUCT E(X) =  = np Variance Var(X)=∑(x-μ)^2 *P(X) ^2 = 𝑛𝑝𝑞, where 𝑞 = 1 − 𝑝 Standard Deviation (^) σ = √Var(X) =STDEV.S

Linear transformation: a + bX

Expected Value

Variance 𝑉𝑎𝑟(𝑎 + 𝑏𝑋) = 𝑏^2 𝑉𝑎𝑟(𝑋)

Standard Deviation

Sum of two INDEPENDENT random variables: X+Y

Expected Value

(Mean of a sum = sum of the means) Variance 𝑉𝑎𝑟(𝑋 + 𝑌) = 𝑉𝑎𝑟(𝑋)

  • 𝑉𝑎𝑟(𝑌)

Standard Deviation

Difference of two INDEPENDENT random variables: X–Y

Expected Value

Variance 𝑉𝑎𝑟(𝑋 − 𝑌) = 𝑉𝑎𝑟(𝑋)

  • 𝑉𝑎𝑟(𝑌)

Standard Deviation

Linear combination of two INDEPENDENT random variables: aX + bY

Expected Value

Variance 𝑉𝑎𝑟(𝑎𝑋 + 𝑏𝑌) = 𝑎^2 𝑉𝑎𝑟(𝑋)

  • 𝑏^2 𝑉𝑎𝑟(𝑌)

Standard Deviation

= √𝑎^2 𝑉𝑎𝑟(𝑋) + 𝑏^2 𝑉𝑎𝑟(𝑌)

Inference and Sampling Distribution for Proportions

When to use: when a binary predictor is used to measure a binary outcome.

Where p = successes and q = failures,

One sample z test

Excel

Parameter p

Estimate (^) 𝑝̂ = 𝑋 𝑛 (X = number of successes, n = sample) Model One sample z test

SE (estimate of standard dev) SD(𝑝̂^ ) =^ √

𝑝 0 𝑞 0 𝑛

Null Hypothesis H 0 : p = p 0

Alternative Hypothesis

HA: p ≠ p 0

(Can be one sided)

HA: p > p 0

HA: p < p 0

Test Statistic

Z =

𝑝̂ −𝑝 0 √𝑝^0 𝑛𝑞 0 = NORM.S.INV(prob)

P-Value

HA: p > p 0 p = Pr(z > z-stat) (^) = 1 – NORM.S.DIST(z-

stat, cumulative) HA: p < p 0 p = Pr(z < z-stat) (^) = NORM.S.DIST(z-stat,

cumulative) HA: p ≠ p 0 p = 2 x Pr(z < z- stat)

= 2 x NORM.S.DIST(z- stat, cumulative)

Critical Values (z) (if you cannot find p-value)*

Sig Level (%)

One sided

Two sided

5 1.645 1.96 = NORM.S.INV(prob) 1 2.33 2. 0.1 3.09 3. Confidence Interval (Always 2-sided) 𝑝̂ ± z*√

𝑝̂ ( 1 −𝑝̂ ) 𝑛 Sample Size for a certain confidence interval n =^

(𝑧∗)^2 𝑝̂ ( 1 −𝑝̂ ) (𝑀𝐸)^2 (approximation with 95% CI) n =

1 (𝑀𝐸)^2

Two Samples z test

When to use: When there are 2 binary predictors (2 categories) and 2 binary outcomes ( categories).

Excel

Parameter p 1 – p2, Mean of (𝑝̂ 1 – 𝑝̂ 2 ) Estimate 𝑝̂ 1 – 𝑝̂ 2 , where (𝑝̂ 1 = X 1 / n 1 and 𝑝̂ 2 = X 2 / n 2 ) Model Two sample z test

SE (estimate of standard dev) (^) √𝑝̂^1 𝑞̂^1 𝑛 1

Null Hypothesis Ho: p 1 – p 2 = Δ 0

Alternative Hypothesis Ha: p 1 – p 2 ≠ Δ 0

Test Statistic Z^ =^

(𝑝̂ 1 – 𝑝̂ 2 )−∆ 0

√𝑝

̂ 1 𝑞̂ 1 𝑛 1 +

𝑝̂ 2 𝑞̂ (^2) 𝑛 2 (If Δ 0 =0, we use a pooled p) 𝑝̂ 1 =

𝑋 1 𝑛 1 and^ 𝑝̂^2 =^

𝑋 2 𝑛 2 so 𝑝̂ = 𝑋 𝑛^1 +𝑋^2 1 +𝑛 2 Z =

(𝑝̂ 1 − 𝑝̂ 2 ) √𝑝̂ 𝑞̂ ( (^) 𝑛^11 + (^) 𝑛^12 )

where 𝑞̂ = 1–𝑝̂

P-Value

Ha: p 1 – p 2 > Δ 0 p = Pr(z > z- stat)

= 1 – NORM.S.DIST(z- stat, cumulative) Ha: p 1 – p 2 < Δ 0 p = Pr(z < z- stat)

= NORM.S.DIST(z-stat, cumulative) Ha: p 1 – p 2 ≠ Δ 0 p = 2 x Pr(z < z-stat)

= 2 x NORM.S.DIST(z- stat, cumulative)

Critical Values (z) (if you cannot find p-value)*

Sig Level (%)

One sided

Two sided

5 1.645 1.96 = NORM.S.INV(prob) 1 2.33 2. 0.1 3.09 3. Confidence Interval (Always 2-sided) (𝑝̂^1 –^ 𝑝̂^2 )^ ±^ z

𝑛 1 +^

𝑝̂ 2 𝑞̂ 2 𝑛 2

Two-sample t-test of Independent Means

Characteristic Unequal (or separate) Variance Version

Equal (or pooled) Variance Version

Excel

Parameter μ 1 – μ 2 Estimate 𝑥̅ 1 – 𝑥̅ 2 SE

You cannot add

standard deviations

(add the variances)

Null Hypothesis Ho: μ 1 – μ 2 = Δ 0

Alternative Hypothesis

Ha: μ 1 – μ 2 ≠ 0

When to use pooled version

SE

𝑠𝑝^2 =

= pooled variance

Regression

t-statistic

t =

√𝑠^1

2

𝑠 22 𝑛 2

t =

1

𝑛 1 +^

1 𝑛 2

Regression

Distribution (^) Approximate Exact Degrees of freedom

  • Simplest choice: df = minimum of (n 1 -1, n 2 -1).
  • Use the sum of n 1 - and n 2 -1.
  • A lengthy formula that we will only use if we do computations via excel software.

n 1 + n 2 – 2

Both If n1=n2, the two test statistics are the same. The P-value, however, is not.

  • In the final, they should be the same.

=T.DIST.2T(t-stat, degrees of freedom) =T.DIST.RT(t-stat, degrees of freedom) Critical Value =T.INV.2T(sig level,df), Regression

Confidence Interval

(x̅ 1 – x̅ 2 ) ± t*min(n1-

1,n2-1) √

s 12 n 1

s 22 n 2

(𝑥̅ 1 – 𝑥̅ 2 ) ± t*(n1+n2-2)

1

𝑛 1 +^

1 𝑛 2 Regression

Two-sample t-test of dependent means (paired t-test)

When to use: same sample, different situations. EX: same person’s midterm and final results.

Excel

Parameter di = x1i – x2i

Estimate Sample mean of the differences, 𝑑̅

Model Paired t test

sd (estimate of standard dev)

Compute the differences and then find their standard deviation on excel. SE 𝑠𝑑

Null Hypothesis H 0 : μd = 0

Alternative Hypothesis

HA: μd ≠ 0 HA: μd > 0 HA: μd < 0

Test Statistic t =^

𝑠𝑑 √𝑛

,

with degrees of freedom n-

P-Value

HA: μ > 0 Pr (t > t-stat) (^) =T.DIST.RT(t-stat,df)

HA: μ < 0 Pr(t < t-stat) (^) =T.DIST.RT(t-stat,df)

HA: μ ≠ 0 2 x Pr(t > t-stat) (^) =T.DIST.2T((t-stat,df)

Critical Values (z) (for CI)*

=T.INV. 2T(prob,df)

Confidence Interval

(Always 2-sided) 𝑑

̅ ± t*n- 1

Types of Errors in Hypothesis Testing

Decision Accept Ho Reject Ho True State

Ho True Correct Type I error (False positive) Ho False Type II error (False negative)

Correct

Chi-square test of two way tables (See summary for steps)

When to use:

  • Test of independence
  • Test of homogeneity

A 2x2 table can be tested using either a two-sample z-test for two proportions or a 𝜒^2 table.

  • Chi-square does not have a confidence interval.
  • The only time the z test HAS to be the one chosen is when you need to later perform a confidence interval.

Excel

Parameter NO PARAMETERS

Model (^) Chi Square test

Null Hypothesis Ho: Row and column classifications are independent (or unassociated, or unrelated)

Alternative Hypothesis

Ha: Row and column classifications are dependent (or associated, or related)

Expected Counts

𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 × 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙

Test Statistic 𝜒^2 = ∑

(𝑂𝑏𝑠−𝐸𝑥𝑝)^2 𝐸𝑥𝑝

, where the

sum is taken over all the cells in the table.

P-Value (Always one sided) (^) P-value = Pr (𝜒^2 > 𝜒^2 - stat); Degrees of freedom = (r–1)(c–1), (r = number of rows, c = number of columns)

=CHISQ.DIST.RT(test- statistic, degrees of freedom)

Critical Values (X) (for CI)*

= CHISQ.INV.RT

Chapter 16: Introduction to Statistical Models

𝑌 = 𝛽 0 + 𝛽 1 𝑋 + 𝜀 Y is the unknown dependent variable (outcome) X is the known independent variable (predictor) ε is a random variable representing random error β 0 and β 1 are parameters

PARAMETER

β 1

PARAMETER

β 0

Estimate b 1 = 𝑟 𝑆𝑦

𝑆𝑥

b 0 = 𝑏 0 = 𝑦̅ − 𝑏 1 𝑥̅

Mean β 1 β 0

SE 𝑠𝑒

, or

√∑(𝑥𝑖−𝑥̅ )^2 𝑠𝑒√

1 𝑛

𝑥̅ 2 (𝑛− 1 )𝑠𝑥^2

, or

𝑠𝑒√

1 𝑛

𝑥̅ 2 ∑(𝑥𝑖−𝑥̅ )^2 S^2 e

𝑠𝑒^2 =

∑ 𝑒𝑖^2

Sampling distribution Normal Null Hypothesis HO: β 1 = 0 i.e.:

  1. No linear association/trend between X and Y.
  2. X is not a useful predictor of Y.
  3. Knowing X does not tell you anything about Y.
  4. Model is not worthwhile.

HO: β 1 = 0 (Not often tested)

Alternative Hypothesis (^) Ha: β 1 ≠ 0 (Always two sided for this case)

Ha: β 0 ≠ 0

Test statistics 𝑡 =

SE(𝑏 1 )

SE(𝑏 0 )

P value P-value = 2 x Pr (t > |t-stat|) P-value = 2 x Pr (t > |t-stat|)

Confidence Intervals β 1 : 𝑏

1 ±^ t*n-^2

𝑠𝑒 𝑠𝑥√𝑛− 1

β 0 : 𝑏 0 ± t*n- 2 𝑠𝑒√

1 𝑛

𝑥̅ 2 (𝑛− 1 )𝑠𝑥^2

First Hypothesis Test for Correlation

When to use:

  • Testing whether β 1 = 0 is equivalent to testing whether ρ = 0.
  • To see if the model has predictive value Excel

Parameter Correlation coefficient of the population (ρ). Estimate r

Null Hypothesis Ho: ρ = 0

Alternative Hypothesis Ha: ρ ≠ 0

Test Statistic t =

√ 1 −𝑟^2

, df = n- 2

P-Value (Two sided)

P-value = Pr (𝜒^2 > 𝜒^2 -stat) =T.DIST(t-stat,n-2)

Another Test for β1: Analysis of Variance for Regression (ANOVA TABLE)

When to use: to answer whether the X variable is a useful predictor of Y; if the model is worthwhile.

∑(𝑦𝑖 − 𝑦̅ )^2 = ∑(𝑦̂𝑖 − 𝑦̅ )^2 + ∑(𝑦𝑖 − 𝑦̂𝑖)^2

Sum of Squares Total (SST) = Sum of Squares Model (SSM) + Sum of Squares Error (SSE) (Regression) (Residual)

Source of Variation

Sum of Squares

Degrees of Freedom

Mean Square F-stat

Model SSM 1 MSM=SSM/df MSM/MSE Error SSE n-2 MSE=SSE/(n-2) Total SST n-

SSE = ∑(𝑦𝑖 − 𝑦̂𝑖)^2 = ∑ 𝑒𝑖^2

MSE = SSE/(n-2) = 𝑠𝑒^2 , so se = √𝑀𝑆𝐸

Analysis of Variance to Test the Worthiness of the Model

Excel

Parameter β 1

Se^2 𝑠𝑒^2 =

∑ 𝑒𝑖^2

𝑛 − 2

“Standard Error” in the Regression analysis of excel, (standard error of the residuals) Null Hypothesis Ho: β 1 = 0 (The model is not worthwhile; x has no predictive info about y)

Alternative Hypothesis

Ha: β 1 ≠ 0 (The model is worthwhile; x has some predictive information over y) Test Statistic (Overall F test)

Test statistic: F = MSM / MSE = F.DIST.RT(F- stat,df1(k),df2(n-k-1) = T.DIST.2T(SQRT(F- stat), n-k-1) F-stat and t-stat (for simple regression only)

t^2 n- 2 = F1, n- 2 t = (^) √𝐹

P-Value (Always one sided) P-value = Pr (F > F-stat) = F.DIST.RT(f-stat,k,n-

R^2 =

𝑆𝑆𝑀 𝑆𝑆𝑇

= proportion of total variation explained by the model

r = √

𝑆𝑆𝑀 𝑆𝑆𝑇 , (excel says R-multiple, but you have to find the sign from the slope sign)

ANOVA Table (The same, with minor changes in the Degrees of Freedom column).

Source of Variation

Sum of Squares

Degrees of Freedom

Mean Square F-stat

Model/Regression SSM k MSM=SSM/k MSM/MSE Error/Residual SSE n-k-1 MSE=SSE/(n-k-1) Total SST n-

R-squared is now formally called the Coefficient of Multiple Determination, but is defined the same way:

R^2 =

= proportion of total variation explained by the model

F-test to Compare Full and Reduced Models

When to use: to see if all present variables are relevant, and to avoid multicollinearity. Full Model: Use all available predictor variables Reduced Model: Subset of Full Model: drop q of the predictor variables

Null Hypothesis H 0 : The reduced model is adequate (The extra variables in the full model that are not in the reduced model do not provide a significant improvement in the model's predictive ability)

Alternative Hypothesis

Ha: The reduced model is not adequate (At least one of the extra predictors is worthwhile, so the reduced model discards some worthwhile predictors; therefore, it is better to keep the full model rather than this reduced model.) Test Statistic (F test to compare full and reduced models)

𝑅^2 (𝑓𝑢𝑙𝑙)^ − 𝑅^2 (𝑟𝑒𝑑𝑢𝑐𝑒𝑑)

1 − 𝑅^2 (𝑓𝑢𝑙𝑙)

k = number of X-variables in the Full Model q = number of variables dropped from the Full Model to get the Reduced Model

EXCEL WILL NOT

DO IT FOR YOU

P-Value (Always one sided) P-value = Pr (F > F-stat) = F.DIST.RT(f- stat,k,n-k-1)

Regression Output

Other formulas:

(3): R^2 : 1 – (1–R^2 )[(n–1)/(n–2)]

(15): Pr[F1,n- 2 > (14)]

(18): 𝑠𝑒√

1 𝑛

𝑥̅ 2 ∑(𝑥𝑖−𝑥̅ )^2

𝑠𝑒 𝑠𝑥√𝑛−

(22): 2×Pr[ t n- 2 > |(20)|]

(23): 2×Pr[ t n- 2 > |( 21 )|]

(24): (16) – t *n- 2 ×(18)

(25): (17) – t *n- 2 ×(19)

(26): (16) + t *n- 2 ×(18)

(27): (17) + t *n- 2 ×(19)