


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
In this cheat sheet you find basic principles of Statistics for introductory courses.
Typology: Cheat Sheet
1 / 4
This page cannot be seen from the preview
Don't miss anything!
J STATISTICS - A set of tools for collecting, o r e a n i z i n g , p r e s e n t i n g , a n d a n a l y z i n g numerical facts or observations. I. Descriptive Statistics - procedures used to organize and present data in a convenient, useable. and communicable form.
-l STATISTIC - A number describing a sample characteristic. Results from the manipulation of sample data according to certain specified procedures. J DATA^ - Characteristics or numbers that a r e c o l l e c t e d b y o b s e r v a t i o n. J POPULATION - A complete set of actual or potential observations. J PARAMETER - A^ number describing a population characteristic; typically, inferred f r o m s a m p l e s t a t i s t i c. f SAMPLE^ - A^ subset of^ the population selectedaccording to some scheme. J RANDOM^ SAMPLE^ - A^ subset selected i n s u c h a w a y t h a t e a c h m e m b e r o f^ t h e population has an equal opportunity^ to be selected. Ex.lottery^ numbers in afair^ lottery J VARIABLE - A phenomenon that may take on different values.
f MEAN -The ooint in a distribution of measurements about which the summeddeviationsare equal^ to zero. Average value of a sample or population. POPULATION MEAN SAMPLE MEAN
p: (^) +!,, o:#2,
Note: The mean ls very sensltlveto extrememeasure- mentsthat are not balancedon both sides. I WEIGHTED MEAN^ - Sum of a setof observations multiplied by their respectiveweights, divided by the sum of the weights: (^) 9, *, *,
WEIGHTED MEAN -L-
w h e r ex r , :^ w e i g h t , ' x ,-^ o b s e r v a t i o n ;G :^ n u m b e ro f o b s e r v a i i o ng r d u p s. ' C a l c u l a t e df r o m a p o p u l a t i o n. sample.or gr6upings in a frequencydistribution. Ex. In the FrequencVDistribution below, the meun is 80.3: culculatbd by- using frequencies for the wis. When grouped, use clossmidpointsJbr xis. J MEDIAN - Observationor potenlialobservationin a set that divides the set so that the same^ number of observationslie on each side of it. For an odd number of values.it is the middle value; for an even^ number it is the averageof the middle two. Ex. In the Frequency Distribution table below, the median is 79.5. f MODE - Observationthat occurs with the greatest tiequency. Ex. In the Frequency Distributioln nble below. the mode is 88.
l I ; _ r t = 1
the variance: Ex. Pop. S.D. o -
n
I
fi
z
D BAR GRAPH - A form of graph that uses bars to indicate the frequency of occurrence of observations. o Histogram - a form of bar graph used rr ith interval or ratio-scaled variables.
GROUpITG
OF DATA Shows the number of times each observation occurs when the values ofa variable are arranged in order according to their magnitudes.
x f x t^ x^ f^ x^ t
96 11 87 1 7A^ I^69 95 0 88 1111111 79 1 1 70 1111 94 0 89 111 80 1 71 0 93 I 1 1 81 1 1^72 1 1 92 0 91 1 82 I 73 111
tr CUMULATUEFREOUENCYBISTRI.
I il {.ll lNl.l'tlz^ !I!^ llrfGl:
65-67 3 3 4. 6&70 8 1 1^ 17. 71-73 5 1 6 25. 7+76 9 25 40. Tt-79 6 3 1 50. 80-82 4 35 56. 83-85 8 43 69. 86-88 8 5 1^ 82. 89-91 6 57 9 1. 9 4 92-g 1 58 9 3. 5 5 95-97 2 60 96. 9&100 2 62 100.
1 5
1 0
0
-t
1 5
1 0
0
SKEWEDCURVE
Probability of occurrence^t at -Number of outcomafamring^ EwntA oif'ent'l Ant=@
o Exhaustive - two or more events are said to be exhaustive if all possible outcomes are considered. Symbolically, P (A or B or...) -^ l. rNon-Exhausdve -two^ or more events are said to be non- exhaustive if they do not exhaust all possible outcomes. rMutually Exclusive^ -^ Events^ that^ cannot^ occur simultaneously:p (A and B) = 0; and^ p (A or B) = p (A) + p (B). Ex. males, females oNon-Mutually Exclusive - Event-s that can occur s i m u l t a n e o u s l y : p ( A o r B )^ = P ( A ) + p ( B )^ - p ( A^ a n d B ) ' &x. males, brown^ eyes. Slndependent - Events whose probability^ is unaffected by occurrence or nonoccurrence of each other: p(A^ lB) = p(A); ptB In)= p(e); and p(A and B) = p(A) p(B). Ex. gender and eye color SDependent - Events whose probability^ changes deoendlns upon the occurrence or non-occurrence ofeach other: p{.I I bl^ dilfers lrom AA): p(B lA)^ differs from p ( B ) ; a n d p ( A a n d B ) : p ( A ) p ( B l A ) :^ p ( B ) A A I B ) Ex. rsce and eye colon
D CONDITIONAL PROBABILITIES^ - Probability of I given the existence of ,S,^ written,^ p (Al$. f l E X A M P L E - G i v e n^ t h e^ n u m b e r s^ I^ t o^9 a s o b s e r v a t i o n s i n a s a m p l e s p a c e : .Events mutually exclusive and exhaustive' Example: p (all odd numb ers); p ( all eu-en nurnbers) .Evenls mutualty exclusive but not exhaustive- Example: p (an eien number); p (the numbers 7 and 5) .Events ni:ither mutually exclusive or exhaustive- Example: p (an even number or a 2)
given size from some population.
THE STAIUDARDEBROR OF THE MEAN
A theoretical standard deviation^ of sample mean of a given sample si4e, drawn from some speciJied popu- lation. DWhen based on a very large, known population, the s t a n d a r de r r o r i s : (^) 6 _ _ o " r _ ^ ln
EWhen estimated from a sample drawn from very large population, the standard error is:
lThe dispersion of sample means decreasesas sample size is increased.
O = =^ t - S ' f n
RANDOM VARIABLES A mapping or function that assignsone and'onlv one-numerical value to each outcome in an exPeriment.
B i n o m i a l m e a n : (^)! : n x Binomial variance: o': n, (l -^ tr)
A s n i n c r e a s e s ,t h e B i n o m i a l^ a p p r o a c h e st h e Normal distribution. D HYPERGEOMETRIC^ DISTRIBUTION^ - A model for the sum of a series of n trials where each trial results in a 0 or I and is drawn from a small population with N elements split between N1 successesand N2 failures. Then the probabil- ity of splitting the n trials between xl successes and x2 failures is:^ Nl! (^) {_z!
p ( x l a n d t r r : W 't 4tlv-r;lr
Hypergeometric mean: pt :E(xi^ - + andvariance: o2: ffit+][p]
D POISSON DISTRIBUTION - A model for the number of^ occurrences of an event x^ : 0 , 1 , 2 ,... ,w h e n t h e p r o b a b i l i t y o f o c c u r r e n c e i s s m a l l , b u t t h e n u m b e r o f o p p o r t u n i t i e s f o r t h e o c c u r r e n c ei s l a r g e , f o r x :^ 0 , 1 , 2 , 3... .a n d )v > 0. otherwise P(x) =. 0. e $ t = f f P o i s s o nm e a n a n d r a r i a n c e : , t.
Fo r c ontinuo u s t'a ri u b I es. .fi'eq u en t'^ i es u re e.tp re^ ssed in terms o.f areus under u t'ttt.re. D CONTINUOUS RANDOM VARIABLES
f ( x ) = - 1 ,
( x - P ) 2 1 2 o 2 o"t'2x
wheref (x): frequency.at.a givenrzalue o :^ s t a n d a r dd e v i a t l o no f t h e d i s t r i b u t i o n
factors. Common significance levels are 170, 5 0 , l 0 o. A l p h a ( a ) l e v e l :^ t h e l o w e s tl e v e for which the null hypothesis can be rejected. The significanceleveldeterminesthecritical region. [| NULL HYPOTHESIS (flr)^ - A^ statement that specifies hypothesized value(s) for one or more of the population parameter. lBx. Hs=^ a coin is unbiased.That isp :^ 0.5.] tr ALTERNATM HYPOTHESIS (.r/1) - A statement that^ specifies^ that^ the^ population parameter is some value other than the one specified underthe null trypothesis.[Ex. I1r:^ a coin is biased That isp * 0.5. I. NONDIRECTIONAL HYPOTHESIS^ - an alternative hypothesis (H1) that states onll that the population parameter is different from the one ipicified under H 6. Ex. [1^ f lt + !t Two-Tailed Probability Value is employed^ when the alternative hypothesis is non-directional.
(for sample mean X) rlf x 1, X2, X3,... xn , is a simple random sample of n elements from a large (infinite) population, with mean mu(p) and standard deviation o, then the distribution of T takes on the bell shaped distribution^ of^ a normal random variable as n increases andthe distribution ofthe ratio: 7-! 6l^J n approaches the standard normal distribution^ as n goes t o ' i n f i n i t y. I n p r a c t i c e. a n o r m a l a p p r o x i m a t i o n i s acceptable for samples of 30 or larger.
Percentage Cumulative Distribution for selected (^) Z values under a normal curye
Z - v a l u e - 3^ - 2^ - l^0 + 1 + 2 + 3 PercentifeScore o-13 2.2a 15.87 50.00 a4.13 97.72 99.a
G O R R E L A T I O N
Definirton - Carrelation refers to the relatianship baween two variables,The Correlstion CoefJicientis a measurethst exFrcssesthe extent ta which two vsriables we related
tU
I
0
where r is Pearsoncorrelation o By matching samples on a variable correlated with the criterion variable, the magnitude of the standard error ofthe^ difference can be reduced. o The higher the correlation, the greater the reduction in the standard error ofthe difference.
^ 2 ^ 2 n
DIFFERENCE BETWEEN MEANS- If a num- ber of pairs of samples were taken from the same population or from two different populations, then: r (^) The distribution of differences between pairs of samplemeanstendsto be normal (z-distrjbution). r The mean of these differences between means F 1 , 1 " i s e q u a l t o t h e d i f f e r e n c e b e t w e e n t h e population means. that is ltfl-tz. I Z-DISTRIBUTION: or and ozure known o The standard error ofthe difference between means o", - ",
={toi) | \ + @',)I n 2 o Where (u, - u,) reDresentsthe hvpothesizeddif- ferencein rirdan!.'ihefollowins statisticcan be used for hypothesis tests: _ ( 4 - t r ) - ( u t -^ u z )
o When n1 and n2 qre >30, substifuesy and s2 for ol and 02. respecnvely.
(To-qbtain sum of squaygs(SS) (^) see Measures of Cen- tral Tendencyon'page l) D POOLED^ '-TEST o Distribution is normal o n < 3 0 r o1 and 02 are zal known but assumed equal
;U.r11!9 to determrnest,-x-,.giyen formula below for estimating 6riz
(j;.#)
(n1- l)sf +(n2- l)s
o s2lllarger variance' stgmaller variance)
Top row=.05,Bottom row=. points for distribution of F
k- : (^) mean of ith treatment group and xr., of all n values across all k treatmeiii
i o o o .w q o v ' P - f t z : ^ T n ( | - t t y n
oMost widely-used non-parametric test. .The y2 mean :^ its degreesof freedom. oThe X2 variance :^ twice its degreesof fieedorl o Can be usedto test one or two independentsamples. o The square of a standardnormal variable is a c h i - s q d a r ev a r i a b l e. o Like the t-distibution" it has different distribu- tions depending on the degrees of freedom. D D E G R E E S O F F R E E D O | M ( d. f. )^. / COMPUTATION v o lf chi-pquare tests for the goodness-of'-fitto a hr - p o t h e s i z ' e dd i s t r i b u t i o n. d.f.: (^) S - I - m, where
. g:.number of groups,or classes.in the^ fiequener dlstrlbutlon. m -^ number of population parametersthat lnu\r b e e s t i m a t e d f r o m
n t ' n 1 1
W h e r e
, n , = , , r , n 1 ' I t ' lt " r r J 1 " , * " r 1 21 n r + n , l 1
7 = mean number of runs n , :^ n u m b e r o f o u t c o m e s o f o n e t y p e n2 = number of outcomes of the other type S4 = standard deviation of the distribution of the n u m D e r o I r u n s.
\ l r n u l l i e t u r e c lb r I l n r i n a t r n gS e r rr c e s .I n c. l. a r g o .f l. i r n d s o l d u n d e r l i c e n s et f o r f I. r. 1 f l o l d r n g :. L L (.. P r l o \ l t o. (. \ l S. { u n d e r o n e o r n r o r e o l t h e t b l l o $ i n g P a t e n t s :L S P r r r r
quickstudy.com
\ O l l ( ' E I O S T I D E N T : T h i s Q { / ( / t c h a r t c o r e r s t h e b a s r c s o f I n t r o d u e r , , r ' t r s t i c s .D u e t o i 1 : c o n d c n \ c d l o r n i a t. h c t u e rc r. u s c i t r s r S t a t i s t i c sg r r i r l r , l r r dn o l a \ a r ( p l a ( r nrrnt lor assigned course work.t l0(ll. tsar( hart. lnc. Boca Raton. FI