



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The confusion between standard deviation (SD) and standard error (SE) in biomedical research. The author, George W. Brown, MD, highlights the importance of clarifying the usage and meaning of these terms in biomedical reports. insights into the differences between SD and SE, their sources, and their meanings, and suggests ways to avoid confusion in reporting research data.
What you will learn
Typology: Slides
1 / 5
This page cannot be seen from the preview
Don't miss anything!
and SE in several authoritative medical
SE in biomedical (^) reports.
larities; yet, they are^ conceptually so different that we must wonder (^) why they are used almost (^) interchangeably in the medical literature.^ Both^ are^ usually preceded by a^ plus-minus symbol (±), suggesting that^ they define^ a^ sym¬ metric interval or (^) range of some sort. They both^ appear almost^ always with^ a mean (^) (average) of^ a^ set^ of^ measure¬ ments or counts of (^) something. The med¬ ical literature is^ replete with^ statements like, "The^ serum^ cholesterol^ measure¬ ments were distributed^ with^ a mean of 180±30 (^) mg/dL (SD)." In the same (^) journal, perhaps in the same (^) article, a different statement (^) may appear: "The^ weight gains of^ the^ sub¬ jects averaged 720 (mean)^ ±32^ g/mo (SE)." (^) Sometimes, as^ discussed^ further, the (^) summary data are (^) presented as the "mean (^) of (^120) mg/dL ±12" without the "12" (^) being defined as SD or (^) SE, or as some (^) other index (^) of (^) dispersion. Eisen¬ hart1 warned^ against this^ "peril of From the Los Lunas (^) Hospital and (^) Training School, New^ Mexico, and^ the^ Department of^ Pedi- atrics, University of^ New^ Mexico^ School^ of^ Medi- cine, Albuquerque. Reprint (^) requests to^ Los^ Lunas^ Hospital and Training School, Box^ 1269, Los^ Lunas, NM^87031 (Dr Brown). shorthand (^) expression" in (^) 1968; Fein- stein2 later^ again warned^ about^ the fatuity and^ confusion^ contained^ in^ any a ± b^ statements where^ b^ is^ not defined. Warnings notwithstanding, a^ glance through almost^ any medical^ journal will show (^) examples of^ this^ usage. Medical (^) journals seldom (^) state (^) why SD or SE is^ selected^ to summarize^ data in (^) a (^) given (^) report. A search of the three major pediatrie journals for^1981 (Amer¬ ican (^) Journal (^) of Diseases (^) of Children, Journal (^) of Pediatrics, and^ Pediatrics) failed (^) to turn (^) up a (^) single article in which the selection^ of^ SD or^ SE was^ explained. There seems (^) to be no (^) uniformity in the use of SD or (^) SE in these (^) journals or in The Journal (^) of the American Medical Association (^) (JAMA), the New (^) England Journal (^) of Medicine, or^ Science.^ The use of SD and SE in the (^) journals will be discussed (^) further. If these (^) respected, well-edited (^) jour¬ nals do (^) not demand consistent use of either (^) SD or (^) SE, are there (^) really any important differences^ between^ them? Yes, they are^ remarkably different, despite their^ superficial similarities. They are^ so^ different^ in^ fact^ that^ some authorities have recommended that SE should (^) rarely or never be used (^) to sum¬ marize (^) medical research data. Fein- stein2 noted the (^) following:
nication of scientific data. The (^) concept is an
Glantz3 also^ has made^ the^ following rec¬ ommendation:
data with the standard error because it is
It makes their data look better
... data should never be summarized with the stan¬ dard error of the mean.3*"25™ A (^) closer look (^) at the source and mean¬ ing of^ SD^ and^ SE^ may clarify^ why medical (^) investigators, journal review¬ ers, and^ editors^ should^ scrutinize^ their usage with^ considerable^ care. DISPERSION An essential function of (^) "descriptive statistics" is the (^) presentation of con¬ densed, shorthand^ symbols that^ epito¬ mize the (^) important features of a collec¬ tion of^ data.^ The idea of a^ central^ value is (^) intuitively satisfactory to (^) anyone who needs (^) to summarize a (^) group of measure¬ ments, or^ counts. The^ traditional^ indica¬ tors of a^ central (^) tendency are^ the^ mode (the most (^) frequent value), the^ median (the value^ midway between^ the^ lowest and the (^) highest value), and the mean (the (^) average). Each^ has^ its^ special uses,
flexibility for^ many purposes. The (^) dispersion of a collection of values can be shown in several (^) ways; some are simple and^ concise, and^ others^ are^ com¬ plex and^ esoteric.^ The^ range is^ a^ simple, direct (^) way to indicate^ the^ spread of a collection of (^) values, but it does (^) not tell how the values are (^) distributed. Knowl¬ edge of^ the^ mean^ adds^ considerably to the information^ carried^ by the^ range. Another index of (^) dispersion is (^) pro¬ vided (^) by the differences (^) (deviations) of each value from the mean of the values. The trouble with this (^) approach is that some deviations will be (^) positive, and some (^) will be (^) negative, and their sum will (^) be zero. We could (^) ignore the (^) sign of each (^) deviation, ie, use the (^) "absolute mean (^) deviation," but mathematicians tell us that (^) working with absolute num¬ bers is^ extremely difficult^ and^ fraught with technical^ disadvantages. A (^) neglected method for (^) summarizing the (^) dispersion of data is the calculation of (^) percentiles (or (^) deciles, or^ quartiles). Percentiles are used more (^) frequently in pediatrics than^ in^ other^ branches^ of medicine, usually in^ growth charts^ or^ in other data (^) arrays that are^ clearly not symmetric or^ bell^ shaped. In^ the^ gen¬ eral medical^ literature, percentiles are sparsely used, apparently because^ of^ a common, but^ erroneous, assumption that the mean ± (^) SD or (^) SE is (^) satisfactory for (^) summarizing central^ tendency and dispersion of^ all^ sorts^ of^ data.
The (^) generally accepted answer to the need for a^ concise^ expression for^ the
ence of^ each value^ from^ the^ group mean, giving all^ positive values.^ When^ these
then (^) divided (^) by the number of values in the (^) group, the^ result is^ the variance. The (^) variance is (^) always a (^) positive num¬ ber, but^ it^ is^ in^ different^ units^ than^ the mean. The^ way around^ this inconve¬ nience (^) is to use the (^) square root of (^) the variance, which^ is^ the^ population stan¬ dard deviation (^) ( ), which^ for^ conve¬ nience (^) will be called SD. (^) Thus, the SD is
deviations from^ the^ mean.^ The^ SD is sometimes called^ by the^ shorthand term, "root-mean-square." The (^) SD, calculated in (^) this (^) way, is in the (^) same units as the (^) original values (^) and the mean. The^ SD^ has^ additional (^) prop¬ erties that make^ it^ attractive for^ sum¬ marizing dispersion, especially if^ the
in the (^) revered (^) bell-shaped, gaussian curve. Although there^ are^ an^ infinite number of^ gaussian curves, the^ one^ for the data at^ hand^ is^ described^ completely by the^ mean^ and^ SD.^ For^ example,^ the mean+ 1.96 SD^ will^ enclose^ 95%^ of^ the values; the^ mean^ ±2.58^ SD^ will^ enclose 99% of^ the values. It^ is^ this (^) symmetry and (^) elegance that contribute to our admiration of the (^) gaussian curve. The (^) bad (^) news, especially for (^) biologic data, is^ that^ many collections^ of^ mea¬ surements or^ counts are not^ sym¬ metric or bell (^) shaped. Biologic data tend to be^ skewed^ or^ double (^) humped, J shaped, U^ shaped, or^ flat^ on^ top. Re¬ gardless of^ the^ shape of^ the^ distribu¬
tic to calculate an^ SD (^) although it^ may be (^) inappropriate and (^) misleading. For (^) example, one^ can (^) imagine throwing a^ six-sided^ die^ several^ hun¬ dred times^ and (^) recording the score^ at each throw. This^ would^ generate a
tion, with^ about^ the^ same^ number^ of counts for^ each (^) score, 1 through 6.^ The mean ofthe scores would be 3.5 and the SD would be about 1.7.^ The trouble^ is that the collection^ of scores is^ not^ bell
mary statement^ of^ the^ true^ form^ of^ the
( -μ)' SD =^ - ) SD of (^) Population μ =^ Mean^ of^ Population = (^) Number in (^) Population Estimate of^ Population SD^ From^ Sample X =^ Mean of (^) Sample = (^) Number in (^) Sample
QT)
s/a SEM SD =^ Estimate of (^) Population SD = (^) Sample Size
that no matter how^ many times the die is (^) thrown, it^ will^ never^ show^ its^ aver¬
The SD (^) wears two hats.^ So (^) far, we have looked at its^ role^ as a (^) descriptive statistic for measurements or^ counts that (^) are (^) representative only of (^) them¬ selves, ie, the^ data^ being^ summarized are not a (^) sample representing a^ larger (and itself^ unmeasurable)^ universe^ or population. The second (^) hat involves the use (^) of SD from a random (^) sample as an^ estimate^ of the (^) population standard^ deviation^ ( ). The formal statistical^ language says that the^ sample statistic, SD, is^ an
rameter, the^ population standard^ devia¬ tion,. This (^) "estimator SD" is calculated dif¬ ferently than^ the^ SD^ used^ to^ describe data (^) that (^) represent only themselves. When a^ sample is^ used^ to^ make^ esti¬ mates about the (^) population standard deviation, the^ calculations^ require^ two changes, one^ in^ concept and^ the^ other^ in arithmetic. (^) First, the mean^ used^ to determine the (^) deviations is (^) concep¬ tualized as^ an^ estimate^ of the^ mean, x, rather (^) than as a true and exact (^) popula¬ tion mean (^) (μ). Both means^ are^ calcu¬ lated in^ the same^ way, but a^ population mean, μ, stands^ for^ itself^ and^ is^ a^ pa¬ rameter; a^ sample mean, x, is^ an^ esti¬
and is^ a^ statistic. The (^) second (^) change in (^) calculation is^ in the (^) arithmetic: the sum of the (^) squared
makes (^) sense (^) intuitively when we recall that a^ sample would not^ show as^ great a spread of^ values^ as^ the^ source^ popula¬ tion. (^) Reducing the denominator (^) [by one] (^) produces an^ estimate^ slightly larger than^ the^ sample SD.^ This^ "cor¬ rection" has more^ impact when the^ sam¬ ple is^ small^ than^ when^ is^ large.) Formulas for^ the two^ versions of^ SD are (^) shown in (^) Fig 1. The formulas follow
ters for^ sample statistics. The^ number
pie means), regardless of^ the^ shape of the (^) population distribution. These (^) elegant features of (^) the SEM are embodied in^ a statistical^ principle called the (^) Central Limit (^) Theorem, which (^) says, among other (^) things:
taken. The (^) theorem also (^) says that the collec¬ tion of (^) sample means (^) from (^) large sam¬ ples will^ be^ better^ in^ estimating the population mean^ than^ means^ from^ small samples.
of (^) SEs in inferential (^) statistics, it is no wonder that^ some^ form^ of^ the^ SE, especially the^ SEM, is^ used^ so^ fre¬
flaw (^) occurs, however, when^ a^ confi¬ dence interval (^) based on the SEM is used to (^) replace the (^) SD as a (^) descriptive statistic; ifa^ description ofdata^ spread is (^) needed, the SD (^) should be used. As Feinstein2 has^ observed, the^ reader^ of
the (^) span or^ range of^ the (^) data, but^ the author of^ the^ report instead (^) displays
An absolute (^) prohibition against the
not desirable. There^ are situations^ in which the^ investigator is^ using a^ truly random (^) sample for^ estimation (^) pur¬ poses. Random^ samples of^ children have been^ used, for (^) example, to^ es¬ timate (^) population parameters of
the (^) investigator (and editor) recognize when (^) descriptive statistics should be used, and^ when^ inferential^ (estima¬
As (^) mentioned (^) previously, every sam¬ ple statistic^ has^ its^ SE.^ With^ every statistic, there^ is^ a^ confidence^ interval that (^) can be estimated. (^) Despite the widespread use^ of^ SE^ (unspecified)^ and of (^) SEM in (^) medical (^) journals and (^) books, there is a (^) noticeable (^) neglect of one important SE, the^ SE^ of^ the^ proportion. The discussion^ so^ far^ has^ dealt^ with measurement data^ or counts^ of^ ele¬
such (^) as, "Six (^) of the (^) ten (^) patients with zymurgy syndrome^ had^ so-and-so." From (^) this, it is an (^) easy (^) step to (^) say, "Sixty (^) percent of^ our^ patients with zymurgy syndrome^ had^ so-and-so."^ The implication of^ such^ a^ statement^ may be that the (^) author wishes (^) to alert other clinicians, who^ may encounter^ samples from the (^) universe of (^) patients with zymurgy syndrome that^ they may see so-and-so in^ about 60%^ of them. The (^) proportion—six of ten—has an SE of the (^) proportion. As^ shown in^ Fig 2,
root of^ (0.6 x^ 0.4) divided^ by ten, which equals 0.155.^ The^ true^ proportion of^ so- and-so in (^) the universe of (^) patients with zymurgy syndrome^ is^ in^ the^ confidence interval that^ falls (^) symmetrically on^ both sides of six (^) of ten. lb estimate the interval, we^ start^ with^ 0.6^ or^ 60%^ as^ the midpoint of^ the^ interval.^ At^ the^ 95% level of (^) confidence, the^ interval is
0.155), or^ from^ 0.3^ to^ 0.9. If the (^) sample shows six of (^) ten, the 95% confidence^ interval^ is^ between^ 30%
is (^) not a (^) very narrow (^) interval. The ex¬ panse of^ the^ interval^ may^ explain^ the
cal (^) reports, even in^ journals where^ the SEM and^ SD^ are used^ abundantly. In¬ vestigators may be^ dismayed by the dimensions (^) of the confidence (^) interval
small (^) samples available^ in^ clinical situa¬ tions. Of (^) course, as in^ the^ measurement of self-contained (^) data, the^ investigator may not^ think^ of^ his^ clinical^ material^ as^ a sample from^ a^ larger universe.^ But often, it^ is^ clear^ that^ the^ purpose^ of publication is^ to^ suggest to^ other^ in¬ vestigators or^ clinicians^ that,^ when^ they see (^) patients of a certain^ type, they might (^) expect to^ encounter^ certain^ char¬ acteristics in^ some^ estimated (^) propor¬ tion of such (^) patients. JOURNAL USE^ OF SD AND^ SE lb (^) get empiric information about (^) pe¬ diatrie (^) journal standards on (^) descriptive statistics, especially the^ use^ of^ SD^ and SE, I^ examined^ every issue^ of^ the^ three major pediatrie journals published in 1981: American Journal (^) of Diseases^ of Children, Journal^ of Pediatrics, and Pediatrics. In a (^) less (^) systematic way, I perused several^ issues^ of^ JAMA, the
Science. Every issue^ of^ the^ three^ pediatrie journals had^ articles, reports, or^ letters in which SD was (^) mentioned, without specification of^ whether^ it^ was^ the descriptive SD^ or^ the^ estimate^ SD.^ Ev¬ ery issue^ of^ the^ Journal^ of^ Pediatrics contained (^) articles (^) using SE (^) (unspec¬ ified) and^ articles^ using SEM.^ Pedi¬ atrics (^) used SEM in (^) every issue and^ the SE in^ every issue (^) except one. (^) Eight of the 12 issues of the American Journal (^) of Diseases (^) of Children used SE or SEM or both. All (^) the (^) journals used SE as if SE (^) and SEM were (^) synonymous. Every issue^ of^ the^ three^ journals con¬ tained articles that stated the mean and range, without^ other^ indication^ of dispersion. Every journal contained^ re¬
ber), with^ no^ explanation of^ what^ the number after the (^) plus-minus symbol represented. Every issue^ of^ the^ pediatrie journals
thought of^ as^ samples without^ indicat¬
proportion) might be^ informative.
in one (^) place, but SD is used in another place in^ the^ same^ article,^ sometimes^ in the same^ paragraph, with^ no^ explana¬ tion of the (^) reason for each use. The use of (^) percentiles to describe (^) nongaussian distributions was^ infrequent. Similar examples of^ stylistic inconsistency were seen in (^) the (^) haphazard survey of^ JAMA, the New^ England Journal^ ofMedicine, and Science. A (^) peculiar graphic device (^) (seen in several (^) journals) is the (^) use, in^ illustra¬ tions that (^) summarize (^) data, of^ a (^) point and vertical (^) bars, with no indication^ of
A (^) prevalent and (^) unsettling practice is the use^ of^ the mean^ ±^ SD^ for^ data^ that are (^) clearly not (^) gaussian or not (^) sym¬ metric. Whenever data are (^) reported with the SD (^) as (^) large or (^) larger than^ the mean, the^ inference^ must^ be^ that^ sev¬ eral values^ are zero^ or^ negative. The mean ±2 (^) SDs should embrace about 95% of the values in a (^) gaussian distribu¬ tion. If the SD is as (^) large as the^ mean, then (^) the lower tail of the^ bell-shaped curve will (^) go below zero. For^ many
biologie data, there^ can^ be^ no^ negative values; blood^ chemicals, serum^ en¬ zymes, and^ cellular^ elements^ cannot exist in^ negative amounts. An article (^) by Fletcher and (^) Fletcher entitled "Clinical Research^ in^ General Medical Journals" in a (^) leading publica¬ tion demonstrates the (^) problem of (^) ± SD in real life. The article (^) states that in 1976 certain medical articles had an (^) average
ship distribution^ is^ gaussian, which^ is necessary for^ ±^ SD^ to^ make^ sense, this statement means^ that 95%^ of^ the^ arti¬ cles had 4.9±(1.96x7.3) (^) authors, or from -9.4^ to +19.2.^ Or^ stated^ another way, more^ than^ 25%^ of^ the^ articles^ had zero or^ fewer^ authors. In such a (^) situation, the SD is (^) not (^) good as a (^) descriptive statistic. A mean and range would^ be^ better; percentiles would be (^) logical and (^) meaningful. Deinard (^) et al5 summarized some mental measurement scores^ using the mean (^) ± SD and the (^) range. They vividly showed two (^) dispersions for^ the^ same data. For (^) example, one (^) set of values was 120.8 (^) ± 15.2 (^) (SD); the (^) range was 63 to 140.^ The^ SD^ implies gaussian data, so 99% of the (^) values should be within ± 2.58 (^) SDs of the mean or between 81. and 160. Which (^) dispersion should we believe, 63 to^140 or^ 81.6^ to^ 160? ADVICE (^) OF AUTHORITIES There (^) may be a (^) ground swell of inter¬ est (^) among research authorities to (^) help improve statistical^ use^ in^ the^ medi¬ cal (^) literature. Friedman and (^) Phillips pointed out^ the^ embarrassing uncer¬ tainty that^ pediatrie residents^ have^ with values and correlation coefficients. Berwick and (^) colleagues,7 using a (^) ques¬ tionnaire, reported considerable^ vague¬
many physicians in^ training, in^ aca¬ demic (^) medicine, and^ in^ practice. How¬
attention (^) given to the (^) interesting but confusing properties of^ SD^ and^ SE.
urge that^ we^ be^ wary when^ comparative trials are^ reported as^ not^ statistically significant. Comparisons are^ vulnera¬ ble to the error of (^) rejecting results^ that look (^) negative, especially with^ small samples, but^ may not^ be.^ These^ au¬ thorities remind^ us of^ the^ error of^ failing to detect^ a real^ difference, (^) eg, between controls and^ treated^ subjects, when such (^) a difference (^) exists. This failure (^) is called the "error of the second (^) kind," the Type II^ error, or^ the^ beta^ error.^ In laboratory language, this^ error^ is^ called the (^) false-negative result, in^ which^ the test result^ says "normal"^ but^ nature reveals "abnormal" or^ "disease (^) pres¬ ent." (^) (The Type I^ error, the^ alpha error, is (^) a more familiar (^) one; it is the error of saying that^ two^ groups differ^ in^ some important way when^ they do^ not.^ The Type I^ error^ is^ like^ a^ false-positive laboratory test^ in^ that^ the^ test^ suggests that the (^) subject is (^) abnormal, when in
In (^) comparative trials, calculation of the (^) Type II error (^) requires knowledge of the (^) SEs, whether the (^) comparisons are of (^) group means (^) (requiring SEM) or comparisons of^ group proportions (re¬
are (^) advised2,3 to describe clinical data using means^ and^ the^ SD^ (for^ bell-shaped
SE. On the other (^) hand, we are (^) urged to examine (^) clinical data for (^) interesting confidence (^) intervals,"12 searching for latent scientific value and^ avoiding a^ too hasty pronouncement^ of^ not^ significant. To avoid this (^) hasty fall into (^) the (^) Type II error (^) (the (^) false-negative decision), we must increase (^) sample sizes; in this (^) way, a worthwhile treatment or intervention may be^ sustained^ rather^ than^ wrongly discarded. It (^) may be (^) puzzling that some au¬ thorities seem to be (^) urging that the SE should (^) rarely be (^) used, but others are urging that^ more^ attention^ be^ paid^ to confidence (^) intervals, which (^) depend on the SE. This (^) polarity is more (^) apparent than real. If^ the (^) investigator's aim is description of^ data, he^ should^ avoid^ the use of^ the^ SE; if^ his^ aim^ is^ to estimate population (^) parameters or^ to^ test^ hy¬ potheses, ie, inferential^ statistics, then some version of^ the^ SE^ is (^) required. WHO IS RESPONSIBLE? It is (^) not clear who should be held responsible for^ data^ displays and^ sum¬
Does the (^) responsibility lie (^) at the door of the (^) investigator-author and his (^) sta¬ tistical (^) advisors, with the (^) journal ref¬ erees and^ reviewers, or^ with the^ edi¬ tors? When I ask authors about their statistical (^) style, the (^) reply often (^) is, "The editors made me do it." An articulate defender of (^) good sta¬ tistical (^) practice and (^) usage is Feins¬ tem,2 who^ has^ regularly and^ effectively urged the^ appropriate application of biostatistics, including SD^ and^ SE.^ In his (^) book, Clinical (^) Biostatistics, he devotes an entire (^) chapter (chap (^) 23, pp 335-352) to (^) "problems in^ the^ summary and (^) display of (^) statistical data." He offers some^ advice^ to readers^ who^ wish to (^) improve the^ statistics^ seen^ in^ medi¬ cal (^) publications: "And^ the^ best^ person to (^) help re-orient^ the^ editors is^ you, dear (^) reader, you. Make (^) yourself a one- person vigilante committee."2<p349) Either the (^) vigilantes are (^) busy in other (^) enterprises or^ the^ editors^ are not (^) listening, because^ we^ continue^ to see the^ kind^ of^ inconsistent^ and confusing statistical^ practices that Eisenhart1 and^ Feinstein2 have^ been warning about^ for^ many years. I^ can only echo^ what^ others^ have^ said:^ When one sees medical (^) publications with in¬ appropriate, confusing, or^ wrong sta¬ tistical (^) presentation, one^ should^ write to the editors.^ Editors^ are, after^ all, the (^) assigned defenders^ of^ the^ elegance and (^) accuracy of our medical archives. References