Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Standard Deviation and Standard Error: Uses in Biomedical Research, Slides of Statistics

The confusion between standard deviation (SD) and standard error (SE) in biomedical research. The author, George W. Brown, MD, highlights the importance of clarifying the usage and meaning of these terms in biomedical reports. insights into the differences between SD and SE, their sources, and their meanings, and suggests ways to avoid confusion in reporting research data.

What you will learn

  • Why are standard deviation and standard error used interchangeably in biomedical research?
  • What is the difference between standard deviation and standard error?

Typology: Slides

2021/2022

Uploaded on 09/12/2022

parvini
parvini 🇺🇸

4.5

(15)

243 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Standard
Deviation,
Standard
Error
Which
'Standard'
Should
We
Use?
George
W.
Brown,
MD
\s=b\
Standard
deviation
(SD)
and
standard
error
(SE)
are
quietly
but
extensively
used
in
biomedical
publications.
These
terms
and
notations
are
used
as
descriptive
sta-
tistics
(summarizing
numerical
data),
and
they
are
used
as
inferential
statistics
(esti-
mating
population
parameters
from
sam-
ples).
I
review
the
use
and
misuse
of
SD
and
SE
in
several
authoritative
medical
journals
and
make
suggestions
to
help
clarify
the
usage
and
meaning
of
SD
and
SE
in
biomedical
reports.
(Am
J
Dis
Child
1982;136:937-941)
Standard
deviation
(SD)
and
stan¬
dard
error
(SE)
have
surface
simi¬
larities;
yet,
they
are
conceptually
so
different
that
we
must
wonder
why
they
are
used
almost
interchangeably
in
the
medical
literature.
Both
are
usually
preceded
by
a
plus-minus
symbol
(±),
suggesting
that
they
define
a
sym¬
metric
interval
or
range
of
some
sort.
They
both
appear
almost
always
with
a
mean
(average)
of
a
set
of
measure¬
ments
or
counts
of
something.
The
med¬
ical
literature
is
replete
with
statements
like,
"The
serum
cholesterol
measure¬
ments
were
distributed
with
a
mean
of
180±30
mg/dL
(SD)."
In
the
same
journal,
perhaps
in
the
same
article,
a
different
statement
may
appear:
"The
weight
gains
of
the
sub¬
jects
averaged
720
(mean)
±32
g/mo
(SE)."
Sometimes,
as
discussed
further,
the
summary
data
are
presented
as
the
"mean
of
120
mg/dL
±12"
without
the
"12"
being
defined
as
SD
or
SE,
or
as
some
other
index
of
dispersion.
Eisen¬
hart1
warned
against
this
"peril
of
From
the
Los
Lunas
Hospital
and
Training
School,
New
Mexico,
and
the
Department
of
Pedi-
atrics,
University
of
New
Mexico
School
of
Medi-
cine,
Albuquerque.
Reprint
requests
to
Los
Lunas
Hospital
and
Training
School,
Box
1269,
Los
Lunas,
NM
87031
(Dr
Brown).
shorthand
expression"
in
1968;
Fein-
stein2
later
again
warned
about
the
fatuity
and
confusion
contained
in
any
a
±
b
statements
where
b
is
not
defined.
Warnings
notwithstanding,
a
glance
through
almost
any
medical
journal
will
show
examples
of
this
usage.
Medical
journals
seldom
state
why
SD
or
SE
is
selected
to
summarize
data
in
a
given
report.
A
search
of
the
three
major
pediatrie
journals
for
1981
(Amer¬
ican
Journal
of
Diseases
of
Children,
Journal
of
Pediatrics,
and
Pediatrics)
failed
to
turn
up
a
single
article
in
which
the
selection
of
SD
or
SE
was
explained.
There
seems
to
be
no
uniformity
in
the
use
of
SD
or
SE
in
these
journals
or
in
The
Journal
of
the
American
Medical
Association
(JAMA),
the
New
England
Journal
of
Medicine,
or
Science.
The
use
of
SD
and
SE
in
the
journals
will
be
discussed
further.
If
these
respected,
well-edited
jour¬
nals
do
not
demand
consistent
use
of
either
SD
or
SE,
are
there
really
any
important
differences
between
them?
Yes,
they
are
remarkably
different,
despite
their
superficial
similarities.
They
are
so
different
in
fact
that
some
authorities
have
recommended
that
SE
should
rarely
or
never
be
used
to
sum¬
marize
medical
research
data.
Fein-
stein2
noted
the
following:
A
standard
error
has
nothing
to
do
with
standards,
with
errors,
or
with
the
commu¬
nication
of
scientific
data.
The
concept
is
an
abstract
idea,
spawned
by
the
imaginary
world
of
statistical
inference
and
pertinent
only
when
certain
operations
of
that
imagi¬
nary
world
are
met
in
scientific
reality.2(p336)
Glantz3
also
has
made
the
following
rec¬
ommendation:
Most
medical
investigators
summarize
their
data
with
the
standard
error
because
it
is
always
smaller
than
the
standard
deviation.
It
makes
their
data
look
better
.
.
.
data
should
never
be
summarized
with
the
stan¬
dard
error
of
the
mean.3*"25™
A
closer
look
at
the
source
and
mean¬
ing
of
SD
and
SE
may
clarify
why
medical
investigators,
journal
review¬
ers,
and
editors
should
scrutinize
their
usage
with
considerable
care.
DISPERSION
An
essential
function
of
"descriptive
statistics"
is
the
presentation
of
con¬
densed,
shorthand
symbols
that
epito¬
mize
the
important
features
of
a
collec¬
tion
of
data.
The
idea
of
a
central
value
is
intuitively
satisfactory
to
anyone
who
needs
to
summarize
a
group
of
measure¬
ments,
or
counts.
The
traditional
indica¬
tors
of
a
central
tendency
are
the
mode
(the
most
frequent
value),
the
median
(the
value
midway
between
the
lowest
and
the
highest
value),
and
the
mean
(the
average).
Each
has
its
special
uses,
but
the
mean
has
great
convenience
and
flexibility
for
many
purposes.
The
dispersion
of
a
collection
of
values
can
be
shown
in
several
ways;
some
are
simple
and
concise,
and
others
are
com¬
plex
and
esoteric.
The
range
is
a
simple,
direct
way
to
indicate
the
spread
of
a
collection
of
values,
but
it
does
not
tell
how
the
values
are
distributed.
Knowl¬
edge
of
the
mean
adds
considerably
to
the
information
carried
by
the
range.
Another
index
of
dispersion
is
pro¬
vided
by
the
differences
(deviations)
of
each
value
from
the
mean
of
the
values.
The
trouble
with
this
approach
is
that
some
deviations
will
be
positive,
and
some
will
be
negative,
and
their
sum
will
be
zero.
We
could
ignore
the
sign
of
each
deviation,
ie,
use
the
"absolute
mean
deviation,"
but
mathematicians
tell
us
that
working
with
absolute
num¬
bers
is
extremely
difficult
and
fraught
with
technical
disadvantages.
A
neglected
method
for
summarizing
the
dispersion
of
data
is
the
calculation
of
percentiles
(or
deciles,
or
quartiles).
Percentiles
are
used
more
frequently
in
pediatrics
than
in
other
branches
of
medicine,
usually
in
growth
charts
or
in
other
data
arrays
that
are
clearly
not
symmetric
or
bell
shaped.
In
the
gen¬
eral
medical
literature,
percentiles
are
sparsely
used,
apparently
because
of
a
common,
but
erroneous,
assumption
that
the
mean
±
SD
or
SE
is
satisfactory
for
summarizing
central
tendency
and
dispersion
of
all
sorts
of
data.
pf3
pf4
pf5

Partial preview of the text

Download Understanding Standard Deviation and Standard Error: Uses in Biomedical Research and more Slides Statistics in PDF only on Docsity!

Standard Deviation,

Standard Error

Which 'Standard' Should We^ Use?

George W.^ Brown, MD
\s=b\Standard^ deviation^ (SD) and^ standard
error (SE) are quietly but extensively used
in biomedical^ publications. These^ terms
and notations are^ used as^ descriptive sta-
tistics (summarizing numerical^ data), and
they are^ used^ as^ inferential^ statistics^ (esti-
mating population parameters from^ sam-
ples). I^ review^ the use^ and^ misuse^ of^ SD

and SE in several authoritative medical

journals and^ make^ suggestions^ to^ help
clarify the^ usage and^ meaning of^ SD^ and

SE in biomedical (^) reports.

(Am J Dis Child 1982;136:937-941)

Standard

deviation (SD) and stan¬
dard error (SE) have surface simi¬

larities; yet, they are^ conceptually so different that we must wonder (^) why they are used almost (^) interchangeably in the medical literature.^ Both^ are^ usually preceded by a^ plus-minus symbol (±), suggesting that^ they define^ a^ sym¬ metric interval or (^) range of some sort. They both^ appear almost^ always with^ a mean (^) (average) of^ a^ set^ of^ measure¬ ments or counts of (^) something. The med¬ ical literature is^ replete with^ statements like, "The^ serum^ cholesterol^ measure¬ ments were distributed^ with^ a mean of 180±30 (^) mg/dL (SD)." In the same (^) journal, perhaps in the same (^) article, a different statement (^) may appear: "The^ weight gains of^ the^ sub¬ jects averaged 720 (mean)^ ±32^ g/mo (SE)." (^) Sometimes, as^ discussed^ further, the (^) summary data are (^) presented as the "mean (^) of (^120) mg/dL ±12" without the "12" (^) being defined as SD or (^) SE, or as some (^) other index (^) of (^) dispersion. Eisen¬ hart1 warned^ against this^ "peril of From the Los Lunas (^) Hospital and (^) Training School, New^ Mexico, and^ the^ Department of^ Pedi- atrics, University of^ New^ Mexico^ School^ of^ Medi- cine, Albuquerque. Reprint (^) requests to^ Los^ Lunas^ Hospital and Training School, Box^ 1269, Los^ Lunas, NM^87031 (Dr Brown). shorthand (^) expression" in (^) 1968; Fein- stein2 later^ again warned^ about^ the fatuity and^ confusion^ contained^ in^ any a ± b^ statements where^ b^ is^ not defined. Warnings notwithstanding, a^ glance through almost^ any medical^ journal will show (^) examples of^ this^ usage. Medical (^) journals seldom (^) state (^) why SD or SE is^ selected^ to summarize^ data in (^) a (^) given (^) report. A search of the three major pediatrie journals for^1981 (Amer¬ ican (^) Journal (^) of Diseases (^) of Children, Journal (^) of Pediatrics, and^ Pediatrics) failed (^) to turn (^) up a (^) single article in which the selection^ of^ SD or^ SE was^ explained. There seems (^) to be no (^) uniformity in the use of SD or (^) SE in these (^) journals or in The Journal (^) of the American Medical Association (^) (JAMA), the New (^) England Journal (^) of Medicine, or^ Science.^ The use of SD and SE in the (^) journals will be discussed (^) further. If these (^) respected, well-edited (^) jour¬ nals do (^) not demand consistent use of either (^) SD or (^) SE, are there (^) really any important differences^ between^ them? Yes, they are^ remarkably different, despite their^ superficial similarities. They are^ so^ different^ in^ fact^ that^ some authorities have recommended that SE should (^) rarely or never be used (^) to sum¬ marize (^) medical research data. Fein- stein2 noted the (^) following:

A standard error has nothing to do with
standards, with^ errors, or^ with^ the^ commu¬

nication of scientific data. The (^) concept is an

abstract idea, spawned by the imaginary
world of statistical inference and pertinent
only when^ certain^ operations of^ that^ imagi¬
nary world^ are^ met^ in^ scientific^ reality.2(p336)

Glantz3 also^ has made^ the^ following rec¬ ommendation:

Most medical investigators summarize their

data with the standard error because it is

always smaller^ than^ the^ standard^ deviation.

It makes their data look better

... data should never be summarized with the stan¬ dard error of the mean.3*"25™ A (^) closer look (^) at the source and mean¬ ing of^ SD^ and^ SE^ may clarify^ why medical (^) investigators, journal review¬ ers, and^ editors^ should^ scrutinize^ their usage with^ considerable^ care. DISPERSION An essential function of (^) "descriptive statistics" is the (^) presentation of con¬ densed, shorthand^ symbols that^ epito¬ mize the (^) important features of a collec¬ tion of^ data.^ The idea of a^ central^ value is (^) intuitively satisfactory to (^) anyone who needs (^) to summarize a (^) group of measure¬ ments, or^ counts. The^ traditional^ indica¬ tors of a^ central (^) tendency are^ the^ mode (the most (^) frequent value), the^ median (the value^ midway between^ the^ lowest and the (^) highest value), and the mean (the (^) average). Each^ has^ its^ special uses,

but the mean has great convenience and

flexibility for^ many purposes. The (^) dispersion of a collection of values can be shown in several (^) ways; some are simple and^ concise, and^ others^ are^ com¬ plex and^ esoteric.^ The^ range is^ a^ simple, direct (^) way to indicate^ the^ spread of a collection of (^) values, but it does (^) not tell how the values are (^) distributed. Knowl¬ edge of^ the^ mean^ adds^ considerably to the information^ carried^ by the^ range. Another index of (^) dispersion is (^) pro¬ vided (^) by the differences (^) (deviations) of each value from the mean of the values. The trouble with this (^) approach is that some deviations will be (^) positive, and some (^) will be (^) negative, and their sum will (^) be zero. We could (^) ignore the (^) sign of each (^) deviation, ie, use the (^) "absolute mean (^) deviation," but mathematicians tell us that (^) working with absolute num¬ bers is^ extremely difficult^ and^ fraught with technical^ disadvantages. A (^) neglected method for (^) summarizing the (^) dispersion of data is the calculation of (^) percentiles (or (^) deciles, or^ quartiles). Percentiles are used more (^) frequently in pediatrics than^ in^ other^ branches^ of medicine, usually in^ growth charts^ or^ in other data (^) arrays that are^ clearly not symmetric or^ bell^ shaped. In^ the^ gen¬ eral medical^ literature, percentiles are sparsely used, apparently because^ of^ a common, but^ erroneous, assumption that the mean ± (^) SD or (^) SE is (^) satisfactory for (^) summarizing central^ tendency and dispersion of^ all^ sorts^ of^ data.

STANDARD DEVIATION

The (^) generally accepted answer to the need for a^ concise^ expression for^ the

dispersion of^ data^ is^ to^ square^ the^ differ¬

ence of^ each value^ from^ the^ group mean, giving all^ positive values.^ When^ these

squared deviations^ are^ added^ up^ and

then (^) divided (^) by the number of values in the (^) group, the^ result is^ the variance. The (^) variance is (^) always a (^) positive num¬ ber, but^ it^ is^ in^ different^ units^ than^ the mean. The^ way around^ this inconve¬ nience (^) is to use the (^) square root of (^) the variance, which^ is^ the^ population stan¬ dard deviation (^) ( ), which^ for^ conve¬ nience (^) will be called SD. (^) Thus, the SD is

the square root of^ the^ averaged squared

deviations from^ the^ mean.^ The^ SD is sometimes called^ by the^ shorthand term, "root-mean-square." The (^) SD, calculated in (^) this (^) way, is in the (^) same units as the (^) original values (^) and the mean. The^ SD^ has^ additional (^) prop¬ erties that make^ it^ attractive for^ sum¬ marizing dispersion, especially if^ the

data are distributed symmetrically

in the (^) revered (^) bell-shaped, gaussian curve. Although there^ are^ an^ infinite number of^ gaussian curves, the^ one^ for the data at^ hand^ is^ described^ completely by the^ mean^ and^ SD.^ For^ example,^ the mean+ 1.96 SD^ will^ enclose^ 95%^ of^ the values; the^ mean^ ±2.58^ SD^ will^ enclose 99% of^ the values. It^ is^ this (^) symmetry and (^) elegance that contribute to our admiration of the (^) gaussian curve. The (^) bad (^) news, especially for (^) biologic data, is^ that^ many collections^ of^ mea¬ surements or^ counts are not^ sym¬ metric or bell (^) shaped. Biologic data tend to be^ skewed^ or^ double (^) humped, J shaped, U^ shaped, or^ flat^ on^ top. Re¬ gardless of^ the^ shape of^ the^ distribu¬

tion, it^ is^ still^ possible by rote^ arithme¬

tic to calculate an^ SD (^) although it^ may be (^) inappropriate and (^) misleading. For (^) example, one^ can (^) imagine throwing a^ six-sided^ die^ several^ hun¬ dred times^ and (^) recording the score^ at each throw. This^ would^ generate a

flattopped, ie,^ rectangular,^ distribu¬

tion, with^ about^ the^ same^ number^ of counts for^ each (^) score, 1 through 6.^ The mean ofthe scores would be 3.5 and the SD would be about 1.7.^ The trouble^ is that the collection^ of scores is^ not^ bell

shaped, so^ the^ SD^ is^ not^ a^ good^ sum¬

mary statement^ of^ the^ true^ form^ of^ the

data. (It is^ mildly upsetting to some

"V<

( -μ)' SD =^ - ) SD of (^) Population μ =^ Mean^ of^ Population = (^) Number in (^) Population Estimate of^ Population SD^ From^ Sample X =^ Mean of (^) Sample = (^) Number in (^) Sample

Fig 1.—Standard^ deviation^ (SD) of^ population is^ shown^ at^ left.^ Estimate^ of^ population^ SD^ derived
from sample is shown at^ right.

QT)

SD =_ =^ SEM

s/a SEM SD =^ Estimate of (^) Population SD = (^) Sample Size

SE =^ / (^

  • ) (^) pq SE of (^) Proportion = (^) Proportion Estimated From (^) Sample q =^ (1 -P) = (^) Sample Size
Fig 2.—Standard^ error^ of^ mean^ (SEM)^ is^ shown^ at^ left.^ Note^ that^ SD^ is^ estimate^ of^ population^ SD
(not ,^ actual^ SD^ of^ population).^ Sample^ size^ used^ to^ calculate^ SEM^ is^ n.^ Standard^ error^ of
proportion is^ shown^ at^ right.

that no matter how^ many times the die is (^) thrown, it^ will^ never^ show^ its^ aver¬

age score^ of^ 3.5.)

The SD (^) wears two hats.^ So (^) far, we have looked at its^ role^ as a (^) descriptive statistic for measurements or^ counts that (^) are (^) representative only of (^) them¬ selves, ie, the^ data^ being^ summarized are not a (^) sample representing a^ larger (and itself^ unmeasurable)^ universe^ or population. The second (^) hat involves the use (^) of SD from a random (^) sample as an^ estimate^ of the (^) population standard^ deviation^ ( ). The formal statistical^ language says that the^ sample statistic, SD, is^ an

unbiased estimate of^ a^ population pa¬

rameter, the^ population standard^ devia¬ tion,. This (^) "estimator SD" is calculated dif¬ ferently than^ the^ SD^ used^ to^ describe data (^) that (^) represent only themselves. When a^ sample is^ used^ to^ make^ esti¬ mates about the (^) population standard deviation, the^ calculations^ require^ two changes, one^ in^ concept and^ the^ other^ in arithmetic. (^) First, the mean^ used^ to determine the (^) deviations is (^) concep¬ tualized as^ an^ estimate^ of the^ mean, x, rather (^) than as a true and exact (^) popula¬ tion mean (^) (μ). Both means^ are^ calcu¬ lated in^ the same^ way, but a^ population mean, μ, stands^ for^ itself^ and^ is^ a^ pa¬ rameter; a^ sample mean, x, is^ an^ esti¬

mate of^ the mean^ of^ a^ larger population

and is^ a^ statistic. The (^) second (^) change in (^) calculation is^ in the (^) arithmetic: the sum of the (^) squared

deviations from^ the^ (estimated) mean^ is

divided by -1, rather than by N.^ (This

makes (^) sense (^) intuitively when we recall that a^ sample would not^ show as^ great a spread of^ values^ as^ the^ source^ popula¬ tion. (^) Reducing the denominator (^) [by one] (^) produces an^ estimate^ slightly larger than^ the^ sample SD.^ This^ "cor¬ rection" has more^ impact when the^ sam¬ ple is^ small^ than^ when^ is^ large.) Formulas for^ the two^ versions of^ SD are (^) shown in (^) Fig 1. The formulas follow

the customary use of Greek letters^ for

population parameters^ and^ English^ let¬

ters for^ sample statistics. The^ number

in a sample is indicated by the lowercase

pie means), regardless of^ the^ shape of the (^) population distribution. These (^) elegant features of (^) the SEM are embodied in^ a statistical^ principle called the (^) Central Limit (^) Theorem, which (^) says, among other (^) things:

The mean of^ the^ collection^ of^ many sample
means is^ a^ good estimate^ of^ the^ mean^ of^ the
population, and^ the^ distribution^ of^ the^ sam¬
ple means^ (if^ =^30 or^ larger)^ will^ be^ nearly
gaussian regardless of^ the^ distribution^ of^ the
population from^ which^ the^ samples^ are

taken. The (^) theorem also (^) says that the collec¬ tion of (^) sample means (^) from (^) large sam¬ ples will^ be^ better^ in^ estimating the population mean^ than^ means^ from^ small samples.

Given the^ symmetry and^ usefulness

of (^) SEs in inferential (^) statistics, it is no wonder that^ some^ form^ of^ the^ SE, especially the^ SEM, is^ used^ so^ fre¬

quently in^ technical^ publications.^ A

flaw (^) occurs, however, when^ a^ confi¬ dence interval (^) based on the SEM is used to (^) replace the (^) SD as a (^) descriptive statistic; ifa^ description ofdata^ spread is (^) needed, the SD (^) should be used. As Feinstein2 has^ observed, the^ reader^ of

a research report may be interested in

the (^) span or^ range of^ the (^) data, but^ the author of^ the^ report instead (^) displays

an estimated zone of the mean (SEM).

An absolute (^) prohibition against the

use of the SEM in medical reports is

not desirable. There^ are situations^ in which the^ investigator is^ using a^ truly random (^) sample for^ estimation (^) pur¬ poses. Random^ samples of^ children have been^ used, for (^) example, to^ es¬ timate (^) population parameters of

growth. The^ essential^ element^ is^ that

the (^) investigator (and editor) recognize when (^) descriptive statistics should be used, and^ when^ inferential^ (estima¬

tion) statistics^ are^ required.

SE OF PROPORTION

As (^) mentioned (^) previously, every sam¬ ple statistic^ has^ its^ SE.^ With^ every statistic, there^ is^ a^ confidence^ interval that (^) can be estimated. (^) Despite the widespread use^ of^ SE^ (unspecified)^ and of (^) SEM in (^) medical (^) journals and (^) books, there is a (^) noticeable (^) neglect of one important SE, the^ SE^ of^ the^ proportion. The discussion^ so^ far^ has^ dealt^ with measurement data^ or counts^ of^ ele¬

ments. Equally important are^ data^ re-

ported in^ proportions or^ percentages,

such (^) as, "Six (^) of the (^) ten (^) patients with zymurgy syndrome^ had^ so-and-so." From (^) this, it is an (^) easy (^) step to (^) say, "Sixty (^) percent of^ our^ patients with zymurgy syndrome^ had^ so-and-so."^ The implication of^ such^ a^ statement^ may be that the (^) author wishes (^) to alert other clinicians, who^ may encounter^ samples from the (^) universe of (^) patients with zymurgy syndrome that^ they may see so-and-so in^ about 60%^ of them. The (^) proportion—six of ten—has an SE of the (^) proportion. As^ shown in^ Fig 2,

the SEP in^ this^ situation^ is^ the^ square

root of^ (0.6 x^ 0.4) divided^ by ten, which equals 0.155.^ The^ true^ proportion of^ so- and-so in (^) the universe of (^) patients with zymurgy syndrome^ is^ in^ the^ confidence interval that^ falls (^) symmetrically on^ both sides of six (^) of ten. lb estimate the interval, we^ start^ with^ 0.6^ or^ 60%^ as^ the midpoint of^ the^ interval.^ At^ the^ 95% level of (^) confidence, the^ interval is

0.6±1.96 SE„, which is^ 0.6^ ±^ (1.96 x

0.155), or^ from^ 0.3^ to^ 0.9. If the (^) sample shows six of (^) ten, the 95% confidence^ interval^ is^ between^ 30%

(three often)^ and^ 90%^ (nine^ often).^ This

is (^) not a (^) very narrow (^) interval. The ex¬ panse of^ the^ interval^ may^ explain^ the

almost total absence of^ the^ SE„ in^ medi¬

cal (^) reports, even in^ journals where^ the SEM and^ SD^ are used^ abundantly. In¬ vestigators may be^ dismayed by the dimensions (^) of the confidence (^) interval

when the SE,, is calculated^ from^ the

small (^) samples available^ in^ clinical situa¬ tions. Of (^) course, as in^ the^ measurement of self-contained (^) data, the^ investigator may not^ think^ of^ his^ clinical^ material^ as^ a sample from^ a^ larger universe.^ But often, it^ is^ clear^ that^ the^ purpose^ of publication is^ to^ suggest to^ other^ in¬ vestigators or^ clinicians^ that,^ when^ they see (^) patients of a certain^ type, they might (^) expect to^ encounter^ certain^ char¬ acteristics in^ some^ estimated (^) propor¬ tion of such (^) patients. JOURNAL USE^ OF SD AND^ SE lb (^) get empiric information about (^) pe¬ diatrie (^) journal standards on (^) descriptive statistics, especially the^ use^ of^ SD^ and SE, I^ examined^ every issue^ of^ the^ three major pediatrie journals published in 1981: American Journal (^) of Diseases^ of Children, Journal^ of Pediatrics, and Pediatrics. In a (^) less (^) systematic way, I perused several^ issues^ of^ JAMA, the

New England Journal ofMedicine, and

Science. Every issue^ of^ the^ three^ pediatrie journals had^ articles, reports, or^ letters in which SD was (^) mentioned, without specification of^ whether^ it^ was^ the descriptive SD^ or^ the^ estimate^ SD.^ Ev¬ ery issue^ of^ the^ Journal^ of^ Pediatrics contained (^) articles (^) using SE (^) (unspec¬ ified) and^ articles^ using SEM.^ Pedi¬ atrics (^) used SEM in (^) every issue and^ the SE in^ every issue (^) except one. (^) Eight of the 12 issues of the American Journal (^) of Diseases (^) of Children used SE or SEM or both. All (^) the (^) journals used SE as if SE (^) and SEM were (^) synonymous. Every issue^ of^ the^ three^ journals con¬ tained articles that stated the mean and range, without^ other^ indication^ of dispersion. Every journal contained^ re¬

ports with^ a^ number^ ±^ (another^ num¬

ber), with^ no^ explanation of^ what^ the number after the (^) plus-minus symbol represented. Every issue^ of^ the^ pediatrie journals

presented proportions^ of^ what^ might^ be

thought of^ as^ samples without^ indicat¬

ing that^ the^ SE„ (standard error^ of^ the

proportion) might be^ informative.

In several reports, SE or SEM is used

in one (^) place, but SD is used in another place in^ the^ same^ article,^ sometimes^ in the same^ paragraph, with^ no^ explana¬ tion of the (^) reason for each use. The use of (^) percentiles to describe (^) nongaussian distributions was^ infrequent. Similar examples of^ stylistic inconsistency were seen in (^) the (^) haphazard survey of^ JAMA, the New^ England Journal^ ofMedicine, and Science. A (^) peculiar graphic device (^) (seen in several (^) journals) is the (^) use, in^ illustra¬ tions that (^) summarize (^) data, of^ a (^) point and vertical (^) bars, with no indication^ of

what the length of the^ bars^ signifies.

A (^) prevalent and (^) unsettling practice is the use^ of^ the mean^ ±^ SD^ for^ data^ that are (^) clearly not (^) gaussian or not (^) sym¬ metric. Whenever data are (^) reported with the SD (^) as (^) large or (^) larger than^ the mean, the^ inference^ must^ be^ that^ sev¬ eral values^ are zero^ or^ negative. The mean ±2 (^) SDs should embrace about 95% of the values in a (^) gaussian distribu¬ tion. If the SD is as (^) large as the^ mean, then (^) the lower tail of the^ bell-shaped curve will (^) go below zero. For^ many

biologie data, there^ can^ be^ no^ negative values; blood^ chemicals, serum^ en¬ zymes, and^ cellular^ elements^ cannot exist in^ negative amounts. An article (^) by Fletcher and (^) Fletcher entitled "Clinical Research^ in^ General Medical Journals" in a (^) leading publica¬ tion demonstrates the (^) problem of (^) ± SD in real life. The article (^) states that in 1976 certain medical articles had an (^) average

of 4.9 authors ±7.3 (SD)! If the author¬

ship distribution^ is^ gaussian, which^ is necessary for^ ±^ SD^ to^ make^ sense, this statement means^ that 95%^ of^ the^ arti¬ cles had 4.9±(1.96x7.3) (^) authors, or from -9.4^ to +19.2.^ Or^ stated^ another way, more^ than^ 25%^ of^ the^ articles^ had zero or^ fewer^ authors. In such a (^) situation, the SD is (^) not (^) good as a (^) descriptive statistic. A mean and range would^ be^ better; percentiles would be (^) logical and (^) meaningful. Deinard (^) et al5 summarized some mental measurement scores^ using the mean (^) ± SD and the (^) range. They vividly showed two (^) dispersions for^ the^ same data. For (^) example, one (^) set of values was 120.8 (^) ± 15.2 (^) (SD); the (^) range was 63 to 140.^ The^ SD^ implies gaussian data, so 99% of the (^) values should be within ± 2.58 (^) SDs of the mean or between 81. and 160. Which (^) dispersion should we believe, 63 to^140 or^ 81.6^ to^ 160? ADVICE (^) OF AUTHORITIES There (^) may be a (^) ground swell of inter¬ est (^) among research authorities to (^) help improve statistical^ use^ in^ the^ medi¬ cal (^) literature. Friedman and (^) Phillips pointed out^ the^ embarrassing uncer¬ tainty that^ pediatrie residents^ have^ with values and correlation coefficients. Berwick and (^) colleagues,7 using a (^) ques¬ tionnaire, reported considerable^ vague¬

ness about statistical concepts among

many physicians in^ training, in^ aca¬ demic (^) medicine, and^ in^ practice. How¬

ever, in^ neither^ of^ these^ reports is

attention (^) given to the (^) interesting but confusing properties of^ SD^ and^ SE.

In several reports,8"10 the authors

urge that^ we^ be^ wary when^ comparative trials are^ reported as^ not^ statistically significant. Comparisons are^ vulnera¬ ble to the error of (^) rejecting results^ that look (^) negative, especially with^ small samples, but^ may not^ be.^ These^ au¬ thorities remind^ us of^ the^ error of^ failing to detect^ a real^ difference, (^) eg, between controls and^ treated^ subjects, when such (^) a difference (^) exists. This failure (^) is called the "error of the second (^) kind," the Type II^ error, or^ the^ beta^ error.^ In laboratory language, this^ error^ is^ called the (^) false-negative result, in^ which^ the test result^ says "normal"^ but^ nature reveals "abnormal" or^ "disease (^) pres¬ ent." (^) (The Type I^ error, the^ alpha error, is (^) a more familiar (^) one; it is the error of saying that^ two^ groups differ^ in^ some important way when^ they do^ not.^ The Type I^ error^ is^ like^ a^ false-positive laboratory test^ in^ that^ the^ test^ suggests that the (^) subject is (^) abnormal, when in

truth he is normal.)

In (^) comparative trials, calculation of the (^) Type II error (^) requires knowledge of the (^) SEs, whether the (^) comparisons are of (^) group means (^) (requiring SEM) or comparisons of^ group proportions (re¬

quiring SE„).

At the outset, I mentioned that we

are (^) advised2,3 to describe clinical data using means^ and^ the^ SD^ (for^ bell-shaped

distributions) and^ to^ eschew^ use^ of^ the

SE. On the other (^) hand, we are (^) urged to examine (^) clinical data for (^) interesting confidence (^) intervals,"12 searching for latent scientific value and^ avoiding a^ too hasty pronouncement^ of^ not^ significant. To avoid this (^) hasty fall into (^) the (^) Type II error (^) (the (^) false-negative decision), we must increase (^) sample sizes; in this (^) way, a worthwhile treatment or intervention may be^ sustained^ rather^ than^ wrongly discarded. It (^) may be (^) puzzling that some au¬ thorities seem to be (^) urging that the SE should (^) rarely be (^) used, but others are urging that^ more^ attention^ be^ paid^ to confidence (^) intervals, which (^) depend on the SE. This (^) polarity is more (^) apparent than real. If^ the (^) investigator's aim is description of^ data, he^ should^ avoid^ the use of^ the^ SE; if^ his^ aim^ is^ to estimate population (^) parameters or^ to^ test^ hy¬ potheses, ie, inferential^ statistics, then some version of^ the^ SE^ is (^) required. WHO IS RESPONSIBLE? It is (^) not clear who should be held responsible for^ data^ displays and^ sum¬

mary methods^ in^ medical^ reports.

Does the (^) responsibility lie (^) at the door of the (^) investigator-author and his (^) sta¬ tistical (^) advisors, with the (^) journal ref¬ erees and^ reviewers, or^ with the^ edi¬ tors? When I ask authors about their statistical (^) style, the (^) reply often (^) is, "The editors made me do it." An articulate defender of (^) good sta¬ tistical (^) practice and (^) usage is Feins¬ tem,2 who^ has^ regularly and^ effectively urged the^ appropriate application of biostatistics, including SD^ and^ SE.^ In his (^) book, Clinical (^) Biostatistics, he devotes an entire (^) chapter (chap (^) 23, pp 335-352) to (^) "problems in^ the^ summary and (^) display of (^) statistical data." He offers some^ advice^ to readers^ who^ wish to (^) improve the^ statistics^ seen^ in^ medi¬ cal (^) publications: "And^ the^ best^ person to (^) help re-orient^ the^ editors is^ you, dear (^) reader, you. Make (^) yourself a one- person vigilante committee."2<p349) Either the (^) vigilantes are (^) busy in other (^) enterprises or^ the^ editors^ are not (^) listening, because^ we^ continue^ to see the^ kind^ of^ inconsistent^ and confusing statistical^ practices that Eisenhart1 and^ Feinstein2 have^ been warning about^ for^ many years. I^ can only echo^ what^ others^ have^ said:^ When one sees medical (^) publications with in¬ appropriate, confusing, or^ wrong sta¬ tistical (^) presentation, one^ should^ write to the editors.^ Editors^ are, after^ all, the (^) assigned defenders^ of^ the^ elegance and (^) accuracy of our medical archives. References

  1. Eisenhart C: (^) Expression of the uncertain- ties of final results. Science (^) 1968;160:1201-1204.
  2. Feinstein AR: Clinical Biostatistics. St Louis, CV^ Mosby Co, 1977.
  3. Glantz SA: Primer (^) of Biostatistics. New York, McGraw-Hill^ Book^ Co, 1981.
  4. Fletcher (^) R, Fletcher S: Clinical research in general medical^ journals: A^ 30-year perspective. N Engl J Med^ 1979;301:180-183.
  5. Deinard^ A, Gilbert^ A, Dodd^ M, et^ al:^ Iron deficiency and^ behavioral^ deficits.^ Pediatrics 1981;68:828-833.
  6. Friedman (^) SB, Phillips S: What's the dif- ference?: Pediatric residents and their inaccu- rate (^) concepts regarding statistics.^ Pediatrics 1981;68:644-646.
  7. Berwick (^) DM, Fineberg HV, Weinstein MC: When doctors meet numbers. Am J Med 1981;71:991-998.
  8. Freiman JA, Chalmers (^) TC, Smith^ H (^) Jr, et al: The (^) importance of (^) beta, the (^) Type II error and sample size^ in^ the^ design and^ interpretation of^ the randomized control trial: (^) Survey of (^71) 'negative' trials. N (^) Engl J Med (^) 1978;299:690-694.
  9. Berwick DM: (^) Experimental (^) power: The other side^ of^ the^ coin.^ Pediatrics^ 1980; 65:1043-1045.
  10. Pascoe^ JM:^ Was^ it^ a^ Type II^ error?^ Pediat- rics (^) 1981;68:149-150.
  11. Rothman KJ: A show of confidence. N (^) Engl J Med (^) 1978;299:1362-1363.
  12. Guess H: Lack of (^) predictive indices in kernicterus\p=m-\orlack^ of^ power? Pediatrics^ 1982; 69:383.