Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

A Note on the Geometry of Kullback-Leibler Information, Study notes of Geometry

The Kullback-Leibler information number as a measure of statistical distance between probability distributions and its geometrical properties analogous to Euclidean geometry. It proves a new geometrical property by defining the shortest line between two probability distributions as well as its aid-point. The document could be useful as study notes or summary for a university course in mathematics or statistics.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

tomcrawford
tomcrawford 🇺🇸

4.1

(14)

263 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ADA2
170
A NOTE
ON THE
GEOM
ETRY
OFKULLRACK-LEIBLER
INFORMATION
1/
NUMBRS(U)
WISCONSIN
UNIV-MADISON MATHEMATIC
S
RESEARCH
CENTER
W
LOH
APR
83
MRC-TSR-2506
0AAG29-8O-C-0041
UNCLDIE/G 2/1. N
MElEL
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pff
pf12
pf13

Partial preview of the text

Download A Note on the Geometry of Kullback-Leibler Information and more Study notes Geometry in PDF only on Docsity!

ADA2 170 A NOTE ONNUMBRS(U) THE WISCONSIN GEOM ETRY (^) UNIV-MADISON OFKULLRACK-LEIBLER MATHEMATIC (^) INFORMATION (^) S RESEARCH (^) 1/ UNCLDIE/G CENTER^ W^ LOH^ APR^^83 MRC-TSR-2506^ 0AAG29-8O-C-0041 2/1. N

MElEL

1111111. Q

L

MICROCOPY NA1TOAL (^) BUREAURESOLUTION OF STANDAROS-1963-A TEST CHART

ism

4,}

UNIVERSITY OF WISCONSIN -^ MADISON MATHEMATICS RESEARCH CENTER A NOTE ON^ THE^ GEOMETRY OF KULLBACK-LEIBLER INFORMATION NUMBERS Wei-Yin Loh Technical Summary Report # April 1983 ABSTRACT Csiszar (1975) has shown that Kullback-Leibler information numbers possess some geometrical properties much like those in Euclidean geometry. This paper extends these results by characterizing the shortest line between two distributions as well as the midpoint of the line. It turns out that the distributions comprising the line have applications to the problem of testing separate families of hypotheses.

AMS (MOS) Subject Classifications: Primary - 60-00-E05, 62B

Secondary - 62F Key Words: Kullback-Leiblerdistributions, minimax, information, embedding geometry of probability

Work Unit Number 4 - Statistics and Probability

Department of Statistics and Mathematics Research Center, University of Wisconsin, Madison, WI 53705. material^ Sponsored isby basedthe^ Unitedupon workStates supported^ Army^ under by^ theContract National^ No. Science D'AG29-80-C-0041. Foundation^ underThis Grant No. MCS-7927062, Mod. 2 and Nos. MCS-7825301 and MCS-7903716.

BIGNIZCA C MWD =PLANTZON

T he Kullback-Leibler information^ number^ in^ a^ well-known^ measure^ of statistical distance between probability distributions.^ Previous^ authors^ have shown that when endowed with^ this^ distance^ measure,^ the^ space^ of^ probability distributions possesses^ geometrical^ properties^ analogous^ to^ Zuclidean geometry. This^ paper^ proves^ a^ new^ geometrical^ property^ by^ showing^ that^ one can in fact define the shortest^ line^ between^ two^ probability^ distributions^ as well as its^ aid-point. It turns out that^ the^ probability^ distributions^ comprising^ this^ line^ have long ago been used as a tool in^ the^ important^ problem^ of^ testing^ statistical hypotheses involving nuisance parameters. &part^ from^ pure^ mathematical convenience, there has been little justification for its^ use.^ The^ results^ in this paper are the first attempt at such explanation.

:C

The responsibility for the wording (^) and views expresed in this desriptive

smmary lies with MkC, (^) and not with the author of this (^) report.

I1"p

2. Notations and dofinitions

macall that if F and a are two distributions on the same measurable space, the Kullback-Lelbler information number (^) X(1,G) is defined as ( log(dV/d0)dV, it F << G 1(1G) - (^) (+ °otherwise

where OF << G means that F is absolutely continuous with respect to G. It Is well known that 1(PG) is well-defined, nonnegative, and is equal to zero if and only if F(R) - G(s) for all measurable sets B. We need the following definitions in the rest of this paper. Definition 2.1. A distribution P is closer to F and G than Q is if X(P,) ( K(Q,P) and X(P,G) 4 K(Q,G) with at least one Inequality being strict. (^) In symbols we write P Q (or

P < Q if it is clear from the context what F and G are). Definition 2.2. P is a mid-point of F and G if K(PF) - K(PG) and there does not exist Q for which Q P. Definition 2.3. P is minimax for F and G if max(K(PF), K(P,G)) - min(max(K(Q,F), K(Q,G))} where the min is taken over the space of all

Q

distributions. Throughout this paper, p denotes (^) a measure that dominates both P and Gi and f(x), g(x) are their respective densities relative to V. For convenience, we let A denote the set

(2.1) A - {x : f(x)g(x) 0 0) ,

and let Y ( ( 1) be the distribution with density (with respect to a) given by

A 14A

(2.2) pAx)^ -^ /kAg^ (x)f 0 )^ on otherwise^ A ,

-2-

II-

wier k;e^ -^ A^ a^ ad

I1 (2.3)^ P.({,^ , o^ A^ 1)^ u^ {V.6)

(note that if Ir and^0 are^ matually absolutely^ oontLnuouei,^ P4^ IN Il 0G^ and^ P^ is^ an^ exponential^ family.)^ FPinally^ v^ need^ the^ fft^ ation

(2.4) (M^ -^ A^ f10(9/

I we^ will^ often^ abbreviate^ I(Pxe)P(At).^ to

  • I (^) I (^) 4

C.. n-

Proof.* The^ first^ assertion^ follows^ from^ the^ preceding^ lie^ and^ the^ relations
......... i l i a in t l a -^.. ....^.^. ..^ '-^ ...~^ l_^. J ,^ l I i^ -
(3.2) 1(1,7) - log A + 1 Jill

(3.3) K(A,G)^ -^ log^ k^ -^ 11-X)k)^ Jil) I Differentiation^ yields,^ for^ 0 <^ A^ < 1,

A^ -1 (d/&))K(,7) - -(-A) - (d/dA)K(AG)
.. (3.4)M 13.41 var r,(o9Wf^1 (loglg(X)/f(X)) M 1}^0 0•

This proves the second aseertion.^ it^ is^ easy^ to^ see^ that^ strict^ inequality holds in (3.4)^ for^ same^ 0 <^ A^ <^1 if^ and^ only^ if^ it^ holds^ for^ all

Loen 3.3. Suppose that^ u(A)^ 0.^ Let^ Q^ be^ such^ that^ K(Q,3')^ and

f (Q,G) are both^ finite^ and^ define

rCA) -^ f^ log(px/f)dQ

(3.5) OWA1 - I log(px/g)dQ

Then Mi) rCA) and s(A)^ are^ finite^ and^ continuous^ in^ [0,1],^ and^ (iLi) if for sme 0 <*^ (^ 1, (3.6) riA)^ -^ K1(,r) then m(l) - R(A,G). Proof. The^ finiteness^ of^ (Q,)^ and^ K(Q,G)^ means^ that^ Q^ is^ absolutely continuous with^ respect to^ P^ for^ all^ A^ in^ [0,11.^ Therefore^ we^ may^ write

rW) -^ log^ kh^ +^ X(K(Q,V)^ -^ KQ,G))

OMA) - Iq^ kX^ -^ (1-x)CKCQ,r)^ -^ KCQ,G))

Assertion (i) now^ follow^ from^ Lema^ 3.1.^ To^ get^ (ii)^ use^ the^ fact^ that
R1A,,) - log k 1 +^ 1(1(1,r)^ -^ 1(IM)
and (1,0) - loc ki -^ (1-A)(CV)^ -^ 1(1,0))

.....-.-

The proof of^ the^ next^ lsm.^ is^ trivial.*^ A^ UNWe^ general^ version^ appears
In Csissar (1975).

!2 3.4^ Let^ P,^ Q,^ a^ be^ three^ distinct^ distributions^ such^ that^ P^ <<^ R

and K(Q, P) <(-. Then

I f^ ~log(dP/GJ)dO^ -^ K(P,3)

if and only if

Kx(Q,RI) *K(Q,P) +^ K(P,R)

A similar result^ holds^ if^ both^ signs^ are^ replaced^ with^ U^ signs.

Corollary 4.1. M If F and a are mutually (^) absolutely continuous, N exists and equals PA for some unique A in (0,1). (Li If (^7) and G are mutually singular, N does not exist. (ill) N is unique whenever it exists. Proof. Assertion (i) follows from the fact that if P and 0 are mutually absolutely continuous and distinct from each other, then (^) 1(0,F) - K(1,0) - 0,

I and both X(A,P) and X(X,G) are strictly monotone for 0 ( A 4 1. Assertion (ii) is ismediate from Theorem 4.1 since P - {F,G) if F and (^) G are mutually singular. (^) To prove assertion (iit), suppose that F and G are not mutually singular and N exists. If there (^) are A 1 A2 in [0,1] such that (^) and PA2 are both aid-points of F and G, then 2 (X1,F) - (I, G) - K(12,r) - (12,0) and it follows from (3.4) that g(x)/f(x) is constant a.e. (^) (p) on A. This implies that PX - (^) P0 for all 0 ( A 4 1 and hence that N is unique. Corollja 4.2. Suppose F and G are (^) not mutually singular. Then the mid- point (^) N exists (^) if and only (^) if (4.2) J(A) - 0 for (^) some 0 A(1 , in which case N - P. Proof. (^) According to Theorem 4.1, N exists if and only if (4.3) I(1,?) (^) -1(1,G) < - for some 0 ( X 1 It is clear from (^) (3.2) and (3.3) that this is equivalent to (4.2). Corollary (^) 4.1 states that mutual singularity of F and G (^) is a sufficient condition for the non-existence (^) of the mid-point. The following example shows that the condition is (^) not necessary. Sxsmple 4.1. Let r (^) be the uniform distribution on (0,3) and G be uniform on (1,2). Then P. (^) - 0 for all 0 ( X 4 I and (4.3) does not hold for any (^) A. There is thus no mid-poLnt.

-6-.i

-------- I~ i .. '

The PX (or G) in this example in "minimmxO according to Definition
2.3.* It turns out that ainimax distributions exist always. Uniqueness may be
lost but only in trivial cases. This is made explicit in the next corollary.
corollary 4.3. Mi A Mininax distribution always exists. (ai) If r and

G are not mutually singular, the minimax distributioni is unique. ULUi If

F and G are mutually singular, every distribution is minimax. (iv) avery
mid-point is unique minimax.
Proof. Since every mid-point is minimax by definition, assertion (iv) is
Ismadiato from Corollary 4.1. It remains to prove assertions Mi - (iii) only
for the case when the mid-point does not exist. To prove assertion (ii),

suppose that F and G are not mutually singular. it is clear from (4.3) that the mid-pota.,t does not exist if and only if the graphs of 10.?) and W(,G) fail to intersect in (0,11. But (^) from (3.4) either (4.4) Cd/dX)K(X,F) > 0 and (d/dX)K(X,G) < 0 for all 0 < X. < I or (4.5) (d/dX)K0.,F) - (d/d)K)(X,G) (^) - 0 for all 0 < X < I Therefore either K(0,?) > 1(0,G) or 1(1,7) < ICIG). Assume, without loss of generality, that 1(0,7) > ICOG). First suppose that (4.4) holds. Then for all 0 < X 1 (4.6) (^) maXCICO,F), 1(0,G)) < max(K0X,F), K(XG)) if X(G,F) < -, then G << F, P 1 G and (4.6) (^) yields (4.7) maxCIO,P), ICOG)) < max(X(G,I), K(G,G))

Clearly (4.7) is trivially true also if K(G,F) - .A similar argument shows

that (4.8) max(R(0,F), K(0,G)) 4 max(t(?,r), ICIGQ))

with equality if and only if P 0 a F. Now (4.6) -(4.8) shows that P 0
uniquely minimises max(X(P,P), X(P,G)) over all P e P. we conclude from
5. Example. We end this discussion with two example.
Ibxasple 5.1 (Binomial), Let F be Rin(np 1 ) (binomial with n trials and
success probability p,) and G be sin(nIp 2 ). Write qi- I-pi. Then every
member in P is binomial and the mid-point KI is Bin(n,p) wheo" p
log(q 2 /q 1 )/log(pjq 2 /P 2 q 1 ). This formula applies and yields p between p 1
and P 2 only when neither p 1 nor P 2 is 0 or 1. If P 1 - 0
and 0 < P 2 < 1 for example, the formula gives p -0. The reason for this
strange result is that here there is no aid-point since P - (F). It can be
shown that if both p 1 and P 2 are neither 0 nor 1, then p lies

II

p -5as expected. The formula for p suggests a new way of "scaling" the
binomial family.
______5.2(Noma)._ot__be____Examle(Wrmal. 52 Le^ F^ b^ U(8^21 (normal with mean 8 and
vainea2)adGb1~ 2 ). Then the members of P are also normal

distributions. (^) if a, M2a is 144 (a + 62), a 2)1 and if 0 2

2 w a21a2 loq(o 2/a/(0O2 a 2i
It can be verified that a always lies between 01i and a
AcknowledosMet. Te author is grateful to 3. L. Lehmann for many helpful
comments.

RZ3R13NCl

[1] Atkinson, A. C. (1970). A method for discriminating between models. J.

Roy. Statist. Soc. (B) 32, 323-353.

121 Brown, L. D. (1971). Non-local asymptotic optimality of appropriate likelihood ratio tests. Ann. Math. Statist. 42, 1206-1240. (31 Cox, D. R. (1961). Tests of separate families of hypotheses. Proc. Fourth Berkeley Symp. 1, 105-123. 141 Caiszar, 1. (1975). I-divergence geometry of probability distributions and minimization problems. Ann. Probability, (^) 3, 146-158. (5] Lehmann, R. L. (1959). Testing Statistical Hypotheses. Wiley, New (^) York.

'-

WrL/:j

-12-.

I' Il r III ' II l i r II I "-'"'?"' I , r~ r^ 'r 'V