Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

A Note on the Geometry of Kullback-Leibler Information, Study notes of Geometry

University of Wisconsin (UW) - Whitewater Geometry

The Kullback-Leibler information number as a measure of statistical distance between probability distributions and its geometrical properties analogous to Euclidean geometry. It proves a new geometrical property by defining the shortest line between two probability distributions as well as its aid-point. The document could be useful as study notes or summary for a university course in mathematics or statistics.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

tomcrawford 🇺🇸

4.1

(14)

263 documents

1 / 19

This page cannot be seen from the preview

Don't miss anything!

ADA2

170

A NOTE

ON THE

GEOM

ETRY

OFKULLRACK-LEIBLER

INFORMATION

NUMBRS(U)

WISCONSIN

UNIV-MADISON MATHEMATIC

RESEARCH

CENTER

LOH

APR

MRC-TSR-2506

0AAG29-8O-C-0041

UNCLDIE/G 2/1. N

MElEL

Partial preview of the text

Download A Note on the Geometry of Kullback-Leibler Information and more Study notes Geometry in PDF only on Docsity!

ADA2 170 A NOTE ONNUMBRS(U) THE WISCONSIN GEOM ETRY (^) UNIV-MADISON OFKULLRACK-LEIBLER MATHEMATIC (^) INFORMATION (^) S RESEARCH (^) 1/ UNCLDIE/G CENTER^ W^ LOH^ APR^^83 MRC-TSR-2506^ 0AAG29-8O-C-0041 2/1. N

MElEL

1111111. Q

L

MICROCOPY NA1TOAL (^) BUREAURESOLUTION OF STANDAROS-1963-A TEST CHART

ism

4,}

UNIVERSITY OF WISCONSIN -^ MADISON MATHEMATICS RESEARCH CENTER A NOTE ON^ THE^ GEOMETRY OF KULLBACK-LEIBLER INFORMATION NUMBERS Wei-Yin Loh Technical Summary Report # April 1983 ABSTRACT Csiszar (1975) has shown that Kullback-Leibler information numbers possess some geometrical properties much like those in Euclidean geometry. This paper extends these results by characterizing the shortest line between two distributions as well as the midpoint of the line. It turns out that the distributions comprising the line have applications to the problem of testing separate families of hypotheses.

AMS (MOS) Subject Classifications: Primary - 60-00-E05, 62B

Secondary - 62F Key Words: Kullback-Leiblerdistributions, minimax, information, embedding geometry of probability

Work Unit Number 4 - Statistics and Probability

Department of Statistics and Mathematics Research Center, University of Wisconsin, Madison, WI 53705. material^ Sponsored isby basedthe^ Unitedupon workStates supported^ Army^ under by^ theContract National^ No. Science D'AG29-80-C-0041. Foundation^ underThis Grant No. MCS-7927062, Mod. 2 and Nos. MCS-7825301 and MCS-7903716.

BIGNIZCA C MWD =PLANTZON

T he Kullback-Leibler information^ number^ in^ a^ well-known^ measure^ of statistical distance between probability distributions.^ Previous^ authors^ have shown that when endowed with^ this^ distance^ measure,^ the^ space^ of^ probability distributions possesses^ geometrical^ properties^ analogous^ to^ Zuclidean geometry. This^ paper^ proves^ a^ new^ geometrical^ property^ by^ showing^ that^ one can in fact define the shortest^ line^ between^ two^ probability^ distributions^ as well as its^ aid-point. It turns out that^ the^ probability^ distributions^ comprising^ this^ line^ have long ago been used as a tool in^ the^ important^ problem^ of^ testing^ statistical hypotheses involving nuisance parameters. &part^ from^ pure^ mathematical convenience, there has been little justification for its^ use.^ The^ results^ in this paper are the first attempt at such explanation.

The responsibility for the wording (^) and views expresed in this desriptive

smmary lies with MkC, (^) and not with the author of this (^) report.

I1"p

2. Notations and dofinitions

macall that if F and a are two distributions on the same measurable space, the Kullback-Lelbler information number (^) X(1,G) is defined as ( log(dV/d0)dV, it F << G 1(1G) - (^) (+ °otherwise

where OF << G means that F is absolutely continuous with respect to G. It Is well known that 1(PG) is well-defined, nonnegative, and is equal to zero if and only if F(R) - G(s) for all measurable sets B. We need the following definitions in the rest of this paper. Definition 2.1. A distribution P is closer to F and G than Q is if X(P,) ( K(Q,P) and X(P,G) 4 K(Q,G) with at least one Inequality being strict. (^) In symbols we write P Q (or

P < Q if it is clear from the context what F and G are). Definition 2.2. P is a mid-point of F and G if K(PF) - K(PG) and there does not exist Q for which Q P. Definition 2.3. P is minimax for F and G if max(K(PF), K(P,G)) - min(max(K(Q,F), K(Q,G))} where the min is taken over the space of all

Q

distributions. Throughout this paper, p denotes (^) a measure that dominates both P and Gi and f(x), g(x) are their respective densities relative to V. For convenience, we let A denote the set

(2.1) A - {x : f(x)g(x) 0 0) ,

and let Y ( ( 1) be the distribution with density (with respect to a) given by

A 14A

(2.2) pAx)^ -^ /kAg^ (x)f 0 )^ on otherwise^ A ,

-2-

II-

wier k;e^ -^ A^ a^ ad

I1 (2.3)^ P.({,^ , o^ A^ 1)^ u^ {V.6)

(note that if Ir and^0 are^ matually absolutely^ oontLnuouei,^ P4^ IN Il 0G^ and^ P^ is^ an^ exponential^ family.)^ FPinally^ v^ need^ the^ fft^ ation

(2.4) (M^ -^ A^ f10(9/

I we^ will^ often^ abbreviate^ I(Pxe)P(At).^ to

I (^) I (^) 4

C.. n-

Proof.* The^ first^ assertion^ follows^ from^ the^ preceding^ lie^ and^ the^ relations

......... i l i a in t l a -^.. ....^.^. ..^ '-^ ...~^ l_^. J ,^ l I i^ -

(3.2) 1(1,7) - log A + 1 Jill

(3.3) K(A,G)^ -^ log^ k^ -^ 11-X)k)^ Jil) I Differentiation^ yields,^ for^ 0 <^ A^ < 1,

A^ -1 (d/&))K(,7) - -(-A) - (d/dA)K(AG)

.. (3.4)M 13.41 var r,(o9Wf^1 (loglg(X)/f(X)) M 1}^0 0•

This proves the second aseertion.^ it^ is^ easy^ to^ see^ that^ strict^ inequality holds in (3.4)^ for^ same^ 0 <^ A^ <^1 if^ and^ only^ if^ it^ holds^ for^ all

Loen 3.3. Suppose that^ u(A)^ 0.^ Let^ Q^ be^ such^ that^ K(Q,3')^ and

f (Q,G) are both^ finite^ and^ define

rCA) -^ f^ log(px/f)dQ

(3.5) OWA1 - I log(px/g)dQ

Then Mi) rCA) and s(A)^ are^ finite^ and^ continuous^ in^ [0,1],^ and^ (iLi) if for sme 0 <*^ (^ 1, (3.6) riA)^ -^ K1(,r) then m(l) - R(A,G). Proof. The^ finiteness^ of^ (Q,)^ and^ K(Q,G)^ means^ that^ Q^ is^ absolutely continuous with^ respect to^ P^ for^ all^ A^ in^ [0,11.^ Therefore^ we^ may^ write

rW) -^ log^ kh^ +^ X(K(Q,V)^ -^ KQ,G))

OMA) - Iq^ kX^ -^ (1-x)CKCQ,r)^ -^ KCQ,G))

Assertion (i) now^ follow^ from^ Lema^ 3.1.^ To^ get^ (ii)^ use^ the^ fact^ that

R1A,,) - log k 1 +^ 1(1(1,r)^ -^ 1(IM)

and (1,0) - loc ki -^ (1-A)(CV)^ -^ 1(1,0))

.....-.-

The proof of^ the^ next^ lsm.^ is^ trivial.*^ A^ UNWe^ general^ version^ appears

In Csissar (1975).

!2 3.4^ Let^ P,^ Q,^ a^ be^ three^ distinct^ distributions^ such^ that^ P^ <<^ R

and K(Q, P) <(-. Then

I f^ ~log(dP/GJ)dO^ -^ K(P,3)

if and only if

Kx(Q,RI) *K(Q,P) +^ K(P,R)

A similar result^ holds^ if^ both^ signs^ are^ replaced^ with^ U^ signs.

Corollary 4.1. M If F and a are mutually (^) absolutely continuous, N exists and equals PA for some unique A in (0,1). (Li If (^7) and G are mutually singular, N does not exist. (ill) N is unique whenever it exists. Proof. Assertion (i) follows from the fact that if P and 0 are mutually absolutely continuous and distinct from each other, then (^) 1(0,F) - K(1,0) - 0,

I and both X(A,P) and X(X,G) are strictly monotone for 0 ( A 4 1. Assertion (ii) is ismediate from Theorem 4.1 since P - {F,G) if F and (^) G are mutually singular. (^) To prove assertion (iit), suppose that F and G are not mutually singular and N exists. If there (^) are A 1 A2 in [0,1] such that (^) and PA2 are both aid-points of F and G, then 2 (X1,F) - (I, G) - K(12,r) - (12,0) and it follows from (3.4) that g(x)/f(x) is constant a.e. (^) (p) on A. This implies that PX - (^) P0 for all 0 ( A 4 1 and hence that N is unique. Corollja 4.2. Suppose F and G are (^) not mutually singular. Then the mid- point (^) N exists (^) if and only (^) if (4.2) J(A) - 0 for (^) some 0 A(1 , in which case N - P. Proof. (^) According to Theorem 4.1, N exists if and only if (4.3) I(1,?) (^) -1(1,G) < - for some 0 ( X 1 It is clear from (^) (3.2) and (3.3) that this is equivalent to (4.2). Corollary (^) 4.1 states that mutual singularity of F and G (^) is a sufficient condition for the non-existence (^) of the mid-point. The following example shows that the condition is (^) not necessary. Sxsmple 4.1. Let r (^) be the uniform distribution on (0,3) and G be uniform on (1,2). Then P. (^) - 0 for all 0 ( X 4 I and (4.3) does not hold for any (^) A. There is thus no mid-poLnt.

-6-.i

-------- I~ i .. '

The PX (or G) in this example in "minimmxO according to Definition

2.3.* It turns out that ainimax distributions exist always. Uniqueness may be

lost but only in trivial cases. This is made explicit in the next corollary.

corollary 4.3. Mi A Mininax distribution always exists. (ai) If r and

G are not mutually singular, the minimax distributioni is unique. ULUi If

F and G are mutually singular, every distribution is minimax. (iv) avery

mid-point is unique minimax.

Proof. Since every mid-point is minimax by definition, assertion (iv) is

Ismadiato from Corollary 4.1. It remains to prove assertions Mi - (iii) only

for the case when the mid-point does not exist. To prove assertion (ii),

suppose that F and G are not mutually singular. it is clear from (4.3) that the mid-pota.,t does not exist if and only if the graphs of 10.?) and W(,G) fail to intersect in (0,11. But (^) from (3.4) either (4.4) Cd/dX)K(X,F) > 0 and (d/dX)K(X,G) < 0 for all 0 < X. < I or (4.5) (d/dX)K0.,F) - (d/d)K)(X,G) (^) - 0 for all 0 < X < I Therefore either K(0,?) > 1(0,G) or 1(1,7) < ICIG). Assume, without loss of generality, that 1(0,7) > ICOG). First suppose that (4.4) holds. Then for all 0 < X 1 (4.6) (^) maXCICO,F), 1(0,G)) < max(K0X,F), K(XG)) if X(G,F) < -, then G << F, P 1 G and (4.6) (^) yields (4.7) maxCIO,P), ICOG)) < max(X(G,I), K(G,G))

Clearly (4.7) is trivially true also if K(G,F) - .A similar argument shows

that (4.8) max(R(0,F), K(0,G)) 4 max(t(?,r), ICIGQ))

with equality if and only if P 0 a F. Now (4.6) -(4.8) shows that P 0

uniquely minimises max(X(P,P), X(P,G)) over all P e P. we conclude from

5. Example. We end this discussion with two example.

Ibxasple 5.1 (Binomial), Let F be Rin(np 1 ) (binomial with n trials and

success probability p,) and G be sin(nIp 2 ). Write qi- I-pi. Then every

member in P is binomial and the mid-point KI is Bin(n,p) wheo" p

log(q 2 /q 1 )/log(pjq 2 /P 2 q 1 ). This formula applies and yields p between p 1

and P 2 only when neither p 1 nor P 2 is 0 or 1. If P 1 - 0

and 0 < P 2 < 1 for example, the formula gives p -0. The reason for this

strange result is that here there is no aid-point since P - (F). It can be

shown that if both p 1 and P 2 are neither 0 nor 1, then p lies

p -5as expected. The formula for p suggests a new way of "scaling" the

binomial family.

______5.2(Noma)._otbe__Examle(Wrmal. 52 Le^ F^ b^ U(8^21 (normal with mean 8 and

vainea2)adGb1~ 2 ). Then the members of P are also normal

distributions. (^) if a, M2a is 144 (a + 62), a 2)1 and if 0 2

2 w a21a2 loq(o 2/a/(0O2 a 2i

It can be verified that a always lies between 01i and a

AcknowledosMet. Te author is grateful to 3. L. Lehmann for many helpful

comments.

RZ3R13NCl

[1] Atkinson, A. C. (1970). A method for discriminating between models. J.

Roy. Statist. Soc. (B) 32, 323-353.

121 Brown, L. D. (1971). Non-local asymptotic optimality of appropriate likelihood ratio tests. Ann. Math. Statist. 42, 1206-1240. (31 Cox, D. R. (1961). Tests of separate families of hypotheses. Proc. Fourth Berkeley Symp. 1, 105-123. 141 Caiszar, 1. (1975). I-divergence geometry of probability distributions and minimization problems. Ann. Probability, (^) 3, 146-158. (5] Lehmann, R. L. (1959). Testing Statistical Hypotheses. Wiley, New (^) York.

WrL/:j

-12-.

I' Il r III ' II l i r II I "-'"'?"' I , r~ r^ 'r 'V

A Note on the Geometry of Kullback-Leibler Information, Study notes of Geometry

Related documents

Partial preview of the text

Download A Note on the Geometry of Kullback-Leibler Information and more Study notes Geometry in PDF only on Docsity!

MElEL

L

ism

AMS (MOS) Subject Classifications: Primary - 60-00-E05, 62B

2. Notations and dofinitions

Q

(2.1) A - {x : f(x)g(x) 0 0) ,

A 14A

wier k;e^ -^ A^ a^ ad

I1 (2.3)^ P.({,^ , o^ A^ 1)^ u^ {V.6)

(2.4) (M^ -^ A^ f10(9/

I we^ will^ often^ abbreviate^ I(Pxe)P(At).^ to

Proof.* The^ first^ assertion^ follows^ from^ the^ preceding^ lie^ and^ the^ relations

......... i l i a in t l a -^.. ....^.^. ..^ '-^ ...~^ l_^. J ,^ l I i^ -

(3.2) 1(1,7) - log A + 1 Jill

A^ -1 (d/&))K(,7) - -(-A) - (d/dA)K(AG)

.. (3.4)M 13.41 var r,(o9Wf^1 (loglg(X)/f(X)) M 1}^0 0•

Loen 3.3. Suppose that^ u(A)^ 0.^ Let^ Q^ be^ such^ that^ K(Q,3')^ and

rCA) -^ f^ log(px/f)dQ

rW) -^ log^ kh^ +^ X(K(Q,V)^ -^ KQ,G))

Assertion (i) now^ follow^ from^ Lema^ 3.1.^ To^ get^ (ii)^ use^ the^ fact^ that

R1A,,) - log k 1 +^ 1(1(1,r)^ -^ 1(IM)

and (1,0) - loc ki -^ (1-A)(CV)^ -^ 1(1,0))

The proof of^ the^ next^ lsm.^ is^ trivial.*^ A^ UNWe^ general^ version^ appears

In Csissar (1975).

!2 3.4^ Let^ P,^ Q,^ a^ be^ three^ distinct^ distributions^ such^ that^ P^ <<^ R

and K(Q, P) <(-. Then

if and only if

A similar result^ holds^ if^ both^ signs^ are^ replaced^ with^ U^ signs.

-------- I~ i .. '

The PX (or G) in this example in "minimmxO according to Definition

2.3.* It turns out that ainimax distributions exist always. Uniqueness may be

lost but only in trivial cases. This is made explicit in the next corollary.

corollary 4.3. Mi A Mininax distribution always exists. (ai) If r and

F and G are mutually singular, every distribution is minimax. (iv) avery

mid-point is unique minimax.

Proof. Since every mid-point is minimax by definition, assertion (iv) is

Ismadiato from Corollary 4.1. It remains to prove assertions Mi - (iii) only

for the case when the mid-point does not exist. To prove assertion (ii),

Clearly (4.7) is trivially true also if K(G,F) - .A similar argument shows

with equality if and only if P 0 a F. Now (4.6) -(4.8) shows that P 0

uniquely minimises max(X(P,P), X(P,G)) over all P e P. we conclude from

5. Example. We end this discussion with two example.

Ibxasple 5.1 (Binomial), Let F be Rin(np 1 ) (binomial with n trials and

success probability p,) and G be sin(nIp 2 ). Write qi- I-pi. Then every

member in P is binomial and the mid-point KI is Bin(n,p) wheo" p

log(q 2 /q 1 )/log(pjq 2 /P 2 q 1 ). This formula applies and yields p between p 1

and P 2 only when neither p 1 nor P 2 is 0 or 1. If P 1 - 0

and 0 < P 2 < 1 for example, the formula gives p -0. The reason for this

strange result is that here there is no aid-point since P - (F). It can be

shown that if both p 1 and P 2 are neither 0 nor 1, then p lies

p -5as expected. The formula for p suggests a new way of "scaling" the

binomial family.

______5.2(Noma)._ot__be____Examle(Wrmal. 52 Le^ F^ b^ U(8^21 (normal with mean 8 and

vainea2)adGb1~ 2 ). Then the members of P are also normal

2 w a21a2 loq(o 2/a/(0O2 a 2i

It can be verified that a always lies between 01i and a

AcknowledosMet. Te author is grateful to 3. L. Lehmann for many helpful

comments.

Roy. Statist. Soc. (B) 32, 323-353.

WrL/:j

______5.2(Noma)._otbe__Examle(Wrmal. 52 Le^ F^ b^ U(8^21 (normal with mean 8 and