











Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The Kullback-Leibler information number as a measure of statistical distance between probability distributions and its geometrical properties analogous to Euclidean geometry. It proves a new geometrical property by defining the shortest line between two probability distributions as well as its aid-point. The document could be useful as study notes or summary for a university course in mathematics or statistics.
Typology: Study notes
1 / 19
This page cannot be seen from the preview
Don't miss anything!
ADA2 170 A NOTE ONNUMBRS(U) THE WISCONSIN GEOM ETRY (^) UNIV-MADISON OFKULLRACK-LEIBLER MATHEMATIC (^) INFORMATION (^) S RESEARCH (^) 1/ UNCLDIE/G CENTER^ W^ LOH^ APR^^83 MRC-TSR-2506^ 0AAG29-8O-C-0041 2/1. N
1111111. Q
MICROCOPY NA1TOAL (^) BUREAURESOLUTION OF STANDAROS-1963-A TEST CHART
4,}
UNIVERSITY OF WISCONSIN -^ MADISON MATHEMATICS RESEARCH CENTER A NOTE ON^ THE^ GEOMETRY OF KULLBACK-LEIBLER INFORMATION NUMBERS Wei-Yin Loh Technical Summary Report # April 1983 ABSTRACT Csiszar (1975) has shown that Kullback-Leibler information numbers possess some geometrical properties much like those in Euclidean geometry. This paper extends these results by characterizing the shortest line between two distributions as well as the midpoint of the line. It turns out that the distributions comprising the line have applications to the problem of testing separate families of hypotheses.
Secondary - 62F Key Words: Kullback-Leiblerdistributions, minimax, information, embedding geometry of probability
Work Unit Number 4 - Statistics and Probability
Department of Statistics and Mathematics Research Center, University of Wisconsin, Madison, WI 53705. material^ Sponsored isby basedthe^ Unitedupon workStates supported^ Army^ under by^ theContract National^ No. Science D'AG29-80-C-0041. Foundation^ underThis Grant No. MCS-7927062, Mod. 2 and Nos. MCS-7825301 and MCS-7903716.
BIGNIZCA C MWD =PLANTZON
T he Kullback-Leibler information^ number^ in^ a^ well-known^ measure^ of statistical distance between probability distributions.^ Previous^ authors^ have shown that when endowed with^ this^ distance^ measure,^ the^ space^ of^ probability distributions possesses^ geometrical^ properties^ analogous^ to^ Zuclidean geometry. This^ paper^ proves^ a^ new^ geometrical^ property^ by^ showing^ that^ one can in fact define the shortest^ line^ between^ two^ probability^ distributions^ as well as its^ aid-point. It turns out that^ the^ probability^ distributions^ comprising^ this^ line^ have long ago been used as a tool in^ the^ important^ problem^ of^ testing^ statistical hypotheses involving nuisance parameters. &part^ from^ pure^ mathematical convenience, there has been little justification for its^ use.^ The^ results^ in this paper are the first attempt at such explanation.
:C
The responsibility for the wording (^) and views expresed in this desriptive
smmary lies with MkC, (^) and not with the author of this (^) report.
I1"p
macall that if F and a are two distributions on the same measurable space, the Kullback-Lelbler information number (^) X(1,G) is defined as ( log(dV/d0)dV, it F << G 1(1G) - (^) (+ °otherwise
where OF << G means that F is absolutely continuous with respect to G. It Is well known that 1(PG) is well-defined, nonnegative, and is equal to zero if and only if F(R) - G(s) for all measurable sets B. We need the following definitions in the rest of this paper. Definition 2.1. A distribution P is closer to F and G than Q is if X(P,) ( K(Q,P) and X(P,G) 4 K(Q,G) with at least one Inequality being strict. (^) In symbols we write P Q (or
P < Q if it is clear from the context what F and G are). Definition 2.2. P is a mid-point of F and G if K(PF) - K(PG) and there does not exist Q for which Q P. Definition 2.3. P is minimax for F and G if max(K(PF), K(P,G)) - min(max(K(Q,F), K(Q,G))} where the min is taken over the space of all
distributions. Throughout this paper, p denotes (^) a measure that dominates both P and Gi and f(x), g(x) are their respective densities relative to V. For convenience, we let A denote the set
and let Y ( ( 1) be the distribution with density (with respect to a) given by
(2.2) pAx)^ -^ /kAg^ (x)f 0 )^ on otherwise^ A ,
-2-
II-
(note that if Ir and^0 are^ matually absolutely^ oontLnuouei,^ P4^ IN Il 0G^ and^ P^ is^ an^ exponential^ family.)^ FPinally^ v^ need^ the^ fft^ ation
C.. n-
(3.3) K(A,G)^ -^ log^ k^ -^ 11-X)k)^ Jil) I Differentiation^ yields,^ for^ 0 <^ A^ < 1,
This proves the second aseertion.^ it^ is^ easy^ to^ see^ that^ strict^ inequality holds in (3.4)^ for^ same^ 0 <^ A^ <^1 if^ and^ only^ if^ it^ holds^ for^ all
f (Q,G) are both^ finite^ and^ define
(3.5) OWA1 - I log(px/g)dQ
Then Mi) rCA) and s(A)^ are^ finite^ and^ continuous^ in^ [0,1],^ and^ (iLi) if for sme 0 <*^ (^ 1, (3.6) riA)^ -^ K1(,r) then m(l) - R(A,G). Proof. The^ finiteness^ of^ (Q,)^ and^ K(Q,G)^ means^ that^ Q^ is^ absolutely continuous with^ respect to^ P^ for^ all^ A^ in^ [0,11.^ Therefore^ we^ may^ write
OMA) - Iq^ kX^ -^ (1-x)CKCQ,r)^ -^ KCQ,G))
.....-.-
I f^ ~log(dP/GJ)dO^ -^ K(P,3)
Kx(Q,RI) *K(Q,P) +^ K(P,R)
Corollary 4.1. M If F and a are mutually (^) absolutely continuous, N exists and equals PA for some unique A in (0,1). (Li If (^7) and G are mutually singular, N does not exist. (ill) N is unique whenever it exists. Proof. Assertion (i) follows from the fact that if P and 0 are mutually absolutely continuous and distinct from each other, then (^) 1(0,F) - K(1,0) - 0,
I and both X(A,P) and X(X,G) are strictly monotone for 0 ( A 4 1. Assertion (ii) is ismediate from Theorem 4.1 since P - {F,G) if F and (^) G are mutually singular. (^) To prove assertion (iit), suppose that F and G are not mutually singular and N exists. If there (^) are A 1 A2 in [0,1] such that (^) and PA2 are both aid-points of F and G, then 2 (X1,F) - (I, G) - K(12,r) - (12,0) and it follows from (3.4) that g(x)/f(x) is constant a.e. (^) (p) on A. This implies that PX - (^) P0 for all 0 ( A 4 1 and hence that N is unique. Corollja 4.2. Suppose F and G are (^) not mutually singular. Then the mid- point (^) N exists (^) if and only (^) if (4.2) J(A) - 0 for (^) some 0 A(1 , in which case N - P. Proof. (^) According to Theorem 4.1, N exists if and only if (4.3) I(1,?) (^) -1(1,G) < - for some 0 ( X 1 It is clear from (^) (3.2) and (3.3) that this is equivalent to (4.2). Corollary (^) 4.1 states that mutual singularity of F and G (^) is a sufficient condition for the non-existence (^) of the mid-point. The following example shows that the condition is (^) not necessary. Sxsmple 4.1. Let r (^) be the uniform distribution on (0,3) and G be uniform on (1,2). Then P. (^) - 0 for all 0 ( X 4 I and (4.3) does not hold for any (^) A. There is thus no mid-poLnt.
-6-.i
G are not mutually singular, the minimax distributioni is unique. ULUi If
suppose that F and G are not mutually singular. it is clear from (4.3) that the mid-pota.,t does not exist if and only if the graphs of 10.?) and W(,G) fail to intersect in (0,11. But (^) from (3.4) either (4.4) Cd/dX)K(X,F) > 0 and (d/dX)K(X,G) < 0 for all 0 < X. < I or (4.5) (d/dX)K0.,F) - (d/d)K)(X,G) (^) - 0 for all 0 < X < I Therefore either K(0,?) > 1(0,G) or 1(1,7) < ICIG). Assume, without loss of generality, that 1(0,7) > ICOG). First suppose that (4.4) holds. Then for all 0 < X 1 (4.6) (^) maXCICO,F), 1(0,G)) < max(K0X,F), K(XG)) if X(G,F) < -, then G << F, P 1 G and (4.6) (^) yields (4.7) maxCIO,P), ICOG)) < max(X(G,I), K(G,G))
that (4.8) max(R(0,F), K(0,G)) 4 max(t(?,r), ICIGQ))
II
distributions. (^) if a, M2a is 144 (a + 62), a 2)1 and if 0 2
RZ3R13NCl
[1] Atkinson, A. C. (1970). A method for discriminating between models. J.
121 Brown, L. D. (1971). Non-local asymptotic optimality of appropriate likelihood ratio tests. Ann. Math. Statist. 42, 1206-1240. (31 Cox, D. R. (1961). Tests of separate families of hypotheses. Proc. Fourth Berkeley Symp. 1, 105-123. 141 Caiszar, 1. (1975). I-divergence geometry of probability distributions and minimization problems. Ann. Probability, (^) 3, 146-158. (5] Lehmann, R. L. (1959). Testing Statistical Hypotheses. Wiley, New (^) York.
'-
-12-.
I' Il r III ' II l i r II I "-'"'?"' I , r~ r^ 'r 'V