Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Mathematical Statistics The Cramér-Rao Inequality, Slides of Mathematical Statistics

An estimator for which equality holds in (2.2) is called a minimum variance unbiased estimator or simply a best unbiased estimator. The expected ...

Typology: Slides

2021/2022

Uploaded on 09/12/2022

eekbal
eekbal 🇺🇸

4.6

(30)

264 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Course Notes for Math 162: Mathematical Statistics
The Cram´er-Rao Inequality
Adam Merberg and Steven J. Miller
May 8, 2008
Abstract
The Cram´er-Rao Inequality provides a lower bound for the variance of an unbiased estimator of a parameter. It
allows us to conclude that an unbiased estimator is a minimum variance unbiased estimator for a parameter. In
these notes we prove the Cram´er-Rao inequality and examine some applications. We conclude with a discussion of a
probability distribution for which the Cram´er-Rao inequality provides no useful information.
Contents
1 Description of the Problem 1
2 The Cram´er-Rao Inequality 2
3 Examples and Exercises 5
A Interchanging Integration and Differentiation 7
B The Cauchy-Schwarz Inequality 8
C The Exponential Density 9
D When the Cram´er-Rao Inequality Provides No Information 9
1 Description of the Problem
Point estimation is the use of a statistic to estimate the value of some parameter of a population having a particular
type of density. The statistic we use is called the point estimator and its value is the point estimate. A desirable
property for a point estimator ˆ
Θ for a parameter θis that the expected value of ˆ
Θ is θ. If ˆ
Θ is a random variable with
density fand values ˆ
θ, this is equivalent to saying
E[ˆ
Θ] = Z
−∞
ˆ
θf (ˆ
θ)dˆ
θ=θ.
An estimator having this property is said to be unbiased.
Often in the process of making a point estimate, we must choose among several unbiased estimators for a given
parameter. Thus we need to consider additional criteria to select one of the estimators for use. For example, suppose
that X1, X2, . . . , Xmare a random sample from a normal population of mean µand variance σ2with nan odd integer,
m= 2n+ 1. Let the density of this function be given by f(x;µ, σ2). Suppose we wish to estimate the mean, µ, of this
population. It is well-known that both the sample mean and the sample median are unbiased estimators of the mean (c.f.
[MM]).
Often, we will take the unbiased estimator having the smallest variance. The variance of ˆ
Θ is, as for any random
variable, the second moment about the mean:
var(ˆ
Θ) = Z
−∞
(ˆ
θµˆ
Θ)2f(ˆ
θ)dˆ
θ.
Here, µˆ
Θis the mean of the random variable ˆ
Θ, which is θin the case of an unbiased estimator. Choosing the estimator
with the smaller variance is a natural thing to do, but by no means is it the only possible choice. If two estimators have
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Mathematical Statistics The Cramér-Rao Inequality and more Slides Mathematical Statistics in PDF only on Docsity!

Course Notes for Math 162: Mathematical Statistics

The Cram´er-Rao Inequality

Adam Merberg and Steven J. Miller

May 8, 2008

Abstract The Cram´er-Rao Inequality provides a lower bound for the variance of an unbiased estimator of a parameter. It allows us to conclude that an unbiased estimator is a minimum variance unbiased estimator for a parameter. In these notes we prove the Cram´er-Rao inequality and examine some applications. We conclude with a discussion of a probability distribution for which the Cram´er-Rao inequality provides no useful information.

Contents

1 Description of the Problem 1

2 The Cram´er-Rao Inequality 2

3 Examples and Exercises 5

A Interchanging Integration and Differentiation 7

B The Cauchy-Schwarz Inequality 8

C The Exponential Density 9

D When the Cram´er-Rao Inequality Provides No Information 9

1 Description of the Problem

Point estimation is the use of a statistic to estimate the value of some parameter of a population having a particular type of density. The statistic we use is called the point estimator and its value is the point estimate. A desirable property for a point estimator Θ for a parameterˆ θ is that the expected value of Θ isˆ θ. If Θ is a random variable withˆ density f and values θˆ, this is equivalent to saying

E[ Θ]ˆ =

−∞

θf^ ˆ (θˆ) dθˆ = θ.

An estimator having this property is said to be unbiased. Often in the process of making a point estimate, we must choose among several unbiased estimators for a given parameter. Thus we need to consider additional criteria to select one of the estimators for use. For example, suppose that X 1 , X 2 ,... , Xm are a random sample from a normal population of mean μ and variance σ^2 with n an odd integer, m = 2n + 1. Let the density of this function be given by f (x; μ, σ^2 ). Suppose we wish to estimate the mean, μ, of this population. It is well-known that both the sample mean and the sample median are unbiased estimators of the mean (c.f. [MM]). Often, we will take the unbiased estimator having the smallest variance. The variance of Θ is, as for any randomˆ variable, the second moment about the mean:

var( Θ)ˆ =

−∞

(θˆ − μ (^) Θˆ)^2 f (θˆ) dθ.ˆ

Here, μ (^) Θˆ is the mean of the random variable Θ, which isˆ θ in the case of an unbiased estimator. Choosing the estimator with the smaller variance is a natural thing to do, but by no means is it the only possible choice. If two estimators have

the same expected value, then while their average values will be equal the estimator with greater variance will have larger fluctuations about this common value. An estimator with a smaller variance is said to be relatively more efficient because it will tend to have values that are concentrated more closely about the correct value of the parameter, thus it allows us to be more confident that our estimate will be as close to the actual value as we would like. Furthermore, the quantity

var Θˆ 1 var Θˆ 2

is used as a measure of the efficiency of Θˆ 2 relative to Θˆ 1 [MM]. We hope to maximize efficiency by minimizing variance. In our example, the mean of the population has variance σ^2 /m = σ^2 /(2n + 1). If the population median is ˜μ, that is ˜μ is such that (^) ∫ μ˜

−∞

f (x; μ, σ^2 ) dx =

then, according to [MM], the sampling distribution of the median is approximately normal with mean ˜μ and variance

1 8 n · f (˜μ)^2

Since the normal distribution of our example is symmetric, we must have ˜μ = μ, which makes it easy to show that f (˜μ) = 1/

2 πσ^2. The variance of the sample median is therefore πσ^2 / 4 n. Certainly, in our example, the mean has the smaller variance of the two estimators, but we would like to know whether an estimator with smaller variance exists. More precisely, it would be very useful to have a lower bound on the variance of an unbiased estimator. Clearly, the variance must be non-negative^1 , but it would be useful to have a less trivial lower bound. The Cram´er-Rao Inequality is a theorem that provides such a bound under very general conditions. It does not, however, provide any assurance that any estimator exists that has the minimum variance allowed by this bound.

2 The Cram´er-Rao Inequality

The Cram´er-Rao Inequality provides us with a lower bound on the variance of an unbiased estimator for a parameter.

Cram´er-Rao Inequality. Let f (x; θ) be a probability density with continuous parameter θ. Let X 1 ,... , Xn be indepen- dent random variables with density f (x; θ), and let Θ(ˆX 1 ,... , Xn) be an unbiased estimator of θ. Assume that f (x; θ) satisfies two conditions:

  1. We have

∂ ∂θ

[∫

Θ(^ ˆx 1 ,... , xn)

∏^ n

i=

f (xi; θ) dxi

]

Θ(^ ˆx 1 ,... , xn) ∂^

∏n i=1 f^ (xi;^ θ) ∂θ

dx 1 · · · dxn, (2.1)

Conditions under which this holds are reproduced from [HH] in Appendix A.

  1. For each θ, the variance of Θ(ˆX 1 ,... , Xn) is finite.

Then

var( Θ)ˆ ≥

n E

[(

∂ log f (x;θ) ∂θ

) 2 ]^ ,^ (2.2)

where E denotes the expected value with respect to the probability density function f (x; θ).

Proof. We prove the theorem as in [CaBe]. Let Θ(̂ X~) = Θ(̂ X 1 ,... , Xn). We assume that our estimator depends only on

the sample values X 1 ,... , Xn and is independent of θ. Since Θ(̂ X~) is unbiased as an estimator for θ, we have E[Θ] =̂ θ. From this we have:

0 = E[ Θˆ − θ]

=

Θ(x 1 ,... , xn) − θ

f (x 1 ; θ) · · · f (xn; θ) dx 1 · · · dxn.

(^1) It is possible for the variance of an estimator to be zero. Consider the following case: we always estimate the mean to be 0, no matter what sample values we observe. This is a terrific estimate if the mean happens to be 0, and is a poor estimate otherwise. Note, however, that the variance of our estimator is zero!

We square both sides of (2.7), obtaining

∫ [

( Θ(̂ ~x) − θ) · φ(~x; θ)^1 /^2

]

[

φ(~x; θ)^1 /^2 ·

∑^ n

i=

∂ log f (xi; θ) ∂θ

]

d~x

We now apply the Cauchy-Schwarz Inequality to (2.8). Thus

Θ(~x) − θ

· φ(~x; θ)d~x ·

∫ (^ ∑n

i=

∂ log f (xi; θ) ∂θ

φ(~x; θ)d~x. (2.9)

There are two multiple integrals to evaluate on the right hand side. The first multiple integral is just the definition of the variance of the estimator Θ, which we denote by var(̂ Θ). Thus (2.8) becomeŝ

1 ≤ var(Θ)̂ ·

∫ (^ ∑n

i=

∂ log f (xi; θ) ∂θ

φ(~x; θ)d~x. (2.10)

To finish the proof of the Cram´er-Rao Inequality, it suffices to show

∫ · · ·

∫ (^ ∑n

i=

∂ log f (xi; θ) ∂θ

φ(~x; θ)d~x = nE

[(

∂ log f (x; θ) ∂θ

) 2 ]

This is because if we can prove (2.11), simple division will yield the Cram´er-Rao Inequality from (2.10). We now prove (2.11). We have ∫ · · ·

∫ (^ ∑n

i=

∂ log f (xi; θ) ∂θ

φ(~x; θ)d~x =

∫ (^) ∑n

i=

∑^ n

j=

∂ log f (xi; θ) ∂θ

∂ log f (xj ; θ) ∂θ

φ(~x; θ)d~x

∑^ n

i=

∑^ n

j=

∂ log f (xi; θ) ∂θ

∂ log f (xj ; θ) ∂θ

φ(~x; θ)d~x

= I 1 + I 2 , (2.12)

where

I 1 =

∫ (^) ∑n

i=

∂ log f (xi; θ) ∂θ

φ(~x; θ)d~x

I 2 =

1 ≤i,j≤n i 6 =j

∂ log f (xi; θ) ∂θ

∂ log f (xj ; θ) ∂θ

φ(~x; θ)d~x. (2.13)

The proof is completed by showing I 1 = nE

[(

∂ log f (x;θ) ∂θ

) 2 ]

and I 2 = 0. We have

I 1 =

∫ (^) ∑n

i=

∂ log f (xi; θ) ∂θ

φ(~x; θ)d~x

∑^ n

i=

∂ log f (xi; θ) ∂θ

f (xi; θ)dxi ·

∫ (^) ∏n

=1 6 =i

f (x; θ)dx

∑^ n

i=

∂ log f (xi; θ) ∂θ

f (xi; θ)dxi · 1 n−^1

∑^ n

i=

E

[(

∂ log f (xi; θ) ∂θ

) 2 ]

= nE

[(

∂ log f (xi; θ) ∂θ

) 2 ]

In the above calculation, we used the fact that f (xi; θ) is a probability density, and therefore integrates to one. In the final expected values, xi is a dummy variable, and we may denote these n expected values with a common symbol. We now turn to the analysis of I 2. In obvious notation, we may write

I 2 =

1 ≤ii,j 6 =j≤n

I 2 (i, j). (2.15)

To show I 2 = 0 it suffices to show each I 2 (i, j) = 0, which we now proceed to do. Note

I 2 (i, j) =

∂ log f (xi; θ) ∂θ

∂ log f (xj ; θ) ∂θ φ(~x; θ)d~x

∂ log f (xi; θ) ∂θ f (xi; θ)dxi ·

∂ log f (xj ; θ) ∂θ dxj ·

∫ (^) ∏n

= 6 =i,j

f (x; θ)dx

∂ log f (xi; θ) ∂θ

f (xi; θ)dxi ·

∂ log f (xj ; θ) ∂θ

dxj · 1 n−^2

= E

[

∂ log f (xi; θ) ∂θ

]

· E

[

∂ log f (xj ; θ) ∂θ

]

however, each of these two expected values is zero! To see this, note

f (x; θ)dx. (2.17)

If we differentiate both sides of (2.17) with respect to θ, we find

∂f (x; θ) ∂θ

dx

f (x; θ)

∂f (x; θ) ∂θ

f (x; θ)dx

∂ log f (x; θ) ∂θ

f (x; θ)dx = E

[

∂ log f (x; θ) ∂θ

]

This shows I 2 (i, j) = 0, which completes the proof.

An estimator for which equality holds in (2.2) is called a minimum variance unbiased estimator or simply a best unbiased estimator. The expected value in the Cram´er-Rao Inequality is called the information number or the Fisher information of the sample. We notice that the theorem makes no statement about whether equality holds for any particular estimator Θ. Indeed,ˆ in Appendix D, we give an example in which the information is infinite, and the bound provided is therefore var( Θ)ˆ ≥ 0, which is trivial.

3 Examples and Exercises

Example 3.1. We first consider estimating the parameter of an exponential population based on a sample of size m = 2n + 1. This population has density

f (x; θ) =

1 θ e

−x/θ (^) if x ≥ 0 0 if x < 0.

We consider two estimators, one based on the sample mean and the other on the sample median. We know from the Central Limit Theorem that for large m, the sample mean will have a normal distribution whose mean is θ, the population mean, and whose variance is θ^2 /m = θ^2 /(2n + 1), where θ^2 is the variance computed from the exponential density (the mean and variance are computed in Appendix C). For large n, the sample median Yn+1 has approximately a normal distribution with mean equal to ˜μ, the population median, and variance 1/(8n · f (˜μ)^2 ) [MM]. By definition, the population median satisfies

∫ (^) ˜μ

−∞

f (x) dx =

∫ (^) μ˜

0

θ

e−x/θ^ dx =

On the right side, we have

∫ Θ(^ ˆx) ∂f^ (x;^ θ) ∂θ

dx =

∫ (^) θ

0

2 x

∂θ

θ

dx

θ^2

∫ (^) θ

0

2 x dx

= − 1. (3.27)

It is therefore clear that condition (2.1) does not hold, so we cannot assume that the Cram´er-Rao Inequality holds. Indeed, we will show that it does not. We first compute the information of the sample:

n E

[(

∂ log f (x; θ) ∂θ

) 2 ]

= E

[(

∂ log(1/θ) ∂θ

) 2 ]

= E

[(

∂(− log θ) ∂θ

) 2 ]

= E

[(

θ

) 2 ]

θ^2

Therefore, if applicable, the Cram´er-Rao Inequality would tell us that var( Θ)ˆ ≥ θ^2. We now compute the variance of Θ = 2ˆ X:

var(2X) = E

[

(2X)^2

]

− E [2X]^2

∫ (^) θ

0

(2x)^2 ·

θ dx − θ^2

4 θ^2 3 − θ^2

θ^2 3

We therefore see that the Cram´er-Rao Inequality need not be satisfied when condition (2.1) is not satisfied. We note that this example has the property that the region in which the density function is nonzero depends on the parameter that we are estimating. In such cases we must be particularly careful as condition (2.1) will often not be satisfied.

Exercise 3.3. Show that the sample mean is a minimum variance unbiased estimator for the mean of a normal population.

Exercise 3.4. Let X be a random variable with a binomial distribution with parameters n and θ. Is n · Xn ·

1 − Xn

a minimum variance unbiased estimator for the variance of X?

A Interchanging Integration and Differentiation

Theorem A.1 (Differentiating under the integral sign). Let f (x, t) : Rn+1^ → R be a function such that for each fixed t the integral

F (t) =

Rn

f (t, x)dx 1 · · · dxn (A.30)

exists. For all x, suppose that ∂f /∂t exists^3 , and that there is a continuous Riemann integrable function^4 g(x) such that ∣ ∣ ∣ ∣

f (s, x) − f (t, x) s − t

∣ ≤^ g(x)^ (A.31)

for all s 6 = t. Then F is differentiable, and

dF dt

Rn

∂f ∂t

(t, x)dx 1 · · · dxn. (A.32)

(^3) Technically, all we need is that ∂f /∂t exists for almost all x, i.e., except for a set of measure zero. (^4) This condition can be weakened; it suffices for g(x) to be a Lebesgue integrable function.

The above statement is modified from that of Theorem 4.11.22 of [HH]. See page 518 of [HH] for a proof. We have stated a slightly weaker version (and commented in the footnotes on the most general statement) because these weaker cases often suffice for our applications.

Exercise A.2. It is not always the case that one can interchange orders of operations. We saw in Example 3.2 a case where we cannot interchange the integration and differentiation. We give an example which shows that we cannot always interchange orders of integration. For simplicity, we give a sequence amn such that

m(

n am,n)^6 =^

n(

m am,n). For m, n ≥ 0 let

am,n =

1 if n = m − 1 if n = m + 1 0 otherwise.

(A.33)

Show that the two different orders of summation yield different answers. The reason the Fubini Theorem is not applicable here is that

n

m |amn|^ =^ ∞.

B The Cauchy-Schwarz Inequality

The Cauchy-Schwarz Inequality is a general result from linear algebra pertaining to inner product spaces. Here we will consider only an application to Riemann integrable functions. For a more thorough treatment of the general form of the inequality, we refer the reader to Chapter 8 of [HK].

Cauchy-Schwarz Inequality. Let f, g be Riemann integrable real-valued functions of Rn. Then

(∫

· · ·

f (x 1 ,... , xn)g(x 1 ,... , xn) dx 1 · · · dxn

f (x 1 ,... , xn)^2 dx 1 ,... , dxn·

g(x 1 ,... , xn)^2 dx 1 · · · dxn.

Proof. The proof given here is a special case of that given in [HK] (page 377). For notational convenience, we define

I(f, g) =

f (x 1 ,... , xn)g(x 1 ,... , xn) dx 1 · · · dxn.

The statement of the theorem is then I(f, g)^2 ≤ I(f, f )I(g, g).

The following are results of basic properties of integrals, and we leave it as an exercise for the reader to show that they hold:

  1. I(f + g, h) = I(f, h) + I(g, h)
  2. I(f, g) = I(g, f )
  3. I(c · f, g) = c · I(f, g)
  4. I(f, f ) ≥ 0 for all f.

In the case that I(f, f ) = 0 we must also have I(f, g) = 0, so the inequality holds in this case. Otherwise, we let

h = g − I(g, f ) I(f, f )

· f.

We consider I(h, h), noting by property 4 above that this number must be nonnegative. Using the properties verified by the reader, we gave

0 ≤ I(h, h) = I

g − I(g, f ) I(f, f )

· f, g − I(g, f ) I(f, f )

· f

= I(g, g) −

I(g, f ) I(f, f ) · I(f, g) −

I(g, f ) I(f, f ) · I(g, f ) +

I(g, f )^2 I(f, f )^2 · I(f, f )

= I(g, g) −

I(g, f )^2 I(f, f )

. (B.34)

It thus follows that

I(f, g)^2 ≤ I(f, f )I(g, g). (B.35)

D.1 An Almost Pareto Density

Consider

f (x; θ) =

aθ (^) xθ (^) log^13 x if x ≥ e 0 otherwise,

(D.43)

where aθ is chosen so that f (x; θ) is a probability density function. Thus

∫ (^) ∞

e

dx xθ^ log^3 x

= 1. (D.44)

We chose to have log^3 x in the denominator to ensure that the above integral converges, as does log x times the integrand; however, the expected value (in the expectation in (2.2)) will not converge. For example, 1/x log x diverges (its integral looks like log log x) but 1/x log^2 x converges (its integral looks like 1/ log x); see pages 62–63 of [Rud] for more on close sequences where one converges but the other does not. This distribution is close to the Pareto distribution (or a power law). Pareto distributions are very useful in describing many natural phenomena; see for example [DM, Ne, NM]. The inclusion of the factor of log−^3 x allows us to have the exponent of x in the density function equal 1 and have the density function defined for arbitrarily large x; it is also needed in order to apply the Dominated Convergence Theorem to justify some of the arguments below. If we remove the logarithmic factors, then we obtain a probability distribution only if the density vanishes for large x. As log^3 x is a very slowly varying function, our distribution f (x; θ) may be of use in modeling data from an unbounded distribution where one wants to allow a power law with exponent 1, but cannot as the resulting probability integral would diverge. Such a situation occurs frequently in the Benford Law literature; see [Hi, Rai] for more details. We study the variance bounds for unbiased estimators Θ of̂ θ, and in particular we show that when θ = 1 then the Cram´er-Rao inequality yields a useless bound. Note that it is not uncommon for the variance of an unbiased estimator to depend on the value of the parameter being estimated. For example, consider the uniform distribution on [0, θ]. Let X denote the sample mean of n independent observations, and Yn = max 1 ≤i≤n Xi be the largest observation. The expected value of 2X and n+1 n Yn are both θ

(implying each is an unbiased estimator for θ); however, Var(2X) = θ^2 / 3 n and Var( n+1 n Yn) = θ^2 /n(n + 1) both depend on θ, the parameter being estimated (see, for example, page 324 of [MM] for these calculations).

Lemma D.1. As a function of θ ∈ [1, ∞), aθ is a strictly increasing function and a 1 = 2. It has a one-sided derivative at θ = 1, and d daθθ ∈ (0, ∞).

Proof. We have

e

dx xθ^ log^3 x

= 1. (D.45)

When θ = 1 we have

a 1 =

[∫ ∞

e

dx x log^3 x

]− 1

, (D.46)

which is clearly positive and finite. In fact, a 1 = 2 because the integral is

∫ (^) ∞

e

dx x log^3 x

e

log−^3 x

d log x dx

2 log^2 x

∞ e

; (D.47)

though all we need below is that a 1 is finite and non-zero, we have chosen to start integrating at e to make a 1 easy to compute. It is clear that aθ is strictly increasing with θ, as the integral in (D.46) is strictly decreasing with increasing θ (because the integrand is decreasing with increasing θ). We are left with determining the one-sided derivative of aθ at θ = 1, as the derivative at any other point is handled similarly (but with easier convergence arguments). It is technically easier to study the derivative of 1/aθ , as

d dθ

a^2 θ

daθ dθ

(D.48)

and 1 aθ

e

dx xθ^ log^3 x

. (D.49)

The reason we consider the derivative of 1/aθ is that this avoids having to take the derivative of the reciprocals of integrals. As a 1 is finite and non-zero, it is easy to pass to d daθθ |θ=1. Thus we have

d dθ

θ=

= lim h→ 0 +

h

[∫ ∞

e

dx x1+h^ log^3 x

e

dx x log^3 x

]

= lim h→ 0 +

e

1 − xh h

xh

dx x log^3 x

. (D.50)

We want to interchange the integration with respect to x and the limit with respect to h above. This interchange is permissible by the Dominated Convergence Theorem (see Appendix D.3 for details of the justification). Note lim h→ 0 +

1 − xh h

xh^

= − log x; (D.51)

one way to see this is to use the limit of a product is the product of the limits, and then use L’Hospital’s rule, writing xh as eh^ log^ x. Therefore d dθ

θ=

e

dx x log^2 x

; (D.52)

as this is finite and non-zero, this completes the proof and shows d daθθ |θ=1 ∈ (0, ∞).

Remark D.2. We see now why we chose f (x; θ) = aθ/xθ^ log^3 x instead of f (x; θ) = aθ /xθ^ log^2 x. If we only had two factors of log x in the denominator, then the one-sided derivative of aθ at θ = 1 would be infinite.

Remark D.3. Though the actual value of d daθθ |θ=1 does not matter, we can compute it quite easily. By (D.52) we have

d dθ

θ=

e

dx x log^2 x

= −

e

log−^2 x

d log x dx

=

log x

∞ e

= − 1. (D.53)

Thus by (D.48), and the fact that a 1 = 2 (Lemma D.1), we have

daθ dθ

θ=

= −a^21 ·

d dθ

θ=

= 4. (D.54)

D.2 Computing the Information

We now compute the expected value, E

[(

∂ log f (x;θ) ∂θ

) 2 ]

; showing it is infinite when θ = 1 completes the proof of our main

result. Note

log f (x; θ) = log aθ − θ log x + log log−^3 x ∂ log f (x; θ) ∂θ

daθ dθ

− log x. (D.55)

By Lemma D.1 we know that d daθθ is finite for each θ ≥ 1. Thus

E

[(

∂ log f (x; θ) ∂θ

) 2 ]

= E

[(

daθ dθ − log x

) 2 ]

e

daθ dθ

− log x

· aθ dx xθ^ log^3 x

. (D.56)

If θ > 1 then the expectation is finite and non-zero. We are left with the interesting case when θ = 1. As d daθθ |θ=1 is finite and non-zero, for x sufficiently large (say x ≥ x 1 for some x 1 , though by Remark D.3 we see that we may take any x 1 ≥ e^4 ) we have (^) ∣ ∣ ∣ ∣

a 1

daθ dθ

θ=

∣ ≤^

log x 2

. (D.57)

References

[CaBe] G. Casella and R. Berger, Statistical Inference, 2nd edition, Duxbury Advanced Series, Pacific Grove, CA, 2002.

[DM] D. Devoto and S. Martinez, Truncated Pareto Law and oresize distribution of ground rocks, Mathematical Geology 30 (1998), no. 6, 661–673.

[Hi] T. Hill, A statistical derivation of the significant-digit law, Statistical Science 10 (1996), 354–363.

[HK] Kenneth Hoffman and Ray Kunze. Linear algebra. Second edition. Prentice-Hall Inc., Englewood Cliffs, N.J.,

[HH] J. H. Hubbard and B. B. Hubbard, Vector Calculus, Linear Algebra, and Differential Forms, second edition, Prentice Hall, Upper Saddle River, NJ, 2002.

[MM] I. Miller and M. Miller, John E. Freund’s Mathematical Statistics with Applications, seventh edition, Prentice Hall, 2004.

[Ne] M. E. J. Newman, Power laws, Pareto distributions and Zipfs law, Contemporary Physics 46 (2005), no. 5, 323-351.

[NM] M. Nigrini and S. J. Miller, Benford’s Law applied to hydrology data – results and relevance to other geophysical data, preprint.

[Rai] R. A. Raimi, The first digit problem, Amer. Math. Monthly 83 (1976), no. 7, 521–538.

[Rud] W. Rudin, Principles of Mathematical Analysis, third edition, International Series in Pure and Applied Mathe- matics, McGraw-Hill Inc., New York, 1976.

[SS] E. Stein and R. Shakarchi, Real Analysis: Measure Theory, Integration, and Hilbert Spaces, Princeton University Press, Princeton, NJ, 2005.