Degrees of Freedom in Best Subset Selection and Lasso: A Comparative Study | Study Guides, Projects, Research Statistics

Degrees of Freedom and Model Search

Ryan J. Tibshirani

Abstract

Degrees of freedom is a fundamental concept in statistical modeling, as it provides a quan-

titative description of the amount of fitting performed by a given procedure. But, despite this

fundamental role in statistics, its behavior is not completely well-understood, even in somewhat

basic settings. For example, it may seem intuitively obvious that the best subset selection fit

with subset size khas degrees of freedom larger than k, but this has not been formally verified,

nor has is been precisely studied. At large, the current paper is motivated by this problem, and

we derive an exact expression for the degrees of freedom of best subset selection in a restricted

setting (orthogonal predictor variables). Along the way, we develop a concept that we name

“search degrees of freedom”; intuitively, for adaptive regression procedures that perform vari-

able selection, this is a part of the (total) degrees of freedom that we attribute entirely to the

model selection mechanism. Finally, we establish a modest extension of Stein’s formula to cover

discontinuous functions, and discuss its potential role in degrees of freedom and search degrees

of freedom calculations.

Keywords: degrees of freedom, model search, lasso, best subset selection, Stein’s formula

1 Introduction

Suppose that we are given observations y∈Rnfrom the model

y=µ+, with E()=0,Cov() = σ2I, (1)

where µ∈Rnis some fixed, true mean parameter of interest, and ∈Rnare uncorrelated errors,

with zero mean and common marginal variance σ2>0. For a function f:Rn→Rn, thought of as

a procedure for producing fitted values, ˆµ=f(y), recall that the degrees of freedom of fis defined

as (Efron 1986, Hastie & Tibshirani 1990):

df(f) = 1

σ2

i=1

Covfi(y), yi.(2)

Intuitively, the quantity df(f) reflects the effective number of parameters used by fin producing

the fitted output ˆµ. Consider linear regression, for example, where f(y) is the least squares fit of y

onto predictor variables x1,...xp∈Rn: for this procedure f, our intuition gives the right answer, as

its degrees of freedom is simply p, the number of estimated regression coefficients.1This, e.g., leads

to an unbiased estimate of the risk of the linear regression fit, via Mallows’s Cpcriterion (Mallows

1973).

In general, characterizations of degrees of freedom are highly relevant for purposes like model

comparisons and model selection; see, e.g., Efron (1986), Hastie & Tibshirani (1990), Tibshirani &

Taylor (2012), and Section 1.2, for more motivation. Unfortunately, however, counting degrees of

freedom can become quite complicated for nonlinear, adaptive procedures. (By nonlinear, we mean

fbeing nonlinear as a function of y.) Even for many basic adaptive procedures, explicit answers are

1This is assuming linear independence of x1,...xp; in general, it is the dimension of span{x1,...xp}.

Degrees of Freedom in Best Subset Selection and Lasso: A Comparative Study, Study Guides, Projects, Research of Statistics

Related documents

Partial preview of the text

Download Degrees of Freedom in Best Subset Selection and Lasso: A Comparative Study and more Study Guides, Projects, Research Statistics in PDF only on Docsity!

Degrees of Freedom and Model Search

Ryan J. Tibshirani

1 Introduction

1.1 A motivating example

1.2 Degrees of freedom and optimism

1.3 Lagrange versus constrained problem forms

1.4 Assumptions, notation, and outline

2.2 Example: null signal

2.3 Example: sparse signal

[

)]

3.1 Best subset selection

3.2 Ridge regression

5.1 An extension of Stein’s univariate lemma

[

]

[

]

[

]

[

]

[

)]

[

)]

5.2 An extension of Stein’s multivariate lemma

5.4 Connection to Theorem 2 of Hansen & Sokol (2014)