Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Bias-Introduction to Machine Learning-Lecture 06-Computer Science, Lecture notes of Introduction to Machine Learning

Toyota Technological Institute at Chicago (TTIC)Introduction to Machine Learning

Bias, Variance, Overfitting, Error Decomposition, Regression, Model Complexity, Cross Validation, Estimation Theory, Polynomial Regression, Bias, Estimator, Consistency, Bias-Variance Decomposition, Bias-Variance Tradeoff, Penalizing Model Complexity, Regularization, Greg Shakhnarovich, Lecture Slides, Introduction to Machine Learning, Computer Science, Toyota Technological Institute at Chicago, United States of America.

Typology: Lecture notes

2011/2012

Uploaded on 03/12/2012

alfred67 🇺🇸

4.9

(20)

328 documents

1 / 37

This page cannot be seen from the preview

Don't miss anything!

Lecture 6: Bias, variance and overfitting

TTIC 31020: Introduction to Machine Learning

Instructor: Greg Shakhnarovich

TTI–Chicago

October 8, 2010

revised October 11, 2010

Lecture 6: Bias, variance and overfitting TTIC 31020

Partial preview of the text

Download Bias-Introduction to Machine Learning-Lecture 06-Computer Science and more Lecture notes Introduction to Machine Learning in PDF only on Docsity!

Lecture 6: Bias, variance and overfitting

TTIC 31020: Introduction to Machine Learning

Instructor: Greg Shakhnarovich

TTI–Chicago

October 8, 2010 revised October 11, 2010

Review: error decomposition

Ep(x,y)

[

(y − wˆ 0 − wˆ 1 x)^2

]

= Ep(x,y)

[

(y − w∗ 0 − w∗ 1 x)^2

]

structural error

Ep(x,y)

[

(w∗ 0 + w∗ 1 x − wˆ 0 − wˆ 1 x)^2

]

estimation error

best regression f ∗^ = E[y|x]

best linear regression w∗

estimate ˆw

w∗: parameters of the best linear predictor

Plan for today

Reminder: polynomial regression

f (x; w) = w 0 +

∑^ m

wj xj^.

Define ˜x = [1, x, x^2 ,... , xm]T

Then, f (x; w) = wT^ x˜ and we are back to the familiar simple linear regression. The least squares solution:

wˆ =

X˜T^ X˜

X˜T^ y, where ˜X =

1 x 1 x^21... xm 1 1 x 2 x^22... xm 2

............... 1 xN x^2 N... xmN

Model complexity and overfitting

Data drawn from 3rd order model:

−12−5 −4 −3 −2 −1 0 1 2 3 4 5

−

−12−5 −4 −3 −2 −1 0 1 2 3 4 5

−

m = 1 m = 3

Model complexity and overfitting

Data drawn from 3rd order model:

−12−5 −4 −3 −2 −1 0 1 2 3 4 5

−

−12−5 −4 −3 −2 −1 0 1 2 3 4 5

−

m = 1 m = 3

−12−5 −4 −3 −2 −1 0 1 2 3 4 5

−

m = 5

Cross-validation

The basic idea: if a model overfits (is too sensitive to data) it will be unstable. I.e. removal part of the data will change the fit significantly.

We can hold out part of the data, fit the model to the rest, and then test on the heldout set.

What are the problems of this approach?

Cross-validation

The basic idea: if a model overfits (is too sensitive to data) it will be unstable. I.e. removal part of the data will change the fit significantly.

We can hold out part of the data, fit the model to the rest, and then test on the heldout set.

What are the problems of this approach?

If the heldout set too small, we are susceptible to chance.
(^) If it’s too large, we get overly pessimistic (training on too little data).

Cross-validation

The improved holdout method: k-fold cross-validation

(^) Partition data into k roughly equal parts;
Train on all but j-th part, test on j-th part

x 1... xN

Cross-validation

The improved holdout method: k-fold cross-validation

(^) Partition data into k roughly equal parts;
Train on all but j-th part, test on j-th part

x 1... xN

Cross-validation

The improved holdout method: k-fold cross-validation

(^) Partition data into k roughly equal parts;
Train on all but j-th part, test on j-th part

x 1... xN

An extreme case: leave-one-out cross-validation

Lˆcv = 1 N

∑^ N

(yi − f (xi; ˆw−i))^2

where ˆw−i is fit to all the data but the i-th example.

Cross-validation: example

−12−5 −4 −3 −2 −1 0 1 2 3 4 5

−

−12−5 −4 −3 −2 −1 0 1 2 3 4 5

−

m = 3 m = 5

This is a very good estimate, although expensive to compute

(^) Need to run N estimation problems each on N − 1 examples!
An important research area: devising tricks for efficiently computing cross-validation estimates (by taking advantage of overlap between folds).

Understanding overfitting

Cross validation provides some means of dealing with overfitting

What is the source of overfitting? Why do some models overfit more than others?

We can try to get some insight by thinking about the estimation process for model parameters

A bit of estimation theory

An estimator ̂θ of a parameter θ is a function that for data X = {x 1 ,... , xN } produces estimate (value) value θ̂.

Examples: ML estimator for a Gaussian mean, given X, produces an estimate (vector) μ̂. ML estimator for linear regression parameters w under Gaussian noise model

The estimate θˆ is a random variable since it is based on a randomly drawn set X.

We can talk about E

[

θˆ|X

]

and var(θˆ|X).

(When θ is a vector, we have Cov(θˆ).)

(^) Analysis done assuming that the data is distributed according to p(x; θ)!

Bias-Introduction to Machine Learning-Lecture 06-Computer Science, Lecture notes of Introduction to Machine Learning

Related documents

Partial preview of the text

Download Bias-Introduction to Machine Learning-Lecture 06-Computer Science and more Lecture notes Introduction to Machine Learning in PDF only on Docsity!

Lecture 6: Bias, variance and overfitting

Review: error decomposition

[

]

[

]

[

]

Plan for today

Reminder: polynomial regression

X˜T^ X˜

Model complexity and overfitting

Model complexity and overfitting

Cross-validation

Cross-validation

Cross-validation

x 1... xN

Cross-validation

x 1... xN

Cross-validation

x 1... xN

∑^ N

Cross-validation: example

Understanding overfitting

A bit of estimation theory

[

]