Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Comparison of Nonparametric and Parametric Methods in Machine Learning: Kernel Density Est, Lecture notes of Introduction to Machine Learning

Toyota Technological Institute at Chicago (TTIC)Introduction to Machine Learning

A lecture script from ttic 31020: introduction to machine learning, covering nonparametric methods in machine learning. The lecture begins by comparing parametric and nonparametric methods, explaining that nonparametric methods keep around the training data and use it as parameters. The lecture then introduces kernel density estimation, a nonparametric method for probability density estimation, and nearest neighbor classifiers. The document also discusses the choice of kernel width and the famous cover and hart result on the nearest neighbor classifier.

Typology: Lecture notes

2011/2012

Uploaded on 03/12/2012

alfred67 🇺🇸

4.9

(20)

328 documents

1 / 23

This page cannot be seen from the preview

Don't miss anything!

Lecture 20: Nonparametric methods

TTIC 31020: Introduction to Machine Learning

Instructor: Greg Shakhnarovich

TTI–Chicago

November 10, 2010

Lecture 20: Nonparametric methods TTIC 31020

Partial preview of the text

Download Comparison of Nonparametric and Parametric Methods in Machine Learning: Kernel Density Est and more Lecture notes Introduction to Machine Learning in PDF only on Docsity!

Lecture 20: Nonparametric methods

TTIC 31020: Introduction to Machine Learning

Instructor: Greg Shakhnarovich

TTI–Chicago

November 10, 2010

Review

Parametric vs. nonparametric methods

So far, we have seen parametric methods

Learning = inferring (fitting) parameters.

Is SVM classifier

sign

w 0 +

αi> 0

αiyiK(xi, x)

parametric?

In general, we can not summarize it in a simple parametric form.
Need to keep around some (possibly all!) of the training data.

Parametric vs. nonparametric methods

So far, we have seen parametric methods

Learning = inferring (fitting) parameters.

Is SVM classifier

sign

w 0 +

αi> 0

αiyiK(xi, x)

parametric?

In general, we can not summarize it in a simple parametric form.
Need to keep around some (possibly all!) of the training data.
(^) The Lagrange multipliers α are kind of parameters.

In nonparametric methods the training examples are explicitly used as parameters.

Nonparametric density estimation

The problem of probability density estimation: infer p(vx 0 ) given a set of x 1 ,... , xN.

Parametric estimation: assume a parametric form p(x; θ)

Estimate θ using ML, MAP etc.

We have seen examples for Gaussian and Bernoulli densities.

The idea behind nonparametric estimation: directly evaluate how dense is the vicinity of x 0.

Kernel density estimation

Consider a kernel that is also a pdf.

(^) e.g., Gaussian kernel K(x 0 , xi) = N

x 0 − xi; 0, σ^2 I

Estimator:

pˆ(x 0 ) =

N

∑^ N

K(x 0 , xi).

−6^0 −5 −4 −3 −2 −1 0 1 2 3

An example’s contribution depends on the distance from x 0.

Choice of kernel width

ˆp(x 0 ) =

N

∑^ N

N

x 0 − xi; 0, σ^2 I

σ = 1 σ = 0. 4 σ = 0. 05

−6^0 −5 −4 −3 −2 −1 0 1 2 3

Choice of the kernel width σ is crucial.

Similar to the overfitting effect in supervised learning!

Nearest neighbor methods

When σ is sufficiently small, the role of xi that are far from x 0 vanishes.

(^) The result depends on the nearest neighbors of x 0.

We can make this explicit by ignoring the kernel, and simply expressing inference in terms of neighors’ labels.

Example: nearest neighbor classification.

(^) Training data x 1 , y 1 ,... , xN , yN are simply stored.
Given x 0 , let

iN N = argmin i

‖x 0 − xi‖.

(^) Nearest neighbor prediction: ˆy 0 = yiN N

No parametric/probabilistic assumptions whatsoever!

Nearest neighbor classifier

Let RN be the expected risk of the NN classifier with N training examples drawn from p(x, y).

A famous result due to Cover and Hart ’67: under mild assumptios on p(x, y), the asymptotic risk of the NN classifier R∞ = limN →∞ RN satisfies

R∗^ ≤ R∞ ≤ 2 R∗(1 − R∗),

where R∗^ is the Bayes risk.

Less famous result (Cover, ’68): the rate of convergence to the bound can be arbitrarily slow!

Nonetheless, in practice NN is often very accurate – and slow.

Example: k-NN for handwritten digits

Take 16x16 grayscale images (8bit) of handwritten digits.

Use Euclidean distance in raw pixel space, k = 7.

Classification error (leave-one-out): 4.85%.

Examples:

Nearest neighbor: extensions

We can use k > 1 nearest neighbors ⇒ k-NN classifier

Label for x 0 predicted by majority voting among its k-NN.

What about regression? Simplest k-NN regression: let x′ 1 ,... , x′ k be the neighbors, and y′ 1 ,... , y k′ their labels.

(^) Predict ˆy = (^1) k

∑k j=1 y ′ j. What kind of functions can we estimate in this way?

What is the effect of k?

Geometry of nearest neighbor

NN induce Voronoi tasselation of the space:

Parametric locally weighted regression

Idea 2: bring back the parameters.

Fit a (simple) parametric model to the neighbors of x 0.

Parametric locally weighted regression

Idea 2: bring back the parameters.

Fit a (simple) parametric model to the neighbors of x 0.

Comparison of Nonparametric and Parametric Methods in Machine Learning: Kernel Density Est, Lecture notes of Introduction to Machine Learning

Related documents

Partial preview of the text

Download Comparison of Nonparametric and Parametric Methods in Machine Learning: Kernel Density Est and more Lecture notes Introduction to Machine Learning in PDF only on Docsity!

Lecture 20: Nonparametric methods

Review

Parametric vs. nonparametric methods

Parametric vs. nonparametric methods

Nonparametric density estimation

Kernel density estimation

N

∑^ N

Choice of kernel width

N

∑^ N

N

Nearest neighbor methods

Nearest neighbor classifier

Example: k-NN for handwritten digits

Nearest neighbor: extensions

Geometry of nearest neighbor

Parametric locally weighted regression

Parametric locally weighted regression