


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
How to find the optimal hyperplane for a support vector machine (svm) by maximizing the margin, which is the minimum distance between the hyperplane and the data points. The document derives the quadratic program (qp) that can be used to solve this optimization problem using off-the-shelf qp solvers. The document also discusses handling non-separable data by introducing slack variables and penalties.
Typology: Assignments
1 / 4
This page cannot be seen from the preview
Don't miss anything!
dist(X, W) =
di = yi
Xi
for gives the absolute distance of Xi to W. (Note that signed distance is < 0 only for yi = − 1 .)
C = min i yi
Xi
max W
Subject to yi
Xi ≥ C ∀i
max W
Subject to yiWTXi ≥ ‖ Ŵ ‖C ∀i
max W Some function of W 2
Subject to A bunch of linear constraints
But what we have is something of the form
max W min i WTXi
Subject to A bunch of linear constraints
Subject to yiWTXi ≥ 1 ∀i
Ŵ TXi + w 0 = − 1 for points “above” the plane Ŵ TXj + w 0 = − 1 for points “below” the plane
The first eqn defines a plane (w 0 + 1)/‖ Ŵ ‖ units from the origin, the second defines a plane (w 0 − 1)/‖ Ŵ ‖ units from the origin. Subtracting, we find that the margin, C is 2 /‖ Ŵ ‖ units wide. Therefore, we can maximize the margin simply by maximizing 2 /‖ Ŵ ‖, or, equivalently, minimizing Ŵ T^ Ŵ.
min W
subject to yiWTXi ≥ 1 ∀i
(No, it’s not obvious, and I told you I wouldn’t prove it to you. Go look up the Burges tutorial on SVMs if you really want to understand where that came from... )
K(Xi, Xj ) = e−^
(Xi−Xj )T(Xi−Xj ) 2 σ^2
which you should recognize as, essentially, being a Gaussian. That’s a K equivalent to an infinite-dimensional Φ (again, I won’t prove this to you). Another example is:
K(Xi, Xj ) = (XiTXj )p
for some integer p > 1. It turns out that if your original Xi ∈ Rd, then this corresponds to a Φ that lives in a space of dimension
(d+p− 1 p
. If you’re looking at, say, 16 × 16 images (d = 256), and p = 4 (you’re looking at a degree-4 polynomial expansion), then Φ is a 183,181,376-dimensional space. Wow!