







































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Linear classifiers, specifically focusing on the perceptron algorithm, the cost function, and support vector machines (svms). The perceptron algorithm is a method for training a linear classifier in the presence of linearly separable classes. The cost function is used to measure the error of the classifier and find the optimal solution. Support vector machines (svms) are a type of linear classifier that finds the hyperplane with the maximum margin between the classes.
Typology: Slides
1 / 47
This page cannot be seen from the preview
Don't miss anything!
1
Consider a two class task with
ω
ω
2
0
2 2
1 1
0 ...
0
) (
w x w x w x w
w
x w
x g
l l
T
2 1
2
1
0
2
0
1
2 1
,
0 )
( 0
:
hyperplane
decision
on the
,
Assume
x x
x
x
w
w x w w x w
x x
T
T
T
LINEAR CLASSIFIERS
2
hyperplane
on the
w
0
w
x
w
x
g
T
(^22)
(^21)
(^22)
(^21)
0
) (
,^
w
w
x g
z
w
w
w
d
4
Our goal:
Compute a solution, i.e., a hyperplane
w
so that
function.
1 2
x
x w
T
5
is
the
subset
of
the
vectors
wrongly
classified by
w
When
=O (empty set) a solution
is achieved and
Y x
T x^
w J
1 2
and
if 1
and
if 1
x
x
x
x
x x
7
old)(
) (old)(
(new)
w
w w w J
w
w
w
w
x
x w
w
w w J
Y x^
Y x
x
T x
^
)
(
) (
x
t w
t w
Y x
x
t
) (
) 1
(
w
8
x
x t t
x
t w
x
t w
t w
c^ t
t
t k
k
t
t k
k
t
:
e.g.,
lim ,
lim
0
2
0
10
0
i
It is a learning machine that learns from thetraining vectors via the perceptron algorithm.
The network is called perceptron or neuron.
11
At some stage
t
the perceptron algorithm
results in The corresponding hyperplane is
0
2
1
0
(^5). 0
2
1
x
x
(^5). (^51). 0 0
(^42). 1
(^75). 1 0
(^2). 0 ) 1 ( (^7). 0
(^05). 1 0
(^4). 0 ) 1 ( (^7). 0
(^5). 1 1 0
) 1
(t w
ρ
13
SMALL, in the mean square error sense, means to chooseso that the cost function:
w
2
w
T
14
Minimizing
where
x^
is the autocorrelation matrix
and
the crosscorrelation vector.
]
[
ˆ
]
[
]
[
)]
( [ 2
0
] )
[(
) (
: in
results
to
w.r. ) (
1
2 y x E R w
y x E w x x E
w x
y x E
x w
y
E w
w w J
w
w J
x
T
T
T
]
[
]...
[
]
[
.
..........
...
..........
.
..........
]
[
]...
[
]
[
]
[
2
1
1
2 1
1 1
l l
l
l
l
T
x
x x E x x E x x E x x E x x E x x E x x E R
1 y x E
y x E y x E l
16
:
M
of MSE minimization
problems. That is:Design each
so that its desired output is 1 for
and 0 for
any other class.
Remark: The MSE criterion belongs to a more general class ofcost function with the following important property:
is an estimate, in the MSE sense, of the
a-posteriori probability
,^
provided that the desired
responses used during training are
and 0
otherwise.
^
^
M i
T i
i
W
T
W
x w y E x W y E W
1
2
2
min
arg
min
arg
ˆ
i w
i
x
x
g
i
x
i
i
i^
x
y
17
estimate the value of
In the pattern recognition framework, given
one wants
to estimate the respective label
of
given
is defined as:
The above is known as the regression of
given
and
it is, in general, a non-linear function of
. If
is
Gaussian the MSE regressor is linear.
M
x
y
ˆy
y
x
2
~
y^
y
docsity.com
19
Define
responses
desired
ing
correspond
y
matrix)
(an
T 1 T^2 T N (^1) N y ... y
Nxl
x x ... x
2
1
N
T^
N i
T i i
T^
x x
X
X
1
i
N i
i
T
y x
y
X
1
20
Thus
Assume
N=l
square and invertible.
Then
y
X
y
X
X X
w
y
X
w X X
y x
w x x
T
T
T
T N i
N i
i i
i T i
^
1
1
1
)
(
ˆ
ˆ )
(
)
(
ˆ )
(
T
T^
1 )
^
Pseudoinverse of
1
1
1
1 )
(
X
X
X X X X X X X
T
T
T
T