Prepare for your exams
Get points
Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Machine Learning Cheatsheet Documentation, Cheat Sheet of Machine Learning

Sioux Falls Seminary Machine Learning

Complete and detailed cheat sheet on Machine Learning

Typology: Cheat Sheet

2019/2020

Uploaded on 10/09/2020

snehaaaa 🇺🇸

4.7

(19)

240 documents

1 / 213

This page cannot be seen from the preview

Don't miss anything!

ML Cheatsheet Documentation

Team

Sep 07, 2020

Partial preview of the text

Download Machine Learning Cheatsheet Documentation and more Cheat Sheet Machine Learning in PDF only on Docsity!

Team

Sep 07, 2020

Brief visual explanations of machine learning concepts with diagrams, code examples and links to resources for learning more.

Warning: This document is under early stage development. If you find errors, please raise an issue or contribute a better definition!

Basics 1

CHAPTER 1 Linear Regression

Introduction
Simple regression
- Making predictions
- Cost function
- Gradient descent
- Training
- Model evaluation
- Summary
Multivariable regression
- Growing complexity
- Normalization
- Making predictions
- Initialize weights
- Cost function
- Gradient descent
- Simplifying with matrices
- Bias term
- Model evaluation

1.1 Introduction

Linear Regression is a supervised machine learning algorithm where the predicted output is continuous and has a constant slope. It’s used to predict values within a continuous range, (e.g. sales, price) rather than trying to classify them into categories (e.g. cat, dog). There are two main types:

Simple regression

Simple linear regression uses traditional slope-intercept form, where 𝑚 and 𝑏 are the variables our algorithm will try to “learn” to produce the most accurate predictions. 𝑥 represents our input data and 𝑦 represents our prediction.

𝑦 = 𝑚𝑥 + 𝑏

Multivariable regression

A more complex, multi-variable linear equation might look like this, where 𝑤 represents the coefficients, or weights, our model will try to learn.

𝑓 (𝑥, 𝑦, 𝑧) = 𝑤 1 𝑥 + 𝑤 2 𝑦 + 𝑤 3 𝑧

The variables 𝑥, 𝑦, 𝑧 represent the attributes, or distinct pieces of information, we have about each observation. For sales predictions, these attributes might include a company’s advertising spend on radio, TV, and newspapers.

𝑆𝑎𝑙𝑒𝑠 = 𝑤 1 𝑅𝑎𝑑𝑖𝑜 + 𝑤 2 𝑇 𝑉 + 𝑤 3 𝑁 𝑒𝑤𝑠

1.2 Simple regression

Let’s say we are given a dataset with the following columns (features): how much a company spends on Radio advertising each year and its annual Sales in terms of units sold. We are trying to develop an equation that will let us to predict units sold based on how much a company spends on radio advertising. The rows (observations) represent companies.

Company Radio ($) Sales Amazon 37.8 22. Google 39.3 10. Facebook 45.9 18. Apple 41.3 18.

1.2.1 Making predictions

Our prediction function outputs an estimate of sales given a company’s radio advertising spend and our current values for Weight and Bias.

𝑆𝑎𝑙𝑒𝑠 = 𝑊 𝑒𝑖𝑔ℎ𝑡 · 𝑅𝑎𝑑𝑖𝑜 + 𝐵𝑖𝑎𝑠

Weight the coefficient for the Radio independent variable. In machine learning we call coefficients weights.

Radio the independent variable. In machine learning we call these variables features.

Bias the intercept where our line intercepts the y-axis. In machine learning we can call intercepts bias. Bias offsets all predictions that we make.

4 Chapter 1. Linear Regression

Note:

𝑁 is the total number of observations (data points)
(^) 𝑁^1

𝑛 𝑖=1 is the mean

𝑦𝑖 is the actual value of an observation and 𝑚𝑥𝑖 + 𝑏 is our prediction

Code

def cost_function(radio, sales, weight, bias): companies = len(radio) total_error = 0. for i in range(companies): total_error += (sales[i] - (weightradio[i] + bias))* return total_error / companies

1.2.3 Gradient descent

To minimize MSE we use Gradient Descent to calculate the gradient of our cost function. Gradient descent consists of looking at the error that our weight currently gives us, using the derivative of the cost function to find the gradient (The slope of the cost function using our current weight), and then changing our weight to move in the direction opposite of the gradient. We need to move in the opposite direction of the gradient since the gradient points up the slope instead of down it, so we move in the opposite direction to try to decrease our error.

Math

There are two parameters (coefficients) in our cost function we can control: weight 𝑚 and bias 𝑏. Since we need to consider the impact each one has on the final prediction, we use partial derivatives. To find the partial derivatives, we use the Chain rule. We need the chain rule because (𝑦 − (𝑚𝑥 + 𝑏))^2 is really 2 nested functions: the inner function 𝑦 − (𝑚𝑥 + 𝑏) and the outer function 𝑥^2.

Returning to our cost function:

∑︁^ 𝑛

𝑖=

(𝑦𝑖 − (𝑚𝑥𝑖 + 𝑏))^2

Using the following:

(𝑦𝑖 − (𝑚𝑥𝑖 + 𝑏))^2 = 𝐴(𝐵(𝑚, 𝑏))

We can split the derivative into

𝐴(𝑥) = 𝑥^2 𝑑𝑓 𝑑𝑥

and

𝐵(𝑚, 𝑏) = 𝑦𝑖 − (𝑚𝑥𝑖 + 𝑏) = 𝑦𝑖 − 𝑚𝑥𝑖 − 𝑏 𝑑𝑥 𝑑𝑚

6 Chapter 1. Linear Regression

And then using the Chain rule which states:

𝑑𝑓 𝑑𝑚

We then plug in each of the parts to get the following derivatives

𝑑𝑓 𝑑𝑚

We can calculate the gradient of this cost function as:

[︂ 𝑑𝑓

𝑑𝑚𝑑𝑓 𝑑𝑏

]︂

[︂ 1

𝑁

1 −𝑥𝑖^ ·^ 2(𝑦𝑖^ −^ (𝑚𝑥𝑖^ +^ 𝑏))

𝑁

−^1 ·^ 2(𝑦𝑖^ −^ (𝑚𝑥𝑖^ +^ 𝑏))

]︂

[︂ 1

𝑁

1 −^2 𝑥𝑖(𝑦𝑖^ −^ (𝑚𝑥𝑖^ +^ 𝑏))

𝑁

−2(𝑦𝑖^ −^ (𝑚𝑥𝑖^ +^ 𝑏))

]︂

Code

To solve for the gradient, we iterate through our data points using our new weight and bias values and take the average of the partial derivatives. The resulting gradient tells us the slope of our cost function at our current position (i.e. weight and bias) and the direction we should update to reduce our cost function (we move in the direction opposite the gradient). The size of our update is controlled by the learning rate.

def update_weights(radio, sales, weight, bias, learning_rate): weight_deriv = 0 bias_deriv = 0 companies = len(radio)

for i in range(companies):

Calculate partial derivatives

-2x(y - (mx + b))

weight_deriv += -2radio[i] (^) * (sales[i] - (weightradio[i] + bias))

-2(y - (mx + b))

bias_deriv += -2(sales[i] - (weightradio[i] + bias))

We subtract because the derivatives point in direction of steepest ascent

weight -= (weight_deriv / companies) * learning_rate bias -= (bias_deriv / companies) * learning_rate

return weight, bias

1.2.4 Training

Training a model is the process of iteratively improving your prediction equation by looping through the dataset multiple times, each time updating the weight and bias values in the direction indicated by the slope of the cost function (gradient). Training is complete when we reach an acceptable error threshold, or when subsequent training iterations fail to reduce our cost.

Before training we need to initialize our weights (set default values), set our hyperparameters (learning rate and number of iterations), and prepare to log our progress over each iteration.

1.2. Simple regression 7

Visualizing

1.2. Simple regression 9

10 Chapter 1. Linear Regression

12 Chapter 1. Linear Regression

Cost history

1.2.6 Summary

By learning the best values for weight (.46) and bias (.25), we now have an equation that predicts future sales based on radio advertising investment.

𝑆𝑎𝑙𝑒𝑠 =. 46 𝑅𝑎𝑑𝑖𝑜 +. 025

How would our model perform in the real world? I’ll let you think about it :)

1.3 Multivariable regression

Let’s say we are given data on TV, radio, and newspaper advertising spend for a list of companies, and our goal is to predict sales in terms of units sold.

Company TV Radio News Units Amazon 230.1 37.8 69.1 22. Google 44.5 39.3 23.1 10. Facebook 17.2 45.9 34.7 18. Apple 151.5 41.3 13.2 18.

1.3. Multivariable regression 13

def normalize(features): ** features - (200, 3) features.T - (3, 200)

We transpose the input matrix, swapping cols and rows to make vector math easier **

for feature in features.T: fmean = np.mean(feature) frange = np.amax(feature) - np.amin(feature)

#Vector Subtraction feature -= fmean

#Vector Division feature /= frange

return features

Note: Matrix math. Before we continue, it’s important to understand basic Linear Algebra concepts as well as numpy functions like numpy.dot().

1.3.3 Making predictions

Our predict function outputs an estimate of sales given our current weights (coefficients) and a company’s TV, radio, and newspaper spend. Our model will try to identify weight values that most reduce our cost function.

𝑆𝑎𝑙𝑒𝑠 = 𝑊 1 𝑇 𝑉 + 𝑊 2 𝑅𝑎𝑑𝑖𝑜 + 𝑊 3 𝑁 𝑒𝑤𝑠𝑝𝑎𝑝𝑒𝑟

def predict(features, weights): ** features - (200, 3) weights - (3, 1) predictions - (200,1) ** predictions = np.dot(features, weights) return predictions

1.3.4 Initialize weights

W1 = 0. W2 = 0. W3 = 0. weights = np.array([ [W1], [W2], [W3] ])

1.3. Multivariable regression 15

1.3.5 Cost function

Now we need a cost function to audit how our model is performing. The math is the same, except we swap the 𝑚𝑥 + 𝑏 expression for 𝑊 1 𝑥 1 + 𝑊 2 𝑥 2 + 𝑊 3 𝑥 3. We also divide the expression by 2 to make derivative calculations simpler.

∑︁^ 𝑛

𝑖=

(𝑦𝑖 − (𝑊 1 𝑥 1 + 𝑊 2 𝑥 2 + 𝑊 3 𝑥 3 ))^2

def cost_function(features, targets, weights): ** features:(200,3) targets: (200,1) weights:(3,1) returns average squared error among predictions ** N = len(targets)

predictions = predict(features, weights)

Matrix math lets use do this without looping

sq_error = (predictions - targets)**

Return average squared error among predictions

return 1.0/(2*N) * sq_error.sum()

1.3.6 Gradient descent

Again using the Chain rule we can compute the gradient–a vector of partial derivatives describing the slope of the cost function for each weight.

𝑓 ′(𝑊 1 ) = −𝑥 1 (𝑦 − (𝑊 1 𝑥 1 + 𝑊 2 𝑥 2 + 𝑊 3 𝑥 3 )) (1.3) 𝑓 ′(𝑊 2 ) = −𝑥 2 (𝑦 − (𝑊 1 𝑥 1 + 𝑊 2 𝑥 2 + 𝑊 3 (1.4)𝑥 3 )) 𝑓 ′(𝑊 3 ) = −𝑥 3 (𝑦 − (𝑊 1 𝑥 1 + 𝑊 2 𝑥 2 + 𝑊 3 (1.5)𝑥 3 ))

def update_weights(features, targets, weights, lr): ''' Features:(200, 3) Targets: (200, 1) Weights:(3, 1) ''' predictions = predict(features, weights)

#Extract our features x1 = features[:,0] x2 = features[:,1] x3 = features[:,2]

Use matrix cross product (*) to simultaneously

calculate the derivative for each weight

d_w1 = -x1(targets - predictions) d_w2 = -x2(targets - predictions) d_w3 = -x3*(targets - predictions)

(continues on next page)

16 Chapter 1. Linear Regression

Machine Learning Cheatsheet Documentation, Cheat Sheet of Machine Learning

Related documents

Partial preview of the text

Download Machine Learning Cheatsheet Documentation and more Cheat Sheet Machine Learning in PDF only on Docsity!

Team

Sep 07, 2020

CHAPTER 1

Linear Regression

1.2.1 Making predictions

1.2.3 Gradient descent

∑︁^ 𝑛

(𝑦𝑖 − (𝑚𝑥𝑖 + 𝑏))^2

[︂ 𝑑𝑓

]︂

[︂ 1

1 −𝑥𝑖^ ·^ 2(𝑦𝑖^ −^ (𝑚𝑥𝑖^ +^ 𝑏))

−^1 ·^ 2(𝑦𝑖^ −^ (𝑚𝑥𝑖^ +^ 𝑏))

]︂

[︂ 1

1 −^2 𝑥𝑖(𝑦𝑖^ −^ (𝑚𝑥𝑖^ +^ 𝑏))

−2(𝑦𝑖^ −^ (𝑚𝑥𝑖^ +^ 𝑏))

]︂

Calculate partial derivatives

-2x(y - (mx + b))

-2(y - (mx + b))

We subtract because the derivatives point in direction of steepest ascent

1.2.4 Training

1.2.6 Summary

1.3.3 Making predictions

1.3.4 Initialize weights

1.3.5 Cost function

∑︁^ 𝑛

(𝑦𝑖 − (𝑊 1 𝑥 1 + 𝑊 2 𝑥 2 + 𝑊 3 𝑥 3 ))^2

Matrix math lets use do this without looping

Return average squared error among predictions

1.3.6 Gradient descent

Use matrix cross product (*) to simultaneously

calculate the derivative for each weight