Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Machine Learning: Supervised Learning and Regression Analysis, Slides of Data Mining

An in-depth exploration of machine learning, focusing on supervised learning and regression analysis. It explains the concept of machine learning, its applications in various fields such as banking, insurance, email filtering, and advertising, and the difference between supervised and unsupervised learning. The document delves into the process of supervised learning, including the steps to develop a model, the types of supervised learning tasks, and the use of linear regression for regression tasks. It also discusses the concept of goodness of fit and the mean squared error as a performance metric. The document concludes with a simple linear regression model and its estimation using ordinary least squares.

Typology: Slides

2022/2023

Uploaded on 04/03/2024

rasha-alzahrani
rasha-alzahrani 🇺🇸

1 document

1 / 38

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Machine Learning
Introduction
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26

Partial preview of the text

Download Machine Learning: Supervised Learning and Regression Analysis and more Slides Data Mining in PDF only on Docsity!

Machine Learning

Introduction

Why Machine Learning?

Machine learning is a set of tools that are used to analyze data and use data to generate predictions.

Using data to generate predictions is an incredibly powerful tool. It is used all around us:

◦ (^) Used by banks to make decisions about approving loans. ◦ (^) Used by insurance companies to choose who to cover. ◦ (^) Used by your email providing to weed out spam emails. ◦ (^) Used by YouTube to generate suggested videos. ◦ (^) Used by companies to know who to market their product to. ◦ (^) Used by advertisers to know what ads to show where and when.

It is also a rapidly growing field

◦ (^) Can be used to generate suggested medical treatments.

Supervised Learning

Supervised Learning is possible when you have historic data in which you have data for the dependent variable / output / label. For example:

◦ You want to develop a machine learning algorithm that will predict the sale price of a house based on

the characteristics of that house.

◦ The characteristics are the independent variables / features / characteristics.

◦ The sale price is the dependent variable / output / label.

◦ Supervised learning is analysis with data in which you have data for the dependent variable / output /

label.

◦ You can conducted supervised learning if you have historic data for sale prices.

Supervised Learning

Supervised Learning is possible when you have historic data in which you have data for the output. In supervised learning, you develop a model that will generate predicted values of the dependent variable / output / label using the following steps:

  1. Estimate a model using a sub-set of the data.
  2. Ensure the model fits the data well (more on this later).
  3. Use the data to generate predicted values of the dependent variable/output/label using novel feature/independent variable/characteristic data.

Unsupervised Learning

Unsupervised Learning is used when you DO NOT have data for the dependent variable. For example:

◦ You want to develop a machine learning algorithm that will predict the sale price of a house based on

the characteristics of that house.

◦ The characteristics are the independent variables/features.

◦ The sale price is the dependent variable / output / label.

◦ You can conducted unsupervised learning if you do not have historic data for sale prices.

This class…

We will focus on Supervised learning in this class. We will learn both regression and classification tasks.

Example of a Model

Say we want to predict a person’s income (dependent variable: ). We think we can use characteristics about a person (independent variables) to help us predict their income. For example: Age, Education level, number of years of experience,… A function takes inputs and converts them to some output: Our task will be to create good models of economic variables.

A Simple Model

A function has a functional form, the actual function itself. The most common is a linear functional form: are called parameters. It the slope of years of education. It tells us how much income changes as age, education, and experience change, holding the other independent variables constant. As we write this model we don’t know what are, we in order to use information on people’s age, education, and experience to predict their income, we must estimate the values of.

A Simple Model

When we estimate the values of we will have our estimated function:. Note: the symbol is called “hat” and it denotes that something is estimated. Once we have the estimated function, we can plug in the inputs to generate the predictions:

Machine Learning Overview

1. SPLIT DATA. We split our dataset into two groups: Training Set and Test Set.

◦ (^) Training Set – A subset of our dataset that you use to develop your model. ◦ (^) Test Set – A subset of the dataset not used to develop the model. This set is used to evaluate the performance of the model.

  1. TRAIN ALGORITHM. Develop a Model / Machine learning algorithm using just the Training Set.
  2. GENERATE PREDICTIONS. Using the Model / Machine learning algorithm, generate predicted values of the dependent variable using independent variables from the Test Set.
  3. MEASURE PERFORMANCE. Measure the performance of your model by comparing the predict values from the Test Set to the observed dependent variable values in the Test Set. How well the model generates predictions that fit the dependent variable in the test set is your measure of the performance of the machine learning model.
  4. MODIFY MODEL. If the performance is poor, you may make adjustments to the model in the TRAIN ALGORITHM step and then repeat steps 3 and 4. Once satisfied, you can deploy the algorithm.

Machine Learning Methods

Linear Regression Linear Classification Linear Model Selection Non-Linear Models Tree-Based Methods Support Vector Machines Artificial Intelligence / Deep Learning / Neural Networks

Goodness of Fit

We need a measure of the performance of a model. For regression tasks, we often use the Mean Squared Error (MSE): We want low. In general,

Goodness of Fit

What is a good or? The answer is always based on context. is easier to evaluate because you can use intuition to know if the is large relative to the size of the outcome variable .

Linear Regression