Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Machine Learning: Supervised Learning and Regression Analysis, Slides of Data Mining

The Catholic University of America (CUA)Data Mining

An in-depth exploration of machine learning, focusing on supervised learning and regression analysis. It explains the concept of machine learning, its applications in various fields such as banking, insurance, email filtering, and advertising, and the difference between supervised and unsupervised learning. The document delves into the process of supervised learning, including the steps to develop a model, the types of supervised learning tasks, and the use of linear regression for regression tasks. It also discusses the concept of goodness of fit and the mean squared error as a performance metric. The document concludes with a simple linear regression model and its estimation using ordinary least squares.

Typology: Slides

2022/2023

Uploaded on 04/03/2024

rasha-alzahrani 🇺🇸

1 document

1 / 38

This page cannot be seen from the preview

Don't miss anything!

Machine Learning

Introduction

Partial preview of the text

Download Machine Learning: Supervised Learning and Regression Analysis and more Slides Data Mining in PDF only on Docsity!

Machine Learning

Introduction

Why Machine Learning?

Machine learning is a set of tools that are used to analyze data and use data to generate predictions.

Using data to generate predictions is an incredibly powerful tool. It is used all around us:

◦ (^) Used by banks to make decisions about approving loans. ◦ (^) Used by insurance companies to choose who to cover. ◦ (^) Used by your email providing to weed out spam emails. ◦ (^) Used by YouTube to generate suggested videos. ◦ (^) Used by companies to know who to market their product to. ◦ (^) Used by advertisers to know what ads to show where and when.

It is also a rapidly growing field

◦ (^) Can be used to generate suggested medical treatments.

Supervised Learning

Supervised Learning is possible when you have historic data in which you have data for the dependent variable / output / label. For example:

◦ You want to develop a machine learning algorithm that will predict the sale price of a house based on

the characteristics of that house.

◦ The characteristics are the independent variables / features / characteristics.

◦ The sale price is the dependent variable / output / label.

◦ Supervised learning is analysis with data in which you have data for the dependent variable / output /

label.

◦ You can conducted supervised learning if you have historic data for sale prices.

Supervised Learning

Supervised Learning is possible when you have historic data in which you have data for the output. In supervised learning, you develop a model that will generate predicted values of the dependent variable / output / label using the following steps:

Estimate a model using a sub-set of the data.
Ensure the model fits the data well (more on this later).
Use the data to generate predicted values of the dependent variable/output/label using novel feature/independent variable/characteristic data.

Unsupervised Learning

Unsupervised Learning is used when you DO NOT have data for the dependent variable. For example:

◦ You want to develop a machine learning algorithm that will predict the sale price of a house based on

the characteristics of that house.

◦ The characteristics are the independent variables/features.

◦ The sale price is the dependent variable / output / label.

◦ You can conducted unsupervised learning if you do not have historic data for sale prices.

This class…

We will focus on Supervised learning in this class. We will learn both regression and classification tasks.

Example of a Model

Say we want to predict a person’s income (dependent variable: ). We think we can use characteristics about a person (independent variables) to help us predict their income. For example: Age, Education level, number of years of experience,… A function takes inputs and converts them to some output: Our task will be to create good models of economic variables.

A Simple Model

A function has a functional form, the actual function itself. The most common is a linear functional form: are called parameters. It the slope of years of education. It tells us how much income changes as age, education, and experience change, holding the other independent variables constant. As we write this model we don’t know what are, we in order to use information on people’s age, education, and experience to predict their income, we must estimate the values of.

A Simple Model

When we estimate the values of we will have our estimated function:. Note: the symbol is called “hat” and it denotes that something is estimated. Once we have the estimated function, we can plug in the inputs to generate the predictions:

Machine Learning Overview

1. SPLIT DATA. We split our dataset into two groups: Training Set and Test Set.

◦ (^) Training Set – A subset of our dataset that you use to develop your model. ◦ (^) Test Set – A subset of the dataset not used to develop the model. This set is used to evaluate the performance of the model.

TRAIN ALGORITHM. Develop a Model / Machine learning algorithm using just the Training Set.
GENERATE PREDICTIONS. Using the Model / Machine learning algorithm, generate predicted values of the dependent variable using independent variables from the Test Set.
MEASURE PERFORMANCE. Measure the performance of your model by comparing the predict values from the Test Set to the observed dependent variable values in the Test Set. How well the model generates predictions that fit the dependent variable in the test set is your measure of the performance of the machine learning model.
MODIFY MODEL. If the performance is poor, you may make adjustments to the model in the TRAIN ALGORITHM step and then repeat steps 3 and 4. Once satisfied, you can deploy the algorithm.

Machine Learning Methods

Linear Regression Linear Classification Linear Model Selection Non-Linear Models Tree-Based Methods Support Vector Machines Artificial Intelligence / Deep Learning / Neural Networks

Goodness of Fit

We need a measure of the performance of a model. For regression tasks, we often use the Mean Squared Error (MSE): We want low. In general,

Goodness of Fit

What is a good or? The answer is always based on context. is easier to evaluate because you can use intuition to know if the is large relative to the size of the outcome variable .

Machine Learning: Supervised Learning and Regression Analysis, Slides of Data Mining

Related documents

Partial preview of the text

Download Machine Learning: Supervised Learning and Regression Analysis and more Slides Data Mining in PDF only on Docsity!

Machine Learning

Introduction

Why Machine Learning?

Machine learning is a set of tools that are used to analyze data and use data to generate predictions.

Using data to generate predictions is an incredibly powerful tool. It is used all around us:

It is also a rapidly growing field

Supervised Learning

◦ You want to develop a machine learning algorithm that will predict the sale price of a house based on

the characteristics of that house.

◦ The characteristics are the independent variables / features / characteristics.

◦ The sale price is the dependent variable / output / label.

◦ Supervised learning is analysis with data in which you have data for the dependent variable / output /

label.

◦ You can conducted supervised learning if you have historic data for sale prices.

Supervised Learning

Unsupervised Learning

◦ You want to develop a machine learning algorithm that will predict the sale price of a house based on

the characteristics of that house.

◦ The characteristics are the independent variables/features.

◦ The sale price is the dependent variable / output / label.

◦ You can conducted unsupervised learning if you do not have historic data for sale prices.

This class…

Example of a Model

A Simple Model

A Simple Model

Machine Learning Overview

1. SPLIT DATA. We split our dataset into two groups: Training Set and Test Set.

Machine Learning Methods

Goodness of Fit

Goodness of Fit

Linear Regression