






























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An in-depth exploration of machine learning, focusing on supervised learning and regression analysis. It explains the concept of machine learning, its applications in various fields such as banking, insurance, email filtering, and advertising, and the difference between supervised and unsupervised learning. The document delves into the process of supervised learning, including the steps to develop a model, the types of supervised learning tasks, and the use of linear regression for regression tasks. It also discusses the concept of goodness of fit and the mean squared error as a performance metric. The document concludes with a simple linear regression model and its estimation using ordinary least squares.
Typology: Slides
1 / 38
This page cannot be seen from the preview
Don't miss anything!
◦ (^) Used by banks to make decisions about approving loans. ◦ (^) Used by insurance companies to choose who to cover. ◦ (^) Used by your email providing to weed out spam emails. ◦ (^) Used by YouTube to generate suggested videos. ◦ (^) Used by companies to know who to market their product to. ◦ (^) Used by advertisers to know what ads to show where and when.
◦ (^) Can be used to generate suggested medical treatments.
Supervised Learning is possible when you have historic data in which you have data for the dependent variable / output / label. For example:
Supervised Learning is possible when you have historic data in which you have data for the output. In supervised learning, you develop a model that will generate predicted values of the dependent variable / output / label using the following steps:
Unsupervised Learning is used when you DO NOT have data for the dependent variable. For example:
We will focus on Supervised learning in this class. We will learn both regression and classification tasks.
Say we want to predict a person’s income (dependent variable: ). We think we can use characteristics about a person (independent variables) to help us predict their income. For example: Age, Education level, number of years of experience,… A function takes inputs and converts them to some output: Our task will be to create good models of economic variables.
A function has a functional form, the actual function itself. The most common is a linear functional form: are called parameters. It the slope of years of education. It tells us how much income changes as age, education, and experience change, holding the other independent variables constant. As we write this model we don’t know what are, we in order to use information on people’s age, education, and experience to predict their income, we must estimate the values of.
When we estimate the values of we will have our estimated function:. Note: the symbol is called “hat” and it denotes that something is estimated. Once we have the estimated function, we can plug in the inputs to generate the predictions:
◦ (^) Training Set – A subset of our dataset that you use to develop your model. ◦ (^) Test Set – A subset of the dataset not used to develop the model. This set is used to evaluate the performance of the model.
Linear Regression Linear Classification Linear Model Selection Non-Linear Models Tree-Based Methods Support Vector Machines Artificial Intelligence / Deep Learning / Neural Networks
We need a measure of the performance of a model. For regression tasks, we often use the Mean Squared Error (MSE): We want low. In general,
What is a good or? The answer is always based on context. is easier to evaluate because you can use intuition to know if the is large relative to the size of the outcome variable .