Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

ISYE7406_HW2_2 QUESTIONS AND ANSWERS Spring 2023, Exams of Engineering

ISYE7406_HW2_2 QUESTIONS AND ANSWERS

Typology: Exams

2022/2023

Available from 10/26/2023

Greatsolutions
Greatsolutions ๐Ÿ‡ฌ๐Ÿ‡ง

316 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ISyE7406 โ€“ HOMEWORK 4
Spring 2023
1. Introduction
In this homework, we will examine how various local smoothing methods estimate the
โ€œMexican hat functionโ€ at various points in the interval [-2ฯ€, 2ฯ€]. Using these methods,
we will be able to better understand how bias, variance, and mean square error of these
estimators perform given different spans, bandwidths, and local smoothing parameters.
2. Exploratory Data Analysis
The Mexican hat function is defined in the range [-2ฯ€, 2ฯ€] as ๐‘“(๐‘ฅ)=(1 โˆ’ ๐‘ฅ2)โˆ— ๐‘’โˆ’0.5๐‘ฅ2.
The true function is visualized below:
It is worth noting that we will add random bits of random noise to this function in our
analysis to see how our smoothers correspond. Otherwise, the main thing to note is that
this function holds a fairly static value, then we see a sharp set of wavelets, and then the
function steadies again.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download ISYE7406_HW2_2 QUESTIONS AND ANSWERS Spring 2023 and more Exams Engineering in PDF only on Docsity!

ISyE7406 โ€“ HOMEWORK 4

Spring 2023

1. Introduction

In this homework, we will examine how various local smoothing methods estimate the

โ€œMexican hat functionโ€ at various points in the interval [- 2 ฯ€, 2ฯ€]. Using these methods,

we will be able to better understand how bias, variance, and mean square error of these

estimators perform given different spans, bandwidths, and local smoothing parameters.

2. Exploratory Data Analysis

The Mexican hat function is defined in the range [- 2 ฯ€, 2ฯ€] as ๐‘“(๐‘ฅ)^ = ( 1 โˆ’ ๐‘ฅ^2 )^ โˆ— ๐‘’โˆ’^0.^5 ๐‘ฅ

2

The true function is visualized below:

It is worth noting that we will add random bits of random noise to this function in our

analysis to see how our smoothers correspond. Otherwise, the main thing to note is that

this function holds a fairly static value, then we see a sharp set of wavelets, and then the

function steadies again.

3. Methodology

We will explore a variety of classification techniques for this assignment. These

include:

1. Loess Smoothing

2. Nadaraya-Watson (NW) Kernel Smoothing

3. Spline Smoothing

In addition to testing these methods, itโ€™s also worth mentioning that the โ€˜yโ€™ values

we use to train the models have added errors which are independent and identically

distributed with a mean of 0 and a standard deviation of 0.2. So instead of the smooth

function we have above in the Data Exploration section, we get something a little more

jittery.

It's also worth noting that there are two different methodologies we use in parts

one and two. In part one, the โ€˜xโ€™ values in the core function are evenly distributed from [-

2 ฯ€, 2ฯ€], but in part two we use non equidistant points to train the smoothers. This will have

an impact on the smoothing we see in the results.

Finally, we use a monte carlo simulation to generate the y values for each โ€˜xโ€™ value

1,000 times, and we apply the smoothing methods to this each time to calculate the bias,

variance, and mean square error.

Variance at each point ๐’™๐’Š

MSE at each point ๐’™๐’Š

Part 2 โ€“ Using Non-equidistant X-Values in the Smoothers. Note, Black lines indicate

loess methods, red indicates NW smoothing, and blue indicates spline smoothing.

Variance at each point ๐’™๐’Š

MSE at each point ๐’™๐’Š

5. Conclusions

As a whole for part one, we can see that the loess smoothing method shows very little

variance in its predictions for the data points. Correspondingly, there was a bias of over and

underpredicting the core function for loess smoothing at points in the equation where the

function sways from a flat line. This occurs most dramatically between negative pi and pi on the

x-axis.

While the NW and spline smoothing show very similar bias and MSE in part one, we see a

fairly large difference in variance, with NW being significantly higher than spline and loess

smoothing (though the low loess variance makes sense since the mean ๐‘“(๐‘ฅ๐‘–) values stay in a

fairly tight range. Itโ€™s fair to ask why NW shows such high variance when its mean estimates are

very similar to the spline method, and to answer that, we can look at the minimum and

maximum values at each ๐‘ฅ๐‘– in the function domain for part 1. In the figure below, we can see

**_## Part #1 deterministic equidistant design

Generate n=101 equidistant points in [- 2 \pi, 2\pi]_**

m <- 1000 n <- 101 x <- 2 piseq(- 1 , 1 , length=n) ## Initialize the matrix of fitted values for three methods fvlp <- fvnw <- fvss <- matrix( 0 , nrow= n, ncol= m) ##Generate data, fit the data and store the fitted values for (j in 1 :m){ **_## simulate y-values

Note that you need to replace $f(x)$ below by the mathematical definitio

n in eq. (2)** xlocal = c() for (i in 1 :n) { xlocal[i] = ( 2 * pi) * (- 1 + 2 ((i- 1 )/(n- 1 ))) } f = function () { yi = c() for (i in 1 :n) { yi[i] = ( 1 - (xlocal[i]* 2 )) * exp(-0.5(xlocal[i]* 2 )) } return(yi) } y <- f() + rnorm(length(xlocal), sd=0.2); **## Get the estimates and store them_** fvlp[,j] <- predict(loess(y ~ x, span = 0.75), newdata = x); fvnw[,j] <- ksmooth(x, y, kernel="normal", bandwidth= 0.2, x.points=x)$y; fvss[,j] <- predict(smooth.spline(y ~ x), x=x)$y } ## Below is the sample R code to plot the mean of three estimators in a singl e plot meanlp = apply(fvlp, 1 ,mean); meannw = apply(fvnw, 1 ,mean); meanss = apply(fvss, 1 ,mean); dmin = min( meanlp, meannw, meanss); dmax = max( meanlp, meannw, meanss); matplot(x, meanlp, "l", ylim=c(dmin, dmax), ylab="Response") matlines(x, meannw, col="red") matlines(x, meanss, col="blue") points(x, y)

matplot(x, y, "l", ylim=c(min(y), max(y))) points(x,y)

dmax = max(varlp, varnw, varss); matplot(x, varlp, "l", ylim=c(dmin, dmax), ylab="Variance") matlines(x, varnw, col="red") matlines(x, varss, col="blue") _#Double Check

varlp2 = apply(fvlp,1,var);

varnw2 = apply(fvnw,1,var);

varss2 = apply(fvss,1,var);

dmin = min(varlp2, varnw2, varss2);

dmax = max(varlp2, varnw2, varss2);

matplot(x, varlp2, "l", ylim=c(dmin, dmax), ylab="Response")

matlines(x, varnw2, col="red")

matlines(x, varss2, col="blue")

#MSE_ mselp = replicate( 101 , 0 ) msenw = replicate( 101 , 0 ) msess = replicate( 101 , 0 ) for (i in 1 :n) { for (j in 1 :m) { mselp[i] = mselp[i] + (fvlp[i, j] - y[i])** 2 msenw[i] = msenw[i] + (fvnw[i, j] - y[i])** 2 msess[i] = msess[i] + (fvss[i, j] - y[i])** 2 }

mselp = mselp / m msenw = msenw / m msess = msess / m dmin = min(mselp, msenw, msess); dmax = max(mselp, msenw, msess); matplot(x, mselp, "l", ylim=c(dmin, dmax), ylab="MSE") matlines(x, msenw, col="red") matlines(x, msess, col="blue") #Plot min and max of NW and SS! minsv1nw = apply(fvnw, 1 , min) maxsv1nw = apply(fvnw, 1 , max) minsv1ss = apply(fvss, 1 , min) maxsv1ss = apply(fvss, 1 , max) dmin = min(minsv1nw, minsv1ss); dmax = max(maxsv1nw, maxsv1ss); matplot(x, minsv1nw, "l", ylim=c(dmin, dmax), ylab="Min/Max", col="red", lty = 'dashed') matlines(x, maxsv1nw, col="red", lty = 'dashed') #matlines(x, meannw, col="red") matlines(x, minsv1ss, col="blue", lty = 'dashed') matlines(x, maxsv1ss, col="blue", lty = 'dashed')

meanss = apply(fvss, 1 ,mean); dmin = min( meanlp, meannw, meanss); dmax = max( meanlp, meannw, meanss); matplot(x, meanlp, "l", ylim=c(dmin, dmax), ylab="Response") matlines(x, meannw, col="red") matlines(x, meanss, col="blue") points(x,y) matplot(x, y, "l", ylim=c(min(y), max(y))) points(x,y)

biaslp = meanlp - y biasnw = meannw - y biasss = meanss - y dmin = min(biaslp, biasnw, biasss); dmax = max(biaslp, biasnw, biasss); matplot(x, biaslp, "l", ylim=c(dmin, dmax), ylab="Response") matlines(x, biasnw, col="red") matlines(x, biasss, col="blue")

_#Double Check

varlp2 = apply(fvlp,1,var);

varnw2 = apply(fvnw,1,var);

varss2 = apply(fvss,1,var);

dmin = min(varlp2, varnw2, varss2);

dmax = max(varlp2, varnw2, varss2);

matplot(x, varlp2, "l", ylim=c(dmin, dmax), ylab="Response")

matlines(x, varnw2, col="red")

matlines(x, varss2, col="blue")

#MSE_ mselp = replicate( 101 , 0 ) msenw = replicate( 101 , 0 ) msess = replicate( 101 , 0 ) for (i in 1 :n) { for (j in 1 :m) { mselp[i] = mselp[i] + (fvlp[i, j] - y[i])** 2 msenw[i] = msenw[i] + (fvnw[i, j] - y[i])** 2 msess[i] = msess[i] + (fvss[i, j] - y[i])** 2 } } mselp = mselp / m msenw = msenw / m msess = msess / m

dmin = min(mselp, msenw, msess); dmax = max(mselp, msenw, msess); matplot(x, mselp, "l", ylim=c(dmin, dmax), ylab="Response") matlines(x, msenw, col="red") matlines(x, msess, col="blue")