Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

caret: Classification and Regression Training, Lecture notes of Statistics

McDaniel College Statistics

Description Misc functions for training and plotting classification and regression models.

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

hambery 🇺🇸

4.2

(12)

269 documents

1 / 224

This page cannot be seen from the preview

Don't miss anything!

Package ‘caret’

April 19, 2022

Title Classification and Regression Training

Version 6.0-92

Description Misc functions for training and plotting classification and

regression models.

License GPL (>= 2)

URL https://github.com/topepo/caret/

BugReports https://github.com/topepo/caret/issues

Depends ggplot2, lattice (>= 0.20), R (>= 3.2.0)

Imports e1071, foreach, grDevices, methods, ModelMetrics (>= 1.2.2.2),

nlme, plyr, pROC, recipes (>= 0.1.10), reshape2, stats, stats4,

utils, withr (>= 2.0.0)

Suggests BradleyTerry2, covr, Cubist, dplyr, earth (>= 2.2-3),

ellipse, fastICA, gam (>= 1.15), ipred, kernlab, klaR, knitr,

MASS, Matrix, mda, mgcv, mlbench, MLmetrics, nnet, pamr, party

(>= 0.9-99992), pls, proxy, randomForest, RANN, rmarkdown,

rpart, spls, subselect, superpc, testthat (>= 0.9.1), themis

(>= 0.1.3)

VignetteBuilder knitr

Encoding UTF-8

RoxygenNote 7.1.2

NeedsCompilation yes

Author Max Kuhn [aut, cre] (<https://orcid.org/0000-0003-2402-136X>),

Jed Wing [ctb],

Steve Weston [ctb],

Andre Williams [ctb],

Chris Keefer [ctb],

Allan Engelhardt [ctb],

Tony Cooper [ctb],

Zachary Mayer [ctb],

Brenton Kenkel [ctb],

R Core Team [ctb],

1

Partial preview of the text

Download caret: Classification and Regression Training and more Lecture notes Statistics in PDF only on Docsity!

Package ‘caret’

April 19, 2022

Title Classification and Regression Training

Version 6.0-

Description Misc functions for training and plotting classification and regression models.

License GPL (>= 2)

URL https://github.com/topepo/caret/

BugReports https://github.com/topepo/caret/issues

Depends ggplot2, lattice (>= 0.20), R (>= 3.2.0)

Imports e1071, foreach, grDevices, methods, ModelMetrics (>= 1.2.2.2), nlme, plyr, pROC, recipes (>= 0.1.10), reshape2, stats, stats4, utils, withr (>= 2.0.0)

Suggests BradleyTerry2, covr, Cubist, dplyr, earth (>= 2.2-3), ellipse, fastICA, gam (>= 1.15), ipred, kernlab, klaR, knitr, MASS, Matrix, mda, mgcv, mlbench, MLmetrics, nnet, pamr, party (>= 0.9-99992), pls, proxy, randomForest, RANN, rmarkdown, rpart, spls, subselect, superpc, testthat (>= 0.9.1), themis (>= 0.1.3)

VignetteBuilder knitr

Encoding UTF-

RoxygenNote 7.1.

NeedsCompilation yes

Author Max Kuhn [aut, cre] (https://orcid.org/0000-0003-2402-136X), Jed Wing [ctb], Steve Weston [ctb], Andre Williams [ctb], Chris Keefer [ctb], Allan Engelhardt [ctb], Tony Cooper [ctb], Zachary Mayer [ctb], Brenton Kenkel [ctb], R Core Team [ctb],

2 R topics documented:

Michael Benesty [ctb], Reynald Lescarbeau [ctb], Andrew Ziem [ctb], Luca Scrucca [ctb], Yuan Tang [ctb], Can Candan [ctb], Tyler Hunt [ctb]

Maintainer Max Kuhn mxkuhn@gmail.com

Repository CRAN

Date/Publication 2022-04-19 06:52:35 UTC

R topics documented:

as.matrix.confusionMatrix................................. 4 avNNet........................................... 5 bag.............................................. 8 bagEarth........................................... 11 bagFDA........................................... 13 BloodBrain......................................... 15 BoxCoxTrans........................................ 15 calibration.......................................... 17 caretSBF.......................................... 20 cars............................................. 21 classDist........................................... 22 confusionMatrix....................................... 24 confusionMatrix.train.................................... 27 cox2............................................. 29 createDataPartition..................................... 30 defaultSummary....................................... 32 densityplot.rfe........................................ 35 dhfr............................................. 36 diff.resamples........................................ 37 dotPlot............................................ 39 dotplot.diff.resamples.................................... 40 downSample......................................... 41 dummyVars......................................... 42 extractPrediction...................................... 46 featurePlot.......................................... 48 filterVarImp......................................... 49 findCorrelation....................................... 51 findLinearCombos..................................... 52 format.bagEarth....................................... 54 gafs.default......................................... 55 gafsControl......................................... 58 gafs_initial......................................... 61 GermanCredit........................................ 63 getSamplingInfo...................................... 64

4 as.matrix.confusionMatrix

segmentationData...................................... 153 SLC14_1.......................................... 154 spatialSign.......................................... 158 summary.bagEarth..................................... 159 tecator............................................ 160 thresholder......................................... 161 train............................................. 163 trainControl......................................... 169 train_model_list....................................... 174 update.safs......................................... 206 update.train......................................... 207 varImp............................................ 208 varImp.gafs......................................... 213 var_seq........................................... 214 xyplot.resamples...................................... 215

Index 219

as.matrix.confusionMatrix Confusion matrix as a table

Description

Conversion functions for class confusionMatrix

Usage

S3 method for class 'confusionMatrix'

as.matrix(x, what = "xtabs", ...)

Arguments

x an object of class confusionMatrix what data to convert to matrix. Either "xtabs", "overall" or "classes" ... not currently used

Details

For as.table, the cross-tabulations are saved. For as.matrix, the three object types are saved in matrix format.

Value

A matrix or table

Author(s)

Max Kuhn

avNNet 5

Examples

###################

2 class example

lvs <- c("normal", "abnormal") truth <- factor(rep(lvs, times = c(86, 258)), levels = rev(lvs)) pred <- factor( c( rep(lvs, times = c(54, 32)), rep(lvs, times = c(27, 231))), levels = rev(lvs))

xtab <- table(pred, truth)

results <- confusionMatrix(xtab) as.table(results) as.matrix(results) as.matrix(results, what = "overall") as.matrix(results, what = "classes")

###################

3 class example

xtab <- confusionMatrix(iris$Species, sample(iris$Species)) as.matrix(xtab)

avNNet Neural Networks Using Model Averaging

Description

Aggregate several neural network models

Usage

avNNet(x, ...)

S3 method for class 'formula'

avNNet( formula, data, weights, ..., repeats = 5, bag = FALSE, allowParallel = TRUE, seeds = sample.int(1e+05, repeats),

avNNet 7

newdata matrix or data frame of test examples. A vector is considered to be a row vector comprising a single case. type Type of output, either: raw for the raw outputs, code for the predicted class or prob for the class probabilities.

Details

Following Ripley (1996), the same neural network model is fit using different random number seeds. All the resulting models are used for prediction. For regression, the output from each network are averaged. For classification, the model scores are first averaged, then translated to predicted classes. Bagging can also be used to create the models. If a parallel backend is registered, the foreach package is used to train the networks in parallel.

Value

For avNNet, an object of "avNNet" or "avNNet.formula". Items of interest in #’ the output are:

model a list of the models generated from nnet repeats an echo of the model input names if any predictors had only one distinct value, this is a character string of the #’ remaining columns. Otherwise a value of NULL

Author(s)

These are heavily based on the nnet code from Brian Ripley.

References

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.

See Also

nnet, preProcess

Examples

data(BloodBrain)

Not run:

modelFit <- avNNet(bbbDescr, logBBB, size = 5, linout = TRUE, trace = FALSE) modelFit

predict(modelFit, bbbDescr)

End(Not run)

8 bag

bag A General Framework For Bagging

Description

bag provides a framework for bagging classification or regression models. The user can provide their own functions for model building, prediction and aggregation of predictions (see Details be- low).

Usage

bag(x, ...)

bagControl( fit = NULL, predict = NULL, aggregate = NULL, downSample = FALSE, oob = TRUE, allowParallel = TRUE )

Default S3 method:

bag(x, y, B = 10, vars = ncol(x), bagControl = NULL, ...)

S3 method for class 'bag'

predict(object, newdata = NULL, ...)

S3 method for class 'bag'

print(x, ...)

S3 method for class 'bag'

summary(object, ...)

S3 method for class 'summary.bag'

print(x, digits = max(3, getOption("digits") - 3), ...)

ldaBag

plsBag

nbBag

ctreeBag

svmBag

10 bag

Details

The function is basically a framework where users can plug in any model in to assess the effect of bagging. Examples functions can be found in ldaBag, plsBag , nbBag, svmBag and nnetBag. Each has elements fit, pred and aggregate. One note: when vars is not NULL, the sub-setting occurs prior to the fit and #’ predict functions are called. In this way, the user probably does not need to account for the #’ change in predictors in their functions. When using bag with train, classification models should use type = "prob" #’ inside of the predict function so that predict.train(object,newdata,type = "prob") will #’ work. If a parallel backend is registered, the foreach package is used to train the models in parallel.

Value

bag produces an object of class bag with elements

fits a list with two sub-objects: the fit object has the actual model fit for that #’ bagged samples and the vars object is either NULL or a vector of integers corre- sponding to which predictors were sampled for that model control a mirror of the arguments passed into bagControl call the call B the number of bagging iterations dims the dimensions of the training set

Author(s)

Max Kuhn

Examples

A simple example of bagging conditional inference regression trees:

data(BloodBrain)

treebag <- bag(bbbDescr, logBBB, B = 10,

bagControl = bagControl(fit = ctreeBag$fit,

predict = ctreeBag$pred,

aggregate = ctreeBag$aggregate))

An example of pooling posterior probabilities to generate class predictions

data(mdrr)

remove some zero variance predictors and linear dependencies

mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)] mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .95)]

basicLDA <- train(mdrrDescr, mdrrClass, "lda")

bagEarth 11

bagLDA2 <- train(mdrrDescr, mdrrClass,

"bag",

B = 10,

bagControl = bagControl(fit = ldaBag$fit,

predict = ldaBag$pred,

aggregate = ldaBag$aggregate),

tuneGrid = data.frame(vars = c((1:10)*10 , ncol(mdrrDescr))))

bagEarth Bagged Earth

Description

A bagging wrapper for multivariate adaptive regression splines (MARS) via the earth function

Usage

bagEarth(x, ...)

Default S3 method:

bagEarth(x, y, weights = NULL, B = 50, summary = mean, keepX = TRUE, ...)

S3 method for class 'formula'

bagEarth( formula, data = NULL, B = 50, summary = mean, keepX = TRUE, ..., subset, weights = NULL, na.action = na.omit )

S3 method for class 'bagEarth'

print(x, ...)

Arguments

x matrix or data frame of ’x’ values for examples. ... arguments passed to the earth function y matrix or data frame of numeric values outcomes. weights (case) weights for each example - if missing defaults to 1. B the number of bootstrap samples

bagFDA 13

fit2 <- bagEarth(x = trees[,-3], y = trees[,3], B = 10)

End(Not run)

bagFDA Bagged FDA

Description

A bagging wrapper for flexible discriminant analysis (FDA) using multivariate adaptive regression splines (MARS) basis functions

Usage

bagFDA(x, ...)

Default S3 method:

bagFDA(x, y, weights = NULL, B = 50, keepX = TRUE, ...)

S3 method for class 'formula'

bagFDA( formula, data = NULL, B = 50, keepX = TRUE, ..., subset, weights = NULL, na.action = na.omit )

S3 method for class 'bagFDA'

print(x, ...)

Arguments

x matrix or data frame of ’x’ values for examples. ... arguments passed to the mars function y matrix or data frame of numeric values outcomes. weights (case) weights for each example - if missing defaults to 1. B the number of bootstrap samples keepX a logical: should the original training data be kept? formula A formula of the form y ~ x1 + x2 + ... data Data frame from which variables specified in ’formula’ are preferentially to be taken.

14 bagFDA

subset An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) na.action A function to specify the action to be taken if ’NA’s are found. The default action is for the procedure to fail. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)

Details

The function computes a FDA model for each bootstap sample.

Value

A list with elements

fit a list of B FDA fits B the number of bootstrap samples call the function call x either NULL or the value of x, depending on the value of keepX oob a matrix of performance estimates for each bootstrap sample

Author(s)

Max Kuhn (bagFDA.formula is based on Ripley’s nnet.formula)

References

J. Friedman, “Multivariate Adaptive Regression Splines” (with discussion) (1991). Annals of Statis- tics, 19/1, 1-141.

See Also

fda, predict.bagFDA

Examples

library(mlbench) library(earth) data(Glass)

set.seed(36) inTrain <- sample(1:dim(Glass)[1], 150)

trainData <- Glass[ inTrain, ] testData <- Glass[-inTrain, ]

set.seed(3577) baggedFit <- bagFDA(Type ~ ., trainData) confusionMatrix(data = predict(baggedFit, testData[, -10]),

16 BoxCoxTrans

S3 method for class 'BoxCoxTrans'

print(x, newdata, digits = 3, ...)

S3 method for class 'BoxCoxTrans'

predict(object, newdata, ...)

Arguments

y a numeric vector of data to be transformed. For BoxCoxTrans, the data must be strictly positive. ... for BoxCoxTrans: options to pass to boxcox. plotit should not be passed through. For predict.BoxCoxTrans, additional arguments are ignored. x an optional dependent variable to be used in a linear model. fudge a tolerance value: lambda values within +/-fudge will be coerced to 0 and within 1+/-fudge will be coerced to 1. numUnique how many unique values should y have to estimate the transformation? na.rm a logical value indicating whether NA values should be stripped from y and x before the computation proceeds. newdata a numeric vector of values to transform. digits minimal number of significant digits. object an object of class BoxCoxTrans or expoTrans.

Details

BoxCoxTrans function is basically a wrapper for the boxcox function in the MASS library. It can be used to estimate the transformation and apply it to new data. expoTrans estimates the exponential transformation of Manly (1976) but assumes a common mean for the data. The transformation parameter is estimated by directly maximizing the likelihood. If any(y <= 0) or if length(unique(y)) < numUnique, lambda is not estimated and no transfor- mation is applied.

Value

Both functions returns a list of class of either BoxCoxTrans or expoTrans with elements

lambda estimated transformation value fudge value of fudge n number of data points used to estimate lambda summary the results of summary(y) ratio max(y)/min(y) skewness sample skewness statistic

BoxCoxTrans also returns:

fudge value of fudge

The predict functions returns numeric vectors of transformed values

calibration 17

Author(s)

Max Author

References

Box, G. E. P. and Cox, D. R. (1964) An analysis of transformations (with discussion). Journal of the Royal Statistical Society B, 26, 211-252. Manly, B. L. (1976) Exponential data transformations. The Statistician, 25, 37 - 42.

See Also

boxcox, preProcess, optim

Examples

data(BloodBrain)

ratio <- exp(logBBB) bc <- BoxCoxTrans(ratio) bc

predict(bc, ratio[1:5])

ratio[5] <- NA bc2 <- BoxCoxTrans(ratio, bbbDescr$tpsa, na.rm = TRUE) bc

manly <- expoTrans(ratio) manly

calibration Probability Calibration Plot

Description

For classification models, this function creates a ’calibration plot’ that describes how consistent model probabilities are with observed event rates.

Usage

calibration(x, ...)

Default S3 method:

calibration(x, ...)

S3 method for class 'formula'

calibration(

calibration 19

Details

calibration.formula is used to process the data and xyplot.calibration is used to create the plot. To construct the calibration plot, the following steps are used for each model:

The data are split into cuts -1 roughly equal groups by their class probabilities
the number of samples with true results equal to class are determined
the event rate is determined for each bin

xyplot.calibration produces a plot of the observed event rate by the mid-point of the bins. This implementation uses the lattice function xyplot, so plot elements can be changed via panel functions, trellis.par.set or other means. calibration uses the panel function panel.calibration by default, but it can be changed by passing that argument into xyplot.calibration. The following elements are set by default in the plot but can be changed by passing new values into xyplot.calibration: xlab = "Bin Midpoint", ylab = "Observed Event Percentage", type = "o", ylim = extendrange(c(0,100)),xlim = extendrange(c(0,100)) and panel = panel.calibration For the ggplot method, confidence intervals on the estimated proportions (from binom.test) are also shown.

Value

calibration.formula returns a list with elements:

data the data used for plotting cuts the number of cuts class the event class probNames the names of the model probabilities

xyplot.calibration returns a lattice object

Author(s)

Max Kuhn, some lattice code and documentation by Deepayan Sarkar

See Also

xyplot, trellis.par.set

Examples

Not run:

data(mdrr) mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)] mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .5)]

inTrain <- createDataPartition(mdrrClass) trainX <- mdrrDescr[inTrain[[1]], ]

20 caretSBF

trainY <- mdrrClass[inTrain[[1]]] testX <- mdrrDescr[-inTrain[[1]], ] testY <- mdrrClass[-inTrain[[1]]]

library(MASS)

ldaFit <- lda(trainX, trainY) qdaFit <- qda(trainX, trainY)

testProbs <- data.frame(obs = testY, lda = predict(ldaFit, testX)$posterior[,1], qda = predict(qdaFit, testX)$posterior[,1])

calibration(obs ~ lda + qda, data = testProbs)

calPlotData <- calibration(obs ~ lda + qda, data = testProbs) calPlotData

xyplot(calPlotData, auto.key = list(columns = 2))

End(Not run)

caretSBF Selection By Filtering (SBF) Helper Functions

Description

Ancillary functions for univariate feature selection

Usage

caretSBF

anovaScores(x, y)

gamScores(x, y)

Arguments

x a matrix or data frame of numeric predictors y a numeric or factor vector of outcomes

Format

An object of class list of length 5.

caret: Classification and Regression Training, Lecture notes of Statistics

Related documents

Partial preview of the text

Download caret: Classification and Regression Training and more Lecture notes Statistics in PDF only on Docsity!

Package ‘caret’

April 19, 2022

S3 method for class 'confusionMatrix'

2 class example

3 class example

S3 method for class 'formula'

Not run:

End(Not run)

Default S3 method:

S3 method for class 'bag'

S3 method for class 'bag'

S3 method for class 'bag'

S3 method for class 'summary.bag'

A simple example of bagging conditional inference regression trees:

treebag <- bag(bbbDescr, logBBB, B = 10,

bagControl = bagControl(fit = ctreeBag$fit,

predict = ctreeBag$pred,

aggregate = ctreeBag$aggregate))

An example of pooling posterior probabilities to generate class predictions

remove some zero variance predictors and linear dependencies

basicLDA <- train(mdrrDescr, mdrrClass, "lda")

bagLDA2 <- train(mdrrDescr, mdrrClass,

"bag",

B = 10,

bagControl = bagControl(fit = ldaBag$fit,

predict = ldaBag$pred,

aggregate = ldaBag$aggregate),

tuneGrid = data.frame(vars = c((1:10)*10 , ncol(mdrrDescr))))

Default S3 method:

S3 method for class 'formula'

S3 method for class 'bagEarth'

End(Not run)

Default S3 method:

S3 method for class 'formula'

S3 method for class 'bagFDA'

S3 method for class 'BoxCoxTrans'

S3 method for class 'BoxCoxTrans'

Default S3 method:

S3 method for class 'formula'

Not run:

End(Not run)