Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

caret: Classification and Regression Training, Lecture notes of Statistics

Description Misc functions for training and plotting classification and regression models.

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

hambery
hambery 🇺🇸

4.2

(12)

269 documents

1 / 224

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Package ‘caret’
April 19, 2022
Title Classification and Regression Training
Version 6.0-92
Description Misc functions for training and plotting classification and
regression models.
License GPL (>= 2)
URL https://github.com/topepo/caret/
BugReports https://github.com/topepo/caret/issues
Depends ggplot2, lattice (>= 0.20), R (>= 3.2.0)
Imports e1071, foreach, grDevices, methods, ModelMetrics (>= 1.2.2.2),
nlme, plyr, pROC, recipes (>= 0.1.10), reshape2, stats, stats4,
utils, withr (>= 2.0.0)
Suggests BradleyTerry2, covr, Cubist, dplyr, earth (>= 2.2-3),
ellipse, fastICA, gam (>= 1.15), ipred, kernlab, klaR, knitr,
MASS, Matrix, mda, mgcv, mlbench, MLmetrics, nnet, pamr, party
(>= 0.9-99992), pls, proxy, randomForest, RANN, rmarkdown,
rpart, spls, subselect, superpc, testthat (>= 0.9.1), themis
(>= 0.1.3)
VignetteBuilder knitr
Encoding UTF-8
RoxygenNote 7.1.2
NeedsCompilation yes
Author Max Kuhn [aut, cre] (<https://orcid.org/0000-0003-2402-136X>),
Jed Wing [ctb],
Steve Weston [ctb],
Andre Williams [ctb],
Chris Keefer [ctb],
Allan Engelhardt [ctb],
Tony Cooper [ctb],
Zachary Mayer [ctb],
Brenton Kenkel [ctb],
R Core Team [ctb],
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download caret: Classification and Regression Training and more Lecture notes Statistics in PDF only on Docsity!

Package ‘caret’

April 19, 2022

Title Classification and Regression Training

Version 6.0-

Description Misc functions for training and plotting classification and regression models.

License GPL (>= 2)

URL https://github.com/topepo/caret/

BugReports https://github.com/topepo/caret/issues

Depends ggplot2, lattice (>= 0.20), R (>= 3.2.0)

Imports e1071, foreach, grDevices, methods, ModelMetrics (>= 1.2.2.2), nlme, plyr, pROC, recipes (>= 0.1.10), reshape2, stats, stats4, utils, withr (>= 2.0.0)

Suggests BradleyTerry2, covr, Cubist, dplyr, earth (>= 2.2-3), ellipse, fastICA, gam (>= 1.15), ipred, kernlab, klaR, knitr, MASS, Matrix, mda, mgcv, mlbench, MLmetrics, nnet, pamr, party (>= 0.9-99992), pls, proxy, randomForest, RANN, rmarkdown, rpart, spls, subselect, superpc, testthat (>= 0.9.1), themis (>= 0.1.3)

VignetteBuilder knitr

Encoding UTF-

RoxygenNote 7.1.

NeedsCompilation yes

Author Max Kuhn [aut, cre] (https://orcid.org/0000-0003-2402-136X), Jed Wing [ctb], Steve Weston [ctb], Andre Williams [ctb], Chris Keefer [ctb], Allan Engelhardt [ctb], Tony Cooper [ctb], Zachary Mayer [ctb], Brenton Kenkel [ctb], R Core Team [ctb],

2 R topics documented:

Michael Benesty [ctb], Reynald Lescarbeau [ctb], Andrew Ziem [ctb], Luca Scrucca [ctb], Yuan Tang [ctb], Can Candan [ctb], Tyler Hunt [ctb]

Maintainer Max Kuhn mxkuhn@gmail.com

Repository CRAN

Date/Publication 2022-04-19 06:52:35 UTC

R topics documented:

as.matrix.confusionMatrix................................. 4 avNNet........................................... 5 bag.............................................. 8 bagEarth........................................... 11 bagFDA........................................... 13 BloodBrain......................................... 15 BoxCoxTrans........................................ 15 calibration.......................................... 17 caretSBF.......................................... 20 cars............................................. 21 classDist........................................... 22 confusionMatrix....................................... 24 confusionMatrix.train.................................... 27 cox2............................................. 29 createDataPartition..................................... 30 defaultSummary....................................... 32 densityplot.rfe........................................ 35 dhfr............................................. 36 diff.resamples........................................ 37 dotPlot............................................ 39 dotplot.diff.resamples.................................... 40 downSample......................................... 41 dummyVars......................................... 42 extractPrediction...................................... 46 featurePlot.......................................... 48 filterVarImp......................................... 49 findCorrelation....................................... 51 findLinearCombos..................................... 52 format.bagEarth....................................... 54 gafs.default......................................... 55 gafsControl......................................... 58 gafs_initial......................................... 61 GermanCredit........................................ 63 getSamplingInfo...................................... 64

4 as.matrix.confusionMatrix

segmentationData...................................... 153 SLC14_1.......................................... 154 spatialSign.......................................... 158 summary.bagEarth..................................... 159 tecator............................................ 160 thresholder......................................... 161 train............................................. 163 trainControl......................................... 169 train_model_list....................................... 174 update.safs......................................... 206 update.train......................................... 207 varImp............................................ 208 varImp.gafs......................................... 213 var_seq........................................... 214 xyplot.resamples...................................... 215

Index 219

as.matrix.confusionMatrix Confusion matrix as a table

Description

Conversion functions for class confusionMatrix

Usage

S3 method for class 'confusionMatrix'

as.matrix(x, what = "xtabs", ...)

Arguments

x an object of class confusionMatrix what data to convert to matrix. Either "xtabs", "overall" or "classes" ... not currently used

Details

For as.table, the cross-tabulations are saved. For as.matrix, the three object types are saved in matrix format.

Value

A matrix or table

Author(s)

Max Kuhn

avNNet 5

Examples

###################

2 class example

lvs <- c("normal", "abnormal") truth <- factor(rep(lvs, times = c(86, 258)), levels = rev(lvs)) pred <- factor( c( rep(lvs, times = c(54, 32)), rep(lvs, times = c(27, 231))), levels = rev(lvs))

xtab <- table(pred, truth)

results <- confusionMatrix(xtab) as.table(results) as.matrix(results) as.matrix(results, what = "overall") as.matrix(results, what = "classes")

###################

3 class example

xtab <- confusionMatrix(iris$Species, sample(iris$Species)) as.matrix(xtab)

avNNet Neural Networks Using Model Averaging

Description

Aggregate several neural network models

Usage

avNNet(x, ...)

S3 method for class 'formula'

avNNet( formula, data, weights, ..., repeats = 5, bag = FALSE, allowParallel = TRUE, seeds = sample.int(1e+05, repeats),

avNNet 7

newdata matrix or data frame of test examples. A vector is considered to be a row vector comprising a single case. type Type of output, either: raw for the raw outputs, code for the predicted class or prob for the class probabilities.

Details

Following Ripley (1996), the same neural network model is fit using different random number seeds. All the resulting models are used for prediction. For regression, the output from each network are averaged. For classification, the model scores are first averaged, then translated to predicted classes. Bagging can also be used to create the models. If a parallel backend is registered, the foreach package is used to train the networks in parallel.

Value

For avNNet, an object of "avNNet" or "avNNet.formula". Items of interest in #’ the output are:

model a list of the models generated from nnet repeats an echo of the model input names if any predictors had only one distinct value, this is a character string of the #’ remaining columns. Otherwise a value of NULL

Author(s)

These are heavily based on the nnet code from Brian Ripley.

References

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.

See Also

nnet, preProcess

Examples

data(BloodBrain)

Not run:

modelFit <- avNNet(bbbDescr, logBBB, size = 5, linout = TRUE, trace = FALSE) modelFit

predict(modelFit, bbbDescr)

End(Not run)

8 bag

bag A General Framework For Bagging

Description

bag provides a framework for bagging classification or regression models. The user can provide their own functions for model building, prediction and aggregation of predictions (see Details be- low).

Usage

bag(x, ...)

bagControl( fit = NULL, predict = NULL, aggregate = NULL, downSample = FALSE, oob = TRUE, allowParallel = TRUE )

Default S3 method:

bag(x, y, B = 10, vars = ncol(x), bagControl = NULL, ...)

S3 method for class 'bag'

predict(object, newdata = NULL, ...)

S3 method for class 'bag'

print(x, ...)

S3 method for class 'bag'

summary(object, ...)

S3 method for class 'summary.bag'

print(x, digits = max(3, getOption("digits") - 3), ...)

ldaBag

plsBag

nbBag

ctreeBag

svmBag

10 bag

Details

The function is basically a framework where users can plug in any model in to assess the effect of bagging. Examples functions can be found in ldaBag, plsBag , nbBag, svmBag and nnetBag. Each has elements fit, pred and aggregate. One note: when vars is not NULL, the sub-setting occurs prior to the fit and #’ predict functions are called. In this way, the user probably does not need to account for the #’ change in predictors in their functions. When using bag with train, classification models should use type = "prob" #’ inside of the predict function so that predict.train(object,newdata,type = "prob") will #’ work. If a parallel backend is registered, the foreach package is used to train the models in parallel.

Value

bag produces an object of class bag with elements

fits a list with two sub-objects: the fit object has the actual model fit for that #’ bagged samples and the vars object is either NULL or a vector of integers corre- sponding to which predictors were sampled for that model control a mirror of the arguments passed into bagControl call the call B the number of bagging iterations dims the dimensions of the training set

Author(s)

Max Kuhn

Examples

A simple example of bagging conditional inference regression trees:

data(BloodBrain)

treebag <- bag(bbbDescr, logBBB, B = 10,

bagControl = bagControl(fit = ctreeBag$fit,

predict = ctreeBag$pred,

aggregate = ctreeBag$aggregate))

An example of pooling posterior probabilities to generate class predictions

data(mdrr)

remove some zero variance predictors and linear dependencies

mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)] mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .95)]

basicLDA <- train(mdrrDescr, mdrrClass, "lda")

bagEarth 11

bagLDA2 <- train(mdrrDescr, mdrrClass,

"bag",

B = 10,

bagControl = bagControl(fit = ldaBag$fit,

predict = ldaBag$pred,

aggregate = ldaBag$aggregate),

tuneGrid = data.frame(vars = c((1:10)*10 , ncol(mdrrDescr))))

bagEarth Bagged Earth

Description

A bagging wrapper for multivariate adaptive regression splines (MARS) via the earth function

Usage

bagEarth(x, ...)

Default S3 method:

bagEarth(x, y, weights = NULL, B = 50, summary = mean, keepX = TRUE, ...)

S3 method for class 'formula'

bagEarth( formula, data = NULL, B = 50, summary = mean, keepX = TRUE, ..., subset, weights = NULL, na.action = na.omit )

S3 method for class 'bagEarth'

print(x, ...)

Arguments

x matrix or data frame of ’x’ values for examples. ... arguments passed to the earth function y matrix or data frame of numeric values outcomes. weights (case) weights for each example - if missing defaults to 1. B the number of bootstrap samples

bagFDA 13

fit2 <- bagEarth(x = trees[,-3], y = trees[,3], B = 10)

End(Not run)

bagFDA Bagged FDA

Description

A bagging wrapper for flexible discriminant analysis (FDA) using multivariate adaptive regression splines (MARS) basis functions

Usage

bagFDA(x, ...)

Default S3 method:

bagFDA(x, y, weights = NULL, B = 50, keepX = TRUE, ...)

S3 method for class 'formula'

bagFDA( formula, data = NULL, B = 50, keepX = TRUE, ..., subset, weights = NULL, na.action = na.omit )

S3 method for class 'bagFDA'

print(x, ...)

Arguments

x matrix or data frame of ’x’ values for examples. ... arguments passed to the mars function y matrix or data frame of numeric values outcomes. weights (case) weights for each example - if missing defaults to 1. B the number of bootstrap samples keepX a logical: should the original training data be kept? formula A formula of the form y ~ x1 + x2 + ... data Data frame from which variables specified in ’formula’ are preferentially to be taken.

14 bagFDA

subset An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) na.action A function to specify the action to be taken if ’NA’s are found. The default action is for the procedure to fail. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)

Details

The function computes a FDA model for each bootstap sample.

Value

A list with elements

fit a list of B FDA fits B the number of bootstrap samples call the function call x either NULL or the value of x, depending on the value of keepX oob a matrix of performance estimates for each bootstrap sample

Author(s)

Max Kuhn (bagFDA.formula is based on Ripley’s nnet.formula)

References

J. Friedman, “Multivariate Adaptive Regression Splines” (with discussion) (1991). Annals of Statis- tics, 19/1, 1-141.

See Also

fda, predict.bagFDA

Examples

library(mlbench) library(earth) data(Glass)

set.seed(36) inTrain <- sample(1:dim(Glass)[1], 150)

trainData <- Glass[ inTrain, ] testData <- Glass[-inTrain, ]

set.seed(3577) baggedFit <- bagFDA(Type ~ ., trainData) confusionMatrix(data = predict(baggedFit, testData[, -10]),

16 BoxCoxTrans

S3 method for class 'BoxCoxTrans'

print(x, newdata, digits = 3, ...)

S3 method for class 'BoxCoxTrans'

predict(object, newdata, ...)

Arguments

y a numeric vector of data to be transformed. For BoxCoxTrans, the data must be strictly positive. ... for BoxCoxTrans: options to pass to boxcox. plotit should not be passed through. For predict.BoxCoxTrans, additional arguments are ignored. x an optional dependent variable to be used in a linear model. fudge a tolerance value: lambda values within +/-fudge will be coerced to 0 and within 1+/-fudge will be coerced to 1. numUnique how many unique values should y have to estimate the transformation? na.rm a logical value indicating whether NA values should be stripped from y and x before the computation proceeds. newdata a numeric vector of values to transform. digits minimal number of significant digits. object an object of class BoxCoxTrans or expoTrans.

Details

BoxCoxTrans function is basically a wrapper for the boxcox function in the MASS library. It can be used to estimate the transformation and apply it to new data. expoTrans estimates the exponential transformation of Manly (1976) but assumes a common mean for the data. The transformation parameter is estimated by directly maximizing the likelihood. If any(y <= 0) or if length(unique(y)) < numUnique, lambda is not estimated and no transfor- mation is applied.

Value

Both functions returns a list of class of either BoxCoxTrans or expoTrans with elements

lambda estimated transformation value fudge value of fudge n number of data points used to estimate lambda summary the results of summary(y) ratio max(y)/min(y) skewness sample skewness statistic

BoxCoxTrans also returns:

fudge value of fudge

The predict functions returns numeric vectors of transformed values

calibration 17

Author(s)

Max Author

References

Box, G. E. P. and Cox, D. R. (1964) An analysis of transformations (with discussion). Journal of the Royal Statistical Society B, 26, 211-252. Manly, B. L. (1976) Exponential data transformations. The Statistician, 25, 37 - 42.

See Also

boxcox, preProcess, optim

Examples

data(BloodBrain)

ratio <- exp(logBBB) bc <- BoxCoxTrans(ratio) bc

predict(bc, ratio[1:5])

ratio[5] <- NA bc2 <- BoxCoxTrans(ratio, bbbDescr$tpsa, na.rm = TRUE) bc

manly <- expoTrans(ratio) manly

calibration Probability Calibration Plot

Description

For classification models, this function creates a ’calibration plot’ that describes how consistent model probabilities are with observed event rates.

Usage

calibration(x, ...)

Default S3 method:

calibration(x, ...)

S3 method for class 'formula'

calibration(

calibration 19

Details

calibration.formula is used to process the data and xyplot.calibration is used to create the plot. To construct the calibration plot, the following steps are used for each model:

  1. The data are split into cuts -1 roughly equal groups by their class probabilities
  2. the number of samples with true results equal to class are determined
  3. the event rate is determined for each bin

xyplot.calibration produces a plot of the observed event rate by the mid-point of the bins. This implementation uses the lattice function xyplot, so plot elements can be changed via panel functions, trellis.par.set or other means. calibration uses the panel function panel.calibration by default, but it can be changed by passing that argument into xyplot.calibration. The following elements are set by default in the plot but can be changed by passing new values into xyplot.calibration: xlab = "Bin Midpoint", ylab = "Observed Event Percentage", type = "o", ylim = extendrange(c(0,100)),xlim = extendrange(c(0,100)) and panel = panel.calibration For the ggplot method, confidence intervals on the estimated proportions (from binom.test) are also shown.

Value

calibration.formula returns a list with elements:

data the data used for plotting cuts the number of cuts class the event class probNames the names of the model probabilities

xyplot.calibration returns a lattice object

Author(s)

Max Kuhn, some lattice code and documentation by Deepayan Sarkar

See Also

xyplot, trellis.par.set

Examples

Not run:

data(mdrr) mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)] mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .5)]

inTrain <- createDataPartition(mdrrClass) trainX <- mdrrDescr[inTrain[[1]], ]

20 caretSBF

trainY <- mdrrClass[inTrain[[1]]] testX <- mdrrDescr[-inTrain[[1]], ] testY <- mdrrClass[-inTrain[[1]]]

library(MASS)

ldaFit <- lda(trainX, trainY) qdaFit <- qda(trainX, trainY)

testProbs <- data.frame(obs = testY, lda = predict(ldaFit, testX)$posterior[,1], qda = predict(qdaFit, testX)$posterior[,1])

calibration(obs ~ lda + qda, data = testProbs)

calPlotData <- calibration(obs ~ lda + qda, data = testProbs) calPlotData

xyplot(calPlotData, auto.key = list(columns = 2))

End(Not run)

caretSBF Selection By Filtering (SBF) Helper Functions

Description

Ancillary functions for univariate feature selection

Usage

caretSBF

anovaScores(x, y)

gamScores(x, y)

Arguments

x a matrix or data frame of numeric predictors y a numeric or factor vector of outcomes

Format

An object of class list of length 5.