




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Description Misc functions for training and plotting classification and regression models.
Typology: Lecture notes
1 / 224
This page cannot be seen from the preview
Don't miss anything!
Title Classification and Regression Training
Version 6.0-
Description Misc functions for training and plotting classification and regression models.
License GPL (>= 2)
URL https://github.com/topepo/caret/
BugReports https://github.com/topepo/caret/issues
Depends ggplot2, lattice (>= 0.20), R (>= 3.2.0)
Imports e1071, foreach, grDevices, methods, ModelMetrics (>= 1.2.2.2), nlme, plyr, pROC, recipes (>= 0.1.10), reshape2, stats, stats4, utils, withr (>= 2.0.0)
Suggests BradleyTerry2, covr, Cubist, dplyr, earth (>= 2.2-3), ellipse, fastICA, gam (>= 1.15), ipred, kernlab, klaR, knitr, MASS, Matrix, mda, mgcv, mlbench, MLmetrics, nnet, pamr, party (>= 0.9-99992), pls, proxy, randomForest, RANN, rmarkdown, rpart, spls, subselect, superpc, testthat (>= 0.9.1), themis (>= 0.1.3)
VignetteBuilder knitr
Encoding UTF-
RoxygenNote 7.1.
NeedsCompilation yes
Author Max Kuhn [aut, cre] (https://orcid.org/0000-0003-2402-136X), Jed Wing [ctb], Steve Weston [ctb], Andre Williams [ctb], Chris Keefer [ctb], Allan Engelhardt [ctb], Tony Cooper [ctb], Zachary Mayer [ctb], Brenton Kenkel [ctb], R Core Team [ctb],
2 R topics documented:
Michael Benesty [ctb], Reynald Lescarbeau [ctb], Andrew Ziem [ctb], Luca Scrucca [ctb], Yuan Tang [ctb], Can Candan [ctb], Tyler Hunt [ctb]
Maintainer Max Kuhn mxkuhn@gmail.com
Repository CRAN
Date/Publication 2022-04-19 06:52:35 UTC
R topics documented:
as.matrix.confusionMatrix................................. 4 avNNet........................................... 5 bag.............................................. 8 bagEarth........................................... 11 bagFDA........................................... 13 BloodBrain......................................... 15 BoxCoxTrans........................................ 15 calibration.......................................... 17 caretSBF.......................................... 20 cars............................................. 21 classDist........................................... 22 confusionMatrix....................................... 24 confusionMatrix.train.................................... 27 cox2............................................. 29 createDataPartition..................................... 30 defaultSummary....................................... 32 densityplot.rfe........................................ 35 dhfr............................................. 36 diff.resamples........................................ 37 dotPlot............................................ 39 dotplot.diff.resamples.................................... 40 downSample......................................... 41 dummyVars......................................... 42 extractPrediction...................................... 46 featurePlot.......................................... 48 filterVarImp......................................... 49 findCorrelation....................................... 51 findLinearCombos..................................... 52 format.bagEarth....................................... 54 gafs.default......................................... 55 gafsControl......................................... 58 gafs_initial......................................... 61 GermanCredit........................................ 63 getSamplingInfo...................................... 64
4 as.matrix.confusionMatrix
segmentationData...................................... 153 SLC14_1.......................................... 154 spatialSign.......................................... 158 summary.bagEarth..................................... 159 tecator............................................ 160 thresholder......................................... 161 train............................................. 163 trainControl......................................... 169 train_model_list....................................... 174 update.safs......................................... 206 update.train......................................... 207 varImp............................................ 208 varImp.gafs......................................... 213 var_seq........................................... 214 xyplot.resamples...................................... 215
Index 219
as.matrix.confusionMatrix Confusion matrix as a table
Description
Conversion functions for class confusionMatrix
Usage
as.matrix(x, what = "xtabs", ...)
Arguments
x an object of class confusionMatrix what data to convert to matrix. Either "xtabs", "overall" or "classes" ... not currently used
Details
For as.table, the cross-tabulations are saved. For as.matrix, the three object types are saved in matrix format.
Value
A matrix or table
Author(s)
Max Kuhn
avNNet 5
Examples
###################
lvs <- c("normal", "abnormal") truth <- factor(rep(lvs, times = c(86, 258)), levels = rev(lvs)) pred <- factor( c( rep(lvs, times = c(54, 32)), rep(lvs, times = c(27, 231))), levels = rev(lvs))
xtab <- table(pred, truth)
results <- confusionMatrix(xtab) as.table(results) as.matrix(results) as.matrix(results, what = "overall") as.matrix(results, what = "classes")
###################
xtab <- confusionMatrix(iris$Species, sample(iris$Species)) as.matrix(xtab)
avNNet Neural Networks Using Model Averaging
Description
Aggregate several neural network models
Usage
avNNet(x, ...)
avNNet( formula, data, weights, ..., repeats = 5, bag = FALSE, allowParallel = TRUE, seeds = sample.int(1e+05, repeats),
avNNet 7
newdata matrix or data frame of test examples. A vector is considered to be a row vector comprising a single case. type Type of output, either: raw for the raw outputs, code for the predicted class or prob for the class probabilities.
Details
Following Ripley (1996), the same neural network model is fit using different random number seeds. All the resulting models are used for prediction. For regression, the output from each network are averaged. For classification, the model scores are first averaged, then translated to predicted classes. Bagging can also be used to create the models. If a parallel backend is registered, the foreach package is used to train the networks in parallel.
Value
For avNNet, an object of "avNNet" or "avNNet.formula". Items of interest in #’ the output are:
model a list of the models generated from nnet repeats an echo of the model input names if any predictors had only one distinct value, this is a character string of the #’ remaining columns. Otherwise a value of NULL
Author(s)
These are heavily based on the nnet code from Brian Ripley.
References
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.
See Also
nnet, preProcess
Examples
data(BloodBrain)
modelFit <- avNNet(bbbDescr, logBBB, size = 5, linout = TRUE, trace = FALSE) modelFit
predict(modelFit, bbbDescr)
8 bag
bag A General Framework For Bagging
Description
bag provides a framework for bagging classification or regression models. The user can provide their own functions for model building, prediction and aggregation of predictions (see Details be- low).
Usage
bag(x, ...)
bagControl( fit = NULL, predict = NULL, aggregate = NULL, downSample = FALSE, oob = TRUE, allowParallel = TRUE )
bag(x, y, B = 10, vars = ncol(x), bagControl = NULL, ...)
predict(object, newdata = NULL, ...)
print(x, ...)
summary(object, ...)
print(x, digits = max(3, getOption("digits") - 3), ...)
ldaBag
plsBag
nbBag
ctreeBag
svmBag
10 bag
Details
The function is basically a framework where users can plug in any model in to assess the effect of bagging. Examples functions can be found in ldaBag, plsBag , nbBag, svmBag and nnetBag. Each has elements fit, pred and aggregate. One note: when vars is not NULL, the sub-setting occurs prior to the fit and #’ predict functions are called. In this way, the user probably does not need to account for the #’ change in predictors in their functions. When using bag with train, classification models should use type = "prob" #’ inside of the predict function so that predict.train(object,newdata,type = "prob") will #’ work. If a parallel backend is registered, the foreach package is used to train the models in parallel.
Value
bag produces an object of class bag with elements
fits a list with two sub-objects: the fit object has the actual model fit for that #’ bagged samples and the vars object is either NULL or a vector of integers corre- sponding to which predictors were sampled for that model control a mirror of the arguments passed into bagControl call the call B the number of bagging iterations dims the dimensions of the training set
Author(s)
Max Kuhn
Examples
data(BloodBrain)
data(mdrr)
mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)] mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .95)]
bagEarth 11
bagEarth Bagged Earth
Description
A bagging wrapper for multivariate adaptive regression splines (MARS) via the earth function
Usage
bagEarth(x, ...)
bagEarth(x, y, weights = NULL, B = 50, summary = mean, keepX = TRUE, ...)
bagEarth( formula, data = NULL, B = 50, summary = mean, keepX = TRUE, ..., subset, weights = NULL, na.action = na.omit )
print(x, ...)
Arguments
x matrix or data frame of ’x’ values for examples. ... arguments passed to the earth function y matrix or data frame of numeric values outcomes. weights (case) weights for each example - if missing defaults to 1. B the number of bootstrap samples
bagFDA 13
fit2 <- bagEarth(x = trees[,-3], y = trees[,3], B = 10)
bagFDA Bagged FDA
Description
A bagging wrapper for flexible discriminant analysis (FDA) using multivariate adaptive regression splines (MARS) basis functions
Usage
bagFDA(x, ...)
bagFDA(x, y, weights = NULL, B = 50, keepX = TRUE, ...)
bagFDA( formula, data = NULL, B = 50, keepX = TRUE, ..., subset, weights = NULL, na.action = na.omit )
print(x, ...)
Arguments
x matrix or data frame of ’x’ values for examples. ... arguments passed to the mars function y matrix or data frame of numeric values outcomes. weights (case) weights for each example - if missing defaults to 1. B the number of bootstrap samples keepX a logical: should the original training data be kept? formula A formula of the form y ~ x1 + x2 + ... data Data frame from which variables specified in ’formula’ are preferentially to be taken.
14 bagFDA
subset An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) na.action A function to specify the action to be taken if ’NA’s are found. The default action is for the procedure to fail. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)
Details
The function computes a FDA model for each bootstap sample.
Value
A list with elements
fit a list of B FDA fits B the number of bootstrap samples call the function call x either NULL or the value of x, depending on the value of keepX oob a matrix of performance estimates for each bootstrap sample
Author(s)
Max Kuhn (bagFDA.formula is based on Ripley’s nnet.formula)
References
J. Friedman, “Multivariate Adaptive Regression Splines” (with discussion) (1991). Annals of Statis- tics, 19/1, 1-141.
See Also
fda, predict.bagFDA
Examples
library(mlbench) library(earth) data(Glass)
set.seed(36) inTrain <- sample(1:dim(Glass)[1], 150)
trainData <- Glass[ inTrain, ] testData <- Glass[-inTrain, ]
set.seed(3577) baggedFit <- bagFDA(Type ~ ., trainData) confusionMatrix(data = predict(baggedFit, testData[, -10]),
16 BoxCoxTrans
print(x, newdata, digits = 3, ...)
predict(object, newdata, ...)
Arguments
y a numeric vector of data to be transformed. For BoxCoxTrans, the data must be strictly positive. ... for BoxCoxTrans: options to pass to boxcox. plotit should not be passed through. For predict.BoxCoxTrans, additional arguments are ignored. x an optional dependent variable to be used in a linear model. fudge a tolerance value: lambda values within +/-fudge will be coerced to 0 and within 1+/-fudge will be coerced to 1. numUnique how many unique values should y have to estimate the transformation? na.rm a logical value indicating whether NA values should be stripped from y and x before the computation proceeds. newdata a numeric vector of values to transform. digits minimal number of significant digits. object an object of class BoxCoxTrans or expoTrans.
Details
BoxCoxTrans function is basically a wrapper for the boxcox function in the MASS library. It can be used to estimate the transformation and apply it to new data. expoTrans estimates the exponential transformation of Manly (1976) but assumes a common mean for the data. The transformation parameter is estimated by directly maximizing the likelihood. If any(y <= 0) or if length(unique(y)) < numUnique, lambda is not estimated and no transfor- mation is applied.
Value
Both functions returns a list of class of either BoxCoxTrans or expoTrans with elements
lambda estimated transformation value fudge value of fudge n number of data points used to estimate lambda summary the results of summary(y) ratio max(y)/min(y) skewness sample skewness statistic
BoxCoxTrans also returns:
fudge value of fudge
The predict functions returns numeric vectors of transformed values
calibration 17
Author(s)
Max Author
References
Box, G. E. P. and Cox, D. R. (1964) An analysis of transformations (with discussion). Journal of the Royal Statistical Society B, 26, 211-252. Manly, B. L. (1976) Exponential data transformations. The Statistician, 25, 37 - 42.
See Also
boxcox, preProcess, optim
Examples
data(BloodBrain)
ratio <- exp(logBBB) bc <- BoxCoxTrans(ratio) bc
predict(bc, ratio[1:5])
ratio[5] <- NA bc2 <- BoxCoxTrans(ratio, bbbDescr$tpsa, na.rm = TRUE) bc
manly <- expoTrans(ratio) manly
calibration Probability Calibration Plot
Description
For classification models, this function creates a ’calibration plot’ that describes how consistent model probabilities are with observed event rates.
Usage
calibration(x, ...)
calibration(x, ...)
calibration(
calibration 19
Details
calibration.formula is used to process the data and xyplot.calibration is used to create the plot. To construct the calibration plot, the following steps are used for each model:
xyplot.calibration produces a plot of the observed event rate by the mid-point of the bins. This implementation uses the lattice function xyplot, so plot elements can be changed via panel functions, trellis.par.set or other means. calibration uses the panel function panel.calibration by default, but it can be changed by passing that argument into xyplot.calibration. The following elements are set by default in the plot but can be changed by passing new values into xyplot.calibration: xlab = "Bin Midpoint", ylab = "Observed Event Percentage", type = "o", ylim = extendrange(c(0,100)),xlim = extendrange(c(0,100)) and panel = panel.calibration For the ggplot method, confidence intervals on the estimated proportions (from binom.test) are also shown.
Value
calibration.formula returns a list with elements:
data the data used for plotting cuts the number of cuts class the event class probNames the names of the model probabilities
xyplot.calibration returns a lattice object
Author(s)
Max Kuhn, some lattice code and documentation by Deepayan Sarkar
See Also
xyplot, trellis.par.set
Examples
data(mdrr) mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)] mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .5)]
inTrain <- createDataPartition(mdrrClass) trainX <- mdrrDescr[inTrain[[1]], ]
20 caretSBF
trainY <- mdrrClass[inTrain[[1]]] testX <- mdrrDescr[-inTrain[[1]], ] testY <- mdrrClass[-inTrain[[1]]]
library(MASS)
ldaFit <- lda(trainX, trainY) qdaFit <- qda(trainX, trainY)
testProbs <- data.frame(obs = testY, lda = predict(ldaFit, testX)$posterior[,1], qda = predict(qdaFit, testX)$posterior[,1])
calibration(obs ~ lda + qda, data = testProbs)
calPlotData <- calibration(obs ~ lda + qda, data = testProbs) calPlotData
xyplot(calPlotData, auto.key = list(columns = 2))
caretSBF Selection By Filtering (SBF) Helper Functions
Description
Ancillary functions for univariate feature selection
Usage
caretSBF
anovaScores(x, y)
gamScores(x, y)
Arguments
x a matrix or data frame of numeric predictors y a numeric or factor vector of outcomes
Format
An object of class list of length 5.