




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The 'ConfusionTableR' package is a toolset designed to work with the outputs of machine learning classification models in R. It provides functions to convert confusion matrix outputs into lists, allowing for easy storage in databases and tracking of ML model performance over time. The package supports binary and multiclassification problems and offers record-level conversion of confusion matrix outputs. Traditionally, this approach has been used for highlighting model representation and feature slippage.
Typology: Study Guides, Projects, Research
1 / 8
This page cannot be seen from the preview
Don't miss anything!
Type Package
Title Confusion Matrix Toolset
Version 1.0.
Maintainer Gary Hutson hutsons-hacks@outlook.com
Description Takes the outputs of a 'caret' confusion matrix and allows for the quick conver- sion of these list items to lists. The intended usage is to allow the tool to work with the outputs of machine learning classifica- tion models. This tool works with classification problems for binary and multi- classification problems and allows for the record level conversion of the confusion matrix outputs. This is useful, as it allows quick conversion of these objects for storage in database sys- tems and to track ML model performance over time. Traditionally, this approach has been used for highlighting model representation and feature slip- page.
License MIT + file LICENSE
Encoding UTF-
RoxygenNote 7.1.
Imports dplyr, tidyr, magrittr, caret, purrr, furrr
Suggests knitr, rmarkdown, e1071, randomForest, scales, mlbench, FeatureTerminatoR
VignetteBuilder knitr
NeedsCompilation no
Repository CRAN
Collate 'MultiFramer.R' 'SingleFramer.R' 'binaryVisualiseR.R' 'dummycoder.R' 'globals.R'
Language en-US
Author Gary Hutson [aut, cre] (https://orcid.org/0000-0003-3534-6143)
Date/Publication 2021-12-01 16:30:01 UTC
2 binary_class_cm
R topics documented:
binary_class_cm...................................... 2 binary_visualiseR...................................... 3 dummy_encoder...................................... 5 multi_class_cm....................................... 6
Index 8
binary_class_cm Binary Confusion Matrix data frame
Description
a confusion matrix object for binary classification machine learning problems.
Usage
binary_class_cm(train_labels, truth_labels, ...)
Arguments
train_labels the classification labels from the training set truth_labels the testing set ground truth labels for comparison ... function forwarding for additional ‘caret‘ confusion matrix parameters to be passed such as mode="everything" and positive="class label"
Value
A list containing the outputs highlighted hereunder:
Examples
library(dplyr) library(ConfusionTableR) library(caret) library(tidyr) library(mlbench)
data("BreastCancer", package = "mlbench")
4 binary_visualiseR
Arguments
train_labels the classification labels from the training set truth_labels the testing set ground truth labels for comparison class_label1 classification label 1 i.e. readmission into hospital class_label2 classification label 2 i.e. not a readmission into hospital quadrant_col1 colour of the first quadrant - specified as hexadecimal quadrant_col2 colour of the second quadrant - specified as hexadecimal custom_title title of the confusion matrix plot info_box_title title of the confusion matrix statistics box text_col the colour of the text round_dig rounding options cm_stat_size the cex size of the statistics box label cm_stat_lbl_size the cex size of the label in the statistics box ... function forwarding to the confusion matrix object to pass additional args, such as positive = "Class label"
Value
returns a visual of a Confusion Matrix output
Examples
library(dplyr) library(ConfusionTableR) library(caret) library(tidyr) library(mlbench)
data("BreastCancer", package = "mlbench") breast <- BreastCancer[complete.cases(BreastCancer), ] #Create a copy breast <- breast[, -1] breast <- breast[1:100,] breast$Class <- factor(breast$Class) # Create as factor for(i in 1:9) { breast[, i] <- as.numeric(as.character(breast[, i])) }
#Perform train / test split on the data train_split_idx <- caret::createDataPartition(breast$Class, p = 0.75, list = FALSE) train <- breast[train_split_idx, ]
dummy_encoder 5
test <- breast[-train_split_idx, ] rf_fit <- caret::train(Class ~ ., data=train, method="rf") #Make predictions to expose class labels preds <- predict(rf_fit, newdata=test, type="raw") predicted <- cbind(data.frame(class_preds=preds), test)
ConfusionTableR::binary_visualiseR(predicted$class_preds, predicted$Class)
dummy_encoder Dummy Encoder function to encode multiple columns at once
Description
This function has been designed to encode multiple columns at once and allows the user to specify whether to drop the reference columns or retain them in the data
Usage
dummy_encoder(df, columns, map_fn = furrr::future_map, remove_original = TRUE)
Arguments
df - data.frame object to pass to the function columns - vector of columns to be encoded for dummy encoding map_fn - choice of mapping function purrr:map or furr::future_map accepted remove_original
Value
A tibble containing the dummy encodings
Examples
#Use the NHSR stranded dataset df <- NHSRdatasets::stranded_data #Create a function to select categorical variables sep_categorical <- function(df){ cats <- df %>% dplyr::select_if(is.character) return(cats) } cats <- sep_categorical(df) %>% dplyr::select(-c(admit_date)) #Dummy encoding columns_vector <- c(names(cats)) dummy_encodings <- dummy_encoder(cats, columns_vector)
multi_class_cm 7
rf_model <- caret::train(Species ~ .,data = df,method = "rf", metric = "Accuracy")
rf_class <- predict(rf_model, newdata = test, type = "raw") predictions <- cbind(data.frame(train_preds=rf_class, test$Species))
cm <- ConfusionTableR::multi_class_cm(predictions$train_preds, predictions$test.Species)
cm_rl <- cm$record_level_cm print(cm_rl) #Expose the original confusion matrix list cm_orig <- cm$confusion_matrix print(cm_orig)
binary_class_cm, 2 binary_visualiseR, 3
dummy_encoder, 5
multi_class_cm, 6