Data Transformation with Data Table Cheat Sheet | Cheat Sheet Data Structures and Algorithms

Basics

CC BY SA Erik Petrovski •Updated: 2018-09

Data Transformation with data.table

CHEAT SHEET

Manipulate columns with j

Functions for data.tables

data.table is an extremely fast and memory efficient package

for transforming data in R. It works by converting R’s native

data frame objects into data.tables with new and enhanced

functionality. The basics of working with data.tables are:

dt[i, j, by]

Take data.table dt,

subset rows using i,

and manipulate columns with j,

grouped according to by.

data.tables are also data frames –functions that work with data

frames therefore also work with data.tables.

data.table(a = c(1, 2), b = c("a", "b"))–create a data.table from

scratch. Analogous to data.frame().

setDT(df)* or as.data.table(df)–convert a data frame or a list to

a data.table.

Create a data.table

dt[1:2, ] –subset rows based on row numbers.

dt[a > 5, ] –subset rows based on values in

one or more columns.

Subset rows using i

LOGICAL OPERATORS TO USE IN i

<<= is.na() %in% |%like%

>>= !is.na() ! & %between%

dt[, c(2)] – extract column(s) by number. Prefix

column numbers with “-” to drop.

dt[, .(b, c)] – extract column(s) by name.

b c

EXTRACT

dt[, .(x = sum(a))] –create a data.table with new

columns based on the summarized values of rows.

Summary functions like mean(), median(), min(),

max(), etc. may be used to summarize rows.

dt[, .(c = sum(b)), by = a]–summarize rows within groups.

dt[, c := sum(b), by = a] –create a new column and compute rows

within groups.

dt[, .SD[1], by = a] – extract first row of groups.

dt[, .SD[.N], by = a] – extract last row of groups.

COMMON GROUPED OPERATIONS

COMPUTE COLUMNS*

dt[, c := 1 + 2] –compute a column based on an

expression.

setorder(dt, a, -b)–reorder a data.table

according to specified columns. Prefix

column names with “-” for descending

order.

a b

1 2

1 1

2 2

a b

1 2

2 2

1 1

REORDER

dt[a == 1, c := 1 + 2] –compute a column based

on an expression but only for a subset of rows.

SUMMARIZE

a c

2NA

1 3

Group according to by

aadt[, j, by = .(a)] – group rows by

values in specified column(s).

dt[, j, keyby = .(a)] – group and

simultaneously sort rows according

to values in specified column(s).

Chaining

dt[…][…] –perform a sequence of data.table operations by

chaining multiple “[]”.

*SET FUNCTIONS AND :=

data.table’s functions prefixed with “set” and the operator “:=”

work without “<-” to alter data without making copies in

memory. E.g. the more efficient “setDT(df)” is analogous to

“df <- as.data.table(df)”.

c d

1 2

dt[, `:=`(c = 1 , d = 2)] –compute multiple

columns based on separate expressions.

DELETE COLUMN

cdt[, c := NULL]–delete a column.

CONVERT COLUMN TYPE

1.5

2.6

dt[, b := as.integer(b)] –convert the type of a

column using as.integer(), as.numeric(),

as.character(), as.Date(), etc..

Data Transformation with Data Table Cheat Sheet, Cheat Sheet of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Data Transformation with Data Table Cheat Sheet and more Cheat Sheet Data Structures and Algorithms in PDF only on Docsity!

Basics