Summarise Cases

group_by(.data, ..., add =

FALSE)

Returns copy of table !

grouped by …

g_iris <- group_by(iris, Species)

ungroup(x, …)

Returns ungrouped copy !

of table.

ungroup(g_iris)

Use group_by() to create a "grouped" copy of a table. !

dplyr functions will manipulate each "group" separately and

then combine the results.

mtcars %>%

group_by(cyl) %>%

summarise(avg = mean(mpg))

These apply summary functions to columns to create a new

table of summary statistics. Summary functions take vectors as

input and return one value (see back).

VARIATIONS

summarise_all() - Apply funs to every column.

summarise_at() - Apply funs to specific columns.

summarise_if() - Apply funs to all cols of one type.

summarise(.data, …)!

Compute table of summaries. !

summarise(mtcars, avg = mean(mpg))

count(x, ..., wt = NULL, sort = FALSE)!

Count number of rows in each group defined

by the variables in … Also tally().!

count(iris, Species)

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.7.0 • tibble 1.2.0 • Updated: 2019-08

Each observation, or

case, is in its own row

Each variable is in

its own column

dplyr functions work with pipes and expect tidy data. In tidy data:

pipes

x %>% f(y)

becomes f(x, y) filter(.data, …) Extract rows that meet logical

criteria. filter(iris, Sepal.Length > 7)

distinct(.data, ..., .keep_all = FALSE) Remove

rows with duplicate values. !

distinct(iris, Species)

sample_frac(tbl, size = 1, replace = FALSE,

weight = NULL, .env = parent.frame()) Randomly

select fraction of rows. !

sample_frac(iris, 0.5, replace = TRUE)

sample_n(tbl, size, replace = FALSE, weight =

NULL, .env = parent.frame()) Randomly select

size rows. sample_n(iris, 10, replace = TRUE)

slice(.data, …) Select rows by position.

slice(iris, 10:15)

top_n(x, n, wt) Select and order top n entries (by

group if grouped data). top_n(iris, 5, Sepal.Width)

Row functions return a subset of rows as a new table.

See ?base::Logic and ?Comparison for help.

!is.na()

is.na()

%in%

xor()

arrange(.data, …) Order rows by values of a

column or columns (low to high), use with

desc() to order from high to low.

arrange(mtcars, mpg)

arrange(mtcars, desc(mpg))

add_row(.data, ..., .before = NULL, .after = NULL)

Add one or more rows to a table.

add_row(faithful, eruptions = 1, waiting = 1)

Group Cases

Manipulate Cases

EXTRACT VARIABLES

ADD CASES

ARRANGE CASES

Logical and boolean operators to use with filter()

Column functions return a set of columns as a new vector or table.

contains(match)

ends_with(match)

matches(match)

:, e.g. mpg:cyl

-, e.g, -Species

num_range(prefix, range)

one_of(…)

starts_with(match)

pull(.data, var = -1) Extract column values as

a vector. Choose by name or index.

pull(iris, Sepal.Length)

Manipulate Variables

Use these helpers with select (),

e.g. select(iris, starts_with("Sepal"))

These apply vectorized functions to columns. Vectorized funs take

vectors as input and return vectors of the same length as output

(see back).

mutate(.data, …) !

Compute new column(s).

mutate(mtcars, gpm = 1/mpg)

transmute(.data, …)!

Compute new column(s), drop others.

transmute(mtcars, gpm = 1/mpg)

mutate_all(.tbl, .funs, …) Apply funs to every

column. Use with funs(). Also mutate_if().!

mutate_all(faithful, funs(log(.), log2(.)))

mutate_if(iris, is.numeric, funs(log(.)))

mutate_at(.tbl, .cols, .funs, …) Apply funs to

specific columns. Use with funs(), vars() and

the helper functions for select().!

mutate_at(iris, vars( -Species), funs(log(.)))

add_column(.data, ..., .before = NULL, .after =

NULL) Add new column(s). Also add_count(),

add_tally(). add_column(mtcars, new = 1:32)

rename(.data, …) Rename columns.!

rename(iris, Length = Sepal.Length)

MAKE NEW VARIABLES

EXTRACT CASES

summary function

vectorized function

Data Transformation with dplyr : : CHEAT SHEET

select(.data, …)

Extract columns as a table. Also select_if().

select(iris, Sepal.Length, Species)

dplyr

Data Transformation with Dplyr Cheat Sheet, Cheat Sheet of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Data Transformation with Dplyr Cheat Sheet and more Cheat Sheet Data Structures and Algorithms in PDF only on Docsity!

w

Summarise Cases

wwwwww

w

www

www

Group Cases

Manipulate Cases

EXTRACT VARIABLES

ADD CASES

ARRANGE CASES

Manipulate Variables

MAKE NEW VARIABLES

EXTRACT CASES

wwwwww

wwwwww

wwwwww

wwwwww

wwwwww

wwwwww

wwww

wwwww

wwwwww

www

wwww

w

wwwwww