R Programming: A Beginner's Guide to Data Manipulation and Analysis | Study notes Advanced Computer Programming

Transform and Tidy (Wrangle) Data with R - Required

Read

Bojan Duric

This Notebook is selection of “A Very (short) Introduction to R” by Paul Torfs & Claudia Brauer and “R for

Data Scince”" by Hadley Wickham and Garrett Grolemund

1 Introduction

What is R?

R is a powerful language and environment for statistical computing and graphics. It is a public domain (a so

called “GNU”) project which is similar to the commercial S language and environment which was developed

at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be

considered as a different implementation of S, or in language terms different dialect of S. The main advantages

of R are the fact that R is freeware and that there is a lot of help available online. It is quite similar to

other programming packages such as MatLab (not freeware), but more user-friendly than programming

languages such as C++ or Fortran. You can use R as it is, but for educational purposes we prefer to use R

in combination with the RStudio interface (also freeware), which has an organized layout and several extra

options.

The R language came to use quite a bit after S had been developed. One key limitation of the S language

was that it was only available in a commercial package, S-PLUS. In 1991, R was created by

Ross Ihaka

and

Robert Gentleman

in the Department of Statistics at the University of Auckland. In 1993 the first

announcement of R was made to the public.

In 1995, Martin Mächler made an important contribution by convincing Ross and Robert to use the GNU

General Public License to make R free software. This was critical because it allowed for the source code for

the entire R system to be accessible to anyone who wanted to tinker with it (more on free software later).

In 1996, a public mailing list was created (the R-help and R-devel lists) and in 1997 the R Core Group was

formed, containing some people associated with S and S-PLUS. Currently, the core group controls the source

code for R and is solely able to check in changes to the main R source tree. Finally, in 2000 R version 1.0.0

was released to the public.

Limitations of R

No programming language or statistical analysis system is perfect. R certainly has a number of drawbacks.

For starters, R is essentially based on almost 50 year old technology, going back to the original S system

developed at Bell Labs. There was originally little built in support for dynamic or 3-D graphics (but things

have improved greatly since the “old days”).

Another commonly cited limitation of R is that objects must generally be stored in physical memory. This

is in part due to the scoping rules of the language, but R generally is more of a memory hog than other

statistical packages. However, there have been a number of advancements to deal with this, both in the R

core and also in a number of packages developed by contributors. Also, computing power and capacity has

continued to grow over time and amount of physical memory that can be installed on even a consumer-level

laptop is substantial. While we will likely never have enough physical memory on a computer to handle the

increasingly large datasets that are being generated, the situation has gotten quite a bit easier over time.

R Programming: A Beginner's Guide to Data Manipulation and Analysis, Study notes of Advanced Computer Programming

Related documents

Partial preview of the text

Download R Programming: A Beginner's Guide to Data Manipulation and Analysis and more Study notes Advanced Computer Programming in PDF only on Docsity!

Transform and Tidy (Wrangle) Data with R - Required

Read

Bojan Duric

1 Introduction

2.1 Install R

2.2 Install RStudio

2.3 RStudio layout

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

filter, lag

The following objects are masked from 'package:base':

intersect, setdiff, setequal, union

3.1 Calculator

[1] 136

use hastag to comment your code

write your code:_

3.2 Workspace

[1] 4

[1] 20

[1] 14

3.3 Scalars, vectors and matrices

3.4 Functions

Index

x

Practice (optional)

5.1 Vectors

[1] 1 4 6 8 10

[1] 10

[1] 1 4 12 8 10

[1] 0.00 0.25 0.50 0.75 1.

[1] 35

[1] 1.00 4.25 12.50 8.75 11.

| Hi! I see that you have some variables saved in your workspace. To keep

| things running smoothly, I recommend you clean up before starting swirl.

| Type ls() to see a list of the variables in your workspace. Then, type

| rm(list=ls()) to clear your workspace.

| Type swirl() when you are ready to begin.

## [1] 8.

5.4 Lists

$one

[1] 1

$two

[1] 1 2

$five

[1] 0.00 0.25 0.50 0.75 1.

[1] 1

[1] 1 2

[1] 0.00 0.25 0.50 0.75 1.

[1] "one" "two" "five"

[1] 10.00 10.25 10.50 10.75 11.

Index

rnorm(100)

check help for plot and change some arguments

re-run_

Index

t$x

a b

1 3 12

2 4 43

3 5 54

a b

1 3 12

2 4 43

3 5 54

[1] NA

[1] 2

10.1 If-statement

[1] 2

[1] 1 4

10.2 For-loop (optional)

[1] NA 20 30 40 50 60 70 80 NA NA

10.3 Writing your own functions (optional)

[1] 14

[1] 14

11.1 R Functions

11.2 Keyboard shortcuts