R code, on its own, is just text. We can write R code in a new script within R or RStudio, or in any text editor. However, just writing the code will not do the whole job – in order for our code to be executed (or, interpreted), we need to send it to the command-line interpreter. In RStudio, the command-line interpreter is called the Console.
In R, the command-line interpreter starts with the > symbol. This is called the prompt. It operates on the idea of a “Read, evaluate, print” loop: we type in commands, R tries to execute them, and then returns a result.
The fastest way to have R evaluate code is to type our R code directly into the command-line interpreter. For example,
1+1
## [1] 2
The simplest thing we can do with R is do arithmetic:
1+100
## [1] 101
When using R as a calculator, the order of operations is the same as we learned in school: From highest to lowest precedence:
3+5*2
## [1] 13
# But
(3+5)*2
## [1] 16
We can also do comparison in R:
1==1 # equality (note two equals signs, read as "is equal to")
## TRUE
1 != 2 # inequality (read as "is not equal to")
## TRUE
1 < 2 # less than
## TRUE
1 <= 1 # less than or equal to
## TRUE
1 > 0 # greater than
## TRUE
1 >= -9 # greater than or equal to
## TRUE
A word of warning about comparing numbers: we should never use == to compare two numbers unless they are integers (a data type which can specifically represent only whole numbers).
There are certainly many cases where it makes sense to type code
directly into the console. For example, to open a help menu for a new
function with the ? command, to take a quick look at a
dataset with the head() function, or to do simple
calculations like 1+1, we should type directly into the
console. However, the problem with writing all our code in the console
is that nothing that we write will be saved. So in case of an error, or
if we want to make a change to some earlier code, we have to type it all
over again.
For this (and many more reasons), we should write an important code in the Source window, and save as an R script (a R script is a bunch of R code in a single file).
We can write an R script in any text editor, but we should save it with the .R suffix to make it clear that it contains R code.
To start writing a new R script in RStudio, click
File – New File – R Script. When we open a new script, we
see a blank page like this:
When we type code into an R script, we will notice that, unlike typing code into the Console, nothing happens. In order for R to interpret the code, we need to send it from the Source to the Console.
For example, we write the following code into the Source:
# Create variables
x<-23
y<-36
z<-89
#Do some calculations
x+y+z
(x-z)+y
log(x)
The first thing we do is to save this piece of code.
We can save this code by using click File – Save As....
We can type as many code as we like in the Source. R will not execute this until we send these codes to Console. The three most common ways to do this are:
Run button.Run button to run just that line.The operation of R revolves around two things: objects and functions. Almost everything in R is either an object or a function.
An object is a thing – like a number, a dataset, a summary statistic like a mean or standard deviation, or a statistical test. Objects come in many different shapes and sizes in R. There are simple objects like which represent single numbers, vectors which represent several numbers, more complex objects like dataframes which represent tables of data, and even more complex objects like hypothesis tests or regression which contain all sorts of statistical information. Objects in R are things, and different objects have different attributes.
By now we know that R can be used to do simple calculations. But to really take advantage of R, we need to know how to create and manipulate objects. All of the data, analyses, and even plots, we use and create are, or can be, saved as objects in R. Once an object is loaded, we can use it to calculate descriptive statistics, hypothesis tests, and to create plots.
To create new objects, we need to do object
assignment. Object assignment is our way of storing
information, such as a number or a statistical test, into something we
can easily refer to later. To do an assignment, we use the almighty
<- operator called assign. To assign
something to a new object (or to change an existing object), we use the
notation object <- ..., where object is the
new (or updated) object, and ... is whatever we want to
store in that object.
For example, we are creating two objects x and
y, and store the values 1/40 and
1/50 in these:
x<-1/40
y<-1/50
# Notice that assignment does not print a value. Instead,
# we stored it for later in something called a variable.
# x now contains the value 0.025.
x
## [1] 0.025
y
## [1] 0.02
In the Environment tab, we will see that x and y, along with their values, have appeared.
Our variable x can be used in place of a number in any calculation that expects a number:
log(x)
## [1] -3.688879
sum(x,y)
## [1] 0.045
We can assign these objects to a new object. For example:
z<-x+y
z
## [1] 0.045
To change an object, we need to assign it again. For example:
# We have an object a with a value of 0.
# We would like to add 1 to z in order to make it 1.
a<-0
a+1 # let us try first
a
## [1] 0
# the value of a is still 0! What went wrong?
a<-0
a<-a+1 # Now we are REALLY changing a
a
## [1] 1
We can also store strings in variables:
sentence <- "the cat sat on the mat"
# Note that we need to put strings of characters inside quotes.
sentence
## [1] "the cat sat on the mat"
But the type of data that is stored in a variable affects what we can do with it:
x+1
## [1] 1.025
sentence+1
## Error in sentence + 1 : non-numeric argument to binary operator
Name objects
We can create object names using any combination of letters and a few
special characters (like . and _). Here are
some valid object names:
group.mean <- 10.21
my.age <- 10
FavoriteFood <- "Sweet"
sum.1.to.5 <- 1 + 2 + 3 + 4 + 5
The simplest object type in R is a scalar. A scalar object is just a single value like a number or a name. For example:
# Examples of numeric scalars
m <- 100
n <- 3 / 100
o <- (m + n) / n
# Examples of character scalars
d <- "ship"
e <- "cannon"
f <- "Do any modern armies still use cannons?"
A vector object is just a combination of several scalars stored as a single object. For example, the numbers from one to ten could be a vector of length 10, and the characters in the English alphabet could be a vector of length 26. Like scalars, vectors can be either numeric or character (but not both!).
There are many ways to create vectors in R.
c()
function. The c here stands for concatenate, which means
“bring them together”. The c() function takes several
scalars as arguments, and returns a vector containing those objects.
When using c(), we need to place a comma in between the
objects (scalars or vectors) we want to combine:# Create an object a with the integers from 1 to 5
a <- c(1, 2, 3, 4, 5)
# Print the result
a
## [1] 1 2 3 4 5
char.vec <- c("Ceci", "nest", "pas", "une", "pipe")
char.vec
## [1] "Ceci" "nest" "pas" "une" "pipe"
a:b function takes two numeric scalars
a and b as arguments, and returns a vector of
numbers from the starting point a to the ending point
b.a<-1:10
a
## [1] 1 2 3 4 5 6 7 8 9 10
b<-2.5:8.5
b
## [1] 2.5 3.5 4.5 5.5 6.5 7.5 8.5
seq() function is a more flexible version of
a:b. Like a:b, seq() allows us to
create a sequence from a starting number to an ending number. However,
seq() has additional arguments that allow us to specify
either the size of the steps between numbers, or the total length of the
sequence: by and length.out. If we use the
by argument, the sequence will be in steps of the input to
the by argument:# Create the numbers from 1 to 10 in steps of 1
seq(from = 1, to = 10, by = 1)
## [1] 1 2 3 4 5 6 7 8 9 10
# Create the numbers from 0 to 100 in steps of 10
seq(from = 0, to = 100, by = 10)
## [1] 0 10 20 30 40 50 60 70 80 90 100
If we use the length.out argument, the sequence will
have length equal to length.out.
# Create 10 numbers from 1 to 5
seq(from = 1, to = 5, length.out = 10)
## [1] 1.0 1.4 1.9 2.3 2.8 3.2 3.7 4.1 4.6 5.0
# 3 numbers from 0 to 100
seq(from = 0, to = 100, length.out = 3)
## [1] 0 50 100
rep() function allows you to repeat a
scalar (or vector) a specified number of times, or to a desired
length:rep(x = 3, times = 10) #with scalar
## [1] 3 3 3 3 3 3 3 3 3 3
rep(x = c(1, 2), each = 3) #with vector
## [1] 1 1 1 2 2 2
rep(x = 1:3, length.out = 10) #with vector
## [1] 1 2 3 1 2 3 1 2 3 1
A function is a procedure that typically takes one or more objects as arguments (or, inputs), does something with those objects, then returns a new object.
R has many built in mathematical functions. To call a function, we simply type its name, followed by open and closing parentheses. Anything we type inside the parentheses are called the function’s arguments:
log(1) #natual logarithm
## [1] 0
exp(0.5) # e^(1/2)
## [1] 1.648721
We have no need to remember every function in R. We can simply look
them up on Google, or if we remember the start of the function’s name,
we can type the start of it, then press the tab key. This
will show a list of functions whose name matches what we have typed so
far; this is known as tab completion, and can save a
lot of typing (and reduce the risk of typing errors). Tab completion
works in R (i.e. running it out of RStudio), and in RStudio. In RStudio
this feature is even more useful; a extract of the function’s help file
will be shown alongside the function name.This is one advantage that
RStudio has over R on its own: it has auto-completion abilities that
allow you to more easily look up functions, their arguments, and the
values that they take.
When we use R, we do three basic things: 1) Define objects, 2) Apply functions to those objects, and 3) Repeat!. Take the following as an example:
# 1: Create a vector object called tattoos
runs <- c(4, 67, 23, 4, 10, 35)
# 2: Apply the mean() function to the tattoos object
mean(runs)
## [1] 23.83333
The mean() function we used above takes a vector object,
`runs``, of numeric data as an argument, calculates the arithmetic mean
of those data, then returns a single number (a scalar) as a result.
We will talk more about vector functions in the next section.