R can build the plot in different locations. The default location for
plots is in a temporary plotting window within our R programming
environment. In RStudio, plots will show up in the Plots
window (Pane 4, refer to Introduction to R_2). In Base R,
plots will show up in a Quartz window.
These plotting locations are like canvases. We can only have one
canvas active at any given time, and any plotting command we run will
put more plotting elements on this active canvas. Certain high–level
plotting functions like plot() and hist()
create brand new canvases, while other low–level plotting functions like
points() and segments() place elements on top
of existing canvases.
Let’s start by looking at a basic scatterplot in R using the
plot() function.
plot(x = 1:10,
y = 1:10,
xlab = "X Axis label",
ylab = "Y Axis label",
main = "Main Title")
We see an x–axis, a y–axis, 10 data points, an x–axis label, a y–axis label, and a main plot title.
Some of these items, like the labels and data points, were entered as
arguments to the function. For example, the main arguments x and y are
vectors indicating the x and y coordinates of the (in this case, 10)
data points. The arguments xlab, ylab, and
main set the labels and title to the plot.
We will soon discover that we can change all of these elements by
specifying additional arguments to the plot() function.
Here, R used the default values – values that R uses unless we tell it
to use something else.
Let’s take another example: the iris dataset.
data("iris")
plot(x=iris$Sepal.Width,
y=iris$Petal.Width,
xlab = "Sepal Width",
ylab = "Petal Width",
main = "Iris Sepal vs Petal")
Aside from the x and y arguments, all of the arguments are optional. If we don’t specify a specific argument, then R will use a default value, or try to come up with a value that makes sense. Let’s start creating plots with some of these arguments and high-level plotting functions.
The most common high-level plotting function is
plot(x, y). The plot()
function makes a scatterplot from two vectors x and
y, where the x vector indicates the
x (horizontal) values of the points, and the y
vector indicates the y (vertical) values.
Most plotting functions have a color argument (usually
col) that allows us to specify the color of the plot. There
are many ways to specify colors in R.
The easiest way to specify a color is to enter its name as a string.
For example col = "red" is R’s default version of the color
red. Of course, all the basic colors are there, but R also
has tons of quirky colors. For example, check this.
When we create a plot with plot(), we can specify the
type of symbol with the pch argument. This can be done in
one of two ways: with an integer, or with a
string.
string (like “p”), R will use that text as
the plotting symbol.integer value, we will get the symbol that
corresponds to that number. Let’s have a look at the figure below for
all the symbol types that we can specify with an integer.Symbols differ in their shape and how they are colored. Symbols 1
through 14 only have borders and are always empty, while symbols 15
through 20 don’t have a border and are always filled. Symbols 21 through
25 have both a border and a filling. To specify the border color
or background for symbols 1 through 20, we use the col
argument. For symbols 21 through 25, we can set the color of the border
with col, and the color of the background using
bg.
cex - A numeric vector. It specifies the size of the
symbols (from 0 to Inf). The default size is 1. For example,
cex = 4 will make the points very large, while
cex = .5 will make them very small.
xlim and ylim set limits to the axes. For
example, xlim = c(0, 100) will set the minimum and maximum
of the x-axis to 0 and 100.
main, xlab, and ylab -
Strings. give labels to the plot title, and x and y axes
respectively.
Now, let’s plot the same iris data with these arguments
plot(x=iris$Sepal.Width,
y=iris$Petal.Width,
col='salmon', # Colors
xlab = "Sepal Width", # x-axis label
ylab = "Petal Width", # y-axis label
main = "Iris Sepal vs Petal", # title of the plot
pch = 2, # type of symbol
cex = 1, # size of symbol
xlim = c(2,5), # limit of x-axis
ylim = c(0,4) # limit of y-axis
)
Histograms are the most common way to plot a vector of numeric data.
To create a histogram we use the hist()
function.
The main argument to hist() is a x,
a vector of numeric data.
If we want to specify how the histogram bins are created, we can
use the breaks argument. To change the color of the border
or background of the bins, we can use col and
border arguments, respectively.
| Argument | Description |
|---|---|
x |
Vector of values |
breaks |
How should the bin sizes be calculated? Can be specified in many ways. |
freq |
Should frequencies or probabilities be plotted?
freq = TRUE shows frequencies, freq = FALSE
shows probabilities. |
col and border |
Colors of the bin filling (col) and border
(border) |
Let’s create a histogram of values for iris sepal length
hist(x=iris$Sepal.Length,
col='steelblue',
main='Histogram',
xlab='Length',
ylab='Frequency')
We can use additional arguments like breaks,
col, and bg to make it a bit more
colorful.
hist(x=iris$Sepal.Length,
main='Histogram',
xlab='Length',
ylab='Frequency',
breaks = 4,
xlim = c(3, 9),
col = "papayawhip", # Filling Color
border = "hotpink") # Border Color)
A barplot typically shows summary statistics for different groups.
The primary argument to a barplot is height: a vector of
numeric values corresponding to the height of each bar.
To create a barplot we use the
barplot() function. To add names below the
bars, we use the names.arg argument. For example:
barplot(height = 1:5, # A vector of heights
names.arg = c("G1", "G2", "G3", "G4", "G5"), # A vector of names
main = "Example Barplot",
xlab = "Group",
ylab = "Height")
Now, let’s say we want to create a barplot of the mean
Petal Length for each species of the iris dataset.
# Calculating mean for each Species
df = aggregate(iris[,1:4], by = list(iris$Species), FUN = mean)
# Here we are creating a dataframe, called 'df'. The 'aggregate()' function is used to get the summary statistics of the data by group. Here the data is present from column 1 to column 4 of the 'iris' dataset. We want to get the statistics 'by' the group 'Species', which is present in the 'iris' dataset. The statistic is mean.
df
#The result looks like this:
Now we can plot the mean Petal Length for each
species:
barplot(Petal.Length~Group.1,
data = df,
main = "Mean Petal Length",
xlab = c('Species'),
ylab = c('Petal Length'))
So far we have seen barplots for one variable. For example, mean petal width for three species groups of the iris dataset. Now, let’s say we want to plot both mean sepal length and mean petal length for the three species groups in a single plot. In that case, for each species, there will be two bars - one for mean petal width and another for mean sepal width. Let’s try this one:
length<-cbind(df$Sepal.Length,df$Petal.Length) # create the data subset
colnames(length) <- c("Sepal.Length", "Petal.Length") # attach column name
rownames(length) <- c("setosa", "versicolor","virginica") # attach row name
barplot(height = t(length), # 't' is to transpose the rows and columns
beside = TRUE, # put the bars next to each other
legend.text = TRUE, # add a legend
main = "Mean Length",
ylab = "Length",
xlab = "Species")
Low-level plotting functions allow us to add elements, like points, or lines, to an existing plot. We will discuss here about three low-level plotting functions:
To add new points to an existing plot, we use the
points() function. The points function has many similar
arguments to the plot() function, like x (for
the x-coordinates), y (for the y-coordinates), and
parameters like col (border color), cex (point
size), and pch (symbol type).
Let’s use points() to create a plot with different
symbol types for different data. We will use the iris dataset and plot
the relationship between Sepal.Length and
Sepal.Width. We will create separate points for three
different species.
# Create a blank plot
plot(x = 1,
type = "n",
xlim = c(4,8),
ylim = c(0,6),
pch = 16,
xlab = "Sepal.Length",
ylab = "Sepal.Width",
main = "Adding points to a plot with points()")
# Add points for species `setosa`
points(x = iris$Sepal.Length[iris$Species == "setosa"],
y = iris$Sepal.Width[iris$Species == "setosa"],
pch = 16,
col = "coral2")
# Add points for species `versicolor`
points(x = iris$Sepal.Length[iris$Species == "versicolor"],
y = iris$Sepal.Width[iris$Species == "versicolor"],
pch = 16,
col = "steelblue3")
With text(), we can add text to a plot. We can use
text() to highlight specific points of interest in the
plot, or to add information (like a third variable) for every point in a
plot.
| Argument | Description |
|---|---|
x, y |
Coordinates of the labels |
labels |
Labels to be plotted |
cex |
Size of the labels |
adj |
Horizontal text adjustment. adj = 0 is left justified,
adj = .5 is centered, and adj = 1 is
right-justified |
pos |
Position of the labels relative to the coordinates.
pos = 1, puts the label below the coordinates, while 2, 3,
and 4 put it to the left, top and right of the coordinates
respectively |
For example, let’s use the data from the mean table df.
We will create a scatterplot of sepal length and sepal width, and add
species names as data labels above each point.
# Plot data
plot(x = df$Sepal.Length,
y = df$Sepal.Width,
xlim = c(5,8),
ylim = c(2.5,5),
xlab = "Sepal.Length",
ylab = "Sepal.Width",
pch = 16)
# Add id labels
text(x = df$Sepal.Length,
y = df$Sepal.Width,
labels = df$Group.1,
pos = 3) # Put labels above the points
legend() adds a legend to a plot.
| Argument | Description |
|---|---|
x, y |
Coordinates of the legend. For example, x = 0, y = 0
will put the text at the coordinates (0, 0). Alternatively, we can enter
a string indicating where to put the legend (i.e.;
"topright", "topleft"). For example,
"bottomright" will always put the legend at the bottom
right corner of the plot. |
labels |
A string vector specifying the text in the legend. For example,
legend = c("Males, "Females") will create two groups with
names Males and Females. |
pch, lty, lwd, col, pt.bg, ... |
Additional arguments specifying symbol types (pch),
line types (lty), line widths (lwd),
background color of symbol types 21 through 25 (pt.bg) and
several other optional arguments. |
For example, we will draw a scatterplot of sepal width vs petal width for the two species from the iris dataset, and add a legend to it to identify the species.
# Create plot with data from one species 'setosa'
plot(x=iris$Sepal.Width[iris$Species=="setosa"],
y=iris$Petal.Width[iris$Species=="setosa"],
col='salmon', # Colors
xlab = "Sepal Width", # x-axis label
ylab = "Petal Width", # y-axis label
main = "Sepal vs Petal", # title of the plot
pch = 16, # type of symbol
cex = 1, # size of symbol
xlim = c(2,5), # limit of x-axis
ylim = c(0,4) # limit of y-axis
)
# Add data from another species 'virginica'
points(x = iris$Sepal.Width[iris$Species=="virginica"],
y = iris$Petal.Width[iris$Species=="virginica"],
pch = 16, col = "blue")
# Add legend
legend("bottomright",
legend = c("setosa", "virginica"),
col = c('salmon', 'blue'),
pch = c(16, 16),
bg = "white")
Once we create a plot in R, we want to save it to a file so that we
can use it in another document. To do this, we use either the
pdf(), png() or jpeg()
functions.
We need to follow 3 steps:
pdf() or jpeg() functions with
file, width, height
arguments.plot(x = 1:10, y = 1:10))dev.off().
This tells R that we are done creating the file.For example:
# Step 1: Call the pdf command to start the plot
pdf(file = "plot1.pdf", # The directory and file name where we want to save the file
width = 4, # The width of the plot in inches
height = 4) # The height of the plot in inches
# Step 2: Create the plot with R code
# We are keeping the last code
plot(x=iris$Sepal.Width[iris$Species=="setosa"],
y=iris$Petal.Width[iris$Species=="setosa"],
col='salmon', # Colors
xlab = "Sepal Width", # x-axis label
ylab = "Petal Width", # y-axis label
main = "Sepal vs Petal", # title of the plot
pch = 16, # type of symbol
cex = 1, # size of symbol
xlim = c(2,5), # limit of x-axis
ylim = c(0,4) # limit of y-axis
)
# Add data from another species 'virginica'
points(x = iris$Sepal.Width[iris$Species=="virginica"],
y = iris$Petal.Width[iris$Species=="virginica"],
pch = 16, col = "blue")
# Add legend
legend("bottomright",
legend = c("setosa", "virginica"),
col = c('salmon', 'blue'),
pch = c(16, 16),
bg = "white")
# Step 3: Run dev.off() to create the file!
dev.off()
After we close the plot with dev.off(), a message like
“null device” appears in the prompt. That is just R telling us that we
can now create plots in the main R plotting window again.
The functions pdf(), jpeg(), and
png() all work the same way, they just return different
file types.