R can build the plot in different locations. The default location for plots is in a temporary plotting window within our R programming environment. In RStudio, plots will show up in the Plots window (Pane 4, refer to Introduction to R_2). In Base R, plots will show up in a Quartz window.

These plotting locations are like canvases. We can only have one canvas active at any given time, and any plotting command we run will put more plotting elements on this active canvas. Certain high–level plotting functions like plot() and hist() create brand new canvases, while other low–level plotting functions like points() and segments() place elements on top of existing canvases.

Let’s start by looking at a basic scatterplot in R using the plot() function.

plot(x = 1:10,
     y = 1:10,
     xlab = "X Axis label",
     ylab = "Y Axis label",
     main = "Main Title")

We see an x–axis, a y–axis, 10 data points, an x–axis label, a y–axis label, and a main plot title.

Some of these items, like the labels and data points, were entered as arguments to the function. For example, the main arguments x and y are vectors indicating the x and y coordinates of the (in this case, 10) data points. The arguments xlab, ylab, and main set the labels and title to the plot.

We will soon discover that we can change all of these elements by specifying additional arguments to the plot() function. Here, R used the default values – values that R uses unless we tell it to use something else.

Let’s take another example: the iris dataset.

data("iris")
plot(x=iris$Sepal.Width,
     y=iris$Petal.Width,
     xlab = "Sepal Width",
     ylab = "Petal Width",
     main = "Iris Sepal vs Petal")

Aside from the x and y arguments, all of the arguments are optional. If we don’t specify a specific argument, then R will use a default value, or try to come up with a value that makes sense. Let’s start creating plots with some of these arguments and high-level plotting functions.


1 Scatterplot

The most common high-level plotting function is plot(x, y). The plot() function makes a scatterplot from two vectors x and y, where the x vector indicates the x (horizontal) values of the points, and the y vector indicates the y (vertical) values.

1.1 Argument 1 - Colors

Most plotting functions have a color argument (usually col) that allows us to specify the color of the plot. There are many ways to specify colors in R.

The easiest way to specify a color is to enter its name as a string. For example col = "red" is R’s default version of the color red. Of course, all the basic colors are there, but R also has tons of quirky colors. For example, check this.

1.2 Argument 2 - Symbol types

When we create a plot with plot(), we can specify the type of symbol with the pch argument. This can be done in one of two ways: with an integer, or with a string.

  • If we use a string (like “p”), R will use that text as the plotting symbol.
  • If we use an integer value, we will get the symbol that corresponds to that number. Let’s have a look at the figure below for all the symbol types that we can specify with an integer.

Symbols differ in their shape and how they are colored. Symbols 1 through 14 only have borders and are always empty, while symbols 15 through 20 don’t have a border and are always filled. Symbols 21 through 25 have both a border and a filling. To specify the border color or background for symbols 1 through 20, we use the col argument. For symbols 21 through 25, we can set the color of the border with col, and the color of the background using bg.

1.3 Argument 3 - Symbol size

cex - A numeric vector. It specifies the size of the symbols (from 0 to Inf). The default size is 1. For example, cex = 4 will make the points very large, while cex = .5 will make them very small.

1.4 Argument 4 - Limits of the axes

xlim and ylim set limits to the axes. For example, xlim = c(0, 100) will set the minimum and maximum of the x-axis to 0 and 100.

1.5 Argument 5 - Labels of the axes

main, xlab, and ylab - Strings. give labels to the plot title, and x and y axes respectively.

Now, let’s plot the same iris data with these arguments

plot(x=iris$Sepal.Width,
     y=iris$Petal.Width,
     col='salmon', # Colors
     xlab = "Sepal Width", # x-axis label
     ylab = "Petal Width", # y-axis label
     main = "Iris Sepal vs Petal", # title of the plot
     pch = 2, # type of symbol
     cex = 1, # size of symbol
     xlim = c(2,5), # limit of x-axis
     ylim = c(0,4) # limit of y-axis
     )


2 Histogram

Histograms are the most common way to plot a vector of numeric data. To create a histogram we use the hist() function.

The main argument to hist() is a x, a vector of numeric data.

If we want to specify how the histogram bins are created, we can use the breaks argument. To change the color of the border or background of the bins, we can use col and border arguments, respectively.

hist() function arguments
Argument Description
x Vector of values
breaks How should the bin sizes be calculated? Can be specified in many ways.
freq Should frequencies or probabilities be plotted? freq = TRUE shows frequencies, freq = FALSE shows probabilities.
col and border Colors of the bin filling (col) and border (border)

Let’s create a histogram of values for iris sepal length

hist(x=iris$Sepal.Length,
     col='steelblue',
     main='Histogram',
     xlab='Length',
     ylab='Frequency')

We can use additional arguments like breaks, col, and bg to make it a bit more colorful.

hist(x=iris$Sepal.Length,
     main='Histogram',
     xlab='Length',
     ylab='Frequency',
     breaks = 4,
     xlim = c(3, 9),
     col = "papayawhip", # Filling Color
     border = "hotpink") # Border Color)


3 Barplot

A barplot typically shows summary statistics for different groups. The primary argument to a barplot is height: a vector of numeric values corresponding to the height of each bar.

To create a barplot we use the barplot() function. To add names below the bars, we use the names.arg argument. For example:

barplot(height = 1:5,  # A vector of heights
        names.arg = c("G1", "G2", "G3", "G4", "G5"), # A vector of names
        main = "Example Barplot", 
        xlab = "Group", 
        ylab = "Height")

Now, let’s say we want to create a barplot of the mean Petal Length for each species of the iris dataset.

# Calculating mean for each Species

df = aggregate(iris[,1:4], by = list(iris$Species), FUN = mean)

# Here we are creating a dataframe, called 'df'. The 'aggregate()' function is used to get the summary statistics of the data by group. Here the data is present from column 1 to column 4 of the 'iris' dataset. We want to get the statistics 'by' the group 'Species', which is present in the 'iris' dataset. The statistic is mean.

df

#The result looks like this:

Now we can plot the mean Petal Length for each species:

barplot(Petal.Length~Group.1, 
        data = df,
        main = "Mean Petal Length",
        xlab = c('Species'), 
        ylab = c('Petal Length'))

3.1 Clustered barplot

So far we have seen barplots for one variable. For example, mean petal width for three species groups of the iris dataset. Now, let’s say we want to plot both mean sepal length and mean petal length for the three species groups in a single plot. In that case, for each species, there will be two bars - one for mean petal width and another for mean sepal width. Let’s try this one:

length<-cbind(df$Sepal.Length,df$Petal.Length) # create the data subset
colnames(length) <- c("Sepal.Length", "Petal.Length") # attach column name
rownames(length) <- c("setosa", "versicolor","virginica") # attach row name

barplot(height = t(length), # 't' is to transpose the rows and columns
        beside = TRUE, # put the bars next to each other
        legend.text = TRUE, # add a legend
        main = "Mean Length",
        ylab = "Length",
        xlab = "Species") 


4 Low level plotting functions

Low-level plotting functions allow us to add elements, like points, or lines, to an existing plot. We will discuss here about three low-level plotting functions:

4.1 points()

To add new points to an existing plot, we use the points() function. The points function has many similar arguments to the plot() function, like x (for the x-coordinates), y (for the y-coordinates), and parameters like col (border color), cex (point size), and pch (symbol type).

Let’s use points() to create a plot with different symbol types for different data. We will use the iris dataset and plot the relationship between Sepal.Length and Sepal.Width. We will create separate points for three different species.

# Create a blank plot
plot(x = 1,
     type = "n",
     xlim = c(4,8), 
     ylim = c(0,6),
     pch = 16,
     xlab = "Sepal.Length", 
     ylab = "Sepal.Width",
     main = "Adding points to a plot with points()")

# Add points for species `setosa`
points(x = iris$Sepal.Length[iris$Species == "setosa"],
       y = iris$Sepal.Width[iris$Species == "setosa"],
       pch = 16,
       col = "coral2")

# Add points for species `versicolor`
points(x = iris$Sepal.Length[iris$Species == "versicolor"],
       y = iris$Sepal.Width[iris$Species == "versicolor"],
       pch = 16,
       col = "steelblue3")


4.2 text()

With text(), we can add text to a plot. We can use text() to highlight specific points of interest in the plot, or to add information (like a third variable) for every point in a plot.

Arguments to text()
Argument Description
x, y Coordinates of the labels
labels Labels to be plotted
cex Size of the labels
adj Horizontal text adjustment. adj = 0 is left justified, adj = .5 is centered, and adj = 1 is right-justified
pos Position of the labels relative to the coordinates. pos = 1, puts the label below the coordinates, while 2, 3, and 4 put it to the left, top and right of the coordinates respectively

For example, let’s use the data from the mean table df. We will create a scatterplot of sepal length and sepal width, and add species names as data labels above each point.

# Plot data
plot(x = df$Sepal.Length, 
     y = df$Sepal.Width,
     xlim = c(5,8),
     ylim = c(2.5,5),
     xlab = "Sepal.Length",
     ylab = "Sepal.Width",
     pch = 16)

# Add id labels
text(x = df$Sepal.Length, 
     y = df$Sepal.Width,
     labels = df$Group.1, 
     pos = 3)            # Put labels above the points


4.3 legend()

legend() adds a legend to a plot.

Arguments to legend()
Argument Description
x, y Coordinates of the legend. For example, x = 0, y = 0 will put the text at the coordinates (0, 0). Alternatively, we can enter a string indicating where to put the legend (i.e.; "topright", "topleft"). For example, "bottomright" will always put the legend at the bottom right corner of the plot.
labels A string vector specifying the text in the legend. For example, legend = c("Males, "Females") will create two groups with names Males and Females.
pch, lty, lwd, col, pt.bg, ... Additional arguments specifying symbol types (pch), line types (lty), line widths (lwd), background color of symbol types 21 through 25 (pt.bg) and several other optional arguments.

For example, we will draw a scatterplot of sepal width vs petal width for the two species from the iris dataset, and add a legend to it to identify the species.

# Create plot with data from one species 'setosa'
plot(x=iris$Sepal.Width[iris$Species=="setosa"],
     y=iris$Petal.Width[iris$Species=="setosa"],
     col='salmon', # Colors
     xlab = "Sepal Width", # x-axis label
     ylab = "Petal Width", # y-axis label
     main = "Sepal vs Petal", # title of the plot
     pch = 16, # type of symbol
     cex = 1, # size of symbol
     xlim = c(2,5), # limit of x-axis
     ylim = c(0,4) # limit of y-axis
     )
# Add data from another species 'virginica'
points(x = iris$Sepal.Width[iris$Species=="virginica"], 
       y = iris$Petal.Width[iris$Species=="virginica"],
       pch = 16, col = "blue")

# Add legend
legend("bottomright",
       legend = c("setosa", "virginica"),
       col = c('salmon', 'blue'),
       pch = c(16, 16),
       bg = "white")

5 Save plots

Once we create a plot in R, we want to save it to a file so that we can use it in another document. To do this, we use either the pdf(), png() or jpeg() functions.

We need to follow 3 steps:

  1. Execute the pdf() or jpeg() functions with file, width, height arguments.
  2. Execute all plotting code (e.g.; plot(x = 1:10, y = 1:10))
  3. Complete the file by executing the command dev.off(). This tells R that we are done creating the file.

For example:

# Step 1: Call the pdf command to start the plot
pdf(file = "plot1.pdf",   # The directory and file name where we want to save the file
    width = 4, # The width of the plot in inches
    height = 4) # The height of the plot in inches

# Step 2: Create the plot with R code
# We are keeping the last code
plot(x=iris$Sepal.Width[iris$Species=="setosa"],
     y=iris$Petal.Width[iris$Species=="setosa"],
     col='salmon', # Colors
     xlab = "Sepal Width", # x-axis label
     ylab = "Petal Width", # y-axis label
     main = "Sepal vs Petal", # title of the plot
     pch = 16, # type of symbol
     cex = 1, # size of symbol
     xlim = c(2,5), # limit of x-axis
     ylim = c(0,4) # limit of y-axis
     )
# Add data from another species 'virginica'
points(x = iris$Sepal.Width[iris$Species=="virginica"], 
       y = iris$Petal.Width[iris$Species=="virginica"],
       pch = 16, col = "blue")

# Add legend
legend("bottomright",
       legend = c("setosa", "virginica"),
       col = c('salmon', 'blue'),
       pch = c(16, 16),
       bg = "white")

# Step 3: Run dev.off() to create the file!
dev.off()

After we close the plot with dev.off(), a message like “null device” appears in the prompt. That is just R telling us that we can now create plots in the main R plotting window again.

The functions pdf(), jpeg(), and png() all work the same way, they just return different file types.