In this chapter, we will cover the basics of R object management. We will see how to load new objects like external datasets into R, how to manage the objects that we already have, and how to export objects from R into external files that we can share with other people or store for our own future use.

Why managing the workspace? Our computer is a maze of files and folders. Outside of R, when we want to open a specific file, we visually search through the folders on our computer. This visual search system is a no-go for R. When we are programming in R, we need to specify all steps in our analyses in a way that can be easily replicated by others and your future self. We need to tell R exactly where to find the required files - either on our computer or on the web. To make this job easier, R uses working directories.

1 Working directory

The working directory is just a file path on our computer that sets the default location of any files we read into R, or save out of R.

We can only have one working directory active at any given time. The active working directory is called the current working directory.

# To see our current working directory
getwd()

## [1] "/Users/achyutkumarbanerjee/Desktop/r_course/src"

As we can see, the current code tells us that my working directory is within a folder called src. This folder is within another folder called r_course on my Desktop. This means that when I try to read new files into R, or write files out of R, it will assume that I want to put them in this folder.

# To change our current working directory
# For example, if I want to change my working directory to another folder on the Desktop called `code`,
setwd(dir = "/Users/achyutkumarbanerjee/Desktop/code")

## [1] "/Users/achyutkumarbanerjee/Desktop/code"

2 Workspace

The workspace (i.e., the working environment) represents all of the objects and functions that we have either defined in the current session, or have loaded from a previous session. When we start RStudio for the first time, the workspace is empty, because we have not created any new objects or functions. However, as we start defining new objects and functions using the assignment operator <-, these new objects are getting stored in the workspace.

2.1 ls()

If we want to see all objects defined in our current workspace, we use the ls() function.

# For example, let's create and load some data first.
# First, we create a simple dataframe called `survey` with a mixture of text and numeric columns. The text column records the gender of the participants and the numeric column records the age of the participants.
survey <- data.frame("index" = c(1, 2, 3, 4, 5),
                     "sex" = c("m", "m", "m", "f", "f"),
                     "age" = c(99, 46, 23, 54, 23))

# Second, we are calculating the average petal width of the iris dataframe
iris.average<-mean(iris$Petal.Width)

# Now, let's check the names of the objects we just created
ls()

## [1] "iris.average" "survey"

The result above says that I have these 2 objects in my workspace.

2.2 rm()

If we want to remove all objects from our current workspace, we use the rm()function. torm()`.

# To remove specific objects, enter the objects as arguments:
# For example, to remove the `survey` dataframe, we use:
rm(survey)
# Now check the objects present in the workspace:
ls()

## [1] "iris.average"

# To remove all objects from the workspace:
rm(list = ls())
# Now check the objects present in the workspace:
ls()

## character(0)

Note that once we remove an object, we cannot get it back without running the code that originally generated the object! If our R code is complete and well-documented, we should easily be able to either re-create a lost object, or re-load it from an external file.

3 Dealing with .RData

The best way to store objects from R is with .RData files. .RData files are specific to R and can store as many objects as we like within a single file.

3.1 save()

# Let us create three dataframes that we will save
# These dataframes contain initials and average scores of five random students 
# from three different disciplines 
study1.science <- data.frame(id = 1:5, 
                        name = c("AK", "CK", "NY", "GH", "LT"), 
                        score = c(51, 20, 67, 52, 42))

study2.arts <- data.frame(id = 1:5, 
                        name = c("HG", "JU", "CL", "YS", "VS"), 
                        score = c(39, 68, 59, 29, 92))

study3.commerce <- data.frame(id = 1:5,
                        name = c("JH", "OP", "TY", "SL", "AU"), 
                        score = c(90, 49, 20, 10, 82))

# Now we save these objects as a new .RData file
# in the data folder of our current working directory
save(study1.science, study2.arts, study3.commerce,
     file = "study.RData")

# Once it is saved, let us remove the three files from the workspace
rm(list = ls())

Check the working directory “/Users/achyutkumarbanerjee/Desktop/r_course/src”, and we will find the .RData there:

Check the workspace, and we now have an empty workspace:

3.2 load()

To load the three specific objects that we saved earlier in the .RData file named study.RData, we will run the following:

load(file = "study.RData")

Now check the workspace:

4 Import data

While .RData files are great for saving R objects, sometimes we need to import data from other programs (e.g., Excel). Similarly, sometimes we need to export data as a simple text file (.txt) that other programs can also use.

One of R’s most powerful features is its ability to deal with tabular data — such as we may already have in a spreadsheet, as a text or as a CSV file. For example, the data/ folder of the project contains iris_data in two formats - .csv and .txt.

We can view the contents of the file by selecting it from the “Files” window in RStudio, and selecting “View File”. This will display the contents of the file in a new window in RStudio. We can see that the variables names are given in the first line of the file, and that the remaining lines contain the data itself. Each observation is on a separate line, and variables are separated by commas. Note that viewing the file doesn’t make its contents available to R; to do this we need to import the data.

We can import data into R using various functions. Some functions are available with base R, while some come with packages. Each function has certain parameters that are mentioned below.

4.1 Functions with base R

4.1.1 read.delim()

This method is used for reading “tab-separated value” files (“.txt”).

Parameters:

  1. file: the path to the file containing the data to be read into R.
  2. header: a logical value. If TRUE, read.delim() assumes that our file has a header row, so row 1 is the name of each column. If that’s not the case, we can add the argument header = FALSE.
  3. sep: the field separator character. “ is used for a tab-delimited file.
  4. dec: the character used in the file for decimal points.

Example:

# Read a text file using read.delim()
setwd("/Users/achyutkumarbanerjee/Desktop/r_course/data")
myData <- read.delim("iris_data.txt", header = TRUE)
print(myData)

4.1.2 read.table()

Another popular format to store a file is in a tabular format. R provides various methods that one can read data from a tabular formatted data file. read.table() is a general function that can be used to read a file in table format. The data will be imported as a data frame.

Parameters:

  1. file: the path to the file containing the data to be read into R.
  2. header: a logical value. If TRUE, read.delim() assumes that our file has a header row, so row 1 is the name of each column. If that’s not the case, we can add the argument header = FALSE.
  3. sep: the field separator character.
  4. dec: the character used in the file for decimal points.

Example:

# Read a text file using read.table()
setwd("/Users/achyutkumarbanerjee/Desktop/r_course/data")
myData1 <- read.table("iris_data.txt", header = TRUE)
print(myData1)

4.1.3 read.csv()

It is used for reading “comma separated value” files (“.csv”). In this also the data will be imported as a data frame.

Parameters:

  1. file: the path to the file containing the data to be read into R.
  2. header: a logical value. If TRUE, read.delim() assumes that our file has a header row, so row 1 is the name of each column. If that’s not the case, we can add the argument header = FALSE.
  3. sep: the field separator character.
  4. dec: the character used in the file for decimal points.

Example:

# Read a text file using read.csv()
setwd("/Users/achyutkumarbanerjee/Desktop/r_course/data")
myData2 <- read.csv("iris_data.csv", header = TRUE)
print(myData2)

4.1.4 file.choose()

In R, it is also possible to choose a file interactively using the function file.choose().

Example:

# Read a text file using read.csv()
setwd("/Users/achyutkumarbanerjee/Desktop/r_course/data")
myData3 <- read.csv(file.choose(), header = TRUE)
print(myData3)

4.1.5 Functions with R packages

4.1.5.1 read_csv() function

We can import the data into R using the read_csv() function; this is part of the readr package, which is part of the tidyverse.

We should first install the tidyverse and readr packages, and then load these packages.

install.packages("tidyverse")
install.packages("readr")
library(tidyverse)
library(readr)

We then use the read_csv() function to import the data, which we will store in the object named myData3:

setwd("/Users/achyutkumarbanerjee/Desktop/r_course/data")
myData3 <- read_csv(file = "iris_data.csv")

## Rows: 150 Columns: 5── Column specification 
## ─────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): species
## dbl (4): sepal_length, sepal_width, petal_length, petal_width
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` 
## to quiet this message.

We see that the read_csv() table reports a “column specification”. The spec() function will show us the full column specification:

spec(myData3)

## cols(
##  sepal_length = col_double(),
##  sepal_width = col_double(),
##  petal_length = col_double(),
##  petal_width = col_double(),
##  species = col_character()
## )

4.1.6 Preview the imported data

When we enter myData3, it will print the contents of data frame. We see that it consists of a 150 by 5 tibble.

myData3

A tibble is a way of storing tabular data, which is part of the tidyverse. We see the variable names, and an (abbreviated) string indicating what type of data is stored in each variable.

You may notice while typing the command that RStudio auto suggests read.csv() as a function to load a comma separated value file. This function is included as part of base R, and performs a similar job to read_csv(). We can also use read_csv(), because it is part of the tidyverse and so works well with other parts of the tidyverse, it is faster than read.csv() and handles strings in a way that is usually more useful than read.csv().

5 Data types

Every piece of data in R is stored as either double, integer, complex, logical or character.

When we read the data into R using read_csv(), it tries to work out what data type each variable is, which it does by looking at the data contained in the first 1000 rows of the data file. We can see from the displayed message that read_csv() has treated the species variable as a character variable, and all other 4 variables as double variables.

Now, we are adding one more column to the iris dataset, where we are recording if the species are present (represented by 1) or not (represented by 0) at a given location. And we want to tell read_csv() to treat the occurrence column as a logical variable.

setwd("/Users/achyutkumarbanerjee/Desktop/r_course/data")
myData4 <- read_csv("iris_data_1.csv", col_types = cols(
  species = col_character(),
  sepal_length = col_double(),
  sepal_width = col_double(),
  petal_length = col_double(),
  petal_width = col_double(),
  occurrence = col_logical()
) )

# Now, if we see the specification of the dataframe, 
# `occurrence` is a logical variable.
spec(myData4)

## cols(
##  sepal_length = col_double(),
##  sepal_width = col_double(),
##  petal_length = col_double(),
##  petal_width = col_double(),
##  species = col_character(),
##  occurrence = col_logical()
## )

5.1 Exploring tibble

Tibbles are used to represent tabular data in the tidyverse. In contrast, base R uses data frames to represent tabular data. One of the differences between these two types of object is what is returned when we extract a subset of rows/columns. In contrast to a tibble, taking a subset of a data frame doesn’t always return another data frame.

# Return a vector containing the values of a variable using the dollar symbol $
myData4$petal_width

## [1] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 0.2 ....

# We can also use the subsetting operator [] directly on tibbles. 
# In contrast to a vector, a tibble is two dimensional. 
# We pass two arguments to the [] operator; the first indicates the row(s) 
# and the second indicates the columns. 
# So to return rows 1 and 2, and columns 2 and 3 we can use:
myData4[1:2,4:5]

Subsetting a tibble returns another tibble; using $ to extract a variable returns a vector

6 Export data

We can save a tibble (or data frame) to a csv file, using the write_csv() function of the readr package. For example, to save the myData4 data to myData.csv:

write_csv(myData4, "myData.csv")