Programming refers to a technological process for telling a computer which tasks to perform in order to solve problems. We can think of programming as a collaboration between humans and computers, in which humans create instructions for a computer in a language that the computer can understand and follow.
This set of well-defined instructions to solve a particular problem is called algorithm. It takes a set of input(s) and produces the desired output. For example, an algorithm to add 2 numbers will be like this: Take 2 number inputs > Add numbers using the + operator > Display the result.
How does computer programming work? At its most basic, programming tells a computer what to do.
Some examples
Most used programming languages
According to a survey, the top five programming languages that developers use as of 2023 are:
Qualities of a good algorithm
R is a programming language for statistical computing and graphics. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinformaticians and statisticians for data analysis and developing statistical software. The core R language is enhanced by a large number of extension packages containing reusable code and documentation.
To use R, we will need two software packages: Base-R and RStudio. Base-R is the basic software which contains the R programming language. RStudio is software that makes R programming easier.
Base R can be installed from one of the links below (depending on operating system) and following the instructions:
Note: R and RStudio are constantly being updated with new features and bug-fixes. The latest version (as of 19 December 2023) of Base-R is 4.3.2 “Eye Holes”, and the latest version of RStudio is 2023.12.0+369. We should update R and RStudio to the newest version(s) periodically; otherwise, some of the codes and packages may not work.
While we can do pretty much everything within base R, we will do R programming in an application called RStudio. It is a graphical user interface (GUI)-like interface for R that makes programming in R a bit easier. Once we have installed RStudio, we will likely never need to open the base R application again.
RStudio can be downloaded from here, and can be installed following the on-screen instructions.
When we open RStudio, we can see the following four windows (also called panes):
Pane 1: Source
This is where we write our code.
When we open RStudio, it will automatically start a new Untitled script. We should always save it with a new file name (e.g., “script1.R”). If something happens while we are working, R will have our code waiting for us when we re-open RStudio.
Note that when we type a code in the Source panel, R will not actually run the code. To run the code, we need to first ‘send’ the code to the Console (Pane 2).
The fastest way to send the code from Source to Console is to highlight the code and clicking on the “Run” button on the top right of the Source. The shortcut keys for this task are: “Command + Return” on Mac, and “Control + Enter” on PC.
Pane 2: Console
The console is the heart of R. Here is where R actually evaluates code.
At the beginning of the console, we can see the character >. This is a prompt that tells us that R is ready for new code. We can type code directly into the console after the prompt and get an immediate response. For example, if we type 1+1 into the console and press enter, we will have an output of 2. We can also type 1+1 into the source, then select the code, and use the “Run” button to get the result.
So we can see that we can execute code either by running it from the Source or by typing it directly into the Console.
However, most of the time, we should use the Source rather than the Console. The reason for this is:
Therefore, it is better to write all our code in the Source. When we are ready to execute it, we can then use “Run” and send it to the Console.
Pane 3: Environment
The Environment tab of this panel lists the names of all the data objects (like vectors, matrices, and data frames) that we are using in our current R session. We can also have information like the number of observations and rows in data objects.
The tab also has a few clickable actions like “Import Dataset”, which will open a graphical user interface (GUI) to import data into R. We can click the “Broom” icon to clear the contents of this pane.
The History tab of this panel shows the history of all the codes we have previously evaluated in the Console. As we progress further with R, we will find this pane useful. For now, let us keep it aside, and move to the 4th pane of RStudio.
Pane 4: Files/Plots/…
This panel shows us lots of helpful information. Let us go through each tab in detail:
When we are performing an analysis we will typically be using many files - input data, files containing code to perform the analysis, and results. By creating a project in Rstudio, we make it easier to manage these files.
Let us start the course by making a new project in RStudio, and copying some data into it that we will use in future.
Recommendations:
doc directory.data directory.results directory.src
directory.We will create 3 directories: data, doc,
results and src directories in our project
directory. The directory should look like this in Pane 4.
Note that the path (“Home Library …”) will vary according to where we created the project. Now when we start R in this project directory, or open this project with RStudio, all of our work on this project will be entirely self-contained in this directory.
The working directory determines where files will be loaded from and saved to by default. The current working directory is shown above the console.
If this is not our project’s directory, we can set our working directory as follows:
Session tab at the top.Set Working Directory.To Project Directory or we can
Choose Directory to manually identify the directory.When we download and install R for the first time, we are installing
the Base R software. Base R will contain most of the functions we will
use on a daily basis like mean() and hist().
However, only functions written by the original authors of the R
language will appear here. If we want to access data and code written by
other people, we need to install it as a package.
An R package is simply a bunch of data, functions, help menus, and vignettes (examples), stored in one neat place.
Installing a package simply means downloading the package code onto your personal computer. To download them from the Comprehensive R Archive Network (CRAN).
CRAN is the central repository for R packages. To install a new R
package from CRAN, we can simply run the code:
install.packages("name"), where “name” is the name of the
package. If everything works fine, we should see some information about
where the package is being downloaded from, in addition to a progress
bar.
For example,
install.packages(ggplot2)
Once we have installed a package on our computer, we never need to install it again (unless we want to install a new version of the package). However, every time we want to use it, we need to turn it on by loading it.
To load a package, we use the library() function. For
example, now that we have installed the ggplot2 package, we can load it
with library("ggplot2").
library(ggplot2)
Now that we have loaded the ggplot2 package, we can use any of its functions!
Here, R code is presented in a separate gray box like the one below: Lines that begin with # (at least one) are comments.
Note: The comments starting with single # are comments that I write directly to explain code. Lines starting with ## are the output from the previous line(s) of code. When you run the code yourself, you should see the same output in your console.
# Define a vector a as the integers from 1 to 5
a <- 1:5
# Print a
a
## [1] 1 2 3 4 5
# What is the mean of a?
mean(a)
## [1] 3
The output we see will often start with one or more number(s) in brackets such as [1]. This is just a visual way of telling us where the numbers occur in the output.
# Generate a long vector containing the multiples of 2 from 0 to 100
seq(from = 0, to = 100, by = 2)
## [1] 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
## [24] 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90
## [47] 92 94 96 98 100