1  Setting up your workspace

1.1 Create R Project

Both R and Python are designed in mind that each project should have its own directory, and this is a good habit to get into. RStudio allows you to manage your entire project intuitively and conveniently through R Project files. Using R Project files comes with a couple of perks, for example:

  • Organization: An R project helps you keep all relevant files for a particular project in one place.
  • Reproducibility: An R project makes it easier to reproduce your analysis by keeping all the relevant files and settings in one place.
  • Version control: An R project can be integrated with version control systems like the R renv package and Git, making it easier to track changes to your code and collaborate with others. The renv package allows you to install packages that are local to the project but not to other projects or the rest of the R system, this is helpful for managing multiple projects which might be running off of different versions of packages.
  • Working directory: When you open an R project, the working directory is automatically set to the project directory, making it easier to use relative file paths in your code.

There isn’t much of a learning curve to using an R project, in fact, in my experience, it often makes things a little easier by getting student’s into good habits with saving files into the project directory and makes some of the coding and path references easier to keep track of.

Tip 1.1: Use R Projects

We highly recommend using R projects, one for each project or analysis you are working on.

At times, some workloads will actually require multiple projects, for instance, if we consider designing a shiny app, we might want to have a separate projects for the package that performs some of the data wrangling and the analysis functions and a separate project dedicated to the ui and server functions that run the application.

1.1.1 Create an R Project in RStudio

To create an R Project in RStudio, you need to perform the following steps:

  1. Select File > New Project... from the menu bar.
  2. If you do not already have a folder, select New Directory from the popup window. If you do not already have a folder in mind select Existing Directory.
  3. If you chose New Directory then pick a meaningful name for your project folder, i.e., the Directory Name.
  4. Ensure this project folder is created in the right place.
  5. You have the option to Create a git repository. This is only relevant if you already have a Git installed and wish to use version control.
  6. Lastly, tick Open in new session. This will open your R Project in a new RStudio window.
  7. Once you are happy with your choices, you can click Create Project.

1.1.2 Create an R Project in Code

This can also be accomplished with the usethis package. You can install the usethis package by running install.packages('usethis'). The function is used to install packages or libraries that are extensions of the base R functionality. The usethis package is a package that automates many of the tasks that are associated with creating an R package or project. The create_project() function creates a new project in a new directory.

Code
usethis::create_project(
  path,
  rstudio = rstudioapi::isAvailable(),
  open = rlang::is_interactive()
)
  • path - A path. If it exists, it is used. If it does not exist, it is created, provided that the parent path exists.

  • rstudio - If TRUE, calls use_rstudio() to make the new package or project into an RStudio Project. If FALSE and a non-package project, a sentinel .here file is placed so that the directory can be recognized as a project by the here or rprojroot packages.

  • open

    • If TRUE, activates the new project:
      • If RStudio desktop, the package is opened in a new session.
      • If on RStudio server, the current RStudio project is activated.
    • Otherwise, the working directory and active project is changed.

1.2 Use Git?

While not necessary, Git offers many advantages. Git is a distributed version control system that offers many advantages for software development. If you are here to just get started with R, this might be better suited for your second go-around to focus your attention on other key aspects, however, if you want to use Git this section should provide you with a good idea of how to get started with Git and provide some introduction. There is also an important distinction between Git and GitHub. Git is a version control system that allows you to track changes in your code, while GitHub is a web-based platform that allows you to store your code in the cloud and collaborate with others. You can download Git from https://git-scm.com/. Creating a GitHub account is free and can be done at https://github.com/.

Some of the key benefits of using Git include:

  • Branching and merging: Git’s branching capabilities are one of its biggest advantages. Unlike centralized version control systems, Git branches are cheap and easy to merge. This facilitates the feature branch workflow popular with many Git users.
  • Distributed development: Git is a distributed version control system, which means that each developer gets their own local repository, complete with a full history of commits. Having a full local history makes Git fast and allows developers to work offline or remotely.
  • Security: Git uses a popular cryptographic hash algorithm known as secure hash function (SHA1) to name and identify things in its database 2.
  • Flexibility: Git is flexible and works on all machines.
  • Small and quick: Git is small and quick. Non-linear development: Git supports non-linear development.
  • Widespread acceptance: Git has widespread acceptance and a large user community.

However, nothing is prefect some of the disadvantages of using Git include:

  • Learning curve: Git has a steep learning curve due to the non-intuitive nature of its commands.
  • Binary files: Git does not excel at dealing with binary files. If files with non-text information are changed or utilized often, Git can slow down.
  • Working in the wrong area: It is possible to accidentally work in the wrong area of a repository.
  • Forced collaboration: Git can force collaboration on developers.
  • Permissions for users: Managing permissions for users can be difficult.
  • Git is not particularly well-suited for handling large files. If files with non-text information are changed or utilized often, Git can slow down. While there are options such as Git Large File Storage (LFS) https://git-lfs.com/ this can also add another layer of complexity when getting started.

For additional information about using Git we refer you to Chapter 14.

1.3 Download Data for this book

To download the data we will be using you must first ensure that you have the googledrive package installed. You can install the googledrive package by running install.packages('googledrive').

Code
library('googledrive')
dir.create(here::here('DATA'))
dir.create(here::here('DATA/sql_db'))

drive_download(as_id("1ojkjNZ-upjcOyU3Pa3p-4A46NXJSeRzn"),
               path = here::here("DATA/NHANES.zip"),
               overwrite = TRUE,
               verbose = TRUE)

At this point a prompt will appear in the console the following message >> The googledrive package is requesting access to your Google account. Enter ‘1’ to start a new auth process or select a pre-authorized account.

If you have used a one of the Google’s R packages previously, your account will be listed. Otherwise, you can use the “Send me to the browser for a new auth process.” option.

Code
unzip(here::here("DATA/NHANES.zip"),
      exdir = here::here("DATA/sql_db/"))

unlink(here::here("DATA/NHANES.zip"))

1.4 Using renv

The renv package helps you create reproducible environments for your R projects. Use renv to make your R projects more:

  • Isolated: Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because renv gives each project its own private package library.

  • Portable: Easily transport your projects from one computer to another, even across different platforms. renv makes it easy to install the packages your project depends on.

  • Reproducible: renv records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.

To install renv you can use install.packages('renv').

In order for renv to start working within your project you will need to make sure that you are in the correct project work space, you can use the function usethis::proj_get() to retrieve the active project and, if necessary, attempts to set it in the first place. Then you can use renv::init(), this will initialize renv for that project and it will keep a log of packages and package versions used in your project will be recorded into a lockfile, called renv.lock.

If you install a new package or update an existing package within a project renv will automatically update the contents of renv.lock so that the package, version of the package and, if known, the external source from which that package can be retrieved. In order to record the packages and package versions used you will need to run renv::snapshot() from within the R project.

You can use renv::status() to report any differences between the project’s lockfile and the current state of the project’s library.

Finally to restore or simulate a project which used renv you can navigate to the project and use renv::restore() to restore a project’s dependencies from a lockfile, as previously generated by renv::snapshot().