03: Research compendia with rrtools

By Paola Corrales and Elio Campitelli

Of course, you can always create a file structure that suits you and your work by hand. But if you are going to do that a lot, it might be nice to automate some of the work. We are going to use the rrtools package to create a “research compendium” using a couple of R functions and combine it with the things we learned on the git and GitHub section.

The first thing you need to know is that an rrtools compendium will be at the same time an R package. If you don’t have experience with R packages, don’t worry! This is why:

  • An R package is, in practice, just a folder with some special text files and a few sub-folders.
  • You don’t need to learn much about packages to use rtools and take advantage of it.
  • Here we’ll cover the necessary bits to work inside a compendium.
  • But if you want to learn more about R packages there are wonderful resources, for example: here.

The following instructions are adapted from the rrtools GitHub repository.

0. Create a Git-managed directory linked to an online repository

It is possible to use rrtools without Git, but usually you would want you research compendium to be managed by the version control software Git.

The name of the compendium and the project should be the same. Keep in mind that this will also be the package name so it has to follow some rules for everything to work properly. Your project name must:

  • contain only ASCII letters, numbers, and ‘.’
  • have at least two characters
  • start with a letter (not a number)
  • not end with ‘.’

For convenience we will use pkgname as a placeholder for this package’s name.

  1. Create an repository on GitHub. Remember to pick a valid R package name.
  2. In RStudio, start a new Project:
  • File > New Project > Version Control > Git. In the “repository URL” paste the URL of your new GitHub repository. It will be something like this https://github.com/yourusername/pkgname.git.
  • Make sure you know where you are creating the project.
  • Choose “Open in new session”.
  • Click on “Create Project”.

1. rrtools::use_compendium()

This uses usethis::create_package() to create a basic R package in the pkgname directory. The function will also instruct on the next steps.

Create the compendium

  1. In the new project, run rrtools::use_compendium(simple = FALSE) to create the compendium.
    You will see a message. At the end it will ask you:
Overwrite pre-existing file 'testrrtools.Rproj'?

1: Not now
2: Yes
3: No

Select the “Yes” option.

(It’s possible that a new RStudio windowd is opened at this time. Close it and continue on the original session.)

  1. Edit the DESCRIPTION file (located in your pkgname directory) to include accurate metadata, e.g. your ORCID. This is one of the files that makes a regular folder an “R package”.

  2. Periodically update the Imports: section of the DESCRIPTION file with the names of packages used in the code we write in the .Rmd document(s) by running rrtools::add_dependencies_to_description()

2. usethis::use_mit_license(copyright_holder = "My Name")

This adds a reference to the MIT license in the DESCRIPTION file and generates a LICENSE file listing the name provided as the copyright holder.

To use a different license, replace this line with any of the licenses mentioned here: ?usethis::use_mit_license()

You can read more about licences here.

3. rrtools::use_readme_qmd()

This generates a README.qmd that renders to README.md, ready to display on the repository home page on GitHub.

It contains:

  • A template citation to show others how to cite your project. Edit this to include the correct title and DOI when it gets published!
  • License information for the text, figures, code and data in your compendium.

This function also adds:

  • Two other markdown files: a code of conduct for users CONDUCT.md, and basic instructions for people who want to contribute to your project CONTRIBUTING.md, including for first-timers to git and GitHub.
  • A .binder/Dockerfile file that makes Binder work, if your compendium is hosted online (e.g. GitHub, Zenodo, Figshare, Dataverse, etc.)

4. rrtools::use_analysis()

This function has three location = options: top_level to create a top-level analysis/ directory and a few more options more associated with R packages. The default options is a top-level analysis/.

This folder will have the following structure:

analysis/
|
├── paper/
│   ├── paper.Rmd       # this is the main document to edit
│   └── references.bib  # this contains the reference list information

├── figures/            # location of the figures produced by the Rmd
|
├── data/
│   ├── raw_data/       # data obtained from elsewhere
│   └── derived_data/   # data generated during the analysis
|
└── templates
    ├── journal-of-archaeological-science.csl
    |                   # this sets the style of citations & reference list
    ├── template.docx   # used to style the output of the paper.Rmd
    └── template.Rmd
  • the paper.Rmd is ready to write in and render with bookdown. It includes:

    • a YAML header that identifies the references.bib file and the supplied csl file (to style the reference list).
    • a colophon that adds some git commit details to the end of the document. This means that the output file (HTML/PDF/Word) is always traceable to a specific state of the code.
  • the references.bib file has just one item to demonstrate the format. It is ready to insert more reference details or be replaced it with a .bib file created using Reference Manager tools like Zotero.

  • you can replace the supplied csl file with a different citation style from https://github.com/citation-style-language/

Remember that the Imports: field in the DESCRIPTION file must include the names of all packages used in analysis documents (e.g. paper.Rmd). rrtools have a helper function rrtools::add_dependencies_to_description() that will scan the Rmd file, identify libraries used in there, and add them to the DESCRIPTION file.

When you create this file structure using this function you can choose whether you want to add your data to the repository. If data_in_git = FALSE you will exclude files in the data/ directory from being tracked by git and prevent them from appearing on GitHub. You should set data_in_git = FALSE if your data files are large (>100 mb is the limit for GitHub) or you do not want to make the data files publicly accessible on GitHub. More about sharing data in the next section.

Working with your own code

You can add small snippets of code right inside your Rmd document, but this is not optimal for code that you might want to reuse among documents associated with the same protect or in various R scripts. It’s can also lead to a very long and hard to understand Rmd document.

For complex functions, it’s better to put them in the R sub folder and include devtools::load_all(".") at the top of your paper.Rmd. In an interactive session, you an use Ctrl + Shift + L (or run devtools::load_all(".") in the console).
This will make all functions and objects in that folder available in your environment.

A stricter approach is to build your code as a package, install it in your system and load it with library(pkgname) or with pkgname::. But to do this, you need to know about documenting code.

Documenting code

Whether you choose to write code in chunks, script files or as functions, it important to your future self and others to document your code. If you are not used to doing it, the easiest way to start is by commenting everything without much thinking. With time you can decide if you need commenting every single line or maybe only the key decisions you made when you where writing the code.

If you decide to work with functions there are a few tools that can help you to document them and make that documentation available in the Help Pane (remember, a compendium is also a package, so you can take advantage of that!). Each function will live in a .R file (there are some exceptions but we won’t worry about that at this point) and will have a header with the name, descriptions and more information. You can use simple # to comment or you can use roxygen2 comments using #'. The later will allow you to generate the package documentation.

The following instructions to document a function are adapted from the R packages book.

The documentation workflow

  1. Create a dummy function that add to numbers and save it to add-numbers.R.

  2. Add roxygen comments to your function:

#' Add together two numbers
#' 
#' @param x A number.
#' @param y A number.
#' @return The sum of \code{x} and \code{y}.
#' @examples
#' add(1, 1)
#' add(10, 1)
#' @export
add <- function(x, y) {
  x + y
}
  • The first line is the title
  • You can also add a description after the title
  • @params are the function arguments, in this case x and y.
  • @examples starts the examples sections, R will run this code when the package is build.
  • The @export comment is important to make the function accessible via pkgname::add() or library(pkgname).
  1. Run devtools::document() (or press Ctrl/Cmd + Shift + D in RStudio) to build the documentation based on these roxygen comments

  2. Preview documentation with ?add.

  3. Rinse and repeat until the documentation looks the way you want.

Resources

A guide to modern reproducible data science with R