= PROTECT(Rf_ScalarInteger(1));
SEXP int_one
// Create an R list of length 1, put `int_one` in it
= PROTECT(Rf_allocVector(VECSXP, 1));
SEXP result (result, 0, int_one);
SET_VECTOR_ELT
(2); // unprotect `int_one` and `result`
UNPROTECTreturn result;
Introduction
This is another entry in my series of R + C based posts (you can see a full list here). This article focuses on a somewhat esoteric skill: constructing a global R object at the C level in a persistent way. By “persistent”, I mean that this object will only be created once (at package load time), and will be reusable throughout the life of the R session. You’ll be able to call it from other C files, and can even return the object to the R side. The other “trick” that will be used is a way to run arbitrary C code on R package load, using .onLoad()
+ .Call()
. This is actually much more generic than what we will use it for in this article, so it is worth paying attention to in case you have other uses for it. Along the way, I’ll also use C header files to share C functions/objects between files, and discuss a bit about how I set up my R packages that use C code.
vctrs
and rlang
.Why are persistent R objects callable from C useful? I can think of two reasons.
- The first is performance. You might have a simple R object (for instance, an integer vector holding
1
) that generally takes a small amount of time to create, but is generated and destroyed thousands of time across your C code base. To save a little bit of time, you might want to make this a persistent, unchangeable, global variable. - The other is just for readability. Rather than having to deal with
PROTECT()
ing andUNPROTECT()
ing common variables likeint_one
in the partial example below:
You can instead declare int_one
as a global variable with a more permanent meaningful name, like shared_int_one
, and use it without worrying about protection:
= PROTECT(Rf_allocVector(VECSXP, 1));
SEXP result
// can use `shared_int_one` without creating a new one
(result, 0, shared_int_one);
SET_VECTOR_ELT
(1); // only have to care about `result` protection
UNPROTECTreturn result;
When you have a large C based R package, these kinds of things really pay off in terms of increasing readability and cohesiveness of your package, especially if the global variable takes a few lines of C code to create each time. Additionally, if naming conventions for these kinds of variables are used consistently, you’ll immediately be able to recognize what shared_empty_dbl
is without having to look it up in the code base. This makes reading over C code a more pleasant experience.
The rest of this post will focus on creating a package that constructs some of these global variables. Specifically, we will look at creating a shared empty integer and a shared character vector, and then we will see how to return them back to the R side. One thing to keep in mind is that these kinds of things take a lot of setup on the C side for the first object, but adding subsequent objects is much simpler.
If you haven’t read Now You C Me, and you aren’t too familiar with working on an R package with C code in it, you might want to go check out that post before continuing. It will teach you the basics of working with an R package containing C code.
The final product is an R package called cshared. It contains one R function, get_shared_objects()
. I’ll discuss the bits and pieces of the package throughout the post, but that will be the ultimate reference for the end result.
Setup
First, some setup. We’ll leverage {usethis}
and {devtools}
to get our new package up and running. I’m assuming you are working in RStudio for this. The Now You C Me post describes these steps in much greater detail.
# Create a new R package, cshared
::create_package("~/path/to/location/for/the/package/cshared")
usethis
# Use roxygen2
::use_roxygen_md()
usethis
# As prompted by use_roxygen_md()
::document()
devtools
# Set up `cshared-package.R`, which also gives usethis a place to add extra
# roxygen namespace tags, which is used by `use_c()` later on.
::use_package_doc()
usethis
# Create a `src/shared.c` file, and add the all important registration info
# to `cshared-package.R`
::use_c("shared")
usethis
# Initialize the C DLL, otherwise document() will complain
::load_all(".")
devtools
# As prompted by use_c()
::document() devtools
Header Files
At this point you should be in an R package, and if you’ve opened shared.c
you should see this staring at you:
#define R_NO_REMAP
#include <R.h>
#include <Rinternals.h>
I actually like to move these defines / includes into a package API header file that I can #include
in all of my .c
files, so personally I’m going to create a cshared.h
file next, and move this over there. There’s not a shortcut for this, so in RStudio do File -> New File -> C++ File
then save it as cshared.h
in the src/
folder. Copy those three lines to that file, and remove them from shared.c
, replacing them with the following single include statement, which will have the same effect:
#include "cshared.h"
To prevent cshared.h
from accidentally being included twice in the same file, we should also add some header include guards:
#ifndef CSHARED_H
#define CSHARED_H
#define R_NO_REMAP
#include <R.h>
#include <Rinternals.h>
#endif
C -> R
Okay, now we have the basic structure set up, so let’s wire up a C function to be callable from the R side. For now, it will create a list containing an empty integer vector and a character vector holding "tidyverse"
, and return it to the R side. Later it will return the same list but holding the shared versions of these objects. Add the following function to shared.c
:
#include "cshared.h"
() {
SEXP cshared_get_shared_objects// An empty integer vector
= PROTECT(Rf_allocVector(INTSXP, 0));
SEXP empty_int
// Character vector of size 1, containing "hello world"
= PROTECT(Rf_allocVector(STRSXP, 1));
SEXP tidyverse (tidyverse, 0, Rf_mkChar("tidyverse"));
SET_STRING_ELT
// Initialize the output list, then insert our objects into it
= PROTECT(Rf_allocVector(VECSXP, 2));
SEXP out (out, 0, empty_int);
SET_VECTOR_ELT(out, 1, tidyverse);
SET_VECTOR_ELT
// Must unprotect 3 PROTECT() calls before exiting!
(3);
UNPROTECTreturn out;
}
To call this from R, we need an init.c
file that registers the C routine to the R side. We’ve done something like this in the other blog post, so create init.c
and fill it with:
#include <R.h>
#include <Rinternals.h>
#include <stdlib.h> // for NULL
#include <R_ext/Rdynload.h>
/* .Call calls */
extern SEXP cshared_get_shared_objects();
static const R_CallMethodDef CallEntries[] = {
{"cshared_get_shared_objects", (DL_FUNC) &cshared_get_shared_objects, 0},
{NULL, NULL, 0}
};
void R_init_cshared(DllInfo *dll) {
(dll, NULL, CallEntries, NULL, NULL);
R_registerRoutines(dll, FALSE);
R_useDynamicSymbols}
Over on the R side, we now need an R function that calls this cshared_get_shared_objects
routine. Call usethis::use_r("shared")
and fill the resulting R file with:
#' Get the shared objects
#'
#' @examples
#'
#' get_shared_objects()
#'
#' @export
<- function() {
get_shared_objects .Call(cshared_get_shared_objects)
}
Lastly, run devtools::load_all()
and devtools::document()
to recompile the package and ensure that the shiny new get_shared_objects()
is exported.
You should now be able to call:
get_shared_objects()
#> [[1]]
#> integer(0)
#>
#> [[2]]
#> [1] "tidyverse"
The tidyverse string
The final step is to make the tidyverse string global and shared. Now that we have the infrastructure set up, this is much more straightforward. Update utils.h
with a strings_tidyverse
variable:
#ifndef CSHARED_UTILS_H
#define CSHARED_UTILS_H
#include "cshared.h"
;
SEXP cshared_shared_empty_int
;
SEXP strings_tidyverse
#endif
Update utils.c
with:
#include "cshared.h"
#include "utils.h"
= NULL;
SEXP cshared_shared_empty_int
// This is new
= NULL;
SEXP strings_tidyverse
() {
SEXP cshared_init_utils= Rf_allocVector(INTSXP, 0);
cshared_shared_empty_int (cshared_shared_empty_int);
R_PreserveObject(cshared_shared_empty_int);
MARK_NOT_MUTABLE
// This is new
= Rf_allocVector(STRSXP, 1);
strings_tidyverse (strings_tidyverse);
R_PreserveObject(strings_tidyverse, 0, Rf_mkChar("tidyverse"));
SET_STRING_ELT(strings_tidyverse);
MARK_NOT_MUTABLE
return R_NilValue;
}
This does much of the same as what we did with cshared_shared_empty_int
. It creates a character vector of size 1 to overwrite the NULL
global variable, preserves it, sets the first element value to "tidyverse"
, then marks it as immutable.
Finally we can go back to shared.c
and use strings_tidyverse
.
#include "cshared.h"
#include "utils.h" // To access `cshared_shared_empty_int` and `strings_tidyverse`
() {
SEXP cshared_get_shared_objects= PROTECT(Rf_allocVector(VECSXP, 2));
SEXP out (out, 0, cshared_shared_empty_int);
SET_VECTOR_ELT(out, 1, strings_tidyverse);
SET_VECTOR_ELT
(1);
UNPROTECTreturn out;
}
One thing that I hope is clear is how much more focused cshared_get_shared_objects()
is. It’s much easier to see what the purpose of the function is when you don’t have to worry about creating these common shared objects. Additionally, you only have to UNPROTECT()
1 value, out
, which makes things slightly easier to keep track of. I also appreciate the fact that we can give our global objects evocative names like strings_tidyverse
. If I had another string object I wanted to make into a global variable, I could call it strings_dplyr
. When I come across other C code that uses this variable, I immediately know what its value is because of this consistent naming convention.
Conclusion
These global variables are a neat trick for making code clearer, more internally consistent, and occasionally a bit faster. Additionally, being able to call arbitrary C code on R package load is a useful tool in more ways than just global variable initialization (which we didn’t get to explore in this post). In a later post, I hope to show how to use this trick to initialize a variable holding a call object that let’s you efficiently call an R function from C.