Movatterモバイル変換


[0]ホーム

URL:


rde Tutorial

Stefan Kloppenborg

2025-12-12

When sharing R Notebooks with others, it’s not uncommon for thenotebook to reference data that is only available on your machine. Itcould be that the recipient does not have access to a certain database,or it could be as simple as you forgetting to email them a CSV file withthe data. In either of these cases, the analysis in the notebook is notself-contained. The packagerde solves this problem byallowing you to embed the data directly in the notebook.

If you’re running on an X11 system (i.e. Linux, or similar), pleaseread the section on configuring the clipboard below beforeproceeding.

Let’s take an example. Let’s say that we have a spreadsheet ofpopulations of the ten most populous countries (data originally takenfrom[1]). Somewhere near the top of our RNotebook, we have a code chunk that looks like the following:

fname<-"country_pop.csv"
pop.data<-read.csv(fname,stringsAsFactors =FALSE)
kable(pop.data)
CountryPopulation
China1384688986
India1296834042
United States329256465
Indonesia262787403
Brazil208846892
Pakistan207862518
Nigeria195300343
Bangladesh159453001
Russia142122776
Japan126168156

Now, if you send your notebook to someone else and don’t send alongthe filecountry_pop.csv, that person can look at yournotebook, but they won’t be able to re-run it.

If you want to include the data directly in the notebook, you can userde to do so.

rde provides two functions:load_rde_varandcopy_rde_var. You’ll useload_rde_var inyour notebook, and you’ll usecopy_rde_var to create one ofthe arguments thatload_rde_var needs.

The functionload_rde_var takes three arguments. Thefirst argument is a boolean (we’ll come back to this). The secondargument isload.fcn. This is a piece of code that loadsdata from a source of your choosing (a CSV file, a database, etc.). Thisis the code that needs to work on your computer; it does not need towork on the computer of the notebook recipient. The third argument iscache. This argument is an encoded copy of the data.

When you callload_rde_var, the function will first tryto load the data using the code in theload.fcn argument.If this fails, it will fall back on using thecache. In thelatter case, it will give you a message to say that it used the cacheinstead of loading new data. This is what the recipient of your notebookwould see if you neglected to send them the data file.

Ifload_rde_var succeeds in loading the data using thecode inload.fcn, it will then compare this data with thedata incache. If there’s a difference, it will give you awarning. If you expected the data to change, you can go ahead and updatethe third argument (again usingcopy_rde_var); if youdidn’t expect the data to change, well, now you know that it didchange.

Now we’ll come back to that first argument ofload_rde_var. This argument is a boolean calleduse.cache. This allows you to forceload_rde_var to load data from the cache instead of runningthe code inload.fcn. Under most circumstances, this shouldbeFALSE. However, sometimes, it may take a very long timeto load your data from its original source (maybe the code executes avery long running database query, or scrapes a million webpages and justgives you a summary statistic). In the case that you don’t want to waitaround while you load the data from its original source again, you canset that first argument toTRUE and just use the cacheddata.

Continuing on with our example of loading the populations of the tenmost populous countries, we would start by wrapping our existing codeinside the second argument ofload_rde_var. It would nowlook the this:

library(rde)pop.data<-load_rde_var(use.cache =FALSE,load.fcn = {read.csv(fname,stringsAsFactors =FALSE)  },cache =NULL# We'll fill this in shortly)#> Cache is empty or not a string#> Warning in doTryCatch(return(expr), name, parentenv, handler): Cached data is#> different from loaded data

If we run that code as is, it will raise a warning. We would expectthis since there is nothing in thecache argument, so ofcourse, the result of theload.fcn andcacheare different. We’ll need to fill in cache argument ofload_rde_var.

You’d normally start by loading your data into memory as you normallywould (the code above would work fine). Once the datapop.data is in memory, you’re going to copy it into thecache argument ofload_rde_var. You can usecopy_rde_var to do so.

In the console, you would type:

copy_rde_var(pop.data)

When you execute this, your clipboard will contain some R code thatwill recreate the variable. Your clipboard will look like this:

rde1QlpoOTFBWSZTWQy+/kYAAIB3/v//6EJABRg/WlQv797wYkAAAMQiABBAACAAAZGwANk0RTKejU9TRoBoGgGjTRoBoGgaGymE0Kp+qemmkDNQ0YmJk0AA0xNADQNPUaA0JRhDTJoANAAAAAAAAEJx2Eja7QBKMKPPkRAx63wSAWt31AABs1zauhwHifs5WlltyIyQKAAAZEAZGQYMIZEA6ZAPHVMEB71jSCqdlsiR/eSYkzQkRq5RoXgvNNZnB5RSOvKaTGFtc/SXc74AhzqhMEJvdisEGVfo7UYngc0AwGqTvTHx8CBZTzE9OQZZVY8KAhHAhrG4RCeilM0rXKkdpjGqyNgJwAkmnPQOMYrLlQ4YTIv0WyxfYdkd9WSWUsvggC/i7kinChIBl9/IwA==

You can go ahead and paste that into thecache argumentofload_rde_var. Make sure that you paste it inside a pairof quotes. The code at the top of your notebook will now look like thefollowing. Line breaks and spaces within thecahce argumentdon’t matter, so don’t worry about indenting to make your codepretty.

library(rde)pop.data<-load_rde_var(use.cache =FALSE,load.fcn = {    fname<-system.file("extdata","country_pop.csv",package ="rde")read.csv(fname,stringsAsFactors =FALSE)  },cache ="    rde1QlpoOTFBWSZTWQy+/kYAAIB3/v//6EJABRg/WlQv797wYkAAAMQiABBAACAAAZGwANk0RTKejU9T    RoBoGgGjTRoBoGgaGymE0Kp+qemmkDNQ0YmJk0AA0xNADQNPUaA0JRhDTJoANAAAAAAAAEJx2Eja7QBK    MKPPkRAx63wSAWt31AABs1zauhwHifs5WlltyIyQKAAAZEAZGQYMIZEA6ZAPHVMEB71jSCqdlsiR/eSY    kzQkRq5RoXgvNNZnB5RSOvKaTGFtc/SXc74AhzqhMEJvdisEGVfo7UYngc0AwGqTvTHx8CBZTzE9OQZZ    VY8KAhHAhrG4RCeilM0rXKkdpjGqyNgJwAkmnPQOMYrLlQ4YTIv0WyxfYdkd9WSWUsvggC/i7kinChIB    l9/IwA==  ")

Now, when we run this, it won’t raise a warning becauseload.fcn andcache are the same.

If you send this notebook to someone else, but neglect to send thedata file, they can now still play around with the data because it’s nowdirectly in the code. They will, however, get a message indicating thatthe data has been loaded from cache.

What if you inadvertently change the data file? Or if you’re readingthe data from a database that changes? Well, if that happens,load.fcn andcache won’t match. In this case,you’ll get a warning. This can be useful: maybe you didn’t expect thedata to change, or maybe you need to update some of the text in yournotebook — maybe some of your conclusions or explanation needs tochange. Assuming that the change in the data file (or database) isn’tsome sort of mistake, make sure that you update the value of thecache argument with the new data (again, you’ll use thecopy_rde_var function to do so).

Installing on X11 Systems

If you’re on an X11 system (like Linux), you’ll need to install someadditional software. You should not have to do this on Windows or Mac.On X11 systems, you’ll need to install eitherxsel orxclip. Depending on the distribution that you use, you willprobably install it using a command likesudo apt-get install xsel

References

[1]
U.S. Census Bureau,“CurrentPopulation.” [Online]. Available:https://www.census.gov/popclock/print.php?component=counter.[Accessed: 13-Mar-2018]

[8]ページ先頭

©2009-2025 Movatter.jp