Movatterモバイル変換


[0]ホーム

URL:


Using etl

Ben Baumer

2025-07-24

etl is an R package to facilitateExtract -Transform - Load (ETL) operations formedium data.The end result is generally a populated SQL database, but the userinteraction takes place solely within R.

Usingetl

Instantiate anetl object using a string that determinesthe class of the resulting object, and the package that provides accessto that data. The trivialmtcars database is built intoetl.

library(etl)cars<-etl("mtcars")
## No database was specified so I created one for you at:
## /tmp/Rtmp00cuuZ/file276e72f83f738.sqlite3
class(cars)
## [1] "etl_mtcars"           "etl"                  "src_SQLiteConnection"## [4] "src_dbi"              "src_sql"              "src"

Pay careful attention to where the SQLite database is stored. Thedefault location is a temporary directory, but you will want to movethis to a more secure location if you want this storage to bepersistent. Seefile.copy() for examples on how to move afile.

Connect to a local or remote database

etl works with a local or remote database to store yourdata. Everyetl object extends adplyr::src_dbi object. If, as in the example above, you donot specify a SQL source, a localRSQLite database will becreated for you. However, you can also specify any source that inheritsfromdplyr::src_dbi.

Note: If you want to use a database other than a local RSQLite, youmust create themtcars database and have permission towrite to it first!

# For PostgreSQLlibrary(RPostgreSQL)db<-src_postgres(dbname ="mtcars",user ="postgres",host ="localhost")# Alternatively, for MySQLlibrary(RMariaDB)db<-src_mysql(dbname ="mtcars",user ="r-user",password ="mypass",host ="localhost")cars<-etl("mtcars", db)

At the heart ofetl are three functions:etl_extract(),etl_transform(), andetl_load().

Extract

The first step is to acquire data from an online source.

cars|>etl_extract()
## Extracting raw data...

This creates a local store of raw data.

Transform

These data may need to be transformed from their raw form to filessuitable for importing into SQL (usually CSVs).

cars|>etl_transform()

Load

Populate the SQL database with the transformed data.

cars|>etl_load()
## Loading 12 file(s) into the database...

Do it all at once

To populate the whole database from scratch, useetl_create.

cars|>etl_create()

You can also update an existing database without re-initializing, butwatch out for primary key collisions.

cars|>etl_update()

Step-by-step

Under the hood, there are three functions thatetl_update chains together:

getS3method("etl_update","default")
## function (obj, ...) ## {##     obj <- etl_load(etl_transform(etl_extract(obj, ...), ...), ##         ...)##     invisible(obj)## }## <bytecode: 0x5c2a32066318>## <environment: namespace:etl>

etl_create is simply a call toetl_updatethat forces the SQL database to be written from scratch.

getS3method("etl_create","default")
## function (obj, ...) ## {##     obj <- etl_cleanup(etl_update(etl_init(obj, ...), ...), ...)##     invisible(obj)## }## <bytecode: 0x5c2a340dc080>## <environment: namespace:etl>

Do Your Analysis

Now that your database is populated, you can work with it as asrc data table just like any otherdplyrsource.

cars|>tbl("mtcars")|>group_by(cyl)|>summarise(N =n(),mean_mpg =mean(mpg))
## Warning: Missing values are always removed in SQL aggregation functions.## Use `na.rm = TRUE` to silence this warning## This warning is displayed once every 8 hours.
## # Source:   SQL [?? x 3]## # Database: sqlite 3.50.1 [/tmp/Rtmp00cuuZ/file276e72f83f738.sqlite3]##     cyl     N mean_mpg##   <int> <int>    <dbl>## 1     4    11     26.7## 2     6     7     19.7## 3     8    14     15.1

Extendingetl

Create your own ETL packages

Suppose you want to create your own ETL package calledpkgname. All you have to do is write a package thatrequiresetl, and then you have to writeone S3methods:

etl_extract.etl_pkgname()

You may also wish to write

etl_transform.etl_pkgname()etl_load.etl_pkgname()

All of these functions must take and return an object of classetl_pkgname that inherits frometl. Please seethe “Extending etl” vignette for moreinformation.

Use other ETL packages

Packages that use theetl framework are available onCRAN and/or GitHub:

tools::dependsOnPkgs("etl")
## [1] "airlines"

[8]ページ先頭

©2009-2025 Movatter.jp