Movatterモバイル変換


[0]ホーム

URL:


Skip to contents

Introduction for developers

Source:vignettes/developing.Rmd
developing.Rmd

If you’re interested in contributing to arrow, this article explainsour approach at a high-level. At the end of the article there we haveincluded links to articles that expand on this in various ways.

Package structure and conventions

It helps to first outline the structure of the package.

C++ is an object-oriented language, so the core logic of the ArrowC++ library is encapsulated in classes and methods. In the arrow Rpackage, these classes are implemented asR6 classes, most of which are exportedfrom the namespace.

In order to match the C++ naming conventions, the R6 classes arenamed in “TitleCase”, e.g. RecordBatch. This makes it easyto look up the relevant C++ implementations in thecode ordocumentation. To simplifythings in R, the C++ library namespaces are generally dropped orflattened; that is, where the C++ library hasarrow::io::FileOutputStream, it is justFileOutputStream in the R package. One exception is for thefile readers, where the namespace is necessary to disambiguate. Soarrow::csv::TableReader becomesCsvTableReader, andarrow::json::TableReaderbecomesJsonTableReader.

Some of these classes are not meant to be instantiated directly; theymay be base classes or other kinds of helpers. For those that you shouldbe able to create, use the$create() method to instantiatean object. For example,rb <- RecordBatch$create(int = 1:10, dbl = as.numeric(1:10))will create aRecordBatch. Many of these factory methodsthat an R user might most often encounter also have a “snake_case”alias, in order to be more familiar for contemporary R users. Sorecord_batch(int = 1:10, dbl = as.numeric(1:10)) would dothe same asRecordBatch$create() above.

The typical user of the arrow R package may never deal directly withthe R6 objects. We provide more R-friendly wrapper functions as ahigher-level interface to the C++ library. An R user can callread_parquet() without knowing or caring that they’reinstantiating aParquetFileReader object and calling the$ReadFile() method on it. The classes are there andavailable to the advanced programmer who wants fine-grained control overhow the C++ library is used.

Approach to implementing functionality

Our general philosophy when implementing functionality is to match toexisting R function signatures which may be familiar to users, whilstexposing any additional functionality available via Arrow. The intentionis to allow users to be able to use their existing code with minimalchanges, or new code or approaches to learn.

There are a number of ways in which we do this:

  • When implementing a function with an R equivalent, support thearguments available in R version as much as possible - use the originalparameter names and translate to the arrow parameter name inside thefunction

  • If there are arrow parameters which do not exist in the Rfunction, allow the user to pass in those options through too

  • Where necessary add extra arguments to the function signature fora feature that doesn’t exist in R but does in Arrow (e.g., passing in aschema when reading a CSV dataset)

Further Reading


[8]ページ先頭

©2009-2025 Movatter.jp