Movatterモバイル変換


[0]ホーム

URL:


arrow 22.0.0

New features

Minor improvements and fixes

arrow 21.0.0.1

Minor improvements andfixes

arrow 21.0.0

New features

Minor improvements andfixes

arrow 20.0.0.2

Minor improvements andfixes

arrow 20.0.0

Minor improvements andfixes

arrow 19.0.1.1

Minor improvements andfixes

arrow 19.0.1

This release primarily updates the underlying Arrow C++ version usedby the package to version 19.0.1 and includes all changes from the19.0.0 and 19.0.1 releases. For what’s changed in Arrow C++ 19.0.0,please see theblogpost andchangelog.For what’s changed in Arrow C++ 19.0.1, please see theblogpost andchangelog.

arrow 18.1.0

Minor improvements andfixes

arrow 17.0.0

New features

Minor improvements andfixes

arrow 16.1.0

New features

Minor improvements andfixes

arrow 15.0.1

New features

Minor improvements andfixes

arrow 14.0.2.1

Minor improvements andfixes

arrow 14.0.2

Minor improvements andfixes

arrow 14.0.0.2

Minor improvements andfixes

Installation

arrow 14.0.0.1

Minor improvements andfixes

arrow 14.0.0

New features

Minor improvements andfixes

Installation

arrow 13.0.0.1

arrow 13.0.0

Breaking changes

New features

Minor improvements andfixes

Installation

Docs

arrow 12.0.1.1

arrow 12.0.1

arrow 12.0.0

New features

Installation

Minor improvements andfixes

arrow 11.0.0.3

Minor improvements andfixes

arrow 11.0.0.2

Breaking changes

New features

Docs

Reading/writing data

dplyr compatibility

Function bindings

Arrow object creation

Installation

Minor improvements andfixes

arrow 10.0.1

Minor improvements and fixes:

arrow 10.0.0

Arrow dplyr queries

Several new functions can be used in queries:

The package now has documentation that lists alldplyrmethods and R function mappings that are supported on Arrow data, alongwith notes about any differences in functionality between queriesevaluated in R versus in Acero, the Arrow query engine. See?acero.

A few new features and bugfixes were implemented for joins:

Some changes to improve the consistency of the API:

Finally, long-running queries can now be cancelled and will aborttheir computation immediately.

Arrays and tables

as_arrow_array() can now takeblob::bloband?vctrs::list_of, which convert to binary and listarrays, respectively. Also fixed an issue whereas_arrow_array() ignored type argument when passed aStructArray.

Theunique() function works on?Table,?RecordBatch,?Dataset, and?RecordBatchReader.

Reading and writing

write_feather() can takecompression = FALSE to choose writing uncompressedfiles.

Also, a breaking change for IPC files inwrite_dataset(): passing"ipc" or"feather" toformat will now write files with.arrow extension instead of.ipc or.feather.

Installation

As of version 10.0.0,arrow requires C++17 to build.This means that:

arrow 9.0.0

Arrow dplyr queries

Reading and writing

Arrays and tables

Packaging

arrow 8.0.0

Enhancements to dplyr anddatasets

Enhancements to date andtime support

Extensibility

Concatenation Support

Arrow arrays and tables can be easily concatenated:

Other improvements and fixes

arrow 7.0.0

Enhancements to dplyr anddatasets

CSV

Other improvements andfixes

Installation

Under-the-hood changes

arrow 6.0.1

arrow 6.0.0

There are now two ways to query Arrow data:

1. ExpandedArrow-native queries: aggregation and joins

dplyr::summarize(), both grouped and ungrouped, is nowimplemented for Arrow Datasets, Tables, and RecordBatches. Because datais scanned in chunks, you can aggregate over larger-than-memory datasetsbacked by many files. Supported aggregation functions includen(),n_distinct(),min(),max(),sum(),mean(),var(),sd(),any(), andall().median() andquantile()with one probability are also supported and currently return approximateresults using the t-digest algorithm.

Along withsummarize(), you can also callcount(),tally(), anddistinct(),which effectively wrapsummarize().

This enhancement does change the behavior ofsummarize()andcollect() in some cases: see “Breaking changes” belowfor details.

In addition tosummarize(), mutating and filteringequality joins (inner_join(),left_join(),right_join(),full_join(),semi_join(), andanti_join()) with are alsosupported natively in Arrow.

Grouped aggregation and (especially) joins should be consideredsomewhat experimental in this release. We expect them to work, but theymay not be well optimized for all workloads. To help us focus ourefforts on improving them in the next release, please let us know if youencounter unexpected behavior or poor performance.

New non-aggregating compute functions include string functions likestr_to_title() andstrftime() as well ascompute functions for extracting date parts (e.g. year(),month()) from dates. This is not a complete list ofadditional compute functions; for an exhaustive list of availablecompute functions seelist_compute_functions().

We’ve also worked to fill in support for all data types, such asDecimal, for functions added in previous releases. All typelimitations mentioned in previous release notes should be no longervalid, and if you find a function that is not implemented for a certaindata type, pleasereport anissue.

2. DuckDB integration

If you have theduckdb packageinstalled, you can hand off an Arrow Dataset or query object toDuckDB for further querying using theto_duckdb() function. This allows you to use duckdb’sdbplyr methods, as well as its SQL interface, to aggregatedata. Filtering and column projection done beforeto_duckdb() is evaluated in Arrow, and duckdb can push downsome predicates to Arrow as well. This handoffdoes not copythe data, instead it uses Arrow’s C-interface (just like passing arrowdata between R and Python). This means there is no serialization or datacopying costs are incurred.

You can also take a duckdbtbl and callto_arrow() to stream data to Arrow’s query engine. Thismeans that in a single dplyr pipeline, you could start with an ArrowDataset, evaluate some steps in DuckDB, then evaluate the rest inArrow.

Breaking changes

Installation on Linux

Other enhancements and fixes

Internals

arrow 5.0.0.2

This patch version contains fixes for some sanitizer and compilerwarnings.

arrow 5.0.0

More dplyr

CSV writing

C interface

Other enhancements

arrow 4.0.1

arrow 4.0.0.1

arrow 4.0.0

dplyr methods

Many moredplyr verbs are supported on Arrowobjects:

Over 100 functions can now be called on Arrow objects inside adplyr verb:

Datasets

Other improvements

Installation andconfiguration

arrow 3.0.0

Python and Flight

Enhancements

Bug fixes

Packaging and installation

arrow 2.0.0

Datasets

AWS S3 support

Flight RPC

Flightis a general-purpose client-server framework for high performancetransport of large datasets over network interfaces. Thearrow R package now provides methods for connecting toFlight RPC servers to send and receive data. Seevignette("flight", package = "arrow") for an overview.

Computation

Packaging and installation

Bug fixes and otherenhancements

arrow 1.0.1

Bug fixes

arrow 1.0.0

Arrow format conversion

Datasets

Other enhancements

Bug fixes and deprecations

Installation and packaging

arrow 0.17.1

arrow 0.17.0

Feather v2

This release includes support for version 2 of the Feather fileformat. Feather v2 features full support for all Arrow data types, fixesthe 2GB per-column limitation for large amounts of string data, and itallows files to be compressed using eitherlz4 orzstd.write_feather() can write either version2 orversion 1 Featherfiles, andread_feather() automatically detects which fileversion it is reading.

Related to this change, several functions around reading and writingdata have been reworked.read_ipc_stream() andwrite_ipc_stream() have been added to facilitate writingdata to the Arrow IPC stream format, which is slightly different fromthe IPC file format (Feather v2is the IPC file format).

Behavior has been standardized: allread_<format>() return an Rdata.frame(default) or aTable if the argumentas_data_frame = FALSE; allwrite_<format>() functions return the data object,invisibly. To facilitate some workflows, a specialwrite_to_raw() function is added to wrapwrite_ipc_stream() and return theraw vectorcontaining the buffer that was written.

To achieve this standardization,read_table(),read_record_batch(),read_arrow(), andwrite_arrow() have been deprecated.

Python interoperability

The 0.17 Apache Arrow release includes a C data interface that allowsexchanging Arrow data in-process at the C level without copying andwithout libraries having a build or runtime dependency on each other.This enables us to usereticulate to share data between Rand Python (pyarrow) efficiently.

Seevignette("python", package = "arrow") fordetails.

Datasets

Installation

Other bug fixes andenhancements

arrow 0.16.0.2

arrow 0.16.0

Multi-file datasets

This release includes adplyr interface to ArrowDatasets, which let you work efficiently with large, multi-file datasetsas a single entity. Explore a directory of data files withopen_dataset() and then usedplyr methods toselect(),filter(), etc. Work will be donewhere possible in Arrow memory. When necessary, data is pulled into Rfor further computation.dplyr methods are conditionallyloaded if you havedplyr available; it is not a harddependency.

Seevignette("dataset", package = "arrow") fordetails.

Linux installation

A source package installation (as from CRAN) will now handle its C++dependencies automatically. For common Linux distributions and versions,installation will retrieve a prebuilt static C++ library for inclusionin the package; where this binary is not available, the package executesa bundled script that should build the Arrow C++ library with no systemdependencies beyond what R requires.

Seevignette("install", package = "arrow") fordetails.

Data exploration

Compression

Other fixes and improvements

arrow 0.15.1

arrow 0.15.0

Breaking changes

New features

Other upgrades

arrow 0.14.1

Initial CRAN release of thearrow package. Key featuresinclude:


[8]ページ先頭

©2009-2025 Movatter.jp