Movatterモバイル変換


[0]ホーム

URL:


Skip to contents

Metadata

Source:vignettes/metadata.Rmd
metadata.Rmd

This article describes the various data and metadata object typessupplied by arrow, and documents how these objects are structured.

Arrow metadata classes

The arrow package defines the following classes for representingmetadata:

  • ASchema is a list ofField objects usedto describe the structure of a tabular data object; where
  • AField specifies a character string name and aDataType; and
  • ADataType is an attribute controlling how values arerepresented

Consider this:

df<-data.frame(x=1:3, y=c("a","b","c"))tb<-arrow_table(df)tb$schema
## Schema## x: int32## y: string#### See $metadata for additional Schema metadata

The schema that has been automatically inferred could also bemanually created:

schema(field(name="x", type=int32()),field(name="y", type=utf8()))
## Schema## x: int32## y: string

Theschema() function allows the following shorthand todefine fields:

schema(x=int32(), y=utf8())
## Schema## x: int32## y: string

Sometimes it is important to specify the schema manually,particularly if you want fine-grained control over the Arrow datatypes:

arrow_table(df, schema=schema(x=int64(), y=utf8()))
## Table## 3 rows x 2 columns## $x <int64>## $y <string>#### See $metadata for additional Schema metadata
arrow_table(df, schema=schema(x=float64(), y=utf8()))
## Table## 3 rows x 2 columns## $x <double>## $y <string>#### See $metadata for additional Schema metadata

R object attributes

Arrow supports custom key-value metadata attached to Schemas. When weconvert adata.frame to an Arrow Table or RecordBatch, thepackage stores anyattributes() attached to the columns ofthedata.frame in the Arrow object Schema. Attributes addedto objects in this fashion are stored under ther key, asshown below:

# data frame with custom metadatadf<-data.frame(x=1:3, y=c("a","b","c"))attr(df,"df_meta")<-"custom data frame metadata"attr(df$y,"col_meta")<-"custom column metadata"# when converted to a Table, the metadata is preservedtb<-arrow_table(df)tb$metadata
## $r## $r$attributes## $r$attributes$df_meta## [1] "custom data frame metadata"###### $r$columns## $r$columns$x## NULL#### $r$columns$y## $r$columns$y$attributes## $r$columns$y$attributes$col_meta## [1] "custom column metadata"###### $r$columns$y$columns## NULL

It is also possible to assign additional string metadata under anyother key you wish, using a command like this:

tb$metadata$new_key<-"new value"

Metadata attached to a Schema is preserved when writing the Table toArrow/Feather or Parquet formats. When reading those files into R, orwhen callingas.data.frame() on a Table or RecordBatch, thecolumn attributes are restored to the columns of the resultingdata.frame. This means that custom data types, includinghaven::labelled,vctrs annotations, andothers, are preserved when doing a round-trip through Arrow.

Note that the attributes stored in$metadata$r are onlyunderstood by R. If you write adata.frame withhaven columns to a Feather file and read that in Pandas,thehaven metadata won’t be recognized there. Similarly,Pandas writes its own custom metadata, which the R package does notconsume. You are free, however, to define custom metadata conventionsfor your application and assign any (string) values you want to othermetadata keys.

Further reading


[8]ページ先頭

©2009-2025 Movatter.jp