Movatterモバイル変換


[0]ホーム

URL:


Skip to contents

Define Partitioning for a Dataset

Source:R/dataset-partition.R
Partitioning.Rd

Pass aPartitioning object to aFileSystemDatasetFactory's$create()method to indicate how the file's paths should be interpreted to definepartitioning.

DirectoryPartitioning describes how to interpret raw path segments, inorder. For example,schema(year = int16(), month = int8()) would definepartitions for file paths like "2019/01/file.parquet","2019/02/file.parquet", etc. In this schemeNULL values will be skipped. Inthe previous example: when writing a dataset if the month wasNA (orNULL), the files would be placed in "2019/file.parquet". When reading, therows in "2019/file.parquet" would return anNA for the month column. Anerror will be raised if an outer directory isNULL and an inner directoryis not.

HivePartitioning is for Hive-style partitioning, which embeds fieldnames and values in path segments, such as"/year=2019/month=2/data.parquet". Because fields are named in the pathsegments, order does not matter. This partitioning scheme allowsNULLvalues. They will be replaced by a configurablenull_fallback whichdefaults to the string"__HIVE_DEFAULT_PARTITION__" when writing. Whenreading, thenull_fallback string will be replaced withNAs asappropriate.

PartitioningFactory subclasses instruct theDatasetFactory to detectpartition features from the file paths.

Factory

BothDirectoryPartitioning$create() andHivePartitioning$create()methods take aSchema as a single input argument. The helperfunctionhive_partition(...) is shorthand forHivePartitioning$create(schema(...)).

WithDirectoryPartitioningFactory$create(), you can provide just thenames of the path segments (in our example,c("year", "month")), andtheDatasetFactory will infer the data types for those partition variables.HivePartitioningFactory$create() takes no arguments: both variable namesand their types can be inferred from the file paths.hive_partition() withno arguments returns aHivePartitioningFactory.


[8]ページ先頭

©2009-2026 Movatter.jp