Movatterモバイル変換

[0]ホーム

parquetize (WIP)

This release includes :

{parquetize} now has a newget_parquet_info function for retrieving metadata fromparquet files. This function is particularly useful for row group size(added by@nbc).

parquetize 0.5.7

This release includes :

bugfix by@leungi:remove single quotes in SQL statement thatgenerates incorrect SQL syntaxfor connection of type Microsoft SQL Server #45
{parquetize} now has a minimal version (2.4.0) for{haven} dependency package to ensure that conversions areperformed correctly from SAS files compressed in BINARY mode #46
csv_to_parquet now has aread_delim_argsargument, allowing passing of arguments toread_delim(added by@nikostr).
table_to_parquet can now convert files with uppercaseextensions (.SAS7BDAT, .SAV, .DTA)

parquetize 0.5.6.1

This release includes :

fst_to_parquet function

a newfst_to_parquetfunction that converts a fst file to parquet format.

Other

Rely more on@inheritParams to simply documentation offunctions arguments #38. This leads to some renaming of arguments (e.gpath_to_csv ->path_to_file…)
Argumentscompression andcompression_level are now passed to write_parquet_at_onceand write_parquet_by_chunk functions and now available in mainconversion functions ofparquetize #36
Group@importFrom in a file to facilitate theirmaintenance #37
work on download_extract tests #43

parquetize 0.5.6

This release includes :

Possibility to use a RDBMSas source

You can convert to parquet any query you want on any DBI compatibleRDBMS :

dbi_connection <- DBI::dbConnect(RSQLite::SQLite(),  system.file("extdata","iris.sqlite",package = "parquetize"))  # Reading iris table from local sqlite database# and conversion to one parquet file :dbi_to_parquet(  conn = dbi_connection,  sql_query = "SELECT * FROM iris",  path_to_parquet = tempdir(),  parquetname = "iris")

You can find more information ondbi_to_parquetdocumentation.

check_parquet function

a newcheck_parquetfunction that check if a dataset/file is valid and return columns andarrow type

Deprecations

Two arguments are deprecated to avoid confusion with arrow conceptand keep consistency

chunk_size is replaced bymax_rows (chunksize is an arrow concept).
chunk_memory_size is replaced bymax_memory for consistency

Other

refactoring : extract the logic to write parquet files as chunk orat once in write_parquet_by_chunk and write_parquet_at_once
a big test’s refactoring : all _to_parquet output files are formallyvalidate (readable as parquet, number of lines, partitions, number offiles).
use cli_abort instead of cli_alert_danger with stop(““)everywhere
some minors changes
bugfix: table_to_parquet did not select columns as expected
bugfix: skip_if_offline tests with download

parquetize 0.5.5

This release includes :

A very importantnew contributor to`parquetize` !

Due to these numerous contributions,@nbc is now officially part of the projectauthors !

Three arguments deprecation

After a big refactoring, three arguments are deprecated :

by_chunk :table_to_parquet willautomatically chunked if you use one ofchunk_memory_sizeorchunk_size.
csv_as_a_zip:csv_to_table will detect iffile is a zip by the extension.
url_to_csv : usepath_to_csv instead,csv_to_table will detect if the file is remote with thefile path.

They will raise a deprecation warning for the moment.

Chunking by memory size

The possibility to chunk parquet by memory size withtable_to_parquet():table_to_parquet() takes achunk_memory_size argument to convert an input file intoparquet file of roughlychunk_memory_size Mb size when dataare loaded in memory.

Argumentby_chunk is deprecated (see above).

Example of use of the argumentchunk_memory_size:

table_to_parquet(  path_to_table = system.file("examples","iris.sas7bdat", package = "haven"),  path_to_parquet = tempdir(),  chunk_memory_size = 5000, # this will create files of around 5Gb when loaded in memory)

Passingargument like compression towrite_parquet whenchunking

The functionality for users to pass argument towrite_parquet() when chunking argument (in the ellipsis).Can be used for example to passcompression andcompression_level.

Example:

table_to_parquet(  path_to_table = system.file("examples","iris.sas7bdat", package = "haven"),  path_to_parquet = tempdir(),  compression = "zstd",  compression_level = 10,  chunk_memory_size = 5000)

A new function`download_extract`

This function is added to … download and unzip file if needed.

file_path <- download_extract(  "https://www.nomisweb.co.uk/output/census/2021/census2021-ts007.zip",  filename_in_zip = "census2021-ts007-ctry.csv")csv_to_parquet(  file_path,  path_to_parquet = tempdir())

Other

Under the cover, this release has hardened tests

parquetize 0.5.4

This release fix an error when converting a sas file by chunk.

parquetize 0.5.3

This release includes :

Added columns selection totable_to_parquet() andcsv_to_parquet() functions #20
The example files in parquet format of the iris table have beenmigrated to theinst/extdata directory.

parquetize 0.5.2

This release includes :

The behaviour oftable_to_parquet() function has beenfixed when the argumentby_chunk is TRUE.

parquetize 0.5.1

This release removesduckdb_to_parquet() function on theadvice of Brian Ripley from CRAN.
Indeed, the storage of DuckDB is not yet stable. The storage will bestabilized when version 1.0 releases.

parquetize 0.5.0

This release includes corrections for CRAN submission.

parquetize 0.4.0

This release includes an important feature :

Thetable_to_parquet() function can now convert tablesto parquet format with less memory consumption. Useful for huge tablesand for computers with little RAM. (#15) A vignette has been writtenabout it. Seehere.

Removal of thenb_rows argument in thetable_to_parquet() function
Replaced by new argumentsby_chunk,chunk_size andskip (see documentation)
Progress bars are now managed with{cli} package

parquetize 0.3.0

Addedduckdb_to_parquet() function to convert duckdbfiles to parquet format.
Addedsqlite_to_parquet() function to convert sqlitefiles to parquet format.

parquetize 0.2.0

Addedrds_to_parquet() function to convert rds files toparquet format.
Addedjson_to_parquet() function to convert json andndjson files to parquet format.
Added the possibility to convert a csv file to a partitioned parquetfile.
Improving code coverage (#9)
Check ifpath_to_parquet exists in functionscsv_to_parquet() ortable_to_parquet() (@py-b)

parquetize 0.1.0

Addedtable_to_parquet() function to convert SAS, SPSSand Stata files to parquet format.
Addedcsv_to_parquet() function to convert csv files toparquet format.
Addedparquetize_example() function to get path topackage data examples.
Added aNEWS.md file to track changes to thepackage.

[8]ページ先頭

Movatterモバイル変換

parquetize (WIP)

parquetize 0.5.7

parquetize 0.5.6.1

fst_to_parquet function

Other

parquetize 0.5.6

Possibility to use a RDBMSas source

check_parquet function

Deprecations

Other

parquetize 0.5.5

A very importantnew contributor toparquetize !

Three arguments deprecation

Chunking by memory size

A new functiondownload_extract

Other

parquetize 0.5.4

parquetize 0.5.3

parquetize 0.5.2

parquetize 0.5.1

parquetize 0.5.0

parquetize 0.4.0

parquetize 0.3.0

parquetize 0.2.0

parquetize 0.1.0

A very importantnew contributor to`parquetize` !

A new function`download_extract`