This release includes :
{parquetize} now has a newget_parquet_info function for retrieving metadata fromparquet files. This function is particularly useful for row group size(added by@nbc).This release includes :
{parquetize} now has a minimal version (2.4.0) for{haven} dependency package to ensure that conversions areperformed correctly from SAS files compressed in BINARY mode #46csv_to_parquet now has aread_delim_argsargument, allowing passing of arguments toread_delim(added bytable_to_parquet can now convert files with uppercaseextensions (.SAS7BDAT, .SAV, .DTA)This release includes :
@inheritParams to simply documentation offunctions arguments #38. This leads to some renaming of arguments (e.gpath_to_csv ->path_to_file…)compression andcompression_level are now passed to write_parquet_at_onceand write_parquet_by_chunk functions and now available in mainconversion functions ofparquetize #36@importFrom in a file to facilitate theirmaintenance #37This release includes :
You can convert to parquet any query you want on any DBI compatibleRDBMS :
dbi_connection <- DBI::dbConnect(RSQLite::SQLite(), system.file("extdata","iris.sqlite",package = "parquetize")) # Reading iris table from local sqlite database# and conversion to one parquet file :dbi_to_parquet( conn = dbi_connection, sql_query = "SELECT * FROM iris", path_to_parquet = tempdir(), parquetname = "iris")You can find more information ondbi_to_parquetdocumentation.
Two arguments are deprecated to avoid confusion with arrow conceptand keep consistency
chunk_size is replaced bymax_rows (chunksize is an arrow concept).chunk_memory_size is replaced bymax_memory for consistencyThis release includes :
parquetize !Due to these numerous contributions,
After a big refactoring, three arguments are deprecated :
by_chunk :table_to_parquet willautomatically chunked if you use one ofchunk_memory_sizeorchunk_size.csv_as_a_zip:csv_to_table will detect iffile is a zip by the extension.url_to_csv : usepath_to_csv instead,csv_to_table will detect if the file is remote with thefile path.They will raise a deprecation warning for the moment.
The possibility to chunk parquet by memory size withtable_to_parquet():table_to_parquet() takes achunk_memory_size argument to convert an input file intoparquet file of roughlychunk_memory_size Mb size when dataare loaded in memory.
Argumentby_chunk is deprecated (see above).
Example of use of the argumentchunk_memory_size:
table_to_parquet( path_to_table = system.file("examples","iris.sas7bdat", package = "haven"), path_to_parquet = tempdir(), chunk_memory_size = 5000, # this will create files of around 5Gb when loaded in memory)write_parquet whenchunkingThe functionality for users to pass argument towrite_parquet() when chunking argument (in the ellipsis).Can be used for example to passcompression andcompression_level.
Example:
table_to_parquet( path_to_table = system.file("examples","iris.sas7bdat", package = "haven"), path_to_parquet = tempdir(), compression = "zstd", compression_level = 10, chunk_memory_size = 5000)download_extractThis function is added to … download and unzip file if needed.
file_path <- download_extract( "https://www.nomisweb.co.uk/output/census/2021/census2021-ts007.zip", filename_in_zip = "census2021-ts007-ctry.csv")csv_to_parquet( file_path, path_to_parquet = tempdir())Under the cover, this release has hardened tests
This release fix an error when converting a sas file by chunk.
This release includes :
table_to_parquet() andcsv_to_parquet() functions #20inst/extdata directory.This release includes :
table_to_parquet() function has beenfixed when the argumentby_chunk is TRUE.This release removesduckdb_to_parquet() function on theadvice of Brian Ripley from CRAN.
Indeed, the storage of DuckDB is not yet stable. The storage will bestabilized when version 1.0 releases.
This release includes corrections for CRAN submission.
This release includes an important feature :
Thetable_to_parquet() function can now convert tablesto parquet format with less memory consumption. Useful for huge tablesand for computers with little RAM. (#15) A vignette has been writtenabout it. Seehere.
nb_rows argument in thetable_to_parquet() functionby_chunk,chunk_size andskip (see documentation)duckdb_to_parquet() function to convert duckdbfiles to parquet format.sqlite_to_parquet() function to convert sqlitefiles to parquet format.rds_to_parquet() function to convert rds files toparquet format.json_to_parquet() function to convert json andndjson files to parquet format.path_to_parquet exists in functionscsv_to_parquet() ortable_to_parquet() (table_to_parquet() function to convert SAS, SPSSand Stata files to parquet format.csv_to_parquet() function to convert csv files toparquet format.parquetize_example() function to get path topackage data examples.NEWS.md file to track changes to thepackage.