Movatterモバイル変換

[0]ホーム

RAthena 2.6.3

Bug Fix:

Unload option returns null results when s3_staging_dir is a bucketonly (#214) thanks to@dfsnow for implementing solution
Add support for describe (noctua # 218)thanks to@tyner forraising issue
Add support forEXPLAIN (TYPE validate) (noctua # 225)thanks to@tyner forraising issue
Fix link for documentdbSendQuery (noctua # 223)thanks to@tyner forraising issue
Return athena option environment fromRAthena_options(noctua #226) thanks to@tyner for raising issue
SupportSelectedEngineVersion inupdate_work_group (noctua # 224)thanks to@tyner forraising issue

RAthena 2.6.2

Feature:

Add catalog support (#194)
fixdbExistsTable to catch update AWS errormessage.
add support todbplyr 2.3.3.9000+

RAthena 2.6.1

Bug Fix:

Prevent assuming role fromAWS_ROLE_ARN. This causedconfusing when connecting through web identity (#177)
Supportdbplyr::in_catalog when working withdplyr::tbl (#178)

RAthena 2.6.0

Bug Fix:

Delay Python to R conversion to prevent 64 bit integer mapping toR’s base 32 bit integer (#168) causing the follow bug in Data Scan infomessage. Thanks to@juhoautio for identifying issue.

INFO: (Data scanned: -43839744 Bytes)

Feature:

Addclear_s3_resource parameter toRAthena_options to prevent AWS Athena output AWS S3resource being cleared up bydbClearResult (#168). Thanksto@juhoautio forthe request.
Support extra boto3 parameters forboto3.session.Session class andclient method(#169)
Supportendpoint_override parameter allow defaultendpoints for each service to be overridden accordingly (#169). Thanksto@aoyh for the requestand checking the package in development.

RAthena 2.5.1

Bug Fix:

Fixed unit test helper functiontest_data to usesize parameter explicitly.

RAthena 2.5.0

Feature:

Allow all information messages to be turned off (noctua #178).
AllowRAthena_options to change 1 parameter at a timewithout affecting other pre-configured settings
Return warning message for deprecatedretry_quietparameter inRAthena_options function.

RAthena 2.4.0

Feature:

Add supportdbplyr 2.0.0 backend API.
Add method to set unload on a package level to allowdplyr to benefit fromAWS Athena unloadmethods (noctua #174).

Bug Fix:

EnsuredbGetQuery,dbExecute,dbSendQuery,dbSendStatement work on olderversions ofR (noctua #170). Thanks to@tyner for identifying issue.
Caching would fail when statement wasn’t a character (noctua #171). Thanks to@ramnathv for identifying issue.

RAthena 2.3.0

Feature:

Add support toAWS Athena UNLOAD(noctua: #160). This is to take advantage of read/write speedparquet has to offer.

import awswrangleras wrimport getpassbucket= getpass.getpass()path=f"s3://{bucket}/data/"if"awswrangler_test"notin wr.catalog.databases().values:    wr.catalog.create_database("awswrangler_test")cols= ["id","dt","element","value","m_flag","q_flag","s_flag","obs_time"]df= wr.s3.read_csv(    path="s3://noaa-ghcn-pds/csv/189",    names=cols,    parse_dates=["dt","obs_time"])# Read 10 files from the 1890 decade (~1GB)wr.s3.to_parquet(    df=df,    path=path,    dataset=True,    mode="overwrite",    database="awswrangler_test",    table="noaa");wr.catalog.table(database="awswrangler_test", table="noaa")

library(DBI)con<-dbConnect(RAthena::athena())# Query ran using CSV outputsystem.time({  df=dbGetQuery(con,"SELECT * FROM awswrangler_test.noaa")})# Info: (Data scanned: 80.88 MB)#    user  system elapsed#  57.004   8.430 160.567RAthena::RAthena_options(cache_size =1)# Query ran using UNLOAD Parquet outputsystem.time({  df=dbGetQuery(con,"SELECT * FROM awswrangler_test.noaa",unload = T)})# Info: (Data scanned: 80.88 MB)#    user  system elapsed#  21.622   2.350  39.232# Query ran using cachesystem.time({  df=dbGetQuery(con,"SELECT * FROM awswrangler_test.noaa",unload = T)})# Info: (Data scanned: 80.88 MB)#    user  system elapsed#  13.738   1.886  11.029

RAthena 2.2.0

Bug Fix:

sql_translate_env correctly translates R functionsquantile andmedian toAWS Athenaequivalents (noctua #153). Thanks to@ellmanj for spotting issue.

Feature:

SupportAWS Athenatimestamp with time zone data type.
Properly support data typelist when converting data toAWS AthenaSQL format.

library(data.table)library(DBI)x=5dt=data.table(var1 =sample(LETTERS,size = x, T),var2 =rep(list(list("var3"=1:3,"var4"=list("var5"= letters[1:5]))), x))con<-dbConnect(RAthena::athena())#> Version: 2.2.0sqlData(con, dt)# Registered S3 method overwritten by 'jsonify':#   method     from#   print.json jsonlite# Info: Special characters "\t" has been converted to " " to help with Athena reading file format tsv#    var1                                                   var2# 1:    1 {"var3":[1,2,3],"var4":{"var5":["a","b","c","d","e"]}}# 2:    2 {"var3":[1,2,3],"var4":{"var5":["a","b","c","d","e"]}}# 3:    3 {"var3":[1,2,3],"var4":{"var5":["a","b","c","d","e"]}}# 4:    4 {"var3":[1,2,3],"var4":{"var5":["a","b","c","d","e"]}}# 5:    5 {"var3":[1,2,3],"var4":{"var5":["a","b","c","d","e"]}}#> Version: 2.1.0sqlData(con, dt)# Info: Special characters "\t" has been converted to " " to help with Athena reading file format tsv#    var1                                        var2# 1:    1 1:3|list(var5 = c("a", "b", "c", "d", "e"))# 2:    2 1:3|list(var5 = c("a", "b", "c", "d", "e"))# 3:    3 1:3|list(var5 = c("a", "b", "c", "d", "e"))# 4:    4 1:3|list(var5 = c("a", "b", "c", "d", "e"))# 5:    5 1:3|list(var5 = c("a", "b", "c", "d", "e"))

v-2.2.0 now converts lists into json lines format so that AWS Athenacan parse withsqlarray/mapping/json functions.Small down side a s3 method conflict occurs whenjsonify iscalled to convert lists into json lines.jsonify was choosein favor tojsonlite due to the performance improvements(noctua #156).

RAthena 2.1.0

Bug Fix:

dbIsValid wrongly stated connection is valid for resultclass when connection class was disconnected.
sql_translate_env.paste broke with latest version ofdbplyr. New method is compatible withdbplyr>=1.4.3 (noctua #149).

Feature:

sql_translate_env: add support forstringr/lubridate style functions, similar toPostgresbackend.
dbConnect addtimezone parameter so thattime zone betweenR andAWS Athena isconsistent (noctua #149).

RAthena 2.0.1

Bug Fix:

Fix issue of keyboard interrupt failing to raise interrupterror.

RAthena 2.0.0

API Change

AthenaConnection class:ptr andinfo slots changed fromlist toenvironment with inAthenaConnect class.Allows class to be updated by reference. Simplifies notation whenviewing class from RStudio environment tab.
AthenaResult class:info slot changed fromlist toenvironment. Allows class to beupdated by reference.

By utilising environments forAthenaConnection andAthenaResult, allAthenaResult classes createdfromAthenaConnection will point to the sameptr andinfo environments for it’s connection.Previouslyptr andinfo would make a copy.This means if it was modified it would not affect the child or parentclass for example:

# Old Methodlibrary(DBI)con<-dbConnect(RAthena::athena(),rstudio_conn_tab = F)res<-dbExecute(con,"select 'helloworld'")# modifying parent class to influence childcon@info$made_up<-"helloworld"# nothing happenedres@connection@info$made_up# > NULL# modifying child class to influence parentres@connection@info$made_up<-"oh no!"# nothing happenedcon@info$made_up# > "helloworld"# New Methodlibrary(DBI)con<-dbConnect(RAthena::athena(),rstudio_conn_tab = F)res<-dbExecute(con,"select 'helloworld'")# modifying parent class to influence childcon@info$made_up<-"helloworld"# picked up changeres@connection@info$made_up# > "helloworld"# modifying child class to influence parentres@connection@info$made_up<-"oh no!"# picked up changecon@info$made_up# > "oh no!"

New Feature

Added support toAWS Athena data types[array, row, map, json, binary, ipaddress] (noctua: #135). Conversion types can be changed throughdbConnectandRAthena_options.

library(DBI)library(RAthena)# default conversion methodscon<-dbConnect(RAthena::athena())# change json conversion methodRAthena_options(json ="character")RAthena:::athena_option_env$json# [1] "character"# change json conversion to custom methodRAthena_options(json = jsonify::from_json)RAthena:::athena_option_env$json# function (json, simplify = TRUE, fill_na = FALSE, buffer_size = 1024)# {#   json_to_r(json, simplify, fill_na, buffer_size)# }# <bytecode: 0x7f823b9f6830>#   <environment: namespace:jsonify># change bigint conversion without affecting custom json conversion methodsRAthena_options(bigint ="numeric")RAthena:::athena_option_env$json# function (json, simplify = TRUE, fill_na = FALSE, buffer_size = 1024)# {#   json_to_r(json, simplify, fill_na, buffer_size)# }# <bytecode: 0x7f823b9f6830>#   <environment: namespace:jsonify>RAthena:::athena_option_env$bigint# [1] "numeric"# change binary conversion without affect, bigint or json methodsRAthena_options(binary ="character")RAthena:::athena_option_env$json# function (json, simplify = TRUE, fill_na = FALSE, buffer_size = 1024)# {#   json_to_r(json, simplify, fill_na, buffer_size)# }# <bytecode: 0x7f823b9f6830>#   <environment: namespace:jsonify>RAthena:::athena_option_env$bigint# [1] "numeric"RAthena:::athena_option_env$binary# [1] "character"# no conversion for json objectscon2<-dbConnect(RAthena::athena(),json ="character")# use custom json parsercon<-dbConnect(RAthena::athena(),json = jsonify::from_json)

Allow users to turn off RStudio Connection Tab when working inRStudio (noctua: #136). This can be done through parameterrstudio_conn_tab withindbConnect.

Bug Fix:

AWS Athena usesfloat data type for theDDL only,RAthena was wrongly parsingfloatdata type back to R. InsteadAWS Athena uses data typereal in SQL functions likeselect casthttps://docs.aws.amazon.com/athena/latest/ug/data-types.html.RAthena now correctly parsesreal to R’s datatypedouble (noctua: #133)
Iterate through each tokenAWS returns to get allresults fromAWS Glue catalogue (noctua: #137)

RAthena 1.12.0

New Feature

Added optional formatting todbGetPartition. Thissimply tidies up the default AWS Athena partition format.

library(DBI)library(RAthena)con<-dbConnect(athena())dbGetPartition(con,"test_df2",.format = T)# Info: (Data scanned: 0 Bytes)#    year month day# 1: 2020    11  17dbGetPartition(con,"test_df2")# Info: (Data scanned: 0 Bytes)#                    partition# 1: year=2020/month=11/day=17

Support different formats for returningbigint, this isto align with other DBI interfaces i.e. RPostgres. Nowbigint can be return in the possible formats: [“integer64”,“integer”, “numeric”, “character”]

library(DBI)con <- dbConnect(RAthena::athena(), bigint = "numeric")

When switching between the different file parsers thebigint to be represented according to the file parseri.e. data.table: “integer64” ->vroom:“I”.

Bug Fix:

dbRemoveTable: Check if key has “.” or ends with “/”before adding “/” to the end (noctua: #125)
Added uuid minimum version to fix issue (noctua: #128)

Documentation:

Added note to dbRemoveTable doc string around aws athena tableLocation in Amazon S3.

RAthena 1.11.1

Note:

Added package checks to unit tests when testing a suggesteddependency. This is to fix “CRAN Package Check Results for PackageRAthena” for operating system “r-patched-solaris-x86”. Errormessage:

Error: write_parquet requires the arrow package, please install it first and try again

RAthena 1.11.0

New Feature

Movesql_escape_date intodplyr_integration.R backend (#121). Thanks to@OssiLehtinen fordeveloping Athena date translation.
AllowRAthena to append to a static AWS s3 locationusing uuid

Bug Fix

parquet file.types now use parameteruse_deprecated_int96_timestamps set toTRUE.This puts POSIXct data type in tojava.sql.Timestampcompatible format, such asyyyy-MM-dd HH:mm:ss[.f...].Thanks to Christian N Wolz for highlight this issue.
s3_upload_location simplified how s3 location is built.Now s3.location parameter isn’t affected and instead only additionalcomponents e.g. name, schema and partition.
dbplyr v-2.0.0 functionin_schema nowwraps strings in quotes, this breaksdb_query_fields.AthenaConnection. Nowdb_query_fields.AthenaConnection removes any quotation fromthe string so that it can searchAWS GLUE for tablemetadata. (noctua: #117)

RAthena 1.10.1

Bug Fix

Do not abort if a glue::get_tables api call fails (e.g., due tomissing permissions to a specific database or an orphaned Lake Formationresource link) when retrieving a list of database tables withdbListTables, dbGetTables or in Rstudio’s Connections pane. Thanks to@OssiLehtinencreating solution (noctua: #106)
Allowed cache_size to equal 100

RAthena 1.10.0

New Feature

RAthena now supports Keyboard Interrupt and will stop AWS Athenarunning the query when the query has been interrupted. To keep thefunctionality of AWS Athena running whenR has beeninterrupt a new parameter has been added todbConnect,keyboard_interrupt. Example:

# Stop AWS Athena when R has been interrupted:con<-dbConnect(RAthena::athena())# Let AWS Athena keep running when R has been interrupted:con<-dbConnect(RAthena::athena(),keyboard_interrupt = F)

RAthena 1.9.1

Minor Change

Fixed issue whereRAthena would return adata.frame for utilitySQL queries regardlessof backend file parser. This is due toAWS AthenaoutputtingSQL UTILITY queries as a text file that requiredto be read in line by line. NowRAthena will return thecorrect data format based on file parser set inRAthena_options for example:RAthena_options("vroom") will returntibbles.

Documentation:

Added documentation to highlight behaviourdbClearResult when user doesn’t have permission to deleteAWS S3 objects (noctua: #96)

RAthena 1.9.0

New Feature

functions that collect or push to AWS S3 now have a retrycapability. Meaning if API call fails then the call is retried (noctua: #79)
RAthena_options contains 2 new parameters to controlhowRAthena handles retries.
dbFetch is able to return data from AWS Athena inchunk. This has been achieved by passingNextToken toAthenaResult s4 class. This method won’t be as fastn = -1 as each chunk will have to be process into dataframe format.

library(DBI)con<-dbConnect(RAthena::athena())res<-dbExecute(con,"select * from some_big_table limit 10000")dbFetch(res,5000)

When creating/appending partitions to a table,dbWriteTable opts to usealter table insteadof standardmsck repair table. This is to improveperformance when appending to tables with high number of existingpartitions.
dbWriteTable now allows json to be appended to jsonddls created with the Openx-JsonSerDe library.
dbConvertTable bringsdplyr::computefunctionality to base package, allowingRAthena to use thepower of AWS Athena to convert tables and queries to more efficient fileformats in AWS S3 (#37).
Extendeddplyr::compute to give same functionality ofdbConvertTable
The error message for python’sboto3 not being detectedhas been updated. This is due to several users not sure how to getRAthena set-up.

stop("Boto3 is not detected please install boto3 using either: `pip install boto3 numpy` in terminal or `install_boto()`.",     "\nIf this doesn't work please set the python you are using with `reticulate::use_python()` or `reticulate::use_condaenv()`",     call. = FALSE)

Addedregion_name check before making a connection toAWS Athena (#110)

Bug Fix

dbWriteTable would throwthrottling errorevery now and again,retry_api_call as been built to handlethe parsing of data between R and AWS S3.
dbWriteTable did not clear down all metadata whenuploading toAWS Athena

Documentation

dbWriteTable added support ddl structures for user whohave created ddl’s outside ofRAthena
added vignette around how to useRAthena retryfunctionality
Moved all examples requiring credentials to\dontrun(#108)

RAthena 1.8.0

New Feature

Inspired bypyathena,RAthena_options nowhas a new parametercache_size. This implements localcaching in R environments instead of using AWSlist_query_executions. This is down todbClearResult clearing S3’s Athena output when cachingisn’t disabled
RAthena_options now hasclear_cacheparameter to clear down all cached data.
dbRemoveTable now utiliseAWS Glue toremove tables fromAWS Glue catalogue. This has aperformance enhancement:

library(DBI)con=dbConnect(RAthena::athena())# upload iris dataframe for removal testdbWriteTable(con,"iris2", iris)# Athena methodsystem.time(dbRemoveTable(con,"iris2",confirm = T))# user  system elapsed# 0.131   0.037   2.404# upload iris dataframe for removal testdbWriteTable(con,"iris2", iris)# Glue methodsystem.time(dbRemoveTable(con,"iris2",confirm = T))# user  system elapsed# 0.065   0.009   1.303

dbWriteTable now supports uploading json lines(http://jsonlines.org/) format up toAWS Athena (#88).

library(DBI)con=dbConnect(RAthena::athena())dbWriteTable(con,"iris2", iris,file.type ="json")dbGetQuery(con,"select * from iris2")

Bug Fix

dbWriteTable appending to existing table compress filetype was incorrectly return.
install_boto addednumpy toRAthena environment install asreticulateappears to favour environments withnumpy(https://github.com/rstudio/reticulate/issues/216)
Rstudio connection tab comes into an issue when GlueTable isn’t stored correctly (#92)

Documentation

Added supported environmental variableAWS_REGION intodbConnect
Vignettes added:
- AWS Athena Query Cache
- AWS S3 backend
- Changing Backend File Parser
- Getting Started

Unit tests:

Increase coverage to + 80%

RAthena 1.7.1

Bug Fix

Dependency data.table now restricted to (>=1.12.4) due to filecompression being added tofwrite (>=1.12.4)https://github.com/Rdatatable/data.table/blob/master/NEWS.md
Thanks to@OssiLehtinen for fixing date variablesbeing incorrectly translated bysql_translate_env(#44)

# Beforedbplyr::translate_sql("2019-01-01",con = con)# '2019-01-01'# Nowdbplyr::translate_sql("2019-01-01",con = con)# DATE '2019-01-01'

R functionspaste/paste0 would use defaultdplyr:sql-translate-env (concat_ws).paste0 now uses Presto’sconcat function andpaste now uses pipes to get extra flexibility for customseparating values.

# R code:paste("hi","bye",sep ="-")# SQL translation:('hi'||'-'||'bye')

If table exists and parameterappend set toTRUE then existing s3.location will be utilised (#73)
db_compute returned table name, however when a userwished to write table to another location (#74). An error would beraised:Error: SYNTAX_ERROR: line 2:6: Table awsdatacatalog.default.temp.iris does not existThis has now been fixed with db_compute returningdbplyr::in_schema.

library(DBI)library(dplyr)con<-dbConnect(RAthena::athena())tbl(con,"iris")%>%compute(name ="temp.iris")

dbListFields didn’t display partitioned columns. Thishas now been fixed with the call to AWS Glue being altered to includemore metadata allowing for column names and partitions to bereturned.
RStudio connections tab didn’t display any partitioned columns, thishas been fixed in the same manner asdbListFields

New Feature

RAthena_options
- Now checks if desired file parser is installed before changedfile_parser method
- File parservroom has been restricted to >= 1.2.0due to integer64 support and changes tovroom api
dbStatistics is a wrapper aroundboto3get_query_execution to return statistics forRAthena::dbSendQuery results (#67)
dbGetQuery has new parameterstatistics toprint outdbStatistics before returning Athena results(#67)
s3.location now follows new syntaxs3://bucket/{schema}/{table}/{partition}/{table_file} toalign withPyathena and to allow tables with same name butin different schema to be uploaded to s3 (#73).
Thanks to@OssiLehtinen for improving the speedofdplyr::tbl when calling Athena when using the identmethod (noctua# 64):

library(DBI)library(dplyr)con<-dbConnect(RAthena::athena())# ident method:t1<-system.time(tbl(con,"iris"))# sub query method:t2<-system.time(tbl(con,sql("select * from iris")))# ident method# user  system elapsed# 0.082   0.012   0.288# sub query method# user  system elapsed# 0.993   0.138   3.660

Unit test

dplyr sql_translate_env: expected results have now beenupdated to take into account bug fix with date fields
S3 upload location: Test if the created s3 location is in thecorrect location

RAthena 1.7.0

New Feature

Added integration into Rstudio connections tab
Added information message of amount of data scanned by AWSAthena
Added method to change backend file parser so user can change fileparser fromdata.table tovroom. From now onit is possible to change file parser usingRAthena_optionsfor example:

library(RAthena)RAthena_options("vroom")

new functiondbGetTables that returns Athena hierarchyas a data.frame

Unit tests

Added data transfer unit test for backend file parservroom

Documentation

Updated R documentation toroxygen2 7.0.2

RAthena 1.6.0

Major Change

Default delimited file uploaded to AWS Athena changed from “csv” to“tsv” this is due to separating value “,” in character variables. Byusing “tsv” file type JSON/Array objects can be passed to Athena throughcharacter types. To prevent this becoming a breaking changedbWriteTableappend parameter checks and usesexisting AWS Athena DDL file type. Iffile.type doesn’tmatch Athena DDL file type then user will receive a warningmessage:

warning('Appended `file.type` is not compatible with the existing Athena DDL file type and has been converted to "', File.Type,'".',call. =FALSE)

Minor Change

Added AWS_ATHENA_WORK_GROUP environmental variable support
Removedtolower conversion due to request #41

New Feature

Due to help from@OssiLehtinen,dbRemoveTable can now remove S3 files for AWS Athena tablebeing removed.

Bug fix

Due to issue highlighted by@OssiLehtinen in #50, specialcharacters have issue being processed when using flat file in thebackend.
Fixed issue whereas.character was getting wronglytranslated #45
Fixed issue where row.names not being correctly catered andreturning NA in column names #41
Fixed issue withINTEGER being incorrectly translatedinsql_translate_env.R

Unit Tests

Special characters have been added to unit testdata-transfer
dbRemoveTable new parameters are added in unittest
Added row.names to unit test data transfer
Updated dplyrsql_translate_env until test to cater bugfix

RAthena 1.5.0

Major Change

dbWriteTable now will splitgzipcompressed files to improve AWS Athena performance. By defaultgzip compressed files will be split into 20.

Performance results

library(DBI)X<-1e8df<-data.frame(w =runif(X),x =1:X,y =sample(letters, X,replace = T),z =sample(c(TRUE,FALSE), X,replace = T))con<-dbConnect(RAthena::athena())# upload dataframe with different splitsdbWriteTable(con,"test_split1", df,compress = T,max.batch =nrow(df),overwrite = T)# no splitsdbWriteTable(con,"test_split2", df,compress = T,max.batch =0.05*nrow(df),overwrite = T)# 20 splitsdbWriteTable(con,"test_split3", df,compress = T,max.batch =0.1*nrow(df),overwrite = T)# 10 splits

AWS Athena performance results from AWS console (query executed:select count(*) from .... ):

test_split1: (Run time: 38.4 seconds, Data scanned: 1.16 GB)
test_split2: (Run time: 3.73 seconds, Data scanned: 1.16 GB)
test_split3: (Run time: 5.47 seconds, Data scanned: 1.16 GB)

library(DBI)X<-1e8df<-data.frame(w =runif(X),x =1:X,y =sample(letters, X,replace = T),z =sample(c(TRUE,FALSE), X,replace = T))con<-dbConnect(RAthena::athena())dbWriteTable(con,"test_split1", df,compress = T,overwrite = T)# default will now split compressed file into 20 equal size files.

Added information message to inform user about what files have beenadded to S3 location if user is overwriting an Athena table.

Minor Change

copy_to method now supports compress and max_batch, toalign withdbWriteTable

Bug Fix

Fixed bug in regards to Athena DDL being created incorrectly whenpassed fromdbWriteTable
Thanks to@OssiLehtinen for identifying issuearound uploading classPOSIXct to Athena. This class wasconvert incorrectly and AWS Athena would return NA instead.RAthena will now correctly convertPOSIXct totimestamp but will also correct read in timestamp intoPOSIXct
Thanks to@OssiLehtinen for discovering an issuewithNA in string format. BeforeRAthena wouldreturnNA in string class as"" this has nowbeen fixed.
When returning a single column data.frame from Athena,RAthena would translate output into a vector with currentthe methoddbFetch n = 0.
Thanks to@OssiLehtinen for identifying issuearoundsql_translate_env. PreviouslyRAthenawould take the defaultdplyr::sql_translate_env, nowRAthena has a custom method that uses Data types from:https://docs.aws.amazon.com/athena/latest/ug/data-types.html and windowfunctions from:https://docs.aws.amazon.com/athena/latest/ug/functions-operators-reference-section.html

Unit tests

POSIXct class has now been added to data transfer unittest
dplyr sql_translate_env tests if R functions arecorrect translated in to Athenasql syntax.

RAthena 1.4.1

New Features:

Parquet file type can now be compress using snappy compression whenwriting data to S3.

Bug Fix

Older versions of R are returning errors when functiondbWriteTable is called. The bug is due to functionsqlCreateTable whichdbWriteTable calls.Parameterstable andfields were set toNULL. This has now been fixed.

RAthena 1.4.0

Minor Change

s3.location parameter isdbWriteTable cannow be made nullable
sqlCreateTable info message will now only inform userif colnames have changed and display the column name that havechanged

Backend Change

helper functionupload_data has been rebuilt andremoved the old “horrible” if statement withpaste now thefunction relies onsprintf to construct the s3 locationpath. This method now is a lot clearer in how the s3 location is createdplus it enables adbWriteTable to be simplified.dbWriteTable can now upload data to the default s3_stagingdirectory created indbConnect this simplifiesdbWriteTable to :

library(DBI)con<-dbConnect(RAthena::athena())dbWrite(con,"iris", iris)

New Feature

GZIP compression is now supported for “csv” and “tsv” file format indbWriteTable

Bug Fix

Info message wasn’t being return when colnames needed changing forAthena DDL

Unit Tests

data transfer test now tests compress, and defaults3.location when transferring data

RAthena 1.3.0

Major Changes

RAthena now defaults in using data.table to read and write fileswhen transferring data to and from AWS Athena. Reason for change:
- Increase speed in data transfer
- Data types from AWS Athena can be passed todata.table::fread. This enables data types to be read incorrectly and not required a second stage to convert data types oncedata has been read into R

Minor Change

Progress bar fromdata.table::fread anddata.table::fwrite have been disabled
Removedutil functions from namespace:write.table,read.csv
Addeddata.table to namespace

New Features:

AWS Athenabigint are convert into Rbit64::integer64 and visa versa

Unit tests

Addedbigint tointeger64 in data.transferunit test

RAthena 1.2.0

Minor Changes

Removed old s3_staging_dir validation check fromdbConnect method
ImproveddbFetch with chunk sizes between 0 - 999.Fixed error wherefor loop would return error instead ofbreaking.
simplifiedpy_error function, setcall.parameter toFALSE
AthenaQuery s4 class changed toAthenaResult
dbFetch added datatype collection
dbFetch replaced S3 search for query key with outputlocation from Athena
dbClearResult changed error, to return python error aswarning to warn user doesn’t have permission to delete S3 resource
dbClearResult replaced S3 search for query key with outlocation from Athena
dbListTables now returns vector of tables fromaws glue instead of using anAWS Athena query.This method increases speed of call of query
dbListFields now returns column names fromaws glue instead of using anAWS Athenaquery.. This method increases speed of call of query
dbExistsTable now returns boolean fromaws glue instead of using anAWS Athenaquery.. This method increases speed of call of query

New Features

new lower level api to work with Athena work groups:
- create_work_group: Creates a workgroup with thespecified name.
- delete_work_group: Deletes the workgroup with thespecified name.
- list_work_group: Lists available workgroups for theaccount.
- get_work_group: Returns information about the workgroupwith the specified name.
- update_work_group: Updates the workgroup with thespecified name. The workgroup’s name cannot be changed.
Added lower level api functionget_session_token tocreate temporary session credentials
Added lower level api functionassume_role to assumeAWS ARN Role
Added support for AWS ARN Role when connecting to Athena usingdbConnect
Created helper functionset_aws_env to set aws tokensto environmental variables
Created helper functionget_aws_env to return expectedresults from system variables
Created a helper functiontag_options to create tagoptions forcreate_work_group
Created a helper functionwork_group_config andwork_group_config_update to create config of workgroup
Added extra feature to get work group output location in connectionfunctionAthenaConnection
createddbColumnInfo method: returns data.framecontainingfield_name andtype
Created helper functiontime_check to check how long isleft on the Athena Connection, if less than 15 minutes a warning messageis outputted to notify user
Created s3 method for functiondb_collect for betterintegration with dplyr
Created s3 method for functiondb_save_query for betterintegration with dplyr
Created s3 method for functiondb_copy_to for betterintegration with dplyr

Bug Fix

dbFetch Athena data type miss alignment
Added Athena classes and names to file readers to prevent missclassification
Fixed Athena ddl and underlying data in s3 being miss aligned.Causing parquet files being read by Athena to fail.

Unit tests

ARN Connection
Athena Work Groups
Athena Metadata
dplyr compute
dplyr copy_to

RAthena 1.1.0

New Features

Added new features inAthenaConnection:
- poll_interval: Amount of time took when checking query executionstate.
- work_group: allows users to assign work groups to Athenaresource
- encryption_option: Athena encryption at rest
- kms_key:AWS Key Management Service
New helper functionrequest build Athena queryrequest
Created s3 method for functiondb_desc
Changed to default polling value from 1 to be random intervalbetween 0 - 1
Added parameter validation ondbConnect

Unit tests

Athena Request

RAthena 1.0.3

New Feature

Addedstop_query_execution todbClearResult if the query is still running

Bug Fix

Fixed bug of miss-alignment of s3 location and Athena table on lowerlevel folder structures, when writing data.frame to Athena (usingdbWriteTable)

Unit tests

Added logical variable type in data transfer unit test

RAthena 1.0.2

CRAN Requirement Changes

Added explanation to DBI alias (Database Interface) to descriptionfile due to cran feedback
split functions out of overall r documentation
- Added extra examples to each function
- Added return values
- Added examples that allowed users not to require aws keys
- Added note for user on some example that require an AWS account torun example

Bug Fixes

fixed bug with field names containing “.”, replace with “_” forAthena tables.

Unit tests

Athena DDL
Athena Classes
Data Transfer
Disconnect
Exist/Remove

RAthena 1.0.1

CRAN Requirement Changes

Fixes to description file due to cran submission feedback
- Added link to AWS Athena
- Added description of aliases “AWS”, and “SDK” to descriptionsection
- Removed generated MIT license from github
- Formatted LICENSE file

Minor Change

changed helper functionwaiter topoll, toalign with python’s polling

RAthena 1.0.0

Initial RAthena release

[8]ページ先頭