API Reference¶
Amazon S3¶
| Copy a list of S3 objects to another S3 directory. |
| Delete Amazon S3 objects from a received S3 prefix or list of S3 objects paths. |
| Describe Amazon S3 objects from a received S3 prefix or list of S3 objects paths. |
| Check if object exists on S3. |
| Download file from a received S3 path to local file. |
| Get bucket region name. |
| List Amazon S3 buckets. |
| List Amazon S3 objects from a prefix. |
| List Amazon S3 objects from a prefix. |
| Merge a source dataset into a target dataset. |
| Read CSV file(s) from a received S3 prefix or list of S3 objects paths. |
| Read EXCEL file(s) from a received S3 path. |
| Read fixed-width formatted file(s) from a received S3 prefix or list of S3 objects paths. |
| Read JSON file(s) from a received S3 prefix or list of S3 objects paths. |
| Read Parquet file(s) from an S3 prefix or list of S3 objects paths. |
| Read Apache Parquet file(s) metadata from an S3 prefix or list of S3 objects paths. |
| Read Apache Parquet table registered in the AWS Glue Catalog. |
| Read ORC file(s) from an S3 prefix or list of S3 objects paths. |
| Read Apache ORC file(s) metadata from an S3 prefix or list of S3 objects paths. |
| Read Apache ORC table registered in the AWS Glue Catalog. |
| Load a Deltalake table data from an S3 path. |
| Filter contents of Amazon S3 objects based on SQL statement. |
| Get the size (ContentLength) in bytes of Amazon S3 objects from a received S3 prefix or list of S3 objects paths. |
| Infer and store parquet metadata on AWS Glue Catalog. |
| Write CSV file or dataset on Amazon S3. |
| Write EXCEL file on Amazon S3. |
| Write JSON file on Amazon S3. |
| Write Parquet file or dataset on Amazon S3. |
| Write ORC file or dataset on Amazon S3. |
| Write a DataFrame to S3 as a DeltaLake table. |
| Upload file from a local file to received S3 path. |
| Wait Amazon S3 objects exist. |
| Wait Amazon S3 objects not exist. |
AWS Glue Catalog¶
| Add a column in a AWS Glue Catalog table. |
| Add partitions (metadata) to a CSV Table in the AWS Glue Catalog. |
| Add partitions (metadata) to a Parquet Table in the AWS Glue Catalog. |
| Create a CSV Table (Metadata Only) in the AWS Glue Catalog. |
| Create a database in AWS Glue Catalog. |
| Create a JSON Table (Metadata Only) in the AWS Glue Catalog. |
| Create a Parquet Table (Metadata Only) in the AWS Glue Catalog. |
| Get a Pandas DataFrame with all listed databases. |
| Delete a column in a AWS Glue Catalog table. |
| Delete a database in AWS Glue Catalog. |
| Delete specified partitions in a AWS Glue Catalog table. |
| Delete all partitions in a AWS Glue Catalog table. |
| Delete Glue table if exists. |
| Check if the table exists. |
Drop all repeated columns (duplicated names). | |
| Extract columns and partitions types (Amazon Athena) from Pandas DataFrame. |
| Get all columns comments. |
| Get all columns parameters. |
| Get all partitions from a Table in the AWS Glue Catalog. |
| Get an iterator of databases. |
| Get all partitions from a Table in the AWS Glue Catalog. |
| Get all partitions from a Table in the AWS Glue Catalog. |
| Get table description. |
| Get table's location on Glue catalog. |
| Get total number of versions. |
| Get all parameters. |
| Get all columns and types from a table. |
| Get all versions. |
| Get an iterator of tables. |
| Overwrite all existing parameters. |
| Convert the column name to be compatible with Amazon Athena and the AWS Glue Catalog. |
| Normalize all columns names to be compatible with Amazon Athena. |
| Convert the table name to be compatible with Amazon Athena and the AWS Glue Catalog. |
| Get Pandas DataFrame of tables filtered by a search string. |
| Get table details as Pandas DataFrame. |
| Get a DataFrame with tables filtered by a search term, prefix, suffix. |
| Insert or Update the received parameters. |
Amazon Athena¶
| Create the default Athena bucket if it doesn't exist. |
| Create session and wait until ready to accept calculations. |
| Create a new table populated with the results of a SELECT query. |
| Generate the query that created a table(EXTERNAL_TABLE) or a view(VIRTUAL_TABLE). |
| Get the data type of all columns queried. |
| Fetch query execution details. |
| From specified query execution IDs, return a DataFrame of query execution details. |
| Get AWS Athena SQL query results as a Pandas DataFrame. |
| Get the named query statement string from a query ID. |
| Return information about the workgroup with the specified name. |
| Fetch list query execution IDs ran in specified workgroup or primary work group if not specified. |
| Execute any SQL query on AWS Athena and return the results as a Pandas DataFrame. |
| Extract the full table AWS Athena and return the results as a Pandas DataFrame. |
| Run the Hive's metastore consistency check: 'MSCK REPAIR TABLE table;'. |
| Execute Spark Calculation and wait for completion. |
| Generate the query that created it: 'SHOW CREATE TABLE table;'. |
| Start a SQL Query against AWS Athena. |
| Stop a query execution. |
| Insert into Athena Iceberg table using INSERT INTO . |
| Delete rows from an Iceberg table. |
| Write query results from a SELECT statement to the specified data format using UNLOAD. |
| Wait for the query end. |
| Create a SQL statement with the name statement_name to be run at a later time. |
| List the prepared statements in the specified workgroup. |
| Delete the prepared statement with the specified name from the specified workgroup. |
Amazon Redshift¶
| Return a redshift_connector connection from a Glue Catalog or Secret Manager. |
| Return a redshift_connector temporary connection (No password required). |
| Load Pandas DataFrame as a Table on Amazon Redshift using parquet files on S3 as stage. |
| Load files from S3 to a Table on Amazon Redshift (Through COPY command). |
| Return a DataFrame corresponding to the result set of the query string. |
| Return a DataFrame corresponding the table. |
| Write records stored in a DataFrame into Redshift. |
| Load Pandas DataFrame from a Amazon Redshift query result using Parquet files on s3 as stage. |
| Unload Parquet files on s3 from a Redshift query result (Through the UNLOAD command). |
PostgreSQL¶
| Return a pg8000 connection from a Glue Catalog Connection. |
| Return a DataFrame corresponding to the result set of the query string. |
| Return a DataFrame corresponding the table. |
| Write records stored in a DataFrame into PostgreSQL. |
MySQL¶
| Return a pymysql connection from a Glue Catalog Connection or Secrets Manager. |
| Return a DataFrame corresponding to the result set of the query string. |
| Return a DataFrame corresponding the table. |
| Write records stored in a DataFrame into MySQL. |
Microsoft SQL Server¶
| Return a pyodbc connection from a Glue Catalog Connection. |
| Return a DataFrame corresponding to the result set of the query string. |
| Return a DataFrame corresponding the table. |
| Write records stored in a DataFrame into Microsoft SQL Server. |
Oracle¶
| Return a oracledb connection from a Glue Catalog Connection. |
| Return a DataFrame corresponding to the result set of the query string. |
| Return a DataFrame corresponding the table. |
| Write records stored in a DataFrame into Oracle Database. |
Data API Redshift¶
| Provides access to a Redshift cluster via the Data API. |
| Create a Redshift Data API connection. |
| Run an SQL query on a RedshiftDataApi connection and return the result as a DataFrame. |
Data API RDS¶
| Provides access to the RDS Data API. |
| Create a RDS Data API connection. |
| Run an SQL query on an RdsDataApi connection and return the result as a DataFrame. |
| Insert data using an SQL query on a Data API connection. |
AWS Glue Data Quality¶
| Create recommendation Data Quality ruleset. |
| Create Data Quality ruleset. |
| Evaluate Data Quality ruleset. |
| Get a Data Quality ruleset. |
| Update Data Quality ruleset. |
OpenSearch¶
| Create a secure connection to the specified Amazon OpenSearch domain. |
| Create Amazon OpenSearch Serverless collection. |
| Create an index. |
| Delete an index. |
| Index all documents from a CSV file to OpenSearch index. |
| Index all documents to OpenSearch index. |
| Index all documents from a DataFrame to OpenSearch index. |
| Index all documents from JSON file to OpenSearch index. |
| Return results matching query DSL as pandas DataFrame. |
| Return results matchingSQL query as pandas DataFrame. |
Amazon Neptune¶
| Create a connection to a Neptune cluster. |
| Return results of a Gremlin traversal as pandas DataFrame. |
| Return results of a openCypher traversal as pandas DataFrame. |
| Return results of a SPARQL query as pandas DataFrame. |
| Flatten the lists and dictionaries of the input data frame. |
| Write records stored in a DataFrame into Amazon Neptune. |
| Write records stored in a DataFrame into Amazon Neptune. |
| Write records into Amazon Neptune using the Neptune Bulk Loader. |
| Load files from S3 into Amazon Neptune using the Neptune Bulk Loader. |
DynamoDB¶
| Delete all items in the specified DynamoDB table. |
| Run a PartiQL statement against a DynamoDB table. |
| Get DynamoDB table object for specified table name. |
| Write all items from a CSV file to a DynamoDB. |
| Write all items from a DataFrame to a DynamoDB. |
| Insert all items to the specified DynamoDB table. |
| Write all items from JSON file to a DynamoDB. |
| Read items from given DynamoDB table. |
| Read data from a DynamoDB table via a PartiQL query. |
Amazon Timestream¶
| Batch load a Pandas DataFrame into a Amazon Timestream table. |
| Batch load files from S3 into a Amazon Timestream table. |
| Create a new Timestream database. |
| Create a new Timestream database. |
| Delete a given Timestream database. |
| Delete a given Timestream table. |
| List all databases in timestream. |
| List tables in timestream. |
| Run a query and retrieve the result as a Pandas DataFrame. |
| Wait for the Timestream batch load task to complete. |
| Store a Pandas DataFrame into an Amazon Timestream table. |
| Unload query results to Amazon S3. |
| Unload query results to Amazon S3 and read the results as Pandas Data Frame. |
AWS Clean Rooms¶
| Execute Clean Rooms Protected SQL query and return the results as a Pandas DataFrame. |
| Wait for the Clean Rooms protected query to end. |
Amazon EMR¶
| Build the Step structure (dictionary). |
| Build the Step structure (dictionary). |
| Create a EMR cluster with instance fleets configuration. |
| Get the EMR cluster state. |
| Get EMR step state. |
| Update internal ECR credentials. |
| Submit Spark Step. |
| Submit new job in the EMR Cluster. |
| Submit a list of steps. |
| Terminate EMR cluster. |
Amazon EMR Serverless¶
| Create an EMR Serverless application. |
| Run an EMR serverless job. |
| Wait for the EMR Serverless job to finish. |
Amazon CloudWatch Logs¶
| Run a query against AWS CloudWatchLogs Insights and convert the results to Pandas DataFrame. |
| Run a query against AWS CloudWatchLogs Insights and wait the results. |
| Run a query against AWS CloudWatchLogs Insights. |
| Wait query ends. |
| List the log streams for the specified log group, return results as a Pandas DataFrame. |
| List log events from the specified log group. |
Amazon QuickSight¶
| Cancel an ongoing ingestion of data into SPICE. |
| Create a QuickSight data source pointing to an Athena/Workgroup. |
| Create a QuickSight dataset. |
| Create and starts a new SPICE ingestion on a dataset. |
| Delete all dashboards. |
| Delete all data sources. |
| Delete all datasets. |
| Delete all templates. |
| Delete a dashboard. |
| Delete a data source. |
| Delete a dataset. |
| Delete a template. |
| Describe a QuickSight dashboard by name or ID. |
| Describe a QuickSight data source by name or ID. |
| Describe a QuickSight data source permissions by name or ID. |
| Describe a QuickSight dataset by name or ID. |
| Describe a QuickSight ingestion by ID. |
| Get QuickSight dashboard ID given a name and fails if there is more than 1 ID associated with this name. |
| Get QuickSight dashboard IDs given a name. |
| Get QuickSight data source ARN given a name and fails if there is more than 1 ARN associated with this name. |
| Get QuickSight Data source ARNs given a name. |
| Get QuickSight data source ID given a name and fails if there is more than 1 ID associated with this name. |
| Get QuickSight data source IDs given a name. |
| Get QuickSight Dataset ID given a name and fails if there is more than 1 ID associated with this name. |
| Get QuickSight dataset IDs given a name. |
| Get QuickSight template ID given a name and fails if there is more than 1 ID associated with this name. |
| Get QuickSight template IDs given a name. |
| List dashboards in an AWS account. |
| List all QuickSight Data sources summaries. |
| List all QuickSight datasets summaries. |
| List all QuickSight Groups. |
| List all QuickSight Group memberships. |
| List IAM policy assignments in the current Amazon QuickSight account. |
| List all the IAM policy assignments. |
| List the history of SPICE ingestions for a dataset. |
| List all QuickSight templates. |
| Return a list of all of the Amazon QuickSight users belonging to this account. |
| List the Amazon QuickSight groups that an Amazon QuickSight user is a member of. |
AWS STS¶
| Get Account ID. |
| Get current user/role ARN. |
| Get current user/role name. |
AWS Secrets Manager¶
| Get secret value. |
| Get JSON secret value. |
Amazon Chime¶
| Send message on an existing Chime Chat rooms. |
Typing¶
Typed dictionary defining the settings for the Glue table. | |
Typed dictionary defining the settings for using CTAS (Create Table As Statement). | |
Typed dictionary defining the settings for using UNLOAD. | |
Typed dictionary defining the settings for using cached Athena results. | |
Typed dictionary defining the settings for Athena Partition Projection. | |
Report configuration for a batch load task. | |
Configuration for Arrow file decrypting. | |
Configuration for Arrow file encrypting. | |
Typed dictionary defining the settings for distributing calls using Ray. | |
Typed dictionary defining the settings for distributing reading calls using Ray. | |
Typed dictionary defining the dictionary returned by S3 write functions. | |
| Named tuple defining the return value of the |
Global Configurations¶
| Reset one or all (if None is received) configuration values. |
Load all configurations on a Pandas DataFrame. |
Engine and Memory Format¶
| Execution engine configuration class. |
Memory format configuration class. |
Distributed - Ray¶
| Connect to an existing Ray cluster or start one and connect to it. |