Athena
Amazon Athena is a serverless, interactive analytics service builton open-source frameworks, supporting open-table and file formats.
Athena
provides a simplified,flexible way to analyze petabytes of data where it lives. Analyze data or build applicationsfrom an Amazon Simple Storage Service (S3) data lake and 30 data sources, including on-premises datasources or other cloud systems using SQL or Python.Athena
is built on open-sourceTrino
andPresto
engines andApache Spark
frameworks, with no provisioning or configuration effort required.
This notebook goes over how to load documents fromAWS Athena
.
Setting up
Followinstructions to set up an AWS account.
Install a python library:
! pip install boto3
Example
from langchain_community.document_loaders.athenaimport AthenaLoader
API Reference:AthenaLoader
database_name="my_database"
s3_output_path="s3://my_bucket/query_results/"
query="SELECT * FROM my_table"
profile_name="my_profile"
loader= AthenaLoader(
query=query,
database=database_name,
s3_output_uri=s3_output_path,
profile_name=profile_name,
)
documents= loader.load()
print(documents)
Example with metadata columns
database_name="my_database"
s3_output_path="s3://my_bucket/query_results/"
query="SELECT * FROM my_table"
profile_name="my_profile"
metadata_columns=["_row","_created_at"]
loader= AthenaLoader(
query=query,
database=database_name,
s3_output_uri=s3_output_path,
profile_name=profile_name,
metadata_columns=metadata_columns,
)
documents= loader.load()
print(documents)
Related
- Document loaderconceptual guide
- Document loaderhow-to guides