Spark

Apache Spark is a unified analytics engine forlarge-scale data processing. It provides high-level APIs in Scala, Java,Python, and R, and an optimized engine that supports general computationgraphs for data analysis. It also supports a rich set of higher-leveltools includingSpark SQL for SQL and DataFrames,pandas API on Sparkfor pandas workloads,MLlib for machine learning,GraphX for graph processing, andStructured Streaming for stream processing.

Document loaders

PySpark

It loads data from aPySpark DataFrame.

See ausage example.

from langchain_community.document_loadersimport PySparkDataFrameLoader

API Reference:PySparkDataFrameLoader

Tools/Toolkits

Spark SQL toolkit

Toolkit for interacting withSpark SQL.

See ausage example.

from langchain_community.agent_toolkitsimport SparkSQLToolkit, create_spark_sql_agent
from langchain_community.utilities.spark_sqlimport SparkSQL

API Reference:SparkSQLToolkit |create_spark_sql_agent |SparkSQL

Spark SQL individual tools

You can use individual tools from the Spark SQL Toolkit:

InfoSparkSQLTool: tool for getting metadata about a Spark SQL
ListSparkSQLTool: tool for getting tables names
QueryCheckerTool: tool uses an LLM to check if a query is correct
QuerySparkSQLTool: tool for querying a Spark SQL

from langchain_community.tools.spark_sql.toolimport InfoSparkSQLTool
from langchain_community.tools.spark_sql.toolimport ListSparkSQLTool
from langchain_community.tools.spark_sql.toolimport QueryCheckerTool
from langchain_community.tools.spark_sql.toolimport QuerySparkSQLTool

API Reference:InfoSparkSQLTool |ListSparkSQLTool |QueryCheckerTool |QuerySparkSQLTool

Movatterモバイル変換

Document loaders​

PySpark​

Tools/Toolkits​

Spark SQL toolkit​

Spark SQL individual tools​

Document loaders

PySpark

Tools/Toolkits

Spark SQL toolkit

Spark SQL individual tools