Spark Session #

The entry point to programming Spark with the Dataset and DataFrame API.To create a Spark session, you should useSparkSession.builder attribute.See alsoSparkSession.

SparkSession.active()

Returns the active or defaultSparkSession for the current thread, returned by the builder.

`SparkSession.builder.appName`(name)	Sets a name for the application, which will be shown in the Spark web UI.
`SparkSession.builder.config`([key, value, ...])	Sets a config option.
`SparkSession.builder.enableHiveSupport`()	Enables Hive support, including connectivity to a persistent Hive metastore, support for Hive SerDes, and Hive user-defined functions.
`SparkSession.builder.getOrCreate`()	Gets an existing`SparkSession` or, if there is no existing one, creates a new one based on the options set in this builder.
`SparkSession.builder.master`(master)	Sets the Spark master URL to connect to, such as "local" to run locally, "local[4]" to run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster.
`SparkSession.builder.remote`(url)	Sets the Spark remote URL to connect to, such as "sc://host:port" to run it via Spark Connect server.

`SparkSession.addArtifact`(*path[, pyfile, ...])	Add artifact(s) to the client session.
`SparkSession.addArtifacts`(*path[, pyfile, ...])	Add artifact(s) to the client session.
`SparkSession.addTag`(tag)	Add a tag to be assigned to all the operations started by this thread in this session.
`SparkSession.catalog`	Interface through which the user may create, drop, alter or query underlying databases, tables, functions, etc.
`SparkSession.clearTags`()	Clear the current thread's operation tags.
`SparkSession.conf`	Runtime configuration interface for Spark.
`SparkSession.createDataFrame`(data[, schema, ...])	Creates a`DataFrame` from an`RDD`, a list, a`pandas.DataFrame`, a`numpy.ndarray`, or a`pyarrow.Table`.
`SparkSession.dataSource`	Returns a`DataSourceRegistration` for data source registration.
`SparkSession.getActiveSession`()	Returns the active`SparkSession` for the current thread, returned by the builder
`SparkSession.getTags`()	Get the tags that are currently set to be assigned to all the operations started by this thread.
`SparkSession.interruptAll`()	Interrupt all operations of this session currently running on the connected server.
`SparkSession.interruptOperation`(op_id)	Interrupt an operation of this session with the given operationId.
`SparkSession.interruptTag`(tag)	Interrupt all operations of this session with the given operation tag.
`SparkSession.newSession`()	Returns a new`SparkSession` as new session, that has separate SQLConf, registered temporary views and UDFs, but shared`SparkContext` and table cache.
`SparkSession.profile`	Returns a`Profile` for performance/memory profiling.
`SparkSession.removeTag`(tag)	Remove a tag previously added to be assigned to all the operations started by this thread in this session.
`SparkSession.range`(start[, end, step, ...])	Create a`DataFrame` with single`pyspark.sql.types.LongType` column named`id`, containing elements in a range from`start` to`end` (exclusive) with step value`step`.
`SparkSession.read`	Returns a`DataFrameReader` that can be used to read data in as a`DataFrame`.
`SparkSession.readStream`	Returns a`DataStreamReader` that can be used to read data streams as a streaming`DataFrame`.
`SparkSession.sparkContext`	Returns the underlying`SparkContext`.
`SparkSession.sql`(sqlQuery[, args])	Returns a`DataFrame` representing the result of the given query.
`SparkSession.stop`()	Stop the underlying`SparkContext`.
`SparkSession.streams`	Returns a`StreamingQueryManager` that allows managing all the`StreamingQuery` instances active onthis context.
`SparkSession.table`(tableName)	Returns the specified table as a`DataFrame`.
`SparkSession.tvf`	Returns a`tvf.TableValuedFunction` that can be used to call a table-valued function (TVF).
`SparkSession.udf`	Returns a`UDFRegistration` for UDF registration.
`SparkSession.udtf`	Returns a`UDTFRegistration` for UDTF registration.
`SparkSession.version`	The version of Spark on which this application is running.
`is_remote`()	Returns if the current running environment is for Spark Connect.

Spark Connect Only#

SparkSession.builder.create()

Creates a new SparkSession.

`SparkSession.clearProgressHandlers`()	Clear all registered progress handlers.
`SparkSession.client`	Gives access to the Spark Connect client.
`SparkSession.copyFromLocalToFs`(local_path, ...)	Copy file from local to cloud storage file system.
`SparkSession.registerProgressHandler`(handler)	Register a progress handler to be called when a progress update is received from the server.
`SparkSession.removeProgressHandler`(handler)	Remove a progress handler that was previously registered.

On this page

Show Source

Movatterモバイル変換

Spark Session#

Spark Connect Only#

Spark Session #