Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

Spark Session#

The entry point to programming Spark with the Dataset and DataFrame API.To create a Spark session, you should useSparkSession.builder attribute.See alsoSparkSession.

SparkSession.active()

Returns the active or defaultSparkSession for the current thread, returned by the builder.

SparkSession.builder.appName(name)

Sets a name for the application, which will be shown in the Spark web UI.

SparkSession.builder.config([key, value, ...])

Sets a config option.

SparkSession.builder.enableHiveSupport()

Enables Hive support, including connectivity to a persistent Hive metastore, support for Hive SerDes, and Hive user-defined functions.

SparkSession.builder.getOrCreate()

Gets an existingSparkSession or, if there is no existing one, creates a new one based on the options set in this builder.

SparkSession.builder.master(master)

Sets the Spark master URL to connect to, such as "local" to run locally, "local[4]" to run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster.

SparkSession.builder.remote(url)

Sets the Spark remote URL to connect to, such as "sc://host:port" to run it via Spark Connect server.

SparkSession.addArtifact(*path[, pyfile, ...])

Add artifact(s) to the client session.

SparkSession.addArtifacts(*path[, pyfile, ...])

Add artifact(s) to the client session.

SparkSession.addTag(tag)

Add a tag to be assigned to all the operations started by this thread in this session.

SparkSession.catalog

Interface through which the user may create, drop, alter or query underlying databases, tables, functions, etc.

SparkSession.clearTags()

Clear the current thread's operation tags.

SparkSession.conf

Runtime configuration interface for Spark.

SparkSession.createDataFrame(data[, schema, ...])

Creates aDataFrame from anRDD, a list, apandas.DataFrame, anumpy.ndarray, or apyarrow.Table.

SparkSession.dataSource

Returns aDataSourceRegistration for data source registration.

SparkSession.getActiveSession()

Returns the activeSparkSession for the current thread, returned by the builder

SparkSession.getTags()

Get the tags that are currently set to be assigned to all the operations started by this thread.

SparkSession.interruptAll()

Interrupt all operations of this session currently running on the connected server.

SparkSession.interruptOperation(op_id)

Interrupt an operation of this session with the given operationId.

SparkSession.interruptTag(tag)

Interrupt all operations of this session with the given operation tag.

SparkSession.newSession()

Returns a newSparkSession as new session, that has separate SQLConf, registered temporary views and UDFs, but sharedSparkContext and table cache.

SparkSession.profile

Returns aProfile for performance/memory profiling.

SparkSession.removeTag(tag)

Remove a tag previously added to be assigned to all the operations started by this thread in this session.

SparkSession.range(start[, end, step, ...])

Create aDataFrame with singlepyspark.sql.types.LongType column namedid, containing elements in a range fromstart toend (exclusive) with step valuestep.

SparkSession.read

Returns aDataFrameReader that can be used to read data in as aDataFrame.

SparkSession.readStream

Returns aDataStreamReader that can be used to read data streams as a streamingDataFrame.

SparkSession.sparkContext

Returns the underlyingSparkContext.

SparkSession.sql(sqlQuery[, args])

Returns aDataFrame representing the result of the given query.

SparkSession.stop()

Stop the underlyingSparkContext.

SparkSession.streams

Returns aStreamingQueryManager that allows managing all theStreamingQuery instances active onthis context.

SparkSession.table(tableName)

Returns the specified table as aDataFrame.

SparkSession.tvf

Returns atvf.TableValuedFunction that can be used to call a table-valued function (TVF).

SparkSession.udf

Returns aUDFRegistration for UDF registration.

SparkSession.udtf

Returns aUDTFRegistration for UDTF registration.

SparkSession.version

The version of Spark on which this application is running.

is_remote()

Returns if the current running environment is for Spark Connect.

Spark Connect Only#

SparkSession.builder.create()

Creates a new SparkSession.

SparkSession.clearProgressHandlers()

Clear all registered progress handlers.

SparkSession.client

Gives access to the Spark Connect client.

SparkSession.copyFromLocalToFs(local_path, ...)

Copy file from local to cloud storage file system.

SparkSession.registerProgressHandler(handler)

Register a progress handler to be called when a progress update is received from the server.

SparkSession.removeProgressHandler(handler)

Remove a progress handler that was previously registered.


[8]ページ先頭

©2009-2025 Movatter.jp