pyspark.TaskContext#
- classpyspark.TaskContext[source]#
Contextual information about a task which can be read or mutated duringexecution. To access the TaskContext for a running task, use:
TaskContext.get().New in version 2.2.0.
Examples
>>>frompysparkimportTaskContext
Get a task context instance from
RDD.>>>spark.sparkContext.setLocalProperty("key1","value")>>>taskcontext=spark.sparkContext.parallelize([1]).map(lambda_:TaskContext.get()).first()>>>isinstance(taskcontext.attemptNumber(),int)True>>>isinstance(taskcontext.partitionId(),int)True>>>isinstance(taskcontext.stageId(),int)True>>>isinstance(taskcontext.taskAttemptId(),int)True>>>taskcontext.getLocalProperty("key1")'value'>>>isinstance(taskcontext.cpus(),int)True
Get a task context instance from a dataframe via Python UDF.
>>>frompyspark.sqlimportRow>>>frompyspark.sql.functionsimportudf>>>@udf("STRUCT<anum: INT, partid: INT, stageid: INT, taskaid: INT, prop: STRING, cpus: INT>")...deftaskcontext_as_row():...taskcontext=TaskContext.get()...returnRow(...anum=taskcontext.attemptNumber(),...partid=taskcontext.partitionId(),...stageid=taskcontext.stageId(),...taskaid=taskcontext.taskAttemptId(),...prop=taskcontext.getLocalProperty("key2"),...cpus=taskcontext.cpus())...>>>spark.sparkContext.setLocalProperty("key2","value")>>>[(anum,partid,stageid,taskaid,prop,cpus)]=(...spark.range(1).select(taskcontext_as_row()).first()...)>>>isinstance(anum,int)True>>>isinstance(partid,int)True>>>isinstance(stageid,int)True>>>isinstance(taskaid,int)True>>>prop'value'>>>isinstance(cpus,int)True
Get a task context instance from a dataframe via Pandas UDF.
>>>importpandasaspd>>>frompyspark.sql.functionsimportpandas_udf>>>@pandas_udf("STRUCT<"..."anum: INT, partid: INT, stageid: INT, taskaid: INT, prop: STRING, cpus: INT>")...deftaskcontext_as_row(_):...taskcontext=TaskContext.get()...returnpd.DataFrame({..."anum":[taskcontext.attemptNumber()],..."partid":[taskcontext.partitionId()],..."stageid":[taskcontext.stageId()],..."taskaid":[taskcontext.taskAttemptId()],..."prop":[taskcontext.getLocalProperty("key3")],..."cpus":[taskcontext.cpus()]...})...>>>spark.sparkContext.setLocalProperty("key3","value")>>>[(anum,partid,stageid,taskaid,prop,cpus)]=(...spark.range(1).select(taskcontext_as_row("id")).first()...)>>>isinstance(anum,int)True>>>isinstance(partid,int)True>>>isinstance(stageid,int)True>>>isinstance(taskaid,int)True>>>prop'value'>>>isinstance(cpus,int)True
Methods
How many times this task has been attempted.
cpus()CPUs allocated to the task.
get()Return the currently active
TaskContext.getLocalProperty(key)Get a local property set upstream in the driver, or None if it is missing.
The ID of the RDD partition that is computed by this task.
Resources allocated to the task.
stageId()The ID of the stage that this task belong to.
An ID that is unique to this task attempt (within the same
SparkContext, no two task attempts will share the same attempt ID).