Spark Streaming (Legacy)#
Core Classes#
| Main entry point for Spark Streaming functionality. |
| A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see |
Streaming Management#
Add a [[org.apache.spark.streaming.scheduler.StreamingListener]] object for receiving system events related to streaming. | |
| Wait for the execution to stop. |
Wait for the execution to stop. | |
| Sets the context to periodically checkpoint the DStream operations for master fault-tolerance. |
Return either the currently active StreamingContext (i.e., if there is a context started but not stopped) or None. | |
Either return the active StreamingContext (i.e. | |
| Either recreate a StreamingContext from checkpoint data or create a new StreamingContext. |
| Set each DStreams in this context to remember RDDs it generated in the last given duration. |
Return SparkContext which is associated with this StreamingContext. | |
Start the execution of the streams. | |
| Stop the execution of the streams, with option of ensuring all received data has been processed. |
| Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams. |
| Create a unified DStream from multiple DStreams of the same type and same slide duration. |
Input and Output#
Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as flat binary files with records of fixed length. | |
| Create an input stream from a queue of RDDs or list. |
| Create an input from TCP source hostname:port. |
| Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as text files. |
| Print the first num elements of each RDD generated in this DStream. |
| Save each RDD in this DStream as at text file, using string representation of elements. |
Transformations and Actions#
Persist the RDDs of this DStream with the default storage level (MEMORY_ONLY). | |
| Enable periodic checkpointing of RDDs of this DStream |
| Return a new DStream by applying 'cogroup' between RDDs of this DStream andother DStream. |
| Return a new DStream by applying combineByKey to each RDD. |
Return the StreamingContext associated with this DStream | |
Return a new DStream in which each RDD has a single element generated by counting each RDD of this DStream. | |
Return a new DStream in which each RDD contains the counts of each distinct value in each RDD of this DStream. | |
| Return a new DStream in which each RDD contains the count of distinct elements in RDDs in a sliding window over this DStream. |
| Return a new DStream in which each RDD has a single element generated by counting the number of elements in a window over this DStream. |
Return a new DStream containing only the elements that satisfy predicate. | |
| Return a new DStream by applying a function to all elements of this DStream, and then flattening the results |
Return a new DStream by applying a flatmap function to the value of each key-value pairs in this DStream without changing the key. | |
| Apply a function to each RDD in this DStream. |
| Return a new DStream by applying 'full outer join' between RDDs of this DStream andother DStream. |
Return a new DStream in which RDD is generated by applying glom() to RDD of this DStream. | |
| Return a new DStream by applying groupByKey on each RDD. |
| Return a new DStream by applyinggroupByKey over a sliding window. |
| Return a new DStream by applying 'join' between RDDs of this DStream andother DStream. |
| Return a new DStream by applying 'left outer join' between RDDs of this DStream andother DStream. |
| Return a new DStream by applying a function to each element of DStream. |
| Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs of this DStream. |
| Return a new DStream in which each RDD is generated by applying mapPartitionsWithIndex() to each RDDs of this DStream. |
Return a new DStream by applying a map function to the value of each key-value pairs in this DStream without changing the key. | |
| Return a copy of the DStream in which each RDD are partitioned using the specified partitioner. |
| Persist the RDDs of this DStream with the given storage level |
| Return a new DStream in which each RDD has a single element generated by reducing each RDD of this DStream. |
| Return a new DStream by applying reduceByKey to each RDD. |
| Return a new DStream by applying incrementalreduceByKey over a sliding window. |
| Return a new DStream in which each RDD has a single element generated by reducing all elements in a sliding window over this DStream. |
| Return a new DStream with an increased or decreased level of parallelism. |
| Return a new DStream by applying 'right outer join' between RDDs of this DStream andother DStream. |
| Return all the RDDs between 'begin' to 'end' (both included) |
| Return a new DStream in which each RDD is generated by applying a function on each RDD of this DStream. |
| Return a new DStream in which each RDD is generated by applying a function on each RDD of this DStream and 'other' DStream. |
| Return a new DStream by unifying data of another DStream with this DStream. |
| Return a new "state" DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values of the key. |
| Return a new DStream in which each RDD contains all the elements in seen in a sliding window of time over this DStream. |
Kinesis#
| Create an input stream that pulls messages from a Kinesis stream. |