Movatterモバイル変換


[0]ホーム

URL:


Hackers and Slackers
Hackers and Slackers Mobile

Hackers and Slackers

Sign inSubscribe

Learn Apache Spark

Build enterprise-grade data pipelines to clean and manipulate data on a massive scale. Use PySpark to clean & transform tabular data and build live data streams that constantly ingest and process information. Take advantage of Spark's horizontally scalable infrastructure to effectively run pipelines across multiple machines simultaneously.

What You'll Learn

  • Interact with Spark via a convenient notebook interface
  • Tabular data sanitation & transformations
  • Complex joins and aggregations to get more from your data
  • Create a structured stream from a remote data source
  • Become familiar with Spark's underlying RDD data structure

that

  • Have a basic understanding of Python
  • Are current or aspiring Data Engineers
  • Need better tools to handle large volumes of data

1

Learning Apache Spark with PySpark & Databricks

Get started with Apache Spark in part 1 of our series, where we leverage Databricks and PySpark.
13 min read
2

Transforming PySpark DataFrames

Apply transformations to PySpark DataFrames such as creating new columns, filtering rows, or modifying string & number values.
15 min read
3

Cleaning PySpark DataFrames

Easy DataFrame cleaning techniques ranging from dropping rows to selecting important data.
18 min read
4

Structured Streaming in PySpark

Become familiar with building a structured stream in PySpark using the Databricks interface.
8 min read
5

Working with PySpark RDDs

Working with Spark's original data structure API: Resilient Distributed Datasets.
8 min read
6

Join and Aggregate PySpark DataFrames

Perform SQL-like joins and aggregations on your PySpark DataFrames.
7 min read

[8]ページ先頭

©2009-2025 Movatter.jp