spark-dataframes
Here are 48 public repositories matching this topic...
Sort:Most stars
PySpark-Tutorial provides basic algorithms using PySpark
- Updated
May 26, 2025 - Jupyter Notebook
Plain Stock Close-Price Prediction via Graves LSTM RNNs
- Updated
Feb 15, 2021 - Java
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
- Updated
Dec 4, 2025 - HTML
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
- Updated
May 19, 2021 - Scala
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
- Updated
Nov 16, 2022 - Scala
This repository contains Spark, MLlib, PySpark and Dataframes projects
- Updated
Oct 22, 2017 - Jupyter Notebook
Various data stream/batch process demo with Apache Scala Spark 🚀
- Updated
Feb 28, 2020 - Scala
Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster
- Updated
Oct 10, 2019 - Python
Apache Spark Basics - Java Examples
- Updated
Sep 9, 2016 - Java
A library having Java and Scala examples for Spark 2.x
- Updated
Dec 29, 2016 - Java
Spark BigQuery Parallel
- Updated
Jan 24, 2019 - Scala
This project utilizes PySpark DataFrames and PySpark RDD to implement item-based collaborative filtering. By calculating cosine similarity scores or identifying movies with the highest number of shared viewers, the system recommends 10 similar movies for a given target movie that aligns users’ preferences.
- Updated
Jun 29, 2024 - Jupyter Notebook
Use this project to join data from multiple csv files. Currently in this project we support one to one and one to many join. Along with this you can find how to use kafka producer efficiently with spark.
- Updated
Jul 1, 2022 - Java
Big Data - Split a large CSV file into N smaller ones and save them into the local disk
- Updated
Nov 3, 2018 - Scala
Data Science and Engineering project - Programming for Big Data @ Simon Fraser University (SFU)
- Updated
Jan 2, 2023 - Jupyter Notebook
Collection of PySpark programs and projects demonstrating the use of Apache Spark's Python API for big data processing and analysis. It includes practical implementations such as logistic regression classification, data analysis on the Iris dataset, and basic PySpark operations like temperature conversion.
- Updated
Sep 29, 2025 - Jupyter Notebook
This is our final project for SFU's CMPT 353 taught by Greg Baker during Summer 2023
- Updated
Aug 23, 2023 - Python
Improve this page
Add a description, image, and links to thespark-dataframes topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thespark-dataframes topic, visit your repo's landing page and select "manage topics."