Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

spark-dataframes

Here are 48 public repositories matching this topic...

PySpark-Tutorial provides basic algorithms using PySpark

  • UpdatedMay 26, 2025
  • Jupyter Notebook

Plain Stock Close-Price Prediction via Graves LSTM RNNs

  • UpdatedFeb 15, 2021
  • Java

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

  • UpdatedNov 16, 2022
  • Scala

This repository contains Spark, MLlib, PySpark and Dataframes projects

  • UpdatedOct 22, 2017
  • Jupyter Notebook

Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster

  • UpdatedOct 10, 2019
  • Python

This project utilizes PySpark DataFrames and PySpark RDD to implement item-based collaborative filtering. By calculating cosine similarity scores or identifying movies with the highest number of shared viewers, the system recommends 10 similar movies for a given target movie that aligns users’ preferences.

  • UpdatedJun 29, 2024
  • Jupyter Notebook

Getting started with Apache Spark

  • UpdatedFeb 16, 2024

BCG GAMMA CASE STUDY

  • UpdatedJan 27, 2023
  • Jupyter Notebook

Use this project to join data from multiple csv files. Currently in this project we support one to one and one to many join. Along with this you can find how to use kafka producer efficiently with spark.

  • UpdatedJul 1, 2022
  • Java

Big Data - Split a large CSV file into N smaller ones and save them into the local disk

  • UpdatedNov 3, 2018
  • Scala

Data Science and Engineering project - Programming for Big Data @ Simon Fraser University (SFU)

  • UpdatedJan 2, 2023
  • Jupyter Notebook

Collection of PySpark programs and projects demonstrating the use of Apache Spark's Python API for big data processing and analysis. It includes practical implementations such as logistic regression classification, data analysis on the Iris dataset, and basic PySpark operations like temperature conversion.

  • UpdatedSep 29, 2025
  • Jupyter Notebook

This is our final project for SFU's CMPT 353 taught by Greg Baker during Summer 2023

  • UpdatedAug 23, 2023
  • Python

Improve this page

Add a description, image, and links to thespark-dataframes topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thespark-dataframes topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp