emr-cluster
Here are 101 public repositories matching this topic...
Language:All
Sort:Most stars
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
- Updated
Mar 9, 2020 - Python
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
- Updated
Aug 6, 2021 - Jupyter Notebook
Reference Architectures for Datalakes on AWS
- Updated
May 13, 2020 - HTML
Classwork projects and home works done through Udacity data engineering nano degree
- Updated
Dec 12, 2023 - Jupyter Notebook
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS
- Updated
Feb 5, 2025 - HCL
Bits of code I use during live demos
- Updated
Dec 19, 2024 - Jupyter Notebook
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
- Updated
Jun 13, 2022 - Python
This is an ETL application on AWS with general open sales and customer data that you can find here:https://github.com/camposvinicius/data/blob/main/AdventureWorks.zip, it's a zipped file with some .csvs inside that we will apply transformations.
- Updated
Feb 7, 2022 - Smarty
Apache Spark TPC-DS benchmark setup with EMR launch setup
- Updated
Jul 11, 2022 - Smarty
An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.
- Updated
Jan 31, 2023 - Python
A Cassandra Architecture for GDELT Database 🌍
- Updated
Mar 7, 2019 - Shell
Uses EMR clusters to export dynamoDB tables to S3 and generates import steps
- Updated
Sep 16, 2022 - Shell
A boilerplate for spark projects with docker support for local development and scripts for emr support.
- Updated
Dec 2, 2017 - Scala
Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extracts data from S3, transform data using spark, load transformed data back to S3.
- Updated
Jul 12, 2021 - Python
A large-scale data framework that will enable us to store and analyze financial market data and drive future predictions for investment.
- Updated
Mar 7, 2020 - TSQL
This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS command line instructions for setting up and managing the EMR cluster, and a dataset for testing and demonstration purposes.
- Updated
Nov 12, 2023 - Python
- Updated
Jan 5, 2021 - HCL
Generic python library that enables to provision emr clusters with yaml config files (Configuration as Code)
- Updated
Dec 8, 2022 - Python
Event driven EMR via Serverless
- Updated
Nov 22, 2017 - Python
Improve this page
Add a description, image, and links to theemr-cluster topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with theemr-cluster topic, visit your repo's landing page and select "manage topics."