- Notifications
You must be signed in to change notification settings - Fork0
DataCody/pvoutput-streaming-monitoring
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Designed and implemented a complete data pipeline for solar panel performance monitoring using real-time web scraping, data cleaning (bronze/silver/gold layers), and interactive dashboards.
🔧 Technologies:
Python, Selenium, BeautifulSoup, pandas, dbt, Airflow, Databricks, PostgreSQL, Plotly/Dash
✨ Key Features:
- 📅Automated daily data extraction frompvoutput.org
- 🧹Data transformation into bronze → silver → gold layers usingdbt
- ⏱️Scheduled workflows and job orchestration viaAirflow
- 📊Interactive dashboards to visualize:
- System efficiency
- Power generation trends
- Anomalies and system health
- 🔁Modular design with support formultiple solar systems (multi-SID)
python3 -m venv venvsource venv/bin/activate
pip install -r requirements.txt
Leveraged PySpark in Databricks to aggregate multi-site solar energy data across 30,000+ records, enabling system-wide performance benchmarking, anomaly detection, and cross-SID comparisons. Data stored as Delta tables for efficient downstream dashboard consumption.
About
End-to-End PV Monitoring & Streaming Pipeline with Delta Lake
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.