DataOps
DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics. DataOps applies to the entire data lifecycle from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations.
Here are 173 public repositories matching this topic...
Language:All
Sort:Most stars
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
- Updated
Apr 10, 2025 - Python
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
- Updated
Apr 11, 2025 - Go
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
- Updated
Apr 11, 2025 - Rust
Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging.
- Updated
Apr 11, 2025 - TypeScript
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
- Updated
Jan 10, 2025 - Jupyter Notebook
Scalable and efficient data transformation framework - backwards compatible with dbt.
- Updated
Apr 13, 2025 - Python
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
- Updated
Apr 10, 2025 - HTML
Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, , 20+ connectors
- Updated
Mar 26, 2025 - Shell
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
- Updated
Apr 13, 2025 - Python
Cloud Native DataOps & AIOps Platform | 云原生数智运维平台
- Updated
Apr 11, 2024 - Java
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
- Updated
Apr 13, 2025 - Java
📙 Awesome Data Catalogs and Observability Platforms.
- Updated
Apr 2, 2025
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
- Updated
Jun 8, 2024 - Go
Tenzir is the data pipeline engine for security teams.
- Updated
Apr 11, 2025 - C++
DataOps for Microsoft Data Platform technologies.https://aka.ms/dataops-repo
- Updated
Apr 10, 2025 - Shell
A list of tools for annotating data, managing annotations, etc.
- Updated
Aug 1, 2024
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
- Updated
Apr 7, 2025 - Python
Titan Core - Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API. Change Management tool for the Snowflake data warehouse.
- Updated
Mar 13, 2025 - Python
One framework to develop, deploy and operate data workflows with Python and SQL.
- Updated
Apr 7, 2025 - Python
Open data platform based on Kubernetes. Scaleph supports SeaTunnel、Flink and Doris backended by SeaTunnel on Flink engine、Flink Kubernetes Operator and Doris operator.
- Updated
Jan 10, 2025 - Java