DataOps

Star

DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics. DataOps applies to the entire data lifecycle from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations.

Here are 172 public repositories matching this topic...

Language:All

Filter by language

All172 Python48 Go16 TypeScript14 Jupyter Notebook9 Java8 Shell8 HTML7 JavaScript4 R4 Ruby4

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

cleanlab /cleanlab

Star10.2k

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

data-science annotation data-validation exploratory-data-analysis weak-supervision dataops outlier-detection labeling datasets data-cleaning active-learning data-quality data-profiling data-curation dataquality noisy-labels out-of-distribution-detection data-labeling data-centric-ai llms

UpdatedMar 12, 2025
Python

flyteorg /flyte

Star6.1k

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

python kubernetes golang workflow data-science data machine-learning scale production declarative grpc dataops data-analysis kubernetes-operator hacktoberfest fine-tuning flyte mlops orchestration-engine llm

UpdatedMar 17, 2025
Go

lancedb /lance

Star4.3k

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

python rust data-science machine-learning computer-vision deep-learning embeddings dataops data-analytics data-analysis data-format apache-arrow data-centric mlops duckdb llms

UpdatedMar 16, 2025
Rust

redpanda-data /console

Star3.9k

Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging.

react go typescript kafka web-ui dataops apache-kafka kafka-ui kafka-gui

UpdatedMar 14, 2025
TypeScript

whylabs /whylogs

Star2.7k

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

python data-science machine-learning analytics logging constraints dataset dataops data-pipeline data-quality calculate-statistics data-constraints mlops model-performance ml-pipelines ai-pipelines approximate-statistics statistical-properties

UpdatedJan 10, 2025
Jupyter Notebook

TobikoData /sqlmesh

Star2.2k

Efficient data transformation and modeling framework that is backwards compatible with dbt.

python sql etl dataops dbt elt transformation dataengineering

UpdatedMar 15, 2025
Python

lensesio /fast-data-dev

Star2k

Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, , 20+ connectors

docker kafka schema-registry dataops kafka-rest-proxy

UpdatedJan 30, 2025
Shell

elementary-data /elementary

Star2k

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

bigquery snowflake data-warehouse dataops data-analysis redshift dbt data-pipelines data-pipeline lineage data-governance data-lineage analytics-engineer dbt-packages data-observability data-reliability dbt-artifacts

UpdatedMar 16, 2025
HTML

meltano /meltano

Star2k

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

open-source tap data opensource integration pipelines data-engineering target dataops loaders elt extract-data data-pipelines singer connectors targets taps meltano dataops-platform meltano-sdk

UpdatedMar 15, 2025
Python

alibaba /SREWorks

Star1.9k

Cloud Native DataOps & AIOps Platform | 云原生数智运维平台

kubernetes engineering application devops ops saas dataops maintenance k8s sre flink cloudnative operation aiops oam

UpdatedApr 11, 2024
Java

datavane /tis

Star1.1k

Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI

java etl dataops flink cdc datax flink-streaming chunjun

UpdatedMar 11, 2025
Java

opendatadiscovery /awesome-data-catalogs

Star803

📙 Awesome Data Catalogs and Observability Platforms.

open-source metadata awesome opensource oss big-data opendata ml data-engineering dataops data-catalog data-discovery awesome-list observability data-quality metadata-management datacatalog datadiscovery

UpdatedJul 27, 2024

raystack /optimus

Star746

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

golang bigquery airflow automation etl analytics data-transformation data-warehouse business-intelligence dataops elt workflows data-pipelines data-modelling analytics-engineering

UpdatedJun 8, 2024
Go

tenzir /tenzir

Sponsor

Star670

Tenzir is the data pipeline engine for security teams.

security netflow pcap incident-response pipelines dataops suricata siem sigma soc hacktoberfest zeek investigation threathunting secdataops

UpdatedMar 17, 2025
C++

Azure-Samples /modern-data-warehouse-dataops

Star625

DataOps for Microsoft Data Platform technologies.https://aka.ms/dataops-repo

devops data fabric azure dataops cicd databricks datafactory automatedtesting

UpdatedMar 16, 2025
Shell

taivop /awesome-data-annotation

Star593

A list of tools for annotating data, managing annotations, etc.

nlp awesome computer-vision image-annotation annotations dataops awesome-list data-annotation

UpdatedAug 1, 2024

polyaxon /traceml

Star516

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

tracking data-science statistics spark tensorflow plotly pandas data-visualization pytorch dataops matplotlib dask data-exploration pandas-summary dataframes data-quality-checks data-quality data-profiling explainable-ai mlops

UpdatedJan 5, 2025
Python

Titan-Systems /titan

Star463

Titan Core - Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API. Change Management tool for the Snowflake data warehouse.

devops snowflake data-warehouse secops data-engineering dataops rbac data-governance compliance-as-code

UpdatedMar 13, 2025
Python

vmware /versatile-data-kit

Star443

One framework to develop, deploy and operate data workflows with Python and SQL.

python data-science data sql database pipeline etl analytics snowflake data-warehouse data-structures data-engineering dataops warehouse elt data-pipelines data-engineer trino data-lineage data-engineering-pipeline

UpdatedMar 14, 2025
Python

flowerfine /scaleph

Star377

Open data platform based on Kubernetes. Scaleph supports SeaTunnel、Flink and Doris backended by SeaTunnel on Flink engine、Flink Kubernetes Operator and Doris operator.

dataops data-platform flink dag doris flink-sql seatunnel flink-sql-gateway flink-kubernetes flink-kubernetes-operator doris-operator doris-manager

UpdatedJan 10, 2025
Java

Followers: 46 followers
Wikipedia: Wikipedia

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataOps

Here are 172 public repositories matching this topic...

cleanlab /cleanlab

flyteorg /flyte

lancedb /lance

redpanda-data /console

whylabs /whylogs

TobikoData /sqlmesh

lensesio /fast-data-dev

elementary-data /elementary

meltano /meltano

alibaba /SREWorks

datavane /tis

opendatadiscovery /awesome-data-catalogs

raystack /optimus

tenzir /tenzir

Azure-Samples /modern-data-warehouse-dataops

taivop /awesome-data-annotation

polyaxon /traceml

Titan-Systems /titan

vmware /versatile-data-kit

flowerfine /scaleph

Related Topics