data-profiling
Here are 88 public repositories matching this topic...
Language:All
Sort:Most stars
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
- Updated
Mar 21, 2025 - Python
Always know what to expect from your data.
- Updated
Mar 21, 2025 - Python
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
- Updated
Mar 12, 2025 - Python
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
- Updated
Mar 21, 2025 - TypeScript
Visualize and compare datasets, target values and associations, with one line of code.
- Updated
Aug 6, 2024 - Python
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas)https://www.soda.io
- Updated
Mar 17, 2025 - Python
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
- Updated
Dec 2, 2024 - Python
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
- Updated
Feb 19, 2025 - Java
Automatically find issues in image datasets and practice data-centric computer vision.
- Updated
Apr 23, 2024 - Python
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
- Updated
Feb 20, 2025 - Java
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
- Updated
Jan 5, 2025 - Python
Monitor the stability of a Pandas or Spark dataframe ⚙︎
- Updated
Jan 24, 2025 - Python
Code review for data in dbt
- Updated
Jan 3, 2025 - Python
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
- Updated
Mar 2, 2025 - Python
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
- Updated
Mar 15, 2025 - C++
Databricks framework to validate Data Quality of pySpark DataFrames
- Updated
Mar 19, 2025 - Python
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
- Updated
Jul 15, 2023 - Vue
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.
- Updated
Jan 14, 2025 - Java
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
- Updated
Mar 19, 2025 - Python
Swiple enables you to easily observe, understand, validate and improve the quality of your data
- Updated
Mar 20, 2025 - Python
Improve this page
Add a description, image, and links to thedata-profiling topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedata-profiling topic, visit your repo's landing page and select "manage topics."