data-centric
Here are 44 public repositories matching this topic...
Language:All
Sort:Most stars
Low-code framework for building custom LLMs, neural networks, and other AI models
- Updated
Jan 19, 2026 - Python
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
- Updated
Feb 20, 2026 - Rust
A curated, but incomplete, list of data-centric AI resources.
- Updated
Jun 26, 2024
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
- Updated
Dec 9, 2024 - Python
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
- Updated
May 23, 2025 - Python
Rust implementation of the Data Distribution Service (DDS)
- Updated
Jan 21, 2026 - Rust
DataCLUE: 数据为中心的NLP基准和工具包
- Updated
May 11, 2022 - Python
Simulator framework for analysis of performance, energy consumption, area and cost of multi-node multi-chiplet tile-based manycore designs
- Updated
Jun 30, 2024 - C++
🔥🔥🔥 KDD2024 Best Student Paper
- Updated
Feb 21, 2025 - Python
[ICLR'23] Implementation of "Empowering Graph Representation Learning with Test-Time Graph Transformation"
- Updated
Jun 23, 2023 - Python
A list of data-efficient and data-centric LLM (Large Language Model) papers. Our Survey Paper: Towards Efficient LLM Post Training: A Data-centric Perspective
- Updated
Feb 19, 2025
A Data Centric NER annotation tool for your Named Entity Recognition projects
- Updated
Apr 10, 2024
Vue Form with Laravel Inspired Validation and Simply Enjoyable Error Messages Api. (Form Api, Validator Api, Rules Api, Error Messages Api)
- Updated
Mar 5, 2022 - JavaScript
Codes for a Top 5% finish in the Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI
- Updated
Oct 18, 2021 - Jupyter Notebook
An observer is a wrapper over JSON data, that provides an interface to know when data is changed, with a focus on performance and memory efficiency.
- Updated
Aug 27, 2021 - JavaScript
Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data (NeurIPS 2022)
- Updated
Mar 20, 2023 - Jupyter Notebook
Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information"
- Updated
Jun 28, 2023 - Python
From local functions to cloud deployed pipelines
- Updated
Mar 18, 2023 - Python
The official Python library for Openlayer, the Continuous Model Improvement Platform for AI. 📈
- Updated
Feb 20, 2026 - Python
Open-source Data Backend written in Java and based on PostgreSQL & GraphQL.
- Updated
Jul 30, 2018 - JavaScript
Improve this page
Add a description, image, and links to thedata-centric topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedata-centric topic, visit your repo's landing page and select "manage topics."