data-centric
Here are 39 public repositories matching this topic...
Language:All
Sort:Most stars
Low-code framework for building custom LLMs, neural networks, and other AI models
- Updated
Mar 3, 2025 - Python
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
- Updated
Mar 20, 2025 - Rust
A curated, but incomplete, list of data-centric AI resources.
- Updated
Jun 26, 2024
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
- Updated
Dec 9, 2024 - Python
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
- Updated
Feb 6, 2025 - Python
DataCLUE: 数据为中心的NLP基准和工具包
- Updated
May 11, 2022 - Python
Rust implementation of the Data Distribution Service (DDS)
- Updated
Mar 14, 2025 - Rust
Simulator framework for analysis of performance, energy consumption, area and cost of multi-node multi-chiplet tile-based manycore designs
- Updated
Jun 30, 2024 - C++
[ICLR'23] Implementation of "Empowering Graph Representation Learning with Test-Time Graph Transformation"
- Updated
Jun 23, 2023 - Python
🔥🔥🔥 KDD2024 Best Student Paper
- Updated
Feb 21, 2025 - Python
A Data Centric NER annotation tool for your Named Entity Recognition projects
- Updated
Apr 10, 2024
Vue Form with Laravel Inspired Validation and Simply Enjoyable Error Messages Api. (Form Api, Validator Api, Rules Api, Error Messages Api)
- Updated
Mar 5, 2022 - JavaScript
A list of data-efficient and data-centric LLM (Large Language Model) papers. Our Survey Paper: Towards Efficient LLM Post Training: A Data-centric Perspective
- Updated
Feb 19, 2025
An observer is a wrapper over JSON data, that provides an interface to know when data is changed, with a focus on performance and memory efficiency.
- Updated
Aug 27, 2021 - JavaScript
Codes for a Top 5% finish in the Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI
- Updated
Oct 18, 2021 - Jupyter Notebook
From local functions to cloud deployed pipelines
- Updated
Mar 18, 2023 - Python
Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data (NeurIPS 2022)
- Updated
Mar 20, 2023 - Jupyter Notebook
Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information"
- Updated
Jun 28, 2023 - Python
Data-SUITE: Data-centric identification of in-distribution incongruous examples (ICML 2022)
- Updated
Mar 8, 2023 - Jupyter Notebook
The official Python library for Openlayer, the Continuous Model Improvement Platform for AI. 📈
- Updated
Mar 19, 2025 - Python
Improve this page
Add a description, image, and links to thedata-centric topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedata-centric topic, visit your repo's landing page and select "manage topics."