data-selection
Here are 45 public repositories matching this topic...
Language:All
Sort:Most stars
Official Repository of "LLM × DATA" Survey Paper
- Updated
Nov 2, 2025
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
- Updated
Oct 20, 2024 - Jupyter Notebook
DSIR large-scale data selection framework for language model training
- Updated
Apr 7, 2024 - Python
A Survey on Data Selection for Language Models
- Updated
Apr 29, 2025
⛔ [DEPRECATED] Adapt Transformer-based language models to new text domains
- Updated
Feb 21, 2024 - Jupyter Notebook
🔥[VLDB'26] Official repository for the paper "LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning".
- Updated
Jun 3, 2025 - Python
Code for ACL 2025 Main paper "Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning".
- Updated
Aug 4, 2025 - Python
InstructionGPT-4
- Updated
Dec 29, 2023 - Python
[ACL 2025 main] SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Models
- Updated
Aug 6, 2025 - Python
DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.
- Updated
Dec 17, 2025 - Python
[ACL2025 Findings] Official code for MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
- Updated
Aug 30, 2025 - Python
[ACL 2023] The code for our ACL'23 paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach
- Updated
Jun 1, 2024 - Python
Implementation of TSDS: Data Selection for Task-Specific Model Finetuning. An optimal-transport framework for selecting domain-specific and task-specific training data to improve LLM finetuning and instruction tuning.
- Updated
Dec 25, 2024 - Python
This is an official repository for "Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources" (NeurIPS 2023).
- Updated
Oct 26, 2023 - Python
Enhancing Efficiency in Multidevice Federated Learning through Data Selection
- Updated
Apr 15, 2024 - Python
Enhanced spatio-temporal electric load forecasts with less data using active deep learning
- Updated
Feb 7, 2023 - Jupyter Notebook
Keras sentence classification
- Updated
Apr 27, 2018 - Python
Dynamic Transfer Learning for Low-Resource Neural Machine Translation
- Updated
Aug 4, 2020 - Python
Repository for the experiments in my paper accepted to the CLIN Journal: "Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts"
- Updated
Nov 28, 2025 - Python
Code for NeurIPS 2023 Paper (Imitation Learning from Imperfection: Theoretical Justifications and Algorithms)
- Updated
Sep 22, 2023 - Python
Improve this page
Add a description, image, and links to thedata-selection topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedata-selection topic, visit your repo's landing page and select "manage topics."