Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

data-selection

Here are 45 public repositories matching this topic...

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

  • UpdatedOct 20, 2024
  • Jupyter Notebook

DSIR large-scale data selection framework for language model training

  • UpdatedApr 7, 2024
  • Python

A Survey on Data Selection for Language Models

  • UpdatedApr 29, 2025

⛔ [DEPRECATED] Adapt Transformer-based language models to new text domains

  • UpdatedFeb 21, 2024
  • Jupyter Notebook

🔥[VLDB'26] Official repository for the paper "LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning".

  • UpdatedJun 3, 2025
  • Python

Code for ACL 2025 Main paper "Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning".

  • UpdatedAug 4, 2025
  • Python

[ACL 2025 main] SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Models

  • UpdatedAug 6, 2025
  • Python

DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

  • UpdatedDec 17, 2025
  • Python

[ACL2025 Findings] Official code for MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space

  • UpdatedAug 30, 2025
  • Python

[ACL 2023] The code for our ACL'23 paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach

  • UpdatedJun 1, 2024
  • Python

Implementation of TSDS: Data Selection for Task-Specific Model Finetuning. An optimal-transport framework for selecting domain-specific and task-specific training data to improve LLM finetuning and instruction tuning.

  • UpdatedDec 25, 2024
  • Python

This is an official repository for "Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources" (NeurIPS 2023).

  • UpdatedOct 26, 2023
  • Python

Enhanced spatio-temporal electric load forecasts with less data using active deep learning

  • UpdatedFeb 7, 2023
  • Jupyter Notebook

Dynamic Transfer Learning for Low-Resource Neural Machine Translation

  • UpdatedAug 4, 2020
  • Python

Repository for the experiments in my paper accepted to the CLIN Journal: "Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts"

  • UpdatedNov 28, 2025
  • Python

Code for NeurIPS 2023 Paper (Imitation Learning from Imperfection: Theoretical Justifications and Algorithms)

  • UpdatedSep 22, 2023
  • Python

Improve this page

Add a description, image, and links to thedata-selection topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thedata-selection topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp