data-synthesis
Here are 60 public repositories matching this topic...
Sort:Most stars
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
- Updated
Aug 14, 2024
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
- Updated
Sep 30, 2025 - Python
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
- Updated
Mar 9, 2025
Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured training workflows to bridge the gap between design and front-end development.
- Updated
Mar 26, 2025 - Python
[CVPR 2020--Oral] CycleISP: Real Image Restoration via Improved Data Synthesis
- Updated
Sep 24, 2024 - Python
Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)
- Updated
Aug 5, 2025 - Python
Official Repository of "LLM × DATA" Survey Paper
- Updated
Oct 4, 2025
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
- Updated
Sep 30, 2025 - Python
[CVPR 2023] Label-Free Liver Tumor Segmentation
- Updated
Sep 12, 2025 - Python
[CVPR 2024] Generalizable Tumor Synthesis - Realistic Synthetic Tumors in Liver, Pancreas, and Kidney
- Updated
Aug 9, 2025 - Python
[ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
- Updated
Aug 31, 2025 - Jupyter Notebook
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks
- Updated
Sep 17, 2025 - Python
[ICLR 2025] Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video
- Updated
Apr 9, 2025 - C++
[EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs
- Updated
Aug 25, 2025 - Python
Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning [ICLR 2025]
- Updated
Jan 24, 2025 - Python
Repository for the results of my master thesis, about the generation and evaluation of synthetic data using GANs
- Updated
Jun 21, 2023 - Jupyter Notebook
Official Code for “EarthSynth: Generating Informative Earth Observation with Diffusion Models”
- Updated
Sep 25, 2025 - Python
Code & data for ICLR 2024 spotlight paper: 🍯MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
- Updated
May 29, 2024 - C++
Source code for LDPTrace: Locally Differentially Private Trajectory Synthesis. VLDB 2023.
- Updated
Nov 13, 2023 - Python
[Preprint] Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis
- Updated
Sep 21, 2025 - Python
Improve this page
Add a description, image, and links to thedata-synthesis topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedata-synthesis topic, visit your repo's landing page and select "manage topics."