#
large-scale-data-processing
Here are 2 public repositories matching this topic...
Scalable data pre processing and curation toolkit for LLMs
pythondatadata-processingdata-preparationdeduplicationdata-qualitydata-curationdata-prepfine-tuningfast-data-processingdata-processing-pipelinesdatacurationlarge-language-modelsllmllmappslarge-scale-data-processingdatarecipessemantic-deduplicationllm-data-quality
- Updated
Jul 18, 2025 - Python
Open source project for data preparation for GenAI applications
pythondatasparkmalwarecode-qualitydata-preprocessingraydata-preparationdeduplicationdata-prepfinetuningdata-preprocessing-pipelinesdatacurationlarge-language-modelsllmllmappslarge-scale-data-processingdatarecipes
- Updated
Jul 17, 2025 - HTML
Improve this page
Add a description, image, and links to thelarge-scale-data-processing topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thelarge-scale-data-processing topic, visit your repo's landing page and select "manage topics."