- Notifications
You must be signed in to change notification settings - Fork58
Dingo: A Comprehensive AI Data Quality Evaluation Tool
License
MigoXLab/dingo
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
If you like Dingo, please give us a ⭐ on GitHub!
Dingo is a data quality evaluation tool that helps you automatically detect data quality issues in your datasets. Dingo provides a variety of built-in rules and model evaluation methods, and also supports custom evaluation methods. Dingo supports commonly used text datasets and multimodal datasets, including pre-training datasets, fine-tuning datasets, and evaluation datasets. In addition, Dingo supports multiple usage methods, including local CLI and SDK, making it easy to integrate into various evaluation platforms, such asOpenCompass.
pip install dingo-python
fromdingo.config.input_argsimportEvaluatorLLMArgsfromdingo.io.inputimportDatafromdingo.model.llm.llm_text_quality_model_baseimportLLMTextQualityModelBasefromdingo.model.rule.rule_commonimportRuleEnterAndSpacedata=Data(data_id='123',prompt="hello, introduce the world",content="Hello! The world is a vast and diverse place, full of wonders, cultures, and incredible natural beauty.")defllm():LLMTextQualityModelBase.dynamic_config=EvaluatorLLMArgs(key='YOUR_API_KEY',api_url='https://api.openai.com/v1/chat/completions',model='gpt-4o', )res=LLMTextQualityModelBase.eval(data)print(res)defrule():res=RuleEnterAndSpace().eval(data)print(res)
fromdingo.configimportInputArgsfromdingo.execimportExecutor# Evaluate a dataset from Hugging Faceinput_data= {"input_path":"tatsu-lab/alpaca",# Dataset from Hugging Face"dataset": {"source":"hugging_face","format":"plaintext"# Format: plaintext },"executor": {"eval_group":"sft",# Rule set for SFT data"result_save": {"bad":True# Save evaluation results } }}input_args=InputArgs(**input_data)executor=Executor.exec_map["local"](input_args)result=executor.execute()print(result)
python -m dingo.run.cli --input test/env/local_plaintext.json
python -m dingo.run.cli --input test/env/local_json.json
After evaluation (withresult_save.bad=True), a frontend page will be automatically generated. To manually start the frontend:
python -m dingo.run.vsl --input output_directory
Whereoutput_directory contains the evaluation results with asummary.json file.
Try Dingo on our online demo:(Hugging Face)🤗
Try Dingo in local:
cd app_gradiopython app.pyExperience Dingo interactively with Google Colab notebook:
Dingo includes an experimental Model Context Protocol (MCP) server. For details on running the server and integrating it with clients like Cursor, please see the dedicated documentation:
To help you get started quickly with Dingo MCP, we've created a video walkthrough:
mcp_demo.mp4
This video demonstrates step-by-step how to use Dingo MCP server with Cursor.
Dingo provides comprehensive data quality assessment through both rule-based and prompt-based evaluation metrics. These metrics cover multiple quality dimensions including effectiveness, completeness, similarity, security, and more.
📊View Complete Metrics Documentation →
Our evaluation system includes:
- Pretrain Text Quality Assessment Metrics: Pre-training data quality evaluation using DataMan methodology and enhanced multi-dimensional assessment
- SFT Data Assessment Metrics: Honest, Helpful, Harmless evaluation for supervised fine-tuning data
- Classification Metrics: Topic categorization and content classification
- Multimodality Assessment Metrics: Image classification and relevance evaluation
- Rule-Based Quality Metrics: Automated quality checks using heuristic rules for effectiveness and similarity detection
- Factuality Assessment Metrics: Two-stage factuality evaluation based on GPT-5 System Card
- etc
Most metrics are backed by academic sources to ensure objectivity and scientific rigor.
To use these assessment prompts in your evaluations, specify them in your configuration:
input_data= {# Other parameters..."executor": {"prompt_list": ["QUALITY_BAD_SIMILARITY"],# Specific prompt to use },"evaluator": {"llm_config": {"LLMTextQualityPromptBase": {# LLM model to use"model":"gpt-4o","key":"YOUR_API_KEY","api_url":"https://api.openai.com/v1/chat/completions" } } }}
You can customize these prompts to focus on specific quality dimensions or to adapt to particular domain requirements. When combined with appropriate LLM models, these prompts enable comprehensive evaluation of data quality across multiple dimensions.
For detailed guidance on using Dingo's hallucination detection capabilities, including HHEM-2.1-Open local inference and LLM-based evaluation:
📖View Hallucination Detection Guide →
For comprehensive guidance on using Dingo's two-stage factuality evaluation system:
📖View Factuality Assessment Guide →
Dingo provides pre-configured rule groups for different types of datasets:
| Group | Use Case | Example Rules |
|---|---|---|
default | General text quality | RuleColonEnd,RuleContentNull,RuleDocRepeat, etc. |
sft | Fine-tuning datasets | Rules fromdefault plusRuleHallucinationHHEM for hallucination detection |
rag | RAG system evaluation | RuleHallucinationHHEM,PromptHallucination for response consistency |
hallucination | Hallucination detection | PromptHallucination with LLM-based evaluation |
pretrain | Pre-training datasets | Comprehensive set of 20+ rules includingRuleAlphaWords,RuleCapitalWords, etc. |
To use a specific rule group:
input_data= {"executor": {"eval_group":"sft",# Use "default", "sft", "rag", "hallucination", or "pretrain" }# other parameters...}
- Data Sources: Local files, Hugging Face datasets, S3 storage
- Data Types: Pre-training, fine-tuning, and evaluation datasets
- Data Modalities: Text and image
- Built-in Rules: 20+ general heuristic evaluation rules
- LLM Integration: OpenAI, Kimi, and local models (e.g., Llama3)
- Hallucination Detection: HHEM-2.1-Open local model and GPT-based evaluation
- RAG System Evaluation: Response consistency and context alignment assessment
- Custom Rules: Easily extend with your own rules and models
- Security Evaluation: Perspective API integration
- Interfaces: CLI and SDK options
- Integration: Easy integration with other platforms
- Execution Engines: Local and Spark
- Quality Metrics: 7-dimensional quality assessment
- Traceability: Detailed reports for anomaly tracking
If the built-in rules don't meet your requirements, you can create custom ones:
fromdingo.modelimportModelfromdingo.model.rule.baseimportBaseRulefromdingo.config.input_argsimportEvaluatorRuleArgsfromdingo.ioimportDatafromdingo.model.modelresimportModelRes@Model.rule_register('QUALITY_BAD_RELEVANCE', ['default'])classMyCustomRule(BaseRule):"""Check for custom pattern in text"""dynamic_config=EvaluatorRuleArgs(pattern=r'your_pattern_here')@classmethoddefeval(cls,input_data:Data)->ModelRes:res=ModelRes()# Your rule implementation herereturnres
fromdingo.modelimportModelfromdingo.model.llm.base_openaiimportBaseOpenAI@Model.llm_register('my_custom_model')classMyCustomModel(BaseOpenAI):# Custom implementation herepass
See more examples in:
fromdingo.configimportInputArgsfromdingo.execimportExecutorinput_args=InputArgs(**input_data)executor=Executor.exec_map["local"](input_args)result=executor.execute()# Get resultssummary=executor.get_summary()# Overall evaluation summarybad_data=executor.get_bad_info_list()# List of problematic datagood_data=executor.get_good_info_list()# List of high-quality data
fromdingo.configimportInputArgsfromdingo.execimportExecutorfrompyspark.sqlimportSparkSession# Initialize Sparkspark=SparkSession.builder.appName("Dingo").getOrCreate()spark_rdd=spark.sparkContext.parallelize([...])# Your data as Data objectsinput_data= {"executor": {"eval_group":"default","result_save": {"bad":True} }}input_args=InputArgs(**input_data)executor=Executor.exec_map["spark"](input_args,spark_session=spark,spark_rdd=spark_rdd)result=executor.execute()
After evaluation, Dingo generates:
- Summary Report (
summary.json): Overall metrics and scores - Detailed Reports: Specific issues for each rule violation
Report Description:
- score:
num_good/total - type_ratio: The count of type / total, such as:
QUALITY_BAD_COMPLETENESS/total - name_ratio: The count of name / total, such as:
QUALITY_BAD_COMPLETENESS-RuleColonEnd/total
Example summary:
{"task_id":"d6c922ec-981c-11ef-b723-7c10c9512fac","task_name":"dingo","eval_group":"default","input_path":"test/data/test_local_jsonl.jsonl","output_path":"outputs/d6c921ac-981c-11ef-b723-7c10c9512fac","create_time":"20241101_144510","score":50.0,"num_good":1,"num_bad":1,"total":2,"type_ratio": {"QUALITY_BAD_COMPLETENESS":0.5,"QUALITY_BAD_RELEVANCE":0.5 },"name_ratio": {"QUALITY_BAD_COMPLETENESS-RuleColonEnd":0.5,"QUALITY_BAD_RELEVANCE-RuleSpecialCharacter":0.5 }}- Richer graphic and text evaluation indicators
- Audio and video data modality evaluation
- Small model evaluation (fasttext, Qurating)
- Data diversity evaluation
The current built-in detection rules and model methods focus on common data quality problems. For specialized evaluation needs, we recommend customizing detection rules.
We appreciate all the contributors for their efforts to improve and enhanceDingo. Please refer to theContribution Guide for guidance on contributing to the project.
This project uses theApache 2.0 Open Source License.
This project uses fasttext for some functionality including language detection. fasttext is licensed under the MIT License, which is compatible with our Apache 2.0 license and provides flexibility for various usage scenarios.
If you find this project useful, please consider citing our tool:
@misc{dingo, title={Dingo: A Comprehensive AI Data Quality Evaluation Tool for Large Models}, author={Dingo Contributors}, howpublished={\url{https://github.com/MigoXLab/dingo}}, year={2024}}About
Dingo: A Comprehensive AI Data Quality Evaluation Tool
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.




