- Notifications
You must be signed in to change notification settings - Fork121
Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥
Hello-SimpleAI/chatgpt-comparison-detection
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Official repository of paper"How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection". Please star, watch, and fork our repo for the active updates!
See also→(📢 Feedback Space for Detectors please feel free to leave your feedback here! 请留下您宝贵的意见!)
Yes, we propose the firstHuman vs. ChatGPT comparison corpus, namedHC3.
我们提出了第一个Human vs. ChatGPT 对比语料, 叫做HC3.
The first version of the HC3 datasets are now available on 🤗 Huggingface Datasets:
在中文社区,HC3 数据集也已在 ModelScope 上可用:
Train/Test splits & filtered versions of the paper, ref to Google Drive links inHC3/README.md.
If the source datasets used in this corpus has a specific license which is stricter than CC-BY-SA, our products follow the same.If not, they follow CC-BY-SA license.
English Split | Source | Source License | Note |
---|---|---|---|
reddit_eli5 | ELI5 | BSD License | |
open_qa | WikiQA | PWC Custom | |
wiki_csai | Wikipedia | CC-BY-SA | |
medicine | Medical Dialog | Unknown | Asking |
finance | FiQA | Unknown | Asking by 📧 |
Chinese Split | Source | Source License | Note |
---|---|---|---|
open_qa | WebTextQA & BaikeQA | MIT license | |
baike | Baidu Baike | None | |
nlpcc_dbqa | NLPCC-DBQA | Unknown | Asking |
medicine | Chinese Medical Dialogue | CC-BY-NC 4.0 | |
finance | FinanceZhidao | CC-BY 4.0 | |
psychology | On Baidu AI Studio | CC0 | |
law | LegalQA | Unknown | Asking |
(Hosted on 🤗 Hugging Face Spaces)
We provide three kinds of detectors, all in Bilingual / 我们提供了三个版本的检测器,且都支持中英文:
- QA version / 问答版: detect whether ananswer is generated by ChatGPT for certainquestion, using PLM-based classifiers / 判断某个问题的回答是否由ChatGPT生成,使用基于PTM的分类器来开发;
- Sinlge-text version / 独立文本版: detect whether a piece of text is ChatGPT generated, using PLM-based classifiers / 判断单条文本是否由ChatGPT生成,使用基于PTM的分类器来开发;
- Linguistic version / 语言学版: detect whether a piece of text is ChatGPT generated, using linguistic features / 判断单条文本是否由ChatGPT生成,使用基于语言学特征的模型来开发;
在 modelscope 中文社区平台,三个版本的检测器也都可用:
The model weights are all available at 🤗 Hugging Face Models:
Model Checkpoints | Comment |
---|---|
chatgpt-detector-roberta | To detect a single piece of text |
chatgpt-qa-detector-roberta | To detect a question-answer pair |
chatgpt-detector-roberta-chinese | 检测单条文本,中文版 |
chatgpt-qa-detector-roberta-chinese | 检测一对QA文本,中文版 |
The English models are based onroberta-base.The Chinese models are based onhfl/chinese-roberta-wwm-ext.
Events | Dates |
---|---|
Project Launch / 项目启动 | 2022-12-09 ✅ |
Comparison Data Collection / 对比数据收集 | 2022-12-11 to Now 🏎️ |
Release ChatGPT Detector (Demo) / 检测器 Demo 发布 | 2023-01-11 ✅ |
Models Release / 模型开源 | 2023-01-18 ✅ |
Comparison Corpus Release / 语料集开源 | 2023-01-18 ✅ |
Research Paper / 研究论文发布 | 2023-01-19 ✅ |
... | ... |
Checkout this paperarxiv: 2301.07597
@article{guo-etal-2023-hc3, title = "How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection", author = "Guo, Biyang and Zhang, Xin and Wang, Ziyuan and Jiang, Minqi and Nie, Jinran and Ding, Yuxuan and Yue, Jianwei and Wu, Yupeng", journal={arXiv preprint arxiv:2301.07597} year = "2023",}
On December 9, 2022, which is 10 days after the launch ofChatGPT, we started this project, for two purposes:
- To create someopen-source models for efficiently detecting ChatGPT-generated content;
- To collect a valuablehuman-ChatGPT comparison Q&A corpus, to facilitate releated research.
2022 年 12 月 9 日,也就是ChatGPT 推出的第 10 天,我们开始了这个项目,为了两个目的:
- 做出一些开源模型工具来高效检测 ChatGPT 生成的内容;
- 收集一批有价值的人类和 ChatGPT 对比的中英双语问答语料,来助力相关学术研究。
Welcome to follow our project! We have released a preview of our ChatGPT detectors, and themodels, dataset will be open-sourced in about a week. We look forward to receiving feedback from the community to help improve the models and make contributions toopen academic research together:)
欢迎关注我们项目,我们目前已经发布ChatGPT检测器预览版,并将于约一周内发布开源模型、数据集。期待得到广大群众的反馈,来帮助我们改进模型,为开放的学术研究一起做贡献!
We are a group of insignificant researchers (in the shadow of ChatGPT) hoping to do some significant work for the community. The team for this projects consists of PhD students and engineers from 6 universities/companies.
我们是一群(在 ChatGPT 的阴影下)渺小的研究人员,但希望为社区做一些有意义的事。这个项目的团队由来自6所大学/公司的博士生和工程师组成。
Biyang Guo | Minqi Jiang | Ziyuan Wang | Xin Zhang |
Jinran Nie | Yuxuan Ding | Jianwei Yue | Yupeng Wu |
About
Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Uh oh!
There was an error while loading.Please reload this page.