Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥

NotificationsYou must be signed in to change notification settings

Hello-SimpleAI/chatgpt-comparison-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Official repository of paper"How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection". Please star, watch, and fork our repo for the active updates!

See also→(📢 Feedback Space for Detectors please feel free to leave your feedback here! 请留下您宝贵的意见!)

image


Human ChatGPT Comparison Corpus (HC3) / 人类-ChatGPT 问答对比语料集

Yes, we propose the firstHuman vs. ChatGPT comparison corpus, namedHC3.

我们提出了第一个Human vs. ChatGPT 对比语料, 叫做HC3.

image

The first version of the HC3 datasets are now available on 🤗 Huggingface Datasets:

在中文社区,HC3 数据集也已在 ModelScope 上可用:

Train/Test splits & filtered versions of the paper, ref to Google Drive links inHC3/README.md.

Dataset Copyright

If the source datasets used in this corpus has a specific license which is stricter than CC-BY-SA, our products follow the same.If not, they follow CC-BY-SA license.

English SplitSourceSource LicenseNote
reddit_eli5ELI5BSD License
open_qaWikiQAPWC Custom
wiki_csaiWikipediaCC-BY-SA
medicineMedical DialogUnknownAsking
financeFiQAUnknownAsking by 📧
Chinese SplitSourceSource LicenseNote
open_qaWebTextQA & BaikeQAMIT license
baikeBaidu BaikeNone
nlpcc_dbqaNLPCC-DBQAUnknownAsking
medicineChinese Medical DialogueCC-BY-NC 4.0
financeFinanceZhidaoCC-BY 4.0
psychologyOn Baidu AI StudioCC0
lawLegalQAUnknownAsking

ChatGPT detectors / 内容检测器

image(Hosted on 🤗 Hugging Face Spaces)

We provide three kinds of detectors, all in Bilingual / 我们提供了三个版本的检测器,且都支持中英文:

  • QA version / 问答版: detect whether ananswer is generated by ChatGPT for certainquestion, using PLM-based classifiers / 判断某个问题的回答是否由ChatGPT生成,使用基于PTM的分类器来开发;
  • Sinlge-text version / 独立文本版: detect whether a piece of text is ChatGPT generated, using PLM-based classifiers / 判断单条文本是否由ChatGPT生成,使用基于PTM的分类器来开发;
  • Linguistic version / 语言学版: detect whether a piece of text is ChatGPT generated, using linguistic features / 判断单条文本是否由ChatGPT生成,使用基于语言学特征的模型来开发;

在 modelscope 中文社区平台,三个版本的检测器也都可用:

The model weights are all available at 🤗 Hugging Face Models:

Model CheckpointsComment
chatgpt-detector-robertaTo detect a single piece of text
chatgpt-qa-detector-robertaTo detect a question-answer pair
chatgpt-detector-roberta-chinese检测单条文本,中文版
chatgpt-qa-detector-roberta-chinese检测一对QA文本,中文版

The English models are based onroberta-base.The Chinese models are based onhfl/chinese-roberta-wwm-ext.


Important Dates / 重要节点:

EventsDates
Project Launch / 项目启动2022-12-09 ✅
Comparison Data Collection / 对比数据收集2022-12-11 to Now 🏎️
Release ChatGPT Detector (Demo) / 检测器 Demo 发布2023-01-11 ✅
Models Release / 模型开源2023-01-18 ✅
Comparison Corpus Release / 语料集开源2023-01-18 ✅
Research Paper / 研究论文发布2023-01-19 ✅
......

Citation

Checkout this paperarxiv: 2301.07597

@article{guo-etal-2023-hc3,    title = "How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection",    author = "Guo, Biyang  and      Zhang, Xin  and      Wang, Ziyuan  and      Jiang, Minqi  and      Nie, Jinran  and      Ding, Yuxuan  and      Yue, Jianwei  and      Wu, Yupeng",    journal={arXiv preprint arxiv:2301.07597}    year = "2023",}

Our Story... / 背景故事

On December 9, 2022, which is 10 days after the launch ofChatGPT, we started this project, for two purposes:

  1. To create someopen-source models for efficiently detecting ChatGPT-generated content;
  2. To collect a valuablehuman-ChatGPT comparison Q&A corpus, to facilitate releated research.

2022 年 12 月 9 日,也就是ChatGPT 推出的第 10 天,我们开始了这个项目,为了两个目的:

  1. 做出一些开源模型工具来高效检测 ChatGPT 生成的内容;
  2. 收集一批有价值的人类和 ChatGPT 对比的中英双语问答语料,来助力相关学术研究。

Welcome to follow our project! We have released a preview of our ChatGPT detectors, and themodels, dataset will be open-sourced in about a week. We look forward to receiving feedback from the community to help improve the models and make contributions toopen academic research together:)
欢迎关注我们项目,我们目前已经发布ChatGPT检测器预览版,并将于约一周内发布开源模型、数据集。期待得到广大群众的反馈,来帮助我们改进模型,为开放的学术研究一起做贡献!

About Us / 关于我们

We are a group of insignificant researchers (in the shadow of ChatGPT) hoping to do some significant work for the community. The team for this projects consists of PhD students and engineers from 6 universities/companies.
我们是一群(在 ChatGPT 的阴影下)渺小的研究人员,但希望为社区做一些有意义的事。这个项目的团队由来自6所大学/公司的博士生和工程师组成。

Biyang GuoMinqi JiangZiyuan WangXin Zhang
Jinran NieYuxuan DingJianwei YueYupeng Wu

[8]ページ先頭

©2009-2025 Movatter.jp