- Notifications
You must be signed in to change notification settings - Fork0
ロールプレイで収集した日本語のカウンセリング対話データセット
License
UEC-InabaLab/KokoroChat
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
KokoroChat: A Japanese Psychological Counseling Dialogue Dataset Collected via Role-Playing by Trained Counselors
KokoroChat is the largest human-collected Japanese psychological counseling dialogue dataset to date (as of June 2025). It was created through role-playing between trained counselors and includes rich, long-form dialogues and detailed client feedback on counseling quality. The dataset supports research on empathetic response generation, dialogue evaluation, and mental health-oriented language modeling.
This work has beenaccepted to the main conference of ACL 2025.📄View Paper (arXiv)
- 6,589 dialogues, collected between 2020 and 2024
- Avg. 91.2 utterances per dialogue
- 480 trained counselors simulating online text-based counseling sessions
- 20-dimension Likert-scale client feedback for every session
- Broad topic coverage: mental health, school, family, workplace, romantic issues, etc.
Category | Total | Counselor | Client |
---|---|---|---|
# Dialogues | 6,589 | - | - |
# Speakers | 480 | 424 | 463 |
# Utterances | 600,939 | 306,495 | 294,444 |
Avg. utterances/dialogue | 91.20 | 46.52 | 44.69 |
Avg. length/utterance | 28.39 | 35.84 | 20.63 |
Each sample contains:
- A full counseling dialogue with role labels (counselor / client) and message timestamps
- Structured client feedback on 20 dimensions (0–5 Likert scale)
- Flags for ethical concern checks (optional)
- Predicted topic label (automatically annotated by GPT-4o-mini)
👉 See thekokorochat_dialogues
folder for the complete dataset.
You can also access our full dataset and fine-tuned models via Hugging Face:
- 📁Dataset:KokoroChat-dataset
We fine-tuned three counseling dialogue models based onLlama-3.1-Swallow-8B-Instruct-v0.3, using different subsets of the KokoroChat dataset filtered by client feedback score:
- 🔵Llama-3.1-KokoroChat-Low: Fine-tuned on 3,870 dialogues with feedback scores< 70
- 🟢Llama-3.1-KokoroChat-High: Fine-tuned on 2,601 dialogues with feedback scoresbetween 70 and 98
- ⚫Llama-3.1-KokoroChat-Full: Fine-tuned on 6,471 dialogues with feedback scores≤ 98
If you use this dataset, please cite the following paper:
@inproceedings{qi2025kokorochat,title ={KokoroChat: A Japanese Psychological Counseling Dialogue Dataset Collected via Role-Playing by Trained Counselors},author ={Zhiyang Qi and Takumasa Kaneko and Keiko Takamizo and Mariko Ukiyo and Michimasa Inaba},booktitle ={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics},year ={2025},url ={https://github.com/UEC-InabaLab/KokoroChat}}
KokoroChat is released under theCreative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.
About
ロールプレイで収集した日本語のカウンセリング対話データセット
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.