nu-dialogue/j-moshiPublic

NotificationsYou must be signed in to change notification settings
Fork18
Star261

J-Moshi: A Japanese Full-duplex Spoken Dialogue System

License

View license

261 stars 18 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
static		static
LICENSE		LICENSE
README-en.md		README-en.md
README.md		README.md
index.html		index.html

Repository files navigation

J-Moshi: A Japanese Full-duplex Spoken Dialogue System

📑Paper | 🤗Model | 🖥️Demo | 🔧Training Code

J-Moshiは，日本語におけるfull-duplex音声対話システムです．英語における7Bパラメータのfull-duplex音声対話モデルMoshi をベースとし，日本語音声対話データでの追加学習によって構築されました．発話のオーバーラップや相槌など，人間同士の対話におけるような自然なターンテイキングをリアルタイムに実現します．詳細は我々の論文を参照してください．

このリポジトリでは，J-Moshiの学習済みモデル，およびモデルとの対話方法を提供します．また，J-Moshiが生成した音声のサンプルや，J-Moshi の学習に使用された学習コードベースも公開されています．

Note

J-Moshiは試作段階であり，その応答は不自然な場合があります．また，J-Moshiの学習データの大部分は雑談対話であるため，ユーザの指示に従った応答を生成することはできません．

Models

以下の2種類のJ-Moshiが公開されています:

nu-dialogue/j-moshi
- kyutai/moshiko-pytorch-bf16をベースとし，大規模な日本語音声対話データによって学習されたモデル
nu-dialogue/j-moshi-ext
- kyutai/moshiko-pytorch-bf16をベースとし，大規模な日本語音声対話データおよび，Multi-stream TTSを用いて合成された拡張データによって学習されたモデル

また各リポジトリには，以下の3つのモデルファイルが含まれています:

model.safetensors
- J-Moshi本体の重み．
tokenizer_spm_32k_3.model
- テキストトークナイザ．rinna/japanese-gpt2-mediumの日本語SentencePieceモデル．
tokenizer-e351c8d8-checkpoint125.safetensors
- 音声トークナイザ．kyutai/moshiko-pytorch-bf16のMimiモデル．

Interactive Demo

Kyutai公式のMoshiのPyTorch実装を用いて，J-Moshiと対話することができます．実装の詳細は，オリジナルMoshiのリポジトリkyutai-labs/moshi を参照してください．

Installation

Python 3.10以上が必要です．

pip install moshi<=0.2.2

Usage

moshi.serverを実行することで，対話用のweb UIを起動できます．--hf-repoオプションでJ-Moshiの 🤗HuggingFace Hubリポジトリ（nu-dialogue/j-moshi,nu-dialogue/j-moshi-ext）を指定してください．

python -m moshi.server --hf-repo nu-dialogue/j-moshi-ext

Tips

実行には，24GB以上のVRAMを搭載したLinux GPUマシンが必要です．MacOSには対応していません．
モデルの発話音声がエコーすることを避けるため，対話時にはスピーカではなくイヤホン・ヘッドホンを使用してください．音声デバイスはweb UIアクセス時にブラウザ上で設定できます．

Training Details

J-Moshiの学習では，以下の音声対話コーパスを使用しました．また，これらデータに加え，J-Moshi-extの学習では，テキスト対話コーパスから音声合成された拡張データも使用しました．使用したコーパスは以下の通りです:

音声対話コーパス
- J-CHAT
- 日本語Callhome
- CSJ
- 旅行代理店対話コーパス
- 雑談対話コーパス（内製）
- 相談対話コーパス（内製）
テキスト対話コーパス

学習では，128基のNVIDIA V100 32GB GPUを使用しました．

Terms of Use

J-MoshiはCC BY-NC 4.0の下で公開されており，研究目的での利用を想定しています．本モデルは，なりすましや詐欺など，いかなる悪意ある目的での使用も意図していません．また，本モデルの出力には，学習データに起因するバイアスや不正確もしくは攻撃的な情報が含まれる可能性があります．我々はその使用によって生じるいかなる損害についても責任を負いません．

Acknowledgments

本研究は，JSTムーンショット型研究開発事業，JPMJMS2011の支援を受けました．雑談対話コーパスおよび相談対話コーパスは，株式会社アイシンとの共同研究において構築しました．また本研究では，名古屋大学のスーパーコンピュータ「不老」を利用しました．最後に，Moshi のテクニカルペーパーおよびモデルを公開していただいた Kyutai Labs に感謝いたします．

Citation

@inproceedings{ohashi2025jmoshi,title={Towards a Japanese Full-duplex Spoken Dialogue System},author={Ohashi, Atsumoto and Iizuka, Shinya and Jiang, Jingjing and Higashinaka, Ryuichiro},booktitle={Proceedings of the 26th Interspeech Conference},year={2025},}@inproceedings{ohashi2025jmoshi,title ="日本語 {F}ull-duplex 音声対話システムの試作",author ="大橋 厚元 and 飯塚 慎也 and 姜 菁菁 and 東中 竜一郎",booktitle ="言語処理学会 第31回年次大会 発表論文集",pages ="3164--3169",year ="2025",url ="https://www.anlp.jp/proceedings/annual_meeting/2025/pdf_dir/D8-6.pdf"}

About

J-Moshi: A Japanese Full-duplex Spoken Dialogue System

nu-dialogue.github.io/j-moshi

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

J-Moshi: A Japanese Full-duplex Spoken Dialogue System

Models

Interactive Demo

Installation

Usage

Tips

Training Details

Terms of Use

Acknowledgments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

nu-dialogue/j-moshi

Folders and files

Latest commit

History

Repository files navigation

J-Moshi: A Japanese Full-duplex Spoken Dialogue System

Models

Interactive Demo

Installation

Usage

Tips

Training Details

Terms of Use

Acknowledgments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages