- Notifications
You must be signed in to change notification settings - Fork0
This is a repository for the AI LAB article "係り受けに基づく日本語単語埋込 (Dependency-based Japanese Word Embeddings)" ( Article URLhttps://ai-lab.lapras.com/nlp/japanese-word-embedding/)
lapras-inc/dependency-based-japanese-word-embeddings
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a repository to share dependency-based Japanese word embeddings which we trained for experiments in the article係り受けに基づく日本語単語埋込 (Dependency-based Japanese Word Embeddings).
We applied the method proposed in the paperDependency-based Word Embeddings to Japanese.
To prepare the training data, we first extracted sentences fromJapanese Wikipedia dumps.
Then, we parsed them using an NLP frameworkGiNZA.
Finally, we trained the embeddings with the script provided inthe page of the paper's first author.
The parameter settings for the experiments is as below where DIM is the number of dimensions written in each file name.
-size DIM -negative 15 -threads 20
You can download the data from links below.
Download beginssoon after you click on a link.
- dep-ja-100dim (85.4 MB)
- 100 dimensional word vectors
- dep-ja-200dim (169.9 MB)
- 200 dimensional word vectors
- dep-ja-300dim (254.5 MB)
- 300 dimensional word vectors
You can use the embeddings in the same way as embeddings trained by using the original implementation ofWord2Vec.
Here is an example code to load them from your Python script.
from gensim.models import KeyedVectorsvectors = KeyedVectors.load_word2vec_format("path/to/embeddings")
When writing your paper using them, please cite this bibtex,
@misc{matsuno2019dependencybasedjapanesewordembeddings, title = {Dependency-based Japanese Word Embeddings}, author = {Tomoki, Matsuno}, affiliation = {LAPRAS inc.}, url = {https://github.com/lapras-inc/dependency-based-japanese-word-embeddings}, year = {2019} }
- 松田寛, 大村舞, 浅原正幸. 短単位品詞の用法曖昧性解決と依存関係ラベリングの同時学習, 言語処理学会 第 25 回年次大会 発表論文集, 2019.
- Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, .
- Levy, O. & Goldberg, Y. (2014). Dependency-Based Word Embeddings. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (p./pp. 302--308), June, Baltimore, Maryland: Association for Computational Linguistics.
About
This is a repository for the AI LAB article "係り受けに基づく日本語単語埋込 (Dependency-based Japanese Word Embeddings)" ( Article URLhttps://ai-lab.lapras.com/nlp/japanese-word-embedding/)
Resources
Uh oh!
There was an error while loading.Please reload this page.