Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

JMdict, JMnedict, Kanjidic, KRADFILE/RADKFILE in JSON format

License

NotificationsYou must be signed in to change notification settings

scriptin/jmdict-simplified

Repository files navigation

JMdict,JMnedict,Kanjidic, andKradfile/Radkfile in JSON format
with more comprehensible structure and beginner-friendly documentation

Download JSON filesFormat docs

NPM package: @scriptin/jmdict-simplified-types
NPM package: @scriptin/jmdict-simplified-loader


Why?

Original XML files are less than ideal in terms of format.(My opinion only, the JMdict/JMnedict project in general is absolutely awesome!)This project provides the following changes and improvements:

  1. JSON instead of XML (or custom text format of RADKFILE/KRADFILE).Because the original format used some "advanced" XML features,such as entities and DOCTYPE, it could be quite difficult to use in some tech stacks,e.g. when your programming language of choice has no libraries for parsing some syntax
  2. Regular structure for every item in every collection, no "same as in previous" implicit values.This is a problem with original XML files because users' code has to keep trackof various parts of state while traversing collections. In this project, I tried to make everyitem of every collection "self-contained," with all the fields having all the values,without a need to refer to preceding items
  3. Avoidingnull (with few exceptions) and missing fields, preferring empty arrays.Seehttp://thecodelesscode.com/case/6 for the inspiration for this
  4. Human-readable names for fields instead of cryptic abbreviations with no explanations
  5. Documentation in a single file instead of browsing obscure pages across multiple sites.In my opinion, the documentation is the weakest part of JMDict/JMnedict project

Format

See theFormat documentation orTypeScript types

Please also read the original documentation if you have more questions:

There are also Kotlin types, although they contain some methods and annotations you might not need.

Full, "common-only", with examples, and language-specific versions

There are three main types of JSON files for the JMdict dictionary:

  • full - same as original files, with no omissions of entries
  • "common-only" - containing only dictionary entries considered "common" -if any of/k_ele/ke_pri or/r_ele/re_pri elements in XML files containone of these markers: "news1", "ichi1", "spec1", "spec2", "gai1".Only one such element is enough for the whole word to be considered common.This corresponds to how online dictionaries such ashttps://jisho.orgclassify words as "common". Common-only distributions are much smaller.They are marked with "common" keyword in file names, see thelatest release
  • with example sentences (built from JMdict_e_examp.xml source file) - English-only versionwith example sentences from Tanaka corpus maintained byhttps://tatoeba.org.This version doesn't have a full support in this project: NPM libraries do not provideparsers and type definitions

Also, JMdict and Kanjidic have language-specific versions with language codes(3-letterISO 639-2 codes for JMdict,2-letterISO 639-1 codes for Kanjidic) in file names:

  • all - all languages, i.e. no language filter was applied
  • eng/en - English
  • ger/de - German
  • rus/ru - Russian
  • hun/hu - Hungarian
  • dut/nl - Dutch
  • spa/es - Spanish
  • fre/fr - French
  • swe/sv - Swedish
  • slv/sl - Slovenian

JMnedict and JMdict with examples have only one respective version each,since they are both English-only, and JMnedict has no "common" indicators on entries.

Requirements for running the conversion script

You don't need to install Gradle, just use the Gradle wrapper provided in this repository:gradlew (for Linux/Mac) orgradlew.bat (for Windows)

Converting XML dictionaries

NOTE: You can grab the pre-built JSON files in thelatest release

Use included scripts:gradlew (for Linux/macOS) orgradlew.bat (for Windows).

Tasks to convert dictionary files and create distribution archives:

  • ./gradlew clean - clean all build artifacts to start a fresh build,in cases when you need to re-download and convert from scratch
  • ./gradlew download - download and extract original dictionary XML files intobuild/dict-xml
  • ./gradlew convert - convert all dictionaries to JSON and place intobuild/dict-json
  • ./gradlew archive - create distribution archives (zip, tar+gzip) inbuild/distributions

Utility tasks (for CI/CD workflows):

  • ./gradlew --quiet jmdictHasChanged,./gradlew --quiet jmnedictHasChanged,and./gradlew --quiet kanjidicHasChanged- check if dictionary files have changedby comparing checksums of downloaded files with those stored in thechecksums.OutputsYES orNO. Run this only afterdownload task!The--quiet is to silence Gradle logs, e.g. when you need to put values into environments variables.
  • ./gradlew updateChecksums - update checksum files in thechecksums directory.Run after creating distribution archives and commit checksum files into the repository,so that next time CI/CD workflow knows if it needs to rebuild anything.
  • ./gradlew uberJar - create an Uber JAR for standalone use (i.e. w/o Gradle).The JAR program shows help messages and should be intuitive to use if you know how to run it.

For the full list of available tasks, run./gradlew tasks

Troubleshooting

  • Make sure to run tasks in order:download ->convert ->archive
  • If running Gradle fails, make surejava is available on your$PATH environment variable
  • Run Gradle with--stacktrace,--info, or--debug arguments to see more detailsif you get an error

License

JMdict and JMnedict

The original XML files -JMdict.xml,JMdict_e.xml,JMdict_e_examp.xml,andJMnedict.xml -are the property of the Electronic Dictionary Research and Development Group,and are used in conformance with the Group'slicense.Project started in 1991 by Jim Breen.

All derived files are distributed under the same license, as the original license requires it.

Kanjidic

The originalkanjidic2.xml file is released underCreative Commons Attribution-ShareAlike License v4.0.See theCopyright and Permissionssection on the Kanjidic wiki for details.

All derived files are distributed under the same license, as the original license requires it.

RADKFILE/KRADFILE

The RADKFILE and KRADFILE files are copyright and available under theEDRDG Licence.The copyright of the RADKFILE2 and KRADFILE2 files is held by Jim Rose.

NPM packages

NPM packages@scriptin/jmdict-simplified-types and@scriptin/jmdict-simplified-loader are available underMIT license.

Other files

The source code and other files of this project, excluding the files and packages mentioned above,are available underCreative Commons Attribution-ShareAlike License v4.0.SeeLICENSE.txt


[8]ページ先頭

©2009-2025 Movatter.jp