New functionconvert_kanji for universal conversionbetween kanji formats.
New functionsedist for computing the stroke editdistance by Lars Yencken.
compare_neighborhoods gave obscure errors when strokeedit distances involved kanji with index > 2133. Fixed by returningan explicit error if the key kanji has such an index and setting thecorresponding return value to NA if any of the closest kanji in thekanji distance has such an index.kanjidist withapprox = "pc" orapprox = "pcweighted" now runs only forkanjivec objects generated with kanjistat 0.13.0 ornewer.The structure ofkanjivec objects has been extended.Each strokes in thestroketree component now has anadditional attribute"beziermat" which describes the Béziercurves of the stroke in a standardized 2 x (1+3n) matrix format (n =number of curves). The new structure is fully backward compatible.Whether a given kanjivec objectkan follows the newstructure can be tested byattr(kan, "kanjistat_version") >= 0.13.0. Thekvecjoyo dataset onhttps://github.com/dschuhmacher/kanjistat.data has beenupdated accordingly.
New functioncompare_neighborhoods, which currentlycompares stroke edit distances and kanji distances in a dstrokeditneighborhood of a given kanji and optionally extends the comparison tonearest neighbors in the kanji distance. This function is still somewhatexperimental.
kanjidist andkanjidistmat have a newparameterminor_warnings which toggles any warnings thatcan be ignored by most users. These warnings usually point to issues inthe underlyingkanjivec data or thekanjidistcomputation that are currently addressed by workarounds.
approx = "pc" orapprox = "pcweighted" runs considerably faster with the newkanjivec objects, because the inefficient (multiple)parsing ofd attributes from previous versions is nowavoided.kanjivec objects. Fixed in the internalfunctions. Bothkanjivec with non-default parameterbezier_discr andkanjidist withapprox = "pc" orapprox = "pcweighted" shouldrun now in all cases without problems (tested for Jouyou kanji).Functionkanjidist has a new argumentapprox, which specifies how the strokes are to beapproximated for computing component distances. The three options“grid”, “pc” or “pcweighted” work in any combination with the threeoptions for thetype argument (which now strictly specifiesthe type of distance used for the components).
Functionkanjivec has a new argumentbezier_discr, which may be any of “svgparser”, “eqtimed”and “eqspaced”, specifing, for the discretization of the strokes in thestroketree component, which code is used and according towhich strategy the points are placed.
Data setpooled_similarity contains the humansimilarity judgements of kanji from Yencken and Baldwin (2008).
point cloud approximations (“pc” and “pcweighted”) use(approximately) equispaced points on the Bézier curves now.
Various speed improvements to options “pc” and“pcweighted”.
kanjidist for compo_seg_depth1 >= 5 returnedan error. Fixed.Functionkanjidist accepts two newtypearguments “pc” and “pcweighted” for computing component distances basedon (weighted) point clouds rather than bitmap images.
Data setsdstrokedit anddyehli addedwith stroke edit and Yeh-Li (bag-of-radicals) distances between Jouyoukanji and (usually a bit more than) their closest ten neighbors. Basedon the PhD thesis by Lars Yencken (2010).
kanjimat cut off part of the kanjiunder the default settingmarging = 0 on Windows. Thealgorithm for setting the effective margin in the bitmap representationhas been improved.read_kanjidic2, which reads a KANJIDIC2 fileand converts it to a list. All kanji information in the original file isretained, but the structure is simplified.cjk_escape, which replaces CJK charactersby their Unicode escape sequences in files.More extensive readme file and main package vignette.
Add package website usingpkgdown.
plotkanji. This function nowplots several kanji in possibly different fonts. A parameterfilename was added for devices that plot to a file.print.kanjivec() to package exports.