New function convert_kanji for universal conversion between kanji formats.
New function sedist for computing the stroke edit distance by Lars Yencken.
compare_neighborhoods gave obscure errors when stroke edit distances involved kanji with index > 2133. Fixed by returning an explicit error if the key kanji has such an index and setting the corresponding return value to NA if any of the closest kanji in the kanji distance has such an index.kanjidist with approx = "pc" or approx = "pcweighted" now runs only for kanjivec objects generated with kanjistat 0.13.0 or newer.The structure of kanjivec objects has been extended. Each strokes in the stroketree component now has an additional attribute "beziermat" which describes the Bézier curves of the stroke in a standardized 2 x (1+3n) matrix format (n = number of curves). The new structure is fully backward compatible. Whether a given kanjivec object kan follows the new structure can be tested by attr(kan, "kanjistat_version") >= 0.13.0. The kvecjoyo dataset on https://github.com/dschuhmacher/kanjistat.data has been updated accordingly.
New function compare_neighborhoods, which currently compares stroke edit distances and kanji distances in a dstrokedit neighborhood of a given kanji and optionally extends the comparison to nearest neighbors in the kanji distance. This function is still somewhat experimental.
kanjidist and kanjidistmat have a new parameter minor_warnings which toggles any warnings that can be ignored by most users. These warnings usually point to issues in the underlying kanjivec data or the kanjidist computation that are currently addressed by workarounds.
approx = "pc" or approx = "pcweighted" runs considerably faster with the new kanjivec objects, because the inefficient (multiple) parsing of d attributes from previous versions is now avoided.kanjivec objects. Fixed in the internal functions. Both kanjivec with non-default parameter bezier_discr and kanjidist with approx = "pc" or approx = "pcweighted" should run now in all cases without problems (tested for Jouyou kanji).Function kanjidist has a new argument approx, which specifies how the strokes are to be approximated for computing component distances. The three options "grid", "pc" or "pcweighted" work in any combination with the three options for the type argument (which now strictly specifies the type of distance used for the components).
Function kanjivec has a new argument bezier_discr, which may be any of "svgparser", "eqtimed" and "eqspaced", specifing, for the discretization of the strokes in the stroketree component, which code is used and according to which strategy the points are placed.
Data set pooled_similarity contains the human similarity judgements of kanji from Yencken and Baldwin (2008).
point cloud approximations ("pc" and "pcweighted") use (approximately) equispaced points on the Bézier curves now.
Various speed improvements to options "pc" and "pcweighted".
kanjidist for compo_seg_depth1 >= 5 returned an error. Fixed.Function kanjidist accepts two new type arguments "pc" and "pcweighted" for computing component distances based on (weighted) point clouds rather than bitmap images.
Data sets dstrokedit and dyehli added with stroke edit and Yeh-Li (bag-of-radicals) distances between Jouyou kanji and (usually a bit more than) their closest ten neighbors. Based on the PhD thesis by Lars Yencken (2010).
kanjimat cut off part of the kanji under the default setting marging = 0 on Windows. The algorithm for setting the effective margin in the bitmap representation has been improved.read_kanjidic2, which reads a KANJIDIC2 file and converts it to a list. All kanji information in the original file is retained, but the structure is simplified.cjk_escape, which replaces CJK characters by their Unicode escape sequences in files.More extensive readme file and main package vignette.
Add package website using pkgdown.
plotkanji. This function now plots several kanji in possibly different fonts. A parameter filename was added for devices that plot to a file.print.kanjivec() to package exports.