Movatterモバイル変換
[0]ホーム
rtika 3.2.3 (2025-10-10)
* Updated the bundled Apache Tika configuration to use the 3.2.3 release. Details are at https://tika.apache.org/3.2.3/index.html .* Refreshed the default SHA512 checksum that `install_tika()` uses to validate the download.* Raised the minimum supported Java runtime to 11 to match Apache Tika requirements.
rtika 2.7.0 (2023-04-30)
* Updated Tika to 2.4.1. Details are found at https://tika.apache.org/2.4.1/index.html .* Use tools::R_user_dir() instead of rappdirs, thanks to Maëlle Salmon (rOpenSci)
rtika 2.4.1 (2022-08-25)
* Updated Tika to 2.4.1. Details are found at https://tika.apache.org/2.4.1/index.html .* Use tools::R_user_dir() instead of rappdirs, thanks to Maëlle Salmon (rOpenSci)
rtika 2.0.0 (2021-08-05)
* Updated Tika to 2.0.0. Details are found at https://tika.apache.org/2.0.0/index.html .
rtika 1.23 (2020-04-24)
MINOR IMPROVEMENTS
* Updated Tika to 1.24.1. Details are found at https://tika.apache.org/1.24.1/index.html .
rtika 1.23 (2019-12-12)
MINOR IMPROVEMENTS
* Updated Tika to 1.23. Details are found at https://tika.apache.org/1.23/index.html .
rtika 1.22 (2019-08-01)
MINOR IMPROVEMENTS
* Updated Tika to 1.22
rtika 1.20 (2019-02-26)
MINOR IMPROVEMENTS
* Updated Tika to 1.20* Includes two config files to either turn on or off OCR. This is only relevant on Linux variants that have the Tesseract OCR engine installed.
BUG FIX
* Created a workaround because normalizePath() on Windows produced inconsistent results.
rtika 1.19.1 (2018-07-08)
MINOR IMPROVEMENTS
* Updated Tika to 1.19.1.* Updated the 'sys' package integration, and Jeroen informed the Tika team of an unexpected method of interprocess communication in the batch processor.
rtika 1.1.19 (2018-07-08)
NEW FEATURES
- The new java() function is used get the command to invoke Java forall tika() functions, and allows the option of changing its value acrosssessions. If you want to use a particular installation of Java, set theJAVA_HOME variable using the Sys.setenv(JAVA_HOME = ‘my path’). Thejava() function will check for this variable, and if found return itinstead of the default ‘java’ invocation.
- Updated to Tika version 1.19.
MINOR IMPROVEMENTS
- The tika_check function now uses the more advance SHA512 checksuminstead of the MD5. To implement this, the ‘digest’ package is now adependency.
rtika 0.1.8 (2018-04-25)
MINOR IMPROVEMENTS
- The install_tika() function now gets the Tika 1.18 release that cameout 2018-04-24, instead of the 1.18 development version.
rtika 0.1.7 (2018-03-08)
MINOR IMPROVEMENTS
- The new install_tika() function allows this package to bedistributed on CRAN. The Tika App jar was too large to go on CRANdirectly. The .jar is installed in the directory determined by therappdirs::user_data_dir() function.
- The .onLoad() function now gives various installation advice whenstarting up.
DEPRECATED AND DEFUNCT
- Removed the .jar in favor of the install_tika() function.
rtika 0.1.6 (2018-03-01)
NEW FEATURES
- tika(), tika_xml(), tika_json(), tika_text(), and tika_html() have anew downloader, which preserve the server’s content-type encoding as afile extension when possible. This should help Tika identify and parsedownloaded files more reliably. It depends on the ‘curl’ package.
- Added tika_fetch(), which is a stand alone function to downloadfiles and append a file extension matching the content type declared bythe server. Additional features for this function include specifying thenumber of download retries. The output of tika_fetch() can be pipeddirectly into other tika functions.
- New introductory vignette covers how to use the functions andsurveys several applications.
- tika(), tika_xml(), tika_json(), tika_text(), and tika_html() cannow be set to return=FALSE, which does not return any R character vectorbut invisibly returns NULL. This would be most useful in massive fileconversion jobs with hundreds of thousands of files.
- Used pkgdown to create a website for github pages.
- New tika_json_text() function gets metadata in .json with plain textcontent.
DEPRECATED AND DEFUNCT
- Previous vignette has been removed in favor of new one.
- The tikajar package is not required in this version. Moved the .jarfile back into this package to ease installation until I hear fromCRAN.
rtika 0.1.5 (2018-02-15)
MINOR IMPROVEMENTS
- Added dependency on ‘sys’ package because the ‘system2’ function wascausing intermittent errors by ending tika in mid process.
- Added startup check of the java version, using .onLoad() call to‘java -version’
- Removed redundant conversion to UTF-8, because the Tika batchroutine is already outputting UTF-8.
- Increased the speed of building packages (fewer downloads needed fortesting, and the examples do not run).
- Added Code of Conduct to CONDUCT.md file
- Set default ‘cleanup’ attribute to TRUE.
rtika 0.1.4 (2018-02-15)
MINOR IMPROVEMENTS
- Because it is too big for CRAN, removed the Tika .jar file.
- Added the Tika .jar to a new tikajar package on github.
- Put the ropensci review badge on the tikajar package also, since itsan essential component of this package.
- Updated DESCRIPTION, documentation and .travis.yml to reflect thenew installation routine.
rtika 0.1.3 (2018-02-04)
NEW FEATURES
- added convenience functions that advertise output format:tika_xml(), tika_json(), tika_text(), tika_html().
MINOR IMPROVEMENTS
- README examples use magrittr pipe.
rtika 0.1.2 (2018-01-30)
NEW FEATURES
- added many tests to increase code coverage dramatically.
- integrated the covr package.
MINOR IMPROVEMENTS
- for Windows users, the curl package is recommended to prevent base Rdownload.file from corrupting files.
DEPRECATED AND DEFUNCT
- removed the n_chars parameter in favor of using file.size()internally.
- removed onLoad.R that checked java version in order to speed packageloading and simplify code coverage testing.
rtika 0.1.1 (2018-01-23)
NEW FEATURES
- allows the user to input the URLs and file paths of documents. URLswill be downloaded first to a temporary directory. The previousinterface has been changed.
MINOR IMPROVEMENTS
- the Tika License file is included in the source.
- added a vignette on basic text processing using the library.
rtika 0.1.0 (2018-01-19)
NEW FEATURES
- Initial release.
- R interface to Apache Tika batch processing CLI, found to be themost efficient CLI option.
- tika function returns processing results as a character vector.
- includes the Tika App .jar. Tika source is available at:https://github.com/apache/tika
[8]ページ先頭