Movatterモバイル変換

idiolect

Theidiolect R package is designed to provide acomprehensive suite of tools for performing comparative authorshipanalysis within a forensic context using the Likelihood Ratio Framework(e.g. Ishihara 2021; Nini 2023). The package contains a set ofauthorship analysis functions that take a set of texts as input andoutput scores that can then be calibrated into likelihood ratios. Thepackage is dependent onquanteda (Benoit etal. 2018) for all Natural Language Processing functions.

Installation

You can installidiolect from CRAN:

install.packages("idiolect")

Workflow

The main functions contained in the package reflect the typicalworkflow for authorship analysis for forensic problems:

Input data usingcreate_corpus();
Optionally mask the content/topic of the texts usingcontentmask();
Launch an analysis (e.g. delta(),ngram_tracing(),impostors());
Test the performance of the method on ground truth data usingperformance();
Finally, apply the method to the questioned text and generate alikelihood ratio withcalibrate_LLR().

Check the website and the vignette for examples.

References

Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng,Stefan Müller, and Akitaka Matsuo. 2018. “Quanteda: An r Package for theQuantitative Analysis of Textual Data.”Journal of Open SourceSoftware 3 (30).https://doi.org/10.21105/joss.00774.

Ishihara, Shunichi. 2021. “Score-Based Likelihood Ratios for LinguisticText Evidence with a Bag-of-Words Model.”Forensic ScienceInternational 327: 110980.https://doi.org/10.1016/j.forsciint.2021.110980.

Nini, Andrea. 2023.A Theory of Linguistic Individuality forAuthorship Analysis. Elements in Forensic Linguistics. Cambridge,UK: Cambridge University Press.

[8]ページ先頭