Movatterモバイル変換

NaileR

Prep, polish, top coat, this vanity case is exactly what you need toput the final touch to your statistical analysis.

Overview

Thanks to the Ollama API that allows to use Large Language Model(LLM) locally, we developed a small package designed for interpretingcontinuous or categorical latent variables. You provide a data set witha latent variable you want to understand and some other explanatoryvariables. It provides a description of the latent variable based on theexplanatory variables. It also provides a name to the latent variable.‘NaileR’ is an R package that uses convenience functions offered by the‘FactoMineR’package (condes(), catdes(), descfreq()) in conjunction with the‘ollamar’package.

Its two main goals are to: * generate latent variables descriptionswith the help of AI * offer similarity measure tools for textualdata

Installation (from GitHub)

If needed, install the devtools package.

install.packages('devtools')

Install and load the ‘NaileR’ package from GitHub.

devtools::install_github('Nelhe/NaileR')library(NaileR)

Usage

‘NaileR’ currently features 15 datasets and 9 functions.

Datasets

agri_studies: contains the results of a Q method-like survey onagribusiness studies
beard, beard_cont and beard_wide: contain the results of asensometrics experiment on beards
boss: contains the results of a Q method-like survey on the idealboss
glossophobia: contains the results of a Q method-like survey onfeelings about speaking in public
local_food: contains the results of a Q method-like survey onsustainable food systems
quality: contains the results of a survey on French foodcertification logos
waste: contains the results of a survey on food waste
rorschach: this dataset was initially collected to understand theperception of the Rorschach test
fabric: this dataset was initially collected to understand the freejar data
atomic_habit, car_alone, atomic_habit_clust: a survey forunderstanding atomic habits
nutriscore: these data were collected after a survey on thenutri-score

Functions

nail_catdes(): performs a catdes analysis on a dataset and describeseach category
nail_condes(): performs a condes analysis on a dataset and describesthe chosen continuous variable
nail_descfreq(): performs a descfreq analysis on a contingency tableand describes the rows
nail_textual(): generate an LLM response to analyze a categoricallatent variable, based on answers to open-ended questions
nail_qda(): performs a decat analysis on QDA dara and describes thestimuli
nail_sort(): performs clustering on textual data from sensometricsexperiments
sim_llm(): computes the similarity between texts
dist_mat_llm(): computes a distance matrix based on sim_llm
dist_ref_llm(): computes a distance vector based on sim_llm

Example

For complete case studies and a showcase of the main functions of the‘NaileR’ package, see thedocumentation.

Let’s have a look at how we can interpret HCPC clusters:

library(FactoMineR)data(local_food)set.seed(1)# for consistencyres_mca<-MCA(local_food,quali.sup =46:63,ncp =100,level.ventil =0.05,graph = F)plot.MCA(res_mca,choix ="ind",invisible =c("var","quali.sup"),label ="none")res_hcpc<-HCPC(res_mca,nb.clust =3,graph = F)plot.HCPC(res_hcpc,choice ="map",draw.tree = F,ind.names = F)don_clust<- res_hcpc$data.clust

Due to the very long and explicit variable names, the categorydescription result is practically illegible. Let’s provide clear contextand see how a LLM can make sense of it:

res=nail_catdes(don_clust,ncol(don_clust),introduction ='A study on sustainable food systems was led on several French participants. This study had 2 parts.                   In the first part, participants had to rate how acceptable "a food system that..." (e.g, "a food system that only uses renewable energy") was to them.                   In the second part, they had to say if they agreed or disagreed with some statements.',request ='I will give you the answers from one group.                   Please explain who the individuals of this group are, what their beliefs are. Then, give this group a new name, and explain why you chose this name.',isolate.groups = T,drop.negative = T)

Out comes a list of results, for each group.

In the same fashion, nail_condes can be used to interpret axis from aPCA - although a bit more work is needed, to bind the original dataframe with the coordinates on the PCA axis.

Roadmap

Implement a validation function to test the consistency of aresponse
Implement a function to generate multiple responses and pick the most“central”
Add a~~nail_textual~~ nail_sort for textual data
Consider adding a nail_decat
Implement a way to generate reports (pptx)

License

This package is under the GPL (>= 2) License. Details can be foundhere.

Contact

Sébastien Lê - sebastien.le@institut-agro.fr

Project link:https://github.com/Nelhe/NaileR

Acknowledgements

This work has benefited from a government grant managed by the AgenceNationale de la Recherche under the France 2030 programme under thereference ANR-23-PESA-0005.

Ce travail a bénéficié d’une aide de l’Etat gérée par l’AgenceNationale de la Recherche au titre de France 2030 portant la référenceANR-23-PESA-0005.

[8]ページ先頭