NotificationsYou must be signed in to change notification settings
Fork30
Star154

Using Transformers from HuggingFace in R

154 stars 30 forks Branches Tags Activity

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,955 Commits
.github		.github
R		R
data		data
inst		inst
man		man
pkgdown/favicon		pkgdown/favicon
tests		tests
vignettes		vignettes
.DS_Store		.DS_Store
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codecov.yml		codecov.yml
installation.md		installation.md
text.Rproj		text.Rproj

Repository files navigation

text

R Language Analysis Suite

An R-package for analyzing natural language with transformers-basedlarge language models. Thetext package is part of theR LanguageAnalysis Suite, including:

talk - a package that transforms voicerecordings into text, audio features, or embeddings.
text - a package that provides tools formany language tasks such as converting digital text into wordembeddings.

talk andtext offer access to Large LanguageModels from Hugging Face.
topics a package with tools forvisualizing language patterns into topics.
the L-BAM Library a librarythat provides pre-trained models for different psychologicalassessments such as mental health issues, personality and relatedbehaviours.

TheR Language Analysis Suite is created through a collaborationbetween psychology and computer science to address research needs andensure state-of-the-art techniques. The suite is continuously tested onUbuntu, Mac OS and Windows using the latest stable R version.

Thetext-package has two main objectives:
* First, to serveR-users as apoint solution for transforming text to state-of-the-artword embeddings that are ready to be used for downstream tasks. Thepackage provides a user-friendly link to language models based ontransformers fromHugging Face.
*Second, to serve as anend-to-end solution that providesstate-of-the-art AI techniques tailored for social and behavioralscientists.
Please reference our tutorial article when using thetext package:The text-package: An R-package for Analyzing andVisualizing Human Language Using Natural Language Processing and DeepLearning.

Point solution for transforming text to embeddings

Recent significant advances in NLP research have resulted in improvedrepresentations of human language (i.e., language models). Theselanguage models have produced big performance gains in tasks related tounderstanding human language. Text are making these SOTA models easilyaccessible through an interface toHuggingFace in Python.

Text provides many of the contemporary state-of-the-art languagemodels that are based on deep learning to model word order and context.Multilingual language models can also represent several languages;multilingual BERT comprises104 different languages.

Table 1. Some of the available language models

Models	References	Layers	Dimensions	Language
‘bert-base-uncased’	Devlin et al. 2019	12	768	English
‘roberta-base’	Liu et al. 2019	12	768	English
‘distilbert-base-cased’	Sahn et al., 2019	6	768	English
‘bert-base-multilingual-cased’	Devlin et al. 2019	12	768	104 top languages at Wikipedia
‘xlm-roberta-large’	Liu et al	24	1024	100 language

SeeHuggingFace for a morecomprehensive list of models.

An end-to-end package

Text also provides functions to analyse the word embeddings withwell-tested machine learning algorithms and statistics. The focus is toanalyze and visualize text, and their relation to other text ornumerical variables. For example, thetextTrain() function is used toexamine how well the word embeddings from a text can predict a numericor categorical variable. Another example is functions plottingstatistically significant words in the word embedding space.

About

Using Transformers from HuggingFace in R

r-text.org

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

text

R Language Analysis Suite

Point solution for transforming text to embeddings

An end-to-end package

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors17

Uh oh!

Languages

Movatterモバイル変換

OscarKjell/text

Folders and files

Latest commit

History

Repository files navigation

text

R Language Analysis Suite

Point solution for transforming text to embeddings

An end-to-end package

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors17

Uh oh!

Languages

Packages