Movatterモバイル変換

LDAvis

R package for interactive topic model visualization.

LDAvis is designed to help users interpret thetopics in a topic model that has been fit to a corpus of text data. Thepackage extracts information from a fitted LDA topic model to inform aninteractive web-based visualization.

Installing the package

Stable version on CRAN:

install.packages("LDAvis")

Development version on GitHub (withdevtools):

devtools::install_github("cpsievert/LDAvis")

Getting started

Once installed, we recommend a visit to the main help page:

library(LDAvis)help(createJSON, package="LDAvis")

The documentation and example on the bottom of that page shouldprovide a quick sense of how to create (and share) your ownvisualizations. If you want more details about the technicalspecifications of the visualization, see the vignette:

vignette("details", package="LDAvis")

Note thatLDAvis itself does not provide facilitiesforfitting the model (onlyvisualizing a fittedmodel). If you want to perform LDA in R, there are several packages,includingmallet,lda,andtopicmodels.

If you want to perform LDA with the R packageldaand visualize the result withLDAvis, our example of a20-topicmodel fit to 2,000 movie reviews may be helpful.

LDAvis does not limit you to topic modelingfacilities in R. If you use other tools (MALLET andgensim are popular), werecommend that you visit ourTwentyNewsgroups example to help quickly understand what componentsLDAvis will need.

Sharing a Visualization

To share a visualization that you created usingLDAvis, you can encode the state of the visualizationinto the URL by appending a string of the form:

“#topic=k&lambda=l&term=s”

to the end of the URL, where “k”, “l”, and “s” are strings indicatingthe desired values of the selected topic, the value of lambda, and theselected term, respectively. For more details, see the last section ofourMovieReviews example, or for a quick example, see the link here:

http://cpsievert.github.io/LDAvis/reviews/vis/#topic=3&lambda=0.6&term=cop

Video demos

Additional data

We included one data set in LDAvis, ‘TwentyNewsgroups’, whichconsists of a list with 5 elements: - phi, a matrix with the topic-termdistributions - theta, a matrix with the document-topic distributions -doc.length, a numeric vector with token counts for each document -vocab, a character vector containing the terms - term.frequency, anumeric vector of observed term frequencies

We also created a second data-only package calledLDAvisData to holdadditional example data sets. Currently there are three more examplesavailable there: - Movie Reviews (a 20-topic model fit to 2,000 moviereviews) - AP (a 40-topic model fit to approximately 2,246 newsarticles) - Jeopardy (a 100-topic model fit to approximately 20,000Jeopardy questions)

[8]ページ先頭