Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

LDA#

classpyspark.mllib.clustering.LDA[source]#

Train Latent Dirichlet Allocation (LDA) model.

New in version 1.5.0.

Methods

train(rdd[, k, maxIterations, ...])

Train a LDA model.

Methods Documentation

classmethodtrain(rdd,k=10,maxIterations=20,docConcentration=-1.0,topicConcentration=-1.0,seed=None,checkpointInterval=10,optimizer='em')[source]#

Train a LDA model.

New in version 1.5.0.

Parameters
rddpyspark.RDD

RDD of documents, which are tuples of document IDs and term(word) count vectors. The term count vectors are “bags ofwords” with a fixed-size vocabulary (where the vocabulary sizeis the length of the vector). Document IDs must be uniqueand >= 0.

kint, optional

Number of topics to infer, i.e., the number of soft clustercenters.(default: 10)

maxIterationsint, optional

Maximum number of iterations allowed.(default: 20)

docConcentrationfloat, optional

Concentration parameter (commonly named “alpha”) for the priorplaced on documents’ distributions over topics (“theta”).(default: -1.0)

topicConcentrationfloat, optional

Concentration parameter (commonly named “beta” or “eta”) forthe prior placed on topics’ distributions over terms.(default: -1.0)

seedint, optional

Random seed for cluster initialization. Set as None to generateseed based on system time.(default: None)

checkpointIntervalint, optional

Period (in iterations) between checkpoints.(default: 10)

optimizerstr, optional

LDAOptimizer used to perform the actual calculation. Currently“em”, “online” are supported.(default: “em”)


[8]ページ先頭

©2009-2025 Movatter.jp