IDFModel#
- classpyspark.mllib.feature.IDFModel(java_model)[source]#
Represents an IDF model that can transform term frequency vectors.
New in version 1.2.0.
Methods
call(name, *a)Call method of java_model
docFreq()Returns the document frequency.
idf()Returns the current IDF vector.
numDocs()Returns number of documents evaluated to compute idf
transform(x)Transforms term frequency (TF) vectors to TF-IDF vectors.
Methods Documentation
- call(name,*a)#
Call method of java_model
- transform(x)[source]#
Transforms term frequency (TF) vectors to TF-IDF vectors.
IfminDocFreq was set for the IDF calculation,the terms which occur in fewer thanminDocFreqdocuments will have an entry of 0.
New in version 1.2.0.
- Parameters
- x
pyspark.mllib.linalg.Vectororpyspark.RDD an RDD of term frequency vectors or a term frequencyvector
- x
- Returns
pyspark.mllib.linalg.Vectororpyspark.RDDan RDD of TF-IDF vectors or a TF-IDF vector
Notes
In Python, transform cannot currently be used withinan RDD transformation or action.Call transform directly on the RDD instead.