Normalizer#
- classpyspark.mllib.feature.Normalizer(p=2.0)[source]#
Normalizes samples individually to unit Lp norm
For any 1 <=p < float(‘inf’), normalizes samples usingsum(abs(vector)p)(1/p) as norm.
Forp = float(‘inf’), max(abs(vector)) will be used as norm fornormalization.
New in version 1.2.0.
- Parameters
- pfloat, optional
Normalization in L^p^ space, p = 2 by default.
Examples
>>>frompyspark.mllib.linalgimportVectors>>>v=Vectors.dense(range(3))>>>nor=Normalizer(1)>>>nor.transform(v)DenseVector([0.0, 0.3333, 0.6667])
>>>rdd=sc.parallelize([v])>>>nor.transform(rdd).collect()[DenseVector([0.0, 0.3333, 0.6667])]
>>>nor2=Normalizer(float("inf"))>>>nor2.transform(v)DenseVector([0.0, 0.5, 1.0])
Methods
transform(vector)Applies unit length normalization on a vector.
Methods Documentation
- transform(vector)[source]#
Applies unit length normalization on a vector.
New in version 1.2.0.
- Parameters
- vector
pyspark.mllib.linalg.Vectororpyspark.RDD vector or RDD of vector to be normalized.
- vector
- Returns
pyspark.mllib.linalg.Vectororpyspark.RDDnormalized vector(s). If the norm of the input is zero, itwill return the input vector.