Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

RandomRDDs#

classpyspark.mllib.random.RandomRDDs[source]#

Generator methods for creating RDDs comprised of i.i.d samples fromsome distribution.

New in version 1.1.0.

Methods

exponentialRDD(sc, mean, size[, ...])

Generates an RDD comprised of i.i.d.

exponentialVectorRDD(sc, mean, numRows, numCols)

Generates an RDD comprised of vectors containing i.i.d.

gammaRDD(sc, shape, scale, size[, ...])

Generates an RDD comprised of i.i.d.

gammaVectorRDD(sc, shape, scale, numRows, ...)

Generates an RDD comprised of vectors containing i.i.d.

logNormalRDD(sc, mean, std, size[, ...])

Generates an RDD comprised of i.i.d.

logNormalVectorRDD(sc, mean, std, numRows, ...)

Generates an RDD comprised of vectors containing i.i.d.

normalRDD(sc, size[, numPartitions, seed])

Generates an RDD comprised of i.i.d.

normalVectorRDD(sc, numRows, numCols[, ...])

Generates an RDD comprised of vectors containing i.i.d.

poissonRDD(sc, mean, size[, numPartitions, seed])

Generates an RDD comprised of i.i.d.

poissonVectorRDD(sc, mean, numRows, numCols)

Generates an RDD comprised of vectors containing i.i.d.

uniformRDD(sc, size[, numPartitions, seed])

Generates an RDD comprised of i.i.d.

uniformVectorRDD(sc, numRows, numCols[, ...])

Generates an RDD comprised of vectors containing i.i.d.

Methods Documentation

staticexponentialRDD(sc,mean,size,numPartitions=None,seed=None)[source]#

Generates an RDD comprised of i.i.d. samples from the Exponentialdistribution with the input mean.

New in version 1.3.0.

Parameters
scpyspark.SparkContext

SparkContext used to create the RDD.

meanfloat

Mean, or 1 / lambda, for the Exponential distribution.

sizeint

Size of the RDD.

numPartitionsint, optional

Number of partitions in the RDD (default:sc.defaultParallelism).

seedint, optional

Random seed (default: a random long integer).

Returns
pyspark.RDD

RDD of float comprised of i.i.d. samples ~ Exp(mean).

Examples

>>>mean=2.0>>>x=RandomRDDs.exponentialRDD(sc,mean,1000,seed=2)>>>stats=x.stats()>>>stats.count()1000>>>abs(stats.mean()-mean)<0.5True>>>frommathimportsqrt>>>bool(abs(stats.stdev()-sqrt(mean))<0.5)True
staticexponentialVectorRDD(sc,mean,numRows,numCols,numPartitions=None,seed=None)[source]#

Generates an RDD comprised of vectors containing i.i.d. samples drawnfrom the Exponential distribution with the input mean.

New in version 1.3.0.

Parameters
scpyspark.SparkContext

SparkContext used to create the RDD.

meanfloat

Mean, or 1 / lambda, for the Exponential distribution.

numRowsint

Number of Vectors in the RDD.

numColsint

Number of elements in each Vector.

numPartitionsint, optional

Number of partitions in the RDD (default:sc.defaultParallelism)

seedint, optional

Random seed (default: a random long integer).

Returns
pyspark.RDD

RDD of Vector with vectors containing i.i.d. samples ~ Exp(mean).

Examples

>>>importnumpyasnp>>>mean=0.5>>>rdd=RandomRDDs.exponentialVectorRDD(sc,mean,100,100,seed=1)>>>mat=np.asmatrix(rdd.collect())>>>mat.shape(100, 100)>>>bool(abs(mat.mean()-mean)<0.5)True>>>frommathimportsqrt>>>bool(abs(mat.std()-sqrt(mean))<0.5)True
staticgammaRDD(sc,shape,scale,size,numPartitions=None,seed=None)[source]#

Generates an RDD comprised of i.i.d. samples from the Gammadistribution with the input shape and scale.

New in version 1.3.0.

Parameters
scpyspark.SparkContext

SparkContext used to create the RDD.

shapefloat

shape (> 0) parameter for the Gamma distribution

scalefloat

scale (> 0) parameter for the Gamma distribution

sizeint

Size of the RDD.

numPartitionsint, optional

Number of partitions in the RDD (default:sc.defaultParallelism).

seedint, optional

Random seed (default: a random long integer).

Returns
pyspark.RDD

RDD of float comprised of i.i.d. samples ~ Gamma(shape, scale).

Examples

>>>frommathimportsqrt>>>shape=1.0>>>scale=2.0>>>expMean=shape*scale>>>expStd=sqrt(shape*scale*scale)>>>x=RandomRDDs.gammaRDD(sc,shape,scale,1000,seed=2)>>>stats=x.stats()>>>stats.count()1000>>>bool(abs(stats.mean()-expMean)<0.5)True>>>bool(abs(stats.stdev()-expStd)<0.5)True
staticgammaVectorRDD(sc,shape,scale,numRows,numCols,numPartitions=None,seed=None)[source]#

Generates an RDD comprised of vectors containing i.i.d. samples drawnfrom the Gamma distribution.

New in version 1.3.0.

Parameters
scpyspark.SparkContext

SparkContext used to create the RDD.

shapefloat

Shape (> 0) of the Gamma distribution

scalefloat

Scale (> 0) of the Gamma distribution

numRowsint

Number of Vectors in the RDD.

numColsint

Number of elements in each Vector.

numPartitionsint, optional

Number of partitions in the RDD (default:sc.defaultParallelism).

seedint, optional,

Random seed (default: a random long integer).

Returns
pyspark.RDD

RDD of Vector with vectors containing i.i.d. samples ~ Gamma(shape, scale).

Examples

>>>importnumpyasnp>>>frommathimportsqrt>>>shape=1.0>>>scale=2.0>>>expMean=shape*scale>>>expStd=sqrt(shape*scale*scale)>>>mat=np.matrix(RandomRDDs.gammaVectorRDD(sc,shape,scale,100,100,seed=1).collect())>>>mat.shape(100, 100)>>>bool(abs(mat.mean()-expMean)<0.1)True>>>bool(abs(mat.std()-expStd)<0.1)True
staticlogNormalRDD(sc,mean,std,size,numPartitions=None,seed=None)[source]#

Generates an RDD comprised of i.i.d. samples from the log normaldistribution with the input mean and standard distribution.

New in version 1.3.0.

Parameters
scpyspark.SparkContext

used to create the RDD.

meanfloat

mean for the log Normal distribution

stdfloat

std for the log Normal distribution

sizeint

Size of the RDD.

numPartitionsint, optional

Number of partitions in the RDD (default:sc.defaultParallelism).

seedint, optional

Random seed (default: a random long integer).

Returns
RDD of float comprised of i.i.d. samples ~ log N(mean, std).

Examples

>>>frommathimportsqrt,exp>>>mean=0.0>>>std=1.0>>>expMean=exp(mean+0.5*std*std)>>>expStd=sqrt((exp(std*std)-1.0)*exp(2.0*mean+std*std))>>>x=RandomRDDs.logNormalRDD(sc,mean,std,1000,seed=2)>>>stats=x.stats()>>>stats.count()1000>>>bool(abs(stats.mean()-expMean)<0.5)True>>>frommathimportsqrt>>>bool(abs(stats.stdev()-expStd)<0.5)True
staticlogNormalVectorRDD(sc,mean,std,numRows,numCols,numPartitions=None,seed=None)[source]#

Generates an RDD comprised of vectors containing i.i.d. samples drawnfrom the log normal distribution.

New in version 1.3.0.

Parameters
scpyspark.SparkContext

SparkContext used to create the RDD.

meanfloat

Mean of the log normal distribution

stdfloat

Standard Deviation of the log normal distribution

numRowsint

Number of Vectors in the RDD.

numColsint

Number of elements in each Vector.

numPartitionsint, optional

Number of partitions in the RDD (default:sc.defaultParallelism).

seedint, optional

Random seed (default: a random long integer).

Returns
pyspark.RDD

RDD of Vector with vectors containing i.i.d. samples ~ logN(mean, std).

Examples

>>>importnumpyasnp>>>frommathimportsqrt,exp>>>mean=0.0>>>std=1.0>>>expMean=exp(mean+0.5*std*std)>>>expStd=sqrt((exp(std*std)-1.0)*exp(2.0*mean+std*std))>>>m=RandomRDDs.logNormalVectorRDD(sc,mean,std,100,100,seed=1).collect()>>>mat=np.matrix(m)>>>mat.shape(100, 100)>>>bool(abs(mat.mean()-expMean)<0.1)True>>>bool(abs(mat.std()-expStd)<0.1)True
staticnormalRDD(sc,size,numPartitions=None,seed=None)[source]#

Generates an RDD comprised of i.i.d. samples from the standard normaldistribution.

To transform the distribution in the generated RDD from standard normalto some other normal N(mean, sigma^2), useRandomRDDs.normal(sc,n,p,seed).map(lambdav:mean+sigma*v)

New in version 1.1.0.

Parameters
scpyspark.SparkContext

used to create the RDD.

sizeint

Size of the RDD.

numPartitionsint, optional

Number of partitions in the RDD (default:sc.defaultParallelism).

seedint, optional

Random seed (default: a random long integer).

Returns
pyspark.RDD

RDD of float comprised of i.i.d. samples ~ N(0.0, 1.0).

Examples

>>>x=RandomRDDs.normalRDD(sc,1000,seed=1)>>>stats=x.stats()>>>stats.count()1000>>>bool(abs(stats.mean()-0.0)<0.1)True>>>bool(abs(stats.stdev()-1.0)<0.1)True
staticnormalVectorRDD(sc,numRows,numCols,numPartitions=None,seed=None)[source]#

Generates an RDD comprised of vectors containing i.i.d. samples drawnfrom the standard normal distribution.

New in version 1.1.0.

Parameters
scpyspark.SparkContext

SparkContext used to create the RDD.

numRowsint

Number of Vectors in the RDD.

numColsint

Number of elements in each Vector.

numPartitionsint, optional

Number of partitions in the RDD (default:sc.defaultParallelism).

seedint, optional

Random seed (default: a random long integer).

Returns
pyspark.RDD

RDD of Vector with vectors containing i.i.d. samples ~N(0.0, 1.0).

Examples

>>>importnumpyasnp>>>mat=np.matrix(RandomRDDs.normalVectorRDD(sc,100,100,seed=1).collect())>>>mat.shape(100, 100)>>>bool(abs(mat.mean()-0.0)<0.1)True>>>bool(abs(mat.std()-1.0)<0.1)True
staticpoissonRDD(sc,mean,size,numPartitions=None,seed=None)[source]#

Generates an RDD comprised of i.i.d. samples from the Poissondistribution with the input mean.

New in version 1.1.0.

Parameters
scpyspark.SparkContext

SparkContext used to create the RDD.

meanfloat

Mean, or lambda, for the Poisson distribution.

sizeint

Size of the RDD.

numPartitionsint, optional

Number of partitions in the RDD (default:sc.defaultParallelism).

seedint, optional

Random seed (default: a random long integer).

Returns
pyspark.RDD

RDD of float comprised of i.i.d. samples ~ Pois(mean).

Examples

>>>mean=100.0>>>x=RandomRDDs.poissonRDD(sc,mean,1000,seed=2)>>>stats=x.stats()>>>stats.count()1000>>>abs(stats.mean()-mean)<0.5True>>>frommathimportsqrt>>>bool(abs(stats.stdev()-sqrt(mean))<0.5)True
staticpoissonVectorRDD(sc,mean,numRows,numCols,numPartitions=None,seed=None)[source]#

Generates an RDD comprised of vectors containing i.i.d. samples drawnfrom the Poisson distribution with the input mean.

New in version 1.1.0.

Parameters
scpyspark.SparkContext

SparkContext used to create the RDD.

meanfloat

Mean, or lambda, for the Poisson distribution.

numRowsfloat

Number of Vectors in the RDD.

numColsint

Number of elements in each Vector.

numPartitionsint, optional

Number of partitions in the RDD (default:sc.defaultParallelism)

seedint, optional

Random seed (default: a random long integer).

Returns
pyspark.RDD

RDD of Vector with vectors containing i.i.d. samples ~ Pois(mean).

Examples

>>>importnumpyasnp>>>mean=100.0>>>rdd=RandomRDDs.poissonVectorRDD(sc,mean,100,100,seed=1)>>>mat=np.asmatrix(rdd.collect())>>>mat.shape(100, 100)>>>bool(abs(mat.mean()-mean)<0.5)True>>>frommathimportsqrt>>>bool(abs(mat.std()-sqrt(mean))<0.5)True
staticuniformRDD(sc,size,numPartitions=None,seed=None)[source]#

Generates an RDD comprised of i.i.d. samples from theuniform distribution U(0.0, 1.0).

To transform the distribution in the generated RDD from U(0.0, 1.0)to U(a, b), useRandomRDDs.uniformRDD(sc,n,p,seed).map(lambdav:a+(b-a)*v)

New in version 1.1.0.

Parameters
scpyspark.SparkContext

used to create the RDD.

sizeint

Size of the RDD.

numPartitionsint, optional

Number of partitions in the RDD (default:sc.defaultParallelism).

seedint, optional

Random seed (default: a random long integer).

Returns
pyspark.RDD

RDD of float comprised of i.i.d. samples ~U(0.0, 1.0).

Examples

>>>x=RandomRDDs.uniformRDD(sc,100).collect()>>>len(x)100>>>max(x)<=1.0andmin(x)>=0.0True>>>RandomRDDs.uniformRDD(sc,100,4).getNumPartitions()4>>>parts=RandomRDDs.uniformRDD(sc,100,seed=4).getNumPartitions()>>>parts==sc.defaultParallelismTrue
staticuniformVectorRDD(sc,numRows,numCols,numPartitions=None,seed=None)[source]#

Generates an RDD comprised of vectors containing i.i.d. samples drawnfrom the uniform distribution U(0.0, 1.0).

New in version 1.1.0.

Parameters
scpyspark.SparkContext

SparkContext used to create the RDD.

numRowsint

Number of Vectors in the RDD.

numColsint

Number of elements in each Vector.

numPartitionsint, optional

Number of partitions in the RDD.

seedint, optional

Seed for the RNG that generates the seed for the generator in each partition.

Returns
pyspark.RDD

RDD of Vector with vectors containing i.i.d samples ~U(0.0, 1.0).

Examples

>>>importnumpyasnp>>>mat=np.matrix(RandomRDDs.uniformVectorRDD(sc,10,10).collect())>>>mat.shape(10, 10)>>>bool(mat.max()<=1.0andmat.min()>=0.0)True>>>RandomRDDs.uniformVectorRDD(sc,10,10,4).getNumPartitions()4

[8]ページ先頭

©2009-2025 Movatter.jp