make_regression#

sklearn.datasets.make_regression(n_samples=100,n_features=100,*,n_informative=10,n_targets=1,bias=0.0,effective_rank=None,tail_strength=0.5,noise=0.0,shuffle=True,coef=False,random_state=None)[source]#

Generate a random regression problem.

The input set can either be well conditioned (by default) or have a lowrank-fat tail singular profile. Seemake_low_rank_matrix formore details.

The output is generated by applying a (potentially biased) random linearregression model withn_informative nonzero regressors to the previouslygenerated input and some gaussian centered noise with some adjustablescale.

Read more in theUser Guide.

Parameters:
n_samplesint, default=100

The number of samples.

n_featuresint, default=100

The number of features.

n_informativeint, default=10

The number of informative features, i.e., the number of features usedto build the linear model used to generate the output.

n_targetsint, default=1

The number of regression targets, i.e., the dimension of the y outputvector associated with a sample. By default, the output is a scalar.

biasfloat, default=0.0

The bias term in the underlying linear model.

effective_rankint, default=None
If not None:

The approximate number of singular vectors required to explain mostof the input data by linear combinations. Using this kind ofsingular spectrum in the input allows the generator to reproducethe correlations often observed in practice.

If None:

The input set is well conditioned, centered and gaussian withunit variance.

tail_strengthfloat, default=0.5

The relative importance of the fat noisy tail of the singular valuesprofile ifeffective_rank is not None. When a float, it should bebetween 0 and 1.

noisefloat, default=0.0

The standard deviation of the gaussian noise applied to the output.

shufflebool, default=True

Shuffle the samples and the features.

coefbool, default=False

If True, the coefficients of the underlying linear model are returned.

random_stateint, RandomState instance or None, default=None

Determines random number generation for dataset creation. Pass an intfor reproducible output across multiple function calls.SeeGlossary.

Returns:
Xndarray of shape (n_samples, n_features)

The input samples.

yndarray of shape (n_samples,) or (n_samples, n_targets)

The output values.

coefndarray of shape (n_features,) or (n_features, n_targets)

The coefficient of the underlying linear model. It is returned only ifcoef is True.

Examples

>>>fromsklearn.datasetsimportmake_regression>>>X,y=make_regression(n_samples=5,n_features=2,noise=1,random_state=42)>>>Xarray([[ 0.4967, -0.1382 ],    [ 0.6476,  1.523],    [-0.2341, -0.2341],    [-0.4694,  0.5425],    [ 1.579,  0.7674]])>>>yarray([  6.737,  37.79, -10.27,   0.4017,   42.22])

Gallery examples#

Prediction Latency

Prediction Latency

Effect of transforming the targets in regression model

Effect of transforming the targets in regression model

Comparing Linear Bayesian Regressors

Comparing Linear Bayesian Regressors

Fitting an Elastic Net with a precomputed Gram Matrix and Weighted Samples

Fitting an Elastic Net with a precomputed Gram Matrix and Weighted Samples

HuberRegressor vs Ridge on dataset with strong outliers

HuberRegressor vs Ridge on dataset with strong outliers

Lasso on dense and sparse data

Lasso on dense and sparse data

Robust linear model estimation using RANSAC

Robust linear model estimation using RANSAC

Ridge coefficients as a function of the L2 Regularization

Ridge coefficients as a function of the L2 Regularization

Effect of model regularization on training and test error

Effect of model regularization on training and test error

Release Highlights for scikit-learn 0.23

Release Highlights for scikit-learn 0.23

Release Highlights for scikit-learn 1.4

Release Highlights for scikit-learn 1.4