LearningCurveDisplay #

classsklearn.model_selection.LearningCurveDisplay(*,train_sizes,train_scores,test_scores,score_name=None)[source]#

Learning Curve visualization.

It is recommended to usefrom_estimator tocreate aLearningCurveDisplay instance.All parameters are stored as attributes.

Read more in theUser Guide for general informationabout the visualization API anddetailed documentation regarding the learningcurve visualization.

Added in version 1.2.

Parameters:

train_sizesndarray of shape (n_unique_ticks,): Numbers of training examples that has been used to generate thelearning curve.
train_scoresndarray of shape (n_ticks, n_cv_folds): Scores on training sets.
test_scoresndarray of shape (n_ticks, n_cv_folds): Scores on test set.
score_namestr, default=None: The name of the score used inlearning_curve. It will override the nameinferred from thescoring parameter. Ifscore isNone, we use"Score" ifnegate_score isFalse and"Negativescore" otherwise. Ifscoring is astring or a callable, we infer the name. We replace_ by spaces and capitalizethe first letter. We removeneg_ and replace it by"Negative" ifnegate_score isFalse or just remove it otherwise.

Attributes:

ax_matplotlib Axes: Axes with the learning curve.
figure_matplotlib Figure: Figure containing the learning curve.
errorbar_list of matplotlib Artist or None: When thestd_display_style is"errorbar", this is a list ofmatplotlib.container.ErrorbarContainer objects. If another style isused,errorbar_ isNone.
lines_list of matplotlib Artist or None: When thestd_display_style is"fill_between", this is a list ofmatplotlib.lines.Line2D objects corresponding to the mean train andtest scores. If another style is used,line_ isNone.
fill_between_list of matplotlib Artist or None: When thestd_display_style is"fill_between", this is a list ofmatplotlib.collections.PolyCollection objects. If another style isused,fill_between_ isNone.

See also

sklearn.model_selection.learning_curve: Compute the learning curve.

Examples

>>>importmatplotlib.pyplotasplt>>>fromsklearn.datasetsimportload_iris>>>fromsklearn.model_selectionimportLearningCurveDisplay,learning_curve>>>fromsklearn.treeimportDecisionTreeClassifier>>>X,y=load_iris(return_X_y=True)>>>tree=DecisionTreeClassifier(random_state=0)>>>train_sizes,train_scores,test_scores=learning_curve(...tree,X,y)>>>display=LearningCurveDisplay(train_sizes=train_sizes,...train_scores=train_scores,test_scores=test_scores,score_name="Score")>>>display.plot()<...>>>>plt.show()

../../_images/sklearn-model_selection-LearningCurveDisplay-1.png

classmethodfrom_estimator(estimator,X,y,*,groups=None,train_sizes=array([0.1,0.33,0.55,0.78,1.]),cv=None,scoring=None,exploit_incremental_learning=False,n_jobs=None,pre_dispatch='all',verbose=0,shuffle=False,random_state=None,error_score=nan,fit_params=None,ax=None,negate_score=False,score_name=None,score_type='both',std_display_style='fill_between',line_kw=None,fill_between_kw=None,errorbar_kw=None)[source]#

Create a learning curve display from an estimator.

Read more in theUser Guide for generalinformation about the visualization API anddetaileddocumentation regarding the learning curvevisualization.

Parameters:

estimatorobject type that implements the “fit” and “predict” methods

An object of that type which is cloned for each validation.

Xarray-like of shape (n_samples, n_features)

Training data, wheren_samples is the number of samples andn_features is the number of features.

yarray-like of shape (n_samples,) or (n_samples, n_outputs) or None

Target relative to X for classification or regression;None for unsupervised learning.

groupsarray-like of shape (n_samples,), default=None

Group labels for the samples used while splitting the dataset intotrain/test set. Only used in conjunction with a “Group”cvinstance (e.g.,GroupKFold).

train_sizesarray-like of shape (n_ticks,), default=np.linspace(0.1, 1.0, 5)

Relative or absolute numbers of training examples that will be usedto generate the learning curve. If the dtype is float, it isregarded as a fraction of the maximum size of the training set(that is determined by the selected validation method), i.e. it hasto be within (0, 1]. Otherwise it is interpreted as absolute sizesof the training sets. Note that for classification the number ofsamples usually have to be big enough to contain at least onesample from each class.

cvint, cross-validation generator or an iterable, default=None

Determines the cross-validation splitting strategy.Possible inputs for cv are:

None, to use the default 5-fold cross validation,
int, to specify the number of folds in a(Stratified)KFold,
CV splitter,
An iterable yielding (train, test) splits as arrays of indices.

For int/None inputs, if the estimator is a classifier andy iseither binary or multiclass,StratifiedKFold is used. In allother cases,KFold is used. Thesesplitters are instantiated withshuffle=False so the splits willbe the same across calls.

ReferUser Guide for the variouscross-validation strategies that can be used here.

scoringstr or callable, default=None

The scoring method to use when calculating the learning curve. Options:

str: seeString name scorers for options.
callable: a scorer callable object (e.g., function) with signaturescorer(estimator,X,y). SeeCallable scorers for details.
None: theestimator’sdefault evaluation criterion is used.

exploit_incremental_learningbool, default=False

If the estimator supports incremental learning, this will beused to speed up fitting for different training set sizes.

n_jobsint, default=None

Number of jobs to run in parallel. Training the estimator andcomputing the score are parallelized over the different trainingand test sets.None means 1 unless in ajoblib.parallel_backend context.-1 means using allprocessors. SeeGlossary for more details.

pre_dispatchint or str, default=’all’

Number of predispatched jobs for parallel execution (default isall). The option can reduce the allocated memory. The str canbe an expression like ‘2*n_jobs’.

verboseint, default=0

Controls the verbosity: the higher, the more messages.

shufflebool, default=False

Whether to shuffle training data before taking prefixes of itbased on`train_sizes`.

random_stateint, RandomState instance or None, default=None

Used whenshuffle is True. Pass an int for reproducibleoutput across multiple function calls.SeeGlossary.

error_score‘raise’ or numeric, default=np.nan

Value to assign to the score if an error occurs in estimatorfitting. If set to ‘raise’, the error is raised. If a numeric valueis given, FitFailedWarning is raised.

fit_paramsdict, default=None

Parameters to pass to the fit method of the estimator.

axmatplotlib Axes, default=None

Axes object to plot on. IfNone, a new figure and axes iscreated.

negate_scorebool, default=False

Whether or not to negate the scores obtained throughlearning_curve. This isparticularly useful when using the error denoted byneg_* inscikit-learn.

score_namestr, default=None

The name of the score used to decorate the y-axis of the plot. It willoverride the name inferred from thescoring parameter. Ifscore isNone, we use"Score" ifnegate_score isFalse and"Negativescore"otherwise. Ifscoring is a string or a callable, we infer the name. Wereplace_ by spaces and capitalize the first letter. We removeneg_ andreplace it by"Negative" ifnegate_score isFalse or just remove it otherwise.

score_type{“test”, “train”, “both”}, default=”both”

The type of score to plot. Can be one of"test","train", or"both".

std_display_style{“errorbar”, “fill_between”} or None, default=”fill_between”

The style used to display the score standard deviation around themean score. IfNone, no representation of the standard deviationis displayed.

line_kwdict, default=None

Additional keyword arguments passed to theplt.plot used to drawthe mean score.

fill_between_kwdict, default=None

Additional keyword arguments passed to theplt.fill_between usedto draw the score standard deviation.

errorbar_kwdict, default=None

Additional keyword arguments passed to theplt.errorbar used todraw mean score and standard deviation score.

Returns:

displayLearningCurveDisplay: Object that stores computed values.

Examples

>>>importmatplotlib.pyplotasplt>>>fromsklearn.datasetsimportload_iris>>>fromsklearn.model_selectionimportLearningCurveDisplay>>>fromsklearn.treeimportDecisionTreeClassifier>>>X,y=load_iris(return_X_y=True)>>>tree=DecisionTreeClassifier(random_state=0)>>>LearningCurveDisplay.from_estimator(tree,X,y)<...>>>>plt.show()

../../_images/sklearn-model_selection-LearningCurveDisplay-2.png

plot(ax=None,*,negate_score=False,score_name=None,score_type='both',std_display_style='fill_between',line_kw=None,fill_between_kw=None,errorbar_kw=None)[source]#

Plot visualization.

Parameters:

axmatplotlib Axes, default=None: Axes object to plot on. IfNone, a new figure and axes iscreated.
negate_scorebool, default=False: Whether or not to negate the scores obtained throughlearning_curve. This isparticularly useful when using the error denoted byneg_* inscikit-learn.
score_namestr, default=None: The name of the score used to decorate the y-axis of the plot. It willoverride the name inferred from thescoring parameter. Ifscore isNone, we use"Score" ifnegate_score isFalse and"Negativescore"otherwise. Ifscoring is a string or a callable, we infer the name. Wereplace_ by spaces and capitalize the first letter. We removeneg_ andreplace it by"Negative" ifnegate_score isFalse or just remove it otherwise.
score_type{“test”, “train”, “both”}, default=”both”: The type of score to plot. Can be one of"test","train", or"both".
std_display_style{“errorbar”, “fill_between”} or None, default=”fill_between”: The style used to display the score standard deviation around themean score. If None, no standard deviation representation isdisplayed.
line_kwdict, default=None: Additional keyword arguments passed to theplt.plot used to drawthe mean score.
fill_between_kwdict, default=None: Additional keyword arguments passed to theplt.fill_between usedto draw the score standard deviation.
errorbar_kwdict, default=None: Additional keyword arguments passed to theplt.errorbar used todraw mean score and standard deviation score.

Returns: