4.Metadata Routing#

Note

The Metadata Routing API is experimental, and is not yet implemented for allestimators. Please refer to thelist of supported and unsupportedmodels for more information. It may change withoutthe usual deprecation cycle. By default this feature is not enabled. You canenable it by setting theenable_metadata_routing flag toTrue:

>>>importsklearn>>>sklearn.set_config(enable_metadata_routing=True)

Note that the methods and requirements introduced in this document are onlyrelevant if you want to passmetadata (e.g.sample_weight) to a method.If you’re only passingX andy and no other parameter / metadata tomethods such asfit,transform, etc., then you don’t need to setanything.

This guide demonstrates howmetadata can be routed and passed between objects inscikit-learn. If you are developing a scikit-learn compatible estimator ormeta-estimator, you can check our related developer guide:Metadata Routing.

Metadata is data that an estimator, scorer, or CV splitter takes into account if theuser explicitly passes it as a parameter. For instance,KMeans acceptssample_weight in itsfit() method and considers it to calculate its centroids.classes are consumed by some classifiers andgroups are used in some splitters, butany data that is passed into an object’s methods apart from X and y can be considered asmetadata. Prior to scikit-learn version 1.3, there was no single API for passingmetadata like that if these objects were used in conjunction with other objects, e.g. ascorer acceptingsample_weight inside aGridSearchCV.

With the Metadata Routing API, we can transfer metadata to estimators, scorers, and CVsplitters usingmeta-estimators (such asPipeline orGridSearchCV) or functions such ascross_validate which route data to other objects. In order topass metadata to a method likefit orscore, the object consuming the metadata,mustrequest it. This is done viaset_{method}_request() methods, where{method}is substituted by the name of the method that requests the metadata. For instance,estimators that use the metadata in theirfit() method would useset_fit_request(),and scorers would useset_score_request(). These methods allow us to specify whichmetadata to request, for instanceset_fit_request(sample_weight=True).

For grouped splitters such asGroupKFold, agroups parameter is requested by default. This is best demonstrated by thefollowing examples.

4.1.Usage Examples#

Here we present a few examples to show some common use-cases. Our goal is to passsample_weight andgroups throughcross_validate, whichroutes the metadata toLogisticRegressionCV and to a custom scorermade withmake_scorer, both of whichcan use the metadata in theirmethods. In these examples we want to individually set whether to use the metadatawithin the differentconsumers.

The examples in this section require the following imports and data:

>>>importnumpyasnp>>>fromsklearn.metricsimportmake_scorer,accuracy_score>>>fromsklearn.linear_modelimportLogisticRegressionCV,LogisticRegression>>>fromsklearn.model_selectionimportcross_validate,GridSearchCV,GroupKFold>>>fromsklearn.feature_selectionimportSelectKBest>>>fromsklearn.pipelineimportmake_pipeline>>>n_samples,n_features=100,4>>>rng=np.random.RandomState(42)>>>X=rng.rand(n_samples,n_features)>>>y=rng.randint(0,2,size=n_samples)>>>my_groups=rng.randint(0,10,size=n_samples)>>>my_weights=rng.rand(n_samples)>>>my_other_weights=rng.rand(n_samples)

4.1.1.Weighted scoring and fitting#

The splitter used internally inLogisticRegressionCV,GroupKFold, requestsgroups by default. However, we needto explicitly requestsample_weight for it and for our custom scorer by specifyingsample_weight=True inLogisticRegressionCV’sset_fit_request()method and inmake_scorer’sset_score_request() method. Bothconsumers know how to usesample_weight in theirfit() orscore() methods. We can then pass the metadata incross_validate which will route it to any active consumers:

>>>weighted_acc=make_scorer(accuracy_score).set_score_request(sample_weight=True)>>>lr=LogisticRegressionCV(...cv=GroupKFold(),...scoring=weighted_acc...).set_fit_request(sample_weight=True)>>>cv_results=cross_validate(...lr,...X,...y,...params={"sample_weight":my_weights,"groups":my_groups},...cv=GroupKFold(),...scoring=weighted_acc,...)

Note that in this example,cross_validate routesmy_weightsto both the scorer andLogisticRegressionCV.

If we would passsample_weight in the params ofcross_validate, but not set any object to request it,UnsetMetadataPassedError would be raised, hinting to us that we need to explicitly setwhere to route it. The same applies ifparams={"sample_weights":my_weights,...}were passed (note the typo, i.e.weights instead ofweight), sincesample_weights was not requested by any of its underlying objects.

4.1.2.Weighted scoring and unweighted fitting#

When passing metadata such assample_weight into arouter(meta-estimators or routing function), allsample_weightconsumers require weights to be either explicitly requested or explicitly notrequested (i.e.True orFalse). Thus, to perform an unweighted fit, we need toconfigureLogisticRegressionCV to not request sample weights, sothatcross_validate does not pass the weights along:

>>>weighted_acc=make_scorer(accuracy_score).set_score_request(sample_weight=True)>>>lr=LogisticRegressionCV(...cv=GroupKFold(),scoring=weighted_acc,...).set_fit_request(sample_weight=False)>>>cv_results=cross_validate(...lr,...X,...y,...cv=GroupKFold(),...params={"sample_weight":my_weights,"groups":my_groups},...scoring=weighted_acc,...)

Iflinear_model.LogisticRegressionCV.set_fit_request had not been called,cross_validate would raise an error becausesample_weightis passed butLogisticRegressionCV would not be explicitlyconfigured to recognize the weights.

4.1.3.Unweighted feature selection#

Routing metadata is only possible if the object’s method knows how to use the metadata,which in most cases means they have it as an explicit parameter. Only then we can setrequest values for metadata usingset_fit_request(sample_weight=True), for instance.This makes the object aconsumer.

UnlikeLogisticRegressionCV,SelectKBest can’t consume weights and therefore no requestvalue forsample_weight on its instance is set andsample_weight is not routedto it:

>>>weighted_acc=make_scorer(accuracy_score).set_score_request(sample_weight=True)>>>lr=LogisticRegressionCV(...cv=GroupKFold(),scoring=weighted_acc,...).set_fit_request(sample_weight=True)>>>sel=SelectKBest(k=2)>>>pipe=make_pipeline(sel,lr)>>>cv_results=cross_validate(...pipe,...X,...y,...cv=GroupKFold(),...params={"sample_weight":my_weights,"groups":my_groups},...scoring=weighted_acc,...)

4.1.4.Different scoring and fitting weights#

Despitemake_scorer andLogisticRegressionCV both expecting the keysample_weight, we can use aliases to pass different weights to differentconsumers. In this example, we passscoring_weight to the scorer, andfitting_weight toLogisticRegressionCV:

>>>weighted_acc=make_scorer(accuracy_score).set_score_request(...sample_weight="scoring_weight"...)>>>lr=LogisticRegressionCV(...cv=GroupKFold(),scoring=weighted_acc,...).set_fit_request(sample_weight="fitting_weight")>>>cv_results=cross_validate(...lr,...X,...y,...cv=GroupKFold(),...params={..."scoring_weight":my_weights,..."fitting_weight":my_other_weights,..."groups":my_groups,...},...scoring=weighted_acc,...)

4.2.API Interface#

Aconsumer is an object (estimator, meta-estimator, scorer, splitter) whichaccepts and uses somemetadata in at least one of its methods (for instancefit,predict,inverse_transform,transform,score,split).Meta-estimators which only forward the metadata to other objects (child estimators,scorers, or splitters) and don’t use the metadata themselves are not consumers.(Meta-)Estimators which route metadata to other objects arerouters.A(n) (meta-)estimator can be aconsumer and arouter at the same time.(Meta-)Estimators and splitters expose aset_{method}_request method for each methodwhich accepts at least one metadata. For instance, if an estimator supportssample_weight infit andscore, it exposesestimator.set_fit_request(sample_weight=value) andestimator.set_score_request(sample_weight=value). Herevalue can be:

  • True: method requests asample_weight. This means if the metadata is provided,it will be used, otherwise no error is raised.

  • False: method does not request asample_weight.

  • None: router will raise an error ifsample_weight is passed. This is in almostall cases the default value when an object is instantiated and ensures the user setsthe metadata requests explicitly when a metadata is passed. The only exception areGroup*Fold splitters.

  • "param_name": alias forsample_weight if we want to pass different weights todifferent consumers. If aliasing is used the meta-estimator should not forward"param_name" to the consumer, butsample_weight instead, because the consumerwill expect a param calledsample_weight. This means the mapping between themetadata required by the object, e.g.sample_weight and the variable name providedby the user, e.g.my_weights is done at the router level, and not by the consumingobject itself.

Metadata are requested in the same way for scorers usingset_score_request.

If a metadata, e.g.sample_weight, is passed by the user, the metadata request forall objects which potentially can consumesample_weight should be set by the user,otherwise an error is raised by the router object. For example, the following coderaises an error, since it hasn’t been explicitly specified whethersample_weightshould be passed to the estimator’s scorer or not:

>>>param_grid={"C":[0.1,1]}>>>lr=LogisticRegression().set_fit_request(sample_weight=True)>>>try:...GridSearchCV(...estimator=lr,param_grid=param_grid...).fit(X,y,sample_weight=my_weights)...exceptValueErrorase:...print(e)[sample_weight] are passed but are not explicitly set as requested or notrequested for LogisticRegression.score, which is used within GridSearchCV.fit.Call `LogisticRegression.set_score_request({metadata}=True/False)` for each metadatayou want to request/ignore. See the Metadata Routing User guide<https://scikit-learn.org/stable/metadata_routing.html> for more information.

The issue can be fixed by explicitly setting the request value:

>>>lr=LogisticRegression().set_fit_request(...sample_weight=True...).set_score_request(sample_weight=False)

At the end of theUsage Examples section, we disable the configuration flag formetadata routing:

>>>sklearn.set_config(enable_metadata_routing=False)

4.3.Metadata Routing Support Status#

All consumers (i.e. simple estimators which only consume metadata and don’troute them) support metadata routing, meaning they can be used insidemeta-estimators which support metadata routing. However, development of supportfor metadata routing for meta-estimators is in progress, and here is a list ofmeta-estimators and tools which support and don’t yet support metadata routing.

Meta-estimators and functions supporting metadata routing:

Meta-estimators and tools not supporting metadata routing yet: