Intercept
Added in version 2.0.0.
Since 2.0.0, XGBoost supports estimating the model intercept (namedbase_score)automatically based on targets upon training. The behavior can be controlled by settingbase_score to a constant value. The following snippet disables the automaticestimation:
importxgboostasxgbclf=xgb.XGBClassifier(n_estimators=10)clf.set_params(base_score=0.5)
library(xgboost)# Load built-in datasetdata(agaricus.train,package="xgboost")# Set base_score parameter directlymodel<-xgboost(x=agaricus.train$data,y=factor(agaricus.train$label),base_score=0.5,nrounds=10)
In addition, here 0.5 represents the value after applying the inverse link function. Seethe end of the document for a description.
Other than thebase_score, users can also provide global bias via the data fieldbase_margin, which is a vector or a matrix depending on the task. With multi-outputand multi-class, thebase_margin is a matrix with size(n_samples,n_targets) or(n_samples,n_classes).
importxgboostasxgbfromsklearn.datasetsimportmake_classificationX,y=make_classification()clf=xgb.XGBClassifier()clf.fit(X,y)# Request for raw predictionm=clf.predict(X,output_margin=True)clf_1=xgb.XGBClassifier()# Feed the prediction into the next model# Using base margin overrides the base score, see below sections.clf_1.fit(X,y,base_margin=m)clf_1.predict(X,base_margin=m)
library(xgboost)# Load built-in datasetdata(agaricus.train,package="xgboost")# Train first modelmodel_1<-xgboost(x=agaricus.train$data,y=factor(agaricus.train$label),nrounds=10)# Request for raw predictionm<-predict(model_1,agaricus.train$data,type="raw")# Feed the prediction into the next model using base_margin# Using base margin overrides the base score, see below sections.model_2<-xgboost(x=agaricus.train$data,y=factor(agaricus.train$label),base_margin=m,nrounds=10)# Make predictions with base_marginpred<-predict(model_2,agaricus.train$data,base_margin=m)
It specifies the bias for each sample and can be used for stacking an XGBoost model on topof other models, seeDemo for boosting from prediction for a workedexample. Whenbase_margin is specified, it automatically overrides thebase_scoreparameter. If you are stacking XGBoost models, then the usage should be relativelystraightforward, with the previous model providing raw prediction and a new model usingthe prediction as bias. For more customized inputs, users need to take extra care of thelink function. Let\(F\) be the model and\(g\) be the link function, sincebase_score is overridden when sample-specificbase_margin is available, we willomit it here:
When base margin\(b\) is provided, it’s added to the raw model output\(F\):
and the output of the final model is:
Using the gamma deviance objectivereg:gamma as an example, which has a log linkfunction, hence:
As a result, if you are feeding outputs from models like GLM with a correspondingobjective function, make sure the outputs are not yet transformed by the inverse link(activation).
In the case ofbase_score (intercept), it can be accessed throughsave_config() after estimation. Unlike thebase_margin, thereturned value represents a value after applying inverse link. With logistic regressionand the logit link function as an example, given thebase_score as 0.5,\(g(intercept) = logit(0.5) = 0\) is added to the raw model output:
and 0.5 is the same as\(base\_score = g^{-1}(0) = 0.5\). This is more intuitive ifyou remove the model and consider only the intercept, which is estimated before the modelis fitted:
For some objectives like MAE, there are close solutions, while for others it’s estimatedwith one step Newton method.
Offset
Thebase_margin is a form ofoffset in GLM. Using the Poisson objective as anexample, we might want to model the rate instead of the count:
And the offset is defined as log link applied to the exposure variable:\(\ln{exposure}\). Let\(c\) be the count and\(\gamma\) be the exposure,substituting the response\(y\) in our previous formulation of base margin:
Substitute\(g\) with\(\ln\) for Poisson regression:
We have:
As you can see, we can use thebase_margin for modeling with offset similar to GLMs
Example
The following example shows the relationship betweenbase_score andbase_marginusing binary logistic with alogit link function:
importnumpyasnpfromscipy.specialimportlogitfromsklearn.datasetsimportmake_classificationimportxgboostasxgbX,y=make_classification(random_state=2025)
library(xgboost)# Load built-in datasetdata(agaricus.train,package="xgboost")X<-agaricus.train$datay<-agaricus.train$label
The intercept is a valid probability (0.5). It’s used as the initial estimation of theprobability of obtaining a positive sample.
intercept=0.5
intercept<-0.5
First we use the intercept to train a model:
booster=xgb.train({"base_score":intercept,"objective":"binary:logistic"},dtrain=xgb.DMatrix(X,y),num_boost_round=1,)predt_0=booster.predict(xgb.DMatrix(X,y))
# First model with base_scoremodel_0<-xgboost(x=X,y=factor(y),base_score=intercept,objective="binary:logistic",nrounds=1)predt_0<-predict(model_0,X)
Applylogit() to obtain the “margin”:
# Apply logit function to obtain the "margin"margin=np.full(y.shape,fill_value=logit(intercept),dtype=np.float32)Xy=xgb.DMatrix(X,y,base_margin=margin)# Second model with base_margin# 0.2 is a dummy value to show that `base_margin` overrides `base_score`.booster=xgb.train({"base_score":0.2,"objective":"binary:logistic"},dtrain=Xy,num_boost_round=1,)predt_1=booster.predict(Xy)
# Apply logit function to obtain the "margin"logit_intercept<-log(intercept/(1-intercept))margin<-rep(logit_intercept,length(y))# Second model with base_margin# 0.2 is a dummy value to show that `base_margin` overrides `base_score`model_1<-xgboost(x=X,y=factor(y),base_margin=margin,base_score=0.2,objective="binary:logistic",nrounds=1)predt_1<-predict(model_1,X,base_margin=margin)
Compare the results:
np.testing.assert_allclose(predt_0,predt_1)
all.equal(predt_0,predt_1,tolerance=1e-6)