Intercept

Added in version 2.0.0.

Since 2.0.0, XGBoost supports estimating the model intercept (namedbase_score)automatically based on targets upon training. The behavior can be controlled by settingbase_score to a constant value. The following snippet disables the automaticestimation:

importxgboostasxgbclf=xgb.XGBClassifier(n_estimators=10)clf.set_params(base_score=0.5)

In addition, here 0.5 represents the value after applying the inverse link function. Seethe end of the document for a description.

Other than thebase_score, users can also provide global bias via the data fieldbase_margin, which is a vector or a matrix depending on the task. With multi-outputand multi-class, thebase_margin is a matrix with size(n_samples,n_targets) or(n_samples,n_classes).

importxgboostasxgbfromsklearn.datasetsimportmake_classificationX,y=make_classification()clf=xgb.XGBClassifier()clf.fit(X,y)# Request for raw predictionm=clf.predict(X,output_margin=True)clf_1=xgb.XGBClassifier()# Feed the prediction into the next model# Using base margin overrides the base score, see below sections.clf_1.fit(X,y,base_margin=m)clf_1.predict(X,base_margin=m)

It specifies the bias for each sample and can be used for stacking an XGBoost model on topof other models, seeDemo for boosting from prediction for a workedexample. Whenbase_margin is specified, it automatically overrides thebase_scoreparameter. If you are stacking XGBoost models, then the usage should be relativelystraightforward, with the previous model providing raw prediction and a new model usingthe prediction as bias. For more customized inputs, users need to take extra care of thelink function. Let\(F\) be the model and\(g\) be the link function, sincebase_score is overridden when sample-specificbase_margin is available, we willomit it here:

\[g(E[y_i]) = F(x_i)\]

When base margin\(b\) is provided, it’s added to the raw model output\(F\):

\[g(E[y_i]) = F(x_i) + b_i\]

and the output of the final model is:

\[g^{-1}(F(x_i) + b_i)\]

Using the gamma deviance objectivereg:gamma as an example, which has a log linkfunction, hence:

\[\begin{split}\ln{(E[y_i])} = F(x_i) + b_i \\E[y_i] = \exp{(F(x_i) + b_i)}\end{split}\]

As a result, if you are feeding outputs from models like GLM with a correspondingobjective function, make sure the outputs are not yet transformed by the inverse link(activation).

In the case ofbase_score (intercept), it can be accessed throughsave_config() after estimation. Unlike thebase_margin, thereturned value represents a value after applying inverse link. With logistic regressionand the logit link function as an example, given thebase_score as 0.5,\(g(intercept) = logit(0.5) = 0\) is added to the raw model output:

\[E[y_i] = g^{-1}{(F(x_i) + g(intercept))}\]

and 0.5 is the same as\(base\_score = g^{-1}(0) = 0.5\). This is more intuitive ifyou remove the model and consider only the intercept, which is estimated before the modelis fitted:

\[\begin{split}E[y] = g^{-1}{(g(intercept))} \\E[y] = intercept\end{split}\]

For some objectives like MAE, there are close solutions, while for others it’s estimatedwith one step Newton method.

Offset

Thebase_margin is a form ofoffset in GLM. Using the Poisson objective as anexample, we might want to model the rate instead of the count:

\[rate = \frac{count}{exposure}\]

And the offset is defined as log link applied to the exposure variable:\(\ln{exposure}\). Let\(c\) be the count and\(\gamma\) be the exposure,substituting the response\(y\) in our previous formulation of base margin:

\[g(\frac{E[c_i]}{\gamma_i}) = F(x_i)\]

Substitute\(g\) with\(\ln\) for Poisson regression:

\[\ln{\frac{E[c_i]}{\gamma_i}} = F(x_i)\]

We have:

\[\begin{split}E[c_i] &= \exp{(F(x_i) + \ln{\gamma_i})} \\E[c_i] &= g^{-1}(F(x_i) + g(\gamma_i))\end{split}\]

As you can see, we can use thebase_margin for modeling with offset similar to GLMs

Example

The following example shows the relationship betweenbase_score andbase_marginusing binary logistic with alogit link function:

importnumpyasnpfromscipy.specialimportlogitfromsklearn.datasetsimportmake_classificationimportxgboostasxgbX,y=make_classification(random_state=2025)

The intercept is a valid probability (0.5). It’s used as the initial estimation of theprobability of obtaining a positive sample.

intercept=0.5

First we use the intercept to train a model:

booster=xgb.train({"base_score":intercept,"objective":"binary:logistic"},dtrain=xgb.DMatrix(X,y),num_boost_round=1,)predt_0=booster.predict(xgb.DMatrix(X,y))

Applylogit() to obtain the “margin”:

# Apply logit function to obtain the "margin"margin=np.full(y.shape,fill_value=logit(intercept),dtype=np.float32)Xy=xgb.DMatrix(X,y,base_margin=margin)# Second model with base_margin# 0.2 is a dummy value to show that `base_margin` overrides `base_score`.booster=xgb.train({"base_score":0.2,"objective":"binary:logistic"},dtrain=Xy,num_boost_round=1,)predt_1=booster.predict(Xy)

Compare the results:

np.testing.assert_allclose(predt_0,predt_1)