Prediction

There are a number of prediction functions in XGBoost with various parameters. Thisdocument attempts to clarify some of confusions around prediction with a focus on thePython binding, R package is similar whenstrict_shape is specified (see below).

Prediction Options

There are a number of different prediction options for thexgboost.Booster.predict() method, ranging frompred_contribs topred_leaf. The output shape depends on types of prediction. Also for multi-classclassification problem, XGBoost builds one tree for each class and the trees for eachclass are called a “group” of trees, so output dimension may change due to used model.After 1.4 release, we added a new parameter calledstrict_shape, one can set it toTrue to indicate a more restricted output is desired. Assuming you are usingxgboost.Booster, here is a list of possible returns:

  • When using normal prediction withstrict_shape set toTrue:

    Output is a 2-dim array with first dimension as rows and second as groups. Forregression/survival/ranking/binary classification this is equivalent to a column vectorwithshape[1]==1. But for multi-class withmulti:softprob the number ofcolumns equals to number of classes. If strict_shape is set to False then XGBoost mightoutput 1 or 2 dim array.

  • When usingoutput_margin to avoid transformation andstrict_shape is set toTrue:

    Similar to the previous case, output is a 2-dim array, except for thatmulti:softmaxhas equivalent output shape ofmulti:softprob due to dropped transformation. Ifstrict shape is set to False then output can have 1 or 2 dim depending on used model.

  • When usingpred_contribs withstrict_shape set toTrue:

    Output is a 3-dim array, with(rows,groups,columns+1) as shape. Whetherapprox_contribs is used does not change the output shape. If the strict shapeparameter is not set, it can be a 2 or 3 dimension array depending on whethermulti-class model is being used.

  • When usingpred_interactions withstrict_shape set toTrue:

    Output is a 4-dim array, with(rows,groups,columns+1,columns+1) as shape.Like the predict contribution case, whetherapprox_contribs is used does not changethe output shape. If strict shape is set to False, it can have 3 or 4 dims depending onthe underlying model.

  • When usingpred_leaf withstrict_shape set toTrue:

    Output is a 4-dim array with(n_samples,n_iterations,n_classes,n_trees_in_forest)as shape.n_trees_in_forest is specified by thenumb_parallel_tree duringtraining. When strict shape is set to False, output is a 2-dim array with last 3 dimsconcatenated into 1. Also the last dimension is dropped if it equals to 1. When usingapply method in scikit learn interface, this is set to False by default.

For R package, whenstrict_shape is specified, anarray is returned, with the samevalue as Python except R array is column-major while Python numpy array is row-major, soall the dimensions are reversed. For example, for a Pythonpredict_leaf outputobtained by havingstrict_shape=True has 4 dimensions:(n_samples,n_iterations,n_classes,n_trees_in_forest), while R withstrict_shape=TRUE outputs(n_trees_in_forest,n_classes,n_iterations,n_samples).

Other than these prediction types, there’s also a parameter callediteration_range,which is similar to model slicing. But instead of actually splitting up the model intomultiple stacks, it simply returns the prediction formed by the trees within range.Number of trees created in each iteration equals to\(trees_i = num\_class \timesnum\_parallel\_tree\). So if you are training a boosted random forest with size of 4, onthe 3-class classification dataset, and want to use the first 2 iterations of trees forprediction, you need to provideiteration_range=(0,2). Then the first\(2\times 3 \times 4\) trees will be used in this prediction.

Early Stopping

When a model is trained with early stopping, there is an inconsistent behavior betweennative Python interface and sklearn/R interfaces. By default on R and sklearn interfaces,thebest_iteration is automatically used so prediction comes from the best model. Butwith the native Python interfacexgboost.Booster.predict() andxgboost.Booster.inplace_predict() uses the full model. Users can usebest_iteration attribute withiteration_range parameter to achieve the samebehavior. Also thesave_best parameter fromxgboost.callback.EarlyStoppingmight be useful.

Base Margin

There’s a training parameter in XGBoost calledbase_score, and a meta data forDMatrix calledbase_margin (which can be set infit method if you are usingscikit-learn interface). They specifies the global bias for boosted model. If the latteris supplied then former is ignored.base_margin can be used to train XGBoost modelbased on other models. See demos on boosting from predictions.

Staged Prediction

Using the native interface withDMatrix, prediction can be staged (or cached). Forexample, one can first predict on the first 4 trees then run prediction on 8 trees. Afterrunning the first prediction, result from first 4 trees are cached so when you run theprediction with 8 trees XGBoost can reuse the result from previous prediction. The cacheexpires automatically upon next prediction, train or evaluation if the cachedDMatrixobject is expired (like going out of scope and being collected by garbage collector inyour language environment).

In-place Prediction

Traditionally XGBoost accepts onlyDMatrix for prediction, with wrappers likescikit-learn interface the construction happens internally. We added support for in-placepredict to bypass the construction ofDMatrix, which is slow and memory consuming.The new predict function has limited features but is often sufficient for simple inferencetasks. It accepts some commonly found data types in Python likenumpy.ndarray,scipy.sparse.csr_matrix andcudf.DataFrame instead ofxgboost.DMatrix. You can callxgboost.Booster.inplace_predict() to useit. Be aware that the output of in-place prediction depends on input data type, wheninput is on GPU data output iscupy.ndarray, otherwise anumpy.ndarrayis returned.

Thread Safety

After 1.4 release, all prediction functions including normalpredict with variousparameters like shap value computation andinplace_predict are thread safe whenunderlying booster isgbtree ordart, which means as long as tree model is used,prediction itself should thread safe. But the safety is only guaranteed with prediction.If one tries to train a model in one thread and provide prediction at the other using thesame model the behaviour is undefined. This happens easier than one might expect, forinstance we might accidentally callclf.set_params() inside a predict function:

defpredict_fn(clf:xgb.XGBClassifier,X):X=preprocess(X)clf.set_params(n_jobs=1)# NOT safe!returnclf.predict_proba(X,iteration_range=(0,10))withThreadPoolExecutor(max_workers=10)ase:e.submit(predict_fn,...)

Privacy-Preserving Prediction

Concrete ML is a third-party open-source library developed byZama that proposes gradientboosting classes similar to ours, but predicting directly over encrypted data, thanks toFully Homomorphic Encryption. A simple example would be as follows:

fromsklearn.datasetsimportmake_classificationfromsklearn.model_selectionimporttrain_test_splitfromconcrete.ml.sklearnimportXGBClassifierx,y=make_classification(n_samples=100,class_sep=2,n_features=30,random_state=42)X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=10,random_state=42)# Train in the clear and quantize the weightsmodel=XGBClassifier()model.fit(X_train,y_train)# Simulate the predictions in the cleary_pred_clear=model.predict(X_test)# Compile in FHEmodel.compile(X_train)# Generate keysmodel.fhe_circuit.keygen()# Run the inference on encrypted inputs!y_pred_fhe=model.predict(X_test,fhe="execute")print("In clear  :",y_pred_clear)print("In FHE    :",y_pred_fhe)print(f"Similarity:{int((y_pred_fhe==y_pred_clear).mean()*100)}%")

More information and examples are given in theConcrete ML documentation.