C API Tutorial

In this tutorial, we are going to install XGBoost library & configure the CMakeLists.txt file of our C/C++ application to link XGBoost library with our application. Later on, we will see some useful tips for using C API and code snippets as examples to use various functions available in C API to perform basic task like loading, training model & predicting on test dataset. For API reference, please visitXGBoost C Package

Requirements

Install CMake - Follow thecmake installation documentation for instructions.Install Conda - Follow theconda installation documentation for instructions

Install XGBoost on conda environment

Run the following commands on your terminal. The below commands will install the XGBoost in your XGBoost folder of the repository cloned

# clone the XGBoost repository & its submodulesgitclone--recursivehttps://github.com/dmlc/xgboostcdxgboost# Activate the Conda environment, into which we'll install XGBoostcondaactivate[env_name]# Build the compiled version of XGBoost inside the build foldercmake-Bbuild-S.-DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX# install XGBoost in your conda environment (usually under [your home directory]/miniconda3)cmake--buildbuild--targetinstall

Configure CMakeList.txt file of your application to link with XGBoost

Here, we assume that your C++ application is using CMake for builds.

Usefind_package() andtarget_link_libraries() in your application’s CMakeList.txt to link with the XGBoost library:

cmake_minimum_required(VERSION3.18)project(your_project_nameLANGUAGESCCXXVERSIONyour_project_version)find_package(xgboostREQUIRED)add_executable(your_project_name/path/to/project_file.c)target_link_libraries(your_project_namexgboost::xgboost)

To ensure that CMake can locate the XGBoost library, supply-DCMAKE_PREFIX_PATH=$CONDA_PREFIX argument when invoking CMake. This option instructs CMake to locate the XGBoost library in$CONDA_PREFIX, which is where your Conda environment is located.

# Activate the Conda environment where we previously installed XGBoostcondaactivate[env_name]# Invoke CMake with CMAKE_PREFIX_PATHcmake-Bbuild-S.-DCMAKE_PREFIX_PATH=$CONDA_PREFIX# Build your applicationcmake--buildbuild

Useful Tips To Remember

Below are some useful tips while using C API:

  1. Error handling: Always check the return value of the C API functions.

  1. In a C application: Use the following macro to guard all calls to XGBoost’s C API functions. The macro prints all the error/ exception occurred:

1#define safe_xgboost(call) {  \2  int err = (call); \3  if (err != 0) { \4    fprintf(stderr, "%s:%d: error in %s: %s\n", __FILE__, __LINE__, #call, XGBGetLastError());  \5    exit(1); \6  } \7}

In your application, wrap all C API function calls with the macro as follows:

DMatrixHandletrain;safe_xgboost(XGDMatrixCreateFromFile("/path/to/training/dataset/",silent,&train));
  1. In a C++ application: modify the macrosafe_xgboost to throw an exception upon an error.

1#define safe_xgboost(call) {  \2  int err = (call); \3  if (err != 0) { \4    throw std::runtime_error(std::string(__FILE__) + ":" + std::to_string(__LINE__) + \5                        ": error in " + #call + ":" + XGBGetLastError());  \6  } \7}
  1. Assertion technique: It works both in C/ C++. If expression evaluates to 0 (false), then the expression, source code filename, and line number are sent to the standard error, and then abort() function is called. It can be used to test assumptions made by you in the code.

DMatrixHandledmat;assert(XGDMatrixCreateFromFile("training_data.libsvm",0,&dmat)==0);
  1. Always remember to free the allocated space by BoosterHandle & DMatrixHandle appropriately:

 1#include<assert.h> 2#include<stdio.h> 3#include<stdlib.h> 4#include<xgboost/c_api.h> 5 6intmain(intargc,char**argv){ 7intsilent=0; 8 9BoosterHandlebooster;1011// do something with booster1213//free the memory14XGBoosterFree(booster);1516DMatrixHandleDMatrixHandle_param;1718// do something with DMatrixHandle_param1920// free the memory21XGDMatrixFree(DMatrixHandle_param);2223return0;24}
  1. For tree models, it is important to use consistent data formats during training and scoring/ predicting otherwise it will result in wrong outputs.Example if we our training data is indensematrix format then your prediction dataset should also be adensematrix or if training inlibsvm format then dataset for prediction should also be inlibsvm format.

  2. Always use strings for setting values to the parameters in booster handle object. The parameter value can be of any data type (e.g. int, char, float, double, etc), but they should always be encoded as strings.

BoosterHandlebooster;XGBoosterSetParam(booster,"parameter_name","0.1");

Sample examples along with Code snippet to use C API functions

  1. If the dataset is available in a file, it can be loaded into aDMatrix object using theXGDMatrixCreateFromFile()

DMatrixHandledata;// handle to DMatrix// Load the data from file & store it in data variable of DMatrixHandle datatypesafe_xgboost(XGDMatrixCreateFromFile("/path/to/file/filename",silent,&data));
  1. You can also create aDMatrix object from a 2D Matrix using theXGDMatrixCreateFromMat()

 1// 1D matrix 2constintdata1[]={0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0}; 3 4// 2D matrix 5constintROWS=6,COLS=3; 6constintdata2[ROWS][COLS]={{1,2,3},{2,4,6},{3,-1,9},{4,8,-1},{2,5,1},{0,1,5}}; 7DMatrixHandledmatrix1,dmatrix2; 8// Pass the matrix, no of rows & columns contained in the matrix variable 9// here '0' represents the missing value in the matrix dataset10// dmatrix variable will contain the created DMatrix using it11safe_xgboost(XGDMatrixCreateFromMat(data1,1,50,0,&dmatrix));12// here -1 represents the missing value in the matrix dataset13safe_xgboost(XGDMatrixCreateFromMat(data2,ROWS,COLS,-1,&dmatrix2));
  1. Create a Booster object for training & testing on dataset usingXGBoosterCreate()

1BoosterHandlebooster;2constinteval_dmats_size;3// We assume that training and test data have been loaded into 'train' and 'test'4DMatrixHandleeval_dmats[eval_dmats_size]={train,test};5safe_xgboost(XGBoosterCreate(eval_dmats,eval_dmats_size,&booster));
  1. For eachDMatrix object, set the labels usingXGDMatrixSetFloatInfo(). Later you can access the label usingXGDMatrixGetFloatInfo().

 1constintROWS=5,COLS=3; 2constintdata[ROWS][COLS]={{1,2,3},{2,4,6},{3,-1,9},{4,8,-1},{2,5,1},{0,1,5}}; 3DMatrixHandledmatrix; 4 5safe_xgboost(XGDMatrixCreateFromMat(data,ROWS,COLS,-1,&dmatrix)); 6 7// variable to store labels for the dataset created from above matrix 8floatlabels[ROWS]; 910for(inti=0;i<ROWS;i++){11labels[i]=i;12}1314// Loading the labels15safe_xgboost(XGDMatrixSetFloatInfo(dmatrix,"label",labels,ROWS));1617// reading the labels and store the length of the result18bst_ulongresult_len;1920// labels result21constfloat*result;2223safe_xgboost(XGDMatrixGetFloatInfo(dmatrix,"label",&result_len,&result));2425for(unsignedinti=0;i<result_len;i++){26printf("label[%i] = %f\n",i,result[i]);27}
  1. Set the parameters for theBooster object according to the requirement usingXGBoosterSetParam() . Check out the full list of parameters availablehere .

1BoosterHandlebooster;2safe_xgboost(XGBoosterSetParam(booster,"booster","gblinear"));3// default max_depth =64safe_xgboost(XGBoosterSetParam(booster,"max_depth","3"));5// default eta  = 0.36safe_xgboost(XGBoosterSetParam(booster,"eta","0.1"));
  1. Train & evaluate the model usingXGBoosterUpdateOneIter() andXGBoosterEvalOneIter() respectively.

 1intnum_of_iterations=20; 2constchar*eval_names[eval_dmats_size]={"train","test"}; 3constchar*eval_result=NULL; 4 5for(inti=0;i<num_of_iterations;++i){ 6// Update the model performance for each iteration 7safe_xgboost(XGBoosterUpdateOneIter(booster,i,train)); 8 9// Give the statistics for the learner for training & testing dataset in terms of error after each iteration10safe_xgboost(XGBoosterEvalOneIter(booster,i,eval_dmats,eval_names,eval_dmats_size,&eval_result));11printf("%s\n",eval_result);12}

Note

For customized loss function, useXGBoosterBoostOneIter() instead and manually specify the gradient and 2nd order gradient.

  1. Predict the result on a test set usingXGBoosterPredictFromDMatrix()

 1charconstconfig[]= 2"{\"training\": false,\"type\": 0, " 3"\"iteration_begin\": 0,\"iteration_end\": 0,\"strict_shape\": false}"; 4/* Shape of output prediction */ 5uint64_tconst*out_shape; 6/* Dimension of output prediction */ 7uint64_tout_dim; 8/* Pointer to a thread local contiguous array, assigned in prediction function. */ 9floatconst*out_result=NULL;10safe_xgboost(11XGBoosterPredictFromDMatrix(booster,dmatrix,config,&out_shape,&out_dim,&out_result));1213for(unsignedinti=0;i<output_length;i++){14printf("prediction[%i] = %f\n",i,output_result[i]);15}
  1. Get the number of features in your dataset usingXGBoosterGetNumFeature().

1bst_ulongnum_of_features=0;23// Assuming booster variable of type BoosterHandle is already declared4// and dataset is loaded and trained on booster5// storing the results in num_of_features variable6safe_xgboost(XGBoosterGetNumFeature(booster,&num_of_features));78// Printing number of features by type conversion of num_of_features variable from bst_ulong to unsigned long9printf("num_feature: %lu\n",(unsignedlong)(num_of_features));
  1. Save the model usingXGBoosterSaveModel()

BoosterHandlebooster;constchar*model_path="/path/of/model.json";safe_xgboost(XGBoosterSaveModel(booster,model_path));
  1. Load the model usingXGBoosterLoadModel()

 1BoosterHandlebooster; 2constchar*model_path="/path/of/model.json"; 3 4// create booster handle first 5safe_xgboost(XGBoosterCreate(NULL,0,&booster)); 6 7// set the model parameters here 8 9// load model10safe_xgboost(XGBoosterLoadModel(booster,model_path));1112// predict the model here
  1. Free all the internal structure used in your code usingXGDMatrixFree() andXGBoosterFree(). This step is important to prevent memory leak.

safe_xgboost(XGDMatrixFree(dmatrix));safe_xgboost(XGBoosterFree(booster));