XGBoost C Package

XGBoost implements a set of C API designed for various bindings, we maintain its stabilityand the CMake/make build interface. SeeC API Tutorial for anintroduction anddemo/c-api/ for related examples. Also one can generate doxygendocument by providing-DBUILD_C_DOC=ON as parameter toCMake during build, orsimply look at function comments ininclude/xgboost/c_api.h. The reference is exportedto sphinx with the help of breathe, which doesn’t contain links to examples but might beeasier to read. For the original doxygen pages please visit:

C API Reference

Library

groupLibrary

These functions are used to obtain general information about XGBoost including version, build info and current global configuration.

Typedefs

typedefvoid*DMatrixHandle

handle to DMatrix

typedefvoid*BoosterHandle

handle to Booster

Functions

voidXGBoostVersion(int*major,int*minor,int*patch)

Return the version of the XGBoost library.

The output variable is only written if it’s not NULL.

Parameters:
  • major – Store the major version number.

  • minor – Store the minor version number.

  • patch – Store the patch (revision) number.

intXGBuildInfo(charconst**out)

Get compile information of the shared XGBoost library.

Parameters:

out – string encoded JSON object containing build flags and dependency versions.

Returns:

0 when success, -1 when failure happens

constchar*XGBGetLastError()

Get the string message of the last error.

Most functions in XGBoost returns 0 when success and non-zero when an error occurred. In the case of error,XGBGetLastError can be used to retrieve the error message

This function is thread safe.

Returns:

The error message from the last error.

intXGBRegisterLogCallback(void(*callback)(constchar*))

register callback function for LOG(INFO) messages — helpful messages that are not errors.

Note

This function can be called by multiple threads. The callback function will run on the thread that registered it.

Returns:

0 when success, -1 when failure happens

intXGBSetGlobalConfig(charconst*config)

Set global configuration (collection of parameters that apply globally). This function accepts the list of key-value pairs representing the global-scope parameters to be configured. The list of key-value pairs are passed in as a JSON string.

Parameters:

config – a JSON string representing the list of key-value pairs. The JSON object shall be flat: no value can be a JSON object or an array.

Returns:

0 when success, -1 when failure happens

intXGBGetGlobalConfig(charconst**out_config)

Get current global configuration (collection of parameters that apply globally).

Parameters:

out_config – pointer to received returned global configuration, represented as a JSON string.

Returns:

0 when success, -1 when failure happens

DMatrix

groupDMatrix

DMatrix is the basic data storage for XGBoost used by all XGBoost algorithms including both training, prediction and explanation. There are a few variants ofDMatrix including normalDMatrix, which is a CSR matrix,QuantileDMatrix, which is used by histogram-based tree methods for saving memory, and lastly the experimental external-memory-based DMatrix, which reads data in batches during training. For the last two variants, see theStreaming group.

Functions

intXGDMatrixCreateFromFile(constchar*fname,intsilent,DMatrixHandle*out)

load a data matrix

Deprecated:

since 2.0.0

Parameters:
  • fname – the name of the file

  • silent – whether print messages during loading

  • out – a loaded data matrix

Returns:

0 when success, -1 when failure happens

intXGDMatrixCreateFromURI(charconst*config,DMatrixHandle*out)

load a data matrix

Parameters:
  • config – JSON encoded parameters for DMatrix construction. Accepted fields are:

    • uri: The URI of the input file. The URI parameterformat is required when loading text data.

      SeeText Input Format of DMatrix for more info.

    • silent (optional): Whether to print message during loading. Default to true.

    • data_split_mode (optional): Whether the file was split by row or column beforehand for distributed computing. Default to row.

  • out – a loaded data matrix

Returns:

0 when success, -1 when failure happens

intXGDMatrixCreateFromColumnar(charconst*data,charconst*config,DMatrixHandle*out)

Create a DMatrix from columnar data. (table)

A special type of input to theDMatrix is the columnar format, which refers to column-based dataframes. XGBoost can accept both numeric data types like integers and floats, along with the categorical type, called dictionary in arrow’s term. The addition of categorical type is introduced in 3.1.0. The dataframe is represented by a list array interfaces with one object for each column.

A categorical type is represented by 3 buffers, the validity mask, the names of the categories (called index for most of the dataframe implementation), and the codes used to represent the categories in the rows. XGBoost consumes a categorical column by accepting two JSON-encoded arrow arrays in a list. The first item in the list is a JSON object with{"offsets":IntegerArray,"values":StringArray} representing the string names defined by the arrow columnar format. The second buffer is an masked integer array that stores the categorical codes along with the validity mask:

[// categorical column, represented as an array (list)[{'offsets':{'data':(129412626415808,True),'typestr':'<i4','version':3,'strides':None,'shape':(3,),'mask':None},'values':{'data':(129412626416000,True),'typestr':'<i1','version':3,'strides':None,'shape':(7,),'mask':None}},{'data':(106200854378448,True),'typestr':'<i1','version':3,'strides':None,'shape':(2,),'mask':None}],// numeric column, represented as an object, same number of rows as the previous column (2){'data':(106200854378448,True),'typestr':'<f4','version':3,'strides':None,'shape':(2,),'mask':None}]

As for numeric inputs, it’s the same as dense array.

Parameters:
  • data – A list of JSON-encoded array interfaces.

  • config – SeeXGDMatrixCreateFromDense for details.

  • out – The created DMatrix.

Returns:

0 when success, -1 when failure happens

intXGDMatrixCreateFromCSR(charconst*indptr,charconst*indices,charconst*data,bst_ulongncol,charconst*config,DMatrixHandle*out)

Create a DMatrix from CSR matrix.

Parameters:
  • indptr – JSON encodedarray_interface to row pointers in CSR.

  • indices – JSON encodedarray_interface to column indices in CSR.

  • data – JSON encodedarray_interface to values in CSR.

  • ncol – The number of columns.

  • config – SeeXGDMatrixCreateFromDense for details.

  • out – The created dmatrix

Returns:

0 when success, -1 when failure happens

intXGDMatrixCreateFromDense(charconst*data,charconst*config,DMatrixHandle*out)

Create a DMatrix from dense array.

The array interface is defined inhttps://numpy.org/doc/2.1/reference/arrays.interface.html We encode the interface as a JSON object.

Parameters:
  • data – JSON encodedarray_interface to array values.

  • config – JSON encoded configuration. Required values are:

    • missing: Which value to represent missing value.

    • nthread (optional): Number of threads used for initializing DMatrix.

    • data_split_mode (optional): Whether the data was split by row or column beforehand. Default to row.

  • out – The created DMatrix

Returns:

0 when success, -1 when failure happens

intXGDMatrixCreateFromCSC(charconst*indptr,charconst*indices,charconst*data,bst_ulongnrow,charconst*config,DMatrixHandle*out)

Create a DMatrix from a CSC matrix.

Parameters:
  • indptr – JSON encodedarray_interface to column pointers in CSC.

  • indices – JSON encodedarray_interface to row indices in CSC.

  • data – JSON encodedarray_interface to values in CSC.

  • nrow – The number of rows in the matrix.

  • config – SeeXGDMatrixCreateFromDense for details.

  • out – The created dmatrix.

Returns:

0 when success, -1 when failure happens

intXGDMatrixCreateFromMat(constfloat*data,bst_ulongnrow,bst_ulongncol,floatmissing,DMatrixHandle*out)

create matrix content from dense matrix

Parameters:
  • data – pointer to the data space

  • nrow – number of rows

  • ncol – number columns

  • missing – which value to represent missing value

  • out – created dmatrix

Returns:

0 when success, -1 when failure happens

intXGDMatrixCreateFromMat_omp(constfloat*data,bst_ulongnrow,bst_ulongncol,floatmissing,DMatrixHandle*out,intnthread)

create matrix content from dense matrix

Parameters:
  • data – pointer to the data space

  • nrow – number of rows

  • ncol – number columns

  • missing – which value to represent missing value

  • out – created dmatrix

  • nthread – number of threads (up to maximum cores available, if <=0 use all cores)

Returns:

0 when success, -1 when failure happens

intXGDMatrixCreateFromCudaColumnar(charconst*data,charconst*config,DMatrixHandle*out)

Create DMatrix from CUDA columnar format. (cuDF)

SeeXGDMatrixCreateFromColumnar for a brief description of the columnar format.

Parameters:
  • data – A list of JSON-encoded array interfaces.

  • config – SeeXGDMatrixCreateFromDense for details.

  • out – Created dmatrix

Returns:

0 when success, -1 when failure happens

intXGDMatrixCreateFromCudaArrayInterface(charconst*data,charconst*config,DMatrixHandle*out)

Create DMatrix from CUDA array.

Parameters:
  • data – JSON encodedcuda_array_interface for array data.

  • config – JSON encoded configuration. Required values are:

    • missing: Which value to represent missing value.

    • nthread (optional): Number of threads used for initializing DMatrix.

    • data_split_mode (optional): Whether the data was split by row or column beforehand. Default to row.

  • out – created dmatrix

Returns:

0 when success, -1 when failure happens

intXGDMatrixSliceDMatrix(DMatrixHandlehandle,constint*idxset,bst_ulonglen,DMatrixHandle*out)

create a new dmatrix from sliced content of existing matrix

Parameters:
  • handle – instance of data matrix to be sliced

  • idxset – index set

  • len – length of index set

  • out – a sliced new matrix

Returns:

0 when success, -1 when failure happens

intXGDMatrixSliceDMatrixEx(DMatrixHandlehandle,constint*idxset,bst_ulonglen,DMatrixHandle*out,intallow_groups)

create a new dmatrix from sliced content of existing matrix

Parameters:
  • handle – instance of data matrix to be sliced

  • idxset – index set

  • len – length of index set

  • out – a sliced new matrix

  • allow_groups – allow slicing of an array with groups

Returns:

0 when success, -1 when failure happens

intXGDMatrixFree(DMatrixHandlehandle)

Free a DMatrix object.

Returns:

0 when success, -1 when failure happens

intXGDMatrixSaveBinary(DMatrixHandlehandle,constchar*fname,intsilent)

Save the DMatrix object into a file.QuantileDMatrix and external memory DMatrix are not supported.

Parameters:
  • handle – a instance of data matrix

  • fname – File name

  • silent – print statistics when saving

Returns:

0 when success, -1 when failure happens

intXGDMatrixSetInfoFromInterface(DMatrixHandlehandle,charconst*field,charconst*data)

Set content in array interface to a content in info.

Parameters:
  • handle – An instance of data matrix

  • field – Field name.

  • data – JSON encodedarray_interface to values in the dense matrix/vector.

Returns:

0 when success, -1 when failure happens

intXGDMatrixSetFloatInfo(DMatrixHandlehandle,constchar*field,constfloat*array,bst_ulonglen)

set float vector to a content in info

Parameters:
  • handle – a instance of data matrix

  • field – field name, can be label, weight

  • array – pointer to float vector

  • len – length of array

Returns:

0 when success, -1 when failure happens

intXGDMatrixSetUIntInfo(DMatrixHandlehandle,constchar*field,constunsigned*array,bst_ulonglen)

Deprecated:

since 2.1.0

UseXGDMatrixSetInfoFromInterface instead.

intXGDMatrixSetStrFeatureInfo(DMatrixHandlehandle,constchar*field,constchar**features,constbst_ulongsize)

Set string encoded information of all features.

Accepted fields are:

  • feature_name

  • feature_type

charconst*feat_names[]{"feat_0","feat_1"};XGDMatrixSetStrFeatureInfo(handle,"feature_name",feat_names,2);// i for integer, q for quantitive, c for categorical.  Similarly "int" and "float"// are also recognized.charconst*feat_types[]{"i","q"};XGDMatrixSetStrFeatureInfo(handle,"feature_type",feat_types,2);

Parameters:
  • handle – An instance of data matrix

  • field – Field name

  • features – Pointer to array of strings.

  • size – Size offeatures pointer (number of strings passed in).

Returns:

0 when success, -1 when failure happens

intXGDMatrixGetStrFeatureInfo(DMatrixHandlehandle,constchar*field,bst_ulong*size,constchar***out_features)

Get string encoded information of all features.

Accepted fields are:

  • feature_name

  • feature_type

Caller is responsible for copying out the data, before next call to any API function of XGBoost.

charconst**c_out_features=NULL;bst_ulongout_size=0;// Asumming the feature names are already set by `XGDMatrixSetStrFeatureInfo`.XGDMatrixGetStrFeatureInfo(handle,"feature_name",&out_size,&c_out_features)for(bst_ulongi=0;i<out_size;++i){// Here we are simply printing the string.  Copy it out if the feature name is// useful after printing.printf("feature %lu: %s\n",i,c_out_features[i]);}

Parameters:
  • handle – An instance of data matrix

  • field – Field name

  • size – Size of output pointerfeatures (number of strings returned).

  • out_features – Address of a pointer to array of strings. Result is stored in thread local memory.

Returns:

0 when success, -1 when failure happens

intXGDMatrixSetDenseInfo(DMatrixHandlehandle,constchar*field,voidconst*data,bst_ulongsize,inttype)

Deprecated:

since 2.1.0

UseXGDMatrixSetInfoFromInterface instead.

intXGDMatrixGetFloatInfo(constDMatrixHandlehandle,constchar*field,bst_ulong*out_len,constfloat**out_dptr)

get float info vector from matrix.

Parameters:
  • handle – a instance of data matrix

  • field – field name

  • out_len – used to set result length

  • out_dptr – pointer to the result

Returns:

0 when success, -1 when failure happens

intXGDMatrixGetUIntInfo(constDMatrixHandlehandle,constchar*field,bst_ulong*out_len,constunsigned**out_dptr)

get uint32 info vector from matrix

Parameters:
  • handle – a instance of data matrix

  • field – field name

  • out_len – The length of the field.

  • out_dptr – pointer to the result

Returns:

0 when success, -1 when failure happens

intXGDMatrixNumRow(DMatrixHandlehandle,bst_ulong*out)

Get the number of rows from a DMatrix.

Parameters:
  • handle – the handle to the DMatrix

  • out – The address to hold number of rows.

Returns:

0 when success, -1 when failure happens

intXGDMatrixNumCol(DMatrixHandlehandle,bst_ulong*out)

Get the number of columns from a DMatrix.

Parameters:
  • handle – the handle to the DMatrix

  • out – The output of number of columns

Returns:

0 when success, -1 when failure happens

intXGDMatrixNumNonMissing(DMatrixHandlehandle,bst_ulong*out)

Get number of valid values from a DMatrix.

Parameters:
  • handle – the handle to the DMatrix

  • out – The output of number of non-missing values

Returns:

0 when success, -1 when failure happens

intXGDMatrixDataSplitMode(DMatrixHandlehandle,bst_ulong*out)

Get the data split mode from DMatrix.

Parameters:
  • handle – the handle to the DMatrix

  • out – The output of the data split mode

Returns:

0 when success, -1 when failure happens

intXGDMatrixGetDataAsCSR(DMatrixHandleconsthandle,charconst*config,bst_ulong*out_indptr,unsigned*out_indices,float*out_data)

Get the predictors from DMatrix as CSR matrix for testing. If this is a quantized DMatrix, quantized values are returned instead.

Unlike most of XGBoost C functions, caller ofXGDMatrixGetDataAsCSR is required to allocate the memory for return buffer instead of using thread local memory from XGBoost. This is to avoid allocating a huge memory buffer that can not be freed until exiting the thread.

Since

1.7.0

Parameters:
  • handle – the handle to the DMatrix

  • config – JSON configuration string. At the moment it should be an empty document, preserved for future use.

  • out_indptr – indptr of output CSR matrix.

  • out_indices – Column index of output CSR matrix.

  • out_data – Data value of CSR matrix.

Returns:

0 when success, -1 when failure happens

intXGDMatrixGetQuantileCut(DMatrixHandleconsthandle,charconst*config,charconst**out_indptr,charconst**out_data)

Export the quantile cuts used for training histogram-based models likehist andapprox. Useful for model compression.

Since

2.0.0

Parameters:
  • handle – the handle to the DMatrix

  • config – JSON configuration string. At the moment it should be an empty document, preserved for future use.

  • out_indptr – indptr of output CSC matrix represented by a JSON encoded __(cuda_)array_interface__.

  • out_data – Data value of CSC matrix represented by a JSON encoded __(cuda_)array_interface__.

Streaming

groupStreaming

Quantile DMatrix and external memory DMatrix can be created from batches of data.

There are 2 sets of data callbacks for DMatrix. The first one is currently exclusively used by JVM packages. It usesXGBoostBatchCSR to accept batches for CSR formated input, and concatenate them into 1 final big CSR. The related functions are:

Another set is used by external data iterator. It accepts foreign data iterators as callbacks. There are 2 different senarios where users might want to pass in callbacks instead of raw data. First it’s the Quantile DMatrix used by the hist and GPU-based hist tree method. For this case, the data is first compressed by quantile sketching then merged. This is particular useful for distributed setting as it eliminates 2 copies of data. First one by aconcat from external library to make the data into a blob for normal DMatrix initialization, another one by the internal CSR copy of DMatrix.

The second use case is external memory support where users can pass a custom data iterator into XGBoost for loading data in batches. For both cases, the iterator is only used during the construction of the DMatrix and can be safely freed after construction finishes. There are short notes on each of the use cases in respected DMatrix factory function.

Related functions are:

Factory functions

Proxy that callers can use to pass data to XGBoost

Typedefs

typedefvoid*DataIterHandle

handle to a external data iterator

typedefvoid*DataHolderHandle

handle to an internal data holder.

typedefintXGBCallbackSetData(DataHolderHandlehandle,XGBoostBatchCSRbatch)

Callback to set the data to handle,.

Param handle:

The handle to the callback.

Param batch:

The data content to be set.

typedefintXGBCallbackDataIterNext(DataIterHandledata_handle,XGBCallbackSetData*set_function,DataHolderHandleset_function_handle)

The data reading callback function. The iterator will be able to give subset of batch in the data.

If there is data, the function will call set_function to set the data.

Param data_handle:

The handle to the callback.

Param set_function:

The batch returned by the iterator

Param set_function_handle:

The handle to be passed to set function.

Return:

0 if we are reaching the end and batch is not returned.

typedefintXGDMatrixCallbackNext(DataIterHandleiter)

Callback function prototype for getting next batch of data.

Param iter:

A handler to the user defined iterator.

Return:

0 when success, -1 when failure happens.

typedefvoidDataIterResetCallback(DataIterHandlehandle)

Callback function prototype for resetting the external iterator.

Functions

intXGDMatrixCreateFromDataIter(DataIterHandledata_handle,XGBCallbackDataIterNext*callback,constchar*cache_info,floatmissing,DMatrixHandle*out)

Create a DMatrix from a data iterator.

Parameters:
  • data_handle – The handle to the data.

  • callback – The callback to get the data.

  • cache_info – Additional information about cache file, can be null.

  • missing – Which value to represent missing value.

  • out – The created DMatrix

Returns:

0 when success, -1 when failure happens.

intXGProxyDMatrixCreate(DMatrixHandle*out)

Create a DMatrix proxy for setting data, can be freed byXGDMatrixFree.

Second set of callback functions, used by constructing Quantile DMatrix or external memory DMatrix using a custom iterator.

The DMatrix proxy is only a temporary reference (wrapper) to the actual user data. For instance, if a dense matrix (like a numpy array) is passed into the proxy DMatrix via theXGProxyDMatrixSetDataDense method, then the proxy DMatrix holds only a reference and the input array cannot be freed until the next iteration starts, signaled by a call to theXGDMatrixCallbackNext by XGBoost. It’s calledProxyDMatrix because it reuses the interface of the DMatrix class in XGBoost, but it’s just a mid interface for theXGDMatrixCreateFromCallback and related constructors to consume various user input types.

User inputs -> Proxy DMatrix (wrapper) -> Actual DMatrix
Parameters:

out – The created Proxy DMatrix.

Returns:

0 when success, -1 when failure happens.

intXGDMatrixCreateFromCallback(DataIterHandleiter,DMatrixHandleproxy,DataIterResetCallback*reset,XGDMatrixCallbackNext*next,charconst*config,DMatrixHandle*out)

Create an external memory DMatrix with data iterator.

Short note for how to use second set of callback for external memory data support:

  • Step 0: Define a data iterator with 2 methodsreset, andnext.

  • Step 1: Create a DMatrix proxy byXGProxyDMatrixCreate and hold the handle.

  • Step 2: Pass the iterator handle, proxy handle and 2 methods intoXGDMatrixCreateFromCallback, along with other parameters encoded as a JSON object.

  • Step 3: Call appropriate data setters innext functions.

Parameters:
  • iter – A handle to external data iterator.

  • proxy – A DMatrix proxy handle created byXGProxyDMatrixCreate.

  • reset – Callback function resetting the iterator state.

  • next – Callback function yielding the next batch of data.

  • config – JSON encoded parameters for DMatrix construction. Accepted fields are:

    • missing: Which value to represent missing value

    • cache_prefix: The path of cache file, caller must initialize all the directories in this path.

    • nthread (optional): Number of threads used for initializing DMatrix.

  • out[out] The created external memory DMatrix

Returns:

0 when success, -1 when failure happens

intXGQuantileDMatrixCreateFromCallback(DataIterHandleiter,DMatrixHandleproxy,DataIterHandleref,DataIterResetCallback*reset,XGDMatrixCallbackNext*next,charconst*config,DMatrixHandle*out)

Create a Quantile DMatrix with a data iterator.

Short note for how to use the second set of callback for (GPU)Hist tree method:

  • Step 0: Define a data iterator with 2 methodsreset, andnext.

  • Step 1: Create a DMatrix proxy byXGProxyDMatrixCreate and hold the handle.

  • Step 2: Pass the iterator handle, proxy handle and 2 methods intoXGQuantileDMatrixCreateFromCallback.

  • Step 3: Call appropriate data setters innext functions.

See test_iterative_dmatrix.cu or Python interface for examples.

Parameters:
  • iter – A handle to external data iterator.

  • proxy – A DMatrix proxy handle created byXGProxyDMatrixCreate.

  • ref – Reference DMatrix for providing quantile information.

  • reset – Callback function resetting the iterator state.

  • next – Callback function yielding the next batch of data.

  • config – JSON encoded parameters for DMatrix construction. Accepted fields are:

    • missing: Which value to represent missing value

    • nthread (optional): Number of threads used for initializing DMatrix.

    • max_bin (optional): Maximum number of bins for building histogram. Must be consistent with the corresponding booster training parameter.

    • max_quantile_blocks (optional): For GPU-based inputs, XGBoost handles incoming batches with multiple growing substreams. This parameter sets the maximum number of batches before XGBoost can cut the sub-stream and create a new one. This can help bound the memory usage. By default, XGBoost grows new sub-streams exponentially until batches are exhausted. Only used for the training dataset and the default is None (unbounded).

  • out – The created Quantile DMatrix.

Returns:

0 when success, -1 when failure happens

intXGExtMemQuantileDMatrixCreateFromCallback(DataIterHandleiter,DMatrixHandleproxy,DataIterHandleref,DataIterResetCallback*reset,XGDMatrixCallbackNext*next,charconst*config,DMatrixHandle*out)

Create a Quantile DMatrix backed by external memory.

SeeUsing XGBoost External Memory Version for more info.

Since

3.0.0

  • cache_host_ratio (optioinal): For GPU-based inputs, XGBoost can split the cache into host and device portitions to reduce the data transfer overhead. This parameter specifies the size of host cache compared to the size of the entire cache:host/(host+device).

Note

This is experimental and subject to change.

Parameters:
  • out – The created Quantile DMatrix.

  • iter – A handle to external data iterator.

  • proxy – A DMatrix proxy handle created byXGProxyDMatrixCreate.

  • ref – Reference DMatrix for providing quantile information.

  • reset – Callback function resetting the iterator state.

  • next – Callback function yielding the next batch of data.

  • config – JSON encoded parameters for DMatrix construction. Accepted fields are:

    • missing: Which value to represent missing value

    • cache_prefix: The path of cache file, caller must initialize all the directories in this path.

    • nthread (optional): Number of threads used for initializing DMatrix.

    • max_bin (optional): Maximum number of bins for building histogram. Must be consistent with the corresponding booster training parameter.

    • on_host (optional): Whether the data should be placed on host memory. Used by GPU inputs.

    • min_cache_page_bytes (optional): The minimum number of bytes for each internal GPU page. Set to 0 to disable page concatenation. Automatic configuration if the parameter is not provided or set to None.

    • max_quantile_blocks (optional): For GPU-based inputs, XGBoost handles incoming batches with multiple growing substreams. This parameter sets the maximum number of batches before XGBoost can cut the sub-stream and create a new one. This can help bound the memory usage. By default, XGBoost grows new sub-streams exponentially until batches are exhausted. Only used for the training dataset and the default is None (unbounded).

Returns:

0 when success, -1 when failure happens

intXGProxyDMatrixSetDataCudaArrayInterface(DMatrixHandlehandle,constchar*data)

Set data on a DMatrix proxy.

Parameters:
  • handle – A DMatrix proxy created byXGProxyDMatrixCreate

  • data – Null terminated JSON document string representation of CUDA array interface.

Returns:

0 when success, -1 when failure happens

intXGProxyDMatrixSetDataColumnar(DMatrixHandlehandle,charconst*data)

Set columnar (table) data on a DMatrix proxy.

Parameters:
Returns:

0 when success, -1 when failure happens

intXGProxyDMatrixSetDataCudaColumnar(DMatrixHandlehandle,constchar*data)

Set CUDA-based columnar (table) data on a DMatrix proxy.

Parameters:
Returns:

0 when success, -1 when failure happens

intXGProxyDMatrixSetDataDense(DMatrixHandlehandle,charconst*data)

Set data on a DMatrix proxy.

Parameters:
  • handle – A DMatrix proxy created byXGProxyDMatrixCreate

  • data – Null terminated JSON document string representation of array interface.

Returns:

0 when success, -1 when failure happens

intXGProxyDMatrixSetDataCSR(DMatrixHandlehandle,charconst*indptr,charconst*indices,charconst*data,bst_ulongncol)

Set data on a DMatrix proxy.

Parameters:
  • handle – A DMatrix proxy created byXGProxyDMatrixCreate

  • indptr – JSON encodedarray_interface to row pointer in CSR.

  • indices – JSON encodedarray_interface to column indices in CSR.

  • data – JSON encodedarray_interface to values in CSR..

  • ncol – The number of columns of input CSR matrix.

Returns:

0 when success, -1 when failure happens

structXGBoostBatchCSR
#include <c_api.h>

Mini batch used in XGBoost Data Iteration.

Booster

groupBooster

TheBooster class is the gradient-boosted model for XGBoost.

During training, the booster object has many caches for improved performance. In addition to gradient and prediction, it also includes runtime buffers like leaf partitions. These buffers persist with the Booster object until eitherXGBoosterReset() is called or the booster is deleted by theXGBoosterFree().

Functions

intXGBoosterCreate(constDMatrixHandledmats[],bst_ulonglen,BoosterHandle*out)

Create a XGBoost learner (booster)

Parameters:
  • dmats – matrices that are set to be cached by the booster.

  • len – length of dmats

  • out – handle to the result booster

Returns:

0 when success, -1 when failure happens

intXGBoosterFree(BoosterHandlehandle)

Delete the booster.

Parameters:

handle – The handle to be freed.

Returns:

0 when success, -1 when failure happens

intXGBoosterReset(BoosterHandlehandle)

Reset the booster object to release data caches used for training.

Since

3.0.0

Returns:

0 when success, -1 when failure happens

intXGBoosterSlice(BoosterHandlehandle,intbegin_layer,intend_layer,intstep,BoosterHandle*out)

Slice a model using boosting index. The slice m:n indicates taking all trees that were fit during the boosting rounds m, (m+1), (m+2), …, (n-1).

Parameters:
  • handle – Booster to be sliced.

  • begin_layer – start of the slice

  • end_layer – end of the slice; end_layer=0 is equivalent to end_layer=num_boost_round

  • step – step size of the slice

  • out – Sliced booster.

Returns:

0 when success, -1 when failure happens, -2 when index is out of bound.

intXGBoosterBoostedRounds(BoosterHandlehandle,int*out)

Get number of boosted rounds from gradient booster. When process_type is update, this number might drop due to removed tree.

Parameters:
  • handle – Handle to booster.

  • out – Pointer to output integer.

Returns:

0 when success, -1 when failure happens

intXGBoosterSetParam(BoosterHandlehandle,constchar*name,constchar*value)

set parameters

Parameters:
  • handle – handle

  • name – parameter name

  • value – value of parameter

Returns:

0 when success, -1 when failure happens

intXGBoosterGetNumFeature(BoosterHandlehandle,bst_ulong*out)

get number of features

Parameters:
  • handle – Handle to booster.

  • out – number of features

Returns:

0 when success, -1 when failure happens

intXGBoosterUpdateOneIter(BoosterHandlehandle,intiter,DMatrixHandledtrain)

update the model in one round using dtrain

Parameters:
  • handle – handle

  • iter – current iteration rounds

  • dtrain – training data

Returns:

0 when success, -1 when failure happens

intXGBoosterBoostOneIter(BoosterHandlehandle,DMatrixHandledtrain,float*grad,float*hess,bst_ulonglen)

Deprecated:

since 2.1.0

intXGBoosterTrainOneIter(BoosterHandlehandle,DMatrixHandledtrain,intiter,charconst*grad,charconst*hess)

Update a model with gradient and Hessian. This is used for training with a custom objective function.

Since

2.0.0

Parameters:
  • handle – handle

  • dtrain – The training data.

  • iter – The current iteration round. When training continuation is used, the count should restart.

  • grad – Json encoded __(cuda)_array_interface__ for gradient.

  • hess – Json encoded __(cuda)_array_interface__ for Hessian.

Returns:

0 when success, -1 when failure happens

intXGBoosterEvalOneIter(BoosterHandlehandle,intiter,DMatrixHandledmats[],constchar*evnames[],bst_ulonglen,constchar**out_result)

get evaluation statistics for xgboost

Parameters:
  • handle – handle

  • iter – current iteration rounds

  • dmats – pointers to data to be evaluated

  • evnames – pointers to names of each data

  • len – length of dmats

  • out_result – the string containing evaluation statistics

Returns:

0 when success, -1 when failure happens

intXGBoosterDumpModel(BoosterHandlehandle,constchar*fmap,intwith_stats,bst_ulong*out_len,constchar***out_dump_array)

dump model, return array of strings representing model dump

Parameters:
  • handle – handle

  • fmap – name to fmap can be empty string

  • with_stats – whether to dump with statistics

  • out_len – length of output array

  • out_dump_array – pointer to hold representing dump of each model

Returns:

0 when success, -1 when failure happens

intXGBoosterDumpModelEx(BoosterHandlehandle,constchar*fmap,intwith_stats,constchar*format,bst_ulong*out_len,constchar***out_dump_array)

dump model, return array of strings representing model dump

Parameters:
  • handle – handle

  • fmap – name to fmap can be empty string

  • with_stats – whether to dump with statistics

  • format – the format to dump the model in

  • out_len – length of output array

  • out_dump_array – pointer to hold representing dump of each model

Returns:

0 when success, -1 when failure happens

intXGBoosterDumpModelWithFeatures(BoosterHandlehandle,intfnum,constchar**fname,constchar**ftype,intwith_stats,bst_ulong*out_len,constchar***out_models)

dump model, return array of strings representing model dump

Parameters:
  • handle – handle

  • fnum – number of features

  • fname – names of features

  • ftype – types of features

  • with_stats – whether to dump with statistics

  • out_len – length of output array

  • out_models – pointer to hold representing dump of each model

Returns:

0 when success, -1 when failure happens

intXGBoosterDumpModelExWithFeatures(BoosterHandlehandle,intfnum,constchar**fname,constchar**ftype,intwith_stats,constchar*format,bst_ulong*out_len,constchar***out_models)

dump model, return array of strings representing model dump

Parameters:
  • handle – handle

  • fnum – number of features

  • fname – names of features

  • ftype – types of features

  • with_stats – whether to dump with statistics

  • format – the format to dump the model in

  • out_len – length of output array

  • out_models – pointer to hold representing dump of each model

Returns:

0 when success, -1 when failure happens

intXGBoosterGetAttr(BoosterHandlehandle,constchar*key,constchar**out,int*success)

Get string attribute from Booster.

Parameters:
  • handle – handle

  • key – The key of the attribute.

  • out – The result attribute, can be NULL if the attribute do not exist.

  • success – Whether the result is contained in out.

Returns:

0 when success, -1 when failure happens

intXGBoosterSetAttr(BoosterHandlehandle,constchar*key,constchar*value)

Set or delete string attribute.

Parameters:
  • handle – handle

  • key – The key of the attribute.

  • value – The value to be saved. If nullptr, the attribute would be deleted.

Returns:

0 when success, -1 when failure happens

intXGBoosterGetAttrNames(BoosterHandlehandle,bst_ulong*out_len,constchar***out)

Get the names of all attribute from Booster.

Parameters:
  • handle – handle

  • out_len – the argument to hold the output length

  • out – pointer to hold the output attribute stings

Returns:

0 when success, -1 when failure happens

intXGBoosterSetStrFeatureInfo(BoosterHandlehandle,constchar*field,constchar**features,constbst_ulongsize)

Set string encoded feature info in Booster, similar to the feature info in DMatrix.

Accepted fields are:

  • feature_name

  • feature_type

Parameters:
  • handle – An instance of Booster

  • field – Field name

  • features – Pointer to array of strings.

  • size – Size offeatures pointer (number of strings passed in).

Returns:

0 when success, -1 when failure happens

intXGBoosterGetStrFeatureInfo(BoosterHandlehandle,constchar*field,bst_ulong*len,constchar***out_features)

Get string encoded feature info from Booster, similar to the feature info in DMatrix.

Accepted field names are:

  • feature_name

  • feature_type

Caller is responsible for copying out the data, before the next call to any API function of XGBoost.

Parameters:
  • handle – An instance of Booster

  • field – Field name

  • len – Size of output pointerfeatures (number of strings returned).

  • out_features – Address of a pointer to array of strings. Result is stored in thread local memory.

Returns:

0 when success, -1 when failure happens

intXGBoosterFeatureScore(BoosterHandlehandle,constchar*config,bst_ulong*out_n_features,charconst***out_features,bst_ulong*out_dim,bst_ulongconst**out_shape,floatconst**out_scores)

Calculate feature scores for tree models. When used on linear model, only theweight importance type is defined, and output scores is a row major matrix with shape [n_features, n_classes] for multi-class model. For tree model, out_n_feature is always equal to out_n_scores and has multiple definitions of importance type.

Parameters:
  • handle – An instance of Booster

  • config – Parameters for computing scores encoded as JSON. Accepted JSON keys are:

    • importance_type: A JSON string with following possible values:

      • ’weight’: the number of times a feature is used to split the data across all trees.

      • ’gain’: the average gain across all splits the feature is used in.

      • ’cover’: the average coverage across all splits the feature is used in.

      • ’total_gain’: the total gain across all splits the feature is used in.

      • ’total_cover’: the total coverage across all splits the feature is used in.

    • feature_map: An optional JSON string with URI or path to the feature map file.

    • feature_names: An optional JSON array with string names for each feature.

  • out_n_features – Length of output feature names.

  • out_features – An array of string as feature names, ordered the same as output scores.

  • out_dim – Dimension of output feature scores.

  • out_shape – Shape of output feature scores with length ofout_dim.

  • out_scores – An array of floating point as feature scores with shape ofout_shape.

Returns:

0 when success, -1 when failure happens

Prediction

groupPrediction

These functions are used for running prediction and explanation algorithms.

Functions

intXGBoosterPredict(BoosterHandlehandle,DMatrixHandledmat,intoption_mask,unsignedntree_limit,inttraining,bst_ulong*out_len,constfloat**out_result)

make prediction based on dmat (deprecated, useXGBoosterPredictFromDMatrix instead)

Deprecated:

Parameters:
  • handle – handle

  • dmat – data matrix

  • option_mask – bit-mask of options taken in prediction, possible values 0:normal prediction 1:output margin instead of transformed value 2:output leaf index of trees instead of leaf value, note leaf index is unique per tree 4:output feature contributions to individual predictions

  • ntree_limit – limit number of trees used for prediction, this is only valid for boosted trees when the parameter is set to 0, we will use all the trees

  • training – Whether the prediction function is used as part of a training loop. Prediction can be run in 2 scenarios:

    1. Given data matrix X, obtain prediction y_pred from the model.

    2. Obtain the prediction for computing gradients. For example, DART booster performs dropout during training, and the prediction result will be different from the one obtained by normal inference step due to dropped trees. Set training=false for the first scenario. Set training=true for the second scenario. The second scenario applies when you are defining a custom objective function.

  • out_len – used to store length of returning result

  • out_result – used to set a pointer to array

Returns:

0 when success, -1 when failure happens

intXGBoosterPredictFromDMatrix(BoosterHandlehandle,DMatrixHandledmat,charconst*config,bst_ulongconst**out_shape,bst_ulong*out_dim,floatconst**out_result)

Make prediction from DMatrix, replacingXGBoosterPredict.

“type”: [0, 6]

  • 0: normal prediction

  • 1: output margin

  • 2: predict contribution

  • 3: predict approximated contribution

  • 4: predict feature interaction

  • 5: predict approximated feature interaction

  • 6: predict leaf “training”: bool Whether the prediction function is used as part of a training loop.Not used for inplace prediction.

Prediction can be run in 2 scenarios:

  1. Given data matrix X, obtain prediction y_pred from the model.

  2. Obtain the prediction for computing gradients. For example, DART booster performs dropout during training, and the prediction result will be different from the one obtained by normal inference step due to dropped trees. Set training=false for the first scenario. Set training=true for the second scenario. The second scenario applies when you are defining a custom objective function. “iteration_begin”: int Beginning iteration of prediction. “iteration_end”: int End iteration of prediction. Set to 0 this will become the size of tree model (all the trees). “strict_shape”: bool Whether should we reshape the output with stricter rules. If set to true, normal/margin/contrib/interaction predict will output consistent shape disregarding the use of multi-class model, and leaf prediction will output 4-dim array representing: (n_samples, n_iterations, n_classes, n_trees_in_forest)

Example JSON input for running a normal prediction with strict output shape, 2 dim for softprob , 1 dim for others.

{"type":0,"training":false,"iteration_begin":0,"iteration_end":0,"strict_shape":true}

Parameters:
  • handle – Booster handle

  • dmat – DMatrix handle

  • config – String encoded predict configuration in JSON format, with following available fields in the JSON object:

  • out_shape – Shape of output prediction (copy before use).

  • out_dim – Dimension of output prediction.

  • out_result – Buffer storing prediction value (copy before use).

Returns:

0 when success, -1 when failure happens

intXGBoosterPredictFromDense(BoosterHandlehandle,charconst*values,charconst*config,DMatrixHandlem,bst_ulongconst**out_shape,bst_ulong*out_dim,constfloat**out_result)

Inplace prediction from CPU dense matrix.

Note

If the booster is configured to run on a CUDA device, XGBoost falls back to run prediction with DMatrix with a performance warning.

Parameters:
Returns:

0 when success, -1 when failure happens

intXGBoosterPredictFromColumnar(BoosterHandlehandle,charconst*values,charconst*config,DMatrixHandlem,bst_ulongconst**out_shape,bst_ulong*out_dim,constfloat**out_result)

Inplace prediction from CPU columnar data. (Table)

Note

If the booster is configured to run on a CUDA device, XGBoost falls back to run prediction with DMatrix with a performance warning.

Parameters:
Returns:

0 when success, -1 when failure happens

intXGBoosterPredictFromCSR(BoosterHandlehandle,charconst*indptr,charconst*indices,charconst*values,bst_ulongncol,charconst*config,DMatrixHandlem,bst_ulongconst**out_shape,bst_ulong*out_dim,constfloat**out_result)

Inplace prediction from CPU CSR matrix.

Note

If the booster is configured to run on a CUDA device, XGBoost falls back to run prediction with DMatrix with a performance warning.

Parameters:
  • handle – Booster handle.

  • indptr – JSON encodedarray_interface to row pointer in CSR.

  • indices – JSON encodedarray_interface to column indices in CSR.

  • values – JSON encodedarray_interface to values in CSR..

  • ncol – Number of features in data.

  • config – SeeXGBoosterPredictFromDMatrix for more info. Additional fields for inplace prediction are:

    • ”missing”: float

  • m – An optional (NULL if not available) proxy DMatrix instance storing meta info.

  • out_shape – SeeXGBoosterPredictFromDMatrix for more info.

  • out_dim – SeeXGBoosterPredictFromDMatrix for more info.

  • out_result – SeeXGBoosterPredictFromDMatrix for more info.

Returns:

0 when success, -1 when failure happens

intXGBoosterPredictFromCudaArray(BoosterHandlehandle,charconst*values,charconst*config,DMatrixHandleproxy,bst_ulongconst**out_shape,bst_ulong*out_dim,constfloat**out_result)

Inplace prediction from CUDA Dense matrix (cupy in Python).

Note

If the booster is configured to run on a CPU, XGBoost falls back to run prediction with DMatrix with a performance warning.

Parameters:
Returns:

0 when success, -1 when failure happens

intXGBoosterPredictFromCudaColumnar(BoosterHandlehandle,charconst*data,charconst*config,DMatrixHandleproxy,bst_ulongconst**out_shape,bst_ulong*out_dim,constfloat**out_result)

Inplace prediction from CUDA dense dataframe (cuDF in Python).

Note

If the booster is configured to run on a CPU, XGBoost falls back to run prediction with DMatrix with a performance warning.

Parameters:
Returns:

0 when success, -1 when failure happens

Serialization

groupSerialization

There are multiple ways to serialize a Booster object depending on the use case.

Short note for serialization APIs. There are 3 different sets of serialization API.

  • Functions with the term “Model” handles saving/loading XGBoost model like trees or linear weights. Striping out parameters configuration like training algorithms or CUDA device ID. These functions are designed to let users reuse the trained model for different tasks, examples are prediction, training continuation or model interpretation.

  • Functions with the term “Config” handles save/loading configuration. It helps user to study the internal of XGBoost. Also user can use the load method for specifying parameters in a structured way. These functions were introduced in 1.0.0.

  • Functions with the term “Serialization” are combination of above two. They are used in situations like check-pointing, or continuing training task in a distributed environment. In these cases the task must be carried out without any user intervention.

Functions

intXGBoosterLoadModel(BoosterHandlehandle,constchar*fname)

Load the model from an existing file.

Parameters:
  • handle – handle

  • fname – File name. The string must be UTF-8 encoded.

Returns:

0 when success, -1 when failure happens

intXGBoosterSaveModel(BoosterHandlehandle,constchar*fname)

Save the model into an existing file.

Parameters:
  • handle – handle

  • fname – File name. The string must be UTF-8 encoded.

Returns:

0 when success, -1 when failure happens

intXGBoosterLoadModelFromBuffer(BoosterHandlehandle,constvoid*buf,bst_ulonglen)

load model from in memory buffer

Parameters:
  • handle – handle

  • buf – pointer to the buffer

  • len – the length of the buffer

Returns:

0 when success, -1 when failure happens

intXGBoosterSaveModelToBuffer(BoosterHandlehandle,charconst*config,bst_ulong*out_len,charconst**out_dptr)

Save model into raw bytes, return header of the array. User must copy the result out, before next xgboost call.

Parameters:
  • handle – handle

  • config – JSON encoded string storing parameters for the function. Following keys are expected in the JSON document:

    • ”format”: str

      • json: Output booster will be encoded as JSON.

      • ubj: Output booster will be encoded as Universal binary JSON. this format except for compatibility reasons.

  • out_len – The argument to hold the output length

  • out_dptr – The argument to hold the output data pointer

Returns:

0 when success, -1 when failure happens

intXGBoosterSerializeToBuffer(BoosterHandlehandle,bst_ulong*out_len,constchar**out_dptr)

Memory snapshot based serialization method. Saves everything states into buffer.

Parameters:
  • handle – handle

  • out_len – the argument to hold the output length

  • out_dptr – the argument to hold the output data pointer

Returns:

0 when success, -1 when failure happens

intXGBoosterUnserializeFromBuffer(BoosterHandlehandle,constvoid*buf,bst_ulonglen)

Memory snapshot based serialization method. Loads the buffer returned fromXGBoosterSerializeToBuffer.

Parameters:
  • handle – handle

  • buf – pointer to the buffer

  • len – the length of the buffer

Returns:

0 when success, -1 when failure happens

intXGBoosterSaveJsonConfig(BoosterHandlehandle,bst_ulong*out_len,charconst**out_str)

Save XGBoost’s internal configuration into a JSON document. Currently the support is experimental, function signature may change in the future without notice.

Parameters:
  • handle – handle to Booster object.

  • out_len – length of output string

  • out_str – A valid pointer to array of characters. The characters array is allocated and managed by XGBoost, while pointer to that array needs to be managed by caller.

Returns:

0 when success, -1 when failure happens

intXGBoosterLoadJsonConfig(BoosterHandlehandle,charconst*config)

Load XGBoost’s internal configuration from a JSON document. Currently the support is experimental, function signature may change in the future without notice.

Parameters:
  • handle – handle to Booster object.

  • config – string representation of a JSON document.

Returns:

0 when success, -1 when failure happens

Collective

groupCollective

Experimental support for exposing internal communicator in XGBoost.

The collective communicator in XGBoost evolved from therabit project of dmlc but has changed significantly since its adoption. It consists of a tracker and a set of workers. The tracker is responsible for bootstrapping the communication group and handling centralized tasks like logging. The workers are actual communicators performing collective tasks like allreduce.

To use the collective implementation, one needs to first create a tracker with corresponding parameters, then get the arguments for workers usingXGTrackerWorkerArgs(). The obtained arguments can then be passed to theXGCommunicatorInit() function. Call toXGCommunicatorInit() must be accompanied with aXGCommunicatorFinalize() call for cleanups. Please note that the communicator usesstd::thread in C++, which has undefined behavior in a C++ destructor due to the runtime shutdown sequence. It’s preferable to callXGCommunicatorFinalize() before the runtime is shutting down. This requirement is similar to a Python thread or socket, which should not be relied upon in a__del__ function.

Since it’s used as a part of XGBoost, errors will be returned when a XGBoost function is called, for instance, training a booster might return a connection error.

Note

This is still under development.

Typedefs

typedefvoid*TrackerHandle

Handle to the tracker.

There are currently two types of tracker in XGBoost, first one israbit, while the other one isfederated.rabit is used for normal collective communication, whilefederated is used for federated learning.

Functions

intXGTrackerCreate(charconst*config,TrackerHandle*handle)

Create a new tracker.

  • dmlc_communicator: String, the type of tracker to create. Available options arerabit andfederated. SeeTrackerHandle for more info.

  • n_workers: Integer, the number of workers.

  • port: (Optional) Integer, the port this tracker should listen to.

  • timeout: (Optional) Integer, timeout in seconds for various networking operations. Default is 300 seconds.

Some configurations arerabit specific:

  • host: (Optional) String, Used by the therabit tracker to specify the address of the host. This can be useful when the communicator cannot reliably obtain the host address.

  • sortby: (Optional) Integer.

    • 0: Sort workers by their host name.

    • 1: Sort workers by task IDs.

Somefederated specific configurations:

  • federated_secure: Boolean, whether this is a secure server. False for testing.

  • server_key_path: Path to the server key. Used only if this is a secure server.

  • server_cert_path: Path to the server certificate. Used only if this is a secure server.

  • client_cert_path: Path to the client certificate. Used only if this is a secure server.

Parameters:
  • config – JSON encoded parameters.

  • handle – The handle to the created tracker.

Returns:

0 when success, -1 when failure happens

intXGTrackerWorkerArgs(TrackerHandlehandle,charconst**args)

Get the arguments needed for running workers. This should be called afterXGTrackerRun().

Parameters:
  • handle – The handle to the tracker.

  • args – The arguments returned as a JSON document.

Returns:

0 when success, -1 when failure happens

intXGTrackerRun(TrackerHandlehandle,charconst*config)

Start the tracker. The tracker runs in the background and this function returns once the tracker is started.

Parameters:
  • handle – The handle to the tracker.

  • config – Unused at the moment, preserved for the future.

Returns:

0 when success, -1 when failure happens

intXGTrackerWaitFor(TrackerHandlehandle,charconst*config)

Wait for the tracker to finish, should be called afterXGTrackerRun(). This function will block until the tracker task is finished or timeout is reached.

Parameters:
  • handle – The handle to the tracker.

  • config – JSON encoded configuration. No argument is required yet, preserved for the future.

Returns:

0 when success, -1 when failure happens

intXGTrackerFree(TrackerHandlehandle)

Free a tracker instance. This should be called afterXGTrackerWaitFor(). If the tracker is not properly waited, this function will shutdown all connections with the tracker, potentially leading to undefined behavior.

Parameters:

handle – The handle to the tracker.

Returns:

0 when success, -1 when failure happens

intXGCommunicatorInit(charconst*config)

Initialize the collective communicator.

Currently the communicator API is experimental, function signatures may change in the future without notice.

Call this once in the worker process before using anything. Please make sureXGCommunicatorFinalize() is called after use. The initialized commuicator is a global thread-local variable.

Only applicable to therabit communicator:

  • dmlc_tracker_uri: Hostname or IP address of the tracker.

  • dmlc_tracker_port: Port number of the tracker.

  • dmlc_task_id: ID of the current task, can be used to obtain deterministic rank assignment.

  • dmlc_retry: The number of retries for connection failure.

  • dmlc_timeout: Timeout in seconds.

  • dmlc_nccl_path: Path to the nccl shared librarylibnccl.so.

Only applicable to thefederated communicator (use upper case for environment variables, use lower case for runtime configuration):

  • federated_server_address: Address of the federated server.

  • federated_world_size: Number of federated workers.

  • federated_rank: Rank of the current worker.

  • federated_server_cert_path: Server certificate file path. Only needed for the SSL mode.

  • federated_client_key_path: Client key file path. Only needed for the SSL mode.

  • federated_client_cert_path: Client certificate file path. Only needed for the SSL mode.

Parameters:

config – JSON encoded configuration. Accepted JSON keys are:

  • dmlc_communicator: The type of the communicator, this should match the tracker type.

    • rabit: Use Rabit. This is the default if the type is unspecified.

    • federated: Use the gRPC interface for Federated Learning.

Returns:

0 when success, -1 when failure happens

intXGCommunicatorFinalize(void)

Finalize the collective communicator.

Call this function after you have finished all jobs.

Returns:

0 when success, -1 when failure happens

intXGCommunicatorGetRank(void)

Get rank of the current process.

Returns:

Rank of the worker.

intXGCommunicatorGetWorldSize(void)

Get the total number of processes.

Returns:

Total world size.

intXGCommunicatorIsDistributed(void)

Get if the communicator is distributed.

Returns:

True if the communicator is distributed.

intXGCommunicatorPrint(charconst*message)

Print the message to the tracker.

This function can be used to communicate the information of the progress to the user who monitors the tracker.

Parameters:

message – The message to be printed.

Returns:

0 when success, -1 when failure happens

intXGCommunicatorGetProcessorName(constchar**name_str)

Get the name of the processor.

Parameters:

name_str – Pointer to received returned processor name.

Returns:

0 when success, -1 when failure happens

intXGCommunicatorBroadcast(void*send_receive_buffer,size_tsize,introot)

Broadcast a memory region to all others from root. This function is NOT thread-safe.

Example:

inta=1;Broadcast(&a,sizeof(a),root);

Parameters:
  • send_receive_buffer – Pointer to the send or receive buffer.

  • size – Size of the data in bytes.

  • root – The process rank to broadcast from.

Returns:

0 when success, -1 when failure happens

intXGCommunicatorAllreduce(void*send_receive_buffer,size_tcount,intdata_type,intop)

Perform in-place allreduce. This function is NOT thread-safe.

Example Usage: the following code gives sum of the result

enumclassOp{kMax=0,kMin=1,kSum=2,kBitwiseAND=3,kBitwiseOR=4,kBitwiseXOR=5};std::vector<int>data(10);...Allreduce(data.data(),data.size(),DataType:kInt32,Op::kSum);...

Parameters:
  • send_receive_buffer – Buffer for both sending and receiving data.

  • count – Number of elements to be reduced.

  • data_type – Enumeration of data type, see xgboost::collective::DataType in communicator.h.

  • op – Enumeration of operation type, see xgboost::collective::Operation in communicator.h.

Returns:

0 when success, -1 when failure happens