3.0.3 Patch Release (Jul 30 2025)
Fix NDCG metric with non-exp gain. (#11534)
Avoid using mean intercept for
rmsle. (#11588)[jvm-packages] add
setNumEarlyStoppingRoundsAPI (#11571)Avoid implicit synchronization in GPU evaluation. (#11542)
Remove CUDA check in the array interface handler (#11386)
Fix check in GPU histogram. (#11574)
Support Rapids 25.06 (#11504)
Adding
enable_categoricalto the sklearn.applymethod (#11550)Make xgboost.testing compatible with scikit-learn 1.7 (#11502)
Add support for building xgboost wheels on Win-ARM64 (#11572,#11597,#11559)
3.0.2 Patch Release (May 25 2025)
3.0.1 Patch Release (May 13 2025)
Use
nvidia-smito detect the driver version and handle old drivers that don’t support virtual memory. (#11391)Optimize deep trees for GPU external memory. (#11387)
Small fix for page concatenation with external memory (#11338)
Build xgboost-cpu for
manylinux_2_28_x86_64(#11406)Workaround for different Dask versions (#11436)
Output models now use denormal floating-point instead of
nan. (#11428)Fix aarch64 CI. (#11454)
3.0.0 (2025 Feb 27)
3.0.0 is a milestone for XGBoost. This note will summarize some general changes and thenlist package-specific updates. The bump in the major version is for a reworked R packagealong with a significant update to the JVM packages.
External Memory Support
This release features a major update to the external memory implementation with improvedperformance, a newExtMemQuantileDMatrix for more efficient datainitialization, new feature coverage including categorical data support and quantileregression support. Additionally, GPU-based external memory is reworked to support usingCPU memory as a data cache. Last but not least, we worked on distributed training usingexternal memory along with the spark package’s initial support.
A new
ExtMemQuantileDMatrixclass for fast data initialization withthehisttree method. The new class supports both CPU and GPU training. (#10689,#10682,#10886,#10860,#10762,#10694,#10876)External memory now supports distributed training (#10492,#10861). In addition, theSpark package can use external memory (the host memory) when the device is GPU. Thedefault package on maven doesn’t support RMM yet. For better performance, one needsto compile XGBoost from the source for now. (#11186,#11238,#11219)
Improved performance with new optimizations for both the
hist-specific training andtheapprox(DMatrix) method. (#10529,#10980,#10342)New demos and documents for external memory, including distributed training. (#11234,#10929,#10916,#10426,#11113)
Reduced binary cache size and memory allocation overhead by not writing the cut matrix. (#10444)
More feature coverage, including categorical data and all objective functions, includingquantile regression. In addition, various prediction types like SHAP values aresupported. (#10918,#10820,#10751,#10724)
Significant updates for the GPU-based external memory training implementation. (#10924,#10895,#10766,#10544,#10677,#10615,#10927,#10608,#10711)
GPU-based external memory supports both batch-based and sampling-based training. Beforethe 3.0 release, XGBoost concatenates the data during training and stores the cache ondisk. In 3.0, XGBoost can now stage the data on the host and fetch them bybatch. (#10602,#10595,#10606,#10549,#10488,#10766,#10765,#10764,#10760,#10753,#10734,#10691,#10713,#10826,#10811,#10810,#10736,#10538,#11333)
XGBoost can now utilizeNVLink-C2C for GPU-based external memory training and canhandle up to terabytes of data.
Support prediction cache (#10707).
Automatic page concatenation for improved GPU utilization (#10887).
Improved quantile sketching algorithm for batch-based inputs. See the section fornew features for more info.
Optimization for nearly-dense input, see the section foroptimization for more info.
See our latest document for detailsUsing XGBoost External Memory Version. The PyPI package(pipinstall) doesn’t haveRMM support, which is required by the GPU externalmemory implementation. To experiment, you can compile XGBoost from source or wait for theRAPIDS conda package to be available.
Networking
Continuing the work from the previous release, we updated the network module to improvereliability. (#10453,#10756,#11111,#10914,#10828,#10735,#10693,#10676,#10349,#10397,#10566,#10526,#10349)
The timeout option is now supported for NCCL using the NCCL asynchronous mode (#10850,#10934,#10945,#10930).
In addition, a newConfig class is added for users tospecify various options including timeout, tracker port, etc for distributedtraining. Both the Dask interface and the PySpark interface support the newconfiguration. (#11003,#10281,#10983,#10973)
SYCL
Continuing the work on the SYCL integration, there are significant improvements in thefeature coverage for this release from more training parameters and more objectives todistributed training, along with various optimization (#10884,#10883).
Starting with 3.0, the SYCL-plugin is close to feature-complete, users can start workingon SYCL devices for in-core training and inference. Newly introduced features include:
Dask support for distributed training (#10812)
Various training procedures, including split evaluation (#10605,#10636), grow policy(#10690,#10681), cached prediction (#10701).
Updates for objective functions. (#11029,#10931,#11016,#10993,#11064,#10325)
On-going work for float32-only devices. (#10702)
Other related PRs (#10842,#10543,#10806,#10943,#10987,#10548,#10922,#10898,#10576)
Features
This section describes new features in the XGBoost core. For language-specific features,please visit corresponding sections.
A new initialization method for objectives that are derived from GLM. The new method isbased on the mean value of the input labels. The new method changes the result of theestimated
base_score. (#10298,#11331)The
xgboost.QuantileDMatrixcan be used with all prediction types for bothCPU and GPU.In prior releases, XGBoost makes a copy for the booster to release memory held byinternal tree methods. We formalize the procedure into a new booster method
reset()/XGBoosterReset(). (#11042)OpenMP thread setting is exposed to the XGBoost global configuration. Users can use itto workaround hardcoded OpenMP environment variables. (#11175)
We improved learning to rank tasks for better hyper-parameter configuration and fordistributed training.
In 3.0, all three distributed interfaces, including Dask, Spark, and PySpark, supportsorting the data based on query ID. The option for the
DaskXGBRankeris true by default and can be optedout. (#11146,#11007,#11047,#11012,#10823,#11023)Also for learning to rank, a new parameter
lambdarank_score_normalizationisintroduced to make one of the normalizations optional. (#11272)The
lambdarank_normalizationnow uses the number of pairs when normalizing themeanpair strategy. Previously, the gradient was used for bothtopkandmean.#11322
We have improved GPU quantile sketching to reduce memory usage. The improvement helpsthe construction of the
QuantileDMatrixand the newExtMemQuantileDMatrix.A new multi-level sketching algorithm is employed to reduce the overall memory usagewith batched inputs.
In addition to algorithmic changes, internal memory usage estimation and the quantilecontainer is also updated. (#10761,#10843)
The change introduces two more parameters for the
QuantileDMatrixandDataIter, namely,max_quantile_batchesandmin_cache_page_bytes.
More work is needed to improve the support of categorical features. This releasesupports plotting trees with stat for categorical nodes (#11053). In addition, somepreparation work is ongoing for auto re-coding categories. (#11094,#11114,#11089) These are feature enhancements instead of blocking issues.
Implement weight-based feature importance for vector-leaf. (#10700)
Reduced logging in the DMatrix construction. (#11080)
Optimization
In addition to the external memory and quantile sketching improvements, we have a numberof optimizations and performance fixes.
GPU tree methods now use significantly less memory for both dense inputs and near-denseinputs. (#10821,#10870)
For near-dense inputs, GPU training is much faster for both
hist(about 2x) andapprox.Quantile regression on CPU now can handle imbalance trees much more efficiently. (#11275)
Small optimization for DMatrix construction to reduce latency. Also, C users can nowreuse the
ProxyDMatrixfor multiple inferencecalls. (#11273)CPU prediction performance for
QuantileDMatrixhas been improved(#11139) and now is on par with normalDMatrix.Fixed a performance issue for running inference using CPU with extremely sparse
QuantileDMatrix(#11250).Optimize CPU training memory allocation for improved performance. (#11112)
Improved RMM (rapids memory manager) integration. Now, with the help of
config_context(), all memory allocated by XGBoost should be routed toRMM. As a bonus, allthrustalgorithms now use async policy. (#10873,#11173,#10712,#10712,#10562)When used without RMM, XGBoost is more careful with its use of caching allocator toavoid holding too much device memory. (#10582)
Breaking Changes
This section lists breaking changes that affect all packages.
Bug Fixes
Documentation
A new tutorial for advanced usage with custom objective functions. (#10283,#10725)
The new online document site now shows documents for all packages including Python, R,and JVM-based packages. (#11240,#11216,#11166)
Lots of enhancements. (#10822, 11137,#11138,#11246,#11266,#11253,#10731,#11222,#10551,#10533)
Consistent use of cmake in documents. (#10717)
Add a brief description for using the
offsetfrom the GLM setting (likePoisson). (#10996)Cleanup document for building from source. (#11145)
Various fixes. (#10412,#10405,#10353,#10464,#10587,#10350,#11131,#10815)
Python Package
The
feature_weightsparameter in the sklearn interface is now defined asa scikit-learn parameter. (#9506)Initial support for polars, categorical feature is not yet supported. (#11126,#11172,#11116)
Reduce pandas dataframe overhead and overhead for various imports. (#11058,#11068)
Better xlabel in
plot_importance()(#11009)Validate reference dataset for training. The
train()function nowthrows an error if aQuantileDMatrixis used as a validationdataset without a reference. (#11105)Fix misleading errors when feature names are missing during inference (#10814)
Add Stacklevel to Python warning callback. The change helps improve the error messagefor the Python package. (#10977)
Remove circular reference in DataIter. It helps reduce memory usage. (#11177)
Add checks for invalid inputs forcv. (#11255)
Support doc link for the sklearn module. Users can now find links to documents in ajupyter notebook. (#10287)
Dask
Prevent the training from hanging due to aborted workers. (#10985) This helpsDask XGBoost be robust against error. When a worker is killed, the training will failwith an exception instead of hang.
Optional support for client-side logging. (#10942)
Fix LTR with empty partition and NCCL error. (#11152)
Update to work with the latest Dask. (#11291)
See theFeatures section for changes to ranking models.
See theNetworking section for changes with the communication module.
PySpark
Expose Training and Validation Metrics. (#11133)
Add barrier before initializing the communicator. (#10938)
Extend support for columnar input to CPU (GPU-only previously). (#11299)
See theFeatures section for changes to ranking models.
See theNetworking section for changes with the communication module.
Document updates (#11265).
Maintenance. (#11071,#11211,#10837,#10754,#10347,#10678,#11002,#10692,#11006,#10972,#10907,#10659,#10358,#11149,#11178,#11248)
Breaking changes
Remove deprecatedfeval. (#11051)
Remove dask from the default import. (#10935) Users are now required to import theXGBoost Dask through:
fromxgboostimportdaskasdxgb
instead of:
importxgboostasxgbxgb.dask
The change helps avoid introducing dask into the default import set.
Bump Python requirement to 3.10. (#10434)
Drop support for datatable. (#11070)
R Package
We have been reworking the R package for a few releases now. In 3.0, we will startpublishing a new R package on R-universe, before moving toward a CRAN update. The newpackage features a much more ergonomic interface, which is also more idiomatic to Rspeakers. In addition, a range of new features are introduced to the package. To name afew, the new package includes categorical feature support,QuantileDMatrix, and aninitial implementation of the external memory training. To test the new package:
install.packages('xgboost',repos=c('https://dmlc.r-universe.dev','https://cloud.r-project.org'))
Also, we finally have an online documentation site for the R package featuring bothvignettes and API references (#11166,#11257). A good starting point for the new interfaceis the newxgboost() function. We won’t list all the feature gains here, as there aretoo many! Please visit theXGBoost R Package for more info. There’s a migrationguide (#11197) there if you use a previous XGBoost R package version.
Support for the MSVC build was dropped due to incompatibility with R headers. (#10355,#11150)
Maintenance (#11259)
Related PRs. (#11171,#11231,#11223,#11073,#11224,#11076,#11084,#11081,#11072,#11170,#11123,#11168,#11264,#11140,#11117,#11104,#11095,#11125,#11124,#11122,#11108,#11102,#11101,#11100,#11077,#11099,#11074,#11065,#11092,#11090,#11096,#11148,#11151,#11159,#11204,#11254,#11109,#11141,#10798,#10743,#10849,#10747,#11022,#10989,#11026,#11060,#11059,#11041,#11043,#11025,#10674,#10727,#10745,#10733,#10750,#10749,#10744,#10794,#10330,#10698,#10687,#10688,#10654,#10456,#10556,#10465,#10337)
JVM Packages
The XGBoost 3.0 release features a significant update to the JVM packages, and inparticular, the Spark package. There are breaking changes in packaging and someparameters. Please visit themigration guide forrelated changes. The work brings new features and a more unified feature set between CPUand GPU implementation. (#10639,#10833,#10845,#10847,#10635,#10630,#11179,#11184)
Automatic partitioning for distributed learning to rank. See thefeatures section above (#11023).
Resolve spark compatibility issue (#10917)
Support missing value when constructing dmatrix with iterator (#10628)
Fix transform performance issue (#10925)
Honor skip.native.build option in xgboost4j-gpu (#10496)
Support array features type for CPU (#10937)
Change default missing value to
NaNfor better alignment (#11225)Don’t cast to float if it’s already float (#10386)
Maintenance. (#10982,#10979,#10978,#10673,#10660,#10835,#10836,#10857,#10618,#10627)
Maintenance
Code maintenance includes both refactoring (#10531,#10573,#11069), cleanups (#11129,#10878,#11244,#10401,#10502,#11107,#11097,#11130,#10758,#10923,#10541,#10990),and improvements for tests (#10611,#10658,#10583,#11245,#10708), along with fixingvarious warnings in compilers and test dependencies (#10757,#10641,#11062,#11226). Also, miscellaneous updates, including some dev scripts and profiling annotations(#10485,#10657,#10854,#10718,#11158,#10697,#11276).
Lastly, dependency updates (#10362,#10363,#10360,#10373,#10377,#10368,#10369,#10366,#11032,#11037,#11036,#11035,#11034,#10518,#10536,#10586,#10585,#10458,#10547,#10429,#10517,#10497,#10588,#10975,#10971,#10970,#10949,#10947,#10863,#10953,#10954,#10951,#10590,#10600,#10599,#10535,#10516,#10786,#10859,#10785,#10779,#10790,#10777,#10855,#10848,#10778,#10772,#10771,#10862,#10952,#10768,#10770,#10769,#10664,#10663,#10892,#10979,#10978).
CI
The CI is reworked to useRunsOn to integrate custom CI pipelines with GitHubaction. The migration helps us reduce the maintenance burden and make the CIconfiguration more accessible to others. (#11001,#11079,#10649,#11196,#11055,#10483,#11078,#11157)
Other maintenance work includes various small fixes, enhancements, and toolingupdates. (#10877,#10494,#10351,#10609,#11192,#11188,#11142,#10730,#11066,#11063,#10800,#10995,#10858,#10685,#10593,#11061)