Getting Started with XGBoost4J
This tutorial introduces Java API for XGBoost.
Data Interface
Like the XGBoost python module, XGBoost4J uses DMatrix to handle data.LIBSVM txt format file, sparse matrix in CSR/CSC format, and dense matrix aresupported.
The first step is to import DMatrix:
importml.dmlc.xgboost4j.java.DMatrix;
Use DMatrix constructor to load data from a libsvm text format file:
DMatrixdmat=newDMatrix("train.svm.txt");
Pass arrays to DMatrix constructor to load from sparse matrix.
Suppose we have a sparse matrix
1 0 2 04 0 0 33 1 2 0
We can express the sparse matrix inCompressed Sparse Row (CSR) format:
long[]rowHeaders=newlong[]{0,2,4,7};float[]data=newfloat[]{1f,2f,4f,3f,3f,1f,2f};int[]colIndex=newint[]{0,2,0,3,0,1,2};intnumColumn=4;DMatrixdmat=newDMatrix(rowHeaders,colIndex,data,DMatrix.SparseType.CSR,numColumn);
… or inCompressed Sparse Column (CSC) format:
long[]colHeaders=newlong[]{0,3,4,6,7};float[]data=newfloat[]{1f,4f,3f,1f,2f,2f,3f};int[]rowIndex=newint[]{0,1,2,2,0,2,1};intnumRow=3;DMatrixdmat=newDMatrix(colHeaders,rowIndex,data,DMatrix.SparseType.CSC,numRow);
You may also load your data from a dense matrix. Let’s assume we have a matrix of form
1 23 45 6
Usingrow-major layout, we specify the dense matrix as follows:
float[]data=newfloat[]{1f,2f,3f,4f,5f,6f};intnrow=3;intncol=2;floatmissing=0.0f;DMatrixdmat=newDMatrix(data,nrow,ncol,missing);
To set weight:
float[]weights=newfloat[]{1f,2f,1f};dmat.setWeight(weights);
Setting Parameters
To set parameters, parameters are specified as a Map:
Map<String,Object>params=newHashMap<String,Object>(){{put("eta",1.0);put("max_depth",2);put("objective","binary:logistic");put("eval_metric","logloss");}};
Training Model
With parameters and data, you are able to train a booster model.
Import Booster and XGBoost:
importml.dmlc.xgboost4j.java.Booster;importml.dmlc.xgboost4j.java.XGBoost;
Training
DMatrixtrainMat=newDMatrix("train.svm.txt");DMatrixvalidMat=newDMatrix("valid.svm.txt");// Specify a watch list to see model accuracy on data setsMap<String,DMatrix>watches=newHashMap<String,DMatrix>(){{put("train",trainMat);put("test",testMat);}};intnround=2;Boosterbooster=XGBoost.train(trainMat,params,nround,watches,null,null);
Saving model
After training, you can save model and dump it out.
booster.saveModel("model.json");
Generating model dump with feature map
// dump without feature mapString[]model_dump=booster.getModelDump(null,false);// dump with feature mapString[]model_dump_with_feature_map=booster.getModelDump("featureMap.txt",false);
Load a model
Boosterbooster=XGBoost.loadModel("model.json");
Prediction
After training and loading a model, you can use it to make prediction for other data. The result will be a two-dimension float array(nsample,nclass); forpredictLeaf(), the result would be of shape(nsample,nclass*ntrees).
DMatrixdtest=newDMatrix("test.svm.txt");// predictfloat[][]predicts=booster.predict(dtest);// predict leaffloat[][]leafPredicts=booster.predictLeaf(dtest,0);