Getting Started with XGBoost4J

This tutorial introduces Java API for XGBoost.

Data Interface

Like the XGBoost python module, XGBoost4J uses DMatrix to handle data.LIBSVM txt format file, sparse matrix in CSR/CSC format, and dense matrix aresupported.

  • The first step is to import DMatrix:

    importml.dmlc.xgboost4j.java.DMatrix;
  • Use DMatrix constructor to load data from a libsvm text format file:

    DMatrixdmat=newDMatrix("train.svm.txt");
  • Pass arrays to DMatrix constructor to load from sparse matrix.

    Suppose we have a sparse matrix

    1 0 2 04 0 0 33 1 2 0

    We can express the sparse matrix inCompressed Sparse Row (CSR) format:

    long[]rowHeaders=newlong[]{0,2,4,7};float[]data=newfloat[]{1f,2f,4f,3f,3f,1f,2f};int[]colIndex=newint[]{0,2,0,3,0,1,2};intnumColumn=4;DMatrixdmat=newDMatrix(rowHeaders,colIndex,data,DMatrix.SparseType.CSR,numColumn);

    … or inCompressed Sparse Column (CSC) format:

    long[]colHeaders=newlong[]{0,3,4,6,7};float[]data=newfloat[]{1f,4f,3f,1f,2f,2f,3f};int[]rowIndex=newint[]{0,1,2,2,0,2,1};intnumRow=3;DMatrixdmat=newDMatrix(colHeaders,rowIndex,data,DMatrix.SparseType.CSC,numRow);
  • You may also load your data from a dense matrix. Let’s assume we have a matrix of form

    1    23    45    6

    Usingrow-major layout, we specify the dense matrix as follows:

    float[]data=newfloat[]{1f,2f,3f,4f,5f,6f};intnrow=3;intncol=2;floatmissing=0.0f;DMatrixdmat=newDMatrix(data,nrow,ncol,missing);
  • To set weight:

    float[]weights=newfloat[]{1f,2f,1f};dmat.setWeight(weights);

Setting Parameters

To set parameters, parameters are specified as a Map:

Map<String,Object>params=newHashMap<String,Object>(){{put("eta",1.0);put("max_depth",2);put("objective","binary:logistic");put("eval_metric","logloss");}};

Training Model

With parameters and data, you are able to train a booster model.

  • Import Booster and XGBoost:

    importml.dmlc.xgboost4j.java.Booster;importml.dmlc.xgboost4j.java.XGBoost;
  • Training

    DMatrixtrainMat=newDMatrix("train.svm.txt");DMatrixvalidMat=newDMatrix("valid.svm.txt");// Specify a watch list to see model accuracy on data setsMap<String,DMatrix>watches=newHashMap<String,DMatrix>(){{put("train",trainMat);put("test",testMat);}};intnround=2;Boosterbooster=XGBoost.train(trainMat,params,nround,watches,null,null);
  • Saving model

    After training, you can save model and dump it out.

    booster.saveModel("model.json");
  • Generating model dump with feature map

    // dump without feature mapString[]model_dump=booster.getModelDump(null,false);// dump with feature mapString[]model_dump_with_feature_map=booster.getModelDump("featureMap.txt",false);
  • Load a model

    Boosterbooster=XGBoost.loadModel("model.json");

Prediction

After training and loading a model, you can use it to make prediction for other data. The result will be a two-dimension float array(nsample,nclass); forpredictLeaf(), the result would be of shape(nsample,nclass*ntrees).

DMatrixdtest=newDMatrix("test.svm.txt");// predictfloat[][]predicts=booster.predict(dtest);// predict leaffloat[][]leafPredicts=booster.predictLeaf(dtest,0);