NULL.any_near0().NA_real to FBM type integer on newMacs.big_randomSVD() andbig_crossprodSelf()(#52).backingfile tobig_crossprodSelf() andbig_cor() (#170).big_univLogReg() (#137).ind.col inbig_prodMat() (#154).FBM.dir (that defaults totempdir() as before). This can be used to change thedefault directory used to create FBMs when calling eitherFBM(),FBM.code256(),as_FBM(),big_copy(), orbig_transpose(). Note that, ifnot using the temporary directory anymore, you must clean up the filesyou do not want to keep.ARMA_64BIT_WORD.$add_columns().as_scaling_fun() to createyour ownfun.scaling parameters.pcor() (with a warning).pcor() now returns NAs (instead of 0s) for singularsystems.big_prodVec(),big_cprodVec(),big_colstats() andbig_univLinReg() have beenrecoded.pcor() for singular systems, e.g. whenx has all the same values.summary() andplot() for old (<v1.3)big_sp_list models.pcor() to compute partialcorrelations.Add two options inbig_spLinReg() andbig_spLogReg();power_scale for using adifferent scaling for LASSO andpower_adaptive for usingadaptive LASSO (where larger marginal effects are penalized less). Seedocumentation for details.
big_(c)prodVec() andbig_(c)prodMat()(re)gain ancores parameter. Note that forbig_(c)prodMat(), it might be beneficial to use the BLASparallelism (withbigparallelr::set_blas_ncores()) insteadof this parameter, especially when the matrixA islarge-ish.
big_colstats() can now be run in parallel(added parameterncores).Functionsbig_(c)prodMat() andbig_(t)crossprodSelf() now use much less memory, and may befaster.
Addcovar_from_df() to convert a data frame withfactors/characters to a numeric matrix using one-hot encoding.
Add a new column$all_conv to output ofsummary() forbig_spLinReg() andbig_spLogReg() to check whether all models have stoppedbecause of “no more improvement”. Also add a new parametersort tosummary().
Nowwarn (enabled by default) if some models may nothave reached a minimum when usingbig_spLinReg() andbig_spLogReg().
In .self$nrow * .self$ncol : NAs produced by integer overflow.Make two different memory-mappings: one that is read-only (using$address) and one where it is possible to write (using$address_rw). This enables to use file permissions toprevent modifying data.
Also add a new field$is_read_only to be used toprevent modifying data (at least with<-) even when youhave write permissions to it. Functions creating an FBM now gain aparameteris_read_only.
Make vector accessors (e.g. X[1:10])faster.
Move some code to new packages {bigassertr} and{bigparallelr}.
big_randomSVD() gains arguments related tomatrix-vector multiplication.
assert_noNA() is faster.
big_increment().Inplot.big_SVD(),
Can now plot many PCA scores (more than two) at once.
Usecoord_fixed() when plotting PCA scores becauseit is good practice.
Use log-scale in scree plot to better see small differences insingular values.
Reexportcowplot::plot_grid() to merge multipleggplots.
AUCBoot() is now 6-7 times faster.center andscale toproducts.big_univLogReg() for variables with novariation. IRLS was not converging, soglm() was usedinstead. The problem is thatglm() drops dimensions causingsingularities so that Z-score of the first covariate (or intercept) wasused instead of a missing value.Usemio instead ofboost formemory-mapping.
Add a parameterbase.row topredict.big_sp_list() and automatically detect if needed(as well as forcovar.row).
Possibility to subset abig_sp_list without losingattributes, so that one can access one model (corresponding to onealpha) even if it is not the ‘best’.
Add parameterspf.X andpf.covar inbig_sp***Reg() to provide different penalization for eachvariable (possibly no penalization at all).
Add%*%,crossprod andtcrossprod operations for ‘double’ FBMs.
Now also returns the number of non-zero variables($nb_active) and the number of candidate variables($nb_candidate) for each step of the regularization pathsofbig_spLinReg() andbig_spLogReg().
warn andreturn.all ofbig_spLinReg() andbig_spLogReg() aredeprecated; now always return the maximum information. Now provide twomethods (summary andplot) to get a quickassessment of the fitted models.Check of missing values for input vectors (indices and targets)and matrices (covariables).
AUC() is now stricter: it accepts only 0s and 1s fortarget.
$bm() and$bm.desc() have been added inorder to get anFBM as afilebacked.big.matrix. This enables using {bigmemory}functions.float added.big_write added.big_read now has afilter argument tofilter rows, and argumentnrow has been removed because itis now determined when reading the first block of data.
Removed thesave argument fromFBM (andothers); now, you must useFBM(...)$save() instead ofFBM(..., save = TRUE).
You can now fill an FBM using a data frame. Note that factorswill be used as integers.
Package{bigreadr} has been developed and is now used bybig_read.
options(bigstatsr.downcast.warning = FALSE), or you can usewithout_downcast_warning() to disable this warning for onecall.big_read so that it is faster (correspondingvignette updated).possibility to add a “base predictor” forbig_spLinReg andbig_spLogReg.
don’t store the whole regularization path (as a sparsematrix) inbig_spLinReg andbig_spLogReganymore because it caused major slowdowns.
directly average the K predictions inpredict.big_sp_best_list.
only use the “PSOCK” type of cluster because “FORK” can leavezombies behind. You can change this withoptions(bigstatsr.cluster.type = "PSOCK").
Fix a bug inbig_spLinReg related to the computationof summaries.
Now provides functionplus to be used as thecombine argument inbig_apply andbig_parallelize instead of'+'.
options(bigstatsr.cluster.type = "PSOCK"). Uses “PSOCK” in0.4.0.big_spLinReg andbig_spLogReg. One will bechosen by grid-search.big_prodMat when using a dimension of 1or 0.big_crossprod,big_tcrossprod,big_SVD andbig_randomSVD (before, there was no default at all)Integrate Cross-Model Selection and Averaging (CMSA)directly inbig_spLinReg andbig_spLogReg, aprocedure that automatically chooses the value of the
Speed upbig_spLinReg andbig_spLogReg (issue#12)
big.matrix format of packagebigmemory