Movatterモバイル変換

Type:

Package

Title:

Parallel GLM

Version:

0.1.7

Description:

Provides a parallel estimation method for generalized linear models without compiling with a multithreaded LAPACK or BLAS.

License:

GPL-2

Encoding:

UTF-8

URL:

https://github.com/boennecd/parglm

BugReports:

https://github.com/boennecd/parglm/issues

LinkingTo:

Rcpp, RcppArmadillo

Imports:

Rcpp, Matrix

SystemRequirements:

C++11

Suggests:

testthat, SuppDists, knitr, rmarkdown, speedglm,microbenchmark, R.rsp

RoxygenNote:

6.1.1

VignetteBuilder:

R.rsp

NeedsCompilation:

yes

Packaged:

2021-10-14 14:55:16 UTC; boennecd

Author:

Benjamin Christoffersen

[cre, aut], Anthony Williams [cph], Boost developers [cph]

Maintainer:

Benjamin Christoffersen <boennecd@gmail.com>

Repository:

CRAN

Date/Publication:

2021-10-14 15:10:02 UTC

Fitting Generalized Linear Models in Parallel

Description

Function likeglm which can make the computationin parallel. The function supports most families listed infamily.See "vignette("parglm", "parglm")" for run time examples.

Usage

parglm(formula, family = gaussian, data, weights, subset, na.action,  start = NULL, offset, control = list(...), contrasts = NULL,  model = TRUE, x = FALSE, y = TRUE, ...)parglm.fit(x, y, weights = rep(1, NROW(x)), start = NULL,  etastart = NULL, mustart = NULL, offset = rep(0, NROW(x)),  family = gaussian(), control = list(), intercept = TRUE, ...)

Arguments

formula

an object of classformula.

family

afamily object.

data

an optional data frame, list or environment containing the variablesin the model.

weights

an optional vector of 'prior weights' to be used in the fitting process. ShouldbeNULL or a numeric vector.

subset

an optional vector specifying a subset of observations to be used inthe fitting process.

na.action

a function which indicates what should happen when the data containNAs.

start

starting values for the parameters in the linear predictor.

offset

this can be used to specify an a priori known component to beincluded in the linear predictor during fitting.

control

a list of parameters for controlling the fitting process.For parglm.fit this is passed toparglm.control.

contrasts

an optional list. See thecontrasts.arg ofmodel.matrix.default.

model

a logical value indicating whether model frame should be includedas a component of the returned value.

x,y

Forparglm: logical values indicating whether the response vectorand model matrix used in the fitting process should be returned as components of thereturned value.

Forparglm.fit:x is a design matrix of dimensionn * p, andy is a vector of observations of lengthn.

...

Forparglm: arguments to be used to form the defaultcontrol argumentif it is not supplied directly.

Forparglm.fit: unused.

etastart

starting values for the linear predictor. Not supported.

mustart

starting values for the vector of means. Not supported.

intercept

logical. Should an intercept be included in the null model?

Details

The current implementation usesmin(as.integer(n / p), nthreads)threads wheren is the number observations,p is thenumber of covariates, andnthreads is thenthreads element ofthe listreturned byparglm.control. Thus, there is likely little (ifany) reduction in computation time ifp is almost equal ton.The current implementation cannot handlep > n.

Value

glm object as returned byglm but differs mainly by theqrelement. Theqr element in the object returned byparglm(.fit) only has theRmatrix from the QR decomposition.

Examples

# small example from `help('glm')`. Fitting this model in parallel does# not matter as the data set is smallclotting <- data.frame(  u = c(5,10,15,20,30,40,60,80,100),  lot1 = c(118,58,42,35,27,25,21,19,18),  lot2 = c(69,35,26,21,18,16,13,12,12))f1 <- glm   (lot1 ~ log(u), data = clotting, family = Gamma)f2 <- parglm(lot1 ~ log(u), data = clotting, family = Gamma,             control = parglm.control(nthreads = 2L))all.equal(coef(f1), coef(f2))

Auxiliary for Controlling GLM Fitting in Parallel

Description

Auxiliary function forparglm fitting.

Usage

parglm.control(epsilon = 1e-08, maxit = 25, trace = FALSE,  nthreads = 1L, block_size = NULL, method = "LINPACK")

Arguments

epsilon

positive convergence tolerance.

maxit

integer giving the maximal number of IWLS iterations.

trace

logical indicating if output should be produced doing estimation.

nthreads

number of cores to use. You may get the best performance byusing your number of physical cores if your data set is sufficiently large.Using the number of physical CPUs/cores may yield the best performance(check your number e.g., by callingparallel::detectCores(logical = FALSE)).

block_size

number of observation to include in each parallel block.

method

string specifying which method to use. Either"LINPACK","LAPACK", or"FAST".

Details

TheLINPACK method uses the same QR method asglm.fit for the final QR decomposition.This is thedqrdc2 method described inqr. All other QRdecompositions but the last are made withDGEQP3 fromLAPACK.See Wood, Goude, and Shaw (2015) for details on the QR method.

TheFAST method computes the Fisher information and then solves the normalequation. This is faster but less numerically stable.

Value

A list with components named as the arguments.

References

Wood, S.N., Goude, Y. & Shaw S. (2015) Generalized additive models forlarge datasets. Journal of the Royal Statistical Society, Series C64(1): 139-155.

Examples

# use one coreclotting <- data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12))f1 <- parglm(lot1 ~ log(u), data = clotting, family = Gamma,             control = parglm.control(nthreads = 1L))# use two coresf2 <- parglm(lot1 ~ log(u), data = clotting, family = Gamma,             control = parglm.control(nthreads = 2L))all.equal(coef(f1), coef(f2))