Movatterモバイル変換


[0]ホーム

URL:


Title:Modelling Multivariate Data with Additive Bayesian Networks
Version:3.1.12
Date:2025-11-18
Description:The 'abn' R package facilitates Bayesian network analysis, a probabilistic graphical model that derives from empirical data a directed acyclic graph (DAG). This DAG describes the dependency structure between random variables. The R package 'abn' provides routines to help determine optimal Bayesian network models for a given data set. These models are used to identify statistical dependencies in messy, complex data. Their additive formulation is equivalent to multivariate generalised linear modelling, including mixed models with independent and identically distributed (iid) random effects. The core functionality of the 'abn' package revolves around model selection, also known as structure discovery. It supports both exact and heuristic structure learning algorithms and does not restrict the data distribution of parent-child combinations, providing flexibility in model creation and analysis. The 'abn' package uses Laplace approximations for metric estimation and includes wrappers to the 'INLA' package. It also employs 'JAGS' for data simulation purposes. For more resources and information, visit the 'abn' website.
License:GPL (≥ 3)
URL:https://r-bayesian-networks.org/,https://github.com/furrer-lab/abn
BugReports:https://github.com/furrer-lab/abn/issues
Depends:R (≥ 4.0.0)
Imports:doParallel, foreach, glmmTMB, graph, jsonlite, lme4, mclogit,methods, nnet, Rcpp, Rgraphviz, rjags, stringi
Suggests:bookdown, boot, brglm, devtools (≥ 2.4.5), ggplot2,gridExtra, INLA, knitr, Matrix, MatrixModels (≥ 0.5.3),microbenchmark, R.rsp, RhpcBLASctl, rmarkdown, testthat (≥3.0.0), entropy, moments, R6
LinkingTo:Rcpp, RcppArmadillo
VignetteBuilder:knitr
Additional_repositories:https://inla.r-inla-download.org/R/stable/
Config/testthat/edition:3
Encoding:UTF-8
LazyData:TRUE
RoxygenNote:7.3.3
SystemRequirements:pkg-config, cmake, gsl, jpeg, gdal, geos, proj,udunits-2, openssl, libcurl, jags
NeedsCompilation:yes
Packaged:2025-11-25 09:06:48 UTC; matteo
Author:Matteo DelucchiORCID iD [aut, cre], Reinhard FurrerORCID iD [aut], Gilles KratzerORCID iD [aut], Fraser Iain LewisORCID iD [aut], Jonas I. LiechtiORCID iD [ctb], Marta PittavinoORCID iD [ctb], Kalina Cherneva [ctb]
Maintainer:Matteo Delucchi <matteo.delucchi@math.uzh.ch>
Repository:CRAN
Date/Publication:2025-11-25 11:20:09 UTC

abn Package

Description

abn is a collection of functions for fitting, selecting/learning, analyzing, reporting Additive Bayesian Networks.

Value

nothing.

General overview

What isabn:Bayesian network modeling is a data analysis technique that is ideally suited to messy, highly correlated, and complex data sets.This methodology is somewhat distinct from other forms of statistical modeling in that its focus is on structure discovery - determining an optimal graphical model that describes the inter-relationships in the underlying processes which generated the data.It is a multivariate technique and can be used for one or many dependent variables.This is a data-driven approach, as opposed to, rely only on subjective expert opinion to determine how variables of interest are inter-related (for example, structural equation modeling).

The R packageabn is designed to fit additive Bayesian models to observational data sets.It contains routines to score Bayesian Networks based on Bayesian or information-theoretic formulation of generalized linear models.It is equipped with exact search and greedy search algorithms to select the best network.The Bayesian implementation supports random effects to control for one layer clustering.It supports a possible mixture of continuous, discrete, and count data and inputs of prior knowledge at a structural level.

The R packageabn requires the R packageRgraphviz to work well.It is store outside of CRAN; see ‘Examples’ for the code to install the last version.

The web pager-bayesian-networks.org provides further case studies.See also the files provided in the package directoriesinst/bootstrapping_example andinst/old_vignette for more details.

Author(s)

Maintainer: Matteo Delucchimatteo.delucchi@math.uzh.ch (ORCID)

Authors:

Other contributors:

References

Kratzer G, Lewis F, Comin A, Pittavino M, Furrer R (2023). “Additive Bayesian Network Modeling with the R Package abn.” Journal of Statistical Software, 105(8), 1-41. https://doi.org/10.18637/jss.v105.i08.

Delucchi M, Liechti J, Spinner G, Furrer R (2024). “abn: Additive Bayesian Networks.” Journal of Open Source Software, Accepted for publication. R package version 3.1.3, https://joss.theoj.org/papers/1bbc43a2be86f5d3f831cedb5cf81812.

Lewis, F. I., and Ward, M. P. (2013). "Improving epidemiologic data analyses through multivariate regression modeling". Emerging themes in epidemiology, 10(1), 4.

Kratzer, G., Lewis, F.I., Willi, B., Meli, M.L., Boretti, F.S., Hofmann-Lehmann, R., Torgerson, P., Furrer, R. and Hartnack, S. (2020). "Bayesian Network Modeling Applied to Feline Calicivirus Infection Among Cats in Switzerland". Frontiers in Veterinary Science, 7, 73.

Delucchi M, Furrer R, Kratzer G, Lewis F, Liechti J, Pittavino M, Cherneva K (2024). "abn: Modelling Multivariate Data with Additive Bayesian Networks". R package version 3.1.3, https://CRAN.R-project.org/package=abn.

See Also

Useful links:

Examples

## Citations:print(citation('abn'), bibtex=TRUE)## Installing the R package Rgraphviz:# if (!requireNamespace("BiocManager", quietly = TRUE))#     install.packages("BiocManager")# BiocManager::install("Rgraphviz")## README.md in the directory `bootstrapping_example/`:# edit(file=paste0( path.package('abn'),'/bootstrapping_example/README.md'))

Prints start up message

Description

Prints start up message

Usage

.onAttach(lib, pkg)

Value

Prints startup message to the console

Examples

library(abn)

Print AIC of objects of classabnFit

Description

Print AIC of objects of classabnFit

Usage

## S3 method for class 'abnFit'AIC(object, digits = 3L, verbose = TRUE, ...)

Arguments

object

Object of classabnFit

digits

number of digits of the results.

verbose

print additional output.

...

additional parameters. Not used at the moment.

Value

prints the AIC of the fitted model.


Print BIC of objects of classabnFit

Description

Print BIC of objects of classabnFit

Usage

## S3 method for class 'abnFit'BIC(object, digits = 3L, verbose = TRUE, ...)

Arguments

object

Object of classabnFit

digits

number of digits of the results.

verbose

print additional output.

...

additional parameters. Not used at the moment.

Value

prints the BIC of the fitted model.


Documentation of C Functions

Description

This is mainly to circumvent issues in R CMD check.

Value

nothing.


Dataset related to Feline calicivirus infection among cats in Switzerland.

Description

The dataset is about the Feline calicivirus (FCV) infection among cats in Switzerland.FCV is a virus that occurs worldwide in domestic cats but also in exotic felids.FCV is a highly contagious virus that is the major cause of upper respiratorydisease or cat flue that affects felids. This is a complex disease caused bydifferent viral and bacterial pathogens, i.e., FCV, FHV-1,Mycoplasma felis,Chlamydia felis andBordetella bronchiseptica.It can be aggravated by retrovirus infections such as FeLV and FIV.This composite dynamic makes it very interesting for a BN modeling approach.The data were collected between September 2012 and April 2013.

Usage

FCV

Format

An adapted data frame of the original dataset, which consists of 300 observations of 15 variables.

FCV

Feline Calici Virus status (0/1).

FHV_1

Feline Herpes Virus 1 status (0/1).

C_felis

C-felis and Chlamydia felis status (0/1).

M_felis

Mycoplasma felis status (0/1).

B_bronchiseptica

B-bronchiseptica & Bordetella bronchispetica status (0/1).

FeLV

feline leukosis virus status (0/1).

FIV

feline immunodeficiency virus status (0/1).

Gingivostomatitis

gingivostomatitis complex status (0/1).

URTD

URTD complex (upper respiratory complex) (0/1).

Vaccinated

vaccination status (0/1).

Pedigree

pedigree (0/1).

Outdoor

outdoor access (0/1).

Sex

sex and castrated status (M, MN, F, FS).

GroupSize

number of cats in the group (counts).

Age

age in year (continuous)\.

References

Berger, A., Willi, B., Meli, M. L., Boretti, F. S., Hartnack, S., Dreyfus, A., ... and Hofmann-Lehmann, R. (2015). Feline calicivirus and other respiratory pathogens in cats with Feline calicivirus-related symptoms and in clinically healthy cats in Switzerland. BMC Veterinary Research, 11(1), 282.


abn Version Information

Description

abn.version() provides detailed information about the running version ofabnor theabn components.

Usage

abn.version(what = c("abn", "system"))

Arguments

what

detailed information about the version ofabn or the system (see returns).

Value

abn.version(what = "system") is a list with character-string components

R

R.version.string

abn

essentiallyabn.version$version.string

GSL, JAGS, INLA

version numbers thereof

abn.version(what = "abn") is a list with character-string components

status

the status of the version (e.g.,"beta")

major

the major version number

minor

the minor version number

year

the year the version was released

month

the month the version was released

day

the day the version was released

version.string

acharacter string concatenatingthe info above, useful for plotting, etc.

abn.version is a list of class"simple.list" which has aprint method.

See Also

R.version

Examples

abn.version()$version.string## Not run:   abn.version("system")## End(Not run)

Dataset related to average daily growth performance and abattoir findings in pigs commercial production.

Description

The case study dataset is about growth performance and abattoir findings in pigs commercial production in a selected set of 15 Canadian farms collected in March 1987.

Usage

adg

Format

An adapted data frame of the original dataset which consists of 341 observations of 8 variables and a grouping variable (farm).

AR

presence of atrophic rhinitis.

pneumS

presence of moderate to severe pneumonia.

female

sex of the pig (1=female, 0=castrated).

livdam

presence of liver damage (parasite-induced white spots).

eggs

presence of fecal/gastrointestinal nematode eggs at time of slaughter.

wormCount

count of nematodes in small intestine at time of slaughter.

age

days elapsed from birth to slaughter (days).

adg

average daily weight gain (grams).

farm

farm ID.

Details

When using the data to fit an additive Bayesian network,the variablesAR,pneumS,female,livdam,eggs are considered binomial,wormcount Poisson,age andadg Gaussian.The variablefarm can be used to adjust for grouping.

References

Kratzer, G., Lewis, F.I., Comin, A., Pittavino, M. and Furrer, R. (2019). "Additive Bayesian Network Modelling with the R Package abn". arXiv preprint arXiv:1911.09006.Dohoo, Ian Robert, Wayne Martin, and Henrik Stryhn. Veterinary epidemiologic research. No. V413 DOHv. Charlottetown, Canada: AVC Incorporated, 2003.


Transform the adjacency matrix representation of a DAG to a data.framewith columns "from" and "to" representing directed edges.

Description

Transform the adjacency matrix representation of a DAG to a data.framewith columns "from" and "to" representing directed edges.

Usage

## S3 method for class 'abnDag'as.data.frame(x, ...)

Arguments

x

An object of class abnDag

...

Additional arguments (currently unused)

Details

The adjacency matrix in abnDag objects has parents in columns andchildren in rows. A value of 1 at position (i,j) indicates an arc fromparent j to child i.

Value

A data.frame with columns "from" and "to" representing directed edgesfrom parent nodes (from) to child nodes (to)

Examples

# Create example DAG matrixmydag <- createAbnDag(dag = ~a+b|a, data.df = data.frame("a"=1, "b"=1))# Convert to edge listas.data.frame(mydag)

Bugs code for Bernoulli response

Description

Bugs model for a Binomial responseX in a single trial:X \sim \mathcal{B}(n=1, p) = \mathcal{Bernoulli}(p).

Usage

bern_bugs(nodename, nodesintercept, parentnames, parentcoefs)bern_bugsGroup(nodename, nodesintercept, parentnames, parentcoefs, sigma_alpha)

Arguments

nodename

character string of response variable name.

nodesintercept

overall mean of response. Parameter from fixed-effects intercept.

parentnames

single character string (for one parent) or vector of characters (for multiple parent nodes) with parent node (predictor variables) names.

parentcoefs

overall slope for each predictor (parent node) variable (fixed-effects).

sigma_alpha

between-group variance. Parameter from random-effects intercept.

Value

Bugs model returned as stdout.

Functions

See Also

makebugssimulateAbn

Examples

bern_bugs(nodename = "a",          parentnames = c("b", "c"),          nodesintercept = c(0.318077),          parentcoefs = list("b"=c(b=0.3059395),                             "c"=c(c=0.5555)))

Control the iterations inbuildScoreCache

Description

Allow the user to set restrictions in thebuildScoreCache for both the Bayesian and the MLE approach.Control function similar tofit.control.

Usage

build.control(  method = "bayes",  max.mode.error = 10,  mean = 0,  prec = 0.001,  loggam.shape = 1,  loggam.inv.scale = 5e-05,  max.iters = 100,  epsabs = 1e-07,  error.verbose = FALSE,  trace = 0L,  epsabs.inner = 1e-06,  max.iters.inner = 100,  finite.step.size = 1e-07,  hessian.params = c(1e-04, 0.01),  max.iters.hessian = 10,  max.hessian.error = 0.5,  factor.brent = 100,  maxiters.hessian.brent = 100,  num.intervals.brent = 100,  n.grid = 250,  ncores = 1,  cluster.type = "FORK",  max.irls = 100,  tol = 1e-08,  tolPwrss = 1e-07,  check.rankX = "message+drop.cols",  check.scaleX = "message+rescale",  check.conv.grad = "message",  check.conv.singular = "message",  check.conv.hess = "message",  xtol_abs = 1e-06,  ftol_abs = 1e-06,  trace.mblogit = FALSE,  catcov.mblogit = "free",  epsilon = 1e-06,  only_glmmTMB_poisson = FALSE,  seed = 9062019L)

Arguments

method

a character that takes one of two values: "bayes" or "mle". Overridesmethod argument frombuildScoreCache.

max.mode.error

if the estimated modes from INLA differ by a factor ofmax.mode.error or more from those computed internally, then results from INLA are replaced by those computed internally. To force INLA always to be used, thenmax.mode.error=100, to force INLA never to be usedmax.mod.error=0. See alsofitAbn.

mean

the prior mean for all the Gaussian additive terms for each node. INLA argumentcontrol.fixed=list(mean.intercept=...) andcontrol.fixed=list(mean=...).

prec

the prior precision (\tau = \frac{1}{\sigma^2}) for all the Gaussian additive term for each node. INLA argumentcontrol.fixed=list(prec.intercept=...) andcontrol.fixed=list(prec=...).

loggam.shape

the shape parameter in the Gamma distribution prior for the precision in a Gaussian node. INLA argumentcontrol.family=list(hyper = list(prec = list(prior="loggamma",param=c(loggam.shape, loggam.inv.scale)))).

loggam.inv.scale

the inverse scale parameter in the Gamma distribution prior for the precision in a Gaussian node. INLA argumentcontrol.family=list(hyper = list(prec = list(prior="loggamma",param=c(loggam.shape, loggam.inv.scale)))).

max.iters

total number of iterations allowed when estimating the modes in Laplace approximation. passed to.Call("fit_single_node", ...).

epsabs

absolute error when estimating the modes in Laplace approximation for models with no random effects. Passed to.Call("fit_single_node", ...).

error.verbose

logical, additional output in the case of errors occurring in the optimization. Passed to.Call("fit_single_node", ...).

trace

Non-negative integer. If positive, tracing information on the progress of the "L-BFGS-B" optimization is produced. Higher values may produce more tracing information. (There are six levels of tracing. To understand exactly what these do see the source code.). Passed to.Call("fit_single_node", ...).

epsabs.inner

absolute error in the maximization step in the (nested) Laplace approximation for each random effect term. Passed to.Call("fit_single_node", ...).

max.iters.inner

total number of iterations in the maximization step in the nested Laplace approximation. Passed to.Call("fit_single_node", ...).

finite.step.size

suggested step length used in finite difference estimation of the derivatives for the (outer) Laplace approximation when estimating modes. Passed to.Call("fit_single_node", ...).

hessian.params

a numeric vector giving parameters for the adaptive algorithm, which determines the optimal stepsize in the finite-difference estimation of the hessian. First entry is the initial guess, second entry absolute error. Passed to.Call("fit_single_node", ...).

max.iters.hessian

integer, maximum number of iterations to use when determining an optimal finite difference approximation (Nelder-Mead). Passed to.Call("fit_single_node", ...).

max.hessian.error

if the estimated log marginal likelihood when using an adaptive 5pt finite-difference rule for the Hessian differs by more thanmax.hessian.error from when using an adaptive 3pt rule then continue to minimize the local error by switching to the Brent-Dekker root bracketing method. Passed to.Call("fit_single_node", ...).

factor.brent

if using Brent-Dekker root bracketing method then define the outer most interval end points as the best estimate ofh (stepsize) from the Nelder-Mead ash/factor.brent,h*factor.brent). Passed to.Call("fit_single_node", ...).

maxiters.hessian.brent

maximum number of iterations allowed in the Brent-Dekker method. Passed to.Call("fit_single_node", ...).

num.intervals.brent

the number of initial different bracket segments to try in the Brent-Dekker method. Passed to.Call("fit_single_node", ...).

n.grid

recompute density on an equally spaced grid withn.grid points.

ncores

The number of cores to parallelize to, see ‘Details’. If >0, the number of CPU cores to be used. -1 for all available -1 core. Only formethod="mle".

cluster.type

The type of cluster to be used, see?parallel::makeCluster.abn then defaults to"PSOCK" on Windows and"FORK" on Unix-like systems. With "FORK" the child process are started withrscript_args = "--no-environ" to avoid loading the whole workspace into each child.

max.irls

total number of iterations for estimating network scores using an Iterative Reweighed Least Square algorithm. Is this DEPRECATED?

tol

real number giving the minimal tolerance expected to terminate the Iterative Reweighed Least Square algorithm to estimate network score. Passed toirls_binomial_cpp_fast_br andirls_poisson_cpp_fast.

tolPwrss

numeric scalar passed toglmerControl - the tolerance for declaring convergence in the penalized iteratively weighted residual sum-of-squares step. Similar totol.

check.rankX

character passed tolmerControl andglmerControl - specifying ifrankMatrix(X) should be compared withncol(X) and if columns from the design matrix should possibly be dropped to ensure that it has full rank. Defaults tomessage+drop.cols.

check.scaleX

character passed tolmerControl andglmerControl - check for problematic scaling of columns of fixed-effect model matrix, e.g. parameters measured on very different scales. Defaults tomessage+rescale.

check.conv.grad

character passed tolmerControl andglmerControl - checking the gradient of the deviance function for convergence. Defaults tomessage but can be one of "ignore" - skip the test; "warning" - warn if test fails; "message" - print a message if test fails; "stop" - throw an error if test fails.

check.conv.singular

character passed tolmerControl andglmerControl - checking for a singular fit, i.e. one where some parameters are on the boundary of the feasible space (for example, random effects variances equal to 0 or correlations between random effects equal to +/- 1.0). Defaults tomessage but can be one of "ignore" - skip the test; "warning" - warn if test fails; "message" - print a message if test fails; "stop" - throw an error if test fails.

check.conv.hess

character passed tolmerControl andglmerControl - checking the Hessian of the deviance function for convergence. Defaults tomessage but can be one of "ignore" - skip the test; "warning" - warn if test fails; "message" - print a message if test fails; "stop" - throw an error if test fails.

xtol_abs

Defaults to 1e-6 stop on small change of parameter value. Only formethod='mle', group.var=.... Default convergence tolerance for fitted(g)lmer models is reduced to the value provided here if default values did not fit. This value here is passed to theoptCtrl argument of(g)lmer (see help oflme4::convergence()).

ftol_abs

Defaults to 1e-6 stop on small change in deviance. Similar toxtol_abs.

trace.mblogit

logical indicating if output should be produced for each iteration. Directly passed totrace argument inmclogit.control. Is independent ofverbose.

catcov.mblogit

Defaults to "free" meaning that there are no restrictions on the covariances of random effects between the logit equations. Set to "diagonal" if random effects pertinent to different categories are uncorrelated or "single" if random effect variances pertinent to all categories are identical.

epsilon

Defaults to 1e-8. Positive convergence tolerance\epsilon that is directly passed to thecontrol argument ofmclogit::mblogit asmclogit.control. Only formethod='mle', group.var=....

only_glmmTMB_poisson

logical, if TRUE only useglmmTMB to fit Poisson nodes with random effects. This is useful ifglmer fails due to convergence issues. Default is FALSE.

seed

a non-negative integer which sets the seed inset.seed(seed).

Details

Parallelization over all children is possible via the functionforeach of the packagedoParallel.ncores=0 orncores=1 use single threadedforeach.ncores=-1 uses all available cores but one.

Value

Named list according the provided arguments.

See Also

fit.control.

Other buildScoreCache:buildScoreCache()

Examples

ctrlmle <- abn::build.control(method = "mle",                        ncores = 0,                        cluster.type = "PSOCK",                        max.irls = 100,                        tol = 10^-11,                        tolPwrss = 1e-7,                        check.rankX = "message+drop.cols",                        check.scaleX = "message+rescale",                        check.conv.grad = "message",                        check.conv.singular = "message",                        check.conv.hess = "message",                        xtol_abs = 1e-6,                        ftol_abs = 1e-6,                        trace.mblogit = FALSE,                        catcov.mblogit = "free",                        epsilon = 1e-6,                        only_glmmTMB_poisson=FALSE,                        seed = 9062019L)ctrlbayes <- abn::build.control(method = "bayes",                           max.mode.error = 10,                           mean = 0, prec = 0.001,                           loggam.shape = 1,                           loggam.inv.scale = 5e-05,                           max.iters = 100,                           epsabs = 1e-07,                           error.verbose = FALSE,                           epsabs.inner = 1e-06,                           max.iters.inner = 100,                           finite.step.size = 1e-07,                           hessian.params = c(1e-04, 0.01),                           max.iters.hessian = 10,                           max.hessian.error = 0.5,                           factor.brent = 100,                           maxiters.hessian.brent = 100,                           num.intervals.brent = 100,                           tol = 10^-8,                           seed = 9062019L)

Build a cache of goodness of fit metrics for each node in a DAG, possibly subject to user-defined restrictions

Description

Iterates over all valid parent combinations - subject to ban, retain, andmax.parent limits - for each node, or a subset of nodes, and computes a cache of scores (AIC, BIC, log marginal likelihood).This cache can then be used in different DAG structural search algorithms.

Usage

buildScoreCache(data.df = NULL,data.dists = NULL,method = "bayes",group.var = NULL,adj.vars = NULL,cor.vars = NULL,dag.banned = NULL,dag.retained = NULL,max.parents = NULL,which.nodes = NULL,defn.res = NULL,centre = TRUE,dry.run = FALSE,control = NULL,verbose = FALSE,debugging = FALSE,...)buildScoreCache.bayes(  data.df = NULL,  data.dists = NULL,  group.var = NULL,  cor.vars = NULL,  dag.banned = NULL,  dag.retained = NULL,  max.parents = NULL,  which.nodes = NULL,  defn.res = NULL,  dry.run = FALSE,  centre = TRUE,  force.method = NULL,  mylist = NULL,  grouped.vars = NULL,  group.ids = NULL,  control = build.control(method = "bayes"),  verbose = FALSE,  debugging = FALSE)forLoopContentBayes(  row.no = NULL,  children = NULL,  node.defn = NULL,  dag.m = NULL,  force.method = NULL,  data.df = NULL,  data.dists = NULL,  var.types = NULL,  control = NULL,  grouped.vars = NULL,  group.ids = NULL,  verbose = FALSE)forLoopContent(  row.num,  mycache,  data.dists,  data.df.multi,  adj.vars,  data.df,  data.df.lvl,  group.var,  group.ids,  control,  n,  verbose)buildScoreCache.mle(  data.df = NULL,  data.dists = NULL,  max.parents = NULL,  adj.vars = NULL,  cor.vars = NULL,  dag.banned = NULL,  dag.retained = NULL,  which.nodes = NULL,  centre = TRUE,  defn.res = NULL,  dry.run = FALSE,  verbose = FALSE,  debugging = FALSE,  force.method = NULL,  group.var = NULL,  grouped.vars = NULL,  group.ids = NULL,  control = build.control(method = "mle"))

Arguments

data.df

a data frame containing the data used for learning each node. Binary variables must be declared as factors.

data.dists

a named list giving the distribution for each node in the network, see ‘Details’.

method

should a "Bayes" or "mle" approach be used, see ‘Details’.

group.var

variable name for nodes to be fitted as variable intercept as in a mixed-effects model ("Bayes" and "mle") and gives the column name indata.df of the grouping variable which must be a factor denoting group membership.

adj.vars

a character vector giving the column names indata.df for which the network score has to be adjusted for, see ‘Details’.

cor.vars

a character vector giving the column names indata.df for which a mixed model should be used to adjust for within group correlation or pure adjustment ("bayes" only).

dag.banned

a matrix or a formula statement (see ‘Details’ for format) defining which arcs are not permitted - banned - see ‘Details’ for format. Note that colnames and rownames must be set, otherwise same row/column names as data.df will be assumed. If set as NULL an empty matrix is assumed.

dag.retained

a matrix or a formula statement (see ‘Details’ for format) defining which arcs are must be retained in any model search, see ‘Details’ for format. Note that colnames and rownames must be set, otherwise same row/column names as data.df will be assumed. If set as NULL an empty matrix is assumed.

max.parents

a constant or named list giving the maximum number of parents allowed, the list version allows this to vary per node (only formethod="bayes". A constant can be a single integer, a numeric vector of the length of variables with the same integer for all variable (e.g.c(2,2)) or a named list with all values being the same (e.g.list("A"=2, "B"=2)).

which.nodes

a vector giving the column indices of the variables to be included, if ignored all variables are included. This is used to subsetdata.df.

defn.res

an optional user-supplied list of child and parent combinations, see ‘Details’.

centre

should the observations in each Gaussian node first be standardized to mean zero and standard deviation one, defaults to TRUE.

dry.run

if TRUE then a list of the child nodes and parent combinations are returned but without estimation of node scores (log marginal likelihoods).

control

a list of control parameters. Seebuild.control for the names of the settable control values and their effect.

verbose

if TRUE then provides some additional output.

debugging

ifTRUE andmethod = 'mle' this enables to step into the for-loop.

...

additional arguments passed for optimization.

force.method

"notset", "INLA" or "C". This is specified inbuildScoreCache(control=list(max.mode.error=...)).

mylist

result returned fromcheck.valid.data.

grouped.vars

result returned fromcheck.valid.groups.

group.ids

result returned fromcheck.valid.groups.

row.no

The row number of the child-parent combination to be processed.

children

vector of child node integers.

node.defn

child-parent combination table.

dag.m

Empty adjacency matrix.

var.types

vector of numeric encoding of distribution types. Seeget.var.types(data.dists)

row.num

number of child-node (mostly corresponds to child node index e.g. in dag).

mycache

prepared cache.

data.df.multi

extended data.df for one-hot-encoded multinomial variables.

data.df.lvl

copy of originaldata.df.

n

corresponds tonvars, number of variables in data.dists.

Details

The function computes a cache of scores based on possible restrictions (maximum complexity, retained and banned arcs).This function is very similar tofitAbn - see that help page for details of the type of models used and in particulardata.dists specification - but rather than fit a single complete DAGbuildScoreCache iterates over all different parent combinations for each node, creating a cache of scores.This cache of score could be used to select the optimal network in other function such assearchHeuristic ormostProbable.‘dag.banned’ and ‘dag.retained’ specify which arcs are forced to be absent or present in the DAG, respectively.If provided as matrix, rows represent child nodes and columns their parents for elements with a value $=1$.

Two very different approaches are implemented: a Bayesian and frequentist approaches. They can be selected using themethod argument.

Ifmethod="bayes":

This function is used to calculate all individual node scores (log marginal likelihoods).Internal code is used by default for numerical estimation in nodes without random effects, and INLA is the default for nodes with random effects.This default behavior can be overridden usingcontrol=list(max.mode.error=...). The default ismax.mode.error=10, which means that the modes estimated from INLA output must be within 10\Otherwise, the internal code is used rather than INLA.To force the use of INLA on all nodes, use max.mode.error=100, which then ignores this check, to force the use of internal code then usemax.mode.error=0. For more detials, seefitAbn.The variablewhich.nodes is to allow the computation to be separated by node, for example, over different CPUs using sayR CMD BATCH.This may useful and indeed likely essential with larger problems or those with random effects.Note that in this case, the results must then be combined back into a list of identical formats to that produced by an individual call tobuildScoreCache,comprising of all nodes (in the same order as the columns indata.df) before sending it to any search routines. Usingdry.run can be useful here.

Ifmethod="mle":

This function is used to calculate all individual information-theoretic node scores. The possible information-theoretic based network scores computed inbuildScoreCache are the maximum likelihood (mlik, called marginal likelihood in this context as it is computed node wise),the Akaike Information Criteria (aic), the Bayesian Information Criteria (bic) and the Minimum distance Length (mdl). The classical definitions of those metrics are given in Kratzer and Furrer (2018). This function computes a cache that can be fed into a model search algorithm.The numerical routines used here are identical to those infitAbn and see that help page for further details and also the quality assurance section on ther-bayesian-networks.org of theabn website for more details.

Value

A named list of classabnCache.

children

a vector of the child node indexes (from 1) corresponding to the columns in data.df (ignoring any grouping variable)

node.defn

a matrix giving the parent combination

mlik

log marginal likelihood value for each node combination. If the model cannot be fitted then NA is returned.

error.code

if non-zero then either the root finding algorithm (glm nodes) or the maximisation algorithm (glmm nodes) terminated in an unusual way suggesting a possible unreliable result, or else the finite difference hessian estimation produced and error or warning (glmm nodes). NULL ifmethod="mle".

error.code.desc

a textual description of theerror.code. NULL ifmethod="mle"

hessian.accuracy

An estimate of the error in the final mlik value for each parent combination - this is the absolute difference between two different adaptive finite difference rules where each computes the mlik value. NULL ifmethod="mle"

data.df

a version of the original data (for internal use only in other functions such asmostProbable).

data.dists

the named list of nodes distributions (for internal use only in other functions such asmostProbable).

max.parents

the maximum number of parents (for internal use only in other functions such asmostProbable).

dag.retained

the matrix encoding the retained arcs (for internal use only in other functions such assearchHeuristic).

dag.banned

the matrix encoding the banned arcs (for internal use only in other functions such assearchHeuristic).

aic

aic value for each node combination. If the model cannot be fitted then NaN is returned. NULL ifmethod="bayes".

bic

bic value for each node combination. If the model cannot be fitted then NaN is returned. NULL ifmethod="bayes".

mdl

mdl value for each node combination. If the model cannot be fitted then NaN is returned. NULL ifmethod="bayes".

Named vector of results from one child-parent combination subject to therow.no.The names are:

childParentCombNo

The row number of the child-parent combination in thenode.defn table.This must be the same as the row number innode.defn:careful ifbuildScoreCache.bayes() is run in parallel!

mlik

The marginal log-likelihood of the child-parent combination.

error.code

The error code returned byinla().

hessian.accuracy

The accuracy of the Hessian matrix returned byinla().

used.INLA

A logical value indicating whetherinla() was used to fit the model.

A list that will be passed tobuildScoreCache.mle.

Functions

References

Kratzer, Gilles, Fraser Lewis, Arianna Comin, Marta Pittavino, and Reinhard Furrer. “Additive Bayesian Network Modeling with the R Package Abn.” Journal of Statistical Software 105 (January 28, 2023): 1–41. https://doi.org/10.18637/jss.v105.i08.

Kratzer, G., Lewis, F.I., Comin, A., Pittavino, M., and Furrer, R. (2019). "Additive Bayesian Network Modelling with the R Package abn". arXiv:1911.09006.

Kratzer, G., and Furrer, R., (2018). "Information-Theoretic Scoring Rules to Learn Additive Bayesian Network Applied to Epidemiology". arXiv:1808.01126.

Lewis, F. I., and McCormick, B. J. J. (2012). "Revealing the complexity of health determinants in resource poor settings".American Journal Of Epidemiology. doi:10.1093/aje/KWS183).

Further information aboutabn can be found at:r-bayesian-networks.org.

See Also

fitAbn

Other buildScoreCache:build.control()

Other Bayes:calc.node.inla.glm(),calc.node.inla.glmm(),fitAbn(),getmarginals()

Examples

## Simple example# Generate dataN <- 1e6mydists <- list(a="gaussian",                b="gaussian",                c="gaussian")a <- rnorm(n = N, mean = 0, sd = 1)b <- 1 + 2*rnorm(n = N, mean = 5, sd = 1)c <- 2 + 1*a + 2*b + rnorm(n = N, mean = 2, sd = 1)mydf <- data.frame("a" = scale(a),                   "b" = scale(b),                   "c" = scale(c))# ABN with MLEmycache.mle <- buildScoreCache(data.df = mydf,                               data.dists = mydists,                               method = "mle",                            max.parents = 2)dag.mle <- mostProbable(score.cache = mycache.mle,                        max.parents = 2)myfit.mle <- fitAbn(object = dag.mle,                    method = "mle",                    max.parents = 2)plot(myfit.mle)## Not run: # ABN with Bayesif(requireNamespace("INLA", quietly = TRUE)){  # Run only if INLA is available  mycache.bayes <- buildScoreCache(data.df = mydf,                                   data.dists = mydists,                                   method = "bayes",                                   max.parents = 2)  dag.bayes <- mostProbable(score.cache = mycache.bayes,                            max.parents = 2)  myfit.bayes <- fitAbn(object = dag.bayes,                        method = "bayes",                        max.parents = 2)  plot(myfit.bayes)}# Compare MLE and Bayes with lmmymod.lm <- lm(c ~ a + b, data = mydf)summary(mymod.lm)#################################################################################################### Example 1 - "mle" vs. "bayes" and the later with using the internal C routine compared to INLA################################################################################################### Subset of the build-in dataset, see  ?ex0.dag.datamydat <- ex0.dag.data[,c("b1","b2","g1","g2","b3","g3")] ## take a subset of cols# setup distribution list for each nodemydists <- list(b1="binomial", b2="binomial", g1="gaussian",                g2="gaussian", b3="binomial", g3="gaussian")# Structural constraints## ban arc from b2 to b1## always retain arc from g2 to g1## parent limits - can be specified for each node separatelymax.par <- list("b1"=2, "b2"=2, "g1"=2, "g2"=2, "b3"=2, "g3"=2)# now build the cache of pre-computed scores accordingly to the structural constraintsres.c <- buildScoreCache(data.df=mydat,                         data.dists=mydists,                         dag.banned= ~b1|b2,                         dag.retained= ~g1|g2,                         max.parents=max.par,                         method="bayes")# repeat but using R-INLA. The mlik's should be virtually identical.# Force using of INLA build.control(max.mode.error=100)if(requireNamespace("INLA", quietly = TRUE)){  res.inla <- buildScoreCache(data.df=mydat,                              data.dists=mydists,                              dag.banned= ~b1|b2, # ban arc from b2 to b1                              dag.retained= ~g1|g2, # always retain arc from g2 to g1                              max.parents=max.par,                              method="bayes",                              control=build.control(max.mode.error=100))  ## comparison - very similar  difference <- res.c$mlik - res.inla$mlik  summary(difference)}# Comparison Bayes with MLE (unconstrained):res.mle <- buildScoreCache(data.df=mydat, data.dists=mydists,                           max.parents=3, method="mle")res.abn <- buildScoreCache(data.df=mydat, data.dists=mydists,                           max.parents=3, method="bayes")# of course different, but same order:plot(-res.mle$bic, res.abn$mlik)################################################################### Example 2 - mle with several cores################################################################### Many variables, few observationsmydat <- ex0.dag.datamydists <- as.list(rep(c("binomial", "gaussian", "poisson"), each=10))names(mydists) <- names(mydat)system.time({  res.mle1 <- buildScoreCache(data.df=mydat,                              data.dists=mydists,                              max.parents=2,                              method="mle",                              control = build.control(method = "mle",                                                      ncores=1))})system.time({  res.mle2 <- buildScoreCache(data.df=mydat,                              data.dists=mydists,                              max.parents=2,                              method="mle",                              control = build.control(method = "mle",                                                      ncores=2))})################################################################### Example 3 - grouped data - random effects example e.g. glmm################################################################### this data comes with abn see ?ex3.dag.datamydat <- ex3.dag.data[,c("b1","b2","b3","b4","b5","b6","b7",                         "b8","b9","b10","b11","b12","b13", "group")]mydists <- list(b1="binomial", b2="binomial", b3="binomial",                b4="binomial", b5="binomial", b6="binomial", b7="binomial",                b8="binomial", b9="binomial", b10="binomial",b11="binomial",                b12="binomial", b13="binomial" )max.par <- 2## in this example INLA is used as default since these are glmm nodes## when running this at node-parent combination 71 the default accuracy check on the## INLA modes is exceeded (default is a max. of 10 percent difference from## modes estimated using internal code) and a message is given that internal code## will be used in place of INLA's results.mycache.bayes <- buildScoreCache(data.df=mydat,                                 data.dists=mydists,                                 group.var="group",                                 method = "bayes",                                 max.parents=max.par)dag.bayes <- mostProbable(score.cache=mycache.bayes)plot(dag.bayes)mycache.mle <- buildScoreCache(data.df=mydat,                               data.dists=mydists,                               group.var="group",                               method = "mle",                               max.parents=max.par)dag.mle <- mostProbable(score.cache=mycache.mle)plot(dag.mle)## End(Not run)

Fit a given regression using INLA

Description

Internal wrapper to INLA and are called fromfitAbn.bayes andbuildScoreCache.bayes.

Usage

calc.node.inla.glm(  child.loc = NULL,  dag.m.loc = NULL,  data.df.loc = NULL,  data.dists.loc = NULL,  ntrials.loc = NULL,  exposure.loc = NULL,  compute.fixed.loc = NULL,  mean.intercept.loc = NULL,  prec.intercept.loc = NULL,  mean.loc = NULL,  prec.loc = NULL,  loggam.shape.loc = NULL,  loggam.inv.scale.loc = NULL,  verbose.loc = FALSE,  nthreads = NULL)

Arguments

child.loc

index of current child node.

dag.m.loc

dag as matrix.

data.df.loc

data df,

data.dists.loc

list of distributions.

ntrials.loc

rep(1,dim(data.df)[1]).

exposure.loc

rep(1,dim(data.df)[1]).

compute.fixed.loc

TRUE.

mean.intercept.loc

the prior mean for all the Gaussian additive terms for each node. INLA argumentcontrol.fixed=list(mean.intercept=...) andcontrol.fixed=list(mean=...).

prec.intercept.loc

the prior precision for all the Gaussian additive term for each node. INLA argumentcontrol.fixed=list(prec.intercept=...) andcontrol.fixed=list(prec=...).

mean.loc

the prior mean for all the Gaussian additive terms for each node. INLA argumentcontrol.fixed=list(mean.intercept=...) andcontrol.fixed=list(mean=...). Same asmean.intercept.loc.

prec.loc

the prior precision for all the Gaussian additive term for each node. INLA argumentcontrol.fixed=list(prec.intercept=...) andcontrol.fixed=list(prec=...). Same asprec.intercept.loc.

loggam.shape.loc

the shape parameter in the Gamma distribution prior for the precision in a Gaussian node. INLA argumentcontrol.family=list(hyper = list(prec = list(prior="loggamma",param=c(loggam.shape, loggam.inv.scale)))).

loggam.inv.scale.loc

the inverse scale parameter in the Gamma distribution prior for the precision in a Gaussian node. INLA argumentcontrol.family=list(hyper = list(prec = list(prior="loggamma",param=c(loggam.shape, loggam.inv.scale)))).

verbose.loc

FALSE to not print additional output.

nthreads

number of threads to use for INLA. Default isfit.control[["ncores"]] orbuild.control[["ncores"]] which is the number of cores specified incontrol and defaults to 1.

Value

If INLA failed, FALSE or an error is returned. Otherwise, the direct output from INLA is returned.

See Also

Other Bayes:buildScoreCache(),calc.node.inla.glmm(),fitAbn(),getmarginals()


Fit a given regression using INLA

Description

Internal wrapper to INLA and are called fromfitAbn.bayes andbuildScoreCache.bayes.

Usage

calc.node.inla.glmm(  child.loc = NULL,  dag.m.loc = NULL,  data.df.loc = NULL,  data.dists.loc = NULL,  ntrials.loc = NULL,  exposure.loc = NULL,  compute.fixed.loc = NULL,  mean.intercept.loc = NULL,  prec.intercept.loc = NULL,  mean.loc = NULL,  prec.loc = NULL,  loggam.shape.loc = NULL,  loggam.inv.scale.loc = NULL,  verbose.loc = FALSE,  nthreads = NULL)

Arguments

child.loc

index of current child node.

dag.m.loc

dag as matrix.

data.df.loc

data df,

data.dists.loc

list of distributions.

ntrials.loc

rep(1,dim(data.df)[1]).

exposure.loc

rep(1,dim(data.df)[1]).

compute.fixed.loc

TRUE.

mean.intercept.loc

the prior mean for all the Gaussian additive terms for each node. INLA argumentcontrol.fixed=list(mean.intercept=...) andcontrol.fixed=list(mean=...).

prec.intercept.loc

the prior precision for all the Gaussian additive term for each node. INLA argumentcontrol.fixed=list(prec.intercept=...) andcontrol.fixed=list(prec=...).

mean.loc

the prior mean for all the Gaussian additive terms for each node. INLA argumentcontrol.fixed=list(mean.intercept=...) andcontrol.fixed=list(mean=...). Same asmean.intercept.loc.

prec.loc

the prior precision for all the Gaussian additive term for each node. INLA argumentcontrol.fixed=list(prec.intercept=...) andcontrol.fixed=list(prec=...). Same asprec.intercept.loc.

loggam.shape.loc

the shape parameter in the Gamma distribution prior for the precision in a Gaussian node. INLA argumentcontrol.family=list(hyper = list(prec = list(prior="loggamma",param=c(loggam.shape, loggam.inv.scale)))).

loggam.inv.scale.loc

the inverse scale parameter in the Gamma distribution prior for the precision in a Gaussian node. INLA argumentcontrol.family=list(hyper = list(prec = list(prior="loggamma",param=c(loggam.shape, loggam.inv.scale)))).

verbose.loc

FALSE to not print additional output.

nthreads

number of threads to use for INLA. Default isfit.control[["ncores"]] orbuild.control[["ncores"]] which is the number of cores specified incontrol and defaults to 1.

Value

If INLA failed, FALSE or an error is returned. Otherwise, the direct output from INLA is returned.

See Also

Other Bayes:buildScoreCache(),calc.node.inla.glm(),fitAbn(),getmarginals()


Bugs code for Categorical response

Description

Bugs code for Categorical response

Usage

categorical_bugs(  nodename,  nodesCatIdx,  parentnames,  nodesintercepts,  parentcoefs)categorical_bugsGroup(  nodename,  nodesCatIdx,  nodesintercepts,  parentnames,  parentcoefs,  sigma,  sigma_alpha)

Arguments

nodename

character string of response variable name.

nodesCatIdx

integer vector of length|K-1| and starting atk+1 (see Examples).

parentnames

single character string (for one parent) or vector of characters (for multiple parent nodes) with parent node (predictor variables) names.

nodesintercepts

overall mean of response. Parameter from fixed-effects intercept.

parentcoefs

overall slope for each predictor (parent node) variable (fixed-effects).

sigma

within-group variance. Parameter from random-effects residual.

sigma_alpha

between-group variance-covariance matrix. Parameters from random-effects intercept.

Details

The output offitAbn withmethod = "mle" is based onthe output of logistic regression models fit with eitherlm,glm,glmer,multinom,mblogit or internalirls methods.They all use the first factor level as reference level.Therefore,nodesCatIdx starts with index2 and not1.nodesintercepts andparentcoefs refer to the values of(Intercept) andEstimate of the respective model output.Predictor names build the keys inparentcoef.

Value

Bugs model returned as stdout.

Functions

See Also

makebugssimulateAbn

Examples

# A -> B# Where B is a categorical variable with 4 levels.categorical_bugs(nodename = "b",                 nodesCatIdx = c(2, 3, 4),                 parentnames = "a",                 nodesintercepts = c(2.188650, 3.133928, 3.138531),                 parentcoefs = list("a"=c(a=1.686432, a=3.134161, a=5.052104)))

Simple check on the control parameters

Description

Simple check on the control parameters

Usage

check.valid.buildControls(control, method = "bayes", verbose = FALSE)

Arguments

control

list of control arguments with new parameters supplied tobuildScoreCache orfitAbn.

method

"bayes" or "mle" strategy from argumentmethod=... inbuildScoreCache orfitAbn. Defaults to "bayes".

verbose

when TRUE additional information is printed. Defaults to FALSE.

Value

list with all control arguments with respect to the method but with new values.


Set of simple commonsense validity checks on the directed acyclic graph definition matrix

Description

Set of simple commonsense validity checks on the directed acyclic graph definition matrix

Usage

check.valid.dag(  dag = NULL,  data.df = NULL,  is.ban.matrix = FALSE,  group.var = NULL)

Arguments

dag

Named square matrix or a formula statement specifying a directed acyclic graph.If NULL an empty network is assumed.

data.df

data frame with names corresponding to variable/node names.

is.ban.matrix

Diagonals and cycles are not checked for ban-matrices. Defaults to FALSE.

group.var

not yet implemented

Value

dag as named square matrix


Set of simple commonsense validity checks on the data.df and data.dists arguments

Description

Set of simple commonsense validity checks on the data.df and data.dists arguments

Usage

check.valid.data(data.df = NULL, data.dists = NULL, group.var = NULL)

Arguments

data.df

data frame with names corresponding to variable/node names.

data.dists

list specifying each columns distribution type. Names correspond to column names and values must be one of "gaussian", "binomial", "poisson", "multinomial".

group.var

not yet implemented

Value

list of indexes for each distribution type

a list of the indexes for each distribution type


Simple check on the control parameters

Description

Simple check on the control parameters

Usage

check.valid.fitControls(control, method = "bayes", verbose = FALSE)

Arguments

control

list of control arguments with new parameters supplied tobuildScoreCache orfitAbn.

method

"bayes" or "mle" strategy from argumentmethod=... inbuildScoreCache orfitAbn. Defaults to "bayes".

verbose

when TRUE additional information is printed. Defaults to FALSE.

Value

list with all control arguments with respect to the method but with new values.


Simple check on the grouping variable

Description

Simple check on the grouping variable

Usage

check.valid.groups(  group.var = NULL,  data.df = NULL,  cor.vars = NULL,  verbose = FALSE)

Arguments

group.var

Name of grouping variable.

data.df

data frame of all variables including the variable specified ingroup.var as factor.

cor.vars

Name(s) of variables to which the grouping should be applied to.

verbose

when TRUE additional information is printed. Defaults to FALSE.

Value

list with data.df, indexes of variables to which the grouping should be applied to and the associated group for each observation as integer.


Set of simple checks on the given parent limits

Description

Set of simple checks on the given parent limits

Usage

check.valid.parents(data.df = NULL, max.parents = NULL, group.var = NULL)

Arguments

data.df

data frame

max.parents

single integer for one overall max parent limit.A list with names corresponding to variable/column names ofdata.df and individual parent limits.NULL for no parent limit restriction(s).

group.var

not yet implemented

Value

numeric vector of max number of parents per variable


Set of simple checks on the list given as parent limits

Description

Set of simple checks on the list given as parent limits

Usage

check.which.valid.nodes(data.df = NULL, which.nodes = NULL, group.var = NULL)

Arguments

data.df

data frame

which.nodes

a vector giving the column indices of the variables to be included, if ignored all variables are included.

group.var

not yet implemented

Value

integer vector of column indexes of valid nodes in data.df


Print coefficients of objects of classabnFit

Description

Print coefficients of objects of classabnFit

Usage

## S3 method for class 'abnFit'coef(object, digits = 3L, verbose = TRUE, ...)

Arguments

object

Object of classabnFit

digits

number of digits of the results.

verbose

print additional output.

...

additional parameters. Not used at the moment.

Value

prints the coefficients of the fitted model.


Compare two DAGs or EGs

Description

Function that returns multiple graph metrics to compare two DAGsor essential graphs, known as confusion matrix or error matrix.

Usage

compareDag(ref, test, node.names = NULL, checkDAG = TRUE)

Arguments

ref

a matrix or a formula statement (see details for format) definingthe reference network structure, a directed acyclic graph (DAG).Note that row names must be set or given innode.namesif the DAG is given via a formula statement.

test

a matrix or a formula statement (see details for format) definingthe test network structure, a directed acyclic graph (DAG).Note that row names must be set or given innode.namesif the DAG is given via a formula statement.

node.names

a vector of names if the DAGs are given via formula, see details.

checkDAG

should the DAGs be tested for DAGs (default).

Details

This R function returns standard Directed Acyclic Graph comparison metrics.In statistical classification, those metrics are known as aconfusion matrix or error matrix.

Those metrics allows visualization of the difference between different DAGs.In the case where comparing TRUTH to learned structure or two learned structures,those metrics allow the user to estimate the performance of the learning algorithm.In order to compute the metrics, a contingency table is computed of apondered difference of the adjacency matrices od the two graphs.

Theref ortest can be provided using a formula statement(similar to GLM input).A typical formula is ~ node1|parent1:parent2 + node2:node3|parent3.The formula statement have to start with~.In this example, node1 has two parents (parent1 and parent2).node2 and node3 have the same parent3.The parents names have to exactly match those given innode.names.: is the separtor between either children or parents,| separates children (left side) and parents (right side),+ separates terms,. replaces all the variables innode.names.

To test for essential graphs (or graphs) in general, the test for DAGneed to be switched offcheckDAG=FALSE.The functioncompareEG() is a wrapper tocompareDag(, checkDAG=FALSE).

Value

TP

True Positive

TN

True Negative

FP

False Positive

FN

False Negative

CP

Condition Positive (ref)

CN

Condition Negative (ref)

PCP

Predicted Condition Positive (test)

PCN

Predicted Condition Negative (test)

True Positive Rate

=\frac{\sum TP}{\sum CP}

False Positive Rate

=\frac{\sum FP}{\sum CN}

Accuracy

=\frac{\sum TP + \sum TN}{Total population}

G-measure

\sqrt {{\frac {TP}{TP+FP}}\cdot {\frac {TP}{TP+FN}}}

F1-Score

\frac{2 \sum TP}{2 \sum TP + \sum FN + \sum FP}

Positive Predictive Value

\frac{\sum TP}{\sum PCP}

False Ommision Rate

\frac{\sum FN}{\sum PCN}

Hamming-Distance

Number of changes needed to match the matrices.

References

Sammut, Claude, and Geoffrey I. Webb. (2017). Encyclopedia of machine learning and data mining. Springer.

Examples

test.m <- matrix(data = c(0,1,0,                          0,0,0,                          1,0,0), nrow = 3, ncol = 3)ref.m <- matrix(data = c(0,0,0,                         1,0,0,                         1,0,0), nrow = 3, ncol = 3)colnames(test.m) <- rownames(test.m) <- colnames(ref.m) <- rownames(ref.m) <- c("a", "b", "c")unlist(compareDag(ref = ref.m, test = test.m))

Compare two DAGs or EGs

Description

Function that returns multiple graph metrics to compare two DAGsor essential graphs, known as confusion matrix or error matrix.

Usage

compareEG(ref, test)

Arguments

ref

a matrix or a formula statement (see details for format) definingthe reference network structure, a directed acyclic graph (DAG).Note that row names must be set or given innode.namesif the DAG is given via a formula statement.

test

a matrix or a formula statement (see details for format) definingthe test network structure, a directed acyclic graph (DAG).Note that row names must be set or given innode.namesif the DAG is given via a formula statement.

Details

This R function returns standard Directed Acyclic Graph comparison metrics.In statistical classification, those metrics are known as aconfusion matrix or error matrix.

Those metrics allows visualization of the difference between different DAGs.In the case where comparing TRUTH to learned structure or two learned structures,those metrics allow the user to estimate the performance of the learning algorithm.In order to compute the metrics, a contingency table is computed of apondered difference of the adjacency matrices od the two graphs.

Theref ortest can be provided using a formula statement(similar to GLM input).A typical formula is ~ node1|parent1:parent2 + node2:node3|parent3.The formula statement have to start with~.In this example, node1 has two parents (parent1 and parent2).node2 and node3 have the same parent3.The parents names have to exactly match those given innode.names.: is the separtor between either children or parents,| separates children (left side) and parents (right side),+ separates terms,. replaces all the variables innode.names.

To test for essential graphs (or graphs) in general, the test for DAGneed to be switched offcheckDAG=FALSE.The functioncompareEG() is a wrapper tocompareDag(, checkDAG=FALSE).

Value

TP

True Positive

TN

True Negative

FP

False Positive

FN

False Negative

CP

Condition Positive (ref)

CN

Condition Negative (ref)

PCP

Predicted Condition Positive (test)

PCN

Predicted Condition Negative (test)

True Positive Rate

=\frac{\sum TP}{\sum CP}

False Positive Rate

=\frac{\sum FP}{\sum CN}

Accuracy

=\frac{\sum TP + \sum TN}{Total population}

G-measure

\sqrt {{\frac {TP}{TP+FP}}\cdot {\frac {TP}{TP+FN}}}

F1-Score

\frac{2 \sum TP}{2 \sum TP + \sum FN + \sum FP}

Positive Predictive Value

\frac{\sum TP}{\sum PCP}

False Ommision Rate

\frac{\sum FN}{\sum PCN}

Hamming-Distance

Number of changes needed to match the matrices.

References

Sammut, Claude, and Geoffrey I. Webb. (2017). Encyclopedia of machine learning and data mining. Springer.

Examples

test.m <- matrix(data = c(0,1,0,                          0,0,0,                          1,0,0), nrow = 3, ncol = 3)ref.m <- matrix(data = c(0,0,0,                         1,0,0,                         1,0,0), nrow = 3, ncol = 3)colnames(test.m) <- rownames(test.m) <- colnames(ref.m) <- rownames(ref.m) <- c("a", "b", "c")unlist(compareDag(ref = ref.m, test = test.m))

Make DAG of class "abnDag"

Description

Make DAG of class "abnDag"

Usage

createAbnDag( dag, data.df = NULL, data.dists = NULL, ...)

Arguments

dag

a matrix or a formula specifying a DAG, see ‘Details’.

data.df

named dataframe.

data.dists

named list giving the distribution for each node in the network. If not provided it will be sample and returned.

...

further arguments passed to or from other methods.

Details

An object of classclass(abnDag) contains a named matrix describing the DAG and possibly additional objects such as the associated distributions of the nodes.If the dag is specified with a formula, eitherdata.df ordata.dists is required with the. quantifier.If the dag is specified with an unnamed matrix and bothdata.df anddata.dists are missing, lower-case letters of the Roman alphabet are used for the node names.

Value

abnDag object as list of dag, data.df, data.dists.

Create a legitimate DAGs

Create a legitimate DAG in the abn format.

An object of classabnDag containing a named matrix and a named list giving the distribution for each node.

Examples

dagFromFormula <- createAbnDag(dag = ~a+b|a,                              data.df = data.frame("a"=1, "b"=1),                              data.dists = list(a="binomial", b="gaussian"))dagFromMatrix <- createAbnDag(dag = matrix(c(0,1,0,0), 2, 2),                              data.df = data.frame("a"=1, "b"=1),                              data.dists = list(a="binomial", b="gaussian"))plot(dagFromMatrix)

Discretization of a Possibly Continuous Data Frame of Random Variables based on their distribution

Description

This function discretizes a data frame of possibly continuous random variablesthrough rules for discretization. The discretization algorithms are unsupervisedand univariate. See details for the complete list of discretization rules (the number of state of eachrandom variable could also be provided).

Usage

discretization(data.df = NULL,                      data.dists = NULL,                      discretization.method = "sturges",                      nb.states = FALSE)

Arguments

data.df

a data frame containing the data to discretize, binary and multinomial variables must be declared as factors, others as a numeric vector. The data frame must be named.

data.dists

a named list giving the distribution for each node in the network.

discretization.method

a character vector giving the discretization method to use; see details. If a number is provided, the variable will be discretized by equal binning.

nb.states

logical variable to select the output. If set toTRUE a list with the discretized data frame and the number of state of each variable is returned. If set to FALSE only the discretized data frame is returned.

Details

fd Freedman Diaconis rule.IQR() stands for interquartile range.The number of bins is given by

\frac{range(x) * n^{1/3}}{2 * IQR(x)}

The Freedman Diaconis rule is known to be less sensitive than the Scott's rule to outlier.

doane Doane's rule.The number of bins is given by

1 + \log_{2}{n} + \log_{2}{1+\frac{|g|}{\sigma_{g}}}

This is a modification of Sturges' formula, which attempts to improve its performance with non-normal data.

sqrt The number of bins is given by:

\sqrt(n)

cencov Cencov's rule.The number of bins is given by:

n^{1/3}

rice Rice' rule.The number of bins is given by:

2 n^{1/3}

terrell-scott Terrell-Scott's rule.The number of bins is given by:

(2 n)^{1/3}

It is known that Cencov, Rice, and Terrell-Scott rules over-estimates k, compared to other rules due to its simplicity.

sturges Sturges's rule.The number of bins is given by:

1 + \log_2(n)

scott Scott's rule.The number of bins is given by:

range(x) / \sigma(x) n^{-1/3}

Value

The discretized data frame or a list containing the table ofcounts for each bin the discretized data frame.

table of counts for each bin of the discretized data frame.

References

Garcia, S., et al. (2013). A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning.IEEE Transactions on Knowledge and Data Engineering, 25.4, 734-750.

Cebeci, Z. and Yildiz, F. (2017). Unsupervised Discretization of Continuous Variables in a Chicken Egg Quality Traits Dataset.Turkish Journal of Agriculture-Food Science and Technology, 5.4, 315-320.

Examples

## Generate random variablerv <- rnorm(n = 100, mean = 5, sd = 2)dist <- list("gaussian")names(dist) <- c("rv")## Compute the entropy through discretizationentropyData(freqs.table = discretization(data.df = rv, data.dists = dist,discretization.method = "sturges", nb.states = FALSE))

Computes an Empirical Estimation of the Entropy from a Table of Counts

Description

This function empirically estimates the Shannon entropy from a table of counts using the observed frequencies.

Usage

entropyData(freqs.table)

Arguments

freqs.table

a table of counts.

Details

The general concept of entropy is defined for probability distributions.TheentropyData() function estimates empirical entropy from data.The probability is estimated from data using frequency tables.Then the estimates are plug-in in the definition of the entropy to returnthe so-called empirical entropy. A common known problem of empirical entropyis that the estimations are biased due to the sampling noise.It is also known that the bias will decrease as the sample size increases.

Value

Shannon's entropy estimate on natural logarithm scale.

integer

References

Cover, Thomas M, and Joy A Thomas. (2012). "Elements of Information Theory". John Wiley & Sons.

See Also

discretization

Examples

## Generate random variablerv <- rnorm(n = 100, mean = 5, sd = 2)dist <- list("gaussian")names(dist) <- c("rv")## Compute the entropy through discretizationentropyData(freqs.table = discretization(data.df = rv, data.dists = dist,discretization.method = "sturges", nb.states = FALSE))

Construct the essential graph

Description

Constructs different versions of the essential graph from a given DAG.External function that computes essential graph of a dag Minimal PDAG:The only directed edges are those who participate in v-structure Completed PDAG:very directed edge corresponds to a compelled edge, and every undirectededge corresponds to a reversible edge

Usage

essentialGraph(dag, node.names = NULL, PDAG = "minimal")

Arguments

dag

a matrix or a formula statement (see ‘Details’ for format)defining the network structure, a directed acyclic graph (DAG).

node.names

a vector of names if the DAG is given via formula, see ‘Details’.

PDAG

a character value that can be: minimal or complete, see ‘Details’.

Details

This function returns an essential graph from a DAG,aka acyclic partially directed graph (PDAG).This can be useful if the learning procedure is defined up to a Markov classof equivalence.A minimal PDAG is defined as only directed edges are those who participatein v-structure. Whereas the completed PDAG: every directed edge correspondsto a compelled edge, and every undirected edge corresponds to a reversible edge.

Thedag can be provided using a formula statement (similar to glm).A typical formula is ~ node1|parent1:parent2 + node2:node3|parent3.The formula statement have to start with~.In this example, node1 has two parents (parent1 and parent2).node2 and node3 have the same parent3.The parents names have to exactly match those given innode.names.: is the separator between either children or parents,| separates children (left side) and parents (right side),+ separates terms,. replaces all the variables innode.names.

Value

A matrix giving the PDAG.

References

West, D. B. (2001). Introduction to Graph Theory. Vol. 2. Upper Saddle River: Prentice Hall.Chickering, D. M. (2013) A Transformational Characterization of Equivalent Bayesian Network Structures, arXiv:1302.4938.

Examples

dag <- matrix(c(0,0,0, 1,0,0, 1,1,0), nrow = 3, ncol = 3)dist <- list(a="gaussian", b="gaussian", c="gaussian")colnames(dag) <- rownames(dag) <- names(dist)essentialGraph(dag)

function to get marginal across an equal grid

Description

function to get marginal across an equal grid

Usage

eval.across.grid(mylist, n.grid, single)

Arguments

mylist

list of matrices of two cols x, y

n.grid

grid size

single

NULL or TRUE if only a single node and parameter

Value

list


Synthetic validation data set for use with abn library examples

Description

300 observations simulated from a DAG with 10 binary variables, 10 Gaussian variables and 10 poisson variables.

Usage

ex0.dag.data

Format

A data frame, binary variables are factors.The relevant formulas are given below (note these do not give parameterestimates just the form of the relationships, e.g. logit()=1 means a logitlink function and comprises of only an intercept term).

b1

binary, logit()=1

b2

binary, logit()=1

b3

binary, logit()=1

b4

binary, logit()=1

b5

binary, logit()=1

b6

binary, logit()=1

b7

binary, logit()=1

b8

binary, logit()=1

b9

binary, logit()=1

b10

binary, logit()=1

g1

gaussian, identity()=1

g2

gaussian, identity()=1

g3

gaussian, identity()=1

g4

gaussian, identity()=1

g5

gaussian, identity()=1

g6

gaussian, identity()=1

g7

gaussian, identity()=1

g8

gaussian, identity()=1

g9

gaussian, identity()=1

g10

gaussian, identity()=1

p1

poisson, log()=1

p2

poisson, log()=1

p3

poisson, log()=1

p4

poisson, log()=1

p5

poisson, log()=1

p6

poisson, log()=1

p7

poisson, log()=1

p8

poisson, log()=1

p9

poisson, log()=1

p10

poisson, log()=1

Examples

## Not run: ## The dataset was (essentially) generated using the following code:datasize <- 300tmp <- c(rep("y", as.integer(datasize/2)), rep("n", as.integer(datasize/2)))set.seed(1)ex0.dag.data <- data.frame(b1=sample(tmp, size=datasize, replace=TRUE),                           b2=sample(tmp, size=datasize, replace=TRUE),                           b3=sample(tmp, size=datasize, replace=TRUE),                           b4=sample(tmp, size=datasize, replace=TRUE),                           b5=sample(tmp, size=datasize, replace=TRUE),                           b6=sample(tmp, size=datasize, replace=TRUE),                           b7=sample(tmp, size=datasize, replace=TRUE),                           b8=sample(tmp, size=datasize, replace=TRUE),                           b9=sample(tmp, size=datasize, replace=TRUE),                           b10=sample(tmp, size=datasize, replace=TRUE),                           g1=rnorm(datasize, mean=0,sd=1),                           g2=rnorm(datasize, mean=0,sd=1),                           g3=rnorm(datasize, mean=0,sd=1),                           g4=rnorm(datasize, mean=0,sd=1),                           g5=rnorm(datasize, mean=0,sd=1),                           g6=rnorm(datasize, mean=0,sd=1),                           g7=rnorm(datasize, mean=0,sd=1),                           g8=rnorm(datasize, mean=0,sd=1),                           g9=rnorm(datasize, mean=0,sd=1),                           g10=rnorm(datasize, mean=0,sd=1),                           p1=rpois(datasize, lambda=10),                           p2=rpois(datasize, lambda=10),                           p3=rpois(datasize, lambda=10),                           p4=rpois(datasize, lambda=10),                           p5=rpois(datasize, lambda=10),                           p6=rpois(datasize, lambda=10),                           p7=rpois(datasize, lambda=10),                           p8=rpois(datasize, lambda=10),                           p9=rpois(datasize, lambda=10),                           p10=rpois(datasize, lambda=10))## End(Not run)

Synthetic validation data set for use with abn library examples

Description

10000 observations simulated from a DAG with 10 variables from Poisson, Bernoulli and Gaussian distributions.

Usage

ex1.dag.data

Format

A data frame, binary variables are factors.The relevant formulas are given below (note these do not give parameterestimates just the form of the relationships, like in glm(),e.g. logit()=1+p1 means a logit link function and comprises of anintercept term and a term involving p1).

b1

binary, logit()=1

p1

poisson, log()=1

g1

gaussian, identity()=1

b2

binary, logit()=1

p2

poisson, log()=1+b1+p1

b3

binary, logit()=1+b1+g1+b2

g2

gaussian, identify()=1+p1+g1+b2

b4

binary, logit()=1+g1+p2

b5

binary, logit()=1+g1+g2

g3

gaussian, identity()=1+g1+b2

Examples

## The data is one realisation from the the underlying DAG:ex1.true.dag <- matrix(data=c(  0,0,0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,0,0,  1,1,0,0,0,0,0,0,0,0,  1,0,1,1,0,0,0,0,0,0,  0,1,1,1,0,0,0,0,0,0,  0,0,1,0,1,0,0,0,0,0,  0,0,1,0,0,0,1,0,0,0,  0,0,1,1,0,0,0,0,0,0), ncol=10, byrow=TRUE)colnames(ex1.true.dag) <- rownames(ex1.true.dag) <-  c("b1","p1","g1","b2","p2","b3","g2","b4","b5","g3")

Synthetic validation data set for use with abn library examples

Description

10000 observations simulated from a DAG with 18 variables three sets each from Poisson, Bernoulli and Gaussian distributions.

Usage

ex2.dag.data

Format

A data frame, binary variables are factors.The relevant formulas are given below (note these do not give parameterestimates just the form of the relationships, e.g. logit()=1means a logit link function and comprises of only an intercept term).

b1

binary,logit()=1+g1+b2+b3+p3+b4+g4+b5

g1

gaussian,identity()=1

p1

poisson,log()=1+g6

b2

binary,logit()=1+p3+b4+p6

g2

gaussian,identify()=1+b2

p2

poisson,log()=1+b2

b3

binary,logit()=1+g1+g2+p2+g3+p3+g4

g3

gaussian,identify()=1+g1+p3+b4

p3

poisson,log()=1

b4

binary,logit()=1+g1+p3+p5

g4

gaussian,identify()=1+b4;

p4

poisson,log()=1+g1+b2+g2+b5

b5

binary,logit()=1+b2+g2+b3+p3+g4

g5

gaussian,identify()=1

p5

poisson,log()=1+g1+g5+b6+g6

b6

binary,logit()=1

g6

gaussian,identify()=1

p6

poisson,log()=1+g5

Examples

## The true underlying stochastic model has DAG - this data is a single realisation.ex2.true.dag <- matrix(data = c(  0,1,0,1,0,0,1,0,1,1,1,0,1,0,0,0,0,0,  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,  0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,  0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,  0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,  0,1,0,0,1,1,0,1,1,0,1,0,0,0,0,0,0,0,  0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,  0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,  0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,  0,1,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,  0,0,0,1,1,0,1,0,1,0,1,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,  0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0), ncol = 18, byrow = TRUE)colnames(ex2.true.dag) <- rownames(ex2.true.dag) <- c("b1","g1","p1","b2",                                                      "g2","p2","b3","g3",                                                      "p3","b4","g4","p4",                                                      "b5","g5","p5","b6",                                                      "g6","p6")

Validation data set for use with abn library examples

Description

1000 observations across with 13 binary variables and one grouping variable. Real (anonymised) data of unknown structure.

Usage

ex3.dag.data

Format

A data frame with 14 columns, whereb1,b2,...,b13 are binary variables encoded as factors andgroup is a factor with 100 factors defining the samplinggroups (10 observations each).


Valdiation data set for use with abn library examples

Description

2000 observations across with 10 binary variables and one grouping variable. Real (anonymised) data of unknown structure.

Usage

ex4.dag.data

Format

A data frame with eleven columns:group factor with 85 levels defining sampling groups;b1,...,b10 binary variables encoded as factors.


Valdiation data set for use with abn library examples

Description

434 observations across with 18 variables, 6 binary and 12 continuous, and one grouping variable. Real (anonymised) data of unknown structure.

Usage

ex5.dag.data

Format

A data frame with 19 columns:b1,...,b6 binaryvariables, encoded as factors;g1,...,g12 continuous variables. Finally, the columngroup defines sampling groups (encoded as a factor as well).


Valdiation data set for use with abn library examples

Description

800 observations across with 8 variables, 1 count, 2 binary and 4 continuous, and 1 grouping variable. Real (anonymised) data of unknown structure.

Usage

ex6.dag.data

Format

A data frame with eight columns. Binary variables are factors

p1

count

g1

continuous

g2

continuous

b1

binary

b2

binary

g3

continuous

g4

continuous

group

factor,defines sampling groups


Valdiation data set for use with abn library examples

Description

10648 observations across with 3 variables, 2 binary and 1 grouping variable.Real (anonymised) data of unknown structure.

Usage

ex7.dag.data

Format

A data frame, binary variables are factors

b1

binary

b2

binary

group

factor, defines sampling groups


expit of proportions

Description

See also the C implementation?abn::expit_cpp().

Usage

expit(x)

Arguments

x

numeric with values between[0,1].

Value

numeric vector of same length asx.


expit function

Description

transformx either via the logit, or expit.

Usage

expit_cpp(x)

Arguments

x

a numeric vector

Value

a numeric vector


Export abnFit object to structured JSON format

Description

Exports a fitted Additive Bayesian Network (ABN) model to a structured JSONformat suitable for storage, sharing, and interoperability with other analysistools. The export includes network structure (variables and arcs) and modelparameters (coefficients, variances, and their associated metadata).

Usage

export_abnFit(  object,  format = "json",  include_network = TRUE,  file = NULL,  pretty = TRUE,  scenario_id = NULL,  label = NULL,  ...)

Arguments

object

An object of classabnFit, typically created byfitAbn.

format

Character string specifying the export format. Currently, only"json" is supported.

include_network

Logical, whether to include network structure (variablesand arcs). Default isTRUE.

file

Optional character string specifying a file path to save the JSONoutput. IfNULL (default), the JSON string is returned.

pretty

Logical, whether to format the JSON output with indentation forreadability. Default isTRUE. Set toFALSE for more compact output.

scenario_id

Optional character string or numeric identifier for the modelrun or scenario. Useful for tracking multiple model versions or experiments.Default isNULL.

label

Optional character string providing a descriptive name or labelfor the scenario. Default isNULL.

...

Additional export options (currently unused, reserved for future extensions).

Details

This function provides a standardized way to export fitted ABN models to JSON,facilitating model sharing, archiving, and integration with external tools ordatabases. The JSON structure is designed to be both human-readable andmachine-parseable, following a flat architecture to avoid deep nesting.

Supported Model Types

The function handles different model fitting methods:

JSON Structure Overview

The exported JSON follows a three-component structure:

Additionally, optional top-level fieldsscenario_id andlabelcan be used to identify and describe the model.

Value

Iffile isNULL, returns a character string containing the JSONrepresentation of the model. Iffile is provided, writes the JSON tothe specified file and invisibly returns the file path.

JSON Schema

Top-Level Fields

scenario_id

Optional string or numeric identifier for the modelrun. Can benull.

label

Optional descriptive name for the model. Can benull.

variables

Array of variable objects (see Variables section).

parameters

Array of parameter objects (see Parameters section).

arcs

Array of arc objects (see Arcs section).

Variables Array

Each variable object contains:

variable_id

Unique identifier for the variable (string). This IDis used throughout the JSON to reference this variable in parameters'sourcefields and in arcs'source_variable_id/target_variable_id fields.

attribute_name

Original attribute name from the data (string).

model_type

Distribution type:"gaussian","binomial","poisson", or"multinomial".

states

Array of state objects for multinomial variables only.Each state hasstate_id (used to reference specific categories inparameters),value_name (the category label), andis_baseline(whether this is the reference category).NULL for continuous variables.

Parameters Array

Each parameter object contains:

parameter_id

Unique identifier for the parameter (string).

name

Parameter name (e.g.,"intercept","prob_2",coefficient name,"sigma","sigma_alpha").

link_function_name

Link function:"identity" (Gaussian),"logit" (Binomial, Multinomial), or"log" (Poisson).

source

Object identifying which variable and state this parameterbelongs to. Containsvariable_id (required, references a variable fromthe variables array) and optionalstate_id (references a specific statefor category-specific parameters in multinomial models).

coefficients

Array of coefficient objects (typically length 1),each withvalue,stderr (orNULL for mixed models),condition_type, andconditions array.

Coefficient Condition Types

Arcs Array

Each arc object contains:

source_variable_id

Identifier of the parent/source node.

target_variable_id

Identifier of the child/target node.

Design Rationale

The JSON structure uses a flat architecture with three parallel arrays ratherthan deeply nested objects. This design offers several advantages:

Parameters are linked to variables through thesource.variable_id field,with optionalsource.state_id for category-specific parameters inmultinomial models. Parent dependencies are encoded in theconditionsarray within each coefficient.

See Also

Examples

## Not run: # Load example data and fit a modellibrary(abn)data(ex1.dag.data)# Define distributionsmydists <- list(b1 = "binomial", p1 = "poisson", g1 = "gaussian",                b2 = "binomial", p2 = "poisson", g2 = "gaussian",                b3 = "binomial", g3 = "gaussian")# Build score cachemycache <- buildScoreCache(data.df = ex1.dag.data,                            data.dists = mydists,                            method = "mle",                            max.parents = 2)# Find most probable DAGmp_dag <- mostProbable(score.cache = mycache)# Fit the modelmyfit <- fitAbn(object = mp_dag, method = "mle")# Export to JSON string with metadatajson_export <- export_abnFit(myfit,                             scenario_id = "example_model_v1",                             label = "Example ABN Model")# View the structurelibrary(jsonlite)parsed <- fromJSON(json_export)str(parsed, max.level = 2)# Export to fileexport_abnFit(myfit,              file = "my_abn_model.json",              scenario_id = "example_model_v1",              label = "Example ABN Model",              pretty = TRUE)# Export with compact formattingcompact_json <- export_abnFit(myfit, pretty = FALSE)# ---# Mixed-effects model example# (Requires data with grouping structure)# Add grouping variableex1.dag.data$group <- rep(1:5, length.out = nrow(ex1.dag.data))# Build cache with groupingmycache_grouped <- buildScoreCache(data.df = ex1.dag.data,                                    data.dists = mydists,                                    method = "mle",                                    group.var = "group",                                    max.parents = 2)# Fit grouped modelmyfit_grouped <- fitAbn(object = mp_dag,                        method = "mle",                        group.var = "group")# Export grouped model (includes random effects)json_grouped <- export_abnFit(myfit_grouped,                              scenario_id = "grouped_model_v1",                              label = "Mixed Effects ABN")## End(Not run)

Export abnFit object fitted with Bayesian methods

Description

Export abnFit object fitted with Bayesian methods

Usage

export_abnFit_bayes(  object,  format,  include_network,  scenario_id = NULL,  label = NULL,  ...)

Arguments

object

An object of classabnFit, typically created byfitAbn.

format

Character string specifying the export format. Currently, only"json" is supported.

include_network

Logical, whether to include network structure (variablesand arcs). Default isTRUE.

scenario_id

Optional character string or numeric identifier for the modelrun or scenario. Useful for tracking multiple model versions or experiments.Default isNULL.

label

Optional character string providing a descriptive name or labelfor the scenario. Default isNULL.

...

Additional export options (currently unused, reserved for future extensions).

Details

This function handles abnFit objects fitted using Bayesian methods.It will extract the posterior distributions and other Bayesian-specific information.

The structure will follow the same variables/parameters/arcs format, but parameterswill include posterior summaries (mean, median, credible intervals) instead ofpoint estimates and standard errors.

TODO: Implement the full extraction logic for Bayesian models, including:

Value

A named list with components: scenario_id, label, variables, parameters, arcs.


Export abnFit object fitted with MLE (non-mixed effects)

Description

Export abnFit object fitted with MLE (non-mixed effects)

Usage

export_abnFit_mle(  object,  format,  include_network,  scenario_id = NULL,  label = NULL,  ...)

Arguments

object

An object of classabnFit, typically created byfitAbn.

format

Character string specifying the export format. Currently, only"json" is supported.

include_network

Logical, whether to include network structure (variablesand arcs). Default isTRUE.

scenario_id

Optional character string or numeric identifier for the modelrun or scenario. Useful for tracking multiple model versions or experiments.Default isNULL.

label

Optional character string providing a descriptive name or labelfor the scenario. Default isNULL.

...

Additional export options (currently unused, reserved for future extensions).

Details

This function handles abnFit objects fitted using Maximum Likelihood Estimation (MLE)without mixed-effects. It extracts variables metadata, arc details, and node parameterisations.

Value

A named list with components: scenario_id, label, variables, parameters, arcs.


Export arc information from abnFit objects fitted with MLE

Description

Export arc information from abnFit objects fitted with MLE

Usage

export_abnFit_mle_arcs(object, ...)

Arguments

object

An object of classabnFit, typically created byfitAbn.

...

Additional export options (currently unused, reserved for future extensions).

Details

This function extracts arc information from abnFit objects fitted using MLE.It retrieves the source and target nodes for each arc in the directed acyclic graph (DAG).Currently, frequency, significance, and constraint information are not included in the export.

Value

An array containing arc details: source_variable_id and target_variable_id for each arc.


Export node information from abnFit objects fitted with MLE (mixed effects)

Description

Export node information from abnFit objects fitted with MLE (mixed effects)

Usage

export_abnFit_mle_grouped_nodes(object, ...)

Arguments

object

An object of class abnFit fitted with method = "mle" and group.var specified.

...

Additional arguments (currently unused)

Details

This function extracts node parameterisation information from abnFit objectsthat were fitted using the Maximum Likelihood Estimation (MLE) approach WITHmixed-effects (i.e., group.var was specified).

For mixed-effects models, the structure includes:

The export format follows the same variables/parameters structure, but parameterswill include both fixed-effect coefficients and random-effect variance components.

Value

A named list with two components: variables and parameters.Variables is an array where each element represents a variable with its metadata.Parameters is an array where each element represents a parameter, including bothfixed-effect coefficients and random-effect variance components.


Export node information from abnFit objects fitted with MLE (non-mixed effects)

Description

Export node information from abnFit objects fitted with MLE (non-mixed effects)

Usage

export_abnFit_mle_nodes(object, ...)

Arguments

object

An object of class abnFit fitted with method = "mle"

...

Additional arguments (currently unused)

Details

This function extracts node parameterisation information from abnFit objectsthat were fitted using the Maximum Likelihood Estimation (MLE) approach withoutmixed-effects (i.e., no group.var specified). The function processes the coefficientsand standard errors stored in the abnFit object.

Thecoef component contains the estimated regression coefficients for each node,stored as a matrix where column names indicate the parameter names (e.g., "g2","m11", "b1|intercept"). These represent the linear model coefficients from thegeneralized linear model fitted to each node given its parents in the DAG.

TheStderror component contains the corresponding standard errors for eachcoefficient, providing a measure of uncertainty in the parameter estimates. Thestructure mirrors that of thecoef component.

For different distribution types:

Value

A named list with two components: variables and parameters.Variables is an array where each element represents a variable with its metadata.Parameters is an array where each element represents a parameter with its coefficients.


Helper function to convert export list to JSON

Description

Helper function to convert export list to JSON

Usage

export_to_json(export_list, format, file = NULL, pretty = TRUE)

Arguments

export_list

The list to convert to JSON. Must contain variables, parameters, and arcs components, see details.

format

Character string specifying the export format. Currently, only"json" is supported.

file

Optional character string specifying a file path to save the JSONoutput. IfNULL (default), the JSON string is returned.

pretty

Logical, whether to format the JSON output with indentation forreadability. Default isTRUE. Set toFALSE for more compact output.

Details

The export_list must be a named list with the following components:


Helper function to extract parameters based on distribution type

Description

Helper function to extract parameters based on distribution type

Usage

extract_parameters_by_distribution(  coef_vec,  se_vec,  distribution,  node_id,  parent_nodes,  start_counter,  link_function)

Helper function to extract parameters for mixed-effects models

Description

Helper function to extract parameters for mixed-effects models

Usage

extract_parameters_mixed_effects(  mu,  betas,  sigma,  sigma_alpha,  distribution,  node_id,  parent_nodes,  start_counter,  link_function)

Arguments

mu

Fixed-effect intercept(s) from mu component

betas

Fixed-effect coefficients from betas component

sigma

Residual variance from sigma component

sigma_alpha

Random-effect variance/covariance from sigma_alpha component

distribution

Node distribution type

node_id

Node identifier

parent_nodes

Parent node identifiers

start_counter

Starting parameter counter

link_function

Link function name

Details

Extracts parameters from mixed-effects models following the new JSON structure.

For each node, creates parameters for:


Extract states for categorical variables from data

Description

Extract states for categorical variables from data

Usage

extract_states_from_data(object, node_id)

Print family of objects of classabnFit

Description

Print family of objects of classabnFit

Usage

## S3 method for class 'abnFit'family(object, ...)

Arguments

object

Object of classabnFit

...

additional parameters. Not used at the moment.

Value

prints the distributions for each variable of the fitted model.


Find next X evaluation Point

Description

Attempt to find the next x evaluation point using spline extrapolation traversing left from mode.

Usage

find.next.left.x(mat.xy, g.max, g.factor, x.delta, max.fact.delta)find.next.right.x(mat.xy, g.max, g.factor, x.delta, max.fact.delta)

Arguments

mat.xy

matrix

g.max

integer. See Details.

g.factor

integer. See Details.

x.delta

integer. See Details.

max.fact.delta

integer. See Details.

Details

if new x point is more than a factor max.fact.delta (e.g. 0.2) from last evaluated point then stop herecat("evaluating node ",nodeid,": parameter:",paramid," at betafixed=",betafixed," with gvalue=",gvalue,"\n",sep="");find the next x value left which differs from the max. gvalue by at least a factor of g.factor, searching in step sizes ofx.delta subject to the constraint that if we move more than max.fact.delta*last.x then we evaluate here. Avoids big steps.

Value

integer

integer

Functions


Control the iterations infitAbn

Description

Allow the user to set restrictions in thefitAbn for both the Bayesian and the MLE approach.Control function similar tobuild.control.

Usage

fit.control(  method = "bayes",  max.mode.error = 10,  mean = 0,  prec = 0.001,  loggam.shape = 1,  loggam.inv.scale = 5e-05,  max.iters = 100,  epsabs = 1e-07,  error.verbose = FALSE,  trace = 0L,  epsabs.inner = 1e-06,  max.iters.inner = 100,  finite.step.size = 1e-07,  hessian.params = c(1e-04, 0.01),  max.iters.hessian = 10,  max.hessian.error = 1e-04,  factor.brent = 100,  maxiters.hessian.brent = 10,  num.intervals.brent = 100,  min.pdf = 0.001,  n.grid = 250,  std.area = TRUE,  marginal.quantiles = c(0.025, 0.25, 0.5, 0.75, 0.975),  max.grid.iter = 1000,  marginal.node = NULL,  marginal.param = NULL,  variate.vec = NULL,  ncores = 1,  cluster.type = "FORK",  max.irls = 100,  tol = 1e-11,  tolPwrss = 1e-07,  check.rankX = "message+drop.cols",  check.scaleX = "message+rescale",  check.conv.grad = "message",  check.conv.singular = "message",  check.conv.hess = "message",  xtol_abs = 1e-06,  ftol_abs = 1e-06,  trace.mblogit = FALSE,  catcov.mblogit = "free",  epsilon = 1e-06,  only_glmmTMB_poisson = FALSE,  seed = 9062019L)

Arguments

method

a character that takes one of two values: "bayes" or "mle". Overridesmethod argument frombuildScoreCache.

max.mode.error

if the estimated modes from INLA differ by a factor ofmax.mode.error or more from those computed internally, then results from INLA are replaced by those computed internally. To force INLA always to be used, thenmax.mode.error=100, to force INLA never to be usedmax.mod.error=0. See alsofitAbn.

mean

the prior mean for all the Gaussian additive terms for each node. INLA argumentcontrol.fixed=list(mean.intercept=...) andcontrol.fixed=list(mean=...).

prec

the prior precision (\tau = \frac{1}{\sigma^2}) for all the Gaussian additive term for each node. INLA argumentcontrol.fixed=list(prec.intercept=...) andcontrol.fixed=list(prec=...).

loggam.shape

the shape parameter in the Gamma distribution prior for the precision in a Gaussian node. INLA argumentcontrol.family=list(hyper = list(prec = list(prior="loggamma",param=c(loggam.shape, loggam.inv.scale)))).

loggam.inv.scale

the inverse scale parameter in the Gamma distribution prior for the precision in a Gaussian node. INLA argumentcontrol.family=list(hyper = list(prec = list(prior="loggamma",param=c(loggam.shape, loggam.inv.scale)))).

max.iters

total number of iterations allowed when estimating the modes in Laplace approximation. passed to.Call("fit_single_node", ...).

epsabs

absolute error when estimating the modes in Laplace approximation for models with no random effects. Passed to.Call("fit_single_node", ...).

error.verbose

logical, additional output in the case of errors occurring in the optimization. Passed to.Call("fit_single_node", ...).

trace

Non-negative integer. If positive, tracing information on the progress of the "L-BFGS-B" optimization is produced. Higher values may produce more tracing information. (There are six levels of tracing. To understand exactly what these do see the source code.). Passed to.Call("fit_single_node", ...).

epsabs.inner

absolute error in the maximization step in the (nested) Laplace approximation for each random effect term. Passed to.Call("fit_single_node", ...).

max.iters.inner

total number of iterations in the maximization step in the nested Laplace approximation. Passed to.Call("fit_single_node", ...).

finite.step.size

suggested step length used in finite difference estimation of the derivatives for the (outer) Laplace approximation when estimating modes. Passed to.Call("fit_single_node", ...).

hessian.params

a numeric vector giving parameters for the adaptive algorithm, which determines the optimal stepsize in the finite-difference estimation of the hessian. First entry is the initial guess, second entry absolute error. Passed to.Call("fit_single_node", ...).

max.iters.hessian

integer, maximum number of iterations to use when determining an optimal finite difference approximation (Nelder-Mead). Passed to.Call("fit_single_node", ...).

max.hessian.error

if the estimated log marginal likelihood when using an adaptive 5pt finite-difference rule for the Hessian differs by more thanmax.hessian.error from when using an adaptive 3pt rule then continue to minimize the local error by switching to the Brent-Dekker root bracketing method. Passed to.Call("fit_single_node", ...).

factor.brent

if using Brent-Dekker root bracketing method then define the outer most interval end points as the best estimate ofh (stepsize) from the Nelder-Mead ash/factor.brent,h*factor.brent). Passed to.Call("fit_single_node", ...).

maxiters.hessian.brent

maximum number of iterations allowed in the Brent-Dekker method. Passed to.Call("fit_single_node", ...).

num.intervals.brent

the number of initial different bracket segments to try in the Brent-Dekker method. Passed to.Call("fit_single_node", ...).

min.pdf

the value of the posterior density function below which we stop the estimation only used when computing marginals, see details.

n.grid

recompute density on an equally spaced grid withn.grid points.

std.area

logical, should the area under the estimated posterior density be standardized to exactly one, useful for error checking.

marginal.quantiles

vector giving quantiles at which to compute the posterior marginal distribution at.

max.grid.iter

gives number of grid points to estimate posterior density at when not explicitly specifying a grid used to avoid excessively long computation.

marginal.node

used in conjunction withmarginal.param to allow bespoke estimate of a marginal density over a specific grid. value from 1 to the number of nodes.

marginal.param

used in conjunction withmarginal.node. value of 1 is for intercept, see modes entry in results for the appropriate number.

variate.vec

a vector containing the places to evaluate the posterior marginal density, must be supplied ifmarginal.node is not null.

ncores

The number of cores to parallelize to, see ‘Details’. If >0, the number of CPU cores to be used. -1 for all available -1 core. Only formethod="mle".

cluster.type

The type of cluster to be used, see?parallel::makeCluster.abn then defaults to"PSOCK" on Windows and"FORK" on Unix-like systems. With "FORK" the child process are started withrscript_args = "--no-environ" to avoid loading the whole workspace into each child.

max.irls

total number of iterations for estimating network scores using an Iterative Reweighed Least Square algorithm. Is this DEPRECATED?

tol

real number giving the minimal tolerance expected to terminate the Iterative Reweighed Least Square algorithm to estimate network score. Passed toirls_binomial_cpp_fast_br andirls_poisson_cpp_fast.

tolPwrss

numeric scalar passed toglmerControl - the tolerance for declaring convergence in the penalized iteratively weighted residual sum-of-squares step. Similar totol.

check.rankX

character passed tolmerControl andglmerControl - specifying ifrankMatrix(X) should be compared withncol(X) and if columns from the design matrix should possibly be dropped to ensure that it has full rank. Defaults tomessage+drop.cols.

check.scaleX

character passed tolmerControl andglmerControl - check for problematic scaling of columns of fixed-effect model matrix, e.g. parameters measured on very different scales. Defaults tomessage+rescale.

check.conv.grad

character passed tolmerControl andglmerControl - checking the gradient of the deviance function for convergence. Defaults tomessage but can be one of "ignore" - skip the test; "warning" - warn if test fails; "message" - print a message if test fails; "stop" - throw an error if test fails.

check.conv.singular

character passed tolmerControl andglmerControl - checking for a singular fit, i.e. one where some parameters are on the boundary of the feasible space (for example, random effects variances equal to 0 or correlations between random effects equal to +/- 1.0). Defaults tomessage but can be one of "ignore" - skip the test; "warning" - warn if test fails; "message" - print a message if test fails; "stop" - throw an error if test fails.

check.conv.hess

character passed tolmerControl andglmerControl - checking the Hessian of the deviance function for convergence. Defaults tomessage but can be one of "ignore" - skip the test; "warning" - warn if test fails; "message" - print a message if test fails; "stop" - throw an error if test fails.

xtol_abs

Defaults to 1e-6 stop on small change of parameter value. Only formethod='mle', group.var=.... Default convergence tolerance for fitted(g)lmer models is reduced to the value provided here if default values did not fit. This value here is passed to theoptCtrl argument of(g)lmer (see help oflme4::convergence()).

ftol_abs

Defaults to 1e-6 stop on small change in deviance. Similar toxtol_abs.

trace.mblogit

logical indicating if output should be produced for each iteration. Directly passed totrace argument inmclogit.control. Is independent ofverbose.

catcov.mblogit

Defaults to "free" meaning that there are no restrictions on the covariances of random effects between the logit equations. Set to "diagonal" if random effects pertinent to different categories are uncorrelated or "single" if random effect variances pertinent to all categories are identical.

epsilon

Defaults to 1e-8. Positive convergence tolerance\epsilon that is directly passed to thecontrol argument ofmclogit::mblogit asmclogit.control. Only formethod='mle', group.var=....

only_glmmTMB_poisson

logical, if TRUE only useglmmTMB to fit Poisson nodes with random effects. This is useful ifglmer fails due to convergence issues. Default is FALSE.

seed

a non-negative integer which sets the seed inset.seed(seed).

Details

Parallelization over all children is possible via the functionforeach of the packagedoParallel.ncores=0 orncores=1 use single threadedforeach.ncores=-1 uses all available cores but one.

Value

a list of control parameters for thefitAbn function.

See Also

build.control.

Other fitAbn:fitAbn()

Examples

ctrlmle <- abn::fit.control(method = "mle",                       max.irls = 100,                       tol = 10^-11,                       tolPwrss = 1e-7,                       xtol_abs = 1e-6,                       ftol_abs = 1e-6,                       epsilon = 1e-6,                       ncores = 2,                       cluster.type = "PSOCK",                       only_glmmTMB_poisson = FALSE,                       seed = 9062019L)ctrlbayes <- abn::fit.control(method = "bayes",                         mean = 0,                         prec = 0.001,                         loggam.shape = 1,                         loggam.inv.scale = 5e-05,                         max.mode.error = 10,                         max.iters = 100,                         epsabs = 1e-07,                         error.verbose = FALSE,                         epsabs.inner = 1e-06,                         max.iters.inner = 100,                         finite.step.size = 1e-07,                         hessian.params = c(1e-04, 0.01),                         max.iters.hessian = 10,                         max.hessian.error = 1e-04,                         factor.brent = 100,                         maxiters.hessian.brent = 10,                         num.intervals.brent = 100,                         min.pdf = 0.001,                         n.grid = 100,                         std.area = TRUE,                         marginal.quantiles = c(0.025, 0.25, 0.5, 0.75, 0.975),                         max.grid.iter = 1000,                         marginal.node = NULL,                         marginal.param = NULL,                         variate.vec = NULL,                         ncores = 1,                         cluster.type = NULL,                         seed = 9062019L)

Fit an additive Bayesian network model

Description

Fits an additive Bayesian network to observed data and is equivalent to Bayesian or information-theoretic multi-dimensional regression modeling.Two numerical options are available in the Bayesian settings, standard Laplace approximation or else an integrated nested Laplace approximation provided via a call to the R INLA library (seer-inla.org - this is not hosted on CRAN).

Usage

fitAbn(object = NULL,       dag = NULL,       data.df = NULL,       data.dists = NULL,       method = NULL,       group.var = NULL,       adj.vars = NULL,       cor.vars = NULL,       centre = TRUE,       compute.fixed = FALSE,       control = NULL,       verbose = FALSE,       debugging = FALSE,       ...)fitAbn.bayes(  dag = NULL,  data.df = NULL,  data.dists = NULL,  group.var = NULL,  cor.vars = NULL,  centre = TRUE,  compute.fixed = FALSE,  control = fit.control(method = "bayes"),  mylist = NULL,  grouped.vars = NULL,  group.ids = NULL,  force.method = NULL,  verbose = FALSE,  debugging = FALSE)fitAbn.mle(  dag = NULL,  data.df = NULL,  data.dists = NULL,  group.var = NULL,  grouped.vars = NULL,  group.ids = NULL,  adj.vars = NULL,  cor.vars = NULL,  centre = TRUE,  control = fit.control(method = "mle"),  verbose = FALSE,  debugging = FALSE)regressionLoop(  i = NULL,  dag = NULL,  data.df = NULL,  data.df.multi = NULL,  data.dists = NULL,  group.var = NULL,  grouped.vars = NULL,  group.ids = NULL,  control = NULL,  nvars = NULL,  nobs = NULL,  dag.multi = NULL,  verbose = NULL,  only_glmmTMB_poisson = FALSE)

Arguments

object

an object of classabnLearned produced bymostProbable,searchHeuristic orsearchHillClimber.

dag

a matrix or a formula statement (see details) defining the network structure, a directed acyclic graph (DAG), see details for format. Note that column names and row names must be set up.

data.df

a data frame containing the data used for learning the network, binary variables must be declared as factors, and no missing values all allowed in any variable.

data.dists

a named list giving the distribution for each node in the network, see details.

method

ifNULL, takes method ofobject, otherwise"bayes" or"mle" for the method to be used, see details.

group.var

only applicable for mixed models and gives the column name indata.df of the grouping variable (which must be a factor denoting group membership).

adj.vars

a character vector giving the column names indata.df for which the network score has to be adjusted for, see details.

cor.vars

a character vector giving the column names in data.df for which a mixed model should be used (method = 'bayes' only).

centre

should the observations in each Gaussian node first be standardised to mean zero and standard deviation one.

compute.fixed

a logical flag, set toTRUE for computation of marginal posterior distributions, see details.

control

a list of control parameters. Seefit.control for the names of the settable control values and their effect.

verbose

ifTRUE then provides some additional output, in particular the code used to call INLA, if applicable.

debugging

ifTRUE andmethod = 'mle' this enables to step into the for-loop.

...

additional arguments passed for optimization.

mylist

result returned fromcheck.valid.data.

grouped.vars

result returned fromcheck.valid.groups. Column indexes of all variables which are affected from grouping effect.

group.ids

result returned fromcheck.valid.groups. Vector of group allocation for each observation (row) in 'data.df'.

force.method

"notset", "INLA" or "C". This is specified inbuildScoreCache(control=list(max.mode.error=...)).

i

number of child-node (mostly corresponds to child node index e.g. in dag).

data.df.multi

extended data.df for one-hot-encoded multinomial variables.

nvars

number of variables in data.dists.

nobs

number of observations in data.df.

dag.multi

extended dag for one-hot-encoded multinomial variables.

only_glmmTMB_poisson

logical, if TRUE only useglmmTMB to fit Poisson nodes with random effects. This is useful ifglmer fails due to convergence issues. Default is FALSE.

Details

Ifmethod="Bayes":

The procedurefitAbn fits an additive Bayesian network model to data where each node (variable - a column in data.df) can be either: presence/absence (Bernoulli); continuous (Gaussian); or an unbounded count (Poisson). Multinomial distributions are only supported withmethod = "mle" (see below).The model comprises of a set of conditionally independent generalized linear regressions with or without random effects.Internal code is used by default for numerical estimation in nodes without random effects, and INLA is the default for nodes with random effects.This default behavior can be overridden usingcontrol=list(max.mode.error=...). The default ismax.mode.error=10, which means that the modes estimated from INLA output must be within 10\Otherwise, the internal code is used rather than INLA.To force the use of INLA on all nodes, usemax.mode.error=100, which then ignores this check, to force the use of internal code then usemax.mode.error=0.For the numerical reliability and perform ofabn seehttps://r-bayesian-networks.org/.Generally speaking, INLA can be swift and accurate, but in several cases, it can perform very poorly and so some care is required (which is why there is an internal check on the modes).Binary variables must be declared as factors with two levels, and the argumentdata.dists must be a list with named arguments, one for each of the variables indata.df (except a grouping variable - if present!), where each entry is either "poisson","binomial", "multinomial" or "gaussian", see examples below.The "poisson" and "binomial" distributions use log and logit link functions, respectively.Note that "binomial" here actually means only binary, one Bernoulli trial per row indata.df.

If the data are grouped into correlated blocks - wherein a standard regression context a mixed model might be used - then a network comprising of one or more nodes where a generalized linear mixed model is used (but limited to only a single random effect).This is achieved by specifying parametersgroup.var andcor.vars.Where the former defines the group membership variable, which should be a factor indicating which observations belong to the same grouping.The parametercor.vars is a character vector that contains the names of the nodes for which a mixed model should be used. This is not yet implemented withmethod = 'mle'.For example, in some problems, it may be appropriate for all variables (exceptgroup.var) in data.df to be parametrized as a mixed model while in others it may only be a single variable for which grouping adjustment is required (as the remainder of variables are covariates measured at group level).

In the network structure definition,dag, each row represents a node in the network, and the columns in each row define the parents for that particular node, see the example below for the specific format.Thedag can be provided using a formula statement (similar to GLM).A typical formula is ~ node1|parent1:parent2 + node2:node3|parent3.The formula statement have to start with~. In this example, node1 has two parents (parent1 and parent2). node2 and node3 have the same parent3.The parents names must match those given indata.df.: is the separator between either children or parents,| separates children (left side) and parents (right side),+ separates terms,. replaces all the variables indata.df.

Ifcompute.fixed=TRUE then the marginal posterior distributions for all parameters are computed.Note the current algorithm used to determine the evaluation grid is rather crude and may need to be manually refined usingvariate.vec (one parameter at a time) for publication-quality density estimates.Note that a manual grid can only be used with internal code and not INLA (which uses its own grid).The end points are defined as where the value of the marginal density drops below a given thresholdpdf.min.When estimating the log marginal likelihood in models with random effects (using internal code rather than INLA), an attempt is made to minimize the error by comparing the estimates given between a 3pt and 5pt rule when estimating the Hessian in the Laplace approximation.The modes used in each case are identical. The first derivatives are computed using gsl's adaptive finite difference function, and this is embedding inside the standard 3pt and 5pt rules for the second derivatives.In all cases, a central difference approximation is tried first with a forward difference being a fall back (as the precision parameters are strictly positive).The error is minimized through choosing an optimal step size using gsl's Nelder-Mead optimization, and if this fails, (e.g., is larger thanmax.hessian.error) then the Brent-Dekker root bracketing method is used as a fallback.If the error cannot be reduced to belowmax.hessian.error, then the step size, which gave the lowest error during the searches (across potentially many different initial bracket choices), is used for the final Hessian evaluations in the Laplace approximation.

Ifmethod="mle":

The procedurefitAbn with the argumentmethod= "mle" fits an additive Bayesian network model to data where each node (variable - a column in data.df) can be either: presence/absence (Bernoulli); continuous (Gaussian); an unbounded count (Poisson); or a discrete variable (Multinomial).The model comprises of a set of conditionally independent generalized linear regressions with or without adjustment.Binary and discrete variables must be declared as factors and the argumentdata.dists must be a list with named arguments, one for each of the variables indata.df, where each entry is either "poisson","binomial", "multinomial" or "gaussian", see examples below.The "poisson" and "binomial" distributions use log and logit link functions, respectively.Note that "binomial" here actually means only binary, one Bernoulli trial per row in data.df.

If the data are grouped into correlated blocks - wherein a standard regression context a mixed-effect model might be used - then a network comprising of one or more nodes where a generalized linear mixed model is used (but limited to only a single random intercept).This is achieved by specifying parametergroup.var (cor.vars as withmethod = "bayes" is not yet implemented withmethod = "mle").The parametergroup.var defines the group membership variable, which should be a factor indicating which observations belong to the same grouping.This corresponds to"1|group.var" in the formula notation of e.g.lme4.

In the context offitAbn adjustment means that irrespective to the adjacency matrix the adjustment variable set (adj.vars) will be add as covariate to every node defined bycor.vars.

In the network structure definition,dag, each row represents a node in the network, and the columns in each row define the parents for that particular node, see the example below for the specific format.Thedag can be provided using a formula statement (similar to GLM). A typical formula is ~ node1|parent1:parent2 + node2:node3|parent3.The formula statement have to start with~. In this example, node1 has two parents (parent1 and parent2). node2 and node3 have the same parent3.The parents names have to exactly match those given indata.df.: is the separator between either children or parents,| separates children (left side) and parents (right side),+ separates terms,. replaces all the variables indata.df.

The Information-theoretic based network scores used infitAbn with argumentmethod="mle" are the maximum likelihood (mlik, called marginal likelihood in this context as it is computed node wise), the Akaike Information Criteria (aic), the Bayesian Information Criteria (bic) and the Minimum distance Length (mdl). The classical definitions of those metrics are given in Kratzer and Furrer (2018).

The numerical routine is based on an iterative scheme to estimate the regression coefficients. The Iterative Reweighed Least Square (IRLS) programmed using Rcpp/RcppArmadrillo. One hard coded feature offitAbn with argumentmethod="mle" is a conditional use of a bias reduced binomial regression when a classical Generalized Linear Model (GLM) fails to estimate the maximum likelihood of the given model accurately. Additionally, a QR decomposition is performed to check for rank deficiency. If the model is rank deficient and the BR GLM fails to estimate it, then predictors are sequentially removed. This feature aims at better estimating network scores when data sparsity is present.

A special care should be taken when interpreting or even displaying p-values computed withfitAbn. Indeed, the full model is already selected using goodness of fit metrics based on the (same) full dataset.

Thecontrol argument is a list with separate arguments for the Bayesian and MLE implementation. Seefit.control for details.

Value

An object of classabnFit. A named list. One entry for each of the variables indata.df (excluding the grouping variable, if present) which contains an estimate of the log marginal likelihood for that individual node. An entry "mlik" which is the total log marginal likelihood for the full ABN model. A vector oferror.codes - non-zero if a numerical error or warning occurred, and a vector error.code.desc giving a text description of the error. A listmodes, which contains all the mode estimates for each parameter at each node. A vector called Hessian accuracy, which is the estimated local error in the log marginal likelihood for each node. Ifcompute.fixed=TRUE then a list entry calledmarginals which contains a named entry for every parameter in the ABN and each entry in this list is a two-column matrix where the first column is the value of the marginal parameter, say x, and the second column is the respective density value, pdf(x). Also, a list calledmarginal.quantiles is produced, giving the quantiles for each marginal posterior distribution.

list

Functions

Author(s)

Fraser Iain Lewis and Gilles Kratzer.

References

Kratzer, G., Lewis, F.I., Comin, A., Pittavino, M. and Furrer, R. (2019). "Additive Bayesian Network Modelling with the R Package abn". arXiv preprint arXiv:1911.09006.

Kratzer, G., and Furrer, R., 2018. Information-Theoretic Scoring Rules to Learn Additive Bayesian Network Applied to Epidemiology. Preprint; Arxiv: stat.ML/1808.01126.

Lewis, F. I., and McCormick, B. J. J. (2012). Revealing the complexity of health determinants in resource poor settings.American Journal Of Epidemiology. DOI:10.1093/aje/KWS183.

Further information aboutabn can be found at:r-bayesian-networks.org.

See Also

buildScoreCache

Other fitAbn:fit.control()

Other Bayes:buildScoreCache(),calc.node.inla.glm(),calc.node.inla.glmm(),getmarginals()

Examples

## Not run: ## Built-in dataset with a subset of colsmydat <- ex0.dag.data[, c("b1", "b2", "b3", "g1", "b4", "p2", "p4")]## setup distribution list for each nodemydists <- list(b1 = "binomial",                b2 = "binomial",                b3 = "binomial",                g1 = "gaussian",                b4 = "binomial",                p2 = "poisson",                p4 = "poisson")## Null model - all independent variablesmydag_empty <- matrix(0, nrow = 7, ncol = 7)colnames(mydag_empty) <- rownames(mydag_empty) <- names(mydat)## Now fit the model to calculate its goodness-of-fitmyres <- fitAbn(dag = mydag_empty,                data.df = mydat,                data.dists = mydists)## Log-marginal likelihood goodness-of-fit for complete DAGprint(myres$mlik)## fitAbn accepts also the formula statementmyres <- fitAbn(dag = ~ b1 | b2 + b2 | p4:g1 + g1 | p2 + b3 | g1 + b4 | b1 + p4 | g1,                data.df = mydat,                data.dists = mydists)print(myres$mlik) # a much weaker fit than full independence DAG# Plot the DAG via Rgraphvizplot(myres)## Or equivalently using the formula statement, with plotting## Now repeat but include some dependencies firstmydag <- mydag_emptymydag["b1", "b2"] <- 1 # b1<-b2 and so onmydag["b2", "p4"] <- mydag["b2", "g1"] <- mydag["g1", "p2"] <- 1mydag["b3", "g1"] <- mydag["b4", "b1"] <- mydag["p4", "g1"] <- 1myres_alt <- fitAbn(dag = mydag,                    data.df = mydat,                    data.dists = mydists)plot(myres_alt)## -----------------------------------------------------------------------------## This function contains an MLE implementation accessible through a method## parameter use built-in simulated data set## -----------------------------------------------------------------------------myres_mle <- fitAbn(dag = ~ b1 | b2 + b2 | p4 + g1 + g1 | p2 + b3 | g1 + b4 | b1 + p4 | g1,                    data.df = mydat,                    data.dists = mydists,                    method = "mle")## Print the output for mle first then for Bayes:print(myres_mle)plot(myres_mle)print(myres)plot(myres)## This is a basic plot of some posterior densities. The algorithm used for## selecting density points is quite straightforward, but it might result## in a sparse distribution. Therefore, we also recompute the density over## an evenly spaced grid of 50 points between the two endpoints that had## a minimum PDF at f = min.pdf.## Setting max.mode.error = 0 forces the use of the internal C code.myres_c <- fitAbn(dag = mydag,                  data.df = mydat,                  data.dists = mydists,                  compute.fixed = TRUE,                  control = list(max.mode.error = 0))print(names(myres_c$marginals)) # gives all the different parameter names## Repeat but use INLA for the numerics using max.mode.error = 100## as using internal code is the default here rather than INLAmyres_inla <- fitAbn(dag = mydag,                     data.df = mydat,                     data.dists = mydists,                     compute.fixed = TRUE,                     control = list(max.mode.error = 100))## Plot posterior densitiesdefault_par <- par(no.readonly = TRUE) # save default par settingspar(mfrow = c(2, 2), mai = c(.7, .7, .2, .1))plot(myres_c$marginals$b1[["b1 | (Intercept)"]], type = "l", xlab = "b1 | (Intercept)")lines(myres_inla$marginals$b1[["b1 | (Intercept)"]], col = "blue")plot(myres_c$marginals$b2[["b2 | p4"]], type = "l", xlab = "b2 | p4")lines(myres_inla$marginals$b2[["b2 | p4"]], col = "blue")plot(myres_c$marginals$g1[["g1 | precision"]], type = "l", xlab = "g1 | precision")lines(myres_inla$marginals$g1[["g1 | precision"]], col = "blue")plot(myres_c$marginals$b4[["b4 | b1"]], type = "l", xlab = "b4 | b1")lines(myres_inla$marginals$b4[["b4 | b1"]], col = "blue")par(default_par) # reset par settings## An elementary mixed model example using built-in data specify DAG,## only two variables using a subset of variables from ex3.dag.data## both variables are assumed to need (separate) adjustment for the## group variable, i.e., a binomial GLMM at each nodemydists <- list(b1 = "binomial",                b2 = "binomial")## Compute marginal likelihood - use internal code via max.mode.error=0## as using INLA is the default here.## Model where b1 <- b2myres_c <- fitAbn(dag = ~b1 | b2,                  data.df = ex3.dag.data[, c(1, 2, 14)],                  data.dists = mydists,                  group.var = "group",                  cor.vars = c("b1", "b2"),                  control = list(max.mode.error = 0))print(myres_c) # show all the output## compare mode for node b1 with glmer(), lme4::glmer is automatically attached.## Now for marginals - INLA is strongly preferable for estimating marginals for## nodes with random effects as it is far faster, but may not be reliable## see https://r-bayesian-networks.org/## INLA's estimates of the marginals, using high n.grid = 500## as this makes the plots smoother - see below.myres_inla <- fitAbn(dag = ~b1 | b2,                   data.df = ex3.dag.data[, c(1, 2, 14)],                  data.dists = mydists,                  group.var = "group",                  cor.vars = c("b1", "b2"),                  compute.fixed = TRUE,                  n.grid = 500,                  control = list(max.mode.error = 100,                                 max.hessian.error = 10E-02))## this is NOT recommended - marginal density estimation using fitAbn in## mixed models is really just for diagnostic purposes, better to use## fitAbn.inla() here; but here goes... be patientmyres_c <- fitAbn(dag = ~b1 | b2,                  data.df = ex3.dag.data[, c(1, 2, 14)],                  data.dists = mydists,                  group.var = "group",                  cor.vars = c("b1", "b2"),                  compute.fixed = TRUE,                  control = list(max.mode.error = 0,                                 max.hessian.error = 10E-02))## compare marginals between internal and INLA.default_par <- par(no.readonly = TRUE) # save default par settingspar(mfrow = c(2, 3))# 5 parameters - two intercepts, one slope, two group level precisionsplot(myres_inla$marginals$b1[[1]], type = "l", col = "blue")lines(myres_c$marginals$b1[[1]], col = "brown", lwd = 2)plot(myres_inla$marginals$b1[[2]], type = "l", col = "blue")lines(myres_c$marginals$b1[[2]], col = "brown", lwd = 2)# the precision of group-level random effectsplot(myres_inla$marginals$b1[[3]], type = "l", col = "blue", xlim = c(0, 2))lines(myres_c$marginals$b1[[3]], col = "brown", lwd = 2)plot(myres_inla$marginals$b2[[1]], type = "l", col = "blue")lines(myres_c$marginals$b2[[1]], col = "brown", lwd = 2)plot(myres_inla$marginals$b2[[1]], type = "l", col = "blue")lines(myres_c$marginals$b2[[1]], col = "brown", lwd = 2)# the precision of group-level random effectsplot(myres_inla$marginals$b2[[2]], type = "l", col = "blue", xlim = c(0, 2))lines(myres_c$marginals$b2[[2]], col = "brown", lwd = 2)par(default_par) # reset par settings### these are very similar although not exactly identical## use internal code but only to compute a single parameter over a specified## grid.## This can be necessary if the simple auto grid finding functions does## a poor job.myres_c <- fitAbn(dag = ~b1 | b2,                  data.df = ex3.dag.data[, c(1, 2, 14)],                  data.dists = mydists,                  group.var = "group",                  cor.vars = c("b1", "b2"),                  centre = FALSE,                  compute.fixed = TRUE,                  control = list(marginal.node = 1,                                 marginal.param = 3, # precision term in node 1                                 variate.vec = seq(0.05, 1.5, len = 25),                                 max.hessian.error = 10E-02))default_par <- par(no.readonly = TRUE) # save default par settingspar(mfrow = c(1, 2))plot(myres_c$marginals$b1[[1]], type = "l", col = "blue") # still fairly sparse# An easy way is to use spline to fill in the density without recomputing other# points provided the original grid is not too sparse.plot(spline(myres_c$marginals$b1[[1]], n = 100), type = "b", col = "brown")par(default_par) # reset par settings## End(Not run)

Regress each node on its parents.#'

Description

Regress each node on its parents.#'

Usage

forLoopContentFitBayes(  child = NULL,  dag = NULL,  data.df = NULL,  var.types = NULL,  grouped.vars = NULL,  group.ids = NULL,  control = NULL,  INLA.marginals = NULL,  verbose = NULL,  force.method = NULL,  data.dists = NULL,  mymodes = NULL,  error.code = NULL,  hessian.accuracy = NULL,  mymargs = NULL)

Arguments

child

integer of node to be regressed

var.types

vector of numeric encoding of distribution types. Seeget.var.types(data.dists)

INLA.marginals

vector of logicals indicating which nodes are to be fitted using INLA

mymodes

Empty list of modes for each node

error.code

Empty element of error codes for each node

hessian.accuracy

Empty element of hessian accuracies for each node

mymargs

Empty list of marginals for each node

Value

list of mlik, modes, marginals, error codes, hessian accuracies and a logical if INLA was used for each node.


Formula to adjacency matrix

Description

Internal function that produce a square matrix length(name) with0,1 depending on f.f have to start with ~ terms are entries of name terms are separated by + term1 | term2 indicatescol(term1) row(term2) puts a 1 term1 | term2:term3: ... : is used as a sep . = all terms in name

Usage

formula_abn(f, name)

Value

A square matrix


Toy Data Set for Examples in README

Description

1000 observations with 5 variables: 2 continuous, 2 binary and 1 categorical.

Usage

g2b2c_data

Format

A data frame with five columns. Binary and categorical variables are factors.

G1

gaussian

B1

binomial

B2

binomial

C

categorical

G2

gaussian


Toy Data Set for Examples in README

Description

10000 observations with 6 variables: 2 continuous, 1 binary, 1 count, 1 categorical and 1 grouping factor.

Usage

g2pbcgrp

Format

A data frame with six columns. Binary and categorical variables are factors.

G1

gaussian

P

poisson

B

binomial

C

categorical

G2

gaussian

group

categorical


Bugs code for Gaussian response

Description

Bugs model for a normal distributed response variableX \sim \mathcal{N}(\mu,\,\sigma^{2}).

Usage

gauss_bugs(nodename, nodesintercept, parentnames, parentcoefs, std)gauss_bugsGroup(  nodename,  nodesintercept,  parentnames,  parentcoefs,  sigma,  sigma_alpha)

Arguments

nodename

character string of response variable name.

nodesintercept

overall mean of response. Parameter from fixed-effects intercept.

parentnames

single character string (for one parent) or vector of characters (for multiple parent nodes) with parent node (predictor variables) names.

parentcoefs

overall slope for each predictor (parent node) variable (fixed-effects).

std

integer with standard deviation of response variable that will beconverted to precision (see Details).

sigma

within-group variance. Parameter from random-effects residual.

sigma_alpha

between-group variance. Parameter from random-effects intercept.

Details

The variance of the normal distribution is\frac{1}{\tau}.

Value

Bugs model returned as stdout.

Functions

See Also

makebugssimulateAbn

Examples

gauss_bugs(nodename = "a",           parentnames = c("b", "c"),           nodesintercept = c(0.318077),           parentcoefs = list("b"=c(b=0.3059395),                              "c"=c(c=0.5555)),           std = c(0.05773503))

function to extract quantiles from INLA output

Description

function to get to extract quantiles

Usage

get.quantiles(mylist, quantiles, single)get.ind.quantiles(outmat, inmat)

Arguments

mylist

list of matrices of two cols x, y

quantiles

vector with the desired quantiles

single

NULL or TRUE if only a single node and parameter

outmat

matrix where the first col has the desired quantiles. We want to estimate this and out in into the second col

inmat

is the actual x,f(x) matrix

Value

list

matrix

Functions


Create ordered vector with integers denoting the distribution

Description

gaussian = 1, binomial = 2, poisson = 3, multinomial = 4

Usage

get.var.types(data.dists = NULL)

Arguments

data.dists

list specifying each columns distribution type. Names correspond to column names and values must be one of "gaussian", "binomial", "poisson", "multinomial".

Value

numeric encoding of distribution corresponding to its list element number indata.dists.


Extract Standard Deviations from all Gaussian Nodes

Description

Extract Standard Deviations from all Gaussian Nodes

Usage

getMSEfromModes(modes, dists)

Arguments

modes

list of modes.

dists

list of distributions.

Value

named numeric vector. Names correspond to node name. Value to standard deviations.


function to extract marginals from INLA output

Description

function to extract marginals from INLA output

Usage

getMargsINLA(list.fixed, list.hyper)

Arguments

list.fixed

list of matrices of two cols x, y

list.hyper

list of hyperparameters

Value

vector


function to extract the mode from INLA output

Description

function to extract the mode from INLA output

Usage

getModeVector(list.fixed, list.hyper)

Arguments

list.fixed

list of matrices of two cols x, y

list.hyper

list of hyperparameters

Value

vector


Description

Helper function to determine link function from distribution

Usage

get_link_function(distribution)

Internal function called byfitAbn.bayes.

Description

Function for computing marginal posterior densities using C and is called from fit.dag()Only to be called internally.

Usage

getmarginals(  res.list,  data.df,  dag.m,  var.types,  max.parents,  mean,  prec,  loggam.shape,  loggam.inv.scale,  max.iters,  epsabs,  verbose,  error.verbose,  trace,  grouped.vars,  group.ids,  epsabs.inner,  max.iters.inner,  finite.step.size,  hessian.params,  max.iters.hessian,  min.pdf,  marginal.node,  marginal.param,  variate.vec,  n.grid,  INLA.marginals,  iter.max,  max.hessian.error,  factor.brent,  maxiters.hessian.brent,  num.intervals.brent)

Arguments

res.list

rest of arguments as for call to C fitabn

data.df

a data frame containing the data used for learning the network, binary variables must be declared as factors, and no missing values all allowed in any variable.

dag.m

adjacency matrix

var.types

distributions in terms of a numeric code

max.parents

max number of parents over all nodes in dag (different from othermax.parents definitions).

mean

the prior mean for all the Gaussian additive terms for each node. INLA argumentcontrol.fixed=list(mean.intercept=...) andcontrol.fixed=list(mean=...).

prec

the prior precision (\tau = \frac{1}{\sigma^2}) for all the Gaussian additive term for each node. INLA argumentcontrol.fixed=list(prec.intercept=...) andcontrol.fixed=list(prec=...).

loggam.shape

the shape parameter in the Gamma distribution prior for the precision in a Gaussian node. INLA argumentcontrol.family=list(hyper = list(prec = list(prior="loggamma",param=c(loggam.shape, loggam.inv.scale)))).

loggam.inv.scale

the inverse scale parameter in the Gamma distribution prior for the precision in a Gaussian node. INLA argumentcontrol.family=list(hyper = list(prec = list(prior="loggamma",param=c(loggam.shape, loggam.inv.scale)))).

max.iters

total number of iterations allowed when estimating the modes in Laplace approximation. passed to.Call("fit_single_node", ...).

epsabs

absolute error when estimating the modes in Laplace approximation for models with no random effects. Passed to.Call("fit_single_node", ...).

verbose

ifTRUE then provides some additional output, in particular the code used to call INLA, if applicable.

error.verbose

logical, additional output in the case of errors occurring in the optimization. Passed to.Call("fit_single_node", ...).

trace

Non-negative integer. If positive, tracing information on the progress of the "L-BFGS-B" optimization is produced. Higher values may produce more tracing information. (There are six levels of tracing. To understand exactly what these do see the source code.). Passed to.Call("fit_single_node", ...).

grouped.vars

result returned fromcheck.valid.groups. Column indexes of all variables which are affected from grouping effect.

group.ids

result returned fromcheck.valid.groups. Vector of group allocation for each observation (row) in 'data.df'.

epsabs.inner

absolute error in the maximization step in the (nested) Laplace approximation for each random effect term. Passed to.Call("fit_single_node", ...).

max.iters.inner

total number of iterations in the maximization step in the nested Laplace approximation. Passed to.Call("fit_single_node", ...).

finite.step.size

suggested step length used in finite difference estimation of the derivatives for the (outer) Laplace approximation when estimating modes. Passed to.Call("fit_single_node", ...).

hessian.params

a numeric vector giving parameters for the adaptive algorithm, which determines the optimal stepsize in the finite-difference estimation of the hessian. First entry is the initial guess, second entry absolute error. Passed to.Call("fit_single_node", ...).

max.iters.hessian

integer, maximum number of iterations to use when determining an optimal finite difference approximation (Nelder-Mead). Passed to.Call("fit_single_node", ...).

min.pdf

the value of the posterior density function below which we stop the estimation only used when computing marginals, see details.

marginal.node

used in conjunction withmarginal.param to allow bespoke estimate of a marginal density over a specific grid. value from 1 to the number of nodes.

marginal.param

used in conjunction withmarginal.node. value of 1 is for intercept, see modes entry in results for the appropriate number.

variate.vec

a vector containing the places to evaluate the posterior marginal density, must be supplied ifmarginal.node is not null.

n.grid

recompute density on an equally spaced grid withn.grid points.

INLA.marginals

vector - TRUE if INLA used false otherwise

iter.max

same asmax.iters infit.control. Total number of iterations allowed when estimating the modes in Laplace approximation. Passed to .Call("fit_single_node", ...).

max.hessian.error

if the estimated log marginal likelihood when using an adaptive 5pt finite-difference rule for the Hessian differs by more thanmax.hessian.error from when using an adaptive 3pt rule then continue to minimize the local error by switching to the Brent-Dekker root bracketing method. Passed to.Call("fit_single_node", ...).

factor.brent

if using Brent-Dekker root bracketing method then define the outer most interval end points as the best estimate ofh (stepsize) from the Nelder-Mead ash/factor.brent,h*factor.brent). Passed to.Call("fit_single_node", ...).

maxiters.hessian.brent

maximum number of iterations allowed in the Brent-Dekker method. Passed to.Call("fit_single_node", ...).

num.intervals.brent

the number of initial different bracket segments to try in the Brent-Dekker method. Passed to.Call("fit_single_node", ...).

Value

A named list with "modes", "error.code", "hessian.accuracy", "error.code.desc", "mliknode", "mlik", "mse", "coef", "used.INLA", "marginals".

See Also

Other Bayes:buildScoreCache(),calc.node.inla.glm(),calc.node.inla.glmm(),fitAbn()


Compute standard information for a DAG.

Description

This function returns standard metrics for DAG description. A list thatcontains the number of nodes, the number of arcs,the average Markov blanket size, the neighborhood average set size,the parent average set size and children average set size.

Usage

infoDag(object, node.names = NULL)

Arguments

object

an object of classabnLearned,abnFit.Alternatively, a matrix or a formula statement defining the network structure,a directed acyclic graph (DAG).Note that row names must be set up or given innode.names.

node.names

a vector of names if the DAG is given via formula, see details.

Details

This function returns a named list with the following entries:the number of nodes, the number of arcs, the average Markov blanket size,the neighborhood average set size, the parent average set size, and thechildren's average set size.

Thedag can be provided using a formula statement (similar to glm).A typical formula is ~ node1|parent1:parent2 + node2:node3|parent3.The formula statement have to start with~.In this example, node1 has two parents (parent1 and parent2).node2 and node3 have the same parent3.The parents names have to exactly match those given innode.names.: is the separator between either children or parents,| separates children (left side) and parents (right side),+ separates terms,. replaces all the variables innode.names.

Value

A named list that contains following entries:the number of nodes, the number of arcs,the average Markov blanket size, the neighborhood average set size,the parent average set size and children average set size.

References

West, D. B. (2001). Introduction to graph theory. Vol. 2. Upper Saddle River: Prentice Hall.

Examples

## Creating a dag:dag <- matrix(c(0,0,0,0, 1,0,0,0, 1,1,0,1, 0,1,0,0), nrow = 4, ncol = 4)dist <- list(a="gaussian", b="gaussian", c="gaussian", d="gaussian")colnames(dag) <- rownames(dag) <- names(dist)infoDag(dag)plot(createAbnDag(dag = dag, data.dists = dist))

Fast Iterative Reweighed Least Square algorithm for Binomials

Description

IRLS to estimate network score of Binomial nodes.

Usage

irls_binomial_cpp_fast(A, b, maxit, tol)

Value

a list


Fast Br Iterative Reweighed Least Square algorithm for Binomials

Description

IRLS to estimate network score of Binomial nodes.

Usage

irls_binomial_cpp_fast_br(A, b, maxit, tol)

Value

a list


Fast Iterative Reweighed Least Square algorithm for Gaussians

Description

IRLS to estimate network score of Gaussian nodes.

Usage

irls_gaussian_cpp_fast(A, b, maxit, tol)

Fast Iterative Reweighed Least Square algorithm for Poissons

Description

IRLS to estimate network score of Poisson nodes.

Usage

irls_poisson_cpp_fast(A, b, maxit, tol)

Value

a list


Returns the strengths of the edge connections in a Bayesian Network learned from observational data

Description

A flexible implementation of multiple proxy for strength measures useful forvisualizing the edge connections in a Bayesian Network learned from observational data.

Usage

linkStrength(dag,                    data.df = NULL,                    data.dists = NULL,                    method = c("mi.raw",                               "mi.raw.pc",                               "mi.corr",                               "ls",                               "ls.pc",                               "stat.dist"),                    discretization.method = "doane")

Arguments

dag

a matrix or a formula statement (see details for format) definingthe network structure, a directed acyclic graph (DAG).Note that rownames must be set or given indata.dist if the DAG isgiven via a formula statement.

data.df

a data frame containing the data used for learning each node,binary variables must be declared as factors.

data.dists

a named list giving the distribution for each node in thenetwork, see ‘Details’.

method

the method to be used. See ‘Details’.

discretization.method

a character vector giving the discretizationmethod to use. Seediscretization.

Details

This function returns multiple proxies for estimating the connection strengthof the edges of a possibly discretized Bayesian network's data set.The returned connection strength measures are: the Raw Mutual Information(mi.raw), the Percentage Mutual information (mi.raw.pc),the Raw Mutual Information computed via correlation (mi.corr),the link strength (ls), the percentage link strength (ls.pc)and the statistical distance (stat.dist).

The general concept of entropy is defined for probability distributions.The probability is estimated from data using frequency tables.Then the estimates are plug-in in the definition of the entropy to returnthe so-called empirical entropy. A standard known problem of empirical entropyis that the estimations are biased due to the sampling noise.This is also known that the bias will decrease as the sample size increases.The mutual information estimation is computed from the observed frequenciesthrough a plug-in estimator based on entropy.For the case of an arc going from the node X to the node Y and the remainingset of parent of Y is denoted as Z.

The mutual information is defined as I(X, Y) = H(X) + H(Y) - H(X, Y),where H() is the entropy.

The Percentage Mutual information is defined as PI(X,Y) = I(X,Y)/H(Y|Z).

The Mutual Information computed via correlation is defined asMI(X,Y) = -0.5 log(1-cor(X,Y)^2).

The link strength is defined as LS(X->Y) = H(Y|Z)-H(Y|X,Z).

The percentage link strength is defined as PLS(X->Y) = LS(X->Y) / H(Y|Z).

The statistical distance is defined as SD(X,Y) = 1- MI(X,Y) / max(H(X),H(Y)).

Value

The function returns a named matrix with the requested metric.

References

Boerlage, B. (1992). Link strength in Bayesian networks. Diss. University of British Columbia.Ebert-Uphoff, Imme. "Tutorial on how to measure link strengths in discrete Bayesian networks." (2009).

Examples

# GaussianN <- 1000mydists <- list(a="gaussian",                b="gaussian",                c="gaussian")a <- rnorm(n = N, mean = 0, sd = 1)b <- 1 + 2*rnorm(n = N, mean = 5, sd = 1)c <- 2 + 1*a + 2*b + rnorm(n = N, mean = 2, sd = 1)mydf <- data.frame("a" = a,                   "b" = b,                   "c" = c)mycache.mle <- buildScoreCache(data.df = mydf,                               data.dists = mydists,                               method = "mle",                               max.parents = 2)mydag.mp <- mostProbable(score.cache = mycache.mle, verbose = FALSE)linkstr <- linkStrength(dag = mydag.mp$dag,                        data.df = mydf,                        data.dists = mydists,                        method = "ls",                        discretization.method = "sturges")

Print logLik of objects of classabnFit

Description

Print logLik of objects of classabnFit

Usage

## S3 method for class 'abnFit'logLik(object, digits = 3L, verbose = TRUE, ...)

Arguments

object

Object of classabnFit

digits

number of digits of the results.

verbose

print additional output.

...

additional parameters. Not used at the moment.

Value

prints the logLik of the fitted model.


Logit of proportions

Description

See also the C implementation?abn::logit_cpp().

Usage

logit(x)

Arguments

x

numeric with values between[0,1].

Value

numeric vector of same length asx.

numeric vector of same length asx.


logit functions

Description

transformx either via the logit, or expit.

Usage

logit_cpp(x)

Arguments

x

a numeric vector

Value

a numeric vector


Make BUGS model from fitted DAG

Description

Make BUGS model from fitted DAG

Usage

makebugs(dag, data.dists, coefs, stderrors)

Arguments

dag

named adjacency matrix representing the DAG. Names correspond to node names.

data.dists

list of node distributions.

coefs

a list named by the node names containing for each element a matrix with the nodes' coefficients.

stderrors

a list named by the node names containing for each element a matrix with the nodes' standard errors

Value

Bugs model returned as stdout.

See Also

simulateAbngauss_bugsbern_bugscategorical_bugspois_bugs

Examples

## Prepare data and argumentsmydists <- list(a="gaussian",                b="multinomial",                c="binomial",                d="poisson")mydag <- matrix(0, 4, 4, byrow = TRUE,                dimnames = list(c("a", "b", "c", "d"),                                c("a", "b", "c", "d")))mydag[2,1] <- mydag[3,2] <- mydag[4,3] <- 1# plotAbn(mydag, data.dists = mydists)mycoefs <- list("a"=matrix(-6.883383e-17, byrow = TRUE,                           dimnames = list(NULL,                                           "a|intercept")),                "b"=matrix(c(2.18865, 3.133928, 3.138531, 1.686432, 3.134161, 5.052104),                           nrow= 1, byrow = TRUE,                           dimnames = list(c(NULL),                                      c("b|intercept.2", "b|intercept.3", "b|intercept.4",                                      "a.2", "a.3", "a.4"))),                "c"=matrix(c(1.11, 2.22, 3.33, 4.44, 5.55),                           nrow= 1, byrow = TRUE,                           dimnames = list(c(NULL),                                      c("c|intercept", "b1", "b2", "b3", "b4"))),                "d"=matrix(c(3.33, 4.44),                           nrow= 1, byrow = TRUE,                           dimnames = list(c(NULL),                                      c("d|intercept", "c"))))mymse <- c("a"=0,"b"=1,"c"=2,"d"=3)## Make BUGS modelmakebugs(dag = mydag, data.dists = mydists, coefs = mycoefs, stderrors = mymse)

Make BUGS model from fitted DAG with grouping

Description

Make BUGS model from fitted DAG with grouping

Usage

makebugsGroup(  dag,  data.dists,  stderrors,  group.var,  mu,  betas,  sigma,  sigma_alpha)

Arguments

dag

named adjacency matrix representing the DAG. Names correspond to node names.

data.dists

list of node distributions.

stderrors

a list named by the node names containing for each element a matrix with the nodes' standard errors

group.var

only applicable for mixed models and gives the column name indata.df of the grouping variable (which must be a factor denoting group membership).

mu

Standard deviation of fixed effects.

betas

Coefficients/slopes of fixed effects .

sigma

variance of random effects.

sigma_alpha

variance-covariance matrix corresponding to covariances output frommblogit.

Value

Bugs model returned as stdout.

See Also

simulateAbngauss_bugsGroupbern_bugsGroupcategorical_bugsGrouppois_bugsGroup


Compute the Markov blanket

Description

This function computes the Markov blanket of a set of nodes given a DAG (Directed Acyclic Graph).

Usage

mb(dag, node, data.dists = NULL, data.df = NULL)

Arguments

dag

a matrix or a formula statement (see details for format) defining the network structure, a directed acyclic graph (DAG).

node

a character vector of the nodes for which the Markov Blanket should be returned.

data.dists

a named list giving the distribution for each node in the network, see details.

data.df

a data frame containing the data for the nodes in the network. Only needed ifdag is a formula statement.

Details

This function returns the Markov Blanket of a set of nodes given a DAG.

Thedag can be provided as a matrix where the rows and columns are the nodes names.The matrix should be binary, where 1 indicates an edge from the column node (parent) to the row node (child).The diagonal of the matrix should be 0 and the matrix should be acyclic.The nodes names should be the same as the names of the distributions indata.dists.

Alternatively, thedag can be provided using a formula statement (similar to glm).This requires thedata.dists anddata.df arguments to be provided.A typical formula is ~ node1|parent1:parent2 + node2:node3|parent3.The formula statement have to start with~.In this example,node1 has two parents (parent1 andparent2).node2 andnode3 are children of the same parent (parent3).The parents names have to exactly match those given inname.: is the separator between either children or parents,| separates children (left side) and parents (right side),+ separates terms,. replaces all the variables inname.

Value

character vector of node names from the Markov blanket.

Examples

## Defining distribution and dagdist <- list(a="gaussian", b="gaussian", c="gaussian", d="gaussian",             e="binomial", f="binomial")dag <- matrix(c(0,1,1,0,1,0,                0,0,1,1,0,1,                0,0,0,0,0,0,                0,0,0,0,0,0,                0,0,0,0,0,1,                0,0,0,0,0,0), nrow = 6L, ncol = 6L, byrow = TRUE)colnames(dag) <- rownames(dag) <- names(dist)mb(dag, node = "b", data.dists = dist)mb(dag, node = c("b","e"), data.dists = dist)

Empirical Estimation of the Entropy from a Table of Counts

Description

This function empirically estimates the Mutual Information from a table of counts using the observed frequencies.

Usage

miData(freqs.table, method = c("mi.raw", "mi.raw.pc"))

Arguments

freqs.table

a table of counts.

method

a character determining if the Mutual Information should be normalized.

Details

The mutual information estimation is computed from the observed frequencies through a plugin estimator based on entropy.

The plugin estimator is

I(X, Y) = H (X) + H(Y) - H(X, Y)

, where

H()

is the entropy computed withentropyData.

Value

Mutual information estimate.

integer

References

Cover, Thomas M, and Joy A Thomas. (2012). "Elements of Information Theory". John Wiley & Sons.

See Also

discretization

Examples

## Generate random variableY <- rnorm(n = 100, mean = 0, sd = 2)X <- rnorm(n = 100, mean = 5, sd = 2)dist <- list(Y="gaussian", X="gaussian")miData(discretization(data.df = cbind(X,Y), data.dists = dist,                      discretization.method = "fd", nb.states = FALSE),                      method = "mi.raw")

Mutual Information

Description

Calculates the mutual information.

Usage

mi_cpp(joint_dist)

Value

a double


Convert modes to fitAbn.mle$coefs structure

Description

Convert modes to fitAbn.mle$coefs structure

Usage

modes2coefs(modes)

Arguments

modes

list of modes.

Value

list of matrix arrays.


Find most probable DAG structure

Description

Find most probable DAG structure using exact order based approach due to Koivisto and Sood, 2004.

Usage

mostProbable(score.cache, score="bic", prior.choice=1, verbose=TRUE, ...)

Arguments

score.cache

object of classabnCache typically outputted by frombuildScoreCache().

score

which score should be used to score the network. Possible choices areaic, bic, mdl, mlik.

prior.choice

an integer, 1 or 2, where 1 is a uniform structural prior and 2 uses a weighted prior, see details

verbose

if TRUE then provides some additional output.

...

further arguments passed to or from other methods.

Details

The procedure runs the exact order based structure discovery approach of Koivisto and Sood (2004) to find the most probable posterior network (DAG). The local.score is the node cache, as created usingbuildScoreCache (or manually provided the same format is used). Note that the scope of this search is given by the options used in local.score, for example, by restricting the number of parents or the ban or retain constraints given there.

This routine can take a long time to complete and is highly sensitive to the number of nodes in the network. It is recommended to use this on a reduced data set to get an idea as to the computational practicality of this approach. In particular, memory usage can quickly increase to beyond what may be available. For additive models, problems comprising up to 20 nodes are feasible on most machines. Memory requirements can increase considerably after this, but then so does the run time making this less practical. It is recommended that some form of over-modeling adjustment is performed on this resulting DAG (unless dealing with vast numbers of observations), for example, using parametric bootstrapping, which is straightforward to implement in MCMC engines such as JAGS or WinBUGS. See the case studies athttps://r-bayesian-networks.org/or the files provided in the package directoriesinst/bootstrapping_example andinst/old_vignettefor details.

The parameterprior.choice determines the prior used within each node for a given choice of parent combination. In Koivisto and Sood (2004) p.554, a form of prior is used, which assumes that the prior probability for parent combinations comprising of the same number of parents are all equal. Specifically, that the prior probability for parent set G with cardinality |G| is proportional to1/[n-1 choose |G|] where there are n total nodes. Note that this favors parent combinations with either very low or very high cardinality, which may not be appropriate. This prior is used whenprior.choice=2. Whenprior.choice=1 an uninformative prior is used where parent combinations of all cardinalities are equally likely. The latter is equivalent to the structural prior used in the heuristic searches e.g.,searchHillclimber orsearchHeuristic.

Note that the network score (log marginal likelihood) of the most probable DAG is not returned as it can easily be computed usingfitAbn, see examples below.

Value

An object of classabnMostprobable, which is a list containing: a matrix giving the DAG definition of the most probable posterior structure, the cache of pre-computed scores and the score used for selection.

References

Koivisto, M. V. (2004). Exact Structure Discovery in Bayesian Networks, Journal of Machine Learning Research, vol 5, 549-573.

Examples

## Not run: ################################ Example 1################################ This data comes with 'abn' see ?ex1.dag.datamydat <- ex1.dag.data[1:5000, c(1:7, 10)]## Setup distribution list for each node:mydists <- list(b1 = "binomial",                p1 = "poisson",                g1 = "gaussian",                b2 = "binomial",                p2 = "poisson",                b3 = "binomial",                g2 = "gaussian",                g3 = "gaussian")## Parent limits, for speed purposes quite specific here:max_par <- list("b1" = 0,                "p1" = 0,                "g1" = 1,                "b2" = 1,                "p2" = 2,                "b3" = 3,                "g2" = 3,                "g3" = 2)## Now build cache (no constraints in ban nor retain)mycache <- buildScoreCache(data.df = mydat,                           data.dists = mydists,                           max.parents = max_par)## Find the globally best DAG:mp_dag <- mostProbable(score.cache = mycache)myres <- fitAbn(object = mp_dag,                create.graph = TRUE)plot(myres) # plot the best model## Fit the known true DAG (up to variables 'b4' and 'b5'):true_dag <- matrix(data = 0, ncol = 8, nrow = 8)colnames(true_dag) <- rownames(true_dag) <- names(mydists)true_dag["p2", c("b1", "p1")] <- 1true_dag["b3", c("b1", "g1", "b2")] <- 1true_dag["g2", c("p1", "g1", "b2")] <- 1true_dag["g3", c("g1", "b2")] <- 1fitAbn(dag = true_dag,       data.df = mydat,       data.dists = mydists)$mlik################################################################### Example 2 - models with random effects################################################################### This data comes with abn see ?ex3.dag.datamydat <- ex3.dag.data[, c(1:4, 14)]mydists <- list(b1 = "binomial",                b2 = "binomial",                b3 = "binomial",                b4 = "binomial")## This takes a few seconds and requires INLA:mycache_mixed <- buildScoreCache(data.df = mydat,                                 data.dists = mydists,                                 group.var = "group",                                 max.parents = 2)## Find the most probable DAG:mp_dag <- mostProbable(score.cache = mycache_mixed)## and get goodness of fit:fitAbn(object = mp_dag,       group.var = "group")$mlik## End(Not run)

Print number of observations of objects of classabnFit

Description

Print number of observations of objects of classabnFit

Usage

## S3 method for class 'abnFit'nobs(object, ...)

Arguments

object

Object of classabnFit

...

additional parameters. Not used at the moment.

Value

prints the number of observations of the fitted model.


Probability to odds

Description

Probability to odds

Usage

odds(x)

Arguments

x

numeric vector of probabilities with values between[0,1].

Value

numeric vector of same length asx.


Odds Ratio from a matrix

Description

Compute the odds ratio from a contingency table or a matrix.

Usage

or(x)

Arguments

x

a 2x2 table or matrix.

Value

A real value.


Dataset related to diseases present in ‘finishing pigs’, animals about to enter the human food chain at an abattoir.

Description

The data we consider here comprise of a randomly chosen batch of 50 pigs fromeach of 500 randomly chosen pig producers in the UK.The dataset consists of 25000 observations, 10 binary variables, and a grouping variable.These are ‘finishing pigs’, animals about to enter the human food chain at an abattoir.Further description of the data set is present on the vignette.

Usage

pigs.vienna

Format

A data frame with a mixture of 10 discrete variables, each of which is set as a factor, and a grouping variable.

PC

Binary.

PT

Binary.

MS

Binary.

HS

Binary.

TAIL

Binary.

Abscess

Binary.

Pyaemia

Binary.

EPcat

Binary.

PDcat

Binary.

plbinary

Binary.

batch

Group variable, corresponding to the 500 different pig producers

Details

This dataset was used in an older version of the vignette.See also the files provided in the package directoriesinst/bootstrapping_example andinst/old_vignette give adetailed analysis of the dataset and provide more details for abootstrapping example thereof.

References

Hartnack, S., et al. (2016) "Attitudes of Austrian veterinarians towards euthanasia in small animal practice: impacts of age and gender on views on euthanasia." BMC Veterinary Research 12.1: 26.


Plots DAG from an object of classabnDag

Description

Plots DAG from an object of classabnDag

Usage

## S3 method for class 'abnDag'plot(x, ...)

Arguments

x

Object of classabnDag

...

additional parameters. Not used at the moment.

Value

Rgraphviz::plot

Examples

mydag <- createAbnDag(dag = ~a+b|a,                      data.df = data.frame("a"=1, "b"=1),                      data.dists = list(a="binomial", b="gaussian"))plot(mydag)

Plot objects of classabnFit

Description

Plot objects of classabnFit

Usage

## S3 method for class 'abnFit'plot(x, ...)

Arguments

x

Object of classabnFit

...

additional parameters. Not used at the moment.

Value

a plot of the fitted model.


Plot objects of classabnHeuristic

Description

Plot objects of classabnHeuristic

Usage

## S3 method for class 'abnHeuristic'plot(x, ...)

Arguments

x

Object of classabnHeuristic

...

additional parameters. Not used at the moment.

Value

plot of the scores of the heuristic search.


Plot objects of classabnHillClimber

Description

Plot objects of classabnHillClimber

Usage

## S3 method for class 'abnHillClimber'plot(x, ...)

Arguments

x

Object of classabnHillClimber

...

additional parameters. Not used at the moment.

Value

plot of the consensus DAG.


Plot objects of classabnMostprobable

Description

Plot objects of classabnMostprobable

Usage

## S3 method for class 'abnMostprobable'plot(x, ...)

Arguments

x

Object of classabnMostprobable

...

additional parameters. Not used at the moment.

Value

plot of the mostprobable consensus DAG.


Plot an ABN graphic

Description

Plot an ABN DAG using formula statement or a matrix in using Rgraphviz through the graphAM class.

Usage

plotAbn(dag, data.dists=NULL, markov.blanket.node=NULL, fitted.values=NULL,               digits=2, edge.strength=NULL, edge.strength.lwd=5, edge.direction="pc",               edge.color="black", edge.linetype="solid", edge.arrowsize=0.6,               edge.fontsize=node.fontsize, node.fontsize=12,               node.fillcolor=c("lightblue", "brown3", "chartreuse3"),               node.fillcolor.list=NULL,               node.shape=c("circle", "box", "ellipse", "diamond"),               plot=TRUE,               data.df=NULL, ...)

Arguments

dag

a matrix or a formula statement (see details for format) definingthe network structure, a Directed Acyclic Graph (DAG).Note that rownames must be set or given indata.dists.

data.dists

a named list giving the distribution for each node in the network, see details.

markov.blanket.node

name of variables to display its Markov blanket.

fitted.values

modes or coefficents outputted fromfitAbn.

digits

number of digits to display thefitted.values.

edge.strength

a named matrix containing evaluations of edge strengthwhich will change the arcs width (could be Mutual information, p-values,number of bootstrap retrieve samples or the outcome of thelinkStrength).

edge.strength.lwd

maximum line width foredge.strength.

edge.direction

character giving the direction in which arcs shouldbe plotted,pc (parent to child) orcp (child to parent) orundirected.

edge.color

the colour of the edge.

edge.linetype

the linetype of the edge. Defaults to"solid".Valid values are the same as for the R's base graphic parameterlty.

edge.arrowsize

the thickness of the arrows. Not relevant ifarc.strength is provided.

edge.fontsize

the font size of the arcs fitted values.

node.fontsize

the font size of the nodes names.

node.fillcolor

the colour of the node. Second and third element isused for the Markov blanket and node of the Markov blanket.

node.fillcolor.list

the list of node that should be coloured.

node.shape

the shape of the nodes according the Gaussian, binomial, Poisson and multinomial distributions.

plot

logical variable, if set toTRUE then the graph is plotted.

data.df

NULL or a data frame containing the data for the nodes in the network. Only needed ifdag is a formula statement. If dag is an object of classabnFit, thendata.df is used from there.

...

arguments passed to the plotting function.

Details

By default binomial nodes are squares, multinomial nodes are empty, Gaussian nodes are circles and poison nodes are ellipses.

Thedag can be provided using a formula statement (similar to glm). A typical formula is ~ node1|parent1:parent2 + node2:node3|parent3.

The construction is based on thegraph package. Properties of the graph can be changed after the construction, see ‘Examples’.

Value

A rendered graph, ifplot=TRUE. ThegraphAM object is returned invisibly.

See Also

graphAM-class,edgeRenderInfo

Examples

# Define distribution listdist <- list(a = "gaussian",             b = "gaussian",             c = "gaussian",             d = "gaussian",             e = "binomial",             f = "binomial")# Define a matrix formulationedge_strength <- matrix(c(0, 0.5, 0.5, 0.7, 0.1, 0,                          0, 0, 0.3, 0.1, 0, 0.8,                          0, 0, 0, 0.35, 0.66, 0,                          0, 0, 0, 0, 0.9, 0,                          0, 0, 0, 0, 0, 0.8,                          0, 0, 0, 0, 0, 0),                        nrow = 6L,                        ncol = 6L,                        byrow = TRUE)## Naming of the matrixcolnames(edge_strength) <- rownames(edge_strength) <- names(dist)## Random Datadf <- data.frame(a = rnorm(100),                b = rnorm(100),                c = rnorm(100),                d = rnorm(100),                e = rbinom(100, 1, 0.5),                f = rbinom(100, 1, 0.5))## Plot form a matrixplotAbn(dag = edge_strength,        data.dists = dist)## Edge strengthplotAbn(dag = ~ a | b:c:d:e + b | c:d:f + c | d:e + d | e + e | f,        data.dists = dist,        edge.strength = edge_strength,        data.df = df)## Plot from a formula for a different DAG!plotAbn(dag = ~ a | b:c:e + b | c:d:f + e | f,        data.dists = dist,        data.df = df)## Markov blanketplotAbn(dag = ~ a | b:c:e + b | c:d:f + e | f,        data.dists = dist,        markov.blanket.node = "e",        data.df = df)## Change col for all edgestmp <- plotAbn(dag = ~ a | b:c:e + b | c:d:f + e | f,               data.dists = dist,               plot = FALSE,               data.df = df)graph::edgeRenderInfo(tmp) <- list(col = "blue")Rgraphviz::renderGraph(tmp)## Change lty for individual ones. Named vector is necessarytmp <- plotAbn(dag = ~ a | b:c:e + b | c:d:f + e | f,               data.dists = dist,               plot = FALSE,               data.df = df)edgelty <- rep(c("solid", "dotted"), c(6, 1))names(edgelty) <- names(graph::edgeRenderInfo(tmp, "col"))graph::edgeRenderInfo(tmp) <- list(lty = edgelty)Rgraphviz::renderGraph(tmp)

Bugs code for Poisson response

Description

Bugs model for count response variableX \sim \mathcal{Pois}(\lambda).

Usage

pois_bugs(nodename, nodesintercept, parentnames, parentcoefs)pois_bugsGroup(nodename, nodesintercept, parentnames, parentcoefs, sigma_alpha)

Arguments

nodename

character string of response variable name.

nodesintercept

overall mean of response. Parameter from fixed-effects intercept.

parentnames

single character string (for one parent) or vector of characters (for multiple parent nodes) with parent node (predictor variables) names.

parentcoefs

overall slope for each predictor (parent node) variable (fixed-effects).

sigma_alpha

between-group variance. Parameter from random-effects intercept.

Value

Bugs model returned as stdout.

Functions

See Also

makebugssimulateAbn

Examples

pois_bugs(nodename = "a",          parentnames = c("b", "c"),          nodesintercept = c(0.318077),          parentcoefs = list("b"=c(b=0.3059395),                             "c"=c(c=0.5555)))

Print objects of classabnCache

Description

Print objects of classabnCache

Usage

## S3 method for class 'abnCache'print(x, digits = 3, ...)

Arguments

x

Object of classabnCache

digits

number of digits of the results.

...

additional parameters. Not used at the moment.

Value

summary statement of the class ofabnCache.

Examples

## Subset of the build-in dataset, see  ?ex0.dag.datamydat <- ex0.dag.data[,c("b1","b2","g1","g2","b3","g3")] ## take a subset of cols## setup distribution list for each nodemydists <- list(b1="binomial", b2="binomial", g1="gaussian",                g2="gaussian", b3="binomial", g3="gaussian")# Structural constraints# ban arc from b2 to b1# always retain arc from g2 to g1## parent limitsmax.par <- list("b1"=2, "b2"=2, "g1"=2, "g2"=2, "b3"=2, "g3"=2)## now build the cache of pre-computed scores accordingly to the structural constraintsif(requireNamespace("INLA", quietly = TRUE)){  # Run only if INLA is availableres.c <- buildScoreCache(data.df=mydat, data.dists=mydists,                         dag.banned= ~b1|b2, dag.retained= ~g1|g2, max.parents=max.par)print(res.c)}

Print objects of classabnDag

Description

Print objects of classabnDag

Usage

## S3 method for class 'abnDag'print(x, digits = 3L, ...)

Arguments

x

Object of classabnDag

digits

number of digits of the adjacency matrix.

...

additional parameters. Not used at the moment.

Value

outputs adjacency matrix and statement of the class ofx.

Examples

mydag <- createAbnDag(dag = ~a+b|a, data.df = data.frame("a"=1, "b"=1))print(mydag)

Print objects of classabnFit

Description

Print objects of classabnFit

Usage

## S3 method for class 'abnFit'print(x, digits = 3L, ...)

Arguments

x

Object of classabnFit

digits

number of digits of the results.

...

additional parameters. Not used at the moment.

Value

prints the parameters of the fitted model.


Print objects of classabnHeuristic

Description

Print objects of classabnHeuristic

Usage

## S3 method for class 'abnHeuristic'print(x, digits = 2L, ...)

Arguments

x

Object of classabnHeuristic

digits

number of digits of the results.

...

additional parameters. Not used at the moment.

Value

prints the best score found and the distribution of the scores.


Print objects of classabnHillClimber

Description

Print objects of classabnHillClimber

Usage

## S3 method for class 'abnHillClimber'print(x, digits = 3L, ...)

Arguments

x

Object of classabnHillClimber

digits

number of digits of the results.

...

additional parameters. Not used at the moment.

Value

prints the consensus DAG and the class of the object.


Print objects of classabnMostprobable

Description

Print objects of classabnMostprobable

Usage

## S3 method for class 'abnMostprobable'print(x, digits = 3L, ...)

Arguments

x

Object of classabnMostprobable

digits

number of digits of the results.

...

additional parameters. Not used at the moment.

Value

prints the mostprobable consensus DAG.


Rank of a matrix

Description

similar tobase::rank

Usage

rank_cpp(A)

Value

an integer


Compute the score's contribution in a network of each observation.

Description

This function computes the score's contribution of each observation to the total network score.

Usage

scoreContribution(object = NULL,                         dag = NULL, data.df = NULL, data.dists = NULL,                         verbose = FALSE)

Arguments

object

an object of class 'abnLearned'produced bymostProbable,searchHeuristic orsearchHillClimber.

dag

a matrix or a formula statement (see details) defining the network structure,a directed acyclic graph (DAG), see details for format.Note that colnames and rownames must be set.

data.df

a data frame containing the data used for learning the network,binary variables must be declared as factors andno missing values all allowed in any variable.

data.dists

a named list giving the distribution for each nodein the network, see details.

verbose

ifTRUE then provides some additional output.

Details

This function computes the score contribution of each observationto the total network score.This function is available only in themle settings.To do so one uses theglm andpredict functions.This function is an attempt to perform diagnostic for an ABN analysis.

Value

A named list that contains the scores contributions:maximum likelihood, aic, bic, mdl and diagonal values of the hat matrix.

Examples

## Not run: ## Use a subset of a built-in simulated data setmydat <- ex1.dag.data[,c("b1","g1","p1")]## setup distribution list for each nodemydists <- list(b1="binomial", g1="gaussian", p1="poisson")## now build cachemycache <- buildScoreCache(data.df = mydat, data.dists = mydists, max.parents = 2, method = "mle")## Find the globally best DAGmp.dag <- mostProbable(score.cache=mycache, score="bic", verbose = FALSE)out <- scoreContribution(object = mp.dag)## Observations contribution per network nodeboxplot(out$bic)## End(Not run)

A family of heuristic algorithms that aims at finding high scoring directed acyclic graphs

Description

A flexible implementation of multiple greedy search algorithms to find high scoring network (DAG)

Usage

searchHeuristic(score.cache, score = "mlik",                       num.searches = 1, seed = 42L, start.dag = NULL,                       max.steps = 100,                       algo = "hc", tabu.memory = 10, temperature = 0.9,                       verbose = FALSE, ...)

Arguments

score.cache

output frombuildScoreCache().

score

which score should be used to score the network. Possible choices areaic, bic, mdl, mlik.

num.searches

a positive integer giving the number of different search to run, see details.

seed

a non-negative integer which sets the seed.

start.dag

a DAG given as a matrix, see details for format, which can be used to explicity provide a starting point for the structural search.

max.steps

a constant giving the number of search steps per search, see details.

algo

which heuristic algorithm should be used. Possible choices are:hc, tabu, sa.

tabu.memory

a non-negative integer number to set the memory of thetabu search.

temperature

a real number giving the update in temperature for thesa (simulated annealing) search algorithm.

verbose

if TRUE then provides some additional output.

...

further arguments passed to or from other methods.

Details

This function is a flexible implementation of multiple greedy heuristic algorithms,particularly well adapted to theabn framework.It targets multi-random restarts heuristic algorithms.The user can select thenum.searches and the maximum number of stepswithin bymax.steps. The optimization algorithm within each search isrelatively rudimentary.

The functionsearchHeuristic is different from thesearchHillClimber in the sense that this function is fullywritten in R, whereas thesearchHillClimber is written in Cand thus expected to be faster. The functionsearchHillClimberat each hill-climbing step consider every information from the pre-computedscores cache while the functionsearchHeuristic performs a localstepwise optimization. This function chooses a structural move (or edge move)and compute the score's change. On this point, it is closer to the MCMCMCalgorithm from Madigan and York (1995) and Giudici and Castelo (2003)with a single edge move.

If the user selectrandom, then a random valid DAG is selected.The routine used favourise low density structure.The function implements three algorithm selected with theparameteralgo:hc,tabu orsa.

Ifalgo=hc:The Hill-climber algorithm (hc) is a single move algorithm.At each Hill-climbing step within a search an arc is attempted to be added.The new score is computed and compared to the previous network's score.

Ifalgo=tabu:The same algorithm is as withhc is used, but a list of banned movesis computed. The parametertabu.memory controls the length of the tabulist. The idea is that the classical Hill-climber algorithm is inefficientwhen it should cross low probability regions to unblock from a local maximumand reaching a higher score peak. By forcing the algorithm to choose some notalready used moves, this will force the algorithm to escape the local maximum.

Ifalgo=sa:This variant of the heuristic search algorithm is based on simulated annealingdescribed by Metropolis et al. (1953).Some accepted moves could result in a decrease of the network score.The acceptance rate can be monitored with the parametertemperature.

Value

An object of classabnHeuristic (which extends the classabnLearnd) and contains list with entires:

dags

a list of DAGs

scores

a vector giving the network score for the locally optimal network for each search

detailed.score

a vector giving the evolution of the network score for the all the random restarts

score

a number giving the network score for the locally optimal network

score.cache

the pre-computed cache of scores

num.searches

a numeric giving the number of random restart

max.steps

a numeric giving the maximal number of optimization steps within each search

algorithm

a character for indicating the algorithm used

References

Heckerman, D., Geiger, D. and Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data.Machine Learning, 20, 197-243.Madigan, D. and York, J. (1995) "Bayesian graphical models for discrete data". International Statistical Review, 63:215232.Giudici, P. and Castelo, R. (2003). "Improving Markov chain Monte Carlo model search for data mining". Machine Learning, 50:127158.Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). "Equation of state calculations by fast computing machines". The journal of chemical physics, 21(6), 1087-1092.

Examples

## Not run: ################################################ example: use built-in simulated data set##############################################mydat <- ex1.dag.data ## this data comes with abn see ?ex1.dag.data## setup distribution list for each nodemydists<-list(b1="binomial", p1="poisson", g1="gaussian", b2="binomial",              p2="poisson", b3="binomial", g2="gaussian", b4="binomial",              b5="binomial", g3="gaussian")mycache <- buildScoreCache(data.df = mydat, data.dists = mydists, max.parents = 2)## Now peform 10 greedy searchesheur.res <- searchHeuristic(score.cache = mycache, data.dists = mydists,                            start.dag = "random", num.searches = 10,                            max.steps = 50)## Plot (one) dagplotAbn(heur.res$dags[[1]], data.dists = mydists)## End(Not run)

Find high scoring directed acyclic graphs using heuristic search.

Description

Find high scoring network (DAG) structures using a random re-starts greedy hill-climber heuristic search.

Usage

searchHillClimber(score.cache, score = "mlik", num.searches = 1, seed = 42,                         start.dag = NULL, support.threshold = 0.5, timing.on = TRUE,                         dag.retained = NULL, verbose = FALSE, ...)

Arguments

score.cache

output frombuildScoreCache().

score

character giving which network score should be used toselect the structure. Currently'mlik' only.

num.searches

number of times to run the search.

seed

non-negative integer which sets the seed in the GSL random number generator.

start.dag

a DAG given as a matrix, see details for format, which can be used to provide a starting point for the structural search explicitly.

support.threshold

the proportion of search results - each locally optimal DAG - in which each arc must appear to be a part of the consensus network.

timing.on

extra output in terms of duration computation.

dag.retained

a DAG given as a matrix, see details for format. This is necessary if the score.cache was created using an explicit retain matrix, and the same retain matrix should be used here. dag.retained is used by the algorithm which generates the initial random DAG to ensure that the necessary arcs are retained.

verbose

extra output.

...

further arguments passed to or from other methods.

Details

The procedure runs a greedy hill-climbing search similar, but not identical,to the method presented initially in Heckerman et al. 1995.(Machine Learning, 20, 197-243).Each search begins with a randomly chosen DAG structure where this isconstructed in such a way as to attempt to choose a DAG uniformly fromthe vast landscape of possible structures. The algorithm used is as follows:given a node cache (frombuildScoreCache,then within this set of all allowed local parent combinations,a random combination is chosen for each node.This is then combined into a full DAG, which is then checked for cycles,where this check iterates over the nodes in a random order.If all nodes have at least one child (i.e., at least one cycle is present),then the first node examined has all its children removed, and the check forcycles is then repeated. If this has removed the only cycle present,then this DAG is used at the starting point for the search,if not, a second node is chosen (randomly) and the process is thenrepeated until a DAG is obtained.

The actual hill-climbing algorithm itself differs slightly from that presentedin Heckerman et al. as a full cache of all possible local combinations are available.At each hill-climbing step, everything in the node cache is considered.In other words, all possible single swaps between members of cache currently presentin the DAG and those in the full cache. The single swap, which provides the greatestincrease in goodness of fit is chosen. A single swap here is the removalor addition of any one node-parent combination present in the cache whileavoiding a cycle. This means that as well as all single arc changes(addition or removal), multiple arc changes are also considered at each same step,note however that arc reversal in this scheme takes two steps (as this requiresfirst removal of a parent arc from one node and then addition of a parent arcto a different node). The original algorithm perturbed the current DAG by onlya single arc at each step but also included arc reversal.The current implementation may not be any more efficient than the originalbut is arguably more natural given a pre-computed cache of local scores.

A start DAG may be provided in which case num.searches mustequal 1 - this option is really just to provide a local searcharound a previously identified optimal DAG.

This function is designed for two different purposes:i) interactive visualization; andii) longer batch processing. The former provides easy visual "eyeballing" ofdata in terms of its majority consensus network (or similar threshold),which is a graphical structure which comprises of all arcs which feature ina given proportion (support.threshold) of locally optimal DAGs alreadyidentified during the run. The general hope is that this structure willstabilize - become fixed - relatively quickly, at least for problems withsmaller numbers of nodes.

Value

A list with entries:

init.score

a vector giving network score for initial network from which the search commenced

final.score

a vector giving the network score for the locally optimal network

init.dag

list of matrices, initial DAGs

final.dag

list of matrices, locally optimal DAGs

consensus

a matrix holding a binary graph, not necessary a DAG!

support.threshold

percentage supported used to create consensus matrix

References

Lewis, F. I., and McCormick, B. J. J. (2012). Revealing the complexity of health determinants in resource poor settings.American Journal Of Epidemiology. DOI:10.1093/aje/KWS183).


Simulate data from a fitted additive Bayesian network.

Description

Simulate data from a fitted additive Bayesian network.

Usage

simulateAbn(  object = NULL,  run.simulation = TRUE,  bugsfile = NULL,  n.chains = 10L,  n.adapt = 1000L,  n.thin = 100L,  n.iter = 10000L,  seed = 42L,  verbose = FALSE,  debug = FALSE)

Arguments

object

of typeabnFit.

run.simulation

call JAGS to simulate data (default isTRUE).

bugsfile

A path to a valid file orNULL (default) to delete the bugs file after simulation.

n.chains

number of parallel chains for the model.

n.adapt

number of iteration for adaptation. Ifn.adapt is set to zero, then no adaptation takes place.

n.thin

thinning interval for monitors.

n.iter

number of iteration to monitor.

seed

by default set to 42.

verbose

if TRUE prints additional output

debug

if TRUE prints bug file content to stdout and does not run simulations.

Value

data.frame

See Also

makebugs

Examples

df <- FCV[, c(12:15)]mydists <- list(Outdoor="binomial",                Sex="multinomial",                GroupSize="poisson",                Age="gaussian")## buildScoreCache -> mostProbable() -> fitAbn()suppressWarnings({  mycache.mle <- buildScoreCache(data.df = df, data.dists = mydists, method = "mle",                                 adj.vars = NULL, cor.vars = NULL,                                 dag.banned = NULL, dag.retained = NULL,                                 max.parents = 1,                                 which.nodes = NULL, defn.res = NULL)}) # ignore non-convergence warningsmp.dag.mle <- mostProbable(score.cache = mycache.mle, verbose = FALSE)myres.mle <- fitAbn(object = mp.dag.mle, method = "mle")myres.sim <- simulateAbn(object = myres.mle,                             run.simulation = TRUE,                             bugsfile = NULL,                             verbose = FALSE)str(myres.sim)prop.table(table(myres.sim$Outdoor))prop.table(table(df$Outdoor))

Simulate a DAG with with arbitrary arcs density

Description

Provided with node names, returns anabnDAG.Arc density refers to the chance of a node being connected to the node before it.

Usage

simulateDag(node.name, data.dists = NULL, edge.density = 0.5, verbose = FALSE)

Arguments

node.name

a vector of character giving the names of the nodes. It gives the size of the simulated DAG.

data.dists

named list corresponding to thenode.name specifying the distribution for each node. If not provided arbitrary distributions are assigned to the nodes.

edge.density

number in[0,1] specifying the edge probability in the dag.

verbose

print more information on the run.

Details

This function generates DAGs by sampling triangular matrices and reorder columns and rows randomly.The network density (edge.density) is used column-wise as binomial sampling probability.Then the matrix is named using the user-provided names.

Value

object of classabnDag consisting of a named matrix, a named list giving the distribution for each node and an empty element for the data.

Examples

simdag <- simulateDag(node.name = c("a", "b", "c", "d"),                      edge.density = 0.5,                      data.dists = list(a = "gaussian",                                        b = "binomial",                                        c = "poisson",                                        d = "multinomial"))## Example using Ozon entries:dist <- list(Ozone="gaussian",   Solar.R="gaussian",  Wind="gaussian",             Temp="gaussian",    Month="gaussian",    Day="gaussian")out <- simulateDag(node.name = names(dist), data.dists = dist, edge.density = 0.8)plot(out)

Computes skewness of a distribution

Description

Computes skewness of a distribution

Usage

skewness(x)

Arguments

x

a numeric vector

Value

integer


Standard Area Under the Marginal

Description

function to get std. are under marginal to exactly unity.It should be very close to unity but in some cases due to numerical accuracydifferences (since each point is a separate estimate) this might be a little adriftturn this option off to see how reliable the original estimation is

Usage

std.area.under.grid(mylist, single)

Arguments

mylist

list of matrices of two cols x, y

single

NULL or TRUE if only a single node and parameter

Value

list


Recursive string splitting

Description

Internal function that call multiple times strsplit() and remove space

Usage

strsplits(x, splits, ...)

Value

A vector of strings


Prints summary statistics from an object of classabnDag

Description

Prints summary statistics from an object of classabnDag

Usage

## S3 method for class 'abnDag'summary(object, ...)

Arguments

object

an object of classabnLearned,abnFit.Alternatively, a matrix or a formula statement defining the network structure,a directed acyclic graph (DAG).Note that row names must be set up or given innode.names.

...

additional parameters. Not used at the moment.

Value

List with summary statistics of the DAG.

Examples

mydag <- createAbnDag(dag = ~a+b|a, data.df = data.frame("a"=1, "b"=1))summary(mydag)

Print summary of objects of classabnFit

Description

Print summary of objects of classabnFit

Usage

## S3 method for class 'abnFit'summary(object, digits = 3L, ...)

Arguments

object

Object of classabnFit

digits

number of digits of the results.

...

additional parameters. Not used at the moment.

Value

prints summary statistics of the fitted model.


Print summary of objects of classabnMostprobable

Description

Print summary of objects of classabnMostprobable

Usage

## S3 method for class 'abnMostprobable'summary(object, ...)

Arguments

object

Object of classabnMostprobable

...

additional parameters. Not used at the moment.

Value

prints the mostprobable consensus DAG and the number of observations used to calculate it.


tidy up cache

Description

tidy up cache

Usage

tidy.cache(thecache)

Value

list of chache with error indexes removed


Convert a DAG into graphviz format

Description

Given a matrix defining a DAG create a text file suitable for plotting with graphviz.

Usage

toGraphviz(dag,                  data.df=NULL,                  data.dists=NULL,                  group.var=NULL,                  outfile=NULL,                  directed=TRUE,                  verbose=FALSE)

Arguments

dag

a matrix defining a DAG.

data.df

a data frame containing the data used for learning the network.

data.dists

a list with named arguments matching the names of the data frame which gives the distribution family for each variable. SeefitAbn for details.

group.var

only applicable for mixed models and gives the column name indata.df of the grouping variable (which must be a factor denoting group membership). SeefitAbn for details.

outfile

a character string giving the filename which will contain the graphviz graph.

directed

logical; if TRUE, a directed acyclic graph is produced, otherwise an undirected graph.

verbose

if TRUE more output is printed. If TRUE and 'outfile=NULL' the '.dot' file is printed to console.

Details

Graphviz (https://www.graphviz.org) is a visualisation software developed by AT&T and freely available.This function creates a text representation of the DAG, or the undirected graph, so this can be plotted using graphviz.The R package,Rgraphviz (available through the Bioconductor projecthttps://www.bioconductor.org/) interfaces R and the working installation of graphviz.

Binary nodes will appear as squares, Gaussian as ovals and Poisson nodes as diamonds in the resulting graphviz network diagram.There are many other shapes possible for nodes and numerous other visual enhancements - see online graphviz documentation.

Bespoke refinements can be added by editing the raw outfile produced.For full manual editing, particularly of the layout, or adding annotations,one easy solution is to convert a postscript format graph (produced in graphviz using the -Tps switch)into a vector format using a tool such as pstoedit (http://www.pstoedit.net/),and then edit using a vector drawing tool like xfig.This can then be resaved as postscript or pdf thus retaining full vector quality.

Value

Nothing is returned, but a fileoutfile written.

Author(s)

Fraser Iain Lewis

Marta Pittavino

Examples

## On a typical linux system the following code constructs a nice## looking pdf file 'graph.pdf'.## Not run: ## Subset of a build-in datasetmydat <- ex0.dag.data[,c("b1","b2","b3","g1","b4","p2","p4")]## setup distribution list for each nodemydists <- list(b1="binomial", b2="binomial", b3="binomial",                g1="gaussian", b4="binomial", p2="poisson",                                p4="poisson")## specify DAG modelmydag <- matrix(c(   0,1,0,0,1,0,0, #                     0,0,0,0,0,0,0, #                     0,1,0,0,1,0,0, #                     1,0,0,0,0,0,1, #                     0,0,0,0,0,0,0, #                     0,0,0,1,0,0,0, #                     0,0,0,0,1,0,0  #), byrow=TRUE, ncol=7)colnames(mydag) <- rownames(mydag) <- names(mydat)## create file for processing with graphvizoutfile <- paste(tempdir(), "graph.dot", sep="/")toGraphviz(dag=mydag, data.df=mydat, data.dists=mydists, outfile=outfile)## and then process using graphviz tools e.g. on linuxif(Sys.info()[["sysname"]] == "Linux" && interactive()) {  system(paste( "dot -Tpdf -o graph.pdf", outfile))  system("evince graph.pdf")}## Example using data with a group variable  where b1<-b2mydag <- matrix(c(0,1, 0,0), byrow=TRUE, ncol=2)colnames(mydag) <- rownames(mydag) <- names(ex3.dag.data[,c(1,2)])## specific distributionsmydists <- list(b1="binomial", b2="binomial")## create file for processing with graphvizoutfile <- paste0(tempdir(), "/graph.dot")toGraphviz(dag=mydag, data.df=ex3.dag.data[,c(1,2,14)], data.dists=mydists,           group.var="group",           outfile=outfile, directed=FALSE)## and then process using graphviz tools e.g. on linux:if(Sys.info()[["sysname"]] == "Linux" && interactive()) {  pdffile <- paste0(tempdir(), "/graph.pdf")  system(paste("dot -Tpdf -o ", pdffile, outfile))  system(paste("evince ", pdffile, " &"))   ## or some other viewer}## End(Not run)

Check for valid distribution

Description

The distribution names must match⁠inla() family=''⁠.Similar toget.var.types(), mainly different in output.

Usage

validate_dists(data.dists, returnDists = TRUE, ...)

Arguments

data.dists

list of variable distributions.

returnDists

if TRUE (default) returns the same list as provided.

...

additional arguments.

Value

either TRUE/FALSE or list of variable distributions as provided as input.


simulated dataset from a DAG comprising of 33 variables

Description

250 observations simulated from a DAG with 17 binary variables and 16 continuous.A DAG of this data features in the vignette.Note that the conditional dependence relations given are those in the populationand may differ in the realization of 250 observations.

Usage

var33

Format

A data frame with a mixture of discrete variables each of which isset as a factor and continuous variables.Joint distribution structure used to generate the data.

v1

Binary, independent.

v2

Gaussian, conditionally dependent upon v1.

v3

Binary, independent.

v4

Binary, conditionally dependent upon v3.

v5

Gaussian, conditionally dependent upon v6.

v6

Binary, conditionally dependent upon v4 and v7.

v7

Gaussian, conditionally dependent upon v8.

v8

Gaussian, conditionally dependent upon v9.

v9

Binary, conditionally dependent upon v10.

v10

Binary, independent.

v11

Binary, conditionally dependent upon v10, v12 and v19.

v12

Binary, independent.

v13

Gaussian, independent.

v14

Gaussian, conditionally dependent upon v13.

v15

Binary, conditionally dependent upon v14 and v21.

v16

Gaussian, independent.

v17

Gaussian, conditionally dependent upon v16 and v20.

v18

Binary, conditionally dependent upon v20.

v19

Binary, conditionally dependent upon v20.

v20

Binary, independent.

v21

Binary, conditionally dependent upon v20.

v22

Gaussian, conditionally dependent upon v21.

v23

Gaussian, conditionally dependent upon v21.

v24

Gaussian, conditionally dependent upon v23.

v25

Gaussian, conditionally dependent upon v23 and v26.

v26

Binary, conditionally dependent upon v20.

v27

Binary, independent.

v28

Binary, conditionally dependent upon v27, v29 and v31.

v29

Gaussian, independent.

v30

Gaussian, conditionally dependent upon v29.

v31

Gaussian, independent.

v32

Binary, conditionally dependent upon v21, v29 and v31.

v33

Gaussian, conditionally dependent upon v31.

Examples

## Constructing the DAG of the dataset:dag33 <- matrix(0, 33, 33)dag33[2,1] <- 1dag33[4,3] <- 1dag33[6,4] <- 1; dag33[6,7] <- 1dag33[5,6] <- 1dag33[7,8] <- 1dag33[8,9] <- 1dag33[9,10] <- 1dag33[11,10] <- 1; dag33[11,12] <- 1; dag33[11,19] <- 1;dag33[14,13] <- 1dag33[17,16] <- 1; dag33[17,20] <- 1dag33[15,14] <- 1; dag33[15,21] <- 1dag33[18,20] <- 1dag33[19,20] <- 1dag33[21,20] <- 1dag33[22,21] <- 1dag33[23,21] <- 1dag33[24,23] <- 1dag33[25,23] <- 1; dag33[25,26] <- 1dag33[26,20] <- 1dag33[33,31] <- 1dag33[33,31] <- 1dag33[32,21] <- 1; dag33[32,31] <- 1; dag33[32,29] <- 1dag33[30,29] <- 1dag33[28,27] <- 1; dag33[28,29] <- 1; dag33[28,31] <- 1

[8]ページ先頭

©2009-2025 Movatter.jp