Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Capture-Recapture Estimation using Bayesian Model Averaging
Version:2.0.2
Date:2025-09-21
Description:Performs Bayesian model averaging for capture-recapture. This includes code to stratify records, check the strata for suitable overlap to be used for capture-recapture, and some functions to plot the estimated population size.
License:GPL-2 |GPL-3 [expanded from: GPL (≥ 2)]
LazyLoad:yes
NeedsCompilation:yes
ByteCompile:yes
Imports:chron, Rcpp
RoxygenNote:7.1.1
LinkingTo:Rcpp, RcppArmadillo
Suggests:testthat
Packaged:2025-09-21 17:51:49 UTC; olivier.binette
Author:James Johndrow [aut], Kristian Lum [aut], Patrick Ball [aut], Olivier Binette [ctb, cre]
Maintainer:Olivier Binette <olivier.binette@gmail.com>
Repository:CRAN
Date/Publication:2025-09-21 18:00:02 UTC

Capture-Recapture Estimation using Bayesian Model Averaging

Description

Performs Bayesian model averaging over all decomposable graphical models,each representing a different list dependence structure, with the goal ofestimating the number of uncounted elements from the universe of elementsthat could have been recorded or “captured". The code here is animplementation of the model described in Madigan and York (1997).

Details

Package: dga
Type: Package
Version:1.2
Date: 2015-04-16
License: GPL >=2

Author(s)

James Johndrowjames.johndrow@gmail.com, Kristian Lumkl@hrdag.org, and Patrick Ballpball@hrdag.org

Maintainer: Kristian Lumkl@hrdag.org

References

Madigan, David, and Jeremy C. York. "Bayesian methods forestimation of the size of a closed population." Biometrika 84.1 (1997):19-31.

Examples

### This simulated example goes through the whole process for 3 lists.###library(chron)# Simulate some dataN <- 1000 # true population sizenum.lists <- 3 # number of lists# simulate 3 lists independently 'capturing'l1 <- rbinom(N, 1, .2)l2 <- rbinom(N, 1, .25)l3 <- rbinom(N, 1, .3)overlaps <- data.frame(l1, l2, l3)# simulate dates of recordingdates <- paste(  rep(2015, N), "-", sample(1:4, N, replace = TRUE), "-",  sample(1:28, N, replace = TRUE))dates <- chron(dates, format = c(dates = "y-m-d"))# save true number in each stratum for comparison latertruth <- table((months(dates)))[1:4]# remove dates of unrecorded elementsdates <- dates[apply(overlaps, 1, sum) > 0]# remove individuals not recorded on any listoverlaps <- overlaps[apply(overlaps, 1, sum) > 0, ]# stratify by datestrata <- make.strata(overlaps, dates = dates, date.defs = "monthly")# check to make sure that all strata are OKcheck <- check.strata(strata)# look at strata, just to make surepar(mfrow = c(2, 2), mar = rep(1, 4))for (i in 1:nrow(strata$overlap.counts)) {  venn3(strata$overlap.counts[i, ],    main = rownames(strata$overlap.counts)[i],    cex.main = 1  )}# load the graphs to make the estimatesdata(graphs3)# select expansion factor defining the largest number of unrecorded elements.# this makes Nmissing <- 0:(sum(Y)*fac)fac <- 5# set priordelta <- 1 / 2^num.lists# loop over strata to calculate posterior distributions of# the total population size for each stratumpar(mfrow = c(2, 2), mar = rep(1.75, 4))# if using Rstudio, make sure your plot window is pretty big here!for (i in 1:nrow(strata$overlap.counts)) {  Nmissing <- 0:(sum(strata$overlap.counts[i, ]) * fac)  Y <- array(strata$overlap.counts[i, ], dim = rep(2, num.lists))  weights <- bma.cr(Y, Nmissing, delta, graphs3)  plotPosteriorN(weights, Nmissing + sum(strata$overlap.counts[i, ]),    main = rownames(strata$overlap.counts)[i]  )  points(truth[i], .5 * max(weights), col = "red", pch = 16, cex = 2)}

Computes Marginal Likelihoods for Each Clique and Value of Nmissing

Description

Assembles all of the pieces of the marginal likelihoods to be used tocalculate the posterior probability of each model/value of Nmissing.

Usage

CompLogML(D, Nmissing, delta)

Arguments

D

A marginal array of the list overlap counts.

Nmissing

The vector of possible values for the missing cell.

delta

The prior hyper parameter for the Dirichlet distribution.

Value

The log marginal likelihood of the marginal table.

Author(s)

James Johndrowjames.johndrow@gmail.com and Kristian Lumkl@hrdag.org

References

Madigan, David, and Jeremy C. York. "Bayesian methods forestimation of the size of a closed population." Biometrika 84.1 (1997):19-31.

Examples

Y <- c(0, 27, 37, 19, 4, 4, 1, 1, 97, 22, 37, 25, 2, 1, 3, 5,       83, 36, 34, 18, 3, 5, 0, 2, 30, 5, 23, 8, 0, 3, 0, 2)Y <- array(Y, dim = c(2, 2, 2, 2, 2))# Compute marginal array over lists 1 and 3D <- apply(Y, c(1, 3), sum)dga:::CompLogML(D, 1:300, 0.5)

Component-wise Matrix of Log Marginal Likelihoods

Description

Calls CompLogML to create a matrix of number of possible components bylength(Nmissing) log marginal likelihoods. Calculates the log marginallikehood of each possible marginal table for every value of Nmissing.

Usage

MakeCompMatrix(p, delta, Y, Nmissing)

Arguments

p

Number of lists

delta

Prior hyperparameter of the Dirichlet distribution.

Y

The2^k matrix of list intersection counts.

Nmissing

The vector of possible values for the missing cell.

Value

A matrix of log marginal likelihoods.

Author(s)

James Johndrowjames.johndrow@gmail.com and Kristian Lumkl@hrdag.org


Bayesiam Model Averaging for Capture-Recapture

Description

This function averages over all decomposable graphical models for p lists.

Usage

bma.cr(  Y,  Nmissing,  delta,  graphs,  logprior = NULL,  log.prior.model.weights = NULL)

Arguments

Y

a2^p array of list intersection counts. See details.

Nmissing

A vector of all possible values for the number ofindividuals that appear on no list.

delta

The hyper-parameter for the hyper-Dirichlet prior distributionon list intersection probabilities. A smaller value indicates fewer priorobservations per cell. A suggested default is2^-p

graphs

A pre-computed list of all decomposable graphical models forp lists. These should be loaded using data(graphsp); see example.Currently, this package includes a list of graphs for three, four, or fivelists.

logprior

The log of the prior probability of each value in Nmissing.If left blank, this will default to the -log(Nmissing).

log.prior.model.weights

Prior weights on the graphs. This should be avector of length length(graphs).

Details

This is the main function in this package. It performs capture-recapture(or multiple systems estimation) using Bayesian model averaging as outlinedin Madigan and York (1997).

Y can be created by the array() command from a vector that is orderedlexigraphically by the cell names, e.g., c(x000, x001, x010, x011, x100,x101, x110, x111).

Value

This function returns a matrix of weights, where rows correspond tomodels and columns correspond to values of Nmissing. Thus, theijthentry of the matrix is the posterior probability of theith model andthejth entry of Nmissing. Row sums return posterior probabilities bymodel.Column sums return posterior probabilities by value of Nmissing.

Note

This function is pretty robust relative to the more common log-linearmodel approach to capture-recapture. It will not fail (or issue a numericalwarning) even if there are no overlaps among the lists. The user should takecare that there is adequate list overlap and that there are sufficient casesin the stratum.

Author(s)

James Johndrowjames.johndrow@gmail.com and Kristian Lum(kl@hrdag.org)

References

Madigan, David, and Jeremy C. York. "Bayesian methods forestimation of the size of a closed population." Biometrika 84.1 (1997):19-31.

Examples

#### 5 list example from M & Y ##########delta <- .5Y <- c(0, 27, 37, 19, 4, 4, 1, 1, 97, 22, 37, 25, 2, 1, 3, 5,       83, 36, 34, 18, 3, 5, 0, 2, 30, 5, 23, 8, 0, 3, 0, 2)Y <- array(Y, dim = c(2, 2, 2, 2, 2))Nmissing <- 1:300N <- Nmissing + sum(Y)data(graphs5)weights <- bma.cr(Y, Nmissing, delta, graphs5)plotPosteriorN(weights, N)##### 3 list example from M & Y #######Y <- c(0, 60, 49, 4, 247, 112, 142, 12)Y <- array(Y, dim = c(2, 2, 2))delta <- 1a <- 13.14b <- 55.17Nmissing <- 1:300N <- Nmissing + sum(Y)logprior <- N * log(b) - (N + a) * log(1 + b) + lgamma(N + a) - lgamma(N + 1) - lgamma(a)data(graphs3)weights <- bma.cr(Y, Nmissing, delta, graphs3, logprior)plotPosteriorN(weights, N)

A Helper Function for make.strata

Description

A helper function used in make.strata to make list overlap counts.

Usage

cfunction(x, nlist)

Arguments

x

capture histories, transformed from binary to decimal

nlist

the number of lists

Value

a table of the number of records with each capture history

Author(s)

Kristian Lumkl@hrdag.org

Examples

## The function is currently defined ascfunction <- function(x, nlist) {  out <- table(c(x, 0:(2^nlist - 1))) - 1}

Checks Each Stratum for Suitability for Capture-Recapture

Description

Takes in list intersection counts and source list totals as produced bymake.strata. It then checks whether there are between three and five lists,whether all of the lists are non-empty, and whether all of the lists overlapwith some other list.

Usage

check.strata(strata)

Arguments

strata

A list of list overlaps and source countsin the format of theoutput of make.strata. list.overlaps contains a data frame of list overlapsby stratum. source.counts contains the number of records by source andstratum.

Value

A boolean indicating whether any serious problems have been foundwith the strata.

Note

This does not issue a warning for cases where some subset of lists isnot connected to the others, e.g. Lists A and B have overlap with eachother, lists C and D have overlap with each other, but no records from A orB overlap with lists C or D. We suggest that you examine the listintersection counts manually as well.

Author(s)

Kristian Lumkl@hrdag.org

Examples

library(chron)N <- 1000overlaps <- data.frame(l1 = rbinom(N, 1, .5), l2 = rbinom(N, 1, .5), l3 = rbinom(N, 1, .5))dates <- paste(  rep(2015, N), "-", sample(1:12, N, replace = TRUE), "-",  sample(1:28, N, replace = TRUE))dates <- chron(dates, format = c(dates = "y-m-d"))locations <- sample(c("A", "B", "C", "D"), N, replace = TRUE)# Aggregate only by week:strata <- make.strata(overlaps, dates, date.def = "weekly")check <- check.strata(strata)

A Helper Function Used in venn3

Description

Takes the parameters of a circle and returns points on its perimeter to beplotted to make circles for a venn diagram.

Usage

circle(x, y, r)

Arguments

x

the x coordinate of the center of the circle.

y

the y coordinate of the center of the circle.

r

the radius of the circle

Value

inds

the x,y coordinates of the periphery of a circle, to beused in venn3.

Author(s)

Kristian Lumkl@hrdag.org

Examples

plot(dga:::circle(0, 0, 1), type = "l")

A Helper Function for venn3

Description

Used in venn3 to tell whether proposed points are inside of the givencircle.

Usage

circle.ind(ps, x, y, r)

Arguments

ps

a n x 2 matrix of coordinates.

x

the x coordinate of the center of the circle.

y

the y coordinate of the center of the circle.

r

the radius of the circle.

Value

a length n vector telling whether each row of ps is inside the givencircle.

Author(s)

Kristian Lumkl@hrdag.org

Examples

ps <- cbind(runif(100), runif(100))plot(dga:::circle(0, 0, 1), type = "l")inds <- dga:::circle.ind(ps, 0, 0, 1)points(inds)

A Helper Function Used by Venn4 to Define the Perimeter of an Ellipse

Description

Draws the ellipses used in venn4.

Usage

ellipse(x, y, a, b, alpha)

Arguments

x

the x coordinate of the center of the ellipse.

y

the y coordinate of the center of the ellipse.

a

the x-direction radius.

b

the y-direction radius.

alpha

the angle of rotation of the ellipse

Value

points that define the perimeter of an ellipse.

Author(s)

Kristian Lumkl@hrdag.org

Examples

plot(dga:::ellipse(0, 0, .5, .2, 1))

A Helper Function Used by Venn4

Description

Takes potential points to be plotted in the venn diagrams and returnswhether the point is inside or outside of the ellipse described by x, y, a,b, and alpha.

Usage

ellipse.ind(ps, x, y, a, b, alpha)

Arguments

ps

a n x 2 matrix of coordinates.

x

the x coordinate of the center of the ellipse.

y

the y coordinate of the center of the ellipse.

a

the x-radius of the ellipse.

b

the y-radius of the ellipse.

alpha

the angle of rotation of the ellipse

Value

a length n vector indicating whether each point is inside theellipse.

Author(s)

Kristian Lumkl@hrdag.org

Examples

## The function is currently defined asps <- cbind(runif(100), runif(100))plot(dga:::ellipse(0, 0, .5, .3, 0), type = "l")inds <- dga:::ellipse.ind(ps, 0, 0, .5, .3, 0)points(inds)

All Decomposable Graphical Models on Three Lists

Description

This dataset contains all of the cliques and separators for each of thedecomposable graphical models on three lists. On three lists, this is all ofthe models.

Usage

data(graphs3)

Format

A list of lists. graphs3[[i]] is theith model underconsideration. This consists of graphs3[[i]]$C, all of the cliques in thatmodel, and graphs3[[i]]$S, the separators.


All Decomposable Graphical Models on Four Lists

Description

This dataset contains all of the cliques and separators for each of thedecomposable graphical models on four lists.

Usage

data(graphs4)

Format

A list of lists. graphs4[[i]] is theith model underconsideration. This consists of graphs4[[i]]$C, all of the cliques in thatmodel, and graphs4[[i]]$S, the separators.


All Decomposable Graphical Models on Five Lists

Description

This dataset contains all of the cliques and separators for each of thedecomposable graphical models on five lists.

Usage

data(graphs5)

Format

A list of lists. graphs5[[i]] is theith model underconsideration. This consists of graphs5[[i]]$C, all of the cliques in thatmodel, and graphs5[[i]]$S, the separators.


Base Converter

Description

Takes a decimal number and converts it to base b.

Usage

integer.base.b(x, b = 2)

Arguments

x

A number.

b

The desired base.

Details

This was harvested from the internet here:https://stat.ethz.ch/pipermail/r-help/2003-September/038978.html. Posted bySpencer Graves.

Value

A number in base b.

Author(s)

Spencer Graves

References

https://stat.ethz.ch/pipermail/r-help/2003-September/038978.html


Transforms Records to List Intersection Counts by Stratum

Description

Helps you to create list overlaps in the correct order to be used in bma.cr.This function also does some of the heavy lifting to stratify records bytime (date, etc.) and other variables.

Usage

make.strata(  overlaps,  dates = NULL,  locations = NULL,  demographics = NULL,  date.defs = "monthly",  loc.defs = NULL,  demog.defs = NULL,  start.date = NULL,  end.date = NULL)

Arguments

overlaps

a data frame that tells whether the i'th record appears onthe j'th lists, where n is the total number of sampled elements and p is thenumber of lists. For example, if the [3,2] entry is 1, then the thirdelement appeared on the second list. If it is zero, then the third elementdid NOT appear on the second list.

dates

record dates, in identical row oder to overlaps. This must be achron object. Do not include this if you don't want to stratify by time.

locations

record locations, though unlike the dates, there is nothingspecial about the type that would prevent you from using any other variabletype to stratify by here. Do not include this unless you want to stratify bythe factor you include here.

demographics

record demographic variables. Like locations, there isnothing specific to this that requires this be demographic. This should be afactor. Do not incude this unless you want to stratify by this factor.

date.defs

how you'd like to stratify by date. This defaults to"monthly". Other options are "weekly", "daily", and "yearly". If you enteran integer (k) instead of one of these options, the data will be stratifiedinto blocks of size k days.

loc.defs

How to divide up all of the levels of locations into groups.e.g. if locations has levels A, B, and C, and you'd like to stratify so thatA and B are one strata and C is another, input loc.defs = list(g1 = c('A','B'), g2 = c('C')). If this is left as NULL, each level will be put into itsown stratum.

demog.defs

Similar to loc.defs. Same format. Including both justallows you to stratify along two dimensions.

start.date

A chron object of one date. This gives the date ofearliest record we want to include. If NULL, this defaults to the earliestrecord in the dataset.

end.date

a chron object of one date. This gives the date of thelatest record to be included. If NULL, this defaults to the latest record inthe dataset. This can only be included if dates are given.

Value

overlap.counts

a data frame where each row gives the listintersection counts that can be used in bma.cr

source.counts

a dataframe that gives the total number of records by each data source andstratum.

Author(s)

Kristian Lumkl@hrdag.org

Examples

require(chron)N <- 10000overlaps <- data.frame(l1 = rbinom(N, 1, .5), l2 = rbinom(N, 1, .5), l3 = rbinom(N, 1, .5))dates <- paste(  rep(2015, N), "-", sample(1:12, N, replace = TRUE), "-",  sample(1:28, N, replace = TRUE))dates <- chron(dates, format = c(dates = "y-m-d"))locations <- sample(c("A", "B", "C", "D"), N, replace = TRUE)# Aggregate only by week:make.strata(overlaps, dates, date.def = "weekly")# Aggregate by year and location, where locations are not grouped:make.strata(overlaps, dates, date.def = "yearly", locations)# Aggregate by 2 day increments and location, where there are unique location levels#       A, B, C, and D and locations A and B are in group 1#       and locations C and D are in group 2.loc.defs <- list("g1" = c("A", "B"), "g2" = c("C", "D"))make.strata(overlaps, dates, date.def = 2, locations, loc.defs = loc.defs)# Aggregate by demographic (sex) only, where sex takes values M, F, A, NA, and U#       and we would like to group these as M, F, and other.sex <- sample(c("M", "F", "A", NA, "U"),  prob = c(.4, .4, .1, .05, .05),  N, replace = TRUE)demog.defs <- list("M" = "M", "F" = "F", "Other" = c("A", NA, "U"))make.strata(overlaps, demographics = sex, demog.defs = demog.defs)

Plots Posterior Distribution of Nmissing

Description

Plots the model averaged posterior distribution of the total number ofelements (the solid line) and the contribution to the posterior of each ofthe models (dotted lines)

Usage

plotPosteriorN(weights, N, main = NULL)

Arguments

weights

The output of BMAfunction.

N

N + Nmissing. Or, if you prefer, just Nmissing. The former showsthe posterior distribution of the total population size; the latter showsthe posterior distribution of the number of missing elements.

main

the title of the plot

Value

A plot.

Author(s)

Kristian Lumkl@hrdag.org

Examples

##### 5 list example from M & Y #######delta <- .5Y <- c(0, 27, 37, 19, 4, 4, 1, 1, 97, 22, 37, 25, 2, 1, 3, 5,       83, 36, 34, 18, 3, 5, 0, 2, 30, 5, 23, 8, 0, 3, 0, 2)Y <- array(Y, dim = c(2, 2, 2, 2, 2))Nmissing <- 1:300N <- Nmissing + sum(Y)data(graphs5)weights <- bma.cr(Y, Nmissing, delta, graphs5)plotPosteriorN(weights, N)##### 3 list example from M & Y #######Y <- c(0, 60, 49, 4, 247, 112, 142, 12)Y <- array(Y, dim = c(2, 2, 2))delta <- 1a <- 13.14b <- 55.17Nmissing <- 1:300N <- Nmissing + sum(Y)logprior <- N * log(b) - (N + a) * log(1 + b) + lgamma(N + a) - lgamma(N + 1) - lgamma(a)data(graphs3)weights <- bma.cr(Y, Nmissing, delta, graphs3, logprior)plotPosteriorN(weights, N)

A Helper Function to Tell Which Points Are Near the Boundary of a Circle

Description

Used in venn3 to tell which of the potential points to be plotted are nearthe boundary of the circle defned by x, y, and r.

Usage

remove.close(ps, x, y, r)

Arguments

ps

an n x 2 matrix of potential points.

x

the x coordinate of the center of the circle.

y

the y coordinate of the center of the circle.

r

the radius of the circle

Value

inds

tells which points are too close to the edge of thecircle.

Author(s)

Kristian Lumkl@hrdag.org

Examples

ps <- cbind(runif(100), runif(100))inds <- dga:::remove.close(ps, .5, .5, .1)

A Helper Function to Tell Which Points are Near the Boundary of the Ellipse

Description

A helper function.

Usage

remove.close.ellipse(ps, x, y, a, b, alpha)

Arguments

ps

an n x 2 matrix of potential points.

x

the x coordinate of the center of the ellipse.

y

the y coordinate of the center of the ellipse.

a

the x-radius of the ellipse.

b

the y-radius of the ellipse.

alpha

the angle of rotation of the ellipse.

Value

inds

a vector of length nrow(ps) that tells whether each rowof ps is near the border of the ellipse defined by x,y,a,b, and alpha.

Author(s)

Kristian Lumkl@hrdag.org

Examples

## The function is currently defined asps <- cbind(runif(100), runif(100))inds <- dga:::remove.close.ellipse(ps, .5, .5, .1, .3, 1)

A Helper Function for make.strata.

Description

This is the simplest function ever. It's just an apply to sum acrosscolumns.

Usage

sfunction(x)

Arguments

x

capture histories, as numbers

Value

apply(x, 2, sum)

Author(s)

Kristian Lumkl@hrdag.org

Examples

## The function is currently defined assfunction <- function(x) {  out <- apply(x, 2, sum)}

Three List Venn Diagram

Description

A function that plots a venn diagram of 3 lists. One point is plotted ineach region for each record that falls into the corresponding list overlap.

Usage

venn3(  overlap.counts,  main = NULL,  num.test.points = 1e+05,  p.cex = 0.75,  write_numbers = FALSE,  t.cex = 1.25,  cex.main = 1)

Arguments

overlap.counts

A vector of length2^3 that gives the number ofrecords in each overlap in lexicographic order, i.e. 001, 010, 011, 100,etc.

main

the title of the graph

num.test.points

how many test points to generate as potentials to beplotted in the circles.

p.cex

the size of the points to be plotted

write_numbers

indicates whether to print the number of points in eachregion.

t.cex

the size of the text to write the numbers.

cex.main

the size of the title

Value

a 3-way venn diagram with points inside of each segment representingthe number of records on each list overlap.

Author(s)

Kristian Lumkl@hrdag.org

Examples

overlap.counts <- rpois(8, 30)venn3(overlap.counts, main = "example diagram")

Four List Venn Diagram

Description

A function that plots a venn diagram of 4 lists. One point is plotted ineach region for each record that falls into the corresponding list overlap.

Usage

venn4(  overlap.counts,  main = NULL,  num.test.points = 1e+05,  p.cex = 0.75,  cex.main = 1)

Arguments

overlap.counts

A vector of length2^4 that gives the number ofrecords in each overlap in lexicographic order, i.e. 0000, 0001, 0010, 0011,0100, etc.

main

the title of the graph

num.test.points

how many test points to generate as potentials to beplotted in the circles.

p.cex

the size of the points to be plotted

cex.main

the size of the title

Value

A venn diagram of the list overlap structure for four lists. Eachregion of the plot contains points representing each record in that listintersection.

Author(s)

Kristian Lumkl@hrdag.org

Examples

overlap.counts <- rpois(16, 50)venn4(overlap.counts, main = "example diagram")

[8]ページ先頭

©2009-2025 Movatter.jp