| Type: | Package |
| Title: | Functional Labeled Optimal Partitioning |
| Version: | 2024.6.19 |
| Description: | Provides an efficient 'C++' code for computing an optimal segmentation model with Poisson loss, up-down constraints, and label constraints, as described by Kaufman et al. (2024) <doi:10.1080/10618600.2023.2293216>. |
| License: | GPL-3 |
| RoxygenNote: | 7.3.1 |
| Suggests: | testthat, PeakError, knitr, markdown, ggplot2 |
| Depends: | R (≥ 2.10) |
| LinkingTo: | Rcpp |
| Imports: | Rcpp, data.table |
| VignetteBuilder: | knitr |
| NeedsCompilation: | yes |
| Packaged: | 2024-06-20 11:05:36 UTC; tdhock |
| Author: | Toby Dylan Hocking [aut, cre] |
| Maintainer: | Toby Dylan Hocking <toby.hocking@r-project.org> |
| Repository: | CRAN |
| Date/Publication: | 2024-06-20 21:30:10 UTC |
Functional Labeled Optimal Partitioning
Description
Main function for computing optimal segmentation model withPoisson loss, up-down constraints, and label constraints.
Usage
FLOPART(coverage, label, penalty)Arguments
coverage | data frame of coverage |
label | data frame of labels |
penalty | non-negative penalty constant |
Value
list with named elements: coverage_dt is a data table withcolumns chromStart, chromEnd, count, weight; label_dt is a datatable with columns chromStart, chromEnd, annotation, type,firstRow, lastRow; cost_mat is a Nx2 numeric matrix of optimalpenalized Poisson loss values up to each data point and in eachstate; intervals_mat is a Nx2 integer matrix of counts ofintervals used to store the optimal cost function, useful foranalyzing time/space complexity; segments_dt is a data tablewith columns chromStart, chromEnd, status, mean.
Author(s)
Toby Dylan Hocking
Examples
library(data.table)data("Mono27ac.simple", package="FLOPART")Mono27ac.simplelabel.pen <- 1400fit <- with(Mono27ac.simple, FLOPART::FLOPART(coverage, label, label.pen))lapply(fit, head)## Plot data and model.ann.colors <- c( noPeaks="orange", peakStart="#efafaf", peakEnd="#ff4c4c")model.color <- "blue"(peaks.dt <- fit[["segments_dt"]][status=="peak"][, peak.y := -2][])if(require("ggplot2")){ ggplot()+ ggtitle("Model with label constraints (FLOPART)")+ scale_fill_manual("label", values=ann.colors)+ geom_rect(aes( xmin=chromStart, xmax=chromEnd, ymin=-Inf, ymax=Inf, fill=annotation), alpha=0.5, color="grey", data=Mono27ac.simple[["label"]])+ geom_step(aes( chromStart, count), data=Mono27ac.simple[["coverage"]], color="grey50")+ geom_step(aes( chromStart, mean), data=fit[["segments_dt"]], color=model.color)+ geom_segment(aes( chromStart, peak.y, xend=chromEnd, yend=peak.y), color=model.color, size=1, data=peaks.dt)+ geom_point(aes( chromEnd, peak.y), color=model.color, shape=21, fill="white", data=peaks.dt)+ theme_bw()+ theme(panel.spacing=grid::unit(0, "lines"))}## To analyze computational complexity, plot number of intervals## stored in cost function, versus data point, for each cost status.imat <- fit[["intervals_mat"]]interval.dt <- data.table( intervals=as.integer(imat), status=c("peak", "background")[as.integer(col(imat))], data.i=as.integer(row(imat)))if(require("ggplot2")){ ggplot()+ scale_fill_manual("label", values=ann.colors)+ geom_rect(aes( xmin=firstRow-0.5, xmax=lastRow+0.5, ymin=-Inf, ymax=Inf, fill=annotation), alpha=0.5, color="grey", data=fit[["label_dt"]])+ geom_line(aes( data.i, intervals, color=status), size=1, data=interval.dt)}Convert data for input to FLOPART
Description
FLOPART needs at most one label per coverage data row, which maynot be the case for arbitrary coverage/labels.
Usage
FLOPART_data(coverage, label)Arguments
coverage | data frame of coverage with columns chromStart,chromEnd, count |
label | data frame of labels with with columns chromStart,chromEnd, annotation |
Value
named list: coverage_dt is data table representing arun-length encoding of the input coverage data, with additionalrows if there are label chromStart/chromEnd values not in theset of coverage positions; label_dt is a data table with one rowper label, and additional columns firstRow/lastRow which referto row numbers of coverage_dt, 0-based for passing to C++ code.
Author(s)
Toby Dylan Hocking
Examples
library(data.table)d <- function(chromStart, chromEnd, count){ data.table(chromStart, chromEnd, count)}(cov.dt <- rbind( d(0, 10, 53), d(10, 20, 124)))l <- function(chromStart, chromEnd, annotation){ data.table(chromStart, chromEnd, annotation)}lab.dt <- rbind( l(2, 7, "noPeaks"), l(8, 15, "peakStart"))FLOPART::FLOPART_data(cov.dt)FLOPART::FLOPART_data(cov.dt, lab.dt)data("Mono27ac", package="FLOPART")sapply(Mono27ac, dim)converted <- with(Mono27ac, FLOPART::FLOPART_data(coverage, labels))sapply(converted, dim)Interface to FLOPART C++ code
Description
Interface to FLOPART C++ code
Usage
FLOPART_interface( data_vec, weight_vec, penalty, label_type_vec, label_start_vec, label_end_vec)Arguments
data_vec | Integer vector of non-negative count data |
weight_vec | Numeric vector of positive weights (same size as data_vec) |
penalty | non-negative real-valued penalty (larger for fewer peaks) |
label_type_vec | Integer vector of label types |
label_start_vec | Integer vector of label starts |
label_end_vec | Integer vector of label ends |
Value
List with named elements: cost_mat and intervals_mat (one row for each data point, first column up, second down), segments_df (one row for each segment in the optimal model)
H3K27ac ChIP-seq data from one Monocyte sample
Description
Raw coverage data and labels are provided in order to test the FLOPART algo.
Usage
data("Mono27ac")Format
List of 2 data.tables, coverage and labels.
Source
https://github.com/tdhock/feature-learning-benchmark
Smaller H3K27ac ChIP-seq data from one Monocyte sample
Description
Raw coverage data and labels are provided in order to test the FLOPART algo.
Usage
data("Mono27ac.simple")Format
List of two data tables, coverage and label.
Source
https://github.com/tdhock/feature-learning-benchmark
Lookup the integer values used to represent different label types
Description
Lookup the integer values used to represent different label types
Usage
get_label_code()Value
Integer vector with names corresponding to supported label types