Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Smooth Regression - The Gamma Test and Tools
Version:0.1.0
Description:Finds causal connections in precision data, finds lags and embeddings in time series, guides training of neural networks and other smooth models, evaluates their performance, gives a mathematically grounded answer to the over-training problem. Smooth regression is based on the Gamma test, which measures smoothness in a multivariate relationship. Causal relations are smooth, noise is not. 'sr' includes the Gamma test and search techniques that use it. References: Evans & Jones (2002) <doi:10.1098/rspa.2002.1010>, AJ Jones (2004) <doi:10.1007/s10287-003-0006-1>.
License:GPL (≥ 3)
Encoding:UTF-8
Language:en-US
LazyData:true
RoxygenNote:7.2.3
Config/testthat/edition:3
VignetteBuilder:knitr
Depends:R (≥ 3.5.0)
Imports:ggplot2, dplyr, progress, RANN, stats, vdiffr
Suggests:knitr, magrittr, nnet, rmarkdown, testthat (≥ 3.0.0)
URL:https://smoothregression.com,https://github.com/haythorn/sr/
BugReports:https://github.com/haythorn/sr/issues
NeedsCompilation:no
Packaged:2023-03-09 20:56:16 UTC; wayne
Author:Wayne Haythorn [aut, cre], Antonia Jones [aut] (Principal creator of the Gamma test), Sam Kemp [ctb] (Wrote the original code for the Gamma test in R)
Maintainer:Wayne Haythorn <support@smoothregression.com>
Repository:CRAN
Date/Publication:2023-03-10 08:00:03 UTC

sr: Smooth Regression - The Gamma Test and Tools

Description

Finds causal connections in precision data, finds lags and embeddings in time series, guides training of neural networks and other smooth models, evaluates their performance, gives a mathematically grounded answer to the over-training problem. Smooth regression is based on the Gamma test, which measures smoothness in a multivariate relationship. Causal relations are smooth, noise is not. 'sr' includes the Gamma test and search techniques that use it. References: Evans & Jones (2002)doi:10.1098/rspa.2002.1010, AJ Jones (2004)doi:10.1007/s10287-003-0006-1.

Author(s)

Maintainer: Wayne Haythornsupport@smoothregression.com

Authors:

Other contributors:

See Also

Useful links:


Description

Calculates Gamma for all combinations of a set of input predictors

Usage

fe_search(predictors, target, prog_bar = TRUE, n_neighbors = 10, eps = 0)

Arguments

predictors

A vector or matrix whose columns are proposed inputs to apredictive function

target

A vector of double, the output variable that is to be predicted

prog_bar

Logical, set this to FALSE if you don't want progress bar displayed

n_neighbors

Integer number of near neighbors to use in RANN search,passed to gamma_test

eps

The error limit for the approximate near neighbor search. Thiswill be passed to gamma_test, which will pass it on to the ANN near neighbor search. Settingthis greater than zero can significantly reduce search time for large data sets.

Details

Given a set of predictors and a target that is to be predicted, this searchwill run the gamma test on every combination of the inputs. It returns theresults in order of increasing gamma, so the best combinations of inputs forprediction will be at the beginning of the list. As this is a fullycombinatoric search, it will start to get slow beyond about 16 inputs. By default,fe_search will display a progress bar showing the time to completion.

fe_search() returns a data.frame with two columns: Gamma, a sorted vector ofGamma values, and mask, an integer column containing the masks representing the inputsused to calculate each Gamma. To reconstruct the predictor set for a Gamma,use its mask with int_to_intMask and select_by_mask as shown in their examples.

Value

An invisible data frame with two columns, mask - an integer maskrepresenting a subset of the predictors, and Gamma, the value of Gamma usingthose predictors. The rows are sorted from lowest to highest Gamma. Thereturn value also has an attribute named target_V, the target variance.To get the vratio (estimated fraction of target variance due to noise), divideany of the Gammas by target_v.

Examples

e6 <- embed(mgls, 7)t <- e6[ ,1]p <- e6[ ,2:7]full_search <- fe_search(predictors = p, target = t)full_search <- dplyr::mutate(full_search,                             vratio = Gamma / attr(full_search, "target_v"))

Plot Histogram of Gammas

Description

Produces a histogram showing the distribution in a population of Gammavalues, used to examine the result of a full embedding search. Pass the resultof fe_search() to this function to look for structure in the predictors.For example, it this histogram is bimodal, there is probably one input variablewhich is absolutely required for a good predictive function, so the histogramdivides into the subset containing that variable, and the others that don't.

Usage

gamma_histogram(fe_results, bins = 100, caption = "")

Arguments

fe_results

The result of fe_search or full_embedding_search. A matrixcontaining a column labeled Gamma, of Numeric Gamma values.It also contains an integer column of masks, but that is not used by this function.

bins

Numeric, number of bins in the histogram

caption

Character string caption for the plot

Value

a ggplot object, a histogram showing the distribution of Gamma valuesfull embedding search output

Examples

e6 <- embed(mgls, 7)t <- e6[ ,1]p <- e6[ ,2:7]full_search <- fe_search(predictors = p, target = t)gamma_histogram(full_search, caption = "my data")

Estimate Smoothness in an Input/output Dataset

Description

The gamma test measures mean squared error in an input/output data set, relativeto an arbitrary, unknown smooth function. This can usually be interpreted as testingfor the existence of a causal relationship, and estimating the expected error of thebest smooth model that could be built on that relationship.

Usage

gamma_test(  predictors,  target,  n_neighbors = 10,  eps = 0,  plot = FALSE,  caption = "",  verbose = FALSE)

Arguments

predictors

A Numeric vector or matrix whose columns are proposed inputs to a predictive function.

target

A Numeric vector, the output variable that is to be predicted

n_neighbors

An Integer, the number of near neighbors to use in calculating gamma

eps

The error term passed to the approximate near neighbor search. The default valueof zero means that exact near neighbors will be found, but time will be O(M^2), where anapproximate search can run in O(M*log(M))

plot

A Logical variable, whether to plot the delta/gamma graph.

caption

A character string which will be the caption for the plot if plot = TRUE

verbose

A Logical variable, whether to return details of the computation

Value

If verbose == FALSE, a list containing Gamma and the vratio, If verbose == TRUE,that list plus the distances from each point to its near neighbors, the average of squared distances,and the value returned by lm on the delta and gamma averages. Gamma is Coefficient 1 of lm.

References

https://royalsocietypublishing.org/doi/10.1098/rspa.2002.1010,https://link.springer.com/article/10.1007/s10287-003-0006-1,https://smoothregression.com

Examples

he <- embed(henon_x, 3)t <- he[ , 1]p <- he[ ,2:3]gamma_test(predictors = p, target = t)

Discover how Gamma varies with sample size

Description

Investigates the effect of sample size by calculating Gamma on larger and largersamples. Gamma will converge on the true noise in the relationship as samplingdensity on the function increases.get_Mlist produces a showing M values(sample sizes), and the associated Gammas and vratios. It produces a graph bydefault, and also returns an invisible data.frame. The successive samples aretaken starting at the beginning of the inputs. There is no option to sortthe input data; if you want the data to be randomized, do that before callingget_Mlist. The graph will become stable when the sample size is large enough.If the M list does not become stable, there is not enough data for either theGamma test or a successful smooth model.

Usage

get_Mlist(  predictors,  target,  plot = TRUE,  caption = "",  show = "Gamma",  from = 20,  to = length(target),  by = 20)

Arguments

predictors

A Numeric vector or matrix whose columns are proposedinputs to a predictive relationship

target

A Numeric vector, the output variable that is to be predicted

plot

A logical, set this to FALSE if you don't want the plot

caption

Character string to be used as caption for the plot

show

Character string, if it equals "vratio", vratios will be plotted,otherwise Gamma is plotted

from

Integer length of the first data sample, as passed to seq

to

Integer maximum length of sample to test, passed to seq

by

Integer increment in lengths of successive windows, passed to seq

Value

An invisible data frame with three columns: M (a sample size), Gammaand the associated vratio. This is ordered by increasing M.

Examples

he <- embed(henon_x, 13)t <- he[ , 1]p <- he[ ,2:13]get_Mlist(p, t, by = 2, caption = "this data")

Henon Map

Description

1000 x data points from the Henon Map

Usage

henon_x

Format

An object of classnumeric of length 1000.

References

See Wikipedia entry on "Henon map"

Examples

henon_embedded <- embed(as.matrix(henon_x), 3)targets <- henon_embedded[ ,1]predictors <- henon_embedded[ ,2:3]gamma_test(predictors, targets)

Description

Adds variables one at a time to the input set, to see how many are needed for prediction.

Usage

increasing_search(  predictors,  target,  plot = TRUE,  caption = "",  show = "Gamma")

Arguments

predictors

A vector or matrix whose columns are proposed inputs to apredictive function

target

A vector of double, the output variable that is to be predicted

plot

Logical, set plot = FALSE if you don't want the plot

caption

Character string to identify plot, for example, data being plotted

show

Character string, if it equals "vratio", vratios will be plotted,otherwise Gamma is plotted

Details

An increasing embedding search is appropriate when the input variables are ordered,most commonly in analyzing time series, when it's useful to know how many previoustime steps or lags should be examined to build a model. Starting with lag 1, thesearch adds previous values one at a time, and saves the resulting gammas. Theseresults can be examined using plot_increasing_search()

Value

An invisible data frame with three columns, Depth of search, from1 to ncol(predictors), Gamma calculated using columns 1:Depth as predictors,and vratio corresponding to that Gamma (Gamma / var(target))

Examples

he <- embed(henon_x, 13)t <- he[ , 1]p <- he[ ,2:13]increasing_search(p, t, caption = "henon data embedded 16")df <- increasing_search(predictors=p, target=t, plot = FALSE)

Integer to Vector Bitmask

Description

Converts the bit representation of an integer into a vector of integers

Usage

int_to_intMask(i, length)

Arguments

i

A 32 bit integer

length

Integer length of the bitmask to produce, must be <= 32

Details

Converts an integer to a vector of ones and zeroes. Used as a helperfunction for full_embedding_search, it allows more compact storage of bit masks.The result reads left to right, so the one bit will have index of one in thevector corresponding to lag 1 in an embedding. Works for masks upto 32 bits

Value

A vector of integer containing 1 or 0

Examples

he <- embed(henon_x, 17)t <- he[ , 1]p <- he[ ,2:17]mask <- int_to_intMask(7, 16)     # pick out the first three columnspn <- select_by_mask(p, mask)gamma_test(predictors = pn, target = t)

Mask Histogram

Description

Display a histogram of mask bits.

Usage

mask_histogram(fe_result, dimension, tick_step = 2, caption = "")

Arguments

fe_result

Output data frame from fe_search. Normally you would filterthis by, for example, selecting the top 100 results from that output. If thewhole fe_search result was passed in, all of the mask bits would have the samefrequency and the histogram would be flat.

dimension

Integer number of effective columns in a mask, ncol of thepredictors given to the search

tick_step

Integer, where to put ticks on the x axis

caption

A character string you can use to identify this graph

Details

After a full embedding search, it is sometimes useful to see which bitsappear in a subset of the masks, for example, the masks with the lowest Gammavalues. Filtering of the search results should be done before calling thisfunction, which uses whatever it is given. The histogram can show whichpredictors are generally useful. For selecting an effective mask it isn't asuseful as you might think - it doesn't show interactions between predictors,for mask selection it would only work for linear combinations of inputs.

Value

A ggplot object, a histogram showing the mask bits used in the fe_searchresults that are passed to it

Examples

e6 <- embed(mgls, 7)t <- e6[ ,1]p <- e6[ ,2:7]full_search <- fe_search(predictors = p, target = t)goodies <- head(full_search, 20)mask_histogram(goodies, 6, caption = "mask bits in top 20 Gammas")baddies <- tail(full_search, 20)mask_histogram(baddies, 6, caption = "bits appearing in 20 worst Gammas")

Mackey-Glass time delayed differential equation

Description

4999 data points

Usage

mgls

Format

An object of classnumeric of length 4999.

References

See Wikipedia entry on "Mackey-Glass equations"

Examples

mgls_embedded <- embed(as.matrix(mgls), 25)targets <- mgls_embedded[ ,1]predictors <- mgls_embedded[ ,2:25]

Description

Calculate Gamma values for a window moving through the data.

Usage

moving_window_search(  predictors,  target,  window_size = 40,  by = 1,  plot = TRUE,  caption = "",  show = "Gamma")

Arguments

predictors

A Numeric vector or matrix whose columns are proposed inputsto a predictive function

target

A Numeric vector, the output variable that is to be predicted

window_size

Integer width of the window that will move through the data

by

The increment between successive window starts

plot

Logical, set this to FALSE if you don't want the plot

caption

Character string, caption for plot

show

Character string, if it equals "vratio", vratios will be plotted,otherwise Gamma is plotted

Details

This is used for data sets that are ordered on one or more dimension, such astime series or spatial data. The search slides a window across the data set,calculating gamma for the data at each step. A change in causal dynamics willappear as a spike in gamma when the causal discontinuity is in the window.

Value

An invisible data frame containing starting and ending positions ofeach window with its associated gamma

Examples

he <- embed(henon_x, 13)t <- he[ , 1]p <- he[ ,2:13]moving_window_search(p, t, by = 5, caption = "my data")

Select by Mask

Description

Select columns from a matrix using an integer bitmap

Usage

select_by_mask(data, intMask)

Arguments

data

A numeric matrix in tidy form

intMask

An Integer vector whose length equals number of columns in data

Details

Selects columns from a matrix. A column is included in the output when thecorresponding mask value is 1.

Value

A matrix containing the columns of data for which intMask is 1

Examples

e12 <- embed(mgls, 13)tn <- e12[ , 1]pn <- e12[ ,2:13]msk <- integer(12)msk[c(1,2,3,4,6,7,9)] <- 1  # select these columnsp <- select_by_mask(pn, msk)gamma_test(predictors = p, target = tn)msk <- int_to_intMask(15, 12)     # pick out the first four columnsp <- select_by_mask(pn, msk)gamma_test(predictors = p, target = tn)

[8]ページ先頭

©2009-2025 Movatter.jp