Movatterモバイル変換


[0]ホーム

URL:


nuggets: Get Started

Introduction

Packagenuggets searches for patterns that can beexpressed as formulae in the form of elementary conjunctions, referredto in this text asconditions. Conditions are constructed frompredicates, which correspond to data columns. Theinterpretation of conditions depends on the choice of underlyinglogic:

Before applyingnuggets, data columns intended aspredicates must be prepared either bydichotomization(conversion intodummy logical variables) or by transformationintofuzzy sets. The package provides functions for bothtransformations. See thesection DataPreparation below for a quick overview, or theData Preparation vignette for acomprehensive guide.

nuggets implements functions to search for pre-definedtypes of patterns or to discover patterns ofuser-defined type.For example, the package provides:

To provide custom evaluation functions for conditions and to searchforuser-defined types of patterns, the package offers twogeneral functions:

See the sectionPre-definedPatterns below for examples and details on using the pre-definedpattern discovery functions and the sectionAdvanced Use for examples of custom patterndiscovery.

Discovered rules and patterns can be post-processed, visualized, andexplored interactively. That part is covered in the sectionPost-processing andVisualization below.

Data Preparation

Before applyingnuggets, data columns intended aspredicates must be prepared either bydichotomization(conversion intodummy variables) or by transformation intofuzzy sets. The package provides thepartition()function for both transformations.

This section gives a quick overview of data preparation withnuggets. For a detailed guide, including information aboutall available functions and advanced techniques, please see theData Preparation Vignette.

Crisp (Boolean) Predicates Example

For crisp patterns, numeric columns are transformed to logical(TRUE/FALSE) columns. To show the process, westart with the built-inmtcars dataset, which we firstslightly modify by converting thecyl column to afactor:

# For demonstration, convert 'cyl' column of the mtcars dataset to a factormtcars<- mtcars|>mutate(cyl =factor(cyl,levels =c(4,6,8),labels =c("four","six","eight")))head(mtcars,n =3)#>                mpg  cyl disp  hp drat    wt  qsec vs am gear carb#> Mazda RX4     21.0  six  160 110 3.90 2.620 16.46  0  1    4    4#> Mazda RX4 Wag 21.0  six  160 110 3.90 2.875 17.02  0  1    4    4#> Datsun 710    22.8 four  108  93 3.85 2.320 18.61  1  1    4    1

Now we can use thepartition() function to transform allcolumns into crisp predicates:

# Transform the whole dataset to crisp predicatescrisp_mtcars<- mtcars|>partition(cyl, vs:gear,.method ="dummy")|>partition(mpg,.method ="crisp",.breaks =c(-Inf,15,20,30,Inf))|>partition(disp:carb,.method ="crisp",.breaks =3)head(crisp_mtcars,n =3)#> # A tibble: 3 × 32#>   `cyl=four` `cyl=six` `cyl=eight` `vs=0` `vs=1` `am=0` `am=1` `gear=3` `gear=4`#>   <lgl>      <lgl>     <lgl>       <lgl>  <lgl>  <lgl>  <lgl>  <lgl>    <lgl>#> 1 FALSE      TRUE      FALSE       TRUE   FALSE  FALSE  TRUE   FALSE    TRUE#> 2 FALSE      TRUE      FALSE       TRUE   FALSE  FALSE  TRUE   FALSE    TRUE#> 3 TRUE       FALSE     FALSE       FALSE  TRUE   FALSE  TRUE   FALSE    TRUE#>   `gear=5` `mpg=(-Inf;15]` `mpg=(15;20]` `mpg=(20;30]` `mpg=(30;Inf]`#>   <lgl>    <lgl>           <lgl>         <lgl>         <lgl>#> 1 FALSE    FALSE           FALSE         TRUE          FALSE#> 2 FALSE    FALSE           FALSE         TRUE          FALSE#> 3 FALSE    FALSE           FALSE         TRUE          FALSE#>   `disp=(-Inf;205]` `disp=(205;338]` `disp=(338;Inf]` `hp=(-Inf;146]`#>   <lgl>             <lgl>            <lgl>            <lgl>#> 1 TRUE              FALSE            FALSE            TRUE#> 2 TRUE              FALSE            FALSE            TRUE#> 3 TRUE              FALSE            FALSE            TRUE#>   `hp=(146;241]` `hp=(241;Inf]` `drat=(-Inf;3.48]` `drat=(3.48;4.21]`#>   <lgl>          <lgl>          <lgl>              <lgl>#> 1 FALSE          FALSE          FALSE              TRUE#> 2 FALSE          FALSE          FALSE              TRUE#> 3 FALSE          FALSE          FALSE              TRUE#>   `drat=(4.21;Inf]` `wt=(-Inf;2.82]` `wt=(2.82;4.12]` `wt=(4.12;Inf]`#>   <lgl>             <lgl>            <lgl>            <lgl>#> 1 FALSE             TRUE             FALSE            FALSE#> 2 FALSE             FALSE            TRUE             FALSE#> 3 FALSE             TRUE             FALSE            FALSE#>   `qsec=(-Inf;17.3]` `qsec=(17.3;20.1]` `qsec=(20.1;Inf]` `carb=(-Inf;3.33]`#>   <lgl>              <lgl>              <lgl>             <lgl>#> 1 TRUE               FALSE              FALSE             FALSE#> 2 TRUE               FALSE              FALSE             FALSE#> 3 FALSE              TRUE               FALSE             TRUE#>   `carb=(3.33;5.67]` `carb=(5.67;Inf]`#>   <lgl>              <lgl>#> 1 TRUE               FALSE#> 2 TRUE               FALSE#> 3 FALSE              FALSE

As seen above, the"dummy" method can be used to createlogical columns for each category of processed variables. Here, it wasapplied to create dummy variables for the factor variablecyl as well as for the numeric variablesvs,am, andgear.

The method"crisp" creates logical columns representingintervals for numeric variables. In the example, it was used to createintervals formpg based on specified breakpoints(-Inf,15,20,30,Inf), and fordisp,hp,drat,wt,qsec, andcarb using equal-width intervals (3 intervals each).

Now all columns are logical and can be used as predicates in crispconditions.

Fuzzy Predicates Example

Fuzzy predicates express the degree to which a condition issatisfied, with values in the interval\([0,1]\). This allows modeling of smoothtransitions between categories:

# Start with fresh mtcars and transform to fuzzy predicatesfuzzy_mtcars<- mtcars|>partition(cyl, vs:gear,.method ="dummy")|>partition(mpg,.method ="triangle",.breaks =c(-Inf,15,20,30,Inf))|>partition(disp:carb,.method ="triangle",.breaks =3)head(fuzzy_mtcars,n =3)#> # A tibble: 3 × 31#>   `cyl=four` `cyl=six` `cyl=eight` `vs=0` `vs=1` `am=0` `am=1` `gear=3` `gear=4`#>   <lgl>      <lgl>     <lgl>       <lgl>  <lgl>  <lgl>  <lgl>  <lgl>    <lgl>#> 1 FALSE      TRUE      FALSE       TRUE   FALSE  FALSE  TRUE   FALSE    TRUE#> 2 FALSE      TRUE      FALSE       TRUE   FALSE  FALSE  TRUE   FALSE    TRUE#> 3 TRUE       FALSE     FALSE       FALSE  TRUE   FALSE  TRUE   FALSE    TRUE#>   `gear=5` `mpg=(-Inf;15;20)` `mpg=(15;20;30)` `mpg=(20;30;Inf)`#>   <lgl>                 <dbl>            <dbl>             <dbl>#> 1 FALSE                     0             0.9               0.1#> 2 FALSE                     0             0.9               0.1#> 3 FALSE                     0             0.72              0.28#>   `disp=(-Inf;71.1;272)` `disp=(71.1;272;472)` `disp=(272;472;Inf)`#>                    <dbl>                 <dbl>                <dbl>#> 1                  0.557                 0.443                    0#> 2                  0.557                 0.443                    0#> 3                  0.816                 0.184                    0#>   `hp=(-Inf;52;194)` `hp=(52;194;335)` `hp=(194;335;Inf)`#>                <dbl>             <dbl>              <dbl>#> 1              0.592             0.408                  0#> 2              0.592             0.408                  0#> 3              0.711             0.289                  0#>   `drat=(-Inf;2.76;3.84)` `drat=(2.76;3.84;4.93)` `drat=(3.84;4.93;Inf)`#>                     <dbl>                   <dbl>                  <dbl>#> 1                       0                   0.945                0.0550#> 2                       0                   0.945                0.0550#> 3                       0                   0.991                0.00917#>   `wt=(-Inf;1.51;3.47)` `wt=(1.51;3.47;5.42)` `wt=(3.47;5.42;Inf)`#>                   <dbl>                 <dbl>                <dbl>#> 1                 0.434                 0.566                    0#> 2                 0.304                 0.696                    0#> 3                 0.587                 0.413                    0#>   `qsec=(-Inf;14.5;18.7)` `qsec=(14.5;18.7;22.9)` `qsec=(18.7;22.9;Inf)`#>                     <dbl>                   <dbl>                  <dbl>#> 1                  0.533                    0.467                      0#> 2                  0.4                      0.6                        0#> 3                  0.0214                   0.979                      0#>   `carb=(-Inf;1;4.5)` `carb=(1;4.5;8)` `carb=(4.5;8;Inf)`#>                 <dbl>            <dbl>              <dbl>#> 1               0.143            0.857                  0#> 2               0.143            0.857                  0#> 3               1                0                      0

Similar to the crisp example, the"dummy" method createslogical columns for categorical variables (cyl,vs,am,gear).

The"triangle" method creates fuzzy predicates withtriangular membership functions. Formpg, it uses specifiedbreakpoints to define fuzzy intervals. For the remaining numericvariables (disp throughcarb), itautomatically creates 3 overlapping fuzzy sets with smooth transitionsbetween intervals.

Note that thecyl,vs,am, andgear columns are still represented by dummy logicalcolumns, while the numeric columns are now represented by fuzzy sets.This combination allows both crisp and fuzzy predicates to be usedtogether in pattern discovery.

Advanced Data Preparation Capabilities

Thenuggets package provides powerful and flexible datapreparation tools. TheDataPreparation vignette covers these capabilities in depth,including:

For example, you can use quantile-based partitioning to ensurebalanced predicates, or use raised-cosine fuzzy sets with custom labelsto create meaningful linguistic terms like “very_low”, “low”, “medium”,“high”, and “very_high”. These preparation choices significantly impactthe interpretability and usefulness of patterns discovered in subsequentanalyses.

Pre-defined Patterns

The packagenuggets provides a set of functions fordiscovering some of the best-known pattern types. These functions canprocess Boolean data, fuzzy data, or both. Each function returns atibble, where every row represents one detected pattern.

Note: This section assumes that the data havealready beenpreprocessed — i.e., transformed into abinarized or fuzzified form. See the previous sectionData Preparation for details on how toprepare your dataset (for example,crisp_mtcars andfuzzy_mtcars).

For more advanced workflows — such as defining custom pattern typesor computing user-defined measures — see the sectionAdvanced Use.

Search for Association Rules

Association rules identify conditions(antecedents) under which a specific feature(consequent) is present very often.

\[A \Rightarrow C\]

If conditionA is satisfied, then the featureC tends to be present.

For example,

university_edu & middle_age & IT_industry => high_income

can be read as:

People in middle age with university education working in ITindustry are very likely to have a high income.

In practice, the antecedentA is a set of predicates,and the consequentC is usually a single predicate.

For a set of predicates\(I\), let\(\text{supp}(I)\) denote thesupport — the relative frequency (for logical data) or the meantruth degree (for fuzzy data) of rows satisfying all predicates in\(I\). Using this notation, the followingrule properties and quality measures may be defined:

Rules with highsupport are frequent in the data. Rules withhighconfidence indicate a strong association betweenantecedent and consequent. Rules with highlift suggest thatthe validity of antecedent increases the likelihood of the consequentoccurring.

Before searching for rules, it is recommended to create avectorof disjoints, which specifies predicates that must not appeartogether in the same condition. This vector should have the same lengthas the number of dataset columns.

For example, columns representinggear=3 andgear=4 are mutually exclusive, so their shared group labelindisj prevents meaningless conditions likegear=3 & gear=4. You can conveniently generate thisvector withvar_names():

disj<-var_names(colnames(fuzzy_mtcars))print(disj)#>  [1] "cyl"  "cyl"  "cyl"  "vs"   "vs"   "am"   "am"   "gear" "gear" "gear"#> [11] "mpg"  "mpg"  "mpg"  "disp" "disp" "disp" "hp"   "hp"   "hp"   "drat"#> [21] "drat" "drat" "wt"   "wt"   "wt"   "qsec" "qsec" "qsec" "carb" "carb"#> [31] "carb"

Thedig_associations() function searches for associationrules. Its main arguments are:

In the following example, we search for fuzzy association rules inthe datasetfuzzy_mtcars, such that:

result<-dig_associations(fuzzy_mtcars,antecedent =!starts_with("am"),consequent =starts_with("am"),disjoint = disj,min_support =0.02,min_confidence =0.8,contingency_table =TRUE)

The result is a tibble containing the discovered rules and theirquality metrics. You can arrange them, for example, by decreasingsupport:

result<-arrange(result,desc(support))print(result)#> # A tibble: 526 × 13#>    antecedent                     consequent support confidence coverage#>    <chr>                          <chr>        <dbl>      <dbl>    <dbl>#>  1 {gear=3}                       {am=0}       0.469      1        0.469#>  2 {gear=3,vs=0}                  {am=0}       0.375      1        0.375#>  3 {cyl=eight,gear=3,vs=0}        {am=0}       0.375      1        0.375#>  4 {cyl=eight,vs=0}               {am=0}       0.375      0.857    0.438#>  5 {cyl=eight,gear=3}             {am=0}       0.375      1        0.375#>  6 {cyl=eight}                    {am=0}       0.375      0.857    0.438#>  7 {mpg=(-Inf;15;20)}             {am=0}       0.327      0.847    0.387#>  8 {drat=(-Inf;2.76;3.84)}        {am=0}       0.311      0.948    0.328#>  9 {gear=3,mpg=(-Inf;15;20)}      {am=0}       0.309      1        0.309#> 10 {drat=(-Inf;2.76;3.84),gear=3} {am=0}       0.307      1        0.307#>    conseq_support  lift count antecedent_length    pp    pn    np    nn#>             <dbl> <dbl> <dbl>             <int> <dbl> <dbl> <dbl> <dbl>#>  1          0.594  1.68 15                    1 15    0      4     13#>  2          0.594  1.68 12                    2 12    0      7     13#>  3          0.594  1.68 12                    3 12    0      7     13#>  4          0.594  1.44 12                    2 12    2      7     11#>  5          0.594  1.68 12                    2 12    0      7     13#>  6          0.594  1.44 12                    1 12    2      7     11#>  7          0.594  1.43 10.5                  1 10.5  1.90   8.52  11.1#>  8          0.594  1.60  9.96                 1  9.96 0.546  9.04  12.5#>  9          0.594  1.68  9.88                 2  9.88 0      9.12  13.0#> 10          0.594  1.68  9.82                 2  9.82 0      9.18  13#> # ℹ 516 more rows

This example illustrates the typical workflow for mining associationrules withnuggets. The same structure and arguments applywhen analyzing either fuzzy or Boolean datasets.

Conditional Correlations

Conditional correlations identify strongrelationships between pairs of numeric variables under specificconditions.

Thedig_correlations() function searches for pairs ofvariables that are significantly correlated within sub-data satisfyinggenerated conditions. This is useful for discovering context-dependentrelationships.

In the following example, we search for correlations betweendifferent numeric variables in the originalmtcars dataunder conditions defined by the prepared predicates incrisp_mtcars:

# Prepare combined dataset with both condition predicates and numeric variablescombined_mtcars<-cbind(crisp_mtcars, mtcars[,c("mpg","disp","hp","wt")])# Extend disjoint vector for the new numeric columnsdisj_combined<-c(var_names(colnames(crisp_mtcars)),c("mpg","disp","hp","wt"))# Search for conditional correlationscorr_result<-dig_correlations(combined_mtcars,condition =colnames(crisp_mtcars),xvars =c("mpg","hp"),yvars =c("wt","disp"),disjoint = disj_combined,min_length =1,max_length =2,min_support =0.2,method ="pearson")print(corr_result)#> # A tibble: 536 × 10#>    condition               support xvar  yvar  estimate     p_value#>    <chr>                     <dbl> <chr> <chr>    <dbl>       <dbl>#>  1 {carb=(-Inf;3.33]}        0.625 mpg   wt      -0.887 0.000000183#>  2 {carb=(-Inf;3.33]}        0.625 mpg   disp    -0.816 0.0000116#>  3 {carb=(-Inf;3.33]}        0.625 hp    wt       0.791 0.0000326#>  4 {carb=(-Inf;3.33]}        0.625 hp    disp     0.877 0.000000388#>  5 {am=0,carb=(-Inf;3.33]}   0.375 mpg   wt      -0.632 0.0274#>  6 {am=0,carb=(-Inf;3.33]}   0.375 mpg   disp    -0.633 0.0270#>  7 {am=0,carb=(-Inf;3.33]}   0.375 hp    wt       0.755 0.00453#>  8 {am=0,carb=(-Inf;3.33]}   0.375 hp    disp     0.813 0.00131#>  9 {carb=(-Inf;3.33],vs=0}   0.25  mpg   wt      -0.823 0.0121#> 10 {carb=(-Inf;3.33],vs=0}   0.25  mpg   disp    -0.585 0.128#>    method                               alternative  rows condition_length#>    <chr>                                <chr>       <int>            <int>#>  1 Pearson's product-moment correlation two.sided      20                1#>  2 Pearson's product-moment correlation two.sided      20                1#>  3 Pearson's product-moment correlation two.sided      20                1#>  4 Pearson's product-moment correlation two.sided      20                1#>  5 Pearson's product-moment correlation two.sided      12                2#>  6 Pearson's product-moment correlation two.sided      12                2#>  7 Pearson's product-moment correlation two.sided      12                2#>  8 Pearson's product-moment correlation two.sided      12                2#>  9 Pearson's product-moment correlation two.sided       8                2#> 10 Pearson's product-moment correlation two.sided       8                2#> # ℹ 526 more rows

This example combines crisp predicates (fromcrisp_mtcars) with numeric variables from the originalmtcars dataset. The function searches for conditions underwhich pairs of numeric variables show significant Pearson correlations.Thedisjoint vector is extended to include the new numericcolumns, preventing conflicts in the search algorithm.

The result shows conditions under which specific pairs of variablesexhibit strong correlations, along with correlation coefficients andp-values.

Contrast Patterns

Contrast patterns identify conditions under which numeric variablesshow statistically significant differences. Thenuggetspackage provides several functions for different types of contrasts.

Baseline Contrasts

Baseline contrasts identify conditions under which avariable is significantly different from a baseline value (typicallyzero) using a one-sample statistical test.

# Prepare combined dataset with predicates and numeric variablescombined_mtcars2<-cbind(crisp_mtcars,                          mtcars[,c("mpg","hp","wt")])# Extend disjoint vector for the new numeric columnsdisj_combined2<-c(var_names(colnames(crisp_mtcars)),c("mpg","hp","wt"))# Search for baseline contrastsbaseline_result<-dig_baseline_contrasts(combined_mtcars2,condition =colnames(crisp_mtcars),vars =c("mpg","hp","wt"),disjoint = disj_combined2,min_length =1,max_length =2,min_support =0.2,method ="t")head(baseline_result)#> # A tibble: 6 × 15#>   condition               support var   estimate statistic    df  p_value     n#>   <chr>                     <dbl> <chr>    <dbl>     <dbl> <dbl>    <dbl> <int>#> 1 {carb=(-Inf;3.33]}        0.625 mpg      22.5       17.1    19 5.45e-13    20#> 2 {carb=(-Inf;3.33]}        0.625 hp      116.        11.5    19 5.16e-10    20#> 3 {carb=(-Inf;3.33]}        0.625 wt        2.88      15.9    19 1.97e-12    20#> 4 {am=0,carb=(-Inf;3.33]}   0.375 mpg      18.8       20.9    11 3.33e-10    12#> 5 {am=0,carb=(-Inf;3.33]}   0.375 hp      138.        11.4    11 2.01e- 7    12#> 6 {am=0,carb=(-Inf;3.33]}   0.375 wt        3.44      28.6    11 1.13e-11    12#>   conf_lo conf_hi stderr alternative method            comment condition_length#>     <dbl>   <dbl>  <dbl> <chr>       <chr>             <chr>              <int>#> 1   19.8    25.3   1.32  two.sided   One Sample t-test ""                     1#> 2   94.7   137.   10.0   two.sided   One Sample t-test ""                     1#> 3    2.50    3.26  0.181 two.sided   One Sample t-test ""                     1#> 4   16.8    20.8   0.900 two.sided   One Sample t-test ""                     2#> 5  112.    165.   12.2   two.sided   One Sample t-test ""                     2#> 6    3.18    3.71  0.120 two.sided   One Sample t-test ""                     2

This example tests whether the mean of numeric variables(mpg,hp,wt) significantlydiffers from zero under various conditions. Themethod = "t" parameter specifies a t-test. The results showwhich combinations of conditions lead to statistically significantdeviations from the baseline.

Complement Contrasts

Complement contrasts identify conditions under which avariable differs significantly between elements that satisfy thecondition and those that don’t.

complement_result<-dig_complement_contrasts(combined_mtcars2,condition =colnames(crisp_mtcars),vars =c("mpg","hp","wt"),disjoint = disj_combined2,min_length =1,max_length =2,min_support =0.15,method ="t")head(complement_result)#> # A tibble: 6 × 17#>   condition                        support var   estimate_x estimate_y statistic#>   <chr>                              <dbl> <chr>      <dbl>      <dbl>     <dbl>#> 1 {carb=(-Inf;3.33]}                 0.625 mpg        22.5       16.0       3.80#> 2 {carb=(-Inf;3.33]}                 0.625 hp        116.       198.       -3.60#> 3 {carb=(-Inf;3.33]}                 0.625 wt          2.88       3.78     -2.61#> 4 {carb=(-Inf;3.33],hp=(-Inf;146]}   0.406 mpg        25.6       16.3       6.04#> 5 {carb=(-Inf;3.33],hp=(-Inf;146]}   0.406 hp         86.5      188.       -6.95#> 6 {carb=(-Inf;3.33],hp=(-Inf;146]}   0.406 wt          2.45       3.74     -5.02#>      df     p_value   n_x   n_y conf_lo conf_hi stderr alternative#>   <dbl>       <dbl> <int> <int>   <dbl>   <dbl>  <dbl> <chr>#> 1  29.9 0.000662       20    12    2.99   9.94   1.70  two.sided#> 2  16.3 0.00233        20    12 -131.   -34.1   22.9   two.sided#> 3  19.5 0.0171         20    12   -1.61  -0.178  0.343 two.sided#> 4  18.5 0.00000929     13    19    6.06  12.5    1.54  two.sided#> 5  24.3 0.000000318    13    19 -132.   -71.3   14.6   two.sided#> 6  28.9 0.0000244      13    19   -1.82  -0.768  0.258 two.sided#>   method                  comment condition_length#>   <chr>                   <chr>              <int>#> 1 Welch Two Sample t-test ""                     1#> 2 Welch Two Sample t-test ""                     1#> 3 Welch Two Sample t-test ""                     1#> 4 Welch Two Sample t-test ""                     2#> 5 Welch Two Sample t-test ""                     2#> 6 Welch Two Sample t-test ""                     2

This example uses a two-sample t-test to compare the mean values ofnumeric variables between rows that satisfy a condition and rows thatdon’t. The results identify conditions where subgroups havesignificantly different characteristics compared to the rest of thedata.

Paired Baseline Contrasts

Paired baseline contrasts identify conditions under whichthere is a significant difference between two paired numericvariables.

paired_result<-dig_paired_baseline_contrasts(combined_mtcars2,condition =colnames(crisp_mtcars),xvars =c("mpg","hp"),yvars =c("wt","wt"),disjoint = disj_combined2,min_length =1,max_length =2,min_support =0.2,method ="t")head(paired_result)#> # A tibble: 6 × 16#>   condition               support xvar  yvar  estimate statistic    df  p_value#>   <chr>                     <dbl> <chr> <chr>    <dbl>     <dbl> <dbl>    <dbl>#> 1 {carb=(-Inf;3.33]}        0.625 mpg   wt        19.6     13.3     19 4.73e-11#> 2 {carb=(-Inf;3.33]}        0.625 hp    wt       113.      11.4     19 6.19e-10#> 3 {am=0,carb=(-Inf;3.33]}   0.375 mpg   wt        15.4     15.7     11 7.18e- 9#> 4 {am=0,carb=(-Inf;3.33]}   0.375 hp    wt       135.      11.2     11 2.41e- 7#> 5 {carb=(-Inf;3.33],vs=0}   0.25  mpg   wt        14.4      9.96     7 2.20e- 5#> 6 {carb=(-Inf;3.33],vs=0}   0.25  hp    wt       157.      14.7      7 1.63e- 6#>       n conf_lo conf_hi stderr alternative method        comment#>   <int>   <dbl>   <dbl>  <dbl> <chr>       <chr>         <chr>#> 1    20    16.5    22.7  1.48  two.sided   Paired t-test ""#> 2    20    92.1   134.   9.90  two.sided   Paired t-test ""#> 3    12    13.2    17.5  0.980 two.sided   Paired t-test ""#> 4    12   108.    161.  12.1   two.sided   Paired t-test ""#> 5     8    11.0    17.9  1.45  two.sided   Paired t-test ""#> 6     8   131.    182.  10.7   two.sided   Paired t-test ""#>   condition_length#>              <int>#> 1                1#> 2                1#> 3                2#> 4                2#> 5                2#> 6                2

This example performs paired t-tests to compare two variables withinthe same rows under specific conditions. Here, it tests whethermpg differs fromwt (andhp fromwt) in various subgroups. This is useful for detectingcontext-dependent relationships between paired measurements.

Post-processing and Visualization

After discovering patterns withnuggets, you’ll oftenwant to manipulate, format, and visualize the results. The packageprovides several tools for these tasks.

Visualizing Association Rules with Diamond Plots

Thegeom_diamond() function provides a specializedvisualization for association rules and their hierarchical structure. Itdisplays rules as a lattice where broader (more general) conditionsappear above their descendants:

# Search for rules with various confidence levels for visualizationvis_rules<-dig_associations(fuzzy_mtcars,antecedent =starts_with(c("gear","vs")),consequent ="am=1",disjoint = disj,min_support =0,min_confidence =0,min_length =0,max_length =3,max_results =50)print(vis_rules)#> # A tibble: 12 × 9#>    antecedent    consequent support confidence coverage conseq_support  lift#>    <chr>         <chr>        <dbl>      <dbl>    <dbl>          <dbl> <dbl>#>  1 {}            {am=1}      0.406       0.406   1               0.406 1#>  2 {vs=0}        {am=1}      0.188       0.333   0.562           0.406 0.821#>  3 {gear=3,vs=0} {am=1}      0           0       0.375           0.406 0#>  4 {gear=4,vs=0} {am=1}      0.0625      1       0.0625          0.406 2.46#>  5 {gear=5,vs=0} {am=1}      0.125       1       0.125           0.406 2.46#>  6 {gear=3}      {am=1}      0           0       0.469           0.406 0#>  7 {gear=3,vs=1} {am=1}      0           0       0.0938          0.406 0#>  8 {vs=1}        {am=1}      0.219       0.5     0.438           0.406 1.23#>  9 {gear=4,vs=1} {am=1}      0.188       0.6     0.312           0.406 1.48#> 10 {gear=5,vs=1} {am=1}      0.0312      1       0.0312          0.406 2.46#> 11 {gear=4}      {am=1}      0.25        0.667   0.375           0.406 1.64#> 12 {gear=5}      {am=1}      0.156       1       0.156           0.406 2.46#>    count antecedent_length#>    <dbl>             <int>#>  1    13                 0#>  2     6                 1#>  3     0                 2#>  4     2                 2#>  5     4                 2#>  6     0                 1#>  7     0                 2#>  8     7                 1#>  9     6                 2#> 10     1                 2#> 11     8                 1#> 12     5                 1# Create diamond plot showing rule hierarchyggplot(vis_rules)+aes(condition = antecedent,fill = confidence,linewidth = confidence,size = support,label =paste0(antecedent,"\nconf: ",round(confidence,2)))+geom_diamond(nudge_y =0.25)+scale_x_discrete(expand =expansion(add =0.5))+scale_y_discrete(expand =expansion(add =0.25))+labs(title ="Association Rules Hierarchy",subtitle ="consequent: am=1")

This example creates a hierarchical visualization of associationrules. Thegeom_diamond() function arranges rules in alattice structure where simpler rules (with fewer predicates) appear atthe top and more complex rules below. Visual properties (fill color,edge width, node size) encode rule quality measures, making it easy toidentify the most interesting patterns. Custom label merges antecedentwith confidence value for better readability. Additional modifications(scale_x_discrete,scale_y_discrete) addpadding.

The diamond plot helps identify:

Interactive Exploration

Theexplore() function launches an interactive Shinyapplication for exploring discovered patterns. This is particularlyuseful for association rules:

# Launch interactive explorer for association rulesrules<-dig_associations(fuzzy_mtcars,antecedent =everything(),consequent =everything(),min_support =0.05,min_confidence =0.7)# Open interactive explorerexplore(rules,data = fuzzy_mtcars)

The interactive explorer provides:

Advanced Use

For advanced workflows, thenuggets package allows usersto define custom pattern types and evaluation functions. This sectiondemonstrates how to use the generaldig() function withcustom callbacks and the specializeddig_grid()wrapper.

Custom Patterns with dig()

Thedig() function allows you to execute a user-definedcallback function on each generated frequent condition. This enablessearching for custom pattern types beyond the pre-defined functions.

The following example replicates the search for association rulesusing a custom callback function with the datasets prepared earlier:

# Define thresholds for custom association rulesmin_support<-0.02min_confidence<-0.8# Define custom callback functionf<-function(condition, support, pp, pn) {# Calculate confidence for each focus (consequent)    conf<- pp/ support# Filter rules by confidence and support thresholds    sel<-!is.na(conf)& conf>= min_confidence&!is.na(pp)& pp>= min_support    conf<- conf[sel]    supp<- pp[sel]# Return list of rules meeting criterialapply(seq_along(conf),function(i) {list(antecedent =format_condition(names(condition)),consequent =names(conf)[[i]],support = supp[[i]],confidence = conf[[i]])    })}# Search using custom callbackcustom_result<-dig(fuzzy_mtcars,f = f,condition =!starts_with("am"),focus =starts_with("am"),disjoint = disj,min_length =1,min_support = min_support)# Flatten and format resultscustom_result<- custom_result|>unlist(recursive =FALSE)|>lapply(as_tibble)|>do.call(rbind,args = _)|>arrange(desc(support))print(custom_result)#> # A tibble: 5,408 × 4#>    antecedent              consequent support confidence#>    <chr>                   <chr>        <dbl>      <dbl>#>  1 {gear=3}                am=0          15         32#>  2 {wt=(1.51;3.47;5.42)}   am=0          14.0       22.6#>  3 {qsec=(14.5;18.7;22.9)} am=0          12.2       19.5#>  4 {hp=(52;194;335)}       am=0          12.1       24.2#>  5 {vs=0}                  am=0          12         21.3#>  6 {gear=3,vs=0}           am=0          12         32#>  7 {cyl=eight,gear=3,vs=0} am=0          12         32#>  8 {cyl=eight,vs=0}        am=0          12         27.4#>  9 {cyl=eight,gear=3}      am=0          12         32#> 10 {cyl=eight}             am=0          12         27.4#> # ℹ 5,398 more rows

The callback functionf() receives information based onits argument names:

This approach gives you full control over pattern evaluation andfiltering logic.

Grid-Based Patterns with dig_grid()

Thedig_grid() function is useful for patterns based onrelationships between pairs of columns. It creates a grid of columncombinations and evaluates a user-defined function for each conditionand column pair.

Here’s an example that computes custom statistics for pairs ofnumeric variables:

# Define callback for grid-based patternsgrid_callback<-function(d, weights) {if (nrow(d)<5)return(NULL)# Skip if too few observations# Compute weighted correlation    wcor<-cov.wt(d,wt = weights,cor =TRUE)$cor[1,2]list(correlation = wcor,n_obs =sum(weights>0.1),mean_x =weighted.mean(d[[1]], weights),mean_y =weighted.mean(d[[2]], weights)    )}# Prepare combined datasetcombined_fuzzy<-cbind(fuzzy_mtcars, mtcars[,c("mpg","hp","wt")])# Extend disjoint vector for new numeric columnscombined_disj3<-c(var_names(colnames(fuzzy_mtcars)),c("mpg","hp","wt"))# Search using grid approachgrid_result<-dig_grid(combined_fuzzy,f = grid_callback,condition =colnames(fuzzy_mtcars),xvars =c("mpg","hp"),yvars =c("wt"),disjoint = combined_disj3,type ="fuzzy",min_length =1,max_length =2,min_support =0.15,max_results =20)# Display resultsprint(grid_result)#> # A tibble: 40 × 9#>    condition                                     support xvar  yvar  correlation#>    <chr>                                           <dbl> <chr> <chr>       <dbl>#>  1 {qsec=(14.5;18.7;22.9)}                         0.627 mpg   wt         -0.894#>  2 {qsec=(14.5;18.7;22.9)}                         0.627 hp    wt          0.849#>  3 {qsec=(14.5;18.7;22.9),wt=(1.51;3.47;5.42)}     0.360 mpg   wt         -0.816#>  4 {qsec=(14.5;18.7;22.9),wt=(1.51;3.47;5.42)}     0.360 hp    wt          0.710#>  5 {am=0,qsec=(14.5;18.7;22.9)}                    0.383 mpg   wt         -0.810#>  6 {am=0,qsec=(14.5;18.7;22.9)}                    0.383 hp    wt          0.759#>  7 {drat=(2.76;3.84;4.93),qsec=(14.5;18.7;22.9)}   0.341 mpg   wt         -0.850#>  8 {drat=(2.76;3.84;4.93),qsec=(14.5;18.7;22.9)}   0.341 hp    wt          0.770#>  9 {qsec=(14.5;18.7;22.9),vs=0}                    0.294 mpg   wt         -0.865#> 10 {qsec=(14.5;18.7;22.9),vs=0}                    0.294 hp    wt          0.791#>    n_obs mean_x mean_y condition_length#>    <int>  <dbl>  <dbl>            <int>#>  1    29   20.7   3.19                1#>  2    29  131.    3.19                1#>  3    24   19.4   3.27                2#>  4    24  135.    3.27                2#>  5    18   17.0   3.83                2#>  6    18  158.    3.83                2#>  7    26   22.0   2.93                2#>  8    26  118.    2.93                2#>  9    16   16.4   3.88                2#> 10    16  175.    3.88                2#> # ℹ 30 more rows

Thedig_grid() function is particularly useful for:

Summary

This vignette has introduced the core functionality of thenuggets package for discovering patterns in data throughsystematic exploration of conditions. Key takeaways:

  1. Data Preparation: Transform your data intopredicates usingpartition().

  2. Pre-defined Pattern Discovery: The packageprovides specialized functions for common pattern types:

    • dig_associations() finds association rules (A → C)
    • dig_correlations() discovers conditional correlationsbetween variable pairs
    • dig_baseline_contrasts() identifies when variablesdeviate from baseline under conditions
    • dig_complement_contrasts() finds subgroups differingfrom the rest
    • dig_paired_baseline_contrasts() compares pairedvariables within contexts
  3. Post-processing: Manipulate and visualizediscovered patterns:

    • Create hierarchical visualizations withgeom_diamond()
    • Launch interactive explorers withexplore()
  4. Advanced Usage: Define custom pattern types:

    • Usedig() with custom callback functions forspecialized analyses
    • Usedig_grid() for patterns based on variablepairs

Next Steps

Thenuggets package provides a flexible framework forpattern discovery that scales from simple association rule mining tocomplex custom pattern searches, all while supporting both crisp andfuzzy logic approaches.


[8]ページ先頭

©2009-2025 Movatter.jp