kahlmeyer94/DAG_searchPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star7

UDFS: Systematically searching the space of small directed, acyclic graphs (DAGs).

kahlmeyer94.github.io/DAG_search/

7 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 465 Commits
DAG_search		DAG_search
images		images
.gitignore		.gitignore
README.md		README.md
googlec0b0dd4a394485c3.html		googlec0b0dd4a394485c3.html
install.sh		install.sh
requirements.txt		requirements.txt
setup.py		setup.py
tutorial.ipynb		tutorial.ipynb

Repository files navigation

Symbolic DAG Search

Systematically searching the space of small directed, acyclic graphs (DAGs).

Published in the Paper: Scaling Up Unbiased Search-based Symbolic Regression, where it is named UDFS (Unbiased DAG Frame Search).

For some reason, Google Search did not pick up upon the original github repo. So if you found this repo via the github.io page, you can find the original repositoryhere.

Installation

Option 1

Clone the repository and then follow

conda create --name testenv python=3.9.12conda activate testenvpip install -r requirements.txt... do stuff hereconda deactivateconda remove -n testenv --all

Option 2

Copy the install scriptinstall.sh and then run

bash install.sh

Usage

Lets consider a regression problem withN samples of inputsX (shapeN x m) and outputsy (shapeN).

Estimation of an expression can be done with three types of regressors:

UDFS

This is the base UDFS regressor.

from DAG_search import dag_searchest = dag_search.DAGRegressor()est.fit(X, y)

UDFS + Aug

This is UDFS with Augmentations as described in our paper.Here we wrap any symbolic regressor into an outer loop that detects variable augmentations.

from DAG_search import dag_search, augmentationsest = dag_search.DAGRegressor() # UDFSest_aug = augmentations.AugmentationRegressor(est) # UDFS + Augest_aug.fit(X, y)

UDFS + Aug + Eliminations

Here we wrap any symbolic regressor into an outer loop that detects variable eliminations.This is especially useful if we have regression problems with a lot of inputs.

from DAG_search import dag_search, substitutionsest = dag_search.DAGRegressor() # UDFSest_sub = substitutions.SubstitutionRegressor(est) # UDFS + Substitutionsest_sub.fit(X, y)

Inference of the models

The fitted expression can then be accessed via

est.model()

Note that the model is returned as asympy expression.

For prediction simply use

pred = est.predict(X)

pred, grad = est.predict(X, return_grad = True)

For advanced usage see the Tutorial-Notebooktutorial.ipynb.

Parameters

UDFS

k... Number of constants that are allowed in the expression. Increasing will increase the time used for constant optimization (Default = 1).
n_calc_nodes... Maximum number of intermediate calculation nodes in the expression DAG. Increasing this number will increase the search space, but allows more complex expressions (Default = 5).
max_orders... Maximum number of expression - skeletons in search. If it is greater than the search space, we have a true exhaustive search (Default = 1e6).
random_state... set to number for reproducibility, set to None to ignore (Default = None).
processes... number of processes used in multiprocessing (Default = 1).
max_samples... maximum number of datapoints at which we evaluate, Lower = Faster. Set to None to ignore (Default = None).
stop_thresh... If Loss < this threshold, will stop early (Default = 1e-20).
loss_fkt... Loss function to optimize for. Seedag_search.py for other examples (Default =dag_search.MSE_loss_fkt).
max_time... maximum runtime in seconds (Default = 1800).
use_tan... If True, will search constants using a tangens transformation, which essentially covers the interval [-inf, inf]. Otherwise will use [-10, 10] (Default = False).

Augmentation Regressor

random_state... set to number for reproducibility, set to None to ignore (Default = None).
simpl_nodes... Number of intermediary nodes for possible augmentations (Default = 2).
topk... Number of augmentations to consider (Default = 1).
max_orders... Maximum number of expression - skeletons in search for augmentations (Default = 1e5).
max_degree... Maximum degree for Polynomials (Default = 5)
max_tree_size... For selecting a best model, we return best expression from pareto front with less than this number of nodes (Default = 30).
max_samples... maximum number of datapoints at which we evaluate. Set to None to ignore (Default = None).
processes... number of processes used in multiprocessing (Default = 1).
regr_search... symbolic regressor used to search for solutions to augmented problems. Set to None to use default UDFS (Default = None).
fit_thresh... We consider models with an R2 Score greater than this as recovered. Set to > 1.0 to ignore (Default = 1-(1e-8)).

Substitution Regressor

symb_regr... symbolic regressor that is used to tackle the reduced problems

Rescaling Data

If your dependend variable contains very large values, consider fitting on a rescaled variable and unscaling the model afterwards.For example you could fit onX, y/c and unscale your model with
```
c*regr.model()
```
Similarly you can rescale your independent variables and fit onX/c, y. In the final model, the unscaling can be done via sympyssubstitutions:
```
expr = regr.model()expr.subs((s, c*s) for s in expr.free_symbols)
```

Dimensional Analysis

In case you have measurements with units, we provide the necessary tools to perform a dimensional analysisusingBuckingham's Pi Theorem.

from DAG_search import dimensional_analysis as da# collected data# assume we collected the four quantities charge, permitivity, length and the electric fieldX = ... # Unit table m, s, kg, VD = [    [2, -2, 1, -1, 0], # charge    [1, -2, 1, -2, 0], # permitivity    [1, 0, 0, 0, 0], # length    [-1, 0, 0, 1, 0], # electric field]# Analysisdim_analysis = da.DA_Buckingham()X_new, transl_dict = dim_analysis.fit(D, X)

Any expression in these dimensionless quantities can then be translated back into the original dimensions using

dim_analysis.translate(expr, transl_dict)

Citation

To reference this work, please use the following citation:

@inproceedings{Kahlmeyer:IJCAI24,  title     = {Scaling Up Unbiased Search-based Symbolic Regression},  author    = {Kahlmeyer, Paul and Giesen, Joachim and Habeck, Michael and Voigt, Henrik},  booktitle = {Proceedings of the Thirty-Third International Joint Conference on               Artificial Intelligence, {IJCAI-24}},  publisher = {International Joint Conferences on Artificial Intelligence Organization},  editor    = {Kate Larson},  pages     = {4264--4272},  year      = {2024},  month     = {8},  note      = {Main Track},  doi       = {10.24963/ijcai.2024/471},  url       = {https://doi.org/10.24963/ijcai.2024/471},}

About

UDFS: Systematically searching the space of small directed, acyclic graphs (DAGs).

kahlmeyer94.github.io/DAG_search/

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Symbolic DAG Search

Installation

Usage

UDFS

UDFS + Aug

UDFS + Aug + Eliminations

Inference of the models

Parameters

UDFS

Augmentation Regressor

Substitution Regressor

Rescaling Data

Dimensional Analysis

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

kahlmeyer94/DAG_search

Folders and files

Latest commit

History

Repository files navigation

Symbolic DAG Search

Installation

Usage

UDFS

UDFS + Aug

UDFS + Aug + Eliminations

Inference of the models

Parameters

UDFS

Augmentation Regressor

Substitution Regressor

Rescaling Data

Dimensional Analysis

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages