Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

UDFS: Systematically searching the space of small directed, acyclic graphs (DAGs).

NotificationsYou must be signed in to change notification settings

kahlmeyer94/DAG_search

Repository files navigation

Symbolic DAG Search

Systematically searching the space of small directed, acyclic graphs (DAGs).

Published in the Paper: Scaling Up Unbiased Search-based Symbolic Regression, where it is named UDFS (Unbiased DAG Frame Search).

For some reason, Google Search did not pick up upon the original github repo. So if you found this repo via the github.io page, you can find the original repositoryhere.

Installation

Option 1

Clone the repository and then follow

conda create --name testenv python=3.9.12conda activate testenvpip install -r requirements.txt... do stuff hereconda deactivateconda remove -n testenv --all

Option 2

Copy the install scriptinstall.sh and then run

bash install.sh

Usage

Lets consider a regression problem withN samples of inputsX (shapeN x m) and outputsy (shapeN).

Estimation of an expression can be done with three types of regressors:

UDFS

This is the base UDFS regressor.

from DAG_search import dag_searchudfs = dag_search.DAGRegressor()udfs.fit(X, y)

UDFS + Aug

This is UDFS with Augmentations as described in our paper.Here we wrap any symbolic regressor into an outer loop that detects variable augmentations.

from DAG_search import dag_searchudfs = dag_search.DAGRegressor() # UDFSudfs_aug = dag_search.AugRegressorPoly(regr_search = udfs) # UDFS + Augudfs_aug.fit(X, y, verbose = 0)

UDFS + Aug + Eliminations

Here we wrap any symbolic regressor into an outer loop that detects variable eliminations.This is especially useful if we have regression problems with a lot of inputs.

from DAG_search import dag_search, eliminationsudfs = dag_search.DAGRegressor()udfs_aug = dag_search.AugRegressorPoly(regr_search = udfs)udfs_aug_elim = eliminations.EliminationRegressor(udfs_aug)udfs_aug_elim.fit(X, y)

Inference of the models

The fitted expression can then be accessed via

est.model()

Note that the model is returned as asympy expression.

For prediction simply use

pred = est.predict(X)

or

pred, grad = est.predict(X, return_grad = True)

For advanced usage see the Tutorial-Notebooktutorial.ipynb.

Parameters

UDFS

  • k... Number of constants that are allowed in the expression. Increasing will increase the time used for constant optimization (Default = 1).
  • n_calc_nodes... Maximum number of intermediate calculation nodes in the expression DAG. Increasing this number will increase the search space, but allows more complex expressions (Default = 5).
  • max_orders... Maximum number of expression - skeletons in search. If it is greater than the search space, we have a true exhaustive search (Default = 1e6).
  • random_state... set to number for reproducibility, set to None to ignore (Default = None).
  • processes... number of processes used in multiprocessing (Default = 1).
  • max_samples... maximum number of datapoints at which we evaluate, Lower = Faster. Set to None to ignore (Default = None).
  • stop_thresh... If Loss < this threshold, will stop early (Default = 1e-20).
  • loss_fkt... Loss function to optimize for. Seedag_search.py for other examples (Default =dag_search.MSE_loss_fkt).
  • max_time... maximum runtime in seconds (Default = 1800).
  • use_tan... If True, will search constants using a tangens transformation, which essentially covers the interval [-inf, inf]. Otherwise will use [-10, 10] (Default = False).

Augmentation Regressor

  • random_state... set to number for reproducibility, set to None to ignore (Default = None).
  • simpl_nodes... Number of intermediary nodes for possible augmentations (Default = 2).
  • topk... Number of augmentations to consider (Default = 1).
  • max_orders... Maximum number of expression - skeletons in search for augmentations (Default = 1e5).
  • max_degree... Maximum degree for Polynomials (Default = 5)
  • max_tree_size... For selecting a best model, we return best expression from pareto front with less than this number of nodes (Default = 30).
  • max_samples... maximum number of datapoints at which we evaluate. Set to None to ignore (Default = None).
  • processes... number of processes used in multiprocessing (Default = 1).
  • regr_search... symbolic regressor used to search for solutions to augmented problems. Set to None to use default UDFS (Default = None).
  • fit_thresh... We consider models with an R2 Score greater than this as recovered. Set to > 1.0 to ignore (Default = 1-(1e-8)).

Elimination Regressor

  • symb_regr... symbolic regressor that is used to tackle the reduced problems

Rescaling Data

  • If your dependend variable contains very large values, consider fitting on a rescaled variable and unscaling the model afterwards.For example you could fit onX, y/c and unscale your model with
    c*regr.model()
  • Similarly you can rescale your independent variables and fit onX/c, y. In the final model, the unscaling can be done via sympyssubstitutions:
    expr = regr.model()expr.subs((s, c*s) for s in expr.free_symbols)

Dimensional Analysis

In case you have measurements with units, we provide the necessary tools to perform a dimensional analysisusingBuckingham's Pi Theorem.

from DAG_search import dimensional_analysis as da# collected data# assume we collected the four quantities charge, permitivity, length and the electric fieldX = ... # Unit table m, s, kg, VD = [    [2, -2, 1, -1, 0], # charge    [1, -2, 1, -2, 0], # permitivity    [1, 0, 0, 0, 0], # length    [-1, 0, 0, 1, 0], # electric field]# Analysisdim_analysis = da.DA_Buckingham()X_new, transl_dict = dim_analysis.fit(D, X)

Any expression in these dimensionless quantities can then be translated back into the original dimensions using

dim_analysis.translate(expr, transl_dict)

Citation

To reference this work, please use the following citation:

@inproceedings{Kahlmeyer:IJCAI24,  title     = {Scaling Up Unbiased Search-based Symbolic Regression},  author    = {Kahlmeyer, Paul and Giesen, Joachim and Habeck, Michael and Voigt, Henrik},  booktitle = {Proceedings of the Thirty-Third International Joint Conference on               Artificial Intelligence, {IJCAI-24}},  publisher = {International Joint Conferences on Artificial Intelligence Organization},  editor    = {Kate Larson},  pages     = {4264--4272},  year      = {2024},  month     = {8},  note      = {Main Track},  doi       = {10.24963/ijcai.2024/471},  url       = {https://doi.org/10.24963/ijcai.2024/471},}

About

UDFS: Systematically searching the space of small directed, acyclic graphs (DAGs).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp