- Notifications
You must be signed in to change notification settings - Fork22
Python implementation of iterative-random-forests
License
Yu-Group/iterative-Random-Forest
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The algorithm details are available at:
Sumanta Basu, Karl Kumbier, James B. Brown, Bin Yu, Iterative Random Forests to detect predictive and stable high-order interactions, PNAShttps://www.pnas.org/content/115/8/1943
The implementation is a joint effort of several people in UC Berkeley. See theAuthors.md for the complete list.The weighted random forest implementation is based on the random forest source code and API design fromscikit-learn, details can be found inAPI design for machine learning software: experiences from the scikit-learn project, Buitinck et al., 2013.. The setup file is based on the setup file fromskgarden.
Use conda to create a python3.6 environment. Run the following commands inside the package directory
pipinstallnumpycython==0.29packagingpipinstall-e .
(outdated) To install, simply runpip install irf
. If you run into any issues, seeinstallation help.
In order to use irf, you need to import it in python.
importnumpyasnpfromirfimportirf_utilsfromirf.ensembleimportRandomForestClassifierWithWeights
Generate a simple data set with 2 features: 1st feature is a noise feature that has no power in predicting the labels, the 2nd feature determines the label perfectly:
n_samples=1000n_features=10X_train=np.random.uniform(low=0,high=1,size=(n_samples,n_features))y_train=np.random.choice([0,1],size=(n_samples,),p=[.5,.5])X_test=np.random.uniform(low=0,high=1,size=(n_samples,n_features))y_test=np.random.choice([0,1],size=(n_samples,),p=[.5,.5])# The second feature (which is indexed by 1) is very importantX_train[:,1]=X_train[:,1]+y_trainX_test[:,1]=X_test[:,1]+y_test
Then run irf
all_rf_weights, all_K_iter_rf_data, \ all_rf_bootstrap_output, all_rit_bootstrap_output, \ stability_score = irf_utils.run_iRF(X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test, K=5, # number of iteration rf = RandomForestClassifierWithWeights(n_estimators=20), B=30, random_state_classifier=2018, # random seed propn_n_samples=.2, bin_class_type=1, M=20, max_depth=5, noisy_split=False, num_splits=2, n_estimators_bootstrap=5)
all_rf_weights stores all the weights for each iteration:
print(all_rf_weights['rf_weight5'])
The proposed feature combination and their scores:
print(stability_score)
About
Python implementation of iterative-random-forests
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.