- Notifications
You must be signed in to change notification settings - Fork0
AdamPaternostro/Azure-Databricks-auto-sklearn
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
- Shows how to install auto-sklearn on an Azure Databricks cluster
- https://automl.github.io/auto-sklearn/stable/
- You can use an existing Databricks workspace, but make sure you do not have lots of Libraries attached to your cluster. It is usually best to test this in a clean workspace. You can click on a cluster and click on Libraries to see what is attached. To clean up lots of libraries on your cluster see the bottom of this page.
- https://docs.databricks.com/user-guide/dev-tools/databricks-cli.html
- I perfer using the CLI since you do not need a cluster running to upload init scripts. You can create init scripts with a notebook, but sometimes this apporach creates a malformed file.
- You do not need a cluster running, start a command prompt
databricks configure --token {PLACE YOUR TOKEN HERE}dbfs mkdirs dbfs:/databricks/init/{clusterName-case-sensitive}dbfs cp autosklearn.sh dbfs:/databricks/init/{clusterName-case-sensitive}/autosklearn.shdbfs ls dbfs:/databricks/init/{clusterName-case-sensitive}/
- Must match the name that you just uploaded the script {clusterName-case-sensitive}
- Select Python version 3
- Start the cluster
- The init script will run when the cluster starts. The cluster will take longer to provision since all the auto-sklearn dependencies will be installed.
- Once the cluster has started
- Go to your workspace and add the library
- Create a notebook (Python)
- In the first cell
import sklearn.model_selectionimport sklearn.datasetsimport sklearn.metricsimport autosklearn.regression
- In the second cell
X, y = sklearn.datasets.load_boston(return_X_y=True) feature_types = (['numerical'] * 3) + ['categorical'] + (['numerical'] * 9) X_train, X_test, y_train, y_test = \ sklearn.model_selection.train_test_split(X, y, random_state=1)
- In the thrid cell
automl = autosklearn.regression.AutoSklearnRegressor( time_left_for_this_task=120, per_run_time_limit=30, tmp_folder='autosklearn_regression_example_tmp', output_folder='autosklearn_regression_example_out', ) automl.fit(X_train, y_train, dataset_name='boston', feat_type=feature_types)
To cleanly install a Library
- If you install a Databricks Library the defualt says "Install automatically on all clusters". This can cause issues like slow startup times and libraries failing to install properly due to conflicts. You should only install libraries on the clusters that you need them.
- To cleanly install a library, start a cluster, then install the library and check-off the cluster. Do not use the "Install automatically on all clusters"
To clean up libraries
- Go to each Cluster
- Click on Libraries tab
- Check to see if you can uninstall the library (you would check-off the library then click Unistall)
- What if you cannot uninstall
- Go to the library
- If the library is missing (removed), you just need to add again and do not check "Install automatically on all clusters"
- Uncheck "Install automatically on all clusters"
- Go to the library
- Then go to EACH cluster and click on Libraries tab, select (checkbox) the library and click Uninstall
- If you no longer need the library then you can move to the trash
About
Shows how to install auto-sklearn on an Azure Databricks cluster
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published