- Notifications
You must be signed in to change notification settings - Fork178
Databricks SDK for Python (Beta)
License
databricks/databricks-sdk-py
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Beta: This SDK is supported for production use cases,but we do expect future releases to have some interface changes; seeInterface stability.We are keen to hear feedback from you on these SDKs. Pleasefile issues, and we will address them.| See also theSDK for Java| See also theSDK for Go| See also theTerraform Provider| See also cloud-specific docs (AWS,Azure,GCP)| See also theAPI reference on readthedocs
The Databricks SDK for Python includes functionality to accelerate development withPython for the Databricks Lakehouse.It covers all publicDatabricks REST API operations.The SDK's internal HTTP client is robust and handles failures on different levels by performing intelligent retries.
- Getting started
- Code examples
- Authentication
- Long-running operations
- Paginated responses
- Single-sign-on with OAuth
- User Agent Request Attribution
- Error handling
- Logging
- Integration with
dbutils - Interface stability
- Please install Databricks SDK for Python via
pip install databricks-sdkand instantiateWorkspaceClient:
fromdatabricks.sdkimportWorkspaceClientw=WorkspaceClient()forcinw.clusters.list():print(c.cluster_name)
Databricks SDK for Python is compatible with Python 3.7(untilJune 2023), 3.8, 3.9, 3.10, and 3.11.
Note: Databricks Runtime starting from version 13.1 includes a bundled version of the Python SDK.
It is highly recommended to upgrade to the latest version which you can do by running the following in a notebook cell:
%pipinstall--upgradedatabricks-sdk
followed by
dbutils.library.restartPython()
The Databricks SDK for Python comes with a number of examples demonstrating how to use the library for various common use-cases, including
- Using the SDK with OAuth from a webserver
- Using long-running operations
- Authenticating a client app using OAuth
These examples and more are located in theexamples/ directory of the Github repository.
Some other examples of using the SDK include:
- Unity Catalog Automated Migration heavily relies on Python SDK for working with Databricks APIs.
- ip-access-list-analyzer checks & prunes invalid entries from IP Access Lists.
If you use Databricksconfiguration profilesor Databricks-specificenvironment variablesforDatabricks authentication, the only code required to startworking with a Databricks workspace is the following code snippet, which instructs the Databricks SDK for Python to useitsdefault authentication flow:
fromdatabricks.sdkimportWorkspaceClientw=WorkspaceClient()w.# press <TAB> for autocompletion
The conventional name for the variable that holds the workspace-level client of the Databricks SDK for Python isw, which is shorthand forworkspace.
- Default authentication flow
- Databricks native authentication
- Azure native authentication
- Overriding .databrickscfg
- Additional authentication configuration options
If you run theDatabricks Terraform Provider,theDatabricks SDK for Go, theDatabricks CLI,or applications that target the Databricks SDKs for other languages, most likely they will all interoperate nicely together.By default, the Databricks SDK for Python tries the followingauthentication methods,in the following order, until it succeeds:
- Databricks native authentication
- Azure native authentication
- If the SDK is unsuccessful at this point, it returns an authentication error and stops running.
You can instruct the Databricks SDK for Python to use a specific authentication method by setting theauth_type argumentas described in the following sections.
For each authentication method, the SDK searches for compatible authentication credentials in the following locations,in the following order. Once the SDK finds a compatible set of credentials that it can use, it stops searching:
Credentials that are hard-coded into configuration arguments.
⚠️ Caution: Databricks does not recommend hard-coding credentials into arguments, as they can be exposed in plain text in version control systems. Use environment variables or configuration profiles instead.Credentials in Databricks-specificenvironment variables.
For Databricks native authentication, credentials in the
.databrickscfgfile'sDEFAULTconfiguration profile from its default file location (~for Linux or macOS, and%USERPROFILE%for Windows).For Azure native authentication, the SDK searches for credentials through the Azure CLI as needed.
Depending on the Databricks authentication method, the SDK uses the following information. Presented are theWorkspaceClient andAccountClient arguments (which have corresponding.databrickscfg file fields), their descriptions, and any corresponding environment variables.
By default, the Databricks SDK for Python initially triesDatabricks token authentication (auth_type='pat' argument). If the SDK is unsuccessful, it then tries Workload Identity Federation (WIF). SeeSupported WIF for the supported JWT token providers.
- For Databricks token authentication, you must provide
hostandtoken; or their environment variable or.databrickscfgfile field equivalents. - For Databricks OIDC authentication, you must provide the
host,client_idandtoken_audience(optional) either directly, through the corresponding environment variables, or in your.databrickscfgconfiguration file. - For Azure DevOps OIDC authentication, the
token_audienceis irrelevant as the audience is always set toapi://AzureADTokenExchange. Also, theSystem.AccessTokenpipeline variable required for OIDC request must be exposed as theSYSTEM_ACCESSTOKENenvironment variable, followingPipeline variables
| Argument | Description | Environment variable |
|---|---|---|
host | (String) The Databricks host URL for either the Databricks workspace endpoint or the Databricks accounts endpoint. | DATABRICKS_HOST |
account_id | (String) The Databricks account ID for the Databricks accounts endpoint. Only has effect whenHost is eitherhttps://accounts.cloud.databricks.com/(AWS),https://accounts.azuredatabricks.net/(Azure), orhttps://accounts.gcp.databricks.com/(GCP). | DATABRICKS_ACCOUNT_ID |
token | (String) The Databricks personal access token (PAT)(AWS, Azure, and GCP) or Azure Active Directory (Azure AD) token(Azure). | DATABRICKS_TOKEN |
client_id | (String) The Databricks Service Principal Application ID. | DATABRICKS_CLIENT_ID |
token_audience | (String) When using Workload Identity Federation, the audience to specify when fetching an ID token from the ID token supplier. | TOKEN_AUDIENCE |
For example, to use Databricks token authentication:
fromdatabricks.sdkimportWorkspaceClientw=WorkspaceClient(host=input('Databricks Workspace URL: '),token=input('Token: '))
By default, the Databricks SDK for Python first tries Azure client secret authentication (auth_type='azure-client-secret' argument). If the SDK is unsuccessful, it then tries Azure CLI authentication (auth_type='azure-cli' argument). SeeManage service principals.
The Databricks SDK for Python picks up an Azure CLI token, if you've previously authenticated as an Azure user by runningaz login on your machine. SeeGet Azure AD tokens for users by using the Azure CLI.
To authenticate as an Azure Active Directory (Azure AD) service principal, you must provide one of the following. See alsoAdd a service principal to your Azure Databricks account:
azure_workspace_resource_id,azure_client_secret,azure_client_id, andazure_tenant_id; or their environment variable or.databrickscfgfile field equivalents.azure_workspace_resource_idandazure_use_msi; or their environment variable or.databrickscfgfile field equivalents.
| Argument | Description | Environment variable |
|---|---|---|
azure_workspace_resource_id | (String) The Azure Resource Manager ID for the Azure Databricks workspace, which is exchanged for a Databricks host URL. | DATABRICKS_AZURE_RESOURCE_ID |
azure_use_msi | (Boolean)true to use Azure Managed Service Identity passwordless authentication flow for service principals.This feature is not yet implemented in the Databricks SDK for Python. | ARM_USE_MSI |
azure_client_secret | (String) The Azure AD service principal's client secret. | ARM_CLIENT_SECRET |
azure_client_id | (String) The Azure AD service principal's application ID. | ARM_CLIENT_ID |
azure_tenant_id | (String) The Azure AD service principal's tenant ID. | ARM_TENANT_ID |
azure_environment | (String) The Azure environment type (such as Public, UsGov, China, and Germany) for a specific set of API endpoints. Defaults toPUBLIC. | ARM_ENVIRONMENT |
For example, to use Azure client secret authentication:
fromdatabricks.sdkimportWorkspaceClientw=WorkspaceClient(host=input('Databricks Workspace URL: '),azure_workspace_resource_id=input('Azure Resource ID: '),azure_tenant_id=input('AAD Tenant ID: '),azure_client_id=input('AAD Client ID: '),azure_client_secret=input('AAD Client Secret: '))
Please see more examples inthis document.
By default, the Databricks SDK for Python first tries GCP credentials authentication (auth_type='google-credentials', argument). If the SDK is unsuccessful, it then tries Google Cloud Platform (GCP) ID authentication (auth_type='google-id', argument).
The Databricks SDK for Python picks up an OAuth token in the scope of the Google Default Application Credentials (DAC) flow. This means that if you have rungcloud auth application-default login on your development machine, or launch the application on the compute, that is allowed to impersonate the Google Cloud service account specified ingoogle_service_account. Authentication should then work out of the box. SeeCreating and managing service accounts.
To authenticate as a Google Cloud service account, you must provide one of the following:
hostandgoogle_credentials; or their environment variable or.databrickscfgfile field equivalents.hostandgoogle_service_account; or their environment variable or.databrickscfgfile field equivalents.
| Argument | Description | Environment variable |
|---|---|---|
google_credentials | (String) GCP Service Account Credentials JSON or the location of these credentials on the local filesystem. | GOOGLE_CREDENTIALS |
google_service_account | (String) The Google Cloud Platform (GCP) service account e-mail used for impersonation in the Default Application Credentials Flow that does not require a password. | DATABRICKS_GOOGLE_SERVICE_ACCOUNT |
For example, to use Google ID authentication:
fromdatabricks.sdkimportWorkspaceClientw=WorkspaceClient(host=input('Databricks Workspace URL: '),google_service_account=input('Google Service Account: '))
ForDatabricks native authentication, you can override the default behavior for using.databrickscfg as follows:
| Argument | Description | Environment variable |
|---|---|---|
profile | (String) A connection profile specified within.databrickscfg to use instead ofDEFAULT. | DATABRICKS_CONFIG_PROFILE |
config_file | (String) A non-default location of the Databricks CLI credentials file. | DATABRICKS_CONFIG_FILE |
For example, to use a profile namedMYPROFILE instead ofDEFAULT:
fromdatabricks.sdkimportWorkspaceClientw=WorkspaceClient(profile='MYPROFILE')# Now call the Databricks workspace APIs as desired...
For all authentication methods, you can override the default behavior in client arguments as follows:
| Argument | Description | Environment variable |
|---|---|---|
auth_type | (String) When multiple auth attributes are available in the environment, use the auth type specified by this argument. This argument also holds the currently selected auth. | DATABRICKS_AUTH_TYPE |
http_timeout_seconds | (Integer) Number of seconds for HTTP timeout. Default is60. | (None) |
retry_timeout_seconds | (Integer) Number of seconds to keep retrying HTTP requests. Default is300 (5 minutes). | (None) |
debug_truncate_bytes | (Integer) Truncate JSON fields in debug logs above this limit. Default is 96. | DATABRICKS_DEBUG_TRUNCATE_BYTES |
debug_headers | (Boolean)true to debug HTTP headers of requests made by the application. Default isfalse, as headers contain sensitive data, such as access tokens. | DATABRICKS_DEBUG_HEADERS |
rate_limit | (Integer) Maximum number of requests per second made to Databricks REST API. | DATABRICKS_RATE_LIMIT |
For example, here's how you can update the overall retry timeout:
fromdatabricks.sdkimportWorkspaceClientfromdatabricks.sdk.coreimportConfigw=WorkspaceClient(config=Config(retry_timeout_seconds=300))# Now call the Databricks workspace APIs as desired...
When you invoke a long-running operation, the SDK provides a high-level API totrigger these operations andwait for the related entitiesto reach the correct state or return the error message in case of failure. All long-running operations return genericWait instance withresult()method to get a result of long-running operation, once it's finished. Databricks SDK for Python picks the most reasonable default timeouts forevery method, but sometimes you may find yourself in a situation, where you'd want to providedatetime.timedelta() as the value oftimeoutargument toresult() method.
There are a number of long-running operations in Databricks APIs such as managing:
- Clusters,
- Command execution
- Jobs
- Libraries
- Delta Live Tables pipelines
- Databricks SQL warehouses.
For example, in the Clusters API, once you create a cluster, you receive a cluster ID, and the cluster is in thePENDING state MeanwhileDatabricks takes care of provisioning virtual machines from the cloud provider in the background. The cluster isonly usable in theRUNNING state and so you have to wait for that state to be reached.
Another example is the API for running a job or repairing the run: right afterthe run starts, the run is in thePENDING state. The job is only considered to be finished when it is in eithertheTERMINATED orSKIPPED state. Also you would likely need the error message if the long-runningoperation times out and fails with an error code. Other times you may want to configure a custom timeout other thanthe default of 20 minutes.
In the following example,w.clusters.create returnsClusterInfo only once the cluster is in theRUNNING state,otherwise it will timeout in 10 minutes:
importdatetimeimportloggingfromdatabricks.sdkimportWorkspaceClientw=WorkspaceClient()info=w.clusters.create_and_wait(cluster_name='Created cluster',spark_version='12.0.x-scala2.12',node_type_id='m5d.large',autotermination_minutes=10,num_workers=1,timeout=datetime.timedelta(minutes=10))logging.info(f'Created:{info}')
Please look at theexamples/starting_job_and_waiting.py for a more advanced usage:
importdatetimeimportloggingimporttimefromdatabricks.sdkimportWorkspaceClientimportdatabricks.sdk.service.jobsasjw=WorkspaceClient()# create a dummy file on DBFS that just sleeps for 10 secondspy_on_dbfs=f'/home/{w.current_user.me().user_name}/sample.py'withw.dbfs.open(py_on_dbfs,write=True,overwrite=True)asf:f.write(b'import time; time.sleep(10); print("Hello, World!")')# trigger one-time-run job and get waiter objectwaiter=w.jobs.submit(run_name=f'py-sdk-run-{time.time()}',tasks=[j.RunSubmitTaskSettings(task_key='hello_world',new_cluster=j.BaseClusterInfo(spark_version=w.clusters.select_spark_version(long_term_support=True),node_type_id=w.clusters.select_node_type(local_disk=True),num_workers=1 ),spark_python_task=j.SparkPythonTask(python_file=f'dbfs:{py_on_dbfs}' ), )])logging.info(f'starting to poll:{waiter.run_id}')# callback, that receives a polled entity between state updatesdefprint_status(run:j.Run):statuses= [f'{t.task_key}:{t.state.life_cycle_state}'fortinrun.tasks]logging.info(f'workflow intermediate status:{", ".join(statuses)}')# If you want to perform polling in a separate thread, process, or service,# you can use w.jobs.wait_get_run_job_terminated_or_skipped(# run_id=waiter.run_id,# timeout=datetime.timedelta(minutes=15),# callback=print_status) to achieve the same results.## Waiter interface allows for `w.jobs.submit(..).result()` simplicity in# the scenarios, where you need to block the calling thread for the job to finish.run=waiter.result(timeout=datetime.timedelta(minutes=15),callback=print_status)logging.info(f'job finished:{run.run_page_url}')
On the platform side the Databricks APIs have different wait to deal with pagination:
- Some APIs follow the offset-plus-limit pagination
- Some start their offsets from 0 and some from 1
- Some use the cursor-based iteration
- Others just return all results in a single response
The Databricks SDK for Python hides this complexityunderIterator[T] abstraction, where multi-page resultsyield items. Python typing helps to auto-completethe individual item fields.
importloggingfromdatabricks.sdkimportWorkspaceClientw=WorkspaceClient()forrepoinw.repos.list():logging.info(f'Found repo:{repo.path}')
Please look at theexamples/last_job_runs.py for a more advanced usage:
importloggingfromcollectionsimportdefaultdictfromdatetimeimportdatetime,timezonefromdatabricks.sdkimportWorkspaceClientlatest_state= {}all_jobs= {}durations=defaultdict(list)w=WorkspaceClient()forjobinw.jobs.list():all_jobs[job.job_id]=jobforruninw.jobs.list_runs(job_id=job.job_id,expand_tasks=False):durations[job.job_id].append(run.run_duration)ifjob.job_idnotinlatest_state:latest_state[job.job_id]=runcontinueifrun.end_time<latest_state[job.job_id].end_time:continuelatest_state[job.job_id]=runsummary= []forjob_id,runinlatest_state.items():summary.append({'job_name':all_jobs[job_id].settings.name,'last_status':run.state.result_state,'last_finished':datetime.fromtimestamp(run.end_time/1000,timezone.utc),'average_duration':sum(durations[job_id])/len(durations[job_id]) })forlineinsorted(summary,key=lambdas:s['last_finished'],reverse=True):logging.info(f'Latest:{line}')
For a regular web app running on a server, it's recommended to use the Authorization Code Flow to obtain an Access Tokenand a Refresh Token. This method is considered safe because the Access Token is transmitted directly to the serverhosting the app, without passing through the user's web browser and risking exposure.
To enhance the security of the Authorization Code Flow, the PKCE (Proof Key for Code Exchange) mechanism can beemployed. With PKCE, the calling application generates a secret called the Code Verifier, which is verified bythe authorization server. The app also creates a transform value of the Code Verifier, called the Code Challenge,and sends it over HTTPS to obtain an Authorization Code. By intercepting the Authorization Code, a malicious attackercannot exchange it for a token without possessing the Code Verifier.
Thepresented sampleis a Python3 script that uses the Flask web framework along with Databricks SDK for Python to demonstrate how toimplement the OAuth Authorization Code flow with PKCE security. It can be used to build an app where each user usestheir identity to access Databricks resources. The script can be executed with or without client and secret credentialsfor a custom OAuth app.
Databricks SDK for Python exposes theoauth_client.initiate_consent() helper to acquire user redirect URL and initiatePKCE state verification. Application developers are expected to persistRefreshableCredentials in the webapp sessionand restore it viaRefreshableCredentials.from_dict(oauth_client, session['creds']) helpers.
Works for both AWS and Azure. Not supported for GCP at the moment.
fromdatabricks.sdk.oauthimportOAuthClientoauth_client=OAuthClient(host='<workspace-url>',client_id='<oauth client ID>',redirect_url=f'http://host.domain/callback',scopes=['clusters'])importsecretsfromflaskimportFlask,render_template_string,request,redirect,url_for,sessionAPP_NAME='flask-demo'app=Flask(APP_NAME)app.secret_key=secrets.token_urlsafe(32)@app.route('/callback')defcallback():fromdatabricks.sdk.oauthimportConsentconsent=Consent.from_dict(oauth_client,session['consent'])session['creds']=consent.exchange_callback_parameters(request.args).as_dict()returnredirect(url_for('index'))@app.route('/')defindex():if'creds'notinsession:consent=oauth_client.initiate_consent()session['consent']=consent.as_dict()returnredirect(consent.auth_url)fromdatabricks.sdkimportWorkspaceClientfromdatabricks.sdk.oauthimportSessionCredentialscredentials_provider=SessionCredentials.from_dict(oauth_client,session['creds'])workspace_client=WorkspaceClient(host=oauth_client.host,product=APP_NAME,credentials_provider=credentials_provider)returnrender_template_string('...',w=workspace_client)
For applications, that do run on developer workstations, Databricks SDK for Python providesauth_type='external-browser'utility, that opens up a browser for a user to go through SSO flow. Azure support is still in the early experimentalstage.
fromdatabricks.sdkimportWorkspaceClienthost=input('Enter Databricks host: ')w=WorkspaceClient(host=host,auth_type='external-browser')clusters=w.clusters.list()forclinclusters:print(f' -{cl.cluster_name} is{cl.state}')
In order to use OAuth with Databricks SDK for Python, you should useaccount_client.custom_app_integration.create API.
importlogging,getpassfromdatabricks.sdkimportAccountClientaccount_client=AccountClient(host='https://accounts.cloud.databricks.com',account_id=input('Databricks Account ID: '),username=input('Username: '),password=getpass.getpass('Password: '))logging.info('Enrolling all published apps...')account_client.o_auth_enrollment.create(enable_all_published_apps=True)status=account_client.o_auth_enrollment.get()logging.info(f'Enrolled all published apps:{status}')custom_app=account_client.custom_app_integration.create(name='awesome-app',redirect_urls=[f'https://host.domain/path/to/callback'],confidential=True)logging.info(f'Created new custom app: 'f'--client_id{custom_app.client_id} 'f'--client_secret{custom_app.client_secret}')
The Databricks SDK for Python uses theUser-Agent header to include request metadata along with each request. By default, this includes the version of the Python SDK, the version of the Python language used by your application, and the underlying operating system. To statically add additional metadata, you can use thewith_partner() andwith_product() functions in thedatabricks.sdk.useragent module.with_partner() can be used by partners to indicate that code using the Databricks SDK for Go should be attributed to a specific partner. Multiple partners can be registered at once. Partner names can contain any number, digit,.,-,_ or+.
fromdatabricks.sdkimportuseragentuseragent.with_product("partner-abc")useragent.with_partner("partner-xyz")
with_product() can be used to define the name and version of the product that is built with the Databricks SDK for Python. The product name has the same restrictions as the partner name above, and the product version must be a validSemVer. Subsequent calls towith_product() replace the original product with the new user-specified one.
from databricks.sdkimportuseragentuseragent.with_product("databricks-example-product","1.2.0")
If both theDATABRICKS_SDK_UPSTREAM andDATABRICKS_SDK_UPSTREAM_VERSION environment variables are defined, these will also be included in theUser-Agent header.
If additional metadata needs to be specified that isn't already supported by the above interfaces, you can use thewith_user_agent_extra() function to register arbitrary key-value pairs to include in the user agent. Multiple values associated with the same key are allowed. Keys have the same restrictions as the partner name above. Values must be either as described above or SemVer strings.
AdditionalUser-Agent information can be associated with different instances ofDatabricksConfig. To add metadata to a specific instance ofDatabricksConfig, use thewith_user_agent_extra() method.
The Databricks SDK for Python provides a robust error-handling mechanism that allows developers to catch and handle API errors. When an error occurs, the SDK will raise an exception that contains information about the error, such as the HTTP status code, error message, and error details. Developers can catch these exceptions and handle them appropriately in their code.
fromdatabricks.sdkimportWorkspaceClientfromdatabricks.sdk.errorsimportResourceDoesNotExistw=WorkspaceClient()try:w.clusters.get(cluster_id='1234-5678-9012')exceptResourceDoesNotExistase:print(f'Cluster not found:{e}')
The SDK handles inconsistencies in error responses amongst the different services, providing a consistent interface for developers to work with. Simply catch the appropriate exception type and handle the error as needed. The errors returned by the Databricks API are defined indatabricks/sdk/errors/platform.py.
The Databricks SDK for Python seamlessly integrates with the standardLogging facility for Python.This allows developers to easily enable and customize logging for their Databricks Python projects.To enable debug logging in your Databricks Python project, you can follow the example below:
importlogging,syslogging.basicConfig(stream=sys.stderr,level=logging.INFO,format='%(asctime)s [%(name)s][%(levelname)s] %(message)s')logging.getLogger('databricks.sdk').setLevel(logging.DEBUG)fromdatabricks.sdkimportWorkspaceClientw=WorkspaceClient(debug_truncate_bytes=1024,debug_headers=False)forclusterinw.clusters.list():logging.info(f'Found cluster:{cluster.cluster_name}')
In the above code snippet, the logging module is imported and thebasicConfig() method is used to set the logging level toDEBUG.This will enable logging at the debug level and above. Developers can adjust the logging level as needed to control the verbosity of the logging output.The SDK will log all requests and responses to standard error output, using the format> for requests and< for responses.In some cases, requests or responses may be truncated due to size considerations. If this occurs, the log message will includethe text... (XXX additional elements) to indicate that the request or response has been truncated. To increase the truncation limits,developers can set thedebug_truncate_bytes configuration property or theDATABRICKS_DEBUG_TRUNCATE_BYTES environment variable.To protect sensitive data, such as authentication tokens, passwords, or any HTTP headers, the SDK will automatically replace thesevalues with**REDACTED** in the log output. Developers can disable this redaction by setting thedebug_headers configuration property toTrue.
2023-03-22 21:19:21,702 [databricks.sdk][DEBUG] GET /api/2.0/clusters/list< 200 OK< {< "clusters": [< {< "autotermination_minutes": 60,< "cluster_id": "1109-115255-s1w13zjj",< "cluster_name": "DEFAULT Test Cluster",< ... truncated for brevity< },< "... (47 additional elements)"< ]< }Overall, the logging capabilities provided by the Databricks SDK for Python can be a powerful tool for monitoring and troubleshooting yourDatabricks Python projects. Developers can use the various logging methods and configuration options provided by the SDK to customizethe logging output to their specific needs.
You can use the client-side implementation ofdbutils by accessingdbutils property on theWorkspaceClient.Most of thedbutils.fs operations anddbutils.secrets are implemented natively in Python within Databricks SDK. Non-SDK implementations still require a Databricks cluster,that you have to specify through thecluster_id configuration attribute orDATABRICKS_CLUSTER_ID environment variable. Don't worry if cluster is not running: internally,Databricks SDK for Python callsw.clusters.ensure_cluster_is_running().
fromdatabricks.sdkimportWorkspaceClientw=WorkspaceClient()dbutils=w.dbutilsfiles_in_root=dbutils.fs.ls('/')print(f'number of files in root:{len(files_in_root)}')
Alternatively, you can importdbutils fromdatabricks.sdk.runtime module, but you have to make sure that all configuration is alreadypresent in the environment variables:
fromdatabricks.sdk.runtimeimportdbutilsforsecret_scopeindbutils.secrets.listScopes():forsecret_metadataindbutils.secrets.list(secret_scope.name):print(f'found{secret_metadata.key} secret in{secret_scope.name} scope')
Databricks is actively working on stabilizing the Databricks SDK for Python's interfaces.API clients for all services are generated from specification files that are synchronized from the main platform.You are highly encouraged to pin the exact dependency version and read thechangelogwhere Databricks documents the changes. Databricks may have minordocumentedbackward-incompatible changes, such as renaming some type names to bring more consistency.
About
Databricks SDK for Python (Beta)
Topics
Resources
License
Contributing
Security policy
Uh oh!
There was an error while loading.Please reload this page.