🦈 pyjaws¶
PyJaws enables declaring Databricks Jobs and Workflows as Python code, allowing for code Linting, Formatting, Parameter Validation, Modularity and reusability.
In addition to those, PyJaws also provides some nice features such ascycle detection out of the box.
Folks who have used Python-based orchestration tools such asApache Airflow,Luigi andMage will be familiar with the concepts and the API ifPyJaws.

Project Homepage¶
Getting Started¶
First step is installing pyjaws:
pipinstallpyjaws
Once it’s installed, define your Databricks Workspace authentication variables:
exportDATABRICKS_HOST=...exportDATABRICKS_TOKEN=...
Last, define your Workflow Tasks (see examples) and run:
pyjawscreatepath/to/your/workflow_definitions
Sample Job Definition¶
Below you can find a simple PyJaws job definition:
frompyjaws.api.baseimport(Cluster,Runtime,Workflow)frompyjaws.api.tasksimportPythonWheelTaskcluster=Cluster(job_cluster_key="ai_cluster",spark_version=Runtime.DBR_13_ML,num_workers=2,node_type_id="Standard_DS3_v2",cluster_log_conf={"dbfs":{"destination":"dbfs:/home/cluster_log"}})# Create a Task object.ingest_task=PythonWheelTask(key="ingest",cluster=cluster,entrypoint="iot",task_name="ingest",parameters=[f"my_parameter_value","--output-table","my_table"])transform_task=PythonWheelTask(key="transform",cluster=cluster,entrypoint="iot",task_name="ingest",dependencies=[ingest_task],parameters=[f"my_parameter_value2","--input-table","my_table""--output-table","output_table"])# Create a Workflow object to define dependencies# between previously defined tasks.workflow=Workflow(name="my_workflow",tasks=[ingest_task,transform_task])
API Reference¶
If you are looking for information on a specific function, class, ormethod, this part of the documentation is for you.