🦈 pyjaws

  • PyJaws enables declaring Databricks Jobs and Workflows as Python code, allowing for code Linting, Formatting, Parameter Validation, Modularity and reusability.

  • In addition to those, PyJaws also provides some nice features such ascycle detection out of the box.

Folks who have used Python-based orchestration tools such asApache Airflow,Luigi andMage will be familiar with the concepts and the API ifPyJaws.

PyJaws Mascot - A Shark on Steroids!

Project Homepage

Getting Started

  • First step is installing pyjaws:

    pipinstallpyjaws
  • Once it’s installed, define your Databricks Workspace authentication variables:

    exportDATABRICKS_HOST=...exportDATABRICKS_TOKEN=...

Last, define your Workflow Tasks (see examples) and run:

pyjawscreatepath/to/your/workflow_definitions

Sample Job Definition

Below you can find a simple PyJaws job definition:

frompyjaws.api.baseimport(Cluster,Runtime,Workflow)frompyjaws.api.tasksimportPythonWheelTaskcluster=Cluster(job_cluster_key="ai_cluster",spark_version=Runtime.DBR_13_ML,num_workers=2,node_type_id="Standard_DS3_v2",cluster_log_conf={"dbfs":{"destination":"dbfs:/home/cluster_log"}})# Create a Task object.ingest_task=PythonWheelTask(key="ingest",cluster=cluster,entrypoint="iot",task_name="ingest",parameters=[f"my_parameter_value","--output-table","my_table"])transform_task=PythonWheelTask(key="transform",cluster=cluster,entrypoint="iot",task_name="ingest",dependencies=[ingest_task],parameters=[f"my_parameter_value2","--input-table","my_table""--output-table","output_table"])# Create a Workflow object to define dependencies# between previously defined tasks.workflow=Workflow(name="my_workflow",tasks=[ingest_task,transform_task])

API Reference

If you are looking for information on a specific function, class, ormethod, this part of the documentation is for you.