forcedotcom/datacloud-customcode-python-sdkPublic

generated fromsalesforce/oss-template

NotificationsYou must be signed in to change notification settings
Fork2
Star0

License

Apache-2.0 license

0 stars 2 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.github		.github
docs		docs
scripts		scripts
src/datacustomcode		src/datacustomcode
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
FOR_CONTRIBUTORS.md		FOR_CONTRIBUTORS.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
TERMS_OF_USE.md		TERMS_OF_USE.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Repository files navigation

Data Cloud Custom Code SDK (BETA)

This package provides a development kit for creating custom data transformations inData Cloud. It allows you to write your own data processing logic in Python while leveraging Data Cloud's infrastructure for data access and running data transformations, mapping execution into Data Cloud data structures likeData Model Objects andData Lake Objects.

More specifically, this codebase gives you ability to test code locally before pushing to Data Cloud's remote execution engine, greatly reducing how long it takes to develop.

Use of this project with Salesforce is subject to theTERMS OF USE

Prerequisites

Python 3.11 only (currently supported version - if your system version is different, we recommend usingpyenv to configure 3.11)
Azul Zulu OpenJDK 17.x
Docker support likeDocker Desktop
A salesforce org with some DLOs or DMOs with data and this feature enabled (it is not GA)
Aconnected app

Installation

The SDK can be downloaded directly from PyPI withpip:

pip install salesforce-data-customcode

You can verify it was properly installed via CLI:

datacustomcode version

Quick start

Ensure you have all theprerequisites prepared on your machine.

To get started, create a directory and initialize a new project with the CLI:

mkdir datacloud&&cd datacloudpython3.11 -m venv .venvsource .venv/bin/activatepip install salesforce-data-customcodedatacustomcode init my_package

This will yield all necessary files to get started:

.├── Dockerfile├── README.md├── requirements.txt├── requirements-dev.txt├── payload│   ├── config.json│   ├── entrypoint.py├── jupyterlab.sh└── requirements.txt

Dockerfile(Do not update) – Development container emulating the remote execution environment.
requirements-dev.txt(Do not update) – These are the dependencies for the development environment.
jupyterlab.sh(Do not update) – Helper script for setting up Jupyter.
requirements.txt – Here you define the requirements that you will need for your script.
payload – This folder will be compressed and deployed to the remote execution environment.
- config.json – This config defines permissions on the back and can be generated programmatically withscan CLI method.
- entrypoint.py – The script that defines the data transformation logic.

A functional entrypoint.py is provided so you can run once you've configured your connected app:

cd my_packagedatacustomcode configuredatacustomcode run ./payload/entrypoint.py

Important

The example entrypoint.py requires aAccount_Home__dll DLO to be present. And in order to deploy the script (next step), the output DLO (which isAccount_Home_copy__dll in the example entrypoint.py) also needs to exist and be in the same dataspace asAccount_Home__dll.

After modifying theentrypoint.py as needed, using any dependencies you add in the.venv virtual environment, you can run this script in Data Cloud:

To Add New Dependencies:

Make sure your virtual environment is activated
Add dependencies torequirements.txt
Runpip install -r requirements.txt
The SDK automatically packages all dependencies when you rundatacustomcode zip

datacustomcode scan ./payload/entrypoint.pydatacustomcode deploy --path ./payload --name my_custom_script --cpu-size CPU_L

Tip

Thedeploy process can take several minutes. If you'd like more feedback on the underlying process, you can add--debug to the command likedatacustomcode --debug deploy --path ./payload --name my_custom_script

Note

CPU Size: Choose the appropriate CPU/Compute Size based on your workload requirements:

CPU_L / CPU_XL / CPU_2XL / CPU_4XL: Large, X-Large, 2X-Large and 4X-Large CPU instances for data processing
Default isCPU_2XL which provides a good balance of performance and cost for most use cases

You can now use the Salesforce Data Cloud UI to find the created Data Transform and use theRun Now button to run it.Once the Data Transform run is successful, check the DLO your script is writing to and verify the correct records were added.

Dependency Management

The SDK automatically handles all dependency packaging for Data Cloud deployment. Here's how it works:

Add dependencies torequirements.txt - List any Python packages your script needs
Install locally - Usepip install -r requirements.txt in your virtual environment
Automatic packaging - When you rundatacustomcode zip, the SDK automatically:
- Packages all dependencies fromrequirements.txt
- Uses the correct platform and architecture for Data Cloud

No need to worry about platform compatibility - the SDK handles this automatically through the Docker-based packaging process.

files directory

.├── payload│   ├── config.json│   ├── entrypoint.py├── files│   ├── data.csv

py-files directory

Your Python dependencies can be packaged as .py files, .zip archives (containing multiple .py files or a Python package structure), or .egg files.

.├── payload│   ├── config.json│   ├── entrypoint.py├── py-files│   ├── moduleA│   │   ├── __init__.py│   │   ├── moduleA.py

API

Your entry point script will define logic using theClient object which wraps data access layers.

You should only need the following methods:

find_file_path(file_name) - Returns a file path
read_dlo(name) – Read from a Data Lake Object by name
read_dmo(name) – Read from a Data Model Object by name
write_to_dlo(name, spark_dataframe, write_mode) – Write to a Data Model Object by name with a Spark dataframe
write_to_dmo(name, spark_dataframe, write_mode) – Write to a Data Lake Object by name with a Spark dataframe

For example:

from datacustomcode import Clientclient = Client()sdf = client.read_dlo('my_DLO')# some transformations# ...client.write_to_dlo('output_DLO')

Warning

Currently we only support reading from DMOs and writing to DMOs or reading from DLOs and writing to DLOs, but they cannot mix.

CLI

The Data Cloud Custom Code SDK provides a command-line interface (CLI) with the following commands:

Global Options

--debug: Enable debug-level logging

Commands

`datacustomcode version`

Display the current version of the package.

`datacustomcode configure`

Configure credentials for connecting to Data Cloud.

Options:

--profile TEXT: Credential profile name (default: "default")
--username TEXT: Salesforce username
--password TEXT: Salesforce password
--client-id TEXT: Connected App Client ID
--client-secret TEXT: Connected App Client Secret
--login-url TEXT: Salesforce login URL
--dataspace TEXT: Dataspace name (optional, for non-default dataspaces)

`datacustomcode init`

Initialize a new development environment with a template.

Argument:

DIRECTORY: Directory to create project in (default: ".")

`datacustomcode scan`

Scan a Python file to generate a Data Cloud configuration.

Argument:

FILENAME: Python file to scan

Options:

--config TEXT: Path to save the configuration file (default: same directory as FILENAME)
--dry-run: Preview the configuration without saving to a file

`datacustomcode run`

Run an entrypoint file locally for testing.

Argument:

ENTRYPOINT: Path to entrypoint Python file

Options:

--config-file TEXT: Path to configuration file
--dependencies TEXT: Additional dependencies (can be specified multiple times)
--profile TEXT: Credential profile name (default: "default")

`datacustomcode zip`

Zip a transformation job in preparation to upload to Data Cloud.

Options:

--path TEXT: Path to the code directory (default: ".")
--network TEXT: docker network (default: "default")

`datacustomcode deploy`

Deploy a transformation job to Data Cloud.

Options:

--profile TEXT: Credential profile name (default: "default")
--path TEXT: Path to the code directory (default: ".")
--name TEXT: Name of the transformation job [required]
--version TEXT: Version of the transformation job (default: "0.0.1")
--description TEXT: Description of the transformation job (default: "")
--network TEXT: docker network (default: "default")
--cpu-size TEXT: CPU size for the deployment (default: "CPU_XL"). Available options: CPU_L(Large), CPU_XL(Extra Large), CPU_2XL(2X Large), CPU_4XL(4X Large)

Docker usage

The SDK provides Docker-based development options that allow you to test your code in an environment that closely resembles Data Cloud's execution environment.

How Docker Works with the SDK

When you initialize a project withdatacustomcode init my_package, aDockerfile is created automatically. This Dockerfile:

Isn't used during local development with virtual environments
Becomes active during packaging when you rundatacustomcode zip ordeploy
Ensures compatibility by using the same base image as Data Cloud
Handles dependencies automatically regardless of platform differences

VS Code Dev Containers

Within yourinited package, you will find a.devcontainer folder which allows you to run a docker container while developing inside of it.

Read more about Dev Containers here:https://code.visualstudio.com/docs/devcontainers/containers.

Setup Instructions

Install the VS Code extension "Dev Containers" by microsoft.com.
Open your package folder in VS Code, ensuring that the.devcontainer folder isat the root of the File Explorer
Bring up the Command Palette (on mac: Cmd + Shift + P), and select "DevContainers: Rebuild and Reopen in Container"
Allow the docker image to be built, then you're ready to develop

Development Workflow

Once inside the Dev Container:

Terminal access: Open a terminal within the container
Run your code: Executedatacustomcode run ./payload/entrypoint.py
Environment consistency: Your code will run inside a docker container that more closely resembles Data Cloud compute than your machine

Tip

IDE Configuration: UseCMD+Shift+P (orCtrl+Shift+P on Windows/Linux), then select "Python: Select Interpreter" to configure the correct Python Interpreter

Important

Dev Containers get their own tmp file storage, so you'll need to re-rundatacustomcode configure every time you "Rebuild and Reopen in Container".

JupyterLab

Within yourinited package, you will find ajupyterlab.sh file that can open a jupyter notebook for you. Jupyter notebooks, incombination with Data Cloud'sQuery EditorandData Explorer, can be extremely helpful for dataexploration. Instead of running an entire script, one can run one code cell at a time as they discover and experiment with the DLO or DMO data.

You can read more about Jupyter Notebooks here:https://jupyter.org/

Within the root project of your package folder, run./jupyterlab.sh start
Double-click on "account.ipynb" file, which provides a starting point for a notebook
Use shift+enter to execute each cell within the notebook. Add/edit/delete cells of code as needed for your data exploration.
Don't forget to run./jupyterlab.sh stop to stop the docker container

Important

JupyterLab uses its own tmp file storage, so you'll need to re-rundatacustomcode configure each time you./jupyterlab.sh start.

Prerequisite details

Creating a connected app

Log in to salesforce as an admin. In the top right corner, click on the gear icon and go toSetup
In the left hand side, search for "App Manager" and select theApp Manager underneathApps
Click onNew Connected App in the upper right
Fill in the required fields within theBasic Information section
Under theAPI (Enable OAuth Settings) section:
1. Click on the checkbox to Enable OAuth Settings.
2. Provide a callback URL likehttp://localhost:55555/callback
3. In the Selected OAuth Scopes, make sure thatrefresh_token,api,cdp_query_api,cdp_profile_api is selected.
4. Click on Save to save the connected app
From the detail page that opens up afterwards, click the "Manage Consumer Details" button to find your client id and client secret
Go back toSetup, thenOAuth and OpenID Connect Settings, and enable the "Allow OAuth Username-Password Flows" option

You now have all fields necessary for thedatacustomcode configure command.

Working with Dataspaces

If you're working with a non-default dataspace in Salesforce Data Cloud, you can specify the dataspace during configuration:

datacustomcode configure --dataspace my-dataspace

For default dataspaces, you can omit the--dataspace parameter entirely - the SDK will connect to the default dataspace automatically.

Other docs

About

No description, website, or topics provided.

Resources

Readme

License

Apache-2.0 license

Code of conduct

Contributing

Releases16

v0.1.16 Latest

Dec 16, 2025

+ 15 releases

Packages

No packages published

Movatterモバイル変換

License

forcedotcom/datacloud-customcode-python-sdk

Folders and files

Latest commit

History

Repository files navigation

Data Cloud Custom Code SDK (BETA)

Prerequisites

Installation

Quick start

Dependency Management

files directory

py-files directory

API

CLI

Global Options

Commands

datacustomcode version

datacustomcode configure

datacustomcode init

datacustomcode scan

datacustomcode run

datacustomcode zip

datacustomcode deploy

Docker usage

How Docker Works with the SDK

VS Code Dev Containers

Setup Instructions

Development Workflow

JupyterLab

Prerequisite details

Creating a connected app

Working with Dataspaces

Other docs

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases16

Packages0

Uh oh!

Contributors6

Languages

`datacustomcode version`

`datacustomcode configure`

`datacustomcode init`

`datacustomcode scan`

`datacustomcode run`

`datacustomcode zip`

`datacustomcode deploy`

Packages