Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS

License

NotificationsYou must be signed in to change notification settings

aws/aws-step-functions-data-science-sdk-python

Unit Tests Build StatusDocumentation StatusPyPI

AWS Step Functions Data Science SDK

The AWS Step Functions Data Science SDK is an open-source library that allows datascientists to easily create workflows that process and publish machine learningmodels using Amazon SageMaker and AWS Step Functions. You can create machine learningworkflows in Python that orchestrate AWS infrastructure at scale, without havingto provision and integrate the AWS services separately.

  • Workflow - A sequence of steps designed to perform some work
  • Step - A unit of work within a workflow
  • ML Pipeline - A type of workflow used in data science to create and train machine learning models

The AWS Step Functions Data Science SDK enables you to do the following.

  • Easily construct and run machine learning workflows that use AWSinfrastructure directly in Python
  • Instantiate common training pipelines
  • Create standard machine learning workflows in a Jupyter notebook fromtemplates

Table of Contents

Getting Started With Sample Jupyter Notebooks

The best way to quickly review how the AWS Step Functions Data Science SDK worksis to review the related example notebooks. These notebooks provide code anddescriptions for creating and running workflows in AWS Step Functions Usingthe AWS Step Functions Data Science SDK.

Example Notebooks in SageMaker

In Amazon SageMaker, example Jupyter notebooks are available in theexamplenotebooks portion of a notebook instance. To run the example notebooks, do the following.

  1. EitherCreate a Notebook Instance orAccess an Existing notebook instance.
  2. Select theSageMaker Examples tab.
  3. Choose a notebook in theStep Functions Data Science SDK section and selectUse.

For more information, seeExample Notebooksin the Amazon SageMaker documentation.

Run Example Notebooks Locally

To run the AWS Step Functions Data Science SDK example notebooks locally, downloadthe sample notebooks and open them in a working Jupyter instance.

  1. Install Jupyter:https://jupyter.readthedocs.io/en/latest/install.html
  2. Download the following files from:https://github.com/awslabs/amazon-sagemaker-examples/tree/master/step-functions-data-science-sdk.
  • hello_world_workflow.ipynb
  • machine_learning_workflow_abalone.ipynb
  • training_pipeline_pytorch_mnist.ipynb
  1. Open the files in Jupyter.

Installing the AWS Step Functions Data Science SDK

The AWS Step Functions Data Science SDK is built to PyPI and can be installed withpip as follows.

pip install stepfunctions

You can install from source by cloning this repository and running a pip installcommand in the root directory of the repository:

git clone https://github.com/aws/aws-step-functions-data-science-sdk-python.gitcd aws-step-functions-data-science-sdk-pythonpip install .

Supported Operating Systems

The AWS Step Functions Data Science SDK supports Unix/Linux and Mac.

Supported Python Versions

The AWS Step Functions Data Science SDK is tested on:

  • Python 2.7
  • Python 3.6

Overview of SDK

The AWS Step Functions Data Science SDK provides a Python API that enables you tocreate data science and machine learning workflows using AWS Step Functions andSageMaker directly in your Python code and Jupyter notebooks.

Using this SDK you can:

  1. Create steps that accomplish tasks.
  2. Chain those steps together into workflows.
  3. Include retry, succeed, or fail steps.
  4. Review a graphical representation and definition for your workflow.
  5. Create a workflow in AWS Step Functions.
  6. Start and review executions in AWS Step Functions.

For a detailed API reference of the AWS Step Functions Data Science SDK,be sure to view this documentation onRead the Docs.

AWS Step Functions

AWS Step Functions lets you coordinate multiple AWS services into serverlessworkflows so you can build and update apps quickly. Using Step Functions, youcan design and run workflows that combine services such as Amazon SageMaker, AWSLambda, and Amazon Elastic Container Service (Amazon ECS), into feature-richapplications. Workflows are made up of a series of steps, with the output of onestep acting as input to the next.

The AWS Step Functions Data Science SDK provides access to AWS Step Functions so thatyou can easily create and run machine learning and data science workflowsdirectly in Python, and inside your Jupyter Notebooks. Workflows are created locallyin Python, but when they are ready for execution, the workflow is first uploadedto the AWS Step Functions service for execution in the cloud.

When you use the SDK to create, update, or execute workflowsyou are talking to the Step Functions service in the cloud. Your workflowslive in AWS Step Functions and can be re-used.

You can execute a workflow as many times as you want, and you can optionallychange the input each time. Each time you execute a workflow, it creates a newexecution instance in the cloud. You can inspect these executions with SDKcommands, or with the Step Functions management console. You can run more thanone execution at a time.

Using this SDK you can create steps, chain them together to create a workflow,create that workflow in AWS Step Functions, and execute the workflow in theAWS cloud.

Create a workflow in AWS Step Functions

Once you have created your workflow in AWS Step Functions, you can execute thatworkflow in Step Functions, in the AWS cloud.

Start a workflow in AWS Step Functions

Step Functions creates workflows out of steps calledStates,and expresses that workflow in theAmazon States Language.When you create a workflow in the AWS Step Functions Data Science SDK, itcreates a State Machine representing your workflow and steps in AWS StepFunctions.

For more information about Step Functions concepts and use, see the StepFunctionsdocumentation.

Building a Workflow

Steps

You create steps using the SDK, and chain them together into sequentialworkflows. Then, you can create those workflows in AWS Step Functions andexecute them in Step Functions directly from your Python code. For example,the following is how you define a pass step.

start_pass_state=Pass(state_id="MyPassState")

The following is how you define a wait step.

wait_state=Wait(state_id="Wait for 3 seconds",seconds=3)

The following example shows how to define a Lambda step,and then defines a Retry and a Catch.

lambda_state=LambdaStep(state_id="Convert HelloWorld to Base64",parameters={"FunctionName":"MyLambda",#replace with the name of your function"Payload": {"input":"HelloWorld"        }    })lambda_state.add_retry(Retry(error_equals=["States.TaskFailed"],interval_seconds=15,max_attempts=2,backoff_rate=4.0))lambda_state.add_catch(Catch(error_equals=["States.TaskFailed"],next_step=Fail("LambdaTaskFailed")))

Workflows

After you define these steps, chain them together into a logical sequence.

workflow_definition=Chain([start_pass_state,wait_state,lambda_state])

Once the steps are chained together, you can define the workflow definition.

workflow=Workflow(name="MyWorkflow_v1234",definition=workflow_definition,role=stepfunctions_execution_role)

Visualizing a Workflow

The following generates a graphical representation of your workflow.

workflow.render_graph(portrait=False)

Review a Workflow Definition

The following renders the JSON of theAmazon States Languagedefinition of the workflow you created.

print(workflow.definition.to_json(pretty=True))

Running a Workflow

Create Workflow on AWS Step Functions

The following creates the workflow in AWS Step Functions.

workflow.create()

Execute the Workflow

The following starts an execution of your workflow in AWS Step Functions.

execution=workflow.execute(inputs={"IsHelloWorldExample":True})

Export an AWS CloudFormation Template

The following generates an AWS CloudFormation Template to deploy your workflow.

get_cloudformation_template()

The generated template contains only the StateMachine resource. To reusethe CloudFormation template in a different region, please make sure to updatethe region specific AWS resources (such as the Lambda ARN and Training Image)in the StateMachine definition.

AWS Permissions

As a managed service, AWS Step Functions performs operations on your behalf onAWS hardware that is managed by AWS Step Functions. AWS Step Functions canperform only operations that the user permits. You can read more about whichpermissions are necessary in theAWS Documentation.

The AWS Step Functions Data Science SDK should not require any additional permissionsaside from what is required for using .AWS Step Functions. However, if you areusing an IAM role with a path in it, you should grant permission foriam:GetRole.

Licensing

AWS Step Functions Data Science SDK is licensed under the Apache 2.0 License. It iscopyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. Thelicense is available at:http://aws.amazon.com/apache2.0/

Verifying the Signature

This section describes the recommended process of verifying the validity of theAWS Data Science Workflows Python SDK's compiled distributions onPyPI.

Whenever you download an application from the internet, we recommend that youauthenticate the identity of the software publisher and check that theapplication is not altered or corrupted since it was published. This protectsyou from installing a version of the application that contains a virus or othermalicious code.

If after running the steps in this topic, you determine that the distributionfor the AWS Data Science Workflows Python SDK is altered or corrupted, do NOTinstall the package. Instead, contact AWS Support (https://aws.amazon.com/contact-us/).

AWS Data Science Workflows Python SDK distributions on PyPI are signed usingGnuPG, an open source implementation of the Pretty Good Privacy (OpenPGP)standard for secure digital signatures. GnuPG (also known as GPG) providesauthentication and integrity checking through a digital signature. For moreinformation about PGP and GnuPG (GPG), seehttp://www.gnupg.org.

The first step is to establish trust with the software publisher. Download thepublic key of the software publisher, check that the owner of the public key iswho they claim to be, and then add the public key to your keyring. Your keyringis a collection of known public keys. After you establish the authenticity ofthe public key, you can use it to verify the signature of the application.

Topics

  1. Installing the GPG Tools
  2. Authenticating and Importing the Public Key
  3. Verify the Signature of the Package

Installing the GPG Tools

If your operating system is Linux or Unix, the GPG tools are likely alreadyinstalled. To test whether the tools are installed on your system, typegpg at a command prompt. If the GPG tools are installed, you see a GPGcommand prompt. If the GPG tools are not installed, you see an error statingthat the command cannot be found. You can install the GnuPG package from arepository.

To install GPG tools on Debian-based Linux

From a terminal, run the following command:apt-get install gnupg

To install GPG tools on Red Hat–based Linux

From a terminal, run the following command:yum install gnupg

Authenticating and Importing the Public Key

The next step in the process is to authenticate the AWS Data Science WorkflowsPython SDK public key and add it as a trusted key in your GPG keyring.

To authenticate and import the AWS Data Science Workflows Python SDK public key

1. Copy the key from the following text and paste it into a file calleddata_science_workflows.key. Make sure to include everything that follows:

-----BEGIN PGP PUBLIC KEY BLOCK-----mQINBF27JXsBEAC18lOq7/SmynwuTJZdzoSaYzfPjt+3RN5oFLd9VY559sLb1aqVph+RPu35YOR0GbR76NQZV6p2OicunvjmvvOKXzud8nsV3gjcSCdxn22YwVDdFdx9N0dMOzo126kFIkubWNsBZDxzGsgIsku82+OKJbdSZyGEs7eOQCqieVpubnAk/pc5J4sqYDFhL2ijCIwAW6YUx4WEMq1ysVVcoNIo5J3+f1NzJZBvI9xwf+R2AnX06EZbFFIcX6kx5B8Sz6s4AI0EVFt9YOjtD+y6aBs3e63wx9etahq5No26NffNEve+pw3oFTU7sq6HxX/cE+ssJALAwV/3/1OiluZ/icePgYvsl8UWkkULsnHEImW2vZOe9UCw9CYb7lgqMCd9o14kQy0+SeTS3EdFH+ONRub4RMkdT7NV5wfzgD4WpSYban1YLJYxXLYRIopMzWuRLSUKMHzqsN48UlNwUVzvpPlcVIAotzQQbgFaeWlW1Fvv3awqaF7Qlnt0EBX5n71LJNDmpTRPtICnxcVsNXT1Uctk1mtzYwuMrxk0pDJZs06qPLwehwmO4A4bQCZ/1aVnXaauzshP7kzgPWG6kqOcSbn3VA/yhfDX/NBeY3Xg1ECDlFxmCrrVD7xqpZgVaztHbRIOr6ANKLMf72ZmqxiYayrFlLLOkJYtNCaC8igO5Baf2wARAQABtFBTdGVwZnVuY3Rpb25zLVB5dGhvbi1TREstU2lnbmluZyA8c3RlcGZ1bmN0aW9ucy1kZXZlbG9wZXItZXhwZXJpZW5jZUBhbWF6b24uY29tPokCVAQTAQgAPhYhBMwWBXe3v509bl1RxWDrEDrjFKgJBQJduyV7AhsDBQkUsSsABQsJCAcCBhUKCQgLAgQWAgMBAh4BAheAAAoJEGDrEDrjFKgJq5IP/25LVDaA3itCICBP2/eu8KkUJ437oZDr+3z59z7p4mvispmEzi4OOb1lMGBH+MdhkgblrcSaj4XcIslTkfKD4gP/cMSl14hbX/OIxEXFXvTq4PmWUCgl5NtsyAbgB3pAxGUfNAXR2dV3MJFAHSOVUK5Es4/kAj4a5lra+1MwZZMDqhMTYuvTclIqPA/PXafkgL5g15JA5lFDyFQ2zuV1BgQlKh7o24Jwa1kDB0aSePkrh4gJHXAEoGDjX2mcGhEjlBvCH4ay7VGoG6l+rjcHnqSiVX0tg9dZIlc7RTR+1LX7jx8wdsYSUGekADy6wGTjk9HBTafh8Bl8sR2eNoH1qZuIn/YIHxkRJPH/74hG71pjS4FWPBbbPrdkC/G47mXMfLUrGpigcgkhePuA1BBW30U0ZZWWDHsfISxp8hcQkR5gFhU+37tsC06pwihhDWgx4kTfeTmNqkl03fTH5lwNsig0HSpUINWR+EWN0jXb8DtjMzZbiDhLxQX9U3HBEdw2g2/Ktsqv+MM1P1choEGNtzots3V9fqMYTxy7MkYLtRDYu+sX5DNob309vPzbI4b3KBv6hCRJdnICjBvgL6C8WHaLm6+FU+68rFRKw6WImWHyygdnv8Bzdq4h+MaTE6AhteYutd+ZTWpazfE1h0ngrEerQju2VLZPLAACxHBQNjT+uQINBF27JXsBEAC/PDJmWIkJBdnOmPU/W0SosOZRMvzs/KR89qeIebT8O0rNFeHR6Iql5ak6kGeDLwnzcOOwqamO+vwGmRScwPT6NF9+HDkXCzITOE2271zKVjGVf+tX5kHJzT8ZqQBxvnk5Cx/d7sr3kwLBhhygHLS/kn2K9fhYwbtsQTLEo9XvTBOip+DohHHJjZHcboeYnZ2g2b8Gnwe4cz75ogFNcuHZXusr8Y6enJX8wTBy/AvXPVUIyrHbrXcHaNS3UYKzbhkH6W1cfkV6Bb49FKYkxH0N1ZeooyS6zXyf0X4nTAbyCfoFYQ68KC17/pGMOXtR/UlqDeJe0sFeyyTHKjdSTDpA+WKKJJZ5BSCYQ5Hqewy6mvaIcKURExIZyNqRHRhb4p/0BA7eXzMCryx1AZPcQnaMVQYJTi5e+HSnOxnKAB7jm2HHPHCRgO4qvavr5dIlEoKBM6qya1KVqoarw5hv8J8+R9ECn4kWZ8QjBlgOy65q/b3mwqK0rVA1w73BPWea/xLCLrqqVRGa/fB7dhTnPfn+BpaQ3qruLinIJatM8c2/p1LZ1nuWgrssSkSMn3TlffF0Lq9jtcbi7K11A082RiB2L0lu+j8r07RgVQvZ4UliS1Lklsp7Ixh+zoR712hKPQpNVLstEHTxQhXZTWAk/Ih7b9ukrL/1HJAnhZBeuBhDDQARAQABiQI8BBgBCAAmFiEEzBYFd7e/nT1uXVHFYOsQOuMUqAkFAl27JXsCGwwFCRSxKwAACgkQYOsQOuMUqAnJvA//SDQZxf0zbge8o9kGfrm7bnExz8a6sxEnurooUaSk3isbGFAUg+Q7rQ+ViG9gDG74F5liwwcKoBct/Z9tCi/7p3QI0BE0bM1jIHdm5dXaZAcMlUy6f0p3DO3qE2IjnNjEjvpm7Xzt6tKJu/scZQNdQxG/CDn5+ezmnIatgDV6ugDDv/2o0BXMyAZT008T/QLR2U5dEsbt9H3Bzl4Ska6gjak2ToJL0T611dZjfv/1UbeYRPFCO6CsLj9uEq+RoHAsvAS4rl9HyM3b2sVzr8CMsP6LVdqlA2Qz/nIBd+GuLofi3/PGvvS63ubfqSRGd5VvJXoiRl2WoE8lmyIB5UJfFfd8Zdn6j+hQc14VOp89mEfg57BiQXfZnzjFVNkl7T5I2g3X5O8StosncChqiJTSH5C731KUVqxOxYknFostioIVKmyis/Nwmwr6fIItYyYCwh5YCqAg0r4SLbhFEVXdannUbFPF6upOEbKlZP3Iyu/kYANMnq+9+GImrPrT/FCpM9RW1GFAnuVBt9Qjs+eRq4DQJl/EaIjZcgqz+e5TZNxDK9r2sHC4zGWy88/2GuhD8xh4FH5hBIDJPmHUtKh9XElq187VA4JgU0mbryduKMQIyuc6OLzfJUbVTMvKWaPASbGtvAAOwCFtAi33dZ8bOfjQLgOb9uDh/vQojRxttMc==ovUh-----END PGP PUBLIC KEY BLOCK-----

2. At a command prompt in the directory where you saveddata_science_workflows.key, use the following command to import the AWS DataScience Workflows Python SDK public key into your keyring:

gpg --import data_science_workflows.key

The command returns results that are similar to the following:

gpg: key 60EB103AE314A809: public key "Stepfunctions-Python-SDK-Signing <stepfunctions-developer-experience@amazon.com>" importedgpg: Total number processed: 1gpg:               imported: 1

Make a note of the key value; you need it in the next step. In the precedingexample, the key value is 60EB103AE314A809.

3. Verify the fingerprint by running the following command, replacing key-valuewith the value from the preceding step:

gpg --fingerprint <key-value>

This command returns results similar to the following:

pub   rsa4096 2019-10-31 [SC] [expires: 2030-10-31] CC16 0577 B7BF 9D3D 6E5D51C5 60EB 103A E314 A809 uid           [ unknown]Stepfunctions-Python-SDK-Signing<stepfunctions-developer-experience@amazon.com> sub   rsa4096 2019-10-31 [E][expires: 2030-10-31]

Additionally, the fingerprint string should be identical to CC16 0577 B7BF9D3D 6E5D 51C5 60EB 103A E314 A809, as shown in the preceding example.Compare the key fingerprint that is returned to the one published on thispage. They should match. If they don't match, don't install the AWS DataScience Workflows Python SDK package, and contact AWS Support.

Verify the Signature of the Package

After you install the GPG tools, authenticate and import the AWS Data ScienceWorkflows Python SDK public key, and verify that the public key is trusted, youare ready to verify the signature of the package.

To verify the package signature, do the following.

  1. Download the detached signature for the package from PyPI

Go to the downloads section for the Data Science Workflows Python SDKhttps://pypi.org/project/stepfunctions/#files on PyPI, Right-click on the SDKdistribution link, and choose "Copy Link Location/Address".

Append the string ".asc" to the end of the link you copied, and paste thisnew link on your browser.

Your browser will prompt you to download a file, which is the detatchedsignature associated with the respective distribution. Save the file on yourlocal machine.

2. Verify the signature by running the following command at a command promptin the directory where you saved signature file and the AWS Data ScienceWorkflows Python SDK installation file. Both files must be present.

gpg --verify <path-to-detached-signature-file>

The output should look something like the following:

gpg: Signature made Thu 31 Oct 12:14:53 2019 PDTgpg:                using RSA key CC160577B7BF9D3D6E5D51C560EB103AE314A809gpg: Good signature from "Stepfunctions-Python-SDK-Signing <stepfunctions-developer-experience@amazon.com>" [unknown]gpg: WARNING: This key is not certified with a trusted signature!gpg:          There is no indication that the signature belongs to the owner.Primary key fingerprint: CC16 0577 B7BF 9D3D 6E5D  51C5 60EB 103A E314 A809

If the output contains the phrase Good signature from "AWS Data ScienceWorkflows Python SDK <stepfunctions-developer-experience@amazon.com>", it meansthat the signature has successfully been verified, and you can proceed to runthe AWS Data Science Workflows Python SDK package.

If the output includes the phrase BAD signature, check whether you performed theprocedure correctly. If you continue to get this response, don't run theinstallation file that you downloaded previously, and contact AWS Support.

The following are details about the warnings you might see:

WARNING: This key is not certified with a trusted signature! There is noindication that the signature belongs to the owner. This refers to yourpersonal level of trust in your belief that you possess an authentic publickey for AWS Data Science Workflows Python SDK. In an ideal world, you wouldvisit an AWS office and receive the key in person. However, more often youdownload it from a website. In this case, the website is an AWS website.gpg: no ultimately trusted keys found. This means that the specific key is not"ultimately trusted" by you (or by other people whom you trust).

For more information, seehttp://www.gnupg.org.

About

Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors21

Languages


[8]ページ先頭

©2009-2025 Movatter.jp