Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Solution accelerator to help build Machine Learning Lineage

License

NotificationsYou must be signed in to change notification settings

microsoft/Purview-Machine-Learning-Lineage-Solution-Accelerator

page_typelanguagesproducts
sample
python
bash
microsoft-purview
azure-synapse-analytics
azure-machine-learning

Purview Machine Learning Lineage Solution Accelerator

Purview Machine Learning Lineage Solution Accelerator

Microsoft Purview is a unified data governance service that helps you manage and govern data across different sources.

Machine Learning project life cycle involves many steps to transform raw data into insights. This process usually requires individuals with different roles/skillsets across multiple teams to collaborate effectively. Microsoft Purview helps simplify this complex process by providing an end-to-end lineage of ML entities and processes to enable better collaboration, auditing and debugging capabilities.

This solution accelerator helps developers with the resources needed to build an end-to-end lineage in Purview for Machine Learning scenarios.

Sample Credit Risk Prediction ML Process Flow

Purview Machine Learning Lineage Introduction

Purview ML Process Lineage

ML Lineage

Prerequisites

To use this solution accelerator, you will need access to anAzure subscription. While not required, a prior understanding of Microsoft Purview, Azure Synapse Analytics and Machine Learning will be helpful.

For additional training and support, please see:

  1. Microsoft Purview
  2. Azure Synapse Analytics
  3. Azure Machine Learning

Getting Started

Start by deploying the required resources to Azure. The button below will deploy Microsoft Purview, Azure Synapse Analytics, Azure Machine Learning and its related resources:

Deploy to Azure

If you prefer to setup manually, you need to deploy Microsoft Purview, Azure Synapse Analytics, Azure Machine Learning.

Note: To minimize Azure costs, consider deleting the Purview instance at the end of this exercise if you do not plan to use this instance actively.

Step 1. Download Files

Clone or download this repository and navigate to the project's root directory.

Step 2. Purview Security Access

Step 2.1 Create a Service Principal for Purview Rest API access

Create a service principal

Step 2.2 Configure your Purview catalog to trust the service principal

Configure your Purview catalog to trust the service principal

Step 3. Azure Machine Learning Security Access

Step 3.1 Create a Service Principal for AML access

Create a service principal

Step 3.2 Configure your Azure Machine Learning to trust the service principal

  1. From theAzure portal, select your AML workspace

  2. select Access Control (IAM)

  3. Select Add, Add Role Assignment to open the Add role assignment page

    3.1 For theRole type inContributor

    3.2 ForAssign access to leave the default,User, group, or service principal

    3.2 ForSelect enter the name of the previosly created service principal in step 3.1 and then click on their name in the results pane

    3.3 Click on SaveYou've now configured the service principal as a contributor on Azure Machine Learning resource.

Step 4. Synapse Security Access

Step 4.1 Add your IP address to Synapse firewall

Before you can upload assests to the Synapse Workspace you will need to add your IP address:

  1. Go to the Synapse resouce you created in the previous step
  2. Navigate toFirewalls underSecurity on the left hand side of the page
  3. At the top of the screen click+ Add client IPUpdate Firewalls
  4. Your IP address should now be visable in the IP list

Step 4.2: Update storage account permisions

In order to perform the necessary actions in Synapse workspace, you will need to grant more access.

  1. Go to the Azure Data Lake Storage Account created above
  2. Go to theAccess Control (IAM) > + Add > Add role assignment
  3. Now click the Role dropdown and selectStorage Blob Data Contributor
    • Search for your username and add
  4. ClickSave at the bottom

Learn more

Step 5. Upload CreditRisk Sample Dataset

  1. Launch the Synapse workspaceSynapse Workspace
  2. Select thesubscription andworkspace name you are using for this solution accelerator
  3. In Synapse Studio, navigate to theData Hub
  4. SelectLinked
  5. Under the categoryAzure Data Lake Storage Gen2 you'll see an item with a name likexxxxx(xxxxx- Primary)
  6. Select the container nameddata (Primary)
  7. Create a new foldercreditriskdata
  8. SelectUpload and selectloan.csv andborrower.csv files downloaded fromData folder

Step 6. Register and scan uploaded data in Purview

  1. Setting up authentication for a scan

  2. Register and scan adls gen2

select only thecreditriskdata folder while creating the scan.

ADLSGen2 Scanning folder selection

Wait for scan run status to change toCompleted before running next step.

Step 7. Upload Assets and Run Noteboks

  1. Launch the Synapse workspaceSynapse Workspace

  2. Select thesubscription andworkspace name you are using for this solution accelerator

  3. Go to theManage tab in the Synapse workspace and click on theApache Spark pools

    • Spark Pool
  4. Click... on the deployed Spark Pool and selectPackages

  5. ClickUpload and selectrequirements.txt from the cloned repo and clickApply

    • Requirements File
  6. Go toDevelop, click the+, and clickImport to select all notebooks from the repository's/SynapseNotebooks/ folder

  7. For each of the notebooks, selectAttach to > spark1 in the top dropdown

  8. Update Purview Tenant, Client Id and Secret from step2.1 in01_Authenticate_to_Purview_AML.ipynb

  9. Update Azure Machine Learning Tenant, Client Id and Secret from step3.1 in01_Authenticate_to_Purview_AML.ipynb

  10. Updateaccount_name variable to your ADLS in04_Create_CreditRisk_Experiment.ipynb

  11. ClickPublish all to publish the notebook changes

  12. Run the following notebook:

    • 04_Create_CreditRisk_Experiment.ipynb (This notebook runs other notebooks you imported)

Step 8. Check Machine Learning Lineage in Purview Studio

  1. LaunchPurview Studio
  2. Click onBrowse Assets
  3. Click onCustom Model and select the model we created from running notebooks inStep 7
  4. Click onLineage to see Machine Learning process LineageML Lineage

Step 9. Upload Assets and Run Azure Machine Learning Noteboks (Optional)

  1. Launch the Azure Machine Learning studioAML Studio
  2. Select thesubscription andworkspace name you are using for this solution accelerator
  3. Go to theNotebooks tab in the AML Studio and upload the notebooks and scripts inAML Notebooks folder includingData folder
  4. Go to theCompute tab in the AML Studio and click on theCompute Instances
  5. ClickNew and create a new compute instance
  6. ClickJupyter and launch the compute instance
  7. In the browser window that opens, click the folders to see the notebooks you uploaded in step9.3
  8. Update Purview Tenant, Client Id and Secret from step2.1 inAuthenticate_to_Purview_AML.py
  9. Update Azure Machine Learning Tenant, Client Id and Secret from step3.1 inAuthenticate_to_Purview_AML.py
  10. Run the following notebooks in order:
    • 01_Create_CreditRisk_AML_Pipeline.ipynb ( Pipeline run might take few minutes so please wait for completion before running the next notebook)
    • 02_Create_CreditRisk_AML_Pipeline_Lineage.ipynb

ML Pipeline

Step 10. Check Machine Learning pipeline Lineage in Purview Studio (Optional)

  1. LaunchPurview Studio
  2. Click onBrowse Assets
  3. Click onCustom ML Experiment Step and select any step we created from running notebooks inStep 9
  4. Click onLineage to see Machine Learning pipeline Lineage

ML Pipeline Lineage

Architecture

The architecture diagram below details what you will be building for this Solution Accelerator.Architecture

License

MIT License

Copyright (c) Microsoft Corporation.

Permission is hereby granted, free of charge, to any person obtaining a copyof this software and associated documentation files (the "Software"), to dealin the Software without restriction, including without limitation the rightsto use, copy, modify, merge, publish, distribute, sublicense, and/or sellcopies of the Software, and to permit persons to whom the Software isfurnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in allcopies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS ORIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THEAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHERLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THESOFTWARE

Note about Libraries with MPL-2.0 and LGPL-2.1 Licenses

The following libraries are notexplicitly included in this repository, but users who use this Solution Accelerator may need to install them locally and in Azure Synapse and Azure Machine Learning to fully utilize this Solution Accelerator. However, the actual binaries and files associated with the librariesare not included as part of this repository, but they are available for installation via the PyPI library using the pip installation tool.

Libraries: chardet, certifi

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visithttps://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted theMicrosoft Open Source Code of Conduct. For more information see theCode of Conduct FAQ or contactopencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must followMicrosoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Data Collection

The software may collect information about you and your use of the software and send it to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may turn off the telemetry as described in the repository. There are also some features in the software that may enable you and Microsoft to collect data from users of your applications. If you use these features, you must comply with applicable law, including providing appropriate notices to users of your applications together with a copy of Microsoft's privacy statement. Our privacy statement is located athttps://go.microsoft.com/fwlink/?LinkID=824704. You can learn more about data collection and use in the help documentation and our privacy statement. Your use of the software operates as your consent to these practices.

About

Solution accelerator to help build Machine Learning Lineage

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors3

  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp