- Notifications
You must be signed in to change notification settings - Fork36
Solution accelerator to help build Machine Learning Lineage
License
microsoft/Purview-Machine-Learning-Lineage-Solution-Accelerator
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
| page_type | languages | products | |||||
|---|---|---|---|---|---|---|---|
sample |
|
|
Microsoft Purview is a unified data governance service that helps you manage and govern data across different sources.
Machine Learning project life cycle involves many steps to transform raw data into insights. This process usually requires individuals with different roles/skillsets across multiple teams to collaborate effectively. Microsoft Purview helps simplify this complex process by providing an end-to-end lineage of ML entities and processes to enable better collaboration, auditing and debugging capabilities.
This solution accelerator helps developers with the resources needed to build an end-to-end lineage in Purview for Machine Learning scenarios.
To use this solution accelerator, you will need access to anAzure subscription. While not required, a prior understanding of Microsoft Purview, Azure Synapse Analytics and Machine Learning will be helpful.
For additional training and support, please see:
Start by deploying the required resources to Azure. The button below will deploy Microsoft Purview, Azure Synapse Analytics, Azure Machine Learning and its related resources:
If you prefer to setup manually, you need to deploy Microsoft Purview, Azure Synapse Analytics, Azure Machine Learning.
Note: To minimize Azure costs, consider deleting the Purview instance at the end of this exercise if you do not plan to use this instance actively.
Clone or download this repository and navigate to the project's root directory.
Configure your Purview catalog to trust the service principal
From theAzure portal, select your AML workspace
select Access Control (IAM)
Select Add, Add Role Assignment to open the Add role assignment page
3.1 For the
Roletype inContributor3.2 For
Assign access toleave the default,User, group, or service principal3.2 For
Selectenter the name of the previosly created service principal in step 3.1 and then click on their name in the results pane3.3 Click on SaveYou've now configured the service principal as a contributor on Azure Machine Learning resource.
Before you can upload assests to the Synapse Workspace you will need to add your IP address:
- Go to the Synapse resouce you created in the previous step
- Navigate to
FirewallsunderSecurityon the left hand side of the page - At the top of the screen click
+ Add client IP
- Your IP address should now be visable in the IP list
In order to perform the necessary actions in Synapse workspace, you will need to grant more access.
- Go to the Azure Data Lake Storage Account created above
- Go to the
Access Control (IAM) > + Add > Add role assignment - Now click the Role dropdown and select
Storage Blob Data Contributor- Search for your username and add
- Click
Saveat the bottom
- Launch the Synapse workspaceSynapse Workspace
- Select the
subscriptionandworkspacename you are using for this solution accelerator - In Synapse Studio, navigate to the
DataHub - Select
Linked - Under the category
Azure Data Lake Storage Gen2you'll see an item with a name likexxxxx(xxxxx- Primary) - Select the container named
data (Primary) - Create a new folder
creditriskdata - Select
Uploadand selectloan.csvandborrower.csvfiles downloaded fromData folder
select only thecreditriskdata folder while creating the scan.
Wait for scan run status to change toCompleted before running next step.
Launch the Synapse workspaceSynapse Workspace
Select the
subscriptionandworkspacename you are using for this solution acceleratorGo to the
Managetab in the Synapse workspace and click on theApache Spark poolsClick
...on the deployed Spark Pool and selectPackagesClick
Uploadand selectrequirements.txt from the cloned repo and clickApplyGo to
Develop, click the+, and clickImportto select all notebooks from the repository's/SynapseNotebooks/folderFor each of the notebooks, select
Attach to > spark1in the top dropdownUpdate Purview Tenant, Client Id and Secret from step
2.1in01_Authenticate_to_Purview_AML.ipynbUpdate Azure Machine Learning Tenant, Client Id and Secret from step
3.1in01_Authenticate_to_Purview_AML.ipynbUpdate
account_namevariable to your ADLS in04_Create_CreditRisk_Experiment.ipynbClick
Publish allto publish the notebook changesRun the following notebook:
04_Create_CreditRisk_Experiment.ipynb(This notebook runs other notebooks you imported)
- LaunchPurview Studio
- Click on
Browse Assets - Click on
Custom Modeland select the model we created from running notebooks inStep 7 - Click on
Lineageto see Machine Learning process Lineage
- Launch the Azure Machine Learning studioAML Studio
- Select the
subscriptionandworkspacename you are using for this solution accelerator - Go to the
Notebookstab in the AML Studio and upload the notebooks and scripts inAML Notebooksfolder includingDatafolder - Go to the
Computetab in the AML Studio and click on theCompute Instances - Click
Newand create a new compute instance - Click
Jupyterand launch the compute instance - In the browser window that opens, click the folders to see the notebooks you uploaded in step
9.3 - Update Purview Tenant, Client Id and Secret from step
2.1inAuthenticate_to_Purview_AML.py - Update Azure Machine Learning Tenant, Client Id and Secret from step
3.1inAuthenticate_to_Purview_AML.py - Run the following notebooks in order:
01_Create_CreditRisk_AML_Pipeline.ipynb( Pipeline run might take few minutes so please wait for completion before running the next notebook)02_Create_CreditRisk_AML_Pipeline_Lineage.ipynb
- LaunchPurview Studio
- Click on
Browse Assets - Click on
Custom ML Experiment Stepand select any step we created from running notebooks inStep 9 - Click on
Lineageto see Machine Learning pipeline Lineage
The architecture diagram below details what you will be building for this Solution Accelerator.
MIT License
Copyright (c) Microsoft Corporation.
Permission is hereby granted, free of charge, to any person obtaining a copyof this software and associated documentation files (the "Software"), to dealin the Software without restriction, including without limitation the rightsto use, copy, modify, merge, publish, distribute, sublicense, and/or sellcopies of the Software, and to permit persons to whom the Software isfurnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in allcopies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS ORIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THEAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHERLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THESOFTWARE
The following libraries are notexplicitly included in this repository, but users who use this Solution Accelerator may need to install them locally and in Azure Synapse and Azure Machine Learning to fully utilize this Solution Accelerator. However, the actual binaries and files associated with the librariesare not included as part of this repository, but they are available for installation via the PyPI library using the pip installation tool.
Libraries: chardet, certifi
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visithttps://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted theMicrosoft Open Source Code of Conduct. For more information see theCode of Conduct FAQ or contactopencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must followMicrosoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
The software may collect information about you and your use of the software and send it to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may turn off the telemetry as described in the repository. There are also some features in the software that may enable you and Microsoft to collect data from users of your applications. If you use these features, you must comply with applicable law, including providing appropriate notices to users of your applications together with a copy of Microsoft's privacy statement. Our privacy statement is located athttps://go.microsoft.com/fwlink/?LinkID=824704. You can learn more about data collection and use in the help documentation and our privacy statement. Your use of the software operates as your consent to these practices.
About
Solution accelerator to help build Machine Learning Lineage
Topics
Resources
License
Code of conduct
Contributing
Security policy
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors3
Uh oh!
There was an error while loading.Please reload this page.

