Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Jul 18, 2024. It is now read-only.

Walkthrough the data science life cycle with different tools, techniques, and algorithms. Use AIF360, pandas, and Jupyter notebooks to build and deploy a model on Watson Machine Learning.

License

NotificationsYou must be signed in to change notification settings

IBM/employee-attrition-aif360

Repository files navigation

This code pattern is a high-level overview of what to expect in a data science pipeline and the tools that can be used along the way. It starts from framing the business question, to buiding and deploying a data model. The pipeline is demonstrated through the employee attrition problem.

Employees are the backbone of any organization. Its performance is heavily based on the quality of the employees and retaining them. With employee attrition, organizations are faced with a number of challenges:

  1. Expensive in terms of both money and time to train new employees
  2. Loss of experienced employees
  3. Impact on productivity
  4. Impact on profit

The following solution is designed to help address the employee attrition problem. When the reader has completed this code pattern, they will understand:

  1. The Process involved in solving a data science problem
  2. How to create and use Watson Studio instance
  3. How to mitigate bias by transforming the original dataset through use of the AI Fairness 360 (AIF360) toolkit
  4. How to build and deploy the model in Watson Studio using various tools

The dataset used in the code pattern is supplied byKaggle and contains HR analytics data of employees that stay and leave. The types of data include metrics such as education level, job satisfactions, and commmute distance.

The data is made available under the following license agreements:

Dataset License Details

AssetLicenseSource Link
Employee Attrition Data - Database LicenseOpen Database License (ODbL)Kaggle
Employee Attrition Data - Content License Database Content license (DbCL)Kaggle

Flow

architecture

  1. Create and login to the IBM Watson Studio.
  2. Upload the jupyter notebook and start running it.
  3. Notebook downloads the dataset and imports fairness toolkit (AIF360) and Pygal data visualization library.
  4. Pandas is used for reading the data and perform initial data exploration.
  5. Matplotlib, Seaborn, Plotly, Bokeh and Pygal (from step-3) are used for visualizing the data.
  6. Scikit-Learn and AIF360 (from step-3) are used for model development.
  7. Use the IBM Watson Machine Learning feature to deploy and access the model to generate employee attrition classification.

Included Components

  • IBM Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
  • IBM Watson Machine Learning: a set of REST APIs to develop applications that make smarter decisions, solve tough problems, and improve user outcomes.
  • Jupyter Notebook: An open source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text.

Featured technologies

  • Artificial Intelligence: Artificial intelligence can be applied to disparate solution spaces to deliver disruptive technologies.
  • Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.
  • Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
  • Pandas: A Python library providing high-performance, easy-to-use data structures.
  • AIF360 Fairness toolkit: This extensible open source toolkit can help you examine, report, and mitigate discrimination and bias in machine learning models throughout the AI application lifecycle.
  • Scikit-Learn: Free software machine learning library for the Python programming language.
  • Data Visualization tools: Bokeh, Matplotlib, Seaborn, Pygal and Plotly.

Steps

  1. Create a new Watson Studio project
  2. Create the notebook
  3. Run the notebook
  4. Save and Share

Note: if you would prefer to skip the following steps and just follow along by viewing the completed Notebook, simply:

  • View the completednotebook and its outputs, as is.
  • While viewing the notebook, you can optionally download it to store for future use.

1. Create a new Watson Studio project

  • Log into IBM'sWatson Studio. Once in, you'll land on the dashboard.

  • Create a new project by clicking+ New project and choosingData Science:

    studio project

  • Enter a name for the project name and clickCreate.

NOTE: By creating a project in Watson Studio a free tierObject Storage service andWatson Machine Learning service will be created in your IBM Cloud account. Select theFree storage type to avoid fees.

studio-new-project

  • Upon a successful project creation, you are taken to a dashboard view of your project. Take note of theAssets andSettings tabs, we'll be using them to associate our project with any external assets (datasets and notebooks) and any IBM cloud services.

studio-project-dashboard

2. Create the Notebook

The notebook we'll be using can be viewed innotebooks/employee-attrition.ipynb, and a completed version can be found inexamples/employee-attrition.ipynb.

  • From the new projectOverview panel, click+ Add to project on the top right and choose theNotebook asset type.

studio-project-dashboard

3. Run the notebook

When running the notebook, you will come to the cell that requires you to enter yourWatson Machine Learning instance credentials. These will be required to complete the notebook. Refer tostep #1 above for more details.

When a notebook is executed, what is actually happening is that each code cell inthe notebook is executed, in order, from top to bottom.

Each code cell is selectable and is preceded by a tag in the left margin. The tagformat isIn [x]:. Depending on the state of the notebook, thex can be:

  • A blank, this indicates that the cell has never been executed.
  • A number, this number represents the relative order this code step was executed.
  • A*, this indicates that the cell is currently executing.

There are several ways to execute the code cells in your notebook:

  • One cell at a time.
    • Select the cell, and then press thePlay button in the toolbar.
  • Batch mode, in sequential order.
    • From theCell menu bar, there are several options available. For example, youcanRun All cells in your notebook, or you canRun All Below, that willstart executing from the first cell under the currently selected cell, and thencontinue executing all cells that follow.
  • At a scheduled time.
    • Press theSchedule button located in the top right section of your notebookpanel. Here you can schedule your notebook to be executed once at some futuretime, or repeatedly at your specified interval.

4. Save and Share

How to save your work:

Under theFile menu, there are several ways to save your notebook:

  • Save will simply save the current state of your notebook, without any versioninformation.
  • Save Version will save your current state of your notebook with a version tagthat contains a date and time stamp. Up to 10 versions of your notebook can besaved, each one retrievable by selecting theRevert To Version menu item.

How to share your work:

You can share your notebook by selecting theShare button located in the topright section of your notebook panel. The end result of this action will be a URLlink that will display a “read-only” version of your notebook. You have severaloptions to specify exactly what you want shared from your notebook:

  • Only text and output: will remove all code cells from the notebook view.
  • All content excluding sensitive code cells: will remove any code cellsthat contain asensitive tag. For example,# @hidden_cell is used to protectyour credentials from being shared.
  • All content, including code: displays the notebook as is.
  • A variety ofdownload as options are also available in the menu.

Sample output

View a copy of the notebook including outputhere.

Troubleshooting

  • Notebook error:

    library-error

    This will occur if you run the notebook multiple times. The custom libraryNAME found in the structure below must be unique for each run. Change the value and run the cell again.

    library_metadata= {client.runtimes.LibraryMetaNames.NAME:"PipelineLabelEncoder-Custom",client.runtimes.LibraryMetaNames.DESCRIPTION:"label_encoder_sklearn",client.runtimes.LibraryMetaNames.FILEPATH:"Pipeline_LabelEncoder-0.1.zip",client.runtimes.LibraryMetaNames.VERSION:"1.0",client.runtimes.LibraryMetaNames.PLATFORM: {"name":"python","versions": ["3.5"]}}

Learn more

  • Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our otherAI Code Patterns.
  • Data Analytics Code Patterns: Enjoyed this Code Pattern? Check out our otherData Analytics Code Patterns
  • AI and Data Code Pattern Playlist: Bookmark ourplaylist with all of our Code Pattern videos
  • With Watson: Want to take your Watson app to the next level? Looking to utilize Watson Brand assets?Join the With Watson program to leverage exclusive brand, marketing, and tech resources to amplify and accelerate your Watson embedded commercial solution.
  • Watson Studio: Master the art of data science with IBM'sWatson Studio

License

This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to theDeveloper Certificate of Origin, Version 1.1 and theApache License, Version 2.

Apache License FAQ

About

Walkthrough the data science life cycle with different tools, techniques, and algorithms. Use AIF360, pandas, and Jupyter notebooks to build and deploy a model on Watson Machine Learning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors5


[8]ページ先頭

©2009-2025 Movatter.jp