This repository was archived by the owner on Jul 18, 2024. It is now read-only.

IBM/employee-attrition-aif360Public archive

NotificationsYou must be signed in to change notification settings
Fork45
Star46

Walkthrough the data science life cycle with different tools, techniques, and algorithms. Use AIF360, pandas, and Jupyter notebooks to build and deploy a model on Watson Machine Learning.

developer.ibm.com/patterns/data-science-life-cycle-in-action-to-solve-employee-attrition-problem/

License

Apache-2.0 license

46 stars 45 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
data		data
doc/source/images		doc/source/images
examples		examples
notebooks		notebooks
.gitignore		.gitignore
.travis.yml		.travis.yml
ACKNOWLEDGEMENTS.md		ACKNOWLEDGEMENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
README.md		README.md

Repository files navigation

Data Science Process Pipeline in action to solve Employee Attrition Problem

This code pattern is a high-level overview of what to expect in a data science pipeline and the tools that can be used along the way. It starts from framing the business question, to buiding and deploying a data model. The pipeline is demonstrated through the employee attrition problem.

Employees are the backbone of any organization. Its performance is heavily based on the quality of the employees and retaining them. With employee attrition, organizations are faced with a number of challenges:

Expensive in terms of both money and time to train new employees
Loss of experienced employees
Impact on productivity
Impact on profit

The following solution is designed to help address the employee attrition problem. When the reader has completed this code pattern, they will understand:

The Process involved in solving a data science problem
How to create and use Watson Studio instance
How to mitigate bias by transforming the original dataset through use of the AI Fairness 360 (AIF360) toolkit
How to build and deploy the model in Watson Studio using various tools

The dataset used in the code pattern is supplied byKaggle and contains HR analytics data of employees that stay and leave. The types of data include metrics such as education level, job satisfactions, and commmute distance.

The data is made available under the following license agreements:

Dataset License Details

Asset	License	Source Link
Employee Attrition Data - Database License	Open Database License (ODbL)	Kaggle
Employee Attrition Data - Content License	Database Content license (DbCL)	Kaggle

Flow

Create and login to the IBM Watson Studio.
Upload the jupyter notebook and start running it.
Notebook downloads the dataset and imports fairness toolkit (AIF360) and Pygal data visualization library.
Pandas is used for reading the data and perform initial data exploration.
Matplotlib, Seaborn, Plotly, Bokeh and Pygal (from step-3) are used for visualizing the data.
Scikit-Learn and AIF360 (from step-3) are used for model development.
Use the IBM Watson Machine Learning feature to deploy and access the model to generate employee attrition classification.

Included Components

IBM Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
IBM Watson Machine Learning: a set of REST APIs to develop applications that make smarter decisions, solve tough problems, and improve user outcomes.
Jupyter Notebook: An open source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text.

Featured technologies

Artificial Intelligence: Artificial intelligence can be applied to disparate solution spaces to deliver disruptive technologies.
Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.
Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
Pandas: A Python library providing high-performance, easy-to-use data structures.
AIF360 Fairness toolkit: This extensible open source toolkit can help you examine, report, and mitigate discrimination and bias in machine learning models throughout the AI application lifecycle.
Scikit-Learn: Free software machine learning library for the Python programming language.
Data Visualization tools: Bokeh, Matplotlib, Seaborn, Pygal and Plotly.

Steps

Note: if you would prefer to skip the following steps and just follow along by viewing the completed Notebook, simply:
View the completednotebook and its outputs, as is.
While viewing the notebook, you can optionally download it to store for future use.

1. Create a new Watson Studio project

Log into IBM'sWatson Studio. Once in, you'll land on the dashboard.
Create a new project by clicking+ New project and choosingData Science:
Enter a name for the project name and clickCreate.

NOTE: By creating a project in Watson Studio a free tierObject Storage service andWatson Machine Learning service will be created in your IBM Cloud account. Select theFree storage type to avoid fees.

Upon a successful project creation, you are taken to a dashboard view of your project. Take note of theAssets andSettings tabs, we'll be using them to associate our project with any external assets (datasets and notebooks) and any IBM cloud services.

2. Create the Notebook

The notebook we'll be using can be viewed innotebooks/employee-attrition.ipynb, and a completed version can be found inexamples/employee-attrition.ipynb.

From the new projectOverview panel, click+ Add to project on the top right and choose theNotebook asset type.

Fill in the following information:
- Select theFrom URL tab. [1]
- Enter aName for the notebook and optionally a description. [2]
- UnderNotebook URL provide the following url:https://github.com/IBM/employee-attrition-aif360/blob/master/notebooks/employee-attrition.ipynb [3]
- ForRuntime select thePython 3.5 option. [4]
TIP: Once successfully imported, the notebook should appear in theNotebooks section of theAssets tab.

3. Run the notebook

When running the notebook, you will come to the cell that requires you to enter yourWatson Machine Learning instance credentials. These will be required to complete the notebook. Refer tostep #1 above for more details.

When a notebook is executed, what is actually happening is that each code cell inthe notebook is executed, in order, from top to bottom.

Each code cell is selectable and is preceded by a tag in the left margin. The tagformat isIn [x]:. Depending on the state of the notebook, thex can be:

A blank, this indicates that the cell has never been executed.
A number, this number represents the relative order this code step was executed.
A*, this indicates that the cell is currently executing.

There are several ways to execute the code cells in your notebook:

One cell at a time.
- Select the cell, and then press thePlay button in the toolbar.
Batch mode, in sequential order.
- From theCell menu bar, there are several options available. For example, youcanRun All cells in your notebook, or you canRun All Below, that willstart executing from the first cell under the currently selected cell, and thencontinue executing all cells that follow.
At a scheduled time.
- Press theSchedule button located in the top right section of your notebookpanel. Here you can schedule your notebook to be executed once at some futuretime, or repeatedly at your specified interval.

4. Save and Share

How to save your work:

Under theFile menu, there are several ways to save your notebook:

Save will simply save the current state of your notebook, without any versioninformation.
Save Version will save your current state of your notebook with a version tagthat contains a date and time stamp. Up to 10 versions of your notebook can besaved, each one retrievable by selecting theRevert To Version menu item.

How to share your work:

You can share your notebook by selecting theShare button located in the topright section of your notebook panel. The end result of this action will be a URLlink that will display a “read-only” version of your notebook. You have severaloptions to specify exactly what you want shared from your notebook:

Only text and output: will remove all code cells from the notebook view.
All content excluding sensitive code cells: will remove any code cellsthat contain asensitive tag. For example,# @hidden_cell is used to protectyour credentials from being shared.
All content, including code: displays the notebook as is.
A variety ofdownload as options are also available in the menu.

Sample output

View a copy of the notebook including outputhere.

Troubleshooting

Notebook error:

This will occur if you run the notebook multiple times. The custom libraryNAME found in the structure below must be unique for each run. Change the value and run the cell again.

library_metadata= {client.runtimes.LibraryMetaNames.NAME:"PipelineLabelEncoder-Custom",client.runtimes.LibraryMetaNames.DESCRIPTION:"label_encoder_sklearn",client.runtimes.LibraryMetaNames.FILEPATH:"Pipeline_LabelEncoder-0.1.zip",client.runtimes.LibraryMetaNames.VERSION:"1.0",client.runtimes.LibraryMetaNames.PLATFORM: {"name":"python","versions": ["3.5"]}}

Learn more

Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our otherAI Code Patterns.
Data Analytics Code Patterns: Enjoyed this Code Pattern? Check out our otherData Analytics Code Patterns
AI and Data Code Pattern Playlist: Bookmark ourplaylist with all of our Code Pattern videos
With Watson: Want to take your Watson app to the next level? Looking to utilize Watson Brand assets?Join the With Watson program to leverage exclusive brand, marketing, and tech resources to amplify and accelerate your Watson embedded commercial solution.
Watson Studio: Master the art of data science with IBM'sWatson Studio

License

This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to theDeveloper Certificate of Origin, Version 1.1 and theApache License, Version 2.

Apache License FAQ

About

Walkthrough the data science life cycle with different tools, techniques, and algorithms. Use AIF360, pandas, and Jupyter notebooks to build and deploy a model on Watson Machine Learning.

developer.ibm.com/patterns/data-science-life-cycle-in-action-to-solve-employee-attrition-problem/

Languages

Jupyter Notebook100.0%

Movatterモバイル変換

License

IBM/employee-attrition-aif360

Folders and files

Latest commit

History

Repository files navigation

Data Science Process Pipeline in action to solve Employee Attrition Problem

Dataset License Details

Flow

Included Components

Featured technologies

Steps

1. Create a new Watson Studio project

2. Create the Notebook

3. Run the notebook

4. Save and Share

How to save your work:

How to share your work:

Sample output

Troubleshooting

Learn more

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors5

Uh oh!

Languages

Packages