Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
wu haifeng edited this pageAug 2, 2020 ·80 revisions

Welcome to the Shifu wiki!

Shifu is an open-source, end-to-end machine learning and data mining framework built on top of Hadoop. Shifu is designed for data scientists, simplifying the life-cycle of building machine learning models. While originally built for fraud modeling, Shifu is generalized for many other modeling domains.

Shifu provides a simple command-line interface for each step of the model building process, including

  • Statistic calculation & variable selection to determine the most predictive variables in your data
  • Variable normalization
  • Distributed variable selection based on sensitivity analysis
  • Distributed neural network model training
  • Distributed tree ensemble model training
  • Post training analysis & model evaluation

Shifu’s fast Hadoop-based, distributed neural network / logistic regression / gradient boosted trees training can reduce model training time from days to hours on TB data sets. Shifu integrates with Pig/MapReduce workflows on Hadoop, and Shifu-trained models can be integrated into production code with standard PMML format or native format with a simple Java API. Shifu leverages Hadoop, Pig, Akka, Encog and other open source projects.

Documents

Shifu: A Distributed Model Training Framework on Hadoop

Clone this wiki locally

[8]ページ先頭

©2009-2025 Movatter.jp