- Notifications
You must be signed in to change notification settings - Fork401
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
License
salesforce/TransmogrifAI
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library written in Scala that runs on top of Apache Spark. It was developed with a focus on accelerating machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity, and reuse.Through automation, it achieves accuracies close to hand-tuned models with almost 100x reduction in time.
Use TransmogrifAI if you need a machine learning library to:
- Build production ready machine learning applications in hours, not months
- Build machine learning models without getting a Ph.D. in machine learning
- Build modular, reusable, strongly typed machine learning workflows
To understand the motivation behind TransmogrifAI check out these:
- Open Sourcing TransmogrifAI: Automated Machine Learning for Structured Data, a blog post by@snabar
- Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions, a talk by@tovbinm
- Low Touch Machine Learning, a talk by@leahmcguire
Skip toQuick Start and Documentation.
The Titanic dataset is an often-cited dataset in the machine learning community. The goal is to build a machine learnt model that will predict survivors from the Titanic passenger manifest. Here is how you would build the model using TransmogrifAI:
importcom.salesforce.op._importcom.salesforce.op.readers._importcom.salesforce.op.features._importcom.salesforce.op.features.types._importcom.salesforce.op.stages.impl.classification._importorg.apache.spark.SparkConfimportorg.apache.spark.sql.SparkSessionimplicitvalspark=SparkSession.builder.config(newSparkConf()).getOrCreate()importspark.implicits._// Read Titanic data as a DataFramevalpassengersData=DataReaders.Simple.csvCase[Passenger](path= pathToData).readDataset().toDF()// Extract response and predictor Featuresval (survived, predictors)=FeatureBuilder.fromDataFrame[RealNN](passengersData, response="survived")// Automated feature engineeringvalfeatureVector= predictors.transmogrify()// Automated feature validation and selectionvalcheckedFeatures= survived.sanityCheck(featureVector, removeBadFeatures=true)// Automated model selectionvalpred=BinaryClassificationModelSelector().setInput(survived, checkedFeatures).getOutput()// Setting up a TransmogrifAI workflow and training the modelvalmodel=newOpWorkflow().setInputDataset(passengersData).setResultFeatures(pred).train()println("Model summary:\n"+ model.summaryPretty())
Model summary:
Evaluated Logistic Regression, Random Forest models with 3 folds and AuPR metric.Evaluated 3 Logistic Regression models with AuPR between [0.6751930383321765, 0.7768725281794376]Evaluated 16 Random Forest models with AuPR between [0.7781671467343991, 0.8104798040316159]Selected model Random Forest classifier with parameters:|-----------------------|--------------|| Model Param | Value ||-----------------------|--------------|| modelType | RandomForest || featureSubsetStrategy | auto || impurity | gini || maxBins | 32 || maxDepth | 12 || minInfoGain | 0.001 || minInstancesPerNode | 10 || numTrees | 50 || subsamplingRate | 1.0 ||-----------------------|--------------|Model evaluation metrics:|-------------|--------------------|---------------------|| Metric Name | Hold Out Set Value | Training Set Value ||-------------|--------------------|---------------------|| Precision | 0.85 | 0.773851590106007 || Recall | 0.6538461538461539 | 0.6930379746835443 || F1 | 0.7391304347826088 | 0.7312186978297163 || AuROC | 0.8821603927986905 | 0.8766642291593114 || AuPR | 0.8225075757571668 | 0.850331080886535 || Error | 0.1643835616438356 | 0.19682151589242053 || TP | 17.0 | 219.0 || TN | 44.0 | 438.0 || FP | 3.0 | 64.0 || FN | 9.0 | 97.0 ||-------------|--------------------|---------------------|Top model insights computed using correlation:|-----------------------|----------------------|| Top Positive Insights | Correlation ||-----------------------|----------------------|| sex = "female" | 0.5177801026737666 || cabin = "OTHER" | 0.3331391338844782 || pClass = 1 | 0.3059642953159715 ||-----------------------|----------------------|| Top Negative Insights | Correlation ||-----------------------|----------------------|| sex = "male" | -0.5100301587292186 || pClass = 3 | -0.5075774968534326 || cabin = null | -0.31463114463832633 ||-----------------------|----------------------|Top model insights computed using CramersV:|-----------------------|----------------------|| Top Insights | CramersV ||-----------------------|----------------------|| sex | 0.525557139885501 || embarked | 0.31582347194683386 || age | 0.21582347194683386 ||-----------------------|----------------------|
While this may seem a bit too magical, for those who want more control, TransmogrifAI also provides the flexibility to completely specify all the features being extracted and all the algorithms being applied in your ML pipeline. Visit ourdocs site for full documentation, getting started, examples, faq and other information.
You can simply add TransmogrifAI as a regular dependency to an existing project.Start by picking TransmogrifAI version to match your project dependencies from the version matrix below (if not sure - take thestable version):
TransmogrifAI Version | Spark Version | Scala Version | Java Version |
---|---|---|---|
0.7.1 (unreleased, master),0.7.0 (stable) | 2.4 | 2.11 | 1.8 |
0.6.1, 0.6.0, 0.5.3, 0.5.2, 0.5.1, 0.5.0 | 2.3 | 2.11 | 1.8 |
0.4.0, 0.3.4 | 2.2 | 2.11 | 1.8 |
For Gradle inbuild.gradle
add:
repositories { jcenter() mavenCentral()}dependencies {// TransmogrifAI core dependency compile'com.salesforce.transmogrifai:transmogrifai-core_2.11:0.7.0'// TransmogrifAI pretrained models, e.g. OpenNLP POS/NER models etc. (optional)// compile 'com.salesforce.transmogrifai:transmogrifai-models_2.11:0.7.0'}
For SBT inbuild.sbt
add:
scalaVersion:="2.11.12"resolvers+=Resolver.jcenterRepo// TransmogrifAI core dependencylibraryDependencies+="com.salesforce.transmogrifai"%%"transmogrifai-core"%"0.7.0"// TransmogrifAI pretrained models, e.g. OpenNLP POS/NER models etc. (optional)// libraryDependencies += "com.salesforce.transmogrifai" %% "transmogrifai-models" % "0.7.0"
Then import TransmogrifAI into your code:
// TransmogrifAI functionality: feature types, feature builders, feature dsl, readers, aggregators etc.importcom.salesforce.op._importcom.salesforce.op.aggregators._importcom.salesforce.op.features._importcom.salesforce.op.features.types._importcom.salesforce.op.readers._// Spark enrichments (optional)importcom.salesforce.op.utils.spark.RichDataset._importcom.salesforce.op.utils.spark.RichRDD._importcom.salesforce.op.utils.spark.RichRow._importcom.salesforce.op.utils.spark.RichMetadata._importcom.salesforce.op.utils.spark.RichStructType._
Visit ourdocs site for full documentation, getting started, examples, faq and other information.
Seescaladoc for the programming API.
- Kevin Moore@jauntbox
- Kin Fai Kan@kinfaikan
- Leah McGuire@leahmcguire
- Matthew Tovbin@tovbinm
- Max Ovsiankin@maxov
- Michael Loh@mikeloh77
- Michael Weil@michaelweilsalesforce
- Shubha Nabar@snabar
- Vitaly Gordon@vitalyg
- Vlad Patryshev@vpatryshev
- Chris Rupley@crupley
- Chris Wu@cjwooo
- Eric Wayman@ericwayman
- Felipe Oliveira@feliperazeek
- Gera Shegalov@gerashegalov
- Jean-Marc Soumet@ajmssc
- Marco Vivero@marcovivero
- Mario Rodriguez@mrodriguezsfiq
- Mayukh Bhaowal@mayukhb
- Minh-An Quinn@minhanquinn
- Nicolas Drizard@nicodri
- Oleg Gusak@ogusak
- Patrick Framption@tricktrap
- Ryle Goehausen@ryleg
- Sanmitra Ijeri@sanmitra
- Sky Chen@almandsky
- Sophie Xiaodan Sun@sxd929
- Till Bergmann@tillbe
- Xiaoqian Liu@wingsrc
BSD 3-Clause © Salesforce.com, Inc.
About
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Topics
Resources
License
Code of conduct
Contributing
Security policy
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.