Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Automation of feature engineering, machine learning, model evaluation, model interpretation, eda, forecasting, recommender systems and more.

License

NotificationsYou must be signed in to change notification settings

AdrianAntico/RetroFit

Repository files navigation

Version: 0.1.7PythonBuild: PassingMaintenancePRs WelcomeGitHub Stars

Table of Contents

Documentation + Code Examples

Quick Note

This package is currently in its beginning stages. I'll be working off a blueprint from my R package RemixAutoML so there should be minimal breakages upon new releases, only non-breaking enhancements and additions.

Installation

# Most up-to-datepipinstallgit+https://github.com/AdrianAntico/RetroFit.git#egg=retrofit# From pypipipinstallretrofit==0.1.7# Check out R package RemixAutoMLhttps://github.com/AdrianAntico/RemixAutoML

Feature Engineering Note

Feature Engineering - Some of the feature engineering functions can only be found in this package. I believe feature engineering is your best bet for improving model performance. I have functions that cover all feature types. There are feature engineering functions for numeric data, categorical data, text data, and date data. They are all designed to generate features for training and scoring pipelines and they run extremely fast with low memory utilization. The Feature Engineering class offers the user the ability to have features generated using datatable, polars, or pandas for all feature engineering and data wrangling related methods. All methods collect paramter settings which will be used for scoring pipelines without the need for the user to save them. This makes life really easy when designing training and scoring pipelines.

Machine Learning Note

Machine Learning Training: the goal here is enable the data scientist or machine learning engineer to effortlessly build any number of models with full optionality to tweak all available underlying parameters offered by the various algorithms. The underlying data can come from datatable or polars which means you'll be able to model with bigger data than if you were utilizing pandas. All models come with the ability to generate comprehensive evaluation metrics, evaluation plots, importances, and feature insights. Scoring should be seamless, from regenerating features for scoring to the actual scoring. The RetroFit class makes this super easy, fast, with minimal memory utilization.

Feature Engineering

Expand to view content

FE0 Feature Engineering Row Dependence

Expand to view content

FE0_AutoLags()

Function Description

FE0_AutoLags() Automatically generate any number of lags, for any number of columns, by any number of By-Variables, using datatable.

Code Example

# QA: Test FE0_AutoLagsimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportPolarsFEaspfe# Instantiate Feature Engineering ClassFE=dtfe.FE()# No Group Example: datatableFilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)# Run functiont_start=timeit.default_timer()data1=FE.FE0_AutoLags(data=data,LagPeriods=1,LagColumnNames='Leads',DateColumnName='CalendarDateColumn',ByVariables=None,ImputeValue=-1,Sort=True,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)delOutputprint(data1.names)# No Group Example: polars# Instantiate Feature Engineering ClassFE=pfe.FE()# Run functionFilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=pl.read_csv(FilePath)t_start=timeit.default_timer()data2=FE.FE0_AutoLags(data=data,LagPeriods=1,LagColumnNames='Leads',DateColumnName='CalendarDateColumn',ByVariables=None,ImputeValue=-1.0,Sort=True,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)print(data2.columns)# Group Example, Single Lag: datatable# Instantiate Feature Engineering ClassFE=dtfe.FE()# Run functionFilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)t_start=timeit.default_timer()data1=FE.FE0_AutoLags(data=data,LagPeriods=1,LagColumnNames='Leads',DateColumnName='CalendarDateColumn',ByVariables=['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],ImputeValue=-1,Sort=True,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)print(data1.names)# Group Exmaple: polars# Instantiate Feature Engineering ClassFE=pfe.FE()# Run functionFilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=pl.read_csv(FilePath)t_start=timeit.default_timer()data2=FE.FE0_AutoLags(data=data,LagPeriods=1,LagColumnNames='Leads',DateColumnName='CalendarDateColumn',ByVariables=['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],ImputeValue=-1.0,Sort=True,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)print(data2.columns)# Group and Multiple Periods and LagColumnNames: datatable# Instantiate Feature Engineering ClassFE=dtfe.FE()# Run functionFilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)t_start=timeit.default_timer()data1=FE.FE0_AutoLags(data=data,LagPeriods=[1,3,5],LagColumnNames=['Leads','XREGS1'],DateColumnName='CalendarDateColumn',ByVariables=['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],ImputeValue=-1,Sort=True,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)print(data1.names)# Group and Multiple Periods and LagColumnNames: datatable# Instantiate Feature Engineering ClassFE=pfe.FE()# Run functionFilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=pl.read_csv(FilePath)t_start=timeit.default_timer()data2=FE.FE0_AutoLags(data=data,LagPeriods=[1,3,5],LagColumnNames=['Leads','XREGS1'],DateColumnName='CalendarDateColumn',ByVariables=['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],ImputeValue=-1.0,Sort=True,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)print(data2.columns)

FE0_AutoRollStats()

Function Description

FE0_AutoRollStats() Automatically generate any number of moving averages, moving standard deviations, moving mins and moving maxs from any number of source columns, by any number of By-Variables, using datatable.

Code Example

# Test Functionimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfe# Group Example:# Run functionFilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()t_start=timeit.default_timer()data=FE.FE0_AutoRollStats(data=data,RollColumnNames='Leads',DateColumnName='CalendarDateColumn',ByVariables=None,MovingAvg_Periods=[3,5,7],MovingSD_Periods=[3,5,7],MovingMin_Periods=[3,5,7],MovingMax_Periods=[3,5,7],ImputeValue=-1,Sort=True,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)print(data.names)## Group and Multiple Periods and RollColumnNames:FilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Run functiont_start=timeit.default_timer()data=FE.FE0_AutoRollStats(data=data,RollColumnNames=['Leads','XREGS1'],DateColumnName='CalendarDateColumn',ByVariables=['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],MovingAvg_Periods=[3,5,7],MovingSD_Periods=[3,5,7],MovingMin_Periods=[3,5,7],MovingMax_Periods=[3,5,7],ImputeValue=-1,Sort=True,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)print(data.names)## No Group Example:FilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Run functiont_start=timeit.default_timer()data=FE.FE0_AutoRollStats(data=data,RollColumnNames='Leads',DateColumnName='CalendarDateColumn',ByVariables=None,MovingAvg_Periods=[3,5,7],MovingSD_Periods=[3,5,7],MovingMin_Periods=[3,5,7],MovingMax_Periods=[3,5,7],ImputeValue=-1,Sort=True,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)print(data.names)

FE0_AutoDiff()

Function Description

FE0_AutoDiff() Automatically generate any number of differences from any number of source columns, for numeric, character, and date columns, by any number of By-Variables, using datatable.

Code Example

# Test Functionimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfe## Group Example:FilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()t_start=timeit.default_timer()data=FE.FE0_AutoDiff(data=data,DateColumnName='CalendarDateColumn',ByVariables= ['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],DiffNumericVariables='Leads',DiffDateVariables='CalendarDateColumn',DiffGroupVariables=None,NLag1=0,NLag2=1,Sort=True,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)print(data.names)## Group and Multiple Periods and RollColumnNames:FilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()t_start=timeit.default_timer()data=FE.FE0_AutoDiff(data=data,DateColumnName='CalendarDateColumn',ByVariables= ['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],DiffNumericVariables='Leads',DiffDateVariables='CalendarDateColumn',DiffGroupVariables=None,NLag1=0,NLag2=1,Sort=True,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)print(data.names)## No Group Example:FilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()t_start=timeit.default_timer()data=FE.FE0_AutoDiff(data=data,DateColumnName='CalendarDateColumn',ByVariables=None,DiffNumericVariables='Leads',DiffDateVariables='CalendarDateColumn',DiffGroupVariables=None,NLag1=0,NLag2=1,Sort=True,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)print(data.names)

FE1 Feature Engineering Row Independence

Expand to view content

FE1_AutoCalendarVariables()

Function Description

FE1_AutoCalendarVariables() Automatically generate calendar variables from your datatable.

Code Example

# Test Functionimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportPolarsFEaspfe# DatatableFE# Data can be created using the R package RemixAutoML and function FakeDataGeneratorFilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()t_start=timeit.default_timer()data=FE.AutoCalendarVariables(data=data,DateColumnNames='CalendarDateColumn',CalendarVariables= ['wday','mday','month','quarter','year'],use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)data.names# PolarsFE# Instantiate Feature Engineering ClassFE=pfe.FE()t_start=timeit.default_timer()data=FE.AutoCalendarVariables(data=data,DateColumnNames='CalendarDateColumn',CalendarVariables= ['wday','mday','month','quarter','year'],use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)data.names

FE1_DummyVariables()

Function Description

FE1_DummyVariables() Automatically generate dummy variables for user supplied categorical columns

Code Example

# Example: datatableimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportPolarsFEaspfe# DatatableFE# Instantiate Feature Engineering ClassFE=dtfe.FE()# Run functionFilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)t_start=timeit.default_timer()data=FE.FE1_DummyVariables(data=data,CategoricalColumnNames=['MarketingSegments','MarketingSegments2'],use_saved_args=False)t_end=timeit.default_timer()t_end-t_start# Example: polars# DatatableFE# Instantiate Feature Engineering ClassFE=pfe.FE()# Run functiondata=pl.read_csv('C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv')t_start=timeit.default_timer()data=FE.FE1_DummyVariables(data=data,ArgsList=None,CategoricalColumnNames=['MarketingSegments','MarketingSegments2'],use_saved_args=False)t_end=timeit.default_timer()t_end-t_start

FE2 Feature Engineering Full Data Set

Expand to view content

FE2_ColTypeConversions()

Function Description

FE2_ColTypeConversions() Automatically convert column types required by certain models

Code Example

# Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/RegressionData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Create some lagsdata=FE.FE0_AutoLags(data,LagColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',LagPeriods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some rolling statsdata=FE.FE0_AutoRollStats(data,RollColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',MovingAvg_Periods=[1,2],MovingSD_Periods=[2,3],MovingMin_Periods=[1,2],MovingMax_Periods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some diffsdata=FE.FE0_AutoDiff(data,DateColumnName='DateTime',ByVariables=['Factor_1','Factor_2','Factor_3'],DiffNumericVariables='Independent_Variable1',DiffDateVariables=None,DiffGroupVariables=None,NLag1=0,NLag2=1,Sort=True,use_saved_args=False)# Create Calendar Varsdata=FE.FE1_AutoCalendarVariables(data,DateColumnNames='DateTime',CalendarVariables=['wday','month','quarter'],use_saved_args=False)# Type conversions for modelingdata=FE.FE2_ColTypeConversions(self,data,Int2Float=True,Bool2Float=True,RemoveDateCols=False,RemoveStrCols=False,SkipCols=None,use_saved_args=False)

FE2_AutoDataParition()

Function Description

FE2_AutoDataParition() Automatically create data sets for training based on random or time based splits

Code Example

# FE2_AutoDataParition Example# Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportPolarsFEaspfe# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/RegressionData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# datatable random Examplet_start=timeit.default_timer()DataSets=FE.FE2_AutoDataParition(data=data,DateColumnName='CalendarDateColumn',PartitionType='random',Ratios=[0.70,0.20,0.10],Sort=False,ByVariables=None,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)TrainData=DataSets['TrainData']ValidationData=DataSets['ValidationData']TestData=DataSets['TestData']# polars random Exampledata=pl.read_csv(FilePath)t_start=timeit.default_timer()DataSets=FE.FE2_AutoDataParition(data=data,DateColumnName='CalendarDateColumn',PartitionType='random',Ratios=[0.70,0.20,0.10],ByVariables=None,Sort=False,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)TrainData=DataSets['TrainData']ValidationData=DataSets['ValidationData']TestData=DataSets['TestData']# datatable time Exampledata=dt.fread(FilePath)t_start=timeit.default_timer()DataSets=FE.FE2_AutoDataParition(data=data,DateColumnName='CalendarDateColumn',PartitionType='time',Ratios=[0.70,0.20,0.10],Sort=True,ByVariables=None,use_saved_args=False)t_end=timeit.default_timer()print(t_end-t_start)TrainData=DataSets['TrainData']ValidationData=DataSets['ValidationData']TestData=DataSets['TestData']# polars time Exampledata=pl.read_csv(FilePath)t_start=timeit.default_timer()DataSets=FE.FE2_AutoDataParition(data=data,DateColumnName='CalendarDateColumn',PartitionType='time',Ratios=[0.70,0.20,0.10],ByVariables=None,Sort=True,use_saved_args=False)t_end=timeit.default_timer()t_end-t_startTrainData=DataSets['TrainData']ValidationData=DataSets['ValidationData']TestData=DataSets['TestData']

FE3 Feature Engineering Model-Based

Expand to view content

Coming soon

Machine Learning

Expand to view content

ML0 Machine Learning: Prepare for Modeling

Expand to view content

ML0_Parameters()

Function Description

ML0_Parameters() Automatically generate parameters for modeling. User can update the parameters as desired.

Code Example

# Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtfromdatatableimportsort,f,byimportretrofitfromretrofitimportFeatureEngineeringasfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)# Create partitioned data setsData=fe.FE2_AutoDataParition(data=data,ArgsList=None,DateColumnName=None,PartitionType='random',Ratios=[0.7,0.2,0.1],ByVariables=None,Sort=False,Processing='datatable',InputFrame='datatable',OutputFrame='datatable')# Prepare modeling data setsDataSets=ml.ML0_GetModelData(Processing='catboost',TrainData=Data['TrainData'],ValidationData=Data['ValidationData'],TestData=Data['TestData'],ArgsList=None,TargetColumnName='Leads',NumericColumnNames=['XREGS1','XREGS2','XREGS3'],CategoricalColumnNames=['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],TextColumnNames=None,WeightColumnName=None,Threads=-1,InputFrame='datatable')# Get args list for algorithm and target typeModelArgs=ml.ML0_Parameters(Algorithms='CatBoost',TargetType='Regression',TrainMethod='Train')

ML0_GetModelData()

Function Description

ML0_GetModelData() Automatically create data sets chosen ML algorithm. Currently supports catboost, xgboost, and lightgbm.

Code Example

# ML0_GetModelData Example:importpkg_resourcesimportdatatableasdtfromdatatableimportsort,f,byimportretrofitfromretrofitimportFeatureEngineeringasfefromretrofitimportMachineLearningasml############################################################################################# CatBoost############################################################################################# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)# Create partitioned data setsDataSets=fe.FE2_AutoDataParition(data=data,ArgsList=None,DateColumnName='CalendarDateColumn',PartitionType='random',Ratios=[0.70,0.20,0.10],ByVariables=None,Processing='datatable',InputFrame='datatable',OutputFrame='datatable')# Collect partitioned dataTrainData=DataSets['TrainData']ValidationData=DataSets['ValidationData']TestData=DataSets['TestData']delDataSets# Create catboost data setsDataSets=ml.ML0_GetModelData(TrainData=TrainData,ValidationData=ValidationData,TestData=TestData,ArgsList=None,TargetColumnName='Leads',NumericColumnNames=['XREGS1','XREGS2','XREGS3'],CategoricalColumnNames=['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],TextColumnNames=None,WeightColumnName=None,Threads=-1,Processing='catboost',InputFrame='datatable')# Collect catboost training datacatboost_train=DataSets['train_data']catboost_validation=DataSets['validation_data']catboost_test=DataSets['test_data']############################################################################################# XGBoost############################################################################################# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)# Create partitioned data setsDataSets=fe.FE2_AutoDataParition(data=data,ArgsList=None,DateColumnName='CalendarDateColumn',PartitionType='random',Ratios=[0.70,0.20,0.10],ByVariables=None,Processing='datatable',InputFrame='datatable',OutputFrame='datatable')# Collect partitioned dataTrainData=DataSets['TrainData']ValidationData=DataSets['ValidationData']TestData=DataSets['TestData']delDataSets# Create xgboost data setsDataSets=ml.ML0_GetModelData(TrainData=TrainData,ValidationData=ValidationData,TestData=TestData,ArgsList=None,TargetColumnName='Leads',NumericColumnNames=['XREGS1','XREGS2','XREGS3'],CategoricalColumnNames=['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],TextColumnNames=None,WeightColumnName=None,Threads=-1,Processing='xgboost',InputFrame='datatable')# Collect xgboost training dataxgboost_train=DataSets['train_data']xgboost_validation=DataSets['validation_data']xgboost_test=DataSets['test_data']############################################################################################# LightGBM############################################################################################# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/BenchmarkData.csv')data=dt.fread(FilePath)# Create partitioned data setsDataSets=fe.FE2_AutoDataParition(data=data,ArgsList=None,DateColumnName='CalendarDateColumn',PartitionType='random',Ratios=[0.70,0.20,0.10],ByVariables=None,Processing='datatable',InputFrame='datatable',OutputFrame='datatable')# Collect partitioned dataTrainData=DataSets['TrainData']ValidationData=DataSets['ValidationData']TestData=DataSets['TestData']delDataSets# Create lightgbm data setsDataSets=ml.ML0_GetModelData(TrainData=TrainData,ValidationData=ValidationData,TestData=TestData,ArgsList=None,TargetColumnName='Leads',NumericColumnNames=['XREGS1','XREGS2','XREGS3'],CategoricalColumnNames=['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],TextColumnNames=None,WeightColumnName=None,Threads=-1,Processing='lightgbm',InputFrame='datatable')# Collect lightgbm training datalightgbm_train=DataSets['train_data']lightgbm_validation=DataSets['validation_data']lightgbm_test=DataSets['test_data']

ML1 Machine Learning RetroFit Class

Expand to view content

Class Meta Information

Class Goals

##################################### Goals####################################ClassInitializationModelInitializationTrainingFeatureTuningGridTuningModelScoringModelEvaluationModelInterpretation

Class Functions

##################################### Functions####################################ML1_Single_Train()ML1_Single_Score()ML1_Single_Evaluate()PrintAlgoArgs()

Class Attributes

##################################### Attributes####################################self.ModelArgs=ModelArgsself.ModelArgsNames= [*self.ModelArgs]self.Runs=len(self.ModelArgs)self.DataSets=DataSetsself.DataSetsNames= [*self.DataSets]self.ModelList=dict()self.ModelListNames= []self.FitList=dict()self.FitListNames= []self.EvaluationList=dict()self.EvaluationListNames= []self.InterpretationList=dict()self.InterpretationListNames= []self.CompareModelsList=dict()self.CompareModelsListNames= []

Ftrl Examples

Regression Training

##################################### Ftrl Regression##################################### Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/RegressionData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Create some lagsdata=FE.FE0_AutoLags(data,LagColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',LagPeriods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some rolling statsdata=FE.FE0_AutoRollStats(data,RollColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',MovingAvg_Periods=[1,2],MovingSD_Periods=[2,3],MovingMin_Periods=[1,2],MovingMax_Periods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some diffsdata=FE.FE0_AutoDiff(data,DateColumnName='DateTime',ByVariables=['Factor_1','Factor_2','Factor_3'],DiffNumericVariables='Independent_Variable1',DiffDateVariables=None,DiffGroupVariables=None,NLag1=0,NLag2=1,Sort=True,use_saved_args=False)# Create Calendar Varsdata=FE.FE1_AutoCalendarVariables(data,DateColumnNames='DateTime',CalendarVariables=['wday','month','quarter'],use_saved_args=False)# Type conversions for modelingdata=FE.FE2_ColTypeConversions(data,Int2Float=True,Bool2Float=True,RemoveDateCols=True,RemoveStrCols=False,SkipCols=None,use_saved_args=False)# Drop Text Cols (no word2vec yet)data=data[:, [zforzindata.namesifznotin ['Comment']]]# Create partitioned data setsDataFrames=FE.FE2_AutoDataPartition(data,DateColumnName=None,PartitionType='random',Ratios= [0.7,0.2,0.1],ByVariables=None,Sort=False,use_saved_args=False)# Prepare modeling data setsModelData=ml.ML0_GetModelData(Processing='Ftrl',TrainData=DataFrames['TrainData'],ValidationData=DataFrames['ValidationData'],TestData=DataFrames['TestData'],ArgsList=None,TargetColumnName='Adrian',NumericColumnNames= [zforzinlist(data.names)ifznotin ['Factor_1','Factor_2','Factor_3','Adrian']],CategoricalColumnNames= ['Factor_1','Factor_2','Factor_3'],TextColumnNames=None,WeightColumnName=None,Threads=-1,InputFrame='datatable')# Get args list for algorithm and target typeModelArgs=ml.ML0_Parameters(Algorithms='Ftrl',TargetType='Regression',TrainMethod='Train')# Initialize RetroFitx=ml.RetroFit(ModelArgs,ModelData,DataFrames)# Train Modelx.ML1_Single_Train(Algorithm='Ftrl')# Score datax.ML1_Single_Score(DataName=x.DataSetsNames[2],ModelName=x.ModelListNames[0],Algorithm='Ftrl',NewData=None)# Evaluate scored datametrics=x.ML1_Single_Evaluate(FitName=x.FitListNames[0],TargetType=x.ModelArgs.get('Ftrl')['TargetType'],ScoredDataName=x.DataSetsNames[-1],ByVariables=None,CostDict=None)# Metricsmetrics.keys()# Scoring data namesx.DataSetsNames# Scoring datax.DataSets.get('Scored_test_data_Ftrl_1')# Check ModelArgs Dictx.PrintAlgoArgs(Algo='Ftrl')# List of model namesx.ModelListNames# List of model fitted namesx.FitListNames

Classification Training

##################################### Ftrl Classification##################################### Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/ClassificationData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Create some lagsdata=FE.FE0_AutoLags(data,LagColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',LagPeriods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some rolling statsdata=FE.FE0_AutoRollStats(data,RollColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',MovingAvg_Periods=[1,2],MovingSD_Periods=[2,3],MovingMin_Periods=[1,2],MovingMax_Periods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some diffsdata=FE.FE0_AutoDiff(data,DateColumnName='DateTime',ByVariables=['Factor_1','Factor_2','Factor_3'],DiffNumericVariables='Independent_Variable1',DiffDateVariables=None,DiffGroupVariables=None,NLag1=0,NLag2=1,Sort=True,use_saved_args=False)# Create Calendar Varsdata=FE.FE1_AutoCalendarVariables(data,DateColumnNames='DateTime',CalendarVariables=['wday','month','quarter'],use_saved_args=False)# Type conversions for modelingdata=FE.FE2_ColTypeConversions(data,Int2Float=True,Bool2Float=True,RemoveDateCols=True,RemoveStrCols=False,SkipCols=None,use_saved_args=False)# Drop Text Cols (no word2vec yet)data=data[:, [zforzindata.namesifznotin ['Comment']]]# Create partitioned data setsDataFrames=FE.FE2_AutoDataPartition(data,DateColumnName=None,PartitionType='random',Ratios= [0.7,0.2,0.1],ByVariables=None,Sort=False,use_saved_args=False)# Prepare modeling data setsModelData=ml.ML0_GetModelData(Processing='Ftrl',TrainData=DataFrames['TrainData'],ValidationData=DataFrames['ValidationData'],TestData=DataFrames['TestData'],ArgsList=None,TargetColumnName='Adrian',NumericColumnNames= [zforzinlist(data.names)ifznotin ['Factor_1','Factor_2','Factor_3','Adrian']],CategoricalColumnNames= ['Factor_1','Factor_2','Factor_3'],TextColumnNames=None,WeightColumnName=None,Threads=-1,InputFrame='datatable')# Get args list for algorithm and target typeModelArgs=ml.ML0_Parameters(Algorithms='Ftrl',TargetType='Classification',TrainMethod='Train')# Initialize RetroFitx=ml.RetroFit(ModelArgs,ModelData,DataFrames)# Train Modelx.ML1_Single_Train(Algorithm='Ftrl')# Score datax.ML1_Single_Score(DataName=x.DataSetsNames[2],ModelName=x.ModelListNames[0],Algorithm='Ftrl',NewData=None)# Evaluate scored datametrics=x.ML1_Single_Evaluate(FitName=x.FitListNames[0],TargetType=x.ModelArgs.get('Ftrl')['TargetType'],ScoredDataName=x.DataSetsNames[-1],ByVariables=None,CostDict=dict(tpcost=0,fpcost=1,fncost=1,tncost=1))# Metricsmetrics.keys()# Scoring data namesx.DataSetsNames# Scoring datax.DataSets.get('Scored_test_data_Ftrl_1')# Check ModelArgs Dictx.PrintAlgoArgs(Algo='Ftrl')# List of model namesx.ModelListNames# List of model fitted namesx.FitListNames

MultiClass Training

##################################### Ftrl MultiClass##################################### Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/MultiClassData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Create Calendar Varsdata=FE.FE1_AutoCalendarVariables(data,DateColumnNames='DateTime',CalendarVariables=['wday','month','quarter'],use_saved_args=False)# Type conversions for modelingdata=FE.FE2_ColTypeConversions(data,Int2Float=True,Bool2Float=True,RemoveDateCols=True,RemoveStrCols=False,SkipCols=None,use_saved_args=False)# Drop Text Cols (no word2vec yet)data=data[:, [zforzindata.namesifznotin ['Comment']]]# Create partitioned data setsDataFrames=FE.FE2_AutoDataPartition(data,DateColumnName=None,PartitionType='random',Ratios= [0.7,0.2,0.1],ByVariables=None,Sort=False,use_saved_args=False)# Prepare modeling data setsModelData=ml.ML0_GetModelData(Processing='Ftrl',TrainData=DataFrames['TrainData'],ValidationData=DataFrames['ValidationData'],TestData=DataFrames['TestData'],ArgsList=None,TargetColumnName='Adrian',NumericColumnNames= [zforzinlist(data.names)ifznotin ['Factor_2','Factor_3','Adrian']],CategoricalColumnNames= ['Factor_2','Factor_3'],TextColumnNames=None,WeightColumnName=None,Threads=-1,InputFrame='datatable')# Get args list for algorithm and target typeModelArgs=ml.ML0_Parameters(Algorithms='Ftrl',TargetType='MultiClass',TrainMethod='Train')# Initialize RetroFitx=ml.RetroFit(ModelArgs,ModelData,DataFrames)# Train Modelx.ML1_Single_Train(Algorithm='Ftrl')# Score datax.ML1_Single_Score(DataName=x.DataSetsNames[2],ModelName=x.ModelListNames[0],Algorithm='Ftrl',NewData=None)# Evaluate scored datametrics=x.ML1_Single_Evaluate(FitName=x.FitListNames[0],TargetType=x.ModelArgs.get('Ftrl')['TargetType'],ScoredDataName=x.DataSetsNames[-1],ByVariables=None,CostDict=dict(tpcost=0,fpcost=1,fncost=1,tncost=1))# Metricsmetrics.keys()# Scoring data namesx.DataSetsNames# Scoring datax.DataSets.get('Scored_test_data_Ftrl_1').names# Check ModelArgs Dictx.PrintAlgoArgs(Algo='Ftrl')# List of model namesx.ModelListNames# List of model fitted namesx.FitListNames

CatBoost Examples

Regression Training

##################################### CatBoost Regression##################################### Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/RegressionData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Create some lagsdata=FE.FE0_AutoLags(data,LagColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',LagPeriods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some rolling statsdata=FE.FE0_AutoRollStats(data,RollColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',MovingAvg_Periods=[1,2],MovingSD_Periods=[2,3],MovingMin_Periods=[1,2],MovingMax_Periods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some diffsdata=FE.FE0_AutoDiff(data,DateColumnName='DateTime',ByVariables=['Factor_1','Factor_2','Factor_3'],DiffNumericVariables='Independent_Variable1',DiffDateVariables=None,DiffGroupVariables=None,NLag1=0,NLag2=1,Sort=True,use_saved_args=False)# Create Calendar Varsdata=FE.FE1_AutoCalendarVariables(data,DateColumnNames='DateTime',CalendarVariables=['wday','month','quarter'],use_saved_args=False)# Type conversions for modelingdata=FE.FE2_ColTypeConversions(data,Int2Float=True,Bool2Float=True,RemoveDateCols=True,RemoveStrCols=False,SkipCols=None,use_saved_args=False)# Drop Text Cols (no word2vec yet)data=data[:, [zforzindata.namesifznotin ['Comment']]]# Create partitioned data setsDataFrames=FE.FE2_AutoDataPartition(data,DateColumnName=None,PartitionType='random',Ratios= [0.7,0.2,0.1],ByVariables=None,Sort=False,use_saved_args=False)# Prepare modeling data setsModelData=ml.ML0_GetModelData(Processing='catboost',TrainData=DataFrames['TrainData'],ValidationData=DataFrames['ValidationData'],TestData=DataFrames['TestData'],ArgsList=None,TargetColumnName='Adrian',NumericColumnNames= [zforzinlist(data.names)ifznotin ['Factor_1','Factor_2','Factor_3','Adrian']],CategoricalColumnNames= ['Factor_1','Factor_2','Factor_3'],TextColumnNames=None,WeightColumnName=None,Threads=-1,InputFrame='datatable')# Get args list for algorithm and target typeModelArgs=ml.ML0_Parameters(Algorithms='CatBoost',TargetType='Regression',TrainMethod='Train')# Update iterations to run quicklyModelArgs.get('CatBoost').get('AlgoArgs')['iterations']=50# Initialize RetroFitx=ml.RetroFit(ModelArgs,ModelData,DataFrames)# Train Modelx.ML1_Single_Train(Algorithm='CatBoost')# Score datax.ML1_Single_Score(DataName=x.DataSetsNames[2],ModelName=x.ModelListNames[0],Algorithm='CatBoost',NewData=None)# Evaluate scored datametrics=x.ML1_Single_Evaluate(FitName=x.FitListNames[0],TargetType=x.ModelArgs.get('CatBoost')['TargetType'],ScoredDataName=x.DataSetsNames[-1],ByVariables=None,CostDict=None)# Metricsmetrics.keys()# Scoring data namesx.DataSetsNames# Scoring datax.DataSets.get('Scored_test_data_CatBoost_1')# Check ModelArgs Dictx.PrintAlgoArgs(Algo='CatBoost')# List of model namesx.ModelListNames# List of model fitted namesx.FitListNames

Classification Training

##################################### CatBoost Classification##################################### Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/ClassificationData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Create some lagsdata=FE.FE0_AutoLags(data,LagColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',LagPeriods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some rolling statsdata=FE.FE0_AutoRollStats(data,RollColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',MovingAvg_Periods=[1,2],MovingSD_Periods=[2,3],MovingMin_Periods=[1,2],MovingMax_Periods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some diffsdata=FE.FE0_AutoDiff(data,DateColumnName='DateTime',ByVariables=['Factor_1','Factor_2','Factor_3'],DiffNumericVariables='Independent_Variable1',DiffDateVariables=None,DiffGroupVariables=None,NLag1=0,NLag2=1,Sort=True,use_saved_args=False)# Create Calendar Varsdata=FE.FE1_AutoCalendarVariables(data,DateColumnNames='DateTime',CalendarVariables=['wday','month','quarter'],use_saved_args=False)# Type conversions for modelingdata=FE.FE2_ColTypeConversions(data,Int2Float=True,Bool2Float=True,RemoveDateCols=True,RemoveStrCols=False,SkipCols=None,use_saved_args=False)# Drop Text Cols (no word2vec yet)data=data[:, [zforzindata.namesifznotin ['Comment']]]# Create partitioned data setsDataFrames=FE.FE2_AutoDataPartition(data,DateColumnName=None,PartitionType='random',Ratios= [0.7,0.2,0.1],ByVariables=None,Sort=False,use_saved_args=False)# Prepare modeling data setsModelData=ml.ML0_GetModelData(Processing='catboost',TrainData=DataFrames['TrainData'],ValidationData=DataFrames['ValidationData'],TestData=DataFrames['TestData'],ArgsList=None,TargetColumnName='Adrian',NumericColumnNames= [zforzinlist(data.names)ifznotin ['Factor_1','Factor_2','Factor_3','Adrian']],CategoricalColumnNames= ['Factor_1','Factor_2','Factor_3'],TextColumnNames=None,WeightColumnName=None,Threads=-1,InputFrame='datatable')# Get args list for algorithm and target typeModelArgs=ml.ML0_Parameters(Algorithms='CatBoost',TargetType='Classification',TrainMethod='Train')# Update iterations to run quicklyModelArgs.get('CatBoost').get('AlgoArgs')['iterations']=50# Initialize RetroFitx=ml.RetroFit(ModelArgs,ModelData,DataFrames)# Train Modelx.ML1_Single_Train(Algorithm='CatBoost')# Score datax.ML1_Single_Score(DataName=x.DataSetsNames[2],ModelName=x.ModelListNames[0],Algorithm='CatBoost',NewData=None)# Evaluate scored datametrics=x.ML1_Single_Evaluate(FitName=x.FitListNames[0],TargetType=x.ModelArgs.get('CatBoost')['TargetType'],ScoredDataName=x.DataSetsNames[-1],ByVariables=None,CostDict=dict(tpcost=0,fpcost=1,fncost=1,tncost=0))# Metricsmetrics.keys()# Scoring data namesx.DataSetsNames# Scoring datax.DataSets.get('Scored_test_data_CatBoost_1')# Check ModelArgs Dictx.PrintAlgoArgs(Algo='CatBoost')# List of model namesx.ModelListNames# List of model fitted namesx.FitListNames

MultiClass Training

##################################### CatBoost MultiClass##################################### Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/MultiClassData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Create Calendar Varsdata=FE.FE1_AutoCalendarVariables(data,DateColumnNames='DateTime',CalendarVariables=['wday','month','quarter'],use_saved_args=False)# Type conversions for modelingdata=FE.FE2_ColTypeConversions(data,Int2Float=True,Bool2Float=True,RemoveDateCols=True,RemoveStrCols=False,SkipCols=None,use_saved_args=False)# Drop Text Cols (no word2vec yet)data=data[:, [zforzindata.namesifznotin ['Comment']]]# Create partitioned data setsDataFrames=FE.FE2_AutoDataPartition(data,DateColumnName=None,PartitionType='random',Ratios= [0.7,0.2,0.1],ByVariables=None,Sort=False,use_saved_args=False)# Features for modelingFeatures= [zforzinlist(data.names)ifznotin ['Factor_2','Factor_3','Adrian']]# Prepare modeling data setsModelData=ml.ML0_GetModelData(Processing='catboost',TrainData=DataFrames['TrainData'],ValidationData=DataFrames['ValidationData'],TestData=DataFrames['TestData'],ArgsList=None,TargetColumnName='Adrian',NumericColumnNames=Features,CategoricalColumnNames= ['Factor_2','Factor_3'],TextColumnNames=None,WeightColumnName=None,Threads=-1,InputFrame='datatable')# Get args list for algorithm and target typeModelArgs=ml.ML0_Parameters(Algorithms='CatBoost',TargetType='MultiClass',TrainMethod='Train')# Update iterations to run quicklyModelArgs.get('CatBoost').get('AlgoArgs')['iterations']=50# Initialize RetroFitx=ml.RetroFit(ModelArgs,ModelData,DataFrames)# Train Modelx.ML1_Single_Train(Algorithm='CatBoost')# Score datax.ML1_Single_Score(DataName=x.DataSetsNames[2],ModelName=x.ModelListNames[0],Algorithm='CatBoost',NewData=None)# Evaluate scored datametrics=x.ML1_Single_Evaluate(FitName=x.FitListNames[0],TargetType=x.ModelArgs.get('CatBoost')['TargetType'],ScoredDataName=x.DataSetsNames[-1],ByVariables=None,CostDict=dict(tpcost=0,fpcost=1,fncost=1,tncost=0))# Metricsmetrics.keys()# Scoring data namesx.DataSetsNames# Scoring datax.DataSets.get('Scored_test_data_CatBoost_1')# Check ModelArgs Dictx.PrintAlgoArgs(Algo='CatBoost')# List of model namesx.ModelListNames# List of model fitted namesx.FitListNames

XGBoost Examples

Regression Training

##################################### XGBoost Regression##################################### Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/RegressionData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Create some lagsdata=FE.FE0_AutoLags(data,LagColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',LagPeriods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some rolling statsdata=FE.FE0_AutoRollStats(data,RollColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',MovingAvg_Periods=[1,2],MovingSD_Periods=[2,3],MovingMin_Periods=[1,2],MovingMax_Periods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some diffsdata=FE.FE0_AutoDiff(data,DateColumnName='DateTime',ByVariables=['Factor_1','Factor_2','Factor_3'],DiffNumericVariables='Independent_Variable1',DiffDateVariables=None,DiffGroupVariables=None,NLag1=0,NLag2=1,Sort=True,use_saved_args=False)# Dummifydata=FE.FE1_DummyVariables(data=data,CategoricalColumnNames= ['Factor_1','Factor_2','Factor_3'],use_saved_args=False)data=data[:, [namenotin ['Factor_1','Factor_2','Factor_3']fornameindata.names]]# Create Calendar Varsdata=FE.FE1_AutoCalendarVariables(data,DateColumnNames='DateTime',CalendarVariables=['wday','month','quarter'],use_saved_args=False)# Type conversions for modelingdata=FE.FE2_ColTypeConversions(data,Int2Float=True,Bool2Float=True,RemoveDateCols=True,RemoveStrCols=False,SkipCols=None,use_saved_args=False)# Drop Text Cols (no word2vec yet)data=data[:, [zforzindata.namesifznotin ['Comment']]]# Create partitioned data setsDataFrames=FE.FE2_AutoDataPartition(data,DateColumnName=None,PartitionType='random',Ratios= [0.7,0.2,0.1],ByVariables=None,Sort=False,use_saved_args=False)# FeaturesFeatures= [zforzinlist(data.names)ifnotzin ['Adrian','DateTime','Comment','Weights']]# Prepare modeling data setsModelData=ml.ML0_GetModelData(Processing='xgboost',TrainData=DataFrames['TrainData'],ValidationData=DataFrames['ValidationData'],TestData=DataFrames['TestData'],ArgsList=None,TargetColumnName='Adrian',NumericColumnNames=Features,CategoricalColumnNames=None,TextColumnNames=None,WeightColumnName=None,Threads=-1,InputFrame='datatable')# Get args list for algorithm and target typeModelArgs=ml.ML0_Parameters(Algorithms='XGBoost',TargetType="Regression",TrainMethod="Train")# Update iterations to run quicklyModelArgs['XGBoost']['AlgoArgs']['num_boost_round']=50# Initialize RetroFitx=ml.RetroFit(ModelArgs,ModelData,DataFrames)# Train Modelx.ML1_Single_Train(Algorithm='XGBoost')# Score datax.ML1_Single_Score(DataName=x.DataSetsNames[2],ModelName=x.ModelListNames[0],Algorithm='XGBoost',NewData=None)# Evaluate scored datametrics=x.ML1_Single_Evaluate(FitName=x.FitListNames[0],TargetType=x.ModelArgs.get('XGBoost')['TargetType'],ScoredDataName=x.DataSetsNames[-1],ByVariables=None,CostDict=None)# Metricsmetrics.keys()# Scoring data namesx.DataSetsNames# Scoring datax.DataSets.get('Scored_test_data_XGBoost_1')# Check ModelArgs Dictx.PrintAlgoArgs(Algo='XGBoost')# List of model namesx.ModelListNames# List of model fitted namesx.FitListNames

Classification Training

##################################### XGBoost Classification##################################### Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/ClassificationData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Create some lagsdata=FE.FE0_AutoLags(data,LagColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',LagPeriods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some rolling statsdata=FE.FE0_AutoRollStats(data,RollColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',MovingAvg_Periods=[1,2],MovingSD_Periods=[2,3],MovingMin_Periods=[1,2],MovingMax_Periods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some diffsdata=FE.FE0_AutoDiff(data,DateColumnName='DateTime',ByVariables=['Factor_1','Factor_2','Factor_3'],DiffNumericVariables='Independent_Variable1',DiffDateVariables=None,DiffGroupVariables=None,NLag1=0,NLag2=1,Sort=True,use_saved_args=False)# Dummifydata=FE.FE1_DummyVariables(data=data,CategoricalColumnNames= ['Factor_1','Factor_2','Factor_3'],use_saved_args=False)data=data[:, [namenotin ['Factor_1','Factor_2','Factor_3']fornameindata.names]]# Create Calendar Varsdata=FE.FE1_AutoCalendarVariables(data,DateColumnNames='DateTime',CalendarVariables=['wday','month','quarter'],use_saved_args=False)# Type conversions for modelingdata=FE.FE2_ColTypeConversions(data,Int2Float=True,Bool2Float=True,RemoveDateCols=True,RemoveStrCols=False,SkipCols=None,use_saved_args=False)# Drop Text Cols (no word2vec yet)data=data[:, [zforzindata.namesifznotin ['Comment']]]# Create partitioned data setsDataFrames=FE.FE2_AutoDataPartition(data,DateColumnName=None,PartitionType='random',Ratios= [0.7,0.2,0.1],ByVariables=None,Sort=False,use_saved_args=False)# FeaturesFeatures= [zforzinlist(data.names)ifnotzin ['Adrian','DateTime','Comment','Weights']]# Prepare modeling data setsModelData=ml.ML0_GetModelData(Processing='xgboost',TrainData=DataFrames['TrainData'],ValidationData=DataFrames['ValidationData'],TestData=DataFrames['TestData'],ArgsList=None,TargetColumnName='Adrian',NumericColumnNames=Features,CategoricalColumnNames=None,TextColumnNames=None,WeightColumnName=None,Threads=-1,InputFrame='datatable')# Get args list for algorithm and target typeModelArgs=ml.ML0_Parameters(Algorithms='XGBoost',TargetType='Classification',TrainMethod='Train')# Update iterations to run quicklyModelArgs.get('XGBoost').get('AlgoArgs')['num_boost_round']=50# Initialize RetroFitx=ml.RetroFit(ModelArgs,ModelData,DataFrames)# Train Modelx.ML1_Single_Train(Algorithm='XGBoost')# Score datax.ML1_Single_Score(DataName=x.DataSetsNames[2],ModelName=x.ModelListNames[0],Algorithm='XGBoost',NewData=None)# Evaluate scored datametrics=x.ML1_Single_Evaluate(FitName=x.FitListNames[0],TargetType=x.ModelArgs.get('XGBoost')['TargetType'],ScoredDataName=x.DataSetsNames[-1],ByVariables=None,CostDict=dict(tpcost=0,fpcost=1,fncost=1,tncost=0))# Metricsmetrics.keys()# Scoring data namesx.DataSetsNames# Scoring datax.DataSets.get('Scored_test_data_XGBoost_1')# Check ModelArgs Dictx.PrintAlgoArgs(Algo='XGBoost')# List of model namesx.ModelListNames# List of model fitted namesx.FitListNames

MultiClass Training

##################################### XGBoost MultiClass##################################### Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/MultiClassData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Dummifydata=FE.FE1_DummyVariables(data=data,CategoricalColumnNames= ['Factor_2','Factor_3'],use_saved_args=False)data=data[:, [namenotin ['Factor_2','Factor_3']fornameindata.names]]# Create Calendar Varsdata=FE.FE1_AutoCalendarVariables(data,DateColumnNames='DateTime',CalendarVariables=['wday','month','quarter'],use_saved_args=False)# Type conversions for modelingdata=FE.FE2_ColTypeConversions(data,Int2Float=True,Bool2Float=True,RemoveDateCols=True,RemoveStrCols=False,SkipCols=None,use_saved_args=False)# Drop Text Cols (no word2vec yet)data=data[:, [zforzindata.namesifznotin ['Comment']]]# Create partitioned data setsDataFrames=FE.FE2_AutoDataPartition(data,DateColumnName=None,PartitionType='random',Ratios= [0.7,0.2,0.1],ByVariables=None,Sort=False,use_saved_args=False)# FeaturesFeatures= [zforzinlist(data.names)ifnotzin ['Adrian','DateTime','Comment','Weights']]# Prepare modeling data setsModelData=ml.ML0_GetModelData(Processing='xgboost',TrainData=DataFrames['TrainData'],ValidationData=DataFrames['ValidationData'],TestData=DataFrames['TestData'],ArgsList=None,TargetColumnName='Adrian',NumericColumnNames=Features,CategoricalColumnNames=None,TextColumnNames=None,WeightColumnName=None,Threads=-1,InputFrame='datatable')# Get args list for algorithm and target typeModelArgs=ml.ML0_Parameters(Algorithms='XGBoost',TargetType='MultiClass',TrainMethod='Train')# Update iterations to run quicklyModelArgs.get('XGBoost').get('AlgoArgs')['num_boost_round']=50# Initialize RetroFitx=ml.RetroFit(ModelArgs,ModelData,DataFrames)# Train Modelx.ML1_Single_Train(Algorithm='XGBoost')# Score datax.ML1_Single_Score(DataName=x.DataSetsNames[2],ModelName=x.ModelListNames[0],Algorithm='XGBoost',NewData=None)# Scoring data namesx.DataSetsNames# Scoring datax.DataSets.get('Scored_test_data_XGBoost_1')# Check ModelArgs Dictx.PrintAlgoArgs(Algo='XGBoost')# List of model namesx.ModelListNames# List of model fitted namesx.FitListNames

LightGBM Examples

Regression Training

##################################### LightGBM Regression##################################### Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/RegressionData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Create some lagsdata=FE.FE0_AutoLags(data,LagColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',LagPeriods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some rolling statsdata=FE.FE0_AutoRollStats(data,RollColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',MovingAvg_Periods=[1,2],MovingSD_Periods=[2,3],MovingMin_Periods=[1,2],MovingMax_Periods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some diffsdata=FE.FE0_AutoDiff(data,DateColumnName='DateTime',ByVariables=['Factor_1','Factor_2','Factor_3'],DiffNumericVariables='Independent_Variable1',DiffDateVariables=None,DiffGroupVariables=None,NLag1=0,NLag2=1,Sort=True,use_saved_args=False)# Dummifydata=FE.FE1_DummyVariables(data=data,CategoricalColumnNames= ['Factor_1','Factor_2','Factor_3'],use_saved_args=False)data=data[:, [namenotin ['Factor_1','Factor_2','Factor_3']fornameindata.names]]# Create Calendar Varsdata=FE.FE1_AutoCalendarVariables(data,DateColumnNames='DateTime',CalendarVariables=['wday','month','quarter'],use_saved_args=False)# Type conversions for modelingdata=FE.FE2_ColTypeConversions(data,Int2Float=True,Bool2Float=True,RemoveDateCols=True,RemoveStrCols=False,SkipCols=None,use_saved_args=False)# Drop Text Cols (no word2vec yet)data=data[:, [zforzindata.namesifznotin ['Comment']]]# Create partitioned data setsDataFrames=FE.FE2_AutoDataPartition(data,DateColumnName=None,PartitionType='random',Ratios= [0.7,0.2,0.1],ByVariables=None,Sort=False,use_saved_args=False)# FeaturesFeatures= [zforzinlist(data.names)ifnotzin ['Adrian','DateTime','Comment','Weights']]# Prepare modeling data setsModelData=ml.ML0_GetModelData(Processing='lightgbm',TrainData=DataFrames['TrainData'],ValidationData=DataFrames['ValidationData'],TestData=DataFrames['TestData'],ArgsList=None,TargetColumnName='Adrian',NumericColumnNames=Features,CategoricalColumnNames=None,TextColumnNames=None,WeightColumnName=None,Threads=-1,InputFrame='datatable')# Get args list for algorithm and target typeModelArgs=ml.ML0_Parameters(Algorithms='LightGBM',TargetType='Regression',TrainMethod='Train')# Update iterations to run quicklyModelArgs.get('LightGBM').get('AlgoArgs')['num_iterations']=50# Initialize RetroFitx=ml.RetroFit(ModelArgs,ModelData,DataFrames)# Train Modelx.ML1_Single_Train(Algorithm='LightGBM')# Score datax.ML1_Single_Score(DataName=x.DataSetsNames[2],ModelName=x.ModelListNames[0],Algorithm='LightGBM')# Scoring data namesx.DataSetsNames# Scoring datax.DataSets.get('Scored_test_data_LightGBM_1')# Check ModelArgs Dictx.PrintAlgoArgs(Algo='LightGBM')# List of model namesx.ModelListNames# List of model fitted namesx.FitListNames

Classification Training

##################################### LightGBM Classification##################################### Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/ClassificationData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Create some lagsdata=FE.FE0_AutoLags(data,LagColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',LagPeriods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some rolling statsdata=FE.FE0_AutoRollStats(data,RollColumnNames=['Independent_Variable1','Independent_Variable2'],DateColumnName='DateTime',ByVariables='Factor_1',MovingAvg_Periods=[1,2],MovingSD_Periods=[2,3],MovingMin_Periods=[1,2],MovingMax_Periods=[1,2],ImputeValue=-1,Sort=True,use_saved_args=False)# Create some diffsdata=FE.FE0_AutoDiff(data,DateColumnName='DateTime',ByVariables=['Factor_1','Factor_2','Factor_3'],DiffNumericVariables='Independent_Variable1',DiffDateVariables=None,DiffGroupVariables=None,NLag1=0,NLag2=1,Sort=True,use_saved_args=False)# Dummifydata=FE.FE1_DummyVariables(data=data,CategoricalColumnNames= ['Factor_1','Factor_2','Factor_3'],use_saved_args=False)data=data[:, [namenotin ['Factor_1','Factor_2','Factor_3']fornameindata.names]]# Create Calendar Varsdata=FE.FE1_AutoCalendarVariables(data,DateColumnNames='DateTime',CalendarVariables=['wday','month','quarter'],use_saved_args=False)# Type conversions for modelingdata=FE.FE2_ColTypeConversions(data,Int2Float=True,Bool2Float=True,RemoveDateCols=True,RemoveStrCols=False,SkipCols=None,use_saved_args=False)# Drop Text Cols (no word2vec yet)data=data[:, [zforzindata.namesifznotin ['Comment']]]# Create partitioned data setsDataFrames=FE.FE2_AutoDataPartition(data,DateColumnName=None,PartitionType='random',Ratios= [0.7,0.2,0.1],ByVariables=None,Sort=False,use_saved_args=False)# FeaturesFeatures= [zforzinlist(data.names)ifnotzin ['Adrian','DateTime','Comment','Weights']]# Prepare modeling data setsModelData=ml.ML0_GetModelData(Processing='lightgbm',TrainData=DataFrames['TrainData'],ValidationData=DataFrames['ValidationData'],TestData=DataFrames['TestData'],ArgsList=None,TargetColumnName='Adrian',NumericColumnNames=Features,CategoricalColumnNames=None,TextColumnNames=None,WeightColumnName=None,Threads=-1,InputFrame='datatable')# Get args list for algorithm and target typeModelArgs=ml.ML0_Parameters(Algorithms='LightGBM',TargetType='Classification',TrainMethod='Train')# Update iterations to run quicklyModelArgs.get('LightGBM').get('AlgoArgs')['num_iterations']=50# Initialize RetroFitx=ml.RetroFit(ModelArgs,ModelData,DataFrames)# Train Modelx.ML1_Single_Train(Algorithm='LightGBM')# Score datax.ML1_Single_Score(DataName=x.DataSetsNames[2],ModelName=x.ModelListNames[0],Algorithm='LightGBM')# Evaluate scored datametrics=x.ML1_Single_Evaluate(FitName=x.FitListNames[0],TargetType=x.ModelArgs.get('LightGBM')['TargetType'],ScoredDataName=x.DataSetsNames[-1],ByVariables=None,CostDict=dict(tpcost=0,fpcost=1,fncost=1,tncost=0))# Metricsmetrics.keys()# Scoring data namesx.DataSetsNames# Scoring datax.DataSets.get('Scored_test_data_LightGBM_1')# Check ModelArgs Dictx.PrintAlgoArgs(Algo='LightGBM')# List of model namesx.ModelListNames# List of model fitted namesx.FitListNames

MultiClass Training

##################################### LightGBM MultiClass##################################### Setup Environmentimportpkg_resourcesimporttimeitimportdatatableasdtimportretrofitfromretrofitimportDatatableFEasdtfefromretrofitimportMachineLearningasml# Load some dataFilePath=pkg_resources.resource_filename('retrofit','datasets/MultiClassData.csv')data=dt.fread(FilePath)# Instantiate Feature Engineering ClassFE=dtfe.FE()# Dummifydata=FE.FE1_DummyVariables(data=data,CategoricalColumnNames= ['Factor_2','Factor_3'],use_saved_args=False)data=data[:, [namenotin ['Factor_2','Factor_3']fornameindata.names]]# Create Calendar Varsdata=FE.FE1_AutoCalendarVariables(data,DateColumnNames='DateTime',CalendarVariables=['wday','month','quarter'],use_saved_args=False)# Type conversions for modelingdata=FE.FE2_ColTypeConversions(data,Int2Float=True,Bool2Float=True,RemoveDateCols=True,RemoveStrCols=False,SkipCols=None,use_saved_args=False)# Drop Text Cols (no word2vec yet)data=data[:, [zforzindata.namesifznotin ['Comment']]]# Create partitioned data setsDataFrames=FE.FE2_AutoDataPartition(data,DateColumnName=None,PartitionType='random',Ratios= [0.7,0.2,0.1],ByVariables=None,Sort=False,use_saved_args=False)# FeaturesFeatures= [zforzinlist(data.names)ifnotzin ['Adrian','DateTime','Comment','Weights']]# Prepare modeling data setsModelData=ml.ML0_GetModelData(Processing='lightgbm',TrainData=DataFrames['TrainData'],ValidationData=DataFrames['ValidationData'],TestData=DataFrames['TestData'],ArgsList=None,TargetColumnName='Adrian',NumericColumnNames=Features,CategoricalColumnNames=None,TextColumnNames=None,WeightColumnName=None,Threads=-1,InputFrame='datatable')# Get args list for algorithm and target typeModelArgs=ml.ML0_Parameters(Algorithms='LightGBM',TargetType='MultiClass',TrainMethod='Train')# Update iterations to run quicklyModelArgs.get('LightGBM').get('AlgoArgs')['num_iterations']=50# Initialize RetroFitx=ml.RetroFit(ModelArgs,ModelData,DataFrames)# Train Modelx.ML1_Single_Train(Algorithm='LightGBM')# Score datax.ML1_Single_Score(DataName=x.DataSetsNames[2],ModelName=x.ModelListNames[0],Algorithm='LightGBM')# Evaluate scored datametrics=x.ML1_Single_Evaluate(FitName=x.FitListNames[0],TargetType=x.ModelArgs.get('LightGBM')['TargetType'],ScoredDataName=x.DataSetsNames[-1],ByVariables=None,CostDict=dict(tpcost=0,fpcost=1,fncost=1,tncost=0))# Metricsmetrics.keys()# Scoring data namesx.DataSetsNames# Scoring datax.DataSets.get('Scored_test_data_LightGBM_1')# Check ModelArgs Dictx.PrintAlgoArgs(Algo='LightGBM')# List of model namesx.ModelListNames# List of model fitted namesx.FitListNames

Visualization

Expand to view content

Code here

About

Automation of feature engineering, machine learning, model evaluation, model interpretation, eda, forecasting, recommender systems and more.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp