bits-bytes-nn/mofc-demand-forecastPublic

NotificationsYou must be signed in to change notification settings
Fork10
Star40

Time Series Forecasting for the M5 Competition

40 stars 10 forks Branches Tags Activity

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
img		img
utils		utils
.gitignore		.gitignore
01-forecasting.ipynb		01-forecasting.ipynb
02-forecasting_with_deep_learning.ipynb		02-forecasting_with_deep_learning.ipynb
03-forecasting_with_gbdt.ipynb		03-forecasting_with_gbdt.ipynb
README.md		README.md
requirements.txt		requirements.txt

Repository files navigation

MOFC Demand Forecasting with Time Series Analysis

Goals

Compare the accuracy of various time series forecasting algorithms such asProphet,DeepAR,VAR,DeepVAR, andLightGBM
(Optional) Usetsfresh for automated feature engineering of time series data.

Requirements

The dataset can be downloaded fromthis Kaggle competition.
In addition to theAnaconda libraries, you need to installaltair,vega_datasets,category_encoders,mxnet,gluonts,kats,lightgbm,hyperopt andpandarallel.
- kats requires Python 3.7 or higher.

Competition, Datasets and Evaluation

The M5 Competition aims to forecast daily sales for the next 28 days based on sales over the last 1,941 days for IDs of 30,490 items per Walmart store.
Data includes (i) time series of daily sales quantity by ID, (ii) sales prices, and (iii) holiday and event information.
Evaluation is done throughWeighted Root Mean Squared Scaled Error. A detailed explanation is given in the M5 Participants Guide and the implementation is atthis link.
For hyperparameter tuning, 0.1% of IDs were randomly selected and used, and 1% were used to measure test set performance.

Algorithms

Kats: Prophet

Prophet can incorporate forward-looking related time series into the model, so additional features were created with holiday and event information.
Since aProphet model has to fit for each ID, I had to use theapply function of thepandas dataframe and instead usedpandarallel to maximize the parallelization performance.
Prophet hyperparameters were tuned through 3-fold CV using theBayesian Optimization module built into theKats library. In this case,Tweedie was applied as the loss function. Below is the hyperparameter tuning result.

seasonality_prior_scale	changepoint_prior_scale	changepoint_range	n_changepoints	holidays_prior_scale	seasonality_mode
0.01	0.046	0.93	5	100.00	multiplicative

In the figures below, the actual sales (black dots), the point predictions and confidence intervals (blue lines and bands), and the red dotted lines representing the test period are shown.

Kats: VAR

SinceVAR is a multivariate time series model, the more IDs it fits simultaneously, the better the performance, and the memory requirement increases exponentially.

GluonTS: DeepAR

DeepAR can incorporate metadata and forward-looking related time series into the model, so additional features were created with sales prices, holiday and event information. Dynamic categorical variables were quantified throughFeature Hashing.
As a hyperparameter, it is very important to set the probability distribution of the output, and here it is set as theNegative Binomial distribution.

GluonTS: DeepVAR

In the case ofDeepVAR, a multivariate model, what can be set as the probability distribution of the output is limited (i.e.Multivariate Gaussian distribution), which leads to a decrease in performance.

LightGBM

I usedtsfresh to convert time series into structured data features, which consumes a lot of computational resources even with minimal settings.
ALightGBMTweedie regression model was fitted. Hyperparameters were tuned via 3-fold CV using theBayesian Optimization function of thehyperopt library. The following is the hyperparameter tuning result.

boosting	learning_rate	num_iterations	num_leaves	min_data_in_leaf	min_sum_hessian_in_leaf	bagging_fraction	bagging_freq	feature_fraction	extra_trees	lambda_l1	lambda_l2	path_smooth	max_bin
gbdt	0.01773	522	11	33	0.0008	0.5297	4	0.5407	False	2.9114	0.2127	217.3879	1023

The sales forecast for day D+1 was used recursively to predict the sales volume for day D+2 through feature engineering, and through this iterative process, 28-day test set performance was measured.

Algorithms Performance Summary

Algorithm	WRMSSE	sMAPE	MAE	MASE	RMSE
DeepAR	0.7513	1.4200	0.8795	0.9269	1.1614
LightGBM	1.0701	1.4429	0.8922	0.9394	1.1978
Prophet	1.0820	1.4174	1.1014	1.0269	1.4410
VAR	1.2876	2.3818	1.5545	1.6871	1.9502
Naive Method	1.3430	1.5074	1.3730	1.1077	1.7440
Mean Method	1.5984	1.4616	1.1997	1.0708	1.5352
DeepVAR	4.6933	4.6847	1.9201	1.3683	2.3195

As a result,DeepAR was finally selected and submitted its predictions to Kaggle, achieving a WRMSSE value of 0.8112 based on the private leaderboard.

References

About

Time Series Forecasting for the M5 Competition

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MOFC Demand Forecasting with Time Series Analysis

Goals

Requirements

Competition, Datasets and Evaluation

Algorithms

Kats: Prophet

Kats: VAR

GluonTS: DeepAR

GluonTS: DeepVAR

LightGBM

Algorithms Performance Summary

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

bits-bytes-nn/mofc-demand-forecast

Folders and files

Latest commit

History

Repository files navigation

MOFC Demand Forecasting with Time Series Analysis

Goals

Requirements

Competition, Datasets and Evaluation

Algorithms

Kats: Prophet

Kats: VAR

GluonTS: DeepAR

GluonTS: DeepVAR

LightGBM

Algorithms Performance Summary

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages