Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Time Series Forecasting for the M5 Competition

NotificationsYou must be signed in to change notification settings

bits-bytes-nn/mofc-demand-forecast

Repository files navigation

Goals

  • Compare the accuracy of various time series forecasting algorithms such asProphet,DeepAR,VAR,DeepVAR, andLightGBM
  • (Optional) Usetsfresh for automated feature engineering of time series data.

Requirements

  • The dataset can be downloaded fromthis Kaggle competition.
  • In addition to theAnaconda libraries, you need to installaltair,vega_datasets,category_encoders,mxnet,gluonts,kats,lightgbm,hyperopt andpandarallel.
    • kats requires Python 3.7 or higher.

Competition, Datasets and Evaluation

  • The M5 Competition aims to forecast daily sales for the next 28 days based on sales over the last 1,941 days for IDs of 30,490 items per Walmart store.
  • Data includes (i) time series of daily sales quantity by ID, (ii) sales prices, and (iii) holiday and event information.
  • Evaluation is done throughWeighted Root Mean Squared Scaled Error. A detailed explanation is given in the M5 Participants Guide and the implementation is atthis link.
  • For hyperparameter tuning, 0.1% of IDs were randomly selected and used, and 1% were used to measure test set performance.

Algorithms

Kats: Prophet

  • Prophet can incorporate forward-looking related time series into the model, so additional features were created with holiday and event information.
  • Since aProphet model has to fit for each ID, I had to use theapply function of thepandas dataframe and instead usedpandarallel to maximize the parallelization performance.
  • Prophet hyperparameters were tuned through 3-fold CV using theBayesian Optimization module built into theKats library. In this case,Tweedie was applied as the loss function. Below is the hyperparameter tuning result.
seasonality_prior_scalechangepoint_prior_scalechangepoint_rangen_changepointsholidays_prior_scaleseasonality_mode
0.010.0460.935100.00multiplicative
  • In the figures below, the actual sales (black dots), the point predictions and confidence intervals (blue lines and bands), and the red dotted lines representing the test period are shown.

Forecasting

Kats: VAR

  • SinceVAR is a multivariate time series model, the more IDs it fits simultaneously, the better the performance, and the memory requirement increases exponentially.

Forecasting

GluonTS: DeepAR

  • DeepAR can incorporate metadata and forward-looking related time series into the model, so additional features were created with sales prices, holiday and event information. Dynamic categorical variables were quantified throughFeature Hashing.
  • As a hyperparameter, it is very important to set the probability distribution of the output, and here it is set as theNegative Binomial distribution.

Forecasting

GluonTS: DeepVAR

  • In the case ofDeepVAR, a multivariate model, what can be set as the probability distribution of the output is limited (i.e.Multivariate Gaussian distribution), which leads to a decrease in performance.

Forecasting

LightGBM

  • I usedtsfresh to convert time series into structured data features, which consumes a lot of computational resources even with minimal settings.
  • ALightGBMTweedie regression model was fitted. Hyperparameters were tuned via 3-fold CV using theBayesian Optimization function of thehyperopt library. The following is the hyperparameter tuning result.
boostinglearning_ratenum_iterationsnum_leavesmin_data_in_leafmin_sum_hessian_in_leafbagging_fractionbagging_freqfeature_fractionextra_treeslambda_l1lambda_l2path_smoothmax_bin
gbdt0.0177352211330.00080.529740.5407False2.91140.2127217.38791023
  • The sales forecast for day D+1 was used recursively to predict the sales volume for day D+2 through feature engineering, and through this iterative process, 28-day test set performance was measured.

Forecasting

Algorithms Performance Summary

AlgorithmWRMSSEsMAPEMAEMASERMSE
DeepAR0.75131.42000.87950.92691.1614
LightGBM1.07011.44290.89220.93941.1978
Prophet1.08201.41741.10141.02691.4410
VAR1.28762.38181.55451.68711.9502
Naive Method1.34301.50741.37301.10771.7440
Mean Method1.59841.46161.19971.07081.5352
DeepVAR4.69334.68471.92011.36832.3195

As a result,DeepAR was finally selected and submitted its predictions to Kaggle, achieving a WRMSSE value of 0.8112 based on the private leaderboard.

References

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp