M5 Department Sales Forecasting

Forecast retail department-level sales based on the M5 forecasting competition data. 7 departments tracked over 1,941 time periods.

Dataset Exploration

The M5 Department dataset is based on the M5 forecasting competition and contains Walmart department-level sales data. Features include CPI, holiday events (sporting, cultural, national, religious), SNAP benefits by state, day of week, and month.

Panel data structure: 7 department aggregations tracked over 1,941 days (13,587 total rows, 13 columns). Each department (e.g. FOODS_1) is tracked daily.

Fit Data (first 5 rows)

agg_iddsCPISportingCulturalNationalReligioussnap_CAsnap_TXsnap_WIwdaymonthSales
FOODS_14057210000000112343
FOODS_1405731.040000000212216
FOODS_1405741.0530000000311657
FOODS_1405751.0660000110421508
FOODS_1405761.0750000101521209

Forecast Data (first 3 rows)

agg_iddsCPISportingCulturalNationalReligioussnap_CAsnap_TXsnap_WIwdaymonthSales
FOODS_1425131.2010000000353246
FOODS_1425141.2010000000453270
FOODS_1425151.2010000000553274

Download sample datasets from GitHub

Code Walkthrough

Step 1: Initialize

Python
import pandas as pd
from datfid import DATFIDClient

client = DATFIDClient(token="your_DATFID_token")

Step 2: Fit the Model

Python
url_fit = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/M5_Department.xlsx"
df = pd.read_excel(url_fit)

result = client.fit_model(
    df=df,
    id_col="agg_id",
    time_col="ds",
    y="Sales",
    current_features="all",
    filter_by_significance=True
)

Step 3: Forecast

Python
url_forecast = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/M5_Department_forecast.xlsx"
df_forecast = pd.read_excel(url_forecast)

forecast = client.forecast_model(df_forecast=df_forecast)

Analysis Results (Model Fit)

Formula

Sales ~ α1*Intercept + β1*CPI + β2*Sporting + β3*Cultural + β4*National + β5*Religious + β6*snap_CA + β7*snap_TX + β8*snap_WI + β9*wday + β10*month

Alpha Estimates (Time-Invariant)

VariableEstimateT-statInterpretation
Intercept-24,015518.46Baseline department sales level

Beta Estimates (Time-Varying)

VariableEstimateT-statInterpretation
CPI+25,321.845.18A 1-unit CPI increase drives sales up by ~25,322 units (inflation-linked pricing)
wday-242.3338.64Each additional weekday reduces sales by ~242 units (weekends sell more)
National-758.569.68National holidays reduce sales by ~759 units (stores may close or reduce hours)
snap_TX+256.737.79SNAP benefits in Texas increase sales by ~257 units
snap_WI+242.747.36SNAP benefits in Wisconsin increase sales by ~243 units
snap_CA+212.017.07SNAP benefits in California increase sales by ~212 units
month-16.394.47Each later month reduces sales by ~16 units
Cultural-205.292.31Cultural holidays reduce sales by ~205 units (p=0.021)
Religious-97.621.29Not statistically significant (p=0.197)
Sporting+51.140.37Not statistically significant (p=0.713)

Model Performance

0.021
R² Overall
≈0
R² Between
0.230
R² Within
3,821
MAE
2.91e7
MSE

The M5 dataset has high variance across departments (FOODS vs HOBBIES vs HOUSEHOLD), which explains the lower between-group R². Adding lagged features or filtering by significance can improve results.

Try it yourself: Select "M5 Department" in the Free Playground.