M5 Department Sales Forecasting
Forecast retail department-level sales based on the M5 forecasting competition data. 7 departments tracked over 1,941 time periods.
Dataset Exploration
The M5 Department dataset is based on the M5 forecasting competition and contains Walmart department-level sales data. Features include CPI, holiday events (sporting, cultural, national, religious), SNAP benefits by state, day of week, and month.
Panel data structure: 7 department aggregations tracked over 1,941 days (13,587 total rows, 13 columns). Each department (e.g. FOODS_1) is tracked daily.
Fit Data (first 5 rows)
| agg_id | ds | CPI | Sporting | Cultural | National | Religious | snap_CA | snap_TX | snap_WI | wday | month | Sales |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FOODS_1 | 40572 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 2343 |
| FOODS_1 | 40573 | 1.04 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 2216 |
| FOODS_1 | 40574 | 1.053 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 | 1657 |
| FOODS_1 | 40575 | 1.066 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 4 | 2 | 1508 |
| FOODS_1 | 40576 | 1.075 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 5 | 2 | 1209 |
Forecast Data (first 3 rows)
| agg_id | ds | CPI | Sporting | Cultural | National | Religious | snap_CA | snap_TX | snap_WI | wday | month | Sales |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FOODS_1 | 42513 | 1.201 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 5 | 3246 |
| FOODS_1 | 42514 | 1.201 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 5 | 3270 |
| FOODS_1 | 42515 | 1.201 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 5 | 3274 |
Code Walkthrough
Step 1: Initialize
import pandas as pd
from datfid import DATFIDClient
client = DATFIDClient(token="your_DATFID_token")Step 2: Fit the Model
url_fit = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/M5_Department.xlsx"
df = pd.read_excel(url_fit)
result = client.fit_model(
df=df,
id_col="agg_id",
time_col="ds",
y="Sales",
current_features="all",
filter_by_significance=True
)Step 3: Forecast
url_forecast = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/M5_Department_forecast.xlsx"
df_forecast = pd.read_excel(url_forecast)
forecast = client.forecast_model(df_forecast=df_forecast)Analysis Results (Model Fit)
Formula
Sales ~ α1*Intercept + β1*CPI + β2*Sporting + β3*Cultural + β4*National + β5*Religious + β6*snap_CA + β7*snap_TX + β8*snap_WI + β9*wday + β10*monthAlpha Estimates (Time-Invariant)
| Variable | Estimate | T-stat | Interpretation |
|---|---|---|---|
| Intercept | -24,015 | 518.46 | Baseline department sales level |
Beta Estimates (Time-Varying)
| Variable | Estimate | T-stat | Interpretation |
|---|---|---|---|
| CPI | +25,321.8 | 45.18 | A 1-unit CPI increase drives sales up by ~25,322 units (inflation-linked pricing) |
| wday | -242.33 | 38.64 | Each additional weekday reduces sales by ~242 units (weekends sell more) |
| National | -758.56 | 9.68 | National holidays reduce sales by ~759 units (stores may close or reduce hours) |
| snap_TX | +256.73 | 7.79 | SNAP benefits in Texas increase sales by ~257 units |
| snap_WI | +242.74 | 7.36 | SNAP benefits in Wisconsin increase sales by ~243 units |
| snap_CA | +212.01 | 7.07 | SNAP benefits in California increase sales by ~212 units |
| month | -16.39 | 4.47 | Each later month reduces sales by ~16 units |
| Cultural | -205.29 | 2.31 | Cultural holidays reduce sales by ~205 units (p=0.021) |
| Religious | -97.62 | 1.29 | Not statistically significant (p=0.197) |
| Sporting | +51.14 | 0.37 | Not statistically significant (p=0.713) |
Model Performance
The M5 dataset has high variance across departments (FOODS vs HOBBIES vs HOUSEHOLD), which explains the lower between-group R². Adding lagged features or filtering by significance can improve results.
Try it yourself: Select "M5 Department" in the Free Playground.