M5 Department Sales Forecasting

Forecast retail department-level sales based on the M5 forecasting competition data. 7 departments tracked over 1,941 time periods.

Dataset Exploration

The M5 Department dataset is based on the M5 forecasting competition and contains Walmart department-level sales data. Features include CPI, holiday events (sporting, cultural, national, religious), SNAP benefits by state, day of week, and month.

Panel data structure: 7 department aggregations tracked over 1,941 days (13,587 total rows, 13 columns). Each department (e.g. FOODS_1) is tracked daily.

Fit Data (first 5 rows)

agg_iddsCPISportingCulturalNationalReligioussnap_CAsnap_TXsnap_WIwdaymonthSales
FOODS_14057210000000112343
FOODS_1405731.040000000212216
FOODS_1405741.0530000000311657
FOODS_1405751.0660000110421508
FOODS_1405761.0750000101521209

Forecast Data (first 3 rows)

agg_iddsCPISportingCulturalNationalReligioussnap_CAsnap_TXsnap_WIwdaymonthSales
FOODS_1425131.2010000000353246
FOODS_1425141.2010000000453270
FOODS_1425151.2010000000553274

Download sample datasets from GitHub

Code Walkthrough

Step 1: Initialize

Python
import pandas as pd
from datfid import DATFIDClient

client = DATFIDClient(token="your_DATFID_token")

Step 2: Fit the Model

Python
url_fit = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/M5_Department.xlsx"
df = pd.read_excel(url_fit)

result = client.fit_model(
    df=df,
    id_col="agg_id",
    time_col="ds",
    y="Sales",
    current_features="all",
    filter_by_significance=True
)

Step 3: Forecast

Python
url_forecast = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/M5_Department_forecast.xlsx"
df_forecast = pd.read_excel(url_forecast)

forecast = client.forecast_model(df_forecast=df_forecast)

Analysis Results (Model Fit)

Formula

Sales ~ α1*Intercept + β1*CPI + β2*Sporting + β3*Cultural + β4*National + β5*Religious + β6*snap_CA + β7*snap_TX + β8*snap_WI + β9*wday + β10*month

Alpha Estimates (Time-Invariant)

VariableEstimateT-statInterpretation
Intercept-24,014.8518.5Baseline department-level sales when every driver sits at zero.

Beta Estimates (Time-Varying)

VariableEstimateT-statInterpretation
CPI+25,321.845.2A 1-unit CPI increase lifts sales by ~25,321.8 units — inflation-linked retail pricing flowing through to revenue.
Sporting+51.10.4Sporting-event days are not statistically distinguishable from non-event days in this run.
Cultural-205.32.3Cultural holidays subtract ~205.3 units from sales (stores reduce hours or shoppers stay home).
National-758.69.7National holidays subtract ~758.6 units from sales (closures and reduced traffic).
Religious-97.61.3Religious holidays move sales slightly down, but the effect is not statistically significant.
snap_CA+212.07.1SNAP days in California add ~212.0 units of sales over a typical day.
snap_TX+256.77.8SNAP days in Texas add ~256.7 units of sales over a typical day.
snap_WI+242.77.4SNAP days in Wisconsin add ~242.7 units of sales over a typical day.
wday-242.338.6Each step from Monday toward Sunday subtracts ~242.3 units — weekends still sell more, but the day-of-week index treats Saturday as the high-sales anchor.
month-16.44.5Each later month of the year shaves ~16.4 units off sales — a mild seasonal drift baked into the average.

Model Performance

0.0214
R² Overall
≈0
R² Between
0.230
R² Within
3,821.1
MAE
2.91e7
MSE

The M5 dataset has high variance across departments (FOODS vs HOBBIES vs HOUSEHOLD), which explains the lower between-group R². Adding lagged features or filtering by significance can improve results.

Try it yourself: Select "M5 Department" in the Free Playground.