Insurance Premium Forecasting
Predict insurance premium prices using panel data with 50 policyholders tracked over 100 time periods.
Dataset Exploration
The Insurance dataset includes policyholder information and claim history over time. Features include claims frequency, claim amounts, fraud detection flags, and risk scores.
Panel data structure: 50 policyholders tracked over 100 time periods (5,000 total rows, 14 columns).
Fit Data (first 5 rows)
| Time | Individual | Claims Filed | Log Claim Amt | Fraud Flag | Claims Freq. | Claims Size | Regional Risk | Claims×Fraud | Auto | Home | Policy Risk | Premium Paid | Premium Price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 7.88 | 0 | 2 | 2 | 2 | 0 | 1 | 0 | 2 | 809 | 1281.76 |
| 2 | 1 | 1 | 7.07 | 0 | 3 | 1 | 2 | 0 | 1 | 0 | 2 | 809 | 1310.97 |
| 3 | 1 | 2 | 7.16 | 0 | 2 | 1 | 3 | 0 | 1 | 0 | 2 | 809 | 1283.5 |
| 4 | 1 | 1 | 8.06 | 1 | 2 | 2 | 1 | 1 | 1 | 0 | 2 | 809 | 1295.01 |
| 5 | 1 | 2 | 7.49 | 0 | 3 | 1 | 2 | 0 | 1 | 0 | 2 | 809 | 1293.94 |
Forecast Data (first 3 rows)
The forecast file covers future periods without the Premium Price target column.
| Time | Individual | Claims Filed | Log Claim Amt | Fraud Flag | Claims Freq. | Claims Size | Regional Risk | Claims×Fraud | Auto | Home | Policy Risk | Premium Paid |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 101 | 1 | 1 | 7.27 | 0 | 3 | 2 | 1 | 0 | 0 | 0 | 1 | 809 |
| 102 | 1 | 2 | 6.52 | 0 | 3 | 3 | 1 | 0 | 0 | 0 | 1 | 809 |
| 103 | 1 | 2 | 7.54 | 1 | 2 | 1 | 3 | 2 | 0 | 0 | 1 | 809 |
Code Walkthrough
Step 1: Initialize
Python
import pandas as pd
from datfid import DATFIDClient
client = DATFIDClient(token="your_DATFID_token")Step 2: Fit the Model
Python
url_fit = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Insurance.xlsx"
df = pd.read_excel(url_fit)
result = client.fit_model(
df=df,
id_col="Individual",
time_col="Time",
y="Premium Price",
current_features="all",
filter_by_significance=True
)Step 3: Forecast
Python
url_forecast = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Insurance_forecast.xlsx"
df_forecast = pd.read_excel(url_forecast)
forecast = client.forecast_model(df_forecast=df_forecast)Analysis Results (Model Fit)
Formula
Premium Price ~ α1*Intercept + α2*Premium Paid + α3*Policy Type: Auto + α4*Policy Type: Home + α5*Policyholder Risk + β1*Claims Filed + β2*Log Claim Amount + β3*Fraud Detection Flag + β4*Policyholder Claims Frequency + β5*Policyholder Claims Size + β6*Regional Risk + β7*Claims Filed × Fraud Detection FlagAlpha Estimates (Time-Invariant)
| Variable | Estimate | T-stat | Interpretation |
|---|---|---|---|
| Intercept | 1,509.6 | 690.1 | Baseline premium price before any policyholder or claims information is considered (~$1,509.6). |
| Premium Paid | +0.113 | 78.3 | Every dollar of historical premium paid lifts the priced premium by ~$0.11 — pricing inertia. |
| Policy Type: Auto | -23.3 | 25.4 | Auto policies are priced ~$23.3 below the reference policy type. |
| Policy Type: Home | -22.0 | 27.3 | Home policies are priced ~$22.0 below the reference policy type. |
| Policyholder Risk | +18.4 | 41.2 | Each additional risk-score point adds ~$18.4 to the premium. |
Beta Estimates (Time-Varying)
| Variable | Estimate | T-stat | Interpretation |
|---|---|---|---|
| Claims Filed | +20.1 | 40.8 | Each additional claim filed adds ~$20.1 to the premium. |
| Log Claim Amount | +24.6 | 49.8 | A 1-unit increase in log claim amount adds ~$24.6 to the premium. |
| Fraud Detection Flag | +13.1 | 8.9 | A flagged fraud signal pushes the premium up by ~$13.1. |
| Policyholder Claims Frequency | +10.5 | 30.2 | Each step up in claims frequency adds ~$10.5. |
| Policyholder Claims Size | +4.8 | 13.8 | Each step up in average claim size adds ~$4.8. |
| Regional Risk | +5.1 | 14.7 | Each step up in regional risk adds ~$5.1. |
| Claims Filed × Fraud Detection Flag | +1.0 | 1.5 | The interaction between claims filed and fraud flag adds ~$1.0 per combined unit, but is not statistically significant in this run. |
Model Performance
0.746
R² Overall
0.857
R² Between
0.617
R² Within
19.1
MAE
574.4
MSE
Try it yourself: Select "Insurance" in the Free Playground.