Insurance Premium Forecasting
Predict insurance premium prices using panel data with 50 policyholders tracked over 100 time periods.
Dataset Exploration
The Insurance dataset includes policyholder information and claim history over time. Features include claims frequency, claim amounts, fraud detection flags, and risk scores.
Panel data structure: 50 policyholders tracked over 100 time periods (5,000 total rows, 14 columns).
Fit Data (first 5 rows)
| Time | Individual | Claims Filed | Log Claim Amt | Fraud Flag | Claims Freq. | Claims Size | Regional Risk | Claims×Fraud | Auto | Home | Policy Risk | Premium Paid | Premium Price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 7.88 | 0 | 2 | 2 | 2 | 0 | 1 | 0 | 2 | 809 | 1281.76 |
| 2 | 1 | 1 | 7.07 | 0 | 3 | 1 | 2 | 0 | 1 | 0 | 2 | 809 | 1310.97 |
| 3 | 1 | 2 | 7.16 | 0 | 2 | 1 | 3 | 0 | 1 | 0 | 2 | 809 | 1283.5 |
| 4 | 1 | 1 | 8.06 | 1 | 2 | 2 | 1 | 1 | 1 | 0 | 2 | 809 | 1295.01 |
| 5 | 1 | 2 | 7.49 | 0 | 3 | 1 | 2 | 0 | 1 | 0 | 2 | 809 | 1293.94 |
Forecast Data (first 3 rows)
The forecast file covers future periods without the Premium Price target column.
| Time | Individual | Claims Filed | Log Claim Amt | Fraud Flag | Claims Freq. | Claims Size | Regional Risk | Claims×Fraud | Auto | Home | Policy Risk | Premium Paid |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 101 | 1 | 1 | 7.27 | 0 | 3 | 2 | 1 | 0 | 0 | 0 | 1 | 809 |
| 102 | 1 | 2 | 6.52 | 0 | 3 | 3 | 1 | 0 | 0 | 0 | 1 | 809 |
| 103 | 1 | 2 | 7.54 | 1 | 2 | 1 | 3 | 2 | 0 | 0 | 1 | 809 |
Code Walkthrough
Step 1: Initialize
Python
import pandas as pd
from datfid import DATFIDClient
client = DATFIDClient(token="your_DATFID_token")Step 2: Fit the Model
Python
url_fit = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Insurance.xlsx"
df = pd.read_excel(url_fit)
result = client.fit_model(
df=df,
id_col="Individual",
time_col="Time",
y="Premium Price",
current_features="all",
filter_by_significance=True
)Step 3: Forecast
Python
url_forecast = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Insurance_forecast.xlsx"
df_forecast = pd.read_excel(url_forecast)
forecast = client.forecast_model(df_forecast=df_forecast)Analysis Results (Model Fit)
Formula
Premium Price ~ α1*Intercept + α2*Policy Type: Home + α3*Premium Paid + α4*Policyholder Risk + α5*Policy Type: Auto + β1*Claims Filed + β2*Log Claim Amount + β3*Fraud Detection Flag + β4*Policyholder Claims Frequency + β5*Policyholder Claims Size + β6*Regional Risk + β7*Claims Filed × Fraud Detection FlagAlpha Estimates (Time-Invariant)
| Variable | Estimate | T-stat | Interpretation |
|---|---|---|---|
| Intercept | 1,509.62 | 690.12 | Baseline premium price |
| Premium Paid | +0.113 | 78.26 | Past premium paid strongly predicts current pricing |
| Policyholder Risk | +18.39 | 41.24 | Each risk category adds ~$18 to premium |
| Policy Type: Home | -22.00 | 27.32 | Home policies are ~$22 cheaper than baseline |
| Policy Type: Auto | -23.35 | 25.36 | Auto policies are ~$23 cheaper than baseline |
Beta Estimates (Time-Varying)
| Variable | Estimate | T-stat | Interpretation |
|---|---|---|---|
| Log Claim Amount | +24.58 | 49.78 | Higher claim amounts increase premium by ~$25 per log unit |
| Claims Filed | +20.08 | 40.84 | Each additional claim increases premium by ~$20 |
| Claims Frequency | +10.54 | 30.22 | Higher claim frequency adds ~$11 per unit |
| Regional Risk | +5.13 | 14.65 | Higher-risk regions add ~$5 to premium |
| Claims Size | +4.81 | 13.77 | Larger claim size adds ~$5 to premium |
| Fraud Detection Flag | +13.13 | 8.86 | Fraud flag adds ~$13 to premium |
| Claims × Fraud | +1.02 | 1.45 | Not statistically significant (p=0.15) |
Model Performance
0.746
R² Overall
0.857
R² Between
0.617
R² Within
19.14
MAE
574.39
MSE
Try it yourself: Select "Insurance" in the Free Playground.