Insurance Premium Forecasting

Predict insurance premium prices using panel data with 50 policyholders tracked over 100 time periods.

Dataset Exploration

The Insurance dataset includes policyholder information and claim history over time. Features include claims frequency, claim amounts, fraud detection flags, and risk scores.

Panel data structure: 50 policyholders tracked over 100 time periods (5,000 total rows, 14 columns).

Fit Data (first 5 rows)

TimeIndividualClaims FiledLog Claim AmtFraud FlagClaims Freq.Claims SizeRegional RiskClaims×FraudAutoHomePolicy RiskPremium PaidPremium Price
1117.88022201028091281.76
2117.07031201028091310.97
3127.16021301028091283.5
4118.06122111028091295.01
5127.49031201028091293.94

Forecast Data (first 3 rows)

The forecast file covers future periods without the Premium Price target column.

TimeIndividualClaims FiledLog Claim AmtFraud FlagClaims Freq.Claims SizeRegional RiskClaims×FraudAutoHomePolicy RiskPremium Paid
101117.2703210001809
102126.5203310001809
103127.5412132001809

Download sample datasets from GitHub

Code Walkthrough

Step 1: Initialize

Python
import pandas as pd
from datfid import DATFIDClient

client = DATFIDClient(token="your_DATFID_token")

Step 2: Fit the Model

Python
url_fit = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Insurance.xlsx"
df = pd.read_excel(url_fit)

result = client.fit_model(
    df=df,
    id_col="Individual",
    time_col="Time",
    y="Premium Price",
    current_features="all",
    filter_by_significance=True
)

Step 3: Forecast

Python
url_forecast = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Insurance_forecast.xlsx"
df_forecast = pd.read_excel(url_forecast)

forecast = client.forecast_model(df_forecast=df_forecast)

Analysis Results (Model Fit)

Formula

Premium Price ~ α1*Intercept + α2*Policy Type: Home + α3*Premium Paid + α4*Policyholder Risk + α5*Policy Type: Auto + β1*Claims Filed + β2*Log Claim Amount + β3*Fraud Detection Flag + β4*Policyholder Claims Frequency + β5*Policyholder Claims Size + β6*Regional Risk + β7*Claims Filed × Fraud Detection Flag

Alpha Estimates (Time-Invariant)

VariableEstimateT-statInterpretation
Intercept1,509.62690.12Baseline premium price
Premium Paid+0.11378.26Past premium paid strongly predicts current pricing
Policyholder Risk+18.3941.24Each risk category adds ~$18 to premium
Policy Type: Home-22.0027.32Home policies are ~$22 cheaper than baseline
Policy Type: Auto-23.3525.36Auto policies are ~$23 cheaper than baseline

Beta Estimates (Time-Varying)

VariableEstimateT-statInterpretation
Log Claim Amount+24.5849.78Higher claim amounts increase premium by ~$25 per log unit
Claims Filed+20.0840.84Each additional claim increases premium by ~$20
Claims Frequency+10.5430.22Higher claim frequency adds ~$11 per unit
Regional Risk+5.1314.65Higher-risk regions add ~$5 to premium
Claims Size+4.8113.77Larger claim size adds ~$5 to premium
Fraud Detection Flag+13.138.86Fraud flag adds ~$13 to premium
Claims × Fraud+1.021.45Not statistically significant (p=0.15)

Model Performance

0.746
R² Overall
0.857
R² Between
0.617
R² Within
19.14
MAE
574.39
MSE

Try it yourself: Select "Insurance" in the Free Playground.