Insurance Premium Forecasting

Predict insurance premium prices using panel data with 50 policyholders tracked over 100 time periods.

Dataset Exploration

The Insurance dataset includes policyholder information and claim history over time. Features include claims frequency, claim amounts, fraud detection flags, and risk scores.

Panel data structure: 50 policyholders tracked over 100 time periods (5,000 total rows, 14 columns).

Fit Data (first 5 rows)

TimeIndividualClaims FiledLog Claim AmtFraud FlagClaims Freq.Claims SizeRegional RiskClaims×FraudAutoHomePolicy RiskPremium PaidPremium Price
1117.88022201028091281.76
2117.07031201028091310.97
3127.16021301028091283.5
4118.06122111028091295.01
5127.49031201028091293.94

Forecast Data (first 3 rows)

The forecast file covers future periods without the Premium Price target column.

TimeIndividualClaims FiledLog Claim AmtFraud FlagClaims Freq.Claims SizeRegional RiskClaims×FraudAutoHomePolicy RiskPremium Paid
101117.2703210001809
102126.5203310001809
103127.5412132001809

Download sample datasets from GitHub

Code Walkthrough

Step 1: Initialize

Python
import pandas as pd
from datfid import DATFIDClient

client = DATFIDClient(token="your_DATFID_token")

Step 2: Fit the Model

Python
url_fit = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Insurance.xlsx"
df = pd.read_excel(url_fit)

result = client.fit_model(
    df=df,
    id_col="Individual",
    time_col="Time",
    y="Premium Price",
    current_features="all",
    filter_by_significance=True
)

Step 3: Forecast

Python
url_forecast = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Insurance_forecast.xlsx"
df_forecast = pd.read_excel(url_forecast)

forecast = client.forecast_model(df_forecast=df_forecast)

Analysis Results (Model Fit)

Formula

Premium Price ~ α1*Intercept + α2*Premium Paid + α3*Policy Type: Auto + α4*Policy Type: Home + α5*Policyholder Risk + β1*Claims Filed + β2*Log Claim Amount + β3*Fraud Detection Flag + β4*Policyholder Claims Frequency + β5*Policyholder Claims Size + β6*Regional Risk + β7*Claims Filed × Fraud Detection Flag

Alpha Estimates (Time-Invariant)

VariableEstimateT-statInterpretation
Intercept1,509.6690.1Baseline premium price before any policyholder or claims information is considered (~$1,509.6).
Premium Paid+0.11378.3Every dollar of historical premium paid lifts the priced premium by ~$0.11 — pricing inertia.
Policy Type: Auto-23.325.4Auto policies are priced ~$23.3 below the reference policy type.
Policy Type: Home-22.027.3Home policies are priced ~$22.0 below the reference policy type.
Policyholder Risk+18.441.2Each additional risk-score point adds ~$18.4 to the premium.

Beta Estimates (Time-Varying)

VariableEstimateT-statInterpretation
Claims Filed+20.140.8Each additional claim filed adds ~$20.1 to the premium.
Log Claim Amount+24.649.8A 1-unit increase in log claim amount adds ~$24.6 to the premium.
Fraud Detection Flag+13.18.9A flagged fraud signal pushes the premium up by ~$13.1.
Policyholder Claims Frequency+10.530.2Each step up in claims frequency adds ~$10.5.
Policyholder Claims Size+4.813.8Each step up in average claim size adds ~$4.8.
Regional Risk+5.114.7Each step up in regional risk adds ~$5.1.
Claims Filed × Fraud Detection Flag+1.01.5The interaction between claims filed and fraud flag adds ~$1.0 per combined unit, but is not statistically significant in this run.

Model Performance

0.746
R² Overall
0.857
R² Between
0.617
R² Within
19.1
MAE
574.4
MSE

Try it yourself: Select "Insurance" in the Free Playground.