Loan Probability Forecasting

Predict loan approval probability in the banking sector using panel data with 50 individuals tracked over 100 time periods.

Dataset Exploration

The Banking dataset contains financial transaction data with individual customer records over time. Features include loan amounts, repayment schedules, credit scores, and customer demographics. The target variable predicts loan probability based on historical payment patterns and customer behavior.

Panel data structure:50 individuals tracked over 100 time periods (5,000 total rows, 14 columns). Notice how the same individual (e.g. "ind1") appears across multiple time periods — this is panel data.

Fit Data (first 5 rows)

Time	Individual	Repayment Amount	Missed Payments	Credit Score	Unemp. Rate	Inflation Rate	Rep.Amt X Missed	Log(Loan Amt)	Income Level	Mortgage	Loan Prob.
40179	ind1	1287	1	790	5.81	3.32	1287	5.404	1	1	0.513
40210	ind1	1261	1	800	6.05	2.42	1261	5.404	1	1	0.463
40238	ind1	1273	1	810	5.7	3.46	1273	5.404	1	1	0.448
40269	ind1	1248	1	830	4.89	3.49	1248	5.404	1	1	0.448
40299	ind1	1260	1	830	4.82	2.61	1260	5.404	1	1	0.555

Forecast Data (first 3 rows)

The forecast file has the same structure but covers future periods and does not include the target variable. DATFID uses this to know which individuals and time points to generate predictions for.

Time	Individual	Repayment Amount	Credit Score	Unemp. Rate	Inflation Rate	Log(Loan Amt)	Income Level	Stable Income	Auto
43221	ind1	1830	830	5.75	2.96	5.368	2	1	1
43252	ind1	1847	800	5.52	3.34	5.368	2	1	1
43282	ind1	1810	830	5.23	3.42	5.368	2	1	1

Download sample datasets from GitHub — also works with CSV files.

Code Walkthrough

Step 1: Initialize

Python

import pandas as pd
from datfid import DATFIDClient

client = DATFIDClient(token="your_DATFID_token")

Step 2: Fit the Model

Python

url_fit = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Banking.xlsx"
df = pd.read_excel(url_fit)

result = client.fit_model(
    df=df,
    id_col="Individual",
    time_col="Time",
    y="Loan Probability",
    current_features="all",
    filter_by_significance=True
)

Step 3: Forecast

Python

url_forecast = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Banking_forecast.xlsx"
df_forecast = pd.read_excel(url_forecast)

forecast = client.forecast_model(df_forecast=df_forecast)

Analysis Results (Model Fit)

Formula

Loan Probability ~ α1*Intercept + α2*Log(Loan Amount) + α3*Income Level + α4*Stable Income + α5*Loan Type: Mortgage + α6*Loan Type: Auto + β1*Repayment Amount + β2*Credit Score + β3*Unemployment Rate + β4*Inflation Rate + β5*Repayment Amount X Missed Payments

Alpha Estimates (Time-Invariant)

These coefficients capture characteristics that are constant over time for each individual — the inherent baseline differences between borrowers.

Variable	Estimate	T-stat	Interpretation
Intercept	0.316	20.8	Baseline loan-approval probability before any borrower or macro factor is considered.
Log(Loan Amount)	+0.0246	9.5	Each one-unit increase in log loan amount lifts approval probability by ~2.5 percentage points.
Income Level	+0.0294	29.0	Each step up in income level raises approval probability by ~2.9 percentage points.
Stable Income	+0.0435	27.6	Borrowers with stable income are ~4.4 pp more likely to be approved than the baseline.
Loan Type: Mortgage	+0.0409	21.9	Mortgage applications run ~4.1 pp higher in approval probability than the reference loan type.
Loan Type: Auto	+0.0179	9.3	Auto loans run ~1.8 pp higher in approval probability than the reference loan type.

Beta Estimates (Time-Varying)

These coefficients capture effects that change over time — the dynamic factors influencing loan probability.

Variable	Estimate	T-stat	Interpretation
Repayment Amount	-2.5e-5	5.7	Each extra dollar of monthly repayment trims approval probability by ~2.5e-5 (about 1 pp per $40k of repayment).
Credit Score	+9.8e-5	2.1	Each additional credit-score point lifts approval probability by ~9.8e-5 (about 1 pp per 100 points).
Unemployment Rate	-0.00632	3.1	Each percentage-point rise in unemployment shaves ~0.63 pp off approval probability.
Inflation Rate	-0.00181	0.9	Inflation moves approval probability slightly down (~0.18 pp per pp), but the effect is not statistically significant in this run.
Repayment Amount X Missed Payments	-2.6e-5	44.7	The strongest signal in the panel: borrowers with both high repayment and missed payments lose roughly 2.6e-5 of approval probability per dollar-payment combined.

Model Performance

0.714

R² Overall

0.950

R² Between

0.446

R² Within

0.0423

MAE

0.00280

MSE

Forecast Results

After running the forecast, DATFID produces a CSV file with predicted loan probabilities for each individual and time period specified in the forecast file. The output includes the entity ID, time, and predicted value.

Try it yourself: Run this exact analysis in the Free Playground — select "Banking" from the sample datasets and click Run Analysis.