Loan Probability Forecasting

Predict loan approval probability in the banking sector using panel data with 50 individuals tracked over 100 time periods.

Dataset Exploration

The Banking dataset contains financial transaction data with individual customer records over time. Features include loan amounts, repayment schedules, credit scores, and customer demographics. The target variable predicts loan probability based on historical payment patterns and customer behavior.

Panel data structure: 50 individuals tracked over 100 time periods (5,000 total rows, 14 columns). Notice how the same individual (e.g. "ind1") appears across multiple time periods — this is panel data.

Fit Data (first 5 rows)

TimeIndividualRepayment AmountMissed PaymentsCredit ScoreUnemp. RateInflation RateRep.Amt X MissedLog(Loan Amt)Income LevelStable IncomeMortgageAutoLoan Prob.
40179ind1128717905.813.3212875.40410100.513
40210ind1126118006.052.4212615.40410100.463
40238ind1127318105.73.4612735.40410100.448
40269ind1124818304.893.4912485.40410100.448
40299ind1126018304.822.6112605.40410100.555

Forecast Data (first 3 rows)

The forecast file has the same structure but covers future periods and does not include the target variable. DATFID uses this to know which individuals and time points to generate predictions for.

TimeIndividualRepayment AmountMissed PaymentsCredit ScoreUnemp. RateInflation RateRep.Amt X MissedLog(Loan Amt)Income LevelStable IncomeMortgageAuto
43221ind1183008305.752.9605.3682101
43252ind1184708005.523.3405.3682101
43282ind1181008305.233.4205.3682101

Download sample datasets from GitHub — also works with CSV files.

Code Walkthrough

Step 1: Initialize

Python
import pandas as pd
from datfid import DATFIDClient

client = DATFIDClient(token="your_DATFID_token")

Step 2: Fit the Model

Python
url_fit = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Banking.xlsx"
df = pd.read_excel(url_fit)

result = client.fit_model(
    df=df,
    id_col="Individual",
    time_col="Time",
    y="Loan Probability",
    current_features="all",
    filter_by_significance=True
)

Step 3: Forecast

Python
url_forecast = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Banking_forecast.xlsx"
df_forecast = pd.read_excel(url_forecast)

forecast = client.forecast_model(df_forecast=df_forecast)

Analysis Results (Model Fit)

Formula

Loan Probability ~ α1*Intercept + α2*Loan Type: Auto + α3*Income Level + α4*Log(Loan Amount) + α5*Stable Income + α6*Loan Type: Mortgage + β1*Repayment Amount + β2*Credit Score + β3*Unemployment Rate + β4*Inflation Rate + β5*Repayment Amount X Missed Payments

Alpha Estimates (Time-Invariant)

These coefficients capture characteristics that are constant over time for each individual — the inherent baseline differences between borrowers.

VariableEstimateT-statInterpretation
Intercept0.316520.79Baseline loan probability when all features are zero
Income Level+0.029429.03Each 1-unit higher income level increases loan probability by ~2.9 percentage points
Stable Income+0.043527.61Having a stable income increases loan probability by ~4.4 percentage points
Loan Type: Mortgage+0.040921.89Mortgage loans have ~4.1 pp higher probability than baseline
Loan Type: Auto+0.01799.31Auto loans have ~1.8 pp higher probability than baseline
Log(Loan Amount)+0.02469.48Higher loan amounts are associated with higher approval probability

Beta Estimates (Time-Varying)

These coefficients capture effects that change over time — the dynamic factors influencing loan probability.

VariableEstimateT-statInterpretation
Rep. Amt X Missed Payments-2.63e-0544.66Most significant predictor. Higher repayments combined with missed payments decrease loan probability
Repayment Amount-2.46e-055.72Higher repayment amounts slightly decrease loan probability
Unemployment Rate-0.00633.11A 1 pp increase in unemployment decreases loan probability by ~0.6 pp
Credit Score+9.82e-052.14Higher credit score slightly increases loan probability
Inflation Rate-0.00180.86Not statistically significant (p=0.39)

Model Performance

0.714
R² Overall
0.950
R² Between
0.446
R² Within
0.0423
MAE
0.0028
MSE

Forecast Results

After running the forecast, DATFID produces a CSV file with predicted loan probabilities for each individual and time period specified in the forecast file. The output includes the entity ID, time, and predicted value.

Try it yourself: Run this exact analysis in the Free Playground — select "Banking" from the sample datasets and click Run Analysis.