Loan Probability Forecasting

Predict loan approval probability in the banking sector using panel data with 50 individuals tracked over 100 time periods.

Dataset Exploration

The Banking dataset contains financial transaction data with individual customer records over time. Features include loan amounts, repayment schedules, credit scores, and customer demographics. The target variable predicts loan probability based on historical payment patterns and customer behavior.

Panel data structure:50 individuals tracked over 100 time periods (5,000 total rows, 14 columns). Notice how the same individual (e.g. "ind1") appears across multiple time periods — this is panel data.

Fit Data (first 5 rows)

TimeIndividualRepayment AmountMissed PaymentsCredit ScoreUnemp. RateInflation RateRep.Amt X MissedLog(Loan Amt)Income LevelStable IncomeMortgageAutoLoan Prob.
40179ind1128717905.813.3212875.40410100.513
40210ind1126118006.052.4212615.40410100.463
40238ind1127318105.73.4612735.40410100.448
40269ind1124818304.893.4912485.40410100.448
40299ind1126018304.822.6112605.40410100.555

Forecast Data (first 3 rows)

The forecast file has the same structure but covers future periods and does not include the target variable. DATFID uses this to know which individuals and time points to generate predictions for.

TimeIndividualRepayment AmountMissed PaymentsCredit ScoreUnemp. RateInflation RateRep.Amt X MissedLog(Loan Amt)Income LevelStable IncomeMortgageAuto
43221ind1183008305.752.9605.3682101
43252ind1184708005.523.3405.3682101
43282ind1181008305.233.4205.3682101

Download sample datasets from GitHub — also works with CSV files.

Code Walkthrough

Step 1: Initialize

Python
import pandas as pd
from datfid import DATFIDClient

client = DATFIDClient(token="your_DATFID_token")

Step 2: Fit the Model

Python
url_fit = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Banking.xlsx"
df = pd.read_excel(url_fit)

result = client.fit_model(
    df=df,
    id_col="Individual",
    time_col="Time",
    y="Loan Probability",
    current_features="all",
    filter_by_significance=True
)

Step 3: Forecast

Python
url_forecast = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Banking_forecast.xlsx"
df_forecast = pd.read_excel(url_forecast)

forecast = client.forecast_model(df_forecast=df_forecast)

Analysis Results (Model Fit)

Formula

Loan Probability ~ α1*Intercept + α2*Log(Loan Amount) + α3*Income Level + α4*Stable Income + α5*Loan Type: Mortgage + α6*Loan Type: Auto + β1*Repayment Amount + β2*Credit Score + β3*Unemployment Rate + β4*Inflation Rate + β5*Repayment Amount X Missed Payments

Alpha Estimates (Time-Invariant)

These coefficients capture characteristics that are constant over time for each individual — the inherent baseline differences between borrowers.

VariableEstimateT-statInterpretation
Intercept0.31620.8Baseline loan-approval probability before any borrower or macro factor is considered.
Log(Loan Amount)+0.02469.5Each one-unit increase in log loan amount lifts approval probability by ~2.5 percentage points.
Income Level+0.029429.0Each step up in income level raises approval probability by ~2.9 percentage points.
Stable Income+0.043527.6Borrowers with stable income are ~4.4 pp more likely to be approved than the baseline.
Loan Type: Mortgage+0.040921.9Mortgage applications run ~4.1 pp higher in approval probability than the reference loan type.
Loan Type: Auto+0.01799.3Auto loans run ~1.8 pp higher in approval probability than the reference loan type.

Beta Estimates (Time-Varying)

These coefficients capture effects that change over time — the dynamic factors influencing loan probability.

VariableEstimateT-statInterpretation
Repayment Amount-2.5e-55.7Each extra dollar of monthly repayment trims approval probability by ~2.5e-5 (about 1 pp per $40k of repayment).
Credit Score+9.8e-52.1Each additional credit-score point lifts approval probability by ~9.8e-5 (about 1 pp per 100 points).
Unemployment Rate-0.006323.1Each percentage-point rise in unemployment shaves ~0.63 pp off approval probability.
Inflation Rate-0.001810.9Inflation moves approval probability slightly down (~0.18 pp per pp), but the effect is not statistically significant in this run.
Repayment Amount X Missed Payments-2.6e-544.7The strongest signal in the panel: borrowers with both high repayment and missed payments lose roughly 2.6e-5 of approval probability per dollar-payment combined.

Model Performance

0.714
R² Overall
0.950
R² Between
0.446
R² Within
0.0423
MAE
0.00280
MSE

Forecast Results

After running the forecast, DATFID produces a CSV file with predicted loan probabilities for each individual and time period specified in the forecast file. The output includes the entity ID, time, and predicted value.

Try it yourself: Run this exact analysis in the Free Playground — select "Banking" from the sample datasets and click Run Analysis.