Loan Probability Forecasting
Predict loan approval probability in the banking sector using panel data with 50 individuals tracked over 100 time periods.
Dataset Exploration
The Banking dataset contains financial transaction data with individual customer records over time. Features include loan amounts, repayment schedules, credit scores, and customer demographics. The target variable predicts loan probability based on historical payment patterns and customer behavior.
Panel data structure: 50 individuals tracked over 100 time periods (5,000 total rows, 14 columns). Notice how the same individual (e.g. "ind1") appears across multiple time periods — this is panel data.
Fit Data (first 5 rows)
| Time | Individual | Repayment Amount | Missed Payments | Credit Score | Unemp. Rate | Inflation Rate | Rep.Amt X Missed | Log(Loan Amt) | Income Level | Stable Income | Mortgage | Auto | Loan Prob. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 40179 | ind1 | 1287 | 1 | 790 | 5.81 | 3.32 | 1287 | 5.404 | 1 | 0 | 1 | 0 | 0.513 |
| 40210 | ind1 | 1261 | 1 | 800 | 6.05 | 2.42 | 1261 | 5.404 | 1 | 0 | 1 | 0 | 0.463 |
| 40238 | ind1 | 1273 | 1 | 810 | 5.7 | 3.46 | 1273 | 5.404 | 1 | 0 | 1 | 0 | 0.448 |
| 40269 | ind1 | 1248 | 1 | 830 | 4.89 | 3.49 | 1248 | 5.404 | 1 | 0 | 1 | 0 | 0.448 |
| 40299 | ind1 | 1260 | 1 | 830 | 4.82 | 2.61 | 1260 | 5.404 | 1 | 0 | 1 | 0 | 0.555 |
Forecast Data (first 3 rows)
The forecast file has the same structure but covers future periods and does not include the target variable. DATFID uses this to know which individuals and time points to generate predictions for.
| Time | Individual | Repayment Amount | Missed Payments | Credit Score | Unemp. Rate | Inflation Rate | Rep.Amt X Missed | Log(Loan Amt) | Income Level | Stable Income | Mortgage | Auto |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 43221 | ind1 | 1830 | 0 | 830 | 5.75 | 2.96 | 0 | 5.368 | 2 | 1 | 0 | 1 |
| 43252 | ind1 | 1847 | 0 | 800 | 5.52 | 3.34 | 0 | 5.368 | 2 | 1 | 0 | 1 |
| 43282 | ind1 | 1810 | 0 | 830 | 5.23 | 3.42 | 0 | 5.368 | 2 | 1 | 0 | 1 |
Download sample datasets from GitHub — also works with CSV files.
Code Walkthrough
Step 1: Initialize
import pandas as pd
from datfid import DATFIDClient
client = DATFIDClient(token="your_DATFID_token")Step 2: Fit the Model
url_fit = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Banking.xlsx"
df = pd.read_excel(url_fit)
result = client.fit_model(
df=df,
id_col="Individual",
time_col="Time",
y="Loan Probability",
current_features="all",
filter_by_significance=True
)Step 3: Forecast
url_forecast = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Banking_forecast.xlsx"
df_forecast = pd.read_excel(url_forecast)
forecast = client.forecast_model(df_forecast=df_forecast)Analysis Results (Model Fit)
Formula
Loan Probability ~ α1*Intercept + α2*Loan Type: Auto + α3*Income Level + α4*Log(Loan Amount) + α5*Stable Income + α6*Loan Type: Mortgage + β1*Repayment Amount + β2*Credit Score + β3*Unemployment Rate + β4*Inflation Rate + β5*Repayment Amount X Missed PaymentsAlpha Estimates (Time-Invariant)
These coefficients capture characteristics that are constant over time for each individual — the inherent baseline differences between borrowers.
| Variable | Estimate | T-stat | Interpretation |
|---|---|---|---|
| Intercept | 0.3165 | 20.79 | Baseline loan probability when all features are zero |
| Income Level | +0.0294 | 29.03 | Each 1-unit higher income level increases loan probability by ~2.9 percentage points |
| Stable Income | +0.0435 | 27.61 | Having a stable income increases loan probability by ~4.4 percentage points |
| Loan Type: Mortgage | +0.0409 | 21.89 | Mortgage loans have ~4.1 pp higher probability than baseline |
| Loan Type: Auto | +0.0179 | 9.31 | Auto loans have ~1.8 pp higher probability than baseline |
| Log(Loan Amount) | +0.0246 | 9.48 | Higher loan amounts are associated with higher approval probability |
Beta Estimates (Time-Varying)
These coefficients capture effects that change over time — the dynamic factors influencing loan probability.
| Variable | Estimate | T-stat | Interpretation |
|---|---|---|---|
| Rep. Amt X Missed Payments | -2.63e-05 | 44.66 | Most significant predictor. Higher repayments combined with missed payments decrease loan probability |
| Repayment Amount | -2.46e-05 | 5.72 | Higher repayment amounts slightly decrease loan probability |
| Unemployment Rate | -0.0063 | 3.11 | A 1 pp increase in unemployment decreases loan probability by ~0.6 pp |
| Credit Score | +9.82e-05 | 2.14 | Higher credit score slightly increases loan probability |
| Inflation Rate | -0.0018 | 0.86 | Not statistically significant (p=0.39) |
Model Performance
Forecast Results
After running the forecast, DATFID produces a CSV file with predicted loan probabilities for each individual and time period specified in the forecast file. The output includes the entity ID, time, and predicted value.
Try it yourself: Run this exact analysis in the Free Playground — select "Banking" from the sample datasets and click Run Analysis.