Loan Probability Forecasting
Predict loan approval probability in the banking sector using panel data with 50 individuals tracked over 100 time periods.
Dataset Exploration
The Banking dataset contains financial transaction data with individual customer records over time. Features include loan amounts, repayment schedules, credit scores, and customer demographics. The target variable predicts loan probability based on historical payment patterns and customer behavior.
Panel data structure:50 individuals tracked over 100 time periods (5,000 total rows, 14 columns). Notice how the same individual (e.g. "ind1") appears across multiple time periods — this is panel data.
Fit Data (first 5 rows)
| Time | Individual | Repayment Amount | Missed Payments | Credit Score | Unemp. Rate | Inflation Rate | Rep.Amt X Missed | Log(Loan Amt) | Income Level | Stable Income | Mortgage | Auto | Loan Prob. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 40179 | ind1 | 1287 | 1 | 790 | 5.81 | 3.32 | 1287 | 5.404 | 1 | 0 | 1 | 0 | 0.513 |
| 40210 | ind1 | 1261 | 1 | 800 | 6.05 | 2.42 | 1261 | 5.404 | 1 | 0 | 1 | 0 | 0.463 |
| 40238 | ind1 | 1273 | 1 | 810 | 5.7 | 3.46 | 1273 | 5.404 | 1 | 0 | 1 | 0 | 0.448 |
| 40269 | ind1 | 1248 | 1 | 830 | 4.89 | 3.49 | 1248 | 5.404 | 1 | 0 | 1 | 0 | 0.448 |
| 40299 | ind1 | 1260 | 1 | 830 | 4.82 | 2.61 | 1260 | 5.404 | 1 | 0 | 1 | 0 | 0.555 |
Forecast Data (first 3 rows)
The forecast file has the same structure but covers future periods and does not include the target variable. DATFID uses this to know which individuals and time points to generate predictions for.
| Time | Individual | Repayment Amount | Missed Payments | Credit Score | Unemp. Rate | Inflation Rate | Rep.Amt X Missed | Log(Loan Amt) | Income Level | Stable Income | Mortgage | Auto |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 43221 | ind1 | 1830 | 0 | 830 | 5.75 | 2.96 | 0 | 5.368 | 2 | 1 | 0 | 1 |
| 43252 | ind1 | 1847 | 0 | 800 | 5.52 | 3.34 | 0 | 5.368 | 2 | 1 | 0 | 1 |
| 43282 | ind1 | 1810 | 0 | 830 | 5.23 | 3.42 | 0 | 5.368 | 2 | 1 | 0 | 1 |
Download sample datasets from GitHub — also works with CSV files.
Code Walkthrough
Step 1: Initialize
import pandas as pd
from datfid import DATFIDClient
client = DATFIDClient(token="your_DATFID_token")Step 2: Fit the Model
url_fit = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Banking.xlsx"
df = pd.read_excel(url_fit)
result = client.fit_model(
df=df,
id_col="Individual",
time_col="Time",
y="Loan Probability",
current_features="all",
filter_by_significance=True
)Step 3: Forecast
url_forecast = "https://raw.githubusercontent.com/datfid-valeriidashuk/sample-datasets/main/Banking_forecast.xlsx"
df_forecast = pd.read_excel(url_forecast)
forecast = client.forecast_model(df_forecast=df_forecast)Analysis Results (Model Fit)
Formula
Loan Probability ~ α1*Intercept + α2*Log(Loan Amount) + α3*Income Level + α4*Stable Income + α5*Loan Type: Mortgage + α6*Loan Type: Auto + β1*Repayment Amount + β2*Credit Score + β3*Unemployment Rate + β4*Inflation Rate + β5*Repayment Amount X Missed PaymentsAlpha Estimates (Time-Invariant)
These coefficients capture characteristics that are constant over time for each individual — the inherent baseline differences between borrowers.
| Variable | Estimate | T-stat | Interpretation |
|---|---|---|---|
| Intercept | 0.316 | 20.8 | Baseline loan-approval probability before any borrower or macro factor is considered. |
| Log(Loan Amount) | +0.0246 | 9.5 | Each one-unit increase in log loan amount lifts approval probability by ~2.5 percentage points. |
| Income Level | +0.0294 | 29.0 | Each step up in income level raises approval probability by ~2.9 percentage points. |
| Stable Income | +0.0435 | 27.6 | Borrowers with stable income are ~4.4 pp more likely to be approved than the baseline. |
| Loan Type: Mortgage | +0.0409 | 21.9 | Mortgage applications run ~4.1 pp higher in approval probability than the reference loan type. |
| Loan Type: Auto | +0.0179 | 9.3 | Auto loans run ~1.8 pp higher in approval probability than the reference loan type. |
Beta Estimates (Time-Varying)
These coefficients capture effects that change over time — the dynamic factors influencing loan probability.
| Variable | Estimate | T-stat | Interpretation |
|---|---|---|---|
| Repayment Amount | -2.5e-5 | 5.7 | Each extra dollar of monthly repayment trims approval probability by ~2.5e-5 (about 1 pp per $40k of repayment). |
| Credit Score | +9.8e-5 | 2.1 | Each additional credit-score point lifts approval probability by ~9.8e-5 (about 1 pp per 100 points). |
| Unemployment Rate | -0.00632 | 3.1 | Each percentage-point rise in unemployment shaves ~0.63 pp off approval probability. |
| Inflation Rate | -0.00181 | 0.9 | Inflation moves approval probability slightly down (~0.18 pp per pp), but the effect is not statistically significant in this run. |
| Repayment Amount X Missed Payments | -2.6e-5 | 44.7 | The strongest signal in the panel: borrowers with both high repayment and missed payments lose roughly 2.6e-5 of approval probability per dollar-payment combined. |
Model Performance
Forecast Results
After running the forecast, DATFID produces a CSV file with predicted loan probabilities for each individual and time period specified in the forecast file. The output includes the entity ID, time, and predicted value.
Try it yourself: Run this exact analysis in the Free Playground — select "Banking" from the sample datasets and click Run Analysis.