The Story That Explains Counterfactual Explanations
No reason. No path forward. Just a closed door.
Maria asked the bank's AI system: "Why was I rejected?" The system replied with a SHAP waterfall chart — a stack of bars showing that her low income was –0.42, her credit utilisation was –0.31, her short employment history was –0.18…
Maria stared at it. She understood nothing.
Then she asked a different question: "What would I need to change to get approved?"
The system replied: "If your annual income were £32,000 instead of £26,500, and your credit utilisation were below 35% instead of 52%, your application would be approved with 87% probability."
That she could act on. That was a counterfactual explanation.
A counterfactual explanation answers the question: "What is the smallest change to the input that would flip the model's output?" It is the XAI technique closest to how humans naturally think about decisions — not "why did this happen?" but "what would I need to do differently?"
Given an input x that received a prediction ŷ, a counterfactual x′ is a perturbed version of x such that the model predicts a different (desired) outcome ŷ′, while keeping x′ as close as possible to the original x. The difference x′ − x is the actionable recourse.
Where Counterfactuals Fit in the XAI Landscape
Explainable AI (XAI) techniques are broadly divided by scope (global vs local) and approach (feature attribution vs example-based vs rule-based). Counterfactual explanations are local, example-based methods — they explain a single prediction by generating a contrast example.
| XAI Family | Examples | Answers | Actionable? |
|---|---|---|---|
| Feature Attribution | SHAP, LIME, Integrated Gradients | "Which features mattered most?" | Partially |
| Rule Extraction | ANCHORS, Decision Rules | "What conditions lock in this prediction?" | Partially |
| Counterfactuals | DICE, WACHTER, FACE, GrowingSpheres | "What do I change to get a different outcome?" | Yes — directly |
| Prototype / Criticism | MMD-Critics, ProtoDash | "What does a typical example look like?" | Rarely |
| Saliency Maps | GradCAM, Grad×Input | "Which pixels / tokens caused this?" | No |
Unlike SHAP or LIME which tell you what the model focused on, counterfactuals tell you what you can do. This is the difference between a post-mortem and a road map. In regulated industries — finance, healthcare, hiring — the right to actionable recourse is increasingly a legal requirement (GDPR Article 22, EU AI Act).
The Four Properties of a Good Counterfactual
Not all counterfactuals are useful. A counterfactual that says "if you were 20 years younger and had a PhD, you'd be approved" is technically valid but practically useless. Good counterfactuals satisfy four properties.
There is rarely a single correct counterfactual. Many equally minimal changes can flip a decision. Generating a diverse set of counterfactuals (DICE) lets users choose the path that fits their own constraints and preferences — one person can reduce their debt, another can increase their salary. Offering only one counterfactual is paternalistic; offering too many is overwhelming. A set of 3–5 is usually optimal.
The Mathematical Formulation
Counterfactual search is formulated as a constrained optimisation problem. The original formulation by Wachter et al. (2017) is:
Animated Diagram — How Counterfactual Search Works
The animation below shows a binary classifier's decision boundary in 2D feature space. Watch how the search algorithm nudges the original point (red) across the boundary to find the nearest valid counterfactual (green).
Major Counterfactual Algorithms Compared
The field has produced several distinct algorithms, each with different trade-offs between plausibility, diversity, and computational cost.
Animated Diagram — DICE Diversity vs Single CF
This animation contrasts finding a single counterfactual (Wachter-style) versus a diverse set (DICE-style). Each path represents a different recourse strategy.
Python Implementation — From Scratch (Wachter Method)
Let us build a minimal counterfactual explainer from scratch to understand what libraries like DICE do under the hood. We will use a loan approval dataset and a trained Random Forest classifier.
# ── Step 1: Build dataset and train model ───────────────────
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
np.random.seed(42)
n = 2000
# Synthetic loan data
df = pd.DataFrame({
'income': np.random.normal(35000, 12000, n).clip(10000, 150000),
'credit_score': np.random.normal(640, 80, n).clip(300, 850),
'debt_ratio': np.random.uniform(0.1, 0.9, n),
'employment_yr': np.random.exponential(5, n).clip(0, 30),
'loan_amount': np.random.normal(18000, 8000, n).clip(1000, 60000),
})
# Label: approved if strong financials
score = (
(df['income'] / 50000) * 0.35 +
(df['credit_score'] / 850) * 0.30 +
(1 - df['debt_ratio']) * 0.20 +
(df['employment_yr'] / 30) * 0.15
)
df['approved'] = (score + np.random.normal(0, 0.08, n) > 0.48).astype(int)
X = df.drop('approved', axis=1)
y = df['approved']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train Random Forest (no scaling needed, but we show pipeline for completeness)
model = RandomForestClassifier(n_estimators=200, max_depth=8, random_state=42)
model.fit(X_train, y_train)
print(f"Test accuracy: {model.score(X_test, y_test):.3f}")
# ── Step 2: Wachter-style Counterfactual from scratch ───────
from scipy.optimize import minimize
from scipy.stats import median_abs_deviation
def wachter_cf(model, x_orig, y_desired, feature_names,
immutable=None, lam=0.5, max_iter=500):
"""
Wachter-style counterfactual finder.
model : sklearn classifier with predict_proba
x_orig : pd.Series — the original input row
y_desired : int — desired class (e.g. 1 = approved)
immutable : list of feature names that cannot change
lam : regularisation weight (higher = prioritise flipping)
"""
x0 = x_orig.values.copy().astype(float)
names = feature_names
immut_idx = [names.index(f) for f in (immutable or [])]
# MAD normalisation factors (from training data stats)
mad = X_train.apply(lambda col: median_abs_deviation(col)).values + 1e-8
def objective(x_cf):
# Prediction loss
proba = model.predict_proba(pd.DataFrame([x_cf], columns=names))[:, y_desired][0]
pred_loss = (proba - 1.0) ** 2 # want proba → 1
# L1 proximity (MAD-normalised)
proximity = np.sum(np.abs(x_cf - x0) / mad)
return lam * pred_loss + proximity
def grad_constraint(x_cf):
# Ensure immutable features don't change
return x_cf[immut_idx] - x0[immut_idx]
constraints = [{'type': 'eq', 'fun': grad_constraint}] if immut_idx else []
result = minimize(
objective,
x0=x0,
method='SLSQP',
constraints=constraints,
options={'maxiter': max_iter, 'ftol': 1e-6}
)
x_cf = pd.Series(result.x, index=names)
proba = model.predict_proba(pd.DataFrame([result.x], columns=names))[:, y_desired][0]
valid = proba > 0.5
return x_cf, valid, proba
# ── Apply to Maria's application ────────────────────────────
maria = pd.Series({
'income': 26500,
'credit_score': 580,
'debt_ratio': 0.52,
'employment_yr': 1.5,
'loan_amount': 20000,
})
feature_names = list(X.columns)
cf, valid, proba = wachter_cf(model, maria, y_desired=1,
feature_names=feature_names,
immutable=[], # all mutable
lam=1.0)
print(f"\nOriginal prediction: {model.predict_proba(maria.to_frame().T)[0,1]:.3f}")
print(f"CF prediction: {proba:.3f} (Valid: {valid})")
print("\nFeature Changes:")
print(f"{'Feature':20s} {'Original':>12s} {'Counterfactual':>16s} {'Delta':>12s}")
print("-"*65)
for feat in feature_names:
delta = cf[feat] - maria[feat]
if abs(delta) > 0.01 * abs(maria[feat]):
print(f"{feat:20s} {maria[feat]:>12.1f} {cf[feat]:>16.1f} {delta:>+12.1f}")
Maria's probability of approval rises from 14.3% to 82.1% if she can raise her annual income by ~£5,680, slightly improve her credit score by 38 points, and reduce her debt ratio from 52% to 30%. These are real, measurable targets — not abstract feature importances.
DICE Library — Diverse Counterfactuals in Practice
The DiCE library (Microsoft Research) is the most widely used production-grade counterfactual library. It supports sklearn, TensorFlow, PyTorch, and any custom model.
# ── DICE: Diverse Counterfactual Explanations ───────────────
import dice_ml
from dice_ml import Dice
# 1. Wrap dataset
d = dice_ml.Data(
dataframe=pd.concat([X_train, y_train.rename('approved')], axis=1),
continuous_features=['income', 'credit_score', 'debt_ratio',
'employment_yr', 'loan_amount'],
outcome_name='approved'
)
# 2. Wrap model
m = dice_ml.Model(model=model, backend='sklearn')
# 3. Create explainer
exp = Dice(d, m, method='random') # method: 'random' | 'genetic' | 'kdtree'
# 4. Generate 4 diverse counterfactuals for Maria
query = maria.to_frame().T
cf_result = exp.generate_counterfactuals(
query_instances=query,
total_CFs=4,
desired_class='opposite', # flip the prediction
permitted_range={
'income': [26500, 80000], # can only increase
'debt_ratio': [0.1, 0.52], # can only decrease debt
'employment_yr': [1.5, 30], # can only increase tenure
},
features_to_vary=['income', 'credit_score', 'debt_ratio', 'employment_yr']
# loan_amount not varied — she already fixed the amount
)
cf_result.visualize_as_dataframe(show_only_changes=True)
Use 'random' for quick exploration and prototyping. Use 'genetic' for better quality in high dimensions (slower). Use 'kdtree' when you need plausibility guaranteed — it finds counterfactuals that are actual training instances, ensuring maximum realism.
Interactive — Actionability Constraints
Actionability constraints are what separate useful counterfactuals from technically valid but practically useless ones. The interactive diagram below shows how constraining mutable features changes the counterfactual space.
Real-World Applications
Under the EU AI Act Article 86, he was entitled to a human review AND an explanation. The counterfactual explanation said: "If your CV contained 3+ mentions of 'Python', had a quantified achievement (e.g. '↑ revenue by 15%'), and listed a degree year after 2010, your application score would move from 47 to 74 (threshold: 60)."
James updated his CV in 20 minutes. He got the job.
| Industry | Decision Being Explained | Counterfactual Question | Regulatory Driver |
|---|---|---|---|
| 🏦 Banking | Loan / credit approval | "What income/score would get me approved?" | GDPR Art. 22, EBA Guidelines |
| 🏥 Healthcare | Disease risk prediction | "What lifestyle changes lower my risk?" | FDA AI/ML Action Plan |
| 💼 HR / Hiring | CV screening, promotion | "What skills/experience would flip the decision?" | EU AI Act Art. 86 |
| 🎓 Education | Admissions, grade prediction | "What grades/activities improve my chances?" | Ethical best practice |
| 🔒 Insurance | Premium calculation, claim denial | "What reduces my premium?" | FCA Consumer Duty |
| ⚖️ Criminal Justice | Recidivism risk (COMPAS-style) | "What factors would lower my risk score?" | Highly contested — ethical caution required |
Animated Diagram — Proximity vs Plausibility Trade-off
A fundamental tension in counterfactual generation: the nearest point that crosses the decision boundary may be off the data manifold — a combination of feature values that never occurs in reality. The animation below shows this trade-off.
How to Evaluate Counterfactual Explanations
Generating counterfactuals is only half the challenge. Evaluating their quality requires a multi-dimensional scorecard. No single metric captures everything — you need all five.
| Metric | What It Measures | Formula / Test | Ideal Value |
|---|---|---|---|
| Validity Rate | % of CFs that actually achieve desired class | mean(f(x′) = y′) | 1.0 (100%) |
| Proximity | Average feature distance to original | mean L1/MAD(x, x′) | Minimise |
| Sparsity | Average # of features changed | mean L0(x, x′) | 1–3 features ideal |
| Diversity | Pairwise distance between multiple CFs | mean d(x′ᵢ, x′ⱼ) for i≠j | Maximise |
| Plausibility (IM1) | Distance to nearest training point | min_z∈X_train d(x′, z) | Minimise |
| Actionability | % of changes that respect constraints | All immutable features unchanged? | 1.0 (100%) |
# ── Evaluation helper ───────────────────────────────────────
from scipy.spatial.distance import cdist
def evaluate_cfs(x_orig, cfs_df, model, X_train, y_desired=1):
"""Compute all 5 CF quality metrics."""
x0 = x_orig.values
cf_vals = cfs_df.values
mad = X_train.apply(lambda c: median_abs_deviation(c)).values + 1e-8
# 1. Validity
preds = model.predict(cf_vals)
validity = np.mean(preds == y_desired)
# 2. Proximity (L1 MAD-normalised)
proximity = np.mean([np.sum(np.abs(cf - x0) / mad) for cf in cf_vals])
# 3. Sparsity (number of changed features)
sparsity = np.mean([np.sum(np.abs(cf - x0) > 0.01) for cf in cf_vals])
# 4. Diversity (pairwise distances)
if len(cf_vals) > 1:
pw = cdist(cf_vals, cf_vals, metric='minkowski', p=1)
diversity = pw[np.triu_indices_from(pw, k=1)].mean()
else:
diversity = 0.0
# 5. Plausibility: distance to nearest training point
dists = cdist([cf for cf in cf_vals],
X_train.values, metric='minkowski', p=1)
plausibility = np.mean(dists.min(axis=1))
return {
'validity': round(validity, 3),
'proximity': round(proximity, 3),
'sparsity': round(sparsity, 2),
'diversity': round(diversity, 3),
'plausibility': round(plausibility, 3),
}
Common Pitfalls and How to Avoid Them
immutable_features before running any CF method.
Counterfactuals vs SHAP — When to Use Each
| Feature | SHAP Value |
|---|---|
| Income | −0.42 |
| Credit Score | −0.31 |
| Debt Ratio | −0.22 |
| Employment | −0.18 |
| Loan Amt | +0.04 |
SHAP tells Maria why she was rejected — which features drove the model's output downward. Useful for debugging and auditing.
| Feature | Change Needed |
|---|---|
| Income | £26.5k → £32.2k |
| Credit Score | 580 → 618 |
| Debt Ratio | 52% → 30% |
| Employment | No change needed |
| Loan Amt | No change needed |
The CF tells Maria what to do — concrete, measurable targets. Useful for user recourse and regulatory compliance.
| Criterion | SHAP / LIME | Counterfactuals |
|---|---|---|
| What it answers | "Why this prediction?" | "How to change the prediction?" |
| Output type | Feature importance scores | Modified input instance |
| Actionability | Indirect — requires interpretation | Direct — concrete targets |
| Best audience | Data scientists, auditors | End users, regulated subjects |
| Regulatory fit | Partial (transparency) | Strong (right to recourse) |
| Computation | Fast (SHAP: O(TL)) | Slower (optimisation loop) |
| Scope | Local or Global | Local only |
SHAP and counterfactuals are complementary, not competing. In a production XAI system: (1) use SHAP to build developer intuition about the model, detect bias, and audit feature importance; (2) surface counterfactuals to end users who need actionable recourse. Many leading fintech and insurtech platforms now include both explanation types in their user-facing interfaces.
Complete XAI Pipeline with Counterfactuals
Golden Rules
+£5,680, −22% debt ratio) are
actionable. Raw feature vectors are not. Translate automatically in your
presentation layer before surfacing to non-technical users.