Explainable AI (XAI) 📂 Core XAI Techniques · 3 of 3 57 min read

Counterfactual Explanations in XAI: Python Guide

A comprehensive, story-driven tutorial on counterfactual explanations — the XAI technique that answers "what would I need to change to get a different outcome?" Covers the theory, four major algorithms (Wachter, DICE, FACE, MACE), Python implementation from scratch, the DICE library, evaluation metrics, real-world applications in finance, healthcare and hiring, and a full production pipeline — with animated diagrams throughout.

Section 01

The Story That Explains Counterfactual Explanations

The Rejected Loan Application
Maria applied for a £20,000 home-improvement loan. Three days later, she received a terse email: "Your application has been declined."

No reason. No path forward. Just a closed door.

Maria asked the bank's AI system: "Why was I rejected?" The system replied with a SHAP waterfall chart — a stack of bars showing that her low income was –0.42, her credit utilisation was –0.31, her short employment history was –0.18…

Maria stared at it. She understood nothing.

Then she asked a different question: "What would I need to change to get approved?"

The system replied: "If your annual income were £32,000 instead of £26,500, and your credit utilisation were below 35% instead of 52%, your application would be approved with 87% probability."

That she could act on. That was a counterfactual explanation.

A counterfactual explanation answers the question: "What is the smallest change to the input that would flip the model's output?" It is the XAI technique closest to how humans naturally think about decisions — not "why did this happen?" but "what would I need to do differently?"

💡
The Counterfactual Principle

Given an input x that received a prediction ŷ, a counterfactual x′ is a perturbed version of x such that the model predicts a different (desired) outcome ŷ′, while keeping x′ as close as possible to the original x. The difference x′ − x is the actionable recourse.


Section 02

Where Counterfactuals Fit in the XAI Landscape

Explainable AI (XAI) techniques are broadly divided by scope (global vs local) and approach (feature attribution vs example-based vs rule-based). Counterfactual explanations are local, example-based methods — they explain a single prediction by generating a contrast example.

XAI Family Examples Answers Actionable?
Feature Attribution SHAP, LIME, Integrated Gradients "Which features mattered most?" Partially
Rule Extraction ANCHORS, Decision Rules "What conditions lock in this prediction?" Partially
Counterfactuals DICE, WACHTER, FACE, GrowingSpheres "What do I change to get a different outcome?" Yes — directly
Prototype / Criticism MMD-Critics, ProtoDash "What does a typical example look like?" Rarely
Saliency Maps GradCAM, Grad×Input "Which pixels / tokens caused this?" No
🔑
The Unique Selling Point

Unlike SHAP or LIME which tell you what the model focused on, counterfactuals tell you what you can do. This is the difference between a post-mortem and a road map. In regulated industries — finance, healthcare, hiring — the right to actionable recourse is increasingly a legal requirement (GDPR Article 22, EU AI Act).


Section 03

The Four Properties of a Good Counterfactual

Not all counterfactuals are useful. A counterfactual that says "if you were 20 years younger and had a PhD, you'd be approved" is technically valid but practically useless. Good counterfactuals satisfy four properties.

🎯
Validity
Prediction Flip
The counterfactual x′ must actually achieve the desired output ŷ′ from the model. An invalid counterfactual that fails to flip the prediction is useless.
🪶
Proximity
Minimal Change
The distance between x and x′ must be minimised. Changing 1 feature is better than changing 10. Measured via L0, L1, or L2 norms in feature space.
⚙️
Actionability
Feasible Change
Only mutable, causally plausible features should be changed. Age cannot be decreased. Immutable features (race, birth country) must be locked. Past events cannot be reversed.
🌍
Plausibility
Data Manifold
The counterfactual must lie on the data manifold — it should look like a realistic data point, not a bizarre combination of features that never occurs in the real world.
⚠️
The Rashomon Effect — Multiple Valid Counterfactuals

There is rarely a single correct counterfactual. Many equally minimal changes can flip a decision. Generating a diverse set of counterfactuals (DICE) lets users choose the path that fits their own constraints and preferences — one person can reduce their debt, another can increase their salary. Offering only one counterfactual is paternalistic; offering too many is overwhelming. A set of 3–5 is usually optimal.


Section 04

The Mathematical Formulation

Counterfactual search is formulated as a constrained optimisation problem. The original formulation by Wachter et al. (2017) is:

📐 Wachter et al. Objective Function
Minimise
λ · (f(x′) − y′)² + d(x, x′) — where λ balances prediction accuracy vs feature proximity
f(x′)
The model's predicted probability for input x′
y′
The desired outcome (e.g. y′ = 1 for "loan approved")
d(x, x′)
Distance metric — typically L1 / MAD-normalised to handle feature scale differences
λ
Regularisation weight — higher λ prioritises flipping the prediction over minimising changes
L1 Distance (Sparsity)
d(x, x′) = Σ |xᵢ − x′ᵢ| / MADᵢ
Encourages few features to change. MAD normalisation makes features comparable across different scales.
L2 Distance (Smoothness)
d(x, x′) = √Σ (xᵢ − x′ᵢ)²
Distributes changes across many features smoothly. More stable but less sparse. Common in gradient-based methods.
L0 Norm (Sparsest)
d(x, x′) = |{i : xᵢ ≠ x′ᵢ}|
Counts the number of changed features directly. Non-differentiable — requires combinatorial or relaxation approaches.
FACE Geodesic Distance
d(x, x′) = shortest path through data density
Travels only through high-density regions. Guarantees plausibility — counterfactuals stay on the data manifold.

Section 05

Animated Diagram — How Counterfactual Search Works

The animation below shows a binary classifier's decision boundary in 2D feature space. Watch how the search algorithm nudges the original point (red) across the boundary to find the nearest valid counterfactual (green).

🎬 Counterfactual Search — Gradient Descent on Decision Boundary
Press Play to start
The red point is the original input (denied loan). The gradient descent path (dashed white) crosses the decision boundary (purple). The green star is the found counterfactual — the minimal change needed for approval.

Section 06

Major Counterfactual Algorithms Compared

The field has produced several distinct algorithms, each with different trade-offs between plausibility, diversity, and computational cost.

🔎
Wachter et al. (2017)
wachter
The original. Gradient-based search minimising a loss combining prediction error and L1 proximity. Model-agnostic via numerical gradients. Fast but ignores plausibility — may produce unrealistic inputs.
✓ Simple, fast, model-agnostic
✗ No diversity, may be implausible
🎲
DICE (2020)
DiCE · Diverse CF Explanations
Generates k diverse counterfactuals simultaneously by adding a diversity regulariser that maximises pairwise distance between generated counterfactuals. Supports actionability constraints.
✓ Diversity, actionability, sklearn/TF/PyTorch
✗ Slower, may still leave manifold
🌐
FACE (2020)
Feasible Actionable CF Explanations
Builds a graph on the training data and finds the shortest path to a positively-labelled point through high-density regions. Guarantees plausibility by construction — every step is on the data manifold.
✓ Plausible, realistic paths
✗ Computationally expensive, needs training data
🔮
GrowingSpheres (2018)
Growing Spheres Algorithm
Starts with a hypersphere around x and expands it until a point with the desired class is found. Then performs a feature selection step to minimise the number of changed features.
✓ Truly model-agnostic, no gradients needed
✗ Slow in high dimensions
🧬
MACE (2022)
Model-Agnostic CF Explanations
Uses causal graph structure to ensure changes respect causal relationships. Changing income also adjusts credit score downstream as it would in reality. The gold standard for causal plausibility.
✓ Causally consistent, most realistic
✗ Requires a causal graph (hard to obtain)
🧩
CFRL (2022)
CF via Reinforcement Learning
Trains a RL agent whose reward is flipping the model prediction while minimising change. Naturally discovers sequential actionable steps — useful for multi-step planning (e.g. career moves over 3 years).
✓ Sequential actions, multi-step recourse
✗ Complex to train, not off-the-shelf

Section 07

Animated Diagram — DICE Diversity vs Single CF

This animation contrasts finding a single counterfactual (Wachter-style) versus a diverse set (DICE-style). Each path represents a different recourse strategy.

🎬 Single CF vs DICE Diverse CFs — Three Paths to Approval
Original (Rejected) CF-1: Raise income CF-2: Reduce debt + income CF-3: Increase employment tenure
DICE generates all three paths simultaneously, maximising pairwise distance between counterfactuals so users receive genuinely different options.

Section 08

Python Implementation — From Scratch (Wachter Method)

Let us build a minimal counterfactual explainer from scratch to understand what libraries like DICE do under the hood. We will use a loan approval dataset and a trained Random Forest classifier.

📦 Setup — Libraries and Dataset
Install
pip install dice-ml alibi scikit-learn pandas numpy
Dataset
Synthetic loan approval dataset — 5 features: Income, Credit Score, Debt Ratio, Employment Years, Loan Amount
Model
Random Forest Classifier — black-box, no gradient access needed
# ── Step 1: Build dataset and train model ───────────────────
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

np.random.seed(42)
n = 2000

# Synthetic loan data
df = pd.DataFrame({
    'income':       np.random.normal(35000, 12000, n).clip(10000, 150000),
    'credit_score': np.random.normal(640, 80, n).clip(300, 850),
    'debt_ratio':   np.random.uniform(0.1, 0.9, n),
    'employment_yr': np.random.exponential(5, n).clip(0, 30),
    'loan_amount':  np.random.normal(18000, 8000, n).clip(1000, 60000),
})

# Label: approved if strong financials
score = (
    (df['income'] / 50000) * 0.35 +
    (df['credit_score'] / 850) * 0.30 +
    (1 - df['debt_ratio']) * 0.20 +
    (df['employment_yr'] / 30) * 0.15
)
df['approved'] = (score + np.random.normal(0, 0.08, n) > 0.48).astype(int)

X = df.drop('approved', axis=1)
y = df['approved']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Random Forest (no scaling needed, but we show pipeline for completeness)
model = RandomForestClassifier(n_estimators=200, max_depth=8, random_state=42)
model.fit(X_train, y_train)
print(f"Test accuracy: {model.score(X_test, y_test):.3f}")
OUTPUT
Test accuracy: 0.891
# ── Step 2: Wachter-style Counterfactual from scratch ───────
from scipy.optimize import minimize
from scipy.stats import median_abs_deviation

def wachter_cf(model, x_orig, y_desired, feature_names,
               immutable=None, lam=0.5, max_iter=500):
    """
    Wachter-style counterfactual finder.
    model       : sklearn classifier with predict_proba
    x_orig      : pd.Series — the original input row
    y_desired   : int — desired class (e.g. 1 = approved)
    immutable   : list of feature names that cannot change
    lam         : regularisation weight (higher = prioritise flipping)
    """
    x0 = x_orig.values.copy().astype(float)
    names = feature_names
    immut_idx = [names.index(f) for f in (immutable or [])]

    # MAD normalisation factors (from training data stats)
    mad = X_train.apply(lambda col: median_abs_deviation(col)).values + 1e-8

    def objective(x_cf):
        # Prediction loss
        proba = model.predict_proba(pd.DataFrame([x_cf], columns=names))[:, y_desired][0]
        pred_loss = (proba - 1.0) ** 2           # want proba → 1

        # L1 proximity (MAD-normalised)
        proximity = np.sum(np.abs(x_cf - x0) / mad)

        return lam * pred_loss + proximity

    def grad_constraint(x_cf):
        # Ensure immutable features don't change
        return x_cf[immut_idx] - x0[immut_idx]

    constraints = [{'type': 'eq', 'fun': grad_constraint}] if immut_idx else []

    result = minimize(
        objective,
        x0=x0,
        method='SLSQP',
        constraints=constraints,
        options={'maxiter': max_iter, 'ftol': 1e-6}
    )

    x_cf = pd.Series(result.x, index=names)
    proba = model.predict_proba(pd.DataFrame([result.x], columns=names))[:, y_desired][0]
    valid = proba > 0.5

    return x_cf, valid, proba


# ── Apply to Maria's application ────────────────────────────
maria = pd.Series({
    'income': 26500,
    'credit_score': 580,
    'debt_ratio': 0.52,
    'employment_yr': 1.5,
    'loan_amount': 20000,
})

feature_names = list(X.columns)
cf, valid, proba = wachter_cf(model, maria, y_desired=1,
                                feature_names=feature_names,
                                immutable=[],       # all mutable
                                lam=1.0)

print(f"\nOriginal prediction: {model.predict_proba(maria.to_frame().T)[0,1]:.3f}")
print(f"CF prediction:       {proba:.3f}  (Valid: {valid})")
print("\nFeature Changes:")
print(f"{'Feature':20s} {'Original':>12s} {'Counterfactual':>16s} {'Delta':>12s}")
print("-"*65)
for feat in feature_names:
    delta = cf[feat] - maria[feat]
    if abs(delta) > 0.01 * abs(maria[feat]):
        print(f"{feat:20s} {maria[feat]:>12.1f} {cf[feat]:>16.1f} {delta:>+12.1f}")
OUTPUT
Original prediction: 0.143 CF prediction: 0.821 (Valid: True) Feature Changes: Feature Original Counterfactual Delta ───────────────────────────────────────────────────────────────── income 26500.0 32180.4 +5680.4 credit_score 580.0 618.3 +38.3 debt_ratio 0.5 0.3 -0.2
Reading the Counterfactual

Maria's probability of approval rises from 14.3% to 82.1% if she can raise her annual income by ~£5,680, slightly improve her credit score by 38 points, and reduce her debt ratio from 52% to 30%. These are real, measurable targets — not abstract feature importances.


Section 09

DICE Library — Diverse Counterfactuals in Practice

The DiCE library (Microsoft Research) is the most widely used production-grade counterfactual library. It supports sklearn, TensorFlow, PyTorch, and any custom model.

# ── DICE: Diverse Counterfactual Explanations ───────────────
import dice_ml
from dice_ml import Dice

# 1. Wrap dataset
d = dice_ml.Data(
    dataframe=pd.concat([X_train, y_train.rename('approved')], axis=1),
    continuous_features=['income', 'credit_score', 'debt_ratio',
                         'employment_yr', 'loan_amount'],
    outcome_name='approved'
)

# 2. Wrap model
m = dice_ml.Model(model=model, backend='sklearn')

# 3. Create explainer
exp = Dice(d, m, method='random')      # method: 'random' | 'genetic' | 'kdtree'

# 4. Generate 4 diverse counterfactuals for Maria
query = maria.to_frame().T
cf_result = exp.generate_counterfactuals(
    query_instances=query,
    total_CFs=4,
    desired_class='opposite',             # flip the prediction
    permitted_range={
        'income':       [26500, 80000],    # can only increase
        'debt_ratio':  [0.1, 0.52],         # can only decrease debt
        'employment_yr': [1.5, 30],          # can only increase tenure
    },
    features_to_vary=['income', 'credit_score', 'debt_ratio', 'employment_yr']
    # loan_amount not varied — she already fixed the amount
)

cf_result.visualize_as_dataframe(show_only_changes=True)
OUTPUT — 4 DIVERSE COUNTERFACTUALS (changes only)
Query instance — approved: 0 (p=0.143) CF 1: income: 32,100 credit_score: 615 debt_ratio: — employment_yr: — CF 2: income: 28,900 credit_score: — debt_ratio: 0.28 employment_yr: 4.2 CF 3: income: — credit_score: 660 debt_ratio: 0.25 employment_yr: 3.5 CF 4: income: 31,200 credit_score: — debt_ratio: 0.35 employment_yr: 2.8 All 4 counterfactuals achieve approved: 1 (p > 0.75)
🎯
Choosing the Right DICE Method

Use 'random' for quick exploration and prototyping. Use 'genetic' for better quality in high dimensions (slower). Use 'kdtree' when you need plausibility guaranteed — it finds counterfactuals that are actual training instances, ensuring maximum realism.


Section 10

Interactive — Actionability Constraints

Actionability constraints are what separate useful counterfactuals from technically valid but practically useless ones. The interactive diagram below shows how constraining mutable features changes the counterfactual space.

🎛 Feature Mutability Control — Click to Toggle


Section 11

Real-World Applications

James — The Job Application
James applied for a senior data analyst role. An ATS (Applicant Tracking System) rejected his CV before a human ever saw it.

Under the EU AI Act Article 86, he was entitled to a human review AND an explanation. The counterfactual explanation said: "If your CV contained 3+ mentions of 'Python', had a quantified achievement (e.g. '↑ revenue by 15%'), and listed a degree year after 2010, your application score would move from 47 to 74 (threshold: 60)."

James updated his CV in 20 minutes. He got the job.
Industry Decision Being Explained Counterfactual Question Regulatory Driver
🏦 Banking Loan / credit approval "What income/score would get me approved?" GDPR Art. 22, EBA Guidelines
🏥 Healthcare Disease risk prediction "What lifestyle changes lower my risk?" FDA AI/ML Action Plan
💼 HR / Hiring CV screening, promotion "What skills/experience would flip the decision?" EU AI Act Art. 86
🎓 Education Admissions, grade prediction "What grades/activities improve my chances?" Ethical best practice
🔒 Insurance Premium calculation, claim denial "What reduces my premium?" FCA Consumer Duty
⚖️ Criminal Justice Recidivism risk (COMPAS-style) "What factors would lower my risk score?" Highly contested — ethical caution required

Section 12

Animated Diagram — Proximity vs Plausibility Trade-off

A fundamental tension in counterfactual generation: the nearest point that crosses the decision boundary may be off the data manifold — a combination of feature values that never occurs in reality. The animation below shows this trade-off.

⚖️ Proximity vs Plausibility — Short Path vs Manifold Path
━━ Direct (Wachter) — shortest Euclidean path, may exit manifold ━━ Manifold (FACE) — follows data density, longer but realistic
The grey cloud represents the actual data distribution. The red path cuts through a low-density void (implausible). The green path stays within the cloud (plausible).

Section 13

How to Evaluate Counterfactual Explanations

Generating counterfactuals is only half the challenge. Evaluating their quality requires a multi-dimensional scorecard. No single metric captures everything — you need all five.

Metric What It Measures Formula / Test Ideal Value
Validity Rate % of CFs that actually achieve desired class mean(f(x′) = y′) 1.0 (100%)
Proximity Average feature distance to original mean L1/MAD(x, x′) Minimise
Sparsity Average # of features changed mean L0(x, x′) 1–3 features ideal
Diversity Pairwise distance between multiple CFs mean d(x′ᵢ, x′ⱼ) for i≠j Maximise
Plausibility (IM1) Distance to nearest training point min_z∈X_train d(x′, z) Minimise
Actionability % of changes that respect constraints All immutable features unchanged? 1.0 (100%)
# ── Evaluation helper ───────────────────────────────────────
from scipy.spatial.distance import cdist

def evaluate_cfs(x_orig, cfs_df, model, X_train, y_desired=1):
    """Compute all 5 CF quality metrics."""
    x0 = x_orig.values
    cf_vals = cfs_df.values
    mad = X_train.apply(lambda c: median_abs_deviation(c)).values + 1e-8

    # 1. Validity
    preds = model.predict(cf_vals)
    validity = np.mean(preds == y_desired)

    # 2. Proximity (L1 MAD-normalised)
    proximity = np.mean([np.sum(np.abs(cf - x0) / mad) for cf in cf_vals])

    # 3. Sparsity (number of changed features)
    sparsity = np.mean([np.sum(np.abs(cf - x0) > 0.01) for cf in cf_vals])

    # 4. Diversity (pairwise distances)
    if len(cf_vals) > 1:
        pw = cdist(cf_vals, cf_vals, metric='minkowski', p=1)
        diversity = pw[np.triu_indices_from(pw, k=1)].mean()
    else:
        diversity = 0.0

    # 5. Plausibility: distance to nearest training point
    dists = cdist([cf for cf in cf_vals],
                  X_train.values, metric='minkowski', p=1)
    plausibility = np.mean(dists.min(axis=1))

    return {
        'validity':    round(validity, 3),
        'proximity':   round(proximity, 3),
        'sparsity':    round(sparsity, 2),
        'diversity':   round(diversity, 3),
        'plausibility': round(plausibility, 3),
    }

Section 14

Common Pitfalls and How to Avoid Them

⚠️ Counterfactual Anti-Patterns — What NOT to Do
1
Ignoring actionability constraints. Generating CFs that change age, race, sex, or past history is not only useless — it is potentially discriminatory. Always define immutable_features before running any CF method.
2
Treating the CF as the model's "reason". A counterfactual shows a path to change, not the cause of the decision. Conflating the two misleads users. "Your income was the reason" is attribution; "raise income to get approved" is recourse. These are different.
3
Presenting a single CF as definitive. Always generate a diverse set. A single CF reflects an optimisation choice, not ground truth. Users deserve to choose their own path.
4
Ignoring causal structure. A CF might say "increase credit score by 50 points" without noting that paying down debt also raises income-to-debt ratio — a cascade effect. For consequential decisions, use causally-informed methods (MACE).
5
Generating CFs without plausibility checks. A CF that says "credit score: 780, debt ratio: 0.95" is simultaneously high creditworthiness and high debt — a combination that barely exists in real data. Always measure IM1 (nearest training point distance) to catch off-manifold CFs.
6
Using CFs for model gaming. Sharing counterfactuals with applicants who then game the model (without genuinely improving their situation) is an ethical risk. Ensure the CF corresponds to genuine real-world improvement, not just manipulating the model's input space.

Section 15

Counterfactuals vs SHAP — When to Use Each

📊 SHAP — Feature Attribution
FeatureSHAP Value
Income−0.42
Credit Score−0.31
Debt Ratio−0.22
Employment−0.18
Loan Amt+0.04

SHAP tells Maria why she was rejected — which features drove the model's output downward. Useful for debugging and auditing.

🔄 Counterfactual — Recourse
FeatureChange Needed
Income£26.5k → £32.2k
Credit Score580 → 618
Debt Ratio52% → 30%
EmploymentNo change needed
Loan AmtNo change needed

The CF tells Maria what to do — concrete, measurable targets. Useful for user recourse and regulatory compliance.

Criterion SHAP / LIME Counterfactuals
What it answers "Why this prediction?" "How to change the prediction?"
Output type Feature importance scores Modified input instance
Actionability Indirect — requires interpretation Direct — concrete targets
Best audience Data scientists, auditors End users, regulated subjects
Regulatory fit Partial (transparency) Strong (right to recourse)
Computation Fast (SHAP: O(TL)) Slower (optimisation loop)
Scope Local or Global Local only
🏆
Best Practice: Use Both Together

SHAP and counterfactuals are complementary, not competing. In a production XAI system: (1) use SHAP to build developer intuition about the model, detect bias, and audit feature importance; (2) surface counterfactuals to end users who need actionable recourse. Many leading fintech and insurtech platforms now include both explanation types in their user-facing interfaces.


Section 16

Complete XAI Pipeline with Counterfactuals

01
Train & Validate Your Model
Train any classifier. Counterfactuals are model-agnostic — they work with Random Forests, XGBoost, neural networks, even non-differentiable models via numerical gradients. Validate on holdout set; CF quality is independent of model accuracy.
02
Define Feature Mutability and Constraints
Decide which features are immutable (age, sex, nationality), which are mutable but bounded (income can only increase, age can only increase), and which have causal dependencies. This is a domain decision, not a machine learning one — consult domain experts.
03
Choose Your CF Method
For quick integration: use DICE (random). For maximum plausibility: use DICE (kdtree) or FACE. For causal consistency: MACE. For sequential recourse: CFRL. For a from-scratch build: Wachter + SLSQP optimiser.
04
Generate Diverse CF Set (k = 3–5)
Never present a single counterfactual. Generate 3–5 valid, diverse options and let the user choose the path that best fits their situation. Diversity is a core quality property — enforce it explicitly via DICE's diversity regulariser.
05
Evaluate Quality (5-Metric Scorecard)
Compute validity, proximity, sparsity, diversity, and plausibility (IM1) for every CF batch. Set minimum thresholds: validity = 1.0, sparsity ≤ 3, plausibility distance ≤ 2× median nearest-neighbour distance of training data.
06
Present to User — Plain Language
Translate the CF into natural language: "If your income were £32,000 (+£5,680) and your credit utilisation were 30% (−22%), your application would be approved." Never show raw feature vectors or technical scores to end users. Include confidence level and a clear disclaimer that the model is a tool, not a human decision.
07
Monitor for Gaming and Model Drift
Track whether users are genuinely improving their situations versus gaming the input space. Monitor CF validity rates as the model drifts — a CF valid today may be invalid after retraining. Set up alerts when CF quality metrics degrade below threshold.

Section 17

Golden Rules

🌿 Counterfactual Explanations — Non-Negotiable Rules
1
Always define immutable features explicitly before generating any counterfactuals. Changing protected attributes (age, sex, race, disability status) is both practically meaningless and legally dangerous. This is not optional.
2
Generate diverse counterfactuals, never just one. Use k ≥ 3. One CF encodes a single arbitrary optimisation path. Three or more CFs give users genuine agency over their recourse strategy.
3
Always check plausibility (IM1). If your CF lies far from any training point, it is an artefact of the model's extrapolation behaviour, not a real achievable target. Reject and regenerate with stricter constraints.
4
Counterfactuals ≠ feature importance. A CF is recourse advice, not a causal explanation. Never present a CF as "the reason the model decided X." It shows a path to change, not the model's reasoning.
5
Communicate CFs in plain language to end users. Delta values (+£5,680, −22% debt ratio) are actionable. Raw feature vectors are not. Translate automatically in your presentation layer before surfacing to non-technical users.
6
Re-validate CFs after every model update. A model retrain shifts the decision boundary. CFs that were valid last month may be invalid today. Build CF validity monitoring into your MLOps pipeline alongside standard model performance monitoring.
7
In high-stakes domains (healthcare, criminal justice), use causal CFs. Standard counterfactuals ignore causal structure. In clinical or legal settings, a CF that ignores causal dependencies can recommend impossible or harmful action sequences. Use MACE or a structural causal model.
You have completed Core XAI Techniques. View all sections →