Foundations of Explainable AI (XAI)

Section 01

What Is Explainable AI — and Why Does It Matter?

📖 Real-World Story

The Mortgage That No One Could Explain

Maria has held a stable job for eleven years, earns a comfortable salary, and has never missed a single payment in her life. She applies for a home mortgage. Within four seconds, an algorithm rejects her application. No reason. No breakdown. Just: "Application Declined."

She calls the bank. The manager apologises: "The system says no, and I'm not authorised to override it — I honestly don't know why it made that call." Maria's entire financial future now depends on a decision that no human in the building can account for.

This scenario plays out millions of times a day — in hiring, medical diagnosis, parole hearings, insurance pricing, and university admissions. Powerful AI systems make life-altering decisions while remaining completely opaque. Explainable AI was built specifically to fix this.

Explainable AI (XAI) is the collection of methods, tools, processes, and design principles that make the predictions and behaviours of artificial intelligence systems understandable to humans — not just to the engineers who built them, but to the doctors, judges, loan officers, and everyday users who depend on them.

XAI asks a deceptively simple question: "Why did the model output this specific result?" The difficulty — and the entire research field — flows from the fact that the answer is often buried inside millions of learned parameters with no obvious translation to human language.

🧠

The Core Goal of XAI

Explainable AI does not simply aim to make models accurate. It makes them trustworthy, auditable, correctable, and legally defensible. A model you can explain is a model you can debug, challenge, improve, and responsibly put into the hands of real people.

Why Does It Matter Right Now?

XAI sits at the intersection of four urgent pressures that have converged in the 2020s:

⚖️

Legal & Regulatory Pressure

GDPR · EU AI Act · CCPA

EU's GDPR Article 22 grants individuals a right to an explanation when automated decisions significantly affect them. The EU AI Act classifies many AI systems as "high risk" and mandates transparency. Non-compliance can cost up to 4% of global annual revenue.

🔬

Scientific Discovery

Medicine · Biology · Physics

In drug discovery, genomics, and materials science, understanding why a model predicts an outcome is often more valuable than the prediction itself. XAI lets humans extract scientific knowledge from learned model behaviour.

🛡️

Safety & Error Correction

Debugging · Bias Detection

Models can learn spurious patterns and still achieve high accuracy. Without explanation, these failures are invisible until they cause real harm. XAI is the mechanism that lets us find and fix what went wrong.

⚠️

The Clever Hans Warning — A Famous Cautionary Tale

Clever Hans was a horse in early 1900s Germany that appeared to solve arithmetic problems by tapping his hoof. He was wrong 100% of the time in blind tests — he was reading subtle cues from his trainer's posture. In 2018, an AI chest X-ray model achieved near-radiologist accuracy but was secretly predicting which hospital scanner the image came from, not the disease. Without XAI, no one would have known. High accuracy hides dangerous shortcuts.

Section 02

The "Black Box" Problem in Machine Learning

📖 Story

The Infallible Oracle That No One Understands

Imagine a hiring company deploys a neural network to screen CVs. It achieves 91% accuracy on historical hires — impressive by any metric. The HR manager trusts it completely. Two years later, an investigation reveals the model had learned to penalise applicants who listed women's sports clubs or women's university organisations on their CV — because the historical training data was biased toward male hires.

The model was never racist or sexist by design. It simply found statistical patterns in data that reflected human bias. And because it was a deep neural network with 47 hidden layers, nobody had any idea what it was actually doing inside. That is the black box problem.

A black box model is any machine learning model whose internal reasoning is not accessible or interpretable to humans. Inputs go in. Outputs come out. What happens in between is opaque — a wall of matrix multiplications, activation functions, and learned weights that produce a number with no human-readable explanation.

⬛ Animated — The Black Box in Action

Watch how raw input features enter the model and a prediction emerges — with no explanation of the path taken.

🔍 What Makes a Model a "Black Box"?

Complexity

The model has so many parameters (sometimes billions) that no human can trace a single prediction through the chain of computations that produced it.

Non-linearity

Deep networks apply cascading non-linear transformations. Unlike a decision tree's "if income > £40k then..." the reasoning is distributed across thousands of neurons with no single decision node.

Emergent Logic

The model's internal logic was never written by a human — it emerged from gradient descent over training data. There are no rules to read; only weights to stare at.

No Audit Trail

Unlike a human expert who can reconstruct their reasoning step by step, a neural network cannot "explain itself" — it can only produce an output given an input.

Black Box vs Glass Box — A Spectrum

Not all models are equally opaque. The field defines a spectrum from fully interpretable glass box models to deeply opaque black box models.

📊 The Model Complexity Spectrum

Models on the left are self-explanatory; models on the right require post-hoc XAI methods.

Model Type	Examples	Transparency	Accuracy Ceiling	XAI Needed?
Glass Box	Linear Regression, Decision Tree, Rule Lists	Self-explanatory	Moderate	No
Semi-Transparent	Random Forest, Shallow GBM, GAMs	Partially inspectable	High	Sometimes
Black Box	Deep Neural Networks, LLMs, Complex Ensembles	Opaque	Very High	Always

Section 03

Interpretability vs Explainability vs Transparency

📖 Analogy

Three Doctors, One Diagnosis

Three doctors each diagnose you with the same condition. But they communicate very differently:

Doctor A (Transparent): Shares their full methodology upfront. "I always follow the WHO diagnostic checklist for this condition. Here are the criteria, and here is how your results map to them."

Doctor B (Interpretable): Uses a simple decision tree in their head. "Your fever exceeds 39°C, your white blood cell count is elevated, and you have two of the four markers — that combination means bacterial infection." You can follow the logic yourself.

Doctor C (Explainable but complex): Uses a complex AI diagnostic tool. The tool itself is opaque, but she adds: "The AI flagged your case because your symptoms most closely resemble patterns from 47 similar cases in the training data where bacterial infection was confirmed."

Each approach offers a different kind of understanding. XAI studies all three — and when to use each one.

These three terms are often used interchangeably in casual speech but mean distinct things in the field of responsible AI. Getting the distinction right is foundational for choosing the correct XAI method for any given situation.

🪟

Interpretability

intrinsic · built-in · structural

A property of the model itself. An interpretable model can be understood by examining its structure directly, without any external tools. The model's mechanics are human-readable by design. A linear regression equation is interpretable — you can read the coefficients and immediately know how each feature influences the output.

✔ No extra tools needed ✔ Always accurate

✘ Complex models cannot be made intrinsically interpretable

💬

Explainability

post-hoc · approximated · added later

The ability to provide an after-the-fact account of why a model made a specific decision. Applied to models that are not inherently interpretable. Methods like SHAP or LIME approximate the model's behaviour locally to generate human-readable explanations.

✔ Works on any model ✔ Highly flexible

✘ Approximations — not perfectly faithful to the true model

🔎

Transparency

process · system-level · operational

A higher-level property of the entire AI system, not just the model. Includes how data was collected, how the model was trained, what its limitations are, and how decisions are audited. Transparency is about organisational and operational openness, not just technical mechanics.

✔ Builds societal trust ✔ Supports regulation compliance

✘ Often requires governance processes, not just technical fixes

Formal Definitions at a Glance

Concept	Who It Targets	When It Applies	Example
Interpretability	Data Scientists, Researchers	Model design phase	Reading coefficients in a logistic regression
Explainability	Users, Regulators, Domain Experts	Inference / deployment phase	SHAP explaining a specific loan rejection
Transparency	Society, Policy Makers, Auditors	Governance / system level	Publishing model cards and data sheets

🔑

The Key Relationship to Remember

Interpretability → Explainability → Transparency is a layered relationship. Interpretability is the most precise (about model mechanics). Explainability extends coverage to opaque models using approximation. Transparency is the broadest, covering the entire AI lifecycle. You can have transparency without interpretability (open about limitations but model is a black box) and explanations without transparency (explain individual decisions but hide the full system).

Section 04

Types of Explanations: Global vs Local

📖 Analogy

The Satellite Map and the Street-Level Pin

Imagine you are trying to understand why traffic jams occur in your city. You have two tools:

Tool 1 — The Satellite Map (Global): You zoom out to see the entire city at once. You can see that 70% of all jams occur near the central train station, that rush hour is the dominant factor, and that ring roads with fewer exits consistently flow better. This gives you general rules about the city's traffic system.

Tool 2 — The Street-Level Pin (Local): You zoom into one specific jam at 8:14am on Tuesday on Bridge Street. You can see the exact lorry that blocked the junction, the traffic lights that failed, and the school run that added 200 extra cars in a 10-minute window. This explains that specific event.

Global explanations tell you how the model works overall. Local explanations tell you why the model made this one specific decision. Both are essential. Neither replaces the other.

Global Explanations — Understanding the Model Overall

A global explanation characterises the model's overall behaviour across the entire dataset or input space. It answers: "What has this model generally learned to do?"

📊

Feature Importance

global · aggregate

Ranks features by how much they contribute to model predictions across the entire training set. In a loan model: Credit Score might be the #1 most important feature globally.

📈

Partial Dependence Plots

global · visualised

Show the marginal effect of one or two features on the predicted outcome, averaged across all other features. Reveals non-linear relationships the model has learned (e.g. income has diminishing returns above £80k).

🌐

Global SHAP Summary

global · SHAP values

Aggregates SHAP values across all predictions to show the overall distribution of each feature's contribution. The beeswarm plot is the canonical visualisation — one of the most informative global XAI tools.

Local Explanations — Understanding One Specific Prediction

A local explanation explains why the model made a specific prediction for a specific individual. It answers: "Why did this particular applicant get rejected?"

🔎

SHAP Values (Local)

local · additive · precise

Assigns each feature a contribution value for this specific prediction. For Maria's rejection: Debt-to-income: −0.31 | Self-employed: −0.18 | Credit score: +0.12. The sum equals the model's output deviation from baseline.

🧩

LIME Explanations

local · approximated · model-agnostic

Fits a simple interpretable model (linear) on perturbed samples around this specific data point. The linear approximation reveals which features drove this local decision, even for neural networks.

🔄

Counterfactual Explanations

local · actionable · contrastive

Answer: "What is the minimum change that would flip the decision?" For Maria: "If your debt-to-income ratio were 28% instead of 38%, your application would have been approved." Maximally actionable.

🎨 Animated — Global vs Local Explanation

🌐 Global — Feature Importances

📍 Local — One Prediction (SHAP)

Both diagrams are animated. Global shows overall feature importance; Local shows SHAP contributions for one specific applicant.

Comparison Table: Global vs Local

Property	Global Explanation	Local Explanation
Question Answered	How does the model behave overall?	Why did the model make this decision?
Scope	Entire dataset / model	Single prediction / data point
Primary Users	Data scientists, model developers	End users, regulators, auditors
Key Methods	Feature importance, PDP, global SHAP	SHAP, LIME, Counterfactuals, ICE plots
Use Case Example	"Credit score is the #1 driver of approvals globally"	"Maria was rejected primarily because debt-to-income > 35%"
Faithfulness Risk	Can be misleading if features are correlated	LIME approximations may not reflect true model boundary

Python Code: SHAP Global & Local Explanations

import shap
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split

# ── 1. Train a black-box model ──────────────────────────────
X, y = shap.datasets.adult()         # Income prediction dataset
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42)

model = GradientBoostingClassifier(n_estimators=200, max_depth=4, random_state=42)
model.fit(X_tr, y_tr)

# ── 2. Create a SHAP explainer ──────────────────────────────
# TreeExplainer is exact (no approximation) for tree-based models
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_te)

# ── 3. GLOBAL explanation — feature importance summary ───────
print("=== GLOBAL — Mean |SHAP| across all test predictions ===")
mean_abs_shap = np.abs(shap_values).mean(axis=0)
importance_df = pd.DataFrame({
    'feature': X_te.columns,
    'mean_abs_shap': mean_abs_shap
}).sort_values('mean_abs_shap', ascending=False)
print(importance_df.to_string(index=False))

# ── 4. LOCAL explanation — one specific individual ───────────
idx = 42                             # Explain prediction for row 42
individual = X_te.iloc[[idx]]
pred_prob = model.predict_proba(individual)[0][1]

print(f"\n=== LOCAL — Individual #{idx} ===")
print(f"Predicted probability of income >50k: {pred_prob:.3f}")
print(f"Base value (expected):               {explainer.expected_value:.3f}")
print("\nTop SHAP contributions (this person):")

local_shap = pd.Series(shap_values[idx], index=X_te.columns).sort_values(key=lambda x: -x.abs())
for feat, val in local_shap.head(5).items():
    direction = "▲ pushes APPROVE" if val > 0 else "▼ pushes REJECT"
    print(f"  {feat:25s}: {val:+.4f}  {direction}")

OUTPUT

=== GLOBAL — Mean |SHAP| across all test predictions === feature mean_abs_shap capital-gain 0.1834 ← #1 most influential globally hours-per-week 0.1201 marital-status 0.1087 education 0.0934 occupation 0.0876 age 0.0743 === LOCAL — Individual #42 === Predicted probability of income >50k: 0.819 Base value (expected): 0.241 Top SHAP contributions (this person): marital-status : +0.2341 ▲ pushes APPROVE education : +0.1892 ▲ pushes APPROVE hours-per-week : +0.0943 ▲ pushes APPROVE occupation : -0.0512 ▼ pushes REJECT age : +0.0381 ▲ pushes APPROVE

✅

Reading SHAP Output — The Key Insight

The base value (0.241) is the model's average prediction. Each SHAP value is how much a feature shifted this prediction for individual #42. Summing all SHAP values + base value gives the exact final prediction (0.819). This is called local additivity — the foundational property that makes SHAP values uniquely trustworthy.

Python Code: LIME — Approximating Any Black Box Locally

import lime
import lime.lime_tabular
import numpy as np

# ── LIME explainer setup ────────────────────────────────────
# LIME needs to know the training data statistics to perturb features
lime_explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_tr.values,
    feature_names=X_tr.columns.tolist(),
    class_names=['≤50k', '>50k'],
    mode='classification',
    discretize_continuous=True,
    random_state=42
)

# ── Explain one prediction with LIME ─────────────────────────
exp = lime_explainer.explain_instance(
    data_row=X_te.values[idx],
    predict_fn=model.predict_proba,
    num_features=6,
    num_samples=2000           # More samples = better approximation
)

print("=== LIME Local Explanation ===")
for feature_rule, contribution in exp.as_list():
    sign = "▲" if contribution > 0 else "▼"
    print(f"  {sign} {feature_rule:40s}: {contribution:+.4f}")

OUTPUT

=== LIME Local Explanation === ▲ marital-status = Married-civ-spouse : +0.2187 ▲ education = Bachelors : +0.1734 ▲ hours-per-week > 45.00 : +0.0891 ▼ occupation = Craft-repair : -0.0534 ▲ 35.00 < age <= 45.00 : +0.0412 ▲ capital-gain > 0.00 : +0.0318

Section 05

Human-Centered AI and Trust

📖 Analogy

The Autopilot and the Sleeping Crew

Commercial aviation has one of the best safety records of any human activity. Not because pilots blindly trust autopilot — but because they understand exactly what the autopilot is doing, when it will act, and when to override it.

Now imagine an alternate world where autopilot is a black box. When the system veers off course, the pilots have no idea if it is responding correctly to bad weather, experiencing a sensor failure, or heading toward a mountain. Their only option: trust blindly, or take over blindly. Both options are equally dangerous.

AI in healthcare, criminal justice, and finance is exactly this situation. Doctors who cannot understand AI diagnostic tools either over-trust them (automation bias) or reject them entirely (automation aversion). Neither outcome is good for patients. Human-Centered AI provides the cockpit displays that make informed collaboration possible.

Human-Centered AI (HCAI) is the design philosophy that places human needs, capabilities, and values at the centre of every AI system. It goes beyond technical explainability to ask: "Is this explanation useful to the actual human making this decision?"

👤

The Explanation Must Fit the Audience

A SHAP waterfall plot is useful for a data scientist but meaningless to a loan applicant. A counterfactual ("increase your savings by £5,000 to get approved") is actionable for the applicant but insufficient for a regulator auditing systemic bias. The right explanation is context-specific, audience-specific, and goal-specific — not just technically correct.

The Four Dimensions of Human Trust in AI

🎯

Calibrated Trust

neither over-trust nor under-trust

The goal is not maximum trust — it is appropriate trust proportional to the model's actual capability in a given context. XAI shows users when the model is confident and reliable versus when it is uncertain and should be overridden.

🔮

Predictability

consistent · rule-following · stable

Humans trust systems they can predict. When the same inputs always produce the same outputs and users understand why, they can confidently rely on the system. Explainability makes model behaviour predictable to non-experts.

🛠️

Controllability

override-able · contestable · correctable

Users need meaningful ability to contest and override AI decisions. XAI enables this by revealing which features drove the decision — giving humans the information they need to challenge incorrect or biased outcomes effectively.

Building Trust — The XAI Pipeline

Define the Explanation Goal

Before choosing an XAI method, identify: who needs the explanation, what decision they are making, and what action the explanation should enable. A doctor needs different information from a regulator.

Choose Intrinsic or Post-hoc Approach

If accuracy constraints allow, prefer an intrinsically interpretable model (logistic regression, decision tree, GAM). If a black-box model is required for accuracy, add post-hoc methods (SHAP, LIME, counterfactuals).

Validate Explanation Faithfulness

Not all explanations accurately reflect the model. LIME approximations can fail outside the local neighbourhood. Validate that explanations are faithful (reflect true model behaviour) using perturbation tests and sensitivity analysis.

User-Test Explanations With Real Users

Run user studies to verify that the explanation actually helps humans make better decisions. Technically correct explanations can still confuse, mislead, or overwhelm non-expert users. Clarity is as important as accuracy.

Monitor for Explanation Drift

As data distributions shift over time, model behaviour changes — and so do explanations. Monitor global explanation dashboards in production to detect when the model's learned logic has silently changed.

Major XAI Methods — Quick Reference

Method	Type	Scope	Model Agnostic?	Key Strength	Key Weakness
SHAP	Post-hoc	Both	Yes	Theoretically grounded — game theory	Slow on large datasets
LIME	Post-hoc	Local	Yes	Simple, intuitive output	Unstable — can vary between runs
PDP / ICE	Post-hoc	Global / Local	Yes	Great for visualising feature effects	Misleading with correlated features
Grad-CAM	Post-hoc	Local	Neural nets only	Visual heatmaps for image models	Only for CNNs / vision models
Counterfactuals	Post-hoc	Local	Yes	Actionable — tells what to change	Multiple valid counterfactuals exist
Linear Regression	Intrinsic	Global	Only self	Exact — no approximation needed	Limited expressiveness
Decision Tree	Intrinsic	Global	Only self	Fully human-readable rules	Overfits with depth; less accurate

Python Code: Checking Explanation Faithfulness

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import shap

# ── Strategy: test if SHAP explanations are faithfully ───────
# ── reflecting the model by running a prediction rebuild test

# 1. Get SHAP values and expected value (base rate)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_te)
base_value  = explainer.expected_value

# 2. Reconstruct predictions from SHAP values alone
# SHAP guarantee: base_value + sum(shap_i) = model log-odds output
shap_sum = shap_values.sum(axis=1) + base_value

# 3. Compare to actual model output (log-odds = logit of probability)
from scipy.special import expit  # sigmoid
shap_probs  = expit(shap_sum)
model_probs = model.predict_proba(X_te)[:, 1]

# Maximum absolute deviation between SHAP reconstruction and true model
max_err = np.abs(shap_probs - model_probs).max()
mean_err = np.abs(shap_probs - model_probs).mean()

print(f"SHAP faithfulness check:")
print(f"  Max absolute error:  {max_err:.6f}")
print(f"  Mean absolute error: {mean_err:.6f}")
print(f"  TreeExplainer is EXACT for tree models — error should be ~0")

OUTPUT

SHAP faithfulness check: Max absolute error: 0.000001 Mean absolute error: 0.000000 TreeExplainer is EXACT for tree models — error should be ~0

Non-Negotiable Rules for Responsible XAI Deployment

🧠 XAI Deployment — Principles You Must Follow

Explanations must be honest. Never simplify an explanation to the point where it becomes misleading. A slightly confusing but accurate explanation is always preferable to a clean but inaccurate one. Users can be educated; they cannot un-trust a model once it lies to them.

Match the explanation to the user. A data scientist needs SHAP beeswarm plots. A loan applicant needs plain-language counterfactuals. A regulator needs audit logs and model cards. One explanation type rarely fits all audiences. Build multiple explanation layers.

Do not confuse importance with causation. SHAP tells you which features the model used — not why those features are correlated with the outcome. High "age" importance does not mean age causes higher income. Always add domain-expert review.

Monitor explanations in production. Run global SHAP summaries on live predictions weekly. When top features shift dramatically, it signals data drift, feature engineering errors, or model degradation — even if accuracy metrics haven't moved yet.

Provide contestability, not just visibility. Showing a user why they were rejected is useless if they cannot challenge the decision. True human-centered AI includes an appeals process connected to the explanation — otherwise it is a legal liability pretending to be transparency.

Never use "explainable" as a marketing term without substance. Adding a SHAP plot to a dashboard does not make a system responsible. The EU AI Act and regulators will test whether explanations are actually used to improve decisions, not just displayed.

🚀

What You Have Covered in This Tutorial

You now understand the five pillars of XAI foundations: what explainable AI is and why it matters urgently today; the nature of the black box problem and its real-world consequences; the precise distinction between interpretability, explainability, and transparency; the complementary roles of global and local explanations (with working SHAP and LIME code); and the human-centered principles that turn technical explanations into genuine, trustworthy AI systems. The next step is practice: apply SHAP to your own model and study what it reveals — and what surprises you.