Introduction to Matplotlib
Matplotlib is Python's foundational plotting library — the engine under almost every visualisation in the scientific and data science ecosystem. Seaborn, pandas .plot(), and dozens of other libraries are all built on top of it. Understanding matplotlib directly gives you complete control over every pixel of every chart you produce.
Use seaborn or plotly for quick EDA. Use matplotlib directly when you need pixel-precise control: custom tick formatters, dual axes, subplots with shared axes, inset charts, publication-quality figures with exact font sizes, or any layout that a higher-level library cannot produce. Matplotlib's learning curve pays off in unlimited flexibility.
The Anatomy of a Matplotlib Figure
Every visual element in matplotlib is an object you can access and modify. The Figure is the canvas; the Axes is the actual plot area. One Figure can contain many Axes.
Setup and Imports
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import numpy as np
import pandas as pd
# Apply a style sheet globally
plt.style.use('seaborn-v0_8-darkgrid') # clean dark grid look
# Set default figure size and resolution
plt.rcParams['figure.figsize'] = (10, 5)
plt.rcParams['figure.dpi'] = 120
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['xtick.labelsize'] = 10
plt.rcParams['ytick.labelsize'] = 10
plt.rcParams['legend.fontsize'] = 10
plt.rcParams['lines.linewidth'] = 2
Matplotlib has two interfaces. The pyplot interface (plt.plot()) is quick for single plots. The object-oriented interface (fig, ax = plt.subplots()) is the professional standard — always use it for anything more than a one-liner. It gives you explicit control over every axes object and avoids confusing state bugs.
Line Chart — Trends Over Time
The line chart is matplotlib's most used plot. It connects data points in sequence — ideal for time series, continuous functions, and any data where order matters. Every visual property of the line (colour, width, style, markers) is independently controllable.
fig, ax = plt.subplots(figsize=(10, 5))
# Multiple lines with different styles
ax.plot(months, revenue,
color='#60a5fa', linewidth=2, marker='o', markersize=5,
label='Revenue')
ax.plot(months, target,
color='#f59e0b', linewidth=1.5, linestyle='--',
label='Target')
# Fill the area between lines
ax.fill_between(months, revenue, target,
where=[r > t for r, t in zip(revenue, target)],
alpha=0.15, color='#34d399', label='Above target')
ax.set_title('Monthly Revenue vs Target', fontsize=14, fontweight='bold', pad=12)
ax.set_xlabel('Month')
ax.set_ylabel('Revenue (₹ thousands)')
ax.legend(framealpha=0.3)
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f'₹{x:.0f}k'))
plt.tight_layout()
plt.show()
The green shaded area appears only where Revenue exceeds Target — created with ax.fill_between(where=[...]). Dashed line = target series using linestyle='--'.
Bar Chart — Comparing Categories
Bar charts compare a numeric measure across discrete categories. Matplotlib supports vertical bars, horizontal bars, grouped bars, and stacked bars — each with independent colour and width control per bar.
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# ── Grouped bar chart ──────────────────────────────
x = np.arange(len(categories))
width = 0.35
bars1 = axes[0].bar(x - width/2, male_avg, width, label='Male', color='#60a5fa', alpha=0.85)
bars2 = axes[0].bar(x + width/2, female_avg, width, label='Female', color='#f87171', alpha=0.85)
# Add value labels on top of each bar
axes[0].bar_label(bars1, fmt='₹%.0fk', padding=3, fontsize=9)
axes[0].bar_label(bars2, fmt='₹%.0fk', padding=3, fontsize=9)
axes[0].set_xticks(x)
axes[0].set_xticklabels(categories, rotation=30, ha='right')
axes[0].set_title('Avg Spend by Category & Gender')
axes[0].legend()
# ── Horizontal bar chart (sorted) ──────────────────
sorted_idx = np.argsort(total_revenue)
axes[1].barh(np.array(categories)[sorted_idx],
np.array(total_revenue)[sorted_idx],
color='#f59e0b', alpha=0.85)
axes[1].set_title('Total Revenue by Category')
plt.tight_layout()
plt.show()
Left: ax.bar_label() adds value annotations automatically. Right: ax.barh() with np.argsort() creates a sorted horizontal ranking chart.
Scatter Plot — Relationships Between Variables
Scatter plots reveal correlations, clusters, and outliers. In matplotlib, the scatter() function encodes up to four dimensions simultaneously: x position, y position, colour (third variable), and size (fourth variable).
fig, ax = plt.subplots(figsize=(9, 6))
scatter = ax.scatter(
df['age'], df['purchase_amount'],
c=df['rating'], # colour encodes a 3rd variable
s=df['delivery_days'] * 8, # size encodes a 4th variable
cmap='viridis',
alpha=0.6, edgecolors='white', linewidths=0.4
)
# Colourbar for the 3rd dimension
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Customer Rating', fontsize=11)
# Annotate a specific outlier point
ax.annotate('High-value outlier',
xy=(58, 24800), xytext=(45, 23000),
arrowprops=dict(arrowstyle='->', color='#f87171'),
fontsize=9, color='#f87171')
ax.set_xlabel('Customer Age')
ax.set_ylabel('Purchase Amount (₹)')
ax.set_title('Age vs Purchase — colour=rating, size=delivery days')
plt.tight_layout()
plt.show()
Four variables encoded in one chart. The ax.annotate() call adds an arrow pointing to the outlier. Colourbar created with plt.colorbar(scatter, ax=ax).
Histogram & KDE — Visualising Distributions
Matplotlib's ax.hist() is the foundation for distribution analysis. Combined with a manually computed KDE line, it gives you the full picture of a variable's shape — skewness, modality, and tail weight — all in one chart.
from scipy.stats import gaussian_kde
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# ── Left: histogram + KDE ──────────────────────────
n, bins, patches = axes[0].hist(
data, bins=35, color='#60a5fa', alpha=0.6,
edgecolor='white', linewidth=0.4, density=True
)
# Overlay KDE curve
kde = gaussian_kde(data)
x_range = np.linspace(data.min(), data.max(), 300)
axes[0].plot(x_range, kde(x_range), color='#f59e0b', linewidth=2, label='KDE')
# Colour bars by region (below mean = blue, above = amber)
mean_val = data.mean()
for patch, left_edge in zip(patches, bins):
if left_edge > mean_val:
patch.set_facecolor('#f59e0b')
axes[0].axvline(mean_val, color='#f59e0b', linestyle='--', label=f'Mean={mean_val:.0f}')
axes[0].axvline(np.median(data), color='#34d399', linestyle='--', label='Median')
axes[0].legend()
axes[0].set_title('Distribution of Purchase Amount')
# ── Right: overlapping histograms by group ──────────
for group, colour in zip(groups, ['#60a5fa', '#f87171', '#34d399']):
axes[1].hist(group['data'], bins=25, alpha=0.5,
color=colour, label=group['label'], density=True)
axes[1].legend()
axes[1].set_title('Distribution by Income Bracket')
plt.tight_layout()
plt.show()
Left: bars to the right of the mean (amber) show the right-skew. ax.axvline() draws vertical reference lines. Right: three overlapping histograms with alpha=0.5 reveal how income groups differ in spending.
Subplots — Multiple Charts in One Figure
The plt.subplots() function creates a grid of axes objects. This is the core layout tool in matplotlib — it lets you create dashboards, comparison panels, and multi-panel analysis figures with precise shared axis control.
# 2×2 grid of subplots with shared x-axis
fig, axes = plt.subplots(2, 2, figsize=(12, 8),
sharex=False, sharey=False)
# Access individual axes
ax_line = axes[0, 0] # top-left
ax_bar = axes[0, 1] # top-right
ax_hist = axes[1, 0] # bottom-left
ax_scat = axes[1, 1] # bottom-right
# Add a shared super-title for the whole figure
fig.suptitle('Sales Dashboard — Q4 2024', fontsize=16, fontweight='bold', y=1.01)
# Remove unused axes: axes[1,2].set_visible(False)
# Adjust spacing between subplots
plt.tight_layout(pad=2.0)
plt.show()
SALES DASHBOARD — Q4 2024
Revenue Trend (Line)
Category Comparison (Bar)
Age Distribution (Histogram)
Age vs Spend (Scatter)
fig.suptitle() sets a shared title across all panels. Each subplot is an independent Axes object — style each one separately using its own ax variable.
Heatmap with imshow() — Correlation & Pivot Tables
Matplotlib's ax.imshow() renders any 2D array as a coloured grid — the foundation for correlation heatmaps, pivot table visualisations, and confusion matrices. Unlike seaborn's heatmap(), the matplotlib version gives you full control over colormap, cell annotations, and axis formatting.
fig, ax = plt.subplots(figsize=(7, 6))
# Render the correlation matrix as an image
im = ax.imshow(corr_matrix, cmap='coolwarm', vmin=-1, vmax=1, aspect='auto')
# Add colourbar
plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
# Annotate every cell with the correlation value
for i in range(n):
for j in range(n):
val = corr_matrix[i, j]
text_color = 'white' if abs(val) > 0.6 else 'black'
ax.text(j, i, f'{val:.2f}', ha='center', va='center',
fontsize=11, color=text_color, fontweight='bold')
ax.set_xticks(range(n)); ax.set_xticklabels(feature_names, rotation=45, ha='right')
ax.set_yticks(range(n)); ax.set_yticklabels(feature_names)
ax.set_title('Correlation Matrix (Pearson)', pad=14)
plt.tight_layout()
plt.show()
Cell text colour automatically switches between white and black based on background intensity — done with the if abs(val) > 0.6 conditional. The colourbar is added with plt.colorbar(im, ax=ax).
Boxplot & Violin Plot — Distribution Shape
Matplotlib's ax.boxplot() and ax.violinplot() expose every visual detail of a distribution: median, quartiles, whiskers, and outliers. The violin plot extends this by showing the full distribution shape via a KDE on each side.
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
# ── Styled boxplot ─────────────────────────────────
bp = ax1.boxplot(
data_by_category,
patch_artist=True, # fills boxes with colour
notch=True, # notch = 95% CI around median
vert=True,
widths=0.5
)
# Style each box individually
colors = ['#60a5fa', '#34d399', '#f59e0b', '#a78bfa', '#f87171']
for patch, color in zip(bp['boxes'], colors):
patch.set_facecolor(color)
patch.set_alpha(0.6)
# ── Violin plot ─────────────────────────────────────
vp = ax2.violinplot(
data_by_category,
showmedians=True, showextrema=True
)
for body, color in zip(vp['bodies'], colors):
body.set_facecolor(color)
body.set_alpha(0.5)
plt.tight_layout()
plt.show()
Boxplot — patch_artist=True, notch=True
Violin Plot — showmedians=True
Violin plot is wider where most data lives — you can see bimodal distributions that a boxplot completely hides. Use notch=True on boxplots to show the 95% confidence interval around the median.
Styling & Saving Publication-Quality Figures
Matplotlib's style system and fine-grained control over every element makes it the gold standard for publication figures. Journals, conference papers, and reports all demand specific font sizes, exact figure dimensions, and lossless export — matplotlib handles all of these.
# ── List all available styles ───────────────────────
print(plt.style.available)
# ── Apply a style ───────────────────────────────────
plt.style.use('seaborn-v0_8-whitegrid') # clean white
plt.style.use('dark_background') # pure dark
plt.style.use('ggplot') # R-style
plt.style.use('bmh') # Bayesian methods
# ── Custom rcParams for publication ─────────────────
plt.rcParams.update({
'font.family': 'DejaVu Sans',
'figure.dpi': 150,
'savefig.dpi': 300,
'axes.spines.top': False, # remove top spine
'axes.spines.right':False, # remove right spine
'axes.grid': True,
'grid.alpha': 0.3,
})
# ── Saving figures ───────────────────────────────────
fig.savefig('chart.png', dpi=300, bbox_inches='tight')
fig.savefig('chart.pdf', bbox_inches='tight') # vector PDF
fig.savefig('chart.svg', bbox_inches='tight') # scalable SVG
fig.savefig('chart.eps', bbox_inches='tight') # for LaTeX
seaborn-v0_8-darkgrid
dark_background
ggplot
bmh (Bayesian Methods)
Same data, four completely different aesthetics. For publication use: bbox_inches='tight' prevents labels from being cropped. Use .pdf or .svg for vector output that scales to any size.
Golden Rules of Matplotlib
Matplotlib's verbosity is its superpower. Every other Python visualisation library eventually hits a wall where it cannot produce exactly what you need — and the answer is always "drop down to matplotlib." Learn it deeply, and you will never be blocked by a chart requirement again.