Computer Vision 📂 Computer Vision Basics · 2 of 12 73 min read

Mastering OpenCV

A comprehensive hands-on tutorial covering 15 major OpenCV use-cases — from reading images and color space conversion to real-time video processing and face detection

Section 01

The Story That Explains OpenCV

Teaching a Machine to See
Imagine you are holding a photo of a cat. Your brain fires in milliseconds — you recognise fur, ears, eyes, whiskers. Now imagine you want a computer to do the same thing. To a computer, that photo is nothing more than a grid of numbers — millions of tiny values between 0 and 255 representing pixel brightness.

OpenCV (Open Source Computer Vision Library) is the Swiss Army knife that lets you manipulate, analyse, and extract meaning from those grids of numbers. First built at Intel in 1999, it now powers everything from NASA rover cameras to your phone's portrait mode — and it is 100% free.

At its core, every image in OpenCV is a NumPy array of shape (Height, Width, Channels) with data type uint8 (values 0–255). Every algorithm is simply a function that takes that array in and returns a transformed array out.

📈 Fig 1 — OpenCV Pipeline: From Pixels to Insight
INPUT Camera / File Stream / URL LOAD cv2.imread() → ndarray uint8 PROCESS imgproc · features2d video · dnn · calib3d objdetect · photo shape (H, W, 3) uint8 OUTPUT imshow — display imwrite — save VideoWriter stream / analyse Every image is a NumPy ndarray — shape (H, W, C), dtype uint8, values 0–255 img.shape → (480, 640, 3) · H=480 rows · W=640 cols · C=3 channels (BGR)

OpenCV reads images as BGR arrays by default — not RGB. All processing happens on NumPy arrays in memory.

📷
BGR — Not RGB

OpenCV stores colour channels in Blue-Green-Red order — the opposite of what Matplotlib, PIL, and most libraries expect. Always convert with cv2.cvtColor(img, cv2.COLOR_BGR2RGB) before passing to any other library. Forgetting this is the most common beginner mistake.

🔌 Image Flow — What Happens Step by Step
Input
Camera / File / Stream → raw bytes on disk or in memory
Load
cv2.imread() decodes → NumPy array shape (H, W, 3) dtype uint8
Process
Any OpenCV algorithm — blur, detect, transform, annotate — operates on the array
Output
imshow() display · imwrite() save · pass to another function for further analysis

Installation

# Standard install — everything needed for this tutorial
pip install opencv-python numpy

# Full build with extra contrib modules (SIFT, SURF, aruco…)
pip install opencv-contrib-python

Section 02

Reading, Writing & Displaying Images

The three functions you will type more than any other. Every OpenCV program begins and ends with these.

FunctionPurposeKey Flag / Parameter
cv2.imread()Load image from diskIMREAD_COLOR · IMREAD_GRAYSCALE · IMREAD_UNCHANGED
cv2.imshow()Display in a pop-up windowwindow name string, image array
cv2.imwrite()Save image to diskExtension sets format automatically (.jpg/.png/.bmp)
cv2.waitKey()Pause execution for keyboard input0 = wait forever · n = wait n ms
cv2.destroyAllWindows()Close all display windows
import cv2
import numpy as np

# ── Load an image from disk ───────────────────────────────────
img  = cv2.imread("photo.jpg")                        # BGR, shape (H, W, 3)
gray = cv2.imread("photo.jpg", cv2.IMREAD_GRAYSCALE)  # shape (H, W)
rgba = cv2.imread("logo.png",  cv2.IMREAD_UNCHANGED)  # keeps alpha channel

# ── Guard against failed loads ────────────────────────────────
if img is None:
    raise FileNotFoundError("Image not found — check the path")

# ── Inspect the array ─────────────────────────────────────────
print(f"Shape : {img.shape}")   # (480, 640, 3)
print(f"Dtype : {img.dtype}")   # uint8
print(f"Pixels: {img.size}")    # H × W × C total values

# ── Display ───────────────────────────────────────────────────
cv2.imshow("Original", img)
cv2.imshow("Grayscale", gray)
cv2.waitKey(0)
cv2.destroyAllWindows()

# ── Save with quality control ─────────────────────────────────
cv2.imwrite("output.jpg", img, [cv2.IMWRITE_JPEG_QUALITY, 95])
cv2.imwrite("output.png", img)   # PNG is lossless
OUTPUT
Shape : (480, 640, 3) Dtype : uint8 Pixels: 921600
⚠️
imread() Returns None Silently

If the file path is wrong, cv2.imread() does not raise an exception — it returns None. Every subsequent operation will crash with a cryptic error. Always add a None check immediately after loading.


Section 03

Color Spaces & Conversions

The Traffic Light Problem
A self-driving car needs to detect a red traffic light. In BGR, red is (0, 0, 255) — but under noon sun it reads as (20, 30, 210), and at dusk as (10, 15, 140). Matching three channels across lighting conditions is a nightmare.

Switch to HSV. Red is always Hue 0°–10° and 170°–180° regardless of sunlight. You now write a single two-range mask that works in rain, noon glare, and tunnel shadow alike. Choosing the right color space is often more powerful than any algorithm.
🌈 Fig 2 — Common Color Spaces in OpenCV
■ BGR — default load/save ■ HSV — colour masking ■ LAB — perceptual ■ GRAY — single channel
BGR 3 ch · 0–255 Default in OpenCV cvtColor HSV Hue · Sat · Value Best for colour masks L* a* b* LAB Perceptual uniform Skin / colour match GRAY 1 channel · 0–255 Edges / threshold Hue: 0–180° Saturation: 0–255 Value: 0–255 L: 0–100 lightness a: -128 to +127 b: -128 to +127

HSV isolates colour (Hue) from brightness — making masks robust to lighting changes. Use GRAY when you only need intensity.

🏃
BGR / RGB
3 channels · 0–255 each
Default in OpenCV (BGR) and most tools (RGB). Good for display and saving. Poor for colour-based filtering because brightness affects all three channels simultaneously.
🌈
HSV
Hue · Saturation · Value
Best for colour masking and object detection. Hue encodes pure colour independently of brightness. Define a range on Hue alone and your mask works across lighting conditions.
👁
Grayscale / LAB
1 channel · Perceptual
Grayscale for edge detection, thresholding, and speed (⅓ the data). LAB for perceptually uniform colour comparisons — great for skin tone detection and colour-consistency checks.
import cv2
import numpy as np

img = cv2.imread("scene.jpg")

# ── Common conversions ────────────────────────────────────────
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
hsv  = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
rgb  = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
lab  = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)

# ── Practical: isolate RED objects using HSV ──────────────────
# Red wraps around 0° in Hue — needs two separate ranges
lower_red1 = np.array([  0, 120, 70])
upper_red1 = np.array([ 10, 255, 255])
lower_red2 = np.array([170, 120, 70])
upper_red2 = np.array([180, 255, 255])

mask1  = cv2.inRange(hsv, lower_red1, upper_red1)
mask2  = cv2.inRange(hsv, lower_red2, upper_red2)
mask   = cv2.bitwise_or(mask1, mask2)
result = cv2.bitwise_and(img, img, mask=mask)

cv2.imshow("Red Objects Only", result)
cv2.waitKey(0)

Section 04

Image Filtering & Blurring

Before detecting edges or finding objects, you almost always need to reduce noise. Filtering replaces each pixel with a weighted combination of its neighbours — a mathematical operation called convolution.

🧰 Fig 3 — How Kernel Convolution Works
Input Image Patch × Gaussian Kernel (3×3) 1/16 2/16 1/16 2/16 4/16 2/16 1/16 2/16 1/16 = Smoothed Output Sliding Window Each output pixel = weighted sum of its 3×3 neighbourhood Center weight = 4/16 (highest) Sharp contrast → spiky values (noisy). Gaussian smoothing → gradual transitions (clean).

The kernel slides one pixel at a time. Different kernel values produce blur, sharpen, edge-detect, or emboss effects.

FilterFunctionRemovesBest For
cv2.blur()Box / AverageGeneral noiseQuick pre-processing
cv2.GaussianBlur()Gaussian weightedGaussian noiseBefore Canny edge detection
cv2.medianBlur()Median of neighbourhoodBest for salt & pepperScanned documents, old photos
cv2.bilateralFilter()Edge-preserving smoothNoise, keeps edgesPortrait smoothing, medical
cv2.filter2D()Custom kernelAny (you define it)Sharpen, emboss, custom effects
import cv2
import numpy as np

img = cv2.imread("noisy.jpg")

# ── Standard blur methods ─────────────────────────────────────
box       = cv2.blur(img, (5, 5))
gaussian  = cv2.GaussianBlur(img, (5, 5), sigmaX=0)
median    = cv2.medianBlur(img, 5)
bilateral = cv2.bilateralFilter(img, d=9, sigmaColor=75, sigmaSpace=75)

# ── Custom sharpening kernel ──────────────────────────────────
sharpen_k = np.array([[ 0, -1,  0],
                        [-1,  5, -1],
                        [ 0, -1,  0]])
sharp = cv2.filter2D(img, ddepth=-1, kernel=sharpen_k)

compare = np.hstack([img, gaussian, bilateral, sharp])
cv2.imshow("Original | Gaussian | Bilateral | Sharp", compare)
cv2.waitKey(0)
🔑
Kernel Size Must Always Be Odd

Kernel sizes must be odd integers (3, 5, 7, …) so there is a defined centre pixel. Passing an even number raises a cv2.error. Start with (5, 5) and increase for more smoothing.


Section 05

Edge Detection — Canny, Sobel & Laplacian

The Architect's Blueprint
An architect's blueprint is 95% blank space and 5% lines. Those lines carry all the information. Edge detection is how computers "read" blueprints — stripping colour and texture to reveal only the structure. John Canny designed his algorithm in 1986 as a university project. Nearly 40 years later it remains the default choice for single-scale edge detection in medical imaging, autonomous vehicles, and industrial inspection.
📈 Fig 4 — Canny Edge Detection: 5-Step Pipeline
1. Grayscale BGR → single channel COLOR_BGR2GRAY 2. Gaussian Blur Suppress noise first GaussianBlur(5,5) 3. Sobel Gradient Gx, Gy magnitude & direction per pixel 4. Non-Max Suppress Thin all edges to exactly 1 pixel wide 5. Hysteresis Threshold Keep strong edges + weak if joined to strong threshold1 (weak) and threshold2 (strong) — pass Otsu's auto-value for best results

Canny produces the cleanest, thinnest edges of any single-scale detector. The 5-step pipeline eliminates false edges from noise before they can propagate.

DetectorStrengthsWeaknessesUse When
CannyClean thin edges, best overallTwo thresholds to tuneGeneral purpose — almost always first choice
SobelDirectional (X or Y separately)Thicker edges, more noiseYou need edge direction information
LaplacianIsotropic, one passVery noise sensitiveBlur/focus detection (is image sharp?)
import cv2
import numpy as np

img  = cv2.imread("building.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5, 5), 0)

# ── Canny with Otsu auto-threshold ───────────────────────────
otsu_thresh, _ = cv2.threshold(blur, 0, 255,
                                cv2.THRESH_BINARY + cv2.THRESH_OTSU)
edges = cv2.Canny(blur,
                   threshold1=otsu_thresh * 0.5,
                   threshold2=otsu_thresh)

# ── Sobel — horizontal and vertical separately ───────────────
sobelX = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
sobelY = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
sobel  = cv2.convertScaleAbs(cv2.magnitude(sobelX, sobelY))

# ── Laplacian — all directions at once ───────────────────────
laplacian = cv2.convertScaleAbs(cv2.Laplacian(gray, cv2.CV_64F, ksize=3))

compare = np.hstack([edges, sobel, laplacian])
cv2.imshow("Canny | Sobel | Laplacian", compare)
cv2.waitKey(0)

Section 06

Thresholding & Binary Segmentation

Thresholding converts a grayscale image into a clean black-and-white binary image. Each pixel is compared to a threshold value: above → white, below → black. It is the simplest and fastest form of segmentation.

▮ Fig 5 — Global vs Adaptive Thresholding
■ Pixel intensity ⎯ Global threshold (fixed) ~ Adaptive threshold (local mean)
Global Threshold pixel intensities → T=127 0 (black) 255 (white) ⚠ Fails under uneven lighting Adaptive Threshold uneven lighting → local mean ± C ✓ Handles shadows and gradients Adaptive threshold recomputes the cutoff locally for each pixel neighbourhood (blockSize)

For documents with shadows or uneven scanner lighting, always use Adaptive Gaussian. Global binary is only reliable under perfectly uniform illumination.

TypeFlagThreshold Chosen ByBest For
BinaryTHRESH_BINARYYou pick manuallyUniform lighting, clear background
Otsu's AutoTHRESH_OTSUAuto — bimodal histogramWell-lit scenes, documents
Adaptive MeanADAPTIVE_THRESH_MEAN_CMean of neighbourhoodUneven lighting
Adaptive GaussianADAPTIVE_THRESH_GAUSSIAN_CWeighted mean — smootherShadows, gradients in scene
Inverse BinaryTHRESH_BINARY_INVYou pick (inverted)Dark objects on bright background
import cv2

img  = cv2.imread("document.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# ── 1. Manual binary threshold ────────────────────────────────
_, thresh_manual = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)

# ── 2. Otsu's automatic threshold ────────────────────────────
_, thresh_otsu = cv2.threshold(gray, 0, 255,
                                cv2.THRESH_BINARY + cv2.THRESH_OTSU)

# ── 3. Adaptive Gaussian — best for uneven lighting ──────────
thresh_adapt = cv2.adaptiveThreshold(
    gray, 255,
    cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
    cv2.THRESH_BINARY,
    blockSize=11,    # neighbourhood size — must be odd, > 1
    C=2              # constant subtracted from the mean
)

cv2.imshow("Adaptive Threshold", thresh_adapt)
cv2.waitKey(0)

Section 07

Contour Detection & Shape Analysis

The Factory Quality Robot
A pharmaceutical factory needs to count tablets in a blister pack and reject any pack with a missing or broken tablet — 10,000 packs per hour. The OpenCV solution: threshold the image to isolate white tablets on dark foil, call findContours() to outline each blob, measure the area of each contour, and flag anything outside the expected size range. Zero human involvement. Zero blinks. The robot paid for itself in three weeks.
🔎 Fig 6 — Shape Classification via Contour Vertices
Binary image findContours Contours drawn approxPolyDP Circle Quad Triangle Shape labels Key Measurements per Contour Area — cv2.contourArea(cnt) Perimeter — cv2.arcLength(cnt, True) Bbox — cv2.boundingRect(cnt) → x,y,w,h Centroid — cv2.moments(cnt) → m10/m00 Vertices — len(cv2.approxPolyDP(cnt,…)) Hull — cv2.convexHull(cnt)

approxPolyDP reduces a contour to its key vertices — triangle=3, rectangle=4, pentagon=5, circle=many. Use area to filter noise blobs.

import cv2
import numpy as np

img  = cv2.imread("shapes.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, bw = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)

contours, _ = cv2.findContours(bw, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
print(f"Found {len(contours)} objects")

output = img.copy()
for cnt in contours:
    area      = cv2.contourArea(cnt)
    perimeter = cv2.arcLength(cnt, closed=True)
    if area < 500: continue

    approx   = cv2.approxPolyDP(cnt, 0.04 * perimeter, True)
    vertices = len(approx)
    shape    = {3: "Triangle", 4: "Rectangle",
                5: "Pentagon"}.get(vertices, "Circle")

    M  = cv2.moments(cnt)
    cx = int(M["m10"] / (M["m00"] + 1e-5))
    cy = int(M["m01"] / (M["m00"] + 1e-5))

    cv2.drawContours(output, [cnt], -1, (0, 255, 0), 2)
    cv2.putText(output, shape, (cx-30, cy),
               cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2)

cv2.imshow("Shapes", output)
cv2.waitKey(0)

Section 08

Drawing & Annotations

OpenCV's drawing functions modify the image array in-place. Always call img.copy() first if you want to keep the original.

FunctionShapeKey Arguments
cv2.line()Straight lineimg, pt1, pt2, color, thickness
cv2.rectangle()Axis-aligned boximg, pt1, pt2, color, thickness (-1=filled)
cv2.circle()Circleimg, center, radius, color, thickness
cv2.ellipse()Ellipse / arcimg, center, axes, angle, startAngle, endAngle, color
cv2.polylines()Polygon outlineimg, [pts], isClosed, color, thickness
cv2.putText()Text labelimg, text, org, fontFace, fontScale, color, thickness
cv2.arrowedLine()Arrow with headimg, pt1, pt2, color, thickness, tipLength
import cv2
import numpy as np

canvas = np.zeros((480, 640, 3), dtype=np.uint8)

RED    = (0,   0,   255)
GREEN  = (0,   255, 0)
BLUE   = (255, 0,   0)
YELLOW = (0,   255, 255)
WHITE  = (255, 255, 255)

cv2.line      (canvas, (50, 50),   (300, 50),  RED,   3)
cv2.rectangle (canvas, (50, 80),   (200, 180), GREEN, 2)
cv2.circle    (canvas, (350, 130), 60,          BLUE,  -1)
cv2.ellipse   (canvas, (500, 130), (70, 40), 30, 0, 360, YELLOW, 2)
pts = np.array([[100,300],[200,250],[300,350],[150,400]], np.int32)
cv2.polylines (canvas, [pts], isClosed=True, color=YELLOW, thickness=2)
cv2.putText   (canvas, "OpenCV Drawing API", (50, 450),
              cv2.FONT_HERSHEY_DUPLEX, 1.0, WHITE, 2)

cv2.imshow("Canvas", canvas)
cv2.waitKey(0)

Section 09

Geometric Transformations — Resize, Rotate, Warp

🔃 Fig 7 — Affine vs Perspective Transform
■ Original ■ Affine (3-point, parallel lines preserved) ■ Perspective (4-point, only straight lines preserved)
A ● ● B D ● ● C Original warpAffine A' B' D' C' Affine — parallel lines preserved warpPerspective A'' B'' D'' C'' Perspective — only straight lines

Affine needs 3 point pairs (getAffineTransform). Perspective needs 4 point pairs (getPerspectiveTransform) — perfect for document scanner flatten.

TransformFunctionWhat It PreservesReal Use
Resizecv2.resize()Aspect (if fx=fy)Normalising for ML model input
RotationgetRotationMatrix2D + warpAffineParallel linesDeskew scanned documents
Affine WarpgetAffineTransform + warpAffineParallel lines (3-point)Correct camera tilt
Perspective WarpgetPerspectiveTransform + warpPerspectiveStraight lines onlyDocument scanner, bird's-eye road
Flipcv2.flip()Shape and valuesData augmentation
Crop (ROI)img[y1:y2, x1:x2]Pixel values exactlyRegion of interest extraction
import cv2
import numpy as np

img  = cv2.imread("photo.jpg")
h, w = img.shape[:2]

# ── 1. Resize to half ─────────────────────────────────────────
half = cv2.resize(img, (0, 0), fx=0.5, fy=0.5,
                   interpolation=cv2.INTER_AREA)

# ── 2. Rotate 45° around image centre ────────────────────────
M_rot  = cv2.getRotationMatrix2D((w//2, h//2), angle=45, scale=1.0)
rotated = cv2.warpAffine(img, M_rot, (w, h))

# ── 3. Perspective warp — flatten a document ─────────────────
src = np.float32([[73,239],[356,117],[475,265],[187,391]])
dst = np.float32([[0,0],[300,0],[300,400],[0,400]])
M_persp = cv2.getPerspectiveTransform(src, dst)
warped  = cv2.warpPerspective(img, M_persp, (300, 400))

# ── 4. Crop an ROI ────────────────────────────────────────────
roi = img[100:300, 150:400]    # [y1:y2, x1:x2]

cv2.imshow("Rotated", rotated)
cv2.imshow("Warped",  warped)
cv2.waitKey(0)

Section 10

Histograms & Contrast Equalization

A histogram counts how many pixels exist at each intensity level (0–255). A dark image clusters near 0. Equalization stretches the distribution across the full range — dramatically improving visibility in medical scans and surveillance footage.

📊 Fig 8 — Dark Image vs After CLAHE Equalization
■ Dark image histogram (bunched near 0) ■ After CLAHE (spread across full range)
Original (Dark) Histogram Pixel Intensity (0–255) Count 0 128 255 After CLAHE Equalization Pixel Intensity (0–255) 0 128 255

CLAHE divides the image into tiles and equalises each tile separately (tileGridSize), capping amplification at clipLimit to avoid over-boosting noise.

import cv2
import numpy as np

img  = cv2.imread("dark_xray.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

hist = cv2.calcHist([gray], channels=[0], mask=None,
                     histSize=[256], ranges=[0, 256])

# Global equalization (avoid — amplifies noise in uniform areas)
eq_global = cv2.equalizeHist(gray)

# CLAHE — always prefer this for real images
clahe    = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
eq_clahe = clahe.apply(gray)

cv2.imshow("Original", gray)
cv2.imshow("CLAHE", eq_clahe)
cv2.waitKey(0)

Section 11

Face Detection with Haar Cascades

The Algorithm in Every Camera Since 2008
Paul Viola and Michael Jones published their face detector in 2001. By 2008 it was inside every consumer digital camera ever made — the little green square that appears when a face comes into frame. It works by rapidly testing simple brightness differences (Haar features) at hundreds of positions and scales, rejecting non-faces in milliseconds through a cascade of increasingly strict classifiers. The pre-trained XML files ship free with every OpenCV installation. No training required.
👀 Fig 9 — Haar Features & Face Detection Cascade
■ Edge feature ■ Line feature ■ Centre-surround feature ▪ Dark region ▪ Light region
Edge Feature Light Dark forehead/chin line Line Feature dark stripe nose bridge / eyes Centre-Surround eye / nostril pairs Cascade Classifier Stage 1 — 2 features → reject 50% Stage 2 — 5 features → reject 30% Stage N — 200+ features → FACE ✓

The cascade rejects non-faces early (cheap stages) and only runs expensive stages on candidate regions — enabling real-time detection at 30+ fps.

import cv2

face_cascade = cv2.CascadeClassifier(
    cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
eye_cascade  = cv2.CascadeClassifier(
    cv2.data.haarcascades + "haarcascade_eye.xml")

img  = cv2.imread("group_photo.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

faces = face_cascade.detectMultiScale(
    gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))
print(f"Detected {len(faces)} face(s)")

for (x, y, w, h) in faces:
    cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
    roi_gray  = gray[y:y+h, x:x+w]
    roi_color = img [y:y+h, x:x+w]
    eyes = eye_cascade.detectMultiScale(roi_gray, 1.1, 3)
    for (ex, ey, ew, eh) in eyes:
        cv2.circle(roi_color, (ex+ew//2, ey+eh//2), ew//2, (255,0,0), 2)

cv2.imshow("Face Detection", img)
cv2.waitKey(0)
OUTPUT
Detected 4 face(s)

Section 12

Real-Time Video Processing

Every video is a sequence of images (frames) delivered at a fixed frame rate. OpenCV's VideoCapture reads frames one by one — from a webcam, file, or IP stream — so you can apply any image algorithm to each frame.

🎥 Fig 10 — Real-Time Video Processing Loop
VideoCapture cap = cv2. VideoCapture(0) Read Frame ret, frame = cap.read() Process blur / detect / annotate frame Display cv2.imshow() waitKey(1) Exit Check key == 'q' ? cap.release() loop back for next frame — ~30 fps target

Use waitKey(1) (1 ms) — not waitKey(0) — inside a video loop. waitKey(0) blocks indefinitely and freezes the feed.

import cv2

cap = cv2.VideoCapture(0)     # 0 = default webcam

if not cap.isOpened():
    raise IOError("Cannot access camera")

cap.set(cv2.CAP_PROP_FRAME_WIDTH,  1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

fourcc = cv2.VideoWriter_fourcc(*"mp4v")
writer = cv2.VideoWriter("output.mp4", fourcc, 30, (1280, 720))

face_cascade = cv2.CascadeClassifier(
    cv2.data.haarcascades + "haarcascade_frontalface_default.xml")

while True:
    ret, frame = cap.read()
    if not ret: break

    gray  = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray, 1.1, 5)
    for (x, y, w, h) in faces:
        cv2.rectangle(frame, (x,y), (x+w,y+h), (0,255,0), 2)

    writer.write(frame)
    cv2.imshow("Live — press q to quit", frame)
    if cv2.waitKey(1) & 0xFF == ord("q"): break

cap.release()
writer.release()
cv2.destroyAllWindows()

Section 13

Morphological Operations

Separating Touching Cells
A biologist is counting cells in a microscope image. Problem: several cells are touching each other, so findContours counts them as one blob. Apply Erosion to shrink every white region — the cells separate. Apply Dilation to grow them back — they return to their original size but are now cleanly separated. That sequence is called Opening and it is one of the most powerful tricks in binary image processing.
🧪 Fig 11 — Effect of Each Morphological Operation
■ White region (foreground) ■ Black region (background) ■ Changed pixels
Original Erosion Dilation Opening (E→D) Closing (D→E) Gradient separates & cleans fills gaps outlines only

Structuring element shape (RECT, ELLIPSE, CROSS) affects how corners and curves are handled. ELLIPSE is usually best for natural objects.

OperationEffect on White RegionsPractical Use
ErosionShrinks blobs, removes thin protrusionsSeparate touching objects
DilationGrows blobs, fills small holesConnect broken contours
Opening (E→D)Removes small bright blobsNoise removal without shrinking
Closing (D→E)Fills small dark holes inside bright regionsClose gaps in contour lines
Morphological GradientOutline of objects (D − E)Edge detection alternative
Top HatBright structures smaller than kernelUneven background correction
import cv2
import numpy as np

img  = cv2.imread("cells.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, bw = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)

kernel_ellp = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
kernel_rect = cv2.getStructuringElement(cv2.MORPH_RECT,    (5, 5))

eroded   = cv2.erode    (bw, kernel_ellp, iterations=1)
dilated  = cv2.dilate   (bw, kernel_ellp, iterations=1)
opened   = cv2.morphologyEx(bw, cv2.MORPH_OPEN,     kernel_ellp)
closed   = cv2.morphologyEx(bw, cv2.MORPH_CLOSE,    kernel_ellp)
gradient = cv2.morphologyEx(bw, cv2.MORPH_GRADIENT, kernel_rect)
tophat   = cv2.morphologyEx(bw, cv2.MORPH_TOPHAT,   kernel_rect)

compare = np.hstack([bw, eroded, dilated, opened, closed])
cv2.imshow("Original | Erode | Dilate | Open | Close", compare)
cv2.waitKey(0)

Section 14

Template Matching

Template matching slides a small reference image (template) over a larger scene, computing a similarity score at every position. The peak score marks where the template is found.

import cv2
import numpy as np

img      = cv2.imread("scene.jpg")
template = cv2.imread("logo.png")
th, tw   = template.shape[:2]

result    = cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED)
locations = np.where(result >= 0.80)

output = img.copy()
for pt in zip(*locations[::-1]):
    cv2.rectangle(output, pt, (pt[0]+tw, pt[1]+th), (0,0,255), 2)

print(f"Found {len(locations[0])} match(es)")
cv2.imshow("Template Match", output)
cv2.waitKey(0)
ℹ️
TM_CCOEFF_NORMED — Almost Always the Right Choice

Normalised cross-correlation gives scores between −1 and 1 regardless of image brightness. For scale or rotation-invariant matching, use ORB feature matching with FLANN instead — template matching only works when the object appears at the same scale and orientation as the template.


Section 15

Golden Rules

📷 OpenCV — Non-Negotiable Rules
1
Always check for None after imread(). OpenCV silently returns None on a bad path — every subsequent call crashes with a cryptic AttributeError or shape error. One if img is None: raise saves hours of debugging.
2
Images are BGR, not RGB. Convert with cv2.cvtColor(img, cv2.COLOR_BGR2RGB) before passing to Matplotlib, PIL, TensorFlow, or any other library. The sky turns orange if you forget.
3
Always use img.copy() before drawing or modifying if you want to preserve the original. Drawing functions modify arrays in-place with no undo.
4
Blur before you detect edges or threshold. Canny, Sobel, and Otsu's all behave dramatically better with a light Gaussian blur first. GaussianBlur(gray, (5,5), 0) is almost never wasted.
5
Use HSV for colour-based masking, not BGR. In HSV, Hue alone defines the colour — your range works across indoor, outdoor, daylight, and shadow. In BGR you must fight all three channels simultaneously.
6
Always call cap.release() and cv2.destroyAllWindows() when your video loop exits. Failing to release the camera locks it — your next run cannot access it until Python fully exits.
7
Use CLAHE, not equalizeHist() for contrast enhancement. Global equalization amplifies noise. CLAHE limits amplification per local tile and produces visibly better results on medical images, dark video, and satellite imagery.