The Story That Explains OpenCV
OpenCV (Open Source Computer Vision Library) is the Swiss Army knife that lets you manipulate, analyse, and extract meaning from those grids of numbers. First built at Intel in 1999, it now powers everything from NASA rover cameras to your phone's portrait mode — and it is 100% free.
At its core, every image in OpenCV is a NumPy array of shape (Height, Width, Channels) with data type uint8 (values 0–255). Every algorithm is simply a function that takes that array in and returns a transformed array out.
OpenCV reads images as BGR arrays by default — not RGB. All processing happens on NumPy arrays in memory.
OpenCV stores colour channels in Blue-Green-Red order — the opposite of what Matplotlib, PIL, and most libraries expect. Always convert with cv2.cvtColor(img, cv2.COLOR_BGR2RGB) before passing to any other library. Forgetting this is the most common beginner mistake.
Installation
# Standard install — everything needed for this tutorial
pip install opencv-python numpy
# Full build with extra contrib modules (SIFT, SURF, aruco…)
pip install opencv-contrib-python
Reading, Writing & Displaying Images
The three functions you will type more than any other. Every OpenCV program begins and ends with these.
| Function | Purpose | Key Flag / Parameter |
|---|---|---|
cv2.imread() | Load image from disk | IMREAD_COLOR · IMREAD_GRAYSCALE · IMREAD_UNCHANGED |
cv2.imshow() | Display in a pop-up window | window name string, image array |
cv2.imwrite() | Save image to disk | Extension sets format automatically (.jpg/.png/.bmp) |
cv2.waitKey() | Pause execution for keyboard input | 0 = wait forever · n = wait n ms |
cv2.destroyAllWindows() | Close all display windows | — |
import cv2
import numpy as np
# ── Load an image from disk ───────────────────────────────────
img = cv2.imread("photo.jpg") # BGR, shape (H, W, 3)
gray = cv2.imread("photo.jpg", cv2.IMREAD_GRAYSCALE) # shape (H, W)
rgba = cv2.imread("logo.png", cv2.IMREAD_UNCHANGED) # keeps alpha channel
# ── Guard against failed loads ────────────────────────────────
if img is None:
raise FileNotFoundError("Image not found — check the path")
# ── Inspect the array ─────────────────────────────────────────
print(f"Shape : {img.shape}") # (480, 640, 3)
print(f"Dtype : {img.dtype}") # uint8
print(f"Pixels: {img.size}") # H × W × C total values
# ── Display ───────────────────────────────────────────────────
cv2.imshow("Original", img)
cv2.imshow("Grayscale", gray)
cv2.waitKey(0)
cv2.destroyAllWindows()
# ── Save with quality control ─────────────────────────────────
cv2.imwrite("output.jpg", img, [cv2.IMWRITE_JPEG_QUALITY, 95])
cv2.imwrite("output.png", img) # PNG is lossless
If the file path is wrong, cv2.imread() does not raise an exception — it returns None. Every subsequent operation will crash with a cryptic error. Always add a None check immediately after loading.
Color Spaces & Conversions
Switch to HSV. Red is always Hue 0°–10° and 170°–180° regardless of sunlight. You now write a single two-range mask that works in rain, noon glare, and tunnel shadow alike. Choosing the right color space is often more powerful than any algorithm.
HSV isolates colour (Hue) from brightness — making masks robust to lighting changes. Use GRAY when you only need intensity.
import cv2
import numpy as np
img = cv2.imread("scene.jpg")
# ── Common conversions ────────────────────────────────────────
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
# ── Practical: isolate RED objects using HSV ──────────────────
# Red wraps around 0° in Hue — needs two separate ranges
lower_red1 = np.array([ 0, 120, 70])
upper_red1 = np.array([ 10, 255, 255])
lower_red2 = np.array([170, 120, 70])
upper_red2 = np.array([180, 255, 255])
mask1 = cv2.inRange(hsv, lower_red1, upper_red1)
mask2 = cv2.inRange(hsv, lower_red2, upper_red2)
mask = cv2.bitwise_or(mask1, mask2)
result = cv2.bitwise_and(img, img, mask=mask)
cv2.imshow("Red Objects Only", result)
cv2.waitKey(0)
Image Filtering & Blurring
Before detecting edges or finding objects, you almost always need to reduce noise. Filtering replaces each pixel with a weighted combination of its neighbours — a mathematical operation called convolution.
The kernel slides one pixel at a time. Different kernel values produce blur, sharpen, edge-detect, or emboss effects.
| Filter | Function | Removes | Best For |
|---|---|---|---|
cv2.blur() | Box / Average | General noise | Quick pre-processing |
cv2.GaussianBlur() | Gaussian weighted | Gaussian noise | Before Canny edge detection |
cv2.medianBlur() | Median of neighbourhood | Best for salt & pepper | Scanned documents, old photos |
cv2.bilateralFilter() | Edge-preserving smooth | Noise, keeps edges | Portrait smoothing, medical |
cv2.filter2D() | Custom kernel | Any (you define it) | Sharpen, emboss, custom effects |
import cv2
import numpy as np
img = cv2.imread("noisy.jpg")
# ── Standard blur methods ─────────────────────────────────────
box = cv2.blur(img, (5, 5))
gaussian = cv2.GaussianBlur(img, (5, 5), sigmaX=0)
median = cv2.medianBlur(img, 5)
bilateral = cv2.bilateralFilter(img, d=9, sigmaColor=75, sigmaSpace=75)
# ── Custom sharpening kernel ──────────────────────────────────
sharpen_k = np.array([[ 0, -1, 0],
[-1, 5, -1],
[ 0, -1, 0]])
sharp = cv2.filter2D(img, ddepth=-1, kernel=sharpen_k)
compare = np.hstack([img, gaussian, bilateral, sharp])
cv2.imshow("Original | Gaussian | Bilateral | Sharp", compare)
cv2.waitKey(0)
Kernel sizes must be odd integers (3, 5, 7, …) so there is a defined centre pixel. Passing an even number raises a cv2.error. Start with (5, 5) and increase for more smoothing.
Edge Detection — Canny, Sobel & Laplacian
Canny produces the cleanest, thinnest edges of any single-scale detector. The 5-step pipeline eliminates false edges from noise before they can propagate.
| Detector | Strengths | Weaknesses | Use When |
|---|---|---|---|
| Canny | Clean thin edges, best overall | Two thresholds to tune | General purpose — almost always first choice |
| Sobel | Directional (X or Y separately) | Thicker edges, more noise | You need edge direction information |
| Laplacian | Isotropic, one pass | Very noise sensitive | Blur/focus detection (is image sharp?) |
import cv2
import numpy as np
img = cv2.imread("building.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
# ── Canny with Otsu auto-threshold ───────────────────────────
otsu_thresh, _ = cv2.threshold(blur, 0, 255,
cv2.THRESH_BINARY + cv2.THRESH_OTSU)
edges = cv2.Canny(blur,
threshold1=otsu_thresh * 0.5,
threshold2=otsu_thresh)
# ── Sobel — horizontal and vertical separately ───────────────
sobelX = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
sobelY = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
sobel = cv2.convertScaleAbs(cv2.magnitude(sobelX, sobelY))
# ── Laplacian — all directions at once ───────────────────────
laplacian = cv2.convertScaleAbs(cv2.Laplacian(gray, cv2.CV_64F, ksize=3))
compare = np.hstack([edges, sobel, laplacian])
cv2.imshow("Canny | Sobel | Laplacian", compare)
cv2.waitKey(0)
Thresholding & Binary Segmentation
Thresholding converts a grayscale image into a clean black-and-white binary image. Each pixel is compared to a threshold value: above → white, below → black. It is the simplest and fastest form of segmentation.
For documents with shadows or uneven scanner lighting, always use Adaptive Gaussian. Global binary is only reliable under perfectly uniform illumination.
| Type | Flag | Threshold Chosen By | Best For |
|---|---|---|---|
| Binary | THRESH_BINARY | You pick manually | Uniform lighting, clear background |
| Otsu's Auto | THRESH_OTSU | Auto — bimodal histogram | Well-lit scenes, documents |
| Adaptive Mean | ADAPTIVE_THRESH_MEAN_C | Mean of neighbourhood | Uneven lighting |
| Adaptive Gaussian | ADAPTIVE_THRESH_GAUSSIAN_C | Weighted mean — smoother | Shadows, gradients in scene |
| Inverse Binary | THRESH_BINARY_INV | You pick (inverted) | Dark objects on bright background |
import cv2
img = cv2.imread("document.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# ── 1. Manual binary threshold ────────────────────────────────
_, thresh_manual = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
# ── 2. Otsu's automatic threshold ────────────────────────────
_, thresh_otsu = cv2.threshold(gray, 0, 255,
cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# ── 3. Adaptive Gaussian — best for uneven lighting ──────────
thresh_adapt = cv2.adaptiveThreshold(
gray, 255,
cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,
blockSize=11, # neighbourhood size — must be odd, > 1
C=2 # constant subtracted from the mean
)
cv2.imshow("Adaptive Threshold", thresh_adapt)
cv2.waitKey(0)
Contour Detection & Shape Analysis
findContours()
to outline each blob, measure the area of each contour, and flag anything outside the
expected size range. Zero human involvement. Zero blinks.
The robot paid for itself in three weeks.
approxPolyDP reduces a contour to its key vertices — triangle=3, rectangle=4, pentagon=5, circle=many. Use area to filter noise blobs.
import cv2
import numpy as np
img = cv2.imread("shapes.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, bw = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)
contours, _ = cv2.findContours(bw, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
print(f"Found {len(contours)} objects")
output = img.copy()
for cnt in contours:
area = cv2.contourArea(cnt)
perimeter = cv2.arcLength(cnt, closed=True)
if area < 500: continue
approx = cv2.approxPolyDP(cnt, 0.04 * perimeter, True)
vertices = len(approx)
shape = {3: "Triangle", 4: "Rectangle",
5: "Pentagon"}.get(vertices, "Circle")
M = cv2.moments(cnt)
cx = int(M["m10"] / (M["m00"] + 1e-5))
cy = int(M["m01"] / (M["m00"] + 1e-5))
cv2.drawContours(output, [cnt], -1, (0, 255, 0), 2)
cv2.putText(output, shape, (cx-30, cy),
cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2)
cv2.imshow("Shapes", output)
cv2.waitKey(0)
Drawing & Annotations
OpenCV's drawing functions modify the image array in-place. Always call img.copy() first if you want to keep the original.
| Function | Shape | Key Arguments |
|---|---|---|
cv2.line() | Straight line | img, pt1, pt2, color, thickness |
cv2.rectangle() | Axis-aligned box | img, pt1, pt2, color, thickness (-1=filled) |
cv2.circle() | Circle | img, center, radius, color, thickness |
cv2.ellipse() | Ellipse / arc | img, center, axes, angle, startAngle, endAngle, color |
cv2.polylines() | Polygon outline | img, [pts], isClosed, color, thickness |
cv2.putText() | Text label | img, text, org, fontFace, fontScale, color, thickness |
cv2.arrowedLine() | Arrow with head | img, pt1, pt2, color, thickness, tipLength |
import cv2
import numpy as np
canvas = np.zeros((480, 640, 3), dtype=np.uint8)
RED = (0, 0, 255)
GREEN = (0, 255, 0)
BLUE = (255, 0, 0)
YELLOW = (0, 255, 255)
WHITE = (255, 255, 255)
cv2.line (canvas, (50, 50), (300, 50), RED, 3)
cv2.rectangle (canvas, (50, 80), (200, 180), GREEN, 2)
cv2.circle (canvas, (350, 130), 60, BLUE, -1)
cv2.ellipse (canvas, (500, 130), (70, 40), 30, 0, 360, YELLOW, 2)
pts = np.array([[100,300],[200,250],[300,350],[150,400]], np.int32)
cv2.polylines (canvas, [pts], isClosed=True, color=YELLOW, thickness=2)
cv2.putText (canvas, "OpenCV Drawing API", (50, 450),
cv2.FONT_HERSHEY_DUPLEX, 1.0, WHITE, 2)
cv2.imshow("Canvas", canvas)
cv2.waitKey(0)
Geometric Transformations — Resize, Rotate, Warp
Affine needs 3 point pairs (getAffineTransform). Perspective needs 4 point pairs (getPerspectiveTransform) — perfect for document scanner flatten.
| Transform | Function | What It Preserves | Real Use |
|---|---|---|---|
| Resize | cv2.resize() | Aspect (if fx=fy) | Normalising for ML model input |
| Rotation | getRotationMatrix2D + warpAffine | Parallel lines | Deskew scanned documents |
| Affine Warp | getAffineTransform + warpAffine | Parallel lines (3-point) | Correct camera tilt |
| Perspective Warp | getPerspectiveTransform + warpPerspective | Straight lines only | Document scanner, bird's-eye road |
| Flip | cv2.flip() | Shape and values | Data augmentation |
| Crop (ROI) | img[y1:y2, x1:x2] | Pixel values exactly | Region of interest extraction |
import cv2
import numpy as np
img = cv2.imread("photo.jpg")
h, w = img.shape[:2]
# ── 1. Resize to half ─────────────────────────────────────────
half = cv2.resize(img, (0, 0), fx=0.5, fy=0.5,
interpolation=cv2.INTER_AREA)
# ── 2. Rotate 45° around image centre ────────────────────────
M_rot = cv2.getRotationMatrix2D((w//2, h//2), angle=45, scale=1.0)
rotated = cv2.warpAffine(img, M_rot, (w, h))
# ── 3. Perspective warp — flatten a document ─────────────────
src = np.float32([[73,239],[356,117],[475,265],[187,391]])
dst = np.float32([[0,0],[300,0],[300,400],[0,400]])
M_persp = cv2.getPerspectiveTransform(src, dst)
warped = cv2.warpPerspective(img, M_persp, (300, 400))
# ── 4. Crop an ROI ────────────────────────────────────────────
roi = img[100:300, 150:400] # [y1:y2, x1:x2]
cv2.imshow("Rotated", rotated)
cv2.imshow("Warped", warped)
cv2.waitKey(0)
Histograms & Contrast Equalization
A histogram counts how many pixels exist at each intensity level (0–255). A dark image clusters near 0. Equalization stretches the distribution across the full range — dramatically improving visibility in medical scans and surveillance footage.
CLAHE divides the image into tiles and equalises each tile separately (tileGridSize), capping amplification at clipLimit to avoid over-boosting noise.
import cv2
import numpy as np
img = cv2.imread("dark_xray.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
hist = cv2.calcHist([gray], channels=[0], mask=None,
histSize=[256], ranges=[0, 256])
# Global equalization (avoid — amplifies noise in uniform areas)
eq_global = cv2.equalizeHist(gray)
# CLAHE — always prefer this for real images
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
eq_clahe = clahe.apply(gray)
cv2.imshow("Original", gray)
cv2.imshow("CLAHE", eq_clahe)
cv2.waitKey(0)
Face Detection with Haar Cascades
The cascade rejects non-faces early (cheap stages) and only runs expensive stages on candidate regions — enabling real-time detection at 30+ fps.
import cv2
face_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
eye_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + "haarcascade_eye.xml")
img = cv2.imread("group_photo.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(
gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))
print(f"Detected {len(faces)} face(s)")
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
roi_gray = gray[y:y+h, x:x+w]
roi_color = img [y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray, 1.1, 3)
for (ex, ey, ew, eh) in eyes:
cv2.circle(roi_color, (ex+ew//2, ey+eh//2), ew//2, (255,0,0), 2)
cv2.imshow("Face Detection", img)
cv2.waitKey(0)
Real-Time Video Processing
Every video is a sequence of images (frames) delivered at a fixed frame rate. OpenCV's VideoCapture reads frames one by one — from a webcam, file, or IP stream — so you can apply any image algorithm to each frame.
Use waitKey(1) (1 ms) — not waitKey(0) — inside a video loop. waitKey(0) blocks indefinitely and freezes the feed.
import cv2
cap = cv2.VideoCapture(0) # 0 = default webcam
if not cap.isOpened():
raise IOError("Cannot access camera")
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
writer = cv2.VideoWriter("output.mp4", fourcc, 30, (1280, 720))
face_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
while True:
ret, frame = cap.read()
if not ret: break
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.1, 5)
for (x, y, w, h) in faces:
cv2.rectangle(frame, (x,y), (x+w,y+h), (0,255,0), 2)
writer.write(frame)
cv2.imshow("Live — press q to quit", frame)
if cv2.waitKey(1) & 0xFF == ord("q"): break
cap.release()
writer.release()
cv2.destroyAllWindows()
Morphological Operations
findContours counts them as one blob. Apply Erosion
to shrink every white region — the cells separate. Apply Dilation to grow
them back — they return to their original size but are now cleanly separated.
That sequence is called Opening and it is one of the most powerful
tricks in binary image processing.
Structuring element shape (RECT, ELLIPSE, CROSS) affects how corners and curves are handled. ELLIPSE is usually best for natural objects.
| Operation | Effect on White Regions | Practical Use |
|---|---|---|
| Erosion | Shrinks blobs, removes thin protrusions | Separate touching objects |
| Dilation | Grows blobs, fills small holes | Connect broken contours |
| Opening (E→D) | Removes small bright blobs | Noise removal without shrinking |
| Closing (D→E) | Fills small dark holes inside bright regions | Close gaps in contour lines |
| Morphological Gradient | Outline of objects (D − E) | Edge detection alternative |
| Top Hat | Bright structures smaller than kernel | Uneven background correction |
import cv2
import numpy as np
img = cv2.imread("cells.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, bw = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
kernel_ellp = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
kernel_rect = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
eroded = cv2.erode (bw, kernel_ellp, iterations=1)
dilated = cv2.dilate (bw, kernel_ellp, iterations=1)
opened = cv2.morphologyEx(bw, cv2.MORPH_OPEN, kernel_ellp)
closed = cv2.morphologyEx(bw, cv2.MORPH_CLOSE, kernel_ellp)
gradient = cv2.morphologyEx(bw, cv2.MORPH_GRADIENT, kernel_rect)
tophat = cv2.morphologyEx(bw, cv2.MORPH_TOPHAT, kernel_rect)
compare = np.hstack([bw, eroded, dilated, opened, closed])
cv2.imshow("Original | Erode | Dilate | Open | Close", compare)
cv2.waitKey(0)
Template Matching
Template matching slides a small reference image (template) over a larger scene, computing a similarity score at every position. The peak score marks where the template is found.
import cv2
import numpy as np
img = cv2.imread("scene.jpg")
template = cv2.imread("logo.png")
th, tw = template.shape[:2]
result = cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED)
locations = np.where(result >= 0.80)
output = img.copy()
for pt in zip(*locations[::-1]):
cv2.rectangle(output, pt, (pt[0]+tw, pt[1]+th), (0,0,255), 2)
print(f"Found {len(locations[0])} match(es)")
cv2.imshow("Template Match", output)
cv2.waitKey(0)
Normalised cross-correlation gives scores between −1 and 1 regardless of image brightness. For scale or rotation-invariant matching, use ORB feature matching with FLANN instead — template matching only works when the object appears at the same scale and orientation as the template.
Golden Rules
None after imread(). OpenCV silently returns None on a bad path — every subsequent call crashes with a cryptic AttributeError or shape error. One if img is None: raise saves hours of debugging.cv2.cvtColor(img, cv2.COLOR_BGR2RGB) before passing to Matplotlib, PIL, TensorFlow, or any other library. The sky turns orange if you forget.img.copy() before drawing or modifying if you want to preserve the original. Drawing functions modify arrays in-place with no undo.GaussianBlur(gray, (5,5), 0) is almost never wasted.cap.release() and cv2.destroyAllWindows() when your video loop exits. Failing to release the camera locks it — your next run cannot access it until Python fully exits.equalizeHist() for contrast enhancement. Global equalization amplifies noise. CLAHE limits amplification per local tile and produces visibly better results on medical images, dark video, and satellite imagery.