The Story That Explains Feature Detection
Now imagine two completely different ships — one arriving from the north, one from the east — both spotting the same volcano. Despite seeing it from different angles, in different light, from different distances, they both recognise the same landmark. That is feature detection.
A feature detector is an algorithm that finds those distinctive landmarks in an image — the corners, blobs, and edges that remain recognisable even when the image is zoomed, rotated, or seen from a different viewpoint. A feature descriptor then gives each landmark a unique fingerprint — a compact numerical representation — so that the same landmark can be matched across two completely different images.
In computer vision, Feature Detection is the process of automatically finding interesting points in an image — locations that carry distinctive, repeatable information. Paired with a descriptor that encodes the local appearance around each point, features become the backbone of image matching, panorama stitching, object tracking, 3D reconstruction, and robot navigation.
Every feature-based vision system follows three steps: (1) Detection — find keypoints (distinctive locations in the image). (2) Description — compute a numerical descriptor vector for each keypoint's local neighbourhood. (3) Matching — compare descriptors between images to find corresponding point pairs. Getting all three right is what separates a working system from a failing one.
What Makes a Good Feature?
Not every point in an image is a useful feature. A pixel in the middle of a smooth wall looks exactly like every other pixel around it — useless for matching. A corner, a blob centre, or a salient texture region is far more distinctive. Good features share four essential properties:
A single straight edge is a terrible feature. If you look through a small window (aperture) at a line, you can tell it moved perpendicular to itself — but you cannot tell how far it moved along its length. This ambiguity, the aperture problem, is why edges are unreliable landmarks. Corners do not suffer from this — they constrain motion in all directions.
Harris Corner Detector — The Classic Foundation
Their insight: take a small square patch of an image and slide it in every direction. If you are on a flat surface — intensity barely changes as you slide. If you are on an edge — intensity changes strongly in one direction but not the perpendicular. If you are on a corner — intensity changes strongly in every direction.
That "change in every direction" is exactly what a corner is. Harris formalised this using a structure tensor (the second-moment matrix of image gradients), then distilled it into a single response value R. Thirty-five years later, it is still the first detector taught in every computer vision course.
The Harris detector computes image gradients Ix and Iy, builds the structure tensor M for each pixel's local neighbourhood, then computes the corner response R from M's eigenvalues — without actually computing eigenvalues (which would be slow).
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load image and convert to greyscale
img = cv2.imread('chessboard.jpg')
grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
grey_f = np.float32(grey)
# Harris Corner Detection
# blockSize = neighbourhood size for M computation
# ksize = Sobel kernel aperture size
# k = Harris free parameter (0.04 – 0.06)
harris = cv2.cornerHarris(grey_f, blockSize=2, ksize=3, k=0.04)
# Dilate to mark corner regions more visibly
harris = cv2.dilate(harris, None)
# Threshold: mark pixels with strong corner response as red
img_corners = img.copy()
img_corners[harris > 0.01 * harris.max()] = [0, 0, 255]
# Count detected corners
corner_mask = harris > 0.01 * harris.max()
n_corners = np.sum(corner_mask)
print(f"Harris corners detected : {n_corners}")
print(f"Max response value : {harris.max():.4f}")
print(f"Image shape : {grey.shape}")
# Sub-pixel accuracy refinement (optional but recommended)
coords = np.argwhere(corner_mask) # (row, col) pairs
coords = np.float32(coords[:, ::-1]) # flip to (x, y)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
corners_sub = cv2.cornerSubPix(grey_f, coords, (5,5), (-1,-1), criteria)
print(f"Sub-pixel refined corners: {len(corners_sub)}")
| R Value | λ₁, λ₂ Relationship | Region Type | Action |
|---|---|---|---|
| R ≫ 0 | Both large & similar | Corner ✓ | Keep as keypoint |
| R ≪ 0 | λ₁ ≫ λ₂ or vice versa | Edge ✗ | Discard — unreliable |
| |R| ≈ 0 | Both small | Flat region ✗ | Discard — no information |
Harris detects corners at a fixed scale determined by the window size and Sobel kernel. A corner visible at 1× magnification may not be detected at 0.5×. This is the fundamental limitation that motivated the development of SIFT five years later. For matching images with significant zoom differences, Harris alone is insufficient.
FAST — Features from Accelerated Segment Test
Harris corners are accurate but slow — computing gradients for every pixel is expensive. In 2006, Edward Rosten and Tom Drummond published FAST, which detects corners using a clever circle-based test that can skip most pixels immediately, making it 10–100× faster than Harris. FAST became the detector half of ORB, the most widely-used real-time feature detector today.
import cv2
import numpy as np
img = cv2.imread('street.jpg')
grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Create FAST detector
fast = cv2.FastFeatureDetector_create(
threshold=20, # intensity difference threshold t
nonmaxSuppression=True, # remove clustered detections
type=cv2.FastFeatureDetector_TYPE_9_16 # FAST-9 variant
)
# Detect keypoints
kp = fast.detect(grey, None)
print(f"Keypoints (NMS ON) : {len(kp)}")
# Compare without non-maximum suppression
fast.setNonmaxSuppression(False)
kp_no_nms = fast.detect(grey, None)
print(f"Keypoints (NMS OFF): {len(kp_no_nms)}")
# Draw keypoints
img_kp = cv2.drawKeypoints(img, kp, None, color=(0, 255, 0))
# Benchmark: FAST vs Harris speed
import time
t0 = time.perf_counter()
for _ in range(100): fast.detect(grey, None)
fast_ms = (time.perf_counter() - t0) * 10 # avg ms per call
t0 = time.perf_counter()
for _ in range(100): cv2.cornerHarris(np.float32(grey), 2, 3, 0.04)
harris_ms = (time.perf_counter() - t0) * 10
print(f"\nSpeed comparison (640×480 image):")
print(f" FAST : {fast_ms:.2f} ms")
print(f" Harris : {harris_ms:.2f} ms")
print(f" Speedup: {harris_ms/fast_ms:.1f}×")
SIFT — Scale-Invariant Feature Transform
Lowe's breakthrough insight: build an image pyramid — repeatedly blur and downsample the image — and look for features that stand out relative to their scale. A feature that is distinctive at its own scale — appearing as a blob against its neighbours in the pyramid — will be found at the same physical location regardless of the image magnification.
SIFT was published in full in 2004. Its 128-dimensional descriptor set the gold standard that every subsequent detector has been measured against. The patent expired in 2020; it is now fully open source.
import cv2
import numpy as np
import matplotlib.pyplot as plt
img1 = cv2.imread('building_a.jpg')
img2 = cv2.imread('building_b.jpg') # same building, different viewpoint
grey1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
grey2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
# Create SIFT detector (patent-free since 2020)
sift = cv2.SIFT_create(
nfeatures=0, # 0 = unlimited
nOctaveLayers=3, # layers per octave (DoG levels = nOctaveLayers + 2)
contrastThreshold=0.04,# min DoG response to keep keypoint
edgeThreshold=10, # Harris ratio threshold for edge rejection
sigma=1.6 # initial Gaussian blur sigma
)
# Detect and compute descriptors in one call
kp1, des1 = sift.detectAndCompute(grey1, None)
kp2, des2 = sift.detectAndCompute(grey2, None)
print(f"SIFT keypoints — img1: {len(kp1)}, img2: {len(kp2)}")
print(f"Descriptor shape : {des1.shape}") # (N, 128) float32
# FLANN-based matcher — much faster than brute force for SIFT
FLANN_INDEX_KDTREE = 1
index_params = {'algorithm': FLANN_INDEX_KDTREE, 'trees': 5}
search_params = {'checks': 50}
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1, des2, k=2)
# Lowe's ratio test — keep only unambiguous matches
# A match is "good" if the nearest neighbour is much closer than the 2nd nearest
good = [m for m, n in matches if m.distance < 0.75 * n.distance]
print(f"Total matches : {len(matches)}")
print(f"After ratio test (0.75): {len(good)}")
print(f"Match acceptance rate : {len(good)/len(matches)*100:.1f}%")
For every query descriptor, FLANN returns the two nearest neighbours (distances d1 and d2). If d1 / d2 < 0.75, the nearest neighbour is significantly closer than the second-nearest — the match is unambiguous and likely correct. If the ratio is close to 1.0, two descriptors look almost equally similar — the match is ambiguous and should be discarded. This single test removes the majority of false positives with almost no false-negative cost.
ORB — The Real-Time Champion
SIFT is excellent but slow — and was patent-encumbered until 2020. In 2011, Rublee et al. (OpenCV lab) published ORB: a detector and descriptor that rivals SIFT in matching quality, runs over 100× faster, and has always been free to use. ORB is the default choice for any application needing real-time performance.
import cv2
import numpy as np
img1 = cv2.imread('logo_clean.jpg')
img2 = cv2.imread('logo_rotated.jpg') # same logo, rotated 45°
grey1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
grey2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
# Create ORB detector
orb = cv2.ORB_create(
nfeatures=1000, # max keypoints to retain
scaleFactor=1.2, # pyramid scale factor between levels
nlevels=8, # number of pyramid levels
edgeThreshold=31, # border where features are not detected
firstLevel=0,
WTA_K=2, # points compared per BRIEF test (2=standard)
scoreType=cv2.ORB_HARRIS_SCORE, # use Harris score for NMS ranking
patchSize=31, # patch size for BRIEF descriptor
fastThreshold=20
)
# Detect and compute
kp1, des1 = orb.detectAndCompute(grey1, None)
kp2, des2 = orb.detectAndCompute(grey2, None)
print(f"ORB keypoints : img1={len(kp1)}, img2={len(kp2)}")
print(f"Descriptor dtype : {des1.dtype}") # uint8 binary
print(f"Descriptor shape : {des1.shape}") # (N, 32) — 256 bits
# Brute Force matcher with Hamming distance
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1, des2)
matches = sorted(matches, key=lambda x: x.distance)
# Print top matches
print(f"\nTotal ORB matches : {len(matches)}")
print(f"Best match distance: {matches[0].distance:.1f} (0=perfect)")
print(f"Worst match distance: {matches[-1].distance:.1f} (256=worst)")
# Draw top 30 matches
img_matches = cv2.drawMatches(
img1, kp1, img2, kp2, matches[:30], None,
flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
)
AKAZE & BRISK — The Modern Binary Alternatives
import cv2
import numpy as np
import time
img = cv2.imread('texture_scene.jpg')
grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
detectors = {
'SIFT' : cv2.SIFT_create(),
'ORB' : cv2.ORB_create(nfeatures=1000),
'AKAZE' : cv2.AKAZE_create(),
'BRISK' : cv2.BRISK_create(),
'KAZE' : cv2.KAZE_create(),
}
print(f"{'Detector':8} | {'Keypoints':>10} | {'Desc Shape':>12} | {'Desc Type':>10} | {'ms':>6}")
print("-" * 58)
for name, det in detectors.items():
t0 = time.perf_counter()
kp, des = det.detectAndCompute(grey, None)
ms = (time.perf_counter() - t0) * 1000
dtype = des.dtype if des is not None else 'N/A'
shape = des.shape if des is not None else (0,0)
print(f"{name:8} | {len(kp):>10,} | {str(shape):>12} | {str(dtype):>10} | {ms:>6.1f}")
Detector & Descriptor Comparison — Choosing the Right Tool
| Detector | Scale Inv. | Rotation Inv. | Speed | Desc. Size | Desc. Type | Best For |
|---|---|---|---|---|---|---|
| Harris | ✗ No | Partial | Medium | N/A | None | Calibration boards, teaching |
| FAST | ✗ No | ✗ No | Very fast | N/A | None | Real-time detection only (pair with BRIEF) |
| SIFT | ✓ Yes | ✓ Yes | Slow | 128 × 4B | float32 | Accuracy-critical matching, 3D reconstruction |
| ORB | ✓ Yes | ✓ Yes | Very fast | 32B | uint8 binary | Real-time AR, mobile, embedded devices |
| AKAZE | ✓ Yes | ✓ Yes | Medium | 61B | uint8 binary | Textured objects, deformable surfaces |
| BRISK | ✓ Yes | ✓ Yes | Fast | 64B | uint8 binary | General purpose real-time matching |
| KAZE | ✓ Yes | ✓ Yes | Very slow | 64 × 4B | float32 | Medical / scientific imaging, deformable objects |
Use ORB when speed matters and you can control image quality. Use SIFT when you cannot — extreme viewpoint changes, low-contrast images, or when wrong matches are costly (medical, forensic, satellite matching). Use AKAZE when matching highly textured or deformable objects. Never use Harris or FAST alone for matching — they produce no descriptors.
HOG — Histogram of Oriented Gradients
Their insight: a person's shape is defined by the local distribution of gradient directions — the orientation of edges in small regions of the image. You don't need to know the exact brightness of each pixel; you need to know which way the edges point in each small block.
HOG divides the image into a grid of cells, computes an 8-bin gradient orientation histogram for each cell, groups cells into blocks, and normalises across blocks. The result: a dense feature vector that perfectly captures shape. The Dalal-Triggs paper became one of the most cited in computer vision history, directly enabling the pedestrian detectors in the first generation of driver-assistance systems.
from skimage.feature import hog
from skimage import exposure, color
import cv2
import numpy as np
import matplotlib.pyplot as plt
img_bgr = cv2.imread('pedestrian.jpg')
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
# Resize to standard detection window
img_resized = cv2.resize(img_rgb, (64, 128))
# Compute HOG features + visualisation image
fd, hog_image = hog(
img_resized,
orientations=9, # 9 orientation bins (0–180°)
pixels_per_cell=(8, 8), # cell size in pixels
cells_per_block=(2, 2), # block size in cells (for normalisation)
visualize=True,
channel_axis=-1 # input is HxWxC (RGB)
)
# Enhance contrast for visualisation
hog_vis = exposure.rescale_intensity(hog_image, in_range=(0, 10))
print(f"HOG feature vector length : {len(fd)}") # 3780 for 64×128
print(f"HOG vector dtype : {fd.dtype}") # float64
print(f"HOG min / max : {fd.min():.4f} / {fd.max():.4f}")
# ── HOG + SVM pedestrian detector (OpenCV built-in) ──────────
hog_cv = cv2.HOGDescriptor()
hog_cv.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
# Multi-scale sliding window detection
full_img = cv2.imread('crowd.jpg')
boxes, weights = hog_cv.detectMultiScale(
full_img,
winStride=(8, 8), # step between detection windows
padding=(4, 4), # padding around image before detection
scale=1.05 # pyramid scale factor
)
print(f"\nPedestrians detected: {len(boxes)}")
for (x, y, w, h), conf in zip(boxes, weights):
print(f" Box ({x},{y},{w},{h}) confidence: {conf[0]:.3f}")
Feature Matching & RANSAC — From Matches to Geometry
Detecting and describing features is only half the story. The real goal is using those matched feature pairs to estimate a geometric transformation between two images — a homography (for planar scenes or pure rotation), an essential matrix (for calibrated cameras), or a fundamental matrix (for uncalibrated cameras). The challenge: even with Lowe's ratio test, some matches are still wrong. Wrong matches are called outliers. They destroy least-squares estimation. The solution is RANSAC.
This is RANSAC (Random Sample Consensus). In feature matching, each "witness" is a matched point pair. The "story" is whether those points obey a specific geometric transformation model (homography). The liars are mismatches. RANSAC finds the transformation supported by the most matches, regardless of outliers.
import cv2
import numpy as np
# ── Load images ────────────────────────────────────────────────
img1 = cv2.imread('scene_left.jpg')
img2 = cv2.imread('scene_right.jpg') # overlapping panorama shot
g1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
g2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
# ── SIFT detect + describe ─────────────────────────────────────
sift = cv2.SIFT_create(nfeatures=2000)
kp1, des1 = sift.detectAndCompute(g1, None)
kp2, des2 = sift.detectAndCompute(g2, None)
# ── FLANN matching + ratio test ────────────────────────────────
flann = cv2.FlannBasedMatcher({'algorithm':1,'trees':5}, {'checks':50})
raw = flann.knnMatch(des1, des2, k=2)
good = [m for m,n in raw if m.distance < 0.75*n.distance]
print(f"Ratio-test survivors: {len(good)} / {len(raw)}")
# ── Extract (x,y) point pairs ──────────────────────────────────
src_pts = np.float32([kp1[m.queryIdx].pt for m in good]).reshape(-1,1,2)
dst_pts = np.float32([kp2[m.trainIdx].pt for m in good]).reshape(-1,1,2)
# ── RANSAC homography estimation ───────────────────────────────
H, mask = cv2.findHomography(
src_pts, dst_pts,
method=cv2.RANSAC,
ransacReprojThreshold=5.0 # max reprojection error (pixels)
)
inliers = mask.ravel().tolist()
n_in = sum(inliers)
n_out = len(inliers) - n_in
print(f"RANSAC inliers : {n_in}")
print(f"RANSAC outliers : {n_out}")
print(f"Inlier ratio : {n_in/len(inliers)*100:.1f}%")
# ── Warp img1 onto img2's plane ────────────────────────────────
h2, w2 = img2.shape[:2]
warped = cv2.warpPerspective(img1, H, (w2 * 2, h2))
print(f"Panorama canvas : {warped.shape}")
The number of RANSAC iterations needed to guarantee (with probability p) finding at least one all-inlier sample: N = log(1−p) / log(1−w^s), where w is the inlier ratio and s is the sample size (4 for homography). With 50% inliers and p=0.99: N = log(0.01)/log(1−0.5⁴) ≈ 72 iterations. OpenCV's default of 2000 max iterations covers even 10% inlier scenarios.
Deep Learning Features — SuperPoint, LightGlue & Beyond
Classical detectors are engineered by hand — every design choice (DoG blob scale, orientation bin count, BRIEF sampling pattern) is manually tuned. Since 2017, a new wave of learned feature detectors and matchers have emerged, trained end-to-end to maximise matching accuracy on real image pairs. These methods now set the state of the art on every benchmark.
# SuperPoint + LightGlue example (requires kornia / lightglue library)
# pip install lightglue
import torch
from lightglue import LightGlue, SuperPoint
from lightglue.utils import load_image, rbd
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Initialise detector and matcher
extractor = SuperPoint(max_num_keypoints=1024).eval().to(device)
matcher = LightGlue(features='superpoint').eval().to(device)
# Load images as normalised tensors
img0 = load_image('view_a.jpg').to(device)
img1 = load_image('view_b.jpg').to(device)
with torch.no_grad():
# Extract keypoints + descriptors
feats0 = extractor(dict(image=img0))
feats1 = extractor(dict(image=img1))
# Match — LightGlue replaces RANSAC
result = matcher({'image0': feats0, 'image1': feats1})
# Remove batch dimension; extract matches
feats0, feats1, result = [rbd(x) for x in [feats0, feats1, result]]
matches = result['matches']
scores = result['matching_scores']
kpts0 = feats0['keypoints'][matches[..., 0]]
kpts1 = feats1['keypoints'][matches[..., 1]]
print(f"Keypoints detected : {len(feats0['keypoints'])}")
print(f"Verified matches : {len(matches)}")
print(f"Mean match confidence: {scores.mean():.3f}")
Use learned features when you have a GPU, need maximum accuracy, and face challenging conditions: low texture, large viewpoint changes, night/day transitions, or motion blur. Use classical features (ORB, SIFT) when you need CPU-only deployment, interpretability, or are working in a constrained environment (embedded systems, edge devices). Classical methods are still competitive for well-textured scenes under moderate viewpoint change.
End-to-End Project — Panorama Stitching from Scratch
This project ties everything together: detect with SIFT, match with FLANN + ratio test, estimate geometry with RANSAC, warp and blend into a seamless panorama. This is exactly what your phone's panorama mode does — in real time.
import cv2
import numpy as np
def stitch_panorama(img_left, img_right, ratio=0.75, reproj_thresh=5.0):
"""
Stitch two overlapping images into a panorama using SIFT + RANSAC.
Returns the stitched panorama and metadata dict.
"""
g_left = cv2.cvtColor(img_left, cv2.COLOR_BGR2GRAY)
g_right = cv2.cvtColor(img_right, cv2.COLOR_BGR2GRAY)
# ── 1. Detect and describe ─────────────────────────────────
sift = cv2.SIFT_create(nfeatures=3000)
kp_l, des_l = sift.detectAndCompute(g_left, None)
kp_r, des_r = sift.detectAndCompute(g_right, None)
# ── 2. FLANN match + ratio test ────────────────────────────
flann = cv2.FlannBasedMatcher({'algorithm':1,'trees':5},{'checks':50})
raw = flann.knnMatch(des_l, des_r, k=2)
good = [m for m,n in raw if m.distance < ratio*n.distance]
if len(good) < 10:
raise ValueError(f"Insufficient matches: {len(good)}")
# ── 3. RANSAC homography ───────────────────────────────────
src = np.float32([kp_l[m.queryIdx].pt for m in good]).reshape(-1,1,2)
dst = np.float32([kp_r[m.trainIdx].pt for m in good]).reshape(-1,1,2)
H, mask = cv2.findHomography(src, dst, cv2.RANSAC, reproj_thresh)
inliers = int(mask.sum())
# ── 4. Warp left image into right image's coordinate system ─
h_r, w_r = img_right.shape[:2]
h_l, w_l = img_left.shape[:2]
canvas_w = w_r + w_l # wide enough for both images
warped = cv2.warpPerspective(img_left, H, (canvas_w, h_r))
# ── 5. Composite: place right image on top of warped left ───
warped[0:h_r, 0:w_r] = img_right
# Crop black borders
grey_w = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(grey_w, 1, 255, cv2.THRESH_BINARY)
x,y,w,h = cv2.boundingRect(thresh)
panorama = warped[y:y+h, x:x+w]
meta = {'keypoints_left': len(kp_l), 'keypoints_right': len(kp_r),
'good_matches': len(good), 'inliers': inliers,
'panorama_shape': panorama.shape}
return panorama, meta
# ── Run it ─────────────────────────────────────────────────────
left = cv2.imread('pano_left.jpg')
right = cv2.imread('pano_right.jpg')
result, info = stitch_panorama(left, right)
for k, v in info.items():
print(f"{k:20}: {v}")
cv2.imwrite('panorama_output.jpg', result)
Evaluating Feature Detectors — How Do You Know It Is Working?
import cv2
import numpy as np
def evaluate_repeatability(img1, img2, H_gt, detector, px_threshold=3):
"""
Compute keypoint repeatability between two images given ground-truth homography H_gt.
H_gt maps points from img1 to img2's coordinate system.
"""
g1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
g2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
kp1 = detector.detect(g1, None)
kp2 = detector.detect(g2, None)
# Project kp1 points into img2 using ground-truth homography
pts1 = np.float32([k.pt for k in kp1]).reshape(-1,1,2)
pts2 = np.float32([k.pt for k in kp2])
proj1 = cv2.perspectiveTransform(pts1, H_gt).reshape(-1,2)
# For each projected point, find nearest kp2 point
repeated = 0
for p in proj1:
dists = np.linalg.norm(pts2 - p, axis=1)
if dists.min() < px_threshold:
repeated += 1
repeatability = repeated / min(len(kp1), len(kp2))
return repeatability, len(kp1), len(kp2), repeated
# Evaluate SIFT vs ORB on a test pair with known homography
img1 = cv2.imread('test_a.jpg')
img2 = cv2.imread('test_b.jpg')
H_gt = np.load('H_ground_truth.npy') # from dataset
for name, det in {'SIFT': cv2.SIFT_create(),
'ORB' : cv2.ORB_create(),
'AKAZE': cv2.AKAZE_create()}.items():
rep, n1, n2, n_rep = evaluate_repeatability(img1, img2, H_gt, det)
print(f"{name:6s}: kp1={n1:5d}, kp2={n2:5d}, repeated={n_rep:4d}, R={rep:.3f}")
Classical vs Deep Learning Features — Full Showdown
| Property | Harris / FAST | SIFT / AKAZE | ORB / BRISK | SuperPoint + LightGlue |
|---|---|---|---|---|
| Scale invariance | ✗ None | ✓ Full | ✓ Partial | ✓ Learned |
| Rotation invariance | ✗ None | ✓ Full | ✓ Full | ✓ Full |
| Illumination robustness | Limited | Good | Moderate | Excellent |
| Texture-less scenes | Fails | Struggles | Fails | LoFTR handles well |
| CPU-only speed | Very fast | SIFT: slow / AKAZE: ok | Very fast | Requires GPU |
| Descriptor size | — | 512 B (SIFT) | 32 B (ORB) | 256 B (SuperPoint) |
| Requires training data | No | No | No | Yes — large image pairs dataset |
| HPatches MMA @ 3px | — | ~0.62 | ~0.45 | ~0.92 |
| Best use case | Calibration, teaching | Offline 3D recon, forensics | Mobile AR, robotics | Autonomous vehicles, SLAM |
Golden Rules of Feature Detection
cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
before every detect-and-compute call.
nfeatures=1000–2000 for panorama stitching
and nfeatures=300–500 for real-time tracking. Uncapped SIFT
(nfeatures=0) is fine for offline pipelines.
cv2.GaussianBlur(img, (3,3), 0)
before detection eliminates the majority of noise-driven false keypoints at negligible cost.
cv2.drawMatches() and inspect visually at least once per new dataset.
Silent failures — wrong homographies that happen to not throw exceptions — are far more
dangerous than noisy errors. Your eyes catch geometric inconsistencies instantly.