💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
Computational Photography, an AI-powered Slopendium — 07 Matching pixels across space and time
expand to📖 Full book outlinejump to1 parts · 7 chapters · 12 sections · 12 figures embedded · 4 placeholders · double-click a figure to enlarge
Part 7 MATCHING PIXELS ACROSS SPACE AND TIME
💡 **Big lesson (L17 · estimation half):** correspondence-then-transport — this part owns the **hard, ill-posed estimation** half (where did each pixel go?), ill-posed by the **aperture problem** and broken by occlusion and large motion; [[Warping and morphing]] owns the transport. Matching is literally the inverse of warping: warping consumes a coordinate map, matching produces one.
7.1 Sparse matching
7.2 Feature tracking
fig-track-vs-flow
fig-track-vs-flow · two regimes of correspondence — dense flow (one pair, a vector at every pixel; dense but short) vs tracking (a few points threaded as trajectories across many frames; sparse but long) 🟨
fig-klt-is-harris
fig-klt-is-harris · one matrix, two readings — the same structure-tensor ellipse read as Harris ("good corner to detect?") and as KLT ("motion well-conditioned?"); detection and trackability are the same statement about $M$ 🟨
fig-good-features-to-track
fig-good-features-to-track · Shi–Tomasi points land only on corners — markers clustering on corners/junctions, never flat sky or single edges, with windows annotated by their eigenvalue pair; "good feature" = "well-conditioned $M$" = "corner" 🟨
⬜ figure not yet created
`fig-klt-pyramid-iteration` (per-feature coarse-to-fine: predict on a blurred level, warp, re-solve the $2\times2$ system, refine down the pyramid — the LK iteration localised to one patch) fig-klt-pyramid-iteration
fig-track-drift-occlusion
fig-track-drift-occlusion · three ways a long track fails — drift (window slides off as sub-pixel errors accumulate), occlusion (dissimilarity spikes → drop the feature), re-detection (a fresh Shi–Tomasi point spawned to replace a lost one) 🟨
⬜ figure not yet created
`fig-modern-point-trackers` (KLT's single greedy patch vs a learned tracker — CoTracker — tracking *many* points *jointly* through an occlusion, recovering the point when it reappears) fig-modern-point-trackers
The previous section estimated a **dense** flow field — a motion vector at *every* pixel — between *one* pair of frames. This section does the complementary thing: follow a **sparse** handful of *distinctive* points across *many* frames. It is the same physics (brightness constancy, the aperture problem) restricted to the places where the math is well-behaved, and run forward in time. The payoff of the restriction is the chapter's spine: **the matrix that says a patch is a good corner to *detect* is the same matrix that must be invertible to *track* it.** Detect with it (Harris), track with it (KLT), refuse to track where it is singular (Shi–Tomasi). One $2\times2$ matrix, three jobs.
💡 **Big lesson (recurrence of L8 — a learned operator swaps a hand-designed prior for one learned from data):** classical KLT tracks a hand-built brightness patch by a hand-derived least-squares update; modern point trackers (PIPs, CoTracker) keep the *exact same task* — "where did this point go?" — but replace the patch with **learned features** and the greedy per-point update with a **learned, occlusion-aware, track-together** module. The skeleton (initialise a trajectory, iteratively refine it against appearance, predict visibility) is unchanged; only the operator is now fit to data. (→ see Big lesson **L8**; first appears in [[Deep learning]]. The same "neuralise the classical iteration" move powers RAFT in [[Optical flow]].)
equations
per-feature LK normal equations $M\,\Delta\mathbf{u}=\mathbf{b}$, with the **structure tensor** $M=\sum_{\mathbf{x}\in W} \begin{psmallmatrix}I_x^2 & I_xI_y\\ I_xI_y & I_y^2\end{psmallmatrix}$ and $\mathbf{b}=-\sum_{\mathbf{x}\in W} I_t\begin{psmallmatrix}I_x\\ I_y\end{psmallmatrix}$ → update $\Delta\mathbf{u}=M^{-1}\mathbf{b}$
**Shi–Tomasi trackability** $\min(\lambda_1,\lambda_2)>\lambda_{\min}$ (a feature is trackable iff $M$ is well-conditioned)
affine-warp **dissimilarity** $\varepsilon=\sum_W [\,I_2(A\mathbf{x}+\mathbf{d})-I_1(\mathbf{x})\,]^2$ (occlusion / lost-track test)
7.3 Robustness: the ratio test and RANSAC
7.4 Deep learning approaches to sparse matching
7.5 Misc: fast matching
7.6 Optical flow
fig-flow-field-colorkey
fig-flow-field-colorkey · from two frames to a dense field — a per-pixel flow shown with the standard colour key (hue = direction, saturation = speed, after Baker et al. 2011); coherent regions read as one colour, static background near-white
⬜ figure not yet created
`fig-brightness-constancy` (a patch at $(x,y,t)$ reappears at $(x+u,y+v,t+1)$ with the *same* intensity — the assumption, and where it breaks: lighting change, specularity, occlusion) fig-brightness-constancy
fig-flow-constraint-line
fig-flow-constraint-line · one equation, a line of answers — $I_x u + I_y v + I_t = 0$ as a line in the $(u,v)$ plane; the gradient fixes only the normal component (normal flow), the along-edge tangent undetermined 🟨
fig-aperture-problem
fig-aperture-problem · why one pixel is not enough — a straight edge through an aperture (three true motions, identical appearance, only normal recoverable) and the barber-pole illusion (diagonal stripes appearing to move straight up) 🟨
fig-zoom-mechanism
fig-zoom-mechanism · a zoom at two focal lengths (short-f/wide vs long-f/tele): moving variator + compensator groups slide along the axis to change f while the focal plane (sensor) stays fixed; motion arrows between states
fig-flow-corner-edge-flat
fig-flow-corner-edge-flat · the structure tensor $A^\top A$ deciding where flow is solvable — corner (two large eigenvalues, full 2-D flow), edge (rank-deficient, only normal flow), flat (indeterminate); the same picture that selected Harris corners 🟨
⬜ figure not yet created
`fig-LK-vs-HS` (Lucas–Kanade: solve a tiny system per window, local fig-klt-pyramid-iteration
fig-mc-as-flow
fig-mc-as-flow · same correspondence, two budgets — a dense smooth optical-flow field vs the codec's one-constant-vector-per-block field; the same "where did this come from?" coarsened to what is cheap to estimate and transmit 🟨
fig-coarse-to-fine-flow
fig-coarse-to-fine-flow · making large motion sub-pixel — a Gaussian pyramid where coarse displacement is fractional; at each level upsample-and-scale, warp, estimate the small residual and add; a Laplacian view of the flow 🟨
fig-raft-skeleton
fig-raft-skeleton · RAFT, the classical pipeline neuralized — learned feature + context encoders, an all-pairs 4-D correlation volume, and a recurrent GRU update operator iterating; the boxes line up with data term, cost volume, and warp-then-refine 🟨
💡 **Big lesson (the chapter's core — *brightness constancy + the aperture problem*):** track a point by the one thing that's (assumed) conserved — its **brightness**. Linearizing that single scalar equation gives **one constraint per pixel**, $I_x u + I_y v + I_t = 0$, for **two** unknowns $(u,v)$ — so locally motion is **fundamentally underdetermined**: you can only see the component **along the image gradient** (normal flow), never the component **along an edge** (the *aperture problem*). Every flow method is a strategy for supplying the missing second equation — from a **neighbor** (Lucas–Kanade: pixels in a patch share a motion → the well-/ill-posedness is the **same corner/edge/flat structure tensor as Harris**), from a **smoothness prior** (Horn–Schunck), or from **learned context** (RAFT). *Only the gradient carries motion information, and one gradient is never enough.*
7.7 Deep learning approaches to optical flow