💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
jump to
💡 In a hurry? Jump to this chapter’s 1 big lesson ↓

12.1 Motion blur and temporal sampling

The two great themes of spatial imaging are convolution — a measurement integrates over a region, the story of the point-spread function — and sampling — a discrete grid can only represent frequencies below its Nyquist limit, and anything faster aliases. This chapter makes one claim, and it is almost embarrassingly simple: both recur, identically, on the time axis. A photograph is not an instant; the shutter stays open for an exposure time, and the sensor integrates everything that lands during that window — so a moving scene smears, and motion blur is convolution in time. A video is not a continuous record; it is a sequence of frames at some rate — so motion faster than the frame rate folds down to a false, slow apparent motion, and temporal aliasing is the wagon-wheel effect, the temporal twin of moiré.

What makes this more than a cute analogy is that the two phenomena turn out to be one. The anti-alias filter you would want before sampling a video is a low-pass over time, and the camera already has one for free — the exposure window. Integrating over the exposure is the temporal prefilter, and the blur it produces is the price of suppressing aliasing. Motion blur and temporal aliasing are not two problems; they are the two outcomes of one knob, $\tau$. And the deeper question the chapter ends on — do I think of motion as particles I follow, or as a time-series at each fixed pixel? — is the Lagrangian vs Eulerian distinction, the frame that ties this whole part together.

💡 Big lesson (recurrence of L5 / L16 — Nyquist and prefiltering, on the time axis)

The spatial sampling laws apply unchanged in time. L5: a frame rate $f_s$ can faithfully capture only temporal frequencies below $f_s/2$; faster motion folds down to a false low (even reversed) frequency — the wagon-wheel effect is temporal moiré. L16: the cure for aliasing is to prefilter before you sample — and on the time axis the prefilter is built into the camera, because integrating over the exposure window (which is exactly what produces motion blur) is the temporal anti-alias filter. So motion blur and temporal aliasing are not two problems but one tradeoff: a longer exposure removes strobing by blurring; a shorter one gives sharp frames that strobe. (Registered in Linearity, Fourier, aliasing and deblurring as L5 and L16; this is their temporal instance — the wagon-wheel is L16's named time-axis face.)

12.1.1 A frame is an integral over time → motion blur

Start with the cause, intuition first. A photo is not a snapshot of a single instant. The shutter is open for an exposure time $\tau$, and over that whole window the sensor accumulates every photon that arrives. If a bright feature moves across the frame while the shutter is open, its light is deposited not at one place but along the entire path it travels — a streak, not a point. Motion blur is precisely the temporal version of the standing intuition that a pixel is a weighted average of what it saw; here the average is taken over time rather than over a spatial neighborhood (Figure 12.1.1).

Make it an equation. For an image-plane velocity $\mathbf v$, taken constant over the exposure, the blurred frame $B$ at pixel $\mathbf x$ is the time-average of the sharp scene $I$ over all the positions the scene occupied during the exposure:

$$ B(\mathbf{x}) = \frac{1}{\tau}\int_0^\tau I(\mathbf{x}-\mathbf{v}\,t)\,dt . $$

Read it literally: the value recorded at $\mathbf x$ is the average of the sharp scene seen through a sliding offset $-\mathbf v t$, swept over the open-shutter interval $[0,\tau]$ and normalized by its length.

Now the punchline, the one that makes this a chapter and not a footnote. That integral is a convolution. Write it as

$$ B = I * k_{\mathbf v}, $$

where the path kernel $k_{\mathbf v}$ is a 1-D box of length $\lVert\mathbf v\rVert\,\tau$, oriented along the motion direction $\mathbf v$, normalized to unit area. Constant velocity gives a straight box; curved motion gives a curved path kernel that traces the trajectory. This is the exact temporal analog of the spatial PSF — spatial blur smears over a spatial neighborhood; motion blur smears over the motion path — the same operator, with the axis merely time projected onto space. It is worth recalling from the convolution material that convolution was the averaging / sum-of-random-variables operation in the first place; here the "random variable" being averaged is simply position-during-exposure.

The photographer's knob for all of this is shutter speed, because the streak length is $\lVert\mathbf v\rVert\,\tau$ and so scales directly with exposure time. In cinematography the same control is expressed as a shutter angle,

$$ \theta = 360^\circ\cdot\frac{\tau}{T}, \qquad T = \frac{1}{f_s}, $$

the fraction of the frame interval $T$ that the (historically rotating-disc) shutter is open. The $180^\circ$ convention, $\tau = T/2$, gives the natural "film look"; $360^\circ$ integrates the whole interval and blurs maximally; small angles give crisp but strobed frames — the staccato, every-impact-frozen look of the Saving Private Ryan battle scenes (Figure 12.1.2).

That sets up the tradeoff to state plainly. A longer $\tau$ collects more light — better signal-to-noise — but produces more blur; a shorter $\tau$ gives sharper frames but less light and, as the next section shows, temporal aliasing. For fast motion at a fixed frame rate you cannot have sharp, well-exposed, and alias-free all at once; you pick two.

Finally, a forward-looking word on removing the blur. If the path kernel $k_{\mathbf v}$ is known, undoing motion blur is just deconvolution by it — but a straight-line box kernel has zeros in its frequency response, so naive inversion is ill-posed. This is exactly the Wiener / regularized-inversion story of Blind deblurring, now with a motion PSF rather than a defocus one; when different objects move at different speeds the blur is spatially varying and no single kernel applies. One can even engineer the kernel to invert cleanly — a coded or "flutter" shutter (Raskar et al. 2006) chops the exposure open and closed to give $k_{\mathbf v}$ a broadband, zero-free spectrum, and motion-invariant capture (Levin et al. 2008) sweeps the camera so that every speed receives the same invertible PSF. The philosophy is the same "make the forward operator nice" move as coded aperture.

fig-motion-blur-integral
Figure 12.1.1. A frame is an integral over the exposure. A bright point traverses the frame while the shutter is open for time $\tau$; the sensor accumulates its light along the whole path, recording a streak rather than a point. The streak's length is $\lVert\mathbf v\rVert\,\tau$ and its intensity profile is the shape of the exposure's time-window — so motion blur is literally a 1-D box convolution along the motion vector $\mathbf v$.
fig-shutter-angle
Figure 12.1.2. Shutter angle sets the blur. A rotating-disc shutter with a $0^\circ$ / $180^\circ$ / $360^\circ$ opening admits a smaller or larger fraction of the frame interval $T$, integrating short / normal / maximal motion blur. The $180^\circ$ case ($\tau=T/2$) is the cinematic "film-look" convention; small angles freeze motion into strobed frames.

12.1.2 Time is sampled → temporal aliasing, the wagon-wheel effect

A video is a discrete sequence of frames at rate $f_s$: it samples the continuous motion of the world. And as with any sampling, motion components faster than half the frame rate cannot be represented and instead alias — they fold down to a false, slower, sometimes reversed apparent motion. This is L5/L16 on the time axis, full stop.

The condition is the temporal Nyquist limit. To capture motion of temporal frequency $f_{\text{motion}}$ — say, a wheel's spokes passing a fixed point — you need

$$ f_s > 2\,f_{\text{motion}} . $$

Violate it and the recorded apparent frequency folds to $f_{\text{app}} = \lvert f_{\text{motion}} - n f_s\rvert$, for the integer $n$ that brings the result into $[0, f_s/2]$.

The canonical demonstration is the wagon-wheel effect (Figure 12.1.3). Film a spoked wheel at frame rate $f_s$. While the spoke-passing frequency stays below Nyquist the wheel turns forward correctly; as it climbs toward $f_s$ the wheel appears to slow, stall, then spin backward — because each frame catches the next spoke just short of where the previous one stood, so the sampled snapshots imply a small backward step per frame. It is the exact temporal twin of spatial moiré: undersampling maps a high frequency onto a wrong low one. Helicopter rotors and car rims under video or strobed streetlights show the identical artifact.

The everyday face of the same phenomenon is strobing / judder: fast pans, or fast subjects shot at low frame rate or with a very short shutter, look stuttery rather than smoothly flowing, because the motion is temporally undersampled and the eye perceives discrete jumps instead of continuous travel. Stroboscopic lighting that appears to freeze or reverse a spinning fan is the same effect, with the light doing the sampling instead of the shutter.

The fixes are the sampling toolkit, transplanted to time. One option is brute force: raise $f_s$ — a higher frame rate buys a higher temporal Nyquist, which is exactly what high-speed cameras are for. The other is to prefilter before sampling, which on the time axis means integrate longer: a longer exposure is a wider temporal box, a low-pass that attenuates the offending high temporal frequencies before the frame grid samples them. And that prefilter is motion blur — which is why these two sections are really one story, the subject of the next.

fig-temporal-aliasing-wagonwheel
Figure 12.1.3. The wagon-wheel effect. A spoked wheel is sampled below its temporal Nyquist rate: above Nyquist it spins forward, near it the wheel appears to stall, and past it the wheel appears to spin backward — strobing. This is the temporal twin of moiré, a high motion frequency mapped to a wrong low one.

12.1.3 Motion blur is the temporal prefilter — the two are one tradeoff

Here is the unification, and the payoff of reading L16 in time. The temporal anti-alias filter you would want in front of a video sampler is a low-pass over time — and the camera already supplies one, for free: integration over the exposure window. A long exposure box-filters the motion in time before the frame rate samples it, suppressing exactly the high temporal frequencies that would otherwise alias. So motion blur and temporal aliasing are the two outcomes of one knob $\tau$: integrate more and you get blur (aliasing prefiltered away); integrate less and you get sharp frames that strobe (aliasing let through). They are not separate phenomena to be traded off against each other from outside — they are the two sides of a single dial (Figure 12.1.4).

Why you cannot win for free is just the spatial L16 dilemma — prefiltering buys freedom from aliasing at the cost of blur — moved bodily to the time axis. The "right" exposure is a frequency-domain compromise: long enough to kill the aliasing, short enough to keep the wanted motion sharp. Synthetic imaging faces the dual of this and resolves it the same way: computer-generated imagery (CGI) renderers add motion blur deliberately as a temporal anti-alias filter (Potmesil & Chakravarty 1983), integrating over sub-frame time samples so that fast on-screen motion blurs smoothly instead of strobing — the renderer is choosing a temporal $\tau$ exactly as a camera does.

There is an escape hatch, and it is the same one that recurs across the book: capture the full set, decide later. Shoot at a high frame rate with a short shutter — sharp and, if $f_s$ is high enough, alias-free — and then synthesize whatever motion blur or slow-motion you want afterward by integrating or interpolating frames. This defers the exposure-versus-blur decision out of the instant of capture entirely; it is the temporal cousin of capturing a light field and choosing the aperture later, and it sets up Frame interpolation and slow-motion synthesis.

fig-blur-as-temporal-prefilter
Figure 12.1.4. One knob, two outcomes. The same fast motion is captured two ways: a long exposure gives a frame that is blurred but not aliased — the exposure window low-passed the motion before sampling; a short exposure / high shutter angle gives sharp frames that strobe — the high temporal frequencies were let through to alias. The exposure window is the temporal anti-alias filter; this is L16 in time.

12.1.4 Lagrangian vs Eulerian — the organizing distinction for the part

The chapter's last move steps back from blur and aliasing to a distinction that frames the entire part. Borrowed from fluid dynamics, there are two ways to describe a moving scene, and almost every technique in this part commits to one of them.

The Lagrangian view says follow the particles. Attach yourself to a physical point in the scene and ask where it goes over time. This is tracking and optical flow: the output is trajectories and correspondences — a displacement field recording which point moved where, the material-derivative ($\tfrac{D}{Dt}$) way of looking at things.

The Eulerian view says watch fixed locations. Stay put at a single grid pixel and record the time series of intensity that passes through it. There is no correspondence, no "which point" — only what happened at this fixed cell over time. The output is a per-pixel signal in time (Figure 12.1.5).

This is the right organizing axis because it cleanly sorts the part's methods and explains their difficulty. The Lagrangian methods are Optical flow (dense — follow every pixel's motion) and Feature tracking (sparse — follow distinctive points). They are hard precisely because correspondence is non-local and ill-posed: the aperture problem, occlusion, and large displacements all bite. Matching-based flow is "more Lagrangian" than differential flow exactly because it follows the patch rather than differentiating at a fixed pixel. The Eulerian method is video magnification (Video magnification), and its trick is to refuse correspondence altogether: take the time series at each fixed pixel, band-pass filter it in time, amplify the tiny temporal variations — a pulse in a cheek, a sub-pixel structural sway, a sound-induced vibration of a chip bag — and add them back. By never following particles, Eulerian processing sidesteps the entire hard matching problem; that is why it works on motions far too small to track. This chapter exists to set that contrast up.

The two views trade off against each other in a way worth stating sharply. Lagrangian gives you explicit, large-range correspondence — you genuinely know where the ball went — but you must solve the hard, ill-posed matching to get it. Eulerian is trivial to compute — just per-pixel temporal filtering — and is brilliant for tiny, sub-pixel changes, but it has no notion of where anything went and breaks down for large motion, because a big displacement is simply not a small per-pixel perturbation. The rule of thumb: small motion → Eulerian; large motion → Lagrangian. (Phase-based magnification is a clever middle ground — Eulerian processing carried out in a steerable-pyramid phase representation, where local phase already encodes local motion.)

Optional sidebar — the fluid-dynamics origin

The Lagrangian / Eulerian terms come from describing fluid flow, and the analogy is exact. The Lagrangian specification tracks individual fluid parcels along their trajectories — imagine following one dyed droplet downstream. The Eulerian specification fixes a measurement grid and records velocity or density at each fixed point as fluid passes through — a weather station logging whatever blows by. Computer-graphics fluid solvers make the very same choice: particle / smoothed-particle hydrodynamics (SPH) methods are Lagrangian, grid-based solvers are Eulerian, and many production simulators are hybrid (FLIP, particle-in-cell), borrowing from both. That mirrors vision exactly — tracked particles for large motion, per-pixel grids for tiny motion — the same dichotomy answering the same question, in two fields a century and a half apart.

fig-lagrangian-vs-eulerian
Figure 12.1.5. The organizing diagram. Lagrangian (left): arrows follow individual particles through the frame over time — optical flow and feature tracking, whose output is trajectories and correspondences. Eulerian (right): a fixed grid of pixels, each plotting its own intensity-over-time series — video magnification, which never asks where anything went. The two ways to look at one moving scene.

Big lessons of this chapter

The recurring principles from this chapter, gathered for review.

💡 Big lesson (recurrence of L5 / L16 — Nyquist and prefiltering, on the time axis)

The spatial sampling laws apply unchanged in time. L5: a frame rate $f_s$ can faithfully capture only temporal frequencies below $f_s/2$; faster motion folds down to a false low (even reversed) frequency — the wagon-wheel effect is temporal moiré. L16: the cure for aliasing is to prefilter before you sample — and on the time axis the prefilter is built into the camera, because integrating over the exposure window (which is exactly what produces motion blur) is the temporal anti-alias filter. So motion blur and temporal aliasing are not two problems but one tradeoff: a longer exposure removes strobing by blurring; a shorter one gives sharp frames that strobe. (Registered in Linearity, Fourier, aliasing and deblurring as L5 and L16; this is their temporal instance — the wagon-wheel is L16's named time-axis face.)