💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
jump to

10.0 MULTIPLE EXPOSURE IMAGING

In the 1850s, Gustave Le Gray sold seascapes that no camera of his day could take. A wet-plate negative could hold a bright sky or a dim sea, never both — so he shot the two separately and sandwiched two negatives under one sheet of printing paper, exposing each through its own mask. The horizon where they meet is a seam he hid by hand. He was not faking; he was measuring twice and combining, because one exposure could not hold the whole scene. He was not alone in the instinct: a decade earlier, W. H. Fox Talbot's printing establishment at Reading was already cutting and pasting prints to widen a view beyond what a single plate could frame. The arithmetic that used to live in masks, scissors, and a darkroom timer now lives in software and runs per-pixel, millions of times a second, inside the phone in your pocket. But the move is identical, and it is old: one frame isn't enough — so take several, and combine them.

This part is that move, made systematic. The previous part measured and characterised a single capture and all the ways it falls short — not enough light, not enough dynamic range, not enough field of view, not enough depth of field, not enough colors, not enough time. The fix, every time, is the same shape: when one frame cannot hold everything you want, capture several frames that each commit differently, and resolve the tradeoff afterward, as a downstream computation. That is the spine of the whole part, and it is worth stating bluntly — capture the full set, decide later. Instead of forcing the instant of exposure to make every decision at once, you record a set that brackets the decision, and you choose per-pixel in software once the data is in hand. Capture is cheap and parallel; the decision is a computation you can take your time over, redo, or defer indefinitely.

Each chapter is one axis of "more than one capture," and on each axis the deferral is the same idea wearing different clothes. Averaging a burst defers the noise/exposure tradeoff — don't take one careful frame and hope, take many and let the noise cancel. High-dynamic-range (HDR) imaging defers the oldest question in photography, expose for the highlights or the shadows? — capture both and merge. Focal stacks defer focus near or far? Panoramas defer which field of view? Hyperspectral defers which three colors do I call red, green, blue? Time-lapse intrinsics defer which illumination? The cost is always the same — more data, plus a reconstruction step — and so is the payoff: an irreversible choice is lifted out of the moment of capture and turned into something you compute. Once you see this one move, the part stops looking like six unrelated topics and becomes one idea applied along six axes.

Two quantitative results recur underneath the prose, so meet them now. The first is the $1/\sqrt N$ averaging law: model each noisy pixel as a true value plus independent, zero-mean noise; average $N$ frames and the signal survives untouched while the noise standard deviation falls as $1/\sqrt N$. It is the honest engine behind every "shoot more frames" trick in the part — and it comes with a built-in sting, each extra stop of cleanliness costs four times the frames. The second is geometric: when a camera merely rotates (it does not translate), the views are related by a homography that does not depend on scene depth — which is exactly why you can stitch a panorama from a hand-held pan without knowing anything about the 3-D world. The third is not a formula but a recurring algorithmic move — robust combination: when a stack carries outliers, replace the plain mean with a median or sigma-clipping combine, trust the consensus and reject the stray. It begins as outlier rejection in a denoising stack, becomes de-ghosting when something moves between an HDR bracket's frames or a panorama's, and returns as the Weiss median that separates a moving object from a static scene — the same move in five chapters. Keep all three in your pocket; they explain most of what follows.

The roadmap walks the axes. Denoising by averaging adds time: it derives the $1/\sqrt N$ law from two lines of probability, upgrades the plain mean to a robust combine (median, sigma-clipping) when the stack has outliers, adds the calibration frames (bias, dark, flat) that fixed-pattern noise demands, and lands on deep-sky astrophotography as the law pushed to its physical extreme — pulling a galaxy out of near-nothing. Image alignment is the glue every later chapter needs: you must register before you combine, because averaging two frames that disagree by a pixel doesn't clean the image, it blurs it. This chapter is the shared toolbox — sum-of-squared-differences (SSD) and normalized cross-correlation (NCC) matching scores, phase correlation (a pure shift is a phase ramp in Fourier), coarse-to-fine search on a pyramid — and it hands off naturally to the video part, where the same "find the motion between frames" machinery becomes optical flow and stabilization.

HDR merging adds exposure: calibrate the camera's response, then merge a bracket into a single radiance map weighted by each pixel's reliability. Application to cell phones: HDR+ and burst imaging shows why your phone does the opposite of a classic bracket — it shoots a burst of identical underexposed raws and leans on the $1/\sqrt N$ law plus a robust merge, the production system that ships in every handset. Then viewpoint: Manual panorama stitching from multiple views builds the homography by hand (and explains why it needs no depth), and Automatic panorama stitching from multiple views and feature matching makes it automatic — detect features, match them, survive the bad matches with random sample consensus (RANSAC), recover the warps. Blending hides the seams the warps leave behind, climbing the ladder from feathering through pyramid (multiband) and Poisson to graph-cut seams, with a clear account of which one fails when. Bells and whistles handles the rest of reality — projections onto a cylinder or sphere, exposure drift, moving objects, ghosting — and Continuous panoramas (e.g. on cell phones) covers the phone "sweep," where the panorama is built continuously from a video-rate stream rather than a few stills.

The last three chapters carry the same idea onto fresh axes. Focal stacks and depth of field extension adds focus: capture a stack swept through focus, then pick, per pixel, the frame where that point is sharpest — an all-in-focus image no single aperture could produce. Hyperspectral imaging, color wheels adds wavelength: instead of committing to three color channels at capture, record many narrow bands (a filter wheel, a swept color) and decide which combination to display later. And Intrinsic images with time lapse adds illumination: hold the camera still and let the light change across a day; from the stack you can factor each pixel into the surface that stayed put and the lighting that moved — the same stack-and-decide trick, now separating reflectance from illumination over time.

Through all of it, watch the same characters. The robust combine that rejects a satellite streak in an astro stack is the same move that de-ghosts a panorama and a moving HDR scene — trust the consensus, not the outlier. Alignment is the silent precondition everywhere; skip it and every chapter's payoff turns to mush. And the radiance map HDR produces must still be displayed on an ordinary screen — that is tone mapping, recapped in Recap: tone mapping and developed in the basics; the merge lives here. Hold onto the one sentence and the chapters stop looking like a catalogue of tricks: don't make one capture carry every decision — record the full set, and decide, per pixel, later.


Contents of this part

▸ full collapsible outline of this part