10.11 Hyperspectral imaging, color wheels⧉
Stand in front of a fresh leaf and a plastic leaf dyed to match it. To your eye, and to your camera, they can be the same green — identical R, G, B. Yet the light coming off them, measured wavelength by wavelength across the spectrum, is not remotely the same: the real leaf has a sharp cliff in reflectance just past the red, where chlorophyll stops absorbing and the near-infrared (NIR) floods back, and the plastic has nothing of the kind. Your camera cannot see that cliff. It collapsed the whole spectrum into three numbers at the instant of capture, and the cliff fell into the gap between them. Hyperspectral imaging is the refusal to make that collapse — to record, at every pixel, not a color but a spectrum: tens or hundreds of narrow wavelength bands instead of three broad ones.
This is the part's one move, transplanted to a new axis. We have stacked frames along exposure to beat dynamic range and along focus to beat depth of field; here we stack along wavelength, deferring "which three colors?" — or really, which spectral question — to long after capture (the L14 framing, made precise in the box below). Record the full spectral set, and you can ask, days or years later, what is this thing actually made of, a question RGB threw away before you ever thought to ask it.
Capture the full set, decide later. RGB discards the spectrum at the moment of capture: three broad, overlapping bands, irreversibly mixed into three numbers. Hyperspectral imaging instead records the whole spectrum per pixel and decides spectral questions afterward — the same defer-the-decision move as HDR on the exposure axis, focal stacks on the focus axis, and light fields on the aperture axis, now run along the wavelength axis. The cost is the usual one: more data, more light, more capture time. The payoff is that you can ask what is this made of? — match a material, spot disease, expose a forgery — long after the shutter has closed. (Registered as L14 in Big Lessons; first appears in this part introduction; this is its wavelength-axis recurrence, and the light-field/plenoptic camera in Advanced computational photography is the same idea for the focus axis.)
10.11.1 Why three numbers aren't enough — RGB as a 3-sample projection⧉
Recall, from Human (and animal) vision and color and Color technology, exactly what a camera channel does. Light arrives at a pixel carrying a continuous spectrum $S(\lambda)$ — power as a function of wavelength. A color channel does not record that function; it records a single integral of it against a fixed spectral sensitivity $R_c(\lambda)$,
Read back: each channel multiplies the incoming spectrum by its own sensitivity curve and sums the result to one number. RGB does this with three broad, overlapping sensitivities — roughly the long-, medium-, and short-wavelength responses our cones inspired — so a whole continuous function is summarised by three dot products. That projection is brutally lossy and many-to-one: it is entirely possible for two physically different spectra to produce identical RGB. Those are metamers, and they are not an edge case — they are the everyday currency of color reproduction (the leaf and its plastic twin, two paints, two inks).
What gets lost is not noise; it is information that can be exactly the information you need. Two pigments on a canvas, two species of plant, fresh produce and the same fruit a day from spoiling, the forger's ink and the original it imitates — these routinely look identical to RGB and differ plainly in their spectra. The data that would tell them apart is precisely what the three-number projection discarded (Figure 10.11.1).
The generalisation is then almost trivial. Keep the very same equation, but swap the three broad sensitivities for many narrow band responses $R_b(\lambda)$, each a near-delta spike at a wavelength $\lambda_b$:
Now each pixel yields not three numbers but a sampled spectrum $S(x,y,\lambda_b)$ — and stacking those samples across all bands gives the hyperspectral data cube $I(x,y,\lambda)$: an image stack whose third axis is wavelength, not focus or exposure (Figure 10.11.2). Seen this way, RGB is just this cube with three broad bands, and a Bayer color-filter array is a three-band snapshot mosaic — the most impoverished hyperspectral camera there is (cross-ref Color technology). The cube is the general object; color is a thin slice of it.
A word on vocabulary, because the two terms are used loosely. Multispectral imaging means a handful of bands — Landsat's seven, or an ordinary camera with a near-infrared channel bolted on. Hyperspectral means many contiguous, narrow bands — tens to hundreds — packed densely enough that each pixel carries a quasi-continuous spectrum. Same idea, different sampling density along $\lambda$; the line between them is one of degree, not of kind.
10.11.2 Building the spectral stack — filter wheels, tunable filters, pushbroom, snapshot⧉
Having decided to fill a cube $I(x,y,\lambda)$, the engineering question is how. A sensor is a two-dimensional array; the cube is three-dimensional; something has to give. Every hyperspectral camera is a different answer to "which dimension do I trade away to get the third?" — and, exactly as with the focal and exposure stacks, the unifying view is L14: each method fills the cube by a different schedule, trading time against spatial resolution against spectral resolution (Figure 10.11.3).
Spectral scanning — the filter wheel and the tunable filter. The most direct strategy: put a narrow bandpass filter in front of the sensor, take a full-frame image in that one band, then change the filter and shoot again, sweeping through the bands one at a time. A mechanical color wheel rotates discrete glass filters into the optical path — the literal "color wheel" of the chapter's title, and the oldest trick in the book (Prokudin-Gorskii's three-filter color photographs of 1900s Russia are the three-band ancestor). Better, an electronically tunable filter — a liquid-crystal tunable filter (LCTF) or an acousto-optic tunable filter (AOTF) — sweeps its passband with no moving parts, fast and programmable. The trade is clean: you keep full spatial resolution and can take as many bands as you have patience for, but the capture is sequential in time, so the scene and camera must hold perfectly still across the whole sweep. This is the natural choice for the lab bench, the microscope, and a painting on an easel — anything that will sit still.
Spatial scanning — pushbroom (line-scan). Here you give up imaging the whole frame at once. A slit admits a single line of the scene, and a dispersing element — a prism or grating — spreads that line's light across the sensor by wavelength. The result is that one sensor axis records position along the line and the other records wavelength: you capture one spatial line in all its bands simultaneously. To fill the second spatial dimension you sweep the line across the scene, usually by moving the platform. This is the right architecture precisely when the scene is already moving past the sensor — a satellite or aircraft with the ground scrolling beneath it (hence the agricultural name, "pushbroom"), or a conveyor belt of produce or crushed mineral streaming past an inspection head. The trade: every band of a line arrives at once, but you need relative motion and careful line-to-line registration to assemble a clean cube.
Snapshot spectral imaging. The third answer captures the entire cube in one exposure, by spending spatial pixels on spectral bands. A spectral filter mosaic — a Bayer-style array, but with many band filters tiled instead of just R, G, B — gives each tiny neighbourhood of pixels a full set of bands; computational and coded designs do the same trick more cleverly (surveyed by Hagen & Kudenov 2013). The trade is the obvious one: a single shot, so motion and even video are fine, paid for by a drop in spatial resolution, since the sensor's pixel budget is now split between where and what color.
The most ambitious snapshot designs are unapologetically computational, and they pull the chapter back onto the part's other spine — the inverse problem. In coded snapshot compressive spectral imaging (CASSI) (Wagadarikar et al. 2008), a coded aperture and a dispersive element — a prism or grating — sit in the optical path and together multiplex the whole $(x,y,\lambda)$ cube onto a single 2-D sensor frame: the mask blocks a patterned subset of rays and the disperser shears each band sideways by its wavelength, so every sensor pixel ends up summing a coded mixture of wavelengths from a small neighbourhood. No band is read cleanly; the measurement is one scrambled projection of the entire cube. Recovering the cube is then exactly the compressive-sensing reconstruction we met for coded imaging — find the $I(x,y,\lambda)$ whose coded, sheared projection matches the measured frame, an under-determined fit pinned down by a prior: that a natural spectrum is sparse in some basis, or, increasingly, a learned spectral prior that fills in the rest. The bargain is the one this whole part keeps making, now spent the other way round: instead of trading time (the filter wheel's sequential sweep) or spatial pixels (the mosaic) for the cube, CASSI trades a reconstruction — solving an inverse problem in software — for a genuine single-shot capture, fast enough for motion and video where the scanners cannot follow. It is the same coded-aperture, compressive-sensing move as single-pixel and coded-aperture imaging; the machinery is developed under compressive sensing / coded imaging in Advanced computational photography.
Step back and the three strategies fall on a clean tradeoff triangle. You want all three of {full spatial resolution, full spectral resolution, single-shot / fast}; you can cheaply have any two. Spectral scanning sacrifices time (the scene must freeze); pushbroom demands motion (you must sweep); snapshot sacrifices spatial resolution (the mosaic spends pixels on wavelength). There is no free corner — which is simply the L14 cost (data, light, time) made geometric.
10.11.3 What it's for — material ID, agriculture, art and beyond⧉
A spectrum per pixel changes the kind of question you can answer. RGB answers "what color is it?"; a spectrum answers "what is it made of?" — and that shift drives every application.
Material identification and unmixing. A pixel's spectrum is a fingerprint. Lay it against a library of known spectral signatures and you can classify the material — this mineral, that plastic, this species of crop — pixel by pixel, building a material map of the scene. And because a single ground pixel from orbit may straddle several materials, you can go further and unmix it, decomposing the measured spectrum into a weighted sum of pure-material signatures to recover how much of each is present. This is the founding use of imaging spectrometers in remote sensing — the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), run by NASA's Jet Propulsion Laboratory (JPL), and its descendants map geology and mineralogy from the air this way.
Agriculture and vegetation. Healthy plants do something striking just past the visible: they absorb red light (chlorophyll) but reflect near-infrared strongly, producing the sharp "red edge" we met in the leaf example. The workhorse index that exploits it is the normalised difference vegetation index (NDVI),
Read it back: the contrast between the near-infrared and red bands, normalised by their sum. High NDVI means vigorous, well-watered foliage; a drop flags water stress, disease, or senescence — often before any change is visible to the eye. Note the shape of the formula. It is a per-pixel band ratio, and that is no accident: it is the L1/L2 multiplicative-and-log lesson in miniature. A ratio of bands cancels the overall illumination level — bright sun or shade scales NIR and red together and divides out — leaving an intrinsic property of the surface. Normalising by the sum just bounds the index to $[-1,1]$. This is the same instinct as working in log to turn a multiplicative scene into an additive one (cross-ref BASIC point operations): when the thing you care about is a proportion, you build a ratio, not a difference.
Art conservation and forensics. Train a multispectral or hyperspectral camera on a painting and the surface turns translucent to questions the eye cannot pose. Infrared bands penetrate paint layers to reveal underdrawings and pentimenti — the artist's changes of mind; subtle spectral differences betray later retouching and overpaint; matching pigment spectra to a reference library identifies the pigments and dates them; and a forgery can be unmasked when its modern inks or pigments have spectra the period's materials never had. The discipline documents and authenticates by seeing colors that, to us, are not different colors at all.
And beyond. Food inspection grades ripeness and catches contamination on the line; medical and biological imaging reads tissue oxygenation from haemoglobin's spectrum and runs fluorescence microscopy through LCTF/AOTF filters; recycling and sorting plants pick plastics apart by their infrared signatures faster than any human could.
In every one of these the discipline is the same — record the full spectral set, then ask the question: which material? healthy or stressed? original or forgery? — long after the light has been captured. The wavelength axis thus takes its place alongside exposure (HDR), focus (focal stacks), and aperture (light fields) as one more dimension along which you can refuse to decide at the shutter, and defer the decision into software.
Big lessons of this chapter
The recurring principles from this chapter, gathered for review.
Capture the full set, decide later. RGB discards the spectrum at the moment of capture: three broad, overlapping bands, irreversibly mixed into three numbers. Hyperspectral imaging instead records the whole spectrum per pixel and decides spectral questions afterward — the same defer-the-decision move as HDR on the exposure axis, focal stacks on the focus axis, and light fields on the aperture axis, now run along the wavelength axis. The cost is the usual one: more data, more light, more capture time. The payoff is that you can ask what is this made of? — match a material, spot disease, expose a forgery — long after the shutter has closed. (Registered as L14 in Big Lessons; first appears in this part introduction; this is its wavelength-axis recurrence, and the light-field/plenoptic camera in Advanced computational photography is the same idea for the focus axis.)