2.4 Image measurements as integrals⧉
A camera makes a number out of light. Point it at a scene, open the shutter, and where there was a continuous, infinitely detailed field of light there is now a grid of integers. The question of this chapter is exactly what each of those integers is — what physical quantity a pixel value reports, and what governs how trustworthy it is.
The answer is short enough to state up front: a pixel value is an integral. The light reaching the camera is not one ray but a whole bundle, spread over a range of directions (the aperture), a patch of the sensor (the pixel's area), a band of colors (wavelength), and a stretch of time (the exposure). The sensor adds all of it up. Everything else in this chapter elaborates that one sentence — first by naming the full field of light a camera draws from (the plenoptic function), then by writing the measurement itself as that integral and connecting its four axes to the camera's controls, and finally by describing the physical device that performs it (the sensor). The inevitable jitter in the result — noise — traces straight back to the fact that the thing being integrated is a stream of discrete, randomly arriving photons; it is the subject of the next chapter.
It is worth pausing on the word measurement, because the digital sensor sits at the end of a long historical arc. Photography began as a way to make pictures, and for most of its life a photograph was a perceptual, interpretive object: a film stock had a "look," exposure and development were choices, and the print was a performance of the negative, not a readout of it (the score and the performance). But across that whole history the medium bent steadily toward objective measurement — each advance made the photograph a more faithful, more quantitative record of the light that was actually there, frozen at one instant. The digital sensor is the far end of that arc. Because each photosite literally counts photons, a raw image is, to a good approximation, a calibrated radiometric measurement — a physical map of scene radiance, in real units, of a single snapshot of time — rather than merely a pleasing rendering of one. That shift is what makes the rest of this book possible: you can only merge exposures into high dynamic range, recover depth, or undo a blur if the pixel numbers genuinely mean something physical. (The intro's generation → measurement → generation arc tells the same story from the other side — photography swung hard toward measurement, and only now, with generative AI, swings part of the way back.)
2.4.1 The plenoptic function⧉
Before we can say what a pixel measures, it helps to name everything there is to measure. Stand in a room and ask: what is all the light here, before any camera, eye, or sensor selects from it? At every point in space, light is passing through in every direction, at every wavelength, at every instant. The function that captures all of it at once is the plenoptic function (from plenus, full, and optic — "full optic") introduced by Adelson and Bergen.
Parameterize a single ray of light by its 3D position $(x, y, z)$, its 2D direction $(\theta, \phi)$, its wavelength $\lambda$, and the time $t$, and the plenoptic function returns the radiance carried by that ray:
That is seven numbers in, one radiance out — a seven-dimensional description of every ray of light filling all of space, in every direction and color, at every moment. It is everything there is to see, from everywhere, at once (Figure 2.4.1).
Seven is the classical count, and it captures everything an ordinary camera responds to — but a ray of light carries one more property the standard plenoptic function omits: its polarization, the orientation in which the light's electromagnetic field oscillates. Add it as an eighth axis and $P$ gains a polarization argument. We are largely blind to it, so it is easy to forget; yet it is physically there on every ray, and it is information — the sky is partially polarized in a pattern that follows the sun, reflections off glass and water are strongly polarized (which is how a polarizing filter cuts glare), and the surface orientation of an object leaves a polarization signature. Many animals read it directly (bees navigate by the sky's polarization pattern; mantis shrimp even sense circular polarization — see Human (and animal) vision and color), and polarization cameras sample it on purpose for glare removal, reflection separation, and shape-from-polarization. It is one more dimension of the light that a measurement can choose to integrate over (throwing it away, as most sensors do) or resolve (as a polarization camera does) — the same capture-or-discard choice the rest of this chapter draws for direction, wavelength, and time.
The reason to introduce such an extravagant object is that it makes one idea precise: any imaging operation is the measurement of some portion of the plenoptic function. The eye samples a 2D slice; a video camera samples a 2D slice that also varies over time; a depth sensor recovers part of the geometry hidden in it; a light-field camera grabs a richer 4D chunk. Each imaging device is just a different way of sampling $P$.
In particular, a conventional camera records only a 2D projection. Each pixel collects all the rays that reach it through the aperture and sums them, discarding their direction — it keeps that light arrived, not from which direction within the aperture it came. That discarded directional information is exactly what light-field and multi-aperture cameras (Advanced part) work to preserve, so they can refocus or compute depth after the shot. For an ordinary camera it is thrown away, and the throwing-away is an integral. The next section makes that literal.
2.4.2 The pixel integral⧉
The plenoptic function now pays off concretely. We said a conventional camera measures only a 2D projection of $P$; we can now say exactly what each pixel of that projection holds. A pixel does not sample a single ray — it integrates a slice of $P$ over four of its plenoptic dimensions at once, each weighted by how the pixel responds there (Figure 2.4.2):
- the aperture — all the ray directions $(\theta, \phi)$ the lens admits toward this pixel, a solid angle set by the f-number;
- the pixel's area $(x, y)$ on the sensor — a finite patch, not a mathematical point;
- wavelength $\lambda$ — weighted by the sensor's and color filter's spectral response; and
- the exposure time $t$ — the interval the shutter is open.
Schematically, the value recorded at a pixel is
the scene radiance integrated over aperture × pixel area × wavelength × time, weighted by the pixel's spectral response $r(\lambda)$ and the geometric weight $k$ that the aperture and footprint impose. Stated in one line: measurement is integrating the plenoptic function against the pixel's sampling kernel — and the support of that kernel along each axis is exactly a photographic knob. Because the underlying capture counts photons, this integral is linear in scene radiance — a fact we will need both for high-dynamic-range merging and for the noise analysis later in the chapter.
The four domains are the four knobs. What makes this more than bookkeeping is that each integration domain is governed by a control the photographer actually sets — the integral is the camera's exposure settings, written as mathematics:
- the aperture integral is set by the f-number $N = f/D$: stopping down shrinks the cone of directions each pixel sums over (less light, more depth of field); opening up widens it;
- the exposure-time integral is set by the shutter speed: a longer interval integrates over more time (more light, but more motion blur);
- the pixel-area integral is fixed by the sensor's pixel pitch — the physical size of a photosite — which is why a larger sensor (or larger pixels) gathers more light per pixel, trading against spatial resolution;
- the wavelength integral is shaped by the spectral response — the sensor's sensitivity curve times the color filter over each photosite — which collapses a whole spectrum into a single number per channel.
The one familiar exposure control missing from this list is ISO, and the omission is instructive: ISO is not an integration limit. It scales the result of the integral after the light is collected (analog or digital gain), rather than changing how much light is gathered — which is exactly why pushing ISO amplifies noise along with signal, as the noise section will show.
The deeper reason to write capture this way is that each of the four integration domains earns its own topic later, and much of photography is the craft of choosing how to shape these four integrals — every later sensing topic is just what happens when you change the support of one axis:
- integrating over the aperture is what produces depth of field (the lens chapter): a wide aperture integrates over a fat cone of directions, so an out-of-focus point spreads its energy across many pixels — the circle of confusion. The light-versus-blur trade lives entirely on this one axis;
- integrating over the pixel's area is sampling and antialiasing (Basic Image Processing): a finite footprint makes each sample a box-filtered average rather than a point sample, pre-blurring the image before it is laid on the grid — a wider footprint suppresses aliasing at the cost of resolution;
- integrating over time is motion blur (Motion/Video): a long exposure sums a moving scene over the interval — the temporal sibling of the aperture, trading light against sharpness;
- integrating over wavelength is color sensing: the spectral response collapses a whole spectrum into one number per channel, exactly as a cone does (Human (and animal) vision and color). Because a color-filter array gives each photosite a different $r(\lambda)$, color is spatially multiplexed across the Bayer mosaic and must be reconstructed by demosaicking.
The payoff of the kernel framing is that the rest of the sensing story reduces to a single question — which kernel, how wide, and what does it lose: antialiasing widens the spatial kernel, depth of field is the aperture kernel, motion blur is the time kernel, and demosaicking inverts the wavelength multiplexing. Each is also a source of noise: a narrower kernel along any axis — a smaller footprint, a shorter exposure, a stopped-down aperture — integrates fewer photons, and fewer photons mean a noisier estimate, which is exactly where the next chapter picks up.
Off-axis ($\cos^4$) falloff. One prediction falls straight out of this irradiance integral. Rays reaching the corners of the frame arrive obliquely, so the same scene radiance delivers less irradiance there: the aperture-and-area geometry of the pixel integral works out to a $\cos^4\theta$ dimming toward the edges, called natural vignetting (Figure 2.4.3). It is distinct from optical vignetting — lens elements physically blocking edge rays, covered in Optics — in that natural vignetting is a clean consequence of the pixel-integral geometry alone.
This completes the picture-forming side of measurement: a camera selects a slice of all the light in the world — the plenoptic function — and integrates it over aperture, area, wavelength, and time. The device that performs that integral physically — a grid of photosites counting photons, read out as numbers, and the shutter that times them — is the subject of the next chapter; the unavoidable jitter in the photon count, the noise it imposes, follows after that.