2.6 Noise, signal-to-noise ratio and dynamic range⧉
2.6.1 Noise sources⧉
Counting photons is what makes a sensor linear, but counting is also where noise comes from. Take two photos of the same still scene at the same settings and they are not byte-for-byte identical; each pixel value jitters from frame to frame. Several physical sources contribute, and because they are statistically independent, their variances add — so to combine them we add variances ($\sigma^2$), not standard deviations (Figure 2.6.1).
Photon (shot) noise is the deepest source, and it is not a defect of the sensor — it is the light itself. Light arrives as discrete photons at random times, so the count in a fixed interval fluctuates even from a perfectly steady source. The statistics are Poisson, and the defining property of a Poisson process is that its variance equals its mean. So if a pixel collects $N$ photons on average, the variance of the count is $N$ and the standard deviation is
This is irreducible: no better sensor can remove it, because it is a property of light, not of electronics. It dominates the midtones and highlights, where $N$ is large. The one escape is averaging: average $N_f$ independent frames of the same scene and the shot noise drops by $\sqrt{N_f}$ — the statistical basis of burst denoising and astrophotography stacking.
The remaining sources are added by the hardware:
- Read noise is added by the amplifier and the analog-to-digital converter (ADC) at readout. It is signal-independent — a roughly fixed amount regardless of brightness — so it dominates the shadows (where the signal is tiny) and sets the sensor's noise floor.
- Thermal (dark-current) noise comes from electrons freed by heat rather than light. It is independent of the scene but grows with exposure time and temperature, which is why long exposures and astrophotography call for sensor cooling and dark-frame subtraction (photographing the dark current with the shutter closed and subtracting it).
- Fixed-pattern noise is per-pixel non-uniformity: slightly different gain or offset at each photosite, plus hot or stuck pixels and banding. Because it is structured rather than random — the same pattern in every frame — it reads as worse than random noise of the same magnitude, and it is removed by calibration (flat and dark fields).
We can see that noise directly. Shoot a burst of one static scene at high ISO and take, for every pixel, its variance across the frames: the result is a map of the noise itself, laid over the picture (Figure 2.6.2). The noise is not uniform — it is heavier in the brighter regions (shot noise) and along edges — which is the clue to the rule that follows.
Because the variances of these sources add, and because the dominant two scale so simply with the signal, the whole noise level follows a strikingly simple rule that we can measure directly from data.
The variances add, and two terms dominate: shot noise contributes an amount proportional to the signal (Poisson, variance = mean, scaled by the gain) and read noise a constant floor. So the total is variance ≈ gain·signal + read² — a straight line in brightness (Figure 2.6.3). The slope is the photon gain (it steepens with ISO, which amplifies the signal and its shot noise together); the intercept is the read-noise floor. We can measure it without any model: take an aligned burst of a static scene and, per pixel, plot the variance across frames against the mean. And a single pixel's value over many frames is ≈ Gaussian (Figure 2.6.4) — which is what licenses modelling per-pixel noise as additive Gaussian.
Carl Friedrich Gauss (1777–1855) — the "prince of mathematicians" — is the most over-cited name in this book, and you have just met him in the bell curve. The Gaussian (normal) distribution that a noisy pixel traces over many frames is his, and so is least squares, which he used in 1801 to recover the orbit of the dwarf planet Ceres from a handful of sightings and which underlies nearly every fit and reconstruction we do. The same name returns as the Gaussian blur, Gaussian elimination inside our linear solvers, and the Gauss–Seidel iteration. That a sum of many small independent effects collapses into one bell shape — the central limit theorem — is exactly why the additive-Gaussian noise model keeps working. Portrait: painting by Christian Albrecht Jensen, 1840, public domain (via Wikimedia Commons).
The affine model and the Gaussian shape are the ideal. Real recorded values break the ideal in one important way at the extremes.
A recorded value is clamped to $[0,\text{max}]$. Near black the negative half of the noise is cut off and frames pile up at 0; near white the positive half is. So at the extremes the noise distribution is asymmetric, with its mean pushed inward (Figure 2.6.5). The consequence for denoising: a naive average or smoothing returns that biased mean, so it makes shadows come out too bright and highlights too dark — a bias every denoiser must correct for (we return to it in Denoising, BASIC).
2.6.2 Signal-to-noise ratio: it's about ratios⧉
Here is the fact that surprises people and ties the whole part together. Since shot noise is $\sigma = \sqrt{N}$, the absolute noise is actually larger in the bright parts of the image than in the dark parts — a highlight collecting 10,000 photons has $\sigma = 100$, while a shadow collecting 100 photons has only $\sigma = 10$. And yet noise looks worst in the shadows. The resolution is that perception cares about the signal-to-noise ratio (SNR), not the absolute noise. For shot noise,
The SNR grows with brightness, so it is worst where the light is faintest — the shadows — even though the raw noise magnitude is smallest there (Figure 2.6.6). Ratios are all that matters. This is the same lesson that drives gamma/log encoding (we encode so that equal ratios get equal steps) and exactly why exposing to the right works: collecting more photons everywhere, especially in the shadows, raises $N$ and therefore raises the SNR where it is worst.
Shot noise is Poisson, so its variance grows linearly with the signal ($\sigma^2 \propto N$) and its standard deviation grows only as the square root ($\sigma \propto \sqrt{N}$). Two consequences pull in opposite directions: the absolute noise is larger in highlights than in shadows (a 10,000-photon highlight has $\sigma=100$; a 100-photon shadow has $\sigma=10$), yet the SNR $= N/\sqrt{N} = \sqrt{N}$ is higher in the highlights — the signal outgrows the noise. So bright areas are noisier in absolute terms but cleaner to the eye; the shadows have the worst SNR (Figure 2.6.2, Figure 2.6.6). This is why exposing to the right and brighter scenes look smoother, and why we gamma/log-encode so equal ratios get equal steps.
2.6.3 Dynamic range⧉
Two extremes bound what a single exposure can hold. At the top, a pixel's well saturates: once it has counted its full-well capacity of electrons, more light adds nothing and the highlight clips to white. At the bottom, the read-noise floor sets the dimmest signal still distinguishable from noise. The ratio of the two is the sensor's dynamic range — how much of a high-contrast scene a single exposure can capture. When a scene's range exceeds the sensor's — a bright window and a dim interior in one frame — no single exposure holds both ends, which is the motivation for high-dynamic-range (HDR) imaging, merging several exposures (Multiple-exposure part) (Figure 2.6.7). Like everything else a photographer counts, dynamic range is quoted in stops — factors of two — which connects it straight to the exposure controls.
How wide a range, exactly? It helps to put numbers on it, because the dynamic ranges of real media and of the eye differ by a lot (Figure 2.6.8). A color slide is narrow, only about 5–6 stops; a reflective print about 6–7; a film negative is famously forgiving at roughly 12–13 stops of latitude; a phone sensor manages about 10–12 and a full-frame sensor about 14. The human eye sees about 10–14 stops instantaneously, but allow it to adapt — the slow re-centering of the retinal response covered with perception — and it spans 20+ stops over time; some animals do better still within their niche. Against all of these, an outdoor sun-and-shadow scene can exceed 20 stops in a single frame — which is precisely why no single capture holds it, and why HDR exists.
See it — recover an underexposed shot, full-frame vs phone. The clearest way to feel that dynamic range scales with photosite size is to break one. Shoot the same underexposed scene as raw on a full-frame camera and on a phone, then push the exposure up in software by three or four stops (Figure 2.6.9). The full-frame frame cleans up — its deep full-well and low read-noise floor give it the headroom to be lifted — while the phone frame breaks into shadow noise and banding, because lifting the exposure also amplifies its read-noise floor, which sat much closer to the signal to begin with. That is a direct, visible demonstration that dynamic range scales with photosite (and sensor) size, and exactly why phones lean on burst capture and HDR+ (Multiple-exposure part) rather than trusting a single exposure. It is also a coding exercise — see Exercises & Experiments.
A single exposure records from a top — the photosite's full-well capacity, where it saturates and clips to white — down to a bottom — the noise floor, the read noise that drowns the shadows. Their ratio is the dynamic range, quoted in stops. You widen it with a bigger well (larger photosites — why a full-frame sensor out-ranges a phone) or a lower floor (cooling, lower read noise), and you beat it altogether by merging multiple exposures (HDR). This is the capture-side companion to the quantization lesson below: what bounds a single shot is this range — full-well over floor — not the number of bits.
This is also the right moment to deflate a worry beginners often have about digital capture: that the image is quantized into a finite number of levels, and that this stair-stepping is what limits quality. It almost never is.
With enough bits and a sane encoding, what limits image quality is noise and dynamic range — the two quantities of this section — not the number of levels the analog-to-digital converter (ADC) offers. The noise floor and the full well bound what you can record; the quantization step, once it sits below the noise, is invisible. (Indeed, a little noise dithers the quantization, hiding it further.) The one place levels do bite is banding in the deep shadows of a linearly-encoded image — too few codes where the eye, working in ratios, is most sensitive — and that is exactly the problem gamma encoding exists to solve, by spending code levels perceptually (Color technology). So "rarely" is not "never." We meet this lesson again in BASIC → Image representation → Float vs 8-bit, where floats with headroom are the sane default.
These three facts — that noise is the jitter of a photon count, that the signal-to-noise ratio is worst in the shadows, and that dynamic range is the full-well-over-floor a single exposure can hold — recur on nearly every later page: in how a photographer exposes to the right, in high-dynamic-range merging (Multiple-exposure part), and in the denoising of the basic pipeline. With measurement and its limits now in hand, the part turns next to what a second viewpoint buys you — depth — and then to the eye that will finally judge the result.
Big lessons of this chapter
The recurring principles from this chapter, gathered for review.
The variances add, and two terms dominate: shot noise contributes an amount proportional to the signal (Poisson, variance = mean, scaled by the gain) and read noise a constant floor. So the total is variance ≈ gain·signal + read² — a straight line in brightness (Figure 2.6.3). The slope is the photon gain (it steepens with ISO, which amplifies the signal and its shot noise together); the intercept is the read-noise floor. We can measure it without any model: take an aligned burst of a static scene and, per pixel, plot the variance across frames against the mean. And a single pixel's value over many frames is ≈ Gaussian (Figure 2.6.4) — which is what licenses modelling per-pixel noise as additive Gaussian.
A recorded value is clamped to $[0,\text{max}]$. Near black the negative half of the noise is cut off and frames pile up at 0; near white the positive half is. So at the extremes the noise distribution is asymmetric, with its mean pushed inward (Figure 2.6.5). The consequence for denoising: a naive average or smoothing returns that biased mean, so it makes shadows come out too bright and highlights too dark — a bias every denoiser must correct for (we return to it in Denoising, BASIC).
Shot noise is Poisson, so its variance grows linearly with the signal ($\sigma^2 \propto N$) and its standard deviation grows only as the square root ($\sigma \propto \sqrt{N}$). Two consequences pull in opposite directions: the absolute noise is larger in highlights than in shadows (a 10,000-photon highlight has $\sigma=100$; a 100-photon shadow has $\sigma=10$), yet the SNR $= N/\sqrt{N} = \sqrt{N}$ is higher in the highlights — the signal outgrows the noise. So bright areas are noisier in absolute terms but cleaner to the eye; the shadows have the worst SNR (Figure 2.6.2, Figure 2.6.6). This is why exposing to the right and brighter scenes look smoother, and why we gamma/log-encode so equal ratios get equal steps.
A single exposure records from a top — the photosite's full-well capacity, where it saturates and clips to white — down to a bottom — the noise floor, the read noise that drowns the shadows. Their ratio is the dynamic range, quoted in stops. You widen it with a bigger well (larger photosites — why a full-frame sensor out-ranges a phone) or a lower floor (cooling, lower read noise), and you beat it altogether by merging multiple exposures (HDR). This is the capture-side companion to the quantization lesson below: what bounds a single shot is this range — full-well over floor — not the number of bits.
With enough bits and a sane encoding, what limits image quality is noise and dynamic range — the two quantities of this section — not the number of levels the analog-to-digital converter (ADC) offers. The noise floor and the full well bound what you can record; the quantization step, once it sits below the noise, is invisible. (Indeed, a little noise dithers the quantization, hiding it further.) The one place levels do bite is banding in the deep shadows of a linearly-encoded image — too few codes where the eye, working in ratios, is most sensitive — and that is exactly the problem gamma encoding exists to solve, by spending code levels perceptually (Color technology). So "rarely" is not "never." We meet this lesson again in BASIC → Image representation → Float vs 8-bit, where floats with headroom are the sane default.