💡 In a hurry? Jump to this chapter’s 5 big lessons ↓

2.6 Noise, signal-to-noise ratio and dynamic range⧉

2.6.1 Noise sources⧉

Counting photons is what makes a sensor linear, but counting is also where noise comes from. Take two photos of the same still scene at the same settings and they are not byte-for-byte identical; each pixel value jitters from frame to frame. Several physical sources contribute, and because they are statistically independent, their variances add — so to combine them we add variances ($\sigma^2$), not standard deviations (Figure 2.6.1).

Photon (shot) noise is the deepest source, and it is not a defect of the sensor — it is the light itself. Light arrives as discrete photons at random times, so the count in a fixed interval fluctuates even from a perfectly steady source. The statistics are Poisson, and the defining property of a Poisson process is that its variance equals its mean. So if a pixel collects $N$ photons on average, the variance of the count is $N$ and the standard deviation is

$$ \sigma = \sqrt{N}. $$

This is irreducible: no better sensor can remove it, because it is a property of light, not of electronics. It dominates the midtones and highlights, where $N$ is large. The one escape is averaging: average $N_f$ independent frames of the same scene and the shot noise drops by $\sqrt{N_f}$ — the statistical basis of burst denoising and astrophotography stacking.

The remaining sources are added by the hardware:

Read noise is added by the amplifier and the analog-to-digital converter (ADC) at readout. It is signal-independent — a roughly fixed amount regardless of brightness — so it dominates the shadows (where the signal is tiny) and sets the sensor's noise floor.
Thermal (dark-current) noise comes from electrons freed by heat rather than light. It is independent of the scene but grows with exposure time and temperature, which is why long exposures and astrophotography call for sensor cooling and dark-frame subtraction (photographing the dark current with the shutter closed and subtracting it).
Fixed-pattern noise is per-pixel non-uniformity: slightly different gain or offset at each photosite, plus hot or stuck pixels and banding. Because it is structured rather than random — the same pattern in every frame — it reads as worse than random noise of the same magnitude, and it is removed by calibration (flat and dark fields).

[figure fig-noise-histogram not built]

Figure 2.6.1. Noise sources and a noisy capture. A high-ISO photo (here ISO 3200) with a flat patch enlarged, beside its histogram — a spread, not a spike. The spread is the sum of independent sources whose variances add: photon/shot noise (Poisson, $\sigma = \sqrt{N}$, dominates midtones and highlights), read noise (signal-independent, dominates shadows, sets the floor), thermal/dark current (grows with time and heat), and structured fixed-pattern noise (removed by calibration).

We can see that noise directly. Shoot a burst of one static scene at high ISO and take, for every pixel, its variance across the frames: the result is a map of the noise itself, laid over the picture (Figure 2.6.2). The noise is not uniform — it is heavier in the brighter regions (shot noise) and along edges — which is the clue to the rule that follows.

fig-noise-variance-map — **Figure 2.6.2.** The noise of a real high-ISO image, and its SNR. From a 50-frame ISO-3200 burst of one static scene: the scene itself (left, mean of the burst), the **per-pixel variance** across the burst (middle), and the **per-pixel SNR** = signal/noise (right). Computed on **linearised** values, so shot noise behaves physically. The variance map is **brightest where the scene is bright** — absolute noise grows with signal — while the SNR map is **also brightest in the bright areas and worst in the shadows**, because noise only grows as √signal so the ratio improves with light. Bright pixels are noisier in absolute terms yet cleaner in relative terms. This is the raw material the affine model below summarises.

Because the variances of these sources add, and because the dominant two scale so simply with the signal, the whole noise level follows a strikingly simple rule that we can measure directly from data.

💡 Big lesson — in a linear image, noise variance is an affine function of brightness

The variances add, and two terms dominate: shot noise contributes an amount proportional to the signal (Poisson, variance = mean, scaled by the gain) and read noise a constant floor. So the total is variance ≈ gain·signal + read² — a straight line in brightness (Figure 2.6.3). The slope is the photon gain (it steepens with ISO, which amplifies the signal and its shot noise together); the intercept is the read-noise floor. We can measure it without any model: take an aligned burst of a static scene and, per pixel, plot the variance across frames against the mean. And a single pixel's value over many frames is ≈ Gaussian (Figure 2.6.4) — which is what licenses modelling per-pixel noise as additive Gaussian.

fig-noise-affine — **Figure 2.6.3.** Noise variance is affine in brightness — measured. From a 50-frame burst of one static scene at ISO 3200, the per-pixel variance plotted against the per-pixel mean (in the linearised domain, on flat patches so mis-alignment doesn't masquerade as noise) traces a straight line: a constant read-noise floor (intercept) plus a shot-noise term proportional to the signal (slope = gain), $\sigma^2 \approx a\,I + b$. Toward the bright end the camera's highlight roll-off and clipping pull the measured variance below the line — the truncation taken up next.

fig-noise-gaussian-pixels — **Figure 2.6.4.** A single pixel, over many frames, is ≈ Gaussian. Histograms of a few flat pixels' values across the 50 frames, at increasing brightness, each with a fitted Gaussian: the spread is well described by a normal distribution (the central-limit sum of many small effects). This is the empirical basis for the additive-Gaussian noise model used throughout denoising.

Sidebar — who was Gauss?

Portrait of Carl Friedrich Gauss, by Christian Albrecht Jensen Carl Friedrich Gauss (1777–1855) — the "prince of mathematicians" — is the most over-cited name in this book, and you have just met him in the bell curve. The Gaussian (normal) distribution that a noisy pixel traces over many frames is his, and so is least squares, which he used in 1801 to recover the orbit of the dwarf planet Ceres from a handful of sightings and which underlies nearly every fit and reconstruction we do. The same name returns as the Gaussian blur, Gaussian elimination inside our linear solvers, and the Gauss–Seidel iteration. That a sum of many small independent effects collapses into one bell shape — the central limit theorem — is exactly why the additive-Gaussian noise model keeps working. Portrait: painting by Christian Albrecht Jensen, 1840, public domain (via Wikimedia Commons).

The affine model and the Gaussian shape are the ideal. Real recorded values break the ideal in one important way at the extremes.

💡 Big lesson — noise is clipped at black and white, so near the extremes it is not zero-mean

A recorded value is clamped to $[0,\text{max}]$. Near black the negative half of the noise is cut off and frames pile up at 0; near white the positive half is. So at the extremes the noise distribution is asymmetric, with its mean pushed inward (Figure 2.6.5). The consequence for denoising: a naive average or smoothing returns that biased mean, so it makes shadows come out too bright and highlights too dark — a bias every denoiser must correct for (we return to it in Denoising, BASIC).

fig-noise-truncation — **Figure 2.6.5.** Clipping makes noise asymmetric at the extremes. A near-black pixel spends about 44% of its frames pinned at 0 — the negative half of its noise is clamped away — so the average of its frames (what naive denoising returns) sits well above 0: biased bright. A clean midtone pixel, by contrast, is a symmetric Gaussian whose average is unbiased. Averaging therefore lifts shadows too bright and, symmetrically, pulls clipped highlights too dark.

2.6.2 Signal-to-noise ratio: it's about ratios⧉

Here is the fact that surprises people and ties the whole part together. Since shot noise is $\sigma = \sqrt{N}$, the absolute noise is actually larger in the bright parts of the image than in the dark parts — a highlight collecting 10,000 photons has $\sigma = 100$, while a shadow collecting 100 photons has only $\sigma = 10$. And yet noise looks worst in the shadows. The resolution is that perception cares about the signal-to-noise ratio (SNR), not the absolute noise. For shot noise,

$$ \mathrm{SNR} = \frac{N}{\sqrt{N}} = \sqrt{N}. $$

The SNR grows with brightness, so it is worst where the light is faintest — the shadows — even though the raw noise magnitude is smallest there (Figure 2.6.6). Ratios are all that matters. This is the same lesson that drives gamma/log encoding (we encode so that equal ratios get equal steps) and exactly why exposing to the right works: collecting more photons everywhere, especially in the shadows, raises $N$ and therefore raises the SNR where it is worst.

💡 Big lesson — bright pixels are noisier (more absolute noise) but have better SNR

Shot noise is Poisson, so its variance grows linearly with the signal ($\sigma^2 \propto N$) and its standard deviation grows only as the square root ($\sigma \propto \sqrt{N}$). Two consequences pull in opposite directions: the absolute noise is larger in highlights than in shadows (a 10,000-photon highlight has $\sigma=100$; a 100-photon shadow has $\sigma=10$), yet the SNR $= N/\sqrt{N} = \sqrt{N}$ is higher in the highlights — the signal outgrows the noise. So bright areas are noisier in absolute terms but cleaner to the eye; the shadows have the worst SNR (Figure 2.6.2, Figure 2.6.6). This is why exposing to the right and brighter scenes look smoother, and why we gamma/log-encode so equal ratios get equal steps.

fig-snr-vs-stddev — **Figure 2.6.6.** *Noise is numerically worse in highlights but looks worse in shadows. Two maps of the same brightness ramp: a **standard-deviation** map (absolute noise) is* brightest in the highlights, since shot noise $\sigma = \sqrt{N}$ grows with brightness; an **SNR** map ($=\sqrt{N}$) is worst in the shadows, since the ratio falls where photons are few. Perception tracks the ratio, so shadows look noisiest — the reason for exposing-to-the-right and for gamma/log encoding.

2.6.3 Dynamic range⧉

Two extremes bound what a single exposure can hold. At the top, a pixel's well saturates: once it has counted its full-well capacity of electrons, more light adds nothing and the highlight clips to white. At the bottom, the read-noise floor sets the dimmest signal still distinguishable from noise. The ratio of the two is the sensor's dynamic range — how much of a high-contrast scene a single exposure can capture. When a scene's range exceeds the sensor's — a bright window and a dim interior in one frame — no single exposure holds both ends, which is the motivation for high-dynamic-range (HDR) imaging, merging several exposures (Multiple-exposure part) (Figure 2.6.7). Like everything else a photographer counts, dynamic range is quoted in stops — factors of two — which connects it straight to the exposure controls.

fig-dynamic-range-problem — **Figure 2.6.7.** The dynamic-range problem. One high-contrast scene (a sunset over water) developed at three exposures: expose for the highlights and the foreground is crushed to black; expose for the shadows and the sky blows out to white; the compromise loses both ends. No single exposure holds the full range — the motivation for high-dynamic-range merging of multiple exposures.

How wide a range, exactly? It helps to put numbers on it, because the dynamic ranges of real media and of the eye differ by a lot (Figure 2.6.8). A color slide is narrow, only about 5–6 stops; a reflective print about 6–7; a film negative is famously forgiving at roughly 12–13 stops of latitude; a phone sensor manages about 10–12 and a full-frame sensor about 14. The human eye sees about 10–14 stops instantaneously, but allow it to adapt — the slow re-centering of the retinal response covered with perception — and it spans 20+ stops over time; some animals do better still within their niche. Against all of these, an outdoor sun-and-shadow scene can exceed 20 stops in a single frame — which is precisely why no single capture holds it, and why HDR exists.

fig-dynamic-range-comparison — **Figure 2.6.8.** *(Deferred — to be generated.)* Dynamic range across media and eyes, as a ladder of stops: color slide (~5–6 stops), reflective print (~6–7), film negative (~12–13 latitude), phone sensor (~10–12), full-frame sensor (~14), and the human eye (~10–14 instantaneous, ~20+ once adaptation is allowed), with an animal example, set against a real sun-and-shadow scene exceeding 20 stops — which no single capture spans, the motivation for HDR.

See it — recover an underexposed shot, full-frame vs phone. The clearest way to feel that dynamic range scales with photosite size is to break one. Shoot the same underexposed scene as raw on a full-frame camera and on a phone, then push the exposure up in software by three or four stops (Figure 2.6.9). The full-frame frame cleans up — its deep full-well and low read-noise floor give it the headroom to be lifted — while the phone frame breaks into shadow noise and banding, because lifting the exposure also amplifies its read-noise floor, which sat much closer to the signal to begin with. That is a direct, visible demonstration that dynamic range scales with photosite (and sensor) size, and exactly why phones lean on burst capture and HDR+ (Multiple-exposure part) rather than trusting a single exposure. It is also a coding exercise — see Exercises & Experiments.

[figure fig-underexposure-recovery-ff-vs-phone not built]

Figure 2.6.9. (Deferred — to be generated.) The same underexposed scene, shot as raw on a full-frame camera and on a phone, then pushed up about 3–4 stops in software: the full-frame frame cleans up (deep full-well, low read-noise floor → wide dynamic range), while the phone frame reveals amplified shadow noise and banding (its read-noise floor lifted with the signal) — dynamic range scaling with sensor size.

💡 Big lesson — dynamic range = full-well capacity ÷ noise floor

A single exposure records from a top — the photosite's full-well capacity, where it saturates and clips to white — down to a bottom — the noise floor, the read noise that drowns the shadows. Their ratio is the dynamic range, quoted in stops. You widen it with a bigger well (larger photosites — why a full-frame sensor out-ranges a phone) or a lower floor (cooling, lower read noise), and you beat it altogether by merging multiple exposures (HDR). This is the capture-side companion to the quantization lesson below: what bounds a single shot is this range — full-well over floor — not the number of bits.

This is also the right moment to deflate a worry beginners often have about digital capture: that the image is quantized into a finite number of levels, and that this stair-stepping is what limits quality. It almost never is.

💡 Big lesson — quantization is rarely the real problem

With enough bits and a sane encoding, what limits image quality is noise and dynamic range — the two quantities of this section — not the number of levels the analog-to-digital converter (ADC) offers. The noise floor and the full well bound what you can record; the quantization step, once it sits below the noise, is invisible. (Indeed, a little noise dithers the quantization, hiding it further.) The one place levels do bite is banding in the deep shadows of a linearly-encoded image — too few codes where the eye, working in ratios, is most sensitive — and that is exactly the problem gamma encoding exists to solve, by spending code levels perceptually (Color technology). So "rarely" is not "never." We meet this lesson again in BASIC → Image representation → Float vs 8-bit, where floats with headroom are the sane default.

These three facts — that noise is the jitter of a photon count, that the signal-to-noise ratio is worst in the shadows, and that dynamic range is the full-well-over-floor a single exposure can hold — recur on nearly every later page: in how a photographer exposes to the right, in high-dynamic-range merging (Multiple-exposure part), and in the denoising of the basic pipeline. With measurement and its limits now in hand, the part turns next to what a second viewpoint buys you — depth — and then to the eye that will finally judge the result.

Big lessons of this chapter

The recurring principles from this chapter, gathered for review.

💡 Big lesson — in a linear image, noise variance is an affine function of brightness

💡 Big lesson — noise is clipped at black and white, so near the extremes it is not zero-mean

💡 Big lesson — bright pixels are noisier (more absolute noise) but have better SNR

💡 Big lesson — dynamic range = full-well capacity ÷ noise floor

💡 Big lesson — quantization is rarely the real problem

symbol	meaning (this chapter)	note / clash
$N$	photon count at a pixel; shot noise $\sigma = \sqrt{N}$, $\mathrm{SNR} = \sqrt{N}$	clash: $N$ is also the f-number $f/D$ (lens chapter) — disambiguated by context
$N_f$	number of frames averaged (noise drops by $\sqrt{N_f}$)	new
$\sigma$	noise standard deviation	already in Notations
$\mathrm{SNR}$	signal-to-noise ratio; for shot noise $\mathrm{SNR} = \sqrt{N}$	new

2.6 Noise, signal-to-noise ratio and dynamic range🔗⧉

2.6.1 Noise sources🔗⧉

2.6.2 Signal-to-noise ratio: it's about ratios🔗⧉

2.6.3 Dynamic range🔗⧉

Big lessons of this chapter

2.6 Noise, signal-to-noise ratio and dynamic range⧉

2.6.1 Noise sources⧉

2.6.2 Signal-to-noise ratio: it's about ratios⧉

2.6.3 Dynamic range⧉