💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
jump to
💡 In a hurry? Jump to this chapter’s 1 big lesson ↓

10.1 Denoising by averaging

The previous part measured and characterised noise. This part fights it — and the bluntest, most trustworthy weapon is to take the same picture many times and average. No prior, no model of what images look like, no guess about which pixels to trust: just repeated measurement and the law of large numbers. Where single-image denoising (Bilateral filtering) must infer which neighbours belong to the same surface and average only those, averaging multiple frames adds genuine independent information with every shot — each frame is a fresh, honest measurement of the very same scene value. That difference is the whole reason this method comes with a guarantee the single-image filters cannot offer.

The guarantee is a law, and the law is short. Model each noisy pixel as a true value plus independent, zero-mean noise. Averaging $N$ frames leaves the true value untouched while the noise's standard deviation — the visible error — falls as $1/\sqrt N$. That single relation is the quantitative spine of this entire part: it is why you bracket for HDR, why a phone shoots a burst, why an astrophotographer integrates for hours. The chapter develops the law from a two-line probability argument, then upgrades the plain mean to a robust combine that survives outliers, adds the calibration frames that remove the fixed-pattern errors averaging is powerless against, and lands on deep-sky astrophotography as the law taken to its physical extreme — pulling a galaxy out of near-nothing.

10.1.1 Why averaging works — the $1/\sqrt N$ derivation

Start before any math, with an observation you can make on a tripod. Point a camera at a static, evenly-lit grey card and shoot a burst. Each pixel should read one constant value; instead it fluctuates from frame to frame. And across the card — which should be perfectly flat — a single frame is visibly grainy. These are two faces of the same thing: noise as fluctuation over time (the same pixel wobbling shot to shot) and noise as fluctuation over space (neighbouring pixels disagreeing when the answer should be constant). The cure for both is the same one operation (Figure 10.1.1).

The naive algorithm is a one-liner: sum the frames and divide by their count.

mean = zeros_like(imSeq[0])
for im in imSeq:
    mean += im
mean /= len(imSeq)

Run it on 1, then 3, then 5, then 45 frames and the grey card's histogram tightens visibly at each step; the dark regions clean up first and most. The questions the rest of the section answers are why it works at all, and how fast.

The model. This is the one load-bearing assumption, so state it carefully. The value the sensor reports for a given pixel in frame $i$ is the true signal plus a noise term,

$$ X_i = \mu + n_i, $$

where $\mu$ is the scene value we want and $n_i$ is the noise in frame $i$. Two properties of $n_i$ carry the entire derivation. First, the noise is zero-mean, $\mathbb{E}[n_i] = 0$: it has no built-in offset, so it is just as likely to push a reading up as down. Averaging is only honest — only leaves the signal where it found it — if this holds; a noise term with a DC (constant, non-zero-mean) bias would survive the average and corrupt $\mu$. Second, the noise is independent across frames: knowing frame $i$'s error tells you nothing about frame $j$'s. This is true of shot noise (each photon arrives on its own schedule) and read noise (a fresh draw each read-out), and it is the property that makes a second frame genuinely informative rather than a repeat of the first. It is false for fixed-pattern noise — the same per-pixel error baked into every frame — which is exactly why averaging cannot touch it; we flag that here and repair it under Calibration frames. Each $n_i$ has the same variance, $\operatorname{Var}[n_i] = \sigma^2$.

The two tools. The derivation needs only two facts about variance, worth stating as reusable algebra. Scaling a random variable squares its variance,

$$ \operatorname{Var}[k X] = k^2 \operatorname{Var}[X], $$

and for independent variables, the variance of a sum is the sum of the variances,

$$ \operatorname{Var}\Big[\textstyle\sum_i X_i\Big] = \sum_i \operatorname{Var}[X_i]. $$

The second is where independence earns its keep: in general a sum's variance also carries cross-terms (covariances), and independence is precisely what makes those vanish.

The derivation, in two lines. The average over $N$ frames is $\bar X = \tfrac1N \sum_{i=1}^N X_i$. Its mean is the true value,

$$ \mathbb{E}[\bar X] = \frac1N \sum_i \mathbb{E}[X_i] = \mu, $$

so averaging is unbiased — it does not distort the signal, no matter how many or few frames. Its variance follows from the two tools: pull the $\tfrac1N$ out front (it squares), then add across the independent frames,

$$ \operatorname{Var}[\bar X] = \frac{1}{N^2}\operatorname{Var}\Big[\textstyle\sum_i X_i\Big] = \frac{1}{N^2}\cdot N\sigma^2 = \frac{\sigma^2}{N}. $$

Take the square root for the human-readable number — the standard deviation, the size of the visible error:

$$ \sigma_{\bar X} = \frac{\sigma}{\sqrt N}. $$
💡 Big lesson (the $1/\sqrt N$ law)

Averaging $N$ independent, zero-mean measurements of the same scene leaves the signal untouched but shrinks the noise standard deviation by $\sqrt N$. The variance falls as $1/N$; the visible error falls as $1/\sqrt N$. This is the quantitative spine of the whole MULTIPLE-EXPOSURE part — the same law behind HDR's noise-weighted merge, burst low-light on a phone, burst super-resolution, and astrophotography's hours of integration. Its sting is built in: the returns diminish. Each extra stop of cleanliness costs four times the frames — 100 frames buy a $10\times$ cleaner image, not $100\times$. In SNR terms the gain is $+10\log_{10} N$ dB, about 3 dB per doubling. It ties directly to L15 — dynamic range = well ÷ floor: averaging is the algorithmic way to lower the effective noise floor, the complement to widening the well, and the two together set how much real signal you can lift out of the shadows. (First developed here; registered alongside L15.)

That diminishing return is the law's whole economic content (Figure 10.1.2). It is good news and bad news at once: a quiet burst of four frames buys a clean stop for almost free, but chasing the last bit of cleanliness gets brutally expensive — which is exactly why the rest of this part looks for smarter ways to spend captures than averaging more of the same.

A short practical aside, useful when you build an analysis pipeline. From the very same stack you can measure the per-pixel noise $\sigma^2$ — but divide the sum of squared residuals by $N-1$, not $N$. This is Bessel's correction:

$$ \hat\sigma^2 = \frac{1}{N-1}\sum_i (X_i - \bar X)^2. $$

The reason is that you used the same samples to compute $\bar X$, which pulls the mean slightly toward each sample and shrinks the residuals; dividing by $N$ would systematically under-estimate the true variance. The cleanest illustration is two coin-flips: the naive $N$-estimator averages to half the true variance, while the $N-1$ version is unbiased. Map this measured $\sigma$ across a real scene and it tracks the noise sources from the previous part — numerically larger in bright regions (shot noise grows as $\sqrt{\#\text{photons}}$) yet with better SNR there. Averaging lowers the floor uniformly; it does not change which source dominates where.

[figure fig-averaging- not built]
Figure 10.1.1. Averaging tightens the histogram. The same static grey patch shot at high ISO: a single frame, then the mean of 3, of 5, and of 45 frames. Below each, the histogram of the patch's pixel values — flat and broad for one frame, visibly narrowing toward a spike as $N$ grows; the dark regions clean up first. The signal (the patch's true value) never moves; only the spread shrinks. (Deferred — to be generated. Needs a real high-ISO burst of a static grey card.)
[figure fig-sqrt not built]
Figure 10.1.2. The diminishing return, on log–log axes. Noise standard deviation versus number of frames $N$ is a straight line of slope $-\tfrac12$. Reference marks show that halving the noise (one stop cleaner) costs $4\times$ the frames, and that 100 frames buy only a $10\times$ improvement. The straight line is the $1/\sqrt N$ law.

10.1.2 When the plain mean is wrong: robust combination

The derivation assumed clean independent, identically distributed (IID) noise. Real stacks rarely cooperate. Somewhere in your burst a satellite or aircraft streaks across one frame, a cosmic ray spikes a single pixel to white, or a passer-by walks through the scene. These are outliers — values that do not come from the same zero-mean distribution as the rest — and the mean is not robust to them. A single wild value drags the average toward itself, so the streak does not disappear; it merely fades to a faint ghost trail exactly where the outlier was, scaled down by $1/N$ but never gone (Figure 10.1.3).

The simplest fix is the per-pixel median across the stack instead of the mean. A streaked frame contributes just one extreme value at the affected pixels, and the median steps right over it — the streak vanishes cleanly. The cost is efficiency: for clean Gaussian data the median is noisier than the mean (its variance is about $1.57\times$ larger), so you forfeit roughly a third of your hard-won $\sqrt N$ benefit. Robustness is not free.

The practical compromise — and the default in serious stacking — is sigma-clipping. Iterate: compute the per-pixel mean and standard deviation, reject any sample lying more than $k\sigma$ (typically $2$–$3\sigma$) from the mean, then recompute the mean on the survivors. You get the mean's efficiency on the well-behaved inliers together with the median's rejection of outliers. This is what astro stacking packages such as DeepSkyStacker and RegiStax implement under names like "kappa-sigma" or "winsorized" stacking.

It is worth naming the design choice underneath, because it recurs. Choosing between mean and median/sigma-clip is the same reject-or-average decision as edge-preserving filtering — use a difference to decide who belongs together — only applied across frames instead of across space. Plain mean trusts everyone equally; median and sigma-clip trust the consensus and treat a sample too far from its peers as not-belonging. The very same robust-combine logic returns later as de-ghosting in HDR merging and panoramas (reject pixels that disagree because something moved between captures), and as the robust merge in burst super-resolution (Super-resolution and image priors).

fig-mean-vs-median-stack
Figure 10.1.3. Mean versus robust combine, on a stack with one satellite streak. The same scene stacked two ways: the plain per-pixel mean retains a faint ghost of the streak (one frame's outlier divided across $N$); the per-pixel median (or sigma-clip) rejects it entirely, leaving a clean result. The inset shows the per-pixel sample distribution at a streaked pixel — one far-out value the consensus ignores.

10.1.3 Calibration frames: what averaging can't fix

The $1/\sqrt N$ law has fine print: it kills only zero-mean, independent noise. Two things violate that and therefore survive any amount of averaging. The first is fixed-pattern error — the same offset in every frame: hot pixels, amplifier glow, per-pixel dark current, lens vignetting, dust shadows on the sensor. Because it is identical frame to frame it is not independent, and averaging $N$ copies of the same error simply reproduces it. The second is clipping — the sensor clamps readings at $0$ and at its maximum, a non-linearity that quietly biases the mean. Neither can be averaged away; both must be removed (Figure 10.1.4).

The removal tool is calibration frames, and the classic recipe is a triad — each frame the inverse of one specific degradation. (The canonical reference treatment is Howell's Handbook of CCD Astronomy — the charge-coupled device (CCD) being the cooled image sensor of classic astrophotography Howell 2006.)

In one expression, with $X_\text{light}$ the raw exposure,

$$ X_\text{cal} = \frac{X_\text{light} - X_\text{dark}}{X_\text{flat} - X_\text{bias}}, $$

which reads back as: subtract the additive fixed-pattern terms, divide out the multiplicative one. And you of course average many of each calibration frame too, so they do not re-inject noise of their own — it is $1/\sqrt N$ all the way down.

Clipping deserves its own paragraph because the failure is so counter-intuitive. In deep shadows the true signal sits near zero, so the noise distribution straddles zero — but the sensor cannot record a negative value, so it clamps the below-zero excursions to zero, leaving only the positive half. The surviving noise is no longer zero-mean: its average is pushed up. Average a clipped true-black region and you get something measurably too bright — averaging amplifies the bias instead of cancelling it. The fix is to add a deliberate positive offset before read-out (Canon's well-known trick) so the entire noise distribution clears zero and stays zero-mean; you subtract that offset back during calibration, where it lives in the bias frame (Figure 10.1.5).

One more trick converts an otherwise un-averageable error into an averageable one. Dithering means deliberately nudging the framing by a pixel or two between frames. After you re-align the stack, any residual fixed-pattern error now lands on a different part of the scene in each frame — so once registered, it is effectively random across the stack, and averaging or sigma-clipping can finally remove it. (This is also why the hand-tremor in a handheld burst is a feature rather than a bug — the same sub-pixel jitter is what burst super-resolution exploits, in Super-resolution and image priors.)

fig-calibration-triad
Figure 10.1.4. The calibration triad. A light frame decomposed as signal plus additive fixed-pattern terms (bias offset, dark current / hot pixels) times a multiplicative response field (flat: vignetting, dust shadows). Beside it, the three calibration frames — bias, dark, flat — and what each removes, assembled into $X_\text{cal} = (X_\text{light}-X_\text{dark})/(X_\text{flat}-X_\text{bias})$: subtract the additive terms, divide out the multiplicative one.
fig-clipping-bias
Figure 10.1.5. Clipping bias in the shadows. A true-black region's noise distribution straddles zero; the sensor clamps the negative half to zero, so the surviving distribution is no longer zero-mean and its average is pushed up — averaging makes a black region come out grey. The fix (the deliberate positive offset) shifts the whole distribution above zero so it stays symmetric, then subtracts the offset back in calibration.

10.1.4 Deep-sky astrophotography: averaging at the extreme

Deep-sky astrophotography is the $1/\sqrt N$ law pushed to its physical limit, the case where every idea in this chapter matters at once. The target — a faint nebula or galaxy — is buried so deep in noise and light pollution that it is invisible in any single frame. The recipe is exactly the chapter's: shoot many long sub-exposures ("subs"), align them (the sky rotates over the hours of capture, so registration is mandatory — the bridge to Image alignment), and combine. The signal, fixed on the sky, accumulates; the noise, independent frame to frame, falls as $1/\sqrt N$. Hundreds of subs and hours of total integration lift structure out of near-nothing — spiral arms and faint outer halo emerging from frames in which no single one showed them.

At this extreme, calibration frames are not optional, they are non-negotiable. Across hours of dark-sky exposure the fixed-pattern terms — dark current, hot pixels, amp glow, vignetting, dust — would otherwise dominate the faint signal entirely. So the astrophotographer subtracts darks and bias and divides by flats, exactly the triad above, which is the historical home of these techniques. The temperature regulation of a cooled astro camera exists precisely so the dark frames match the lights.

The combine, too, must be robust, not a plain mean. Over a long session, satellites and aircraft will streak some subs, and cosmic rays will spike pixels; per-pixel sigma-clipping or median rejects them while keeping the inliers' efficiency. Dithering between subs turns any leftover fixed-pattern residue into something the combine can also remove.

Finally there is a capture-budget tradeoff, the astro edition of the exposure decisions from Noise, signal-to-noise ratio and dynamic range. Read noise is paid once per sub, so fewer, longer subs incur it fewer times; but saturation, tracking error, and clipping all favour more, shorter subs. The resolution is quantitative: once each sub's signal sits comfortably above the read-noise floor, lengthening subs no longer helps the $\sqrt N$ arithmetic — but it does reduce the number of read-noise events, so when read noise is significant (a light-polluted sky, or narrowband imaging) fewer-longer wins, whereas shaky tracking or saturating stars push you toward more-shorter.

A closing contrast sharpens what "average to denoise" really commits to. Deep-sky imaging averages everything to beat read and shot noise. Planetary lucky imaging does the opposite — it selects the sharpest few percent of frames and discards the rest, because there the enemy is atmospheric blur, not photon noise. Same burst, opposite combine rule: average all to fight noise, or pick the best few to fight blur. Which one is right depends entirely on what is corrupting your frames — and recognising that is itself an instance of L14, deferring the decision until you can see the whole captured set.


Big lessons of this chapter

The recurring principles from this chapter, gathered for review.

💡 Big lesson (the $1/\sqrt N$ law)

Averaging $N$ independent, zero-mean measurements of the same scene leaves the signal untouched but shrinks the noise standard deviation by $\sqrt N$. The variance falls as $1/N$; the visible error falls as $1/\sqrt N$. This is the quantitative spine of the whole MULTIPLE-EXPOSURE part — the same law behind HDR's noise-weighted merge, burst low-light on a phone, burst super-resolution, and astrophotography's hours of integration. Its sting is built in: the returns diminish. Each extra stop of cleanliness costs four times the frames — 100 frames buy a $10\times$ cleaner image, not $100\times$. In SNR terms the gain is $+10\log_{10} N$ dB, about 3 dB per doubling. It ties directly to L15 — dynamic range = well ÷ floor: averaging is the algorithmic way to lower the effective noise floor, the complement to widening the well, and the two together set how much real signal you can lift out of the shadows. (First developed here; registered alongside L15.)