Computational Photography, an AI-powered Slopendium — 08 Single-image computational photography

▸Part 8 SINGLE IMAGE COMPUTATIONAL PHOTOGRAPHY⧉

▸8.1 Recap: tone mapping⧉

`fig-tonemap-real` · global vs local tone mapping RESULT on a real HDR photo (seal at marina): naive single exposure · global Reinhard (one slope flattens local contrast → hazy) · local bilateral base+detail split (range fits, detail stays crisp) — real-image companion to `fig-tonemap-global-vs-local` (BASIC tone mapping) ✅

`fig-tonemap-taxonomy` · a taxonomy of local tone mapping — bilateral base/detail, gradient-domain, local Laplacian, exposure fusion, learned — placed on a shared axis 🟨

⬜ figure not yet created

`fig-darkroom-dodge-burn` (a Zone-System / dodge-&-burn darkroom print beside its digital local-tone-mapping analogue) fig-editing-lr-vs-ps

This chapter doesn't introduce new algorithms — it **gathers** the tone-mapping thread that has run since BASIC into one picture, and ties it back to the **analog darkroom** where tone control began. (Scope note: this single-image recap deliberately covers the **single-image** operators; the **multi-image** tone mapping — HDR *merge* / exposure fusion — lives in [[Multiple exposure imaging]] and is only forward-referenced here. **See Known Issue.**)

▸8.1.1 Global vs local, re-hashed⧉

• **global** tone mapping applies **one curve to every pixel** (a function of intensity alone — gamma, an S-curve, Reinhard's global operator). Simple, fast, monotone — but a single curve **cannot both compress the overall range and preserve local contrast**: flatten the range and local detail goes with it.

• **local** tone mapping lets the mapping **depend on a pixel's neighbourhood** — compress the *large-scale* range while *keeping* local contrast. The cost is the risk of **halos** (and reversals) where the locality is mishandled — the recurring failure mode introduced in BASIC.

• the recap point: every local operator below is a different answer to the same question — *how do you separate "overall brightness" (compress it) from "local detail" (keep it)?*

▸8.1.2 A taxonomy of local methods⧉

• **gradient-domain** (Fattal et al. 2002): attenuate **large gradients**, then **reconstruct** the image from the edited gradient field (a Poisson solve) — compress range by squashing big luminance steps while leaving small ones. Developed in [[Poisson image editing]].

• **bilateral base/detail** (Durand & Dorsey 2002): split the image into a **base** (large-scale, edge-preserving bilateral) and **detail**; **compress the base**, **keep the detail**, recombine. The canonical edge-aware local operator — developed in [[Edge-preserving techniques]].

• **local Laplacian** (Paris, Hasinoff & Kautz 2011): manipulate a **Laplacian pyramid** with a per-coefficient remapping that **avoids halos by construction** — local contrast control without the bilateral's artifacts. [[Edge-preserving techniques]].

• **exposure fusion** (Mertens et al. 2007): blend a **bracket of exposures** by per-pixel quality weights (well-exposedness, contrast, saturation) in a pyramid — *tone mapping without ever forming an HDR image.* (It straddles single/multi-image — flagged in the Known Issue.)

• **learned** (HDRnet, Gharbi 2017; FiveK retouching): **predict** a local (bilateral-grid affine) tone mapping from data — the operator itself learned from expert edits. [[Deep learning]].

• the table to land: same goal (separate and treat large-scale vs local), five mechanisms — gradient field, base/detail, pyramid remap, multi-exposure blend, learned. [fig-tonemap-taxonomy]

▸8.1.3 The darkroom ancestor: dodge & burn and the Zone System⧉

• tone mapping is **not** a digital invention — photographers have **locally controlled tone** since the wet darkroom. **Dodge** (hold light back from a region → lighter) and **burn** (add light → darker) are **manual local tone mapping** with the hands and bits of card under the enlarger.

• the **Zone System** (Ansel Adams): a disciplined method for **mapping scene luminances onto print tones** — meter the scene, place key values on chosen "zones," and develop/print to control where highlights and shadows land. It is **tone mapping as craft**: deciding the global transfer *and* the local adjustments before computation existed.

• the analog **characteristic curve** of film/paper (the H&D curve — toe, linear, shoulder) is the **global** tone curve in physical form; its shoulder is highlight roll-off, its toe is shadow compression — exactly what a digital global operator emulates.

• the bridge to land: digital global tone mapping = the **characteristic curve**; digital local tone mapping = **dodge & burn + the Zone System**, now automatic and per-pixel. The computation **mechanized** a century-old craft (cross-ref [[Human factors and the art of photography]]; the Intro's Adams "negative = score, print = performance" analogy).

▸8.2 Super-resolution and image priors⧉

`fig-sr-ill-posed` · many sharp HR images collapse to the same blurry LR image — the SR inverse is one-to-many (ill-posed); a prior picks one 🟨

`fig-isp-walk-real` · the ISP pipeline walked on ONE real photo (boatman gathering lotus, overcast lake, Sony DNG): raw (linear, no WB — flat & green) · white balance · tone+colour (gamma encode) · denoise · sharpen · finished JPEG, frame-coloured by linear-light vs encoded regime — real-image companion to `fig-isp-block-diagram` (Recap ISP, BASIC) ✅

⬜ figure not yet created

`fig-sr-recon-vs-halluc` (one low-res crop → reconstruction-based result from a burst (real detail) vs hallucination-based result from a learned prior (invented but plausible detail)) fig-sr-recon-vs-halluc

`fig-burst-subpixel` · hand-jitter across a burst lands samples between the LR grid points; merging the sub-pixel-shifted frames fills the HR grid 🟨

⬜ figure not yet created

aliasing made useful, after Wronski 2019)

`fig-pnp-loop` · Plug-and-Play / RED: alternate a data-fidelity step with a denoiser-as-prior step until convergence — the prior is a black-box denoiser 🟨

`fig-denoiser-as-prior-spectrum` · one PnP/RED prior slot, ever-stronger denoisers: hand-built → classical → learned → diffusion score (GenAI / Super-res) ✅

💡 **Big lesson (L10 · The prior is not optional):** when the measurement genuinely destroys information — super-resolution past the sensor's sampling, deblurring at killed frequencies, inpainting a hole — *no amount of cleverness recovers it from the data alone*. A **prior** (what natural images look like) is what selects an answer; it is a load-bearing part of the algorithm, not a tuning knob. The honest split: **reconstruction** priors fuse genuinely-measured extra samples; **hallucination** priors *invent* plausible detail that was never measured. (Registered in [[Big Lessons]] as **L10**; first appears here; recurs in [[#Blind deblurring]] — blind deblurring & dark-channel prior — and forward in [[Generative AI and diffusion]].)

equations

SR forward model $y = (k * x)\downarrow_s + n$ (blur by PSF $k$, downsample by factor $s$, add noise $n$)

MAP / variational recovery $\hat{x} = \arg\min_x \tfrac{1}{2}\lVert (k*x)\downarrow_s - y\rVert^2 + \lambda\, \Phi(x)$ (data-fit + prior $\Phi$)

PnP iteration (HQS form) — alternate a **data-fit** step $z \leftarrow \arg\min_x \tfrac{1}{2}\lVert (k*x)\downarrow_s - y\rVert^2 + \tfrac{\rho}{2}\lVert x - \tilde{x}\rVert^2$ and a **denoise** step $\tilde{x} \leftarrow \mathcal{D}_\sigma(z)$ (the prior, *any* denoiser $\mathcal{D}$)

RED gradient $\nabla \Phi(x) = x - \mathcal{D}(x)$ (prior gradient = residual of a denoiser)

Tweedie $\hat{x} = z + \sigma^2\, \nabla_z \log p_\sigma(z)$ (the score IS a Gaussian denoiser → diffusion = iterated $\mathcal{D}$)

▸8.2.1 What problem super-resolution solves (and why it's ill-posed)⧉

• **goal**: produce a higher-resolution image than the camera measured — more pixels that are *meaningfully* sharper, not just upsampled. Contrast with plain **interpolation** (bicubic), which adds pixels but no new frequencies → soft.

• **the forward model**: a low-res measurement is a high-res scene $x$ **blurred** by the optics/PSF $k$, **downsampled** by factor $s$, plus **noise** $n$: $y = (k * x)\downarrow_s + n$. Exactly the linear forward model $y = Ax + n$ of [[Linear Inverse Problems and Regression]], with $A = (\text{downsample})\cdot(\text{blur})$. [`fig-sr-forward-model`]

• **why it's ill-posed**: downsampling is **many-to-one** — it discards the high frequencies above the new Nyquist limit. Infinitely many high-res $x$ pass through the *same* $y$ (add any detail that the blur+downsample would have annihilated and $y$ is unchanged). So $A$ has no usable inverse; the data **under-determines** the answer. [`fig-sr-ill-posed`]

• **the resolution — a prior**: recover with **data-fit + regularizer**, $\hat{x} = \arg\min_x \tfrac12\lVert (k*x)\downarrow_s - y\rVert^2 + \lambda\,\Phi(x)$. The data term keeps $\hat{x}$ consistent with what was measured; $\Phi(x)$ (the **prior**) picks, among all consistent $x$, the one that *looks like a real image*. This is the MAP reading: $\Phi = -\log p(x)$, a prior × likelihood (Refreshers — Bayes).

• 💡 **the throughline**: without $\Phi$ the problem has no answer — see the **L10 big-lesson box** above. Everything below is *which* prior and *how* it's applied.

• **intuition-first framing**: SR is not "magnifying glass for files." It is "guess the most plausible scene that would have produced this blurry, aliased, noisy little image" — and *plausible* is defined by the prior.

▸8.2.2 Scenarios: single-image, burst, and hybrid space–time⧉

• **classic single-image SR**: one frame in, one bigger frame out. The data fixes nothing new above Nyquist → **all** the extra detail comes from the prior. Two classic priors: **internal** (patches recur across scales within the *same* image — Glasner 2009) and **external/example-based** (a database / learned model of how low-res patches map to high-res — Freeman 2002). Pure single-image SR is therefore **hallucination-leaning** by necessity. [`fig-sr-recon-vs-halluc`, right panel]

• **multi-frame / burst SR** (Wronski 2019, the Pixel "Super Res Zoom"): capture a **burst** of frames; **hand tremor** shifts the scene by **sub-pixel** amounts between frames. Each frame samples the *same* continuous scene at slightly different grid offsets → align with sub-pixel **registration** and fuse → a **denser** effective sampling grid. This is **genuine information**: aliasing (normally a defect) becomes the signal, because the shifts let you separate frequencies a single frame folded together. No dedicated prior needed for the *recovered* detail — the frames supply it (though robust fusion still needs a noise/motion model). [`fig-burst-subpixel`]

• key sub-points: **sub-pixel alignment is the whole game** (mis-registration → blur/ghosting); **robust merge** rejects moving objects / occlusion (else ghosts); replaces the demosaic step too (the color mosaic's offsets help). Ties to BASIC — sampling/aliasing and to multi-frame denoising ($1/\sqrt{N}$ averaging).

• **hybrid space–time SR**: trade **temporal** resolution for **spatial** (or vice versa) — combine several low-spatial/high-temporal captures (or a fast + a slow camera) so motion provides extra spatial samples and vice versa. Same idea as burst, generalized to the time axis.

• **the organizing axis** — *how much is measured vs invented*: many sub-pixel-shifted frames ⇒ mostly **reconstruction**; a single frame ⇒ mostly **hallucination**; real systems sit in between. This axis is the next section. [`fig-sr-recon-vs-halluc`]

▸8.2.3 Reconstruction vs hallucination — measured detail vs invented detail⧉

• **the honest distinction**, and why it matters for a reader: both make a sharper picture, but only one adds **truth**.

• **reconstruction-based**: the extra detail was **genuinely measured** — by multiple sub-pixel-shifted frames (burst), or by deconvolving frequencies the optics merely attenuated (not killed). It is *recovered*. Verifiable, repeatable, faithful.

• **hallucination-based**: a **learned prior** synthesizes detail that is *plausible* for natural images but was **never in the measurement** — pores on a face, text on a distant sign. It can look stunning and be **wrong** (invent the wrong digits, the wrong texture). [`fig-sr-recon-vs-halluc`]

• **the learned hallucination models** (used here as **priors**, treated as *models* in [[Deep learning]]): **Real-ESRGAN** (Wang 2021 — GAN-trained on a realistic degradation pipeline, sharp/textured outputs) and **SwinIR** (Liang 2021 — transformer restoration). They learn $p(\text{high-res} \mid \text{low-res})$ from data and sample a plausible completion. **Forward-ref**: their architecture and training live in the ML chapter; here they're one end of the prior axis.

• **caveats to surface** (pedagogical honesty): metric trap — a hallucinated result can score *better* on PSNR/SSIM yet be unfaithful (and a faithful one can score worse); the **perception–distortion tradeoff**. Forensic/ethical note: SR'd faces and license plates are **not evidence**. (Cross-ref learned perceptual metrics — LPIPS — in [[Deep learning]].)

• **resolve the open "in ML at this point?" question** — **verdict: keep super-resolution HERE**, as the *image-prior* story that unifies the whole single-image part. The chapter owns the **prior** abstraction (ill-posedness → regularizer → denoiser-as-prior → diffusion). The **learned models** themselves (Real-ESRGAN, SwinIR, GANs, diffusion nets) are taught as models in [[Deep learning]] / [[Generative AI and diffusion]], cross-referenced — not duplicated. So: priors here, models there.

▸8.2.4 Denoising as a universal prior — Plug-and-Play and RED⧉

• **the unifying move**: in $\hat{x} = \arg\min_x \tfrac12\lVert Ax - y\rVert^2 + \lambda\Phi(x)$, splitting algorithms (**ADMM / HQS**) separate the two terms into alternating steps. The regularizer step turns out to be **exactly a denoising operation** (the proximal operator of $\Phi$ = "find the nearby image the prior likes" = denoise). [`fig-pnp-loop`]

• **Plug-and-Play (PnP) priors** (Venkatakrishnan 2013): so **drop in *any* denoiser** $\mathcal{D}_\sigma$ for that step — even one with **no closed-form prior** (BM3D, NL-means, a learned CNN). Alternate:

• **data-fit step**: $z \leftarrow \arg\min_x \tfrac12\lVert (k*x)\downarrow_s - y\rVert^2 + \tfrac{\rho}{2}\lVert x - \tilde{x}\rVert^2$ (enforce the measurement; a linear solve — CG/FFT, per [[Linear Inverse Problems and Regression]]);

• **denoise step**: $\tilde{x} \leftarrow \mathcal{D}_\sigma(z)$ (impose the prior — *any* denoiser).

• payoff: **decouples** the imaging physics (the forward model $A$, swap per task — SR, deblur, inpaint, demosaic) from the **image prior** (the denoiser). One good denoiser ⇒ a solver for *every* inverse problem. (FlexISP, Heide 2014, builds a whole ISP this way.) [`fig-denoiser-as-prior-spectrum`]

• **RED (Regularization by Denoising)** (Romano 2017): make the prior **explicit** from a denoiser. Define $\Phi(x) = \tfrac12 x^\top(x - \mathcal{D}(x))$; under mild conditions its **gradient is just the denoising residual**, $\nabla\Phi(x) = x - \mathcal{D}(x)$. Then plain **gradient steps** minimize the regularized objective — the denoiser defines an actual energy, not only an opaque proximal step. (PnP ≈ proximal/implicit; RED ≈ explicit-gradient.)

• **why "universal"** (the section's headline): the **denoiser is the prior**. Improve the denoiser and *every* recovery task improves for free. This reframes "image prior" from a hand-derived penalty (TV, sparsity) to "whatever your best denoiser knows." [`fig-denoiser-as-prior-spectrum` — the spectrum from TV → BM3D → learned → diffusion]

• 💡 **callback to L10**: PnP/RED is the *mechanism* by which the not-optional prior enters — the denoise step IS the prior step.

▸8.2.5 Diffusion is iterated denoising (the continuous limit)⧉

• **Tweedie's formula**: for a Gaussian-noised observation $z = x + \sigma\epsilon$, the **MMSE denoiser** (posterior mean) is $\hat{x} = z + \sigma^2\,\nabla_z \log p_\sigma(z)$. So the **score** $\nabla_z \log p_\sigma(z)$ and a **Gaussian denoiser** are the *same object* (one is the other rearranged). [`fig-denoiser-as-prior-spectrum`, right end]

• **therefore diffusion = iterated denoising**: a diffusion model learns the score at every noise level; **sampling** repeatedly applies "denoise a little, add a little noise, repeat" from pure noise down to a clean image. That outer loop is precisely **PnP/RED taken to the continuous limit** — the prior-step denoiser is the learned score, run over a noise-level schedule.

• **consequence for recovery**: conditioning diffusion sampling on a measurement ($y$ via the forward model $A$) = a PnP/RED solver whose prior is the strongest learned denoiser we have → state-of-the-art SR / deblur / inpaint. This is the bridge from this chapter's *prior* story to generative models.

• **forward-ref**: the full diffusion treatment (DDPM, latent diffusion, score-matching, samplers) is in [[Generative AI and diffusion]]; here we only establish the **identity** denoiser ≡ score ≡ diffusion-step, closing the prior throughline. The diffusion chapter cross-refs back to *Denoising as a universal prior*.

• **closing idea** (hand-off to [[#Blind deblurring]]): the same "data-fit + prior, and the prior is doing the heavy lifting" story now plays out where the **forward operator itself is unknown** — blind deblurring — and where hand-built priors (dark-channel) still earn their keep.

`fig-naive-deconv-noise` · naive inverse filtering explodes: dividing by the blur's spectrum amplifies noise at its near-zeros 🟨

`fig-wiener-tradeoff` · the Wiener filter as a regularised inverse — the $1/\mathrm{SNR}$ term trades deconvolution sharpness against noise amplification 🟨

`fig-metric-degradation-gallery` · one clean photo vs four degradations (1-px shift, low-q JPEG, noise, blur), each labelled with PSNR + SSIM — PSNR and SSIM mostly agree but disagree where L2 is blind (shift scores worst PSNR yet looks identical; blur keeps high PSNR yet softens texture) (Image metrics, BASIC) ✅

`fig-natural-image-gradient-prior` · natural-image gradients are heavy-tailed (mostly flat, rare strong edges) vs a Gaussian — the sparse prior that breaks the blind tie 🟨

`fig-camera-shake-nonuniform` · camera-shake blur is spatially varying — a rotation gives a different PSF in each corner of the frame 🟨

`fig-bw-conversions` · one colour photo → black & white several ways: single channel (R/G/B) · average · weighted luminance · red-filter channel mix (dark sky) · an isoluminant pair collapsing to one grey — the choices differ (Converting to B&W, Color technology) ✅

`fig-point-op-contrast-spaces` · the same contrast (×gain about mid-gray) in gamma (sRGB) / linear (no gamma) / log: three outputs + remapping curves (linear crushes shadows, log preserves them; black clamped to ε for log) 🟨

⬜ figure not yet created

`fig-colorization-scribbles` (gray photo + a few color scribbles → propagated color, stopping at edges) fig-colorization-scribbles

The previous chapter inverted a **known** blur on a **clean** image — a textbook least-squares solve. Reality breaks both assumptions: sensors add **noise**, and you usually **don't know the blur kernel**. This chapter is what you do then. The unifying frame is one line — *recovery = data-fit + prior* — and each section just swaps in the prior that suits the degradation. Forward-ref the punchline early: when the prior is *learned* rather than hand-designed, this is exactly [[Machine learning]] and [[Super-resolution and image priors]] (PnP/RED).

💡 **Big lesson:** *diagonalize when you can* — shift-invariant blur is a **convolution**, so it's **diagonal in Fourier**: deblurring becomes a per-frequency **division** by the kernel's frequency response, and the Wiener filter is just that division, *regularized* per frequency. (→ see Big lesson L5, first placed in [[Linearity, Fourier, Aliasing and deblurring]] — recurs here as the reason both inverse and Wiener filters have one-line Fourier forms.)

equations

forward model with noise $Y = K * X + n$

**naive (inverse-filter) deconvolution** $\hat X = \mathcal F^{-1}\!\big[\hat Y / \hat K\big]$ — blows up where $\hat K \to 0$

**Wiener / regularized inverse** (Fourier, per frequency) $\hat X = \dfrac{\overline{K}\,Y}{|K|^2 + \mathrm{SNR}^{-1}}$ (equivalently $|K|^2/(|K|^2 + S_n/S_x)$ applied to the inverse filter)

**blind objective** $\displaystyle (\hat X,\hat K) = \arg\min_{X,K}\

\underbrace{\lVert K * X - Y\rVert^2}_{\text{data fit}} + \underbrace{\lambda\,\rho(\nabla X)}_{\text{image prior}} + \underbrace{\mu\,\psi(K)}_{\text{kernel prior}}$ with $\rho$ a **sparse / heavy-tailed** gradient penalty ($\lVert\nabla X\rVert_\alpha,\ \alpha<1$)

**color2gray** objective = match grayscale gradients to (signed) color-difference gradients $\min_g \sum \lVert \nabla g - \theta\rVert^2$

**colorization** objective = minimise color variation between pixels weighted by intensity affinity $\min_U \sum_x \big(U(x) - \sum_{x'\in N(x)} w_{xx'} U(x')\big)^2$ (developed in [[Edge-preserving techniques]])

▸8.3.1 Deblurring in the presence of noise: why naive inversion fails⧉

• the trap: with a *known* kernel and *no* noise, deconvolution is exact — divide by the kernel in Fourier, done. Add even a *whisper* of noise and that exact inverse is **useless**. [fig-naive-deconv-noise]

• where it breaks, intuitively: blur is a **low-pass** filter — it *kills* high frequencies (the kernel's frequency response $\hat K$ goes to ~0 there). The inverse filter $1/\hat K$ therefore **divides by almost zero** at exactly those frequencies. The signal there is gone, but the *noise* isn't — so you amplify noise by huge factors. Result: an image of pure speckle, not a sharp photo.

• the linear-algebra twin: same statement as **small singular values** of the blur matrix. Inversion = dividing each component by its singular value; the tiny ones (high frequencies) are where noise explodes. Naive inversion has **no defense** because nothing tells it "this frequency is noise, not signal."

• 💡 **Big lesson (recurrence of L10):** *real measurements have noise, and inversion amplifies it — so naive inversion fails and a prior is not optional.* The forward problem (blur, project, downsample) is stable and loses information; running it backward is **ill-posed** and **noise-amplifying**. You cannot recover what the forward model destroyed — you can only **fill it in with prior knowledge.** This is the spine of the whole chapter (and of super-resolution, denoising, inpainting). (→ see Big lesson **L10** · the prior is not optional; first appears in [[#Super-resolution and image priors]].)

• the fix, stated: don't invert blindly — **trade off** fidelity to the measurement against plausibility of the answer. Where the kernel is strong (low freq), trust the data; where it's weak (high freq, noise-dominated), back off and rely on what we *expect* images to look like. That trade-off, optimal under a Gaussian-noise / known-spectrum assumption, **is the Wiener filter** (next).

• encoding note: deconvolution assumes the blur is **linear in scene radiance**, so do it in **linear light** (un-gamma first), per L2 — gamma/log distorts the additive convolution model. State this assumption up front.

▸8.3.2 The Wiener filter — the regularized, noise-aware inverse⧉

• one-line idea: take the naive inverse filter and **attenuate it wherever noise would dominate**, frequency by frequency. The amount of attenuation is governed by the **signal-to-noise ratio** at each frequency.

• the formula (per frequency, in Fourier): $\hat X = \dfrac{\overline{K}\,Y}{|K|^2 + \mathrm{SNR}^{-1}}$. Read it: the $\overline K/|K|^2$ part *is* the inverse filter; the $+\,\mathrm{SNR}^{-1}$ in the denominator is the **regularizer** that stops the blow-up. Equivalently the inverse filter times a gain $|K|^2/(|K|^2 + S_n/S_x)$ (noise spectrum $S_n$, signal spectrum $S_x$). [fig-wiener-tradeoff]

• limit checks (hand-verifiable): **high SNR** ($\mathrm{SNR}^{-1}\!\to 0$) → recovers the exact inverse filter $1/\hat K$ (no noise, invert fully). **Where the kernel vanishes** ($|K|\to 0$) → the $+\,\mathrm{SNR}^{-1}$ keeps the denominator finite, so the gain → 0 instead of ∞: those lost frequencies are **left alone**, not amplified. Exactly the desired behavior.

• where it comes from (just the intuition, no derivation): the **MMSE** linear estimator under Gaussian noise — minimize expected squared error between estimate and truth → this filter. The "$\mathrm{SNR}^{-1}$" is literally a **prior on the image's power spectrum** ($1/f$-ish for natural images). So even the classic Wiener filter is **already a prior-driven method** — that's the bridge to everything that follows.

• limitation → the sequel: Wiener assumes a **known kernel** and a **stationary Gaussian** prior (a power-spectrum prior). Real images aren't Gaussian — they have **edges**, and the right prior is **sparse gradients**, not a power spectrum. Swapping the Gaussian prior for a sparse-gradient one gives the modern non-blind deconvolution (Levin 2007; Krishnan & Fergus 2009) — and when even the *kernel* is unknown, we get blind deblurring (next).

• forward-ref: the general version "any denoiser as the prior" is **Plug-and-Play / RED** in [[Super-resolution and image priors]]; the *learned* prior is [[Machine learning]].

• the new difficulty: now we **don't know the blur kernel** $K$ either (camera shake gives an unknown, photo-specific PSF). We must recover **both** $X$ and $K$ from one blurry $Y$ — twice as many unknowns as equations.

• the chicken-and-egg: knowing the sharp image makes the kernel easy (divide), and knowing the kernel makes the image easy (non-blind deconv) — but we have **neither**. [fig-blind-chicken-egg]

• why it's genuinely ill-posed (the trap to call out): the blurry image **explains itself** — $Y = (\text{sharp }X) * (\text{real blur }K)$ fits, but so does $Y = (\text{the blurry image}) * (\delta)$, i.e. "no blur at all." A naive joint optimizer happily returns the **no-op / delta-kernel** solution because it fits the data perfectly. Data-fit alone is hopeless.

• what breaks the tie — **natural-image priors:** real sharp photos have **sparse, heavy-tailed gradients** (mostly flat regions, a few strong edges); **blurring spreads** those strong edges into many weak gradients, *raising* the prior's cost. So the prior **prefers the sharp explanation** — the delta-kernel "no-blur" answer is penalized because the *blurry* image has the wrong (too-dense, too-Gaussian) gradient statistics. [fig-natural-image-gradient-prior]

• the objective: $(\hat X,\hat K) = \arg\min_{X,K}\ \lVert K*X - Y\rVert^2 + \lambda\,\rho(\nabla X) + \mu\,\psi(K)$, with $\rho$ a **sparse** penalty ($\lVert\nabla X\rVert_\alpha$, $\alpha<1$, a hyper-Laplacian) and $\psi$ a non-negativity / sparsity / energy constraint on the kernel. Non-convex in $(X,K)$ jointly.

• two ways people solve it:

• **alternating minimization** (the obvious approach): fix $X$, solve for $K$; fix $K$, do non-blind deconv for $X$; repeat — usually **coarse-to-fine** (estimate the kernel on a downsampled image first, where it's small, then refine). Simple but can collapse to the trivial solution if the prior isn't handled carefully.

• **pseudocode block** (in the draft): the coarse-to-fine alternation — `K ← δ; for each pyramid level: repeat { X ← deconv given K; K ← fit given X }; final non-blind deconv`.

• **the Fergus 2006 insight / Levin 2009 lesson:** don't MAP-estimate $X$ and $K$ *jointly* — that **biases toward the blurry (no-blur) solution**. Instead **marginalize over the image** and estimate the **kernel alone** (variational Bayes), *then* do a single non-blind deconvolution. Levin et al. 2009's evaluation is the canonical "here's why naive MAP fails and what actually works" result — worth stating as the takeaway.

• attribution to state plainly: **Fergus et al. 2006** — "Removing Camera Shake from a Single Photograph" — put blind deblurring on the map for photographs using a natural-image gradient prior + variational inference; **Levin et al. 2007/2009** clarified the *role of the prior* and *why the estimator design matters*.

• forward-ref: the learned successors (CNN/diffusion deblurring) in [[Deep learning]] — same prior, now learned from data instead of hand-modeled.

▸8.3.4 A more realistic blur model: spatially-varying (camera-shake) blur⧉

• the assumption we've been making: a **single** kernel convolved over the whole image (**shift-invariant** blur). Real **camera shake** violates it.

• why shake is *not* a convolution: hand shake is mostly **rotation** of the camera, not translation. Rotation moves different parts of the frame by **different amounts** (and in different directions) — corners smear more than the center, and the smear *curves*. There is **no single PSF** that, convolved everywhere, reproduces it. [fig-camera-shake-nonuniform]

• the better model (Whyte 2011/2014): the blurry image is a **weighted sum of the sharp image under many camera poses** — integrate over the camera's **rotational trajectory** during the exposure. Equivalently a **spatially-varying PSF** that is consistent because it comes from *one* underlying 3-D camera motion (far fewer unknowns than an independent kernel per pixel).

• consequence: deblurring becomes inversion of this **global, non-stationary** operator — still linear in $X$, still the data-fit + sparse-prior recipe, but the forward operator is the pose-integral, not a convolution (so **no single-FFT trick**; back to the iterative solvers of the previous chapter).

• also note (Whyte 2014): real shots have **saturated highlights**, which break the linear model — handling clipping is part of "realistic" deblurring.

• forward-ref (the optimistic flip side): instead of *fighting* an accidental, hard-to-invert blur, **engineer** the PSF so it's easy to invert and even encodes depth — **coded aperture / motion-invariant capture / wavefront coding** (Levin 2007; Dowski & Cathey 1995) in the coded-imaging / PSF-engineering part. "Make the forward operator nice" beats "invert a nasty one."

▸8.3.5 Decolorization (Color2Gray) as gradient-matching optimization⧉

• **placement note**: color2gray and the colorization framing below are *not* deblurring, but they are the same **data-fit + prior** inverse-problem recipe this chapter teaches — kept here as the general "advanced inverse problem" instances (their *solvers* live in [[Poisson image editing]] / [[Edge-preserving techniques]]).

• the problem: convert color → grayscale **preserving contrast that luminance alone destroys.** Two colors can be **isoluminant** (different hues, *same* luminance) — naive "take the luminance / desaturate" maps them to the **identical gray**, erasing a contrast a viewer clearly saw. [fig-color2gray]

• the framing (Gooch et al. 2005): pose it as an **optimization** — find the grayscale image $g$ whose **gradients best match the perceived color differences** between neighboring pixels. Where colors differ (even isoluminantly), force a gray-level difference; carry the **sign** so ordering is consistent.

• objective (the gradient-domain shape, ties back to [[Poisson image editing]]): $\min_g \sum \lVert \nabla g - \theta\rVert^2$, where $\theta$ is a target gradient derived from **CIELAB** luminance + chrominance differences (a signed color-contrast). Solve the resulting **Poisson-type linear system** — *the same machinery as gradient-domain editing.*

• why it sits here: it is a **least-squares reconstruction from a target gradient field** — the inverse-problem / gradient-domain pattern again, with the "prior" being *match perceived contrast*. Short section; mainly a demonstration that the framework generalizes beyond restoration.

• encoding note: contrasts are computed in a **perceptually uniform** space (CIELAB), not linear RGB — state the space explicitly (L1/L2: work where differences mean what you want).

▸8.3.6 Colorization as an optimization (the inverse-problem framing)⧉

• decision (resolve "here or in edge-preserving?"): **introduce the framing here; develop the solver in [[Edge-preserving techniques]]** (*Edge-preserving optimization (colorization)*). Here it's *another instance of the chapter's recipe*; there it's the *worked edge-aware affinity solve.* Cross-ref both ways.

• the problem: a **grayscale photo plus a few color scribbles** → propagate color over the whole image, **stopping at edges** (don't bleed color across object boundaries). [fig-colorization-scribbles]

• the inverse-problem framing (Levin, Lischinski & Weiss 2004): the **unknown** is the per-pixel chrominance $U$; the **constraints** are the scribbles; the **prior** is *neighboring pixels with similar intensity should get similar color* — an **affinity** between pixels keyed on intensity difference. Minimize color variation subject to the scribbles: $\min_U \sum_x \big(U(x) - \sum_{x'\in N(x)} w_{xx'}\,U(x')\big)^2$, with weights $w_{xx'}$ large when intensities match. A **sparse linear least-squares** solve (the same CG/multigrid machinery as [[Efficient solvers]] and Poisson editing).

• 💡 **Big lesson (callback):** the weight $w_{xx'}$ = "how much these two pixels belong together, judged by their intensity difference" *is* an **affinity** — exactly **edge-preserving = affinity** (→ see Big lesson L4, first placed in the bilateral filter; the colorization solver is the optimization-form of that idea). This is the hinge to [[Edge-preserving techniques]], where the affinity/edge-aware machinery is developed in full.

• forward-ref: the **learned** alternative — predict plausible colors from a gray image with a CNN, no scribbles (Zhang, Isola & Efros 2016) — in [[Deep learning]] / colorization.

▸8.4 Dehazing⧉

Haze isn't a deblurring problem — light is **scattered**, not convolved — but it's the same *idea* as the previous chapter in a new costume: an image-formation model that is **under-determined**, made solvable by **one well-chosen statistical prior**.

equations

**haze image model** $Y(x)=t(x)\,X(x) + A\,(1-t(x))$ (transmission $t=e^{-\beta d}$, airlight $A$, depth $d$, scattering coeff $\beta$)

**dark channel** $X^{\text{dark}}(x)=\min_c \min_{x'\in\Omega(x)} X_c(x')$

▸8.4.1 Dehazing as a prior-driven inverse problem⧉

• the shape: an under-determined image-formation model + a statistical prior that makes it solvable — the **same thesis** as blind deblurring, even though the physics (atmospheric scattering, not blur) is different.

• the (physical) forward model: $Y(x) = t(x)\,X(x) + A\,(1-t(x))$ — each pixel is the clean radiance $X$ attenuated by **transmission** $t(x)$ (how much light survives the haze; $t = e^{-\beta d}$, $d$ = scene depth), **blended with airlight** $A$ (the bright, scattered atmospheric light). Haze = a depth-dependent fade toward $A$.

• why it's under-determined: per pixel we know $Y$ (3 numbers) but want $X$ (3), $t$ (1), and the global $A$ — more unknowns than measurements. Need a prior. [fig-dark-channel]

• the **dark-channel prior** (He, Sun & Tang 2009): in a haze-free outdoor patch, *some* color channel is **nearly zero** somewhere (shadows, colorful/dark surfaces) — the patch's $\min_c\min_{\text{window}} X_c \approx 0$. Haze *lifts* this dark channel (adds airlight everywhere), so **how bright the dark channel is ≈ how much haze ≈ $1-t$.** That single observation pins down the transmission map.

• the pipeline (state as steps): estimate $A$ from the haziest (brightest-dark-channel) pixels → estimate a coarse $t(x)$ from the dark channel → **refine the transmission's edges** so it follows real object boundaries (a matting / guided-filter / edge-aware solve — He's original used soft matting, later the **guided filter**, [[Edge-preserving techniques]]) → invert the model for $X$. [fig-dark-channel]

• the takeaway to land: *the prior is doing all the work.* The model is unsolvable as stated; one well-chosen statistical regularity (dark channel) turns it into a clean recovery — the **same moral as Wiener and blind deblurring**, just a different prior.

• xref: the refinement step ties into [[Edge-preserving techniques]] (guided filtering); the learned successors live in [[Deep learning]].

▸8.4.2 The wider lineage: scattering physics, learned dehazing, and beyond⧉

• **before the dark channel — physics from scattering.** **Cozman & Krotkov** ([@cozman-krotkov-1997]) turned the haze model around to *recover depth from scattering* — given the atmosphere, the depth-dependent fade *is* a depth cue (haze carries information, it doesn't only destroy it). **Narasimhan & Nayar**'s vision-and-weather work ([@narasimhan-nayar-2002]) laid down the systematic physics of imaging through fog, haze, and rain and how to invert it (classically from two images or a known airlight) — the physical foundation the single-image priors later shortcut.

• **learned dehazing.** Replace the hand-built prior with a network. **Zhang, Sindagi & Patel** ([@zhang-sindagi-patel-2019]) train a single network that **jointly estimates the transmission map and produces the dehazed image** end-to-end, instead of estimating $t$ by a prior and inverting separately — the learned successor to the dark-channel pipeline (forward-ref [[Deep learning]]). (Also flagged: **SDDE** — *queue exact ref / author to confirm*.)

• **out of the air, into the water.** **Sea-thru** ([@akkaynak-treibitz-2019]) corrects **underwater** images with the *right* physics: unlike atmospheric haze, water attenuates each wavelength differently and the veiling light varies with range, so a single global airlight is simply wrong — Sea-thru estimates wideband, range-dependent attenuation (from a known depth/range map) and removes the water's color cast where naive dehazing fails.

• **haze as a task, not just a nuisance.** **Sakaridis, Dai & Van Gool** ([@sakaridis-etal-2018]) synthesize fog onto clear images to build training data for **semantic scene understanding in fog**, so dehazing becomes a means to robust *perception* (driving, robotics) — closing back to the depth-from-scattering view that fog *carries* scene information.

▸8.4.3 Differentiable image pipelines and algorithm optimization (Halide)⧉

• the meta-point that closes the inverse-problem run: every method across these two chapters is *an optimization with a prior* — but who **optimizes the pipeline itself**? When a restoration/ISP pipeline has knobs (filter weights, kernel parameters, the whole sequence of stages), you want to **tune them by gradient descent against a quality loss** — which means the pipeline must be **differentiable** and **fast**.

• **Halide** (Ragan-Kelley et al. 2013): a language that **separates *what* an image pipeline computes (the algorithm) from *how* it runs (the schedule)** — tiling, vectorization, parallelism, fusion. One algorithm, many schedules → portable high performance without rewriting the math. This is *the* tool the rest of computational photography is implemented in.

• **differentiable Halide** (gradient Halide, Li et al. 2018): add **automatic differentiation** through the pipeline → you can **backprop a loss to the pipeline's parameters** (or use the pipeline as a layer inside a network). Now "tune the deblurring/dehazing/ISP parameters" or "learn the filter" is just gradient descent through a fast, scheduled pipeline.

• why it closes the chapter: it's the **engine** under the whole "data-fit + prior, minimized" program — and the bridge to [[Machine learning]] (differentiable pipelines = the boundary where classical optimization meets learned components; cf. FlexISP, Heide 2014, as a differentiable-ISP example).

• scope note: keep this **conceptual** (separation of algorithm/schedule; differentiability as an enabler), not a Halide tutorial — point to the language for hands-on use.

▸8.5 Style transfer⧉

`fig-style-transfer-zoo` · the zoo of style transfer — patch/analogies, texture statistics (Gram), neural optimisation, feed-forward, image-to-image GANs — same skeleton, swap the prior 🟨

`fig-point-op-levels` · levels on a real (flat/hazy) photo: bunched-up luma histogram stretched out to fill [0,1] by setting a black point and a white point — input + after + transfer curve with the clip-and-stretch anchors and the two histograms (Point operations → Black point, white point, and levels, BASIC) ✅

`fig-neural-style-content-style` · neural style: a content loss (deep feature match) balanced against a style loss (Gram-matrix match), summed into one objective 🟨

The thread of this chapter is one sentence: **match the *look* of a reference onto a target, keeping the target's content.** It runs from **classical** methods — match patches or low-order statistics by hand — to **learned** ones — match deep-feature statistics, or learn a whole image-to-image mapping. Several of these pieces appear as *models* in [[Deep learning]]; here they're gathered under the single idea of **style transfer**.

equations

(light) **neural style** objective $\min_{\hat I}\ \alpha\,\lVert\phi_\ell(\hat I)-\phi_\ell(I_{\text{content}})\rVert^2 + \beta\sum_\ell \lVert G_\ell(\hat I)-G_\ell(I_{\text{style}})\rVert^2$, with **Gram matrix** $G_\ell = \phi_\ell\phi_\ell^\top$ (style = feature correlations)

feed-forward variant trains a net to minimize the same **perceptual loss** in one pass

▸8.5.1 Classical style transfer: patches and statistics⧉

• **Image Analogies** (Hertzmann et al. 2001): give an example pair **A → A′** (an image and its filtered/stylized version) and a new target **B**; synthesize **B′** so that *"A : A′ :: B : B′"* — learned **by patch matching** (for each B patch, find the best-matching A patch and copy its A′ counterpart). A single example teaches a filter/style — texture-synthesis machinery turned into a transfer, the **pre-deep ancestor** of neural style (cross-ref [[Deep learning]] and [[#Inpainting, texture synthesis]]).

• **color (palette) transfer** (Reinhard et al. 2001): the leanest classical method — make a target photo take on a reference's **palette/mood** by matching just the **mean and standard deviation** of each color channel. Convert both images to a **decorrelated color space** (l$\alpha\beta$ / Ruderman — axes statistically near-independent for natural scenes, so channels can be retargeted separately without cross-talk), then per channel **shift** to the reference's mean and **scale** by the ratio of standard deviations. Only **first- and second-order statistics** — no patches, no histograms — yet enough to swap a sunset's warmth onto a daylight shot. The classical, **pre-deep ancestor of neural style / color grading**: where Gatys later matches *deep-feature* second-order statistics (Gram matrices), Reinhard matches *raw-pixel* low-order statistics in a smart color space. (Forward-ref the learned / style-based color-grading successors → [[Deep learning]] and *Style transfer as image-to-image translation* below; the decorrelated transfer space ties to [[Human (and animal) vision and color]].)

• **two-scale tone / style transfer** (Bae, Paris & Durand 2006): transfer the **pictorial look** of a model photograph onto a target by **separating scales** — an edge-preserving **base / detail** split — and transferring the **large-scale tonal distribution** (a histogram-match of the base) and the **detail/texture** amount independently. Classic example: give a snapshot the tonal drama of an Ansel Adams print. Ties to the base–detail decomposition of [[Edge-preserving techniques]] and to histogram matching (BASIC).

• the classical takeaway: style here is **low-order statistics + patches** — global mean/variance (Reinhard), histograms, local texture, exemplar patches — matched by hand-designed pipelines. The arc rises in statistical order: Reinhard's first/second moments → histograms/patches → the deep-feature statistics below, where the learned methods replace "hand-designed statistic" with "deep-feature statistic."

▸8.5.2 Neural style and feed-forward stylization⧉

• **neural style** (Gatys, Ecker & Bethge 2016): the breakthrough that **content = deep CNN features** (what is where) and **style = the *correlations* among those features**, captured by the **Gram matrix** $G_\ell=\phi_\ell\phi_\ell^\top$ at each layer. **Optimize an image** to simultaneously match the target's content features and the reference's style Gram matrices: $\min_{\hat I}\ \alpha\lVert\phi(\hat I)-\phi(I_c)\rVert^2 + \beta\sum_\ell\lVert G_\ell(\hat I)-G_\ell(I_s)\rVert^2$. Slow (an optimization per image), but it reframed style as *deep-feature statistics*. [fig-neural-style-content-style]

• **feed-forward / perceptual-loss style transfer** (Johnson, Alahi & Fei-Fei 2016): **train a network once** to produce the stylized output in a **single forward pass**, by minimizing the *same* content+style **perceptual loss** (deep-feature distance) Gatys optimized iteratively. Real-time stylization; the same **perceptual / feature loss** reappears as a training loss for super-res and as LPIPS (cross-ref [[Deep learning]]).

• **Markovian / patch-based neural style** (Li & Wand 2016): combine **MRFs with CNNs** — match style on **local neighbourhoods of deep features** (a patch/Markovian style term) rather than global Gram statistics, better preserving local structure. Sits **between Image Analogies (patches, no net) and Gatys (global feature statistics)** — patches, but in deep-feature space.

▸8.5.3 Style transfer as image-to-image translation⧉

• **the GAN view**: when "style" is a whole **domain** (photos↔paintings, day↔night, summer↔winter), pose stylization as **learned image-to-image translation** — a generator maps one domain's look onto another, a discriminator keeps it plausible.

• **Pix2Pix** (Isola et al. 2017): **paired** translation — when you have aligned (input, styled-output) pairs, learn the map directly with a conditional GAN. The generic "learn the map between two image domains" tool.

• **CycleGAN** (Zhu et al. 2017): the **unpaired** version — when no aligned pairs exist (you have *photos* and *Monets*, but not the same scene as both), **cycle-consistency** ($B\to A\to B$ should return $B$) lets you learn the style mapping without pairs. The workhorse for "make my photo look like X" when X is only a *collection*.

• placement: these are surveyed as *models* in [[Deep learning]] (generative image-to-image); here they're the **learned, domain-level** end of the style-transfer spectrum. Forward-ref: text-/exemplar-driven stylization with diffusion ([[Generative AI and diffusion]]) is the current frontier.

• refs: Reinhard et al. 2001 (color transfer between images); Hertzmann et al. 2001; Bae, Paris & Durand 2006; Gatys et al. 2016; Johnson, Alahi & Fei-Fei 2016; Li & Wand 2016; Isola et al. 2017 (Pix2Pix); Zhu et al. 2017 (CycleGAN).

▸8.6 Inpainting, texture synthesis⧉

`fig-inpaint-prior-spectrum` · the hole-filling prior spectrum: PDE/diffusion (smoothness) → exemplar/texture (self-similarity) → learned/generative (semantics) 🟨

`fig-pde-isophote-diffusion` · PDE inpainting continues isophotes (level lines) smoothly into the hole — diffusion of structure inward (Bertalmío 2000) 🟨

`fig-thin-lens-vs-real` · side-by-side "convenient lie vs reality": ideal thin lens (single plane, all parallel rays → one focus) vs a real thick multi-surface lens where outer rays focus short of the paraxial focus (spherical aberration → blur, no single point) ✅

`fig-efros-leung-growth` · Efros–Leung non-parametric synthesis: grow one pixel at a time by matching its known neighbourhood against the sample 🟨

`fig-quilting-seam` · image quilting: lay overlapping patches and cut along the minimum-error boundary so seams disappear (Efros–Freeman 2001) 🟨

⬜ figure not yet created

`fig-object-removal` (Criminisi: a person erased, the hole filled by confidence-/edge-priority patch order so structures continue across it) fig-object-removal

⬜ figure not yet created

`fig-highlight-recovery` (a clipped specular spot → inpainted plausible surface color) fig-highlight-recovery

This chapter is the most literal instance of 💡 L10. A hole has **no measurement at all** — the only thing that can fill it is a model of what images look like. So inpainting is best read as a **ladder of priors**, from the weakest ("be smooth") to the strongest (a learned generative model), and the whole chapter is *which prior, and where it copies-vs-invents*. The recurring honesty: filled pixels are **plausible, never measured** — verifiable when the prior copies genuine structure from elsewhere, an outright **guess** when it hallucinates.

💡 **Big lesson (recurrence of L10 · the prior is not optional):** filling a hole is the purest case — there is *zero* data inside it, so **100% of the answer comes from the prior.** Smoothness, self-similarity, a photo database, a learned net — each is a different prior, and the result is exactly as good (and as honest) as the prior is. (→ see Big lesson **L10**, first placed in [[#Super-resolution and image priors]]; here taken to its extreme — no data, all prior.)

▸8.6.1 Inpainting as filling unmeasured pixels — the spectrum of priors⧉

• **the problem**: given an image with a **masked region** (a scratch, a date-stamp, a removed power line, a missing JPEG block), produce a complete image where the hole is **plausible and seamless**. There is **no data in the hole** — the entire fill is the prior's invention. [`fig-inpaint-prior-spectrum`]

• **the organizing axis — weak → strong prior** (this is the chapter):

• **smoothness / PDE** (weakest): *"continue the surrounding color/edges smoothly."* Great for **thin** gaps; **blurs** anything large because smoothness has no notion of *texture*.

• **self-similarity / exemplar** (the workhorse): *"this image repeats itself — copy patches from elsewhere in the same image."* Texture synthesis, clone/heal, Criminisi object removal. Copies **real** structure (reconstruction-leaning), but only structure the image already contains.

• **database** (Hays & Efros): *"the answer is in some other photo"* — retrieve a matching scene from millions and graft a region in. The prior is *the rest of the internet*.

• **learned generative** (strongest): a net that has internalized $p(\text{image})$ **generates** a completion (context encoders → gated conv → diffusion inpainting). Most flexible, most prone to **hallucination**.

• **intuition-first framing**: not "Photoshop magic" — *"the algorithm answers a question it has no data for, by assuming the missing region looks like (a) its neighbours, (b) some other part of this image, (c) some other image, or (d) a typical image."* Those four assumptions are the four families.

• 💡 **the throughline**: this is L10 with the dial at max — see the box. Everything below is *which* prior.

▸8.6.2 PDE / diffusion-based inpainting⧉

• **the idea** (Bertalmío et al. 2000): mimic what a **museum restorer** does to a cracked painting — **continue the lines** arriving at the hole's boundary smoothly into it. Formally, **propagate the image's smoothness (its Laplacian) *along* the level-lines (isophotes)** so edges arriving at the hole are extended across it rather than blurred over. [`fig-pde-isophote-diffusion`]

• **why "along isophotes" matters**: naive diffusion (solve Laplace's equation in the hole — a membrane / harmonic fill) smears *across* edges → a soft blur. Steering the diffusion **along** the level-lines keeps an edge continuing as an edge. (Telea 2004 is a fast **marching-method** approximation.)

• **good for / hard limit**: **thin** structures — scratches, text overlays, fine cracks, scanner dust. For a **large** hole there is nothing to propagate but flatness → the fill goes **blurry and textureless** — the motivation for the exemplar/texture methods next.

• **placement note**: this is the **gradient-domain / Poisson** family in disguise — cross-ref [[Poisson image editing]] (💡 L9): seamless *cloning* edits the gradient field too; PDE inpainting edits it where there's *no source at all*.

▸8.6.3 Texture synthesis (Efros–Leung; Efros–Freeman quilting)⧉

• **the reframing**: a large hole in a *textured* region (grass, gravel, a sweater) can't be diffused — but it **can be synthesized**, because texture is **self-similar**: patches recur.

#### Non-parametric texture synthesis (Efros–Leung)

• **non-parametric texture synthesis** (Efros & Leung 1999): synthesize **one pixel at a time** — for the next unfilled pixel $p$, take its **already-known neighbourhood** $N(p)$, search the source for **windows that match it** (Gaussian-weighted SSD over the known part), pick a best match, and **copy its centre pixel**. No texture *model* is fit — the sample image **is** the model (a Markov-random-field assumption). [`fig-efros-leung-growth`]

• **the one knob — window size**: too **small** → loses large-scale structure (noise); too **large** → copies the source **verbatim** and is slow. Must straddle the texture's characteristic scale. Honest cost: **slow** (a full search per pixel) → the motivation for PatchMatch ([[#Patch match]]).

• **pseudocode block** (in the draft): the frontier loop — `while unfilled: pick the most-surrounded pixel p; SSD-match N(p) against source windows; copy a near-best match's centre into p`.

#### Image quilting (Efros–Freeman)

• **image quilting** (Efros & Freeman 2001): synthesize a **whole patch at a time** — lay down overlapping blocks copied from the source, and where two overlap, **cut along the minimum-error boundary** through the overlap (a tiny dynamic-program / 1-D graph cut) so the seam is invisible. Faster than pixel-at-a-time, and the seam-cut prefigures **GraphCut textures** (Kwatra 2003) and 💡 **L12**. [`fig-quilting-seam`]

• **parametric texture** (the road not taken): **Heeger & Bergen 1995** / **Portilla & Simoncelli 2000** match **statistics** of **steerable-pyramid** subband responses — the direct ancestor of **neural style's Gram-matrix** idea (cross-ref [[Style transfer]], [[Linear pyramids and wavelets]]).

• 💡 **L12 connection**: the patch + min-error-seam view is **discrete optimization on a graph** (match cost + seam cost) — the energy-is-the-design lesson (→ Big lesson **L12**).

▸8.6.4 Exemplar inpainting: clone, healing brush, and object removal (Criminisi)⧉

• **clone brush**: the manual exemplar prior — **copy pixels from a chosen source region**. Pure copy → **visible seams** where lighting differs. The **healing brush** fixes that by transferring **texture/detail** but **matching the destination's color/illumination** — paste the source's *gradients* and re-level (the **gradient-domain / Poisson** trick — cross-ref [[Poisson image editing]], 💡 L9).

• **exemplar-based inpainting / object removal** (Criminisi, Pérez & Toyama 2004): automate the clone brush with texture-synthesis-style **patch copying** plus a crucial **fill order** — a **priority** $P(p)=C(p)\,D(p)$ (a **confidence** term × a **data/edge** term) fills **high-priority patches first**, so strong linear structures (a wall edge, a horizon) continue across the hole *before* flat regions. [`fig-object-removal`]

• **scaling → next sections**: the search is slow and limited to *this* image; **PatchMatch** ([[#Patch match]]) makes it interactive; a **database** (Hays & Efros) or a **learned net** lifts the "only this image" limit.

▸8.6.5 Data-driven scene completion (Hays & Efros)⧉

• **the provocation**: if exemplar methods are limited to *this image's* content, **use a different image.** Hays & Efros 2007 fill a hole by **retrieving, from millions of photos, scenes that match the hole's surround**, then **seamlessly graft** a region from a good match (alignment + a graph-cut seam + Poisson blend). [`fig-inpaint-prior-spectrum`, database panel]

• **the lesson — "the internet is your prior"**: with a big enough database, *some* photo already contains a plausible completion; you **retrieve** rather than model. The non-parametric prior taken to web scale.

• **honest caveats**: it grafts a **semantically different** scene — coherent and photoreal, but not *this* scene's truth (a vivid L10 "plausible ≠ measured"); needs a huge database and good scene matching.

• **why it's the hinge to deep learning**: "retrieve the right completion from millions of images" is what a **generative net** does *implicitly* — it has compressed those millions into weights and can **synthesize** the completion.

▸8.6.6 Deep inpainting (context encoders → partial/gated conv → diffusion)⧉

• **the move** (💡 L8, [[Deep learning]]): replace the hand-built prior with a **learned** one — a net trained to **regenerate masked regions**.

• **context encoders** (Pathak et al. 2016): the first CNN inpainter (encoder–decoder, reconstruction + adversarial loss) — captures **semantics** but early versions were blurry.

• **partial / gated convolutions** (Liu et al. 2018; Yu et al. 2019): fix the core nuisance — a vanilla conv near a hole mixes in **garbage masked pixels**. Partial conv renormalizes by the *valid* fraction; gated conv *learns* a soft mask gate → robust to **free-form** holes, the practical "remove this object" workhorse.

• **diffusion inpainting** (Lugmayr et al. 2022, RePaint): the strongest prior — treat the hole as a **measurement constraint** and, at every denoising step, **pin the known region** to the (renoised) real pixels: $x_{t-1}\!\leftarrow\!\text{mask}\odot q(x^{\text{known}}_{t-1}) + (1-\text{mask})\odot \mathcal{D}_\theta(x_t)$. This is **exactly PnP/RED posterior sampling** (💡 L11) with operator = "keep the known pixels" — the continuation of [[#Super-resolution and image priors]].

• **forward reference**: the architectures live in [[Deep learning]] / [[Generative AI and diffusion]]; here they're the **strong-prior end**. Honest note: most flexible **and** most prone to **hallucinate** — an inpainted region is **not evidence** (L10 ethics).

▸8.6.7 Highlight / specular recovery⧉

• **the framing — clipping is a hole**: a **blown highlight** is a region where the sensor **saturated** — the true radiance above the clip is *unmeasured*. So **highlight recovery is inpainting a clipped region**: fill the saturated pixels using the **unclipped channels** and surrounding texture as the prior. [`fig-highlight-recovery`]

• **two angles**: (a) **specular vs diffuse separation** — recover the diffuse layer under the specular add-on (cross-ref [[#Illumination related effects in a single image]]); (b) **per-channel clipping recovery** — when only one channel clipped, the others carry the hue, so extrapolate the missing channel (a small, well-posed inpainting).

• **why it lives here**: same L10 idea — *recover what the measurement destroyed* (by clipping rather than masking); the single-image cousin of **HDR merge** (capture the highlight with a shorter exposure instead of guessing it — forward-ref [[Multiple exposure imaging]]).

▸8.6.8 Epitomes (compact patch models)⧉

• **the idea** (Jojic, Frey & Kannan 2003): summarize an image (or video) by a **small "epitome"** — a miniature holding the **patch statistics** of the whole, learned so that every original patch has a close match somewhere inside it. A **generative patch model**: the big image is explained as a quilt of epitome patches.

• **uses**: denoising, inpainting, super-resolution, and texture transfer all fall out of "find each patch's place in the epitome, read back a clean version" — the **patch-NN** idea of [[#Patch match]] with a *learned compact dictionary* instead of the image itself.

• **place**: the pre-deep cousin of today's learned patch / latent priors (**L8 / L11**); a bridge from exemplar texture synthesis to modern generative models.

▸8.7 Patch match⧉

`fig-nnf-field` · the nearest-neighbour field: every patch points to its best match elsewhere — visualised as a colour-coded offset map 🟨

`fig-patchmatch-three-steps` · PatchMatch's three moves — random initialisation → propagation (good offsets spread to neighbours) → random search (refine locally) 🟨

⬜ figure not yet created

`fig-patchmatch-apps` (one image → hole-filled, retargeted (wider, no distortion), reshuffled (an object dragged, surround re-synthesized)) fig-patchmatch-apps

`fig-shiftmap-labeling` · Shift-Map editing as a graph-cut labelling: each output pixel is assigned a shift, optimised for data + smoothness 🟨

The previous chapter's exemplar methods (Efros–Leung, Criminisi) all live or die on one operation: *for each patch, find its most similar patch elsewhere.* Done naively that's a full search per patch — far too slow to be interactive, and the real reason texture synthesis felt like a batch job. This chapter is the **two ways to make that fast or optimal**: a brilliant **randomized heuristic** (PatchMatch) that gets a near-perfect answer in milliseconds, and a **global graph-cut** (Shift-Map) that gets the *optimal* answer more slowly. The unifying object is the **nearest-neighbour field**; the unifying lesson is 💡 **L12** — *the edit is an optimization over a correspondence, and the energy you pick is the design.*

💡 **Big lesson (recurrence of L12 · image edits as optimization on a graph):** "where does each output pixel come from?" is a **labelling** — assign every output pixel a **shift** into the input. Pick the **energy** — a data term (respect the user's keep/remove/resize constraints) plus a **seam** term (neighbouring pixels should take consistent offsets, so cuts fall on real edges) — and a generic **graph-cut** solver does the rest. PatchMatch is the *fast heuristic* answer to the same labelling; Shift-Map is the *globally-optimal* one. (→ see Big lesson **L12**, first placed in Seam optimization; it is the affinity lesson L4 turned into a *cut*.)

▸8.7.1 The nearest-neighbour field, and why exhaustive search is the bottleneck⧉

• **the shared sub-problem**: inpainting, texture synthesis, retargeting, NL-means denoising, Image Analogies — all need, for each output patch, **the most similar source patch**. Collect those answers → a **nearest-neighbour field (NNF)**: an **offset** $f(p)=\Delta$ per pixel s.t. $P_{\text{in}}(p+\Delta)$ best matches $P_{\text{out}}(p)$. [`fig-nnf-field`]

• **the cost wall**: computed exhaustively, each patch is compared to **every** candidate → quadratic; kd-trees help but stay far from interactive on megapixels. *This* is why Efros–Leung and Criminisi were slow, and why "Content-Aware Fill" didn't exist as a click-and-wait tool until the NNF could be computed in real time.

• **the reframing**: compute the NNF **approximately, but fast** — a **randomized heuristic** (PatchMatch) or a **global energy minimization** (Shift-Map). The rest of the chapter is those two.

▸8.7.2 PatchMatch — randomized correspondence [@barnes-etal-2009|Barnes et al. 2009]⧉

• **the two facts it exploits**:

1. **coherence** — adjacent output patches usually map to **adjacent** source patches, so **one good offset is good for its neighbours.**

2. **the birthday-paradox / law of large numbers** — over a whole image **some** patches get a good match by **random guessing**; you just need to **spread** the lucky hits.

• **the algorithm — three moves** [`fig-patchmatch-three-steps`]:

1. **random initialization**: every pixel gets a **random** offset. Mostly bad — a few good.

2. **propagation**: scan (alternating raster order each iteration); for $p$, also **try the offsets its scanned neighbours used** (shifted by one) → good offsets **flood-fill** across coherent regions.

3. **random search**: refine the best by testing random offsets in an **exponentially shrinking window** ($w\cdot\alpha^i$) — large jumps escape local optima, small jumps fine-tune.

• iterate 2–5 times → a clean NNF in **milliseconds**, orders of magnitude faster than exhaustive search.

• **pseudocode block** (in the draft): the init-then-sweep loop — `init f randomly; repeat passes (alternating scan): for each p: propagate from neighbours, then random-search in a shrinking window; return f`.

• **why it converges fast**: propagation makes a single lucky match **infect** an entire coherent region in one scan; random search avoids getting stuck. *Randomness + coherence beats brute force.*

• **generalized PatchMatch** (Barnes et al. 2010): k nearest neighbours + **rotations/scales** — needed for matching transformed content (style transfer, denoising).

• 💡 **why it sits next to Shift-Map**: PatchMatch answers "where does each patch come from?" **heuristically and fast**; Shift-Map answers the **same** question **optimally and slower** (next).

▸8.7.3 Applications: hole filling, retargeting, reshuffle⧉

• **interactive hole filling / Content-Aware Fill**: run PatchMatch between hole and rest, copy best-matching patches in, iterate coarse-to-fine for global coherence — the engine that turned exemplar inpainting into a **real-time** tool. The *objective* it approximately optimizes is **bidirectional similarity** (Simakov et al. 2008 — output should be *coherent* and *complete*). [`fig-patchmatch-apps`, hole panel]

• **image retargeting** (content-aware resize): change aspect ratio while **preserving important content** (vs **seam carving**, Avidan & Shamir 2007, the greedy cousin) — reshape without squashing faces/buildings. [`fig-patchmatch-apps`, retarget panel]

• **reshuffle**: **drag an object** to a new location; re-synthesize the rest so the scene stays coherent. [`fig-patchmatch-apps`, reshuffle panel]

• **and beyond**: **video completion** (Wexler et al. 2007), **Image Melding** (Darabi et al. 2012 — PatchMatch + gradient-domain), denoising self-similarity, the search inside Image Analogies (cross-ref [[Style transfer]]).

• **honest framing (L10 callback)**: all still **copy** real content (reconstruction-leaning within the image); the *generative* successors (diffusion inpainting/outpainting) **invent** content — more powerful, less faithful.

▸8.7.4 Shift-Map image editing [@pritch-etal-2009|Pritch et al. 2009] — editing as graph-cut labelling⧉

• **the reframing**: instead of a heuristic search, decide **per output pixel** a single **shift** $M(p)$ into the input by **minimizing a global energy**, then read the output by copying `input(p + M(p))` — the output is a **rearrangement** of input pixels under a smooth offset field. [`fig-shiftmap-labeling`]

• **the energy** (💡 L12): $E(M)=\sum_p E_{\text{data}}(M(p)) + \lambda\sum_{(p,q)} E_{\text{smooth}}(M(p),M(q))$.

• **data term** encodes the task: retargeting (force output size / drop columns), object removal (forbid shifts into the removed region), reshuffle (pin content to its new place).

• **smoothness / seam term** penalizes **inconsistent neighbouring shifts** by **color + gradient discontinuity** — cheap seams fall on **real edges** (same principle as quilting's min-error cut).

• **the solver — graph cuts / α-expansion** (Boykov et al. 2001), often **coarse-to-fine** (the label set — all shifts — is large) → a **globally (near-)optimal** offset map.

• **PatchMatch vs Shift-Map — the duality**: both produce an **offset field** rearranging input into output. PatchMatch = randomized **heuristic**, fast, approximate, great for dense synthesis (fill, reshuffle). Shift-Map = a **global graph-cut energy**, slower, optimal for its energy, with clean explicit seams — great for retargeting / removal. This is the **L4→L12** progression: an **affinity** (patch similarity / seam cost) used by a **fast local rule** or a **global cut** — *the affinity is the design; the solver is a choice.*

• **cross refs**: seam carving (Avidan & Shamir 2007) is the greedy special case Shift-Map generalizes; the seamless blend after a cut is the **Poisson** solve ([[Poisson image editing]], L9); the learned successor is [[Generative AI and diffusion]] (L11).

▸8.8 Compositing, segmentation and matting⧉

`fig-matting-equation` · the matting equation — one boundary pixel is $\alpha F + (1-\alpha)B$: a foreground colour, a background colour, and a soft coverage $\alpha$ 🟨

`fig-graphcut-segmentation` · binary segmentation as a min cut: pixel nodes, source/sink terminals, t-links (Dp) + n-links (Vpq, small across edges), min cut severs the cheap edges (Seam optimization) ✅

`fig-segmentation-graphcut` · binary segmentation as a min-cut on a grid graph wired to source/sink terminals; the cut is the object boundary (GrabCut inset) 🟨

`fig-segmentation-three-framings` · three framings of segmentation — min-cut, normalized-cut, and least-cost path (intelligent scissors) — on the same boundary 🟨

⬜ figure not yet created

`fig-trimap-to-alpha` (user trimap FG/BG/unknown → recovered continuous $\alpha$ over the unknown band, e.g. hair, after closed-form matting) fig-trimap-to-alpha

`fig-key-vs-measure` · two ways to get the cut-out: *key* it (constrain the background — green screen) vs *measure* the separation (depth / IR / flash) 🟨

⬜ figure not yet created

`fig-harmonization` (a correct-$\alpha$ cutout pasted raw ("stickered on") vs Poisson-blended vs color/lighting-harmonized → "belongs") fig-harmonization

The previous chapters fixed *one* image — denoise, deblur, recover detail. Now we **combine** images: take the subject out of one photo and drop it into another. That splits cleanly into two jobs — **cut** (which pixels are the subject?) and **blend** (lay it down so the join is invisible). The first is **segmentation/matting**; the second is **compositing** (and its seamless cousins, Poisson and pyramid blending, built in EDGES MATTER and only pointed to here). The honest twist is that "which pixels are the subject" has no crisp answer at hair, fur, motion blur and glass — so the right object is not a binary mask but a **soft $\alpha$**, and the model of the world is one line: $C = \alpha F + (1-\alpha)B$.

💡 **Big lesson (recurrence of L12 · image edits as discrete optimization on a graph — the energy is the design):** a *hard* selection is exactly the **graph cut** of [[Seam optimization]] — pixels are nodes, you add a **region data term** (this color looks like foreground / background) and the same **edge-aware smoothness term** (cheap to cut where neighbouring colors differ), and min-cut/max-flow returns the **globally optimal** boundary for two labels. The optimizer is off-the-shelf; *choosing the energy is the whole design.* And the smoothness weight $V_{pq}\propto e^{-\lVert C_p-C_q\rVert^2/2\sigma^2}$ is an **affinity** — so this is also a recurrence of **L4 · edge-preserving = affinity**: the very same "how much do these two pixels belong together" that drove the bilateral filter, now deciding *where not to cut*. (→ see Big lesson **L12**, first placed in [[Seam optimization]], extending **L4**, first placed in [[Bilateral filtering]]; here both recur, as a *cut* and, below, as the *matting Laplacian*.)

equations

**compositing / matting equation** $C = \alpha F + (1-\alpha)B,\ \alpha\in[0,1]$ — *forward* = composite, *inverse* = matting (under-determined: 3 knowns, 7 unknowns per pixel)

**graph-cut segmentation energy** $E(\mathbf{L}) = \sum_p D_p(L_p) + \lambda \sum_{(p,q)\in\mathcal N} V_{pq}(L_p,L_q)$ with **data term** $D_p$ (color fit FG vs BG) and **edge-aware smoothness** $V_{pq}\propto \exp(-\lVert C_p-C_q\rVert^2/2\sigma^2)$ (an **affinity**, **L4**), minimized **exactly** for two labels by min-cut/max-flow (**L12**)

**closed-form matting** $\hat\alpha = \arg\min_\alpha \alpha^\top L\,\alpha$ s.t. trimap, where $L$ is the **matting Laplacian** (a pixel-affinity matrix from a local color-line model) — *the same affinity-Laplacian as colorization*

▸8.8.1 Segmentation: cutting the object out⧉

• **the task**: produce a **binary mask** — these pixels are the subject, those are not. Three classic framings, the *same* lesson (**L12**) with three different optimizers — the energy/affinity you pick is the design, not the solver.

#### Interactive graph-cut segmentation

• **interactive graph-cut segmentation** (Boykov & Jolly 2001): treat the image as a **grid graph**; the user paints a few **FG** and **BG** strokes. Build a color model from the strokes (a **data/region term** $D_p$), add an **edge-aware smoothness term** $V_{pq}\propto e^{-\lVert C_p-C_q\rVert^2/2\sigma^2}$ (cheap to cut where colors differ), and minimize $E=\sum_p D_p+\lambda\sum_{pq}V_{pq}$ by **min-cut/max-flow**. For two labels this is **globally optimal** — *why* graph cut beats greedy region-growing. [`fig-segmentation-graphcut`]

#### GrabCut: less input, same cut

• **GrabCut** (Rother et al. 2004): the user drags a **loose rectangle**; fit **Gaussian-mixture color models** to FG/BG, run the cut, **re-estimate** the models, **iterate**. A rectangle plus a couple of strokes → a clean mask — the interaction model "select subject" tools descend from.

#### Two other framings: normalized cuts and intelligent scissors

• **normalized cuts** (Shi & Malik 2000): the **spectral** framing — cut a pixel-affinity graph into balanced pieces by a **normalized** cut cost, solved by an **eigenvector** of the graph Laplacian (a relaxation) → *unsupervised* grouping. Same affinity, different optimizer. [`fig-segmentation-three-framings`]

• **intelligent scissors / live-wire** (Mortensen & Barrett 1995): the **boundary** framing — a **least-cost path** (dynamic programming, costs low along strong edges) **snaps** to the contour as you trace. Energy on the *boundary*, optimizer = shortest-path. Three energies — region, spectral, boundary — one **L12** lesson. [`fig-segmentation-three-framings`]

#### The bridge to soft selection

• **the bridge to soft selection**: every method here returns a **hard 0/1** mask, which through **hair** looks scissor-cut — jagged, missing wisps, fringed. The fix is to relax the label to a **continuous $\alpha\in[0,1]$** — **matting**, next.

▸8.8.2 The matting equation and the alpha channel⧉

• **the model**: each observed pixel is a **mixture**, $C = \alpha F + (1-\alpha)B,\ \alpha\in[0,1]$ — $\alpha=1$ pure foreground, $\alpha=0$ pure background, and the *interesting* pixels are **in between** (a strand of hair, a motion-blurred edge, an out-of-focus border, frosted glass). The matte $\alpha$ is the **coverage / transparency** channel. [`fig-matting-equation`]

• **compositing = the forward direction** (the "over" operator, Porter & Duff 1984 — *queue ref*): given $F,\alpha$ and a new $B$, the composite is one **per-pixel lerp**. (In practice colors are **premultiplied**, $F'=\alpha F$, so over is $C=F'+(1-\alpha)B$ — cheaper, correct under filtering.) This part is easy.

• **matting = the inverse, brutally ill-posed**: from one photo you observe only $C$ (3 numbers) and must recover $F$ (3), $B$ (3), $\alpha$ (1) — **7 unknowns, 3 equations** per pixel.

• 💡 **callback to L10 (the prior is not optional)**: every method adds information — a user **trimap** + a **prior/affinity** (natural matting), a **known background** (chroma key), or an **extra measurement** (depth/IR/multi-flash). Same moral as super-resolution and dehazing. (→ see Big lesson **L10**, [[#Super-resolution and image priors]].)

▸8.8.3 Trimaps: turning the impossible inverse into a tractable one⧉

• **the trimap** is the standard input: a three-way label — **definitely FG**, **definitely BG**, and a thin **unknown** band $\Omega$ straddling the boundary (the hair, the blur). From a scribble, a dilated mask, or learned.

• **why it's enough**: in the definite regions $\alpha$ is pinned and $F$ or $B$ is *observed*; the algorithm solves $\alpha,F,B$ only in the **narrow unknown band**, with nearby known colors as anchors — "7 unknowns everywhere" → "a few unknowns in a thin strip with strong boundary conditions." [`fig-trimap-to-alpha`]

• **the through-line**: *less user input is the research arc* — hard scribbles (graph cut) → loose rectangle (GrabCut) → trimap (matting) → **nothing** (learned/SAM).

▸8.8.4 Natural-image matting: Bayesian and closed-form⧉

• **Bayesian matting** (Chuang et al. 2001 — *queue ref*): a **per-pixel MAP** — model nearby FG/BG colors as Gaussians, find the $(F,B,\alpha)$ best explaining $C$, sweep outward from known regions. Intuitive but **local** (neighbouring alphas needn't agree).

• **closed-form matting** (Levin et al. 2008 — *queue ref*): **eliminate $F,B$** — assume in a small window FG and BG each lie on a **color line**, so $\alpha$ is locally an **affine function of color**, and solve $\hat\alpha=\arg\min_\alpha \alpha^\top L\,\alpha$ s.t. the trimap, where $L$ is the **matting Laplacian** (a sparse **pixel-affinity** matrix). A **sparse linear solve**. [`fig-trimap-to-alpha`]

• 💡 **the hinge — recurrence of L4**: the matting Laplacian $L$ **is an affinity matrix** — the *same object* as the bilateral weight, the graph-cut smoothness term, and the **colorization** solve ([[#Colorization as an optimization (the inverse-problem framing)]]). "**Propagate $\alpha$ along affinities, anchored by the trimap**" is colorization with $\alpha$ in place of chroma. (→ Big lesson **L4**, [[Bilateral filtering]]; the *cut* form is **L12**.)

• **robust / sampling hybrids** (Wang & Cohen 2007 — *queue ref*): combine color-sampling with the affinity smoothness — the workhorse just before the learned era.

▸8.8.5 Easy matting I — blue/green/chroma keying (constrain the background)⧉

• **the trick**: make the inverse *well-posed by construction* — shoot against a **known, uniform, saturated background** (green/blue screen) → $B$ is **known** (and far from skin/hair), so only $\alpha,F$ remain (Smith & Blinn 1996 — *queue ref*). [`fig-key-vs-measure`, left]

• **green vs blue**: **green** dominates today (two green photosites per Bayer cell → less noise; rarer in skin/clothing); **blue** was the film standard.

• **practice / failure modes**: **spill** (green bounce on edges → de-spill), **uneven screen lighting** (the "known" $B$ isn't constant), **fine detail** (hair still needs a soft $\alpha$ at the boundary). *Why studios obsess over even screen lighting* — it keeps $B$ genuinely known.

• **the moral**: chroma key removes unknowns by **controlling the scene**.

▸8.8.6 Easy matting II — IR, depth and multi-flash matting (measure the separation)⧉

• **the other road**: **add a measurement** that distinguishes FG from BG — **computational illumination** (light/sense so the *capture* hands you the matte).

• **depth matting**: a **depth sensor** (ToF, stereo, dual-pixel, LiDAR) gives "near = foreground" independent of color — phone **portrait mode**, video-call background blur. The depth edge is coarse → used as a **trimap generator**, refined with closed-form/guided matting (same affinity solve; cf. the guided-filter transmission refinement in [[#Dehazing]]).

• **IR / "Disney" matting**: light the background in the **infrared** (a retro-reflective IR background; IR-keyed matting) → a *second IR image* is a built-in green screen with **no visible colored screen** to spill/relight.

• **flash / multi-flash matting** (*queue ref*): a **flash** falls off as $1/d^2$, brightening the *near* foreground far more than the far background → differencing flash/no-flash (or multi-flash occlusion shadows) separates FG/BG by **illumination falloff**. [`fig-key-vs-measure`, right]

• **the moral**: these spend a *sensor or a light* to make the matte well-posed, not *user input* or a *prior* — the computational-illumination thread.

▸8.8.7 Rotoscoping and video matting (coherence over time)⧉

• **rotoscoping**: the hand-craft ancestor — tracing a moving subject **frame by frame**; modern tools interpolate **keyframed splines/masks** — segmentation with a **temporal** axis.

• **the new requirement — temporal coherence**: a per-frame matte that **flickers** looks worse than a slightly-wrong but **stable** one, so video matting adds a **temporal smoothness / propagation** term (track, or optical-flow-warp the matte forward) — the graph/affinity now spans a video volume.

• **applications**: green-screen VFX, live virtual backgrounds, object removal/replacement, the **selection** layer under any video edit (ties to [[Non-photorealistic rendering]]). Learned video matting (Lin et al. 2021 — *queue ref*) does this with a recurrent net.

▸8.8.8 Learned segmentation and matting (the prior, learned)⧉

• 💡 **callback to L8**: deep methods keep the *same skeleton* — select, then recover $\alpha$ — but replace the hand-built color model / affinity / trimap-propagation with a function **fit to data**. The matting equation and "cut then blend" don't change; the prior does. (→ Big lesson **L8**, [[Deep learning]].)

• **SAM — Segment Anything** (Kirillov et al. 2023): a **promptable** model — click/box/"everything" → clean masks for **arbitrary objects**, no per-class training; collapses GrabCut's interaction to a prompt and its color model to a pretrained net. The "**cut anything**" front-end feeding masks/trimaps downstream.

• **deep alpha matting** (Xu et al. 2017 — *queue ref*; trimap-free MODNet / robust matting, Lin et al. 2021 — *queue ref*): a network regresses $\alpha$ (and $F$) directly, learning the hair/edge prior; trimap-free variants push user input to **zero** (the end of the "less input" arc), only as good as their training distribution.

• placement: taught as **models** in [[Deep learning]]; here the **learned-prior** end of the same select-then-recover spectrum. Generative *object insertion* (re-render the blended result incl. shadow) is [[Generative AI and diffusion]] (**L11**).

▸8.8.9 Harmonization: making the composite belong⧉

• **the problem after the cut**: even with a *perfect* $\alpha$, a raw paste looks "**stickered on**" — different light, white balance, contrast, **grain** than the new background. The matte was right; the *appearance* is wrong. [`fig-harmonization`]

• **two layers of fix, both pointing to EDGES MATTER**:

• **seamless blending** — **Poisson / gradient-domain** (Pérez et al. 2003) pastes the foreground's **gradients**, re-deriving the boundary color from the background (**L9**, [[Poisson image editing]]); **multi-band / pyramid blending** (Burt & Adelson 1983) blends each band over a band-appropriate width (developed in [[Linear pyramids and wavelets]]). *Used here, not re-derived.*

• **global appearance harmonization** — match the foreground's **color distribution, illumination direction, contrast and grain** (classical: transfer color statistics; learned: a net predicting a harmonized composite). Fixes lighting *consistency*, which Poisson alone doesn't (Poisson hides the *seam*, not a wrong *shading direction*).

• **the takeaway**: a believable composite needs **all three** — a correct **matte** (the cut), a seamless **blend** (Poisson/pyramid), and consistent **appearance** (harmonization).

▸8.8.10 Where the blends live: Poisson, pyramid, photomontage, and fusion (cross-refs)⧉

• **resolving the stub's placeholders** — these belong to other parts (*cut here, blend there*):

• **Poisson / gradient-domain blending** → [[Poisson image editing]] (EDGES MATTER); used above (**L9**).

• **pyramid / multi-band blending** → [[Linear pyramids and wavelets]] (Burt & Adelson 1983).

• **interactive digital photomontage** (Agarwala et al. 2004) → the canonical *cut-then-blend* system: a **graph-cut seam** picks which source photo per pixel (group photo with everyone smiling, clean-plate removal, extended DoF), then a **Poisson blend** hides seams — exactly this part's two halves wired together (graph cut = **L12**; Poisson = **L9**); placed at the seam/blending crossover in EDGES MATTER, showcased here.

• **multi-image fusion** (HDR merge, exposure fusion, focus stacking, flash/no-flash) → [[Multiple exposure imaging]] (and the [[#Recap: tone mapping]] recap); shares "weighted per-pixel combine in a pyramid" with pyramid blending but combines **exposures/focus**, not **objects**.

• the one-line map: **segmentation/matting = the cut (this part); Poisson/pyramid/photomontage = the blend (EDGES MATTER); HDR/fusion = the multi-capture combine ([[Multiple exposure imaging]]).**

⬜ figure not yet created

`fig-intrinsic-decomp` (one photo → its **reflectance** layer (flat albedo) × **shading** layer (smooth illumination, no texture)) fig-intrinsic-decomp

`fig-retinex-thresholding` · Retinex: threshold the log-gradient (small steps = shading, large steps = reflectance edges), then re-integrate 🟨

⬜ figure not yet created

`fig-multi-illuminant-wb` (a scene lit warm tungsten on one side, cool window light on the other → one global white balance leaves *one* half wrong fig-multi-illuminant-wb

`fig-hdr-merge-antelope` · the HDR merge on a REAL bracket (Antelope Canyon): two exposures (short/long) + their per-pixel reliability weight maps → reliability-weighted radiance average → tone-mapped result; both sun and shadows readable ✅

`fig-reflection-removal-cues` · cues that separate a window reflection from the transmitted scene — ghosting/double-image, polarization, focus difference 🟨

`fig-dichromatic-model` · the dichromatic reflection model: observed colour = a body (diffuse) component + a specular (illuminant-coloured) component — a plane in colour space 🟨

A single photograph mixes things the eye effortlessly keeps apart: it instantly reads a white shirt in shadow as *white-and-shadowed*, not as a grey shirt. The camera can't — it records the **product**, $I = R\cdot S$, illumination times surface (**L1**). This chapter is the project of **un-mixing** that product from one image — separating light from surface, or one layer of light from another. And it is the same story as super-resolution and deblurring: the un-mixing is **under-determined**, so a **prior** does the work (**L10**). The unifying frame is the **intrinsic-image decomposition**; everything else here is a special case of it. (These are the *single-image* methods; their easier *multi-image* cousins — flash/no-flash, a rotating polarizer, a light stage — live in **Computational illumination (Advanced)**.)

💡 **Big lesson (recurrence of L1 + L10):** because the world is **multiplicative** — what you measure is light **×** surface — recovering *either factor* from one image is **ill-posed**: a dark patch can be a dim surface in bright light or a bright surface in dim light, and the pixel can't tell you which. So separating illumination from reflectance (or a reflection from a transmission, a shadow from a stain, a highlight from a color) always needs a **prior** about how surfaces, light, and natural images behave — not a tuning knob but the load-bearing part of the method. (→ see **L1** · multiplicative world, FUNDAMENTALS; → see **L10** · the prior is not optional, [[#Super-resolution and image priors]].)

equations

**image as product** $I(x) = R(x)\,S(x)$ — log makes it additive $\log I = \log R + \log S$ (**L1/L2**)

**Retinex / gradient classification** — assign $\nabla(\log I)$ to $\nabla R$ if $|\nabla\log I| > \tau$ (sharp = reflectance) else to $\nabla S$ (smooth = shading), then **Poisson-integrate** each layer (**L9**)

**layer superposition (reflection)** $I = T + R$ (transmitted + reflected), under-determined → priors

**dichromatic model** $I(x) = m_d(x)\,\Lambda_{\text{body}} + m_s(x)\,\Lambda_{\text{illum}}$ (diffuse in the **surface** color direction + specular in the **illuminant** color direction — two color vectors, separable per pixel)

▸8.9.1 Intrinsic images: the unifying frame ($I = R\cdot S$)⧉

• **the decomposition**: split one image into two **intrinsic layers** — **reflectance** $R$ (the surface's albedo, independent of lighting) and **shading** $S$ (the illumination, independent of surface). Per channel $I = R\cdot S$; in **log** it's additive, $\log I = \log R + \log S$ (**L2** — why illumination work lives in log/linear, not gamma). [`fig-intrinsic-decomp`]

• **why it's the umbrella**: once you can name "this variation is *light*, that is *surface*," every effect here is an edit on one layer — **white balance** corrects the *color* of $S$; **shadow removal** flattens a dark region of $S$; **highlight removal** strips a specular add-on; **reflection removal** un-sums two whole $I$'s.

• **why it's hard**: infinitely many $(R,S)$ multiply to the same $I$ — the multiplicative ill-posedness in the box. You need a prior on *which* variations are reflectance and which are shading.

• **Retinex** (Land & McCann 1971) — the **classic gradient prior**: assume **reflectance changes are sharp** (a painted edge) while **shading changes are smooth** (light falls off gradually). Threshold the (log-)gradient: **big jumps → reflectance**, **gentle ramps → shading**; then **Poisson-integrate** each (**L9**). Crude but captures the whole intuition: *edges are surface, ramps are light.* [`fig-retinex-thresholding`]

• **SIRFS** (Barron & Malik 2015 — *queue ref*): jointly recover **Shape, Illumination, Reflectance** by **MAP** with hand-built priors (reflectance piecewise-constant/sparse, shape smooth, illumination low-frequency). Same data-fit + prior recipe; learned successors (CNNs predicting $R,S$) in [[Deep learning]] (**L8**).

• **the takeaway**: intrinsic decomposition is *the* generalization; the rest is "I care about one layer, and I have a cue that pins it down."

▸8.9.2 Multiple-light / spatially-varying white balance⧉

• **the assumption that breaks**: ordinary white balance estimates **one** illuminant for the whole frame and **divides** it out (**L1**, [[Color technology]]) — right only under a *single* color of light.

• **the failure**: real scenes mix lights — **warm tungsten** vs **cool daylight** through a window; flash + ambient; sun + open-shade blue. One global correction neutralizes *one* region and casts the other: fix the tungsten side and the window goes blue. [`fig-multi-illuminant-wb`]

• **the fix — per-pixel light mixture** (Hsu et al. 2008): model each pixel as a **blend of two illuminant colors**, $I(x) = (\alpha(x)L_1 + (1-\alpha(x))L_2)\cdot R(x)$, solve for the **mixing map** $\alpha(x)$ — a **spatially-varying** white balance — then divide out the *local* light. The two colors come from the user, clipped highlights, or clustering.

• **why it's intrinsic-image**: recovering the *color* of the shading layer as it varies — a chrominance-only intrinsic split; the map $\alpha(x)$ is **smooth** (light varies gently) — the Retinex prior on illuminant *color*.

• **encoding note**: divide in **linear light** (mixing/dividing illuminants is linear in radiance — **L2**).

• **cross-ref**: the *easy* multi-image version (flash/no-flash, the flash's known color disambiguates) is in **Computational illumination (Advanced)**; learned single-image in [[Deep learning]].

▸8.9.3 Reflection removal — pulling apart a transmission and a glass reflection⧉

• **the problem**: shoot through a window → **two images summed**, the **transmitted** scene ($T$) plus a **reflection** ($R$): $I = T + R$ (this one is **additive**, unlike $R\cdot S$; **L2**). Pulling them apart from one $I$ is hopelessly under-determined — pure prior territory (**L10**).

• **the cue zoo** (each a different prior breaking the tie): [`fig-reflection-removal-cues`]

• **defocus / blur**: you focus on the *transmitted* scene → the **reflection is out of focus** (weak, broad gradients). Assign sharp gradients to $T$, soft to $R$ (Fattal 2008 — a Retinex-like split on *focus*).

• **ghosting**: a *thick* pane reflects twice → the reflection appears **doubled/shifted**; that known double-impulse lets you identify and subtract $R$.

• **sparse-gradient prior** (Levin & Weiss 2007 — *queue ref*): both layers have **sparse, heavy-tailed gradients**; the sum has *too many* edges → find the $(T,R)$ split **minimizing total edges** (optionally with user scribbles). Same sparse-gradient prior as blind deblurring.

• **polarization**: light off glass is **partially polarized** (Brewster) → a polarizer attenuates $R$ differently from $T$. *(The genuine multi-image method — rotate a polarizer, two frames — lives in **Computational illumination (Advanced)**; single-image methods try to get this without the extra shot.)*

• **learned** (Zhang et al. 2018 — *queue ref*): a CNN trained on synthesized $T+R$ pairs predicts $T$ with a **perceptual loss** so the layer *looks* like a real photo (**L8**).

• **the takeaway**: "$I = T + R$ has no answer; pick the cue (focus, ghosting, edge-sparsity, polarization, a learned prior)." A cousin of intrinsic images — un-summing two illuminations.

▸8.9.4 Shadow detection and shadow removal⧉

• **what a shadow *is*, intrinsically**: a region where the **shading** $S$ drops (less light) while **reflectance** $R$ is **unchanged** — same paint, just darker. So shadow removal is an **edit on the shading layer only**: flatten the darkening, keep the texture. This is why it lives under intrinsic images.

• **the detection cue (the crux)**: tell a **shadow edge** (shading discontinuity) from a **material edge** (reflectance): a shadow edge changes **brightness but not hue much** (same surface color), is often **soft** (penumbra) and geometrically smooth; shadow regions are lit by **skylight** (bluer) not direct sun (warmer) — a color tell.

• **removal as a gradient-domain edit** (**L9**): label the shadow *boundary* gradients, **zero them**, and **Poisson-reconstruct** — interior texture (small gradients) preserved, the big shadow step removed. Same machinery as gradient-domain compositing. Refine the matte with the **fast bilateral solver** (Barron 2017) so the correction follows the real penumbra (**L4**).

• **the penumbra subtlety**: a hard binary mask leaves a dark halo; you need a **soft shadow matte** and to **scale** (multiply the shadowed radiance back up by the recovered shading ratio — a multiplicative correction in linear light, **L1**).

• **cross-ref**: the easy multi-image version (**flash/no-flash**) is in **Computational illumination (Advanced)**; learned shadow removal in [[Deep learning]].

▸8.9.5 Specular-highlight removal / "fake polarization" (the dichromatic model)⧉

• **the problem**: glossy surfaces (skin, leaves, plastic, wet roads) show **specular highlights** — the light source reflected, carrying the *illuminant's* color, not the surface's; they blow out texture and fool color/material estimation. We want the **matte (diffuse-only)** image.

• **the dichromatic reflection model** (Shafer 1985 — *queue ref*): reflected light is the **sum of two terms** with **different colors** — a **diffuse/body** term in the **surface's** color + a **specular/interface** term in the **illuminant's** color: $I(x) = m_d(x)\Lambda_{\text{body}} + m_s(x)\Lambda_{\text{illum}}$. The falloffs vary per pixel, but there are only **two color directions**.

• **the geometric picture (why separable)**: plot a glossy patch's pixels in RGB → an **"L"/dog-leg**: a **line** along the body color (matte shading) with a **spur** toward the **illuminant white** (the specular ramp). Identify the two directions (Lee 1979; Klinker–Shafer–Kanade 1988 — *queue ref*), then **project each pixel back onto the diffuse line**. [`fig-dichromatic-model`]

• **"fake polarization"**: a **real** specular killer is a **polarizing filter** (specular reflection is polarized); doing the *same job* purely **computationally from one image** via the dichromatic model is the "fake polarization" — the polarizer's matte look **without** the filter or extra shot. (Lee 1979 also used the highlight's illuminant color for **color constancy** — a highlight is a free white reference.)

• **why it's an intrinsic-image cousin**: the specular term is an **additive contamination of the shading layer** in the *illuminant's* color; removing it recovers the diffuse $R\cdot S$. A **color-direction** prior rather than a gradient prior — different cue, same un-mixing goal.

• **cross-ref**: the genuine optical version (rotate a polarizer) and multi-light specular removal are in **Computational illumination (Advanced)**; learned separation in [[Deep learning]]. **Encoding**: separate in **linear light** so the dichromatic *sum* is physically additive (**L2**).

▸8.10 Non-photorealistic rendering⧉

`fig-npr-painterly-layers` · painterly rendering in coarse-to-fine brush layers: big strokes lay the ground, smaller strokes add detail where it matters (Hertzmann 1998) 🟨

`fig-npr-cartoon-real` · the Winnemöller edge-preserving abstraction pipeline on a real photo: input → iterated bilateral flatten → luminance quantise → thresholded DoG ink → cartoon composite (NPR) ✅

`fig-npr-flow-dog` · isotropic Difference-of-Gaussians lines vs flow-based coherent lines that follow the local edge tangent (Kang et al. 2007) 🟨

`fig-npr-brush-pset` · the brush-stroke problem-set figure: placing, orienting and sizing strokes from image gradients (course p-set) 🟨

Everything before this chapter tried to make images **truer** — sharper, cleaner, better-exposed. NPR does the opposite on purpose: it throws information **away** to make a photograph more like a **drawing or a painting** — to *communicate* rather than reproduce. The surprising lesson is that the tools are the **same** ones from EDGES MATTER: the operation that makes a good cartoon — **flatten the unimportant texture, keep and darken the meaningful edges** — is exactly the **edge-preserving / base–detail** decomposition (**L4**), used to *abstract* instead of *enhance*. A light chapter and a coda: *learned* stylization is in [[#Style transfer]] above and [[Deep learning]]; here we cover the **classical, structure-driven** core and the course **brush p-set**.

equations

(light) **stroke placement** — paint where the canvas differs from a **blurred reference** (error $> T$), stroke **orientation $\perp \nabla I$**, **size coarse→fine** across pyramid levels

**DoG edges** $D = G_{\sigma_1}*I - G_{\sigma_2}*I$, thresholded → ink lines

**abstraction** = **bilateral**-flattened (quantized) color **+** DoG edges overlaid

▸8.10.1 What NPR is for, and the one idea⧉

• **the goal**: a **depiction**, not a reproduction — painterly, sketch, or cartoon, **emphasizing structure** and **suppressing clutter**. Closer to *illustration* than photography; the value is in **what's left out**.

• **the one idea — structure-aware abstraction**: keep what the eye uses (strong **edges**, **large-scale tone**, the silhouette), simplify the rest (fine texture, mid-tone noise, busy backgrounds). **L13** (separate the subject, commit) executed in pixels, using **L4** (edge-preserving) machinery.

• **intuition-first**: a cartoon = **flat color regions + bold outlines.** Flat regions come from an **edge-preserving smoother** (don't blur across edges); outlines from an **edge detector**. That two-part recipe is the whole field in one sentence.

▸8.10.2 Stroke-based / painterly rendering (and the brush p-set)⧉

• **the idea** (Hertzmann 1998 — *queue ref*): repaint the photo with **discrete brush strokes**, **coarse-to-fine** — big rough strokes over a heavily-blurred version, then progressively **smaller** strokes **only where** the painting still differs from the (less-blurred) reference. Detail appears where needed; flat areas stay loose. [`fig-npr-painterly-layers`]

• **the structure that makes it look good (not random daubs)**:

• **orientation**: strokes run **perpendicular to the image gradient** (along the "grain") — following a cheek's contour or a sky's flow (**L9**);

• **size = scale**: coarse strokes on coarse pyramid levels, fine on fine — a direct use of the **Gaussian / image pyramid** ([[Linear pyramids and wavelets]]);

• **curved strokes**: trace along the orientation field for long painterly strokes, stopping at edges;

• **error-driven placement**: add a stroke only where canvas-vs-reference error exceeds a threshold → the painting **converges** to the photo at the chosen abstraction level.

• **the parameters are the *style***: stroke size range, opacity, jitter, curvature, blur radius → impressionist vs pointillist vs expressionist from *one* algorithm. (The **Gaussian pyramid** sets stroke scale and the blurred references.)

• **the course p-set**: implement the **layered, gradient-following brush painter** — build the pyramid of blurred references, place oriented strokes coarse-to-fine, explore a couple of brush/parameter styles on your own photo. [`fig-npr-brush-pset`] Use the photos-from-Fredo / sourced images for the worked example.

▸8.10.3 Edge-preserving abstraction: bilateral + Difference-of-Gaussians⧉

• **the canonical real-time pipeline** (Winnemöller et al. 2006 — *queue ref*): [`fig-npr-abstraction-pipeline`]

1. **flatten regions** with a **bilateral filter** (iterated) — smooths within objects, **stops at edges** (**L4**), collapsing texture into flat cartoon regions; optionally **quantize** color/luminance into a few bands (the posterized look);

2. **draw lines** with a **Difference-of-Gaussians**, $D = G_{\sigma_1}*I - G_{\sigma_2}*I$, thresholded into **black ink edges**;

3. **composite**: flat quantized color **+** DoG lines = a cartoon, in real time, even on video.

• **why this *is* EDGES MATTER**: the bilateral step is literally the [[Edge-preserving techniques]] base/detail filter (**L4**) — here used to **destroy** detail deliberately, not enhance it. Same tool, opposite intent.

• **flow-based DoG / coherent lines** (Kang et al. 2007 — *queue ref*): plain DoG gives **broken, noisy** edges; compute a smooth **edge-tangent flow** and run the DoG **along** it → **long, clean, coherent** lines like a confident pen. [`fig-npr-flow-dog`]

• **temporal coherence** (for video): filter with motion-awareness so flat regions and lines don't **flicker** frame-to-frame.

▸8.10.4 Example-based stylization and the bridge to neural style⧉

• **eye-tracked, perceptually-guided abstraction** (DeCarlo & Santella 2002, *Stylization and Abstraction of Photographs*, SIGGRAPH): the chapter's deepest answer to *what to keep*. Instead of inferring importance from edges alone, they **record where a viewer actually looks** (an eye-tracker) and build a **hierarchical, multi-scale** abstraction that **preserves detail at the fixated, meaningful regions and simplifies everything else** — bold flat color and clean bounding contours from a segmentation hierarchy, with detail *dialed in by measured visual attention*. NPR reframed as **directing the eye**: the abstraction reads as right because it is tied to **perception**, not just to gradients (**L13** — separate the subject and commit). This is the model of *meaningful* (importance-driven) abstraction, and the natural complement to the bilateral + DoG pipeline above, which abstracts the frame uniformly.

• **Image Analogies** (Hertzmann et al. 2001): give one example pair **A → A′** (a photo and its hand-stylized version) and a new photo **B**; synthesize **B′** so *A:A′ :: B:B′* by **patch matching** — *learn the style from one example*. The pre-deep ancestor of neural style; developed in [[#Style transfer]] (and surveyed in [[Deep learning]]) — appearing in both by design.

• **the bridge to neural style**: classical NPR hand-codes *what to keep* (edges, strokes, flat regions); **neural style** (Gatys 2016) replaces those rules with **deep-feature statistics** (content features + Gram-matrix style), and **feed-forward** stylization (Johnson 2016) makes it real-time — the same painterly goal, the prior now **learned** (**L8**). Full treatment in [[#Style transfer]] / [[Deep learning]].

• **the takeaway**: across painterly strokes, bilateral+DoG cartoons, and neural style, the constant is **abstraction guided by structure** — decide what carries meaning, keep and exaggerate it, simplify the rest. NPR is **L4 + L13** turned into a rendering style.

▸8.10.5 Region-based stylization: stained glass, low-poly, mosaics⧉

• **the idea**: instead of strokes or edges, **partition** the image into a few large regions and fill each with a **flat color** (its mean) — *abstraction by tessellation*. Pick the tiling to respect structure: a **Voronoi / centroidal** tessellation ("**stained glass by numbers**"), a **Delaunay low-poly** triangulation seeded at salient/edge points, or **superpixels** (SLIC). Same NPR thesis — *abstraction guided by structure* (**L4**) — with "what to keep" carried by **region boundaries** rather than brush orientation.

• **the dial = number / shape of regions**: few large cells → bold, poster-like; many small cells → near-photographic. Edge-aware seeding keeps tiles from straddling strong contours (the [[Bilateral filtering]] / segmentation link).

• **p-set lineage**: the "stained glass by numbers" make-your-own assignment; kin to **photomosaics** ([[10 Many images and photo collections#Photomosaics, Salavon]]) and to **graph-cut region labelling** ([[Seam optimization]]).

▸8.10.6 Artistic screening and halftoning⧉

• **the idea**: reproduce a continuous-tone image with **discrete marks** — the classic print problem — but make the *marks themselves* expressive. Ordinary **halftoning** trades spatial resolution for tone (clustered-dot screens, **ordered dithering**, **error diffusion** à la Floyd–Steinberg); **artistic screening** (Ostromoukhov & Hersch) replaces the regular dot screen with **shaped, image-bearing screen elements** so the halftone microstructure carries a *second* picture, lettering, or texture. Tone is still right at a distance; up close the rendering is a deliberate pattern.

• **why it's NPR**: same thesis — *abstraction guided by structure* (**L4**) — here the abstraction lives in **how tone is quantized into marks**, not in strokes, edges, or regions. Stippling, engraving/hatching, and ASCII-art are the same move with different mark vocabularies.

• **the dial = the screen / mark set**: choice of screen element, frequency, and angle sets the style; error-diffusion vs ordered-dither sets the "texture" of the noise (blue-noise dithers look cleanest). Ties to [[Color technology]] (CMYK screen angles, moiré) and the perceptual reason it works (the eye's spatial averaging — [[Human (and animal) vision and color]]).

• **p-set lineage**: the "artistic screening" make-your-own assignment.