💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
Computational Photography, an AI-powered Slopendium — 08 Single-image computational photography
expand to📖 Full book outlinejump to1 parts · 10 chapters · 53 sections · 39 figures embedded · 11 placeholders · double-click a figure to enlarge
Part 8 SINGLE IMAGE COMPUTATIONAL PHOTOGRAPHY
8.1 Recap: tone mapping
fig-tonemap-real
fig-tonemap-real · global vs local tone mapping RESULT on a real HDR photo (seal at marina): naive single exposure · global Reinhard (one slope flattens local contrast → hazy) · local bilateral base+detail split (range fits, detail stays crisp) — real-image companion to `fig-tonemap-global-vs-local` (BASIC tone mapping)
fig-tonemap-taxonomy
fig-tonemap-taxonomy · a taxonomy of local tone mapping — bilateral base/detail, gradient-domain, local Laplacian, exposure fusion, learned — placed on a shared axis 🟨
⬜ figure not yet created
`fig-darkroom-dodge-burn` (a Zone-System / dodge-&-burn darkroom print beside its digital local-tone-mapping analogue) fig-editing-lr-vs-ps
This chapter doesn't introduce new algorithms — it **gathers** the tone-mapping thread that has run since BASIC into one picture, and ties it back to the **analog darkroom** where tone control began. (Scope note: this single-image recap deliberately covers the **single-image** operators; the **multi-image** tone mapping — HDR *merge* / exposure fusion — lives in [[Multiple exposure imaging]] and is only forward-referenced here. **See Known Issue.**)
8.2 Super-resolution and image priors
fig-sr-ill-posed
fig-sr-ill-posed · many sharp HR images collapse to the same blurry LR image — the SR inverse is one-to-many (ill-posed); a prior picks one 🟨
fig-isp-walk-real
fig-isp-walk-real · the ISP pipeline walked on ONE real photo (boatman gathering lotus, overcast lake, Sony DNG): raw (linear, no WB — flat & green) · white balance · tone+colour (gamma encode) · denoise · sharpen · finished JPEG, frame-coloured by linear-light vs encoded regime — real-image companion to `fig-isp-block-diagram` (Recap ISP, BASIC)
⬜ figure not yet created
`fig-sr-recon-vs-halluc` (one low-res crop → reconstruction-based result from a burst (real detail) vs hallucination-based result from a learned prior (invented but plausible detail)) fig-sr-recon-vs-halluc
fig-burst-subpixel
fig-burst-subpixel · hand-jitter across a burst lands samples between the LR grid points; merging the sub-pixel-shifted frames fills the HR grid 🟨
⬜ figure not yet created
aliasing made useful, after Wronski 2019)
fig-pnp-loop
fig-pnp-loop · Plug-and-Play / RED: alternate a data-fidelity step with a denoiser-as-prior step until convergence — the prior is a black-box denoiser 🟨
fig-denoiser-as-prior-spectrum
fig-denoiser-as-prior-spectrum · one PnP/RED prior slot, ever-stronger denoisers: hand-built → classical → learned → diffusion score (GenAI / Super-res)
💡 **Big lesson (L10 · The prior is not optional):** when the measurement genuinely destroys information — super-resolution past the sensor's sampling, deblurring at killed frequencies, inpainting a hole — *no amount of cleverness recovers it from the data alone*. A **prior** (what natural images look like) is what selects an answer; it is a load-bearing part of the algorithm, not a tuning knob. The honest split: **reconstruction** priors fuse genuinely-measured extra samples; **hallucination** priors *invent* plausible detail that was never measured. (Registered in [[Big Lessons]] as **L10**; first appears here; recurs in [[#Blind deblurring]] — blind deblurring & dark-channel prior — and forward in [[Generative AI and diffusion]].)
equations
SR forward model $y = (k * x)\downarrow_s + n$ (blur by PSF $k$, downsample by factor $s$, add noise $n$)
MAP / variational recovery $\hat{x} = \arg\min_x \tfrac{1}{2}\lVert (k*x)\downarrow_s - y\rVert^2 + \lambda\, \Phi(x)$ (data-fit + prior $\Phi$)
PnP iteration (HQS form) — alternate a **data-fit** step $z \leftarrow \arg\min_x \tfrac{1}{2}\lVert (k*x)\downarrow_s - y\rVert^2 + \tfrac{\rho}{2}\lVert x - \tilde{x}\rVert^2$ and a **denoise** step $\tilde{x} \leftarrow \mathcal{D}_\sigma(z)$ (the prior, *any* denoiser $\mathcal{D}$)
RED gradient $\nabla \Phi(x) = x - \mathcal{D}(x)$ (prior gradient = residual of a denoiser)
Tweedie $\hat{x} = z + \sigma^2\, \nabla_z \log p_\sigma(z)$ (the score IS a Gaussian denoiser → diffusion = iterated $\mathcal{D}$)
8.3 Blind deblurring
fig-naive-deconv-noise
fig-naive-deconv-noise · naive inverse filtering explodes: dividing by the blur's spectrum amplifies noise at its near-zeros 🟨
fig-wiener-tradeoff
fig-wiener-tradeoff · the Wiener filter as a regularised inverse — the $1/\mathrm{SNR}$ term trades deconvolution sharpness against noise amplification 🟨
fig-metric-degradation-gallery
fig-metric-degradation-gallery · one clean photo vs four degradations (1-px shift, low-q JPEG, noise, blur), each labelled with PSNR + SSIM — PSNR and SSIM mostly agree but disagree where L2 is blind (shift scores worst PSNR yet looks identical; blur keeps high PSNR yet softens texture) (Image metrics, BASIC)
fig-natural-image-gradient-prior
fig-natural-image-gradient-prior · natural-image gradients are heavy-tailed (mostly flat, rare strong edges) vs a Gaussian — the sparse prior that breaks the blind tie 🟨
fig-camera-shake-nonuniform
fig-camera-shake-nonuniform · camera-shake blur is spatially varying — a rotation gives a different PSF in each corner of the frame 🟨
fig-bw-conversions
fig-bw-conversions · one colour photo → black & white several ways: single channel (R/G/B) · average · weighted luminance · red-filter channel mix (dark sky) · an isoluminant pair collapsing to one grey — the choices differ (Converting to B&W, Color technology)
fig-point-op-contrast-spaces
fig-point-op-contrast-spaces · the same contrast (×gain about mid-gray) in gamma (sRGB) / linear (no gamma) / log: three outputs + remapping curves (linear crushes shadows, log preserves them; black clamped to ε for log) 🟨
⬜ figure not yet created
`fig-colorization-scribbles` (gray photo + a few color scribbles → propagated color, stopping at edges) fig-colorization-scribbles
The previous chapter inverted a **known** blur on a **clean** image — a textbook least-squares solve. Reality breaks both assumptions: sensors add **noise**, and you usually **don't know the blur kernel**. This chapter is what you do then. The unifying frame is one line — *recovery = data-fit + prior* — and each section just swaps in the prior that suits the degradation. Forward-ref the punchline early: when the prior is *learned* rather than hand-designed, this is exactly [[Machine learning]] and [[Super-resolution and image priors]] (PnP/RED).
💡 **Big lesson:** *diagonalize when you can* — shift-invariant blur is a **convolution**, so it's **diagonal in Fourier**: deblurring becomes a per-frequency **division** by the kernel's frequency response, and the Wiener filter is just that division, *regularized* per frequency. (→ see Big lesson L5, first placed in [[Linearity, Fourier, Aliasing and deblurring]] — recurs here as the reason both inverse and Wiener filters have one-line Fourier forms.)
equations
forward model with noise $Y = K * X + n$
**naive (inverse-filter) deconvolution** $\hat X = \mathcal F^{-1}\!\big[\hat Y / \hat K\big]$ — blows up where $\hat K \to 0$
**Wiener / regularized inverse** (Fourier, per frequency) $\hat X = \dfrac{\overline{K}\,Y}{|K|^2 + \mathrm{SNR}^{-1}}$ (equivalently $|K|^2/(|K|^2 + S_n/S_x)$ applied to the inverse filter)
**blind objective** $\displaystyle (\hat X,\hat K) = \arg\min_{X,K}\
\underbrace{\lVert K * X - Y\rVert^2}_{\text{data fit}} + \underbrace{\lambda\,\rho(\nabla X)}_{\text{image prior}} + \underbrace{\mu\,\psi(K)}_{\text{kernel prior}}$ with $\rho$ a **sparse / heavy-tailed** gradient penalty ($\lVert\nabla X\rVert_\alpha,\ \alpha<1$)
**color2gray** objective = match grayscale gradients to (signed) color-difference gradients $\min_g \sum \lVert \nabla g - \theta\rVert^2$
**colorization** objective = minimise color variation between pixels weighted by intensity affinity $\min_U \sum_x \big(U(x) - \sum_{x'\in N(x)} w_{xx'} U(x')\big)^2$ (developed in [[Edge-preserving techniques]])
8.4 Dehazing
fig-tonemap-real
fig-tonemap-real · global vs local tone mapping RESULT on a real HDR photo (seal at marina): naive single exposure · global Reinhard (one slope flattens local contrast → hazy) · local bilateral base+detail split (range fits, detail stays crisp) — real-image companion to `fig-tonemap-global-vs-local` (BASIC tone mapping)
Haze isn't a deblurring problem — light is **scattered**, not convolved — but it's the same *idea* as the previous chapter in a new costume: an image-formation model that is **under-determined**, made solvable by **one well-chosen statistical prior**.
equations
**haze image model** $Y(x)=t(x)\,X(x) + A\,(1-t(x))$ (transmission $t=e^{-\beta d}$, airlight $A$, depth $d$, scattering coeff $\beta$)
**dark channel** $X^{\text{dark}}(x)=\min_c \min_{x'\in\Omega(x)} X_c(x')$
8.5 Style transfer
fig-style-transfer-zoo
fig-style-transfer-zoo · the zoo of style transfer — patch/analogies, texture statistics (Gram), neural optimisation, feed-forward, image-to-image GANs — same skeleton, swap the prior 🟨
fig-point-op-levels
fig-point-op-levels · levels on a real (flat/hazy) photo: bunched-up luma histogram stretched out to fill [0,1] by setting a black point and a white point — input + after + transfer curve with the clip-and-stretch anchors and the two histograms (Point operations → Black point, white point, and levels, BASIC)
fig-neural-style-content-style
fig-neural-style-content-style · neural style: a content loss (deep feature match) balanced against a style loss (Gram-matrix match), summed into one objective 🟨
The thread of this chapter is one sentence: **match the *look* of a reference onto a target, keeping the target's content.** It runs from **classical** methods — match patches or low-order statistics by hand — to **learned** ones — match deep-feature statistics, or learn a whole image-to-image mapping. Several of these pieces appear as *models* in [[Deep learning]]; here they're gathered under the single idea of **style transfer**.
equations
(light) **neural style** objective $\min_{\hat I}\ \alpha\,\lVert\phi_\ell(\hat I)-\phi_\ell(I_{\text{content}})\rVert^2 + \beta\sum_\ell \lVert G_\ell(\hat I)-G_\ell(I_{\text{style}})\rVert^2$, with **Gram matrix** $G_\ell = \phi_\ell\phi_\ell^\top$ (style = feature correlations)
feed-forward variant trains a net to minimize the same **perceptual loss** in one pass
8.6 Inpainting, texture synthesis
fig-inpaint-prior-spectrum
fig-inpaint-prior-spectrum · the hole-filling prior spectrum: PDE/diffusion (smoothness) → exemplar/texture (self-similarity) → learned/generative (semantics) 🟨
fig-pde-isophote-diffusion
fig-pde-isophote-diffusion · PDE inpainting continues isophotes (level lines) smoothly into the hole — diffusion of structure inward (Bertalmío 2000) 🟨
fig-thin-lens-vs-real
fig-thin-lens-vs-real · side-by-side "convenient lie vs reality": ideal thin lens (single plane, all parallel rays → one focus) vs a real thick multi-surface lens where outer rays focus short of the paraxial focus (spherical aberration → blur, no single point)
fig-efros-leung-growth
fig-efros-leung-growth · Efros–Leung non-parametric synthesis: grow one pixel at a time by matching its known neighbourhood against the sample 🟨
fig-quilting-seam
fig-quilting-seam · image quilting: lay overlapping patches and cut along the minimum-error boundary so seams disappear (Efros–Freeman 2001) 🟨
⬜ figure not yet created
`fig-object-removal` (Criminisi: a person erased, the hole filled by confidence-/edge-priority patch order so structures continue across it) fig-object-removal
⬜ figure not yet created
`fig-highlight-recovery` (a clipped specular spot → inpainted plausible surface color) fig-highlight-recovery
This chapter is the most literal instance of 💡 L10. A hole has **no measurement at all** — the only thing that can fill it is a model of what images look like. So inpainting is best read as a **ladder of priors**, from the weakest ("be smooth") to the strongest (a learned generative model), and the whole chapter is *which prior, and where it copies-vs-invents*. The recurring honesty: filled pixels are **plausible, never measured** — verifiable when the prior copies genuine structure from elsewhere, an outright **guess** when it hallucinates.
💡 **Big lesson (recurrence of L10 · the prior is not optional):** filling a hole is the purest case — there is *zero* data inside it, so **100% of the answer comes from the prior.** Smoothness, self-similarity, a photo database, a learned net — each is a different prior, and the result is exactly as good (and as honest) as the prior is. (→ see Big lesson **L10**, first placed in [[#Super-resolution and image priors]]; here taken to its extreme — no data, all prior.)
8.7 Patch match
fig-nnf-field
fig-nnf-field · the nearest-neighbour field: every patch points to its best match elsewhere — visualised as a colour-coded offset map 🟨
fig-patchmatch-three-steps
fig-patchmatch-three-steps · PatchMatch's three moves — random initialisation → propagation (good offsets spread to neighbours) → random search (refine locally) 🟨
⬜ figure not yet created
`fig-patchmatch-apps` (one image → hole-filled, retargeted (wider, no distortion), reshuffled (an object dragged, surround re-synthesized)) fig-patchmatch-apps
fig-shiftmap-labeling
fig-shiftmap-labeling · Shift-Map editing as a graph-cut labelling: each output pixel is assigned a shift, optimised for data + smoothness 🟨
The previous chapter's exemplar methods (Efros–Leung, Criminisi) all live or die on one operation: *for each patch, find its most similar patch elsewhere.* Done naively that's a full search per patch — far too slow to be interactive, and the real reason texture synthesis felt like a batch job. This chapter is the **two ways to make that fast or optimal**: a brilliant **randomized heuristic** (PatchMatch) that gets a near-perfect answer in milliseconds, and a **global graph-cut** (Shift-Map) that gets the *optimal* answer more slowly. The unifying object is the **nearest-neighbour field**; the unifying lesson is 💡 **L12** — *the edit is an optimization over a correspondence, and the energy you pick is the design.*
💡 **Big lesson (recurrence of L12 · image edits as optimization on a graph):** "where does each output pixel come from?" is a **labelling** — assign every output pixel a **shift** into the input. Pick the **energy** — a data term (respect the user's keep/remove/resize constraints) plus a **seam** term (neighbouring pixels should take consistent offsets, so cuts fall on real edges) — and a generic **graph-cut** solver does the rest. PatchMatch is the *fast heuristic* answer to the same labelling; Shift-Map is the *globally-optimal* one. (→ see Big lesson **L12**, first placed in Seam optimization; it is the affinity lesson L4 turned into a *cut*.)
8.8 Compositing, segmentation and matting
fig-matting-equation
fig-matting-equation · the matting equation — one boundary pixel is $\alpha F + (1-\alpha)B$: a foreground colour, a background colour, and a soft coverage $\alpha$ 🟨
fig-graphcut-segmentation
fig-graphcut-segmentation · binary segmentation as a min cut: pixel nodes, source/sink terminals, t-links (Dp) + n-links (Vpq, small across edges), min cut severs the cheap edges (Seam optimization)
fig-segmentation-graphcut
fig-segmentation-graphcut · binary segmentation as a min-cut on a grid graph wired to source/sink terminals; the cut is the object boundary (GrabCut inset) 🟨
fig-segmentation-three-framings
fig-segmentation-three-framings · three framings of segmentation — min-cut, normalized-cut, and least-cost path (intelligent scissors) — on the same boundary 🟨
⬜ figure not yet created
`fig-trimap-to-alpha` (user trimap FG/BG/unknown → recovered continuous $\alpha$ over the unknown band, e.g. hair, after closed-form matting) fig-trimap-to-alpha
fig-key-vs-measure
fig-key-vs-measure · two ways to get the cut-out: *key* it (constrain the background — green screen) vs *measure* the separation (depth / IR / flash) 🟨
⬜ figure not yet created
`fig-harmonization` (a correct-$\alpha$ cutout pasted raw ("stickered on") vs Poisson-blended vs color/lighting-harmonized → "belongs") fig-harmonization
The previous chapters fixed *one* image — denoise, deblur, recover detail. Now we **combine** images: take the subject out of one photo and drop it into another. That splits cleanly into two jobs — **cut** (which pixels are the subject?) and **blend** (lay it down so the join is invisible). The first is **segmentation/matting**; the second is **compositing** (and its seamless cousins, Poisson and pyramid blending, built in EDGES MATTER and only pointed to here). The honest twist is that "which pixels are the subject" has no crisp answer at hair, fur, motion blur and glass — so the right object is not a binary mask but a **soft $\alpha$**, and the model of the world is one line: $C = \alpha F + (1-\alpha)B$.
💡 **Big lesson (recurrence of L12 · image edits as discrete optimization on a graph — the energy is the design):** a *hard* selection is exactly the **graph cut** of [[Seam optimization]] — pixels are nodes, you add a **region data term** (this color looks like foreground / background) and the same **edge-aware smoothness term** (cheap to cut where neighbouring colors differ), and min-cut/max-flow returns the **globally optimal** boundary for two labels. The optimizer is off-the-shelf; *choosing the energy is the whole design.* And the smoothness weight $V_{pq}\propto e^{-\lVert C_p-C_q\rVert^2/2\sigma^2}$ is an **affinity** — so this is also a recurrence of **L4 · edge-preserving = affinity**: the very same "how much do these two pixels belong together" that drove the bilateral filter, now deciding *where not to cut*. (→ see Big lesson **L12**, first placed in [[Seam optimization]], extending **L4**, first placed in [[Bilateral filtering]]; here both recur, as a *cut* and, below, as the *matting Laplacian*.)
equations
**compositing / matting equation** $C = \alpha F + (1-\alpha)B,\ \alpha\in[0,1]$ — *forward* = composite, *inverse* = matting (under-determined: 3 knowns, 7 unknowns per pixel)
**graph-cut segmentation energy** $E(\mathbf{L}) = \sum_p D_p(L_p) + \lambda \sum_{(p,q)\in\mathcal N} V_{pq}(L_p,L_q)$ with **data term** $D_p$ (color fit FG vs BG) and **edge-aware smoothness** $V_{pq}\propto \exp(-\lVert C_p-C_q\rVert^2/2\sigma^2)$ (an **affinity**, **L4**), minimized **exactly** for two labels by min-cut/max-flow (**L12**)
**closed-form matting** $\hat\alpha = \arg\min_\alpha \alpha^\top L\,\alpha$ s.t. trimap, where $L$ is the **matting Laplacian** (a pixel-affinity matrix from a local color-line model) — *the same affinity-Laplacian as colorization*
⬜ figure not yet created
`fig-intrinsic-decomp` (one photo → its **reflectance** layer (flat albedo) × **shading** layer (smooth illumination, no texture)) fig-intrinsic-decomp
fig-retinex-thresholding
fig-retinex-thresholding · Retinex: threshold the log-gradient (small steps = shading, large steps = reflectance edges), then re-integrate 🟨
⬜ figure not yet created
`fig-multi-illuminant-wb` (a scene lit warm tungsten on one side, cool window light on the other → one global white balance leaves *one* half wrong fig-multi-illuminant-wb
fig-hdr-merge-antelope
fig-hdr-merge-antelope · the HDR merge on a REAL bracket (Antelope Canyon): two exposures (short/long) + their per-pixel reliability weight maps → reliability-weighted radiance average → tone-mapped result; both sun and shadows readable
fig-reflection-removal-cues
fig-reflection-removal-cues · cues that separate a window reflection from the transmitted scene — ghosting/double-image, polarization, focus difference 🟨
fig-dichromatic-model
fig-dichromatic-model · the dichromatic reflection model: observed colour = a body (diffuse) component + a specular (illuminant-coloured) component — a plane in colour space 🟨
A single photograph mixes things the eye effortlessly keeps apart: it instantly reads a white shirt in shadow as *white-and-shadowed*, not as a grey shirt. The camera can't — it records the **product**, $I = R\cdot S$, illumination times surface (**L1**). This chapter is the project of **un-mixing** that product from one image — separating light from surface, or one layer of light from another. And it is the same story as super-resolution and deblurring: the un-mixing is **under-determined**, so a **prior** does the work (**L10**). The unifying frame is the **intrinsic-image decomposition**; everything else here is a special case of it. (These are the *single-image* methods; their easier *multi-image* cousins — flash/no-flash, a rotating polarizer, a light stage — live in **Computational illumination (Advanced)**.)
💡 **Big lesson (recurrence of L1 + L10):** because the world is **multiplicative** — what you measure is light **×** surface — recovering *either factor* from one image is **ill-posed**: a dark patch can be a dim surface in bright light or a bright surface in dim light, and the pixel can't tell you which. So separating illumination from reflectance (or a reflection from a transmission, a shadow from a stain, a highlight from a color) always needs a **prior** about how surfaces, light, and natural images behave — not a tuning knob but the load-bearing part of the method. (→ see **L1** · multiplicative world, FUNDAMENTALS; → see **L10** · the prior is not optional, [[#Super-resolution and image priors]].)
equations
**image as product** $I(x) = R(x)\,S(x)$ — log makes it additive $\log I = \log R + \log S$ (**L1/L2**)
**Retinex / gradient classification** — assign $\nabla(\log I)$ to $\nabla R$ if $|\nabla\log I| > \tau$ (sharp = reflectance) else to $\nabla S$ (smooth = shading), then **Poisson-integrate** each layer (**L9**)
**layer superposition (reflection)** $I = T + R$ (transmitted + reflected), under-determined → priors
**dichromatic model** $I(x) = m_d(x)\,\Lambda_{\text{body}} + m_s(x)\,\Lambda_{\text{illum}}$ (diffuse in the **surface** color direction + specular in the **illuminant** color direction — two color vectors, separable per pixel)
8.10 Non-photorealistic rendering
fig-npr-painterly-layers
fig-npr-painterly-layers · painterly rendering in coarse-to-fine brush layers: big strokes lay the ground, smaller strokes add detail where it matters (Hertzmann 1998) 🟨
fig-npr-cartoon-real
fig-npr-cartoon-real · the Winnemöller edge-preserving abstraction pipeline on a real photo: input → iterated bilateral flatten → luminance quantise → thresholded DoG ink → cartoon composite (NPR)
fig-npr-flow-dog
fig-npr-flow-dog · isotropic Difference-of-Gaussians lines vs flow-based coherent lines that follow the local edge tangent (Kang et al. 2007) 🟨
fig-npr-brush-pset
fig-npr-brush-pset · the brush-stroke problem-set figure: placing, orienting and sizing strokes from image gradients (course p-set) 🟨
Everything before this chapter tried to make images **truer** — sharper, cleaner, better-exposed. NPR does the opposite on purpose: it throws information **away** to make a photograph more like a **drawing or a painting** — to *communicate* rather than reproduce. The surprising lesson is that the tools are the **same** ones from EDGES MATTER: the operation that makes a good cartoon — **flatten the unimportant texture, keep and darken the meaningful edges** — is exactly the **edge-preserving / base–detail** decomposition (**L4**), used to *abstract* instead of *enhance*. A light chapter and a coda: *learned* stylization is in [[#Style transfer]] above and [[Deep learning]]; here we cover the **classical, structure-driven** core and the course **brush p-set**.
equations
(light) **stroke placement** — paint where the canvas differs from a **blurred reference** (error $> T$), stroke **orientation $\perp \nabla I$**, **size coarse→fine** across pyramid levels
**DoG edges** $D = G_{\sigma_1}*I - G_{\sigma_2}*I$, thresholded → ink lines
**abstraction** = **bilateral**-flattened (quantized) color **+** DoG edges overlaid