💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
Computational Photography, an AI-powered Slopendium — 04 Computational tools
expand to📖 Full book outlinejump to1 parts · 5 chapters · 22 sections · 23 figures embedded · 1 placeholders · double-click a figure to enlarge
Part 4 COMPUTATIONAL TOOLS
The chapters before this one built images up from physics, perception, and the basic processing pipeline. This part assembles the **general-purpose computational tools** the rest of the book reaches for again and again: casting a recovery task as a **linear inverse problem** and solving it by regression; replacing a hand-designed operator with one **learned from data**; and **generating** plausible images with modern generative models. They are gathered here, before the single-image applications, because nearly every later part — deblurring, super-resolution, compositing, HDR, video — is at bottom an application of one of these three.
4.1 Linear Inverse Problems and Regression
fig-correspondence-then-transport
fig-correspondence-then-transport · the L17 spine — one scene displaced (two views / two faces / two frames / one long frame), each resolved by estimating a coordinate map (homography, morph field, flow, track, motion vector, camera path) then transporting pixels by one shared inverse-warp engine; the finding is hard, the moving is plumbing 🟨
fig-image-as-vector
fig-image-as-vector · an image *is* a vector: a 5×5 pixel grid unrolled into a tall column vector; n = H·W (Linear algebra) 🟩
fig-least-squares
fig-least-squares · line fit minimizing squared vertical residuals (Optimization & regression) 🟩
fig-deblur-preview
fig-deblur-preview · the deblur preview: sharp → blurred + noise → naive inverse amplifies noise; MTF 🟨
fig-gradient-descent
fig-gradient-descent · descent path on a convex-bowl contour (−∇f steps); too-large step overshoots (Optimization) 🟩
fig-pyramid-reconstruction
fig-pyramid-reconstruction · RECONSTRUCTION on a real photo: collapse the Laplacian pyramid coarse→fine (residual → +L_k each octave → exact image), plus an all-black per-pixel error panel (max ~1e-16 = lossless) (Reconstruction, BASIC)
fig-focus-stacking
fig-focus-stacking · optics-chapter illustrative figure (07-07 Figure 4): a stepped-focus stack → sharpness selection → all-in-focus composite. Synthetic per-slice defocus on one photo (`sourced/corn-cobs.jpg`, © Frédo Durand) — license-safe. The full real-data treatment lives in part-08 `fig-focalstack-*`
equations
forward model $y = Ax$
least squares $\hat x = \arg\min_x \tfrac12\|Ax - y\|^2$
normal equations $A^\top A\,x = A^\top y$
gradient of the data term $\nabla f(x) = A^\top(Ax - y)$
gradient step $x_{t+1} = x_t - \eta\,A^\top(Ax_t - y)$
convolution form $A x = k * x$ and $A^\top y = \tilde k * y$ (flipped kernel $\tilde k$)
per-frequency inverse $\hat x(\omega) = \hat y(\omega)/\hat k(\omega)$ (when it diagonalizes)
4.2 Efficient solvers
fig-gradient-descent
fig-gradient-descent · descent path on a convex-bowl contour (−∇f steps); too-large step overshoots (Optimization) 🟩
fig-pyramid-reconstruction
fig-pyramid-reconstruction · RECONSTRUCTION on a real photo: collapse the Laplacian pyramid coarse→fine (residual → +L_k each octave → exact image), plus an all-black per-pixel error panel (max ~1e-16 = lossless) (Reconstruction, BASIC)
fig-focus-stacking
fig-focus-stacking · optics-chapter illustrative figure (07-07 Figure 4): a stepped-focus stack → sharpness selection → all-in-focus composite. Synthetic per-slice defocus on one photo (`sourced/corn-cobs.jpg`, © Frédo Durand) — license-safe. The full real-data treatment lives in part-08 `fig-focalstack-*`
The previous chapter cast recovery as the normal equations $A^\top A\,x = A^\top y$ and made the key point that we **never build $A$** — every solver needs only to *apply* the blur and its transpose. This chapter is the toolkit that actually does the solving: a handful of matrix-free, iterative / multiscale methods, and the rule for which one to reach for. The same machinery returns in [[Poisson image editing]], colorization, matting, and gradient-domain HDR — learn it once here.
equations
matrix-free operator pair $A x = k * x$, $A^\top y = \tilde k * y$
gradient step $x_{t+1}=x_t-\eta\,A^\top(Ax_t-y)$
preconditioned CG ($M\approx(A^\top A)^{-1}$)
FFT one-shot $\hat x(\omega)=\hat y(\omega)\,\overline{\hat k(\omega)}/\big(|\hat k(\omega)|^2+\lambda|\hat L(\omega)|^2\big)$
4.3 Machine learning
fig-learned-vs-handdesigned
fig-learned-vs-handdesigned · same inverse-problem skeleton, prior swapped: classical (data-fit + hand prior $\Phi$) vs learned ($f_\theta$ fit to data) (ML)
fig-synthetic-data-pipeline
fig-synthetic-data-pipeline · manufacture (degraded, clean) training pairs by simulating the camera/degradation (ML)
reference the ML & deep-learning refresher ([[Refreshers#Machine learning and deep learning]]) up front; this chapter does **not** re-teach networks — it sets up *learning an operator from data* and the **data** that powers it. The concrete deep-network operators are the next chapter, [[Deep learning]].
equations
learned operator $\hat I = f_\theta(\text{measurement})$ trained by $\min_\theta \sum_i \ell\!\big(f_\theta(x_i),\,y_i\big)$
reuse the inverse-problem $\hat I=\arg\min_I \lVert AI-b\rVert^2+\lambda\,R(I)$ from above to contrast a hand-tuned $R$ vs a learned one
4.4 Deep learning
fig-downsample-aliasing
fig-downsample-aliasing · 3 panels: the full **high-res input** zone plate (cos r², rings finer toward the edge — what's being shrunk) → ÷4 naive decimation (drop samples) → moiré aliasing → ÷4 prefilter-then-decimate (Gaussian low-pass first) → clean
fig-colorization-classical-vs-learned
fig-colorization-classical-vs-learned · Levin 2004 scribble-propagation vs Zhang 2016 fully-automatic colorization (ML)
⬜ figure not yet created
fig-depth-anything (one photo → its monocular depth map) fig-depth-anything
fig-gan-pix2pix
fig-gan-pix2pix · paired image-to-image translation: edge map → photo (conditional GAN) (ML)
fig-metric-degradation-gallery
fig-metric-degradation-gallery · one clean photo vs four degradations (1-px shift, low-q JPEG, noise, blur), each labelled with PSNR + SSIM — PSNR and SSIM mostly agree but disagree where L2 is blind (shift scores worst PSNR yet looks identical; blur keeps high PSNR yet softens texture) (Image metrics, BASIC)
these are the **deep-network** realizations of the learned-operator framing ([[Machine learning]]); architectures and training are the Refreshers' job — here we survey *what gets learned*.
equations
minimal (survey chapter) — perceptual / feature loss $\ell_{\text{feat}}=\sum_l \lVert \phi_l(\hat I)-\phi_l(I)\rVert^2$ over deep features $\phi_l$ (LPIPS)
reuse the learned operator $\hat I = f_\theta(\text{measurement})$ from [[Machine learning]]
4.5 Generative AI and diffusion
fig-genai-evaluate-vs-sample
fig-genai-evaluate-vs-sample · the generative leap (L11): a prior you can only *evaluate* ($\Phi$/denoiser) vs one you can *sample* ($x\sim p(x)$) (GenAI)
fig-diffusion-forward-reverse
fig-diffusion-forward-reverse · the two chains: forward $q$ adds Gaussian noise to a real photo; reverse $p_\theta$ denoises back (GenAI)
fig-diffusion-demo
fig-diffusion-demo · live interactive demo (web edition): type a prompt and watch the reverse process denoise from noise, step by step; static fallback is a noise→photo filmstrip (GenAI)
fig-denoiser-as-prior-spectrum
fig-denoiser-as-prior-spectrum · one PnP/RED prior slot, ever-stronger denoisers: hand-built → classical → learned → diffusion score (GenAI / Super-res)
fig-latent-diffusion
fig-latent-diffusion · encode → diffuse in a compressed latent → decode; prompt conditions by cross-attention (Stable Diffusion) (GenAI)
fig-posterior-sampling
fig-posterior-sampling · generative priors for inverse problems: one degraded $y$ → many plausible samples $x\sim p(x\mid y)$ (GenAI)
fig-conditioning-controlnet
fig-conditioning-controlnet · one prompt + a spatial hint (edges / pose / depth) → a generated image obeying both (GenAI)
This chapter is the **generative limit** of the two before it. *Machine learning* replaced a hand-designed prior with a **learned** one (L8); *Super-resolution* showed the prior is **not optional** and that a **denoiser is a universal prior** you can plug into any solver (L10, PnP/RED, and the punchline that the **score is a denoiser**). Here that same prior becomes something you can **sample from** — a generative model — and the canonical one, **diffusion**, is literally that denoiser run in a loop. We reference the ML / deep-learning refresher ([[Refreshers#Machine learning and deep learning]]) for U-Nets, transformers, and attention; this chapter does **not** re-teach networks — it surveys the *generative* idea and where it plugs back into the book's inverse-problem spine.
equations
forward (noising) process $x_t = \sqrt{\bar\alpha_t}\,x_0 + \sqrt{1-\bar\alpha_t}\,\epsilon$, $\epsilon\sim\mathcal N(0,I)$
training loss (predict the noise) $\mathbb{E}_{x_0,\epsilon,t}\big\lVert \epsilon - \epsilon_\theta(x_t, t)\big\rVert^2$
Tweedie / score = denoiser $\hat x_0 = x_t + \sigma_t^2\,\nabla_{x_t}\log p_{\sigma_t}(x_t)$, i.e. the score and a Gaussian denoiser are the same object
reverse (sampling) step — denoise a little, add a little noise, repeat from $x_T\sim\mathcal N(0,I)$ down to $x_0$
classifier-free guidance $\tilde\epsilon_\theta = \epsilon_\theta(x_t,\varnothing) + w\,\big(\epsilon_\theta(x_t,c) - \epsilon_\theta(x_t,\varnothing)\big)$
posterior sampling for an inverse problem — sample $x\sim p(x\mid y)\propto p(y\mid x)\,p(x)$ by alternating the diffusion prior step with a data-fit step against the forward model $A$ (a PnP/RED solver with a generative prior)