Computational Photography, an AI-powered Slopendium

▸Part 5 EDGES MATTER⧉

The thread of this part is that **where the edges are** governs the edit. Edges carry most of an image's meaning, so three families of method are organised around them: **gradient-domain (Poisson) editing** reconstructs an image *from* its edges (seamless cloning, gradient-domain HDR, photomontage); **edge-preserving filtering** smooths everything *except* across edges (bilateral, guided, local Laplacian → tone mapping, detail, denoising); and **seam optimization** cuts *along* the least-noticeable edges (seam carving, graph-cut compositing). All three lean on the linear-systems and convolution machinery from earlier parts; they belong together because they share one idea — respect the edges.

▸5.1 Poisson image editing⧉

`fig-seamless-cloning` · Poisson seamless cloning (Pérez 2003), EDGES MATTER → Poisson editing: a disk of Jupiter cloned onto Earth — destination (target ring) · source patch · naive paste (visible ring) · Poisson paste (seam gone) ✅

`fig-gradient-domain-pipeline` · gradient-domain (Poisson) workflow: image → take gradients (∇f) → modify the field → integrate by solving ∇²f=div v → reconstructed (EDGES MATTER → Poisson editing) ✅

`fig-mixing-gradients` · mixing gradients — keep the per-pixel *stronger* gradient so destination texture shows through holey / transparent inserts (naive vs max-gradient paste) (Poisson editing) ✅

⬜ figure not yet created

`fig-chong-colorspace-results` — results from Chong, Gortler & Zickler 2008 [placeholder]

`fig-poisson-vs-laplacian-blend` · Poisson vs Laplacian-pyramid blending, contrasted on the DC: 1-D three cases (matched level — both agree; mismatched pedestal — Poisson re-lights, pyramid leaves a halo; constant — Poisson ramps/dissolves, pyramid keeps a feathered plateau) + 2-D (paste a constant square — Poisson dissolves it via the harmonic membrane, pyramid keeps a feathered version) ✅

`fig-fourier-lowpass-highpass` · low-pass vs high-pass a real photo by masking its spectrum and inverse-transforming: keep center → blurred, drop center → edges only (convolution = mult in frequency, by hand) (Reading an image's Fourier transform) ✅

equations

the **Poisson equation** $\nabla^2 f = \operatorname{div}\mathbf{v}$ (reconstruct the field $f$ whose gradient best matches guidance $\mathbf{v}$)

the **discrete Laplacian** system $4f_{p}-\sum_{q\in N(p)} f_{q}=\sum_{q\in N(p)}\!\big(\,v_{pq}\,\big)$ (one sparse equation per pixel $p$ over its 4-neighbours $N(p)$)

the **seamless-cloning Dirichlet boundary condition** $f|_{\partial\Omega}=f^{*}|_{\partial\Omega}$ (interior gradients from the source, boundary values pinned to the destination $f^{*}$)

▸5.1.1 The key idea: edit gradients, not pixels⧉

• the **key idea**: the eye cares about **gradients** (local contrast), not absolute values — so edit the **gradient field** $\mathbf{v}$ and reconstruct the image $f$ whose gradients best match it. A field rarely has an exact integral (it is generally not a true gradient — see the conservative-fields sidebar), so it is a **least-squares** reconstruction $\min_f \int \lVert \nabla f - \mathbf{v}\rVert^2$, whose normal equations are the **Poisson equation** $\nabla^2 f = \operatorname{div}\mathbf{v}$ — i.e. exactly the inverse-problem machinery of [[Linear Inverse Problems and Regression]], applied to gradients.

• 💡 **Big lesson:** *the eye cares about gradients (local contrast), not absolute values* — so edit the gradient field and reconstruct. A mismatched absolute level (a paste's lighting/color shift, an HDR scene's enormous range) is invisible once the *gradients* are right; this is the same reason lightness constancy works and the visual system encodes differences (→ register/first appearance: [[Big Lessons#L9]]).

▸5.1.2 Seamless cloning: the headline application⧉

• **seamless cloning** (the headline app, Pérez 2003): to paste a region $\Omega$ without a visible seam, copy the source's **gradients** $\mathbf{v}=\nabla g$, not its pixels, but pin the interior to the destination's **boundary values** $f|_{\partial\Omega}=f^{*}|_{\partial\Omega}$ (a **Dirichlet** condition); solving $\nabla^2 f=\operatorname{div}\mathbf{v}$ fills an interior that flows smoothly out of the destination at the seam, so lighting/color mismatches are absorbed and the paste blends invisibly. `fig-seamless-cloning`

▸5.1.3 Reconstruction = solving the Poisson equation⧉

• the discrete **Laplacian** (a convolution kernel — the 5-point stencil) turns $\nabla^2 f = \operatorname{div}\mathbf{v}$ into **one equation per pixel**, $4f_p-\sum_{q\in N(p)}f_q=\sum_{q\in N(p)} v_{pq}$ over the 4-neighbours $N(p)$ — a **large, sparse, structured** linear system. **Boundary conditions** close it: **Dirichlet** (fixed boundary $f|_{\partial\Omega}=f^{*}|_{\partial\Omega}$) for cloning; **Neumann** (zero normal derivative) when integrating a whole image with no enclosing destination.

• pseudocode block: the transparent **Jacobi / Gauss-Seidel** iterate-to-convergence solve — each pixel ← ¼·(sum of 4 neighbours + divergence rhs), swept until settled, boundary held fixed — as the slow baseline the CG / multigrid / FFT solvers accelerate.

• the **Fourier-domain view (one FFT divide)**: gradient and Laplacian are **convolutions** (linear shift-invariant), so on a rectangular domain with periodic/Neumann boundaries they are **diagonal in the Fourier basis** — every frequency decouples and the whole Poisson equation $\nabla^2 f=\operatorname{div}\mathbf{g}$ solves in **one per-frequency division**, $\hat f(\omega)=\widehat{\operatorname{div}\mathbf{g}}(\omega)/\widehat{\nabla^2}(\omega)$ (dividing by the Laplacian's spectrum $-(\omega_x^2+\omega_y^2)$; a **DCT** variant handles Neumann/free boundary conditions cleanly). This is the **same "diagonalize when you can" (L5) FFT solver** as in deblurring — see [[Efficient solvers]] / [[Computational tools]] and [[Linearity, Fourier, Aliasing and deblurring]]. Subtlety: the **DC / mean is undetermined** — the Laplacian's spectrum is **zero at $\omega=0$** (its zero eigenvalue), so gradients pin down everything **but the overall offset**, which must be fixed by a boundary value or a reference. [`fig-poisson-fourier`: the Poisson solve as a per-frequency division — divide by the Laplacian's spectrum $-(\omega_x^2+\omega_y^2)$]

#### The same solver as the regression chapter

• **same solver as the regression chapter**: don't form the matrix — use **conjugate gradient** (with a preconditioner), **multigrid** (coarse-to-fine — ties to [[Linear pyramids and wavelets]]), or an **FFT solver** on a rectangular periodic/Neumann domain (the Laplacian **diagonalizes in Fourier** — ties to [[Linearity, Fourier, Aliasing and deblurring]]). This is the *same* machinery as [[Linear Inverse Problems and Regression]], colorization, and matting.

#### Which space to integrate in: linear vs. gamma vs. log

• **which space to integrate in (linear vs gamma vs log)**: integrate in a space where equal *gradients* mean equal *perceived* contrast. Plain **linear** light over-weights highlights and lets bright regions dominate the seam; **log** (or a perceptual encoding) makes a ratio a fixed gradient, which matches the eye and keeps shadow detail — so for tone/exposure-mismatched clones and gradient-domain HDR we integrate in **log**, then map back. We state the chosen encoding explicitly per edit (the encoding-clarity rule).

▸5.1.4 Poisson vs. pyramid blending⧉

• the **membrane** lens on the two seam-hiders: **Poisson** = gradient-domain + pinned boundary → copies *gradients* and **discards the source DC** (re-set by the boundary; adding a constant changes nothing); **Laplacian-pyramid blend** ([[Linear pyramids and wavelets]]; Burt–Adelson spline) = cross-fade *values* band-by-band → **keeps the DC**, feathered by a coarse-wide mask

• discriminator — **paste a constant**: Poisson collapses to the bare **membrane** (Laplace, boundary = destination) and the constant *vanishes* (a line in 1-D, a soap film in 2-D); the pyramid's coarse band **copies the DC**, so a feathered constant *survives*. This is **L9** (gradients, not absolute values) seen from both sides

• upshot: Poisson **re-lights** the insert (seamless, but bleeds / erases flat regions); the pyramid **preserves** appearance and only dissolves the seam (robust, solver-free, but a true brightness mismatch lingers as a halo). Cloning → Poisson; panorama → pyramid

• [`fig-poisson-vs-laplacian-blend`: 1-D textured + constant source, and a 2-D constant square — Poisson vs pyramid]

▸5.1.5 Advanced techniques⧉

#### Mixing gradients

• **mixing gradients** (Pérez 2003): per pixel, keep the **stronger** of the source's and destination's gradient, $\mathbf{v}=\arg\max(\lVert\nabla g\rVert,\lVert\nabla f^{*}\rVert)$, before solving — lets you insert **holey / lacy objects**, **transparent overlays**, or **text onto texture** so the destination's texture shows through the gaps. `fig-mixing-gradients`

• **interactive demo** (`fig-poisson-blend`): a seamless-cloning playground — upload/pick source+target, paint Ω, drag/scale the source, solve per-channel by SOR; switch color space (**linear / gamma / Chong log**) and toggle **mixed gradients**, with live source/dest/∇/reconstructed-∇/residual panels + a scanline. Standalone in `interactive-sandbox/poisson-blend/` (iframe-embedded, static PNG fallback), like the [[photo101-simulators]]. Ties together seamless cloning, the linear-vs-log encoding choice, and mixing gradients.

#### Illumination invariance, pixel encoding and color space

• **preferred solution = Chong's color space (Chong, Gortler & Zickler 2008)**: remind the reader of the transform from **RGB** (or **XYZ**) into Chong's space; because it is a **log** space it is **illumination-invariant** — the multiplicative shading becomes an **additive offset** that gradient-domain editing simply ignores, so equal gradients mean equal perceived contrast regardless of lighting. `fig-chong-colorspace-results` (results from their paper)

• **healing brush = a covariant derivative (Georgiev 2004), folded in**: Photoshop's healing brush is gradient-domain cloning made *multiplicative* — Georgiev replaces the plain gradient with a **covariant derivative** (work on $\log f$, transport gradients as ratios) so a patch matches the destination's *illumination*, not its absolute brightness — the same log/illumination-invariance point baked into the operator. Note the historical order: **Photoshop's healing brush came first (2002); Pérez et al. (2003) were inspired by it** and formalized the gradient-domain view.

#### Conservative-gradient simplification: the membrane shortcut

• **conservative-gradient simplification (Sylvain Paris / Farbman et al. 2009; Bhat 2010)**: when the guidance field is (or is made) **conservative**, the expensive Poisson solve collapses to adding a smooth **membrane** — solve $\nabla^2 u = 0$ (a Laplace, not Poisson, problem) for the boundary-mismatch correction $u$ and add it to the naive paste. Far cheaper, and the route to **real-time** gradient-domain editing.

#### Other gradient-domain edits

• **section framing — "cover your tracks"**: Poisson is the great tool for **hiding the evidence of an edit** — make any *local* edit blend so it leaves **no artifact at its boundary** (the destination flows smoothly out of the seam). Whatever you did inside the region, the gradient-domain reconstruction makes the join invisible.

• **panorama / seam blending**: stitch overlapping images by Poisson-blending across the chosen seam so exposure/color mismatches vanish at the join → forward-ref to the panorama / stitching part (where the seam is *found*; here it is *hidden*).

• **other gradient-domain edits**: HDR **tone compression** (attenuate large gradients, then integrate — Fattal 2002; integrate in log); **photomontage** (stitch many photos by choosing gradients + seams — Agarwala 2004); shadow / reflection removal; flash–no-flash fusion. All change only the **guidance field** $\mathbf{v}$ and reuse the same solver.

▸5.2 Bilateral filtering⧉

▸5.2.1 Motivation — local tone-mapping haloes⧉

**Motivation — local tone mapping haloes.** `fig-tonemap-halo`

• the setup (craft callback): HDR **local tone mapping** compresses range by a **two-scale decomposition** — split into a large-scale **base** (the overall light) and a fine **detail** layer, compress the base, keep the detail, recombine (the *best-of-both* move, [[Style Reference]] rule 5)

• do it with a **plain Gaussian** base and it **bleeds across edges**: at a bright-sky / dark-mountain boundary the blur averages sky into the mountain, so after compressing the base a **bright halo** glows along the silhouette — the classic tone-mapping artefact

• *we want* a base layer that is smooth **within** a region but **stops at edges** — exactly an edge-preserving filter

• other motivations (state plainly, don't list-open): edits that must **not bleed past edges** — denoise without smudging detail; sharpen / abstract for stylization; build selection masks. Cross-link to [[#Compositing, segmentation and matting]] and stylization (NPR).

▸5.2.2 Bilateral filter⧉

• **start from Gaussian blur** (name-then-use): the output is a weighted average of neighbours, weight = a spatial Gaussian $g_s(\lVert p-q\rVert)$ — close pixels count more. Illustrate on a **1-D scanline** = one row of the image. `fig-bilateral-1d`

• **sidebar — debugging the bilateral** (recipe moved here from Developing, testing & debugging): crank the **range variance $\sigma_r$ very large** → the range term drops out and the bilateral must collapse to a **plain Gaussian** (compare against your already-tested Gaussian); shrink **$\sigma_r$ very small** → pixels only average with near-identical neighbours, so **edges stay untouched** (run it on the half-plane / rectangle). Two extreme settings, two predictable outcomes that bracket the whole filter. `fig-bilateral-limits`

• **the problem**: near an edge the Gaussian **gathers information from across the boundary** — it averages the dark side into the bright side, smearing the step (the same bleed that made the halo)

• **the fix — a range weight**: multiply in a *second* Gaussian on the **value difference**, $g_r(\lvert I_p-I_q\rvert)$, that **down-weights pixels too different in value**. A neighbour that is close *in space* but far *in intensity* (across the edge) gets almost zero weight.

• **the bilateral weight** (Tomasi & Manduchi 1998): $w(p,q)=g_s(\lVert p-q\rVert)\,g_r(\lvert I_p-I_q\rvert)$, and the output is the normalized average $I^{\mathrm{bf}}_p=\frac1{W_p}\sum_q w(p,q)\,I_q$ with $W_p=\sum_q w(p,q)$. Read it back in words ([[Style Reference]] rule 3): *"average my neighbours, but only those that are both nearby and similar to me."* Two widths: $\sigma_s$ in pixels (how big a region), $\sigma_r$ in intensity units (how different is "too different").

• pseudocode block (from the slides): the brute-force double loop — precompute the **spatial** weight $g_s$ once, **recompute the range weight $g_r$ per pair** (it depends on the value difference, so it can't be tabulated or reused from convolution code), accumulate $\sum w\,I_q$ and $\sum w$, divide per pixel; truncate only the spatial Gaussian; RGB 3-D distance for color.

• **1-D demo**: a noisy **step edge** → Gaussian smears the step; the bilateral **denoises each plateau but keeps the step crisp**, because the weight collapses to one side at the edge. Then the **2-D image**: texture smoothed, edges held. `fig-bilateral-1d`, `fig-bilateral-2d`

• **the affinity reading** (the conceptual payoff): the range Gaussian *is* a similarity / **affinity** between two pixels — "how much do they belong together" — computed from their value difference. Every later method in this chapter is a variation on how to choose that affinity.

💡 **Big lesson:** edge-preserving = affinity — use the **color / intensity difference** between two pixels as how much they *belong together* (an **affinity**), then average (or connect) pixels in proportion to it. The bilateral's range weight $g_r$ is the first instance; the *same* affinity drives the bilateral grid, the guided filter, NL-means, colorization, matting and segmentation. *(L4 · first full edge-preserving treatment of the principle; registered in BASIC → Denoising, fully developed here.)*

#### Robustness for values with few neighbours

• **the failure**: a pixel whose value is **rare** (few neighbours within $\sigma_r$) has a tiny, noisy normalizer $W_p$ — the standard normalized bilateral becomes unstable / over-trusts the few similar pixels, and the Durand–Dorsey normalization is, strictly, **wrong** there

• **the fix — unnormalized / delta bilateral** (Aubry et al. 2014): express the output as a **delta from the input** rather than a fresh normalized average — $I^{\mathrm{bf}}_p=I_p+\sum_q w(p,q)\,(I_q-I_p)$ — and **drop the $1/W_p$ division**

• **why it works (intuition)**: each neighbour contributes a *correction* $w\,(I_q-I_p)$ pulling toward the input value; when few neighbours match, you make a **small correction** instead of dividing by a near-zero $W_p$ and amplifying noise — so a lonely value stays close to itself rather than blowing up

• ties to **local Laplacian filters** (Paris / Aubry): the unnormalized form is the bilateral cousin of the multiscale local Laplacian — see [[#Local Laplacian filters]] (and [[Linear pyramids and wavelets]])

#### Debugging — hand-checkable limits

• **large range variance** ($\sigma_r\to\infty$): $g_r\to1$, the range weight does nothing → the bilateral **collapses to a plain Gaussian** blur

• **tiny range variance** ($\sigma_r\to0$): only the pixel itself survives → output ≈ **identity**, edges (and everything) untouched

• these two limits are an **easy correctness test** for an implementation; a middle $\sigma_r$ interpolates between them

• also useful for tuning: pick $\sigma_s$ from the spatial scale of structure to remove, $\sigma_r$ from the contrast of edges to preserve (state the encoding first — see meta)

▸5.2.3 Cross / joint bilateral⧉

• **the twist**: read the **range weight from a *different* (guide) image** than the one being averaged — $g_r(\lvert J_p-J_q\rvert)$ with guide $J$, while still averaging the target $I$. The affinity comes from a *cleaner* signal.

• **why**: when the target is too noisy / dark for its own edges to be trusted, borrow edges from a **better-exposed guide**

• **flash / no-flash** (the headline app): smooth the noisy **no-flash** ambient image using **flash-image** edges as the guide — keeps the mood, kills the noise. Forward-ref → [[#Illumination related effects in a single image]] / flash–no-flash (Computational illumination). `fig-crossbilateral`

• a special case of the general **joint / guided** idea (→ guided filter, below)

▸5.2.4 Joint / cross bilateral upsampling⧉

• **the idea (Kopf et al. 2007)**: compute an **expensive** operation at **low resolution** — tone mapping, depth, stereo, colorization, a graph-cut label map — then **upsample it to full resolution guided by the full-res image**

• **how**: it's a cross/joint bilateral where the **range weight is read from the high-res guide** (the original image), so the cheap low-res solution snaps to the guide's edges and **stays sharp** — no blurry haloes across boundaries

• **why it matters**: a heavy solver runs on a tiny image, edges come (for free) from the big one — best-of-both, and the conceptual ancestor of HDRnet's learned edge-aware upsampling (below)

▸5.2.5 Bilateral grid⧉

• **the idea** (Jiawen Chen's SIGGRAPH paper): turn the **nonlinear 2-D** bilateral into a **linear 3-D** filter by adding the **range / value as a third axis** — lift $(x,y)\to(x,y,I)$ into a coarse 3-D **grid**. `fig-bilateral-grid`

• **splat → blur → slice**: *splat* pixels into grid cells (accumulate value + weight); *blur* with a **plain separable 3-D Gaussian** (now an ordinary convolution — edges are automatic because dissimilar values live in different grid layers); *slice* back out by interpolating at each pixel's $(x,y,I)$

• pseudocode block (from the slides): **splat / blur / slice** as code — accumulate value and weight into cell $(x/\sigma_s,\,y/\sigma_s,\,I/\sigma_r)$; separable 3-D Gaussian on **both** grid and weight; slice by trilinear lookup at each pixel's $(x,y,I)$ and divide value by weight (that division *is* the bilateral normalizer $1/W_p$).

• **why it's fast**: the grid is **coarse** (downsampled in space and range), so it runs real-time on a GPU; ties to [[#Differentiable image pipelines and algorithm optimization (Halide)]] (Halide schedules exactly this pipeline)

• connects to the affinity lesson: the third axis *is* the affinity made geometric — pixels far apart in value are far apart in the grid

• **what the grid unlocks** (uses): (1) **cross / joint bilateral filtering** — splat the target, read the range axis from a guide; (2) **local histogram equalization** — each grid cell already stores a **local histogram** (the per-value counts splatted into it), so you can equalize / manipulate tone *locally* straight off the grid.

▸5.2.6 Bilateral-grid learning (HDRnet)⧉

• **learn the grid**: a CNN predicts, at **low resolution**, the *coefficients* of a per-cell affine transform stored in a **bilateral grid**; slice + apply at **full resolution** → real-time learned enhancement on a phone

• the bilateral grid becomes a **learned, edge-aware upsampling** structure — best-of-both: a heavy network on a tiny image, edge-aware detail on the big one

• ties to L8 (a learned operator swaps a hand-designed prior for a learned one) — a one-line callback here, not a full box; xref ML / [[Machine learning]]

▸5.2.7 Edge-respecting selection brush⧉

• **same affinity, made interactive**: the affinity that stops the bilateral from averaging across edges also powers an **edge-respecting selection / enhancement brush** — a manual local-edit brush whose painted effect **does not bleed across edges** (it follows the same value/color affinity)

• so it is the **interactive cousin of edge-preserving filtering**: where the bilateral decides automatically "these pixels belong together," the brush lets a user *paint* an edit that respects exactly the same boundaries — dodge/burn, local tone or color tweaks that stop at the object's edge.

▸5.3 Locally adaptive regression kernel (LARK)⧉

• **the view**: treat denoising / upsampling as **local regression** — fit a smooth function to nearby pixels, weighted by a kernel

• **steering / adaptive kernels**: shape the kernel to the **local image structure** — estimate the local gradient, then **stretch the kernel *along* edges and squeeze it *across* them** (an anisotropic, data-dependent weight) so it averages along the edge, not across it

• so LARK is the **regression flavour** of the same affinity idea: the kernel geometry is *derived from* local structure, generalizing the bilateral's scalar range weight to an oriented one

• also a **descriptor**: LARK signatures are used for detection / matching (Seo & Milanfar) — note in passing, not developed

▸5.4 NL-means (non-local means)⧉

• **the leap — compare patches, not pixels**: the bilateral's affinity uses a single pixel's value; NL-means measures similarity between **small patches** around two pixels, so it matches **texture / structure**, not just brightness. `fig-nlmeans-patches`

• **non-local**: the search is **not limited to a spatial neighbourhood** — any patch anywhere in the image (a repeated texture, another brick, another eye) can vote; the spatial Gaussian is dropped or relaxed

• **patch distance & weight**: $d^2(p,q)=\sum_k g_a(k)\,\lVert I(p+k)-I(q+k)\rVert^2$ over patch offsets $k$ (optionally Gaussian-weighted $g_a$), weight $\propto e^{-d^2/2\sigma^2}$; average the **centre** pixels of the matching patches

• **why it denoises well**: a clean signal repeats, noise does not — so averaging many similar patches cancels noise while preserving structure

• the affinity reading again (now a patch-space affinity); the direct precursor to **BM3D** and to the self-similarity priors used in learned denoising (forward-ref ML)

• **BM3D** (Dabov et al. 2007) — the high-water mark of the patch / self-similarity idea, and for years the classical denoising **gold standard**. Where NL-means *averages* similar patches, BM3D goes further in two stages: (1) **block-matching** groups the most-similar patches into a **3-D stack**; (2) **collaborative filtering** applies a 3-D transform to that stack and **jointly shrinks** the coefficients (hard-threshold, then a Wiener pass using a first-pass estimate), exploiting the fact that aligned similar patches are *sparse together* — far more so than any one patch alone. Aggregate the filtered patches back (weighted by how many survived) and the result beats plain NL-means. Same lesson as the whole chapter — **find what belongs together (here, similar patches) and process them jointly** — pushed from a weighted average to a sparse transform-domain shrinkage; the bridge from bilateral/NL-means affinity to sparsity/dictionary and learned self-similarity denoisers (forward-ref [[Deep learning]]).

▸5.5 Local Laplacian filters⧉

`fig-local-laplacian` · local Laplacian filters on a real HDR sunset (Paris–Hasinoff–Kautz 2011, implemented from scratch): naive global (blown) vs Gaussian base (halo) vs local Laplacian (halo-free) tone mapping, plus the compressive remapping curve $r_g(\cdot)$ (detail centre + identity tails) and a zoom comparison proving the silhouette stays halo-free (Local Laplacian filters) ✅

⬜ figure not yet created

before/after detail-boost with **no halo** fig-wb-before-after

• **the odd one out — not an affinity method**: every other technique here weights *pairs* of pixels by similarity; local Laplacian instead works **band-by-band in a Laplacian pyramid** with a **pointwise remapping** — no range weight, no base/detail split (so none of the halos / gradient reversals those can produce). The multiscale cousin of the bilateral, by a different mechanism. Cross-ref [[Linear pyramids and wavelets]] (the pyramid it's built on) and [[Bilateral filtering]].

• **the mechanism**: to compute the output's Laplacian coefficient at a pixel and level, take the pixel's value $g$, apply a **remapping function** $r_g(\cdot)$ to the *whole* image (a smooth curve, centred on $g$, that decides what is "detail" vs "edge"), build a Laplacian pyramid of that remapped image, and read off **just this level's coefficient at this pixel**; collapse the resulting pyramid for the result.

• **the knob is the remapping $r$**: near $g$ (small amplitude = **detail**) it **compresses or amplifies** — controlling detail / local contrast; far from $g$ (large amplitude = **edge**) it stays the **identity** — so strong edges pass through untouched. One curve → detail enhancement, tone mapping, or inverse tone mapping, all **artifact-free**.

• **why it matters**: state-of-the-art **edge-aware detail manipulation and HDR tone mapping** with no halos and no gradient reversal — the failure modes that motivated the bilateral in the first place.

• **cost & the fast version**: the naive form rebuilds a pyramid per pixel-and-level (expensive); **Aubry et al. 2014** gives a fast approximation and shows the **unnormalized bilateral** is its single-scale relative (ties back to [[Bilateral filtering#Bilateral filter|the unnormalized bilateral]]).

▸5.6 Edge-preserving optimization — colorization⧉

• **the shift — filtering → optimization**: instead of one weighted-average *pass*, solve a **global least-squares** problem whose smoothness term uses the affinity as a weight. The same affinity, now inside an energy.

• **colorization** (the demo): the user scribbles a few colors; **propagate** them so that **neighbouring pixels of similar luminance get similar color** — minimize $\sum_p (U_p-\sum_{q\in N(p)} a_{pq} U_q)^2$ over the chrominance $U$, with affinity $a_{pq}$ **large where $I_p\approx I_q$** (small luminance difference) — exactly the bilateral's range idea inside a quadratic energy

• **solve it** with the linear-systems machinery of the regression chapter — sparse, structured, solved by CG / multigrid (cross-ref [[Single-image computational photography#Blind deblurring]], [[Linear Inverse Problems and Regression]])

• the **affinity matrix** here is the same object as the **matting Laplacian** and the graph-cut edge weights → forward-ref [[#Compositing, segmentation and matting]] (matting / segmentation = affinity again)

• also placeable under [[Single-image computational photography#Blind deblurring]] (the "Colorization as optimization?" stub) — keep the *technique* here, cross-ref from there

▸5.7 Guided image filtering⧉

• **the idea**: like the joint bilateral (filter $I$, take structure from a **guide** $J$) but the output is a **local linear transform of the guide**, $I^{\mathrm{out}}=a_k J + b_k$ fit per window by **least squares** — so it preserves the guide's edges by construction

• **why prefer it**: **$O(N)$**, **no range-quantization artefacts** (the gradient reversal the bilateral can show), and **differentiable** → drops into learning pipelines

• **uses**: edge-preserving smoothing; detail enhancement; **flash / no-flash**; **matting / feathering**; **dehazing** (guide = the hazy image — xref He, Sun & Tang 2009 dark-channel, [[#Dehazing as a prior-driven inverse problem]]); depth upsampling

• the affinity is now **implicit** in the local linear fit — same family, different mechanism; a fitting end-point for the chapter's through-line (bilateral → grid → learned → regression → optimization → guided, all *affinity*)

▸5.8 Seam optimization⧉

• **the third leg of EDGES MATTER** (the part's framing payoff): Poisson **reconstructs from edges**, bilateral **smooths except across edges**, seam optimization **cuts along edges**. State it as the dual of edge preservation: edge-preserving filtering works *hard to avoid* edges; seam methods *go looking for them* — a transition is invisible exactly where the image is already changing, so we hide the cut **in the edges / busy texture** and keep it **out of smooth regions** where a seam would scream. (Callback to the part intro's "respect the edges.")

• the **unifying machinery**: all three sub-families are **discrete optimization on a graph or grid** — a 1-D least-cost path (**DP**), a 2-D globally-optimal boundary (**graph cut / min-cut**), or a spectral partition (**normalized cuts**). What changes between methods is **the energy / affinity**, not the optimizer.

💡 **Big lesson:** *many image edits are **discrete optimization on a graph** — dynamic programming, min-cut/max-flow, or spectral relaxation — and the **energy (affinity) you choose is the whole game**.* The optimizer is off-the-shelf; **what** you penalize (gradient magnitude, label disagreement across edges, region/boundary cost) is the design. This is the **L4 affinity** principle (Bilateral) turned into a *cut*: there the affinity said "average these together"; here it says "**don't cut here**." *(Register as new lesson **L12 · Image edits as discrete optimization on a graph (the energy is the design)**, first appearance here; "also relevant": Poisson/photomontage, compositing/segmentation/matting, texture synthesis, stereo/MRF labeling. Full box at this section's head; one-line callback to L4.)*

▸5.8.1 Find the non-edges — least-noticeable cuts⧉

• the one idea: a cut is **least noticeable where the image is already changing** (a strong edge or busy texture) and **most noticeable in smooth regions** — so *seek* edges to hide the seam, the mirror image of edge-preserving smoothing.

• the **panorama motivation** (craft callback): naive reprojection shows a hard seam, smooth/feather blending **ghosts** moving objects; the fix is to **route the seam around disagreement** — build a difference map of "bad things happen here," then find the boundary that crosses the fewest changing pixels. (Pays off the panorama-blending forward-ref to graph cut.)

• **two regimes**: **interactive** path-finding where a user guides the cut (scissors, snakes), vs **automatic** global optimization (seam carving, graph cut, normalized cuts). Present the automatic ones as the main line; scissors/snakes as the interactive ancestors.

▸5.8.2 Intelligent scissors / live-wire — the cut as a least-cost path⧉

• **the idea** (Mortensen & Barrett 1995, *name-then-use*): an interactive selection tool — the user clicks a seed on an object boundary; as the cursor moves, a **"live wire" snaps to the strongest nearby edge** and traces the boundary in real time. Click to anchor, continue.

• **mechanism = shortest path on a cost graph**: pixels are nodes; edge cost is **low where image gradient is high** (cost $\propto$ inverse edge magnitude, + gradient-direction terms). Dijkstra from the seed gives the **least-cost path** to the cursor — the DP/shortest-path idea made interactive. `fig-snake-livewire`

• the payoff: same machinery as seam carving (least-cost path), different goal — **select/cut along** an edge rather than carve through a non-edge. The cost function is the design.

▸5.8.3 Snakes (active contours) — the continuous cousin⧉

• **the idea** (Kass, Witkin & Terzopoulos 1988): a deformable **curve** that **relaxes onto an object boundary** by minimizing an energy — the variational, continuous counterpart to live-wire's discrete path.

• **the energy** (read it back): $E=\int E_\text{int}+E_\text{image}+E_\text{con}$ — **internal** keeps the curve smooth (elastic tension $\alpha$, bending stiffness $\beta$); **image** $E_\text{image}=-\lvert\nabla I\rvert^2$ pulls it toward **strong edges**; **constraint** lets the user nudge it. Minimized by gradient descent / a sparse banded solve — same linear-algebra machinery as the regression chapter.

• honest tradeoff: snakes are **local** (need a good initialization, can get stuck) — which motivates the **global** optimizers next; note balloon/GVF/level-set variants only in passing.

▸5.8.4 Graph cut — globally optimal boundaries by min-cut/max-flow⧉

• **the leap from local to global** (Boykov & Jolly 2001): pose the cut as a **binary labeling** — every pixel is **object** or **background** — and minimize a **graph energy** whose minimum is found **exactly** (binary case) by **min-cut = max-flow**. No local minima, unlike snakes.

• **the energy** $E(L)=\sum_p D_p(L_p)+\sum_{(p,q)}V_{pq}(L_p,L_q)$ (read back): a **data term** $D_p$ — how well pixel $p$ fits each label (from user scribbles / color models) — plus a **smoothness term** $V_{pq}$ — a penalty for neighbours getting *different* labels, made **small across true edges** so the boundary **prefers to cut along edges**. (This $V_{pq}$ **is** the L4 affinity / the bilateral range weight — callback.) `fig-graphcut-segmentation`

• **the graph construction**: pixels = nodes; two terminals **source (object) / sink (background)**; **t-links** carry $D_p$, **n-links** carry $V_{pq}$; the **min cut** severs the cheapest set of edges separating source from sink → the segmentation. Solve by **max-flow** (Boykov–Kolmogorov 2004).

• **multi-label, in passing**: more than two labels (photomontage, stereo, denoising) is NP-hard exactly — solved approximately by **$\alpha$-expansion** (Boykov, Veksler & Zabih 2001), iterated binary graph cuts. Note, don't develop.

• **GrabCut** (Rother, Kolmogorov & Blake 2004): graph-cut segmentation made **easy to drive** — the user drags **one rectangle**, iterated graph cuts + updated **GMM color models** extract the object. (Cross-ref the segmentation/matting section, which owns the *application*.)

▸5.8.5 Normalized cuts — the spectral relaxation⧉

• **the problem with plain min-cut**: minimizing the raw cut weight is **biased toward tiny cuts** — it happily lops off a single isolated pixel. We want **balanced** partitions.

• **the fix** (Shi & Malik 2000): normalize the cut by each side's total **association** — $\mathrm{Ncut}(A,B)=\frac{\mathrm{cut}}{\mathrm{assoc}(A,V)}+\frac{\mathrm{cut}}{\mathrm{assoc}(B,V)}$ — penalizing unbalanced splits. The affinity $w_{uv}$ is again the **L4 affinity** (color / proximity similarity).

• **how it's solved (intuition only — keep eigen-machinery light, V13)**: exact Ncut is NP-hard, but it **relaxes to a generalized eigenproblem** $(D-W)x=\lambda D x$; the **second-smallest eigenvector** gives a soft partition you threshold — a **spectral** (vs combinatorial) route to the cut. Defer eigen-fluency to the appendix.

• the contrast worth stating: graph-cut = **combinatorial, needs seeds, exact binary**; normalized cut = **spectral, unsupervised, approximate** — two answers to "where's the best boundary," same affinity input.

▸5.8.6 Seam carving — content-aware resize by DP⧉

• **the application** (Avidan & Shamir 2007, motivation-first): resize a photo to a new aspect ratio **without squashing the subject or cropping it off** — instead of uniform scaling or cropping, **remove (or insert) the least-important pixels**, one connected seam at a time. `fig-seam-carving`

• **a seam** = an 8-connected top-to-bottom (or left-to-right) **path of one pixel per row** whose total **energy** $\sum e(i,j)$ is **minimum**; energy $e=\lvert\partial_x I\rvert+\lvert\partial_y I\rvert$ (gradient magnitude / saliency) — so the seam **threads through low-energy, low-detail regions** and avoids the subject.

• **the DP** (the worked mechanism, `fig-seam-energy-dp`): fill a **cumulative-energy** table top-down, $M(i,j)=e(i,j)+\min\!\big(M(i{-}1,j{-}1),M(i{-}1,j),M(i{-}1,j{+}1)\big)$; the cheapest endpoint $\arg\min_j M(H,j)$ + **back-trace** = the optimal seam. Remove it (narrow), or duplicate it (widen); **repeat**. Read back: *"the cheapest way down is the cheapest way to my best parent, plus my own cost."*

• pseudocode block: the DP sweep that fills $M$ row-by-row with the 3-way $\min$, then the $\arg\min$ + upward back-trace that reads off the seam.

• **the same DP as live-wire/scissors** (through-line): a 1-D least-cost path on a gradient-derived energy — only the **energy** and the **direction of the cut** differ.

• honest limits: seam carving **fails on busy/structured images** (no low-energy path exists → it carves through content); better saliency/forward-energy helps — note, don't develop.

▸5.8.7 Time-lapse via DP — a seam through time⧉

• **the idea**: choosing frames/exposures for a smooth **time-lapse or day-to-night sequence** is the **same least-cost path**, but the axis is **time** — pick a sequence minimizing inter-frame jumps (exposure/color discontinuity), a DP over candidate frames. ⚠️ *coverage note: no dedicated slide/transcript found — confirm scope or trim to a one-line example under seam carving.*

▸5.8.8 Video textures — looping a seam in time⧉

• **the idea** (Schödl et al. 2000): turn a short clip of a stochastic phenomenon (waving flag, candle, fountain) into an **arbitrarily long, seamless loop** by finding **good transitions** — frames that look similar enough to jump between invisibly.

• **mechanism = a cut in the time graph**: frames are nodes; a transition cost = **inter-frame similarity** (low cost = invisible jump); finding loops / a play order is **least-cost path / DP** on this graph — the seam idea applied to **time**. `fig-video-texture-loop`

• the payoff: same "hide the seam where things match" principle as panorama blending and seam carving — now between **frames** instead of pixels.

▸5.8.9 GraphCut textures & photomontage — cut, then blend⧉

• **GraphCut textures** (Kwatra et al. 2003): synthesize/extend a texture (or composite two images) by **graph-cut along the best seam** through the overlap region — the min cut picks the path where the two sources **agree best**, so the join is invisible. The texture-synthesis sibling of panorama seam-finding.

• **interactive digital photomontage** (Agarwala et al. 2004 — **shared with Poisson**): the headline **"cut + blend" pipeline** — **graph cut** chooses *which source contributes each pixel* (the seam, this section), then a **gradient-domain (Poisson) blend** hides any residual seam (the reconstruct-from-edges section). The clean statement of how the three legs **compose**: cut along the edges, then reconstruct across the seam. (Explicit two-way xref to [[#Poisson image editing]].)

▸5.8.10 Where this connects — MRFs and beyond⧉

• **MRFs**: the graph-cut energy $E(L)=\sum D_p+\sum V_{pq}$ **is** a pairwise **Markov Random Field** — data + pairwise prior — and graph cut / belief propagation are its inference engines. Name the connection (so the reader meets "MRF" once); the broader Bayesian/MRF treatment lives with inverse problems / segmentation, cross-ref rather than develop here. *(Resolves the old "MRFs? / Bill's Bayesian stuff" stub.)*

• closing through-line: scissors · snakes · seam carving · graph cut · normalized cuts · video textures are **one idea wearing different optimizers** — *cut where it won't be seen; the energy is the design.*

▸5.9 Recap: which edge-aware technique when?⧉

`fig-pinhole-imaging` · imaging-scenario series (2/3): add a pinhole to the bare sensor — one ray per scene point → a dim **inverted** image (same tree+sensor+colours as fig-bare-sensor-averaging) ✅

▸5.9.1 The three relationships to edges⧉

• **reconstruct *from* edges — Poisson / gradient-domain**: throw away absolute level, keep gradients, integrate back. Reach for it to **hide an edit's boundary** — seamless cloning, healing, gradient-domain HDR, photomontage/panorama blending.

• **smooth *except across* edges — bilateral & friends**: average pixels that *belong together* (the L4 affinity). Reach for it to **denoise / decompose / enhance without haloes or bleeding** — bilateral, joint/cross & upsampling, grid, guided filter, local Laplacian, NL-means, LARK, colorization.

• **cut *along* edges — seam optimization**: find the least-noticeable path/boundary. Reach for it to **resize, select, or composite** by hiding a seam in the busy regions — seam carving, scissors/snakes, graph cut, normalized cuts, GraphCut textures.

▸5.9.2 Pros / cons / cost — when to reach for each⧉

• **Poisson**: global, seamless, encoding-sensitive; cost = a **sparse linear solve** (CG / multigrid / FFT; membrane shortcut when conservative). Pick when the artifact lives **at a boundary**.

• **bilateral & friends**: mostly a **single pass** (grid → real-time; guided $O(N)$; local Laplacian heavier; colorization a solve). Pick when you must **process within regions but not across edges**. Watch range-quantization / gradient-reversal artefacts (guided & local Laplacian avoid them).

• **seam optimization**: **discrete optimization** — DP (1-D path, cheap), graph cut (2-D, exact binary, needs seeds), normalized cut (spectral, unsupervised, approximate). Pick when the right answer is **where to put the boundary**; fails on busy images with no low-energy path.

• **they compose**: photomontage / panorama = **graph-cut seam + Poisson blend** (cut along edges, then reconstruct across the seam) — the part's payoff.