💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
jump to

5.6 Edge-preserving optimization — colorization

So far the affinity has powered a single pass: gather neighbours, weight, average, done. The next mechanism keeps the affinity but changes the verb again — from one filtering pass to a global optimization. Rather than compute each output pixel on its own, write down an energy over the whole image whose smoothness term is weighted by the affinity, and solve for the image that minimizes it. Now every pixel's answer leans on every other's, coupled through the affinities, and one sparse linear solve settles them all together.

This is not a different idea from the bilateral filter; it is the bilateral filter's idea in a different form. The bilateral takes a local affinity-weighted average and reports it once per pixel. Colorization takes the same affinity and demands that, at the solution, every pixel already equal its neighbours' affinity-weighted average. The local average becomes a global constraint; the filtering pass becomes a least-squares solve. That is the whole shift, and the rest of this chapter is one canonical demonstration of it.

5.6.1 The shift: filtering becomes optimization

Hold the bilateral filter in mind. To smooth a pixel $p$, it forms a weighted average of the neighbours $q$, where each weight — the affinity $a_{pq}$ — is large when $p$ and $q$ are alike in value and small when an edge separates them. The output is that average, computed once, independently, for every pixel. Information crosses the image only as far as the filter's footprint reaches; a stroke of influence on one side of the picture has no way to reach the other in a single pass.

An optimization removes that ceiling. Instead of assigning each pixel its neighbours' weighted average, we ask for the image in which every pixel agrees, as nearly as possible, with that average — and we let a solver find the image that best satisfies all those wishes at once. Write the wish for pixel $p$ as a penalty on its disagreement with the affinity-weighted average of its neighbours, sum the penalties over the whole image, and minimize. Because pixel $p$'s wish involves its neighbours, whose wishes involve their neighbours, the constraints chain across the entire picture: a value pinned in one corner propagates, link by affinity link, as far as the affinities allow it to travel — which, where luminance is flat, is arbitrarily far. The single pass could not do this; the global solve does it for free, because "best satisfy every wish simultaneously" is inherently a statement about the whole image.

The price is that we no longer have a closed-form output per pixel; we have a system of coupled equations. The reward, as we will see, is that the system is exactly the kind we already know how to solve, and that the coupling is precisely what lets a handful of user marks fill an entire photograph.

5.6.2 Colorization: the canonical demonstration

The crispest demonstration is colorization, due to Anat Levin, Dani Lischinski and Yair Weiss (2004). Start from a greyscale photograph — an old family portrait, a black-and-white film still — and the goal is to add plausible color. Doing it pixel by pixel would be hopeless. Instead the user scribbles a few coarse strokes of color roughly where each color belongs: a smear of red across the lips, blue on a sleeve, brown over the hair. The strokes need not be careful and need not fill anything; they are just hints. The algorithm then propagates the color from the scribbles to flood the rest of the image, under a single rule (Figure 5.6.1):

neighbouring pixels of similar luminance should take similar color.

That rule is the bilateral's range idea stated verbatim — same affinity, same "alike in luminance means belongs together" — only now expressed as something to minimize rather than to average. And it is exactly the right rule, because of how images are made. Within one object, luminance varies smoothly (shading, a gentle fold of cloth); across an object's boundary, luminance jumps. So "follow the regions of smooth luminance" is a remarkably good proxy for "follow the objects," and color that obeys it spreads through each object and stops at its edges — without anyone ever telling the algorithm where the objects are.

The reason colorization is the canonical demonstration, rather than one application among many, is that it isolates the optimization idea cleanly. There is no clever color model, no segmentation, no learned prior. There is one quadratic energy, one affinity, and one linear solve, and yet a few seconds of scribbling colors an entire photograph. Everything edge-preserving and everything global about the method is on the surface.

[figure fig-colorization-scribbles not built]
Figure 5.6.1. Colorization by optimization. Left: a greyscale photograph with a few coarse color scribbles laid over it — red on the lips, blue on the sleeve, brown on the hair — none of them filling or carefully tracing anything. Middle: the propagated result, color flooding each region of smooth luminance and halting at luminance edges, so the red fills the lips but does not bleed onto the chin and the blue fills the sleeve but stops at its seam. Right (inset): the luminance channel with the same scribbles, showing that color spreads freely where luminance is flat and is blocked wherever luminance jumps — the affinity at work.

5.6.3 Writing down the energy

We are given the luminance $I$ everywhere (it is just the greyscale image) and the user's scribbles at a few pixels. We solve for the chrominance $U$ — the color part, the two color-difference channels — at every pixel. Luminance is known; only color is unknown, which is why the energy is over $U$ alone.

The rule "each pixel's color should match the affinity-weighted average of its neighbours' colors" becomes, written as a penalty to minimize,

$$ \min_{U}\ \sum_{p} \Bigl( U_p - \sum_{q\in N(p)} a_{pq}\,U_q \Bigr)^{2}, $$

where $N(p)$ is the small neighbourhood of pixel $p$ and $a_{pq}$ is the affinity between $p$ and its neighbour $q$. Each summand is one pixel's deviation from its neighbours' weighted average, squared so that positive and negative deviations both cost and so that the whole thing is a clean quadratic. Minimizing it drives every pixel toward agreement with the neighbours it most belongs with.

Two details make the term mean exactly what we said. First, the affinity is large when $I_p \approx I_q$ in luminance and small across a luminance edge — that is the entire edge-preserving content, the L4 affinity keyed on intensity difference. Second, the affinities around each pixel are row-normalized, $\sum_{q\in N(p)} a_{pq} = 1$, so that the inner sum $\sum_{q} a_{pq} U_q$ is a genuine weighted average of the neighbours rather than a mere weighted sum. With both in place, read the term back: each pixel's color should equal the affinity-weighted average of its neighbours' colors.

A natural choice for the affinity, and the one in the original method, makes the dependence on luminance difference concrete:

$$ a_{pq} \;\propto\; \exp\!\Bigl( -\frac{(I_p - I_q)^2}{2\,\sigma_p^{2}} \Bigr), $$

normalized over $q \in N(p)$, where $\sigma_p$ is a local measure of how much luminance varies around $p$ (so the falloff adapts to each region's contrast). This is the same exponential-of-squared-difference that gave the bilateral its range weight; here it is the same number doing the same job inside an energy. Where luminance is flat, neighbouring $I$ values are close, the exponent is near zero, and the affinity is strong — color couples tightly and flows freely, flooding the region. Where luminance jumps at an object boundary, $(I_p - I_q)^2$ is large, the affinity collapses toward zero, the coupling weakens, and color halts — so the red on the lips never spills onto the chin. (Levin and colleagues note that a correlation-based affinity works comparably; the exponential form is the cleanest to picture.)

Finally, the scribbles. They enter as hard constraints: at every scribbled pixel, $U$ is fixed to the color the user painted, and the minimization runs over only the unscribbled pixels. Those fixed values are what the propagation propagates — they pin the color at a few locations, and the energy carries it everywhere else along the high-affinity paths. (One could instead add a soft penalty pulling scribbled pixels toward the user's color; the hard-constraint version is simplest to state and is what the figure shows.)

Sidebar — why "match your neighbours' average" is edge-preserving smoothing in disguise

The energy looks like it is asking for smoothness — every pixel near the average of its neighbours — and it is, but a very particular, edge-aware smoothness. A plain smoothness penalty would write $\sum_p \sum_{q\in N(p)} (U_p - U_q)^2$ with equal weight on every neighbour pair, which smooths across everything, edges included, and would smear color straight over object boundaries. Inserting the affinity $a_{pq}$ as the weight is the whole fix: pairs separated by a luminance edge carry almost no weight, so the solver pays almost nothing to let their colors differ, and the smoothing simply switches off across edges. This is the same move that turned the box average into the bilateral — weight the combination by belonging — transplanted from a filtering pass into the smoothness term of an energy. "Match your neighbours' weighted average" and "smooth, but not across edges" are the same instruction.

5.6.4 Solving it: a sparse linear system we already own

The reward for casting the problem as a quadratic energy is that we already own the solver. Because each penalty term is a square of a linear combination of the unknowns, the energy is a quadratic in $U$, and minimizing a quadratic means setting its derivative to zero — which gives a linear system. Stack one equation per unknown pixel and we have a large, sparse, structured least-squares problem, the very shape met in Linear Inverse Problems and Regression.

It is sparse because each pixel's term touches only its few immediate neighbours, so each row of the system has only a handful of nonzero entries — exactly as the Poisson system of Poisson image editing touched only a pixel's four neighbours. It is structured because every row is the same affinity-weighted stencil, just shifted and re-weighted by the local luminance. And as there, the right move is not to build the matrix: keep $U$ as an image, and the only operation any solver needs is "apply the affinity-weighted Laplacian to the current guess," which is a per-pixel weighted combination of neighbours — a cheap image operation. The solve reduces to repeated applications of that operation plus some dot products, never an explicit matrix.

So the same toolkit applies — the Efficient solvers machinery:

These are the same tools that solve the Poisson equation for gradient-domain editing and that will solve the matting system in Compositing, segmentation and matting. Learn to solve this one sparse system and you have solved all of them; what differs between the applications is only which affinity goes into the weights and what plays the role of the boundary constraint.

5.6.5 One affinity, two faces: the matting Laplacian connection

Step back and the punchline of the whole part comes into focus. Filtering and optimization are two faces of one affinity. The bilateral filter takes the affinity and averages with it in a single pass; colorization takes the same affinity and minimizes an energy built from it. Same question — "do these pixels belong together?" — answered with the same instinct, once locally and once globally. Colorization is simply the moment the affinity turns its face toward optimization.

And the object that does the work — the matrix of all the $a_{pq}$, the weighted Laplacian of the energy above — is not unique to colorization. The same matrix, built from essentially the same intensity-keyed affinities, reappears as the matting Laplacian of Levin, Lischinski and Weiss (2008), where minimizing the analogous quadratic energy extracts a foreground object's soft alpha matte from a few scribbles marking "definitely foreground" and "definitely background" (developed in Compositing, segmentation and matting). Colorization propagates color from color scribbles; closed-form matting propagates opacity from foreground/background scribbles; both do it by minimizing the same affinity-weighted energy with the same sparse solver. The machinery is one; only the quantity being propagated and the marks pinning it differ.

The same energy template powers more than these two. Lischinski and colleagues used precisely this construction for interactive tonal and exposure editing: scribble "brighter here, darker there," and the affinity-weighted solve propagates a smooth, edge-respecting adjustment map across the photograph, so an exposure tweak floods a region and stops at its borders exactly as color does in colorization. In every case the recipe is the same — user marks a few pixels, an affinity-weighted energy propagates the marks along smooth-luminance paths, a sparse solve fills the rest. Colorization is the cleanest face of a single, widely reused idea.

[figure fig-affinity-two-faces not built]
Figure 5.6.2. One affinity, two faces. Top row — filtering (the bilateral): a pixel and its neighbours, each neighbour tagged with its affinity $a_{pq}$ (large within a region, near zero across the edge), and the output computed as one affinity-weighted average — a single local pass. Bottom row — optimization (colorization / matting): the same affinities, now the off-diagonal weights of a sparse matrix; the propagated solution fills the whole image from a few pinned scribbles via one global linear solve. Caption pointer: the very same weight matrix is the colorization operator, the matting Laplacian, and the graph-cut smoothness weights — averaging and solving are two uses of one affinity.