💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
jump to

5.3 Locally adaptive regression kernel (LARK)

The bilateral filter settled the central question of edge-preserving smoothing — which of my neighbours count as the same surface? — with one device: weight each neighbour by an affinity read off its value difference, so a pixel sitting just across an edge gets almost no say. The lasting idea there is not the weighted average; it is the affinity. The average is merely one thing to do with it, and a single value difference is merely one way to compute it. The locally adaptive regression kernel takes the affinity at its word and promotes it from a number to a shape.

5.3.1 Change the verb: from averaging to fitting

Begin by changing the verb. Instead of telling the operator to average its neighbours, tell it to fit — to regress a smooth function through the nearby pixels and report that function's value at the centre. The clean pixel is then what a smooth local model predicts, and the noise is the residue the model declines to explain. This single stance covers two jobs at once. For denoising, evaluate the fitted model at a pixel you already have, and you get a cleaned version of it. For interpolation and upsampling, evaluate the same model at a location between samples, and you get a principled in-between value. Both are the same act — fit locally, then read off the model — which is why kernel regression is a natural common engine for them.

Concretely, around the pixel of interest we suppose the image is locally well-described by a smooth function — a constant, a plane, or a low-order polynomial in the spatial coordinates — and we fit that function to the neighbours by weighted least squares, then take its value at the centre. The weights are supplied by a kernel: a neighbour close to the centre counts heavily, a far one counts little. With a fixed, round, isotropic kernel this is exactly an ordinary weighted average (for a constant model) or a smooth local polynomial fit (for higher orders) — and it is no better than a Gaussian blur. It smears edges just as badly, because a round kernel centred on an edge reaches indiscriminately to both sides and averages the dark surface with the bright one.

[figure fig-lark-fit-vs-average not built]
Figure 5.3.1. Denoising and interpolation are one act: fit, then read off. A 1-D slice of a noisy edge with sample points scattered around it. Left, a local model (a low-order polynomial) is fitted to the samples under a weighting kernel; the denoised value is the model evaluated at an existing sample, the interpolated value is the same model evaluated between samples — one fit serves both. Right, the same fit done with a fixed round kernel straddling the edge: the model splits the difference and the edge is smeared. The fix is to reshape the kernel, which the next figure shows.

5.3.2 Steering the kernel to the local structure

The fix is steering-kernel regression, the locally adaptive regression kernel of Takeda, Farsiu & Milanfar (2007): let the shape of the kernel be dictated by the image. The mechanism is a tidy bit of geometry. At each pixel, estimate the local structure — in practice from the local gradient, or, more robustly, from the structure tensor, the covariance of the gradients gathered in a small window around the pixel. That tensor's eigenvectors point along and across the dominant edge direction, and its eigenvalues say how strongly oriented the neighbourhood is: large and lopsided on a crisp edge, small and balanced on a flat patch. From it we read the edge's direction, its strength, and its sharpness.

Now reshape the kernel accordingly. Stretch it along the edge and squeeze it across it. The round weighting becomes an elongated, oriented ellipse that draws many samples down the length of the edge and almost none from the far side. The kernel "steers" itself to the structure: on a flat region, where the tensor is isotropic, it stays round and smooths freely in every direction; along a crisp edge it becomes a thin sliver hugging the contour; at a corner or fine texture, where structure is strong in no single direction, it shrinks so as to trust only the nearest samples. Average along a thin bright filament with such a kernel and the filament survives — sharpened, even, by all the samples taken along its length — while the refusal to reach across it keeps the filament one pixel wide.

This is also why LARK denoises and interpolates so gracefully near structure. When you upsample, the in-between value is predicted from samples that lie along the local edge rather than across it, so a steep boundary is reconstructed crisply instead of being rounded off — exactly the failure that betrays naive interpolation.

[figure fig-lark-steering-kernel not built]
Figure 5.3.2. The kernel steers to the image. Over a patch containing a curved bright edge, three pixels are marked: one in a flat region, one on the straight part of the edge, one at a sharp bend. At each, the steering kernel is drawn as an ellipse derived from the local structure tensor — round and wide in the flat region (smooth freely), long and thin along the straight edge (average down its length, never across), small and compact at the bend (trust only the nearest samples). An inset shows the structure tensor's two eigenvectors at the straight-edge point, one along the contour and one across it, with the kernel elongated along the first.

5.3.3 The regression flavour of the affinity

Read all of this as the regression flavour of the affinity — the same L4 idea wearing different clothes. The bilateral wrote affinity as a scalar weight on a value difference: a single number per neighbour, large when two pixels are alike, small across an edge. LARK writes affinity as the geometry of an anisotropic kernel derived from local structure: not one number but an oriented shape that says, all at once, which directions to trust and how far. The instinct is identical — let the edges decide who gets averaged with whom — but it has been promoted from a single number to an oriented shape, and that shape carries strictly more information than a scalar can. The bilateral knows only how different a neighbour is; the steering kernel also knows in which direction the surface continues, so it can keep averaging confidently along an edge that the bilateral would only know to stop at.

It is worth being precise about the relationship, because it is easy to overstate. The bilateral's range weight and LARK's steering kernel are not the same object computed two ways; they are two different ways to spend the same affinity instinct. The bilateral compares the centre pixel's value to each neighbour's value directly. LARK never compares the two values at all — it looks only at the local structure (the gradients) and builds a shape from it, then fits. That is why LARK belongs to the same family as the bilateral while sitting a clear step away from it: same question, different mechanism, more expressive answer. For the wider family — patches, optimization energies, guided fits — see the methods that follow in this part; here the point is only that regression is one more place the affinity can live.

5.3.4 LARK as a structure descriptor

There is a second, almost incidental use that falls out of the same object, and it deserves a sentence because it shows how general the construction is. The steering kernel computed at a pixel is, in effect, a compact descriptor of the local image structure — it encodes the orientation, strength, and isotropy of the neighbourhood in a single small matrix. Collected over a window and normalized, these locally adaptive kernels form a feature — the LARK descriptors of Seo and Milanfar — used for detection and matching: finding a template object in a cluttered image, or spotting an unusual action in video, by comparing these structure descriptors rather than raw pixels. We only note it. The same machinery that steers a denoising kernel can serve as a similarity feature, which is a clean illustration of the through-line: once you have a faithful description of local structure, you can either filter with it or recognize with it.