💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
jump to

3.8 Resampling

Resizing an image is the most ordinary thing you ever ask of an editor — shrink a photo for the web, zoom into a crop, straighten a tilted horizon, undo lens distortion. It feels like it should be a one-liner. To make a picture twice as big you spread the pixels out; to make it smaller you throw some away. The whole point of this chapter is that both of those instincts are wrong, and that doing the job well — well enough that you could rotate an image three hundred times without it dissolving — quietly leans on everything we built in the convolution and Fourier chapters.

The reason is that an image is not a picture, it is a grid of samples: point measurements of an underlying continuous scene. When you resize, you ask for the scene's value at new positions that fall between the samples you actually recorded. No pixel lives there to copy. So the honest statement of the problem has two steps — reconstruct a continuous signal from the samples you have, then re-sample it wherever the new grid asks for a value. Hold onto that sentence. It is the spine of the chapter, and every method below is just a different answer to "how should I reconstruct?"

3.8.1 Domain operations: moving pixels around

Recall the three families of image operation from the point-operations chapter, sorted by what each output pixel is allowed to read. Resampling is the domain family: the colors are left alone, only the positions move. Formally a domain transform applies a function $f$ from the plane to itself — from $\mathbb{R}^2$ to $\mathbb{R}^2$ — to the image's coordinates, with the rule that if the point $(x,y)$ carried color $c$ in the input, then $f(x,y)$ carries that same color $c$ in the output. The range (the colors) is untouched; only the domain (the coordinates) is remapped.

That one definition spans a huge amount of ground. When $f$ is a uniform scale it is resizing; a rotation straightens a horizon; an affine or perspective $f$ fixes keystoning or stitches a panorama; and a freely chosen $f$ is a warp — picture the image printed on rubber and stretched. The genuinely free-form case, including how a user even specifies such an $f$ (and the morphing it unlocks), gets its own warping chapter. Here we take $f$ as given and concentrate on the hard part shared by every domain transform: the new positions almost never land on the input grid, so we must invent values between the samples.

3.8.2 Start with scaling up

Start with the friendliest case, scaling an image up — say by an integer factor $k$. The output has $k$ times as many pixels along each axis, so most output pixels sit between input pixels and have no value of their own: we have too few samples and must fill in the gaps. (Downsampling is the opposite complaint — too many input samples crowding into each output pixel — and, surprisingly, it needs a slightly different fix. Hold that thought; the pleasant punchline is that the machinery turns out to be almost identical.) Upsampling is the cleaner place to build intuition, so we begin there.

3.8.3 The naive idea, and why it leaves gaps

Here is the instinct nearly everyone has first: loop over the input pixels, and for each one compute where it lands in the output and write its color there. To scale up by $k$, the input pixel at $(x,y)$ goes to output location $(k\,x,\ k\,y)$:

$$ \text{out}[\,k\,y,\ k\,x\,] \;\leftarrow\; I[\,y,\ x\,]. $$

This is forward mapping — push each input pixel forward to its destination. It is intuitive, and it is broken. Scale up by $k=2$ and you write output pixels $(0,0),(2,0),(4,0),\dots$ while every odd-coordinate pixel is never touched: a regular lattice of black holes (Figure 3.8.1, left). The output is mostly gaps, because spreading $N$ input pixels over $k^2 N$ output positions can fill at most $N$ of them — one cell of every $2\times2$ block, three left empty.

You might try to patch it with an inner loop that paints the little square between four placed pixels. That limps along for an axis-aligned enlargement, but the instant $f$ is a rotation the gaps stop being neat squares — they become slivers at odd angles, and "fill the gap" no longer has an easy shape. Forward mapping fights you. We need a different loop.

fig-resample-forward-inverse
Figure 3.8.1. Forward vs. inverse mapping on a 2× upscale of a small rainbow grid. Left (forward): loop over input pixels and push each to $(2i,2j)$ — only one cell of every 2×2 output block gets written, leaving the other three as black gaps (a regular lattice of holes; worse under a rotation, where the holes become tilted slivers). Right (inverse): loop over output pixels and, for each, apply $f^{-1}$ to reach back into the input and sample — so every pixel is filled, no gaps. The whole chapter loops over the output and reaches back into the input.

3.8.4 Loop over the output, use the inverse transform

The fix is to turn the loop inside out. Instead of pushing inputs forward, iterate over the output pixels — all of them, so none can be skipped — and for each one ask where did this come from in the input? That means applying the inverse transform $f^{-1}$ to the output coordinate and reading the input there:

$$ \text{out}[\,y,\ x\,] \;\leftarrow\; I\big[\,f^{-1}(x,y)\,\big]. $$

For a scale-up by $k$ the inverse divides by $k$, so output pixel $(x,y)$ reads the input at $(x/k,\ y/k)$. Now every output pixel receives exactly one value and there are no gaps by construction. This is inverse mapping, and the lesson deserves to be stated flatly: loop over output pixels, and always use the inverse transform. It is the same "where-from, not where-to" reflex as the flip in convolution — to fill a destination, reach back into the source — and it is the single most important habit in this chapter. (Figure 3.8.1 sets the two loops side by side.)

There is one immediate wrinkle, and it is the real subject of everything that follows. Inverse mapping is clean, but $f^{-1}(x,y)$ lands at a fractional coordinate — $(x/k,y/k)$ is rarely an integer, and under a rotation it essentially never is. There is no pixel sitting at $(1.3, 4.7)$. So the genuine content of resampling is the question we have been circling: given the input samples, what value should we report at a position between them? Every method below is one answer.

3.8.5 Nearest neighbour

The crudest answer: round the fractional coordinate to the closest integer and copy that pixel. Want the value at $1.3$? Round to $1$, return $I[1]$. This is nearest-neighbour reconstruction, and it is exactly what we were silently doing when we wrote $I[x/k,\,y/k]$ and let integer indexing truncate the coordinate.

It works and it is fast, but the result is conspicuously blocky: a whole range of fractional positions all snap to the same input pixel, so an upscaled image looks like fat rectangular tiles (Figure 3.8.12, nearest panel), each input pixel becoming a hard $k\times k$ block. Occasionally that is precisely what you want — pixel art, or a debug view where you want to see the literal samples — but as a general-purpose resize it is the floor, not the goal. As the challenge for nearest neighbour: try upscaling a face by $8\times$ this way and watch it turn to mosaic. We do much better by interpolating between samples instead of snapping to one.

3.8.6 Linear interpolation, in 1-D first

Drop to one dimension, where the idea is cleanest. An image row is a 1-D array of values, and we want the value at $1.3$, between the samples at $1$ and $2$. The natural move is to blend the two neighbours in proportion to how close the query sits to each. At $1.3$ we are $0.3$ of the way from $1$ toward $2$, so

$$ I[1.3] \;=\; 0.7\cdot I[1] \;+\; 0.3\cdot I[2]. $$

This is linear interpolation, and there is one detail that trips up everyone at first: the weights run opposite to the distances. The neighbour you are $0.3$ away from gets weight $0.7$, not $0.3$ — because you want weight $1$ when you sit exactly on a sample and $0$ when you are a full step away. Say it back in words: take a weighted average of the two bracketing samples, weighting each by one minus the distance to it. Geometrically, the reconstructed signal is just the straight line connecting consecutive samples (Figure 3.8.2).

fig-linear-interp-1d
Figure 3.8.2. 1-D linear interpolation. To reconstruct the value at the fractional position $1.3$ between samples $I[1]$ and $I[2]$, blend them in proportion to one minus the distance to each: $I[1.3]=0.7\cdot I[1]+0.3\cdot I[2]$. The reconstructed signal is the straight line joining consecutive samples; a neighbour's weight is $1$ when you sit on it and falls linearly to $0$ a full sample away — the opposite of the raw distance.

3.8.7 Bilinear, in 2-D

In two dimensions we now have four surrounding samples — the corners of the unit square the query falls inside — and two fractional coordinates to interpolate against. The clean construction is to do 1-D linear interpolation twice: first along $x$, then along $y$ (Figure 3.8.4). Blend the top pair of pixels horizontally to get one value, blend the bottom pair horizontally to get another, then blend those two results vertically with the $y$ fraction. The four corner weights come out as products of the $x$- and $y$-blend factors — which is exactly why this is bilinear: linear in each axis, but their product, not a single plane through all four points.

That last point is a real subtlety worth a pause. Four arbitrary corner values do not in general lie on a flat plane, so a "purely linear" 2-D surface through all four simply does not exist. Bilinear sidesteps this by being linear separately in each direction (the product of two tents), which is well-defined and cheap. The resulting surface is doubly ruled — dead straight along every horizontal and every vertical line, yet curved overall, twisting into a saddle when the four corners disagree (Figure 3.8.4b). The only way to get a truly planar piecewise-linear reconstruction is to triangulate — cut each square into two triangles and interpolate within each, since three corners do fix a plane.

fig-bilinear
Figure 3.8.3. Bilinear interpolation as two 1-D passes. The query falls inside the unit square spanned by its four nearest input pixels. Interpolate the top edge and the bottom edge horizontally (with the $x$ fraction), then interpolate those two intermediate values vertically (with the $y$ fraction). The four corner weights are the products of the $x$- and $y$-blend factors — linear in each axis, hence bilinear.
fig-bilinear-heightfield
Figure 3.8.4. The bilinear surface over one unit cell, drawn as a height field whose four corner samples are the controlling heights. Every line of constant $x$ and every line of constant $y$ is dead straight (the sheet is ruled in both directions), yet the surface as a whole bows into a saddle — proof that bilinear is linear in each axis separately but not one flat plane through the four corners.
Sidebar — which diagonal to triangulate on?

If you do triangulate, you must choose which diagonal to cut each square along, and the two choices differ near edges. A naive splitter always cuts the same way, secretly preferring one diagonal and producing a faint directional bias. The smarter trick picks, per square, the diagonal joining the two most similar corner values — cutting along an edge rather than across it, so a sharp boundary isn't smeared. Bilinear's quiet advantage is that it has no diagonal to choose: it needs no branch and favours no direction.

Exercise

Show that interpolating $y$ first, then $x$ gives exactly the same result as $x$ first, then $y$. (Write both as weighted sums of the four corners and compare the coefficients — the order of the two separable passes does not matter.)

Putting the two halves together — the inverse-mapping loop and the bilinear sample — gives the whole upsampler. Loop over output pixels, send each back through $f^{-1}$ to a fractional input location, split that into its integer cell and fractional offsets, and blend the four corners:

for each output pixel (x, y):
    (u, v) = f⁻¹(x, y)                  # fractional input location
    x0, y0 = floor(u), floor(v)
    s, t = u - x0, v - y0               # fractional offsets in [0, 1)
    top = (1 - s)·I[x0, y0]   + s·I[x0+1, y0]
    bot = (1 - s)·I[x0, y0+1] + s·I[x0+1, y0+1]
    out[x, y] = (1 - t)·top + t·bot     # blend the two rows vertically

Read it back: the two horizontal blends collapse the top and bottom edges of the cell, and the final vertical blend combines them — 1-D linear interpolation done twice. Bilinear is the workhorse default: cheap, no blocky tiles, smooth enough for most resizing, and the reconstruction you get for free from a graphics-hardware texture lookup (Figure 3.8.12, bilinear panel). But it has a ceiling. The reconstructed surface has creases — it is only continuous, not smooth, kinking at every sample — and on a strong upscale it reads a touch soft. To push past it we need to look at more than four neighbours, and the "interpolate between neighbours" story does not generalize gracefully. We need a better lens — but first a word on what any of this costs.

Sidebar — a practical note on cost

In general you cannot precompute the kernel weights: because $f^{-1}$ lands at arbitrary fractional offsets, every output pixel may need the kernel evaluated at different points (expect to re-evaluate it per output pixel, per neighbour). Two things rescue performance. First, the kernels are separable — a 2-D resample is a horizontal pass then a vertical pass, turning an $r\times r$ cost into $2r$ — so the standard implementation scales in $x$, producing an anisotropically-stretched intermediate, then scales in $y$. Second, for an integer scale factor the fractional offsets repeat, so the weights can be precomputed and the whole thing collapses to an ordinary discrete convolution. Andrew Adams' advice for downsampling is exactly this: build the tiny sparse weight matrix that downsamples a single row (the weights repeat for every row), normalize each row to sum to $1$ — which conveniently also fixes the borders — and apply it everywhere; and because the reduction is separable, do the shrinking pass first so the second pass works on less data.

3.8.8 The convolution perspective

Here is the reframing that unlocks everything. So far we have described reconstruction as interpolating between neighbours. That picture is fine for linear interpolation and for upsampling, but it has two failings: it does not extend cleanly to fancier reconstructions, and — crucially — it does not extend to downsampling at all. Since this whole chapter is fundamentally about (re)sampling, the Fourier-flavoured view from the aliasing chapter has much better insight to offer. In practice that view means one thing: reconstruction is convolution.

Concretely, the output value at a fractional location is a weighted average of nearby input pixels, with the weights read off a smooth analytical kernel $k$ centred at the desired (non-integer) location. Slide a continuous kernel so its centre sits exactly on the query point $f^{-1}(x,y)$, evaluate the kernel at each surrounding integer input pixel to get that pixel's weight, and sum. The general resampling formula is

$$ \text{out}(x,y) \;=\; \sum_{x'} I(x')\,\cdot\, k\big(f^{-1}(x,y) - x'\big), $$

summing over input pixels $x'$ in the neighbourhood, the kernel evaluated at the offset between the query and each pixel. In words: to find an output pixel, inverse-transform it to a fractional input location, drop a smooth kernel centred there, and accumulate kernel-weight times pixel-value over the nearby samples. It is convolution — but where convolution in the earlier chapter lived purely on the integer grid, here the kernel is a continuous function we can evaluate at any fractional offset (ask for $k(-0.7)$ and it answers), because the centre is non-integer.

This is the place to be emphatic about a requirement easy to skate past: because resampling evaluates the kernel at non-integer positions — the output sample falls between the input samples, so the offset $f^{-1}(x,y)-x'$ is essentially never an integer — the kernel must be a continuous, analytic function of a real argument, $k(x)$, that you can evaluate anywhere, not a discrete list of taps. A fixed convolution stencil (a $[1,4,6,4,1]/16$ blur, say) only has values at integer offsets; ask it for $k(-0.7)$ and it has nothing to say. This is the whole reason resampling kernels are written as closed-form functions — cubic polynomials, windowed sincs — rather than tap lists: you need a formula you can sample at the exact fractional offset every output pixel demands (Figure A).

fig-resample-kernel
Figure 3.8.5. Why the kernel must be a continuous function. A row of discrete input samples sits on the integer grid; a smooth kernel $k(x)$ is centred at a non-integer query location $x^\*$ between two of them, and the output is the sum of each sample times the kernel's height at that sample's offset. Because the centre is fractional, those offsets are fractional too — the kernel has to be a closed-form curve readable at any real argument, not a fixed stencil of integer taps.

The beauty is that bilinear is already a special case. The linear-interpolation weights $0.7$ and $0.3$ are exactly the values of a triangular (tent) kernel — one that is $1$ at its centre and falls linearly to $0$ one sample away on each side — evaluated at the offsets to the two neighbours. So "interpolate linearly" and "convolve with a tent" are the same operation, seen two ways. Nearest neighbour is the same story with a box kernel (a flat $1$ over a one-pixel window). Once reconstruction is "convolve with a kernel," choosing a better reconstruction is just choosing a better kernel.

Sidebar — proper resampling: reconstruct, prefilter, resample

The fully honest pipeline has three stages, not one (Figure B). Start from the sampled input. 1/ Reconstruct a continuous signal with a reconstruction filter. 2/ Prefilter (low-pass / blur) it so it carries no detail too fine for the output grid to represent — the anti-aliasing step from the aliasing chapter. 3/ Resample on the output grid. Because both the reconstruction and the prefilter are convolutions, they combine into a single kernel — the final filter is the prefilter composed with the reconstruction filter — which is why one piece of code does the whole job. Which factor dominates that combined kernel depends on direction: upsampling is dominated by the reconstruction filter (the output grid is fine, so little prefiltering is needed); downsampling is dominated by the prefilter (we must aggressively strip detail the coarse grid cannot hold). Same machinery, opposite emphasis.

fig-reconstruction-pipeline
Figure 3.8.6. The full resampling pipeline as one chain: sampled input → reconstruct (continuous signal) → prefilter (low-pass for the output grid) → resample (new grid). The reconstruction and prefilter steps are both convolutions, so they fold into one combined kernel; upsampling is dominated by the reconstruction filter, downsampling by the prefilter.

3.8.9 Better kernels: bicubic and Lanczos

If the kernel is what matters, what makes a good one? The theoretical answer, straight from the sampling chapter, is the sinc function, $\operatorname{sinc}(x)=\sin(\pi x)/(\pi x)$: it is the kernel whose Fourier transform is a perfect box, passing every frequency the grid can represent and rejecting every one it cannot, exactly. Sinc is the single mathematically optimal reconstruction filter. The catch — fatal in practice — is that sinc is infinitely wide, assuming an infinite image and an infinite compute budget. You cannot afford a kernel that reaches across the whole image for every output pixel, and it rings badly when truncated.

So every practical reconstruction kernel is an approximation of sinc with finite support, balancing two competing evils that are clearest in the frequency domain (the perfect-box-versus-smooth-approximation picture from the aliasing chapter). A kernel that is too gentle loses medium frequencies — its response rolls off early and the image goes soft — while one too aggressive lets high frequencies alias — it leaks past the grid's limit and admits jaggies and moiré. Every kernel is a point on this blur-versus-alias tradeoff. The two famous families both live on it.

Lanczos is sinc literally windowed by a wider sinc:

$$ k(x) \;=\; \operatorname{sinc}(\pi x)\,\cdot\,\operatorname{sinc}\!\big(\pi x / a\big), \qquad |x| < a, $$

over a support of $a$ samples on each side (and zero beyond). It hugs the ideal closely and looks crisp, at the cost of some ringing near strong edges — the surviving side-lobes of the truncated sinc (Figure C).

fig-lanczos-kernel
Figure 3.8.7. The Lanczos kernel $\operatorname{sinc}(\pi x)\,\operatorname{sinc}(\pi x/a)$ over its $a$-sample support (Lanczos-2 and Lanczos-3) against the plain infinite sinc: a tall central lobe flanked by alternating side-lobes, truncated at $\pm a$ — closest to the ideal box spectrum, but the surviving side-lobes are what produce edge ringing.

Bicubic uses a piecewise cubic polynomial — smooth, compact (four samples wide in 1-D), the standard default in image editors. It typically carries small negative lobes to the sides, and those are not a bug: they give the reconstruction a built-in mild sharpening (much like the unsharp mask) that makes upscaled images crisper. The cubic kernel is not unique — there is a family. Once you impose the natural requirements — the kernel should be symmetric, should be smooth where the cubic pieces meet, and should sum to one so it preserves brightness — a cubic still has two free parameters left. Mitchell and Netravali (1988) named them $B$ and $C$; the result is a single piecewise cubic, supported over four samples (two on each side):

$$ k_{BC}(x)=\frac{1}{6}\begin{cases}(12-9B-6C)\,|x|^3+(-18+12B+6C)\,|x|^2+(6-2B), & |x|<1\\[2pt] (-B-6C)\,|x|^3+(6B+30C)\,|x|^2+(-12B-48C)\,|x|+(8B+24C), & 1\le|x|<2\\[2pt] 0,& |x|\ge2\end{cases}$$

one cubic whose two knobs slide it between sharper and smoother. Three points of the $(B,C)$ plane are the canonical members worth knowing by name:

fig-bicubic-kernels
Figure 3.8.8. The cubic kernel $k_{BC}(x)$ as a one-parameter-pair family, plotted for the three standard members: cubic B-spline $(B,C)=(1,0)$, all-positive and smooth; Catmull–Rom $(0,\tfrac12)$, interpolating with strong negative side-lobes; and Mitchell–Netravali $(\tfrac13,\tfrac13)$, the balanced default between them. The deeper the negative lobes, the sharper (and ringier) the reconstruction.
fig-resample-kernels-2d
Figure 3.8.9. The kernels in 2-D, formed as the separable outer product $k(x)\,k(y)$ for Catmull–Rom and Lanczos-3, each shown as a diverging-color image and as a height field. The smooth central mound is ringed by a shallow negative moat — the sharpening side-lobes seen from above.

Mitchell and Netravali (1988) found their balance by running a perceptual study mapping which $(B,C)$ choices look best, charting where each lands among three failure modes: residual blur (too little negative lobe), ringing (ghost edges from too-strong negative lobes), and anisotropy (a visible preference for the horizontal / vertical / diagonal directions). A broad region of the $(B,C)$ map looks satisfactory, but different points suit different images — which is exactly why Photoshop and friends offer you a choice of bicubic ("bicubic smoother" for enlargements, "bicubic sharper" for reductions) rather than one fixed filter. There is no universally best answer, only a tradeoff you tune to the image: a slightly blurry original can afford a sharper kernel, a crisp one cannot.

Real software exposes exactly these kernel choices. Photoshop's Image Size resample menu lets you pick Nearest Neighbor, Bilinear, Bicubic, Bicubic Smoother (for enlargements), Bicubic Sharper (for reductions), and Preserve Details — the very box / tent / cubic family, and the smooth-versus-sharp bicubic tradeoff, turned into a dropdown (Figure F).

fig-photoshop-resample
Figure 3.8.10. Adobe Photoshop's Image Size resample-method dropdown (© Adobe Inc., reproduced for educational commentary), with each menu option mapped to its kernel: Nearest Neighbor → box, Bilinear → tent, Bicubic → cubic, Bicubic Smoother → cubic biased smooth (enlarge), Bicubic Sharper → cubic biased sharp (reduce), Preserve Details → an edge-aware / learned upsampler. The shipping product is this chapter's kernel zoo as a menu.
A surprising amount of "bicubic" on the internet is wrong

Bicubic is not "fit a cubic polynomial through the data points." It is convolution with a fixed cubic kernel — a reconstruction filter, not a curve fit. The confusion is widespread: blog posts, textbooks, and historically shipping software (old Python imaging libraries among them). If a description says it interpolates by fitting a cubic through the pixels, distrust it. Resampling — and downsampling especially — is notoriously easy to get subtly wrong and hard to debug, because the output still looks like an image, just a slightly bad one. The stakes are not merely cosmetic. Broken, aliased resizing in standard image libraries silently corrupts the metrics used to evaluate generative models (arXiv:2104.11222) and is an outright security hole (arXiv:2104.08690). The worst offender is nearest-neighbour downsampling with no prefilter: because it keeps one input pixel per output pixel and discards the rest, an attacker who knows the target size can plant a sparse lattice of tiny, near-invisible dots positioned to land exactly on the kept sample grid — so the decimated result is a different picture entirely (Figure G). Prefilter-free decimation is not just blurry-in-a-bad-way; it is broken enough to hijack.

fig-nn-downsample-adversarial
Figure 3.8.11. An adversarial-resampling demo. A large image that to the eye is one picture carries a faint regular lattice of dots borrowed from a second image. Nearest-neighbour downsampling to the target size keeps only the pixels on that lattice and throws away everything between them — so the hidden image emerges and the visible one vanishes. A correct prefilter, which averages over the discarded pixels, would have washed the dots out. The image-scaling attack (arXiv:2104.11222, arXiv:2104.08690) works only because nearest-neighbour decimation ignores all but one pixel per cell.
fig-interp-comparison
Figure 3.8.12. Nearest, bilinear, and two bicubics on the same upsampled crop (an iguana eye and scales), at a tight zoom and a larger crop. Nearest: each input pixel becomes a hard block — blocky and aliased. Bilinear: smooth but a touch soft, with faint creases at the sample grid. Catmull–Rom $(0,\tfrac12)$: crisp, the strong negative lobes adding bite — but watch the haloing and faint ringing on high-contrast edges. Mitchell $(\tfrac13,\tfrac13)$: softer and artefact-free but a hair less sharp. The two bicubics make the chapter's central tradeoff concrete: the same family slides between sharpness and aliasing / ringing, and which you prefer depends on the image and the zoom.
fig-resample-kernels-real
Figure 3.8.13. The same kernel ladder on a real photograph — a Boston skyline with a rainbow, resampled under a 35° rotation (the general case, not a clean axis-aligned zoom) and then magnified on a building detail. Nearest holds each sample as a hard square (blocky); bilinear is smooth but smears the brickwork; bicubic restores edge contrast with its negative side-lobes; Lanczos (a windowed sinc) is the sharpest of the four. Going down the ladder costs more taps per output pixel but recovers detail nearest and bilinear throw away.

3.8.10 Upsampling vs. downsampling: scale the kernel

We have promised twice that downsampling reuses the same machinery; now we cash it in. Whether you scale up or down, you run the same convolution with the same family of kernels. The only thing that changes is how you scale the kernel.

When upsampling, the kernel keeps a fixed footprint in the input image — about four pixels wide for bicubic, however much you enlarge — because the input samples are already dense enough; you are just reading between them. So for upsampling the kernel width in input space is constant (a useful corollary: clamp the kernel's scale factor at $1$, never narrower).

Downsampling is the genuinely harder case, and not by a little. Pick one input pixel per output pixel — nearest neighbour, no blur — and you get aliasing: a fine texture turns into bold, wrong, low-frequency moiré (a brick wall or a striped shirt shot from far away dissolving into wide phantom bands; Figure 3.8.16). This is not a cosmetic glitch but the wagon-wheel / Prudential-tower effect from the aliasing chapter: detail finer than the output grid can represent does not vanish, it folds down and masquerades as coarse structure.

The cure is exactly the anti-aliasing rule from that chapter: blur before you downsample. Average the input over the footprint of each new, larger output pixel, so the only patterns surviving into the small image are ones it can actually represent; then sample. Mechanically this is the same kernel as upsampling, scaled up in input space: the more aggressively you shrink, the wider the kernel must reach, because we want it to stay a roughly fixed size in output space — so it has to span more and more input pixels and average over all of them. Downsample by $5\times$ and each output pixel should gather roughly a $5$-pixel-wide neighbourhood. So the whole up / down distinction collapses to one knob: keep the kernel fixed in input space to upsample, grow it in proportion to the scale factor to downsample. One code path, one kernel-scale parameter (Figure 3.8.15).

fig-bicubic-bc
Figure 3.8.14. The Mitchell–Netravali cubic family. A single pair of parameters $(b, c)$ tunes the cubic resampling kernel across a whole family — from blurry (the cubic B-spline, $b=1$) through the balanced Mitchell ($b=c=\tfrac13$) to sharp-but-ringing (Catmull-Rom, $b=0$, $c=\tfrac12$). Drag the $(b,c)$ point to trade blur against ringing and overshoot, shown on the kernel, its frequency response, and a resampled edge.
fig-kernel-scaling
Figure 3.8.15. The one knob that distinguishes up- from downsampling. Upsampling: the kernel keeps its native width in input space (about four pixels for bicubic), reaching across the same handful of dense input samples no matter how much you enlarge. Downsampling: the kernel is stretched to span the output-pixel spacing — by $s\times$ for an $s\times$ reduction — so it averages over all the input detail crammed into each new pixel, doubling as the anti-aliasing prefilter. Same kernel shape, scaled differently.
fig-downsample-aliasing
Figure 3.8.16. Downsampling a fine texture (a zone plate, rings finer toward the edge), across three panels. Left: the high-resolution input — the high-frequency signal we start from. Middle: shrink by simply dropping samples — high-frequency detail aliases into bold, wrong moiré. Right: blur first (a kernel scaled up to the output-pixel footprint) so sub-grid detail is removed, then sample — a clean, correctly-averaged miniature. Downsampling without a prefilter is the classic resampling bug.

The effect is ugliest on real, regular structure — the rows of windows on a glass facade, shrunk for a thumbnail, break into sweeping moiré if you decimate without prefiltering (Figure 3.8.17).

fig-downsample-comparison
Figure 3.8.17. Downsampling a real photograph of a building facade. Left: naive decimation (pick one pixel per output) turns the regular grid of windows and mullions into coarse, wrong moiré bands sweeping across the facade — aliasing on actual content, not a test pattern. Right: the same shrink with a properly scaled prefilter first — a clean, sharp miniature with the window grid faithfully averaged down. The artefact is exactly the aliasing of the previous figure, now on the kind of image you would really downsample.

3.8.11 Prefiltering when the transform isn't a clean scale

Everything above quietly assumed a uniform scale, where the right prefilter is the same everywhere and radially symmetric — the same in every direction, a round footprint. For a shear, rotation, or perspective (homography) that assumption breaks. Back-project a single square output pixel through $f^{-1}$ and its footprint in the input is no longer round: under an affine map it becomes a stretched, tilted ellipse, elongated in whatever direction the transform compresses most. So the ideal prefilter is not a circle but that ellipse — anisotropic and position-dependent, a different shape at every pixel. Doing it exactly is therefore spatially varying and anisotropic, which is genuinely expensive. In practice we approximate it: mip-maps (a precomputed pyramid of progressively blurred, downsampled copies — see pyramids) let hardware pick a roughly-right isotropic blur level cheaply, and elliptical weighted average (EWA) filtering approximates the true back-projected ellipse more faithfully when quality matters. The takeaway for now: the clean "scale the kernel" story is the easy case, and non-uniform transforms need an anisotropic, position-dependent prefilter we can only approximate.

Sidebar — Andrew Adams' rotation challenge

Here is a stress test that exposes a resampler instantly: rotate an image all the way around in $N$ equal steps — rotate $N$ times by $360/N$ degrees — so the geometry returns the image to its original orientation and a perfect resampler would hand back (nearly) the original picture. The point is to try several $N$ (the figure shows $N=10$, $60$, and $360$): the more steps you take, the more times the image is resampled, and the more — smaller — rotations compound more damage, not less. Crucially you rotate $k$ times by $360/N$ each; you never do a single rotation by $k\cdot 360/N$, which would resample only once. A naive resampler — repeated bilinear with no proper prefilter — blurs a little on every step, and those tiny blurs compound into mush; the image dissolves, worse at larger $N$. A good resampler (a proper prefilter plus a sharp-but-stable reconstruction kernel) keeps the errors from accumulating and returns something very close to the start even after $360$ rounds. It is a brutal, honest way to compare resampling quality: errors invisible once become glaring after a few hundred rounds.

fig-rotation-challenge
Figure 3.8.18. Andrew Adams' rotation challenge at three step counts: rotating an image all the way around in $N=10$, $60$, and $360$ equal steps of $360/N$ degrees, each step resampling the previous result. A naive resampler blurs a little on every resample, so more (smaller) steps compound more damage and the image degrades visibly more at $N=360$ than at $N=10$; a good resampler returns close to the original at every $N$. Each row is the same total $360°$ of rotation, only chopped into more or fewer resampling steps; a validity mask tracks the corners that ever leave the frame.

3.8.12 Where this goes next

We end where the chapter began: resampling is reconstruct-then-resample, and reconstruction is convolution with a kernel that approximates sinc on a finite budget. That one idea recurs immediately. The blur-then-downsample reduce step is the engine of the Gaussian and Laplacian pyramids, the next chapter. The inverse-mapping discipline — loop over output, apply $f^{-1}$, reconstruct — is exactly what general warping and morphing will need once we let the user specify an arbitrary $f$. And a resampler's quality is one of the things we will quantify when we reach image metrics. The mechanics are humble — a weighted average of a handful of neighbours — but, as the rotation challenge shows, getting that weighted average right is what separates a tool you can trust from one that quietly destroys your image.