💡 In a hurry? Jump to this chapter’s 1 big lesson ↓

3.7 Fourier⧉

The previous chapter left us with a debt. We built blur out of convolution, and we noted that blur is not only something we do to an image but something the world does to us: diffraction, a soft lens, a shaky hand, a missed focus. The natural wish is to run that blur backward — to deblur. That wish organizes this entire chapter. Deblurring is an inverse problem: we are handed the blurred image and asked for the sharp one, which means undoing a convolution. Whether that is even possible, and what it costs, turns out to hinge on a single change of viewpoint — seeing the image not as a grid of pixels but as a sum of waves. That viewpoint is the Fourier transform, and reaching it honestly is most of the work here. On the way it hands us, almost as a free gift, the right way to think about sampling and aliasing: the jaggies, the moiré on a striped shirt, the wagon wheel that spins backward in an old film. We cash in the deblurring promise at the very end.

To see why we need new machinery at all, start with two perfectly ordinary operations and the baffling results they can produce (Figure 3.7.1). First, deblurring: take a sharp photo, blur it and add a whisper of noise, then try to undo the blur the obvious way — invert the operation directly. Instead of the sharp original, the result explodes into noise. Second, downsampling: shrink a photo of a building facade by simply keeping every seventh pixel, and bold curved bands appear that are nowhere in the building — phantom moiré. Neither result is a bug; both are doing exactly what we asked. What makes them so hard to reason about is that each operation couples a great many pixels at once — a blur mixes every neighbour into every output, and decimation's fate depends on how the discarded pixels related to the kept ones across the whole image. Staring at the pixels tells us nothing. To understand how these artifacts and patterns emerge — and how to predict, invert, and tame them — we need a different mathematical tool, one in which these many-pixel operations become simple enough to analyze at a glance. That tool is the Fourier transform, and building it is the work of this chapter.

fig-fourier-motivation — **Figure 3.7.1.** Two simple operations, two baffling results — the motivation for the whole chapter. **Top:** a sharp photo is blurred and given a little noise; inverting the blur by the naive, direct method does not return the sharp image but **explodes into noise**. **Bottom:** a building facade — a fine, near-periodic grid of windows — is shrunk by keeping every seventh pixel with no prefilter, and false **moiré** bands appear that are not in the scene. Both artifacts arise because the operation couples many pixels at once, which is what makes them hard to analyze in the pixel domain — and exactly what the Fourier transform untangles.

Sidebar — who was Fourier?

Portrait of Joseph Fourier, engraving after Julien-Léopold Boilly Joseph Fourier (1768–1830) reached his transform not through pure mathematics but through physics — the conduction of heat. In the 1822 Théorie analytique de la chaleur he made the then-scandalous claim that any function, however jagged, can be written as a sum of sines and cosines; contemporaries including Lagrange doubted it, and pinning down exactly when it holds occupied analysts for a century after. The decomposition into frequencies that diagonalizes convolution — the engine of this whole chapter — is his. (Fourier also argued that the atmosphere keeps the Earth warmer than incoming sunlight alone would: the first description of what we now call the greenhouse effect. He was, briefly, Napoleon's governor of Lower Egypt, too.) Portrait: engraving after Julien-Léopold Boilly, early 19th c., public domain (via Wikimedia Commons).

3.7.1 Images as vectors in a high-dimensional space⧉

Begin with a reframing that costs nothing and pays for the whole chapter. A grayscale image with $W \times H$ pixels is simply $W\!\cdot\!H$ numbers. Line them up in a fixed order — unroll the grid row by row into one tall column — and the image becomes a single vector, one point in a space with that many dimensions (Figure 1). A one-megapixel image is a single point in a million-dimensional space. The move feels abstract, but it is exactly what lets us bring linear algebra to bear on images, and linear algebra is the sharpest tool we have for the kind of problem we care about. The flattening is sometimes literal — it is, after all, already how the pixels sit in memory and in a machine-learning tensor — and sometimes only conceptual; the image remains, just as validly, a function over pixel coordinates. (For the linear-algebra vocabulary itself — basis, dot product, orthonormality — see the Refreshers appendix; we lean on Strang's Introduction to Linear Algebra as the standard reference.)

fig-image-as-vector — **Figure 3.7.2.** *An image is a vector.* A small $H \times W$ pixel grid (left) is unrolled row by row into one tall column vector (right) — the same gray cells, merely restacked. An image of $H\!\cdot\!W$ pixels is therefore a single point in an $n = H\!\cdot\!W$ dimensional space, and a one-megapixel image lives in a space of a million dimensions. This flattening is how images already sit in memory, and it is what lets us treat image operations with the machinery of linear algebra.

Why does the vector view help? Because a great many image operations are linear: brightening scales the vector, adding two exposures adds the vectors, and — the one that matters here — blur is linear too. Any linear operation on a vector is a matrix. So an optical blur, a motion blur, a Gaussian filter: each is some large matrix $M$ that carries the sharp image $\mathbf{x}$ to the blurred image $\mathbf{y} = M\mathbf{x}$. Deblurring is then, in principle, nothing more dramatic than solving a linear system: invert the matrix, recover $\mathbf{x} = M^{-1}\mathbf{y}$. The entire question of whether we can deblur, and how badly noise will hurt when we try, collapses into a question about a single matrix — and that is a question linear algebra knows how to answer. This dual usefulness, conceptual (a clean language for inverse problems) and practical (we get to call well-tested solvers and libraries), is the reason to adopt the view.

There is one practical worry to dispel immediately. The matrix $M$ for a one-megapixel image is a million by a million — far too large to build, store, or invert directly. The trick, which recurs throughout the book, is that we reason about $M$ without ever forming it. We work out what to do in the abstract language of $M$ and $\mathbf{x}$, then implement it with the cheap image operations that $M$ secretly stands for — a convolution by the blur kernel, a scalar multiply, a dot product of two images. The matrix is the thinking tool; the convolution is the running code. Keeping that separation in mind is what makes the linear-algebra view usable rather than merely elegant.

Sidebar — when do we get to use linearity?

Linearity is a gift, and we grab it whenever we honestly can. Sometimes the world genuinely is linear: light adds up, optical blur is a true linear operator. Sometimes we must reformulate to expose a linearity hiding underneath — a multiplicative effect becomes additive in log, perspective becomes linear in homogeneous coordinates (later). Sometimes we simplify a messy reality to a linear model on purpose, because linear tools are so much cheaper and better understood. The honest caveat for this chapter's physics: images are usually gamma-encoded for storage, while optical and camera-shake blur happen in linear light — so decode to linear before deblurring. The recurring caution of the book holds: idealized linear math rarely fits reality exactly, but its insights almost always survive the mismatch.

3.7.2 why Fourier⧉

Here is the catch with the matrix view. The blur matrix $M$ is enormous, and although each row is actually quite sparse — every output pixel mixes only the handful of inputs under the kernel, so most entries are zero — that local cross-talk is still painful to reason about and, above all, to invert. Undoing the blur means untangling all the overlapping neighborhoods at once, and the inverse $M^{-1}$ is generally not sparse: a clean local operation can have a thoroughly non-local undo. You cannot eyeball a million-by-million matrix and pronounce whether it can be undone. The structure is at least tidy in 1-D: there the convolution matrix is Toeplitz — the same kernel sliding down every row, constant along each diagonal — clean enough to draw. In 2-D that tidiness fragments: once you flatten the grid into a vector, the matrix becomes a block-of-Toeplitz-blocks, far messier to picture even though it is doing the very same local averaging. That is one more reason to change basis rather than wrestle the matrix directly.

There is, however, one species of matrix that is trivial to understand and trivial to invert: a diagonal matrix. A diagonal matrix has no cross-talk whatsoever — each input coordinate is multiplied by its own scalar, independently of every other. Inverting it is then embarrassingly easy: divide each coordinate by that scalar. The entire difficulty of inversion shrinks to one question per coordinate — how close is that scalar to zero? — because dividing by a near-zero number is where all the trouble lives.

So the dream is obvious: we would love the blur matrix to be diagonal. In the pixel representation it is nowhere close. But — and this is the pivot of the chapter — we are free to change how we represent the image, to describe it in a different basis, and a linear operator that looks coupled and hard to invert in one basis can look perfectly diagonal in another. The basis that diagonalizes convolution is precisely the Fourier transform. Change to the Fourier representation and blur stops being a million-dimensional tangle and becomes, instead, a separate scalar multiplication on each frequency — exactly the diagonal case we wished for. (One honest caveat, which we keep flagging: this exact diagonalization assumes the image is cyclic — that it wraps around at its edges like a torus. Real images do not, so the borders misbehave; the insight survives anyway, which is why we lean on it as an analysis tool more than a literal implementation.)

💡 Big lesson — diagonalize when you can

Express a linear operator in a basis where it becomes diagonal. A coupled, hard-to-invert operation in one representation can be a list of independent scalar multiplications in another — and scalars are trivial to understand, compose, and invert. Fourier diagonalizes convolution, turning a blur into per-frequency multiplication. This is not a one-off trick: it is the single most reusable move in applied linear algebra, and it returns for differential equations (the heat equation, below), for principal components, and for nearly every inverse problem in this book. The idea is already visible in the smallest possible case — a $2\times2$ matrix (Figure 3.7.3).

fig-diagonalize-2x2 — **Figure 3.7.3.** The smallest example of diagonalizing. The symmetric matrix $M=\bigl[\begin{smallmatrix}2&1\\1&2\end{smallmatrix}\bigr]$ looks **coupled** in the $x,y$ basis — the off-diagonal $1$s mix the two axes. But its eigenvectors point along $(1,1)$ and $(1,-1)$, i.e. at $\pm 45^\circ$, and in that rotated basis it is simply $\operatorname{diag}(3,1)$: $M=R_{45^\circ}\,\operatorname{diag}(3,1)\,R_{45^\circ}^{\top}$. **Left:** the algebra. **Right:** the geometry on the standard plane — the unit circle (dotted) maps to an ellipse (solid) whose axes lie exactly on the $45^\circ$ eigenvectors, stretched $\times 3$ along the diagonal $u_1$ and left unchanged ($\times 1$) across it. Rotate to the eigenbasis and the coupling vanishes, leaving two independent scalings — diagonalization in miniature.

And there is a bonus that makes Fourier feel less like a computational convenience and more like the right language for images. We have already met spatial frequency once, back in perception: the contrast sensitivity function (CSF) describes the eye's sensitivity as a function of how fine or coarse a pattern is — its spatial frequency (see Human perception → Spatial vision / CSF). The visual system is, to a first approximation, organized by frequency. So the Fourier view matches both the mathematics of convolution and the structure of human vision, which is why frequency keeps reappearing — in JPEG compression, in image-quality metrics, in tone mapping. When two unrelated considerations point at the same representation, that representation is usually worth learning.

Sidebar — a Fourier transform you can see

Before it is a piece of mathematics we impose on images, the Fourier transform is something the physical world already does. Look at a distant point of light — a star, a streetlight — through a small aperture or a camera lens, and the little starburst pattern you see is, almost exactly, the Fourier transform of that aperture (its squared magnitude). A round opening makes the bright central disc and faint rings of the Airy pattern; the polygonal opening of a stopped-down iris makes the radiating sunstar spikes around a bright light, one streak perpendicular to each blade-edge. The optics performs the transform for free, in the instant it takes light to cross the lens — no FFT required. The lesson to take into this chapter is that the Fourier transform is not an arbitrary trick: it is a natural, physical way that waves and apertures already organize themselves, and we are simply borrowing it to organize images. (We meet the optics of this in the OPTICS part, where the lens's frequency response — its modulation transfer function, the Fourier transform of its point-spread blur — is this same picture seen in the frequency domain.)

3.7.3 Definition: one coefficient per wave⧉

Here is the transform itself, in words first. Instead of describing an image by one value per pixel, the Fourier transform describes it by one coefficient per sine wave. We fix a family of sinusoidal patterns — waves of every frequency that fit in the image — and ask, for each one, how much of that wave is present? The list of answers is the Fourier transform $\hat I$ of the image, and because the operation is perfectly reversible, that list carries exactly the same information as the original pixels: it is the same image, written in a different alphabet.

For an $N$-sample one-dimensional signal the recipe is a sum — one output coefficient $\hat I(u)$ for each frequency index $u$. This is the discrete Fourier transform (DFT):

$$ \hat I(u) = \sum_{x=0}^{N-1} I(x)\, e^{-2\pi i\, u x / N}. $$

Read it back in words: to find how much of frequency $u$ the signal holds, slide a wave of that frequency across the signal, multiply point by point, and add up — a dot product between the signal and that wave. The complex exponential $e^{-2\pi i\,ux/N}$ is just a compact way to carry a cosine and a sine of frequency $u$ together, by Euler's formula

$$ e^{i\theta} = \cos\theta + i\,\sin\theta, $$

so the real part of $\hat I(u)$ measures the cosine content and the imaginary part the sine content. Why carry both? Because a single frequency can appear at any phase — shifted left or right — and a cosine and a sine of that frequency together can reproduce a wave of that frequency at any phase and amplitude. One pair, one cosine and one sine, per frequency. (The naive sum costs $O(N^2)$, but the fast Fourier transform (FFT) computes the very same coefficients in $O(N\log N)$ — the algorithm that made Fourier methods practical, and the reason spectral tricks are cheap in code.)

Sidebar — who was Euler?

Portrait of Leonhard Euler, by Jakob Emanuel Handmann Leonhard Euler (1707–1783) was the most prolific mathematician who ever lived — he kept publishing for years after going blind, dictating from memory. The formula just above, $e^{i\theta}=\cos\theta+i\sin\theta$, is his, and it is what lets a single complex exponential carry a cosine and a sine together — the bookkeeping that makes the Fourier transform compact. His reach into this book is wide: the Euler–Lagrange equations we set to zero to minimize an energy (optical flow, Poisson blending), and the Eulerian "watch a fixed location" view of motion that names half of Feature tracking and all of Video magnification. Portrait: by Jakob Emanuel Handmann, 1753, public domain (via Wikimedia Commons).

⚠️ So many Fourier conventions

A warning to carry out of this chapter, because it will bite the first time you compare a formula against code. The transform above is one of many cosmetically different definitions, and they do not agree digit-for-digit. They differ in the form (the compact complex-exponential here versus separate real sine/cosine series); in where the $2\pi$ lives (in the exponent, folded into the frequency variable, or hidden in the normalization); in which transform you mean (the continuous Fourier transform, the Fourier series of a periodic signal, or the discrete DFT used in code); in the normalization (a $1/N$ on the forward transform, a symmetric $1/\sqrt N$ on each side, or none at all); in the sign of the exponent (forward versus backward — who gets the minus); and in angular versus ordinary frequency. Textbooks, signal-processing courses, and NumPy each pick their own combination, so a formula lifted from one source and dropped into another can be off by a factor, a sign, or a $2\pi$ without any error message to warn you. The rule is simple: always check which convention a source uses before you trust its formula. We use the one above throughout, and — reassuringly — the intuition (a dot product against each wave, convolution becoming per-frequency multiplication) survives every convention unchanged (Figure 3.7.4).

fig-fourier-conventions — **Figure 3.7.4.** *So many Fourier conventions — beware the "flavors."* Two light illustrations of the same warning: the **same** Fourier transform comes in many cosmetically different "flavors" — complex exponential versus real sine/cosine, where the $2\pi$ lives, the normalization, the sign of the exponent, continuous transform versus series versus DFT — each genuinely correct, none digit-for-digit interchangeable. The point is the lesson: there is no single canonical formula, only a convention you must pin down before trusting any equation or dropping it into code.

That "dot product against each wave" structure is the punchline of the vector view. The Fourier basis is orthonormal: the waves are mutually perpendicular unit vectors in our high-dimensional space, so finding a coefficient is just projecting the image onto that wave — a dot product, no system to solve. The whole transform is therefore a single matrix multiplication: stack the waves as the rows of a matrix and multiply by the flattened image (Figure 2). And the same orthonormal change of basis that those rows perform is exactly the one that diagonalizes every convolution at once — every linear shift-invariant (LSI) operator. That is the general theorem hiding beneath the chapter: Fourier diagonalizes every linear shift-invariant operator.

fig-fourier-basis-matrix — **Figure 3.7.5.** *The DFT as a change-of-basis matrix.* Each row is one basis wave — a cosine (real part) and a sine (imaginary part) of a given frequency, from the flat constant term at the top through ever-finer ripples below. Multiplying this matrix by the flattened image computes, row by row, the dot product of the image with each wave — i.e. how much of each frequency it contains. Because the rows are orthonormal, the matrix is its own (conjugate-transpose) inverse: the same waves reconstruct the image from its coefficients.

Sidebar — Fourier and the heat equation

This trick was not invented for images. Jean-Baptiste Joseph Fourier (1768–1830) was trying to solve heat diffusion in a one-dimensional rod with a given initial temperature profile, and the governing equation was a coupled mess. His move, in 1807, was to write the temperature as a sum of cosines — and in that basis the heat equation diagonalizes: each frequency evolves entirely on its own, and a single cosine simply decays exponentially in time ($\propto e^{-k^2 t}$), the highest frequencies fastest, so any profile smooths out. The cosines are the eigenfunctions of the Laplacian (the diffusion operator) — the same reason Fourier diagonalizes our convolutions. Notice what separation of variables has handed us: the two canonical solution families of linear differential equations meeting in one problem — sine waves in space and exponentials in time. The spatial sinusoids set the basis; their amplitudes decay as $e^{-k^2 t}$, the finest ripples dying first, so a general profile melts smooth while a single sine merely fades in place without changing shape (Figure 4). The lesson reaches far past heat: for almost any linear ordinary differential equation (ODE) or partial differential equation (PDE), find the eigenbasis and a coupled system decouples into independent scalar equations whose solutions are plain exponentials ($e^{\lambda t}$) or, for oscillatory systems, sine waves. Diagonalizing is the win.

fig-heat-diffusion-1d — **Figure 3.7.6.** *Heat diffusion in 1-D: sines in space, exponentials in time.* Left, a general 1-D temperature profile (amplitude vs. position) shown at successive times — every wiggle softens, the sharpest features vanishing first, until the profile flattens. Right, the same equation acting on a single sine wave: it keeps its exact shape and position and simply **decays in place**, its amplitude shrinking as $e^{-k^2 t}$. The two panels are the same fact: in the Fourier basis the coupled diffusion decouples into independent modes, each a spatial sinusoid whose height falls off exponentially in time, the highest frequencies fastest.

3.7.4 Sines are the eigenvectors of convolution⧉

We have asserted that Fourier diagonalizes convolution. Here is the one-line reason, and it is worth seeing because it is the mechanical heart of the chapter. Feed a pure sine wave into any convolution — any blur — and what comes out is the same sine wave: same frequency, only with its amplitude scaled and its phase possibly shifted (Figure 5). The wave is never bent into a different frequency; it is merely turned down (or up) and slid over. In the language of linear algebra, the sine waves are the eigenvectors of convolution, and the per-frequency scale factor is the eigenvalue. Symbolically, convolving a wave $e^{i\omega x}$ by a kernel $g$ just multiplies it by a number $\hat g(\omega)$ that depends only on the frequency:

$$ g * e^{i\omega x} = \hat g(\omega)\, e^{i\omega x}. $$

In words: a blur cannot create or move frequencies, it can only attenuate each one by its own factor. That factor $\hat g(\omega)$ — the eigenvalue at frequency $\omega$ — is exactly the Fourier transform of the kernel. A blur, having a broad smooth kernel, leaves the low frequencies almost untouched ($\hat g \approx 1$) and crushes the high ones toward zero ($\hat g \approx 0$): blur is a low-pass filter, which is just the frequency-domain restatement of "blur removes fine detail."

Because every frequency is scaled independently, the whole convolution, written in the Fourier basis, is diagonal — the eigenvalues $\hat g(\omega)$ sit on the diagonal and nothing else. This is the convolution theorem, and it is the sentence to memorize:

$$ \mathcal{F}\{I * g\} = \hat I \cdot \hat g. $$

Convolution in space is multiplication in frequency: transform the image and the kernel, multiply their spectra frequency by frequency, transform back. Everything else in this chapter is a consequence of that single identity.

fig-sine-eigenvectors — **Figure 3.7.7.** *Sine waves are the eigenvectors of convolution.* A pure sinusoid (top) convolved with a blur kernel comes out (bottom) as the **same** sinusoid — identical frequency — only reduced in amplitude and slightly shifted in phase. No new frequencies appear. The amount of attenuation is the kernel's Fourier coefficient $\hat g(\omega)$ at that frequency: large for low frequencies, near zero for high ones, which is why blur is a low-pass filter.

3.7.5 The 2-pixel example⧉

A million dimensions is impossible to picture, so shrink the image to two pixels — a vector in the ordinary 2-D plane, with the two pixel values as its $x$ and $y$ coordinates (Figure 6). Take the simplest blur: each output pixel is $0.8$ of itself plus $0.2$ of its neighbor. As a matrix in the pixel basis this mixes the two coordinates — it is not diagonal; the off-diagonal $0.2$ is the cross-talk.

Now rotate the coordinate axes by 45°, onto the two directions $[1, 1]$ (both pixels equal — the constant, the zero-frequency or direct current (DC) term, borrowing the electrical-engineering name) and $[1, -1]$ (the two pixels opposite — the highest frequency a two-pixel image can carry). That 45° rotation is the Fourier transform for a two-pixel signal. And in these rotated axes the very same blur is diagonal: the constant direction $[1, 1]$ passes through untouched (a flat signal cannot be blurred — eigenvalue $1$), while the oscillating direction $[1, -1]$ is attenuated to $0.6$ (since $0.8 - 0.2 = 0.6$, the high frequency is damped). Two pixels, two frequencies, two independent scale factors — the entire apparatus of the chapter in a picture you could draw on a napkin. Everything that follows is this, scaled up to a million dimensions.

fig-fourier-2pixel — **Figure 3.7.8.** *The 2-pixel example.* A two-pixel image is a point in the plane (axes = the two pixel values). The blur $\text{out} = 0.8\cdot\text{self} + 0.2\cdot\text{neighbor}$ mixes the axes — off-diagonal, not diagonal. Rotating the axes by 45° to $[1,1]$ (constant / DC) and $[1,-1]$ (highest frequency) — which is the two-point Fourier transform — makes the same blur diagonal: it leaves the constant direction unchanged (eigenvalue 1) and scales the oscillating direction to 0.6. A change of basis turns mixing into independent per-frequency scaling.

3.7.6 Reading an image's Fourier transform⧉

For a 2-D image the recipe is the same with two frequency indices $(u, v)$ — horizontal and vertical — so the transform is itself an image, the same size as the original, with a value at each frequency. We display it as two pictures: a magnitude (how much of each frequency is present) and a phase (where each wave sits). A handful of properties make these readable.

First, a real image has a conjugate-symmetric spectrum: the coefficient at frequency $(u, v)$ is the mirror of the one at $(-u, -v)$, so half the coefficients are redundant and the magnitude image is point-symmetric about its center. The very low frequencies (slow, large-scale variation) sit near the center; the high frequencies (fine detail, sharp edges) sit out toward the corners. Natural images put most of their energy at low frequencies — the world is mostly smooth, with detail concentrated at edges — so a typical magnitude spectrum is bright in the middle and dim at the edges, often with bright streaks where strong oriented structure (a horizon, a picket fence) lives. Set a real photograph beside its log-magnitude spectrum and you can read these correspondences off directly (Figure 7): the bright center is the smooth tone, oriented scene edges show up as bright streaks through the center, and fine texture fills the outskirts. (Beware: the cyclic wrap-around assumption itself injects a cross of horizontal and vertical power, from the discontinuity where the image's left edge meets its right; serious analysis windows the image first to suppress it.)

fig-fourier-read-spectrum — **Figure 3.7.9.** *Reading a real photograph's Fourier magnitude.* Left, a photo with strong oriented edges; right, its log-magnitude spectrum $\log(1+|\hat I|)$, with DC at the center (fftshift). The correspondences are labelled: the bright center is the DC and low frequencies (the smooth, large-scale tone); strong oriented edges in the scene produce bright streaks through the center; fine texture and detail land in the high-frequency outskirts. The image is windowed before the transform so the wrap-around boundary cross does not drown the genuine oriented structure.

A second reading aid is to manipulate the spectrum and watch what returns. Because filtering is multiplication in frequency, we can low-pass or high-pass a real photo by hand: transform it, multiply the spectrum by a mask — a disk that keeps only the center, or its complement that keeps only the outside — and transform back (Figure 8). Keep the center and the fine detail is gone: a smooth, blurred image, since the detail lived in the high frequencies we deleted. Keep the outside and only the edges and texture survive, the smooth tone subtracted away. This is the convolution theorem turned into an operation you can watch.

fig-fourier-lowpass-highpass — **Figure 3.7.10.** *Filtering a real photo by masking its spectrum.* From left: the photo; its spectrum with a low-pass disk drawn on it (keep inside); the inverse transform after low-pass masking — a smooth, blurred image (the fine detail, which lived at high frequencies, is gone); the inverse transform after the complementary high-pass mask — just the edges and texture on a mid-gray (the smooth tone removed). Multiplying the spectrum by a mask and transforming back is "convolution in space = multiplication in frequency" performed by hand.

Second, and more surprising: it is the phase, not the magnitude, that carries the structure of a scene. The magnitude reports which frequencies and orientations are present; the phase reports how they line up to form edges and objects. The standard demonstration is to swap them — take the magnitude of image A with the phase of image B — and you recognize the image whose phase you kept, not its magnitude (Figure 9). Phase is where the picture lives.

fig-fourier-magnitude-phase — **Figure 3.7.11.** *An image's Fourier magnitude and phase, and the phase-swap demo.* Top: a photo, its magnitude spectrum (bright at the center = low frequencies, energy fading outward, oriented streaks from strong edges), and its phase. Bottom: cross the two — magnitude of one image, phase of the other — and the result resembles the image whose **phase** was used. Magnitude says which frequencies are present; phase says how they assemble into structure.

A spatial shift, third, shows up as a pure phase ramp — sliding the image over does not change which frequencies are present (the magnitude is identical), it only re-times them (the phase rotates). This is consistent with the eigenvector picture: a shift is a convolution by a shifted impulse, and a convolution can only rescale each frequency by a complex number of magnitude one — a pure rotation, i.e. a phase change.

Fourth, the compact-in-space / spread-in-frequency tradeoff. A narrow, sharply localized blob in the image has a wide spectrum, and a wide smooth blob has a narrow one (Figure 10). A Gaussian makes this exact: the narrower the Gaussian in space, the wider its Fourier transform — which is itself a Gaussian. You cannot be compact in both domains at once. This locality-versus-frequency tradeoff is a genuine uncertainty principle, and it is the reason we will later want pyramids and wavelets, which sit halfway between the pixel and Fourier representations to buy some localization in both.

fig-compact-space-frequency — **Figure 3.7.12.** *Compact in space ⇔ spread in frequency.* Left column, two Gaussians in the image domain — a narrow one and a wide one; right column, their Fourier magnitudes. The narrow Gaussian (tightly localized in space) has a broad spectrum; the wide Gaussian (spread out in space) has a narrow spectrum. You cannot be simultaneously compact in both domains — the tradeoff that motivates multiresolution pyramids.

Sidebar — MTF, the lens's frequency response

The same language describes a lens. An optical system blurs by its point spread function (PSF) — the little smudge a perfect point of light becomes — and convolving the scene by that PSF is exactly the blur of this chapter. Its Fourier transform, the per-frequency attenuation, is the modulation transfer function (MTF): it reports, for each spatial frequency, how much contrast the lens preserves. A perfect lens would have $\text{MTF} = 1$ everywhere; a real one rolls off toward high frequencies, and diffraction sets a hard ceiling past which no contrast survives — the diffraction limit (see Optics). The MTF is how lenses are actually specified, and it is the frequency-domain face of the PSF — the very low-pass attenuation we will have to fight when we try to deblur.

3.7.7 The fine print: two limitations of Fourier⧉

Before we lean on this machinery any harder, two honest caveats are worth stating plainly, because the rest of the book keeps bumping into them. Both have surfaced already in passing; here they are named, because each one is the seed of a later idea.

The first is the one we have been flagging since the beginning: the DFT treats the signal as cyclic — periodic. It does not see a finite image with edges; it sees that image tiled to infinity, wrapping around so that whatever runs off the right edge re-enters on the left, and whatever leaves the top returns at the bottom. Where the left edge and the right edge disagree — which, for any real photograph, is essentially everywhere — that wrap-around manufactures a hard artificial discontinuity, and a discontinuity is broadband: it sprays energy across all frequencies. That spurious energy is the boundary leakage that lays a faint cross of horizontal and vertical power over a real spectrum, and it is the ringing that haunts frequency-domain filtering near borders. It is exactly why serious spectral analysis windows the image first — tapering it to zero at the edges so the wrap-around seam vanishes — and exactly the periodic-boundary assumption baked into any FFT-based deblurring solver, which silently inherits this same wrap-around and must be coaxed not to ring at the frame's edge.

The second cuts deeper, and we met its shadow in the space-versus-frequency tradeoff just above. The Fourier basis has infinite, global support: every single coefficient is a wave that spans the entire image, edge to edge. So a Fourier coefficient can tell you which frequencies are present, but it cannot tell you where — there is no spatial localization at all. Move one pixel, or paste a small bright object into one corner, and in principle every coefficient changes, because every global wave must be re-weighted to account for that one local change. This is not a flaw to be patched but a fundamental property of a global basis, and it is precisely what motivates the wavelets and pyramids of the next part: representations that sit halfway between pixels and Fourier, trading a little frequency precision for the ability to say where in the image a given band of detail lives (Linear pyramids and wavelets). The uncertainty principle we saw with the Gaussian — compact in space or compact in frequency, never both — is the same wall, seen from the other side.

3.7.8 Sampling and aliasing⧉

Now the second payoff. A digital image is a sampling of a continuous scene — we record the value on a regular grid and throw away everything in between. Most of the time that is harmless. But when the scene carries detail finer than the grid can resolve, something strange happens, and it is one of the most counter-intuitive facts in imaging.

Picture photographing a building's facade — rows of fine windows, or a striped shirt — and shrinking the result. Where the stripes are finer than the output pixels, they do not melt into a uniform gray; they reorganize into bold, wrong, low-frequency bands and swirls that are not in the scene at all: moiré. The same effect spins wagon wheels backward in old films and freezes a helicopter's rotor on video. Intuitively, when a pixel must summarize detail it cannot represent, it ought to return the local average — a neutral gray. Instead, the leftover high frequency gets misread as a low one. To see why precisely we need Fourier — and, conveniently, sampling is one more thing Fourier makes easy, because a sampling grid is just a set of evenly shifted spikes.

The clean case is a single sine wave (Figure 11). Sample a high-frequency sine too coarsely — fewer than two samples per cycle — and the samples you collect fall into a pattern indistinguishable from a lower-frequency sine. The original travels in disguise as a different frequency; that impostor frequency is its alias. The damage is not blur (lost detail you could shrug off); it is corruption — energy that lands at the wrong frequency and masquerades as real structure. The same masquerade plays out in 2-D on a real photograph (Figure 12): a facade's regular window grid is a high spatial frequency, and sampling it too coarsely without a pre-filter folds that grid down into bold, slow moiré bands that are not in the building.

The rule that prevents it is the Nyquist–Shannon sampling theorem: to represent a signal faithfully you must sample at more than twice its highest frequency. Equivalently, with a fixed sampling grid there is a top frequency — the Nyquist frequency, half the sampling rate — that the grid can represent; anything above it cannot be stored and, if present, folds (aliases) down onto a lower frequency and contaminates it. There is no recovering from aliasing after the fact: once a high frequency has folded onto a low one, the two are added together and can never be told apart.

In the frequency domain the mechanism is vivid. Sampling on a grid is multiplying the signal by a comb of spikes, and multiplying by a comb in space replicates the spectrum — it pastes shifted copies of the signal's spectrum at every multiple of the sampling rate (Figure 13). If the original spectrum is narrow enough (band-limited below Nyquist), the copies sit side by side without touching and the original can be isolated. If it is too wide, the copies overlap, the tails fold back over the genuine low frequencies, and that overlap is aliasing. Seen this way the fix is obvious: band-limit the signal before you sample it. Run a low-pass filter — a blur — first, to delete the frequencies above Nyquist that would otherwise fold, then sample. This pre-filter (anti-aliasing) trades a little sharpness for the removal of false structure, and it is why a well-written image-shrinking routine blurs before it downsamples. The space-versus-frequency tradeoff returns: a perfectly sharp cutoff in frequency is a sinc spread out forever in space, so practical pre-filters compromise.

fig-aliasing — **Figure 3.7.13.** *Sampling a sine too coarsely → aliasing.* A high-frequency sinusoid (light) is sampled at too few points per cycle; the samples (dots) are consistent with a completely different, **lower**-frequency sinusoid (dark) — its alias. Below twice the signal's frequency (the Nyquist rate) the two are indistinguishable from the samples alone: the high frequency "folds" down and masquerades as a low one. This is the moiré / wagon-wheel effect.

fig-aliasing-photo — **Figure 3.7.14.** *The same aliasing on a real photo.* A building facade — a dense, near-periodic grid of windows, i.e. a high spatial frequency — sampled ×7 by simply dropping samples, with no pre-filter. The window frequency, above the coarse grid's Nyquist, folds back into bold, slow, curved **moiré** bands that are not in the building (center; zoom at the bottom). It is the one-dimensional sine of the previous figure acting in two dimensions on a real image — high frequency masquerading as low. The cure — pre-filter, then sample — is shown in the resampling chapter.

fig-sampling-comb — **Figure 3.7.15.** *Sampling as multiplication by a comb → spectral replicas.* Top: sampling in space = multiplying by a comb of evenly spaced spikes, which in frequency pastes shifted **copies** of the signal's spectrum at every multiple of the sampling rate. Middle: if the spectrum is band-limited below Nyquist, the copies do not touch and the original is recoverable. Bottom: if it is not, the copies **overlap** and high frequencies fold onto low ones (aliasing) — fixed by low-pass **pre-filtering** before downsampling.

Sidebar — aliasing without Fourier

You do not strictly need Fourier to feel why sampling fails. A pixel records the signal at a single point, not the average over its little footprint, so two scenes that happen to agree at every sample point produce identical images even if they differ wildly in between — a fast wiggle and a slow one can thread the same dots. The honest "what should a pixel be?" answer is the average over the pixel's area (the integral of the scene under the pixel's footprint), and computing that average is the pre-filtering blur described above, dressed in primal-domain clothes. Fourier merely turns this geometric coincidence into a precise statement about which frequencies survive.

Sidebar — aliasing on purpose

Everything so far has treated aliasing as a nuisance to be suppressed. But the same effect can be engineered to carry information. Print a fine micro-pattern, photograph it with a camera, and the camera's own pixel grid samples that pattern too coarsely — producing a moiré that is no longer in the print at all but emerges from the beat between the print's spatial frequency and the sensor's sampling grid. Because that beat pattern is exquisitely sensitive to the exact sampling geometry, it shifts dramatically with the slightest change in viewpoint — so the moiré effectively encodes the relative camera pose or motion, reading out information that the bare pixels never recorded. MIT's KinéCam dynamic prints exploit exactly this: a static printed micro-pattern that animates and reports camera motion entirely through engineered moiré (hcie.csail.mit.edu/research/kinecam). It is the same physics as the wagon wheel and the striped shirt, turned from a bug into a channel — and it rhymes with the structured / coded patterns that recur elsewhere in the book.

The deepest reconstruction filter, the one Nyquist's theorem secretly invokes, is the sinc: in the frequency domain it is a perfect box — keep everything below Nyquist, kill everything above, with a vertical wall in between. The sinc reconstructs a band-limited signal exactly. But that vertical frequency wall costs a kernel that ripples on forever in space (the space–frequency tradeoff again), and infinitely wide kernels are neither practical nor well-behaved — they ring. So real systems use approximations to the sinc with finite support (Figure 14), each striking its own bargain between losing a little mid-frequency contrast and letting a little high-frequency aliasing through. We will meet the practical menagerie — bilinear, bicubic, Lanczos — in the resampling chapter; here the point is only that the ideal exists, that it is unreachable, and that every real filter is a compromise away from it.

fig-sinc-vs-practical — **Figure 3.7.16.** *Ideal sinc vs a practical filter, in both domains.* Top row (space): the ideal **sinc** reconstruction kernel ripples on forever, while a practical filter has finite support. Bottom row (frequency): the sinc is a perfect box — flat passband, vertical cutoff at Nyquist, zero beyond — whereas the practical filter rolls off gradually, slightly softening mid-frequencies and leaking a little past Nyquist. The vertical wall in frequency is exactly what forces the infinite ripple in space.

3.7.9 Application preview: can we deblur, given the blur?⧉

We can finally settle the promise. Suppose we know the blur — the kernel $g$ the optics applied — and we are handed the blurred image; this is non-blind deconvolution (the harder blind case, where the kernel is unknown, waits for a later chapter). The Fourier view makes the question crisp. In the frequency domain, blurring multiplied each frequency of the sharp image by the kernel's eigenvalue: $\hat I_\text{blur} = \hat I_\text{sharp} \cdot \hat g$. To invert, just divide it back out:

$$ \hat I_\text{sharp} = \hat I_\text{blur} \,/\, \hat g. $$

This is inverse filtering, and on paper it is perfect — a blur diagonalized into per-frequency multiplications is undone by per-frequency divisions. So why is deblurring not a solved problem?

Because of the one number we said decides everything: how close the eigenvalue is to zero. A blur is a low-pass filter, so $\hat g$ is large at low frequencies but tiny — near zero — at exactly the high frequencies the blur destroyed. Dividing by a near-zero number is catastrophic. And real images carry noise: a little random grain the sensor added after the blur, so that what we actually observe is $\hat I_\text{blur} = \hat I_\text{sharp}\cdot\hat g + \hat n$. Divide that through and we get $\hat I_\text{sharp} + \hat n / \hat g$: the genuine signal comes back, but the noise term is divided by $\hat g$ as well. At a high frequency where $\hat g \approx 0.01$ the blur attenuated the signal a hundredfold — yet it never touched the noise, which was added afterward — so dividing by $0.01$ now amplifies that noise a hundredfold. The inverse filter faithfully reconstructs the signal and, in the same stroke, explodes the noise; the result is a sharp-looking image drowned in colorful high-frequency garbage (Figure 15). This is the central difficulty of deblurring, and the Fourier picture diagnoses it exactly: inverse filtering is unstable precisely where the blur was strong.

The way out is to stop dividing blindly. A Wiener filter asks, at each frequency, whether there is enough genuine signal there to be worth recovering: where the blur left healthy signal, divide it back; where the blur crushed the signal below the noise floor, do not bother — leave it attenuated rather than amplify pure noise. It is the optimal linear compromise between sharpness and noise, in effect inverse filtering with a per-frequency safety valve. This regularized view is the foundation; the full treatment — better priors than "minimize noise," gradient-domain and sparsity methods, even designing the lens so its blur has no near-zero frequencies to begin with (coded apertures) — waits for the deblurring chapter. The lesson to carry forward is the diagnosis: deblurring is a division in Fourier, and it amplifies noise wherever the blur's frequency response (its MTF) dips toward zero.

fig-deblur-preview — **Figure 3.7.17.** *Deblurring preview: why naive inversion fails.* Left, a sharp image. Center, the same image blurred (a box / Gaussian PSF) with a little sensor noise added. Right, naive inverse filtering — dividing each frequency by the kernel's spectrum $\hat g$: the image sharpens but high-frequency **noise explodes**, because dividing by the near-zero $\hat g$ at high frequencies amplifies the noise the blur never attenuated. Inset: the kernel's MTF, dipping toward zero at high frequencies — exactly where the division blows up. A Wiener / regularized filter tames this by not dividing where the signal is below the noise floor.

Big lessons of this chapter

The recurring principles from this chapter, gathered for review.

💡 Big lesson — diagonalize when you can

symbol	meaning (this chapter)	note
$I(x, y)$	grayscale image / pixel value	from Notations; color = per-channel
$\mathbf{x}\in\mathbb{R}^n$	the image flattened to a vector in $n = W\!\cdot\!H$ dimensions	new (the vector / inverse-problem view)
$*$	convolution $(I * g)(x) = \sum I(x')\,g(x-x')$	from the convolution chapter
$g$, $\delta$	a kernel / impulse response; the impulse	from Notations
$\mathcal{F}$, $\hat I$	the Fourier transform and a transformed signal, $\hat I = \mathcal{F}\{I\}$	from Notations; $\hat g$ = transform of the kernel
$e^{i\theta}$	complex exponential; $e^{i\theta} = \cos\theta + i\sin\theta$ (Euler)	new (carries a cosine + sine together)
$u$, $v$	frequency indices (1-D $u$; 2-D $(u, v)$)	from Notations
$\omega$	(angular) spatial frequency	from Notations
$\hat g(\omega)$	the convolution eigenvalue at frequency $\omega$ (= the kernel's spectrum)	new (named here)
$\text{MTF}$	modulation transfer function — $\lvert\hat g\rvert$, a system's per-frequency contrast	from Notations
Nyquist	half the sampling rate; the top representable frequency	new (named, not symbolized)
$M$, $M^{-1}$	the blur as a matrix, and its inverse (deblurring)	new (local to the vector / inverse-problem view)

3.7 Fourier🔗⧉

3.7.1 Images as vectors in a high-dimensional space🔗⧉

3.7.2 why Fourier🔗⧉

3.7.3 Definition: one coefficient per wave🔗⧉

3.7.4 Sines are the eigenvectors of convolution🔗⧉

3.7.5 The 2-pixel example🔗⧉

3.7.6 Reading an image's Fourier transform🔗⧉

3.7.7 The fine print: two limitations of Fourier🔗⧉

3.7.8 Sampling and aliasing🔗⧉

3.7.9 Application preview: can we deblur, given the blur?🔗⧉