💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
jump to
💡 In a hurry? Jump to this chapter’s 1 big lesson ↓

3.2 Developing, Testing and Debugging

📎 Problem set

PS0 sets up the toolchain, a minimal image class, and image I/O. → Problem sets (appendix).

Let me start with an admission that will save you some grief. Most of the time you spend writing image-processing code will not go into writing it. It will go into figuring out why the picture that came out looks wrong — too dark, shifted by a pixel, fringed with color, peppered with NaN, or, on a genuinely bad day, not coming out at all because the program crashed. Image code has a special talent for failing quietly. A bug in ordinary software throws an exception or prints a wrong number and you notice. A bug in a blur produces a perfectly valid image that is merely a touch too soft, or a hair too dark at the borders — and it sails right past you unless you are in the habit of looking. This chapter is about that habit, and the few others that go with it.

Nothing here is specific to fancy algorithms. It is the unglamorous craft of getting code to actually work, adapted to the awkward fact that your data is a few million numbers you cannot read by eye. The principles are common sense; the reason to spell them out is that under deadline pressure everyone quietly abandons them, and every lapse costs you an afternoon. We go through the workflow, then the handful of principles that do most of the work, then the single best trick — testing on inputs you can verify by hand — before turning to the per-algorithm recipes, the dreaded segmentation fault, and a word on writing this kind of code with a large language model (LLM) at your elbow.

3.2.1 The workflow: build, run, look at the picture

The mechanics will be familiar from any programming course. You write code in your favorite text editor or integrated development environment (IDE); you build, run, and repeat. What is different for images is the run step, because the thing you most need to inspect is not a variable sitting in a debugger — it is a picture.

So the one habit that anchors this entire chapter fits in a sentence: write your intermediate images to disk and open them. Saving a .png after each stage of your pipeline is the image-processing equivalent of a print statement — the "printf of image code." (We use PNG precisely because it is trivial to read and write; see Image representation.) When the final output is wrong, you do not squint at the source guessing which stage spoiled it. You open the intermediate images and see where things first went bad. The blurred image looks fine but the sharpened one is full of black specks? The bug is in sharpening, and you localized it in ten seconds without reading a line of code.

Saving to disk is the zero-setup default and it always works, so reach for it first. If you catch yourself dumping the same images over and over while tuning a parameter, it is worth wiring up a small display straight from your code — a window that pops the array on screen, or a tiny browser-based interface that lays a few intermediates side by side. That is a convenience, not a requirement. The discipline is looking at the picture; the tooling you look at it with is up to you.

Sidebar — make "show me an image" a one-liner

Before you debug anything, write the smallest possible helper that takes an array and puts it on screen or on disk, and make calling it effortless: one short name, no ceremony. The friction of inspecting an intermediate result is the single biggest predictor of whether you will actually do it. If dumping a debug image costs one line, you will do it constantly and find bugs in minutes. If it costs a five-line incantation every time, you will skip it and reason in your head instead — which, as the next section argues, is exactly the mistake.

3.2.2 Principles

A few principles sit underneath everything else. Written down they look obvious. The reason to write them down anyway is that under pressure everyone violates them, and every violation costs an afternoon.

Doubt everything. Assume no piece of your code works until you have watched it work. Not the convolution you wrote yesterday, not the helper you lifted from your own earlier project, and especially not the part that is "too simple to be wrong." The most painful bugs hide in the piece you were so sure of that you never bothered to check it.

Test incrementally. Never write the whole program and then run it for the first time. If you implement five stages and only then hit run, a wrong final image could be any one of the five — or any interaction among them — and you are searching a haystack. Implement one stage, test it, convince yourself it is right, and only then build the next on top. Bugs are cheap to catch when there is exactly one new place they could be hiding.

Isolate, and binary-search the bug. When something is wrong in a long pipeline, do not read it top to bottom hoping the error jumps out. Cut the problem in half: look at the image halfway through. Already wrong there? The bug is upstream — check halfway through that half. Still fine? The bug is downstream. Each look halves the suspect region, so you pin a bug in a ten-stage pipeline in three or four checks instead of ten. This is plain divide-and-conquer, the move you would use on any system, applied to a chain of images.

Change one thing at a time. When you are fixing a bug or chasing an effect, vary a single thing between runs. Change three things at once, watch the output move, and you have no idea which change did it — and if two of them are bugs that partly cancel, you will be thoroughly lost. One knob per experiment.

Display, don't guess. This is the principle that swallows the others, and the one people resist hardest, because reading the code feels faster than running it. It is not. Your mental model of what the code does is precisely the thing that is wrong — that is why there is a bug. So stop reasoning about what the array probably holds and dump it. Look at the actual numbers, the actual picture. Do not stare at the code trying to guess how it behaves; display enough intermediate information to know.

There is a sharp little corollary to "display, don't guess" worth promoting to a habit of its own.

Sidebar — to trust a step, break it on purpose

Your code runs, the output looks plausible, and you want to confirm that a particular line is actually doing what you think. Deliberately break it — comment it out, zero its effect, flip its sign — and look again. If the picture changes the way you predicted, the line was doing its job and you have just proven it. If the picture doesn't change at all, that line was a no-op all along: dead code, a result you forgot to assign back, a parameter that never reached the function. A plausible-looking output is no evidence a step works; a predicted change when you break it is.

3.2.3 Test on inputs you can verify by hand

If you take one concrete technique away from this chapter, take this one. The reason image bugs are hard is that you cannot check a million-pixel output by inspection — so don't start there. Feed your code tiny synthetic inputs whose correct output you can compute in your head, and check that you get exactly that. A 3×3 or 5×5 image is plenty; you can read every number.

A small zoo of these inputs covers most needs, and each isolates a different kind of failure:

fig-debug-inputs
Figure 3.2.1. (Optional.) The four standard debug inputs, each a tiny grid you can read pixel by pixel: a constant (every pixel equal), an impulse (a single 1 in a field of 0s), an edge / half-plane (one straight black-to-white boundary), and a rectangle (a bright block on a dark field). Each isolates a different failure mode — constants catch normalization and edge-padding bugs, the impulse exposes a linear filter's kernel directly, the edge stresses behaviour at discontinuities, and the rectangle adds corners and two directions at once. The chapter stands alone without it; the montage just collects the set in one place.

The crucial part, and the part everyone skips: feed these inputs to your intermediate stages, not only to the final program. Running the whole pipeline on an impulse tells you that something is broken; running each stage on an impulse tells you which stage. The synthetic inputs and the binary-search habit are the same idea from two directions — small inputs make each stage checkable, and checking each stage localizes the bug.

Why a constant is such a good first test

A surprising number of bugs announce themselves on a constant image, because a constant strips away every spatial effect and leaves only the value-handling exposed. If a uniform gray comes out darker — a classic symptom — your filter weights probably don't sum to 1 (the kernel isn't normalized), or your edge handling is padding with black and dragging the borders down. Either way you caught a real bug on an input you could verify in your head, before a single real photo was involved.

3.2.4 Per-algorithm debug recipes

The general inputs above specialize beautifully once you know what a particular algorithm is supposed to do — an impulse reads a convolution kernel straight off the output, a known sub-pixel shift validates image alignment, the two limiting cases of the range parameter bracket the bilateral filter, and a constant patch tests demosaicking for channel leakage. Rather than work those recipes here, we keep them where they belong: each one now lives as a debug sidebar inside the algorithm's own chapter, next to the algorithm it tests.

Notice the shape they all share: pick an input whose correct output you can state in advance, then check you got exactly that. The method never changes; the algorithm only changes what "correct" looks like.

3.2.5 Crashes and bounds

Sometimes the program does not produce a wrong image — it dies. A segmentation fault, an index out of range, a process that simply vanishes. Image code crashes for a characteristic reason, and there is a characteristic fix.

The reason is almost always an array index gone out of bounds. You ask for a pixel that isn't there — x = -1 at the left edge, y = height at the bottom, a neighbour one step past the boundary in a filter loop. The maddening part is that the crash usually surfaces far from its cause. In C++ especially, reading or writing just past the end of a vector may not crash on the spot; it quietly corrupts adjacent memory, and the program topples over later, deep inside some innocent unrelated function, pointing your debugger at a line that is completely fine. So when you hit a segmentation fault, do not trust where it landed — suspect an index.

The fix is to make the bug surface at the moment it happens rather than later. When you suspect an indexing bug, print the offending index and the array's size right before the access, and — better — assert that every index lies in $[0, \text{size})$ before you use it. More generally, an assertion is a cheap way to pin down any property you believe should hold — that a constant came in constant, that the weights summed to 1, that a value stayed in range — so that the instant it stops being true the program halts and tells you where, instead of carrying a poisoned value downstream into a baffling final image. In a tight pixel loop you would not want these checks in your shipped, optimized build, so put them behind a debug flag: pay the cost while developing, compile it out for release. When an assertion fires it points at the exact access or invariant that broke, actual numbers in hand — no more chasing a crash that surfaced three functions away.

This is the same boundary problem we met when designing the safe pixel accessor in Image representation: a filter near the edge will ask for pixels past the boundary, and you must decide what that means (clamp, mirror, wrap, or zero). A bounds assertion and a well-defined edge policy are two sides of one coin — the assertion catches the accesses you forgot to handle, and the edge policy defines the answer for the ones you did.

When the crash is a compiler error, not a segmentation fault

Not every failure is a runtime crash. Early on, half of them are the compiler refusing to build at all, behind a wall of template noise that means nothing the first time you read it. The fastest move is rarely to reason it out from first principles: paste the error message into a search engine or an LLM and let the collective experience of everyone who hit the same message point you at the cause. A cryptic compiler error is almost always a known error, and treating it as a lookup rather than a riddle saves hours.

3.2.6 Vibe coding: writing image code with an LLM

You will increasingly write this code with a large language model — call it vibe coding: describing what you want and letting the model draft it. For image processing this is mostly a genuine help and occasionally a trap, and it pays to be clear-eyed about which is which.

The good news first. For the scaffolding around your algorithms — file input/output, reading and writing PNGs, setting up that little display helper, wiring an interface, the boilerplate of looping over pixels — an LLM is excellent and you should lean on it. That is exactly the code that is tedious, well-trodden, and easy to eyeball for correctness.

The catch is the part that matters most. For code that is numerically or perceptually subtle — filtering, resampling, color, antialiasing — generated code has a habit of looking completely plausible while being subtly wrong. The model will cheerfully hand you a resampler that doesn't prefilter and so aliases, an "antialiasing" routine that isn't, or a cross-fade that goes muddy in the middle because it blended in the wrong space. None of these throw an error. None of these look wrong in the source. They are only wrong in the picture — which is precisely the failure mode this whole chapter exists to catch. So the guidance is simple: keep the model on a short leash for the subtle numerical and perceptual code, where "looks plausible" is no evidence at all, and let it run free on the boilerplate.

What makes this unnerving is that an LLM, like all of today's AI, can be remarkable and trivially incompetent in the same sitting — and not always where you would expect. I watched this play out reproducing Kemelmacher-Shlizerman & Seitz's Photobios (the morphing slideshow that ages a person smoothly across a whole collection of their photos — see Many images and photo collections). The model handled the genuinely hard parts on its own: it registered the faces from facial landmarks, and it found a dynamic-programming path through the collection so consecutive frames matched. Then it repeatedly botched the easy part — a one-line cross-fade, frame = (1−t)·start + t·end, the linear blend you would write in your sleep — and it took something like five re-prompts to get that single line right. The lesson is not that the model is dumb; it plainly isn't. The lesson is that competence on the hard stuff is no guarantee on the easy stuff, so you review even the trivial parts. The line you would never bother to check is exactly the one it will quietly get wrong.

Which means the rule for trusting generated image code is the same as the rule for trusting your own: guilty until tested. Take the impulse, the constant, the edge, the rectangle, and run the generated function on them exactly as you would run your own. The LLM cannot tell you its resampler is correct; the impulse response can.

Sidebar — vibe debugging, on a leash

The model is genuinely useful for debugging too, as long as you keep the verification in your own hands. Paste it a cryptic error and ask what causes that class of crash; ask it to propose a minimal reproduction of a bug; ask it where in a function to look for an off-by-one. All fair game. What you must not do is let it hand you a fix and apply it on faith — "this should fix it" carries exactly the same risk as the original generated code. Run the proposed fix against a hand-checkable input before you believe it. The LLM is a fast source of hypotheses; the impulse, the constant, and the edge remain your only source of proof.

💡 Big lesson

Trust nothing — your own code or a model's — until you have watched it produce the right answer on an input simple enough to check by hand. Image code is unusually easy to get almost right and unusually hard to notice when you haven't. The defense is not cleverness; it is a small set of cheap, repeatable habits — look at the picture, test one piece at a time on inputs you can verify, change one thing, assert your bounds. Build those habits now, on three-by-three images, and they will carry you through every algorithm in the rest of this book.


Big lessons of this chapter

The recurring principles from this chapter, gathered for review.

💡 Big lesson

Trust nothing — your own code or a model's — until you have watched it produce the right answer on an input simple enough to check by hand. Image code is unusually easy to get almost right and unusually hard to notice when you haven't. The defense is not cleverness; it is a small set of cheap, repeatable habits — look at the picture, test one piece at a time on inputs you can verify, change one thing, assert your bounds. Build those habits now, on three-by-three images, and they will carry you through every algorithm in the rest of this book.