8.10 Non-photorealistic rendering⧉
PS8 implements stroke-based painterly rendering. → Problem sets (appendix).
Everything up to here has tried to make images truer — to recover detail the optics blurred away, to denoise, to expose well, to undo haze. Non-photorealistic rendering (NPR) does the opposite, and on purpose. It throws information away to make a photograph more like a drawing or a painting — to communicate rather than reproduce. A medical illustration leaves out the distracting tissue and draws a bold line around the organ that matters; a courtroom sketch gives you the witness's posture and not the wood grain of the bench; a cartoon is flat color and a few black outlines. The value of each is in what was left out.
The surprising part — and the reason this chapter belongs in a computational-photography book rather than an art-history one — is that the tools are the same ones we built in EDGES MATTER. The operation that makes a good cartoon is flatten the unimportant texture, keep and darken the meaningful edges, and that is exactly the edge-preserving / base–detail decomposition (L4), used now to abstract instead of to enhance. The bilateral filter that we used to smooth noise while protecting edges is the very same filter that collapses a face into flat poster-like regions; the edge detector that we used to find structure is the one that inks it. So NPR is less a new field than a new intent placed on familiar machinery — which is also why it makes a fitting coda. We will run from stroke-based painterly rendering (paint the photo with brushes — the course p-set), through edge-preserving abstraction (bilateral + Difference-of-Gaussians), to example-based stylization, and end on the bridge into neural style, where the hand-coded rules give way to a learned prior.
8.10.1 What NPR is for, and the one idea⧉
The goal is a depiction, not a reproduction: a painterly, sketch, or cartoon rendering that emphasizes structure and suppresses clutter. It sits closer to illustration than to photography, and — to repeat the point because it is the whole point — its value lies in what it omits. A photorealistic render is judged by how little you can tell it apart from a photograph; an NPR render is judged by how clearly it says one thing.
The one idea underneath all of it is structure-aware abstraction. Keep what the visual system actually uses to read a scene — the strong edges, the large-scale tone, the silhouette of the subject — and simplify everything else: fine texture, mid-tone noise, busy backgrounds. This is big lesson L13 — separate the subject and commit (first placed in Human factors and the art of photography) — executed not with a lens aperture or a choice of viewpoint but in pixels, and it is carried out with the L4 edge-preserving machinery. The two lessons meet here: deciding what carries meaning (L13) and having a filter that respects edges while smoothing within them (L4) are the two halves of every NPR algorithm.
Edge-preserving = affinity — and the same filter that enhances detail can be turned around to abstract it. A bilateral / base–detail decomposition smooths within a region but never across an edge, because it weights neighbours by how much they "belong together." Tone mapping used that to keep local contrast while compressing range; NPR uses the identical operation to destroy texture deliberately, collapsing busy regions into flat cartoon patches while leaving the meaningful edges standing. Same tool, opposite intent. (First placed in Bilateral filtering; here it is the engine of abstraction.)
The intuition-first version fits in a sentence: a cartoon is flat color regions plus bold outlines. The flat regions come from an edge-preserving smoother (one that refuses to blur across edges); the outlines come from an edge detector. That two-part recipe — smooth the insides, ink the borders — is very nearly the entire field, and the rest of this chapter is variations on it plus one older, stroke-by-stroke approach that gets at the same place from the painter's side.
8.10.2 Stroke-based / painterly rendering (and the brush p-set)⧉
The oldest computational route to a painting is to actually paint — to repaint the photograph with discrete brush strokes rather than filter its pixels. The canonical method is Hertzmann's Painterly Rendering with Curved Brush Strokes of Multiple Sizes Hertzmann 1998, and its central move is coarse-to-fine. Start with a blank (or roughly toned) canvas and a heavily blurred version of the photo as the reference. Lay down big, rough strokes to match that blurry reference — broad daubs of approximate color. Then sharpen the reference by one pyramid level and add smaller strokes, but only where the current painting still disagrees with the reference. Repeat down to the finest level. The effect is that detail appears only where it is needed — an eye, a highlight, a busy edge — while flat areas like a sky or a cheek stay loose and impressionistic, painted once with a fat brush and never revisited. (Figure 1.)
What keeps this from looking like random daubs is the structure built into where and how the strokes go:
- Orientation. Strokes run perpendicular to the image gradient — that is, along the "grain" of the image, following the contour of a cheek or the flow of a sky rather than cutting across them. A stroke laid along an edge reinforces it; one laid across it would smear it. This is the gradient field of Poisson image editing (L9) used as a direction field for the brush.
- Size equals scale. Coarse strokes are placed against coarse pyramid levels, fine strokes against fine ones — a direct use of the Gaussian / image pyramid (Linear pyramids and wavelets). The pyramid does double duty: it supplies the blurred references and it sets each layer's stroke size.
- Curved strokes. Rather than a single dab, a stroke can be traced as a long curve that follows the orientation field, bending with the image's flow and stopping when it reaches an edge (where the gradient direction becomes unreliable or the color changes). Long curved strokes are what make the result read as confident brushwork rather than stippling.
- Error-driven placement. A new stroke is added only where the canvas-versus-reference error exceeds a threshold $T$. This is what makes the painting converge: keep adding strokes where it is still wrong, stop where it is already close enough — at the abstraction level the current pyramid layer represents.
The crucial pedagogical point is that the parameters are the style. One algorithm, with a few knobs — the stroke size range, opacity, positional jitter, curvature limit, the blur radius of the references — produces an impressionist, a pointillist, or an expressionist look. Big strokes with high jitter and low opacity give a loose, Monet-ish wash; tiny round strokes with no curvature give pointillism; long high-curvature strokes give something more Van-Gogh-like. Nothing in the algorithm changes; only the brush does. This is the cleanest demonstration of "style as a small parameter set," and it is the conceptual ancestor of the learned methods at the end of the chapter, which replace this hand-chosen parameter vector with a statistic learned from an example.
This is the basis of the course brush p-set: implement the layered, gradient-following brush painter. Build the pyramid of blurred references; place oriented strokes coarse-to-fine, perpendicular to the local gradient, adding a stroke only where the error exceeds threshold; then explore a couple of brush/parameter styles on your own photograph. Use the photos-from-Fredo / sourced images for the worked example. The deliverable is one photo rendered in two or three distinct painterly styles from the same code — the parameters carrying all the difference. (Figure 4.)
8.10.3 Edge-preserving abstraction: bilateral + Difference-of-Gaussians⧉
The filtering route reaches a cartoon directly, and in real time — fast enough for live video. The canonical pipeline is Winnemöller, Olsen and Gooch's Real-Time Video Abstraction Winnemöller et al. 2006, and it is the two-part recipe from above made literal:
- Flatten the regions with an (iterated) bilateral filter. The bilateral smooths within objects but stops at edges (L4), so a textured face or a noisy wall collapses into broad, flat, poster-like regions while the boundaries between them stay crisp. Run it two or three times for a stronger, more painterly flattening. Optionally quantize the result — snap luminance (or color) into a small number of bands — for the hard-stepped, posterized look of a screen-print.
- Draw the lines with a Difference-of-Gaussians. Blur the image at two nearby scales and subtract: $$ D = G_{\sigma_1} * I - G_{\sigma_2} * I, $$ which responds wherever intensity changes rapidly — i.e. at edges — and is near zero in flat regions. **Threshold** $D$ and you get **black ink edges**: a clean line drawing of the scene's structure. (The DoG is the classic band-pass / edge operator; with $\sigma_2$ a bit larger than $\sigma_1$ it approximates a Laplacian-of-Gaussian.)
- Composite. Lay the thresholded DoG lines on top of the flat, quantized color, and the result is a cartoon — flat regions, bold outlines — produced cheaply enough to run on each frame of a video. (Figure 2.)
This pipeline is worth pausing on because it is EDGES MATTER stated outright. The bilateral step is literally the base/detail filter of Bilateral filtering (L4) — only here we keep the base (the flattened regions) and discard the detail (the texture we worked so hard to preserve elsewhere). The same affinity-weighted smoothing that protected detail in tone mapping is now used to annihilate it. Same tool, opposite intent — which is exactly the L4 recurrence boxed above.
There is one common refinement. Plain isotropic DoG, applied independently at every pixel, gives broken, noisy edges — short dashes and speckle, because nothing ties one edge pixel's response to its neighbour's. Kang, Lee and Chui's flow-based coherent line drawing Kang et al. 2007 fixes this by first computing a smooth edge-tangent flow — an orientation field that says, at each pixel, which way the local edge runs — and then running the DoG along that flow rather than isotropically. The result is long, clean, coherent lines, like a confident pen stroke instead of a scratchy pencil. It is the same idea as the painter's oriented, curved strokes, now applied to the edge operator: respect the local orientation and the marks join up. (Figure 3.)
For video, the remaining problem is temporal coherence: filtered independently, the flat regions and ink lines flicker and crawl frame-to-frame, which is distracting and unnatural. The fix is to make the filtering motion-aware — propagate the abstraction along the optical flow so a region keeps its color and a line keeps its place as the scene moves — so the stylization is stable in time, not just within each frame.
8.10.4 Example-based stylization and the bridge to neural style⧉
The methods so far hand-code what to keep: edges via a DoG, strokes via a gradient field, flat regions via a bilateral. Two final ideas point past that — first toward directing the abstraction by where attention goes, then toward learning the style from an example rather than coding it.
Eye-tracked abstraction (DeCarlo and Santella, Stylization and Abstraction of Photographs DeCarlo & Santella 2002) makes the abstraction non-uniform on purpose. They abstract the image at multiple levels of detail and then keep fine detail only where a viewer actually looked — measured with an eye-tracker — abstracting everything else away. The result reproduces, in the rendering, what the viewer's own visual system did: it foregrounds what drew attention and lets the rest dissolve. This is NPR as directing attention, and it is the most explicit possible link back to L13 — separate the subject and commit — with the "subject" defined by literal measured gaze.
Image Analogies Hertzmann et al. 2001 takes the example-based idea to its general form: a single hand-stylized example pair teaches an entire filter, the style inferred from one demonstration rather than coded by hand — the pre-deep ancestor of neural style, the hinge between the classical and learned worlds. The full treatment lives in Style transfer (and it is surveyed as a model in Deep learning); here it is enough to register it as that hinge.
That hinge is the bridge to neural style. Classical NPR, the whole chapter to this point, hand-codes what to keep: edges, strokes, flat regions, all chosen by the algorithm's designer. Neural style (Gatys, Ecker and Bethge Gatys et al. 2016) replaces those rules with deep-feature statistics — content captured by a convolutional neural network (CNN)'s feature activations, style captured by the correlations among those features (the Gram matrix) — and optimizes an image to match both. Feed-forward stylization (Johnson, Alahi and Fei-Fei Johnson et al. 2016) then trains a network to do the same in a single pass, making it real-time. The goal is identical to the painter's — a depiction that keeps content and imposes a look — but the prior is now learned rather than designed (L8). The full treatment is in Style transfer and Deep learning; here it is enough to see the lineage.
A learned operator swaps a hand-designed prior for one learned from data. Classical NPR is a stack of hand-tuned rules — this bilateral radius, that DoG threshold, strokes oriented this way. Neural style keeps the same end goal (content kept, style imposed) but replaces those rules with statistics fit to data: deep-feature content and Gram-matrix style. The skeleton — abstract guided by structure — does not change; only the source of the prior does, from the designer's intuition to a dataset. (First placed in Machine learning; here it marks the classical → neural transition.)
The takeaway that ties the chapter together: across painterly strokes, bilateral + DoG cartoons, eye-tracked abstraction, and neural style, the constant is abstraction guided by structure — decide what carries meaning, keep and exaggerate it, simplify the rest. The methods differ only in how they decide and how they execute. NPR is, in the end, L4 + L13 turned into a rendering style: the edge-preserving machinery doing the simplifying, and the subject-separation instinct deciding what to spare.
8.10.5 Region-based stylization: stained glass, low-poly, mosaics⧉
Strokes and edges are not the only handle on abstraction. A different family partitions the image into a few large regions and floods each with one flat color — abstraction by tessellation. Choose the tiling to respect structure and the result reads instantly: a Voronoi / centroidal tessellation gives the leaded panes of "stained glass by numbers"; a Delaunay low-poly triangulation seeded at corners and edges gives the faceted look of vector portraits; superpixels (SLIC) give organic patches. The dial is the number and shape of the cells — a few large ones read as a bold poster, many small ones approach the photograph. It is the same thesis as the rest of the chapter (L4, abstraction guided by structure), with "what to keep" carried by region boundaries rather than brush orientation; edge-aware seeding keeps tiles from straddling strong contours. It is kin to photomosaics (each tile chosen to match the local color) and, when the partition is found by graph-cut labelling, to the seam machinery of Seam optimization.
8.10.6 Artistic screening and halftoning⧉
The oldest abstraction of all is reproducing a continuous tone with discrete marks — the print shop's problem — and it turns expressive once the marks themselves carry meaning. Ordinary halftoning trades resolution for tone with clustered dots, ordered dithering, or error diffusion (Floyd–Steinberg scatters each pixel's quantization error to its neighbours, giving the clean blue-noise stipple of a laser print). Artistic screening (Ostromoukhov & Hersch) replaces the regular dot screen with shaped, image-bearing screen elements, so the halftone microstructure spells a second picture, lettering, or texture: correct tone at arm's length, a deliberate pattern up close. Stippling, engraving/hatching, and ASCII-art are the same move with different mark vocabularies — here the abstraction lives in how tone is quantized into marks, not in strokes, edges, or regions. It works for the same perceptual reason chroma subsampling does: the eye spatially averages fine marks into tone (Human (and animal) vision and color); the screen, its frequency, and its angle are the style, and the CMYK screen angles that dodge moiré tie it back to Color technology.
Big lessons of this chapter
The recurring principles from this chapter, gathered for review.
Edge-preserving = affinity — and the same filter that enhances detail can be turned around to abstract it. A bilateral / base–detail decomposition smooths within a region but never across an edge, because it weights neighbours by how much they "belong together." Tone mapping used that to keep local contrast while compressing range; NPR uses the identical operation to destroy texture deliberately, collapsing busy regions into flat cartoon patches while leaving the meaningful edges standing. Same tool, opposite intent. (First placed in Bilateral filtering; here it is the engine of abstraction.)
A learned operator swaps a hand-designed prior for one learned from data. Classical NPR is a stack of hand-tuned rules — this bilateral radius, that DoG threshold, strokes oriented this way. Neural style keeps the same end goal (content kept, style imposed) but replaces those rules with statistics fit to data: deep-feature content and Gram-matrix style. The skeleton — abstract guided by structure — does not change; only the source of the prior does, from the designer's intuition to a dataset. (First placed in Machine learning; here it marks the classical → neural transition.)