3.5 Tone mapping⧉
Stand in a doorway on a bright day and look out. Your eye takes in the dim hallway behind you and the sunlit street ahead — shadow detail and highlight detail at once, with no apparent effort. Now raise a camera and try to keep both. Expose for the street and the hallway falls to black; expose for the hallway and the street burns to white. The scene holds more range of light than the picture can. That gap — between the enormous range of light in the world and the modest range a screen or a print can reproduce — is what this chapter is about, and tone mapping is the name for bridging it: choosing the mapping from scene tones to display tones.
We have, in truth, already been tone mapping all along. Every curve from the previous chapter — exposure, contrast, levels, the freehand tone curve — is a mapping from an input tone to an output tone. This chapter is the natural payoff of those point operations: it asks which curve to choose when the explicit goal is to fit a high-contrast scene into a low-contrast display without losing the detail at either end. We give the quick intro here; the full treatment — how you capture a high dynamic range (HDR) image in the first place, and the edge-preserving filter the best local methods rest on — waits for the HDR chapter.
3.5.1 The dynamic-range problem⧉
Begin with numbers, because the gap is wider than intuition admits. The dynamic range of a scene is the ratio between its brightest and its darkest light — its contrast, as distinct from its overall brightness (its exposure). The adapting human eye copes with luminances spanning roughly $10^{-6}$ to $10^{6}$ candela per square metre, and a single outdoor scene routinely runs $1:100{,}000$ from deep shadow to sunlit cloud. A camera sensor or a film negative captures perhaps two to three orders of magnitude before it clips at the top or sinks into noise at the bottom. The display is meaner still: a monitor or a glossy print manages a contrast of maybe $1:20$ to $1:500$ — its blacks only some tens of times darker than its whites.
So there are really two problems, and it pays to keep them apart. Problem one is capture: recording $10^{5}$ of range with a sensor that holds $10^{3}$. The remedy is to take several exposures and merge them into one HDR image — the multiple-exposure story, deferred to the HDR chapter. Problem two is display: given an image that now holds a huge range of values (possibly far above 1), how do you squeeze it down to the $1:50$ your screen can show while keeping the detail visible at both ends? That second problem — display, not capture — is tone mapping, and it is the one we tackle here. The catch that makes it interesting is that you cannot simply rescale. Divide everything down to fit and the shadows collapse into one muddy black: the range survives, but the contrast that let you see the detail is gone.
You can watch the gap on a single real high-contrast scene (Figure 3.5.1). Take one capture and develop it three ways: expose for the highlights and the sky holds while the foreground crushes to black; expose for the shadows and the foreground reads while the sky and sun blow to flat white; the compromise in between satisfies neither end. No single exposure wins — which is exactly the gap tone mapping exists to close.
It is tempting to read all this as a defect to engineer away, but photographers have long used limited range deliberately. W. Eugene Smith spent days printing a single portrait, compressing tones so subject and background sit in a similar range — and the restraint is the point: balance, composition, a face that reads. Lighting does the same job at capture time (three-point lighting and fill flash exist partly to reduce a scene's dynamic range before it ever reaches the sensor). Tone mapping is not only damage control; it is a tonal-design tool with a long analog pedigree.
3.5.2 Two families: global and local⧉
There are two fundamentally different ways to compress the range, and the split is the spine of this chapter (Figure 3.5.2).
A global tone map applies one curve to the whole image: a single function $L_\text{out} = f(L_\text{in})$, the same everywhere — precisely the point operations we already know. A gamma curve, a log curve, an S-curve, or the Reinhard operator below: pick one, apply it to every pixel. Global methods are simple, fast, and — crucially — they never produce halos (a failure we meet shortly), because a single monotone curve cannot conjure local artifacts. Their limitation is intrinsic: a global curve that compresses a huge range into a small one must flatten its slope somewhere, and a flattened slope means lost local contrast. Squeeze hard enough and the picture goes flat — the detail technically present but visually washed out.
A local tone map lets the curve vary spatially: it compresses differently in different regions, applying a gentler squeeze where the eye hunts for detail and a stronger one across the large-scale brightness differences (the bright window versus the dark room). The idea, developed properly below, is to split the image into a slowly-varying large-scale layer and a fine detail layer, compress only the large-scale layer, and put the detail back untouched. This keeps local contrast crisp even under enormous compression — but it needs a spatial, edge-aware blurring step, which is why local tone mapping must wait for the convolution and edge-preserving chapters, and why its naive version fails in a characteristic way.
3.5.3 Global tone mapping: one curve for everything⧉
Start with the simplest honest attempt and watch it fail, because the failure names the fix. The crudest tone map is to clip: keep everything in $[0,1]$ and snap anything brighter to pure white. The sky goes uniformly white, every cloud erased. Marginally better is to rescale — divide the whole image by its maximum so the brightest pixel lands at 1 — but with a $10^{5}$ range that divisor is enormous, and everything but the brightest highlights is crushed to near-black. Neither works, because both are linear answers to a problem that is fundamentally multiplicative: we need to compress ratios, and what does that is a curve that bends — steep in the dark tones, flattening in the bright ones.
The principled global curve is Reinhard's operator (Reinhard et al. 2002), derived from how photographic exposure actually behaves. It is one line:
In words: take the scene luminance $L$ (which can run from 0 up to any large value) and send it through $L/(1+L)$. Read off the two ends. When $L$ is small, the denominator is near 1, so $L_\text{out} \approx L$ — dark tones pass through almost unchanged, shadow contrast preserved. When $L$ is large, $L_\text{out} \approx 1$ — the brightest highlights are gently drawn toward 1 but never quite reach it, so they roll off rather than clip. The entire infinite range $[0, \infty)$ maps smoothly into $[0, 1)$, with no hard clip anywhere (Figure 3.5.3). A common variant takes a white-point parameter $L_\text{white}$, the luminance you want mapped exactly to 1:
Setting $L_\text{white} = \infty$ recovers the plain operator; a finite white point lets you place pure white deliberately (so a chosen highlight burns clean to 1) rather than only approaching it asymptotically — the same "decide where white lands" choice the levels white-point slider offered, and, as we will see, the Zone System's "place a tone in a zone." A detail worth copying: apply the curve to luminance only and rescale the R, G, B channels by the same ratio, so colors keep their hue and only the brightness is compressed (a per-channel curve would desaturate the highlights toward white).
That $L/(1+L)$ fits on one line is no accident, and it is worth pausing on why. The problem is multiplicative — we want to shrink ratios. Reinhard's curve does exactly that, and it reads cleanest in log space, where a ratio becomes a difference: there the operator is simply "pull the large log-values back toward the small ones," the additive shadow of a multiplicative squeeze. That is the additive-vs-multiplicative lesson in miniature; we return to it below.
Reinhard is the sensible default global map, and it helps to see the family it belongs to. A plain gamma curve and a log curve are also global tone maps — each a single monotone bend that compresses the highlights relative to the shadows. An S-curve (sigmoid) is the photographer's favorite, steep through the midtones and rolling off at both ends, the shape film's response has and the one a camera bakes into its default look. And the most general global map is just the freehand tone curve of the previous chapter — drag the control points, keep it monotone with a spline, and you can hand-draw any of these. Reinhard's contribution is to supply a principled, parameter-light member of that family.
But notice what every global curve shares, and what none can escape: at any given input value there is one slope. Flatten that slope to fit a huge range into a small one and you flatten it for the whole image at once — bright regions and dark regions compressed by the same local amount. That is the wall global methods hit, and the reason we turn to local ones. You can see the wall on a real photograph (Figure 3.5.4): squeeze an HDR scene hard enough with a global Reinhard curve and the result, though it fits, goes hazy and washed out — the single slope has flattened the local contrast everywhere. A local tone map, developed below, fits the same range while keeping the fine contrast crisp; the figure shows that result now, even though the machinery comes later.
3.5.4 Histogram equalization and Ward's bound⧉
One global curve deserves its own paragraph, because the image chooses it for you. Histogram equalization (met as an automatic point operation in the previous chapter) uses the image's own cumulative histogram — its cumulative distribution function (CDF) — as the tone curve: $L_\text{out} = \text{CDF}(L_\text{in})$. This spends output range in proportion to how many pixels actually live at each input level — crowded tones get spread apart, empty tones compressed — which is, in effect, an automatic, data-driven tone map that maximizes global contrast. Equalization is itself a tone map, then: the CDF read straight off as the lookup curve, now wearing its dynamic-range-compression hat. As a tone-mapping operator it is appealing because it adapts to the scene with no knob to turn.
Its flaw, as before, is a lack of restraint. Where the histogram piles up, the CDF rises steeply, and equalization will stretch that region's contrast far past anything the scene contained — exaggerating contrast in busy regions and amplifying noise in flat ones (a blotchy sky). Greg Ward and colleagues (1997) fixed exactly this for HDR scenes with histogram-based tone mapping: equalize, but cap the local slope of the curve so contrast is never pushed beyond what the human eye will accept. The result is a perceptually-bounded equalization — it keeps the data-driven, automatic character but refuses to manufacture contrast that was never there. It is a clean instance of a recurring move in tone mapping: take a method that overdoes it, then constrain it with a fact about perception.
3.5.5 Local tone mapping and the halo problem: why naive local fails⧉
Now the local idea, and the trap. The plan is sound: an HDR image is really two things superimposed — a large-scale layer carrying the big brightness differences (the bright window, the dark corner) and a fine detail layer carrying texture and edges. The big range lives almost entirely in the large-scale layer. So: split them, compress only the large-scale layer hard, leave the detail layer alone, and recombine. The detail survives at full local contrast even as the overall range is crushed. This is the best-of-both decomposition that recurs throughout the book — separate a signal into meaningful parts and treat each appropriately.
The naive way to get the large-scale layer is to blur the image: a big Gaussian blur averages away the detail and leaves the slow variations. Compress that blurred version, add the detail back, and for smooth regions it works beautifully. The trouble is strong edges. A Gaussian blur does not respect edges — it averages straight across the boundary between the bright window and the dark wall, so the "large-scale" layer near that edge is a smeared ramp, too dark on the bright side and too light on the dark side. Compress this contaminated layer, add the detail back, and you get a visible bright or dark band hugging the edge: a halo (Figure 3.5.5). It is the same failure mode as naive unsharp masking, and it is the signature artifact of cheap local tone mapping — that telltale glow around the skyline in an over-processed HDR photo.
The diagnosis points straight at the cure: the problem is that the blur crosses edges, so we want a blur that does not blur across strong edges — one that averages within a region but stops at its boundary. That is precisely the edge-preserving filter, the bilateral filter, and it earns its own treatment after we have built up convolution. With it, the large-scale layer follows the edges instead of smearing over them, the halos vanish, and local tone mapping delivers on its promise: enormous compression with crisp local detail. We flag the fix here and develop it later; the point for now is why the naive version fails and what property the fix must have.
The halo is what you get when a blur treats two pixels as neighbours just because they sit close in space. The fix is to also ask whether they are close in value: pixels on the same side of an edge "belong together," pixels across it do not. Weighting a blur by value affinity as well as spatial distance is the whole idea of the bilateral filter, and it is the recurring trick behind edge-preserving operations throughout the book.
3.5.6 Why log space pays off⧉
One choice runs beneath everything above, and it is the throughline of these chapters: which numerical space does the curve act in? For tone mapping the answer leans hard toward log, and the reason is clean. Dynamic-range compression is fundamentally multiplicative — we want to shrink ratios, pulling a $1:100{,}000$ ratio down to $1:50$, because contrast and perception are about ratios, not differences (the contrast between 1 and 2 is the same as between 100 and 200). In linear space a multiplicative operation is an awkward nonlinear thing. But take the logarithm and multiplication becomes addition: a ratio becomes a difference, and "compress every ratio by a constant factor" becomes the trivial "scale all log-values toward the mean." So we do the work in log luminance — the large-scale/detail split, the compression — and the arithmetic turns simple.
Log space helps the decomposition too. In the log domain the detail layer is literally $\text{detail} = \log(L) - \text{large-scale}$, a plain subtraction, and compressing the large-scale layer by a factor and adding the detail back is plain addition — the multiplicative range compression we actually want, expressed as the additive operations a computer and a Gaussian blur handle gracefully. This is the same reason exposure (a multiply) became an addition in log back in the point-operations chapter, and it generalizes into one of the book's recurring lessons.
Ask whether an operation is fundamentally additive or multiplicative. Light, exposure, and dynamic range are all multiplicative (double the photons, double the radiance; contrast is a ratio). Multiplication is clumsy to manipulate directly — but in log space it becomes addition, which blurs, decomposes, and compresses cleanly. That is why so much of tone mapping (and HDR, and the bilateral split) lives in log: it turns the range we want to compress into a quantity we can simply add and subtract.
3.5.7 The analog ancestor: the Zone System⧉
None of this is new with computers; it is the digital descendant of a craft photographers refined for a century, and the analog version is the best intuition pump for the digital one.
Ansel Adams formalized global tone mapping as the Zone System. The idea is to carve the range of light into discrete zones, each a stop apart (a factor of two in light), labeled 0 (pure black) through X (pure white), with Zone V the 18% mid-grey anchor. The photographer decides, at capture, which scene luminance to place in which zone — "I'll put the shadowed rock in Zone III so it keeps texture, and let the snow fall in Zone VIII" — then controls the mapping from scene to negative to print through exposure, film choice, development chemistry and timing, and paper grade. That is exactly a global tone curve, chosen deliberately: zones are intensity ranges, and the Zone System is a discipline for designing the curve that maps scene tones to print tones (Figure 3.5.6). It is also the analog cousin of Reinhard's white point — "place this tone here" is the darkroom version of "map this luminance to 1."
And local tone mapping has an even more direct analog ancestor, important enough to pull out on its own.
In the darkroom you hold back light from part of the print during exposure to make it brighter (dodging), or give it extra light to make it darker (burning) — a local, spatially varying adjustment, done by hand for every single print, often with cut-out cards and careful timing. Hold back light over a dark foreground and you lift it relative to the bright sky: you have compressed the scene's range locally, exactly what a digital local tone map automates. Adams and his peers even varied local contrast and sharpness this way, not just brightness. When you reach for a local tone map, or paint on a masked curves adjustment in a modern editor, you are doing in software what a master printer did by hand under the enlarger. This is the analog ancestor of the spatially-varying methods above — and, like them, it lives outside the reach of any single global curve.
Stills and motion picture developed their tonal tools on parallel tracks, and the vocabularies never fully merged. Video tends to be less radiometric — colorists work with lift/gamma/gain wheels rather than stops of exposure — yet film grading developed an extraordinarily rich craft of its own, including elaborate local and color manipulations. The underlying operations are the same tone curves and local adjustments; only the dialect differs. (See Video color grading in the point-operations chapter.)
3.5.8 Where this is going⧉
That is the quick intro, and the takeaways carry straight into the rest of the book. Tone mapping is display-side range compression: fitting a high-contrast scene into a low-contrast medium. Global methods — Reinhard's $L/(1+L)$, gamma, log, S-curves, equalization, the Zone System — apply one monotone curve everywhere: simple and halo-free, but they trade away local contrast under heavy compression. Local methods vary the curve spatially by splitting the image into a large-scale layer and a detail layer and compressing only the former; they keep local detail crisp but demand an edge-preserving blur, because the naive Gaussian version produces halos at strong edges. And all of it is cleanest in log space, where multiplicative range compression becomes additive. The HDR chapter picks up every one of these threads — how to capture the HDR image to begin with, the bilateral filter that kills the halos, and the full local operator — once we have built the convolution and edge-preserving machinery they rest on.
Big lessons of this chapter
The recurring principles from this chapter, gathered for review.
The halo is what you get when a blur treats two pixels as neighbours just because they sit close in space. The fix is to also ask whether they are close in value: pixels on the same side of an edge "belong together," pixels across it do not. Weighting a blur by value affinity as well as spatial distance is the whole idea of the bilateral filter, and it is the recurring trick behind edge-preserving operations throughout the book.
Ask whether an operation is fundamentally additive or multiplicative. Light, exposure, and dynamic range are all multiplicative (double the photons, double the radiance; contrast is a ratio). Multiplication is clumsy to manipulate directly — but in log space it becomes addition, which blurs, decomposes, and compresses cleanly. That is why so much of tone mapping (and HDR, and the bilateral split) lives in log: it turns the range we want to compress into a quantity we can simply add and subtract.