💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
jump to

8.4 Dehazing

A landscape photographed across a few kilometers of hazy air comes back washed out: distant ridgelines fade to a flat, bluish-gray, contrast collapses with distance, and the colors drain away toward the brightness of the sky. The near foreground may look fine while the mountains behind it dissolve into milk. The instinct is to "boost contrast," but a global curve cannot fix it, because the damage is depth-dependent — the far ridge needs far more correction than the near tree, and a single curve has no way to know which pixel is which.

Haze is not blur. The lens is in perfect focus; nothing is smeared. What has happened is scattering: between the scene and the camera, tiny particles (water droplets, dust, aerosols) both attenuate the light traveling from each surface and add their own glow of scattered ambient light — the airlight — and the deeper the scene point, the more of both. So the recipe of the previous chapters returns in a new costume. We will write down the physical image-formation model, notice that it has too many unknowns to invert per pixel, and then watch one clever statistical observation — the dark-channel prior — pin down the missing quantity and turn an unsolvable model into a clean recovery. The chapter ends one level up, on the tooling that lets us optimize such pipelines at all.

8.4.1 Dehazing as a prior-driven inverse problem

The shape of the problem is the thesis of this entire part. We have an under-determined image-formation model and we add one statistical prior that makes it solvable — exactly the move of blind deblurring in Blind deblurring, even though the physics could not be more different (atmospheric scattering there, an unknown convolution kernel here). Seeing the same skeleton under two unrelated bodies is the point.

The physical forward model. Each observed pixel is a blend of two things: the light that actually made it from the scene surface to the camera, dimmed by the haze, plus the haze's own scattered glow. Writing $I$ for the observed (hazy) image, $J$ for the clear scene radiance we want, $A$ for the airlight (the bright, roughly constant atmospheric light, essentially the color of the haze far away), and $t$ for the transmission (the fraction of $J$'s light that survives the trip to the camera, between $0$ and $1$):

$$ I(x) = J(x)\,t(x) + A\,\big(1 - t(x)\big). $$

Read it back in words: each pixel is the clear radiance attenuated by the transmission, blended with the airlight in proportion to how much was lost. The first term, $J\,t$, is direct attenuation — the scene, faded. The second term, $A(1-t)$, is the airlight — the haze's glow, weighted by exactly the fraction $1-t$ of light that didn't survive. When $t = 1$ (no haze, the near foreground) the pixel is the clear radiance $J$; when $t \to 0$ (deep haze, the far ridge) the pixel washes out entirely to the airlight $A$. Haze is a depth-dependent fade toward $A$. Note the shape: the clear scene enters multiplicatively in the $J\cdot t$ core — radiance times a surviving fraction, the same contrast-is-a-ratio structure (L1) that recurs across imaging — and it is the airlight term $A(1-t)$, an additive offset, that makes the model affine rather than purely multiplicative, which is exactly why a single global tone curve cannot undo it.

The transmission is not arbitrary — it follows the physics of attenuation through a medium, $t(x) = e^{-\beta d(x)}$, where $d$ is the scene depth and $\beta$ the medium's scattering coefficient. This is the same exponential extinction (Beer–Lambert) law that governs any absorbing or scattering medium: light decays multiplicatively with distance. So transmission is really a stand-in for depth — recovering $t$ is implicitly recovering a depth map, which is why dehazing and depth estimation are quietly the same problem wearing different labels.

Why it is under-determined. Count knowns and unknowns per pixel. We observe $I$ — three numbers, the RGB of the hazy pixel. We want to recover $J$ (three numbers), $t$ (one number), and we share a global $A$ across the image. So at every pixel we have three equations and four unknowns (the three components of $J$ plus $t$), before we even fix $A$. The model is hopelessly under-determined as written: infinitely many (clear-scene, transmission) pairs reproduce the same hazy pixel. We cannot invert it from the data alone — we need a prior (Figure 8.4.1).

This is another recurrence of L10 — the prior is not optional; see Super-resolution and image priors. Dehazing is the most vivid demonstration in the part: the model is literally unsolvable per pixel, and a single statistical observation about natural scenes is what makes it solvable.

The dark-channel prior. Here is the observation that breaks the deadlock, due to He, Sun and Tang (2009) and one of the more beautiful priors in imaging. Take any small window in a haze-free outdoor photo and look across all three color channels. In almost every such patch, some channel at some pixel is very nearly zero — there is a shadow, a dark or vividly colored surface, a sliver of a deep-saturated object. Pure colorful or shadowed regions are dark in at least one channel; only white or gray, well-lit surfaces are bright in all three, and those are rare in any given small patch of a real outdoor scene. Define the dark channel of an image as the per-patch minimum over both channels and a local window $\Omega(x)$:

$$ J^{\text{dark}}(x) = \min_{c \in \{r,g,b\}} \ \min_{x' \in \Omega(x)} \ J_c(x'). $$

The empirical claim — verified on thousands of haze-free outdoor images — is that for a clear scene, $J^{\text{dark}}(x) \approx 0$ almost everywhere.

Now watch what haze does to that near-zero dark channel. The airlight term $A(1-t)$ adds a bright, positive amount to every channel, more where the haze is thicker. It lifts the dark channel off the floor. So the dark channel of the hazy image is no longer near zero — its brightness is a direct read-out of how much airlight has been added, which is to say a direct read-out of $1 - t$. How bright the hazy dark channel is $\approx$ how much haze $\approx 1 - t$. Working the algebra through the model (take the dark channel of both sides, divide by the airlight, and use $J^{\text{dark}} \approx 0$) gives a clean estimate of the transmission directly from the hazy image:

$$ \tilde t(x) \approx 1 - \min_{c}\,\min_{x' \in \Omega(x)} \frac{I_c(x')}{A_c}. $$

One statistical regularity has supplied the one missing unknown. That is the whole trick (Figure 8.4.1).

[figure fig-dark-channel not built]
Figure 8.4.1. The dark-channel prior, end to end. (a) A hazy outdoor photo — distant scene washed out toward the sky's color. (b) Its dark channel $I^{\text{dark}}=\min_c\min_\Omega I_c$: near-zero (black) in the haze-free near field, bright in the hazy far field — directly imaging "how much haze." (c) The recovered transmission map $t(x)$ derived from it, bright (near 1) up close and dark (near 0) at depth — effectively a depth map. (d) The dehazed result $J=(I-A)/t+A$, contrast and color restored with distance, the haze peeled off layer by layer. The model was unsolvable per pixel; one statistical prior on the dark channel made the whole recovery possible.

The pipeline. With the prior in hand, dehazing is four steps:

  1. Estimate the airlight $A$. The haziest pixels are the ones whose dark channel is brightest — they are nearest to pure airlight. So pick the pixels with the highest dark-channel values (the top fraction of a percent), and read $A$ off the brightest of those in the original image. This is "find where the scene has essentially vanished into haze; that color is the haze."
  2. Estimate a coarse transmission $\tilde t(x)$ from the dark channel via the formula above. (In practice one keeps a small amount of haze in the far field — a factor like $1-\omega(1-\tilde t)$ with $\omega\!\approx\!0.95$ — because a perfectly dehazed distance looks unnaturally flat; some aerial perspective is a depth cue the eye expects.)
  3. Refine the transmission's edges. The coarse $\tilde t$ is computed over patches, so it is blocky and its boundaries do not line up with real object edges — it will halo around a sharp silhouette against the sky. The fix is to snap the transmission map to the image's own edges: solve for a refined $t$ that stays close to $\tilde t$ but follows the structure of $I$. He's original paper used soft image matting (a large sparse, edge-aware linear solve — the same matting Laplacian family as colorization in Blind deblurring); the follow-up guided image filter (He, Sun and Tang, 2010) does the same job far faster, using the hazy image itself as the guide that tells the filter where the edges are. Either way this is an edge-preserving / affinity step — exactly the machinery of Bilateral filtering, reused.
  4. Invert the model for $J$. With $A$ and a refined $t$ in hand, solve the haze model pixelwise for the clear radiance: $$ J(x) = \frac{I(x) - A}{\max\!\big(t(x),\,t_0\big)} + A. $$ The clamp $t_0$ (a small floor, $\approx 0.1$) keeps the division well-behaved where transmission is tiny: in the deepest haze almost no signal survived, so dividing by a near-zero $t$ would catastrophically amplify whatever noise is there — the same "inversion amplifies noise" trap that haunted deblurring. The floor caps that amplification, accepting a little residual haze in the far distance rather than a face full of noise.

The takeaway to land. The prior is doing all the work. The haze model has more unknowns than measurements and is unsolvable as stated; one well-chosen statistical regularity — the dark channel of a haze-free patch is dark — turns it into a clean, almost mechanical recovery. The data-fit (the haze model) plus a prior (the dark channel), with an edge-aware refinement to make the prior respect object boundaries: it is the same moral as the Wiener filter and as blind deblurring, in a third costume. And as everywhere in this part, the learned successor is waiting in the wings — networks trained to regress transmission, or to map hazy to clear directly, replace the hand-built dark-channel prior with a learned one (cross-ref Deep learning), and the de-weathering of fog and rain in video is its motion-aware cousin (forward-ref Advanced computational photography).

8.4.2 Differentiable image pipelines and algorithm optimization (Halide)

Step back and look at everything the last several chapters built. Deblurring, super-resolution, colorization, dehazing — every one is an optimization with a prior, a pipeline of operations (take gradients, estimate a map, refine with an edge-aware filter, invert a model) with knobs at every stage: filter weights, the patch size $\Omega$, the airlight percentile, the floor $t_0$, the regularization strength. Who optimizes the pipeline itself? When a restoration or full camera ISP has dozens of such parameters — or whole stages you'd like to learn — you want to tune them by gradient descent against a quality loss, the way we train anything else. For that, two things have to be true of the pipeline: it must be fast (you'll run it thousands of times), and it must be differentiable (you need a gradient of the loss with respect to every knob).

Halide. Ragan-Kelley et al. (2013) is a language built on one decisive separation: it splits what an image pipeline computes — the algorithm, the math — from how it runs — the schedule, meaning the tiling, vectorization, parallelism, and fusion of stages across the memory hierarchy. You write the algorithm once, in clean mathematical form, and then separately describe a schedule; the compiler combines them. One algorithm admits many schedules, so you get portable, near-hand-tuned performance on CPU, GPU, and beyond without rewriting the math every time you re-target or re-optimize. This decoupling is why Halide became the substrate a great deal of production computational photography is actually implemented in — the burst pipelines, the local tone operators, the filters of this very part.

Differentiable Halide. Li et al. (2018) add automatic differentiation through a Halide pipeline. Once you can backpropagate through the whole sequence of stages, two doors open at once. You can tune the pipeline's parameters by gradient descent — pose "find the dehazing/deblurring/ISP settings that minimize a quality loss on a dataset" as plain optimization through a fast, scheduled pipeline. And you can drop a classical pipeline in as a layer inside a neural network, so hand-built physics and learned components train end to end together. "Learn the filter" and "tune the algorithm" become the same gradient-descent loop.

This is why the topic closes the inverse-problem run rather than opening a new one. Differentiable, scheduled pipelines are the engine under the entire "data-fit + prior, minimized" program this part has been teaching — and they are precisely the boundary where classical optimization meets learned components. FlexISP (Heide et al., 2014) is the canonical example: a whole camera ISP built as one differentiable optimization with a denoiser as its prior, exactly the plug-and-play idea of Super-resolution and image priors realized as a real pipeline. The keep-it-conceptual caveat applies — this is the idea (separate algorithm from schedule; differentiability as the enabler of optimization and learning), not a Halide tutorial; for hands-on use, the language and its documentation are the place to go. The hand-off from here is natural: when the prior stops being hand-designed and starts being learned through one of these differentiable pipelines, we have walked from this part straight into Deep learning.