3.0 BASIC IMAGE PROCESSING AND ISP⧉
The first part of this book put a camera in front of the world and watched light turn into a grid of numbers. From here on, that grid is the world we work in. We have a photograph in memory, and the question is no longer where the photons came from but what we can do with the numbers they left behind. By the end of this part you should be able to build the core of a small photo editor — a mini Lightroom with exposure, contrast, color, sharpening — or the camera-side version of the same machinery, the image signal processor that turns raw sensor data into a finished JPEG.
A great deal lives in here, it turns out. Almost every operation you associate with a photo app or a camera is in this part: making a picture brighter or punchier, fixing its color, sharpening it, blurring it, scaling it up or down, cleaning up noise, saving it small enough to send. Underneath that variety sit a handful of ideas that recur the whole way through — and we will meet them in an order designed so each one earns the next. We start with the representation itself: what the array is, what its numbers mean, and the small conventions (coordinates, channels, edges) that quietly cause most beginner bugs (this chapter). Because image code is famously easy to get subtly wrong, we then spend a short chapter on how to develop, test, and debug it — the habits that save you later. With that in hand we look at point operations (a curve applied to every pixel: exposure, contrast, tone, histograms), then tone mapping for scenes whose range outruns the display, then neighborhood operations and convolution (a pixel computed from its neighbors: blur, sharpen, gradients). Convolution opens the door to Fourier, which gives us the language for sampling, aliasing, and resampling, and the pyramids that represent an image at many scales at once. We close with the pieces a real camera bolts together: file formats and compression (how a JPEG actually works), the metrics by which images are compared, denoising and demosaicking, and finally the image signal processor — the fixed pipeline from raw sensor data to a finished JPEG that ties every chapter of this part into one block diagram.
One thread runs through all of it, and it is worth stating up front because it will save you grief: the same array of numbers can mean different things, depending on how it is encoded. Do the physics — combining exposures, white balance, blurring — on linear values that are proportional to light; do the display and the compression on gamma-encoded values tuned to perception; and reach for log when the range is enormous. Knowing which of these a given operation wants is half of getting image processing right. This chapter raises that flag; the next chapters pin it down.
Contents of this part
- 3.1 Image representation
- An image is an array
- Beyond the pixels: basic metadata and EXIF
- Float vs 8-bit vs more bits
- What the numbers actually mean: encoding
- What the numbers actually mean: color spaces
- Three kinds of operation
- How the array sits in memory
- Pixel coordinate conventions
- Alpha and extra channels
- Video and more dimensions
- Reading a pixel — and falling off the edge
- 3.2 Developing, Testing and Debugging
- 3.3 Point operations
- 3.4 Histograms
- 3.5 Tone mapping
- 3.6 Neighborhood operations and convolution
- Motivation: blur and sharpen
- Convolution 101
- The flip: where-from vs. where-to
- Properties: the impulse, normalization, symmetry, commutativity
- Nitty-gritty: finite images
- A blur zoo
- Separability
- Gradient and oriented filters
- Sharpening
- Non-linear sharpening
- Edge-preserving filtering: the nonlinear escape
- Where this goes next
- 3.7 Fourier
- Images as vectors in a high-dimensional space
- why Fourier
- Definition: one coefficient per wave
- Sines are the eigenvectors of convolution
- The 2-pixel example
- Reading an image's Fourier transform
- The fine print: two limitations of Fourier
- Sampling and aliasing
- Application preview: can we deblur, given the blur?
- 3.8 Resampling
- Domain operations: moving pixels around
- Start with scaling up
- The naive idea, and why it leaves gaps
- Loop over the output, use the inverse transform
- Nearest neighbour
- Linear interpolation, in 1-D first
- Bilinear, in 2-D
- The convolution perspective
- Better kernels: bicubic and Lanczos
- Upsampling vs. downsampling: scale the kernel
- Prefiltering when the transform isn't a clean scale
- Where this goes next
- 3.9 Linear pyramids and wavelets
- Halfway between space and frequency
- The idea: process an image at many resolutions
- The Gaussian pyramid
- The Laplacian pyramid
- Reconstruction: an exact encoder and decoder
- Each level is a frequency band
- What the bands look like on a real image
- Honest limitations
- Cousins: wavelets and steerable pyramids
- What pyramids are good for
- Multiresolution as a recurring theme
- 3.10 Image metrics
- 3.11 Denoising
- 3.12 Demosaicking
- Quad-Bayer sensors: remosaic before demosaicking
- Reminder: the Bayer mosaic
- The task: full RGB at every pixel
- The naive approach: interpolate each channel on its own
- Why naive interpolation zippers: averaging across an edge
- Doing better: edge-directed interpolation
- The harder half: red and blue, and color fringing
- Green-based demosaicking: interpolate the color difference
- Classic (non-learning) demosaicking: the general strategy
- Related: the optical anti-aliasing filter
- Beyond hand-tuned: joint denoising and learned demosaicking
- Cross-reference: other ways to sense color
- Where this sits in the pipeline
- 3.13 Auto-exposure and auto white balance
- 3.14 File formats and compression
- 3.15 Recap ISP, non-destructive editing: