💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
jump to
💡 In a hurry? Jump to this chapter’s 1 big lesson ↓

2.10 Photography and camera 101

We have built the physics of image formation: rays selected by a lens, projected by perspective, integrated over the aperture and the pixel and the exposure time, and counted as photons on a sensor. Now we sit behind a real camera and ask the practical questions. Which dial does what, and what does it cost? How does the camera decide the exposure on its own, and when does it get it wrong? How does autofocus find the focus plane, and how does a phone — with a sensor the size of a fingernail and no aperture ring at all — produce a picture that rivals a camera ten times its size? This chapter is the bridge from the abstractions of the previous chapters to the buttons under your thumb, and from there to the computational chapters that make a phone camera work.

The throughline is that every control on a camera is one of the previous chapters made physical. The aperture ring sets the $f/D$ ratio from the lens chapter; the shutter dial sets the exposure time $t$ from the sensor integral; the ISO setting scales the gain that the noise section described; the focus ring moves the image distance $v$ of the thin-lens equation. Learn the physics once and the camera stops being a box of mysterious modes.

2.10.1 Basic photography: exposure settings — shutter, aperture, and ISO

How bright a photo comes out is its exposure, and from the radiometry and sensor chapters we already have the equation: exposure is irradiance times time, and the irradiance the lens delivers scales with aperture area, so

$$ \text{exposure} \;\propto\; t \left(\frac{D}{f}\right)^2 \;=\; \frac{t}{N^2}. $$

Two optical controls set it. Shutter time $t$ enters linearly — twice the time, twice the light. Aperture enters through its area, which goes as the diameter squared $D^2$, i.e. as $1/N^2$ in the f-number $N = f/D$ — so twice the f-number means a quarter the light. This is finally why photographers describe the aperture by the ratio $f/D$ rather than the raw diameter: a longer lens at the same f-number has a proportionally bigger opening, so it gathers the same light per unit of sensor area despite seeing a narrower field of view. The longer lens divides the world into a smaller patch (less light per pixel from the geometry) but compensates with a larger physical aperture (more light gathered), and the ratio $f/D$ is exactly the quantity that nets those two effects out. The ratio normalizes light-gathering across focal lengths — precisely what you want a brightness setting to do.

A third control, ISO, is electronic rather than optical: it scales the gain applied to the sensor signal after capture. Raising ISO does not collect more light; it amplifies what was collected, brightening a dim exposure at the cost of also amplifying noise (the sensor chapter's read and shot noise). So the three legs of exposure are shutter, aperture, and ISO — two that change the light and one that changes the gain.

Each control carries a side effect you cannot escape, and each was derived elsewhere in this part. A longer shutter time risks motion blur of anything that moves during the exposure (covered fully in Motion/Video); a wider aperture (smaller $N$) shrinks the depth of field (DoF) — the range of distances in acceptable focus (the lens chapter); and a higher ISO raises noise (the sensor chapter). Choosing an exposure is therefore never a single decision but a negotiation among brightness, motion, depth of field, and noise (Figures 2 and 3).

[figure fig-aperture-dof not built]
Figure 2.10.1. Aperture versus depth of field. The same subject shot at a wide aperture (small f-number, e.g. f/2) and a narrow one (large f-number, e.g. f/16). Wide open admits more light but yields a shallow depth of field — a melting, out-of-focus background that isolates the subject. Stopped down admits less light (forcing a longer shutter or higher ISO) but renders foreground-to-background sharp. The same f-number is both a brightness setting and a depth-of-field setting.
[figure fig-shutter-motion-blur not built]
Figure 2.10.2. Shutter speed versus motion blur. A moving subject (a waterfall, a runner) shot at a fast shutter (e.g. 1/1000 s) freezes the motion crisply; at a slow shutter (e.g. 1/4 s) the motion smears into blur — silky water, streaked lights. The slow shutter admits more light but records the subject's motion during the exposure. The same shutter time is both a brightness setting and a motion-rendering choice.

Stops and the power of two. Photographers do not count light in raw multiples; they count it in stops, where one stop is a factor of two in light — the natural unit because the eye and the medium respond to ratios, not differences (the same logarithmic perception behind gamma encoding, in Color technology). One stop brighter is $+1$ exposure value (EV), so a stop is a log₂ unit: $\text{stops} = \log_2(\text{light ratio})$, with $\mathrm{EV} = \log_2(N^2/t)$. This is why the classic f-number series looks so irregular — $f/1.4, f/2, f/2.8, f/4, f/5.6, f/8, f/11, f/16$. Because light $\propto$ aperture area $\propto D^2 \propto 1/N^2$, each step multiplies $N$ by $\sqrt{2}$, so the area — and the light — halves at every step (Figure 2.10.3). Shutter speeds ($1/30, 1/60, 1/125, \dots$ s) and ISO ($100, 200, 400, \dots$) step in the same power-of-two stops; cameras add ⅓-stop clicks for fine control. The point is that one stop of aperture, one stop of shutter, and one stop of ISO all change the brightness by the same factor — they are interchangeable currency, the exposure triangle (Figure 2.10.5).

💡 Big lesson (L7) — photographers measure light in stops

Photographers measure light in stops — a log₂ (doubling) scale. A stop is a log₂ unit: one stop is a factor of $2$ in light, and $\text{stops} = \log_2(\text{light ratio})$. Aperture (f-stops), shutter, ISO, and dynamic range are all counted in stops, so exposure is additive in log space — adding one stop here and subtracting one there keeps the total fixed, which is exactly what makes the three controls of the exposure triangle interchangeable. It is the same "ratios matter, work in log" principle we meet again in gamma encoding and tone mapping.

fig-fstop-progression
Figure 2.10.3. The f-stop progression as powers of two. The standard series f/1.4, f/2, f/2.8, f/4, f/5.6, f/8, f/11, f/16 drawn as aperture disks: each f-number is √2 times the previous, so the disk area — and the light admitted — halves at every step. The irregular-looking numbers are exactly the √2 spacing that makes each step one stop (a factor of two in light).
fig-exposure-triangle
Figure 2.10.4. The exposure triangle. Three controls set how bright a photo is, each in power-of-two stops (one stop = ×2 light = 1 EV): shutter time (longer = brighter, but motion blur), aperture / f-number (wider = brighter, but shallower depth of field), and ISO (higher = brighter, but more noise). A stop of one can be traded for a stop of another to keep brightness constant while changing the side effects.
fig-exposure-triangle-sim
Figure 2.10.5. Try it yourself (web edition). A live exposure-triangle simulator: two 3D subjects stand at different depths and walk in opposite directions while you set shutter, aperture, ISO, scene brightness, and sensor size. It renders the scene physically — supersampling over time for real motion blur, over the aperture for real depth of field, and through a photon-shot-plus-read-noise model for real ISO grain — and an "auto-set triangle" holds the exposure constant so you feel each control's trade-off (freeze vs blur, deep vs shallow focus, clean vs noisy) rather than just its brightness. In the printed edition this stands in as a screenshot.

Autoexposure and metering. How does the camera pick an exposure on its own? Its meter integrates the scene's light — through the lens (TTL — through-the-lens), as a center-weighted average, a narrow spot, or a smart evaluative/matrix reading — and then renders that average to a mid-gray assumption: it assumes the average scene reflects about 18% gray and sets the exposure to make it so. This works for ordinary scenes and fails predictably on high- and low-key ones (Figure 2.10.6). A snowfield is far brighter than 18% gray, so the meter, dragging it down to gray, underexposes — the snow comes out dingy, and you must dial in $+1$ or $+2$ EV to put it back. A black cat fills the frame with something far darker than gray, so the meter overexposes it to muddy gray, and you dial in $-$EV. Smart matrix metering — Nikon's 3D Color Matrix, learned from a database of tens of thousands of photographs — adds brightness, color, contrast, and distance cues to guess the photographer's intent, but metering remains, in essence, an open problem.

Margin note — where does 18% come from?

18% is the reflectance of the standard photographic gray card, the meter's stand-in for the average reflectance of a "normal" scene. The key idea is that 18% is middle gray perceptually, not numerically: because lightness is roughly logarithmic (Weber–Fechner; cf. the Commission Internationale de l'Éclairage (CIE) lightness $L^*$, Color technology), the perceptual midpoint between a black (~2–3% reflectance) and a white (~90%) sits near their geometric mean $\sqrt{0.03 \cdot 0.9} \approx 0.16$, i.e. ~18%, not 50%. It is Zone V of Ansel Adams's Zone System. (A wrinkle for forum arguments: light meters are actually calibrated to a luminance constant that, with typical lens flare, corresponds to ~12–13% reflectance — so the famous "18% card" and the meter's true target differ by ~⅓–½ stop.)

[figure fig-metering-18-gray not built]
Figure 2.10.6. The 18% gray assumption and where it fails. The meter renders the average scene to mid-gray (~18%). For an ordinary scene this is right. For a snowfield (high-key), forcing the bright average to gray underexposes — the snow turns dull; the fix is to add +EV. For a black subject (low-key), forcing the dark average to gray overexposes; the fix is −EV. Matrix metering uses extra cues to guess intent.

Priority modes. Because shutter and aperture trade off, cameras offer priority modes: in aperture priority you set $N$ (and thus depth of field) and the camera picks the shutter $t$; in shutter priority you set $t$ (and thus motion blur) and the camera picks $N$. Aperture priority is the usual default, because depth of field is usually the creative variable. Each mode fails when no valid partner exists: ask for $f/1.4$ in blazing sun and the required shutter may be faster than the camera can go; ask for $1/1000$ s in a dim room and no aperture is wide enough. (The full mode dial gets its own section next.)

Expose to the right. A subtle but important habit, which the noise section justifies: expose to the right (ETTR) — push the exposure as bright as you can without clipping the highlights, so the histogram piles up toward the right edge (Figure 2.10.7). The reason is that noise lives in the shadows (the signal-to-noise ratio is worst where photons are fewest), so capturing more light everywhere, especially in the darks, improves quality; you pull the brightness back down later in software at no cost. And when you do need to brighten, raising the ISO in-camera beats brightening in software, because ISO gain amplifies the signal together with the early (pre-amplifier) noise but not the later read noise added on readout — so the signal gets a head start over part of the noise. ETTR and ISO are both the noise section's logic applied at capture time.

fig-ettr-histogram
Figure 2.10.7. Expose to the right (ETTR). Two captures of the same scene: (left) a timid exposure with the histogram bunched at the noise floor, later brightened in software — which amplifies shadow noise; (right) an exposure pushed as far right as possible without clipping the highlights, then pulled down in software — capturing more photons everywhere, especially in the noisy shadows. Brighter-but-not-clipped is the lower-noise capture.

2.10.2 Exposure modes, UI, and auto-ISO

The mode dial looks like a menagerie, but underneath it is just who picks which leg of the exposure triangle. The four classic program, aperture-priority, shutter-priority, and manual (PASM) modes are exactly the four answers (Figure 2.10.8):

Beyond these sit full Auto and the scene modes (portrait, sports, night), which are presets that bias the program toward a guessed intent.

fig-mode-dial
Figure 2.10.8. The mode dial as the exposure triangle's "who decides." P / A(Av) / S(Tv) / M / Auto, annotated with which leg each fixes: P fixes nothing (camera picks the pair, you may shift it); A fixes the aperture (you choose DoF); S fixes the shutter (you choose motion); M fixes both; Auto fixes everything. Auto-ISO adds a floating fourth leg, freeing a second variable in A or S.

Auto-ISO is the modern fourth knob, and it changes the logic. In plain A or S the camera still has only one free leg, so a single setting can over- or under-expose when the light is extreme. Auto-ISO frees a second leg by letting ISO float: now you can hold both a chosen aperture and a minimum shutter speed, and let the ISO climb to make the exposure work. You cap the maximum ISO (your noise budget) and a minimum shutter speed — often a "1/focal-length" rule (so a 200 mm lens won't drop below 1/200 s) or a multiple of it. This is why M + auto-ISO has become a popular combo: you nail aperture and shutter for the look you want, and let ISO be the automatic variable, with exposure compensation still steering it.

Exposure compensation (±EV) is the manual override for the meter's 18%-gray failure without leaving an auto mode: it biases the mid-gray target, so a snow scene gets $+$EV and a dark scene $-$EV. On a mirrorless body the dedicated dial and the live exposure meter make this immediate — you see the brightness change before you press the shutter, the subject of the UI and display section below.

2.10.3 Focus, autofocus, and depth of field

Focusing, from the thin-lens chapter, is nothing more than choosing the image distance $v$ — moving the lens so the subject's image lands exactly on the sensor. Doing that automatically, fast and accurately, is autofocus (AF), and there are two classic passive families plus a modern on-sensor synthesis (Figure 2.10.9).

Contrast-detection autofocus (CDAF) drives the lens until the image contrast (sharpness) is maximal. It is accurate — the peak of the sharpness curve is exactly best focus — but it has to hunt: from a single blurred frame it knows only that the image is soft, not which direction sharpens it, so it must dither past the peak to find it. Historically slow, it was the staple of early mirrorless bodies and compacts.

Phase-detection autofocus (PDAF) is cleverer. It looks at the scene through two opposite edges of the lens — two sub-apertures — giving two slightly shifted views of the same scene. The phase difference between them gives the direction and the amount of defocus in a single shot, so the lens drives straight to focus with no hunting (Figure 2.10.10). This is literally a tiny stereo measurement (the two sub-apertures are a micro baseline), and it is why the next chapter's stereo geometry shows up inside a single camera. The classic single-lens reflex (SLR) implemented PDAF with a dedicated AF sensor hidden under the reflex mirror.

fig-af-families
Figure 2.10.9. The three autofocus families. CDAF climbs the contrast/sharpness curve — accurate but must hunt, since one frame gives no direction. PDAF splits the lens into two sub-apertures; the phase shift between the two views gives direction and distance in one shot (no hunt) — a tiny stereo measurement. On-sensor PDAF / dual-pixel puts that split inside the imaging sensor: each photosite is divided into halves that see opposite sub-apertures, giving phase detection at (nearly) every pixel.
[figure fig-pdaf-phase not built]
Figure 2.10.10. Phase-detection geometry. The two sub-aperture views of a scene point coincide when the image is in focus; when the subject is too near, the two views separate one way; when too far, they separate the other way. The sign and magnitude of the separation tell the lens exactly which way and how far to move — focus in one step, no hunting.

On-sensor PDAF and dual-pixel is the modern synthesis that mirrorless bodies use. Instead of a separate AF sensor, phase detection is moved onto the imaging sensor itself — either with scattered masked AF pixels, or with dual-pixel designs that split every photosite into two halves that see opposite sub-apertures. The result is phase detection at nearly every pixel, the speed of PDAF with the coverage of CDAF — the best of both worlds. (Those same split photosites are a literal micro-stereo pair, which is how Google's Pixel computes the depth for its portrait-mode background blur.) For completeness, active AF — old Polaroid sonar (ultrasonic time-of-flight) and infrared rangefinding — works in the dark but is crude; it is distinct from an AF-assist lamp, which merely throws contrast onto the subject so passive AF can work.

Depth-from-defocus AF (Panasonic DFD) is a third route that wrings a one-step focus cue out of an ordinary contrast sensor — no phase pixels at all. The trick is to model how a particular lens blurs: knowing its point-spread function as a function of focus, the camera shoots two frames at slightly different focus and reads off, from how the blur changes between them, both the direction and the amount of defocus. That is a PDAF-like single-step estimate sitting on top of a plain contrast sensor — it still rides on CDAF for the final confirmation, and it needs a per-lens blur profile baked in (so it works only with the maker's own lenses). The full treatment of how defocus encodes depth is deferred to the Optics part (Focus).

Focus modes decide whether to lock or track: AF-S / one-shot locks focus once, for still subjects; AF-C / AI-Servo tracks continuously and even predicts motion, for action. Increasingly, what to focus on is itself automated: from single-point and zone selection, cameras have moved to subject-detection AF that finds and tracks a face → eye → animal / bird / vehicle with a deep-learning detector (→ Machine learning). Eye-AF is the headline modern feature for portraits — the camera locks the nearest eye and holds it as the subject moves. Two handling tricks round out the section: back-button focus decouples AF from the shutter button (focus with a thumb button, release with the shutter), and focus-and-recompose locks focus on the subject, then reframes — but beware the small focus-plane error this introduces at wide apertures and close range, where depth of field is razor-thin.

2.10.4 Lenses and focal length

Focal length sets the field of view — the geometry was derived in the Pinhole chapter, including the "35 mm" / crop-factor sidebar — so here we treat lenses as a photographer chooses them: what each focal length is for, and what fast/slow and prime/zoom buy you.

Primes versus zooms. A prime lens is one fixed focal length; for the same quality it is typically brighter (faster), sharper, smaller, and cheaper, and it "makes you think and move your feet." A zoom trades some of that for the convenience of many framings in one lens. A cheap fast prime — a 35 mm or 50 mm or 85 mm at f/1.8 — is some of the best value in photography: razor-sharp and bright for the price. Modern zooms are excellent, but the basic kit zoom is usually a camera's weakest optical link.

The focal-length families (quoted on full frame; remember the crop factor for smaller sensors), each with what it is for (Figure 2.10.13):

fig-focal-length-families
Figure 2.10.11. The focal-length families on one strip. Ultrawide / wide / normal / short-tele / tele / super-tele, each with a representative framing and use (landscape, reportage, everyday, portrait, sports/wildlife). Quoted in full-frame focal lengths; on a smaller sensor multiply by the crop factor for the equivalent field of view.
fig-dolly-zoom-sim
Figure 2.10.12. Try it yourself (web edition). The dolly zoom makes the focal-length-versus-distance distinction visceral: the camera physically dollies toward or away from the subject while the lens zooms to hold the subject exactly the same size — so the subject is pinned and only the background swells or recedes (the queasy "Vertigo" effect). Log-spaced focal-length (12–200 mm) and distance sliders are linked to preserve the framing and refuse to slide past either limit; pick a realistic environment and a subject — a head bust or one of several diverse 3D human figures. It shows directly that viewpoint, not focal length, sets the background (the compression of the Pinhole chapter). In the printed edition this stands in as a screenshot. The human 3D models are generated with Meshy AI.

Why portrait lenses flatter — it is the distance, not the lens. It is tempting to credit the "portrait" focal length itself for flattering a face, but the lens is only an accomplice. What actually shapes a face is the camera-to-subject distance, and the same focal-length-versus-distance distinction the Pinhole chapter drew for backgrounds governs faces too. Up close — where a wide lens forces you to stand to fill the frame — the nose is markedly nearer the lens than the ears, so by the perspective $1/Z$ falloff it looms: a bulbous nose, an inflated forehead, ears that recede into nothing. Step back and shoot from a few meters with a short telephoto and those front-to-back distances equalize, so the proportions read as natural. The "portrait" focal length (~85–135 mm) flatters precisely because, at a normal framing, it makes you back away to that comfortable distance; the longer focal length is just what crops the now-distant face back to a head-and-shoulders. The striking finding is that the chosen distance changes not only proportions but perceived personality: in rating studies, faces shot from closer are judged less trustworthy, less competent, and less attractive — a sub-meter change in where you stand measurably shifts how a stranger is read (Perona 2007; Bryan et al.). The flip side — the gross wide-angle face stretch when you cannot back away, as on a phone's selfie camera, and how to undo it computationally — is the subject of Perspective distortion and its correction.

fig-face-distortion-sim
Figure 2.10.13. Try it yourself (web edition). A live face-distortion simulator that separates the two effects above. Set the focal length, the magnification (face size in the frame), and the eccentricity (how far the face sits off-centre), and watch the face's shape change at constant size: the close-and-wide "big-nose" perspective when a short lens forces you in, and the wide-angle edge stretch as the face nears the frame's corner. A full-frame view and a face-cropped view let you compare shape rather than position; the subject can be a face, a neutral grid sphere, or one of several diverse 3D human faces. In the printed edition this stands in as a screenshot. The human 3D models are generated with Meshy AI.

Fast versus slow. A fast lens has a large maximum aperture (small f-number, e.g. f/1.4) — more light and shallower depth of field; a slow lens (f/5.6) is smaller and cheaper but light-starved. Zooms are often slow, and many have a variable maximum aperture that shrinks as you zoom in (e.g. f/3.5 wide, f/5.6 long). The "35 mm-equivalent" reminder is worth repeating but not re-deriving: a focal length only means a field of view relative to a sensor size, so smaller sensors crop and you multiply by the crop factor (APS-C ≈ 1.5×, Micro Four Thirds ≈ 2×, phones much more) to compare — see the Pinhole chapter's "35 mm" sidebar.

2.10.5 Lens filters: polarizers, ND, and graduated ND

A few pieces of glass screwed onto the front of the lens change which light the sensor captures — and the most interesting of them do something software cannot redo afterward (Figure 2.10.14).

The circular polarizer (CPL). Natural light is unpolarized (its electric field oscillates in a rapidly varying mix of directions), but reflection off a non-metal surface — water, glass, wet leaves — partially polarizes what bounces off it, and so does the blue sky (skylight is most strongly polarized in a band about $90°$ from the sun). A polarizer is a rotatable filter that passes only one polarization, so turning it lets you cut glare and reflections off those surfaces, deepen a blue sky, and boost saturation as the haze of scattered, polarized light is removed (this is the polarization of the Reflection sidebar, put to work). It costs about 1–2 stops of light, and it is "circular" — a linear polarizer followed by a quarter-wave plate — specifically so that the camera's beam-splitter-based autofocus and metering still see unpolarized light and keep working. Crucially, its effect is not reproducible in post: it changes the light before capture, so no slider can recover the un-glared, deep-sky image from a shot taken without it.

Neutral-density (ND) filters. An ND is a uniform gray that cuts the light by a chosen number of stops with (ideally) no color shift — a brightness knob in glass. Its point is to let you choose a setting you otherwise could not in bright light: a slow shutter for silky water or motion blur, or a wide aperture for shallow depth of field under a midday sun. Strong NDs (6–10 stops) make daytime long exposures possible; a variable ND is simply two stacked polarizers, rotated against each other to dial the strength continuously.

Graduated ND (GND). A graduated ND is dark at the top and clear at the bottom, with a hard, soft, or reverse transition. Lined up with the horizon, it holds back a bright sky so a high-contrast landscape fits inside a single exposure — an optical alternative to HDR bracketing (→ Image measurements; Multiple exposure), done in glass at capture time rather than by merging frames later.

UV / protective filters. A clear or UV filter today is mostly lens protection (it once cut UV haze on film); whether the protection is worth adding one more glass surface that can introduce flare is a perennial photographers' debate.

The throughline ties the section back to the chapter's bigger argument: the polarizer's and GND's effects are largely un-doable in software — the polarizer changes the captured light itself, and the GND is a per-shot exposure blend — which is exactly why these physical filters survive into the computational era, whereas a plain ND (which only scales brightness, something software can fake) is the one whose job a computer can take over.

[figure fig-lens-filters not built]
Figure 2.10.14. (Deferred — to be generated.) The three image-altering filters. (Left) a circular polarizer rotated through its range: glare and reflections off water and glass vanish, the blue sky darkens, and saturation rises — strongest in the band ~$90°$ from the sun. (Middle) a strong ND turning a bright daytime scene into a long exposure (silky water) at a slow shutter. (Right) a graduated ND aligned to the horizon, holding back the bright sky so the whole high-contrast scene fits one exposure — hard vs soft vs reverse edges. Polarizer and GND effects cannot be reproduced in post.

2.10.6 Keeping the lens clean

A smudged or dusty front element is not a cosmetic problem — it is an optical one. Stray light hitting the dirt scatters across the frame, producing veiling flare, lowering contrast, and casting soft dust spots (worst when stopped down, where the spots come into focus). Clean glass is real image quality, not fussiness.

There is an order of operations, gentlest first, because each step risks the delicate anti-reflection coating. Start with a blower to lift loose grit without dragging it across the glass; then a soft brush; then a microfibre cloth or lens pen; and only for stubborn smears, a drop of lens fluid. Wipe gently and from the centre outward, and never dry-wipe a gritty element — that grinds the grit into the coating.

The companion warning is don't overdo it: every wipe is a small risk to the coating, and a little dust is optically harmless (it is far out of focus and scatters a negligible fraction of the light). A protective / UV filter or simply the lens hood prevents most of the problem in the first place. The cousin problem is sensor dust — but those spots sit in a fixed place in the frame (they do not move when you change lenses or zoom), which is the giveaway. Cameras fight it with an in-body ultrasonic shake of the sensor at startup; beyond that, a rocket blower or a wet swab cleans it, and dust-mapping lets software paint the spots out automatically (→ dust-spot removal, BASIC).

2.10.7 Image stabilization

A long exposure handheld blurs not only moving subjects but the whole frame, from the tiny tremor of your hands. Image stabilization fights that camera shake: a gyroscope senses the camera's angular shake and a feedback loop counters it during the exposure, letting you hand-hold slower shutter speeds — typically a gain of 2–5 stops (so a shot that needed 1/100 s can succeed at 1/4 s). There are three implementations (Figure 2.10.15):

fig-stabilization
Figure 2.10.15. Three stabilization schemes. OIS: a floating lens element shifts to counter shake. IBIS: the sensor rides a gimballed stage and counter-moves, working with any lens. EIS / digital: a margin is cropped and each frame warped to cancel motion — no moving parts, at the cost of resolution and field of view. In all three, a gyroscope senses the angular shake and drives the counter-motion.

The crucial limit: stabilization fixes camera shake, not subject motion. A stabilized 1/4 s shot of a static scene is sharp, but a moving subject in that same frame still blurs — for that you need a fast shutter. This is a common beginner confusion, worth stating flatly: stabilization buys you slow shutters for still subjects only.

Sidebar — image stabilization as a controllable blur actuator

The stabilizer is, physically, a precise way to move the image during the exposure — so driving it deliberately turns it into a blur generator. Bando and Holtzman (ICCP 2011) used it for coded and custom motion blur, a hint that hardware meant to remove blur can be repurposed to shape it. We return to coded exposure in the Advanced part.

2.10.8 UI and display

A camera's controls are abstractions; its viewfinder and display are where those abstractions become visible feedback — and this is exactly what mirrorless changed (Figure 2.10.16).

Viewfinders. An optical viewfinder (OVF) — the SLR's mirror-and-prism, or a rangefinder's window — shows the real world directly: zero lag, infinite resolution, but it does not show the captured exposure or white balance, only the scene. An electronic viewfinder (EVF) (and the rear LCD) shows the sensor's live feed instead: slightly laggy, but what-you-see-is-what-you-get. Exposure, white balance, and even a depth-of-field preview are all visible before you press the shutter.

Exposure preview and the live histogram. Because the EVF or LCD can simulate the captured image, it can overlay a live histogram to judge clipping in real time — which is the practical home of ETTR: push the histogram to the right without letting the highlights pile up against the right edge.

Focus aids. For manual focus, focus peaking highlights the high-contrast (in-focus) edges, and a magnify mode zooms in to check critical focus. For exposure, zebras — animated diagonal stripes over near-clipped highlights, borrowed from video — warn you before you blow out the brights.

fig-viewfinder-types
Figure 2.10.16. OVF vs EVF vs rear LCD. The OVF (SLR mirror/prism or rangefinder window) shows the real world directly — zero lag, but not the captured exposure or white balance. The EVF and rear LCD show the sensor's live feed — slightly laggy, but WYSIWYG, with exposure, white balance, and depth-of-field preview, plus overlays (live histogram, focus peaking, zebras).

The throughline: an EVF turns the abstractions of exposure and focus into direct visual feedback, which is why mirrorless changed how people shoot — you stop guessing and start seeing the photograph before you take it.

2.10.9 Video modes

Video is a stack of exposures, so most of it forward-references the dedicated Motion/Video chapter; here we name just the controls a stills photographer meets when they hit the record button.

The algorithms behind all of this — and the full treatment of motion — live in the Motion/Video chapter; this section is deliberately brief.

2.10.10 Programmability, or the lack thereof

Here is an obstacle that shapes this entire book. Most cameras run closed firmware: you cannot run your own code on them. The computational-photography ideas in the chapters ahead — multi-frame high-dynamic-range merging, burst denoising, computational bokeh — all need programmable capture: control of the per-frame exposure, focus, and gain, and access to the raw burst before the camera's own pipeline mangles it. A stock camera grants none of this, so its computational ideas cannot even be tried on most cameras (Figure 2.10.17).

The community responded with hacks: CHDK (Canon Hack Development Kit, for PowerShot compacts; CHDK (Canon Hack Development Kit)) and Magic Lantern (for Canon DSLRs; Magic Lantern) are add-on firmware that expose scripting, raw video, auto-bracketing, intervalometers, and focus stacking — features born precisely because the manufacturers withheld them. The open road, though, runs through phones: Android's Camera2 / CameraX API (and Google's earlier FCam research camera) expose per-frame control of exposure, focus, and gain, plus access to the raw burst. That openness is why phones, not cameras, became the home of computational photography — the platform that grants programmable capture is the platform where the research happens, and it is the bridge to the rest of this book.

There is also a maker road, the most open of all: a Raspberry Pi with a camera module (the HQ camera, driven by the libcamera / picamera2 stack) is a cheap, fully open, fully programmable camera. It hands you per-frame control of exposure and gain and direct access to the raw Bayer frames on a hobbyist board you can script in a few lines of Python — which makes it the natural platform on which to try this book's algorithms and build a do-it-yourself computational camera.

[figure fig-camera-programmability not built]
Figure 2.10.17. Closed firmware versus an open capture API. A stock camera is a black box: the user's code cannot reach the per-frame exposure/focus/gain controls or the raw burst, so this book's algorithms cannot run on it. A phone's Camera2-style API exposes exactly those hooks — per-frame control and raw access — which is why computational photography lives on phones. Community firmware (CHDK, Magic Lantern) pries a few hooks open on closed cameras.

2.10.11 Anatomy of a modern full-frame interchangeable-lens camera

It helps to see the whole machine at once, because each block is a previous section made physical (Figure 2.10.18). Trace the light from front to back through a modern mirrorless body:

Lens and mount (electronic contacts carry autofocus, aperture, and stabilization signals) → sensor on an IBIS stage (the moving sensor of the stabilization section) → shutter — a mechanical curtain and/or an electronic rolling shutter (the readout of the sensor chapter) → image processor and AF — on-sensor PDAF, subject detection, and the image signal processor (ISP, the computational pipeline of the next part) → EVF and rear LCD (the live feed of the UI section) → card and battery.

The defining change from the single-lens reflex (SLR) is that the mirror box and optical prism are gone. In an SLR, a 45° mirror bounced the lens's image up into an optical prism and the OVF, then flipped up out of the way at the instant of exposure. Remove it, and the sensor feeds the EVF directly — which is exactly what enables WYSIWYG preview, on-sensor PDAF spread across the whole frame, a silent electronic shutter, and a shorter lens-to-sensor (flange) distance that allowed the new, smaller lens mounts. The mirrorless body is the SLR with its one big moving part removed and the sensor promoted to do the viewfinder's job.

fig-mirrorless-anatomy
Figure 2.10.18. Cutaway of a modern mirrorless body, light to bits. Lens + mount (electronic contacts) → sensor on the IBIS stage → mechanical and/or electronic shutter → image processor + AF (on-sensor PDAF, subject detect, ISP) → EVF + rear LCD → card + battery. Inset: the older SLR's mirror box and optical prism, the parts mirrorless removes — which is what enables WYSIWYG preview, full-frame on-sensor PDAF, and shorter flange distances.

2.10.12 Anatomy of cell-phone cameras

A phone reaches the same goal — a good photograph — by a wholly different route, and the difference is the punchline of the chapter (Figure 2.10.19). Its hardware is constrained: a tiny sensor behind a bright, fixed-aperture lens. There is no iris, so exposure is set by shutter and ISO alone. The tiny sensor has two consequences that fall straight out of earlier chapters: depth of field grows as the sensor shrinks (the lens chapter), so a phone has enormous depth of field and must fake background blur; and a small sensor collects fewer photons, so it has more noise and must burst-and-average to clean it up.

Where a camera uses a zoom lens, a phone uses a multi-camera array: separate ultrawide / wide / tele modules, the long end often a periscope that folds the light path sideways with a prism so a long focal length fits a thin body. "Zooming" really means switching cameras (with cropping and fusion in between).

fig-phone-camera-array
Figure 2.10.19. A phone's multi-camera array. Separate ultrawide, wide, and tele modules — each a tiny sensor behind a bright fixed-aperture lens — stand in for a zoom; "zooming" switches between them (plus crop and fusion). The long end is often a periscope: a prism folds the optical path sideways so a long focal length fits inside a thin body. Tiny sensors mean huge depth of field and more noise — overcome by computation, not glass.

The key idea is that the computational pipeline does the heavy lifting that the optics cannot: demosaicking, multi-frame high dynamic range (HDR; the HDR+ pipeline) and burst denoising, computational bokeh (depth from the dual-pixel split or the multi-camera array), night mode, and super-resolution. The phone overcomes small-sensor physics with computation rather than glass — forward-referencing the Basic ISP and Multiple-exposure parts, where these algorithms live. This is the chapter's punchline and the book's bridge: cheap optics plus a smart pipeline beats good optics alone.

2.10.13 Types of cameras

The two anatomies above — a full-frame mirrorless body and a phone — are just two points in a much wider space. It helps to lay that space out along a few axes, because they are the axes a photographer actually shops along (Figure 2.10.20).

By sensor size — the spine of the taxonomy, because sensor size is what sets depth of field, light-gathering, and body bulk (cross-reference the crop factor of the Pinhole chapter). From smallest to largest: phone1-inch compactsMicro Four ThirdsAPS-Cfull-frame (35 mm)medium formatlarge-format / view cameras. Each step up gathers more light and shrinks depth of field at the same framing, and each makes the camera bigger and dearer.

By lens. Cameras split into fixed-lens (phones, compacts, bridge/superzooms, instant cameras) and interchangeable-lens (ILC) bodies (mirrorless, DSLR, medium format) where you swap optics for the job.

By mechanism and viewfinder. The DSLR (a reflex mirror feeding an optical viewfinder) is the legacy design; mirrorless (an EVF fed by the live sensor) is now dominant — the anatomy above. Around them sit the rangefinder (a Leica-style window with a focus patch), the point-and-shoot / compact, the bridge / superzoom (a fixed, enormous-range zoom on a small sensor), the rugged action / 360 cameras (ultrawide, fixed), the instant camera (Polaroid / Instax, a self-developing print), still-made film bodies (→ the darkroom below), and the view / technical camera (large format with movements → tilt-shift, deferred to Optics).

By platform — how the camera is carried (aerial photography). A camera also gets classified by where you put it. The modern default for getting above a scene is the drone / UAV, a gimballed camera that flies. But it caps a long lineage of lifting a camera when no tripod will reach: balloon photography (Nadar over Paris, 1858), kite aerial photography (Arthur Batut, 1880s, with a timer or altitude trigger firing the rig), pigeon cameras, and pole / mast photography for low overhead frames. The platform buys a viewpoint, not a different sensor — it is the same imaging, simply lifted.

The practical lesson is that sensor size and lens system — not megapixels — are what mostly separate these cameras; the rest is ergonomics, platform, and price.

[figure fig-camera-types not built]
Figure 2.10.20. (Deferred — to be generated.) The space of cameras on two axes. A sensor-size ladder (phone → 1″ → Micro Four Thirds → APS-C → full-frame → medium → large format), each rung with a representative body, drawn against a second axis of fixed vs interchangeable lens — placing phone, compact, bridge/superzoom, mirrorless, DSLR, and medium format in the plane. Sensor size sets depth of field, light-gathering, and bulk; the lens axis sets versatility.

2.10.14 Cameras beyond photography

Most of the cameras in the world never make a picture for a human to look at. Their image is an input to a decision or a measurement — the same optics and sensor, but pointed at a different goal (speed, a spectral band, geometry, reliability) and producing a different output (a number, a position, a defect flag). This is the hardware counterpart of the Introduction's insistence that the field is not just about photography (Figure 2.10.21).

The throughline, echoing the Introduction: once a camera is a measurement device, "good" stops meaning pretty and starts meaning accurate, fast, and robust for the task at hand.

[figure fig-cameras-beyond-photography not built]
Figure 2.10.21. (Deferred — to be generated.) A strip of cameras that aren't for pictures, each labelled with its output rather than an image: machine-vision / industrial inspection (defect flag), barcode / QR reader (a decoded number), robot / drone (pose + depth via SLAM), automotive ADAS (detections), scientific / cooled-astronomy (a calibrated measurement), medical endoscope (a diagnostic view), security CCTV and AR/VR tracking. Same optics + sensor, different goal: accuracy, speed, and robustness over beauty.

2.10.15 A camera's other sensors

The image sensor is not the only sensor in a modern camera. Tucked around it is a small suite of others, and several of them feed straight into the computational pipeline rather than just decorating the metadata.

The most important is the inertial measurement unit (IMU) — a tiny accelerometer (sensing linear acceleration and, through gravity, which way is down) and a gyroscope (sensing angular rotation). It is the camera's sense of its own motion, and it does real work: it drives the stabilization of the previous section (the gyro tells OIS and IBIS exactly which way to counter-shift), it auto-rotates the picture and levels a horizon overlay, and increasingly it is an input to computation — feeding video stabilization (EIS), seeding the frame-to-frame alignment in burst and panorama modes so the software starts from a good guess, and even informing gyro-based deblurring, where knowing the camera's motion during the exposure helps estimate the blur kernel (forward-referenced to the deblurring and motion chapters). The IMU is why a phone, which cannot afford a heavy stabilized lens, can still hand-hold a long night-mode exposure.

The rest of the suite is mostly about context and metadata:

The throughline echoes the rest of the chapter: a camera is no longer just a lens and a sensor but a small sensor fusion platform, and the side channels — especially the IMU — are increasingly part of how the picture is computed, not just notes attached to it.

[figure fig-camera-sensor-suite not built]
Figure 2.10.22. (Deferred — to be generated.) The sensors surrounding the image sensor in a modern camera/phone: the IMU (accelerometer + gyroscope) driving stabilization and alignment, GPS/GNSS and magnetometer for geotag and heading, barometer, ambient-light/flicker sensor, laser/ToF and lidar depth sensors, microphones, and housekeeping (temperature, Hall position, clock) — with arrows marking which ones feed the computational pipeline versus the EXIF metadata.

2.10.16 Flash and lighting

Sometimes the available light is wrong — too little, too harsh, too contrasty — and the photographer adds light with a flash. Done well, this is dynamic-range management, not just "more light" (Figure 2.10.24).

TTL metering automates flash power the way autoexposure automates ambient exposure: the camera fires a brief pre-flash, meters its return through the lens (TTL), and sets the main flash power so the subject lands at the right brightness.

Sync speed and high-speed sync. A focal-plane shutter (the curtain in an SLR or mirrorless body) only fully uncovers the sensor up to its X-sync speed (~1/200 s). Faster than that, the shutter is a moving slit and a single flash pulse would light only the strip the slit exposes — so the flash cannot illuminate the whole frame. High-speed sync (HSS) works around this by pulsing the flash rapidly as the slit travels, at a large cost in light. A leaf shutter (built into the lens, opening fully at any speed) instead syncs at all shutter speeds — a real advantage for daylight fill.

Fill flash. In harsh or backlit light, add flash at −1 to −2 EV below the ambient — enough to open the shadows without looking flashed. It is exactly dynamic-range management: you are not lighting the scene, you are lifting the dark end into the range the sensor can hold.

Bounce and off-camera. Never fire a bare flash straight at the subject — it gives flat, harsh light and red-eye. Bounce it off a ceiling or wall (~45°), or spread it with a diffuser, to enlarge the apparent light source and soften the shadows. Taking the flash off-camera entirely gives directional, shaped light (→ Computational illumination, including computational bounce flash, Davis et al. SIGGRAPH Asia 2016).

fig-flash-sync
Figure 2.10.23. Flash sync and fill. (Left) fill flash before/after: harsh, backlit shadows opened by adding flash at about −1.5 EV — dynamic-range management, not "more light." (Right) sync: a leaf shutter opens fully at any speed and so syncs at every shutter speed, while a focal-plane curtain only fully uncovers the sensor up to its X-sync speed (~1/200 s); above that, a moving slit means a single flash pulse can't light the whole frame, and high-speed sync (HSS) pulses the flash to cover it.
fig-portrait-lighting-sim
Figure 2.10.24. Try it yourself (web edition). A live portrait-lighting simulator: three soft area lights — a key, a fill, and a kicker, each with its own direction (azimuth, elevation), size, intensity, and colour — sculpt a 3D face, with area-light-supersampled soft shadows and a physiologically grounded melanin/hemoglobin skin model. A second "from behind" view shows the lights as studio umbrellas plus the camera so you can read the whole setup, and one-click presets (butterfly, loop, Rembrandt, split, clamshell, rim) let you feel how a light's placement and size — not its brightness — shape a face. In the printed edition this stands in as a screenshot. The 3D studio umbrella is generated with Meshy AI.

2.10.17 The traditional darkroom

A photograph is only half made when the shutter closes. For most of photography's history the other half happened in a darkroom, and it is worth knowing — both because film workflows are enjoying a revival and because nearly every tool in today's editing software is a darkroom operation in disguise.

The chemical pipeline has two stages. First the exposed film is developed: bathed in a developer that turns the exposed silver-halide grains to metallic silver, a stop bath that halts the reaction, and a fixer that dissolves away the unexposed grains so the image is no longer light-sensitive. The result is a negative — dark where the scene was bright. Then comes printing: an enlarger projects the negative onto a sheet of light-sensitive photographic paper, which is developed in its turn to give a positive print.

Printing is where the craft lives, because the enlarger is a second exposure that the printer fully controls:

This is the literal origin of the dodge and burn tools in Photoshop, of the exposure and contrast sliders everywhere, and of Ansel Adams's Zone System (the metering section's 18% gray is its Zone V) — a discipline for planning, at capture time, exactly where each scene tone will land on the print. The darkroom is also the concrete meaning of the book's recurring metaphor that the print is a performance of the negative (the score and the performance, and the measurement chapter's sidebar): two printers, one negative, two different photographs.

[figure fig-darkroom-enlarger not built]
Figure 2.10.25. (Deferred — to be generated.) The chemical darkroom. (Left) the film pipeline: develop → stop → fix → a negative. (Right) the enlarger projects the negative onto photographic paper; the printer controls overall exposure time (lightness) and contrast grade (filter/paper), and shapes regions by dodging (holding light back → lighter) and burning (adding light → darker) — the operations that name today's editing tools.

2.10.18 The digital darkroom: editing software

Today that darkroom is software, and it comes in two flavors worth telling apart, because they embody two different philosophies of editing (Figure 2.10.26).

Parametric raw "developers" — the Lightroom style. Adobe Lightroom, Capture One, Apple Photos, and the open-source darktable and RawTherapee are, at heart, raw developers wrapped in a library. They are non-destructive and parametric: the original file is never altered; every edit is stored as a small list of instructions applied on top of it, re-rendered live and reversible at any moment — the digital analog of deciding how to print a negative. The toolset is mostly global adjustments: white balance, exposure, contrast, highlights / shadows / whites / blacks, the tone curve, HSL (per-color hue/saturation/luminance), clarity / texture / dehaze, sharpening and noise reduction, lens corrections (distortion, vignetting, chromatic aberration), and crop / straighten — plus local adjustments painted in with masks, graduated and radial filters, and increasingly AI-driven subject and sky masks. The other half of the job is asset management: importing, rating, keywording, and searching tens of thousands of photos. This is the everyday workflow for most photographers, applied across whole shoots at once.

Pixel editors — the Photoshop style. Adobe Photoshop (and GIMP, Affinity Photo) work the other way: they manipulate the pixels directly, with layers, selections, and masks for compositing, and a deep toolbox for retouching — the clone stamp and healing brush to remove blemishes and distractions, content-aware fill to erase objects, dodge / burn tools (there they are again), curves and levels as adjustment layers, filters, text, and montage. This is where you do pixel-level surgery, combine several frames into one, and composite — the things a parametric developer is not built for.

In practice most serious workflows use both: develop and cull in a Lightroom-style tool, then hand the few keepers to Photoshop for heavy retouching or compositing. The dividing line is global, reversible, many-photos versus local, pixel-level, one-photo.

And here is the bridge into the rest of this book: every one of those tools is an algorithm, and we are going to study how each one works. The exposure and contrast sliders are point operations and tone curves (BASIC → Point operations, Tone mapping); the histogram you push for ETTR gets its own chapter (Histograms); white balance is color constancy (Color technology); sharpening, blur, and clarity are convolution and edge-preserving filtering (Neighborhood operations; Bilateral filtering); noise reduction is denoising; raw development is demosaicking; crop and straighten are resampling; content-aware fill and seamless compositing are Poisson editing and seam optimization (EDGES MATTER); and the newest "generative fill" and AI masking are diffusion models and learned segmentation (COMPUTATIONAL TOOLS). The slider is the interface; the rest of the book is the machinery underneath it.

[figure fig-editing-lr-vs-ps not built]
Figure 2.10.26. (Deferred — to be generated.) The two digital darkrooms, annotated with where each tool is explained in this book. (Left) a Lightroom-style develop panel — white balance, exposure/contrast, highlights/shadows, tone curve, HSL, clarity/dehaze, sharpening/noise reduction, lens corrections, local masks — labelled with the chapters that derive them (point operations, tone mapping, color constancy, denoising). (Right) a Photoshop-style workspace — layers, masks, clone/heal, content-aware fill, curves — labelled with theirs (convolution, Poisson/seam, diffusion). Parametric & non-destructive vs pixel-level & compositing.

This chapter has walked photography end to end — exposure, modes, focus, lenses and their filters, keeping the glass clean, stabilization, the viewfinder, video, programmability, the two anatomies that book-end the field (a full-frame body where good optics do the work, and a phone where computation does), the wider taxonomy of camera types and the many cameras that measure rather than photograph, and finally the post-capture darkroom, chemical and then digital. That last step is the hinge into the rest of the book: the phone is the machine the book is really about, and editing software is where its output is finished — and everything from the develop slider to content-aware fill is an algorithm we are about to study. The next parts build the pipeline — image representation, the ISP, multi-frame fusion, and the editing operations themselves — that lets a fingernail-sized sensor punch far above its physics.

Big lessons of this chapter

The recurring principles from this chapter, gathered for review.

💡 Big lesson (L7) — photographers measure light in stops

Photographers measure light in stops — a log₂ (doubling) scale. A stop is a log₂ unit: one stop is a factor of $2$ in light, and $\text{stops} = \log_2(\text{light ratio})$. Aperture (f-stops), shutter, ISO, and dynamic range are all counted in stops, so exposure is additive in log space — adding one stop here and subtracting one there keeps the total fixed, which is exactly what makes the three controls of the exposure triangle interchangeable. It is the same "ratios matter, work in log" principle we meet again in gamma encoding and tone mapping.