Computational Photography, an AI-powered Slopendium

▸Part 2 FUNDAMENTALS OF PHOTOGRAPHY⧉

`fig-light-journey` · FUNDAMENTALS part opener — the journey of light: source → scene (reflection) → lens → sensor / retina → visual system, each step labelled with its chapter ✅

**Photography means writing with light** — from the Greek *phōs* ("light") and *graphē* ("drawing"). This first part follows light on its entire journey and studies each step in turn.

• **the path of light** (and the chapters that follow it): a **source** emits light → it strikes **objects** and is reflected, absorbed, and scattered (the scene) → it passes through a **lens** → it lands on a camera **sensor** or the **retina**'s cones and rods → and the **visual system** interprets it.

• **Light and physics** — what light *is* (wave / ray / photon), how it interacts with matter (reflection, refraction, scattering, the BRDF), and how we *measure* it (radiometry: radiance, irradiance, the inverse-square and cosine laws).

• **Image formation and linear perspective** — how a lens turns 3-D rays into a 2-D image (pinhole, perspective projection), depth of field, the **sensor** and the exposure triangle, and **noise**.

• **Human perception and color** — the eye and visual system; why three **cones** make us trichromatic; color as the projection of a spectrum onto three numbers; opponent processing, constancy, and the contrast sensitivity function.

• **Color technology** — the engineering payoff: **measuring** color (CIE), **encoding** it (linear / gamma / log, color spaces, CIELAB), and **reproducing** it (additive vs subtractive synthesis, gamuts, white balance, color management).

• **Human factors and the art of photography** — making *good* pictures (composition, light, portraiture), the perception of images, and the **ethics** of photography.

• **skippable, but foundational**: a reader who only wants to code can jump straight to BASIC IMAGE PROCESSING — but the **big lessons** of this part recur everywhere later. The ones to carry forward:

• **multiplicative vs additive** — much of imaging is *multiplicative* (illumination × albedo, contrast), some *additive* (incoherent light from different sources, blur); the right **encoding** (linear / log / gamma) follows from which one you are doing (→ [[Big Lessons]]).

• **gamma / ratios matter** — perception is roughly logarithmic, so we store images **gamma-encoded** to spend code levels where the eye looks; what matters is *ratios*, not absolute values.

• **noise is affine** — sensor noise is **read + shot** (a constant plus a term that grows with the signal); because shot noise is **Poisson**, SNR is worst in the **shadows**, which is exactly where noise *looks* worst.

• **projective geometry is matrices + a division** — perspective projection is a **linear map in homogeneous coordinates** followed by a divide-by-depth; that single trick handles cameras, warps, and stereo.

• **color is non-orthogonal and non-negative** — the cone "axes" overlap and light cannot be negative, which is what makes color sensing and reproduction genuinely hard (no perfect set of primaries; white balance is never exact).

• **by the end of this part** you can trace a photon from a lamp to a perceived color, reason quantitatively about exposure and noise, and you will carry the small set of principles the rest of the book keeps reusing.

▸2.1 Light and physics⧉

• chapter intro: a working model of light — its **color** (spectrum), its **interaction with surfaces**, and its **energy** (radiometry); closes with the **plenoptic function** that ties the pipeline together

▸2.1.1 Light: rays, waves, and the spectrum⧉

`fig-light-models` · three models of light: wave (λ) vs ray vs particle/photon propagation, and what each is good for 🟨

`fig-newton-prism` · Newton's prism experiment: white light dispersed into a spectrum (classic diagram) 🟨

`fig-em-spectrum` · EM spectrum / wavelength map 🟨

• three models: **ray** (geometry), **wave** (color, diffraction), **particle/photon** (counting → noise) — use whichever fits the question [wave/ray/particle figure]

• side bar — **wave–particle duality, made visible with a liquid analog**: light is *both* wave and particle, which is famously hard to picture. A startling **hydrodynamic analog** makes the marriage tangible: a tiny oil droplet **bouncing** on a vibrating fluid bath becomes a **"walker,"** carried across the surface by its own **pilot wave**, and these walkers reproduce quantum-like behaviour (single- and double-slit statistics, quantized orbits, tunnelling). They are *not* a theory of quantum mechanics, but they are a gorgeous classical picture of a particle that is inseparable from its wave. [embed video — Couder & Fort and **John Bush's** (MIT) walking-droplet experiments → [[Video Resources]]] (revisited at diffraction, below)

• wavelength, EM map, spectra

• **Newton's prism experiment** (the historical hook): a prism splits white light into the **spectrum** — white light is a *mixture* of wavelengths, and the *experimentum crucis* (a second prism) shows those pure colors don't split further. [classic diagram]

• a spectrum = power vs wavelength; like an **infinity of channels**, of which a camera (and the eye) keep only three → **metamerism** (Color technology)

equations

c = λν (wavelength × frequency)

▸2.1.2 Reflection, refraction, and what happens at a surface⧉

`fig-reflection-refraction-diffraction` · reflection (equal angles) · refraction (Snell, two indices) · diffraction 🟨

`fig-polarization` · unpolarized (random orientations) vs polarized light 🟨

`fig-atmospheric-scattering` · Rayleigh scattering: blue sky (short path) vs red sunset (long path) 🟨

• a camera sees nothing that hasn't bounced off something

• **reflection** (angle out = angle in); **refraction** (light slows in glass/water → *refractive index*; **Snell's law** n₁ sinθ₁ = n₂ sinθ₂); **diffraction** (→ Wave effects section below)

• marching-band (wavefront) intuition; fuller lens geometry deferred to Single lenses

• **interactive — a wave simulation of refraction** (`fig-refraction-wave-sim`): the same 2-D FDTD wave solver as the diffraction figure, but with a slower medium (higher index) filling the lower half of the box. An angled plane wave crosses the interface and its wavefronts **bend toward the normal** and **crowd together** (λ→λ/n) — the wave speed drops, so Snell's law becomes a *consequence* of the wave picture (the wave-level version of the marching-band intuition and Fermat's equal-path view in [[#Single lenses: refraction and Snell's law]]). Sliders for the index n₂ and the incidence angle, with an on-figure Snell check. Companion to `fig-diffraction-wave-sim`.

• **light–matter interaction is largely *resonance***: much of what light does to matter — the **index of refraction** itself, and **absorption** (hence material color) — comes from the **resonances** of atoms, molecules, and bound electrons. Light's oscillating field drives those bound charges, which resonate at characteristic frequencies (the **Lorentz-oscillator** picture); **near a resonance** the material **absorbs** strongly, and the **index of refraction varies with wavelength** (**dispersion**) according to how far each wavelength sits from the nearest resonance. This is why different materials transmit/absorb different colors, and why $n(\lambda)$ gives **prisms, rainbows, and chromatic aberration**. Flag forward: dispersion **shapes much of lens design** — achromatic/apochromatic doublets (crown+flint), the fight against chromatic aberration in fast lenses → [[Optics, lenses, and aberration correction]]. **Interactive** `fig-lorentz-resonance`: a driven bound electron (mass on a spring) with the absorption peak at ω₀ and the n(ω) normal→anomalous dispersion curve; sliders for driving frequency and damping.

• side bar: **polarization** — light is a *transverse* wave (field ⊥ propagation); **polarization** is the **orientation of that oscillation**. Natural light is **unpolarized** (orientation a rapidly varying mix, *no single direction*); reflection partly polarizes it → a polarizing filter cuts reflections / deepens the sky. Two everyday encounters: (a) **good sunglasses are polarizers** that selectively kill glare — *why* lives in the reflection/BRDF material (specular reflection off dielectrics is partially polarized, maximally at **Brewster's angle**) → [[#The BRDF and the look of materials]], the polarizer filter in [[#Lens filters: polarizers, ND, and graduated ND]], and reflection / specular removal in [[Single-image computational photography]]; (b) **LCD displays emit polarized light** — the liquid-crystal shutter forms an image by **rotating polarization between two crossed polarizers**, which is why a polarizer (or polarized sunglasses) makes some screens go dark at certain angles.

• **why some screens go dark through sunglasses and some don't.** Polarized sunglasses pass only **vertically** polarized light (so they can block the horizontally-polarized glare off roads and water). An LCD emits **linearly polarized** light fixed by its **exit polarizer**; whether the screen survives your sunglasses depends on *that polarizer's orientation*: if it emits **vertical** light it passes straight through, if **horizontal** the glasses **black it out**, and at **45°** it dims. Manufacturers choose the axis — laptops and monitors are usually safe in landscape, but many phones (and especially **car dashboards / GPS**, ATMs, and aircraft seat-back screens) are built with a diagonal or horizontal axis, so they wash out or go black when you tilt your head or rotate the phone 90°. The honest fix is to put a **quarter-wave retarder** *outside* the exit polarizer so the emitted light is **circularly** (not linearly) polarized — then it looks the same through any linear polarizer at any rotation; phones that stay bright through sunglasses generally do this.

• side bar: **why the sky is blue and the sunset orange** — **Rayleigh scattering**: air molecules scatter short (blue) wavelengths far more than long (red) ones (∝ 1/λ⁴). At midday the sunlight takes a *short* path through the atmosphere and the blue it sheds scatters into the sky in every direction → the sky reads **blue**; at sunset the light skims a *long* path, its blue is scattered away before it reaches us, and mostly **orange–red** survives → the sun reddens. (Larger droplets scatter all wavelengths about equally — **Mie** scattering → white haze and clouds.) [atmospheric scattering figure] [Minnaert]

• side bar: **rainbows** (spoiler: it's a caustic) — and a desktop demo. A **single spherical raindrop** bends sunlight three times: **refract** in at the front, **reflect** off the back, **refract** out again; because $n$ depends on wavelength (**dispersion**, above) each color exits at a slightly different angle. Across a whole sky of drops those exit rays **pile up** into a bright **caustic** at the **rainbow angle** — about **42° (red)** down to **~40° (violet)** measured from the **antisolar point** (the shadow of your head) — and *that* accumulation is the rainbow. A **glass sphere** reproduces it on a tabletop: it **stands in for a giant water droplet**, and a bright lamp throws its dispersed caustic onto a card (`fig-rainbow-glass-sphere`).

• **the color order, and the second bow**: the **primary** bow has **one** internal reflection — **red on the outside, violet inside**. A fainter **secondary** bow (~51°) comes from **two** internal reflections, with the colors **reversed**; between them sits **Alexander's dark band** (named for Alexander of Aphrodisias, ~200 AD), darker because one-reflection light emerges only at **≤42°** and two-reflection light only at **≥51°**, so the 42–51° gap gets almost no rainbow light (and the sky *inside* the primary is correspondingly brighter). Same geometry, one extra bounce. Real example: a **double bow over Loch Ness** (`fig-rainbow-loch-ness`, photo) showing both bows and the dark band.

• **dispersion alone is NOT a rainbow**: a prism only *fans* colours; the rainbow's brightness and fixed angle are a **caustic**. Vary the impact parameter and the deviation $D(b)$ of the **one-internal-reflection** ray bottoms out at a **minimum** (~138° → bow at 42°) where rays pile up. Crucially, a ray with **only two refractions and no internal reflection makes no bow** — its $D(b)$ is **monotonic**, no minimum, no pile-up. Dispersion only splits the already-bunched light into the band. Interactive Descartes diagram `fig-rainbow-droplet` (water drop; drag the impact parameter, choose the number of internal reflections, and tick **"zoom near the caustic"** to re-range both the slider and the plot tightly around the bow; the deviation-vs-$b$ plot marks the minimum, and a side panel plots the **ray density** — an integral over impact parameter — whose **spike** at the stationary deviation *is* the bow. Dispersion is always on, so three colour bands separate near the caustic).

• **it's an angle, not an object**: the bow is centered on the antisolar point, so it can't be reached and *everyone sees their own* — it is a **caustic of accumulated rays at a particular angle**, not a thing sitting in space (the same "it's a direction" lesson as vanishing points). [Minnaert] [`fig-rainbow-glass-sphere`, real demo photo; `fig-rainbow-droplet`, interactive]

• **scattering & absorption** in media (fog, haze, underwater): single vs. multiple scattering → de-weathering / dehazing; the same physics as the blue sky, at close range

▸2.1.3 The color of objects: illumination times reflectance⧉

`fig-reflectance-illumination-spectra` · example reflectance & illumination spectra 🟨

`fig-illum-times-reflectance` · light = illumination × reflectance (per-wavelength) ✅

• a **red apple is not red in the dark**: color = **illumination × reflectance**

• reflectance ρ(λ) is the surface's intrinsic, fixed signature; illumination S(λ) is whatever light is present; their per-wavelength product C(λ) reaches the eye

• this multiplicative entanglement is exactly what **white balance** must later undo

• scope caveat: the clean product assumes direction-independent reflectance (matte case); the general directional case is the **BRDF**, next

• side bar: **albedo can appear > 100% (fluorescence cheats)** — real (non-fluorescent) reflectance is **≤ 1 per wavelength** (a surface can't return more light than it receives at that wavelength). But a **fluorescent** material **absorbs** short-wavelength (often **ultraviolet**) light and **re-emits** it at **longer** wavelengths, so in the re-emitted band the measured "reflectance" **exceeds 1** — apparent albedo tops 100%. Examples: **optical brighteners** in laundry detergent and paper ("whiter than white"), **highlighter** ink, **fluorescent safety vests**. The trick is **moving energy between wavelengths**, not creating it.

equations

reflected color C(λ) = S(λ)·ρ(λ) (illuminant spectrum S × spectral reflectance ρ, per wavelength)

▸2.1.4 The BRDF and the look of materials⧉

`fig-brdf` · light bouncing off a surface — the BRDF (incoming/outgoing dirs, normal, reflection lobe, θ angles) 🟨

`fig-diffuse-specular-spheres` · Lambertian (matte) sphere vs shiny (specular) sphere — same shape/light, different BRDF 🟨

`fig-global-illumination` · color bleeding: a red wall casts red onto a white floor (multiple bounces) 🟨

• the **BRDF** (bidirectional reflectance distribution function): a surface re-emits incoming light into a *distribution* of outgoing directions — radiance toward q = f_r(p→q; n, λ) · incoming; *bidirectional* = depends on both the incoming and outgoing directions. [BRDF diagram]

• **diffuse (Lambertian) vs specular (shiny)**: a matte surface scatters into a broad lobe and looks equally bright from any view (the light it *receives* falls off as cos θ); a shiny one concentrates near the mirror direction, giving a view-dependent highlight; most materials are a blend. [Lambertian sphere vs shiny sphere]

• **side bar — where a material's colour actually comes from.** "Colour" is not one mechanism; the same red can be made several physically different ways, and they behave differently (some shift with viewing angle, some don't), which is why this belongs with the BRDF:

• **absorption (pigments & dyes)** — the everyday case: molecules/electrons absorb certain wavelengths (the **resonance** picture from *Light and physics*) and what's left is the colour. Paint, ink, skin, leaves (chlorophyll), most natural surfaces. **Angle-independent** and the basis of the *illumination × reflectance* model above.

• **structural colour (interference & diffraction)** — colour from **micro-geometry**, not pigment: **thin-film interference** (soap bubbles, oil slicks, anodized metal, the blue of a Morpho butterfly, peacock feathers), **diffraction gratings** (CD/DVD, holographic foil, some beetles), and **photonic crystals** (opal). The hallmark is **iridescence** — the hue **shifts with viewing/lighting angle** — because it comes from path-length differences, so it is strongly **directional** and a genuinely *spectral* BRDF effect, not a single albedo.

• **scattering** — wavelength-dependent **Rayleigh/Mie/Tyndall** scattering makes colour without any pigment: the **blue sky** and blue eyes, the **Tyndall blue** of some bird feathers and milk, the red of a sunset (cross-ref the Rayleigh sidebar in *Reflection, refraction*).

• **metallic reflection** — in metals the **free electrons** set the reflectance; most metals are grey/white (aluminium, silver), but **gold** and **copper** absorb in the blue and so reflect warm — a coloured *specular* with a tinted highlight (unlike dielectrics, whose highlight stays the light's colour). Matters for shading models (metal vs dielectric is the key axis in PBR).

• **emission & fluorescence** — the surface **makes** light: incandescence/LEDs (emit), and **fluorescence/phosphorescence** which absorb short λ and re-emit longer (the albedo->100% and "fluorescence breaks the BRDF" sidebars).

• takeaway: pigment colour is a per-wavelength **albedo** (the simple model), but interference, scattering, and metallic colour are **angle- and spectrum-dependent**, which is precisely why a faithful material model is a *spectral, directional* BRDF rather than a single RGB reflectance. [PBR: Pharr, Jakob & Humphreys]

• **light–matter interaction also polarizes the light**: reflection (especially the **specular** component) partially **polarizes** what bounces off a surface — maximally near **Brewster's angle** — so a full account of the BRDF is *polarization-dependent* (a **polarization BRDF / pBRDF**). This is exactly what a **polarizing filter** exploits to cut glare and reflections (→ Photography 101 — Lens filters) and what **shape-from-polarization** and reflection-removal methods use (→ Advanced). [cross-ref the polarization sidebar in *Reflection*]

• **side bar — why good sunglasses are polarizers.** Glare off roads, water, snow, and car hoods is light that reflected off a roughly **horizontal** surface, so (near Brewster's angle) it is strongly **horizontally polarized**. Quality sunglasses are **linear polarizers oriented to pass vertical and block horizontal** light, so they kill that glare specifically — not just dim everything like a plain neutral-density tint. That's the difference between "good" (polarized) and cheap (just dark) sunglasses: you see *into* the water and the wet road instead of at the sheen. The same physics is the camera's circular/linear polarizer, and the same selective extinction is shape-from-polarization's signal. (One catch: polarized glasses can make some **LCD screens / phone displays** go dark at certain angles, because those screens emit polarized light.)

• **try it yourself (how to test your sunglasses).** Hold the glasses up to a glary reflection — a car hood, a puddle, a window, a lake — and **rotate them ~90°**. If the glare **darkens in one orientation and comes back in the other**, they are genuine **polarizers**; if nothing changes, they are just tinted. (That is exactly what `fig-polarizer-sunglasses` shows. A second quick test: look at an LCD screen and tilt your head — polarized lenses go dark at some angles.)

• hint of **global illumination** (light bounces many times between surfaces) — and it is not just a rendering nicety, it is something the photographer actively **exploits** and occasionally **fights**:

• **handy — reflectors and bounce flash**: because light keeps bouncing, you can shape it by adding bounce surfaces. A **reflector** (a white/silver/gold panel) catches the key light and **bounces fill** into the shadows; a **bounce flash** fires at a ceiling or wall so the *whole room* becomes a big, soft secondary source instead of a hard on-axis point. Both are global illumination put to work — softening contrast and wrapping light around the subject (→ Photography 101 — fill & bounce flash; [[#Human factors and the art of photography]]; computational bounce flash in Computational illumination). [photo: someone holding a reflector to bounce fill onto a subject — `fig-reflector-fill`, **shoot our own** unless a CC image turns up]

• **a challenge — unwanted bounced color**: the same bounced light can **hurt** you, because it carries the *color* of whatever it bounced off (color bleeding — the very thing the `fig-global-illumination` figure shows). The classic case is shooting at a **pool**: the strong **blue** light bouncing up off the water casts a cold tint on faces that makes decent **skin tones** genuinely hard to hold (a green lawn or a red wall does the same in its own hue). Knowing the bounce is colored is half the fix — block it, add neutral fill, or correct it later.

• **side bar — fluorescence breaks the BRDF.** A BRDF is **per-wavelength**: same wavelength in, same wavelength out, just redistributed in direction. **Fluorescence** violates this — it **absorbs short-λ (often UV) light and re-emits at *longer* wavelengths** (the **Stokes shift**), moving energy *between* wavelengths. So a fluorescent material is not described by a scalar BRDF per λ but by a **reradiation (bispectral) matrix** — input wavelength → output wavelength. (This is also the cheat behind apparent albedo > 100% — cross-ref, don't re-derive, the albedo sidebar in [[#The color of objects: illumination times reflectance]].) Everywhere once you look: **optical brighteners**, **highlighter** ink, **minerals under UV**, **GFP** in bio-imaging, **banknote security inks** — and it is the working signal in fluorescence **microscopy** and **forensics**.

• **side bar — subsurface scattering (why skin isn't plastic).** A BRDF assumes light leaves at the *same point* it entered. In **translucent** materials light instead **enters, scatters around beneath the surface, and exits somewhere else**, so a point-wise BRDF is **insufficient** — you need the **BSSRDF** (bidirectional *surface-scattering* RDF), which couples an entry point to a separate exit point. Examples: **skin**, **marble**, **milk**, **wax**, **leaves**. This is what gives skin its soft, "lit-from-within" glow and rounded shadow terminator; render skin with a pure BRDF and it looks **plastic**. (Cross-ref translucency / global illumination in rendering.)

equations

BRDF — outgoing radiance toward q = f_r(p→q

n, λ) · incoming

▸2.1.5 Illuminants⧉

`fig-illuminant-spectra` · blackbody / illuminant spectra (D65, tungsten, fluorescent) 🟨

`fig-color-temperature` · the (inverted) color-temperature scale: warm (low K) → cool (high K) 🟨

• 'white' light is not one thing — the same scene at noon vs sunset shifts color

• **blackbody** & **color temperature** (low K warm/orange, high K cool/blue); **D65**, **tungsten** (warm), **fluorescent** (spiky → poor color rendition)

• insist again: the stimulus is the **spectral product** (illuminant × reflectance) → motivates white balance

equations

blackbody spectrum set by temperature → color temperature (K)

▸2.1.6 Radiometry: radiance, irradiance, exposure, and falloff⧉

`fig-radiance-irradiance` · radiance (per direction) vs irradiance (per area) 🟨

`fig-cosine-irradiance` · why irradiance carries a cosine: a beam of width w vs its surface footprint L = w/cosθ 🟨

`fig-seasons` · sidebar — seasons: N. America in summer vs winter (sun angle → energy per area, cosine law) 🟨

`fig-inverse-square-vs-radiance` · point source 1/r² vs distance-invariant surface radiance 🟨

• **radiance** L: power per unit area per unit solid angle carried *along a ray* — ultimately what a pixel measures; **conserved along a ray** in vacuum (a wall doesn't get dimmer as you step back, only smaller)

• **irradiance** E: power per unit area *arriving on a surface* (the sensor) — the integral of incoming radiance over the collecting solid angle, which grows with **aperture area**

• the **cos θ** in E = ∫ L cos θ dω is **projected area**: a beam (cylinder of light) of width w striking a surface at angle θ spreads over a larger footprint w/cos θ, so the same power is thinned → E ∝ cos θ (Lambert's cosine law). [cosine / projected-area figure]

• **sidebar — seasons**: the same cosine explains seasons — the Earth's 23.5° tilt makes summer sun strike North America near-vertically (concentrated) while winter sun arrives at a grazing angle (spread out, ~half the energy per area). [seasons figure]

• **units — radiometric vs photometric** (worth pinning down): the quantities above are **radiometric** (raw energy, in **watts**). Weight them by the eye's spectral sensitivity — the **luminous-efficiency curve $V(\lambda)$** — and you get the **photometric** quantities the photographic world actually quotes:

• radiant flux **W** ↔ luminous flux **lumen (lm)**

• irradiance **W/m²** ↔ **illuminance lux** (lm/m²) — light *arriving on* a surface (an incident-meter reading)

• radiance **W/m²/sr** ↔ **luminance cd/m²** (the "**nit**") — light *along a ray* / leaving a surface; this is the perceptual "brightness" of a patch, and what displays, HDR, and the photopic/scotopic scale (Perception) are all quoted in

• intensity **W/sr** ↔ luminous intensity **candela (cd)** — the SI **base** unit (≈ one candle)

• so **cd/m² = lm/(m²·sr)**; for scale, a monitor is ~100–300 cd/m², an HDR display ~1000+, paper in sunlight ~10⁴, and the photopic/scotopic ranges (Perception) live on this same axis

• **exposure** H = irradiance × time → ties straight back to aperture (D/f)² and shutter t (Exposure settings)

• **falloff**: a *point* source dims as **1/r²** (inverse-square), but scene **radiance is distance-invariant** — contrast the two carefully, it confuses people

• this radiometry underlies **HDR** (recovering scene radiance) and **computational illumination**

• (the off-axis **cos⁴** edge falloff / natural vignetting is deferred to **Sensors**, where the pixel irradiance integral and a finite aperture make it meaningful)

equations

radiance L (power per area per solid angle, **conserved along a ray** in vacuum)

irradiance E = ∫ L cos θ dω (power per area on the sensor)

exposure H = E·t

inverse-square E ∝ 1/r² (point source)

**photometric** counterparts via the luminous-efficiency curve V(λ): lumen, lux, candela, **cd/m² (nit)**

▸2.1.7 Wave effects, diffraction, and the diffraction limit⧉

`fig-diffraction-aperture-size` · two apertures: a smaller one diffracts more (spread θ ≈ λ/D) 🟨

`fig-airy-disk` · Airy disk = diffraction-limited PSF 🟨

`fig-aperture-sharpness-sweet-spot` · aberration vs diffraction → sharpness sweet spot 🟨

• light is a **wave**: at edges and small apertures it **diffracts** — bends and spreads instead of travelling in straight rays

• consequence for imaging: a **finite aperture cannot focus to a perfect point**; even an ideal lens images a point as a small blur, the **Airy disk**, whose size grows with wavelength × f-number

• the **diffraction limit**: the finest resolvable detail is set by wavelength and aperture (Rayleigh ≈ 1.22 λ/D); a **larger** aperture diffracts **less**

• so **stopping down too far softens** the image — aberrations dominate wide open, diffraction dominates stopped down → a sharpness **sweet spot** (developed with MTF in the Optics part)

• **coherent vs incoherent**: we mostly assume **incoherent** imaging (intensities add, stay positive, linear); coherent / wave optics returns for advanced optics (metalenses, lensless, holography)

• **a live demo — the wave equation, solved**: diffraction through a slit is shown as an actual 2-D wave simulation (FDTD), wavefronts bending and spreading past the opening, with the time-averaged intensity pattern alongside — and a companion widget where a *smaller aperture visibly spreads more* (θ ≈ λ/D). [`fig-diffraction-wave-sim` (animated), `fig-diffraction-aperture-size`]

• **the audio parallel** (use it — sound is the everyday 1-D wave): every wave effect here has a **sound** counterpart, and most people have *heard* them. Diffraction — **bass bends around a doorway** (long λ) while treble stays directional, exactly as a small aperture spreads light; plus **interference** and **beats**, **standing waves** in a pipe or room, and the **Doppler** shift. The governing math is shared (a wave equation, Fourier content, λ/D spreading), so acoustics is a reliable intuition pump for the optics — and the same FDTD solver in `fig-diffraction-wave-sim` *is* an acoustic simulation with light's numbers.

• side bar — **diffraction and duality, revisited with walkers**: diffraction is the *wave* story (a slit spreads light by θ ≈ λ/D); the *photon* story is that **individual** photons still arrive one at a time, yet pile up into the same fringe pattern. The **walking-droplet** analog (see [[#Light: rays, waves, and the spectrum]]) dramatizes the marriage: a single bouncing drop passing a slit is steered by its own **pilot wave** into a **diffraction-like** distribution — so you can literally *watch* a "particle" guided by a wave. [reuse the Couder/Bush walker video → [[Video Resources]]]

equations

Airy-disk angular radius θ ≈ 1.22 λ/D

blur-spot diameter ≈ 2.44 λ·N (N = f-number)

Rayleigh resolution criterion

▸2.2 Pinhole image formation and linear perspective⧉

`fig-camera-obscura` · camera obscura / pinhole (artist's) 🟨

`fig-camera-obscura-apparatus` · camera obscura apparatus (Diderot engraving) 🟨

`fig-bare-sensor-averaging` · a bare sensor averages all rays 🟨

`fig-pinhole-fov` · pinhole geometry: real (inverted) vs virtual (upright) image plane, and focal length → field of view 🟨

`fig-perspective-projection` · perspective projection equation: x'=f·X/Z, y'=f·Y/Z (divide by depth, scale by f) 🟨

`fig-perspective-projection-3d` · perspective projection in 3D: P=(X,Y,Z) → p=(x,y) on the image plane through the pinhole 🟨

`fig-perspective-vanishing-points` · perspective projection + vanishing points 🟨

⬜ figure not yet created

focal length vs subject distance (background compression) fig-focal-length-compression

`fig-face-distortion-sim` · live face-distortion simulator (web edition): focal length + magnification + eccentricity controls with a full-frame view and a face-cropped view, showing the close-wide "big nose" perspective and the wide-angle edge stretch at constant face size; subject = a face, a grid sphere, or one of six diverse Meshy humans framed on the face; static fallback is a screenshot. *Human 3D models generated with Meshy AI.* ✅

`fig-dolly-zoom-sim` · live dolly-zoom simulator (web edition): hold the subject size while focal length and camera distance trade off so only perspective / background scale changes; log focal (12–200 mm) + distance sliders, realistic Poly Haven environments, subject = head bust or one of six diverse Meshy humans. *Human 3D models generated with Meshy AI.* ✅

`fig-portrait-lighting-sim` · live portrait-lighting simulator (web edition): three soft area lights (key/fill/kicker — az/el/extent/intensity/colour, area-light-supersampled soft shadows) on a 3D face with a physiological melanin/hemoglobin skin model; a from-behind setup view with Meshy studio umbrellas + a camera rig; lighting presets; static fallback is a screenshot. *3D umbrella generated with Meshy AI.* ✅

`fig-keystoning-cause` · keystoning is the tilt, not the lens — a camera tilted up makes world-parallel verticals converge (façade → trapezoid), while the fronto-parallel shot keeps them parallel; projection preserves lines, not parallelism 🟨

`fig-point-line-duality` · point↔line duality in homogeneous 2D: two points → a line, two lines → a point (same cross product) 🟨

`fig-projection-decomposition` · projection-matrix decomposition K·[R matplotlib, 3-stage pipeline

`fig-crop-focal-length` · cropping = changing focal length: full frame vs a crop box ≡ a longer-focal-length capture, upsampled 🟨

`fig-depth-vs-ray-length` · depth (z coordinate) vs ray length ‖P‖ from camera to a 3D point; unproject (pixel + depth → 3D) 🟨

• sensor alone would capture a diffuse light. We need to select which rays from which part of the scene contribute to which pixel.

equations

pinhole projection x = f·X/Z, y = f·Y/Z

homogeneous p ≈ K·[R|t]·P

field of view = 2·arctan(sensor / 2f)

▸2.2.1 Pinhole imaging and the perspective projection⧉

• pinhole,

• show camera obscure example (some artist)

• **geometry of pinhole imaging**: each scene point's chief ray passes through the pinhole onto the sensor. The physical sensor sits *behind* the pinhole, so the image is **inverted** (upside-down and left-right flipped).

• we instead use a **virtual image plane the same distance in front** of the pinhole — *not physical*, but it keeps the image **right-side up**, so we adopt it throughout (consistent with the camera looking down **+z** → [[Conventions, Notation & Style#Geometry & coordinate systems]]).

• the pinhole→sensor distance is the **focal length f**: a short f gives a **wide** field of view, a long f a **narrow** (telephoto) one — FOV = 2·arctan(sensor / 2f).

• **side bar — DIY pinhole.** A pinhole needs no lens, so it is the easiest real camera to build: punch a clean tiny hole in foil and tape it over the lens mount of a camera body, or over one end of a lightproof box (a shoebox / oatmeal tin) with film or paper at the other — a working **camera obscura**. Smaller holes are sharper but dimmer (diffraction sets the floor), so exposures run long.

• **side bar — the pinhole bag** (credit the idea to **Freeman, Torralba & Isola**'s online vision book, their Figure 5.5, at <https://visionbook.mit.edu/imaging.html>): pull a bag with a single small pinhole over your head, let your eyes dark-adapt, and you see the **upside-down image of the bright world outside projected on the inside of the bag** — pinhole projection made personal, and an unforgettable demo that the inverted image is real.

• exercise: make your own, do in a room

• mention animal eyes

• linear perspective when camera is at origin and along z axis

• **perspective projection equation**: a scene point (X, Y, Z) projects to screen coordinates **x' = f·X/Z** and **y' = f·Y/Z** — each screen coordinate is the world coordinate **divided by the depth Z and multiplied by the focal length f** (the "divide by z" that makes distant things small). [shown both as 2D similar-triangles and in a 3D view]

• 💡 **Big lesson — with the camera at the origin looking down +z, perspective is *mostly a division by depth*.** In that canonical frame the whole of linear (pinhole) perspective is just $x' = f\,X/Z,\ y' = f\,Y/Z$: divide each world coordinate by the depth $Z$, then scale by $f$. Everything else a camera does — sitting somewhere else, pointing some other way, landing on pixels — is bolted on *around* this one step. The **divide-by-depth is the whole of perspective**; the rest is bookkeeping. → register [[Big Lessons]]

▸2.2.2 Homogeneous coordinates⧉

• **homogeneous coordinates**: write a 2D point as (x, y, 1) up to scale, so projection, **translation**, and camera motion all become **matrix multiplies**

• **translation becomes a matrix, too.** A shift by $(t_x, t_y)$ is *not* linear on $(x, y)$, but on the homogeneous triple it is an ordinary matrix multiply: $\begin{bmatrix}1&0&t_x\\0&1&t_y\\0&0&1\end{bmatrix}\begin{bmatrix}x\\y\\1\end{bmatrix} = \begin{bmatrix}x+t_x\\y+t_y\\1\end{bmatrix}$, and in 3D the $4\times4$ analogue $\begin{bmatrix}I_3&\mathbf{t}\\\mathbf{0}&1\end{bmatrix}$ translates $(X, Y, Z, 1)$. This is *the* reason graphics and vision live in homogeneous coordinates — translation, rotation, projection, and the perspective divide all compose into one chain of matrix multiplies. [translation-matrix in homogeneous coordinates]

▸2.2.3 Translating and rotating the camera⧉

• so far the camera has sat at the **origin looking down +z**, where perspective is the clean divide-by-depth above. But a real camera is **somewhere else, pointing some other way** — so before we can use that canonical projection we must re-express the scene **in the camera's own frame**. This is the step that turns "an arbitrary camera" into "the camera at the origin."

• 💡 **moving the camera = applying the *inverse* transform to the world.** Sliding the camera right by $d$ is indistinguishable from sliding the whole **world left** by $d$; rotating the camera one way is rotating the world the other. So we never move the camera *inside* the projection — we move the **world into the camera's frame** by the inverse rigid transform, then project with the fixed origin/+z pinhole. (This is also why "camera-to-world" and "world-to-camera" matrices are inverses — the convention trap below.)

• that world→camera step is a **rotation R then a translation t**, written as one $4\times4$ matrix in homogeneous coordinates, $\begin{bmatrix}R&\mathbf{t}\\\mathbf{0}&1\end{bmatrix}$, carrying a world point $(X, Y, Z, 1)$ into the camera frame — where the canonical divide-by-depth then applies. Translating the camera is changing $\mathbf{t}$; aiming it is changing $R$.

• 💡 **Big lesson — perspective for an *arbitrary* camera is one $4\times4$ matrix followed by a divide by the last coordinate.** Pose the camera (world→camera $4\times4$), project, and map to pixels — every stage is a matrix on the homogeneous point, so the *whole* geometric pipeline composes into **one matrix multiply**, and the only nonlinearity — the perspective divide — *is* the final "divide through by the last coordinate." The arbitrary-camera case is the origin case (divide by depth) wrapped in two more matrices; homogeneous coordinates are exactly what make that wrapping elegant. → register [[Big Lessons]]

▸2.2.4 Intrinsics, extrinsics, and what cropping really does⧉

• moving camera, matrices, all that good stuff

• **camera intrinsics K** — give them a real treatment, not a bullet: reason first in **normalized image coordinates** (project onto the z=1 plane, independent of sensor size), then map to **pixels** with a second, **affine** matrix carrying the focal lengths fx, fy and the **principal point** (the arbitrary image centre). So the full projection **decomposes** as *(pixel affine) · (project at z=1) · [R|t]* = K·[R|t]. This split is also why **cropping an image = using a longer focal length** — you change only the pixel mapping, not the projection. [projection-matrix decomposition figure; crop = focal-length figure]

• 💡 **Big lesson — cropping a photo *is* changing the focal length.** From the projection geometry, cropping touches only the last stage, the pixel mapping $K$: keep a smaller patch of the normalized image and rescale it to fill the frame — exactly what enlarging $f_x, f_y$ does. A 2× crop and a 2× zoom produce the *same* picture (the crop just has fewer pixels). Scaled up to whole sensors, this same fact is the **crop factor** that relates sensor sizes. → register [[Big Lessons]]

• **camera extrinsics [R|t]** — where intrinsics are *inside* the camera, extrinsics are the camera's **pose in the world**: a **rigid 6-DOF** world→camera transform, a rotation **R** (3×3, 3 DOF — orientation) plus a translation **t** (3 DOF — position). Together they place the world into the camera's frame *before* projection, so the **full projection is intrinsics · extrinsics = K·[R|t]** (the [R|t] moves the world into camera coordinates; K projects and maps to pixels). "Moving the camera" is just changing R and t.

• **the convention trap — cam2world vs. world2cam** (the one place to slow down): the matrix [R|t] above maps **world → camera**; its inverse maps **camera → world**. Worse, packages disagree on whether **t is the camera's position** (its center C in world coordinates) or its **negation** in the projection (t = −R·C). Mixing the two silently flips signs / mirrors a reconstruction — always pin down *which direction* a stored pose maps and *what t means* before trusting it.

• this is exactly the machinery that **multiple-view geometry** and **panorama homographies** rely on — relating two cameras is composing their extrinsics; a pure rotation between views collapses [R|t] into a **homography** (→ [[Multiple view geometry]]; Pano & homographies).

▸2.2.5 What perspective preserves, and what it destroys⧉

• vanishing points

• **what projection preserves vs destroys**: straight **lines → lines** and incidence survive; **angles and lengths do not** (parallels converge to vanishing points, a circle can become an ellipse) — the core fact behind perspective

▸2.2.6 Photography with focal length: framing, magnification, and compression⧉

• Photography, focal length, field of view, various trigomometric ideas, magnification. Difference focal length vs distance and how that changes the background

• the same focal-length-vs-distance distinction governs **portraiture**: facial proportions (and even perceived personality) are set by **camera-to-subject distance**, which is *why* portrait lenses (~85–135 mm) flatter — they let you back up (Perona 2007; full treatment in [[#Lenses and focal length]]).

• sensor size and impact on perspective. Maybe too late?

• sidebar: the (weird) meaning of **"35 mm"** = a **24×36 mm** image frame. The "35 mm" is the *width of the film stock* (including the sprocket-hole margins), not the image; Barnack ran 35 mm cine film sideways, giving a 36 mm × 24 mm frame. Today this size is called **"full frame"**, and smaller sensors are quoted by their **crop factor** relative to it (APS-C ≈ 1.5×, Micro 4/3 ≈ 2×, phone sensors much smaller).

▸2.2.7 Depth, ray length, and unprojection⧉

• **depth vs. ray length**: a depth sensor's "depth" is the **z coordinate**, *not* the ray length ‖·‖ to the point — keep them distinct; **unproject** (pixel + depth → 3D point) inverts the projection (RealSense & friends do this live). Depth sensors exist (→ ToF / structured light / stereo). [depth-vs-ray-length / unproject figure]

▸2.3 Lens image formation⧉

• chapter intro: from the **pinhole** (sharp but dim) to a **lens** (bright but with one focus plane) — refraction at curved surfaces, the thin-lens linearization, and the depth-of-field that the finite aperture forces on us

▸2.3.1 Single lenses: refraction and Snell's law⧉

`fig-single-lens-refraction` · single lens, two interfaces, Snell at each → focus 🟨

`fig-thick-lens-snell` · thick lens: Snell's law at both interfaces, with the incidence/refraction angles and indices of refraction shown 🟨

`fig-fermat-equal-path` · the Fermat / equal-optical-path view of a lens: several rays object→focus, all the same optical path (glass detour offsets the slower speed) 🟨

`fig-thick-lens-wave-sim` · live 2-D FDTD wave sim of a **thick biconvex lens** focusing: a point source's diverging wave is reshaped by the slower glass into a converging one that meets at an image point; focus tracks 1/f=(n−1)2/R and 1/v=1/f−1/u (Fundamentals → Lens image formation, Fermat/equal-path) ✅

• a real lens is a piece of glass bounded by **two curved surfaces** (two interfaces)

• light bends at each interface by **refraction** — Snell's law: n₁ sin θ₁ = n₂ sin θ₂

• shown explicitly on a **thick lens** (the two interfaces drawn far apart): the angle of incidence and the angle of refraction at *each* surface, with the indices n₁ (air) and n₂ (glass) marked. [thick-lens Snell figure]

• a curved interface bends a bundle of rays; the two surfaces of a single lens together steer parallel light toward a **focal point**

• **a second view — Fermat / equal optical path length**: a lens focuses because **every** ray from the object to the focus has the *same optical path length* — the longer geometric detour through the thick centre is exactly compensated by the slower speed (higher index) of glass there, so all paths arrive **in phase** and interfere constructively. The ray (Snell) picture and this wave (Fermat) picture are two takes on the one fact. [equal-path-length figure]

• 🖱️ **Interactive (web edition):** a live **2-D wave simulation of a thick lens** — a point source on the left radiates circular waves that cross a biconvex glass slab (slower inside, higher index n); because the glass is thickest on the axis it retards the centre of each wavefront most, turning the diverging wave into a **converging** one that collapses to an **image point**. This is the Fermat/wave picture made literal: focusing emerges from the wave equation, not from a ray rule. Sliders for the surface curvature R, the index n, and the source distance u, with the focus tracking the **lensmaker's equation** $1/f=(n-1)\,2/R$ and the **imaging equation** $1/v=1/f-1/u$ live on the figure. The lens sits in an opaque mount (aperture stop) so all light is forced through the glass; absorbing (sponge) borders, so nothing reflects off the window edges. Companion to `fig-refraction-wave-sim` (single interface) and `fig-diffraction-wave-sim`. [`fig-thick-lens-wave-sim`; interactive, FDTD]

• **lensmaker's equation** ties focal length to the two radii of curvature and the glass index: 1/f = (n−1)(1/R₁ − 1/R₂)

• this already assumes small angles (paraxial); at large angles the rays do **not** meet at one point → **aberrations** (forward ref: Optics part, where compound lenses correct them)

• eyes have optics too (cornea + lens are the two main refracting elements)

equations

Snell's law n₁ sin θ₁ = n₂ sin θ₂

lensmaker 1/f = (n−1)(1/R₁ − 1/R₂)

▸2.3.2 Thin lens optics⧉

`fig-thin-lens-rays` · thin-lens ray diagram (focus, focal point) 🟨

`fig-thin-lens-conjugates` · conjugates: object at ∞ / far / close / at focal length → where the image forms 🟨

• the **thin lens** as a *simplification / linearization* of the single lens: collapse the two interfaces into one plane, assume small angles and negligible thickness → all bending happens once, at the lens plane (forward ref: Optics part for the accurate thick / compound model)

• precisely, **paraxial = first-order optics**: replace **sin θ ≈ θ** (keep only the linear term). Keeping the *next* term — sin θ ≈ θ − θ³/6 (**third-order optics**) — is exactly what produces the **Seidel aberrations**, so "linearization" and "no aberrations" are the *same* approximation (forward ref: Optics part)

• three-ray construction (parallel→focus, through-centre, focus→parallel) locates the image

• **focus equation** 1/f = 1/u + 1/v (object distance u, image distance v); an object at infinity focuses at f; **focusing** = changing v (move the lens or the sensor) to image nearer/farther objects

• **conjugates**: each object distance pairs with one image distance — object at ∞ → image at f; far → just beyond f; close → far behind; at the focal point F → rays exit parallel (image at ∞). [conjugates figure]

• **symmetry & reversibility**: the focus equation $1/f = 1/s_o + 1/s_i$ is **symmetric** in object and image distance, so object and image are **conjugate** — **swap them and the rays retrace the same path** (the reversibility of light paths). Object and image planes are **conjugate planes**, and the magnification follows directly ($m = -s_i/s_o$, so swapping inverts it). This object↔image **reciprocity** recurs throughout optics (e.g. why a lens images either way, and why entrance/exit pupils are conjugates).

• **magnification** m = −v/u (the image is inverted)

• **lensmaker's equation** (thin-lens form) 1/f = (n−1)(1/R₁ − 1/R₂) — connects f to the glass shape (from the single-lens section)

• **f-number** N = f/D sets the aperture (→ exposure, → blur)

• (depth of field gets its **own section**, next)

equations

thin lens / focus 1/f = 1/u + 1/v

magnification m = −v/u

lensmaker 1/f = (n−1)(1/R₁ − 1/R₂)

f-number N = f/D

▸2.3.3 Depth of field⧉

`fig-circle-of-confusion` · finite aperture & circle of confusion ✅

`fig-defocus-vs-parameters` · defocus blur (CoC) vs aperture and object distance 🟨

`fig-depth-of-field` · near/far limits where blur reaches c (linearized DoF) 🟨

`fig-dof-derivation-near` · near (front) limit by similar triangles — all quantities 🟨

`fig-dof-derivation-far` · far (back) limit by similar triangles — all quantities 🟨

`fig-dof-vs-parameters` · DoF vs aperture, focal length, focus distance 🟨

`fig-hyperfocal` · hyperfocal: focus at H → sharp from H/2 to ∞ 🟨

`fig-dof-same-framing` · same framing + same f-number → same object-space DoF (short vs long focal length); background blur still looks larger with the long lens 🟨

`fig-dof-double-cone` · object-space DoF as a double cone in front of the lens, waist on the focus plane (the blur for one scene point) 🟨

`fig-dof-depth-dependent` · defocus is NOT a convolution: blur radius depends on scene depth (not pixel location), and occlusion at depth edges 🟨

`fig-dof-background-blur` · DoF focal-length invariance is only first-order: background blur c vs background distance for 35/85/200 mm (matched framing & N) coincide near the subject and fan apart far away; + c vs focal length showing growth/saturation ✅

• **defocus / circle of confusion**: a point off the focus plane images to a point that is not on the sensor, so the aperture cone meets the sensor as a blur disk — the **circle of confusion**. A point still counts as sharp while its disk stays within an acceptable c. [CoC figure]

• the blur disk grows with the **aperture** (wider → bigger disk) and with the **object's distance from the focus plane** (zero exactly at focus); focal length and focus distance enter too. [defocus-vs-parameters figure]

• **depth of field** = the *range* of object distances whose blur disk stays within c; the **near** and **far** limits are where the blur just reaches c. [DoF near/far figure — each cone carried out to the CoC at the sensor]

• **deriving the limits**: on the image side a near-limit object images *behind* the sensor and a far-limit object *in front* of it; **similar triangles** relate the aperture D, that convergence distance, and the circle of confusion c — and together with the conjugate relations (1/f = 1/s + 1/v) they fix the near (front) and far (back) object-distance limits. [near-limit and far-limit similar-triangle figures]

• **hyperfocal distance** H = f²/(N·c): focus there and everything from **H/2 to infinity** is acceptably sharp — the focus setting that maximizes DoF (focusing at ∞ instead wastes the H/2…H zone). [hyperfocal figure]

• DoF **grows** with a smaller aperture (larger N), a shorter focal length, and a greater focus distance — these are the paraxial (linearized) formulas; the accurate model is in the Optics part. [DoF-vs-parameters figure]

• **the surprising invariance** (the lecture's flagged "important conclusion"): for a **fixed framing** (same subject size) and a **fixed f-number**, the depth of field in **object space** is essentially **independent of focal length** — switching 28 mm → 100 mm forces you to step back, and the larger physical aperture cancels the longer focal length. So "long lenses have shallow DoF" is really about **magnification**, not focal length per se. (The *background* still looks more blurred with the long lens — that's magnification of the out-of-focus disk, not a DoF change.) [same-framing/same-N → same DoF figure]

• ⚠️ **but the invariance is only *first-order* — the background gives the focal length away.** "Same framing + same f-number → same DoF" is the **small-defocus (paraxial)** result: it holds for the **near/far acceptable-sharpness limits straddling the subject**, but **not** for how hard a *distant* background is thrown out of focus. **Derivation** (thin lens; blur-disk diameter on the sensor): a point at distance $s_b$, with the lens focused at $s_f$, images to a circle of confusion $c \approx \dfrac{f^2}{N}\,\dfrac{|s_b-s_f|}{s_b\,s_f}$. **Fix the framing** — subject magnification $m=f/s_f$ held constant, so $s_f=f/m$ — and **fix $N$**, with the background a gap $\Delta=s_b-s_f$ behind the subject:

$$c(f)=\frac{m^2\Delta/N}{\,1+ m\Delta/f\,}\ \xrightarrow[f\to\infty]{}\ \frac{m^2\Delta}{N}.$$

The subject-zone DoF is ~independent of $f$ (the leading $f$ cancels — the **first-order invariance**), but the **background blur grows monotonically with focal length**, saturating at $m^2\Delta/N$. Two shots matched in framing *and* aperture render the **subject** identically, yet a **200 mm melts a far background far more** than a 35 mm: the correction term is order $m\Delta/f$ — negligible for a background near the focus plane (invariance looks exact), but dominant once the background is many focus-distances away. This is exactly why portrait shooters reach for long lenses for "background separation" even though the DoF on the *face* is unchanged. **Full derivation + plots** of $c$ vs. background distance for 35 / 85 / 200 mm (fixed framing & $N$) — the curves coincide near the focus plane and fan apart far from it. [`fig-dof-background-blur`]

• **object-space picture & a caveat**: the simplest view is a **double cone** in front of the lens whose waist sits on the focus plane — it gives the blur for **one** scene point. But **defocus is *not* a spatially-invariant convolution**: the circle of confusion depends on each point's **scene depth** (not its pixel location), and at depth discontinuities **occlusion** matters — which is why naive "just blur the image" fakes look wrong, and why light-field / depth-aware methods (Advanced) do better. [object-space double cone; depth-dependent-blur + occlusion figure]

• **sensor size scales DoF**: shrink the sensor (keeping FOV) and both the circle of confusion and the aperture shrink, so DoF grows **linearly** with sensor size — why phone cameras have huge DoF (and resort to *computational* background blur), and why **macro is easier on small sensors** (a scaled-down camera photographs a scaled-down world — modulo diffraction)

• 🖱️ **Interactive (web edition):** a live **macro depth-of-field simulator** — a to-scale thin-lens diagram, a lens-supersampled 3D photo (real bokeh), and a circle-of-confusion-vs-distance plot all driven by one shared optics model (focal length, f-number, sensor, focus distance, subject–background gap). A 3D bug (beetle / bee / ladybug) over a flower makes the macro lesson vivid: focus lands on the front of the face and the rest of the body, antennae, and background all fall out of focus. [fig-depth-of-field-sim; interactive] *(bug & flower 3D models generated with Meshy AI — acknowledge in the caption)*

equations

acceptable blur = circle of confusion c

defocus blur diameter b = D·|v − v′|/v′

hyperfocal H = f²/(N·c)

near limit D_n = s(H−f)/(H+s−2f)

far limit D_f = s(H−f)/(H−s)

total DoF = D_f − D_n

▸2.4 Image measurements as integrals⧉

• chapter intro: the unifying idea behind sensing — **measurement = integrating a slice of the plenoptic function** — then the physical sensor that does it (photosites, CCD/CMOS) and the **noise** that limits it

• sidebar — **photography as objective measurement**: the medium's long arc bends toward objective measurement (film/print = a *performance*; → measurement). The **digital sensor** is the far end: because each photosite *counts photons*, a **raw** image is ≈ a **calibrated radiometric measurement** — a physical map of scene radiance, in real units, of one snapshot of time — not just a rendering. This is what makes HDR / depth / deblur possible: the pixel numbers must *mean* something physical. Ties to the intro's **generation → measurement → generation** arc (→ [[01 Intro]]).

▸2.4.1 The plenoptic function⧉

`fig-contrast-af-curve` · contrast-detect AF: high-frequency contrast (sharpness) vs focus position, peaked at best focus; hill-climb arrows that must overshoot past the peak then step back (no direction cue) ✅

fig-pixel-measurement-integral · a pixel value = integral over aperture (thin lens ellipse) + finite pixel area (+ time, wavelength); the 4 domains → DoF, antialiasing, motion blur, color; linked to the plenoptic function 🟨

• (placed here, just before **sensors**, because it pays off as the framing for what a pixel actually measures)

• the **plenoptic function** describes *all the light filling space*: the radiance of every **ray**, parameterized by its 3D position, 2D direction, wavelength, and time — a 7-dimensional function [Adelson & Bergen]

• **any** imaging operation is the **measurement of some portion** of this function; the eye and the camera each sample a slice of it

• a conventional camera records only a **2D projection**: each pixel integrates all the rays reaching it, **discarding their direction** — information that richer ray-based / multi-aperture devices (Advanced part) can recover

• a recurring unifying view: imaging = sampling the plenoptic function

equations

plenoptic function P(x, y, z, θ, φ, λ, t) — a ray's 3D position, 2D direction, wavelength, and time (7D)

**add polarization** as an 8th axis (the field's oscillation orientation) — omitted classically, but physically present and informative (sky polarization, glare/reflections, shape-from-polarization

animals & polarization cameras sample it)

▸2.4.2 The pixel integral⧉

• **the chapter's thesis in one equation — a pixel value is an integral of the [[#The plenoptic function|plenoptic function]]** against a sampling kernel: each photosite *counts photons*, so it reports the **incoming radiance integrated over four physical dimensions**, weighted by the pixel's response. Schematically,

• pixel value ≈ ∫_time ∫_λ ∫_aperture ∫_(pixel area) L(x, y, θ, φ, λ, t) · r(λ) · (geometry) dA dω dλ dt

• i.e. **measurement = integrate the plenoptic function against the pixel's sampling kernel** — and the kernel's *support* in each dimension is exactly a photographic knob. [fig-pixel-integral: thin lens as an ellipse (aperture solid angle), the sensor, one finite pixel; the integral over aperture + pixel area]

• **each integration dimension ↔ a photographic parameter ↔ a downstream effect** (every later sensing topic is "what happens when you change the support of one axis"):

• **sensor footprint (x, y area)** — the pixel integrates radiance over its **finite area** on the sensor, which sets **resolution** and makes a sample a *box-filtered* average rather than a point sample → **sampling & anti-aliasing** (a bigger footprint blurs but suppresses aliasing; pre-filter vs. point-sample lives here).

• **aperture (solid angle of admitted rays)** — the pixel integrates over the **cone of directions** the lens admits, set by the **f-number** N = f/D. Wider aperture = more light *and* a wider integration cone = a shallower **depth of field** (out-of-focus points spread their energy across many pixels → the circle of confusion). The light-vs-blur trade is this single axis. (→ [[#Depth of field]])

• **exposure time** — the pixel integrates over the **shutter interval** t, which sets exposure (H = E·t) and, when the scene or camera moves during t, produces **motion blur** (the width of the temporal kernel). Light-vs-sharpness, the temporal sibling of aperture. (→ Motion / video)

• **wavelength × spectral sensitivity** — the pixel integrates radiance against its **spectral response** r(λ), collapsing a whole spectrum to one number (exactly like a cone, → [[#Color]]). A **color-filter array** gives each photosite a different r(λ) (R/G/B), so color is *spatially multiplexed* and must be reconstructed → **demosaicking** (Bayer).

• **why this framing pays off**: the rest of the sensing story is just *which kernel, how wide, and what it loses* — antialiasing widens the spatial kernel, depth of field is the aperture kernel, motion blur is the time kernel, demosaicking inverts the wavelength multiplexing. Each is also a **source of noise**: integrating fewer photons (small footprint, short time, narrow aperture) means a noisier estimate (→ Noise, next). Capture is **linear** in radiance, which is why this integral is well-defined and why **raw** values *mean* something physical.

▸2.5 Sensors: photosites, CCD vs CMOS⧉

`fig-sensor-microlens` · sensor cross-section: microlens → filter → photosite 🟨

`fig-rolling-shutter-pan` · a slanted, sheared vertical pole from an uncorrected fast pan vs a straightened, rectified version, illustrating line-by-line CMOS readout during camera motion 🟨

`fig-slowmo-axis` · four ways to treat the time axis on a shared timeline — normal capture (sparse), high-speed (dense true samples), interpolation (sparse reals with synthesized in-betweens), and long-exposure blur (the integral); only interpolation adds resolution after capture and can be wrong 🟨

`fig-mirrorless-anatomy` · labeled cutaway of a mirrorless full-frame camera (lens+mount, sensor/IBIS, mech+electronic shutter, EVF, processor/on-sensor PDAF, card+battery); SLR mirror-box inset for contrast ✅

`fig-rolling-shutter-skew` · rolling-shutter distortion — a global shutter keeping a vertical pole upright under a fast pan vs a rolling shutter where per-row readout $t(r)=t_\text{frame}+r\,t_\text{row}$ shears the pole and smears fan blades (the jello effect) 🟨

`fig-flash-sync` · flash & focal-plane-shutter sync: whole frame ≤ X-sync · slit/partial frame above it · high-speed-sync pulse train; fill-flash noted ✅

`fig-darkroom-dodge-burn` · the darkroom ancestor: dodging (hold back light) and burning (add light) under the enlarger as hand-painted spatially-varying exposure 🟨

▸2.5.1 Photon to number: the photosite⧉

• the **digital-sensor revolution came in two steps**: first **electronic** — turning light into an *analog electronic signal* (the photosite + readout; the era of the CCD and analog electronic / video imaging) — and only **then digital** — putting an **analog-to-digital converter (ADC)** on the path so that signal becomes **numbers** we can compute on. Computational photography lives on that second step; flag the distinction up front.

• a **sensor** is a 2-D array of **photosites** (pixels), each a potential well that **counts photons** → charge → voltage → a digital number; counting photons is what makes capture ~**linear** in radiance (keep this short — the deep sensor zoo is Advanced)

• a **microlens** over each photosite gathers light onto its active area (**fill factor** < 100% because of wiring), and a **color filter** sits under it → the **Bayer** mosaic (forward-ref: Demosaicking). [sensor cross-section figure]

• the scene-radiance → pixel-value mapping is the **camera response curve**: **raw** is ~linear in radiance, while the rendered (JPEG) image applies a non-linear response (≈ gamma + a tone curve — see Linear-vs-gamma in Color technology); **recovering** that response is exactly what HDR merging needs

▸2.5.2 CCD vs CMOS⧉

• **CCD vs CMOS**: a **CCD** shuttles charge across the chip in a "bucket brigade" to **one** amplifier at the edge (clean, uniform, but slow and power-hungry, and it reads the whole frame at once → naturally **global** shutter); **CMOS** puts an amplifier (and now the ADC + logic) **at every pixel** and reads in place, **row by row** (fast, cheap, low-power — now utterly dominant). [CCD bucket-brigade vs CMOS read-in-place schematic]

• what CMOS's per-pixel, row-by-row readout **unlocks** (and costs): on-chip ADC, region-of-interest and high-frame-rate readout, on-sensor HDR and PDAF and binning — but, because rows are read at different times, an **electronic rolling shutter** with its motion artifacts (below).

▸2.5.3 Time integration: the exposure (the time axis of the pixel integral)⧉

• the pixel value is an **integral over four dimensions** (aperture × pixel area × wavelength × **time**) — the [[#Image measurements as integrals]] chapter owns the first three; **this is the time one**. Over the **exposure time** Δt the well **accumulates** photon arrivals: value ∝ ∫ irradiance dt. [photons-accumulating-in-a-well figure]

• **exposure = irradiance × time** (the other half of the exposure triangle with aperture; ISO is gain *after*). **Reciprocity**: ½ the light for 2× the time gives the same exposure — until it doesn't (sensor reciprocity holds far better than film's reciprocity *failure*).

• the time integral is exactly **motion blur**: anything that moves during Δt is averaged along its path → a long exposure smears motion (light trails, silky water), a short one freezes it. The deliberate use and the *removal* of this integral are [[Motion blur and temporal sampling]] and deblurring.

• **what ends the integral is the shutter** — which is the rest of this chapter.

▸2.5.4 Mechanical vs electronic shutter⧉

• **mechanical shutter** — a physical curtain (or leaf) opens then closes to time the exposure. A focal-plane shutter is **two curtains**: at short times the second starts closing before the first fully opens, so a **slit** sweeps the frame (already a kind of rolling exposure, in hardware). A **leaf shutter** (in the lens) opens from the centre → exposes the whole frame at once (global) and **syncs flash at any speed**.

• **electronic shutter** — no moving parts: the sensor **resets** each photosite to start its integration and **reads it out** to end it. Silent, vibration-free, and capable of very short or very high-frame-rate exposures — but on CMOS it is usually **rolling** (below). Many cameras use an **electronic first curtain** (electronic start, mechanical end) to cut shutter shock while keeping a clean finish.

▸2.5.5 Global, rolling, and readout shutter⧉

• **global shutter**: every pixel integrates over the **same** time window (all start and stop together), then the frame is read out. No motion distortion. Native to CCD and to special global-shutter CMOS; costs transistors/area per pixel.

• **rolling shutter**: rows are exposed and **read out one after another**, so each row's time window is **shifted** slightly later than the one above. Cheap and standard on CMOS — but rows see the scene at different instants. [row-vs-time diagram: global = a rectangle in (row, time); rolling = a slanted parallelogram]

• the **artifacts** (all from "different rows, different times"): **skew** (a fast pan or a moving car leans over), **wobble / "jello"** (handheld video shimmer), the **spinning-propeller / guitar-string** surreal shapes, and **partial exposure with flash** (a brief flash lights only the rows open at that instant → a **bright band**, which is why flash sync speed exists). [📺 [rolling shutter explained](https://www.youtube.com/watch?v=dNVtMmLlnoE) — Smarter Every Day; `fig-rolling-shutter` interactive: a vertical rod / spinning blade under global vs rolling readout]

• **lighting & flicker banding**: mains-powered and many **LED/PWM** lights **flicker** (100/120 Hz, or fast PWM dimming); under a rolling shutter, rows captured at different phases of that flicker get different brightness → horizontal **banding** (and colour banding if RGB phosphors lag). Fixes: match the exposure to the flicker period, **anti-flicker** modes that time the readout, or a global shutter. Ties to video [[Video stabilization and rolling-shutter correction]].

• **correcting rolling shutter** computationally — estimate the per-row time offset and the camera/scene motion, then warp each row back to a common instant → [[Video stabilization and rolling-shutter correction]].

▸2.5.6 Modern sensor tricks (a taste; the zoo is Advanced)⧉

• **off-axis (cos⁴) falloff** still applies once there's a sensor + finite aperture: rays reaching the frame edge arrive obliquely → **cos⁴θ** dimming toward the corners (**natural vignetting**), distinct from optical vignetting (Optics).

• **dual conversion gain (DCG)** — a CMOS pixel can switch the **sense-node capacitance**, changing the conversion gain (µV per electron): **high-conversion-gain** mode (small capacitance → lower read noise, better low light / high ISO) vs **low-conversion-gain** mode (large capacitance → bigger full well, more highlight headroom). Two settings → two **native ISOs** (**dual native ISO**); reading/combining both extends dynamic range → [[Noise, signal-to-noise ratio and dynamic range]].

• **quad-Bayer (Tetracell / quad CFA)** — the CFA groups **2×2 same-color photosites** under one filter tile. Good light: the sensor **remosaics** back to full-resolution Bayer; low light: it **bins** each group into one larger effective pixel (less noise, ¼ resolution). How 48/50/108 MP phone sensors get clean low-light output and in-pixel staggered-exposure HDR. Variant: **3×3 Nonacell**. → [[Demosaicking]], [[HDR merging]].

• **dual-pixel and quad-pixel autofocus (on-chip lens, OCL)** — split a photosite into **2 (dual-pixel)** or **4 (quad-pixel, 2×2 OCL)** sub-photodiodes under **one microlens**; each sub-diode sees a different half/quadrant of the lens pupil → **phase-detection autofocus (PDAF) at every pixel** (a stereo baseline inside the pixel); sub-diodes are **summed** for the final image (no light loss). Sensor structure only — the optics/AF algorithm are in [[Focus, autofocus]].

▸2.6 Noise, signal-to-noise ratio and dynamic range⧉

⬜ figure not yet created

noisy image + flat-patch histogram (ISO 3200) fig-noise-histogram

⬜ figure not yet created

noise vs ISO fig-noise-vs-iso

`fig-noise-affine` · measured noise variance is affine in brightness (σ²≈gain·I+read²) — real per-pixel variance vs mean from a 50-frame aligned ISO-3200 burst, with the affine fit, read-noise floor, and highlight roll-off (Noise, SNR, dynamic range) ✅

`fig-noise-gaussian-pixels` · a single pixel's value across many frames is ≈ Gaussian — per-pixel histograms from the ISO-3200 burst with Gaussian overlays (Noise) ✅

`fig-noise-truncation` · noise clips at black/white so it is not zero-mean at the extremes — a near-black pixel pinned at 0 ~44% of frames, its average biased bright, vs an unbiased midtone (Noise; denoising trap) ✅

`fig-snr-vs-stddev` · noise std-dev map vs SNR map of a brightness ramp: std rises with brightness (∝√N), but SNR=√N is worst in the shadows 🟨

⬜ figure not yet created

**underexposure recovery — full-frame vs phone**: the same underexposed scene pushed up several stops in software, the full-frame frame cleaning up while the phone reveals amplified shadow noise / banding [fig-underexposure-recovery-ff-vs-phone fig-underexposure-recovery-ff-vs-phone

⬜ figure not yet created

photo] fig-results-montage

`fig-dynamic-range-comparison` · dynamic-range ladder in **stops** (horizontal bars): colour slide (~5–6) · reflective print (~6–7) · film negative (~12–13) · phone sensor (~10–12) · full-frame sensor (~14) · human eye instantaneous (~10–14) vs adapted (~20+) · an animal example · a typical sun-and-shadow scene (>20) — shows why no single capture holds a high-contrast scene (→ HDR) ✅

⬜ figure not yet created

an animal or two

• the noise **sources** (their variances add):

• **photon / shot noise** — the Poisson statistics of *counting photons*: variance = mean, so σ = √N. Fundamental (it's the light itself), dominates the midtones/highlights; averaging N independent frames cuts it by √N.

• **read noise** — added by the amplifier / ADC on readout; signal-independent, dominates the **shadows** and sets the noise floor (hence dynamic range).

• **thermal / dark current** — electrons freed by heat, independent of light; grows with exposure time and temperature (long exposures & astrophotography → sensor cooling, dark-frame subtraction).

• **fixed-pattern / pattern noise** — per-pixel gain/offset non-uniformity (PRNU/DSNU), hot/stuck pixels, banding; *structured*, so it reads as worse than random noise of equal magnitude, and is removed by calibration (flat / dark fields).

• 💡 **Big lesson — in a *linear* image, noise variance is an *affine* function of brightness.** The variances add: shot noise gives a term **proportional to the signal** (Poisson, variance = mean, scaled by the gain) and read noise a **constant** floor, so the total is **variance ≈ gain·signal + read²** — a straight line in brightness. You can *measure* it: take an aligned **burst** of a static scene and, per pixel, plot the variance across frames against the mean — the points trace that line, slope = photon gain, intercept = the read-noise floor. And a single pixel's value over many frames is **≈ Gaussian**, which is what licenses additive-Gaussian noise models. [`fig-noise-affine` — per-pixel variance vs brightness, measured from a 50-frame ISO-3200 burst; `fig-noise-gaussian-pixels` — per-pixel histograms] → register [[Big Lessons]]

• 💡 **Big lesson — noise is *clipped* at black and white, so near the extremes it is *not* zero-mean.** A recorded value is clamped to [0, max]: near black the negative half of the noise is cut off (frames pile up at 0), near white the positive half is. So at the extremes the noise distribution is **asymmetric** and its mean is pushed **inward**. The consequence for **denoising**: any naive average/smoothing returns that biased mean, so it makes **shadows come out too bright and highlights too dark** — a bias every denoiser has to correct for (carried forward to [[#Denoising]] in BASIC). [`fig-noise-truncation` — a near-black pixel pinned at 0 a third of the time, its average biased bright; from the ISO-3200 burst] → register [[Big Lessons]]

• **the key intuition — it's about ratios (SNR)**: shot noise is **Poisson**, so σ = √N grows with brightness — measured noise is actually **higher in the highlights**. Yet noise *looks* worst in the **shadows**, because perception cares about the **signal-to-noise ratio** SNR = N/√N = √N, which is worst where N is small. "Ratios are all that matters" — the same reason we encode in gamma/log, and the motivation for ETTR (below â Photography 101). [std-dev map vs SNR map figure]

• **dynamic range** = the brightest recordable signal (sensor **saturation / full-well**) ÷ the **noise floor** (read noise in the shadows); it sets how much of a high-contrast scene fits in a *single* exposure → the motivation for **HDR** (Multiple exposure)

• **HDR (high dynamic range)**, sometimes called **wide dynamic range (WDR)**, is a deliberately **fuzzy** term — flag this. It gets used for at least four different things: the *scene* (a high-contrast world), the *capture* (multi-exposure / a high-DR sensor), the *file/encoding* (float or ≥10-bit radiance, e.g. OpenEXR, HDR10), and the *display* (an HDR monitor) — and, loosely, for the **tone-mapped "HDR look."** There is no single sharp definition; say which sense you mean. (Developed in [[Multiple exposure imaging]].)

• 💡 **Big lesson (L15) — dynamic range is set by full-well capacity over the noise floor**: a single exposure records from a **top** (the photosite's **full-well capacity** — where it clips to white) down to a **bottom** (the **noise floor** — read noise in the shadows); their **ratio is the dynamic range** (quoted in stops). You **widen** it with a **bigger well** (larger photosites — why a full-frame sensor out-ranges a phone) or a **lower floor** (cooling, lower read noise), and you **beat** it altogether by merging **multiple exposures** (HDR). The capture-side companion to **L6**: what bounds a shot is this *range*, not the bit depth. → register [[Big Lessons]]

• **examples, in stops** (the figure): a color **slide** is narrow (~5–6 stops), a reflective **print** ~6–7, **film negative** wide (~12–13 latitude), a **phone** sensor ~10–12, a **full-frame** sensor ~14; the **human eye** is ~10–14 stops *instantaneously* but ~**20+** once **adaptation** is allowed — and some **animals** do better in their niche; an outdoor sun-and-shadow scene can exceed **20 stops**, which is why no single capture holds it [fig-dynamic-range-comparison]

• **see it — recover an underexposed shot, full-frame vs phone**: shoot the *same* underexposed scene as **raw** on a **full-frame** camera and on a **phone**, then **push the exposure up** in software (e.g. +3–4 stops). The full-frame frame cleans up (deep full-well + low read-noise floor → wide dynamic range), while the phone frame breaks down into **shadow noise and banding** as the lift amplifies its read-noise floor — a direct, visible demonstration that **dynamic range scales with photosite / sensor size**, and exactly why phones lean on **burst / HDR+** (Multiple exposure) rather than a single exposure. [fig-underexposure-recovery-ff-vs-phone; photo — paired Fredo full-frame DNG + phone raw] *(also a coding exercise — see [[Exercises & Experiments]])*

• 💡 **Big lesson (L6) — quantization is rarely the real problem**: with enough bits and a sane encoding, it's **noise and dynamic range** (this section) that bite — not the number of levels. The one exception is **gamma / banding in shadows**, which is exactly why gamma encoding exists (allocate codes perceptually); so "rarely" ≠ "never". Reminded again in BASIC → Image representation → *Float vs 8-bit*.

equations

shot noise Poisson (variance = mean), σ ∝ √N, SNR = √N

variances add (shot + read + thermal)

averaging N frames → noise /√N

PSNR = 10·log₁₀(MAX²/MSE)

dynamic range = full-well ÷ read-noise floor

▸2.7 Multiple view geometry⧉

• chapter intro: the leap from one image to **two** — what a second viewpoint buys you (depth), built on the stereo / disparity intuition, then the epipolar-geometry derivation of E/F, with only their estimation and the multi-view scaling-up forwarded

▸2.7.1 Two views: stereo and disparity⧉

⬜ figure not yet created

a stereo pair (anaglyph) fig-stereo-pair

`fig-disparity-geometry` · parallel-camera disparity geometry 🟨

⬜ figure not yet created

a disparity map fig-disparity-map

`fig-random-dot-stereogram` · a random-dot stereogram pair (Julesz): a shifted central patch → depth from disparity alone, no monocular cues 🟨

• two eyes / two cameras → **disparity** (nearer points shift more)

• **random-dot stereograms** (Julesz): two random-dot images identical except for a shifted patch fuse into a floating surface — proof that **disparity alone yields depth**, with no recognizable objects or monocular cues at all ("depth without objects") [random-dot stereogram figure]

• simple parallel-camera geometry: depth ∝ baseline · focal length / disparity

• sidebar — **widening your baseline**: since disparity ∝ baseline, gadgets that optically push your eyes farther apart exaggerate depth (**hyperstereo**) — Helmholtz's **telestereoscope** (mirrors, 1857), military **optical rangefinders** & the **scissors periscope** (*Scherenfernrohr*), vs **hypostereo** (shrunk baseline) for macro 3D

• disparity map; rectification (a homography) when the cameras aren't parallel

• ties to panorama stitching; forward ref: multi-view stereo & depth (Advanced)

equations

depth Z = f·B / d (baseline B, disparity d)

▸2.7.2 Two-view geometry: epipolar lines, and the essential and fundamental matrices⧉

`fig-epipolar-line` · the epipolar constraint: two views of a 3-D point, epipolar plane & line, the match constrained to a 1-D search ✅

• the key fact, in one picture: a point in one view does **not** pin its match to a point in the other — it constrains it to a **line**. Given a pixel in the left image, its 3-D point lies somewhere along a single ray; that ray projects, in the right image, to a line — the **epipolar line**. So matching is a **1-D search along a line**, not a 2-D search (this is exactly why we *rectify* stereo pairs — to make those lines horizontal). [fig-epipolar-line]

• **derive the matrix straight from that picture** — ray → plane → line → algebra:

• **(1) the ray (depth ambiguity)**: a pixel back-projects to a ray; in **calibrated** coords ($\hat{x}=K^{-1}x$) the 3-D point is $X=\lambda\hat{x}$, depth $\lambda$ unknown.

• **(2) the epipolar plane**: that ray together with the **second camera centre** spans a plane (equivalently: the plane through both camera centres and $X$ — independent of $\lambda$).

• **(3) the epipolar line**: the plane meets the second image in a line, so the match $\hat{x}'$ must lie on it.

• **(4) write the plane in 3-D**: with relative pose $(R,t)$ ($X\mapsto RX+t$; the first centre maps to $t$ = the **baseline** vector, the ray direction maps to $R\hat{x}$), the three vectors $\hat{x}'$, $t$, $R\hat{x}$ are **coplanar** → vanishing triple product $\hat{x}'\cdot(t\times R\hat{x})=0$ → $\hat{x}'^{\mathsf T}[t]_\times R\,\hat{x}=0$ → the **essential matrix** $E=[t]_\times R$.

• **(5) → the fundamental matrix**: in raw pixel coords, $x'^{\mathsf T}Fx=0$ with $F=K'^{-\mathsf T}[t]_\times R\,K^{-1}=K'^{-\mathsf T}EK^{-1}$; $Fx$ is the epipolar line. The relative pose $(R,t)$ (translation only up to scale) is recoverable from $E$ — the seed of multi-view reconstruction.

• **forward-ref** (deferred to Advanced/3-D part): *estimating* F/E from noisy matches (the **8-point algorithm**, RANSAC), rectification, triangulation, and **structure-from-motion / multi-view stereo**.

▸2.8 Human (and animal) vision and color⧉

▸2.8.1 Visual system anatomy:⧉

`fig-eye-cross-section` · eye cross-section 🟨

`fig-retina-layers` · retina layers & photoreceptor synapses 🟨

`fig-cone-rod-distribution` · cone/rod distribution & fovea 🟨

`fig-visual-pathway` · pathway retina→LGN→V1 🟨

`fig-photoreceptor-cell` · anatomy of a rod & cone cell 🟨

`fig-rods-cones-micrograph` · micrograph of rods & cones (primate retina) 🟨

• light, lens, pupil, retina,

• cones/rods, distribution, fovea — note there are **no S (blue) cones in the very centre of the fovea**, and far more L+M than S overall, so **blue is low-resolution and a bit out of focus** (chromatic aberration) — yet we perceive blue everywhere (the brain fills it in). Luminance ≈ the **L+M cone sum**, which is why **green carries most of the acuity** (and why Bayer uses 2× green).

• cells, ganglions, on-retina spatial processing, visual cortex, V1, higher.

• **the eye as a camera — and its refractive errors.** Like any lens system the eye can mis-focus: **myopia** (eyeball too long / cornea too curved → image forms *in front* of the retina), **hyperopia** (too short → *behind*), **astigmatism** (a non-spherical cornea → two focal lines, not one point), and **presbyopia** (the aging lens stiffens and loses **accommodation**). Each is corrected by a lens quoted in **diopters** (D = 1/focal-length in metres): a negative/diverging lens for myopia, positive/converging for hyperopia, a **cylindrical** lens (with an axis) for astigmatism. [`fig-refractive-errors`]

• **how the prescription is *measured* (the machines), and why it's just inverse imaging.** **(1) Retinoscopy** — shine a moving streak of light into the eye and watch the **retinal reflex** sweep across the pupil; its direction and speed reveal myopia vs hyperopia, and trial lenses are stacked until the reflex stops moving ("neutralization"). **(2) Autorefractor** — automates this objectively: it projects an **infrared** target onto the retina and analyses the **returning image** (its focus/size, or a Hartmann–Shack / Scheimpflug pattern) to solve for the corrective lens that makes the retinal image sharp — essentially a little camera running autofocus *on your eye*, in seconds. **(3) Phoropter** — the familiar "better one… or two?" wheel of trial lenses over an eye chart: a **subjective** refinement of the machine's objective estimate. **(4) Wavefront aberrometer (Shack–Hartmann)** — for the full optical picture: a **lenslet array** samples the wavefront leaving the eye and measures **higher-order aberrations** (coma, spherical, trefoil), not just sphere/cylinder — the same wavefront sensing used in **adaptive optics** and in planning **LASIK**. The throughline: measuring an eye is **inverse imaging** — find the optics that would make the retinal image (or the returning wavefront) ideal. (Cross-ref wavefront / Shack–Hartmann sensing and adaptive optics in [[Optics, lenses, and aberration correction]].)

▸2.8.2 Color⧉

`fig-cone-sensitivities` · cone spectral sensitivities (L/M/S) 🟨

`fig-spectrum-to-three-responses` · spectrum → 3 responses (projection) 🟨

`fig-metamers` · metamers (two spectra, one color) 🟨

`fig-ishihara` · Ishihara plate (“74”) 🟨

`fig-cone-mosaic` · trichromatic cone mosaic (L/M/S, fovea) 🟨

`fig-cone-response-matrix` · cone response as a 3×N matrix–vector product r = C·E (discretized λ); note the continuous / infinite-D limit r_k = ∫ c_k(λ) E(λ) dλ 🟨

`fig-nonorthogonal-dual-basis` · non-orthogonal cone basis (two vectors in the positive quadrant) and its dual: synthesis vs analysis; the dual (analysis) basis has negative coordinates 🟨

`fig-opsin-tree` · the opsin gene tree (schematic): ancestral opsin → RH1 rod + cone classes (SWS1/SWS2/RH2/LWS); the human S/M/L branch highlighted, with the recent primate L/M gene duplication marked ✅

• cones, projection from spectrum to 3 responses

• relate to measurement equation in image formation

• **the response is a matrix–vector product**: discretize the spectrum into N wavelength bins → a vector E ∈ ℝᴺ; stack the three cone sensitivities as the **rows of a 3×N matrix C** (one row per L/M/S); the cone responses are then **r = C·E**, i.e. each r_k = Σ_i c_k(λ_i)·E(λ_i). In the real world wavelength is **continuous**, so it's really **infinite-dimensional** — the sum becomes the integral r_k = ∫ c_k(λ)·E(λ) dλ. [cone-response matrix figure]

• insist a given cone does not make the difference between wavelength. Example of monochromatic stimuli

• point out they are non orthogonal and lots of stuff can’t be negative.

• This make linear algebra more messy

• 💡 **Big lesson:** **color is non-orthogonal *and* non-negative** — overlapping cone axes plus the no-negative-light constraint are what make color algebra tricky (analysis needs a dual basis with negative coordinates).

• side bar: **non-orthogonal bases & the dual basis** — the cone "axes" are **not orthogonal** (L and M overlap heavily), so the comfortable orthonormal intuition fails. Picture two basis vectors **c₁, c₂ both in the positive quadrant** of the (canonical) spectrum space, each standing for one cone. **Synthesis** is easy — any point is a combination a₁c₁ + a₂c₂ (a parallelogram). But **analysis** — recovering the coordinates a₁, a₂ — is **not** a plain projection onto c₁, c₂; you must project onto the **dual (reciprocal) basis** c₁\*, c₂\* (defined by cᵢ\*·cⱼ = δᵢⱼ). For a non-orthogonal positive basis the **dual vectors point partly into the negative quadrants** — they have **negative coordinates** in the original space. That is exactly why "stuff can't be negative" makes color algebra messy: the natural *analysis* vectors are negative. (Only for an **orthonormal** basis do a basis and its dual coincide — then analysis = synthesis = projection.) [non-orthogonal dual-basis figure]

• color blindness. Linear algebra perspective (projection loses info)

• we are all color blind

• side bar — **animal vision and the divergence of opsins**: "color" is just whatever set of **opsins** (photopigments) an animal carries, and these have **diverged enormously** across major animal groups — many insects are UV–blue–green trichromats, birds and reptiles are often **tetrachromats** (a fourth, UV cone), most mammals reverted to **dichromacy** (primate trichromacy is *re-evolved*, via a gene duplication), and the **mantis shrimp** carries ~12–16 photoreceptor classes. So human red–green color blindness is one point in a vast space — every species is "color blind" relative to some other. The opsin family tree and its divergence are traced in the "Evolution of Eyes" chapter of *Vision* (Cambridge); the schematic opsin gene tree (ancestral opsin → rod RH1 + cone classes, human S/M/L with the recent L/M duplication) is [fig-opsin-tree]. [→ Animal eyes]

• the **multistage color model** — the brain re-codes color in stages (simplified; see Reinhard et al., *Color Imaging* for the full version):

• **stage 1 — cones**: three heavily-overlapping LMS responses (above)

• **stage 2 — opponent recoding** in the retina / LGN: LMS → **light–dark, red–green, blue–yellow** (decorrelates the channels, matches the four unique hues; → next section)

• **stage 3 — appearance level**: a higher organization closer to **hue / saturation / brightness** (an HSV-like arrangement) where context, adaptation and memory act (→ color appearance models, Color technology)

• **focal colors & naming**: color *names* cluster around shared **focal colors** across languages (Berlin & Kay; the World Color Survey) — evidence the recoding is perceptual, not arbitrary

• experimental support: **psychophysics** (hue-cancellation — Hurvich & Jameson) and **physiology** (opponent cells — De Valois)

equations

cone response r_k = ∫ E(λ)·c_k(λ) dλ, k ∈ {L,M,S} (a projection)

metamerism = same r_k for different E(λ)

▸2.8.3 So what are the primary colors?⧉

• the perennial muddle (RGB? RYB? CMY?) has *two separate* causes, neither of which is a fact about light:

• **dual-process / opponent coding** (next section): the visual system re-wires the three cone signals into **red–green, blue–yellow, light–dark**, so *red, green, blue, and yellow* all feel "primary" perceptually — there are **four** psychological unique hues, not three. This is why RYB feels natural to painters and why the cones' actual L/M/S don't match anyone's intuition of "primaries."

• the rest is **additive vs subtractive synthesis** (Color technology): **RGB** are the *additive* (light) primaries, **CMY** the *subtractive* (ink/filter) primaries; grade-school **RYB** is a folk version of subtractive mixing. So "primary" conflates a **perception** fact with a **technology** choice — different goals, different "primaries."

• source: lecture — the opponent rewiring "is one of the many causes of all the confusion about what's a primary color" (transcript 10/2); pair with [[Glossary]] entry for *primary*.

▸2.8.4 Opponent process and the multistage model⧉

`fig-opponent-channels` · opponent channels (R–G, B–Y, light–dark) 🟨

• two stages: trichromatic at the cones, then opponent in the retina / LGN — red–green, blue–yellow, light–dark

• explains afterimages and color naming; and why **luma/chroma** (YCbCr — luma Y′ on gamma-encoded RGB, see *Linear vs Gamma*) compresses well (see JPEG)

equations

opponent = linear map LMS → (L−M, S−(L+M), L+M) (YCbCr-like)

▸2.8.5 Light adaptation⧉

`fig-spanish-castle` · afterimage / Spanish-Castle illusion ✅

`fig-photopic-scotopic-range` · the photopic / mesopic / scotopic regimes on a horizontal log-luminance axis (cd/m², ~10⁻⁶→10⁸): rod vs cone activity bands, the absolute threshold, the Purkinje shift, and landmarks (starlight, moonlight, indoor, overcast, sunlight, the sun); the eye's full range vs the few-log instantaneous range (→ adaptation, HDR) ✅

• we recenter the neural response around the current average brightness — spanning a huge range over *time*, not all at once

• mechanisms: neural, chemical (photopigment bleaching), and the pupil

• afterimages; the Spanish-Castle illusion

• **the operating regimes — photopic, mesopic, scotopic** (which receptors are working, and *how bright* that is, in **luminance** cd/m²):

• **photopic** (daylight, **cones**, full color, sharp foveal vision): roughly **> 3 cd/m²** — office light ~10²; an overcast sky ~10³–10⁴; a sunlit scene ~10⁴–10⁵

• **mesopic** (dusk / dim interior, **rods *and* cones together**): roughly **10⁻³ – 3 cd/m²** — colors desaturate and the **Purkinje shift** sets in (as rods take over, blues look relatively brighter and reds darker)

• **scotopic** (night, **rods only** — no color, poor acuity, and *off-fovea* since the fovea is all cones): below ~**10⁻³ cd/m²** — moonlit landscape ~10⁻²; starlight ~10⁻³–10⁻⁴; the **absolute threshold** of vision ~**10⁻⁶ cd/m²**

• big picture: the eye works across a staggering **~10⁻⁶ to ~10⁸ cd/m²** (≈ 14 log units), but only over a few log units **at once** — which is exactly why **adaptation** (above) and, in cameras, **HDR / tone mapping** exist. [fig-photopic-scotopic-range]

equations

Naka–Rushton response R = Iⁿ / (Iⁿ + I₀ⁿ) (saturating)

▸2.8.6 Lightness constancy⧉

`fig-checker-shadow` · Adelson's checker-shadow illusion — the original artwork: the illusion (squares A/B identical luminance) beside the equal-grey-bars proof ✅

• the visual system reports a surface's intrinsic **lightness** (its reflectance / albedo), **not** the raw luminance reaching the eye — a white page in shadow still looks white, a black one in sunlight still looks black, even when the shadowed white sends *less* light to the eye than the sunlit black

• it **discounts the illuminant**: the brain factors the retinal image into **illumination × reflectance** (the same entanglement from Light and physics) and reports the *reflectance* — the property we actually care about

• this is an **ill-posed inversion** (one image, two unknowns), so the brain leans on cues and priors: **edges and context** tell it where the *illumination* changes versus where the *surface* changes

• the canonical demo: **Adelson's checker-shadow illusion** — two squares of identical luminance look very different because the brain infers the cast shadow and discounts it

• **Retinex** (Land & McCann) models the same idea computationally: treat slow spatial gradients as illumination and sharp edges as reflectance changes, then integrate the edges back to lightness

• ties to **intrinsic images** (split a photo into reflectance + shading) and to the darkroom craft of **dodging & burning** / tone mapping

▸2.8.7 Color constancy⧉

`fig-white-balance-real` · white balance computed from scratch (von Kries per-channel gain in linear light): a warm-casted photo corrected by gray-world vs white-patch/max-RGB, with the recovered channel gains (PS1) ✅

⬜ figure not yet created

the same surface under two illuminants reading the same color fig-wb-before-after

⬜ figure not yet created

"the dress" — identical pixels read blue-black or white-gold depending on the illuminant the viewer assumes fig-the-dress

• the **color analog** of lightness constancy: we perceive an object's intrinsic **surface color** (its reflectance) despite large changes in the **illuminant** — a white shirt reads white under noon daylight, a warm lamp, or blue shade, even though the spectra reaching the eye differ a lot

• again the brain **discounts the illuminant**: it estimates the light's color and divides it out so the surface reads consistently — the same ill-posed (**illuminant × reflectance**) inversion, solved with assumptions (e.g. the average scene is roughly gray, or the brightest patch is white)

• a key mechanism is **von Kries adaptation**: an *independent gain on each cone type* (L, M, S) rescales the response to the prevailing light — plus spatial / contextual cues; it is **not perfect** (some cast remains, and memory colors bias us)

• this is the **biological counterpart of white balance** (Color technology) — a camera's automatic white balance must do, by algorithm, what our visual system does for free

• classics: **Retinex** (Land & McCann); **Bayesian** color constancy (Freeman & Brainard); gray-world / bright-pixel heuristics

• the **"the dress"** demo (2015): the *same* image pixels look **blue-black** to some viewers and **white-gold** to others — because the brain must *guess the illuminant* (cool daylight vs warm indoor light) and divides it out differently. A perfect, viral illustration that color constancy is an **ambiguous inference**, not a measurement — the visual system commits to a prior and different people commit differently. Ties to white balance (a camera faces the exact same ill-posed guess).

▸2.8.8 Contrast⧉

`fig-simultaneous-contrast` · simultaneous-contrast patch 🟨

`fig-contrast-ratios` · contrast-as-ratios strip 🟨

• contrast is about *ratios*: 1→2 looks like 100→200 (discounts the multiplicative effect of light) — foreshadows gamma / log encoding

• Weber's law / just-noticeable difference

• simultaneous contrast; the Adelson checker-shadow illusion

equations

Weber's law ΔI/I ≈ const

Michelson contrast = (I_max − I_min)/(I_max + I_min)

▸2.8.9 Spatial vision⧉

`fig-campbell-robson` · Campbell–Robson CSF chart 🟨

`fig-csf-chromatic` · three contrast-sensitivity functions: luminance (band-pass) vs red–green and blue–yellow (low-pass, lower cutoff) — we resolve colour more coarsely than brightness 🟨

`fig-blur-chroma-luma` · blur only the chroma (Lab) and the image still looks sharp; blur the luminance and it falls apart — the basis for chroma subsampling ✅

• Visual Acuity

• Simultaneous contrast

• Mach bands; lateral inhibition (center-surround receptive fields)

• **CSF** — measured by **psychophysics** (show **sine-wave gratings**, ask "do you see a grating?"); the **contrast sensitivity function** is band-pass in spatial frequency. Crucially the **chromatic** CSFs (red–green, blue–yellow) cut off at **far lower** spatial frequency than the **luminance** CSF — we resolve **color** much more coarsely than **brightness**. [three-CSF figure]

• the dramatic demo: in Lab, **blur the chroma** channels and the image still looks sharp; blur the **luminance** and it falls apart — the empirical basis for **chroma subsampling** (JPEG / YCbCr) and a confirmation of the opponent model. [blur-chroma vs blur-luminance figure]

▸2.8.10 Temporal vision⧉

• the visual system trades some spatial resolution for **temporal** resolution: a **temporal CSF** band-passes flicker, with a **critical flicker-fusion** frequency (~50–60 Hz in bright light, lower when dim) above which a flickering source looks steady — why displays/film refresh fast enough to fuse, and why old CRTs / fluorescents could flicker in peripheral vision

• **apparent motion** (the phi phenomenon) + persistence of vision make a sequence of frames read as continuous movement (→ Motion/Video)

• **temporal adaptation** (light/dark adaptation over seconds–minutes; → Light adaptation) and the eye's own motion (saccades, microsaccades) shape what we see over time

• exploited by computation: **frame-rate** choices, flicker-free capture, and **video magnification** (amplifying tiny temporal changes — Motion/Video)

▸2.8.11 Attention and eye movements⧉

⬜ figure not yet created

a fixation scanpath over an image fig-scanpath

• attention; fixations; saccades; smooth pursuit

• the eye samples a scene through rapid saccades + a high-res fovea

• saliency (what draws the eye) is modeled computationally → see [[Human factors#Computational models of perception]] (the model and its uses) — the ML estimator is in [[Computational tools#Machine learning]]

▸2.8.12 Perception of color and images⧉

`fig-hunt-effect` · colorfulness vs luminance (Hunt) 🟨

`fig-bezold-brucke-abney` · Bezold–Brücke / Abney demos ✅

• color reproduction kind of stuff

• blue shift

• colorfulness as a function of luminance (Hunt effect)

• hue shifts: Bezold–Brücke (with intensity), Abney effect (with added white)

▸2.8.13 Animal eyes⧉

`fig-eye-evolution` · the evolution of the eye, in cross-sections: flat photoreceptor patch (no image) → cup (directional) → pinhole (an image, no lens) → lensed eye — recapitulating sensor → camera-obscura → lens (Bonus: animal eyes) 🟨

• **the eye evolved as a sequence of small improvements**, and it recapitulates the optics of this book — from "no image" to a focused one. Following Nilsson's stages (Land & Nilsson, *Animal Eyes*):

• **a flat patch of photoreceptors** (an *eyespot*): just **photosites**, **no image formation** — it tells light from dark and, crudely, which side has *more* light (phototaxis), but cannot form a picture. This is the bare **sensor** with no optics.

• **a concavity / cup**: fold the patch into a **pit** and the rim's shading gives **directional** sensitivity — the deeper the cup, the better you can tell *where* light comes from.

• **a pinhole**: close the cup almost shut → a **pinhole eye**, a real (if dim) **image** with no lens at all (the **nautilus** still uses one) — exactly the camera obscura of [[#Pinhole image formation and linear perspective]].

• **a lens**: fill the aperture with a **refractive lens** (often a graded-index sphere) → a **bright, focused** image, independently evolved in fish, cephalopods, and vertebrates. Add a **cornea, iris, and accommodation** and you have the human eye.

• so the human eye is *one* solution; evolution reached the same goal by very different routes (below).

• **a tour of those other solutions** — a useful contrast to the human eye and to camera design:

• **compound eyes** (insects): many **ommatidia** → wide field of view and fast temporal response, but low spatial resolution

• **more cone types**: birds / reptiles are **tetrachromatic** (a 4th cone, into the UV); the **mantis shrimp** has ~12–16 photoreceptor classes (yet poor color *discrimination* — it seems to recognize rather than compare) — the **divergence of opsins** across animal groups is the color sidebar in [[#Color]]

• **tapetum lucidum** (cats, deer): a reflective layer behind the retina that boosts night vision (and causes eye-shine)

• **polarization vision** (cephalopods, many insects): they see light's **polarization** — a channel we're blind to

• **other eye designs**: **pinhole** (nautilus), **mirror eyes** (scallops — image formed by a concave mirror, not a lens), and the front-facing (**stereo**) vs side-facing (**field-of-view**) placement trade-off

• further reading: Land & Nilsson, *Animal Eyes*; the "Evolution of Eyes" chapter in *Vision* (Cambridge); a friendly overview at phos.co.uk, "The Evolution of Sight".

▸2.9 Color technology⧉

▸2.9.1 Analysis vs synthesis and non-orthogonality⧉

`fig-analysis-vs-synthesis` · analysis vs synthesis (sense→3 / reproduce←few) 🟨

• analysis = *sensing* color (project a spectrum onto 3 cone / sensor responses)

• synthesis = *reproducing* color (add a few primaries)

• both lose information: the projection is non-orthogonal and light can't be negative — a tension that recurs (sensing, displays, white balance)

▸2.9.2 Measuring color⧉

`fig-color-matching-setup` · color-matching experiment setup ✅

`fig-cie-cmfs` · CIE color-matching functions 🟨

`fig-chromaticity-diagram` · xy chromaticity + spectral locus + white point 🟨

• the color-matching experiment: match a test light by mixing 3 primaries

• trichromatic theory (von Helmholtz 1859); tristimulus values

• metamers: different spectra that match — *why* color reproduction is possible at all

• CIE 1931: color-matching functions, the xy chromaticity diagram, the spectral locus, the white point

• negative matches → why some colors need "impossible" primaries (see Reproducing color)

equations

tristimulus X = ∫ E(λ)·x̄(λ) dλ (and Y, Z)

chromaticity x = X/(X+Y+Z), y = Y/(X+Y+Z)

▸2.9.3 Linear vs gamma vs. log encoding⧉

`fig-gamma-curves` · gamma encode/decode curves 🟨

`fig-quantization-banding` · quantization banding (low-bit demo), worst in shadows ✅

⬜ figure not yet created

CRT power-law response (screen luminance vs grid voltage ≈ V^2.2)

`fig-slr-cross-section` · labelled cross-section of an SLR: 45° main reflex mirror sends light UP to the focusing screen + pentaprism → optical viewfinder; the semi-transparent main mirror + secondary (sub) mirror fold light DOWN to the phase-detect AF module in the mirror-box floor; focal-plane shutter + sensor/film behind; inset shows the mirror flipping up for exposure (companion to `fig-mirrorless-anatomy`) ✅

• different challenges: quantization, noise

• same observation: ratios matter

• **side bar — why gamma ≈ 2.2? the CRT accident**: old **CRT** displays had a *natural* power-law response — screen luminance ∝ (grid voltage)^γ, γ ≈ 2.2–2.5 (electron-gun physics: beam current vs control-grid voltage is a power law). So a CRT **automatically decodes** a gamma-encoded signal: send V = L^(1/γ) and the tube returns L = V^γ (linear light), no decode hardware. **sRGB's ≈2.2 codifies it.** Bonus: that curve ≈ the *inverse of human lightness perception* (Weber–Fechner), so it also puts code levels where the eye is sensitive — **why we kept gamma after CRTs vanished** (LCD/OLED emulate the curve for compatibility). [CRT-response figure]

• **side bar — even sRGB isn't quite continuous**: the ubiquitous sRGB transfer curve is **piecewise** — a short **linear** toe near black, $V = 12.92\,L$ for $L \le 0.0031308$, then a **power** segment $V = 1.055\,L^{1/2.4} - 0.055$ above it — meant to join smoothly at the knee. But the standardized constants (the $12.92$ slope, the $0.0031308$ breakpoint, the $0.055$ offset, the $1/2.4$ exponent) are **rounded and not mutually consistent**, so the two pieces **don't exactly meet**: at the breakpoint the linear part gives $\approx 0.04045$ while the power part gives $\approx 0.04059$ — a tiny **discontinuity** ($\sim 10^{-4}$), on top of the slope kink that's there by design. So even the encoding *every* image uses is, strictly, **not continuous** at the join. The lesson: real transfer functions are messier than the clean "$\gamma \approx 2.2$" story — **check the actual curve** (and its inverse) before doing arithmetic or quantizing in that space. *(A "corrected" sRGB nudges the breakpoint/scale so the pieces meet exactly.)*

• **side bar — gamma in film**: the word **gamma** comes from photography. Film's response is the **characteristic (Hurter–Driffield) curve** — **density** vs **log exposure** — with a toe (shadows), straight middle, and shoulder (highlights). Film **gamma = the slope of the straight portion** (γ = ΔD/Δlog H) = the medium's **contrast**, set by development (push-processing raises γ). So *film* gamma is a contrast/rendering parameter while *display* gamma is an encoding power law — both log–log slopes, different jobs. End-to-end **system gamma** (capture × display) is tuned slightly >1 (~1.1–1.5) so images look right in a **dim viewing surround**. [film characteristic-curve figure]

• 💡 **Big lesson:** **additive vs multiplicative → choice of encoding.** Light *adds* (incoherent sources, blur) but surfaces and contrast *multiply*; **log** makes the multiplicative native, **linear** suits the additive (deconvolution), **gamma** is the compromise with least crazy behaviour near zero.

• **log is becoming the standard for *video***: high-end and now prosumer cameras capture in **log** profiles — **S-Log / S-Log3** (Sony), **Log-C** (ARRI), **V-Log** (Panasonic), **C-Log** (Canon), **RED Log3G10** — and the **ACES** pipeline standardises a log working space (ACEScc / ACEScct). Log packs a wide scene dynamic range into limited bits while keeping the grade malleable: a **stop** becomes a roughly constant code increment, so exposure and contrast moves are uniform. The footage looks flat and desaturated straight out of camera and is **color-graded** to a display encoding; the broadcast-HDR transfer functions **HLG** and **PQ** (SMPTE ST 2084) are the display-side cousins. So in video the trio increasingly resolves to **log for capture, gamma / PQ / HLG for display**. (→ HDR, tone mapping.)

• **luma vs luminance** (a recurring confusion): **luminance** Y is the *linear*, perceptual brightness — a weighted sum of **linear** RGB (CIE Y); **luma** Y′ is the *same-shaped* weighted sum computed on the **gamma-encoded** RGB (the coding convenience behind YUV / YCbCr / JPEG). They differ because the weighting and the gamma don't commute.

• **our convention**: use **Rec. 601 weights — Y = 0.299 R + 0.587 G + 0.114 B** — matching video / JPEG / YCbCr and our programming exercises; mention **Rec. 709** (0.2126 / 0.7152 / 0.0722) as the HDTV / linear-light alternative.

equations

gamma encode V = L^(1/γ), decode L = V^γ (γ ≈ 2.2)

sRGB piecewise curve

**luma** Y′ = 0.299 R′ + 0.587 G′ + 0.114 B′ (Rec. 601, on *encoded* RGB) ≠ **luminance** Y (same-style weights on *linear* RGB

Rec. 709 alt 0.2126/0.7152/0.0722)

▸2.9.4 Linear (kind of) color spaces⧉

`fig-gamut-primaries` · sRGB vs wide-gamut primaries on xy 🟨

• **RGB and its flavours**: many RGB spaces differ only by their **primaries** (gamut) and **white point**, related by a single **3×3 matrix M** — sRGB / Rec. 709 (web, HDTV), **Adobe RGB** and **Display P3** (wider), **Rec. 2020** (UHD), **ProPhoto** (very wide, for editing). Converting between them (or to XYZ) is one matrix multiply *in linear light*.

• **opponent / luma-chroma spaces** (YCbCr, YUV, YCoCg): a fixed 3×3 map separating **luma** from two **chroma** axes — used for compression (chroma subsampling) and some editing.

• caveat: "linear" here means **a linear map between spaces**, not necessarily linear-*light* values — sRGB carries a gamma; XYZ is linear-light.

equations

RGB ↔ RGB and RGB → XYZ via a 3×3 matrix M

▸2.9.5 Non-linear space: CIELAB⧉

`fig-perceptual-uniformity` · a rainbow at N colours equally spaced in sRGB vs CIELAB; ΔE bars under each strip show sRGB steps are perceptually ragged, CIELAB steps even (Color technology — CIELAB) 🟩

`fig-cielab-ab-plane` · the a*b* plane 🟨

• **CIELAB** (and CIELUV): a deliberately **non-linear, perceptually-uniform** space — a cube-root compression of XYZ about a white point yields **L\*** (lightness) and **a\*, b\*** (opponent chroma), so **equal distances ≈ equal perceived differences**; color difference **ΔE = ‖ΔLab‖** (improved by ΔE2000)

• the motivating demo: sample a rainbow at **N colors equally spaced in sRGB vs in CIELAB** — the sRGB steps have wildly uneven perceived gaps (ΔE), the Lab steps are even [perceptual-uniformity figure]

• **why non-linear**: perception is roughly power-law (Weber–Fechner / Stevens), so a uniform space must warp XYZ — the same reason gamma encoding exists

• **friends**: HSV / HSL (handy for UI, *not* perceptually uniform), and modern uniform spaces — CAM02-UCS, **Oklab** — that fix CIELAB's known non-uniformities (the *Chong* perceptual color metric belongs here)

equations

CIELAB L* = 116·f(Y/Yₙ) − 16 (cube-root nonlinearity)

ΔE = ‖ΔLab‖

▸2.9.6 Reproducing color⧉

`fig-additive-subtractive` · additive vs subtractive synthesis 🟨

`fig-color-synthesis-bands` · additive vs subtractive synthesis, 3-band cartoon; subtractive is multiplicative (Color technology) 🟨

`fig-subtractive-spectra` · subtractive mixing as a per-wavelength multiplication: yellow & cyan filter transmittances and their product T_Y·T_C (= green) 🟨

`fig-gamut-mapping` · gamut + gamut mapping 🟨

`fig-repro-monochromatic` · reproducing a monochromatic signal (2D LA analogy) 🟨

• intro: fundamental challenge of spectrum non-orthogonality.

• Give example of reproducing a monochromatic signal.

• Show 2D linear algebra equivalent problem.

• Say that the big extra challenge is that we can only have positive light.

• Non-orthogonality + restriction to positive -> impossible problems.

• **additive vs subtractive** color synthesis — cartoon: split the spectrum into 3 equal bands and pretend they are pure **B, G, R** [3-band synthesis figure]

• **additive** (lights): start from black and *add* bands → R+G+B = white; pairs give yellow / cyan / magenta

• **subtractive** (filters/inks): start from white and each filter *removes* a band — but this is really a wavelength-by-wavelength **multiplication** of the spectrum, so "subtractive" is **misleading**; C·M·Y → black

• concrete proof it's a multiplication: a **yellow** filter (blocks blue) stacked with a **cyan** filter (blocks red) passes only the overlapping green band → **yellow × cyan = green**, not a "sum" of the two [subtractive spectra-multiplication figure: T_Y(λ)·T_C(λ)]

• the **overlapping-disks** picture makes the same point at a glance: RGB lights overlapping toward white (additive) vs CMY inks overlapping toward black (subtractive) [overlapping RGB/CMY disks figure]

• real systems are often **hybrid** — some light *added* and some *removed* — e.g. an **LCD projector**: a white lamp, LCD panels that *subtract* (modulate) each channel, then the three channels *added* on the screen

• various technology: CRT, LCD, projectors (DMD, LCD, laser)

• film, printer, halftoning

• gamut, gamut mapping

equations

additive mix = Σ wᵢ·Pᵢ (linear)

solve primary weights via 3×3 inverse

gamut = convex hull of primaries

▸2.9.7 Color management, ICC, and industry standards⧉

`fig-icc-workflow` · ICC workflow diagram 🟨

`fig-device-gamut-mismatch` · gamut mismatch across devices ✅

• the problem: **the same numbers look different on different devices** — each camera / monitor / printer has its own primaries, white point, gamut and transfer curve

• the solution — **ICC color management**: every device carries a **profile** mapping its values to/from a device-independent **profile-connection space** (XYZ or Lab); image → PCS → target device makes color *portable*

• **rendering intents** for out-of-gamut color: **perceptual** (compress the whole gamut, keep relations), **relative colorimetric** (clip + white-point match), **saturation**, **absolute colorimetric**

• **chromatic adaptation** between white points via a **Bradford 3×3** transform (the engineering cousin of von Kries adaptation, Perception)

• in practice: profiles embedded in files (→ File formats), monitor calibration, soft-proofing; **sRGB** as the safe default when no profile is present

equations

device ↔ profile-connection space (XYZ/Lab)

chromatic-adaptation transform (Bradford 3×3)

▸2.9.8 Sensing color: multiplexing strategies⧉

`fig-color-multiplexing` · four colour-sensing multiplexing strategies: temporal, spatial (Bayer), beam-split prism, depth (Foveon) 🟨

⬜ figure not yet created

temporal multiplexing artifact — Prokudin-Gorskii color fringes on moving water fig-prokudin-gorskii

• four ways to turn a monochrome sensor into a color one — **multiplex** the three (or more) measurements across some axis:

• **in time**: shoot R, G, B frames sequentially through filters — Maxwell's first color photograph (1861), **Prokudin-Gorskii**, flatbed scanners, astronomy filter wheels. Cheap and full-resolution, but **fails on motion** (the classic color fringes on moving water).

• **in space (the dominant choice)**: a **color filter array (CFA)** — the **Bayer mosaic** (2 green : 1 red : 1 blue) — one color per photosite, then **demosaick** to fill the rest; the eye does the same with its interleaved cone mosaic. Variants: Fuji's **X-Trans** (a larger, less-periodic tile to fight moiré) and **complementary CMY/CYGM** CFAs (more light through, messier color). Needs an **optical anti-aliasing (low-pass) filter** to tame high-frequency color artifacts, and suffers some channel **crosstalk**.

• **beam-splitter (prism)**: a **dichroic prism** splits the rays onto **three sensors** — **3-CCD / 3-chip**, the classic **broadcast video camera** — wasting no photons at full per-channel resolution, but bulky, costly, and hard to align.

• **in depth (stacked)**: stack wavelength-selective layers so one location senses all three — **Foveon** (silicon absorbs longer wavelengths deeper; used in **Sigma** cameras), echoing **Kodachrome**'s stacked dye layers. Full color at every pixel (**no demosaicking**), but trickier color separation and more chroma noise.

• **spectral / Lippmann**: record the standing-wave **interference** of the full spectrum (Lippmann, 1908 Nobel; a precursor to holography) — true spectral capture, not just 3 numbers.

• **hybrid**: combine these (e.g. spatial + temporal in video).

• this is *analysis* (sensing color) — the mirror of color *synthesis* (reproduction, above); the **demosaicking** algorithm itself is deferred to Basic Image Processing (which is why this is just the *sensing* half)

• **trichromatic (perceptual) vs spectral (physical) capture**: almost all imaging systems aim to capture **perceptual, trichromatic color** — three numbers that reproduce what a *human* would see (RGB ≈ the cones), so metamers are, by design, indistinguishable. **Multispectral / hyperspectral** imaging instead samples the **physical spectrum** in many narrow bands (tens to hundreds), keeping distinctions the eye throws away — for remote sensing, agriculture, art conservation, and machine vision, where the *material*, not the *appearance*, is what matters (and metamers must be told apart). Lippmann (above) is the limiting case: the full continuous spectrum.

▸2.9.9 Processing color: saturation and beyond⧉

`fig-point-op-saturation-vibrance` · uniform saturation (constant `s`, over-saturates already-vivid colours → clip) vs vibrance (per-pixel gain `g = 1+v(1−sat)·w_skin(hue)`, protects vivid colours & skin): input, saturation, vibrance + a gain-vs-saturation panel (Point operations → Basic colour enhancement, BASIC) 🟩

• **saturation**: scale the chroma — push colors away from the grey (luma) axis in an opponent / HSV space; a uniform scale over-saturates already-vivid colors and skin

• **vibrance**: a *smarter* saturation — boost **less-saturated** colors more and **protect** already-saturated ones and **skin tones** (a non-linear, hue-aware gain)

• **selective / HSL color** (Lightroom's mixer): per-hue-band hue/saturation/luminance, and **color grading** (shadows / mids / highlights tinting) from film & video

• the *mechanics* (curves, LUTs, pipeline placement) live in **Basic Image Processing** (point operations); here we own the **color-space** reasoning — which axis, which space, why protect skin

▸2.9.10 Converting to black and white⧉

`fig-bw-conversions` · one colour photo → black & white several ways: single channel (R/G/B) · average · weighted luminance · red-filter channel mix (dark sky) · an isoluminant pair collapsing to one grey — the choices differ (Converting to B&W, Color technology) ✅

• the key framing: "convert to black & white" is **under-specified** — you are projecting **three numbers to one**, an inherently **lossy, many-to-one** map, so there are *many* valid answers and the right one depends on **intent**, not correctness

• **pick a single channel** (R, G, or B): quick; occasionally useful (the **green** channel is close to luminance and usually least noisy; the **red** channel smooths skin and darkens skies) — but it throws away two-thirds of the data

• **average** $(R+G+B)/3$: simplest, but **perceptually wrong** — it makes blue as bright as green; skies and foliage come out muddy

• **weighted luminance** (the perceptual default): a weighted sum that matches the eye's sensitivity — **green dominates**, blue counts little — **Rec. 601** $0.299/0.587/0.114$ or **Rec. 709** $0.2126/0.7152/0.0722$; this is exactly the cone-weighting / luminance idea from [[Human (and animal) vision and color]]

• **but in which space?** the recurring **[[#Linear vs gamma vs. log encoding|luma vs luminance]]** distinction — the cheap version weights the **gamma-encoded** values (**luma** Y′, what most software does), the radiometrically correct version weights **linear** RGB (**luminance** Y) then re-encodes; they give visibly different greys (linear-light is darker in saturated reds/blues)

• **lightness, not luminance**: take **L\*** from **CIELAB** (perceptually uniform lightness) — or the UI-space shortcuts **HSL "lightness"** $(\max+\min)/2$ and **HSV "value"** $\max(R,G,B)$ — each a different grey again

• **custom channel mixer** (the artistic control): choose the weights yourself (keep $\sum w_i\approx 1$ to hold exposure) — this is Lightroom/Photoshop's **B&W mixer**, and the digital descendant of the darkroom **color filter on B&W film**: a **red** filter darkens a blue sky for dramatic clouds, a **yellow/green** filter lightens foliage, an **orange** filter smooths skin (Ansel Adams's black skies are a red filter on panchromatic film). So a B&W conversion is really a decision about *how each color should map to grey*

• **the hard case — isoluminant colors collapse**: two distinct hues with the *same* luminance (a red and a green of equal brightness) map to the **same grey**, erasing the edge between them; **contrast-preserving decolorization** methods (Color2Gray, Gooch et al. 2005; Grundland & Dodgson 2007) optimise the mapping to keep **color contrast** rather than only luminance — the principled answer to "the green apple and red apple shouldn't vanish into one grey"

equations

grey $= w_R R + w_G G + w_B B$ with $\sum w_i = 1$ (and *in which space* — luma on gamma RGB vs luminance on linear RGB)

▸2.9.11 Color appearance models⧉

`fig-color-dimensions` · the dimensions of color (hue/chroma/lightness) 🟨

`fig-color-solid` · a perceptual color solid ✅

• dimensions of color perception:hue, chroma, etc.

• perceptual color space

• HSV and other UI-spaces

• color appearance models

▸2.9.12 Skin tones⧉

`fig-skin-tone-vectorscope` · skin-tone locus on a vectorscope ✅

⬜ figure not yet created

diverse skin tones fig-diverse-skin-tones

⬜ figure not yet created

a Shirley card fig-shirley-card

• a *memory color*: viewers are unusually sensitive to skin looking right

• the skin-tone locus; cameras and film are tuned to render skin pleasingly

• diversity of skin tones and historical bias (e.g. Shirley cards) — a fairness tie-in (see ethics)

▸2.10 Photography and camera 101⧉

• chapter intro: having built the physics, we now sit behind a real camera — its **controls, modes, and guts** — and see how phones reach the same goal by a wholly different (computational) route

▸2.10.1 Basic photography: exposure settings — shutter, aperture, and ISO⧉

`fig-exposure-triangle` · exposure triangle 🟨

`fig-fstop-progression` · f-stop power-of-two progression 🟨 pilot rendered

⬜ figure not yet created

aperture vs depth of field fig-aperture-dof

⬜ figure not yet created

shutter vs motion blur fig-shutter-motion-blur

⬜ figure not yet created

metering & the 18% gray assumption (snow scene → underexposed without +EV) fig-metering-18-gray

`fig-ettr-histogram` · expose-to-the-right: an image histogram pushed toward the highlights without clipping (vs an under-exposed one bunched at the noise floor) 🟨

• effect on light (linear, quadratic). Why f/# are a ratio: same amount of light (divide fov, compensate with aperture)

• side effects (motion blur, depth of field). Will derive later

• **f-stops are a log₂ (doubling) scale of light**: the *f-number* itself is a **ratio** N = f/D, but what a photographer counts is **stops**, and **one stop = a factor of 2 in light**. Because light ∝ aperture **area** ∝ D² ∝ 1/N², the standard series **1.4, 2, 2.8, 4, 5.6, 8, 11, 16** steps by **√2 in N** so the area (and the light) **halves** each step. A "stop" is therefore a **log₂ unit**: $\text{stops} = \log_2(\text{light ratio})$ — and aperture, shutter time, and ISO are all measured in this same stop currency, which is exactly why you can trade one against another (the exposure triangle). Cameras add **⅓-stop** clicks for fine control. [f-stop power-of-two progression figure]

• 💡 **Big lesson:** **photographers measure light in *stops* — a log₂ (doubling) scale.** Aperture, shutter, ISO, and dynamic range are all counted in stops (factors of 2), so **exposure is additive in log space** — the same "ratios matter, work in log" principle as gamma and tone mapping.

• 🖱️ **Interactive (web edition):** a live **exposure-triangle simulator** — two 3D subjects at different depths walk in opposite directions while you set shutter, aperture, ISO, scene brightness, and sensor size; it renders the scene *physically* (time supersampling → real motion blur, aperture supersampling → real depth of field, a shot+read-noise model → real ISO grain), and an "auto-set triangle" holds the exposure constant so you feel each control's trade-off, not just its brightness. [fig-exposure-triangle-sim; interactive]

• **autoexposure & metering**: the meter integrates scene light (TTL — center-weighted / **spot** / **evaluative-matrix**) and renders it to a **mid-gray (~18%) assumption**, which **fails on high-/low-key scenes** — a snowfield meters to gray and comes out *underexposed* (you add +EV); a black cat meters to gray and comes out *overexposed*. Matrix metering (e.g. Nikon's 3D Color Matrix, trained on tens of thousands of photos) adds brightness/color/contrast/distance cues to guess intent. [18%-gray metering figure]

• **where does 18% come from?** it's the reflectance of the standard photographic **grey card** and the meter's stand-in for the *average reflectance of a "normal" scene*. The key idea: **18% is middle grey *perceptually*, not numerically** — because lightness is roughly **logarithmic** (Weber–Fechner; cf. CIE $L^*$), the perceptual midpoint between a black (~2–3% reflectance) and a white (~90%) sits near their **geometric mean** (√(0.03·0.9) ≈ 0.16), i.e. ~**18%**, *not* 50%. It is **Zone V** of Ansel Adams's Zone System (the middle of eleven zones). A meter's job is then "assume the scene averages to middle grey, and expose so that average lands at middle grey."

• 💡 **the radiometric meaning of (shutter, aperture, ISO) — a worked example (and why not to trust it).** The three controls aren't arbitrary dials: they tie the photo to **absolute scene radiometry** through two calibrated relations.

• **ISO is calibrated to the sensor** (ISO 12232). The **saturation-based speed** is $S_{\text{sat}} = 78 / H_{\text{sat}}$, where $H_{\text{sat}}$ (in **lux·s**) is the focal-plane *luminous exposure* that just **saturates** the sensor; so a higher ISO means less light is needed to fill the well. (The standard also defines SOS / REI variants tied to a mid-grey output level — which is partly why bodies disagree.)

• **the exposure (metering) equation** ties the controls to scene **luminance** $L$ (cd/m²): a "correct" (middle-grey) exposure satisfies $\boxed{\,N^2/t = L\,S/K\,}$, with $N$ the f-number, $t$ the shutter (s), $S$ the ISO, and $K \approx 12.5$ the meter calibration constant (ISO 2720).

• **worked example** — **f/2.8, 1/100 s, ISO 400.** The scene luminance this renders to middle grey: $L = K\,N^2/(t\,S) = 12.5 \times 2.8^2 / (0.01 \times 400) = 12.5\times 7.84/4 \approx \mathbf{24.5\ \text{cd/m}^2}$. The **focal-plane exposure** actually delivered to the sensor follows the *camera equation* $H = q\,L\,t/N^2$ with the lens factor $q=\tfrac{\pi}{4}T\cos^4\theta \approx 0.65$ on-axis (DxO uses $q=0.71$): $H \approx 0.65\times 24.5\times 0.01/7.84 \approx \mathbf{0.020\ \text{lux·s}}$. Sanity check against saturation: at ISO 400, $H_{\text{sat}}=78/400 = 0.195$ lux·s, so middle grey sits $\log_2(0.195/0.020)\approx\mathbf{3.3\ \text{stops}}$ below clipping — that's your highlight headroom. So a bare $(t,N,S)$ triple **does** pin down scene radiance in real units.

• ⚠️ **but two big caveats** before you believe the number. **(1) the nominal values lie** — the marked/EXIF $N$, $t$, $S$ are not the true ones (rounded shutter, manufacturer-tuned ISO that can be **⅓–1 stop** off the measured value; → [[Basic image processing and ISP#Beyond the pixels: basic metadata and EXIF]], [@burggraaff-etal-2019]). **(2) not all pixels see the same light** — lens **vignetting** and the **cos⁴θ** natural falloff (→ [[#Sensors: photosites, CCD vs CMOS]]) mean **edge pixels get less light than the centre** for the same scene radiance, so "absolute radiance per pixel" needs a per-lens **flat-field** correction, not one global scalar. Honest radiometry therefore **calibrates the actual body + lens + aperture** rather than trusting the triplet. (DxOMark's "measured ISO" is exactly this saturation-based calibration; their old *ISO-sensitivity* measurements page is now folded into the [DxOMark glossary → ISO speed](https://www.dxomark.com/glossary/iso-speed/) and the [sensor-testing protocol](https://www.dxomark.com/dxomark-camera-sensor-testing-protocol-and-scores/).) [refs: ISO 12232; ISO 2720; [@burggraaff-etal-2019]]

• caveat (the **18 vs 12.5%** confusion): light meters are actually calibrated to a fixed **luminance constant** ($K$, ISO 2720) that, with typical lens flare, corresponds to **~12–13%** reflectance — so the famous "18% card" and the meter's true midpoint differ by **~⅓–½ stop**; metering off an 18% card gives a slightly different exposure than the meter's internal target, a perennial source of forum arguments.

• **priority modes & their failure cases**: aperture-priority (set N, camera picks t) and shutter-priority (set t, camera picks N) — each fails when no valid partner exists (f/1.4 in bright sun needs an impossible shutter; 1/1000 in dim light needs an impossible aperture)

• **"expose to the right" (ETTR)**: push the exposure as bright as possible **without clipping** highlights, since the **shadows** are where noise lives; **raising ISO** beats brightening in software because ISO gain amplifies the signal *and* the pre-amplifier noise but **not** the downstream **read noise** (→ Noise) [ETTR histogram]

equations

exposure ∝ t·(D/f)²

one stop = ×2 light = 1 EV

EV = log₂(N²/t)

▸2.10.2 Exposure modes, UI, and auto-ISO⧉

`fig-mode-dial` · exposure mode dial (P/A/S/M) → who sets aperture vs shutter in the exposure triangle ✅

`fig-telephoto-vs-retrofocus` · two two-group schematics — telephoto (+ then −, principal plane H′ pushed in front → physical length < f) vs retrofocus/inverted-telephoto (− then +, long back-focal distance to clear the SLR mirror); marks f vs physical length, H′, F′ ✅

`fig-exposure-triangle-sim` · live exposure-triangle simulator (web edition): two 3D subjects at different depths walking opposite ways; shutter/aperture/ISO/scene-light/sensor sliders with brute-force time + aperture supersampling (real motion blur, depth of field, shot+read noise) and an auto-set triangle; static fallback is a screenshot ✅

• the **PASM** modes are just *who picks which leg of the exposure triangle*: **P** (program — camera picks aperture+shutter, you can shift the pair), **A/Av** (you set aperture → DoF, camera picks shutter), **S/Tv** (you set shutter → motion, camera picks aperture), **M** (you set both); plus full **Auto** / scene modes

• **auto-ISO** is the modern fourth knob: in A or S the camera still has only one free leg — auto-ISO frees a *second* one by letting ISO float, so you can hold *both* a chosen aperture **and** a minimum shutter speed; you cap the **max ISO** (noise budget) and a **min shutter** (often a "1/focal-length" rule, or a factor of it)

• **exposure compensation** (±EV) biases the meter's mid-grey target — the manual override for the 18%-grey failure (snow → +EV, dark scene → −EV) without leaving an auto mode

• **M + auto-ISO** is a popular combo: you nail aperture and shutter for look, and let ISO be the automatic variable, with ±EV still steering it

▸2.10.3 Focus, autofocus, and depth of field⧉

`fig-af-families` · autofocus families: contrast-detect hill-climb · phase-detect sub-aperture offset · on-sensor/dual-pixel PDAF ✅

`fig-focus-conjugate-move` · focusing geometry: as subject distance u changes the image distance v must change (1/f=1/u+1/v); the lens slides to keep the image plane on the fixed sensor — near vs far focus (lens extends vs retracts) ✅

`fig-depth-of-field-sim` · live macro depth-of-field simulator (web edition): a to-scale thin-lens diagram + a 3D photo (aperture-supersampled bokeh) + a circle-of-confusion plot all sharing one optics model; subject dropdown (Meshy beetle / bee / ladybug) over a flower background, focus landing on the front of the face so the body and antennae blur; static fallback is a screenshot. *Bug & flower 3D models generated with Meshy AI.* ✅

• **the two classic passive AF families**:

• **contrast-detection (CDAF)**: drive the lens until image **contrast/sharpness is maximal** — accurate but has to **hunt** (it doesn't know which way is "more in focus", only that it overshot); historically slow, common on early mirrorless / compacts

• **phase-detection (PDAF)**: look at the scene through **two opposite edges of the lens** (sub-apertures) → two slightly shifted images; the **phase difference gives the direction *and* amount** of defocus in **one shot**, so the lens moves straight to focus (open-loop, no hunt). Classic SLR mechanism (a dedicated AF sensor under the mirror); it is literally a **tiny stereo measurement**

• **on-sensor PDAF / dual-pixel**: mirrorless bodies put phase detection *on the imaging sensor* — masked AF pixels, or **dual-pixel** designs that split each photosite into two halves that see opposite sub-apertures → phase detection at (nearly) every pixel, the best of both worlds (used for Pixel's portrait-mode depth too — a literal micro-stereo pair)

• **depth-from-defocus AF (Panasonic DFD)**: a third route — model how the lens's blur (PSF) changes with focus and read the **defocus direction and amount** from **two frames at slightly different focus**, giving a PDAF-like one-step cue from a plain **contrast** sensor (no phase pixels); rides on CDAF, needs a **per-lens blur profile** (full treatment → Optics, *Focus*)

• **active AF** (for completeness / history): time-of-flight ultrasonic (old Polaroid sonar) and IR rangefinding — work in the dark but limited; distinct from an **AF-assist lamp** that merely adds contrast for passive AF

• **focus modes**: **AF-S / one-shot** (lock once, for still subjects) vs **AF-C / AI-Servo** (track continuously, predict motion, for action); plus **MF** with focus aids (peaking, magnify — see UI)

• **focus selection & automation** — where computation now lives: single-point / zone / wide-area selection, then **subject-detection AF** — **face → eye → animal/bird/vehicle** detect that picks and **tracks** the subject (deep-learning driven; →ML). **Eye-AF** is the headline modern feature for portraits.

• **handling tricks**: **back-button focus** (decouple AF from the shutter button — focus with a thumb button, shutter only releases) and **focus-and-recompose** (lock focus on the subject, then reframe — beware the small focus-plane error at wide apertures / close range)

▸2.10.4 Lenses and focal length⧉

`fig-focal-length-families` · focal-length families as nested FOV cones (ultrawide/wide/normal/short-tele/tele) with angle of view + one-word use each ✅

• **primes vs zooms**: a **prime** is one focal length — typically **brighter (faster), sharper, smaller, cheaper** for its quality, and "makes you think / move your feet"; a **zoom** trades some of that for **convenience** (one lens, many framings). Fredo's standing advice: a cheap **35–85 mm f/1.8 prime** is razor-sharp value; modern zooms are good but the basic kit zoom is the weak point.

• **the focal-length families** (full-frame reference; *what each is for*):

• **ultrawide** (≤24 mm): landscapes, interiors, exaggerated near/far — watch the perspective stretch

• **wide** (~24–35 mm): environmental / reportage, "the scene + context"

• **normal** (~50 mm): close to the eye's perspective, unobtrusive, fast & cheap

• **short telephoto** (~85–135 mm): **portraits** — flattering compression, easy subject separation (see the portrait-distance bullet below)

• **telephoto / super-tele** (~200 mm+): sports, wildlife, the moon — strong compression, brings far subjects close

• 🖱️ **Interactive (web edition):** a live **dolly-zoom simulator** — the camera dollies in/out while the focal length zooms to hold the subject the same size, so *only* perspective changes and the background rushes in or falls away (the "Vertigo" effect); log focal-length (12–200 mm) and distance sliders with a magnification-preserving link, realistic environment maps, and a subject that is a bust or one of several diverse 3D humans. It makes "focal length vs distance / background compression" tangible (the Pinhole chapter's lesson). [fig-dolly-zoom-sim; interactive] *(human 3D models generated with Meshy AI — acknowledge in the caption)*

• **why portrait lenses flatter — it's the *distance*, not the lens**: a "portrait" focal length (~**85–135 mm**) flatters because it lets you **back away** from the subject, and what actually shapes a face is the **camera-to-subject distance**. Up close (a wide lens), the nose is much nearer the lens than the ears, so it looms — bulbous nose, big forehead, receding ears; step back and those distances equalize, giving natural proportions. Striking finding: the chosen **distance changes not just proportions but perceived *personality*** — closer faces read as less trustworthy / less attractive in rating studies. Cite **Perona 2007** ("A new perspective on portraiture," *Journal of Vision* 7(9):992) and the perceived-traits-vs-distance study (Bryan et al., PMC4114730: <https://pmc.ncbi.nlm.nih.gov/articles/PMC4114730/>). The flip side — the **wide-angle face stretch** up close — and its correction are in [[Perspective distortion and its correction]].

• 🖱️ **Interactive (web edition):** a live **face-distortion simulator** — set the focal length, the magnification (face size), and the eccentricity (how far off-centre the face sits), and watch the face's *shape* change: the close-wide "big-nose" perspective when a short lens forces you in, and the wide-angle stretch as the face nears the frame edge. A full-frame view and a face-cropped view (constant size) let you compare shape, not position; the subject can be a face, a grid sphere, or one of several diverse 3D human faces. [fig-face-distortion-sim; interactive] *(human 3D models generated with Meshy AI — acknowledge in the caption)*

• **fast vs slow**: a **fast** lens has a **large maximum aperture** (small f-number, e.g. f/1.4) → more light + shallower DoF; a **slow** lens (f/5.6) is smaller/cheaper but light-starved. Zooms are often slow and/or have a **variable** max aperture across the zoom range.

• **the "35 mm-equiv" reminder** (cross-ref, don't re-derive): a focal length only means a field of view *relative to a sensor size*; smaller sensors **crop** → multiply by the **crop factor** to get the full-frame-equivalent FOV (APS-C ≈ 1.5×, M4/3 ≈ 2×, phones much more). See the Pinhole chapter's "35 mm" / crop-factor sidebar.

▸2.10.5 Lens filters: polarizers, ND, and graduated ND⧉

`fig-reflection-removal-cues` · cues that separate a window reflection from the transmitted scene — ghosting/double-image, polarization, focus difference 🟨

⬜ figure not yet created

**ND** enabling a daylight long exposure (silky water) [photo before/after] fig-gain-vignette-solve

`fig-jpeg-artifacts-photo` · JPEG blocking & ringing on a REAL photographic edge (white egret vs dark foliage) at q=10: grid-aligned zoom with 8×8 overlay — blocking in the smooth area, ringing along the edge (File formats, BASIC) ✅

• **polarizer (circular polarizer, CPL)**: a rotatable filter passing one polarization. Because **reflection partially polarizes light** (Brewster's angle; Light & physics), rotating it **cuts glare and reflections** off non-metal surfaces (water, glass, foliage), **deepens blue sky** (skylight ~90° from the sun is strongly polarized) and **boosts saturation**. Costs ~1–2 stops; "**circular**" (a linear polarizer + quarter-wave plate) so the camera's beam-splitter AF/metering still work. **Not reproducible in post** — it changes *which* light is captured.

• **neutral-density (ND) filter**: a **uniform grey** that cuts light by a chosen number of **stops** with (ideally) no color shift → lets you use a **slow shutter** (silky water, motion blur) or a **wide aperture** in bright light. Strong NDs (6–10 stops) enable **daytime long exposures**; a **variable ND** is two stacked polarizers rotated to dial the strength.

• **graduated ND (GND)**: dark at the top, clear at the bottom, with a **hard / soft / reverse** transition. Aligned to the **horizon**, it holds back a bright sky so a high-contrast scene fits in **one exposure** — an *optical* alternative to **HDR bracketing** (Image measurements; Multiple exposure).

• **UV / clear "protective" filter**: today mostly lens protection (perennial debate: protection vs an extra surface that can add flare); historically cut UV haze on film.

• **throughline**: polarizer and GND effects are **partly or wholly un-doable in software** (the polarizer changes the captured light; the GND is a manual exposure blend) — which is why physical filters survive the computational era, in contrast to a plain ND (≈ an exposure change software *can* fake).

▸2.10.6 Keeping the lens clean⧉

⬜ figure not yet created

the cleaning kit + order of operations (blower → brush → microfibre / lens-pen → fluid)

`fig-lens-flare-ghosting` · stray light: a bright source, a ray reflecting off two internal lens surfaces → a "ghost" landing displaced on the sensor, plus a veiling-glare wash; ghost vs flare labelled ✅

• a smudged or dusty front element **scatters light** → **veiling flare**, lower contrast, and soft dust spots (worst stopped down) — clean glass is real image quality, not fussiness

• **order of operations**: **blower first** (lift grit without dragging it), then a soft **brush**, then a **microfibre cloth / lens pen**, and only then **lens fluid** for stubborn smears — wipe **gently, centre-outward**; never a dry rough wipe (it grinds grit into the **coating**)

• **don't overdo it**: every wipe risks the anti-reflection coating, and a little dust is optically harmless; a **protective/UV filter** or the **lens hood** prevents most of the problem

• **sensor dust** is the cousin problem — dust on the **sensor (or its cover glass)** shows as **soft dark spots fixed in the frame** (vs moving with the lens), most visible at **small apertures** on plain areas (a clear **sky**); the cure is cleaning (rocket **blower**, then a **wet swab**) plus in-camera **dust-removal / ultrasonic shake** and **dust-mapping** for software subtraction — a maintenance reality of **interchangeable-lens** cameras (changing lenses lets dust in) (forward-ref dust-spot removal, BASIC)

▸2.10.7 Image stabilization⧉

`fig-ois-vs-ibis` · two stabilization strategies side-by-side: lens-shift OIS (floating lens group moves) vs sensor-shift IBIS (sensor moves on a magnetic stage), moving element highlighted, same gyro/IMU input ✅

`fig-stabilization` · image stabilization: hand-held blur vs OIS (lens shift) vs IBIS (sensor shift); ~2–5 stops, camera shake not subject motion ✅

• **what it does**: a gyroscope senses the camera's **angular shake** and a feedback loop counters it during the exposure, so you can hand-hold **slower shutter speeds** — typically a **gain of ~2–5 stops** (e.g. 1/100 s at 1000 mm on a monopod, per the slides)

• **three implementations**:

• **OIS — optical, in-lens**: a floating lens element shifts to counter shake (Canon IS, Nikon VR) — tuned per lens, also stabilizes the viewfinder/AF

• **IBIS — in-body**: the **sensor** itself rides a moving stage (Panasonic, Olympus, Sony) → works with *any* mounted lens, and can stack with OIS; also enables sensor-shift super-resolution / high-res modes

• **digital / EIS**: crop a margin and **warp each frame** to cancel motion — cheap, no moving parts (phones, video), but costs resolution/FOV and can add wobble

• **the crucial limit** — it fixes **camera shake, not subject motion**: a stabilized 1/4 s shot of a *static* scene is sharp, but a *moving* subject still blurs (you need a fast shutter for that). State this explicitly — a common beginner confusion.

• **sidebar — IS hacking**: the stabilizer is a controllable blur actuator — driving it deliberately enables **custom motion blur** / coded exposure (Bando–Holtzman ICCP 2011) → Advanced

▸2.10.8 UI and display⧉

`fig-viewfinder-types` · viewfinder types: OVF (optical) vs EVF (electronic) vs rear LCD ✅

• **viewfinders**: **OVF** (optical — an SLR mirror/prism or a rangefinder window) shows the real world directly, zero lag, but **not** the captured exposure/WB; **EVF** (electronic, mirrorless) and the **rear LCD** show the *sensor's* live feed — slightly laggy, but **WYSIWYG**: exposure, white balance, and depth-of-field preview all visible before the shot

• **exposure preview & the live histogram**: the EVF/LCD can show the image *as it will be captured* (exposure simulation), with a **live histogram** to judge clipping; this is the practical home of **ETTR** — push the histogram right without clipping

• **focus aids**: **focus peaking** (highlight high-contrast in-focus edges) and **magnify** for manual focus; **zebras** (animated stripes over near-clipped highlights) for exposure — borrowed from video

• the through-line: an EVF turns the *abstractions* of exposure and focus into **direct visual feedback**, which is why mirrorless changed how people shoot

▸2.10.9 Video modes⧉

`fig-shutter-angle` · shutter angle sets the blur — a rotating-disc shutter at $0°/180°/360°$ admitting a smaller/larger fraction of the frame interval $T$; $180°$ ($\tau=T/2$) is the cinematic film-look, small angles strobe 🟨

• **frame rate & shutter angle**: video sets exposure time as a fraction of the frame interval — the **180° shutter-angle** convention (shutter ≈ ½ the frame time) gives "natural" motion blur; high frame rate → slow motion

• **rolling shutter / "jello"**: CMOS reads the sensor **row by row**, so fast motion (or a fast pan, or a spinning prop) **skews/wobbles** — a **global shutter** avoids it but is rarer/costlier (ties back to CMOS readout in Image measurements as integrals)

• **log / flat profiles + grading**: shoot a **log** (flat, low-contrast) profile to preserve dynamic range, then **color-grade** in post — the video cousin of raw + tone curve

• **codecs & bitrate**: intra- vs inter-frame compression, 8- vs 10-bit, chroma subsampling (4:2:0 vs 4:2:2), bitrate — trade file size vs editing/grading headroom

• *(kept deliberately brief — the algorithms and the full motion treatment are in **Motion/Video**)*

▸2.10.10 Programmability, or the lack thereof⧉

`fig-bokeh-shapes` · bokeh: out-of-focus highlights take the aperture's shape — circular (many blades / wide open) vs polygonal (few blades, stopped down); synthetic defocus-disk fields + aperture insets ✅

• most cameras run **closed firmware**: you cannot run your own code on them, so the **computational-photography ideas in this book can't be tried on a stock camera** — a real obstacle for experimentation

• the **hacks**: **CHDK** (Canon PowerShot) and **Magic Lantern** (Canon DSLRs) are community firmware add-ons that expose scripting, raw video, bracketing, intervalometers, and focus stacking — born exactly because the manufacturers wouldn't

• the **open road**: **Android's Camera2 / CameraX** API (and Google's earlier FCam research camera) expose per-frame control of exposure / focus / gain and access to **raw burst** — which is *why phones, not cameras, became the home of computational photography*

• the **maker road**: a **Raspberry Pi + camera module** (the HQ camera, driven by **libcamera / picamera2**) is a cheap, fully **open and programmable** camera — per-frame exposure/gain control and **raw** access on a hobbyist board — the natural platform for *trying this book's algorithms* and building DIY computational cameras

• **why this matters for the book**: our algorithms (HDR+, burst denoising, computational bokeh) need **programmable capture** — control of the per-frame integral and access to raw; the platform that grants it is where the research happens

▸2.10.11 Anatomy of a modern full-frame interchangeable-lens camera⧉

• the **mirrorless block diagram** (light → bits): **lens + mount** (electronic contacts carry AF/aperture/IS) → **sensor on an IBIS stage** → **shutter** (a mechanical curtain *and/or* an electronic/rolling shutter) → **image processor + AF** (on-sensor PDAF, subject detect, the ISP) → **EVF + rear LCD** (the live feed) → **card + battery**

• the defining change from the **SLR**: the **mirror box and optical prism are gone** — the sensor feeds the EVF directly, which is what enables WYSIWYG preview, on-sensor PDAF across the frame, silent electronic shutter, and tighter lens-to-sensor distances (shorter flange distance → new mounts)

• each block is a previous section made physical — a good place to consolidate the chapter

▸2.10.12 Anatomy of cell-phone cameras⧉

`fig-phone-camera-array` · phone multi-camera array (ultrawide/wide/tele+periscope, tiny sensors, bright fixed lenses) → a computational-pipeline block → final photo ✅

• the hardware constraints: a **tiny sensor** behind a **bright, fixed-aperture lens** (no iris — aperture is fixed, exposure is shutter + ISO only); tiny sensor → **huge depth of field** (why phones must *fake* background blur) and **more noise** (why they burst-and-average)

• a **multi-camera array** stands in for a zoom: separate **ultrawide / wide / tele** modules, the long end often a **periscope** (a prism folds the light path sideways so a long focal length fits a thin body); switching "zoom" really switches cameras (+ crop/fusion between them)

• 💡 the key idea: the **computational pipeline does the heavy lifting** — demosaicking, multi-frame **HDR+ / burst denoising**, **computational bokeh** (depth from dual-pixel / multi-camera), night mode, super-resolution. The phone overcomes small-sensor physics with **computation**, not glass → forward-ref **Basic ISP** and **Multiple-exposure**.

• this is the chapter's punchline and the bridge to the rest of the book: cheap optics + a smart pipeline beats good optics alone

▸2.10.13 Types of cameras⧉

`fig-lens-design-tradeoff` · lens-design Pareto cartoon as a 6-axis radar (sharpness, speed, size, weight, cost, distortion): fast pro prime vs compact phone lens overlaid, plus a dashed polygon showing where software correction shifts the phone lens (Lens optimization) ✅

• **by sensor size** (the spine — sets DoF, light-gathering, and body size; cross-ref crop factor): **phone** → **1-inch compacts** → **Micro Four Thirds** → **APS-C** → **full-frame (35 mm)** → **medium format** → **large-format / view cameras**

• **by lens**: **fixed-lens** (phones, compacts, bridge/superzooms, instant) vs **interchangeable-lens (ILC)** (mirrorless, DSLR, medium format)

• **by mechanism / viewfinder**: **DSLR** (mirror + OVF, legacy) vs **mirrorless** (EVF, dominant — the anatomy above); **rangefinder** (Leica-style window + focus patch); **point-and-shoot / compact**; **bridge / superzoom** (fixed huge-range zoom on a small sensor); **action / 360** (rugged, ultrawide, fixed); **instant** (Polaroid / Instax, self-developing print); **glasses / wearable cameras** (the camera leaves your hands entirely — Google Glass, Snap Spectacles, Ray-Ban Meta, GoPro-on-a-head; **point-of-view, always-available capture**, raising the obvious **privacy / consent** questions — cross-ref Ethics); **film** bodies (still made — see the darkroom below); **view / technical cameras** (large format + movements → tilt-shift, Optics)

• **by platform — how the camera is *carried* (aerial photography)**: the **drone / UAV** is the modern default (a gimballed camera that flies), but it sits at the end of a long lineage of getting a camera *above* the scene when no tripod will reach — **balloon** photography (Nadar over Paris, 1858), **kite** aerial photography (Arthur Batut, 1880s; the rig + a timer/altitude trigger), **pigeon** cameras, and **pole / mast** photography (a camera on a tall pole or monopod for low-altitude overhead frames). The platform buys a **viewpoint**, not a different sensor — the same imaging, lifted.

• the practical lesson: **sensor size and lens system, not megapixels**, mostly separate these — the rest is ergonomics, platform, and price

▸2.10.14 Cameras beyond photography⧉

`fig-lagrangian-vs-eulerian` · the organizing diagram — Lagrangian (arrows following particles over time → flow and tracking, output trajectories) vs Eulerian (a fixed grid, each pixel plotting intensity-over-time → magnification, never asking where anything went) 🟨

• **industrial / machine vision**: defect inspection, line metrology — often **global-shutter** mono sensors with fixed lighting and GigE/USB3 interfaces; the "image" is really a **pass/fail or a measurement**

• **barcode & QR readers**: dedicated retail/warehouse scanners, and the **QR / barcode** reader built into every phone camera — the camera as a **data input** (locate → rectify → decode a 1D/2D code to a number or URL), a tiny computer-vision pipeline rather than a picture

• **robotics & drones**: cameras for **SLAM**, obstacle avoidance, **visual-inertial odometry** (camera + IMU), and grasping — output is **pose / depth / a control signal**, not a picture

• **automotive (ADAS / self-driving)**: surround cameras fused with radar and lidar for lane-keeping, sign and pedestrian detection — **high dynamic range and reliability** over prettiness

• **scientific & specialty**: microscope and telescope cameras, **high-speed** cameras, **thermal / multispectral / hyperspectral**, and **event cameras** (per-pixel brightness-change sensors) — many revisited in Advanced

• **cooled astronomy cameras**: dedicated deep-sky cameras **actively cool the sensor** (a thermoelectric **Peltier** stage, often to a fixed *set-point* tens of °C below ambient) to crush **dark current** for minutes-long exposures; typically **mono** (paired with a filter wheel — LRGB / narrowband), **regulated** so dark frames match, and **no IR-cut / no Bayer** for sensitivity — the noise chapter's thermal term, attacked with refrigeration → cross-ref *Denoising by averaging* (stacking) and the noise section

• **medical**: endoscopes, retinal and surgical cameras, capsule cameras — imaging in service of diagnosis

• **ubiquitous & embedded**: webcams and video-conferencing, **AR/VR** inside-out and eye tracking, doorbell / CCTV security, automotive cabin sensing — cheap modules, computation-heavy

• throughline (→ Intro): once the camera is a **measurement device**, "good" stops meaning *pretty* and starts meaning **accurate, fast, and robust** for the task

▸2.10.15 A camera's other sensors⧉

`fig-pinhole-imaging` · imaging-scenario series (2/3): add a pinhole to the bare sensor — one ray per scene point → a dim **inverted** image (same tree+sensor+colours as fig-bare-sensor-averaging) ✅

⬜ figure not yet created

deferred]

• the image sensor isn't the only sensor; a small suite surrounds it, and several feed the *computational* pipeline, not just the metadata

• **IMU** (accelerometer + gyroscope) — the camera's sense of its own motion; drives **stabilization** (tells OIS/IBIS which way to counter-shift), auto-rotate / level horizon, and increasingly **computation**: EIS, seeding burst/panorama alignment, gyro-based deblur (knowing the motion helps estimate the blur kernel) → Motion / Deblurring

• **context / metadata sensors**: **GPS/GNSS** (geotag → EXIF), **magnetometer/compass** (heading; AR), **barometer** (altitude)

• **ambient-light sensor (ALS)**: a small front-facing photodiode — increasingly a multi-channel **color / spectral** version — that reads the **level and color of the surrounding light** (a *non-imaging* light meter sitting beside the imaging sensor). It drives **auto screen brightness**, seeds a **white-balance prior** (what illuminant are we under? → cross-ref automatic white balance, Color technology), and a fast variant detects the **flicker** of mains-powered fluorescent/LED lighting (the 50/60 Hz pulsing) so the camera can phase its exposures to dodge **banding** (*flicker-free / anti-banding* mode)

• **dedicated focus/depth sensors**: **laser / ToF** rangefinder for fast low-light AF, **lidar** scanner for coarse depth (portrait blur, AR) — beyond on-sensor PDAF

• **microphones** (video sound; arrays → directional/wind-aware audio; sync with IMU), **housekeeping** (thermometer → dark-current/noise comp; Hall sensors → lens/stabilizer element position; real-time clock → timestamps)

• throughline: a camera is now a small **sensor-fusion** platform; the side channels (esp. the IMU) are part of *how the picture is computed*, not just notes attached to it

▸2.10.16 Flash and lighting⧉

`fig-point-op-levels` · levels on a real (flat/hazy) photo: bunched-up luma histogram stretched out to fill [0,1] by setting a black point and a white point — input + after + transfer curve with the clip-and-stretch anchors and the two histograms (Point operations → Black point, white point, and levels, BASIC) ✅

`fig-wide-angle-perspective-distortion` · wide-angle perspective distortion: off-axis spheres image as radially-stretched ellipses (flat-sensor geometry), and faces near a wide frame's edge are widened — not a lens flaw; forward-ref to correction (Pinhole image formation) ✅

• **TTL metering**: the camera fires a **pre-flash**, meters the return **through the lens**, and sets flash power automatically — the flash analog of auto-exposure

• **sync speed & high-speed sync**: a **focal-plane** shutter only fully uncovers the sensor up to its **X-sync speed** (~1/200 s) — faster than that and a moving slit means the flash can't light the whole frame; **high-speed sync (HSS)** pulses the flash rapidly to cover the slit (at a big power cost). A **leaf shutter** (in-lens) opens fully at *any* speed → **syncs at all shutter speeds** (a real advantage for daylight fill).

• **fill flash**: in harsh/backlit light, add flash at **−1 to −2 EV** (the slides say −1.5 to −2, i.e. 3–4× below ambient) to **open the shadows** without looking flashed — it's **dynamic-range management**, not "more light"

• **bounce & off-camera**: never bare flash head-on (flat, harsh, red-eye) — **bounce** off a ceiling/wall (~45°) or use a diffuser to enlarge the source → softer light; **off-camera** flash gives directional, shaped light (→ Computational illumination; computational bounce flash, Davis SIGA 2016)

• 🖱️ **Interactive (web edition):** a live **portrait-lighting simulator** — three soft area lights (key / fill / kicker, each with azimuth, elevation, size/softness, intensity, and colour) on a 3D face, with **area-light-supersampled soft shadows** and a physiological melanin/hemoglobin skin model; a from-behind "setup" view shows the lights as studio umbrellas plus the camera, and lighting presets (butterfly, loop, Rembrandt, split, clamshell, rim) let you feel how light *placement and size* sculpt a face. [fig-portrait-lighting-sim; interactive] *(3D studio umbrella generated with Meshy AI — acknowledge in the caption)*

▸2.10.17 The traditional darkroom⧉

⬜ figure not yet created

chemical darkroom — film develop pipeline + enlarger printing, labelling exposure time / contrast grade / dodging / burning [fig-darkroom-enlarger fig-darkroom-enlarger

⬜ figure not yet created

deferred]

• film **development**: developer (exposed silver-halide → metallic silver) → **stop bath** → **fixer** (dissolves unexposed grains) → a **negative** (dark where the scene was bright)

• **printing**: an **enlarger** projects the negative onto light-sensitive **paper**, developed in turn → a positive print; the enlarger is a *second, fully controlled exposure*

• print controls: **exposure time** = overall lightness; **contrast grade** (paper grade / filter) = tone-mapping steepness; **dodging** = hold light back → lighter region; **burning** = add light → darker region

• the origin of the **dodge/burn** tools, the **exposure/contrast** sliders, and the **Zone System** (the 18% gray of metering = Zone V); makes the "print = performance of the negative" metaphor concrete

▸2.10.18 The digital darkroom: editing software⧉

⬜ figure not yet created

Lightroom-style develop panel vs Photoshop-style layer workspace, each tool annotated with the chapter that derives it [fig-editing-lr-vs-ps fig-editing-lr-vs-ps

⬜ figure not yet created

schem/screenshot

⬜ figure not yet created

deferred]

• **parametric / non-destructive (Lightroom style)**: the original file is untouched; edits are a re-rendered *recipe*. Global: white balance, exposure, contrast, highlights/shadows/whites/blacks, tone curve, HSL, clarity/texture/dehaze, sharpening, noise reduction, lens corrections, crop. Local: masks, graduated/radial filters, AI subject/sky masks. Plus **asset management** (catalog, rating, keyword, search) across whole shoots

• **pixel editor (Photoshop style)**: edits pixels directly — layers, selections, masks; clone stamp & healing brush, content-aware fill, dodge/burn, curves/levels adjustment layers, filters, text, montage → retouching & compositing

• the split: **global · reversible · many photos** vs **local · pixel-level · one photo**; serious workflows use both (develop in LR → retouch/composite in PS)

• 💡 the bridge: **the slider is the interface; the rest of the book is the machinery** — name each tool and forward-ref the chapter that explains it