💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
Computational Photography, an AI-powered Slopendium — 02 Fundamentals
expand to📖 Full book outlinejump to1 parts · 10 chapters · 70 sections · 166 figures embedded · 26 placeholders · double-click a figure to enlarge
Part 2 FUNDAMENTALS OF PHOTOGRAPHY
fig-light-journey
fig-light-journey · FUNDAMENTALS part opener — the journey of light: source → scene (reflection) → lens → sensor / retina → visual system, each step labelled with its chapter
**Photography means writing with light** — from the Greek *phōs* ("light") and *graphē* ("drawing"). This first part follows light on its entire journey and studies each step in turn.
• **the path of light** (and the chapters that follow it): a **source** emits light → it strikes **objects** and is reflected, absorbed, and scattered (the scene) → it passes through a **lens** → it lands on a camera **sensor** or the **retina**'s cones and rods → and the **visual system** interprets it.
• **Light and physics** — what light *is* (wave / ray / photon), how it interacts with matter (reflection, refraction, scattering, the BRDF), and how we *measure* it (radiometry: radiance, irradiance, the inverse-square and cosine laws).
• **Image formation and linear perspective** — how a lens turns 3-D rays into a 2-D image (pinhole, perspective projection), depth of field, the **sensor** and the exposure triangle, and **noise**.
• **Human perception and color** — the eye and visual system; why three **cones** make us trichromatic; color as the projection of a spectrum onto three numbers; opponent processing, constancy, and the contrast sensitivity function.
• **Color technology** — the engineering payoff: **measuring** color (CIE), **encoding** it (linear / gamma / log, color spaces, CIELAB), and **reproducing** it (additive vs subtractive synthesis, gamuts, white balance, color management).
• **Human factors and the art of photography** — making *good* pictures (composition, light, portraiture), the perception of images, and the **ethics** of photography.
• **skippable, but foundational**: a reader who only wants to code can jump straight to BASIC IMAGE PROCESSING — but the **big lessons** of this part recur everywhere later. The ones to carry forward:
• **multiplicative vs additive** — much of imaging is *multiplicative* (illumination × albedo, contrast), some *additive* (incoherent light from different sources, blur); the right **encoding** (linear / log / gamma) follows from which one you are doing (→ [[Big Lessons]]).
• **gamma / ratios matter** — perception is roughly logarithmic, so we store images **gamma-encoded** to spend code levels where the eye looks; what matters is *ratios*, not absolute values.
• **noise is affine** — sensor noise is **read + shot** (a constant plus a term that grows with the signal); because shot noise is **Poisson**, SNR is worst in the **shadows**, which is exactly where noise *looks* worst.
• **projective geometry is matrices + a division** — perspective projection is a **linear map in homogeneous coordinates** followed by a divide-by-depth; that single trick handles cameras, warps, and stereo.
• **color is non-orthogonal and non-negative** — the cone "axes" overlap and light cannot be negative, which is what makes color sensing and reproduction genuinely hard (no perfect set of primaries; white balance is never exact).
• **by the end of this part** you can trace a photon from a lamp to a perceived color, reason quantitatively about exposure and noise, and you will carry the small set of principles the rest of the book keeps reusing.
2.1 Light and physics
• chapter intro: a working model of light — its **color** (spectrum), its **interaction with surfaces**, and its **energy** (radiometry); closes with the **plenoptic function** that ties the pipeline together
2.2 Pinhole image formation and linear perspective
fig-camera-obscura
fig-camera-obscura · camera obscura / pinhole (artist's) 🟨
fig-camera-obscura-apparatus
fig-camera-obscura-apparatus · camera obscura apparatus (Diderot engraving) 🟨
fig-bare-sensor-averaging
fig-bare-sensor-averaging · a bare sensor averages all rays 🟨
fig-pinhole-fov
fig-pinhole-fov · pinhole geometry: real (inverted) vs virtual (upright) image plane, and focal length → field of view 🟨
fig-perspective-projection
fig-perspective-projection · perspective projection equation: x'=f·X/Z, y'=f·Y/Z (divide by depth, scale by f) 🟨
fig-perspective-projection-3d
fig-perspective-projection-3d · perspective projection in 3D: P=(X,Y,Z) → p=(x,y) on the image plane through the pinhole 🟨
fig-perspective-vanishing-points
fig-perspective-vanishing-points · perspective projection + vanishing points 🟨
⬜ figure not yet created
focal length vs subject distance (background compression) fig-focal-length-compression
fig-face-distortion-sim
fig-face-distortion-sim · live face-distortion simulator (web edition): focal length + magnification + eccentricity controls with a full-frame view and a face-cropped view, showing the close-wide "big nose" perspective and the wide-angle edge stretch at constant face size; subject = a face, a grid sphere, or one of six diverse Meshy humans framed on the face; static fallback is a screenshot. *Human 3D models generated with Meshy AI.*
fig-dolly-zoom-sim
fig-dolly-zoom-sim · live dolly-zoom simulator (web edition): hold the subject size while focal length and camera distance trade off so only perspective / background scale changes; log focal (12–200 mm) + distance sliders, realistic Poly Haven environments, subject = head bust or one of six diverse Meshy humans. *Human 3D models generated with Meshy AI.*
fig-portrait-lighting-sim
fig-portrait-lighting-sim · live portrait-lighting simulator (web edition): three soft area lights (key/fill/kicker — az/el/extent/intensity/colour, area-light-supersampled soft shadows) on a 3D face with a physiological melanin/hemoglobin skin model; a from-behind setup view with Meshy studio umbrellas + a camera rig; lighting presets; static fallback is a screenshot. *3D umbrella generated with Meshy AI.*
fig-keystoning-cause
fig-keystoning-cause · keystoning is the tilt, not the lens — a camera tilted up makes world-parallel verticals converge (façade → trapezoid), while the fronto-parallel shot keeps them parallel; projection preserves lines, not parallelism 🟨
fig-point-line-duality
fig-point-line-duality · point↔line duality in homogeneous 2D: two points → a line, two lines → a point (same cross product) 🟨
fig-projection-decomposition
fig-projection-decomposition · projection-matrix decomposition K·[R matplotlib, 3-stage pipeline
fig-crop-focal-length
fig-crop-focal-length · cropping = changing focal length: full frame vs a crop box ≡ a longer-focal-length capture, upsampled 🟨
fig-depth-vs-ray-length
fig-depth-vs-ray-length · depth (z coordinate) vs ray length ‖P‖ from camera to a 3D point; unproject (pixel + depth → 3D) 🟨
• sensor alone would capture a diffuse light. We need to select which rays from which part of the scene contribute to which pixel.
equations
pinhole projection x = f·X/Z, y = f·Y/Z
homogeneous p ≈ K·[R|t]·P
field of view = 2·arctan(sensor / 2f)
2.3 Lens image formation
• chapter intro: from the **pinhole** (sharp but dim) to a **lens** (bright but with one focus plane) — refraction at curved surfaces, the thin-lens linearization, and the depth-of-field that the finite aperture forces on us
2.4 Image measurements as integrals
• chapter intro: the unifying idea behind sensing — **measurement = integrating a slice of the plenoptic function** — then the physical sensor that does it (photosites, CCD/CMOS) and the **noise** that limits it
• sidebar — **photography as objective measurement**: the medium's long arc bends toward objective measurement (film/print = a *performance*; → measurement). The **digital sensor** is the far end: because each photosite *counts photons*, a **raw** image is ≈ a **calibrated radiometric measurement** — a physical map of scene radiance, in real units, of one snapshot of time — not just a rendering. This is what makes HDR / depth / deblur possible: the pixel numbers must *mean* something physical. Ties to the intro's **generation → measurement → generation** arc (→ [[01 Intro]]).
2.5 Sensors: photosites, CCD vs CMOS
fig-sensor-microlens
fig-sensor-microlens · sensor cross-section: microlens → filter → photosite 🟨
fig-rolling-shutter-pan
fig-rolling-shutter-pan · a slanted, sheared vertical pole from an uncorrected fast pan vs a straightened, rectified version, illustrating line-by-line CMOS readout during camera motion 🟨
fig-slowmo-axis
fig-slowmo-axis · four ways to treat the time axis on a shared timeline — normal capture (sparse), high-speed (dense true samples), interpolation (sparse reals with synthesized in-betweens), and long-exposure blur (the integral); only interpolation adds resolution after capture and can be wrong 🟨
fig-mirrorless-anatomy
fig-mirrorless-anatomy · labeled cutaway of a mirrorless full-frame camera (lens+mount, sensor/IBIS, mech+electronic shutter, EVF, processor/on-sensor PDAF, card+battery); SLR mirror-box inset for contrast
fig-rolling-shutter-skew
fig-rolling-shutter-skew · rolling-shutter distortion — a global shutter keeping a vertical pole upright under a fast pan vs a rolling shutter where per-row readout $t(r)=t_\text{frame}+r\,t_\text{row}$ shears the pole and smears fan blades (the jello effect) 🟨
fig-flash-sync
fig-flash-sync · flash & focal-plane-shutter sync: whole frame ≤ X-sync · slit/partial frame above it · high-speed-sync pulse train; fill-flash noted
fig-darkroom-dodge-burn
fig-darkroom-dodge-burn · the darkroom ancestor: dodging (hold back light) and burning (add light) under the enlarger as hand-painted spatially-varying exposure 🟨
2.6 Noise, signal-to-noise ratio and dynamic range
⬜ figure not yet created
noisy image + flat-patch histogram (ISO 3200) fig-noise-histogram
⬜ figure not yet created
noise vs ISO fig-noise-vs-iso
fig-noise-affine
fig-noise-affine · measured noise variance is affine in brightness (σ²≈gain·I+read²) — real per-pixel variance vs mean from a 50-frame aligned ISO-3200 burst, with the affine fit, read-noise floor, and highlight roll-off (Noise, SNR, dynamic range)
fig-noise-gaussian-pixels
fig-noise-gaussian-pixels · a single pixel's value across many frames is ≈ Gaussian — per-pixel histograms from the ISO-3200 burst with Gaussian overlays (Noise)
fig-noise-truncation
fig-noise-truncation · noise clips at black/white so it is not zero-mean at the extremes — a near-black pixel pinned at 0 ~44% of frames, its average biased bright, vs an unbiased midtone (Noise; denoising trap)
fig-snr-vs-stddev
fig-snr-vs-stddev · noise std-dev map vs SNR map of a brightness ramp: std rises with brightness (∝√N), but SNR=√N is worst in the shadows 🟨
⬜ figure not yet created
**underexposure recovery — full-frame vs phone**: the same underexposed scene pushed up several stops in software, the full-frame frame cleaning up while the phone reveals amplified shadow noise / banding [fig-underexposure-recovery-ff-vs-phone fig-underexposure-recovery-ff-vs-phone
⬜ figure not yet created
photo] fig-results-montage
fig-dynamic-range-comparison
fig-dynamic-range-comparison · dynamic-range ladder in **stops** (horizontal bars): colour slide (~5–6) · reflective print (~6–7) · film negative (~12–13) · phone sensor (~10–12) · full-frame sensor (~14) · human eye instantaneous (~10–14) vs adapted (~20+) · an animal example · a typical sun-and-shadow scene (>20) — shows why no single capture holds a high-contrast scene (→ HDR)
⬜ figure not yet created
an animal or two
• the noise **sources** (their variances add):
• **photon / shot noise** — the Poisson statistics of *counting photons*: variance = mean, so σ = √N. Fundamental (it's the light itself), dominates the midtones/highlights; averaging N independent frames cuts it by √N.
• **read noise** — added by the amplifier / ADC on readout; signal-independent, dominates the **shadows** and sets the noise floor (hence dynamic range).
• **thermal / dark current** — electrons freed by heat, independent of light; grows with exposure time and temperature (long exposures & astrophotography → sensor cooling, dark-frame subtraction).
• **fixed-pattern / pattern noise** — per-pixel gain/offset non-uniformity (PRNU/DSNU), hot/stuck pixels, banding; *structured*, so it reads as worse than random noise of equal magnitude, and is removed by calibration (flat / dark fields).
• 💡 **Big lesson — in a *linear* image, noise variance is an *affine* function of brightness.** The variances add: shot noise gives a term **proportional to the signal** (Poisson, variance = mean, scaled by the gain) and read noise a **constant** floor, so the total is **variance ≈ gain·signal + read²** — a straight line in brightness. You can *measure* it: take an aligned **burst** of a static scene and, per pixel, plot the variance across frames against the mean — the points trace that line, slope = photon gain, intercept = the read-noise floor. And a single pixel's value over many frames is **≈ Gaussian**, which is what licenses additive-Gaussian noise models. [`fig-noise-affine` — per-pixel variance vs brightness, measured from a 50-frame ISO-3200 burst; `fig-noise-gaussian-pixels` — per-pixel histograms] → register [[Big Lessons]]
• 💡 **Big lesson — noise is *clipped* at black and white, so near the extremes it is *not* zero-mean.** A recorded value is clamped to [0, max]: near black the negative half of the noise is cut off (frames pile up at 0), near white the positive half is. So at the extremes the noise distribution is **asymmetric** and its mean is pushed **inward**. The consequence for **denoising**: any naive average/smoothing returns that biased mean, so it makes **shadows come out too bright and highlights too dark** — a bias every denoiser has to correct for (carried forward to [[#Denoising]] in BASIC). [`fig-noise-truncation` — a near-black pixel pinned at 0 a third of the time, its average biased bright; from the ISO-3200 burst] → register [[Big Lessons]]
• **the key intuition — it's about ratios (SNR)**: shot noise is **Poisson**, so σ = √N grows with brightness — measured noise is actually **higher in the highlights**. Yet noise *looks* worst in the **shadows**, because perception cares about the **signal-to-noise ratio** SNR = N/√N = √N, which is worst where N is small. "Ratios are all that matters" — the same reason we encode in gamma/log, and the motivation for ETTR (below — Photography 101). [std-dev map vs SNR map figure]
• **dynamic range** = the brightest recordable signal (sensor **saturation / full-well**) ÷ the **noise floor** (read noise in the shadows); it sets how much of a high-contrast scene fits in a *single* exposure → the motivation for **HDR** (Multiple exposure)
• **HDR (high dynamic range)**, sometimes called **wide dynamic range (WDR)**, is a deliberately **fuzzy** term — flag this. It gets used for at least four different things: the *scene* (a high-contrast world), the *capture* (multi-exposure / a high-DR sensor), the *file/encoding* (float or ≥10-bit radiance, e.g. OpenEXR, HDR10), and the *display* (an HDR monitor) — and, loosely, for the **tone-mapped "HDR look."** There is no single sharp definition; say which sense you mean. (Developed in [[Multiple exposure imaging]].)
• 💡 **Big lesson (L15) — dynamic range is set by full-well capacity over the noise floor**: a single exposure records from a **top** (the photosite's **full-well capacity** — where it clips to white) down to a **bottom** (the **noise floor** — read noise in the shadows); their **ratio is the dynamic range** (quoted in stops). You **widen** it with a **bigger well** (larger photosites — why a full-frame sensor out-ranges a phone) or a **lower floor** (cooling, lower read noise), and you **beat** it altogether by merging **multiple exposures** (HDR). The capture-side companion to **L6**: what bounds a shot is this *range*, not the bit depth. → register [[Big Lessons]]
• **examples, in stops** (the figure): a color **slide** is narrow (~5–6 stops), a reflective **print** ~6–7, **film negative** wide (~12–13 latitude), a **phone** sensor ~10–12, a **full-frame** sensor ~14; the **human eye** is ~10–14 stops *instantaneously* but ~**20+** once **adaptation** is allowed — and some **animals** do better in their niche; an outdoor sun-and-shadow scene can exceed **20 stops**, which is why no single capture holds it [fig-dynamic-range-comparison]
• **see it — recover an underexposed shot, full-frame vs phone**: shoot the *same* underexposed scene as **raw** on a **full-frame** camera and on a **phone**, then **push the exposure up** in software (e.g. +3–4 stops). The full-frame frame cleans up (deep full-well + low read-noise floor → wide dynamic range), while the phone frame breaks down into **shadow noise and banding** as the lift amplifies its read-noise floor — a direct, visible demonstration that **dynamic range scales with photosite / sensor size**, and exactly why phones lean on **burst / HDR+** (Multiple exposure) rather than a single exposure. [fig-underexposure-recovery-ff-vs-phone; photo — paired Fredo full-frame DNG + phone raw] *(also a coding exercise — see [[Exercises & Experiments]])*
• 💡 **Big lesson (L6) — quantization is rarely the real problem**: with enough bits and a sane encoding, it's **noise and dynamic range** (this section) that bite — not the number of levels. The one exception is **gamma / banding in shadows**, which is exactly why gamma encoding exists (allocate codes perceptually); so "rarely" ≠ "never". Reminded again in BASIC → Image representation → *Float vs 8-bit*.
equations
shot noise Poisson (variance = mean), σ ∝ √N, SNR = √N
variances add (shot + read + thermal)
averaging N frames → noise /√N
PSNR = 10·log₁₀(MAX²/MSE)
dynamic range = full-well ÷ read-noise floor
2.7 Multiple view geometry
• chapter intro: the leap from one image to **two** — what a second viewpoint buys you (depth), built on the stereo / disparity intuition, then the epipolar-geometry derivation of E/F, with only their estimation and the multi-view scaling-up forwarded
2.8 Human (and animal) vision and color
2.9 Color technology
2.10 Photography and camera 101
• chapter intro: having built the physics, we now sit behind a real camera — its **controls, modes, and guts** — and see how phones reach the same goal by a wholly different (computational) route