Computational Photography, an AI-powered Slopendium — 18 Image Forensics and Authentication

▸Part 18 IMAGE FORENSICS AND AUTHENTICATION⧉

The rest of this book is mostly about *making* images — better, brighter, sharper, or out of nothing. This part is about *believing* them. Cheap editing tools, and now generative models that synthesize a convincing photograph from a sentence, have severed the old reflexive link between "photograph" and "something that happened." Two responses run in parallel. **Forensics** works on an image you are handed cold — no cooperation, possibly an adversary on the other side — and looks for the statistical and physical fingerprints that authentic capture leaves and editing or generation disturbs. **Authentication** works the other way around: it asks creators and cameras to *sign* their work and every edit to it, so trust comes from a verifiable record rather than a hunt for tells. Neither is sufficient alone — forensics gives evidence, not proof, and provenance is opt-in — so the honest position is that they are complementary.

▸18.1 Image Forensics⧉

`fig-focus-stacking` · optics-chapter illustrative figure (07-07 Figure 4): a stepped-focus stack → sharpness selection → all-in-focus composite. Synthetic per-slice defocus on one photo (`sourced/corn-cobs.jpg`, © Frédo Durand) — license-safe. The full real-data treatment lives in part-08 `fig-focalstack-*` ✅

`fig-telephoto-vs-retrofocus` · two two-group schematics — telephoto (+ then −, principal plane H′ pushed in front → physical length < f) vs retrofocus/inverted-telephoto (− then +, long back-focal distance to clear the SLR mirror); marks f vs physical length, H′, F′ ✅

`fig-correspondence-then-transport` · the L17 spine — one scene displaced (two views / two faces / two frames / one long frame), each resolved by estimating a coordinate map (homography, morph field, flow, track, motion vector, camera path) then transporting pixels by one shared inverse-warp engine; the finding is hard, the moving is plumbing 🟨

`fig-thick-lens-wave-sim` · live 2-D FDTD wave sim of a **thick biconvex lens** focusing: a point source's diverging wave is reshaped by the slower glass into a converging one that meets at an image point; focus tracks 1/f=(n−1)2/R and 1/v=1/f−1/u (Fundamentals → Lens image formation, Fermat/equal-path) ✅

`fig-jpeg-artifacts` · JPEG blocking & ringing at low quality (original vs q=8, zoomed crop) 🟨

`fig-illum-times-reflectance` · light = illumination × reflectance (per-wavelength) ✅

`fig-portrait-lighting-sim` · live portrait-lighting simulator (web edition): three soft area lights (key/fill/kicker — az/el/extent/intensity/colour, area-light-supersampled soft shadows) on a 3D face with a physiological melanin/hemoglobin skin model; a from-behind setup view with Meshy studio umbrellas + a camera rig; lighting presets; static fallback is a screenshot. *3D umbrella generated with Meshy AI.* ✅

The premise of all blind forensics: a real photograph is the output of a specific **physical pipeline** — a particular sensor, a color-filter mosaic, a demosaicking algorithm, a lens, a JPEG encoder, a single illumination of a single 3-D scene — and that pipeline stamps the pixels with **consistent low-level regularities**. Splice two photos together, paint something out, scale a pasted region, or synthesize an image from a network, and those regularities are broken locally or never reproduced. Forensics is the art of modeling the expected regularity and flagging where it fails.

▸18.1.1 The problem and the threat model⧉

• **passive / blind, after the fact, adversarial.** Forensics assumes **no cooperation**: no watermark, no signature, possibly a skilled forger who knows the detectors. You get pixels (maybe a file) and must reason backward. Contrast the next chapter, which assumes cooperation up front.

• **three questions.** *Is it authentic?* (manipulation detection), *what was changed and where?* (localization), and *where did it come from?* (source attribution / was it AI-generated). Different traces answer different questions.

• **why now.** Editing went from darkroom craft to a one-tap "remove object"; generation went from impossible to a text prompt. The barrier to a convincing fake collapsed, so detection had to industrialize.

▸18.1.2 Sensor and pipeline traces: PRNU, CFA, and noise⧉

• **sensor fingerprint — PRNU (photo-response non-uniformity).** Every sensor's photosites have tiny, *permanent*, manufacturing-random gain differences, so a given camera multiplies the scene by a fixed, near-unique noise pattern. Average many of a camera's images (especially flat, bright ones) to estimate its **PRNU fingerprint**; then **correlate** a suspect image's noise residual with candidate fingerprints to (a) **identify the source camera** (forensic "ballistics") and (b) **localize tampering** — a pasted-in region carries a *different* (or no) fingerprint, so its correlation drops out. Lukáš–Fridrich–Goljan's signature result. Robust-ish, but defeatable by strong denoising, re-compression, or deliberate fingerprint transplant.

• **CFA / demosaicking correlations.** A single-chip camera measures one color per pixel (the **Bayer mosaic**) and *interpolates* the rest, which imprints a periodic correlation lattice between a pixel and its neighbors ([[Demosaicking]]). Resaving, resizing, or splicing disturbs that lattice; estimating the local interpolation pattern reveals regions that don't match the camera's demosaicking (Popescu & Farid).

• **noise & resampling inconsistency.** Authentic capture has a coherent, signal-dependent noise level (shot + read, [[Fundamentals]] — Noise). A spliced region from another image, or one that was **scaled/rotated**, carries a *different* noise floor and tell-tale **resampling** periodicities (interpolation correlates neighbors in a detectable way). Local noise-level and resampling maps expose the seams.

• **lens traces.** Optical **chromatic aberration** and **vignetting/lens-distortion** vary smoothly and predictably across a real frame; a pasted object violates the global model (Johnson & Farid). Tie-in to [[Optics, lenses, and aberration correction]].

▸18.1.3 Compression, geometry, and metadata forensics⧉

• **JPEG forensics — the file remembers.** JPEG compresses 8×8 blocks with a quantization table; **re-saving** leaves **double-compression** artifacts in the DCT-coefficient histograms, **blocking-grid misalignment** where a region was pasted off-grid, and **JPEG ghosts** — re-compress the whole image at a quality matching a hidden region's prior compression and that region's error *dips*, making it pop out (Farid). The quantization table itself is a coarse camera/software signature. (Background: [[File formats and compression]].)

• **lighting, shadows, and geometry consistency.** A composite must obey **one** physical scene: a single dominant **light direction** (estimate it from shading and from specular highlights / catchlights in eyes), consistent **shadows** (cast-shadow constraints, the shadow of every object must agree on the light), consistent **reflections**, and consistent **perspective / vanishing points**. Physically impossible lighting is one of the hardest tells for a forger to fake and one of the most convincing for a court (Farid's geometric methods). Ties to the BRDF / shading and projective geometry earlier in the book.

• **copy-move and splicing.** **Copy-move** (clone-stamp within one image — hiding an object by covering it with a patch of the same image) is found by matching **blocks or keypoints** that repeat at an unusual offset (Fridrich et al.; SIFT-based variants — cross-ref [[Automatic panorama stitching from multiple views and feature matching]]). **Splicing** (paste from another image) is caught by the seam/noise/CFA/illumination inconsistencies above.

• **metadata & file-structure forensics.** **EXIF** inconsistencies (a "straight-from-camera" JPEG with editing software in the tags, an embedded **thumbnail** that doesn't match the full image, an impossible timestamp/GPS, a quantization table that no camera uses) are quick, useful tells — but **trivially stripped or forged**, so they corroborate, never prove. Direct tie to [[Appendices#EXIF and image metadata]].

▸18.1.4 Deepfakes, GAN/diffusion fingerprints, and learned detection⧉

• **the synthetic-media problem.** Generative models ([[Generative AI and diffusion]]) make **face swaps / reenactment ("deepfakes")** and **fully synthetic** images that pass the eye. Detection shifts from hand-modeled traces to **learned classifiers** trained on real-vs-fake corpora.

• **generative fingerprints.** Up-sampling layers and the generator architecture leave **frequency-domain artifacts** — periodic grid/checkerboard spectra, unnatural high-frequency statistics — and even **model-specific "fingerprints"** that can attribute an image to a particular generator (Marra, Yu). Wang et al. showed a detector trained on one CNN generator's images can **generalize surprisingly well** with the right augmentation — *for a while*.

• **the generalization wall.** A detector trained on yesterday's generator often **fails on tomorrow's** (new architectures, diffusion vs GAN, post-processing, social-media re-compression). This is the central, unsolved difficulty; benchmarks like **FaceForensics++** and the **DFDC** track it.

• **biological/behavioral cues (faces, video).** Inconsistent **blinking**, pulse (subtle skin-color **video magnification** — cross-ref [[Video magnification]]), lip-sync, head-pose vs gaze, and lighting-on-skin physics catch many face fakes — and get patched in the next generation.

▸18.1.5 Why forensics is evidence, not proof — and the hand-off to provenance⧉

• 💡 **the cat-and-mouse reality.** Every published detector is a **training target** for the next forger; **anti-forensics** deliberately launders fingerprints (denoise, re-add plausible noise, re-compress, adversarial perturbations). Robustness and generalization, not peak accuracy on a fixed benchmark, are what matter.

• **calibrated honesty.** Forensics produces a **likelihood with a false-positive rate**, region maps, and physical arguments a human expert weighs — not a yes/no oracle. Over-claimed "AI detectors" do real harm. The discipline's own honesty is part of the methodology.

• **the structural fix.** Because detection is adversarial and asymptotically losing against ever-better generators, the durable answer is to **stop guessing** and instead **carry trust with the asset** — signed provenance at capture and through every edit. That is the next chapter.

▸18.2 Authentication and Provenance (C2PA)⧉

`fig-diffraction-aperture-size` · two apertures: a smaller one diffracts more (spread θ ≈ λ/D) 🟨

⬜ figure not yet created

their complementarity [fig-provenance-vs-forensics

`fig-crop-focal-length` · cropping = changing focal length: full frame vs a crop box ≡ a longer-focal-length capture, upsampled 🟨

`fig-rotation-challenge` · Andrew Adams' rotation challenge: rotate a full turn in N steps of 360/N° (each step bilinear-resamples the PREVIOUS result), for N∈{10,60,360}; track a validity mask through identical rotations so corners that ever leave the frame go black. Result returns to 0° but is resampled N times → accumulated blur/mush + shrinking valid region toward the inscribed disc. More, smaller rotations compound MORE damage (rotate k× by 360/N, not once by k·360/N) ✅

⬜ figure not yet created

**Durable Content Credentials** — manifest + invisible **watermark** + **fingerprint**, so a screenshot that strips the metadata can still be **recovered** by matching the watermark/fingerprint to a provenance store [fig-durable-credentials fig-editing-lr-vs-ps

`fig-pinhole-imaging` · imaging-scenario series (2/3): add a pinhole to the bare sensor — one ray per scene point → a dim **inverted** image (same tree+sensor+colours as fig-bare-sensor-averaging) ✅

`fig-cos4-vignetting` · natural vignetting: relative illumination ∝ cos⁴θ falling toward the corner, with an image-corner darkening illustration (companion to `fig-cos4-falloff`, ties θ to image radius + a picture) ✅

Forensics is a losing race in the limit: as generators improve, the pixel-level tells vanish. Authentication changes the game by **not relying on the pixels to confess**. Instead it asks the *honest* parts of the pipeline — the camera, the editing app, the AI tool, the publisher — to **cryptographically sign what they did**, and binds that signed record to the image. A reader then **verifies a signature against a trust list** rather than hunting for artifacts. The catch, which we keep front and center, is that this is **opt-in**: it can prove an image *is* what it claims, but a missing or stripped credential proves nothing — so it complements forensics rather than replacing it.

▸18.2.1 From detection to attestation: trustworthy cameras and watermarking⧉

• **the shift.** Stop asking "does this image betray a fake?" and start asking "**what does this image's signed record say, and does it verify?**" Trust derives from cryptography and a known signer, not from artifacts.

• **the trustworthy camera (the old idea).** Friedman (1993) proposed a camera that **digitally signs each photo at capture** with a private key, so any later edit breaks the signature. The seed of all provenance — held back for decades by missing standards, key management, and the fact that *every* edit (even a legitimate crop) invalidates a naive whole-image signature. C2PA's manifest-of-edits is the modern fix.

• **watermarking — robust, fragile, semi-fragile.** Embed an imperceptible signal in the pixels. **Robust** watermarks survive compression/cropping/screenshotting (used to carry an ID or recover provenance); **fragile** ones break on any edit (tamper *evidence*); **semi-fragile** tolerate benign processing but break on malicious edits. Watermarking is provenance's way to survive when the metadata is stripped (see Durable Content Credentials). Text: Cox–Miller–Bloom.

▸18.2.2 C2PA and Content Credentials: the standard⧉

• **who and what.** **C2PA** — the *Coalition for Content Provenance and Authenticity* — is an open standard formed by merging Adobe's **Content Authenticity Initiative (CAI)** with **Project Origin** (BBC, Microsoft), now backed by Adobe, Microsoft, BBC, Intel, **Sony, Nikon, Leica, Canon**, Truepic, Google, OpenAI, and others. The consumer-facing brand is **Content Credentials** ("a nutrition label for digital content").

• **the manifest.** Attached to (or associated with) the asset is a **manifest**: a **tamper-evident, cryptographically-signed** bundle of **assertions** ("claims") — e.g. *capture device & settings*, *date/time/location*, the **list of edits** performed, **thumbnails** of intermediate states, **AI-generation disclosure** (was a generative model used, and how), and **who signed**. Change a pixel without updating the manifest and the **hash** no longer matches → tamper is evident.

• **the claim signature & trust.** The manifest is signed with an **X.509 certificate**; verification walks the **certificate chain** to a **trust list** of accepted signers (cameras, apps, publishers). So "verified" means *this manifest was signed by a key we recognize and the bytes haven't changed since* — **not** that the content is "true," only that its provenance record is **authentic and intact**.

• **hard vs soft binding.** **Hard binding** = a cryptographic **hash of the pixels** in the manifest (any pixel change is detected, but the credential is lost if the file is re-encoded or the manifest is stripped). **Soft binding** = an imperceptible **watermark** and/or a perceptual **fingerprint** that lets a stripped or re-encoded asset be **re-matched** to its manifest in a provenance store. C2PA supports both.

• **ingredients and provenance chains.** Edits **compose**: open a photo, crop it, color-grade it, paste in an element, export. Each step records its **ingredients** (the assets it consumed) and adds a **signed assertion**, building an **auditable lineage** from capture to publish. You can trace where a published image's pieces came from.

• **Durable Content Credentials.** Because metadata is easily stripped (a screenshot, a re-upload), Adobe/CAI pair the **signed manifest + invisible watermark + fingerprint** so provenance is **recoverable** even after stripping: match the watermark/fingerprint to the cloud provenance store and re-surface the credential. This is the practical answer to "but you can just screenshot it."

• **verification & UX.** Published assets carry a small **"cr" Content Credentials pin**; clicking it (or dropping the file into *verify.contentauthenticity.org*) shows the manifest — device, edits, AI use, signer. The whole point is a **one-click, human-readable** provenance view.

▸18.2.3 AI disclosure, watermarking, and the regulatory push⧉

• **disclosing the synthetic.** Generative tools increasingly **attach C2PA at creation**: Adobe **Firefly**, OpenAI's image models, Microsoft, and others stamp "**AI-generated**" assertions, so synthetic content is *labeled by the maker* rather than *caught by a detector*. This is provenance's most immediate, highest-leverage use.

• **AI watermarking.** Complementary invisible watermarks designed for *generated* content — Google DeepMind's **SynthID** (for images, audio, and text) — let a generator mark its own output so it can later be recognized even without metadata. Pairs with C2PA's soft binding.

• **regulation.** The **EU AI Act** (and similar proposals) push **mandatory labeling/transparency** for AI-generated and manipulated media, which is accelerating provenance adoption across cameras, editors, platforms, and model providers.

▸18.2.4 Limits, critiques, and the forensics partnership⧉

• ⚠️ **opt-in: absence is not guilt.** No credential ≠ fake; a credential ≠ "true." Provenance proves **origin and integrity of the record**, not the *honesty* of the scene. A staged photo can carry a perfectly valid credential.

• ⚠️ **strippable (mitigated, not solved).** Plain metadata is removed by a screenshot or a re-upload; **Durable Content Credentials** (watermark + fingerprint + cloud store) recover it, but watermarks can be attacked and fingerprints are probabilistic.

• ⚠️ **governance & trust lists.** Who decides which signers are trusted? Centralized trust lists carry power and single points of failure; key compromise is a real risk.

• ⚠️ **privacy.** Rich provenance can leak **location, device serials, and identity** (echoing the EXIF privacy issue) — so the standard makes many assertions optional and redactable.

• **the synthesis.** **Forensics** (detect, no cooperation, adversarial) and **authentication** (attest, cooperative, opt-in) are two halves of one trust system: provenance makes honesty cheap and verifiable; forensics handles everything *without* a credential. Trust on the modern internet of images comes from **both**, plus the human judgment the [[Human factors and the art of photography|ethics]] discussion frames.