1.1 From Digital to Computational Photography⧉
When digital cameras first arrived, they did one obvious thing: they swapped a roll of film for a sensor. No more buying film, no more waiting for the lab, no more wondering whether the shot came out — you could see it on a little screen the instant you pressed the button, and you could share it with the world a moment later. Quick, cheap, gratifying. That alone was enough to win.
But the sensor brought something far more consequential than instant gratification, and at first almost nobody noticed it. Once the image is a grid of numbers rather than a pattern of silver grains, you can put arbitrary computation between the photons in the scene and the picture you finally look at. The camera stops being a copying machine and becomes a small computer that happens to have a lens attached. That single shift — computation sitting in the middle of the imaging path — is what this book is about.
It lets us do genuinely surprising things. We can photograph a scene whose range of brightness far exceeds what any single exposure can hold — a sunlit window seen from a dark room — and recover detail everywhere. We can computationally remove blur from a shaky handheld shot. We can stitch a sweep of ordinary frames into a single wide panorama no lens could capture in one shot. And if we capture not just where light lands but the full bundle of rays flowing through the lens, we can refocus a photograph after it was taken, or pull a 3D model of the scene out of it.
The striking part is that you don't need any of those exotic goals to need computation. Even the most ordinary snapshot already depends on it. As we'll see shortly, a typical sensor measures only one color channel at each pixel — red, or green, or blue — and the other two are reconstructed by an algorithm before you ever see a full-color image. Computation isn't the icing on photography; it's baked into the bread.
This is why a phone takes far better pictures than its tiny lens and sensor have any right to produce. Squeezed into a few millimeters, the optics are mediocre by any classical standard; the image quality comes from computation — aligning and merging many frames, learned denoising, careful tone reproduction — doing the work that a big lens used to do.
And the story is still accelerating. Generative AI has learned to model the space of all plausible images, and can now synthesize, complete, and transform pictures with a fluency that would have seemed like science fiction a decade ago. There is something a little vertiginous about it — a machine that has, in effect, internalized every photograph that could exist, much like the library in Borges's story that contains every possible book (Borges 1941; on the parallel between that library and modern generative models, see Bottou & Schölkopf 2023). We will treat generative methods as full members of the field, not a sideshow.
1.1.1 Computation is the new optics⧉
For almost the entire history of photography, the principle of image formation barely changed. A lens gathers the rays leaving a scene and focuses them onto a flat surface — film, then a sensor — which records them directly. The final photograph is essentially a copy of the optical image the lens forms. Under this arrangement there is really only one way to make a better picture: build better optics. Sharper lenses, finer film, larger formats. Quality was an optics problem, and for a century and a half that is exactly how the field improved.
Computational photography breaks this chain. By inserting computation between the optical image and the final picture, we are no longer bound to copy what the lens produced. We can push past physical limits the optics alone couldn't beat, edit flexibly after the shutter has closed, record information an ordinary photograph throws away — depth, scene structure, the direction light was traveling — and construct visual experiences that have no single-exposure counterpart at all.
The deepest consequence is easy to miss. Once the final image need not resemble the optical image, the optical image is freed to be whatever is most useful to the computation, not what looks best to the eye. The raw measurement can be a coded, scrambled, or otherwise strange thing, as long as an algorithm can turn it into the picture we want. Computation is not merely post-processing tacked onto a normally formed image: the optics themselves get redesigned around it. The lens and the algorithm are co-designed, and we optimize the whole pipeline end to end rather than each piece in isolation.
A concrete example makes this vivid. In coded-aperture imaging (Levin et al. 2007), a carefully patterned mask replaces the lens's ordinary round aperture, so that the out-of-focus blur it produces is no longer a smooth disk but an invertible pattern whose shape encodes how far each part of the scene lies from the focal plane. The raw photograph looks oddly, almost wrongly blurred — exactly the point that the optical image need not resemble the final one. The mask and the deblurring algorithm are designed together: the mask is shaped precisely so that the algorithm can both recover a sharp image and read off scene depth from a single shot. We return to coded apertures and this kind of optics–algorithm co-design in Advanced computational photography.
That is what we mean by the slogan "computation is the new optics." Optics were how we extended our ability to image the world for a hundred and fifty years; digital processing now vastly expands how we form and enhance images, and it does so with a flexibility that grinding glass never had.
There is a lovely older analogy for this. Ansel Adams — who was, before he was a photographer, a trained pianist — described the negative as the score and the print as the performance. A "straight print" reproduces the negative mechanically, note for note. But the photographer in the darkroom interprets that score: dodging here, burning there, choosing how to render the scene. Computation is the modern darkroom, and it lets us interpret the score far more freely than dodging and burning ever could.
1.1.2 The goals of computational photography⧉
It helps to keep in view why we reach for computation in the first place. The motivations recur throughout the book, and a given technique usually serves more than one of them at once:
- Prettier pictures — straightforwardly better-looking results: cleaner, sharper, better-toned, more pleasing color.
- Alleviate physical limitations — beat the constraints the hardware imposes: too little light, too little dynamic range, a lens too small or too shaky.
- Record more information — capture what an ordinary photograph discards: depth, scene structure, separate layers of the scene.
- Facilitate human-driven post-processing — give the photographer powerful, flexible control after capture rather than locking in every decision at the moment of the shot.
- New visual experiences — make images that simply have no single-shot equivalent, like a refocusable photograph or a navigable panorama.
- Reveal the invisible — surface what the eye cannot see on its own, from tiny motions and color changes to information outside the visible band.
1.1.3 Many flavors: from traditional to generative⧉
Computational photography is not one technique but a spectrum, and it is worth laying out the range before we dive in. At one end sits traditional photography lightly assisted by computation — a single capture, processed much as a darkroom would have processed it. A step further, we combine multiple exposures: several frames of the same scene, merged to beat a physical limit (more dynamic range, less noise, a wider field of view). Further still, we change the imaging itself — co-designing new optics with the reconstruction, so the camera measures something unusual on purpose. And increasingly, at the far end, the picture is partly or wholly generated, synthesized by a model rather than measured from the scene.
There is a pleasing arc to this. Image-making began as pure generation — a painter inventing the picture by hand. Photography then swung hard toward measurement: the camera as an objective recorder of the rays in front of it. And now, with generative AI, we are swinging back toward generation — except this time the generator is a learned model, and measurement and synthesis blur into each other. The book lives across this whole spectrum.
Photography's history is, in large part, a history of optics — and the recent chapters are increasingly a history of computation. A rough chronology:
- Optics first — the camera obscura. A darkened room or box with a small hole projects an inverted image of the world onto the far wall — the ancestor of every camera. Known since antiquity, described carefully by Alhazen around 1020, and used as a drawing aid by Renaissance painters: image formation centuries before anyone could fix the image in place.
- 1826 — first permanent photograph. Nicéphore Niépce records a lasting image by heliography — the oldest surviving photograph made in a camera (Figure 1.1.3).
- 1839 — photography announced. Louis Daguerre's daguerreotype produces a sharp image that, crucially, does not fade; Henry Fox Talbot's calotype introduces the negative→positive idea — one negative, any number of positive prints — the conceptual foundation of film photography.
- 1851 — wet collodion. Frederick Scott Archer's collodion process cuts exposure times to a few seconds.
- 1861 — first color photograph. James Clerk Maxwell combines three exposures through red, green, and blue filters — the three-channel idea we still use.
- 1888 / 1900 — photography for everyone. Kodak's roll film arrives under the slogan "you press the button, we do the rest" (1888); the Brownie (1900) puts a camera in ordinary hands. In parallel, the Lumière brothers launch cinema (1895) and a practical color process, Autochrome (1907).
- 1925 — the Leica. 35 mm becomes the standard small format.
- 1935 — coatings and color film. Anti-reflection lens coating (Smakula, at Zeiss) cuts internal flare and lifts transmission and contrast — later refined into multi-coating, like Zeiss's T*, in the 1970s. The same year, Kodachrome delivers practical color film (Kodacolor color-negative film follows in 1942).
- 1948 — instant photography. Edwin Land's Polaroid Land Camera develops the print inside the camera in about a minute — no lab, no waiting; the iconic folding SX-70 single-lens reflex follows in 1972.
- 1963 — foolproof loading. Kodak's Instamatic, with its drop-in cartridge, makes loading mistake-proof and cements the mass-market snapshot.
- 1969 / 1975 — toward digital. Boyle and Smith invent the charge-coupled device (CCD) (1969); Steven Sasson at Kodak builds the first digital-camera prototype around one (1975).
- 1977 / 1985 / 1995 — automation. The Konica C35 AF brings the first autofocus camera (1977); the Minolta Maxxum 7000 introduces phase-detection autofocus in a single-lens reflex (SLR) (1985); Canon ships the first optically image-stabilized lens (1995), with sensor-shift, in-body stabilization arriving later.
- Late 1980s–1990s — the image industry went digital before the camera did. Long before most photographs were captured digitally, the photo industry had already moved to digital images. Desktop publishing and digital prepress (late 1980s) scanned film for page layout; drum and film scanners, together with Kodak's Photo CD (1992), digitized negatives and slides; Adobe Photoshop 1.0 (1990, the Knoll brothers, Thomas and John) put the darkroom on a desktop; and photojournalism digitized early — Kodak DCS bodies feeding pictures down the wire — because filing a picture fast beat film's quality. Images were edited, scanned, and transmitted digitally years before the camera itself caught up.
- 1990s / ~2000 — the digital era. The first practical digital cameras appear through the 1990s — the Kodak DCS (1991), the Apple QuickTake (1994), the Nikon D1 (1999) — and the first camera phone follows around 2000, putting a camera in every pocket.
- 2012 / ~2014 / ~2022 — the learning era. ImageNet and AlexNet set off the deep-learning revolution in vision (2012); generative adversarial networks (GANs) arrive around 2014; diffusion models and systems like DALL·E bring generative image AI into the mainstream around 2022.
- Side branches worth knowing. The flash, which let photographers carry their own light; holography, which records the wavefront itself rather than a flat image; Gabriel Lippmann's integral imaging and his interference-based color photography; consumer light-field cameras like Lytro; and the term "computational photography" itself, coined by Steve Mann in the mid-1990s and broadened into common use by Marc Levoy in the mid-2000s.
1.1.4 Objective, perceptual, and subjective⧉
Problems in computational photography are not all of one kind, and knowing which kind you are facing tells you what counts as a good answer — and what tools even apply. It helps to picture a spectrum running from objective to subjective, with a surprisingly important middle.
At the objective end live physics and measurement: radiometry, optics, sensor noise, geometry, reconstruction error. Here there is a right answer. A deblurred image is measurably closer to or further from the ground truth. A signal-to-noise ratio is a number. A calibration is correct or it isn't. You can put error bars on everything.
At the subjective end live taste and intent: what makes a photograph good, how to compose it, which color grade is pleasing, what artistic look to aim for. There is no single correct answer, and pretending otherwise just hides the judgment being made.
The interesting part is the middle ground — perceptual — which concerns how an image actually looks to a human eye. You might expect this to be hopelessly subjective, but it is not. Human vision is remarkably well modeled mathematically, along several measurable dimensions: the contrast sensitivity function, color appearance, lightness and brightness, visual masking, perceived sharpness, and full image-quality metrics such as SSIM and visible-difference predictors. "How it looks" can be measured and even predicted, though it is not pure physics. That is precisely what lets us engineer for the eye — tone-mapping to fit the eye's response, throwing away color detail the eye won't miss, choosing where to spend bits or where a little smoothing will go unnoticed.
This objective → perceptual → subjective spread is a lens we'll return to throughout the book. Before you reach for a tool, it pays to ask which kind of problem you are actually solving: a measurement to get right, a perceptual effect to model, or a matter of taste to respect.
1.1.5 A field at a crossroads⧉
Part of what makes computational photography so much fun is that it sits at the crossroads of an unusual number of disciplines — you rarely get to touch this many at once. It draws on algorithms; AI and machine learning; physics (optics, light transport, radiometry); chemistry (film, sensors, dyes); electrical engineering (sensors, circuits, the image signal processor, or ISP); mechanical engineering (lenses, shutters, stabilization, camera bodies); computer graphics; computer vision; human perception; the art and craft of photography itself; human–computer interaction (HCI); performance engineering (real-time, on-device processing); and ethics.
And these don't take turns — they show up together, often inside a single feature. Take a phone's night mode. It leans on physics (catching scarce photons), electrical engineering (the sensor that catches them), algorithms (aligning and merging a burst of frames), AI (learned denoising), perception (tone-mapping the result so it reads naturally), performance engineering (finishing all of this in about a second, on a phone, without draining the battery), and ethics (how much to "beautify" a face, and whether to tell the user) — all at once. That breadth is not a complication to tolerate; it is the appeal.
1.1.6 Neighbouring fields, and where the work is published⧉
A few of those neighbours overlap with computational photography so heavily that it is worth saying exactly how they relate — and how they differ.
Computer vision is the closest. A useful way to carve up vision is by its three ingredients: 2-D images (the pixels themselves — filtering, features, matching), 3-D (the geometry behind the pixels — depth, shape, pose), and meaning (semantics — what objects and scenes the pixels depict). Computer vision ultimately cares most about the last two: it treats the image as a means to recover 3-D structure or to understand content. Computational photography inverts that emphasis: its product is the 2-D image — a better photograph. But it freely reaches into 3-D and meaning when they help the picture: depth to fake a shallow-focus background, segmentation to brighten a face and not the sky, recognition to pick the frame where everyone is smiling. The same tools, pointed at a different goal.
Computer graphics runs the arrow the other way — synthesizing images from models of geometry, materials, and light. Computational photography borrows its rendering and image-based-rendering machinery (and, increasingly, shares it outright: gradient-domain editing, light fields, differentiable and neural rendering all live in both). Image processing is the classical signal-processing bedrock — filtering, transforms, sampling, compression — that the whole enterprise is built on; computational photography is in large part image processing that knows where its pixels came from. And optics supplies the physics of image formation, which computational photography increasingly co-designs with the computation rather than treating as fixed (coded apertures, lensless cameras, end-to-end optimized lenses).
Because the field straddles these neighbours, so does its literature. A great deal of computational photography is published at the major computer-vision conferences — CVPR, ICCV, and ECCV — and at the premier graphics venue, SIGGRAPH (and SIGGRAPH Asia / ACM Transactions on Graphics). But the home of the community is ICCP — the IEEE International Conference on Computational Photography — the venue dedicated to the field itself.
1.1.7 It's not just about photography⧉
We will talk about cameras and pictures throughout this book, because that is the most familiar and most vivid way in. But almost nothing here is really about snapshots. The underlying machinery — forming an image from measurements, modeling how light and optics and sensors behave, and inverting that model with computation — is the same wherever images are made, and once you have it you start seeing it everywhere.
Start with the ordinary camera pointed at extraordinary jobs. The instant a camera stops being aimed by a human enjoying the view and starts being aimed by a machine making a decision, the same pipeline serves a new master. Robotics and self-driving cars lean on cameras (and their cousins, depth sensors and lidar) to perceive, localize, and navigate — the alignment, depth, and motion estimation we develop for panoramas and refocusing are exactly what a robot needs to know where it is and what is moving. Surveillance and security, industrial inspection (spotting a hairline crack or a misplaced solder joint on a production line, where a camera that never blinks beats any human inspector), agriculture, drones, document scanning, biometrics — all are computational imaging wearing work clothes. The picture is no longer the product; it is an input to a decision.
Then push the optics past what a handheld lens can do. Astronomy is computational photography at the grandest scale: telescopes stack thousands of frames to beat noise (the same averaging that powers a phone's night mode), use adaptive optics to cancel the atmosphere's blur in real time, and combine separate dishes into one synthetic aperture — the trick that let the Event Horizon Telescope assemble an Earth-sized virtual lens to image a black hole. At the opposite scale, microscopy does the same: light-sheet, confocal, and computational microscopes reconstruct 3D cellular structure, and super-resolution techniques reach past the diffraction limit we will meet in the optics part. Big or small, it is the same game — co-design the measurement and the reconstruction.
Change the wavelength and the field opens again. Visible light is a sliver of the electromagnetic spectrum, and imaging lives all along it. Infrared gives thermal cameras and night vision; ultraviolet reveals forgeries and surface detail; X-ray imaging, in the form of computed tomography (CT), reconstructs a 3D body from many 1D projections — a pure inverse problem; radio gives us radio astronomy and synthetic-aperture radar (SAR) that maps terrain through clouds and darkness. The whole of medical imaging is this same discipline under a different name: CT, magnetic resonance imaging (MRI), positron-emission tomography (PET), and ultrasound are all reconstruction problems, recovering an image from indirect measurements by solving exactly the kind of regularized inverse problem this book builds up to. A radiologist's scanner and a phone's night mode are closer cousins than either would guess.
And the techniques do not even need an image at all. Sampling, filtering, the Fourier transform, denoising, priors, inverse problems — the toolkit is really a toolkit for signals, and a signal can be anything. Audio is the obvious neighbor: a sound is a one-dimensional signal, and the same spectral and statistical ideas drive noise reduction, compression, and synthesis there. A charming example closes the loop between the two: Paul Lamere's Infinite Jukebox takes any song, finds beats that sound nearly identical, and quietly jumps between them so the track plays forever without ever sounding wrong — which is precisely the idea of Video Textures (Schödl et al. 2000), a computational-video technique that finds similar frames in a short clip and loops between them to synthesize an endless, non-repeating video, transplanted from pictures to sound.
There is a deeper reason imaging keeps reappearing in places that have nothing obvious to do with pictures: at bottom, we really only know how to measure two things well — electrons and photons. Electrons are the workhorse, because nearly every sensor ultimately hands us a voltage or a current. But photons turn into electrons almost for free, through the photoelectric effect — a photon strikes a surface and frees an electron — the very phenomenon that won Einstein his Nobel Prize, and exactly how a photosite on a sensor works. Because that conversion is so cheap, a huge range of measurements is staged as the same trick: make the quantity you care about emit or modulate light, then count the photons as electrons. Fiber-optic and spectroscopic sensors, fluorescence assays, optical readout of pressure or strain or temperature, even reading the bits off a disc all work this way; scintillators turn invisible X-rays and gamma rays into little flashes of visible light that an ordinary camera can then photograph. The photon ⇄ electron conversion recurs because it is cheap, fast, and roughly linear — which is why the sensor we build up in this book, a photon counter at heart, is a template that reaches far beyond photography. The lesson worth carrying through the whole book is that you are not learning photography tricks; you are learning a way of thinking about measured signals that happens to be illustrated, beautifully, by photographs.
Because these connections are so many and so concrete, the book gives them a part of their own near the end — Adjacent fields and applications — which tours imaging across disciplines and instruments: modern and exotic sensors, astronomy, X-ray imaging, medical imaging, microscopy, millimetre-wave imaging, sound and music, fluorescence imaging, opto-acoustic imaging, ultrasound, aerial imaging, computer vision, and robotics and self-driving. Each is the same machinery this book builds — measure, model, invert — wearing another field's clothes; that part is where we follow the threads out.