11.10 Artistic projects with photo collections⧉
The methods in this part were built to use collections. Photomosaics tile one image out of thousands; retrieval finds the matching photo; curation throws the bad ones away; Photo Tourism and the photobio turn a heap of snapshots into a navigable space. This closing section is about the projects — many of them fine art, one of them a perception study — that take the collection itself as the object, and ask what it means as a whole. The unifying claim is the one stated in the big lesson below: a photo collection has properties that exist only at the level of the set. Average a genre and its hidden rulebook appears; estimate the density of a place's photographs and the cliché is the peak; gather thousands of "faces in things" and you can measure the human face detector against the machine one. Three threads, one idea — and all of them are this part's retrieval, averaging, and curation tools turned to an expressive or scientific end rather than a practical one.
11.10.1 Statistical collage — Salavon⧉
The artist Jason Salavon treats a set of photographs as the work rather than any single frame. His signature move is to average hundreds of same-genre images into one image — newspaper wedding announcements, real-estate listing photos, Playboy centerfolds tabulated by decade — and the result is an uncanny, ghostly prototype (Figure 1). Each contributing photograph dissolves into a soft haze, but what survives the averaging is precisely what every photograph in the genre shares: the standard pose, the standard framing, the standard lighting, the bride to the left and the groom to the right, the living-room-then-kitchen sequence of the home listing. The blur is not a defect; it is the genre's variation averaging out, leaving its convention visible. A statistical distribution that no individual photograph contains is rendered as a single picture you can hang on a wall.
The reason this works is the reason it belongs in this part rather than in an art-history one. Averaging a stack of aligned images is exactly the operation behind denoising by averaging: independent fluctuations shrink toward their mean while the common signal stays. In denoising the "signal" is the true scene and the "noise" is sensor randomness; in a Salavon average the "signal" is the genre's shared template and the "noise" is each couple's, each house's, each decade's particularity. Same arithmetic, opposite reading — the part where the averaging removes identity is the entire point of the work. Salavon also sorts and recombines: tiling images by color, reorganizing a photograph's own pixels by hue, building grids that read as one thing far off and as a thousand things up close — the same perceptual fusion that makes a photomosaic resolve at a distance, and the same appearance-similarity that drives Retrieval, here used as an aesthetic act rather than a search.
There is also a perceptual punchline worth naming, because it ties this thread to the others. The averaged face or scene tends to look not just typical but uncannily attractive and prototypical — a known effect in Human vision, where averaged faces are rated as more appealing than most of the faces that went into them. The collection, averaged, produces a hyper-normal exemplar that exists nowhere in the world. That is the cleanest possible demonstration of the emergent-content thesis: the prototype is real, it is computed straightforwardly from the set, and yet it is no photograph that was ever taken.
11.10.2 Anticliché camera⧉
The cliché is the most-photographed view: the postcard angle on the Eiffel Tower, the exact overlook everyone shoots at sunset, the framing that millions of phones have already captured. The anticliché camera is the contrarian inversion of Auto curation. Where auto-curation asks "is this a good, canonical shot?" and keeps the frame that best matches the prototype, the anticliché camera asks the opposite — "is this unlike what everyone already has?" — and steers you toward the un-photographed angle, the overlooked detail, the view no one bothered to take. It is the photographic cousin of the blind / candid camera ideas elsewhere in the book: both are machines that decide what to shoot, one optimizing for the canonical and the other against it.
Mechanically it is the same density estimate that everything in this part runs on, with the objective's sign flipped. Score a candidate frame against a distribution of existing photographs of the same place — a personal collection, or a geotagged web corpus of the millions of shots already taken there — using the retrieval descriptors of Retrieval. Auto-curation rewards a frame that lands in a high-density region of that distribution (where the good, agreed-upon shots cluster). The anticliché camera rewards low density instead: a frame whose nearest neighbors in the collection are few and far, a view the crowd has missed. Novelty, not canonicality, becomes the score — retrieval and density estimation put to a deliberately contrarian aesthetic end.
The thread matters because it exposes a bias the rest of the part quietly carries. A learned aesthetic scorer, trained on what people already like and already photograph, systematically undervalues the unconventional shot — it is built to reward the cliché. The anticliché camera is the honest counterweight: a reminder that "good," when learned from a collection, means "typical of the collection," and that an interesting photograph is often the one that is atypical of it. The same emergent property — the place's distribution of photographs — is read here not for its peak (the cliché) but for its valleys (the views still worth taking).
11.10.3 Pareidolia⧉
Pareidolia is the human tendency to see faces — and patterns — in things: the two screws and a slot of a wall outlet, the grille-and-headlights "face" of a car, a startled expression in a cloud or a piece of toast. It is an everyday quirk of perception, and Seeing Faces in Things (Hamilton, Stent, DuTell, Harrington, Corbett, Rosenholtz & Freeman, 2024, arXiv:2409.16143) turns it into a computational object. The authors assemble a dataset of roughly five thousand human-annotated "faces in things" — everyday objects whose arrangement triggers a face percept — and use it to compare human and machine face detection on images that contain, by construction, no actual face (Figure 2). The collection here is not a medium for art but an instrument for measurement: gathering thousands of these accidental faces lets you probe the face detector — both kinds — directly.
The findings are the interesting part. Machine face detectors partly fire on these illusory faces, which already says something — the same learned features that find real faces also catch the things that merely look like them — but they diverge from humans, and not randomly. The authors argue the divergence is explained by an evolutionary pressure to detect animal faces, not just human ones: a perceptual system shaped to never miss a predator or prey will be tuned to over-detect faces, accepting false alarms (a face in a rock) to avoid the costly miss. Human pareidolia, on this reading, is the visible residue of a detector deliberately biased toward false positives. Training a model on human faces alone gives it a different operating point, and the gap between the two is exactly what the dataset measures.
This is the cleanest case of the section's thesis pointed inward, at perception itself. The Salavon average revealed a genre's convention; the anticliché camera revealed a place's distribution; here a photo collection of illusory faces reveals a property of the human visual system — the shape and bias of its face detector — that no single image of a startled-looking electrical outlet could establish on its own. It ties the artistic thread of this part squarely back to Human vision: the collection does not just hold pictures of the world, it holds, in aggregate, a portrait of the eye that looks at it.
A collection has emergent content that no single photograph in it holds. Average a genre and its hidden rulebook — pose, framing, lighting — appears as a ghostly prototype that was never photographed. Estimate the density of a place's pictures and the cliché is the peak, the worthwhile shot a valley. Gather thousands of accidental faces and you have measured the human face detector, not any face. Each project is this part's own machinery — alignment-then-average (Photo Mosaics, averaging), retrieval-and-density (Retrieval, Auto curation), the face prior (Human vision) — turned from a practical means into an expressive, critical, or scientific end. The recurring move of the whole part has been the data is the prior; the recurring move of this section is its mirror image: the data is the message. What the set knows, no element of it knows alone.
Big lessons of this chapter
The recurring principles from this chapter, gathered for review.
A collection has emergent content that no single photograph in it holds. Average a genre and its hidden rulebook — pose, framing, lighting — appears as a ghostly prototype that was never photographed. Estimate the density of a place's pictures and the cliché is the peak, the worthwhile shot a valley. Gather thousands of accidental faces and you have measured the human face detector, not any face. Each project is this part's own machinery — alignment-then-average (Photo Mosaics, averaging), retrieval-and-density (Retrieval, Auto curation), the face prior (Human vision) — turned from a practical means into an expressive, critical, or scientific end. The recurring move of the whole part has been the data is the prior; the recurring move of this section is its mirror image: the data is the message. What the set knows, no element of it knows alone.