22.14 Datasets⧉
Modern imaging is as much about data as about algorithms — a learned denoiser is only as good as the noisy/clean pairs it trained on, and a "state-of-the-art" number means nothing without the benchmark it was measured on (the data story of Machine learning). The benchmark, in fact, quietly defines the task: its choices of scene, noise model, and ground truth become the model's assumptions, and its blind spots become the model's blind spots. This appendix is a working index of the public datasets the methods in this book lean on, grouped by what they are for. Each entry is a name, a one-line description, and a link; the chapters that use them point back here rather than reprinting URLs inline.
When a dataset is relevant to a chapter, that chapter adds a margin reference pointing here, rather than re-listing the link in the text. This appendix is the single source for dataset names and links, so a reader implementing or comparing a method always knows where the canonical data lives.
22.14.1 Classification and features⧉
- ImageNet — the million-image labelled classification set; the pretraining corpus behind most of the visual features the rest of the field reuses. <https://image-net.org>
- COCO — Common Objects in Context: detection, segmentation, keypoints, and captioning, on cluttered everyday scenes. <https://cocodataset.org>
- Places / Places2 — scene-recognition at scale (millions of images across hundreds of scene categories); also the standard backdrop for inpainting and scene parsing. <http://places2.csail.mit.edu>
22.14.2 Super-resolution⧉
- DIV2K — 2K-resolution, high-quality images; the de-facto standard super-resolution training set. <https://data.vision.ee.ethz.ch/cvl/DIV2K>
- Flickr2K — 2K Flickr photographs, commonly pooled with DIV2K to enlarge the training pool.
- Set5 / Set14 / BSD100 / Urban100 — the small, classic super-resolution test sets — a handful of stock images, natural scenes, and self-similar urban structure — that nearly every SR paper reports on.
22.14.3 Deblurring and restoration⧉
- GoPro (Nah et al. 2017) — sharp/blurry video-frame pairs synthesized from high-fps GoPro footage; the standard dynamic-scene motion-deblurring benchmark. (verify URL) <https://seungjunnah.github.io/Datasets/gopro>
- REDS — REalistic and Dynamic Scenes (the NTIRE challenge set): high-quality video for deblurring, super-resolution, and denoising. (verify URL) <https://seungjunnah.github.io/Datasets/reds>
22.14.4 Denoising⧉
- SIDD — Smartphone Image Denoising Dataset: real noisy/clean pairs from phone cameras, the benchmark that pushed denoisers toward realistic sensor noise. <https://www.eecs.yorku.ca/~kamel/sidd/>
- DND — the Darmstadt Noise Dataset: real photographs with held-out ground truth, scored on a submission server to prevent overfitting. <https://noise.visinf.tu-darmstadt.de>
- Kodak — the 24 lossless "kodim" test images, a long-standing benchmark for denoising, demosaicking, and compression. <https://r0k.us/graphics/kodak/>
22.14.5 HDR and tone mapping⧉
- HDR+ burst dataset — Google's raw burst dataset behind the HDR+ pipeline (Multiple exposure imaging). <https://hdrplusdata.org>
- Kalantari HDR (dynamic scenes) — multi-exposure bursts with subject motion, for HDR merging that must handle moving content.
- Laval HDR (sky / indoor) — high-dynamic-range outdoor-sky and indoor panoramas, widely used for lighting estimation. <http://hdrdb.com>
- Fairchild HDR Photographic Survey — a set of calibrated HDR scenes built for evaluating tone-mapping operators. <http://markfairchild.org/HDR.html>
22.14.6 Retouching and enhancement⧉
- MIT-Adobe FiveK — 5,000 raw photographs, each retouched by five expert artists; the dataset that made learned enhancement and tone adjustment possible. <https://data.csail.mit.edu/graphics/fivek>
22.14.7 Depth and motion⧉
- NYU Depth V2 — indoor RGB-D (Kinect) scenes, the workhorse for monocular depth. <https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html>
- KITTI — autonomous-driving benchmark with stereo, depth, and optical-flow ground truth. <https://www.cvlibs.net/datasets/kitti/>
- Middlebury — the classic high-accuracy stereo (and early optical-flow) benchmark. <https://vision.middlebury.edu/stereo/>
- MPI Sintel — a synthetic optical-flow benchmark with long-range, large-motion sequences from an animated film. <http://sintel.is.tue.mpg.de>
22.14.8 Color and white balance⧉
- NUS / Gehler-Shi — color-constancy sets with a color chart in each scene, giving a measured ground-truth illuminant. <https://www2.cs.sfu.ca/~color/data/shi_gehler/>
- Cube+ — single-illuminant color-constancy images with a calibration cube for ground truth. <https://ipg.fer.hr/ipg/resources/color_constancy>
22.14.9 Faces⧉
- CelebA / CelebA-HQ — celebrity faces with attribute labels; the HQ variant is the high-resolution version used in generative work. <https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html>
- FFHQ — Flickr-Faces-HQ: 70,000 high-quality aligned faces, the StyleGAN training set. <https://github.com/NVlabs/ffhq-dataset>
- LFW — Labeled Faces in the Wild: the long-standing face-verification benchmark. <http://vis-www.cs.umass.edu/lfw/>
22.14.10 Inpainting, segmentation, and matting⧉
- ADE20K — densely annotated scene parsing / semantic segmentation. <https://groups.csail.mit.edu/vision/datasets/ADE20K/>
- Composition-1k — the standard alpha-matting benchmark, foregrounds composited over many backgrounds. <https://sites.google.com/view/deepimagematting>