• recap, don't re-teach: **depth = the z-coordinate**, *not* the ray length to the point; **unprojection** (pixel + depth → 3-D point) inverts perspective projection (defined in [[Multiple view geometry]] / Fundamentals). A **depth map** is just a per-pixel z image; a **point map** (DUSt3R's term, below) is its unprojected 3-D field.
• the cues/sensors that *give* depth, as a quick inventory (each lives in its home chapter; collected here for contrast):
• **stereo / disparity** — two views, triangulate matched points (baseline + correspondence) → [[Multiple view geometry]]
• **multi-view** — many views (the SfM/MVS pipeline below)
• **active depth** — **time-of-flight**, **structured light** (project a known pattern), **LiDAR** — return z directly (pointer to sensors/optics; RealSense & phone face-ID dot projectors)
• **defocus / focus** — depth-from-defocus and focal stacks → [[Advanced computational photography#Light field and multi-aperture imaging]], Depth of field
• **dual-pixel** — the half-aperture parallax phones already capture for autofocus, reused for portrait depth → Optics (dual-pixel AF)
• **motion** — structure-from-motion / optical-flow parallax → [[Matching pixels across space and time]]
• **monocular cues** — a *single* still already implies depth to a human (occlusion, perspective, texture gradient, shading, familiar size, haze); turning those into numbers is the next chapter
• the recurring catch: most reconstructions recover geometry only **up to a similarity** (scale/rotation/translation) unless something fixes the **absolute scale** — a known baseline, a calibration target, or a metric sensor. Keep "relative" vs "metric" depth distinct.