💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
jump to

9.9 Optical stabilization

Hold a camera in your hands, look through it at a distant sign, and watch the letters quiver. Your hands are never still: muscle tremor, your pulse, the act of pressing the shutter button all rotate the camera by tiny, fast, more-or-less random angles. With a short enough exposure that quiver is frozen and you never see it. But photography is an integration over time — the sensor accumulates light for the whole duration the shutter is open — so if the image is moving across the sensor during that interval, every scene point smears along the path the image took. That smear is motion blur, and when its cause is the trembling camera rather than a moving subject, it is hand-shake blur.

The classic defenses are blunt: shorten the exposure (freeze the shake, but starve the sensor of light), or brace the camera (a tripod, a wall, your elbows on a table). Stabilization is the clever defense. Instead of shortening the exposure or immobilizing the whole camera, it lets the camera shake but moves an optical element — or the sensor itself — to keep the projected image still on the sensor for the duration of the exposure. A small gyroscope feels the shake; an actuator pushes a lens group or the sensor in the opposite sense, just enough to hold the image in place. Done well, this buys several stops of slower shutter speed, which is a large amount of light and the difference between a sharp hand-held shot and a blurry one. The chapter develops the problem (the blur budget), the two optical mechanisms (lens-shift and sensor-shift), and the electronic and computational alternatives — ending on the limit that frames the whole part's relationship between optics and software: optical stabilization fixes the camera's motion, and computation must fix the rest.

9.9.1 The problem: hand-shake and the blur budget

Start with the geometry, because it explains why long lenses are so much harder to hold steady. Hand tremor mostly rotates the camera — small angular wobbles in pitch and yaw — rather than translating it bodily through space. (Translation matters too, especially up close, but rotation dominates for ordinary subjects.) Now recall from FUNDAMENTALS how a camera rotation maps to the image: rotating the camera by a small angle $\Delta\theta$ slides the projected image across the sensor by

$$\Delta x \approx f\,\Delta\theta,$$

where $f$ is the focal length. The displacement is proportional to focal length. A $0.1°$ wobble that nudges the image a trivial amount on a wide-angle lens drags it across many pixels on a long telephoto. This is the geometric heart of hand-shake blur: the same physical tremor of your hands produces a larger image-plane smear the longer the lens.

fig-handshake-motion-blur
Figure 9.9.1. Hand-shake motion blur: a tremor smeared over the exposure. Left: a stylized hand-tremor trajectory — the small, jittery angular path the camera traces during one exposure (pitch vs. yaw, a few tenths of a degree, with the squiggle of pulse and muscle tremor). Center: a single bright scene point projected through the lens; as the camera rotates by $\Delta\theta$, its image walks across the sensor by $\approx f\,\Delta\theta$, so over the open-shutter interval the point integrates into a streak that traces the tremor path. Right: two streaks side by side for a short lens and a long lens under the same tremor, the long-lens streak conspicuously longer — blur $\approx f\,\Delta\theta$ grows with focal length. A caption notes that the blur also grows with shutter time, since a longer exposure integrates more of the path.

The smear also grows with exposure time, for the obvious reason: the longer the shutter stays open, the more of the tremor path the image integrates. Put the two together — blur grows with focal length and with shutter time — and you recover the old photographer's rule of thumb for hand-holding: keep the shutter time below about $1/f$ (for example, $1/200\,\text{s}$ on a $200\,\text{mm}$ lens). Below that the tremor smear is smaller than a pixel-or-two tolerance; above it, shake shows.

That rule exposes a tension we already met in FUNDAMENTALS as the blur budget inside the exposure triangle. You want a short shutter to freeze the shake — but a short shutter lets in less light, and the missing light has to be bought back somewhere. You can open the aperture wider (more light, but less depth of field — cross-ref Depth of field), or raise the ISO (more light, but more noise — cross-ref denoising). In dim conditions you may not be able to afford a shutter fast enough to freeze shake without paying an unacceptable price in depth of field or noise. The whole appeal of stabilization is that it relaxes the blur budget: by physically holding the image still, it lets you use a longer shutter — gathering more light — without the shake smear that a longer shutter would otherwise cause. It is, in effect, extra exposure time for free, spent on light instead of on blur.

There is one limit so important it must be stated before any mechanism: stabilization fights camera motion only. It senses and cancels the motion of the camera. A subject that moves during the exposure — a running child, a passing car, wind-blown leaves — still smears across the sensor, and no stabilizer can help, because the stabilizer has no idea the subject is moving and would have to move the image differently for that part of the frame than for the rest, which a single rigid shift of the lens or sensor simply cannot do. Stabilization can hold the whole image still relative to the camera; it cannot hold one moving object still relative to its moving background. Freezing a moving subject is therefore a separate problem — either a faster shutter (back to the blur budget) or, after the fact, motion deblurring as an inverse problem (forward-ref to the deblurring chapter in the Advanced inverse-problems part). Keep this scope in mind: everything in this chapter buys sharpness against your hands, not against the world.

9.9.2 Optical stabilization: lens-shift vs. sensor-shift (IBIS)

To cancel image motion you need two things: to sense how the camera is moving, and to act to undo it. Both flavors of optical stabilization share the same sensing and differ only in what they move.

Sensing. The camera carries a tiny micro-electro-mechanical-system (MEMS) gyroscope — the same class of chip in your phone — that measures angular velocity about the pitch and yaw axes (and, in some systems, roll) hundreds or thousands of times a second. A gyroscope reports how fast the camera is rotating; integrating that signal estimates how far it has rotated. Often an accelerometer is paired with it (together they form an inertial measurement unit (IMU)) to help with low-frequency drift and translational shake. From the angular signal and the focal length, a controller computes the image-plane displacement the shake is about to cause — recall $\Delta x \approx f\,\Delta\theta$ — and commands an actuator to produce the opposite displacement, so the net motion of the image on the sensor is driven toward zero. This runs as a tight feedback (control) loop: sense, compute the counter-motion, drive the actuator, repeat, fast enough to track the tremor in real time (Figure).

fig-gyro-feedback-loop
Figure 9.9.2. The stabilization feedback loop. A block diagram: a MEMS gyroscope / accelerometer (IMU) senses the camera's angular velocity → a controller integrates it and, using the focal length, computes the image-plane shift the shake would cause and the counter-shift needed to cancel it → an actuator (a voice-coil moving a floating lens group, or a magnetic stage moving the sensor) produces that counter-motion → the image on the sensor is held still, and its residual motion is fed back to close the loop. Arrows show the loop running continuously during the exposure; a side note marks that the loop's bandwidth (how fast it can react) and the actuator's travel range set the limit on how much shake it can absorb.

Acting, version one — lens-shift optical image stabilization (OIS). Put a dedicated floating lens group inside the lens that can slide laterally (perpendicular to the optical axis), driven by voice-coil actuators — the same kind of electromagnetic motor that moves a loudspeaker cone. Shifting that group sideways steers the whole projected image back to where it should be, opposing the shift the shake induced. This is the original optical stabilization — Canon's Image Stabilizer (IS), introduced in 1995 (cross-ref the image-stabilization history in the opening chapter), and Nikon's Vibration Reduction (VR). Its advantages flow from being built into the lens: the correcting group is optimized for that specific lens (its focal length, its aberration behavior), and because the correction happens before the sensor, the stabilized image is what the viewfinder and the autofocus system see — a steadier view to compose and focus with. The cost is that every lens needs its own stabilization hardware, and a long zoom with a heavy floating group is expensive to build well. (Figure)

fig-ois-vs-ibis
Figure 9.9.3. Two ways to hold the image still: lens-shift vs. sensor-shift. Two side-by-side cutaway schematics fed by the same gyro signal. Left, lens-shift OIS: a floating lens group inside the barrel is driven sideways by voice-coil actuators (vertical arrows), bending the ray bundle so the image lands back on its original spot on a fixed sensor. Right, sensor-shift IBIS: the lens is rigid, and the sensor rides a magnetic stage that translates (and can rotate for roll) to chase the moving image. Each panel shows a single ray bundle from an off-axis point and how it is brought back to the same image location, one by moving glass, the other by moving silicon. A label notes that the two can be combined ("dual IS").

Acting, version two — sensor-shift in-body image stabilization (IBIS). Leave the lens rigid and instead mount the sensor on a small magnetic stage that can translate in the image plane — and, crucially, rotate about the optical axis — to follow the image as it moves. Because the moving part is in the camera body, IBIS works with any lens mounted on it, including old, un-stabilized, and adapted lenses that have no stabilization of their own. And because the sensor can rotate, IBIS can correct roll (rotation about the lens axis), an axis a laterally-shifting lens group cannot touch; combined with the two translational shake components it is often described as 5-axis stabilization (pitch, yaw, roll, plus horizontal and vertical translation). IBIS and lens OIS are not mutually exclusive: many systems run both at once — the lens correcting the axes it handles best, the body correcting the rest — a cooperation manufacturers market as "dual IS."

How many stops? The benefit is quoted in stops, meaning $\log_2$ of the ratio of shutter times the stabilizer lets you use. If stabilization lets you hand-hold at $1/15\,\text{s}$ where without it you would need $1/250\,\text{s}$ for the same sharpness, that is a ratio of about $16{:}1$, or $\log_2 16 = 4$ stops. Modern systems advertise roughly 3 to 7 stops of benefit. The number is not unbounded, and the ceilings are physical: the gyro and controller drift on low-frequency motion (slow sway is hard to distinguish from the camera being deliberately panned), the actuator has a finite travel range (it can only shift so far before it hits a stop), and for IBIS the moved sensor must stay inside the lens's image circle — push it too far and the corners fall off the projected image. Past those limits, residual blur creeps back in, which is one reason the burst-capture alternative at the end of the chapter exists.

9.9.3 Digital / electronic stabilization and the computational alternatives

Optical stabilization moves real hardware. The cheaper, hardware-free path is to fix the motion in the pixels after the factelectronic image stabilization (EIS), also called digital stabilization. The idea is purely computational: estimate how the camera moved between frames (from the gyro, from optical-flow analysis of the image content, or both), then crop in by a margin and warp each frame so that successive frames line up — the wobble is absorbed into the margin you cropped away, and the visible frame stays steady (Figure).

fig-digital-vs-optical-stab
Figure 9.9.4. Electronic (digital) vs. optical stabilization. Top row, EIS: a sequence of video frames whose full sensor area jitters frame-to-frame; an inset "output window" is cropped inside each frame and re-positioned (and slightly warped) so the output sequence is steady — with a visible loss of field of view (the discarded margin) and the warning that this acts between frames, not within one exposure, so it cannot remove the blur smeared into a single frame. Bottom row, optical (OIS/IBIS): the same shake cancelled by moving the lens/sensor during the exposure, so each frame is captured sharp at full sensor resolution and full field of view. A side note marks the rolling-shutter caveat: on a complementary metal-oxide-semiconductor (CMOS) sensor the EIS warp must also un-skew rolling-shutter distortion.

EIS is attractive because it has no moving parts and costs nothing in hardware — it is software running on frames the camera already has. But it carries two real limitations. First, the crop-and-warp costs resolution and field of view: the margin you reserve for the correction is thrown away, so a digitally-stabilized video is narrower and (after up-scaling back to size) slightly softer than the raw capture. Second, and more fundamentally, EIS works between frames, not within a single exposure. It can align frame $n$ to frame $n+1$, smoothing the jitter of a video sequence — but it cannot reach inside one frame to undo the smear that the shake already baked into that frame's exposure. So EIS stabilizes the motion of the video without reducing the per-frame motion blur the way optical stabilization does. The two are complementary, and phones combine them: gyro-driven OIS to keep each frame sharp, plus gyro-plus-optical-flow EIS to glue the frames into smooth video.

There is one more wrinkle EIS must handle on modern sensors: the rolling shutter. A CMOS sensor reads its rows sequentially rather than all at once, so during fast motion the top of the frame is captured a few milliseconds before the bottom, skewing straight verticals into slanted ones (the classic "jello" wobble — cross-ref the motion/video treatment of rolling shutter). Electronic stabilization that warps frames must therefore not only translate them to cancel shake but also un-skew the rolling-shutter distortion, which means the warp is a per-row geometric correction, not a single rigid shift. Getting this right is exactly what makes phone video look uncannily smooth.

Finally, the computational-photography flip that closes the chapter and points to the rest of the book. Optical stabilization answers the blur budget by enabling one long, stabilized exposure. The computational alternative answers it the opposite way: take a burst of many short, individually blur-free exposures and combine them. Each short frame is so brief that hand-shake cannot smear it — the shake is frozen, not cancelled — but each short frame is also dark and noisy. So the burst is then aligned and merged: software estimates the small frame-to-frame offset the shake produced and shifts the frames back into register, then averages them. Averaging $N$ aligned frames cuts the random noise by a factor of $\sqrt{N}$ (recall the multi-frame denoising result), recovering the signal-to-noise ratio that a single long exposure would have had — but without the motion blur, because every frame was short. This align-and-merge burst pipeline (cross-ref the multi-frame denoising and super-resolution machinery) is how a phone with a tiny sensor and limited optical stabilization nonetheless produces sharp, clean low-light photographs: it replaces a steady exposure with a steady computation. It is the software counterpart of optical stabilization, and it is forward-referenced into the Advanced computational-camera part. Optics holds the image still with hardware; computation holds it still with alignment — two ends of the same design space, exactly the theme of this part.