💬Comments welcome. To leave a note, select any text and click the note / highlight button that pops up — or open the panel with the tab at the top-right (‹). Notes are visible only inside our private review group.
jump to
💡 In a hurry? Jump to this chapter’s 1 big lesson ↓

5.8 Seam optimization

You have stitched a panorama and one seam will not go away. Two photographs overlap, already aligned so the static background registers, and somewhere in that overlap you must decide, pixel by pixel, which photograph to keep. Wherever the decision flips from one source to the other, there is a seam. Run it straight down the middle and a hard line appears — the two exposures never matched perfectly, a cloud drifted between frames, a pedestrian walked through. Hide the line by feathering across the boundary and you swap the line for ghosting: the pedestrian, half from each frame, becomes a transparent double. The hard cut and the soft blend both fail, and for the same reason — they put the seam in the wrong place.

The fix is to choose where the seam goes. Route it through the hedge, the gravel, the cluttered roofline — anywhere the two photographs already agree, or the texture is so busy that a switch cannot be seen — and keep it out of the flat sky and the blank wall, where the smallest mismatch would shout. That, stated once, is the entire subject of this chapter: find the least-noticeable cut. It is the precise dual of the edge-preserving filtering we just finished. There we labored to avoid edges while smoothing; here we seek them while cutting.

That duality is the part's framing payoff, and it is worth saying plainly. The three families of EDGES MATTER are three things to do with an image's edges. Poisson editing reconstructs an image from its edges. Bilateral filtering and its relatives smooth everything except across its edges. Seam optimization cuts along its edges. All three respect the edges — the thesis the part opened with — and this chapter develops the third leg, picking up exactly where the edge-preserving chapters left off.

What unifies the methods is one idea wearing three hats. Lay the image out as a graph — or a grid, which is just a graph with a regular neighbourhood — attach a cost to cutting at each place, and ask an optimizer for the cheapest cut. When the cut is a one-dimensional path running top to bottom, the optimizer is dynamic programming. When the cut is a two-dimensional boundary separating one region from another, the optimizer is min-cut / max-flow on the graph. When we want a balanced split with no seeds at all, the optimizer is a spectral relaxation — normalized cuts. Between methods the optimizer rarely changes; you take it off the shelf. What changes is the energy: what you decide to penalize. That is the lesson this chapter exists to teach.

💡 Big lesson — image edits as discrete optimization on a graph

Many image edits are discrete optimization on a graph or grid — dynamic programming (a least-cost path), min-cut / max-flow (a globally optimal boundary), or a spectral relaxation (normalized cuts) — and the energy (affinity) you choose is the whole game. The optimizer is generic and off the shelf; what you penalize — gradient magnitude along a seam, label disagreement across an edge, the cost of severing two regions — is the entire design. Read this as the L4 affinity principle from the bilateral chapter turned inside out. There, the affinity between two pixels said "average these together." Here, the very same quantity, sitting on a graph edge, says "do not cut here." High affinity (similar neighbours) marks an expensive place to cut; low affinity (a genuine edge) marks a cheap one. Design that cost well and a generic min-cut returns the boundary a person would have drawn by hand. (Registered as new lesson L12 · image edits as discrete optimization on a graph (the energy is the design), first appearance here. Extends L4. Also relevant downstream: graph-cut photomontage composed with a Poisson blend; compositing / segmentation / matting; texture synthesis; stereo and Markov-random-field labeling.)

We will reach that lesson the way the field did: first the interactive ancestors that let a person steer the cut (intelligent scissors, snakes), then the automatic global optimizers that need no steering (seam carving, graph cut, normalized cuts), and finally the places where the same cut-along-a-seam idea spreads to texture, composites, and time.

5.8.1 Find the non-edges — least-noticeable cuts

Before any algorithm, the principle. A cut between two pieces of image is least noticeable where the image is already changing — across a strong edge, inside busy texture, along a high-frequency boundary — and most noticeable in smooth regions, where the eye has nothing else to attend to and reads the faintest discontinuity as a flaw. So every method here seeks out edges and hides the seam inside them, exactly inverting the goal of edge-preserving smoothing, which seeks out edges in order to protect them. Same target, opposite intent.

The panorama makes the recipe concrete, and it pays off the forward-reference from the panorama alignment-and-blending chapter — RANSAC (random sample consensus) for the alignment, then the blend; → blending chapter, which promised that "the optimal boundary between two images" would be found by graph cut. Build a difference map over the overlap — a per-pixel record of where bad things happen: where the two aligned images disagree in color, where a moving object shows up in one frame but not the other. Then find the boundary that crosses the fewest of those bad pixels. A seam that threads around every disagreement leaves a join nobody can see. The blending chapter's hard line and ghosted soft line were both just the price of not choosing the boundary.

Two regimes follow, and the chapter is organized around them. In the interactive regime a person guides the cut and the algorithm snaps it to the nearest good path — intelligent scissors and snakes, the historical ancestors. In the automatic regime a global optimizer finds the whole cut unaided — seam carving, graph cut, normalized cuts, the main line. We meet the ancestors first because they make "the cut is a least-cost path" tangible with a human in the loop; the automatic methods are then the same idea with the human removed.

5.8.2 Intelligent scissors / live-wire — the cut as a least-cost path

The first tool to make the principle interactive is intelligent scissors, also called live-wire (Mortensen & Barrett 1995). It is a selection tool. You want to trace an object's outline to cut it out, and tracing it freehand, pixel by pixel, is slow and shaky. Intelligent scissors lets you click one seed point on the boundary; then, as you sweep the cursor roughly along the object's edge, a "live wire" snaps to the strongest nearby edge and traces the boundary for you in real time. When the path looks right, click to anchor it and carry on. The tool does the precise edge-following; you supply only the rough intent (Figure 5.8.1).

fig-snake-livewire
Figure 5.8.1. Interactive boundary tools. Intelligent scissors / live-wire: from a clicked seed, the least-cost path on a gradient-derived cost graph snaps to the nearest strong edge and traces the object boundary as the cursor moves. Snake (active contour): the continuous cousin — an elastic curve dropped near the object relaxes onto the same boundary by minimizing its energy. Both lock onto image edges to select an object.

The mechanism is a shortest path on a cost graph, and it is the cleanest first instance of this chapter's machinery. Make every pixel a node, connect neighbours, and set each connection's cost low where the image gradient is high. A good boundary runs along strong edges, so we want the cheap path to be the one that hugs them — the cost is essentially the inverse edge magnitude, plus terms that reward following the gradient's direction smoothly rather than cutting across it. Now run Dijkstra's algorithm from the seed: it returns the least-cost path from the seed to every other pixel at once, so the optimal wire to the cursor's current position is already computed and can be drawn instantly as the cursor moves. Drift the cursor over a smooth region and the cheapest path still detours to ride the nearby edge — which is exactly the "snap to the boundary" behaviour you feel.

The payoff is the through-line of the whole chapter, visible already in this first example: this is the same machinery we will use for seam carving — a least-cost path on a gradient-derived energy — aimed at the opposite goal. Live-wire makes its cost low along edges, so the path follows an edge to select it. Seam carving will make its cost low in flat regions, so the path avoids edges and threads through non-content. The optimizer (a least-cost path) is identical; only the energy, and so the meaning of the cut, differs. The energy is the design.

5.8.3 Snakes (active contours) — the continuous cousin

A snake, or active contour (Kass, Witkin & Terzopoulos 1988), is the continuous, variational counterpart of live-wire. In place of a discrete path on a pixel graph, a snake is a smooth curve $v(s)$ — picture an elastic loop dropped near an object — that relaxes onto the object's boundary by minimizing an energy. Where live-wire is combinatorial, the snake is calculus: it flows downhill on an energy functional until it settles into the boundary, like a deformable band shrink-wrapping the object.

The energy has three parts, and reading them back in words is the whole intuition:

$$ E = \int \Big( E_\text{int}\big(v(s)\big) + E_\text{image}\big(v(s)\big) + E_\text{con}\big(v(s)\big) \Big)\, ds. $$

The internal term keeps the curve well-behaved — smooth, and not too crinkly — through two penalties on the curve's own shape,

$$ E_\text{int} = \alpha\,\lvert v'(s)\rvert^2 + \beta\,\lvert v''(s)\rvert^2, $$

where $\alpha$ is an elastic tension (penalizing length, so the curve wants to be short and not stretch) and $\beta$ a bending stiffness (penalizing curvature, so the curve wants to be smooth and not kink). The image term pulls the curve toward strong edges,

$$ E_\text{image} = -\lvert \nabla I\rvert^2, $$

negative because we minimize energy, so the curve is drawn to places where gradient magnitude is large — the boundaries. The constraint term $E_\text{con}$ lets a user nudge the curve, pinning a point or pushing it off a bad local fit. Together: be smooth (internal), sit on the edges (image), obey my hints (constraint). Minimizing $E$ is gradient descent, which in discretized form is a sparse banded linear solve at each step — the same linear-algebra machinery as the regression chapter, now driving a curve instead of an image.

The honest catch is that a snake is local. It descends to the nearest energy minimum, so it needs a decent initialization and can lodge on a spurious edge or fail to crawl into a deep concavity. That fragility is exactly what motivates the global optimizers next — methods that find the best boundary outright, with no initialization and no local minima to trap them. The many patches for snakes — balloon forces that inflate the curve, gradient vector flow that broadens an edge's pull, level-set reformulations that let the curve change topology — we note only in passing; the global graph cut is the cleaner answer.

5.8.4 Graph cut — globally optimal boundaries by min-cut / max-flow

The leap from local to global is the heart of the chapter (Boykov & Jolly 2001). Stop tracing a curve and instead pose the cut as a binary labeling: every pixel is assigned a label — object or background — and the boundary is wherever the labels change. Choosing the labels well is a single optimization whose minimum, in the binary case, is found exactly and globally by min-cut / max-flow on a graph. No initialization, no getting stuck — the global optimum, every time.

That global guarantee carries one important condition, worth stating because it is what makes the method honest. The exact min-cut equivalence holds only when the smoothness term $V_{pq}$ is submodular — intuitively, it must never cost more for two neighbours to agree than to disagree, formally $V_{pq}(0,0) + V_{pq}(1,1) \le V_{pq}(0,1) + V_{pq}(1,0)$. The Potts penalty and the $e^{-\Delta^2}$ edge weights used in practice satisfy it, so the global optimum is genuinely available here. The multi-label generalization (more than two labels) is by contrast NP-hard, and is only approximated — by $\alpha$-expansion, below. Exact global optimality is the binary, submodular case; keep that scope in mind.

The energy is the same shape that runs through all of computer vision:

$$ E(L) = \sum_p D_p(L_p) + \sum_{(p,q)\in\mathcal{N}} V_{pq}(L_p, L_q). $$

Read it back in two pieces. The data term $D_p(L_p)$ says how well pixel $p$ fits each label on its own — built from user scribbles ("these pixels are object, those background") turned into color models, so $D_p$ is small when $p$'s color matches the label it is handed. The smoothness term $V_{pq}(L_p, L_q)$ is a penalty paid whenever two neighbours $p$ and $q$ receive different labels — it is what makes the boundary short and coherent instead of a speckled mess. And here is the crucial design choice: make $V_{pq}$ small across true edges, i.e. small wherever the colors of $p$ and $q$ already differ a lot. Then the boundary, which must break some neighbour pairs, prefers to break the pairs that already differ — it prefers to cut along edges. That is the panorama principle, now written as an energy.

Sidebar — the smoothness term is the affinity (callback to L4)

The smoothness weight $V_{pq}$ is not a new idea — it is the L4 affinity from the bilateral chapter, read as a cost of cutting. There, the affinity (large when $I_p \approx I_q$) said "these two pixels belong together — average them." Here, the same quantity says "these two pixels belong together — so it is expensive to put a boundary between them." Make $V_{pq}$ large for similar neighbours (do not cut here, they match) and small for dissimilar ones (cheap to cut, there is already an edge), typically $V_{pq} \propto \exp(-\lVert I_p - I_q\rVert^2 / 2\sigma^2)$ — the bilateral range weight exactly. It is also the very same object as the colorization affinity and the matting Laplacian. One similarity measure, used to connect in the filtering chapter and to separate here.

The graph construction makes "min-cut" literal (Figure 5.8.2). Every pixel is a node; add two extra terminal nodes, a source standing for object and a sink standing for background. Connect each pixel to both terminals with t-links whose weights are the data costs $D_p$ — a pixel that strongly looks like object gets a fat link to the source and a thin one to the sink. Connect neighbouring pixels with n-links whose weights are the smoothness costs $V_{pq}$. Now a cut that severs the graph into a source side and a sink side, dropping every pixel onto one terminal, is a labeling — and the total weight of the severed edges is exactly the energy $E(L)$. So the minimum-energy labeling is the minimum-weight cut, and by the max-flow / min-cut theorem that cut is found by pushing maximum flow from source to sink (Boykov & Kolmogorov 2004 give a max-flow algorithm tuned for these grid graphs). The min cut slices through the cheap edges — the ones across real image edges, where $V_{pq}$ is small — and returns the object boundary.

fig-graphcut-segmentation
Figure 5.8.2. Binary segmentation as a minimum cut. Every pixel is a node; two terminals are added — a source (object) and a sink (background). t-links join each pixel to both terminals with weights set by the data term $D_p$ (how much the pixel looks like object vs background); n-links join neighbouring pixels with weights set by the smoothness term $V_{pq}$, made small across true image edges. The min cut (dashed) severs the cheapest set of edges separating source from sink — preferring the low-$V_{pq}$ links along real edges — and the side each pixel lands on is its label.

Two extensions, named not developed. First, multi-label. With more than two labels — photomontage choosing among several source photos, stereo choosing among many depths, denoising choosing among many intensities — the exact problem is NP-hard. It is solved approximately, and very well, by $\alpha$-expansion (Boykov, Veksler & Zabih 2001): iterate a sequence of binary graph cuts, each asking "which pixels should switch to label $\alpha$?", cycling through the labels until nothing improves. Second, GrabCut (Rother, Kolmogorov & Blake 2004): graph-cut segmentation made nearly effortless to drive — the user drags one rectangle loosely around the object, and the method then alternates running a graph cut and re-estimating Gaussian-mixture color models for object and background, tightening the cut each round. The application — interactive segmentation and matting — lives in the single-image part; here we keep the machinery and cross-refer both ways.

5.8.5 Normalized cuts — the spectral relaxation

Plain min-cut has a bias worth confronting head-on. If you minimize the raw weight of a cut with no terminals and no seeds — just "split this image into two pieces as cheaply as possible" — the cheapest split is almost always to lop off a single isolated pixel (or a tiny corner), because severing one weakly-connected node cuts very little weight. Min-cut is biased toward tiny, unbalanced cuts. When we actually want a balanced partition — roughly object-sized against roughly background-sized — raw cut weight is the wrong objective.

The fix is normalized cuts (Shi & Malik 2000): divide the cut weight by how strongly each side is connected to the whole, so a split that strands a tiny fragment is penalized for that fragment's feeble total connection. With an affinity weight $w_{uv}$ on every pair of nodes (again the L4 affinity — color and spatial similarity), set $\mathrm{cut}(A,B) = \sum_{u\in A,\, v\in B} w_{uv}$ and the association $\mathrm{assoc}(A,V) = \sum_{u\in A,\, t\in V} w_{ut}$ (how strongly side $A$ connects to everything), and minimize

$$ \mathrm{Ncut}(A,B) = \frac{\mathrm{cut}(A,B)}{\mathrm{assoc}(A,V)} + \frac{\mathrm{cut}(A,B)}{\mathrm{assoc}(B,V)}. $$

A lopsided split makes one of the two denominators small, blowing the objective up, so the minimum is a balanced partition that still cuts along low-affinity (edge) boundaries. Same affinity input as graph cut; a different, balance-aware objective.

How it is solved we keep to intuition; the eigenvector machinery is deferred to the appendix. Exact normalized cut (Ncut) is NP-hard, but it relaxes — allow the binary "which side" indicator to take continuous values — into a generalized eigenproblem

$$ (D - W)\,x = \lambda\, D\, x, $$

where $W$ is the affinity matrix and $D$ its diagonal degree matrix (each node's total affinity). (For the eigenvector / spectral background this leans on — eigenvalues, eigenvectors, the generalized problem — see Refreshers → linear algebra; we use the result here, not the derivation.) The second-smallest eigenvector of this system is a smooth, real-valued "soft partition" of the image; threshold it for the two regions, and recurse for more. The takeaway without the linear algebra: there is a spectral route to the best boundary — an eigenvector computation — running alongside the combinatorial route of graph cut. The contrast is worth stating plainly. Graph cut is combinatorial, needs seeds, exact and binary; normalized cut is spectral, unsupervised, approximate, and naturally multi-way. Two answers to the same question — where is the best boundary? — fed by the same affinity.

5.8.6 Seam carving — content-aware resize by DP

Now the automatic method that gives the chapter its name (Avidan & Shamir 2007). The problem is content-aware resizing (or retargeting): a wide photo has to fit a tall phone screen, or a square thumbnail, and the two obvious options are both poor. Uniform scaling squashes every face; cropping throws away the edges of the scene. Seam carving does neither. It changes the aspect ratio by removing the least-important pixels, one connected seam at a time, so salient content keeps its shape while the boring regions quietly shrink (Figure 5.8.3).

fig-seam-carving
Figure 5.8.3. Content-aware resizing by seam carving. From the energy map $e=\lvert\partial_x I\rvert+\lvert\partial_y I\rvert$, the lowest-energy vertical seams (one pixel per row, 8-connected) are found by dynamic programming and removed. Left: the original with several computed seams drawn in — they thread through low-detail regions and route around the subject. Centre: the seam-carved result, narrower but with the subject's proportions intact. Right: a naive uniform rescale to the same width, which squashes the subject — the comparison seam carving is built to win.

A seam is the unit of removal: an 8-connected, top-to-bottom path with exactly one pixel per row, where each row's pixel lands within one column of the row above (so deleting it leaves a coherent image one pixel narrower). Among all such paths we want the one of minimum total energy, where the per-pixel energy is gradient magnitude — a proxy for importance or saliency:

$$ e(i,j) = \lvert \partial_x I(i,j)\rvert + \lvert \partial_y I(i,j)\rvert. $$

The cheapest seam threads through low-energy, low-detail regions — flat sky, blurred background, smooth wall — and avoids the subject, whose edges are expensive to cross. Remove it and the photo narrows by one pixel where it could least be missed. Repeat to narrow further; duplicate the lowest-energy seam instead to widen.

The mechanism is the same least-cost path as live-wire, now solved by dynamic programming, and Figure 5.8.4 walks the table. Build a cumulative-energy table $M$ top-down. The first row is just the energy. For every later row, the cheapest way to reach pixel $(i,j)$ is its own energy plus the cheapest of the three pixels it could have descended from:

$$ M(i,j) = e(i,j) + \min\big( M(i{-}1,\,j{-}1),\; M(i{-}1,\,j),\; M(i{-}1,\,j{+}1) \big). $$

Read it back: "the cheapest way down to me is my own cost, plus the cheapest way down to my best parent." When the table is full, the bottom row holds the total cost of the best seam ending at each column; take $\arg\min_j M(H, j)$ for the cheapest endpoint, then back-trace upward — at each step step to whichever of the three parents supplied the minimum — to recover the seam itself. One pass down to fill $M$, one pass up to read off the path:

M[0, :] = e[0, :]
for each row i from 1 to H-1:
    for each column j:
        M[i, j] = e[i, j] + min(M[i-1, j-1], M[i-1, j], M[i-1, j+1])

j = argmin over columns of M[H-1, :]
seam = [(H-1, j)]
for each row i from H-1 down to 1:
    j = column of the parent in {j-1, j, j+1} that gave M[i, j]
    seam.prepend((i-1, j))
return seam

This is exactly the live-wire / scissors recurrence; only the energy (low in flats, not low on edges) and the cut's role (avoid content, not follow it) have changed. The energy is the design, again.

fig-seam-energy-dp
Figure 5.8.4. The seam dynamic program on a small grid. Left: the per-pixel energy $e(i,j)$. Right: the cumulative table $M(i,j)=e(i,j)+\min$ of its three parents, filled top-down; the minimum of the bottom row is the cheapest seam's total cost, and back-tracing the chosen parent at each step (arrows) recovers the seam itself (highlighted).

The honest limit is well known and worth stating. Seam carving fails on busy or structured images. If there is no low-energy path — a packed crowd, a brick wall, a face filling the frame — every seam must cross something, and carving puts kinks in straight lines and dents in faces. Better energies help: forward-energy scores the new edges a removal would create rather than the ones it removes, and saliency maps or face detectors can be folded into $e$ to protect what matters. But seam carving is a tool for images with slack in them, not a universal resizer. Note, do not dwell.

5.8.7 Time-lapse via DP — a seam through time

A short bridge, because the least-cost-path idea does not care whether the axis is space or time. Assembling a smooth time-lapse — a day-to-night sequence, a season's worth of one tree — out of a pile of candidate frames or exposures is the same dynamic program, with time as the running axis. Lay the candidate frames in a column per time step; cost a transition from one frame to the next by the discontinuity it introduces (an exposure jump, a color shift, a lighting flicker); and run a DP to pick the one frame per step that minimizes the total jump across the whole sequence. It is the seam recurrence with frames for pixels and time for rows.

Coverage note

No dedicated slide or transcript was found for the time-lapse-via-DP example; this section is written from the outline and standard practice as a short bridge between seam carving and video textures. Treat its scope as provisional — it may be trimmed to a single sentence under seam carving, or expanded if a source surfaces.

5.8.8 Video textures — looping a seam in time

The same time-axis cut, made into a product (Schödl et al. 2000). A short clip of a stochastic, repetitive phenomenon — a waving flag, a candle flame, a fountain, a child on a swing — can be turned into an arbitrarily long, seamless loop that never visibly repeats. The trick is to find good transitions: pairs of frames that look similar enough that playback can jump from one to the other without the eye catching the cut (Figure 5.8.5).

fig-video-texture-loop
Figure 5.8.5. Video textures as a cut in time. Each frame of a short clip is a node; an edge from frame $i$ to a similar later frame $j$ is a low-cost transition (an invisible jump). A low-cost back-edge that closes a loop lets the clip play forever without a visible repeat — the seam idea applied to time.

The mechanism is a cut in a time graph. Make every frame a node. The cost of jumping from frame $i$ to frame $j$ is their inter-frame dissimilarity — low cost means the two frames look nearly identical, so the jump is invisible; high cost means a visible pop. Finding a play order, or a tight loop that closes back on itself through a low-cost back-edge, is least-cost path / dynamic programming on this graph — the seam idea applied to time, picking the moments where the clip can fold back on itself unseen. The payoff is the chapter's principle one more time: hide the cut where things already match. It is panorama seam-finding and seam carving, now between frames rather than between pixels.

5.8.9 GraphCut textures & photomontage — cut, then blend

Two applications close the chapter by showing the seam machinery composing with the rest of the part.

GraphCut textures (Kwatra et al. 2003) is graph cut aimed at texture synthesis. To extend a texture, or composite two overlapping images, place the pieces so they overlap and run a min-cut through the overlap region to choose the seam — the boundary that picks, pixel by pixel, the path where the two sources agree best (smoothness cost small where the patches match). The min cut routes the join through the places the two textures are indistinguishable, so the synthesized texture has no visible patch boundaries. It is the texture-synthesis sibling of panorama seam-finding: the same "cut where the sources agree" idea, now tiling a plane with texture instead of stitching two photos.

Interactive digital photomontage (Agarwala et al. 2004) is the clean statement of how the three legs of EDGES MATTER compose, and it is shared with the Poisson chapter for exactly that reason. The pipeline is cut, then blend. First a graph cut chooses which source photograph contributes each pixel — the user paints rough strokes ("take the eyes from this frame, the smile from that one") and a multi-label graph cut finds seams between sources that fall along edges where switching is cheap. That settles where to switch. Then a gradient-domain (Poisson) blend reconstructs across those seams, absorbing any residual exposure or color mismatch the cut could not avoid. That handles hiding what is left. So the seam picks the boundary (this chapter) and Poisson hides the leftover step (the Poisson chapter): cut along the edges, then reconstruct across the seam. It is the tidiest example in the book of two EDGES-MATTER methods working as a single tool, and the explicit reason the two chapters cross-refer.

5.8.10 Where this connects — MRFs and beyond

A name to meet once, in passing. The graph-cut energy

$$ E(L) = \sum_p D_p(L_p) + \sum_{(p,q)} V_{pq}(L_p, L_q) $$

— a per-pixel data term plus a pairwise smoothness term — is a pairwise Markov random field (MRF): a prior that says "labels should agree with the evidence ($D_p$) and with their neighbours ($V_{pq}$)." Graph cut and belief propagation are two ways to do inference in that MRF, i.e. to find the most probable labeling. The same template governs stereo (label = disparity), denoising (label = clean intensity), and optical flow (label = motion). We name the connection so the reader meets "MRF" once and recognizes the data-plus-pairwise shape when it returns; the fuller Bayesian / MRF treatment lives with inverse problems and segmentation, cross-referred rather than developed here.

Sidebar — who was Markov?

Portrait of Andrey Markov Andrey Markov (1856–1922) studied chains of events in which the next state depends only on the current one, not on the whole past — the Markov property. Lift that from a chain to a grid and you get the Markov random field above: a pixel's label depends only on its neighbours, which is exactly the data-plus-pairwise energy that graph cut and belief propagation minimize. The same local-dependence assumption powers patch-based texture synthesis, stereo, and labeling. He first demonstrated his chains by tallying the runs of vowels and consonants in Pushkin's Eugene Onegin. Portrait: before 1922, public domain (via Wikimedia Commons).

The closing through-line is the lesson restated. Intelligent scissors, snakes, seam carving, graph cut, normalized cuts, video textures — these are one idea wearing different optimizers. Lay the image on a graph, choose what it costs to cut at each place, and let a generic least-cost path, min-cut, or eigenvector find the cut. Cut where it will not be seen; the energy is the design.


Big lessons of this chapter

The recurring principles from this chapter, gathered for review.

💡 Big lesson — image edits as discrete optimization on a graph

Many image edits are discrete optimization on a graph or grid — dynamic programming (a least-cost path), min-cut / max-flow (a globally optimal boundary), or a spectral relaxation (normalized cuts) — and the energy (affinity) you choose is the whole game. The optimizer is generic and off the shelf; what you penalize — gradient magnitude along a seam, label disagreement across an edge, the cost of severing two regions — is the entire design. Read this as the L4 affinity principle from the bilateral chapter turned inside out. There, the affinity between two pixels said "average these together." Here, the very same quantity, sitting on a graph edge, says "do not cut here." High affinity (similar neighbours) marks an expensive place to cut; low affinity (a genuine edge) marks a cheap one. Design that cost well and a generic min-cut returns the boundary a person would have drawn by hand. (Registered as new lesson L12 · image edits as discrete optimization on a graph (the energy is the design), first appearance here. Extends L4. Also relevant downstream: graph-cut photomontage composed with a Poisson blend; compositing / segmentation / matting; texture synthesis; stereo and Markov-random-field labeling.)