Core Components of CH

Query Correctness

Here we show that the query algorithm described in the previous section is correct on the G* graph produced from any node contraction order. Moreover, it will follow that shortest paths in G* are also shortest paths in the original graph G.

Note that although we already proved that a general bidirectional Dijkstra query is correct, we have to do a little more work to show that the CH modified query also holds.

Our Notation

First we should restate some of the notation that was introduced in earlier sections:

We have some node ordering O = {v₁, … , v_n} for every v in V.
We denote G₀ as the original graph G. And G_i is the subgraph induced by contracting node v_i in G_i-1.
G_i then does not contain v_i or its adjacent edges, but it does contain any shortcut edges added after v_i was contracted. So G_i contains n-i nodes.
We denote the shortest path between two nodes u, v in G_i as dist_i(u, v).
G* is the overlay graph containing all nodes and all edges in G, and all shortcut edges added in any {G₁, … , G_n-1}.

The Contraction of All Nodes Preserves Shortest Paths

We state Lemma 1, which tells us that the shortest path length between two nodes is the same before and after a given round of contraction.

Formally:

For all nodes s and t in G_i, dist_i(s, t) = dist_i-1(s, t) for i in [1, n].

If we have a shortest path P from s to t in G_i, either P contains a shortcut added after contracting v_i or it doesn’t. If it doesn’t, then P also must exist in G_i-1. If P does have a shortcut that was added in this round of contraction, consider that the shortcut edge is uw, and u and w are both neighbors of v_i. Then there is also some path <uv_iw> in G_i-1 with the same cost as uw, because that’s why we added the shortcut in the first place. And therefore dist_i-1(s, t) <= dist_i(s, t).

And similarly, consider now a shortest path P from s to t in G_i-1. Either P has edges that go through v_i (the node that will be contracted in the next round), or it doesn’t. If it doesn’t, then P is the same in G_i-1 as it is in G_i. If P does contain edges through v_i, then in G_i we’ll be adding a shortcut edge with the same weight. So dist_i(s, t) <= dist_i-1(s, t).

So dist_i(s, t) = dist_i-1(s, t).

This result is more or less obvious, but if you want a visual example to prove this lemma for yourself, you can reconsider our original contraction example:

1 / 3

2 / 3

3 / 3

❮ ❯

The Query is Correct

Now, we’ll use this first lemma to prove that, as we said in the previous section:

dist(s, t) = min{ dist(s, v) + dist(v, t)} over all v in L

where L is the set of all nodes settled in both searches on the G*_U graph from s and t.

We also can reiterate that when considering symmetric graphs (every edge goes both directions), a forward search on G*_U is identical to a backward search on G*_D. So wlog, we consider that our two searches from s and t run on G*_U, as we described in the previous section.

Now, we consider the shortest path P from s to t in G. We define vertex m as the node with the highest order (i.e. contracted later than others) on the path P.

We say that there is a prefix path in P that contains a set of nodes {v₀, v₁, … , v_k} such that v_i-1 < v_i, where the < symbol (as said before) indicates that v_i-1 was contracted before v_i. In other words, the prefix path contains the longest subsequence of nodes on the portion of the shortest path from s to m such that each node has a lower contraction order than the next.

We can use the following illustration to visualize m and the prefix path. We only consider the search starting from s for now, but this argument is completely analogous to the search starting from t:

query-proof

Because we defined our path P as existing in the original graph G, we also see that P can have edges between two nodes like v₁ and r and r and q, where v₁ > r > q. There is nothing from precluding this edge from belonging to P.

However, we can show that there must exist either shortcut edges or original edges in G* between all nodes in the prefix path. (The shortcut edges that would appear in G* are dotted green in the illustration above). And moreover, these shortcuts have weights that preserve the cost of edges along the actual shortest path.

For example, consider the graph G’ right before node v₁ is contracted. Then lower-ordered vertices like r and q and their incident edges aren’t in G’, since by definition, those vertices were already contracted in earlier rounds.

But by Lemma 1, we must have a path from v₁ to v₂ that preserves the cost of the shortest path between those two nodes. And if that edge is not an original edge in G, then it must have already been added as a shortcut and appear in G'.

And if we continue this argument further up the prefix path, we can conclude that all edges between nodes in the prefix path must exist in G*. If we tried to give a counter argument that there exists some intermediary node v’ such that v’ > v_i and v’ lies on the path between v_i and v_i+1, then by definition v’ itself would be on the prefix path.

So this proves that the shortest path distance from s to m in G*_U is equal to the shortest path distance from s to m in the original graph G. And like we said earlier, because our backward search on G*_D from t is equivalent to the forward search on G*_U from t, the same argument holds for the path from t to m.

And because m is on the shortest path from s to t in G, m must have been among the nodes settled by both searches during the bidirectional query on G*_U. So by taking the minimum sum of the two shortest path scores among all the nodes settled by both searches (the set L), we will find the sum of the two scores at node m. And whether or not the paths from s to m and t to m contain shortcut edges in G*, we know that dist(s, t) is the same in G* as it is in G, since by Lemma 1, adding shortcuts through contraction preserves shortest path lengths.

It then only follows to backtrack the parent edge pointers in G in both directions from m as shown in the previous section.

For more detailed proofs of query correctness see [9] and [2].

Node Ordering

So our query is correct for any contraction order, but this doesn’t necessarily mean that all node orderings lead to the same query performance. So we continue to understand how node orderings are determined.

[previous] [continue]