Graph isomorphism and Babai’s proof

How do we know two objects are the same?

This is a loaded question, as there is ambiguity about what “the same” precisely means. In the real world, we rely on subtle cues to determine if objects are different. Even an untrained eye can view the Mona Lisa’s in the featured image of this blog and determine the one on the right is a forgery. In mathematics, where rigor is all-important, the gold standard of sameness is the notion of isomorphism.  Isomorphisms are defined depending on the kinds of objects we are discussing. In a slogan: isomorphic objects share all identical properties.

There are several distinct kinds of isomorphisms; the definition used depends on the mathematical objects we consider. There are group isomorphisms, topological isomorphisms (or homeomorphisms), graph isomorphisms and so on. We focus here on the latter kind.

Graphs G and H are isomorphic if there is a function between their vertex sets that is 1) bijective (that is, one-to-one and onto; here is a definition) and 2) maps edges to edges, and non-edges to non-edges. Item 1) ensures that the graphs have the same number of vertices. For example, a graph of with 10 vertices cannot be isomorphic to one of with 11 vertices. Item 2) implies that G and H have identical structure, up to a permutation of vertices. So an 8-cycle with a single chord creating a triangle is not isomorphic to one with a chord creating a 4-cycle.

One way to view the definition (that is less precise but more revealing) is that we can redraw G to look like H, as depicted in the following figure. All three graphs here are isomorphic, with the vertex labellings depicting the bijection.

Image result for isomorphic graphs
Isomorphic graphs.

The graphs in the figure are very small with only four vertices. Imagine the difficulty in trying to determine by hand if graphs with hundreds of vertices are isomorphic. Or millions or billions. Such large-scale graphs cannot be redrawn by hand, and we must resort to computer support to test their properties.

Algorithms and their speed

Is there an efficient algorithm to detect if G and H are isomorphic?

The answer would be either YES or NO, and we require the algorithm to terminate (that is, stop and output an answer in finite time) with one of these two answers. One approach is list all the functions between the graphs, and check if properties 1) and 2) hold.That is: use pure brute force. If you are given a specific function, then there are fast ways to check the isomorphism properties 1) and 2). However, listing all such functions and testing each of them is not efficient. The bottleneck, of course, comes from the fact that there are so many functions to consider. Precisely, there are |V(H)||V(G)| many such functions, which is exponential in the order of H.

In mathematics and theoretical computer science, we are interested in the worst case time complexity of algorithms. That is, for all the possible kind of inputs for the algorithm, we must determine the speed (or running time, or complexity) of the algorithm in the worst case, roaming over all possible choices of G and H. That may sound heady, but it is a rigorous way of quantifying the relative speed of algorithms. You can argue that worst case should be average case or something else entirely. But the worst case complexity is important because the example you are considering for your research problem may be one of the “bad ones” that takes a very long time for the algorithm to terminate. Running times of algorithms can vary between a fraction of a second to longer than the expected lifetime of the universe, so it’s important to know the complexity of a given algorithm.

One measure of whether an algorithm is fast is whether it is of polynomial time complexity (or simply polynomial): if the input consists of n bits, where n is a positive integer, then the algorithm should run in no longer than nk steps, for some positive integer k. A time-step could be a second, millisecond, or something shorter or longer.

A graph decision problem answers YES or NO to a question about graphs, such as “Given G and H as input, is there an isomorphism between G and H?” The input would be data structures representing G and H, such as their adjacency lists. If we can find a polynomial algorithm for such a problem, then we say the problem itself is decidable in polynomial time.

Polynomial or quasipolynomial?

One might guess that the graph isomorphism decision problem is not polynomial time given the bottleneck of the number of functions, but no one has proven that. Proving this assertion is a tall order and remains open: one would need to show that there is no polynomial time algorithm for graph isomorphism, among all the many different possible algorithms. A universal principle in mathematics is that showing something doesn’t exist is usually much harder than simply giving an example.

There are infinitely complexity layers beyond polynomial time, of course. One such layer is quasi-polynomial, where algorithms run in time 2(log n)c, where c > 1. (I am fudging by omitting big ohs terms for simplicity. Further, the base 2 of the exponential can be any fixed real number; I prefer to use it for simplicity.) Note if c = 1, then we would have what is called a linear time algorithm.

To get some sense the relative speed of a quasi-polynomial logarithm, think of the log in the exponent as just counting digits of n (which is essentially what happens in base 10). If n were a billion and c = 1.1, then we would get a running time of 291.1, which is about 2,372. If the time-steps used are measured seconds, then that’s under 40 minutes for the algorithm to terminate. If we had no log in the exponent, then the algorithm would run in exponential time of more than 2 to the power a billion. For context, there are about 280 electrons in the universe, and an estimate on the life expectancy of the universe is about 258 seconds or 9.6 billion years. Exponential algorithms are hopelessly inefficient (possibly until the time when a working quantum computer is engineered).

Quasipolynomial!

Laszlo Babai (born in 1950 in Budapest, now at the University of Chicago) shocked the mathematical world when he claimed that the running time of the graph isomorphism problem is quasi-polynomial time. If true, then this would be a major advance. To put the claim in context, the best-known worst case bound was by Eugene Luks in 1983, and that was the exponential bound 2n1.2.

Image result for laszlo babai
Laszlo Babai.

Babai announced his discovery on his website, while posting his paper in December on arXiv; the proof runs about eighty pages. The proof is quite deep, relying on a subtle grasp of graph theory, algorithmic complexity, and group theory. Even the classification of finite simple groups is used in his proof (and this is widely considered the deepest proven result in mathematics).

20160608_174909.jpg
Babai near the end of his three hours of lectures at the SIAMDM’16 meeting in Atlanta this June.

I saw Babai give lectures in June 2016 in Atlanta at the SIAM Discrete Mathematics meeting on his proof. The first lecture was standing room only. The technical details were formidable, and Babai dived right in; this was not a lecture for the layperson. He dove in literally, as he fell off the stage at one point and caught himself! No harm was done and he continued the lecture without missing a beat. The audience was slightly sparser by the third hour as you can see in the picture above.

The plot thickens

As can happen with proofs of this magnitude, a mistake was discovered. The highly regarded mathematician Harald Helfgott (University of Göttingen and CNRS) spent months studying the paper and found the error. A revised version of the proof gives a subexponential but not quasipolynomial bound. That would be a major feat even if the full claim of quasipolynomiality remained open.

Image result for Harald Helfgott
Harald Helfgott.

However, on January 7, 2017, Babai announced he could correct his first proof. He now claims the algorithm is quasipolynomial as originally asserted. In his own words:

On January 7 I discovered a replacement for the recursive call in the “Split-or-Johnson” routine that had caused the problem. With this modification, I claim that the Graph Isomorphism test runs in quasipolynomial time (now really).

The replacement consists of a few lines of pseudocode, analyzed via a simple new lemma on the structure of coherent configurations.

I am working on an updated arXiv posting.”

No doubt, Helfgott and others will tackle the revised proof when it appears. It will require careful refereeing, and we should remain skeptical until experts are in agreement on its validity.

I hope his proof is correct. If so, then it would be a historic achievement and a fitting apex of Babai’s already stellar career.

Anthony Bonato

One thought on “Graph isomorphism and Babai’s proof

  1. For a weaker but still quasipolynomial version of the algorithm, which does not rely on the Classification theorem of Finite Simple Groups, see my arxiv preprint ” A CFSG-free analysis of Babai’s quasipolynomial GI-algorithm ( see also the STOC extended abstract of Babai).

    Like

Leave a comment