09 October 2014

Nordic championships in programming 2014

The Nordic championships in programming (NCPC) and UK/Ireland championships (UKIEPC) 2014 took place last Saturday. For the second time I was the head of the jury together with Michał Pilipczuk from University of Bergen (now at Warsaw). This was one of my most memorable contests as a jury member, so I wanted to share some things that happened behind the scenes.

First some interesting facts about the contest.
  • We had 257 teams in NCPC and 68 teams in UKIEPC. Most teams had 3 people, so we had almost 1000 people participating in 7 different countries!
  • The jury could solve the whole contest (11 problems) in 463 lines of code. Something to brag about: 4 of the shortest solutions were mine.
  • The winning team, Omogen Heap, solved all 11 problems 80 minutes before the end, when the second team only had 9 solved problems. These are the guys I have been coaching since high school and I have mentioned them in one of my previous posts.
  • The hardest problem Basin City Surveillance had 33 solutions from the judges: 11 correct, 9 wrong and 13 slow ones. There were 76 test cases, most of them hand-crafted.
Now let me talk more about the hardest problem.

Basin City Surveillance

The problem: check if there is an independent set of size k in a bounded degree graph, where each vertex has at most 4 neighbors. See problem B here for the full problem statement. Initially, the upper limit on k was 11, which allowed 5k algorithms to pass.

On the last Sunday before the contest, I sat down to write a random solution that seemed too simple to pass: it adds a random vertex into the independent set and erases its neighbors, and repeats this until there are no vertices left. If it fails to find k independent vertices, it restarts from the beginning with an empty set.

I was expecting such algorithm to require too many restarts, but only 100 restarts were enough to solve all the current test cases. I constructed some test cases that required 300 restarts, but my solution was still the fastest by far.

At about the same time, Per Austrin sent an e-mail suggesting that we raise the upper limit on k to 15, because he wrote a 3k solution and thought the whole contest was too easy otherwise. Interesting! Per's and mine solutions were intriguing enough that falling asleep on Sunday took me too long.

Michał was against raising the limits at first, but at Monday lunch we decided to go for it to make the contest more interesting, even though there were only a few days left. The next couple of days we were all busy writing all kinds of solutions and test cases killing those solutions; that's why we had 76 test cases and 33 solutions. In the mean time, we also managed to prove that my random solution needs on average at most 3k restarts.

Why the random solution needs only 3k restarts?

First we need to prove a lemma that is important when you want to speed up the 5k algorithm to 3k. For a vertex u we have two options: either it is in some maximum independent set or it isn't. If it isn't, at least two of its neighbors are. One can prove this by contradiction. If only one neighbor v is in an optimal set, then we can substitute v for u and get an optimal solution with u in it – a contradiction. And if none of u's neighbors are in an optimal set and neither is u, then this can't be an optimal set: we could add u to it to make it larger. Again a contradiction.

Using this lemma one can write a 3k backtracking algorithm, that either picks a particular vertex or two of its neighbors. With a little more insight this can be sped up to 2.31k.

By the way, this been the first and probably the last time I have written a program with complexity 2.31k. The slides contain the recurrence that leads tosuch complexity. If you want to be able to solve such recurrence, Concrete Mathematics is the recommended reading.

With some more work, the lemma above also implies that my random algorithm has 1/3chance of succeeding. The following proof is by Per Austrin. We call a vertex good if it is in some maximum independent set and bad otherwise. Let G be the fraction of good and B be the fraction of bad vertices, so we have G + B = 1. The claim is that there is at least one third of good vertices: G ≥ 1/3.

Let us count the number of edges L that connect good with bad vertices. By the lemma, a bad vertex has at least two good neighbors, so L ≥ 2Bn (n is the total number of vertices). We have a bounded degree graph, so each good vertex has at most 4 bad neighbors, 4Gn ≥ L. From these two observations, 4Gn ≥ 2Bn and 2G B. Since G + B = 1, G ≥ 1/3.

At each step my algorithm has 1/3 chance of picking a good vertex, so in total it has 1/3k chance of succeeding. The average time complexity is therefore 3when an independent set of size k exists.

Omogen Heap's solution is also 3k

Omogen Heap first submitted a slow 2n solution with a very small optimisation. In the next submission, they randomized their choices and added a cutoff: after 50 million iterations exit and print "impossible". They got accepted, solving everything 80 minutes before the end.

At first I thought the solution was too slow and our test data too weak, because I have overlooked the randomisation part. I told them that they got lucky. But with randomisation such solution actually has the same running time as my random algorithm, since the chance of stumbling upon a solution is high, 1/3k.