Pywren, the cloud, and certificates of optimality
Posted with : Clustering, Data science, Convex optimization
In theory, theory and practice are the same, but in practice, they’re not. This phrase is probably a cliché but I think it captures part of the spirit of Ben Recht’s “quixotic quest for super-linear algorithms” talk on Tuesday at Simons.
His point (if I got it right) is that in ‘big-data’ problems (his example, solving ) one can chase the fastest algorithm (in this case conjugate gradient methods) but in practice if your big data doesn’t fit in memory, then it’ll be real slow. Even Gaussian elimination can be faster when memory is dealt properly.
And (luckily) there is a magic place where memory is infinite and all your dreams come true (at a price). It’s called the cloud, and this is how he depicted it:
But the cloud has entry barriers, and in order to make it simpler for users (data scientists?) they introduced Pywren. Pywren is a framework that allows you to run your python code in the cloud in a seemingly transparent way. I haven’t tried it yet but I think it’s a great idea, so much better than running SDPs overnight in Rachel’s laptop.
During Ben’s talk someone asked how can you know whether the cloud is giving you what you paid for (in terms of memory and processing power). That’s tricky, I guess you probably don’t. I suspect a potential business model once your code runs transparently in the cloud is to get charged by task and not by time. The user doesn’t care if the cloud uses Gauss-Seidel or conjugate gradient as long as it gives him a correct (or approximately correct) solution.
And the next question is, how do you know that the cloud gave you a decent solution for your problem. In the problem it’s easy, but what if you gave it another problem? Say, one of my favorite problems: -means clustering. We have a convex vs non-convex issue here: convex algorithms are slow but provide certificates of approximate optimality, non-convex algorithms are fast, but given a solution, it is difficult to detect its optimality.
One idea, by Afonso Bandeira, that combines the best of both worlds, are Probably Certifiably Correct algorithms. The idea leverages the following components:
- A fast non-convex solver that produces the optimal solution with high probability (under some probability distribution of problem instances).
- A convex relaxation that is tight with high probability (under the same distribution).
- A fast method of computing a certificate of global optimality for the output of the non-convex solver in (1) by exploiting convex duality with the relaxation in (2).
PCC algorithms exists for minimum bisection, synchronization and k-means clustering. They rely on the existence of a tight convex relaxation and (for now) the existence of a mathematician coming up with a dual certificate.
What happens if instead of having a tight convex relaxation you have a nearly tight one? Like the semidefinite program for -means clustering from my previous post. Is it possible to exploit the nearly optimal solution from the fast non-convex solver to efficiently ( linear time) compute a nearly optimal solution for the dual (and therefore a certificate of approximate optimality)? I don’t know, I think it depends on the problem but I don’t know any successful example.
In my recent work with Dustin Mixon, we developed a Monte Carlo algorithm that uses sketching ideas to provide a high-confidence, sub-linear-time approximation certificate for -means clustering. See Dustin’s blog for a summary of the main idea of the paper. I am interested in investigating if the ideas can be used to certify approximate optimality of solutions of other ‘big-data’ problems found with fast algorithms that work in practice but lack theoretical guarantees.