Topological data analysis and HodgeRank

This week I’ve been attending this conference on Topological Data Analysis. On Thursday both Lek-Heng Lim and Sayan Mukherjee talked about HodgeRank (their slides are available on Sayan’s website). I think it’s a really neat application of cohomology to a simple problem that illustrates what information you can get from cohomology groups. So I’m writing about it in my first post.

Let's say we have $m$ people and $n$ movies. Each person watches a bunch of movies and gives pairwise comparisons of the movies. The objective is to find a global ranking of the movies by aggregating the people's preference. Of course inconsistencies will occur most likely but cohomology will give a tool to measure them.

Let $Y_{ij}^\alpha$ be how much the person $\alpha$ liked the movie $i$ over the movie $j$ (and say 0 if the person does not compare the movies $i$ and $j$). If people are not completely insane we can expect $Y_{ij}^\alpha=-Y_{ji}^\alpha$. Let's aggregate all the information we have in a graph $G=(V,E) $ where $V=\{1,\ldots, n\} $ and $E=\{ (i,j): \exists \alpha Y_{ij}^\alpha\neq 0 \} $. One possible way of aggregating the information is $\begin{equation} \bar Y_{ij}=\frac{\sum_\alpha Y_{ij}^{\alpha} }{ \# \{\alpha: Y_{ij}^\alpha \ne 0\}}\end{equation} $. Note that $\bar Y:V\times V \to \mathbb{R}$ is a skew symmetric function.

A global ranking of movies is a score function $s:V\to \mathbb R$, where $s_i$ is the score of the $i$-th movie, and $s_j-s_i$ is how much better is the $j$-th movie movie with respect to the $i$-th movie. A pairwise comparison that comes from a global ranking is a function $X:V\times V \to \mathbb R $ such that there exists $s:V\to \mathbb R $ with $ X_{ij}=s_j-s_i$. If we define the gradient of a function $s$ as $(\operatorname{grad}\,s): V\times V \to \mathbb R$ such that $(\operatorname{grad}\,s)(i,j)= s_j-s_i $ then we have $$\{\text{globally consistent rankings}\}=\operatorname{im}(\operatorname{grad})$$

For $X:V\times V \to \mathbb R$ skew symmetric we can define $\operatorname{curl}\, X: V\times V\times V\to \mathbb R $ as $$\operatorname{curl}\, X (i,j,k) =\left\{ \begin{matrix} X_{ij}+X_{jk}+X_{ki} & \text{ if } (i,j),(j,k),(k,i)\in E \\ 0 & \text{otherwise} \end{matrix} \right..$$ Note that $X\in \operatorname{ker}(\operatorname{curl})$ if and only if triangular relations in $X$ are consistent. In particular: $$ \{\text{globally consistent rankings}\}=\operatorname{im}(\operatorname{grad})\subseteq \operatorname{ker}(\operatorname{curl}) = \{\text{triangularly consistent rankings}\}$$ $$\operatorname{curl} \circ \operatorname{grad}=0 $$

Now consider $K_G$ the clique simplicial complex generated by the graph $G$. It sounds technical but it's just considering the 0-simplex as the set of vertices of $G$, the 1-simplex as the set of edges, the 2-simplex as the set of triangles in $G$ and so on.

Define a $k$-dimensional cochain as a real-valued function on $(k +1)$-tuples of vertices that is alternating on each of the k-simplex and 0 otherwise. Let $C^k(K_G,\mathbb R)$ the set of linear combinations of $k$-dimensional cochains with coefficients in $\mathbb R$.

Define the $k$-th coboundary operator $\delta_k : C^k (K_G, R) \to C^{k+1}(K_G, R) $ as the linear map that takes a $k$-cochain $f \in C^k$ to a $(k + 1)$-cochain $\delta_k f \in C^{k+1}$ defined by </p> $$ (\delta_k f) (i_0,\ldots, i_{k+1}) := \sum_{j=0}^{k+1}(-1)^j f(i_0,\ldots, i_{j-1},i_{j+1},\ldots, i_{k+1}) $$

Note that by definition $ \operatorname{grad} =\delta_0$ and $\operatorname{curl} =\delta_1$. Then we have:

$$ C^0(K_G,\mathbb R) \stackrel{\operatorname{grad}}{\longrightarrow} C^1(K_G,\mathbb R) \stackrel{\operatorname{curl}}{\longrightarrow} C^2(K_G,\mathbb R) $$ $$C^0(K_G,\mathbb R) \stackrel{\operatorname{grad}^* =-\operatorname{div}}{\longleftarrow} C^1(K_G,\mathbb R) \stackrel{\operatorname{curl}^* }{\longleftarrow} C^2(K_G,\mathbb R) $$

Define the combinatorial Laplacian operators as: $$\Delta_k: C^k(K_G,\mathbb R) \to C^k(K_G,\mathbb R) $$ $$ \Delta_0= \delta_0^* \delta_0 $$ $$ \Delta_k= \delta_k^* \delta_k + \delta_{k-1} \delta_{k-1}^* $$ Then, the Hodge decomposition theorem says: $$ C^k(K_G,\mathbb R) = \operatorname{im}(\delta_{k-1}) \oplus \operatorname{ker}(\Delta_k) \oplus \operatorname{im}(\delta_k^* ) $$ $$ \operatorname{ker}(\Delta_k) = \operatorname{ker}(\delta_k)\cap \operatorname{ker}(\delta_{k-1}) $$ Therefore $ C^1(K_G,\mathbb R) = \operatorname{im}(\operatorname{grad}) \oplus \operatorname{ker}(\Delta_1) \oplus \operatorname{im}(\operatorname{curl}^* ) $. Now remember that $\operatorname{im}(\operatorname{grad})$ was the set of globally consistent rankings, and $\operatorname{im}(\operatorname{curl}^* ) = \operatorname{ker}(\operatorname{curl})^\top$ is the set of triangularly inconsistent rankings. Therefore, what the cohomology group $ \operatorname{ker}(\Delta_1) $ is accounting for is the triangularly consistent rankings that are not globally consistent. So, the cohomology group is measuring the obstruction our space has for local solutions to be global solutions. (Exact same thing as in vector calculus with the de Rham cohomology). For example, if $G$ was the complete graph, then all triangularly consistent rankings are globally consistent, or, in other words, the cohomology group $\operatorname{ker}(\Delta_1)$ is trivial.