Inferning networks for graph partitioning
Abstract
Image analysis, pattern recognition, and computer vision pose very interesting and challenging
problems, especially in this time when billions of images are generated and uploaded every single
day. In many computer vision systems, segmentation often is the first step in which it localizes the
objects presented in the image by partitioning the image into meaningful segments. More precisely,
image segmentation is the task of assigning discrete labels to pixels, where the label represents the
segment (or cluster) to which a pixel belongs. We express an image as a graphical model, also known
as an affinity graph, whose vertices denote pixels or regions and whose edges denote adjacency. Each
edge in the affinity graph has a weight that quantifies the similarity between the adjoining vertices.
To partition the graph, we select a threshold and discard all edges with weights below the threshold
and then form segments as path-connected regions. We can define an energy function of a segmentation
by adding up the weights of all edges between different segments, which is referred to as the
correlation clustering energy. Partitioning the graph to obtain a segmentation is seen as inference in
a graphical model and can be formulated as the minimization of the energy function over all possible
segmentations, an NP-hard problem. We train a deep neural network to produce the affinity graph,
whose segmentation minimizes a natural learning objective called the Rand error. For a graph with
a ground truth segmentation, the Rand error measures the pairwise misclassification error between a
predicted segmentation and the ground truth segmentation over all possible vertex pairs. We describe
an efficient mini-batch learning algorithm based on Kruskal’s algorithm and discuss two formulations
of the loss function and two graph encoding schemes used in training the neural net. We present a
novel concept, namely that during the training process we select the optimal threshold that minimizes
the correlation clustering energy function over a restricted set of segmentations given by different
thresholds. We present experiments on a synthetic dataset that illustrate that adding this extra inference
step to the training of the neural net causes it to learn different affinities that lead to a 5%
reduction in the Rand error on a validation set of similar synthetic images.