Inferning networks for graph partitioning
Date
Authors
ORCID
Journal Title
Journal ISSN
Volume Title
Publisher
DOI
Abstract
Image analysis, pattern recognition, and computer vision pose very interesting and challenging problems, especially in this time when billions of images are generated and uploaded every single day. In many computer vision systems, segmentation often is the first step in which it localizes the objects presented in the image by partitioning the image into meaningful segments. More precisely, image segmentation is the task of assigning discrete labels to pixels, where the label represents the segment (or cluster) to which a pixel belongs. We express an image as a graphical model, also known as an affinity graph, whose vertices denote pixels or regions and whose edges denote adjacency. Each edge in the affinity graph has a weight that quantifies the similarity between the adjoining vertices. To partition the graph, we select a threshold and discard all edges with weights below the threshold and then form segments as path-connected regions. We can define an energy function of a segmentation by adding up the weights of all edges between different segments, which is referred to as the correlation clustering energy. Partitioning the graph to obtain a segmentation is seen as inference in a graphical model and can be formulated as the minimization of the energy function over all possible segmentations, an NP-hard problem. We train a deep neural network to produce the affinity graph, whose segmentation minimizes a natural learning objective called the Rand error. For a graph with a ground truth segmentation, the Rand error measures the pairwise misclassification error between a predicted segmentation and the ground truth segmentation over all possible vertex pairs. We describe an efficient mini-batch learning algorithm based on Kruskal’s algorithm and discuss two formulations of the loss function and two graph encoding schemes used in training the neural net. We present a novel concept, namely that during the training process we select the optimal threshold that minimizes the correlation clustering energy function over a restricted set of segmentations given by different thresholds. We present experiments on a synthetic dataset that illustrate that adding this extra inference step to the training of the neural net causes it to learn different affinities that lead to a 5% reduction in the Rand error on a validation set of similar synthetic images.