自动化学报  2017, Vol. 43 Issue (10): 1829-1840   PDF    
一种基于超像素的肿瘤自动攻击交互式分割算法
产思贤1, 周小龙1, 张卓1, 陈胜勇1,2     
1. 浙江工业大学 杭州 310014;
2. 天津工业大学 天津 300384
摘要: 交互式分割对于选取图像中感兴趣的对象很有意义.在图像处理领域中有着很重要的地位,具有广泛的应用,至今仍然是一个研究的热点问题.然而,逐像素执行交互式分割通常是耗时的.本文提出了一种新的分割方法.在Growcut算法框架下,提出基于超像素的肿瘤自动攻击(TA)分割.其中,超像素可以提供强大的边界信息来引导分割,且可以由过分割算法很容易来获取.TA与细胞自动攻击算法(CA)有着相似的原理,给定少量的用户标记目标的超像素,可以由TA完成目标分割任务,处理速度比Growcut算法快.此外,为获得最佳效果,应用了水平集和多层TA方法来进行边界的调整.在VOC挑战分割数据集上进行的实验表明,所提出的分割算法性能表现优异,高效和精准,且能处理多目标分割任务.
关键词: Growcut算法     交互式分割     超像素     肿瘤自动攻击    
Interactive Multi-label Image Segmentation With Multi-layer Tumors Automat
Sixian Chan1, Xiaolong Zhou1, Zhuo Zhang1, Shengyong Chen1,2     
1. College of Computer Science, Zhejiang University of Technology, Hangzhou 310014, China;
2. College of Computer Science, Tianjin University of Technology, Tianjin 300384, China
Abstract: Interactive segmentation is useful for selecting object of interest in an image and it continues to be a popular topic. It plays an increasingly important role in image processing and has a wide range of applications. However, performing interactive segmentation pixel by pixel is normally time consuming. This paper presents a new method to improve the segmentation efficiency. The proposed method improves the growcut algorithm by utilizing the super-pixel-level tumors automata (TA), since the super-pixels can supply powerful boundary clues to guide segmentation and can be gathered easily by over-segmentation algorithm. The TA has the similar principle as cellular automata. Given a small number of user-tagged super-pixels, the rest of the image can be automatically segmented by TA. Due to the iterative strategy, user can observe that the processing is faster than the growcut. To obtain the best result, both level set and multi-layer TA approaches are applied. Experiments carried out on the VOC challenge segmentation dataset show that the proposed method achieves state-of-the-art performance.
Key words: Growcut     interactive segmentation     Super-pixel     tumors automata(TA)    
1 Introduction

Image segmentation task is to divide an image into regions of interest that are suitable for machine or human operations [1], [2] like image retrieval and recognition. Recently, the accuracies of completely automatic segmentation techniques [3], [4] have been enhanced substantially. Nevertheless, the achievements of current state-of-the-art algorithms still cannot satisfy the accuracy requirement of professional image editors for choosing target boundaries. Many interactive algorithms have been proposed to improve the accuracy recently. These algorithms are based on the graph-based theory, including interactive grabcut [5], graph-cut [6]-[8], random walks [9], regioncut [10] and growcut [11].

Graph-cut [6] is an assembled optimization strategy to address the issue of the object segmentation in an image. An image is treated as a graph and each pixel is a graph node. The globally optimal pixel labelling for two-label case (i.e., object and background) can be efficiently computed by using max-flow/min-cut algorithms. Grabcut [5] is an improvement of graph-cut by merging an iterative segmentation mechanism. The first proximity of the ultimate foreground/background labelling can be found when the user draws a rectangular box surrounding the target of interest. Random walker (RW) [9] acquires a few pixels as user-determined seed labels, but it gives an analytical decision of the probability which a random walker starts at each unlabelled pixel will attain one of the pre-labelled pixels firstly. Object segmentation is gained by distributing each pixel to the label for which the greatest probability is computed. Some special images with poor structure, color, and appearance features also can employ the RW for editing. But it is not easy to control and accomplish this kind of energy minimizing approach. Regioncut [10] associates the traits of the robustness of region information and the precision of gradient oriented segmentation approaches. Furthermore, the distributed seeds are initialized by region probabilities. This method can reach the state of convergence without user initialized seeds. Under the framework of the cellular automata (CA) [12], an interactive segmentation method, named growcut [11], is proposed. There are two major properties of this algorithm. One is the possibility to deal with the multi-label segmentations. The other is that this approach can be extended to handle the high-dimensional images.

In computer vision, interactive object segmentation plays a significant role in photo analysis and image editing. Under interactions in terms of scribbles [1], [2] or bounding boxes [13] around the object of interest for seeds, users can directly utilize the segmentation algorithm towards a desired output. Recently, researchers have presented many powerful approaches for interactive image segmentation. In this paper, we focus on the literature of interactive segmentation performed with super-pixel.

In regioncut [10], a Gaussian mixture model (GMM) and a precision of gradient oriented segmentation method are learned by combining the robustness of region information. The GMM is applied in pre-initializing the region probabilities. In this way, it is similar to distributed seeds. The final segmentation output is still gained from building a pixel-based graph. Additionally, the ineffectiveness of only using final segmentation mask is shown in the results. In [14], the method fuses the framework mentioned in [15] to obtain super-pixels on each frame independently. After that, the optical forward and backward information are utilized to build a spatio-temporal super-pixel graph. The graphs based on occlusion boundaries are focused on and the major contribution is to use the information of an occlusion boundary detector to modify them. Subsequently, the spatio-temporal super-pixel graph is partitioned into object and background by graph-cut. In [16], super-pixels serve as interactive buttons which can be tapped by the user quickly to add or remove an initial low quality segmentation mask, with the purpose of correcting the segmentation errors and generating promising results. Reference [17] develops an innovative segmentation framework based on bipartite graph partitioning, in which the multi-layer super-pixels can be fused in a principled manner. Computationally, it is tailored to unbalance bipartite graph structure and lead to a highly efficient, linear-time spectral algorithm. As far as our information goes, nevertheless, almost all the existing interactive approaches initialize the object and background via pixels.

In this paper, a new interactive segmentation algorithm is developed. It is based on the super-pixel level and the simple linear iterative clustering (SLIC) [18] is employed as the super-pixels generators. Each generated super-pixel is simply labelled as object or background. The segmentation is then updated by the graph-cut. To obtain the best result, both level set [19] and multi-layer TA approaches are applied. The corresponding segmentation results of our algorithm are illustrated in Fig. 1. The major contributions of our work are summarized as following.

Figure 1 The final results of our proposed interactive segmentation system. (a), (b) and (c) are single object segmentations and (d) is the multi-object segmentation.

1) A novel mechanism called tumors automata (TA) is proposed to improve the traditional CA method by using super-pixel to replace pixel.

2) For better incorporating the super-pixel to deal with TA mechanism, an improved growcut approach is developed.

3) A level set method [19] is applied for smoothing the object boundary, which uses the output from the previous process for initialization.

4) To make use of multiple final outputs, a multi-layer tumors automata (MTA) is proposed. By integrating different segmentation results, the MTA brings a boost in performance and beats many leading methods in the state-of-the-art.

The process of the proposed algorithm and the corresponding outputs are illustrated in Fig. 2. The original image (Fig. 2(a) is first over-segmented (Fig. 2(b)) and then initialized to select seeds (Fig. 2(c)). Black, white and gray colors represent the background labelled bacteria, object labelled bacteria and the neutral territory, respectively. Fig. 2(d), Fig. 2(e) and Fig. 2(f) demonstrate the results of object binary, object region and object boundary by our algorithm, respectively.

Figure 2 The process of our interactive segmentation algorithm.

We show each module of our algorithm in Fig. 3. User labelling is needed firstly and neighbors information is computed by super-pixel-based calculation. Then, we apply the modified growcut approach with TA in segmenting the whole image. The proposed method also absorbs the level set method to smooth the boundary.

Figure 3 The framework of proposed algorithm.
2 Proposed Algorithm 2.1 Initial Segmentation

As demonstrated in Fig. 2, we need to over-segment the image initially. The spatial proximity weight and the number of super-pixels must be provided. The image is then divided into super-pixels $S = {s_1 \cdots s_k}$ at the beginning. Those super-pixels are the source of TA, which will be introduced in next section.

2.2 Tumors Automata

The CA has been widely used. For example, it has been absorbed in computer vision tasks including image processing [20] and saliency detection [21]. Being spatially and temporally discrete, CA operates on a lattice of sites $ p$ $\in$ $P$ $\subseteq$ $\mathbb{Z}^n $ (pixels or voxels in image processing). Commonly used neighborhood systems $N$ are the von Neumann [12] and Moore neighborhoods:

1) Von Neumann Neighborhood:

$ N(p)=\left\{q \in \mathbb{Z}^n:\parallel p-q \parallel_1:=\sum\limits_{1}^{n}\mid p_i-q_i\mid=1\right\}. $ (1)

2) Moore Neighborhoods:

$ N(p)=\left\{q \in \mathbb{Z}^n:\parallel p-q \parallel_\infty:= \mathop {\max}\limits_{i=1, \ldots, n}\mid p_i-q_i\mid=1\right\}. $ (2)

In order to better obtain intrinsic structural information and compute more efficiently, the TA is put forward. The TA performs in a similar way as CA. The only difference is that the TA is operated with super-pixel rather than pixel. Therefore, an over-segmentation approach for pre-processing an image and generating super-pixels is necessary.

$ Num_p $ super-pixels are obtained via SLIC [18]. Each of them is described with mean color features and coordinates of pixels. Then, the proposed TA algorithm performs directly on the super-pixels. Each super-pixel in our algorithm is analogous to a tumor. A (bi-directional, deterministic) TA is a triplet $ A=(S, N, \delta) $, where $ S $ is a non-empty state set, $ N $ is the super-pixel neighborhood system, and $ \delta:$ $S^N$ $\to$ $S $ is the local transition function. In $ t+1 $ time step, this function makes the rule for calculating the tumor's state when given the states of the neighborhood tumors at previous time step $ t $. The theory of eight neighborhoods is still employed in our super-pixel neighborhood strategy $ N $ (as shown in Fig. 4). We look for each tumor's neighbors including tumors surrounding it as well as sharing conjunct boundaries with their adjacent tumors (as shown in Algorithm 1).

Figure 4 The super-pixel neighborhood in our algorithm.
Algorithm 1. Search super-pixel neighborhoods algorithm
$//$ For each tumor, $ s$ is the super-pixel; $p$ is the pixel on the overlap boundary.
$//$ Neighborhood $S$ is the final neighborhoods information of each super-pixel.
for $\forall\, s\in{S}$ do
    $//$ Search the nearest super-pixel
    $Temp\_S \leftarrow {L}(s)$ $//$ record the number of the neighbor
    $Temp\_P \leftarrow P(boundary)$ $//$ record the overlap pixel between
    super-pixels
    for $\forall\, s\in {L(s)}$ do
        if $Temp\_P > threshold $ then
            $Neighborhood\_S \leftarrow s$
        end if
    end for
end for

The tumor's state $ S_p $ in our case is actually a triplet $ (l_p, $ $\theta_p, $ $C_p)$. The $ l_p $ means the current tumor's label. The strength of the current tumor is $ \theta_p $, and $ C_p $ stands for the tumor feature vector, defined by the image. Without loss generality we will assign $ \theta_p \in [0, 1] $. Any input image ($ k\times m $) is segmented into $ Num_p $ super-pixels. Then, a segmented image can be treated as a special configuration condition of a TA, where tumor space $ P $ is represented by $ Num_p $ super-pixels, and initial states for $ \forall\, p \in P $ are set as:

$ \theta_p = 0, ~~ l_p=0, ~~C_p=RGB_p $ (3)

where $ RGB_p $ is the three dimensional vector of mean color of super-pixel in RGB space. The final goal of the segmentation is to assign each super-pixel one of the $K$ possible labels.

2.3 Improved Growcut Algorithm

Growcut is one of the major methods that is used to determine some seed pixels which iteratively attempt to attack their neighbors. Different from growcut, super-pixels are selected for initial seeds in our algorithm. Therefore, each super-pixel has a pre-defined strength. The strength values of the initially selected seed super-pixels are set to one. Instead, all other super-pixels' strength values are set to zero. This initializes the state of the TA. Henceforth, the seed super-pixels expand over the image until the edges of two different labels contact each other. Otherwise, super-pixels continue to attack their neighbors (as shown in the Algorithm 2).

Algorithm 2. Tumors automata evolution rule
$//$ For each tumor
for $\forall \, p\in P$ do
     $//$ Copy previous state
     $l^{t+1}=l^t$
     $\theta_p^{t+1}=\theta_p^t$
     $//$ Neighbors try to attack current tumor
     for $\forall q\in N(p)$ do
         if $g(\|\vec C_p-\vec C_q\|_2)\cdot \theta_q^t > \theta_p^t$ then
             $l_p^{t+1}=l_q^t$
             $\theta_p^{t+1}=g(\|\vec C_p - \vec C_q\|_2)\cdot \theta_q^t$
         end if
     end for
end for

For each iteration, to weaken the power of occupying super-pixels, the strength value of an occupying super-pixel is multiplied with a linear weighting function $g(x) \to [0, 1]$. The difference in the colour of super-pixels between the attacked super-pixels $q$ and the attacking super-pixels $p$ are used to define the $g(x)$. The goal is to effectively weaken the power of an attacking super-pixel. The $g(x)$ is given as following:

$ g(I_p, I_q)=1-\frac{\|I_p-I_q\|}{C_{\max}} \geq 0 $ (4)

where $(I_p, I_q)$ is the color vector of super-pixels $p$ and $q$, and $C_{\max}$ is the maximum color difference.

Let $p$ be the attacking and $q$ the attacked super-pixel. $\theta_p$ denotes the strength and $x$ is the color gradient between $p$ and $q$. Then $p$ occupies $q$ if the decreased strength $\theta_p \ast g(I_p, I_q)$ is higher than $\theta_q$. In this case, the label $l_q$ will be set to $l_p$ and the strength $\theta_q$ will be set to $\theta_p \ast g(I_p, I_q)$. Iteratively each super-pixel in $I$ tries to occupy its neighbors until a stable state is reached for automation. Fig. 2 reveals the processing of image segmentation. Fig. 1 shows some examples of image segmentation results. Our proposed algorithm can be sure to reach the state of convergence by expanding the strength of each tumor until bounded. Fig. 9 illustrates the processing of the iteration. Since the given competition rule is multi-label capable, the improved growcut naturally supports multi-label segmentations.

Figure 5 Object's boundary optimized by the level set.
Figure 6 The results given by the multi-layer TA.
Figure 7 Plotting overlap score vs. no. of seeds.
Figure 8 Multi-labels segmentation results.
Figure 9 The process of the convergence and corresponding results.
2.4 Boundary Smoothness Mechanism

The improved growcut method is able to achieve quality segmentation (as shown in Fig. 1). However, the resulting segments boundary can be ragged (as shown in Fig. 5: the right, middle and left images denote the final binary segmentation results after using level set, the binary segmentation results without level set method and the original images, respectively) in some images. Sometimes the task is to extract the smallest details of the boundaries. However, this can be an unwanted artifact when editing generic high-resolution images.

Once the output is obtained from the improved growcut algorithm with TA, we treat it as the initial boundary of the object. A distance regularized level set evolution (DRLSE) [19] is then applied for optimizing the boundary. Level set approach is the basic principle of plane closed curve and it can be implicitly represented as a two-dimensional surface level set function. The solution of curve movement can be implicited through the processing of the level set function surfaces. The basic equation of the level set function is:

$ \Phi_t+F\mid\Delta\Phi=0 $ (5)

where $\Phi$ is the level set function and $t$ is time. Before performing the optimization, difference image has to be calculated. In calculus of variations, searching the stable state of the gradient flow equation $F(\Phi)$ is the standard approach to minimize an energy function.

$ \frac{\partial \Phi}{\partial t} = \frac{-\partial F}{\partial \Phi} $ (6)

where $\frac{\partial F}{\partial \Phi}$ is the gateaux derivative of the functional $F(\Phi)$. The zero level set represents the target contour curve. The boundary of the object is then the zero level set of $\Phi$, while the object itself is the set of points in the plane for which $\Phi$ is positive (interior of the contour) or zero (at the boundary).

$ \Gamma(t)=\{ (x, y)\mid\Phi((x, y), t)=0\} $ (7)

where $(x, y)$ is pixel of the boundary and $t$ is time step. Fig. 6 shows the optimized results by TA.

The results illustrate that the multi-layer TA algorithm can optimize the boundary of an object. Even though some results are not satisfying, it is clear that all of those are greatly improved and reach a high accuracy level after evolution.

2.5 Multi-layer Tumors Automata Mechanism

Numerous novel approaches have been raised to solve the issue of the interactive object segmentation. Each of them has its own superiorities and disadvantages. To make use of the advantage of each approach, an effective mechanism to amalgamate $M$ segmentation outputs generated by $M$ state-of-the-art algorithms has been designed by Yao [21]. Inspired by it, we treat each of those re-labelled images as a layer of the TA. We employ two state-of-the-art algorithms proposed by Arndt [10] and Li [17] as two individual layers of the TA.

In MTA, each tumor represents a super-pixel. $Num_p$ denotes the total number of the super-pixels in an image. Different from the definition of neighborhood in Section 4.1, super-pixels with the same center coordinate in different outputs are neighbors for MTA. For any tumor, in fact, it has $M-1$ neighbors which get from other outputs. Meanwhile, each neighbor is considered to have the same force to control the tumor's next state.

After segmentation, the super-pixel $i$ will be determined and denoted as following:

$ \begin{align} \begin{cases} background, & {\rm if~} l_{sp}=0\\ object, & {\rm if~} l_{sp}=1 \end{cases} \end{align} $ (8)

where $l_{sp} = 1$ means that this super-pixel belongs to an object. In contrast, $l_{sp} = 0$ indicates that this super-pixel belongs to the background. Since the segmentation may not always be correct, a super-pixel binarized or determined as object does not mean that it actually belongs to the foreground. Thus, the MTA is proposed to improve the segmentation accuracy. Different from Yao's method [21] that makes use of multiple final outputs in the Bayesian framework, the tumors are fed back to the improved growcut framework in our method. In Fig. 5, (a), (d), (g), (j), (m), (p) are the original images, (b), (e), (h), (k), (n), (q) are the binary segmentations, and (c), (f), (i), (l), (o), (r) are the object segmented contours. (a)-(c), (g)-(i), (m)-(o) are the segmentations without MTA, while (d)-(f), (j)-(l), (p)-(r) are the segmentations with MTA.

3 Experiments

We conduct several experiments to test and verify the effectiveness and robustness of the proposed approach. It is tested in the PASCAL VOC segmentation challenge [22] to evaluate the quality of our interactive segmentation method and compare it with existing algorithms based on the new and harder dataset [23] which augments the existing grabcut dataset [8] with images and ground truth taken from the PASCAL VOC segmentation challenge [22]. Details are described below.

3.1 Robustness Analysis of Our Algorithm

Interactive system quality is evaluated as the average number of super-pixel seeds required to achieve segmentation quality within a certain band. Fig. 7 illustrates the result and measure interaction effort. The graph of overlap score versus number of super-pixel seeds captures how the accuracy of the segmentation varies with successive user interactions, and the average number of seeds summarizes that in a single score. Here overlap score, the measure used to evaluate segmentation quality in the VOC segmentation challenge [22], is given by

$ score = 100*\frac{y \cap y_{gt}}{y \cup y_{gt}} $ (9)

where $y$ denotes output segmentation and $y_{gt}$ denotes ground truth. The average is computed over a certain range of scores, and we take $S_{\rm low} = 83$, $S_{\rm high} = 95$.

3.2 Quantitative Analysis of Segmentation and Segmentation Efficiency

This section analyzes comparison results obtained using the user interaction on the introduced dataset with: RW [9], original growcut [11], graph-cut [6], regioncut [10] and the bipartite graph partitioning approach (BGPA) [17]. The results are analyzed from more challenging images which consists of 151 images, as listed in [23], of which 49 images are taken from the grabcut dataset, 99 from the PASCAL VOC'09 dataset and 3 images from the Alpha matting dataset to ensure a fair comparison with other methods. We evaluate the proposed method on the harder dataset [23] which augments the existing grabcut dataset [8] with images and ground truth taken from the PASCAL VOC segmentation challenge [22]. The VOC segmentation challenge images come with ground truth labeling of object classes, and are facilitated by the foreground segmentation. Even though the VOC dataset has simpler shapes (car, bus) but more complex appearances, where the color distributions of foreground and background overlap. Images from the grabcut dataset contain complex shapes but the foreground and background tend to have disjoint color distributions. Because of the different methods, we do our best to select the seeds which have the similar conditions for each method, and make all kinds of algorithm to get the best results. Experiments are conducted on a computer with an i5 and 2.67 GHz Intel Core. The precision measure taken for binary segmentation is defined by:

$ Precision = \frac{Num_{\rm olp}}{Num_{\rm tp}} $ (10)

where the $Num_{\rm olp}$ is the number of overlap label pixels and $Num_{\rm tp}$ denotes the total pixels. For the issue of multi-label segmentation, we use the IcgBench dataset mentioned by Santner et al. [24], and the mean Dice evaluation score is given as following:

$ score = \frac{1}{N}\sigma_{i=1}^{N}{\rm dice}(E_i, GT_i) = \frac{1}{N}\sigma_{i=1}^{N}\frac{2|E_i\bigcap GT_i|}{|E_i|+|GT_i|} $ (11)

where $|\cdot|$ denotes the region of a segmentation $E_i$. $GT_i$ means the ground truth labeling and $N$ is the number of segments.

Table Ⅰ shows the evaluation of the proposed algorithm on the test images compared with the super-pixel-based approaches, for instance, regioncut [10] and the BGPA [17] as well as the pixel-based approaches, such as RW [9], growcut [11] and graph-cut [6]. It is clearly shown that both improvements, the TA mechanism and the Multi-layer mechanism, increase the accuracy of the proposed scheme. In summary, our algorithm clearly outperforms not only the original growcut method but also the graph-cut framework in terms of mean error rate.

Table Ⅰ Accuracy Rates on the Harder Dataset [23] Based on Different Methods

The proposed interactive object segmentation algorithm is evaluated via a lasso initialization. The initializations given by the introduced dataset are utilized. There is no need for regional analysis by using the lasso initialization. Our pre-initialization measure is conducted to initialize as much as possible of the image except the edges. Compared to the original growcut algorithm, our proposed algorithm performs better and outperforms the regioncut with discriminatively learning parameters. In addition, the proposed method can handle the issue of the Multi-label segmentation (as shown in Fig. 8).

4 Discussion 4.1 Neighborhood Measure

To strengthen smoothness, only neighboring pixels weights could be attained by the von Neumann and Moore neighborhoods. However, this situation will end up with the super-pixels applied in our algorithm. To compensate for the smoothness on the neighboring across super-pixels, different methods have been proposed. Reference [17] connects the neighboring super-pixels which are similar in feature space. It is different from the well known CA, the influences of all neighbors are fixed [11], [12] or depending on the similarity between any pair of cells in color feature space [18], [25]. If the object's color is similar with the background, this will bring noise and cause confusion with background. It is very hard to handle those solutions (as shown in Fig. 10: (a), (g) and (b), (h) are the original images and the ground truth. (c), (i) and (d), (j) are the results obtained by using the feature space neighborhood measure. (e), (k) and (f), (l) are our results by using the proposed neighborhood measure). For the proposed super-pixel neighborhood systems, we still use the idea of eight neighborhoods (see Fig. 4). More details are shown in the Algorithm 1.

Figure 10 The segmentation results with different neighborhood measure. (a), (g) and (b), (h) are the original images and the ground truth. (c), (i) and (d), (j) are the results obtained by using the feature space neighborhood measure. (e), (k) and (f), (l) are our results by using the new neighborhood measure.

In Algorithm 1, we set threshold value as 20. If the overlap boundary line is more than 20 pixels, we think that the boundary is the common border. The super-pixel pair are the neighborhood. Otherwise, this boundary will be out of our consideration.

4.2 Efficiency Analysis

Similarly as the regioncut [10] that ignores pre-learning and pre-classification, we analyze the efficiency of the proposed algorithm by ignoring the pre-process of over-segmentation. Fig. 9 shows a segmentation example of the lotus image compared to graph-cut [25] and regioncut [10]. In Fig. 9, the bottom, middle and top images represent the results of regioncut, graph-cut and ours, respectively. It is easy to find that the convergence speed of the proposed method is faster than others.

Fig. 9 shows that our method is with the least user strokes compared with regioncut and graph-cut. Clearly, the results are the best not only in the speed of convergence but also the accuracy. The results are summarized in Table Ⅱ. The evaluation of the proposed algorithm on the segmentation benchmark is demonstrated, which is compared with the original growcut algorithm and graph-cut based on a coarse initialization. The results illustrate that our method is better than the graph-cut and regioncut. Compared to graph-cut, the super-pixel contains the boundary feature as well as the space information, which promotes the accuracy of segmentation. Meanwhile, compared to regioncut, the processing of pre-classification brings imprecise weights for superpxiels. It is not conducive to the final segmentation. Even though the total time (4.75 s) of our method is more than the time (3.28 s)of the graph-cut, our accuracy is higher than graph-cut's. The Figs. 12-14 are the example results compared with different segmentation methods. The first, second, third and fourth rows are the results of the BGPA [17], graph-cut [6], regioncut [10] and our method, respectively. It is clear that the proposed algorithm achieves better result than other methods.

Table Ⅱ Comparison of Segmentation Efficiency
Figure 11 Failure examples. The result demonstrates that our method is a little sensitive to the color.
Figure 12 The results compared with different segmentation methods. The first row is the result of the BGPA [17]. The second row is the result of the graph-cut [6]. The third row is the result of the regioncut [10]. The last row is the result of our method.
Figure 13 The results compared with different segmentation methods. The first row is the result of the BGPA [17]. The second row is the result of the graph-cut [6]. The third row is the result of the regioncut [10]. The last row is the result of our method.
Figure 14 The results compared with different segmentation methods. The first row is the result of the BGPA [17]. The second row is the result of the graph-cut [6]. The third row is the result of the regioncut [10]. The last row is the result of our method.
4.3 Shortcoming

The results demonstrate how cogent our algorithm can be if the initialization with distributed and reasonable seeds is given. However, the experimental results also illustrate the deficiencies of the proposed method. Our approach has the ability to compete with state-of-the-art segmentation methods, by contrast, without needing the time consumption by user initializations. Since only the RGB color feature is extracted, it is a flaw that our approach is a little sensitive to color distribution. It is empirically found that if the seeds can cover the main features of the object and background, good segmentation boundary can be extracted. Some promising works [17], [26] have addressed effective methods for arc weight estimation during the seeds marking process. Their works take into account image attributes and object information to enhance the discontinuities between object and background, whereas a visual feedback can be provided to the user for next action. We will investigate how to incorporate these methods into our work in the future. Fig. 11 demonstrates the failure examples. (a), (f) are the original images, (b), (g) are the marked seeds, (c), (h) are the binary segmentation results, (d), (i) are the ground truth and (e), (j) are the target contour results. The selected tumors only connect to object regions and are very similar with background. So they are easily assigned to the same labels. Hence, we will extract more effective and robust features to solve the issue of object segmentation.

4.4 Over-segmentation

In the proposed method, the spatial proximity weight and the number of super-pixels must be provided. The bigger the number is, the more super-pixels will be segmented; the smaller the number is, the less super-pixels will be segmented. Segmenting more super-pixels will take more time and vice versa. However, fewer super-pixels cannot offer the rich information of edges. Therefore, selecting a proper amount for over-segmentation is very important. For practical use, we will continue to research an adaptive segmentation algorithm according to the size of image or the frequency distribution of an image in the future. In our experiment, the spatial proximity weight and the number of super-pixels are set to 10 and 300.

5 Conclusion

In this paper, we have investigated a new approach for solving the issue of interactive object segmentation in the image. The presented TA was similar to CA. However, the TA could operate super-pixel directly. Based upon TA, a novel growcut strategy was motivated to handle super-pixels via interactions with neighbors. Experiments illustrated that our approach achieved superior performance and exceeded other state-of-the-arts. It demonstrated by experiments that the context-based multi-layer TA could effectively enhance any given state-of-the-art methods to obtain more accurate results.

In the future, we will continue to improve the performance of proposed approach by extracting more effective features and integrating more algorithms. Implementing a high performance version by the graphics processing unit to fully explore the parallel nature of the algorithm is also a promising direction.

References
1
V. Kolmogorov and R. Zabin, "What energy functions can be minimized via graph cuts, "IEEE Trans. Patt. Anal. Mach. Intell., 2004, 26(2): 147-159. DOI:10.1109/TPAMI.2004.1262177
2
M. A. G. Carvalho and A. L. Costa, "Combining hierarchical structures on graphs and normalized cut for image segmentation, " New Frontiers in Graph Theory, Y. G. Zhang, Ed. Rijeka, Yugoslavia: InTech Open Access Publisher, 2012.
3
J. Carreira and C. Sminchisescu, "Constrained parametric min-cuts for automatic object segmentation, " in Proc. 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 2010, pp. 3241-3248.
4
D. Kuettel and V. Ferrari, "Figure-ground segmentation by transferring window masks, " in Proc. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 2012, pp. 558-565.
5
C. Rother, V. Kolmogorov and A. Blake, "'grabcut':Interactive foreground extraction using iterated graph cuts, "ACM Trans. Graph., 2004, 23(3): 309-314. DOI:10.1145/1015706
6
X. Bai and G. Sapiro, "A geodesic framework for fast interactive image and video segmentation and matting, " University of Minnesota, Minnesota, USA, Tech. Rep. 2171, 2007.
7
L. Yu and C. S. Li, "Low depth of field image automatic segmentation based on graph cut, "J. Autom., 2014, 10: 1471-1481.
8
O. Sener, K. Ugur, and A. A. Alatan, "Error-tolerant interactive image segmentation using dynamic and iterated graph-cuts, " in Proc. 2nd ACM International Workshop on Interactive Multimedia on Mobile and Portable Devices, New York, NY, USA, 2012, pp. 9-16.
9
L. Grady, "Random walks for image segmentation, "IEEE Trans. Patt. Anal. Mach. Intell., 2006, 28(11): 1768-1783. DOI:10.1109/TPAMI.2006.233
10
O. J. Arndt, B. Scheuermann, and B. Rosenhahn, "'Regioncut'-interactive multi-label segmentation utilizing cellular automaton, " in Proc. 2013 IEEE Workshop on Applications of Computer Vision (WACV), Tampa, FL, USA, 2013, pp. 309-316.
11
V. Vezhnevets and V. Konouchine, ""GrowCut": Interactive multi-label N-D image segmentation by cellular automata, " in Proc. Graphicon, Novosibirsk Akademgorodok, Russia, 2005, pp. 150-156.
12
J. Von Neumann and A. W. Burks, Theory of Selfreproducing Automata. Champaign, IL, USA: University of Illinois Press, 1966.
13
A. Blake, C. C. E. Rother, and P. Anandan, "Foreground extraction using iterated graph cuts, " U. S. Patent 7 660 463, Feb. 9, 2010.
14
R. Dondera, V. Morariu, Y. L. Wang, and L. Davis, "Interactive video segmentation using occlusion boundaries and temporally coherent superpixels, " in Proc. 2014 IEEE Winter Conference on Applications of Computer Vision (WACV), Steamboat Springs, CO, USA, 2014, pp. 784-791.
15
M. Ghafarianzadeh, M. B. Blaschko, and G. Sibley, "Unsupervised spatio-temporal segmentation with sparse spectral clustering, " in Proc. British Machine Vision Conference (BMVC), Nottingham, UK, 2014.
16
I. Gallo, A. Zamberletti, and L. Noce, "Interactive object class segmentation for mobile devices, " in Proc. 27th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Rio de Janeiro, Brazil, 2014, pp. 73-79.
17
Z. G. Li, X. M. Wu, and S. F. Chang, "Segmentation using superpixels: A bipartite graph partitioning approach, " in Proc. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 2012, pp. 789-796.
18
R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua and S. Süsstrunk, "Slic superpixels compared to state-of-the-art superpixel methods, "IEEE Trans. Patt. Anal. Mach. Intell., 2012, 34(11): 2274-2282. DOI:10.1109/TPAMI.2012.120
19
C. M. Li, C. Y. Xu, C. F. Gui and M. D. Fox, "Distance regularized level set evolution and its application to image segmentation, "IEEE Trans. Image Process., 2010, 19(12): 3243-3254. DOI:10.1109/TIP.2010.2069690
20
P. L. Rosin, "Image processing using 3-state cellular automata, "Comp. Vision Image Understand., 2010, 114(7): 790-802. DOI:10.1016/j.cviu.2010.02.005
21
Y. Qin, H. C. Lu, Y. Q. Xu, and H. Wang, "Saliency detection via cellular automata, " in Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 110-119.
22
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The Pascal visual object classes challenge 2009 (VOC2009), " in Summary Presentation at the 2009 PASCAL VOC Workshop, 2009.
23
V. Gulshan, C. Rother, A. Criminisi, A. Blake, and A. Zisserman, "Geodesic star convexity for interactive image segmentation, " in Proc. 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 2010, pp. 3129-3136.
24
J. Santner, T. Pock, and H. Bischof, "Interactive multi-label segmentation, " in Asian Conference on Computer Vision, R. Kimmel, R. Klette, and A. Sugimoto, Eds. Berlin, Heidelberg, Germany: Springer, 2010, pp. 397-410.
25
Y. Y. Boykov and M. P. Jolly, "Interactive graph cuts for optimal boundary & region segmentation of objects in ND images, " in Proc. 8th IEEE International Conference on Computer Vision, Vancouver, BC, USA, vol. 1, pp. 105-112, Jul. 2001.
26
P. A. V. de Miranda, A. X. Falcáo and J. K. Udupa, "Synergistic arc-weight estimation for interactive image segmentation using graphs, "Comp. Vision Image Understand., 2010, 114(1): 85-99. DOI:10.1016/j.cviu.2009.08.001