2. 天津工业大学 天津 300384
2. College of Computer Science, Tianjin University of Technology, Tianjin 300384, China
Image segmentation task is to divide an image into regions of interest that are suitable for machine or human operations [1], [2] like image retrieval and recognition. Recently, the accuracies of completely automatic segmentation techniques [3], [4] have been enhanced substantially. Nevertheless, the achievements of current stateoftheart algorithms still cannot satisfy the accuracy requirement of professional image editors for choosing target boundaries. Many interactive algorithms have been proposed to improve the accuracy recently. These algorithms are based on the graphbased theory, including interactive grabcut [5], graphcut [6][8], random walks [9], regioncut [10] and growcut [11].
Graphcut [6] is an assembled optimization strategy to address the issue of the object segmentation in an image. An image is treated as a graph and each pixel is a graph node. The globally optimal pixel labelling for twolabel case (i.e., object and background) can be efficiently computed by using maxflow/mincut algorithms. Grabcut [5] is an improvement of graphcut by merging an iterative segmentation mechanism. The first proximity of the ultimate foreground/background labelling can be found when the user draws a rectangular box surrounding the target of interest. Random walker (RW) [9] acquires a few pixels as userdetermined seed labels, but it gives an analytical decision of the probability which a random walker starts at each unlabelled pixel will attain one of the prelabelled pixels firstly. Object segmentation is gained by distributing each pixel to the label for which the greatest probability is computed. Some special images with poor structure, color, and appearance features also can employ the RW for editing. But it is not easy to control and accomplish this kind of energy minimizing approach. Regioncut [10] associates the traits of the robustness of region information and the precision of gradient oriented segmentation approaches. Furthermore, the distributed seeds are initialized by region probabilities. This method can reach the state of convergence without user initialized seeds. Under the framework of the cellular automata (CA) [12], an interactive segmentation method, named growcut [11], is proposed. There are two major properties of this algorithm. One is the possibility to deal with the multilabel segmentations. The other is that this approach can be extended to handle the highdimensional images.
In computer vision, interactive object segmentation plays a significant role in photo analysis and image editing. Under interactions in terms of scribbles [1], [2] or bounding boxes [13] around the object of interest for seeds, users can directly utilize the segmentation algorithm towards a desired output. Recently, researchers have presented many powerful approaches for interactive image segmentation. In this paper, we focus on the literature of interactive segmentation performed with superpixel.
In regioncut [10], a Gaussian mixture model (GMM) and a precision of gradient oriented segmentation method are learned by combining the robustness of region information. The GMM is applied in preinitializing the region probabilities. In this way, it is similar to distributed seeds. The final segmentation output is still gained from building a pixelbased graph. Additionally, the ineffectiveness of only using final segmentation mask is shown in the results. In [14], the method fuses the framework mentioned in [15] to obtain superpixels on each frame independently. After that, the optical forward and backward information are utilized to build a spatiotemporal superpixel graph. The graphs based on occlusion boundaries are focused on and the major contribution is to use the information of an occlusion boundary detector to modify them. Subsequently, the spatiotemporal superpixel graph is partitioned into object and background by graphcut. In [16], superpixels serve as interactive buttons which can be tapped by the user quickly to add or remove an initial low quality segmentation mask, with the purpose of correcting the segmentation errors and generating promising results. Reference [17] develops an innovative segmentation framework based on bipartite graph partitioning, in which the multilayer superpixels can be fused in a principled manner. Computationally, it is tailored to unbalance bipartite graph structure and lead to a highly efficient, lineartime spectral algorithm. As far as our information goes, nevertheless, almost all the existing interactive approaches initialize the object and background via pixels.
In this paper, a new interactive segmentation algorithm is developed. It is based on the superpixel level and the simple linear iterative clustering (SLIC) [18] is employed as the superpixels generators. Each generated superpixel is simply labelled as object or background. The segmentation is then updated by the graphcut. To obtain the best result, both level set [19] and multilayer TA approaches are applied. The corresponding segmentation results of our algorithm are illustrated in Fig. 1. The major contributions of our work are summarized as following.
1) A novel mechanism called tumors automata (TA) is proposed to improve the traditional CA method by using superpixel to replace pixel.
2) For better incorporating the superpixel to deal with TA mechanism, an improved growcut approach is developed.
3) A level set method [19] is applied for smoothing the object boundary, which uses the output from the previous process for initialization.
4) To make use of multiple final outputs, a multilayer tumors automata (MTA) is proposed. By integrating different segmentation results, the MTA brings a boost in performance and beats many leading methods in the stateoftheart.
The process of the proposed algorithm and the corresponding outputs are illustrated in Fig. 2. The original image (Fig. 2(a) is first oversegmented (Fig. 2(b)) and then initialized to select seeds (Fig. 2(c)). Black, white and gray colors represent the background labelled bacteria, object labelled bacteria and the neutral territory, respectively. Fig. 2(d), Fig. 2(e) and Fig. 2(f) demonstrate the results of object binary, object region and object boundary by our algorithm, respectively.
We show each module of our algorithm in Fig. 3. User labelling is needed firstly and neighbors information is computed by superpixelbased calculation. Then, we apply the modified growcut approach with TA in segmenting the whole image. The proposed method also absorbs the level set method to smooth the boundary.
As demonstrated in Fig. 2, we need to oversegment the image initially. The spatial proximity weight and the number of superpixels must be provided. The image is then divided into superpixels
The CA has been widely used. For example, it has been absorbed in computer vision tasks including image processing [20] and saliency detection [21]. Being spatially and temporally discrete, CA operates on a lattice of sites
1) Von Neumann Neighborhood:
$ N(p)=\left\{q \in \mathbb{Z}^n:\parallel pq \parallel_1:=\sum\limits_{1}^{n}\mid p_iq_i\mid=1\right\}. $  (1) 
2) Moore Neighborhoods:
$ N(p)=\left\{q \in \mathbb{Z}^n:\parallel pq \parallel_\infty:= \mathop {\max}\limits_{i=1, \ldots, n}\mid p_iq_i\mid=1\right\}. $  (2) 
In order to better obtain intrinsic structural information and compute more efficiently, the TA is put forward. The TA performs in a similar way as CA. The only difference is that the TA is operated with superpixel rather than pixel. Therefore, an oversegmentation approach for preprocessing an image and generating superpixels is necessary.
Algorithm 1. Search superpixel neighborhoods algorithm 
for superpixels for if end if end for end for 
The tumor's state
$ \theta_p = 0, ~~ l_p=0, ~~C_p=RGB_p $  (3) 
where
Growcut is one of the major methods that is used to determine some seed pixels which iteratively attempt to attack their neighbors. Different from growcut, superpixels are selected for initial seeds in our algorithm. Therefore, each superpixel has a predefined strength. The strength values of the initially selected seed superpixels are set to one. Instead, all other superpixels' strength values are set to zero. This initializes the state of the TA. Henceforth, the seed superpixels expand over the image until the edges of two different labels contact each other. Otherwise, superpixels continue to attack their neighbors (as shown in the Algorithm 2).
Algorithm 2. Tumors automata evolution rule 
for for if end if end for end for 
For each iteration, to weaken the power of occupying superpixels, the strength value of an occupying superpixel is multiplied with a linear weighting function
$ g(I_p, I_q)=1\frac{\I_pI_q\}{C_{\max}} \geq 0 $  (4) 
where
Let
The improved growcut method is able to achieve quality segmentation (as shown in Fig. 1). However, the resulting segments boundary can be ragged (as shown in Fig. 5: the right, middle and left images denote the final binary segmentation results after using level set, the binary segmentation results without level set method and the original images, respectively) in some images. Sometimes the task is to extract the smallest details of the boundaries. However, this can be an unwanted artifact when editing generic highresolution images.
Once the output is obtained from the improved growcut algorithm with TA, we treat it as the initial boundary of the object. A distance regularized level set evolution (DRLSE) [19] is then applied for optimizing the boundary. Level set approach is the basic principle of plane closed curve and it can be implicitly represented as a twodimensional surface level set function. The solution of curve movement can be implicited through the processing of the level set function surfaces. The basic equation of the level set function is:
$ \Phi_t+F\mid\Delta\Phi=0 $  (5) 
where
$ \frac{\partial \Phi}{\partial t} = \frac{\partial F}{\partial \Phi} $  (6) 
where
$ \Gamma(t)=\{ (x, y)\mid\Phi((x, y), t)=0\} $  (7) 
where
The results illustrate that the multilayer TA algorithm can optimize the boundary of an object. Even though some results are not satisfying, it is clear that all of those are greatly improved and reach a high accuracy level after evolution.
2.5 Multilayer Tumors Automata MechanismNumerous novel approaches have been raised to solve the issue of the interactive object segmentation. Each of them has its own superiorities and disadvantages. To make use of the advantage of each approach, an effective mechanism to amalgamate
In MTA, each tumor represents a superpixel.
After segmentation, the superpixel
$ \begin{align} \begin{cases} background, & {\rm if~} l_{sp}=0\\ object, & {\rm if~} l_{sp}=1 \end{cases} \end{align} $  (8) 
where
We conduct several experiments to test and verify the effectiveness and robustness of the proposed approach. It is tested in the PASCAL VOC segmentation challenge [22] to evaluate the quality of our interactive segmentation method and compare it with existing algorithms based on the new and harder dataset [23] which augments the existing grabcut dataset [8] with images and ground truth taken from the PASCAL VOC segmentation challenge [22]. Details are described below.
3.1 Robustness Analysis of Our AlgorithmInteractive system quality is evaluated as the average number of superpixel seeds required to achieve segmentation quality within a certain band. Fig. 7 illustrates the result and measure interaction effort. The graph of overlap score versus number of superpixel seeds captures how the accuracy of the segmentation varies with successive user interactions, and the average number of seeds summarizes that in a single score. Here overlap score, the measure used to evaluate segmentation quality in the VOC segmentation challenge [22], is given by
$ score = 100*\frac{y \cap y_{gt}}{y \cup y_{gt}} $  (9) 
where
This section analyzes comparison results obtained using the user interaction on the introduced dataset with: RW [9], original growcut [11], graphcut [6], regioncut [10] and the bipartite graph partitioning approach (BGPA) [17]. The results are analyzed from more challenging images which consists of 151 images, as listed in [23], of which 49 images are taken from the grabcut dataset, 99 from the PASCAL VOC'09 dataset and 3 images from the Alpha matting dataset to ensure a fair comparison with other methods. We evaluate the proposed method on the harder dataset [23] which augments the existing grabcut dataset [8] with images and ground truth taken from the PASCAL VOC segmentation challenge [22]. The VOC segmentation challenge images come with ground truth labeling of object classes, and are facilitated by the foreground segmentation. Even though the VOC dataset has simpler shapes (car, bus) but more complex appearances, where the color distributions of foreground and background overlap. Images from the grabcut dataset contain complex shapes but the foreground and background tend to have disjoint color distributions. Because of the different methods, we do our best to select the seeds which have the similar conditions for each method, and make all kinds of algorithm to get the best results. Experiments are conducted on a computer with an i5 and 2.67 GHz Intel Core. The precision measure taken for binary segmentation is defined by:
$ Precision = \frac{Num_{\rm olp}}{Num_{\rm tp}} $  (10) 
where the
$ score = \frac{1}{N}\sigma_{i=1}^{N}{\rm dice}(E_i, GT_i) = \frac{1}{N}\sigma_{i=1}^{N}\frac{2E_i\bigcap GT_i}{E_i+GT_i} $  (11) 
where
Table Ⅰ shows the evaluation of the proposed algorithm on the test images compared with the superpixelbased approaches, for instance, regioncut [10] and the BGPA [17] as well as the pixelbased approaches, such as RW [9], growcut [11] and graphcut [6]. It is clearly shown that both improvements, the TA mechanism and the Multilayer mechanism, increase the accuracy of the proposed scheme. In summary, our algorithm clearly outperforms not only the original growcut method but also the graphcut framework in terms of mean error rate.
The proposed interactive object segmentation algorithm is evaluated via a lasso initialization. The initializations given by the introduced dataset are utilized. There is no need for regional analysis by using the lasso initialization. Our preinitialization measure is conducted to initialize as much as possible of the image except the edges. Compared to the original growcut algorithm, our proposed algorithm performs better and outperforms the regioncut with discriminatively learning parameters. In addition, the proposed method can handle the issue of the Multilabel segmentation (as shown in Fig. 8).
4 Discussion 4.1 Neighborhood MeasureTo strengthen smoothness, only neighboring pixels weights could be attained by the von Neumann and Moore neighborhoods. However, this situation will end up with the superpixels applied in our algorithm. To compensate for the smoothness on the neighboring across superpixels, different methods have been proposed. Reference [17] connects the neighboring superpixels which are similar in feature space. It is different from the well known CA, the influences of all neighbors are fixed [11], [12] or depending on the similarity between any pair of cells in color feature space [18], [25]. If the object's color is similar with the background, this will bring noise and cause confusion with background. It is very hard to handle those solutions (as shown in Fig. 10: (a), (g) and (b), (h) are the original images and the ground truth. (c), (i) and (d), (j) are the results obtained by using the feature space neighborhood measure. (e), (k) and (f), (l) are our results by using the proposed neighborhood measure). For the proposed superpixel neighborhood systems, we still use the idea of eight neighborhoods (see Fig. 4). More details are shown in the Algorithm 1.
In Algorithm 1, we set threshold value as 20. If the overlap boundary line is more than 20 pixels, we think that the boundary is the common border. The superpixel pair are the neighborhood. Otherwise, this boundary will be out of our consideration.
4.2 Efficiency AnalysisSimilarly as the regioncut [10] that ignores prelearning and preclassification, we analyze the efficiency of the proposed algorithm by ignoring the preprocess of oversegmentation. Fig. 9 shows a segmentation example of the lotus image compared to graphcut [25] and regioncut [10]. In Fig. 9, the bottom, middle and top images represent the results of regioncut, graphcut and ours, respectively. It is easy to find that the convergence speed of the proposed method is faster than others.
Fig. 9 shows that our method is with the least user strokes compared with regioncut and graphcut. Clearly, the results are the best not only in the speed of convergence but also the accuracy. The results are summarized in Table Ⅱ. The evaluation of the proposed algorithm on the segmentation benchmark is demonstrated, which is compared with the original growcut algorithm and graphcut based on a coarse initialization. The results illustrate that our method is better than the graphcut and regioncut. Compared to graphcut, the superpixel contains the boundary feature as well as the space information, which promotes the accuracy of segmentation. Meanwhile, compared to regioncut, the processing of preclassification brings imprecise weights for superpxiels. It is not conducive to the final segmentation. Even though the total time (4.75 s) of our method is more than the time (3.28 s)of the graphcut, our accuracy is higher than graphcut's. The Figs. 1214 are the example results compared with different segmentation methods. The first, second, third and fourth rows are the results of the BGPA [17], graphcut [6], regioncut [10] and our method, respectively. It is clear that the proposed algorithm achieves better result than other methods.
The results demonstrate how cogent our algorithm can be if the initialization with distributed and reasonable seeds is given. However, the experimental results also illustrate the deficiencies of the proposed method. Our approach has the ability to compete with stateoftheart segmentation methods, by contrast, without needing the time consumption by user initializations. Since only the RGB color feature is extracted, it is a flaw that our approach is a little sensitive to color distribution. It is empirically found that if the seeds can cover the main features of the object and background, good segmentation boundary can be extracted. Some promising works [17], [26] have addressed effective methods for arc weight estimation during the seeds marking process. Their works take into account image attributes and object information to enhance the discontinuities between object and background, whereas a visual feedback can be provided to the user for next action. We will investigate how to incorporate these methods into our work in the future. Fig. 11 demonstrates the failure examples. (a), (f) are the original images, (b), (g) are the marked seeds, (c), (h) are the binary segmentation results, (d), (i) are the ground truth and (e), (j) are the target contour results. The selected tumors only connect to object regions and are very similar with background. So they are easily assigned to the same labels. Hence, we will extract more effective and robust features to solve the issue of object segmentation.
4.4 OversegmentationIn the proposed method, the spatial proximity weight and the number of superpixels must be provided. The bigger the number is, the more superpixels will be segmented; the smaller the number is, the less superpixels will be segmented. Segmenting more superpixels will take more time and vice versa. However, fewer superpixels cannot offer the rich information of edges. Therefore, selecting a proper amount for oversegmentation is very important. For practical use, we will continue to research an adaptive segmentation algorithm according to the size of image or the frequency distribution of an image in the future. In our experiment, the spatial proximity weight and the number of superpixels are set to 10 and 300.
5 ConclusionIn this paper, we have investigated a new approach for solving the issue of interactive object segmentation in the image. The presented TA was similar to CA. However, the TA could operate superpixel directly. Based upon TA, a novel growcut strategy was motivated to handle superpixels via interactions with neighbors. Experiments illustrated that our approach achieved superior performance and exceeded other stateofthearts. It demonstrated by experiments that the contextbased multilayer TA could effectively enhance any given stateoftheart methods to obtain more accurate results.
In the future, we will continue to improve the performance of proposed approach by extracting more effective features and integrating more algorithms. Implementing a high performance version by the graphics processing unit to fully explore the parallel nature of the algorithm is also a promising direction.
1 
V. Kolmogorov and R. Zabin, "What energy functions can be minimized via graph cuts, "IEEE Trans. Patt. Anal. Mach. Intell., 2004, 26(2): 147159. DOI:10.1109/TPAMI.2004.1262177 
2 
M. A. G. Carvalho and A. L. Costa, "Combining hierarchical structures on graphs and normalized cut for image segmentation, " New Frontiers in Graph Theory, Y. G. Zhang, Ed. Rijeka, Yugoslavia: InTech Open Access Publisher, 2012.

3 
J. Carreira and C. Sminchisescu, "Constrained parametric mincuts for automatic object segmentation, " in Proc. 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 2010, pp. 32413248.

4 
D. Kuettel and V. Ferrari, "Figureground segmentation by transferring window masks, " in Proc. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 2012, pp. 558565.

5 
C. Rother, V. Kolmogorov and A. Blake, "'grabcut':Interactive foreground extraction using iterated graph cuts, "ACM Trans. Graph., 2004, 23(3): 309314. DOI:10.1145/1015706 
6 
X. Bai and G. Sapiro, "A geodesic framework for fast interactive image and video segmentation and matting, " University of Minnesota, Minnesota, USA, Tech. Rep. 2171, 2007.

7 
L. Yu and C. S. Li, "Low depth of field image automatic segmentation based on graph cut, "J. Autom., 2014, 10: 14711481. 
8 
O. Sener, K. Ugur, and A. A. Alatan, "Errortolerant interactive image segmentation using dynamic and iterated graphcuts, " in Proc. 2nd ACM International Workshop on Interactive Multimedia on Mobile and Portable Devices, New York, NY, USA, 2012, pp. 916.

9 
L. Grady, "Random walks for image segmentation, "IEEE Trans. Patt. Anal. Mach. Intell., 2006, 28(11): 17681783. DOI:10.1109/TPAMI.2006.233 
10 
O. J. Arndt, B. Scheuermann, and B. Rosenhahn, "'Regioncut'interactive multilabel segmentation utilizing cellular automaton, " in Proc. 2013 IEEE Workshop on Applications of Computer Vision (WACV), Tampa, FL, USA, 2013, pp. 309316.

11 
V. Vezhnevets and V. Konouchine, ""GrowCut": Interactive multilabel ND image segmentation by cellular automata, " in Proc. Graphicon, Novosibirsk Akademgorodok, Russia, 2005, pp. 150156.

12 
J. Von Neumann and A. W. Burks, Theory of Selfreproducing Automata. Champaign, IL, USA: University of Illinois Press, 1966.

13 
A. Blake, C. C. E. Rother, and P. Anandan, "Foreground extraction using iterated graph cuts, " U. S. Patent 7 660 463, Feb. 9, 2010.

14 
R. Dondera, V. Morariu, Y. L. Wang, and L. Davis, "Interactive video segmentation using occlusion boundaries and temporally coherent superpixels, " in Proc. 2014 IEEE Winter Conference on Applications of Computer Vision (WACV), Steamboat Springs, CO, USA, 2014, pp. 784791.

15 
M. Ghafarianzadeh, M. B. Blaschko, and G. Sibley, "Unsupervised spatiotemporal segmentation with sparse spectral clustering, " in Proc. British Machine Vision Conference (BMVC), Nottingham, UK, 2014.

16 
I. Gallo, A. Zamberletti, and L. Noce, "Interactive object class segmentation for mobile devices, " in Proc. 27th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Rio de Janeiro, Brazil, 2014, pp. 7379.

17 
Z. G. Li, X. M. Wu, and S. F. Chang, "Segmentation using superpixels: A bipartite graph partitioning approach, " in Proc. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 2012, pp. 789796.

18 
R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua and S. Süsstrunk, "Slic superpixels compared to stateoftheart superpixel methods, "IEEE Trans. Patt. Anal. Mach. Intell., 2012, 34(11): 22742282. DOI:10.1109/TPAMI.2012.120 
19 
C. M. Li, C. Y. Xu, C. F. Gui and M. D. Fox, "Distance regularized level set evolution and its application to image segmentation, "IEEE Trans. Image Process., 2010, 19(12): 32433254. DOI:10.1109/TIP.2010.2069690 
20 
P. L. Rosin, "Image processing using 3state cellular automata, "Comp. Vision Image Understand., 2010, 114(7): 790802. DOI:10.1016/j.cviu.2010.02.005 
21 
Y. Qin, H. C. Lu, Y. Q. Xu, and H. Wang, "Saliency detection via cellular automata, " in Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 110119.

22 
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The Pascal visual object classes challenge 2009 (VOC2009), " in Summary Presentation at the 2009 PASCAL VOC Workshop, 2009.

23 
V. Gulshan, C. Rother, A. Criminisi, A. Blake, and A. Zisserman, "Geodesic star convexity for interactive image segmentation, " in Proc. 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 2010, pp. 31293136.

24 
J. Santner, T. Pock, and H. Bischof, "Interactive multilabel segmentation, " in Asian Conference on Computer Vision, R. Kimmel, R. Klette, and A. Sugimoto, Eds. Berlin, Heidelberg, Germany: Springer, 2010, pp. 397410.

25 
Y. Y. Boykov and M. P. Jolly, "Interactive graph cuts for optimal boundary & region segmentation of objects in ND images, " in Proc. 8th IEEE International Conference on Computer Vision, Vancouver, BC, USA, vol. 1, pp. 105112, Jul. 2001.

26 
P. A. V. de Miranda, A. X. Falcáo and J. K. Udupa, "Synergistic arcweight estimation for interactive image segmentation using graphs, "Comp. Vision Image Understand., 2010, 114(1): 8599. DOI:10.1016/j.cviu.2009.08.001 