To make it easier and more natural to interact with robots, people put forward new demands to human-robot interaction (HRI) , . It is hoped that robots can recognize human's facial expressions, understand emotions and give appropriate response -. Emotional intelligence robots have attracted great attention in recent years. Research on emotional robot involves many fields such as computer science and psychology -. However, current research is still in preliminary stage. There are only a few intelligent service systems with emotion. Mascot Robot System including five eye robots was proposed in -, in which the eye robot can achieve a friendly interaction with human by eye-rolling and speech recognition. A face robot called KAPPA is introduced in  which can recognize emotions by facial expressions and generate six basic emotions. Minotaurus robot system is introduced in , where a smart human-robot interaction environment is built and the robot can interact with users by gestures, speech, and facial expressions. Although several human-robot interaction systems involve the emotion of robots, only a few researchers study on both emotion recognition and emotion expression by robots to facilitate smooth communication between humans and robots.
A facial expression emotion recognition based human-robot interaction (FEER-HRI) system is proposed, which is a sub-system of multi-modal emotional communication based human-robot interaction (MEC-HRI) system . There are three NAO robots, two mobile robots, Kinect, workstation, server, eye tracker, portable electroencephalograph (EEG), and other intelligent devices in the MEC-HRI system. FEER-HRI system is designed primarily for two targets: one is the robot's abilities to recognize human emotions based on facial expressions, the other is the robot's abilities to generate emotions for emotional communication with humans instead of unemotional communication as the traditional one.
The operation processes of system consist of three steps. Firstly, the robot collects the human face image data through the Kinect and transmits it to the workstation. Secondly, the facial expression recognition method based on extreme learning machine (ELM) is used to recognize users' emotions and then the system generates robots' facial expressions for adapting to users. Thirdly, system transmits the affective control signal to the robot and the robot can respond to users by expressing its own facial expressions which are made up of some basic cartoon symbols.
The remainder of this paper is organized as follows. The architecture of MEC-HRI system and scenarios design are presented in Section Ⅱ. Facial expression feature extraction method is briefly introduced in Section Ⅲ. Experiment setup and experiment results are given in Section Ⅳ.Ⅱ. ARCHITECTURE OF MEC-HRI SYSTEM
MEC-HRI system can realize multi-modal emotional communication through speech, facial expressions, body gestures, etc. Emotional robots and emotional information acquisition equipment/sensors are connected to a workstation, in which emotion recognition algorithms for facial expressions are embedded. In MEC-HRI system, hardware devices can be extended and algorithms can be improved.A. Hierarchical Structure of MEC-HRI System
The hierarchy of MEC-HRI system is divided into four layers. From bottom to top, there are hardware layer, physical interface layer, data processing layer, and application layer, as shown in Fig. 1. The hardware layer is used to capture humans' emotional signals and express robots' emotions, in which the sensor module is responsible for data collection and pretreatment, as well as actuator module is responsible for the interaction with users. For instance, the high-resolution camera is used to capture real-time pictures of facial expressions and body gestures; microphones are used to collect speech signals; eye tracker and other motion tracking devices are used to acquire motion information. This module can be extended based on specific system requirements. For example, wearable equipment such as smart glove, intelligent heart-rate belt, and EEG can be used to detect physiological data of human body for emotion recognition. The actuator module is used to control the robot to interact with user based on the emotional analysis and behavior instruction from upper layers. Many interaction equipments like NAO, mobile robot, facial expression interactive software, and mobile terminal can be used to extend this module.
Physical interface layer provides the channel for data transmission, which is the bridge between software and hardware in MEC-HRI system. The network module is responsible for the initialization of network and the communication with each module. The data processing layer is the key part of system which can achieve following functions.
1) Correlation analysis and feature extraction of speech, facial expressions, gestures, and physiological signals.
2) Multi-modal information fusion based on the two-layer fusion structure, i.e., feature level and decision level.
3) In addition to emotion recognition, recognizing human's emotional states and other deep cognitive information during the interaction.
4) Getting the operating instruction of robot by multi-robot behavioral adaption mechanism.
Interactive application layer is the highest layer of the system, which provides a variety of interactive ways, such as speech, facial expressions, gestures, and multi-modal interaction. Besides, there are two interactive objects that the user can choose, one is the robot, and the other is virtual robot via a graphical interface.B. Scenario Design
Four scenarios including guiding, entertainment, home service, and scene simulation are designed as shown in Fig. 2.
In guiding region, there is a mobile robot with functions of guiding and interacting with users effectively. When users enter robots' vision, robots welcome users according to their historical data. For a new user, the robot will give a happy expression by LED screen, and then talk with users and guide them.
In entertainment region, there are two NAO robots which can play a finger guessing game with users. Camera of NAO robot captures pictures and transmits it to the workstation, by which users' gestures are recognized and the game result is judged. NAO robot expresses emotions according to game results by speech, facial expressions, and gestures. Different users can also play this game with each other, and NAO acts as a spectator. NAO will cheer for the winner and encourage the loser.
In home service region, there are three NAO robots. Emotional communication between multi-human and multi-robot can be carried out here. NAO robots can provide services for the old, the disabled, and children. When an elder is watching TV, MEC-HRI system is monitoring their health condition through wearable sensors and talk to them. In addition, Kinect can recognize children's gestures and sign language of the disabled.
In scenario simulation region, scenarios can be simulated, e.g., coffee bar. Users can drink coffee here and talk with robots casually. Robots recognize users' emotions through multi-modal information. Meanwhile, MEC-HRI system can change the background music to adjust the atmosphere.Ⅲ. FACIAL EXPRESSION EMOTION RECOGNITION
In order to communicate with users, FEER-HRI system needs to collect and analyze facial information. Facial expression images of the user can be acquired through Kinect equipped in the mobile robot, which are transmitted to the mobile robot through USB port. Then, through the WLAN, they link to the workstation for image processing. Finally, the users' emotional states can be obtained, and system will adapt to users in accordance with their emotions. Considering different cultural backgrounds and people's subjective feelings for understanding emotions, facial expressions are divided into six categories, including happy, angry, surprise, fear, disgust, and sad . Furthermore, different classifications of facial expressions are compared to each other, from which above six categories of facial expressions are thought to be more universal . Therefore, facial expressions are divided into seven basic categories in this paper, i.e., happy, angry, surprise, fear, disgust, sad, and neutral. An approach of facial expression recognition using multi-feature extraction can promote the accuracy rate of classifier, which includes three parts . This process is summarized in Fig. 3. The main steps are feature collection, feature extraction, and emotion recognition.
Firstly, images are preprocessed using face detection  and segmentation. Then, facial images are divided into three regions of interest (ROI), i.e., eyes, nose, and mouth. These three regions contain most of face emotion features. Secondly, facial expression features are extracted using 2D-Gabor filter  and uniform LBP operator . 2D-Gabor filter is robust against illumination change and face pose rotation of human face image. Moreover, 2D-Gabor filter has less calculation and strong real-time performance, which can extract local features of different scales and different directions. Fig. 4 shows the real part of 2D-Gabor filters at five scales and eight directions. When face image is filtered by these 2D-Gabor filters, the energy of other texture features is suppressed, and only the texture features corresponding to featured frequency are passed smoothly. The texture features are composed of all 2D-Gabor filters' output. Fig. 5 shows amplitude spectrum of the segmented eye image after 2D-Gabor feature extraction.
Fig. 4 The real part of the 2D-Gabor filters at five scales and eight directions with the following parameters:
The LBP can describe image texture features, which is used in image processing. The LBP operator compares pixels with their nearby pixels and the results are stored as binary numbers. It is one of the best performing methods in texture features description. In addition, its computational efficiency is high, and it is robust against image offset and the light change. Face often moves and face image is easily affected by the light in each direction. Therefore, LBP operator is very appropriate for feature extraction of facial images. Moreover, LBP operator can well describe local features since face can be seen as the composition of local features.
However, basic LBP operator will produce too many kinds of binary patterns. As a result, the histogram of LBP is too sparse which cannot effectively describe the texture feature . Excessive binary patterns will occupy more storage space and reduce the computational efficiency. To solve this problem, uniform LBP operator is used, which can reduce pattern number from
2D-Gabor cannot capture the subtle changes in each direction and frequency of the texture feature . LBP operator can extract local texture features. The combination of these two methods can effectively integrate the advantages of both, which not only extracts features from multi-scale and multi-direction but also preserves local features of face image. In addition, it reduces the dimension of the data so that computational efficiency is improved. These two methods also make up for their deficiencies. The filtering process of 2D-Gabor wavelet transform can effectively reduce the influence of noise on the LBP operator, and uniform LBP operator enhances the local texture characteristics of the 2D-Gabor wavelet transform.
Figs. 7 and 8 show overall processes of facial emotion recognition. Face features are extracted using the method combining 2D-Gabor and LBP. Furthermore, principal component analysis (PCA) is used to reduce redundant features which can increase the computational efficiency. The processed facial feature is divided into two parts. One is for training and the other one is for testing. Since emotions are divided into seven categories, a multiclass classifier ELM is used for emotion recognition. In our previous works , it was verified that ELM which is used in facial expression recognition has its own characteristics compared with other multi-class classification methods. The computing speed of ELM is fast, the time of modeling and facial expression recognition is usually less than 0.1 second. Meanwhile, the recognition rate of facial expression is usually above 80%. As a result, ELM is adopted for FFER-HRI system, which can meet the requirement of real-time facial recognition.Ⅳ. EXPERIMENTS ON FEER-HRI SYSTEM
The proposed facial recognition algorithm is applied in FEER-HRI system successfully and FEER-HRI system can recognize users' emotions timely and accurately.A. Experimental Setup
MEC-HRI system consists of three NAO robots, two mobile robots, Kinect, eye tracker, two high-performance computers (i.e., a server and a workstation), portable EEG, wearable sensing devices as well as data transmission, and network-connecting devices. The topology structure of MEC-HRI system is shown in Fig. 9.
Two high-performance computers in the system are configured as HP Z840 workstations which consist of two NVIDIA Tesla K40 accelerator card. It can achieve double precision floating point 9Tflops and reach the best configuration of
When MEC-HRI system is built up, both NAO robots and mobile robots are connected to the wireless router via WIFI. The eye tracker, Kinect, and wearable sensing devices access to the mobile workstation that is responsible for capturing emotional information and controlling devices via USB interface and WIFI. Mobile workstation and wireless router are connected to the server and workstation via a hub. NAO robots can capture video images and audio data for emotion recognition of humans. In turn, NAO robots can express its own emotions by using speech, body gestures, and movement according to human emotions.
Fig. 10 shows the structure of Mobile robot, which is mainly composed of an industrial personal computer (IPC), a Kinect, a touch screen, a
The LED screen is the device which can display facial expressions of mobile robot. Compared with human's seven basic expressions, nine kinds of facial expressions, i.e., angry, disgust, fear, neutral, sad, surprise, doubtful, and pitiful are designed for mobile robot. These expressions can fully reflect the emotional state of robots in the process of human-robot interaction. In the FFER-HRI system, facial expressions of robots are represented by some simple cartoon symbols which can express the expression vividly and be easily understood by human.
Fig. 11 shows nine facial expressions displayed on LED screen. Each pattern in the Fig. 11 corresponds to a facial expression, for example, a pattern with two opposite triangles represents anger; a pattern with two love images represents happiness; a pattern with two question marks represents doubt; a pattern with two symmetrical check marks represents sadness; a pattern with two inverted U-shape represents fear. In addition, two extra expressions, i.e., doubtful and pitiful, are added based on human's seven basic expressions. These two expressions are designed according to the characteristics of the robot in human-robot interaction. When the robot cannot recognize users' emotions, it can display the doubtful expression for adapting to users. When users are angry with the robot, the robot can display the pitiful expression in order to gain users' sympathy.
When system is running, Kinect and NAO robots can capture users' facial images. First of all, these image data are transmitted to the server where they are segmented into three ROI, i.e., eyes, nose and mouth. These parts contain most of the facial emotion information. After that, the method of feature extraction and expression recognition, i.e., PCA, the combination of 2D-Gabor and LBP, and ELM classifier are employed to get the final emotion state. Then, the system will make an appropriate affective decision according to the user's emotion. Finally, the server will send certain control instruction to the robots and sensors. As a result, the robot and sensors can make some emotional feedback for adapting to users. For example, mobile robot can express emotion by speech, LED display, and its movement.B. Classification of Facial Expression
The standard face emotion corpus used in this experiment is JAFFE . Fig. 12 shows the some images of this corpus. Seven emotions are included in this corpus, i.e., happy, angry, sad, surprise, neutral, disgust, and fear.
The method combining 2D-Gabor and uniform LBP is used to extract facial emotion features. As shown in Fig. 13, 800 facial features are extracted from every face image. The dimension of the features is too large, which will take a lot of computing resources. To solve this problem, PCA is adopted which can reduce features dimension from 800 to 96. These representative features are input into an ELM classifier  to obtain the final emotion results.C. Application on the Mobile Robot
In order to make mobile robot recognize human emotions, feature extraction methods and classification algorithm in Section Ⅲ should be applied in it. C++ is used to program the Kinect for capturing real-time facial image. Features extraction methods and classification algorithm are realized in MATLAB. In order to combine them, MATLAB program is compiled into dll file which can be called by C++ programming. Robots are connected to workstation via WLAN. By connecting IP and port, we can operate mobile robot from another computer. Fig. 14 shows the operation interface of MEC-HRI system which can connect and control all devices in the system. This operation interfaces show some interaction information between humans and robots. For example, the mobile robot module in Fig. 15 displays real-time images captured by the Kinect equipped in the robot and shows the recognition result of facial expression.
In Fig. 15 (a), the user smiles to the mobile robot. The emotion recognized by the robot is happy and the robot displays a happy expression. In Fig. 15 (b), the user looks angry. The emotion recognized by the robot is angry and the robot displays a sad emotion. The whole process of human-robot interaction is smooth, in which the course from emotional recognition to emotional expression is within 2 seconds.Ⅴ. CONCLUSION
A facial expression emotion recognition based human-robot interaction (FEER-HRI) system is proposed where a four-layer framework is designed for the system. In addition, a facial emotion recognition method based on 2D-Gabor, uniform LBP feature extraction, and ELM multiple classifiers is presented, which is applied to FFER-HRI system. The robot in FEER-HRI~ system~ can~ recognize facial emotions and make simple response for adapting to users. Experiments on the system show that users can perform a simple emotional interaction with robots. Robots can express their emotions by expressions and body gestures according to the emotional state of users.
FEER-HRI system based on facial expression emotion recognition will play a major role in customer service, safe driving, home service, health care, and so on. It would provide better services for human, for example, when customers browse the Web, FEER-HRI system will evaluate customer satisfaction through facial expressions and eye movement. And it could be used for safe driving by monitoring drivers' emotional state from eye movement, facial expression, and physiological parameters. In addition, the system could be used to provide elderly health care and emotional communication with the old. In future research, we will be developing the multi-modal human-robot interaction system based on facial expression, speech, body posture, and physiological signal and exploring its application in our daily life.
|||Z. K. Wang, K. Mülling, M. P. Deisenroth, H. B. Amor, D. Vogt, B. Schölkopf, and J. Peters, "Probabilistic movement modeling for intention inference in human-robot interaction, " Int. J. Robot. Res. , vol. 32, no. 7, pp. 841-858, Apr. 2013. http: //dl. acm. org/citation. cfm?id=2500334. 2500341|
|||K. Qian, J. Niu, and H. Yang, "Developing a gesture based remote human-robot interaction system using kinect, " Int. J. Smart Home, vol. 7, no. 4, pp. 203-208, Jul. 2013. http: //connection. ebscohost. com/c/articles/90307845/developing-gesture-based-remote-human-robot-interaction-system-using-kinect|
|||M. Awais and D. Henrich, "Human-robot interaction in an unknown human intention scenario, " in Proc. 11th Int. Conf. Frontiers of Information Technology, Washington, DC, USA, 2013, pp. 89-94. http: //ieeexplore. ieee. org/document/6717232/|
|||S. Iengo, A. Origlia, M. Staffa, and A. Finzi, "Attentional and emotional regulation in human-robot interaction, " in Proc. 21st IEEE Int. Symp. Robot and Human Interactive Communication, Paris, France, 2012, pp. 1135-1140. http: //ieeexplore. ieee. org/xpls/icp. jsp?arnumber=6343901|
|||J. W. Ryu, C. Park, J. Kim, S. Kang, J. Oh, J. Sohn, and H. K. Cho, "KOBIE: A pet-type emotion robot, " J. Korea Robot. Soc. , vol. 3, no. 2, pp. 154-163, Jun. 2008. http: //www. koreascience. kr/article/ArticleFullRecord. jsp?cn=KROBC7_2008_v3n2_154|
|||L. Zhang, M. Jiang, D. Farid, and M. A. Hossain, "Intelligent facial emotion recognition and semantic-based topic detection for a humanoid robot, " Exp. Syst. Appl. , vol. 40, no. 13, pp. 5160-5168, Oct. 2013. http: //www. sciencedirect. com/science/article/pii/S0957417413001668|
|||H. J. Zhang, Z. Y. Zhao, Z. Y. Meng, and Z. L. Lin, "Experimental verification of a multi-robot distributed control algorithm with containment and group dispersion behaviors: The case of dynamic leaders, " IEEE/CAA J. Automat. Sinica, vol. 1, no. 1, pp. 54-60, Jan. 2014. http: //ieeexplore. ieee. org/document/6391021/|
|||D. S. Kwon, Y. K. Kwak, J. C. Park, M. J. Chung, E. S. Jee, K. S. Park, H. R. Kim, Y. M. Kim, J. C. Park, E. H. Kim, K. H. Hyun, H. J. Min, H. S. Lee, J. W. Park, S. H. Jo, S. Y. Park, and K. W. Lee, "Emotion interaction system for a service robot, " in Proc. 16th IEEE Int. Symp. Robot and Human Interactive Communication, Jeju, Korea, 2007, pp. 351-356. http: //ieeexplore. ieee. org/xpls/abs_all. jsp?arnumber=4415108|
|||J. J. Gross and L. F. Barrett, "The emerging field of affective science, " Emotion, vol. 13, no. 6, pp. 997-998, Dec. 2013. http: //www. ncbi. nlm. nih. gov/pubmed/24320711|
|||K. Hirota and F. Y. Dong, "Development of mascot robot system in NEDO project, " in Proc. 4th Int. IEEE Conf. Intelligent Systems, Varna, Bulgaria, 2008, pp. 1-38-1-44. http: //ieeexplore. ieee. org/document/4670396/|
|||Y. Yamazaki, F. Y. Dong, Y. Masuda, Y. Uehara, P. Kormushev, H. A. Vu, P. Q. Le, and K. Hirota, "Intent expression using eye robot for mascot robot system, " in Proc. 8th Int. Symp. Advanced Intelligent Systems, Madrid, Spain, 2007, pp. 576-580. http: //www. oalib. com/paper/4078089|
|||Y. Yamazaki, H. A. Vu, Q. P. Le, K. Fukuda, Y. Matsuura, M. S. Hannachi, F. Y. Dong, Y. Takama, and K. Hirota, "Mascot robot System by integrating eye robot and speech recognition using RT middleware and its casual information recommendation, " in Proc. 3rd Int. Symp. Computational Intelligence and Industrial Applications, Bali, Indonesia, 2008, pp. 375-384. http: //t2r2. star. titech. ac. jp/cgi-bin/publicationinfo. cgi?q_publication_content_number=CTT100601375|
|||T. Fukuda, D. Tachibana, F. Arai, J. Taguri, M. Nakashima, and Y. Hasegawa, "Human-robot mutual communication system, " in Proc. 10th IEEE Int. Workshop on Robot and Human Interactive Communication, Bordeaux, Paris, France, 2001, pp. 14-19. http: //ieeexplore. ieee. org/xpls/icp. jsp?arnumber=981870|
|||J. Röning, J. Holappa, V. Kellokumpu, A. Tikanmäki, and M. Pietikäinen, "Minotaurus: A system for affective human-robot interaction in smart environments, " Cogn. Comput. , vol. 6, no. 4, pp. 940-953, Dec. 2014. http: //link. springer. com/article/10. 1007/s12559-014-9285-9|
|||Z. T. Liu, F. F. Pan, M. Wu, W. H. Cao, L. F. Chen, J. P. Xu, R. Zhang, and M. T. Zhou, "A multimodal emotional communication based humans-robots interaction system, " in Proc. 35th Chinese Control Conf. , Chengdu, China, 2016, pp. 6363-6368. http: //ieeexplore. ieee. org/document/7554357/|
|||P. Ekman and W. V. Friesen, "Constants across cultures in the face and emotion, " J. Personal. Soc. Psychol. , vol. 17, no. 2, pp. 124-129, Feb. 1971. http: //europepmc. org/abstract/med/5542557|
|||M. N. Dailey, C. Joyce, M. J. Lyons, M. Kamachi, H. Ishi, J. Gyoba, and C. W. Cottrell, "Evidence and a computational explanation of cultural differences in facial expression recognition, " Emotion, vol. 10, no. 6, pp. 874-893, Dec. 2010. http: //www. ncbi. nlm. nih. gov/pubmed/21171759|
|||G. Sandbach, S. Zafeiriou, M. Pantic, and L. J. Yin, "Static and dynamic 3D facial expression recognition: A comprehensive survey, " Image Vis. Comput. , vol. 30, no. 10, pp. 683-697, Oct. 2012. http: //dl. acm. org/citation. cfm?id=2379846. 2380025|
|||Z. C. Lian, M. J. Er, and Y. Cong, "Local line derivative pattern for face recognition, " in Proc. 19th IEEE Int. Conf. Image Processing, Orlando, FL, USA, 2012, pp. 1449-1452. http: //dx. doi. org/10. 1109/ICIP. 2012. 6467143|
|||G. Huo, Y. N. Liu, and X. D. Zhu, "2D-gabor filter design and parameter selection based on iris recognition, " J. Inform. Comput. Sci. , vol. 11, no. 6, pp. 1995-2002, Apr. 2014. http: //www. joics. com/showabstract. aspx?id=3060|
|||Z. H. Guo, L. Zhang, and D. Zhang, "Rotation invariant texture classification using LBP variance (LBPV) with global matching, " Pattern Recognition, vol. 43, no. 3, pp. 706-719, Mar. 2010. http: //dl. acm. org/citation. cfm?id=1660645|
|||K. Mo and L. Xu, "Defect detection of solar cell based on threshold uniform LBP and BP neural network, " Acta Energ. Solar. Sinica, vol. 35, no. 12, pp. 2448-2454, Dec. 2014.|
|||Z. T. Liu, G. T. Sui, D. Y. Li, and G. Z. Tan, "A novel facial expression recognition method based on extreme learning machine, " in Proc. 34th Chinese Control Conf. , Hangzhou, China, 2015, pp. 3852-3857. http: //ieeexplore. ieee. org/document/7260233/|
|||W. B. Mu, S. L. Zhang, X. D. Wang, and J. Y. Li, "Study on the iris recognition technology based on 2D-gabor filter, " Appl. Mech. Mater. , vol. 539, pp. 151-155, Jul. 2014. http: //www. scientific. net/AMM. 539. 151|
|||G. B. Huang, Q. Y. Zhu, and C. K. Siew, "Extreme learning machine: Theory and applications, " Neurocomputing, vol. 70, no. 1-3, pp. 489-501, Dec. 2006. http: //www. sciencedirect. com/science/article/pii/S0925231206000385|