Affective State Assessment


Introduction

There are in fact a lot of research efforts in affective state assessment that expand in an extensive spectrum of psychology, physiology, linguistics and computer science during the past several decades. The number of systems and applications that attempt to do affective state assessment or are related to this problem is too large to allow an exhaustic review. So here I first define the objective of this literature review and limit the contents. It could not be fully predicted at this time what else could contribute to the future research of modeling and recognizing affective state in user modeling and assistance applications. Many other works especially in physiology and psychology may raise interests and then would be studied and summarized later.

The objective of this review is to summarize the past and current approaches used in recognizing the internal affective state of a subject, a person more precisely, from external observable information of the subject and/or the surroundings, which is relevant to the studied affective state. By saying this, our aim is limited to the computational aspects of related systems and applications. Most work in this area could be considered as a pattern recognition or classification problem and rarely uses analytical models to simulate human’s affective state appraisal process. In this manuscript, firstly, the measures used as the input to affective state assessment are categorized. And then the methods and algorithms to process these measures are discussed. And finally our approach to model the relations among the affective state and the observed variables are compared with these approaches with the advantages and contributions.

Measures

Here measures mean the information expressed by a subject under study, which is directly or indirectly related to the internal state, more specifically the emotional, affective, or mental status (affective state thereafter) of the subject. Such information could be accessed and recorded into discrete or continuous-valued variables in a set of data that could even evolve with time. Human beings have abundant emotions such as sadness, happiness, guilty, pride, shame, anxiety, fear, anger, and so on. And there are different categorizations among researchers. Another way of describing the emotion status is to using several independent dimensions, such as valence and arousal. Valence describes the “quality” of the emotion, in terms of negative, neutral, and positive. Arousal describes the “energy degree” that may be “activated” or “not activated.” Such method provides us with a very general representation.

There are huge amounts of various information falling into the group of measures that a computing machine could use to assess the subject’s affective state. To give an example of the complexity of such measures, we could stretch our imagination to a subject in the context of active social settings. Our review effort in this manuscript is not focused in a complete and detailed investigation and categorization of these measures because of current limitation due to the availability in acquisition and the level of research about such measures in specific domains.

Measures used in affective state assessment could be categorized into different groups according to different criteria such as verbal and nonverbal, and intrusive and non-intrusive. The following collection of these measures is based on the relation of modalities to the subject and the required instruments.

Self-report

The subject could report his/her emotional state at specific point of time. Generally we consider such approach inapplicable in a practical user modeling and assistance system. However, self-report may be of interests in some rare cases such as direct query to the subject for the internal state, where we have tremendous concern about the recognition accuracy of the subject’s internal state.

Physiological measures

·                     EMG (Electromyography) - EMG instruments measure the electrical signal generated by the muscles when they are being contracted. This voltage is picked up by small pads placed on the skin.

·                     SC (Skin Conductance) or GSR (Galvanic Skin Response) - These instruments measure the resistance of the body using two sensors, placed on each two finger of one hand. This gives a measure of overall stress.

·                     GSA (General Somatic Activity) – This measures the minute movement of human body.

·                     Temperature - This uses a small sensor that is held or taped to a finger tip. It gives another measure of overall body stress. The higher the temperature the more relaxed the person.

·                     EEG (Electroencephalography) - This uses sensors placed on the head to pick up the small voltages produced by the brain.

·                     Other types of biofeedback that are also used are: heart rate, respiration, perspiration …

Physical appearance measures

Visual – Eye movement, facial expression, head gesture, body gesture. The important coding system in facial expression is the Facial Action Coding System (FACS). And FACS inspired the derivation of facial animation and definition parameters in the framework of the ISO MPEG-4 standard. The Facial Animation Parameter Units (FAPUs) and the Facial Definition Parameter (FDP) set defined in the MPEG-4 standard could be used to define a facial shape and texture, as well as the animation of faces reproducing expressions, emotions and speech pronunciation.

Acoustic – Useful acoustic information about a subject’s emotional or mental state could include the physical expressions in voice intonation, pitch, and semantic expressions in speech contents.

Behavioral – These behavioral measures are domain specific, such as the mouse movement when operating a computer, or steering features when driving a car.

Social and problem-solving strategy

The study of using social and problem-solving strategies in emotion recognition is difficult and most research efforts have been in other fields such as management and psychology. Human beings’ high level assessment of surroundings and decision in choosing strategies often reflect the change of the internal state. A complete set of measures in affective state assessment may need such information in account but obviously this is a little far away from current development status of study.

Algorithms

From the view of considering the emotion or affective state assessment as a classification or pattern recognition problem, a set of approaches and algorithms from statistical, machine learning, and pattern recognition could be potential candidates for such task. Our description focuses on the processing stage to find the emotional category through investigating the input features from the above measures.

Rule based knowledge system is a basic method to describe the relations among the input and output and to predict the consequence in classification. In fact, many models built from other approaches could be transforms into rule-based knowledge and use the rule-based system in classification. And many emotion synthesis systems apply rule-based models for emotion expression. However because of the constraints in representation and modeling capability in IF-THEN rule structure, researchers intend to ignore rule-based systems during feature-extraction and pattern recognition. A research of interest is in facial expression recognition (Pantic et. al, 2002). In this research, a rule-based system containing 20 rules is used to recognize the AUs defined in FACS based on the extracted mid-level feature parameters describing the state and motion of the feature points and their shapes from the face profile contour.

Fuzzy set method defines the degree of membership of an element in a class. And then it defines the input membership functions and fuzzy rules to process input into rule strengths. Such rule strengths are combined with output membership functions to get the output distribution. And if necessary, a class could be given through defuzzification. Hudlicka and McNeese (2002) use fuzzy rule knowledge base to assess the anxiety from the pilot’s static and dynamic data including task context, external events, personality, individual history, training, and physiological data. In this research, fuzzy rules are matched to produce numerical anxiety weight factors (AWFs) for different modalities. Then these AWFs are used to compute an overall anxiety level and the resulting value is mapped onto a three-valued qualitative variable. In this very specific usage, no uncertainty is handled in rule-based affect assessment. For a complex system, building up the complete and accurate rule base is an overwhelming task. Fuzzy systems have also been used in emotion recognition using facial expression (Tsapatsoulis 2000) and both face and voice data (Massaro 2000).

Case/instance based learning is a very straightforward way to design a classifier. Scherer (1993) designed an emotional analyst expert system, GENESE, based on the component process appraisal model. The knowledge base consists of 14 vectors for 14 emotions, with quantified predictions for typical stimulus checks. In classification, subjects are asked 15 questions to determine the values for these checks. Then a Euclidean distance could be calculated to give a distance measure of the new case to each emotion. The smaller the distance to an emotion is, the more possible the current case belongs to this emotion. The training could be done by adjusting the vector of the actual emotion to the vector for current case if the prediction is incorrect. Instance/case based learning has strength in its natural way of designing classification algorithm. But the good performance of such algorithm depends on the accurate choice of instance representatives. Similarly, Petrushin (2000) used k-nearest neighbors to predict emotions from speech features.

Linear or nonlinear regression could be used in classification. In (Moriyama et. Al, 1997; Moriyama and Ozawa, 1999), authors described an emotion recognition and synthesis system on human speech. Users speak same sentences in neutral and emotional states. In training, the measured physical parameters about such speech’s pitch contour and power envelop are transformed into principal components, and then used to estimate the coefficients with respect of the emotions, through multiple linear regression. In recognition the input is the speech of unknown emotion and the neutral speech by the same person and of the same words. The output is the emotional information. Regression could be used as a general classification solution. Like the other augmented algorithms in the family of regression, it suffers the distribution assumption, the computation cost for complex data, and the challenge from missing and incomplete data.

Discriminant analysis is a statistical classification algorithm based on comparing the Mahalanobi distance to different class representatives in multivariate analysis, defined as the distance of the data to the mean center of data points of certain class. In (Ark et. al, 1999), an emotion mouse is designed to measure the user’s emotional state among happiness, surprise, anger, fear, sadness, and disgust, based on four physiological measures including GSR, skin temperature, heart rate, and GSA. This task is fulfilled by training a set of discriminant functions with the physiological measures as the predictor variables, and the emotional states as the classes. These functions could be used to calculate the Mahalanobi distance to different emotion classes and accordingly determine the membership of the current case. Again such algorithm is limited by its assumption on normal distribution. The reported prediction correct rate is only two thirds even using the same training data in testing and without the baseline normal cases.    

Neural networks (NN) have been extensively used in pattern analysis. There are a series of applications of NN in facial expression recognition. Petrushin (1999, 2000a, 2000b) used the feature selected from the speech in call center, including pitch, energy, speaking rate, formants, and some descriptive statistics for them. The NN classifiers used a two-layer backpropagation neural network architecture with a 8-, 10- or 14-element input vector, 10 or 20 nodes in the hidden sigmoid layer and five nodes in the output linear layer. This approach gave the average accuracy of about 65% with the following distribution for emotional categories: normal state is 55-65%, happiness is 60-70%, anger is 60-80%, sadness is 60- 70%, and fear is 25-50%. Also the ensembles of NN classifiers and combinations of NN classifiers only trained for one emotion. Fellenz et. al (2000) used ASSESS system to process the acoustic features such as voice level, voice pitch, phrase, word, phoneme and feature boundaries, voice quality, temporal structure, and facial features. There are other efforts of applying NN in facial expression analysis such as in (Zhao and Kearney, 1996). Neural Networks have long achieved good performance in many difficult problems. The disadvantages of them are mainly the expertise in choosing the network structure and configuration of training, and the intense computation required.

Bayesian approaches apply probability and belief theory into system modeling and learning. Bayesian theorem calculates the posterior probability of the hypothesis given the evidence using a prior probability of the hypothesis and the dependence of the evidence on the hypothesis. In (Qi and Picard, 2002; Qi et. al, 2001), authors described a Bayesian classifier to predict the frustration level of users using the features of mean and variance of the pressure signals from mouse sensors. The data distributions are modeled by a mixture of Gaussians. The Bayesian classifiers are augmented through incorporating context-sensitive information, i.e. by switching component classifiers according to different contexts. The training of the Bayesian classifiers is using an approximate algorithm, namely Expectation Propagation. The experimentation results show a little improvement compared with the global learning algorithms such as SVM, and believed to be better than classical local learning algorithm such as k-nearest neighbors. Bayesian approaches provide a powerful modeling and prediction tool while normally the computation is intense. The Bayesian classifier in this case does not take the advantage of the conditional independency among the variables. In the much more simplified form, naïve Bayes classifier is used to predict emotions (Sebe, 2002).

Bayesian networks use graphical models to summarize the prior knowledge of the causal probability and conditional independency among the variables of a physical system. Then Bayesian inference could be used to update the hidden and hypothetical variables based on the observation of external evidence. In (Ball and Breese, 2000; Breese and Ball, 1998), authors provided a Bayesian network to assess the user’s affective state in terms of the dimensions of valence and arousal, and personality in terms of the dimensions of dominance and friendliness. The observable data are facial and speech data including wording choice, speech characteristics, input style characteristics, and body language movements [figure (Breese and Ball, 1998)]. It models the word selection more deeply by expanding it to a expression style including active, positive, terse, and strong expression, and an interpretation layer of used paraphrases. This network could be implemented into a dynamic Bayesian network to capture the temporal emotion state structure. Such network gives a good representation of the emotional state model and certain performance, although a more complete model could be built to expand the structure both in depth and breadth.

In (Conati, 2002; Conati and Zhou, 2002), authors provide a dynamic Bayesian network model for assessing students’ emotion in educational games. This network models the emotions based on OCC cognitive theory of emotions, as the results of the appraisal of how the current situation fits with the person’s goals and preferences [figure (Conati and Zhou, 2002)]. There are also body expressions and sensors as the evidence for the student’s emotional state, such as the visual information, EMG, GSR, and heart rate measures. The emotion states include joy, distress, pride, shame, admiration, and reproach. Each time the student performs an action, or the agent performs a help, a new time slice is added to the network. This research tries to combine an analytical emotion model with specific application domain. However the very fine grain size for describing the action and consequence is somehow strange when we consider the emotion of a user to some extent is stable. And the variant time interval between time slices may lead to requirement of developing time variant conditional probabilities related to time slices. Furthermore it is hard to evaluate whether the action is satisfied or not in complex systems.

A HMM model is equivalent to dynamic Bayesian network representation. Picard (1997) discussed the use of HMM to model the emotion of users and then use such model to recognize and predict the emotional state. In the model, the hidden state could have three emotional states including interest, joy, and distress. The observation node could contain any sentic measurements varying with the underlying states. The transitional probabilities from one emotional state to another is defined and similarly the transmission probabilities of these measurements given these states. In (Cohen et. al, 2002; Cohen et. al, 2000), the authors used a multiple HMM model to classify six emotions including happy, angry, sad, surprise, disgust, and fear [figure (Cohen et. al, 2000)]. The input is the AUs defined in FACS for facial expression. There are six 5-state emotion-specific HMM models to produce the state sequence from the continuous AUs. The high level HMM has seven states corresponding to the six emotions plus the neutral state. Authors claim to realize the automatic segmentation and recognition of video data. Again HMM does not fully consider the expertise among the emotion states and the variables influenced by or influencing these states since it employ the Bayesian model in the very basic form of two layer network. Choosing the number of hidden states at the lower level HMM models is kind of arbitrary. The computation complexity will increase when more observation variables are combined into the model because of the full connections of them to the hypothesis variables.

A Complete Assessment Framework

According to the above discussion, we could categorize current approaches in affective state assessment into two groups. The first group uses the measures as predictor variables and applies classification algorithms without the prior and context knowledge among these variables and the target affective state variable. Such approaches include the common machine learning and classification algorithm as regression, discriminant analysis, Neural Networks, Bayesian classifiers, instance-based learning. Similar to them, decision trees, EM algorithm, and so on are possible candidates. The advantage of these algorithms is that they have very general and direct expression in terms of some numerical functions. Thus the models could be converted into rule-based expert systems easily. The disadvantages include the lack of ability to handle uncertainty, complexity, incompleteness involved in data sets to building up the pattern models or classification tasks. The other group is represented by the Bayesian network and HMM models. They represent the prior knowledge and expertise into graphic network form, and balance between the global and local representations. While some critics dislike the domain knowledge necessary to build accurate models, such knowledge provides powerful capabilities in handling the complex situation in practical systems with the aid of the causal and uncertainty representation structure.

Integrated with the analytical cognitive model, our user affective assessment model provides a complete framework for user affective state assessment.

1)      Consideration of most relevant factors. In modeling the subject’s internal or affective state, our model incorporates the context, profile information into account. On the other hand, different from the work by others discussed in the above, in our framework we take in consideration the stability of the subject’s affective state and the difficulty of depending on task goals. So we mainly rely on the power of external observations in recognizing subject’s internal state, and let the accurate profile and other context information to help after online and offline training.

2)      Combination of more and more evidence. Applying Bayesian network model in recognizing affective state has the advantage in handling the uncertainty in multimodal data about the subject. More and new evidence could be integrated into this model once we find the relation between it and human being’s affective state. It needs little effort to combine the new modules with the legacy system but provide us with more accurate view. Such new components could also include the context and profile aspects of the subject, and implemented by deploying into independent modules.

3)      Integration of domain independent and dependent models. Extended from the above way of applying Bayesian networks in recognizing the subject’s affective state, we could declare that such affective state assessment model is domain independent and thus could be used in many different application fields. In the meanwhile, a domain specific model is still necessary when we want to react to the subject’s affect. Such models have some functions similar to the “Affective Understanding” described by Affective Computing Research Group (2002) or the Belief Assessment component and Impact Prediction component in ABAIS system by Hudlicka and McNeese (2002). Such model is essential in providing “task environment awareness” for HCI systems such as intelligent assistance. This domain dependent model captures the related information, explains the causes of problems, and predicts the impact of these problems.

4)      Integration of analytical and synthetic models. Although the Bayesian network models have advantages in modeling the subject and associated affective state, two weak points exist and must be addressed when we design and implement a practical HCI system. One problem is raised when we need the details in understanding the affect and related problem. The grain size of Bayesian network models is normally not fine enough to fulfill this task just based on external observations. The other one is raised when we need very accurate predictions and thus deal with the validity of network structure and parameter. We need a third party functioning as an objective judge. In our work the analytical ACT-R cognitive model is a very suitable candidate for complementing both requirements. Working with the domain dependent and independent models, such cognitive model provides mechanisms for simulating, verifying, explaining the subject’s status, detecting conflicts, and improving the models.

Based on the review of current research status of affect state assessment, our next work mainly includes the following.

1)      Choose the measures we will use. At the beginning, we could focus on the facial expression and eye movement measures. These may include the facial expression recognition measures based on AUs by Yongmian and Haisong, and eyelid measures.

2)      Determine a finite number of affective or mental states in consideration. There are two methods in defining the states we concern here. We could use several dimensions, e.g. the valence and arousal degree. Or we could choose from a small set of negative states including fatigue, anxiety, confusion, and enragement.

3)      Evaluate the framework in several simple applications. Design and implement the framework in some simple scenarios. Again the difficulty is in the data we could get. Either we do experimentation with human subject having imaginary states, or we have to search for the real-world data from others’ work. In the first step we could just focus on the domain-independent assessment model.

References

1.       M. Pantic, I. Patras and L.J.M. Rothkrantz. (2002).  Facial mimics recognition from face profile image sequences, Technical Report DKS-02-01, Data and Knowledge Systems group, Delft University of Technology, Netherlands.

2.       Scherer, K. R. (1993). Studying the emotion-antecedent appraisal process: An expert system approach. Cognition and Emotion, 7, 325-355.

3.       Moriyama, T. and Ozawa, S. (1999). Emotion recognition and synthesis system on speech. IEEE ICMCS 99, June 1999.

4.       Moriyama, T., Saito, H., and Ozawa, S. (1999). Evaluation of the relation between emotional concepts and emotional parameters on speech. IEICE Journal. Vol.J82-D-II, No.10, pp.1710-1720, October 1999.

5.       Ark, W., Dryer, D. C., and Lu, D. J. (1999). The Emotion Mouse. Proceedings of HCI International '99, Munich, Germany.

6.       Hudlicka, E. and McNeese, M. D. (2002). Assessment of User Affective and Belief States for Interface Adaptation: Application to an Air Force Pilot Task. User Modeling and User-Adapted Interaction 12: 1-47.

7.       Tsapatsoulis, N., Karpouzis, K. Stamou, G., Piat, F., and Kollias, S. (2000). A Fuzzy System for Emotion Classification based on the MPEG-4 Facial Definition Parameter. EUSIPCO 2000.

8.       Massaro, D. W. (2000). Multimodal emotion perception: Analogous to speech processes. In Proceedings of the ISCA Workshop on Speech and Emotion, Newcastle, Northern Ireland, September 5-7, 2000, 114-121.

9.       Fellenz, W. A., Taylor, J. G., Cowie, R., Douglas-Cowie, E., Piat, F., Kollias, S., Orovas, C., and Apolloni, B. (2000). On emotion recognition of faces and of speech using neural networks, fuzzy logic and the ASSESS system. IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN), Como, Italy, 24-27 July.

10.   Ball, G. and Breese, J. (2000). Emotion and Personality in a Conversational Agent, in Cassell, J. et al. (eds.), Embodied Conversational Agents. Cambridge, MA: MIT Press, 2000.

11.   Breese, J. and Ball, G. (1998). Modeling Emotional State and Personality for Conversational Agents. Technical Report. MSR-TR-98-41. Microsoft Corporation.

12.   Y. Qi and R. W. Picard. (2002). Context-sensitive Bayesian Classifiers and Application to Mouse Pressure Pattern Classification. Appears in: International Conference on Pattern Recognition, Quebec City, Canada.

13.   Y. Qi, C. Reynolds, and R. W. Picard. (2001). The Bayes Point Machine for Computer-User Frustration Detection via PressureMouse. Appears in: Proceedings from the Workshop on Perceptive User Interfaces.

14.   N. Sebe, I. Cohen, A. Garg, T. S. Huang. (2002). Emotion Recognition using a Cauchy Naive Bayes Classifier. International conference on Patter Recognition (ICPR) 2002.

15.   Conati, C. (2002).  Probabilistic Assessment of User's Emotions in Educational Games. In: Journal of Applied Artificial Intelligence, special issue on “Merging Cognition and Affect in HCI”, vol. 16 (7-8).

16.   Conati, C. and Zhou, X. (2002).  Modeling Students’ Emotions from Cognitive Appraisal in Educational Games. In Proceedings of ITS 2002, 6th International Conference on Intelligent Tutoring Systems, Biarritz, France.

17.   Picard. R. (1997). Affective Computing. Cambridge, MA: MIT Press.

18.   I. Cohen, N. Sebe, A. Garg, and T. S. Huang. (2002). Facial Expression Recognition from Video Sequences, International conference on Multimedia and Expo (ICME’02).

19.   I. Cohen, A. Garg, and T. S. Huang. (2000). Emotion Recognition using Multilevel-HMM, NIPS Workshop on Affective Computing, Colorado, Dec 2000.

20.   V. A. Petrushin. (2000a). Emotion Recognition Agents in Real World. 2000 AAAI Fall Symposium on Socially Intelligent Agents: Human in the Loop.

21.   V. A. Petrushin. (1999). Emotion in Speech: Recognition and Application to Call Centers. Artificial Neural Networks in Engineering (ANNIE '99), St. Louis.

22.   V. A. Petrushin. (2000b). Emotion Recognition in Speech Signal: Experimental Study, Development, And Application. 6th International Conference on Spoken Language Processing (ICSLP 2000).

23.     J. Zhao and G. Kearney. (1996). Classifying facial movement by backpropagation neural networks with fuzzy inputs. Proc. Intl. Conf. Neural Information Processing. Pp. 454-457.

24.   Affective Computing Research Group. (2002). "Affective Understanding:" Modeling and Responding to User Affect. http://affect.media.mit.edu/AC_research/understanding.html. MIT Media Laboratory.


Appendix: figures

(Breese and Ball, 1998)

 

 

(Conati and Zhou, 2002)

 

 

(Cohen et. al, 2000)