![]() |
Wenhui (Wendy) LiaoI am currently a research scientist at the R&D in Thomson Reuters. I got my PhD from the Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute in 2006. My primary research interests are probabilistic graphical models, machine learning, and their applications in information retrieval, information extraction, information fusion, and computer vision. |
Probabilistic Graphical Models
Learning Bayesian Networks (BN) when Data is Incomplete
Many real applications using BNs need to automatically learn BN parameters from data due to the difficulty and time requirement of manually setting up the parameters. However, when data is incomplete, which happens frequently with real-world applications, even the state-of-art learning techniques such as the Expectation-Maximization (EM) algorithm and Gibbs sampling could fail. EM suffers from having too many local maxima, and Gibbs sampling suffers from slow convergence.
This motivates us to propose a BN parameter-learning algorithm to escape local maxima by systematically combining domain knowledge during learning. The key idea is to impose two qualitative constrains, the relative relationships between local parameters, and the ranges of local parameters, into EM. In the meantime, sensitivity analysis is used to further refine the search space. The proposed algorithm can achieve state-of-the-art performance in various real applications.
Non-myopic Value-of-Information Computation
Influence diagrams (IDs) have been widely used to model decision-making under uncertainty. A common scenario in decision-making modeled by an ID is that a decision maker must decide whether some information is worth collecting, and what information should be acquired first, given several information sources available. Each set of information sources is usually evaluated by value-of-information (VOI). However, due to the exponential time complexity of exactly computing VOI of multiple information sources, decision analysts and expert-system designers focus on the myopic VOI, which requires certain assumptions that are not satisfied in most applications. This motivates us to propose an approximate algorithm to compute non-myopic VOI efficiently by exploiting the central-limit theorem. The efficiency and accuracy of the algorithm make it a feasible approach in a variety of applications where efficiently evaluating a large amount of information sources is necessary.
Information Extraction
A Unified ID Framework for Information Selection, Information Fusion, and Decision-making
One common issue in many real-world applications is how to choose and integrate multiple information sources (sensors) for solving a problem efficiently, especially when the information could be ambiguous, dynamic, and have multiple-modality. We present a general mathematical framework based on influence diagrams to actively fuse information for timely decision-making. Such a model provides a coherent and fully unified hierarchical probabilistic framework for realizing three main functions: choosing a sensory action set that achieves optimal trade-off between the cost and benefit of sensors, applying a fusion model to efficiently combine the information from the selected sensor set, and making decisions based on the fusion results. The parameters of the model can be automatically learnt with the proposed learning algorithm. This model has been applied to recognize user affective states and provide user assistance, as well as battlefield situation assessment.
Sensor Selection Algorithms
Two typical sensor selection scenarios appear in many applications. The first one is to choose a sensor set with maximum information gain given a budget limit; and another one is to choose a sensor set with optimal tradeoff between information gain and cost. Unfortunately, both of them are computationally intractable due to the exponential search space of sensor subsets. Based on the proposed ID framework, we propose efficient sensor selection algorithms for both of the two scenarios. The algorithms exploit the theory of sub-modular functions and the probabilistic dependency among sensors embedded in the ID model. For the budget-limit case, the proposed algorithm provides a constant factor of (1-1/e) guarantee to the optimal performance. Also the computational efficiency of the algorithm is improved by a partitioning procedure. For the optimal trade-off case, a submodular-supermodular procedure is embedded with the proposed sensor selection algorithm to choose the optimal sensor set in a polynomial-time complexity.
Computer Vision and Human Computer Interaction
Affective State Recognition and User Assistance
Increasingly, HCI researchers are interested in user’s emotional and mental states, since affective states directly influence a user’s performance, especially negative user affect. Therefore, recognizing such negative user affect and providing appropriate interventions is important for various HCI systems. In this study, we apply the proposed ID framework (2.2) to simultaneously model both affective state recognition (stress, frustration, fatigue, etc.) and user assistance in HCI systems. Affective state recognition is achieved through active probabilistic inference from the available sensory data of multiple-modality sensors. User assistance is automatically accomplished through a decision-making process that balances the benefit of keeping the user in productive affective states and the cost of performing user assistance. To validate the model, we build a non-invasive real-time prototype system to recognize different user affective states (stress and fatigue) from four-modality user measurements, including visual appearance features (facial expression, eye gaze, eye movement, head gesture, etc.), physiological measures (heart rate, GSR, temperature, etc.), user performance, and behavioral data. To our knowledge, this integration from four-modality evidence, together with the probabilistic framework, is unique in user affect research.
Visual Object Tracking
Real-time object tracking is essential for video surveillance. One well-know problem is the drifting issue during tracking. We propose a simple but robust framework to automatically maintain and update the object templates for tracking, so that the drifting issue can be well handled. Compared to the existing tracking techniques, the proposed technique has three significant contributions. First, a case-based reasoning (CBR) method is introduced to track non-rigid objects robustly under significant appearance changes without drifting away. Second, an automatic case-base maintenance algorithm is proposed to dynamically update the case base, managing the case base to be representative and concise. Third, it can provide an accurate confidence measurement for each tracked object so that the tracking failures can be identified. With the proposed framework, we implemented a real-time face tracker that can track human faces robustly at 26 frames per second under various face appearance changes.
Facial Activity Modeling and Recognition
Facial activities are the most natural and powerful means of human communication. A spontaneous facial activity is characterized by the rigid head movements, the non-rigid facial muscular movements, and their interactions. Current research in facial activity analysis is limited to recognizing rigid or non-rigid motion separately, often ignoring their interactions. Hence, these approaches cannot always recognize facial activities reliably. We propose to explicitly exploit the prior knowledge about facial activities and systematically combine the prior knowledge with image measurements to achieve an accurate, robust, and consistent facial activity understanding. Specifically, we propose a unified probabilistic framework based on the dynamic Bayesian network to simultaneously and coherently represent the rigid and non-rigid facial motions, their interactions, and their image observations, as well as to capture the temporal evolution of the facial activities. Robust computer vision methods are employed to obtain measurements of both rigid and non-rigid facial motions. Finally, facial activity recognition is accomplished through a probabilistic inference by systemically integrating the visual measurements with the facial activity model.
Video Content Analysis
For most multimedia retrieval systems, it is essential to organize video based on scenes in multimedia database. This motivates us to propose an effective algorithm to automatically break a video sequence into various scenes. This method systematically combines both audio and visual features extracted from the video sequence. Specifically, an unsupervised segmentation algorithm together with the technique of object tracking is used to identify candidate scene boundaries. And then the audio features are used to further refine the candidates. In addition to video segmentation, content-based audio classification is also a valuable step in multimedia content analysis. Most current systems for classifying audio signals either focus on speech recognition or simply classify audio signals into limited groups such as music and speech. We extract multiple audio features and classify audio content into seven categories using support vector machines.
Yan Tong, Wenhui Liao, and Qiang Ji, “Facial Action Unit Recognition by Exploiting their Spatial-temporal Relationships”, IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI), Vol. 29, No. 10, pp. 1683-1699, 2007.