Robust Non-verbal behavior sensing for human interaction analysis
Beyond words, non-verbal behaviors (NVB) are known to play an important roles in human face-to-face interactions. However, decoding non-verbal behaviors is a challenging problem that involves both extracting subtle physical NVB cues and mapping them to higher-level communication behaviors or social constructs. In this talk, i will present in a first part our research towards the automatic estimation of NVB in context from audio-visual sensors and in particular head gestures and gaze (and its discrete version the visual focus of attention, VFOA) in situations when large user mobility is expected and minimal intrusion is required. The main challenges associated to these tasks will be discussed, and how we have addressed them. I will describe how we addressed VFOA recognition in meetings using Dynamical Bayesian Networks to jointly model speech conversation, gaze (represented by head pose), and task context. In addition, i will present recent techniques investigated to perform 3D gaze tracking from RGB-D (color and depth) cameras like the Kinect that can represent an alternative to costly and/or intrusive systems currently available, and can further be used for head gesture recognition. The methods will be illustrated using several examples from human-robot or human-human interaction analysis like automatic gaze coding of natural dyadic and group interactions. In particular, in the second part of the talk, i will present in more details a study on the use of audio-visual NVB cues for the classification of listeners categories in group discussions.