cSCAN Rounds

Coordinating Turn-Taking and Language Use in Spoken Human-Robot Interaction

Being able to communicate with machines through spoken interaction has been a long-standing vision in both science fiction and research labs. Thanks to recent developments, this vision is starting to become a reality. One application domain for this technology is social robots, which are soon expected to serve as receptionists, teachers, companions, etc. One example of this is the Furhat robot head, which started as a research project at KTH, and is now being used in commercial applications, such as serving as a concierge at the Frankfurt airport and doing job interviews. However, despite this progress, current systems are still limited in several ways. In this talk, the speaker will focus on two challenges that lie ahead, and that are currently being addressed in his research group. First, the turn-taking in the interaction is typically not very fluent and not very similar to how we usually talk to each other. The speaker will present his and his colleagues' efforts at modelling and utilizing the multi-modal signals that the human face and voice provide, in order to continuously anticipate what will happen in the interaction, and how this can be used to coordinate the robot’s behaviour with the user. Secondly, current systems are typically based on one generic model of the user and the language they are conversing in. Contrary to this, humans adapt their language to each other and invent terms for new things they need to talk about. Prof.Skantze will present an initial study on how a computer could learn to understand the semantics of referring language by observing humans interacting with each other in a collaborative task, and how the language used by the two interlocutors converges after repeated interactions.