Thursday March 7th 2019, 2:30pm
Being able to communicate with machines through spoken interaction has
been a long-standing vision in both science fiction and research labs.
Thanks to recent developments, this vision is starting to become a
reality. One application domain for this technology is social robots,
which are soon expected to serve as receptionists, teachers,
companions, etc. One example of this is the Furhat robot head, which
started as a research project at KTH, and is now being used in
commercial applications, such as serving as a concierge at the Frankfurt airport
and doing job interviews.
However, despite this progress, current systems are still limited in
several ways. In this talk, the speaker will focus on two challenges that lie
ahead, and that are currently being addressed in his research group.
First, the turn-taking in the interaction is typically not very fluent
and not very similar to how we usually talk to each other. The speaker will
present his and his colleagues' efforts at modelling and utilizing the multi-modal signals that the human face and voice provide, in order to continuously anticipate
what will happen in the interaction, and how this can be used to
coordinate the robot’s behaviour with the user. Secondly, current
systems are typically based on one generic model of the user and the
language they are conversing in. Contrary to this, humans adapt their
language to each other and invent terms for new things they need to
talk about. Prof.Skantze will present an initial study on how a computer could
learn to understand the semantics of referring language by observing
humans interacting with each other in a collaborative task, and how
the language used by the two interlocutors converges after repeated
Prof. Gabriel Skantze
Associate Professor in Speech Technology, Department of Speech Music and Hearing
from the School of Computer Science and Communication, KTH Royal Institute of Technology (Sweden)