29 May 1997 ............... Length about 4300 words (28000 bytes).
This is a WWW version of a document. You may copy it. How to refer to it.

Contents (click to jump to a section)


Christopher. P. O'Donnell,
re-edited by S.W. Draper

"Hitherto, the fact that face to face contact has almost always been the most satisfactory form of communication has been a fundamental constraint of society," Short et al, (pp v, 1976). It could, and has been, argued that large urban conurbation's have grown out of the need of that society to address one another face to face at work and in play. Technology, it has been promised, can free us from these constraints. First it was the telephone and now it is the video conference link that is threatening to restructure the nature of social interactions between human beings. This grand vision of things future, was expressed to me by many students of computer science in the course of researching this review. This technological utopia they conveyed to me, is dependant upon one vital ingredient. That video communication is equal to face to face communication. If it is, this utopia could be possible?

A common accusation levelled at video conference and computer designers in general is that they are just technophiles, too close to the technical possibilities to objectively see some problems and limitations of the technology? That question is not directly relevant to our review, but it is one of the main motivations for avoiding a technology based review. We felt that by not referring to the technological nuances in this review, we could avoid this criticism and perhaps add to a parallel field of psychological research into this burgeoning area. What is relevant is the underlying assumption of the technology's claims. That video conferencing (VC) is as effective as face to face communication. There is a substantial amount of technology based literature on the capabilities of video conferencing from, desk-top systems to large scale conferencing system. Although this literature is of interest to this review, this review is more concerned with the social psychology of video mediated interactions.

The social science of video conferencing is somewhat thinner on the ground than I had perhaps expected at the inception of this study. Consequently, the focus of this review has shifted somewhat, as this fact became apparent. The goal now of the review is to map out what the structure and the strategies of face to face communication are. Our first task then, is to address what are the component parts of a face to face interaction. So that in turn we can critically evaluate, what if anything is different in the literature on video conferencing There are two main constituent parts to any face to face communication. The two are dialogue or verbal communication and non verbal behaviour that accompanies the dialogue. The dialogue's structure is highly dependent on the task the communicators have to achieve. If there is a highly structured task to complete the dialogues structure should reflect that difficulty. Whilst a social interaction between the same individuals should vary considerably, from the above scenario.

So how do you measure a successful conversation? Many researchers have attempted to define a successful conversation, ( Chapanis, 1967, Schober and Clark, 1989, Wilkes-Gibbs, 1986, O'Connell, Kowel, and Kaltenbacher, 1990). The later, O'Connell et al, (1990), produced a rather inflexible definition of a successful dialogue. They claim that it is the "fulfilment of the purposes entertained by two or more interlocutors." Frequently it is difficult to objectively evaluate a successful interaction. Consequently, studies tend to use highly task oriented scenarios with measurable outcomes. Thus, the study of the structure of dialogue tends to be limited to the study of task dependent interactions. Naturally this affects the parameters of our definition of the structure a dialogue. We felt it necessary to qualify this point.

Dialogue is typically structured as follows. There are sessions of turn taking by each speaker. Each session is characterised by dialogue, dialogue gaps (pauses), extended speaker turns, interruptions and eventual cessation of dialogue. A successful conversation within these parameters would exhibit the following. Smooth interchanges in speaker turns where gaps, interruptions and overlaps in dialogue are minimised.
This general structure of conversation dialogue is agreed by most of the differing research camps.
Breakdowns in communication are attributable to many differing factors. Cutler and Pearson, (1987) attributed breakdowns to a failure in turn taking procedures. This view held for some time.
An opposing but equally plausible explanation of communicative breakdown. Is a failure to reach a mutual understanding. This view is offered by a group of psycholinguists known as the interactionists. They believe that a successful interaction is characterised by different paradigms to the one's I have already offered. Their belief is that a successful interaction is one that is characterised by moment to moment collaborations between the participants who co-operate to establish and maintain mutual understanding commonly referred to as "grounding" (Clark, 1989). The interactionists also do not see interruptions, overlapping speech and gaps in dialogue as necessarily problematic in achieving successful grounding. These same phenomena cause problems in turn-taking scenarios.

So, what part does non verbal behaviour play in the competing theories? The overriding opinion is that the successful flow of verbal information is governed by the exchange of signals carried through the non verbal channel, Rosenfeld, (1987). The spectrum of non verbal behaviour is a broad one. Subsequently researchers have studied many forms of non verbal behaviour. Kendon, (1967), first worked on gaze and also on head and eye movement. Others such as Dittman and Llewellyn, (1968), focused on what they christened "kinesic activity". Namely, activities such as torso movements, head nods and shoulder movements. Others have studied the general gesturing behaviour of communications e.g. Graham and Heywood, (1975). Again there is a difference in the theories of the function of non verbal behaviour.

Early researchers (Beattie, 1979, 81, 86), felt that non verbal behaviour is used to regulate turn taking in face to face interactions. The opposing view, that of the interactionist's, is that non verbal behaviour is used to achieve grounding. The 1994 work of Boyle et al, agreed with the interactionist's. In that, their results support the belief that non verbal behaviour (in their case gaze) is used to achieve mutual understanding ( grounding). Boyle et al qualify this hypothesis with the conclusion, "whatever the task and the communicative channel. We as communicative beings can work, all be it more difficulty, to achieve grounding and more often than not task completion," (pp 17).

In the light of these assertions by Boyle et al, I have to ask in whose interests is it imperative to push the technology, to mimic face to face communications? When it is well established that we are more than capable of coping in our present environment. In the course of forming the ideas in this review, I have read many pieces of work that seem to contain nothing but hyperbole. Perhaps I am being to harsh, but I wonder if the same furore surrounded the introduction of the printing press, the radio, the telephone, and the television.

These points aside, let's summarise what we have taken time to establish, namely the verbal and non verbal structure of face to face interactions, then use the framework to analyse the current structure of video-mediated interactions. Verbal interaction is characterised and measured by the number of words used, dialogue length and back channels such as interruption, dialogue gaps and requisitioning. Non verbal behaviour in communication is characterised by information such as eye gaze, and kinesic information such as posture, bodily movement (hand gestures, fidgeting, nods etc..),and facial expressions, (winks, grins, smiles, frowns, etc..). In our optimum communicative scenario, face to face communication would exhibit minimum dialogue and a considerable amount of non verbal signals. In contrast when the visual channel is denied the dialogue length (number of word used) and number of back channels used increase significantly. As seen in Boyle et al, (1994) and Short et al, (1976), in order to achieve grounding and task completion.

As we alluded to earlier we will use the framework of verbal and non verbal behaviour in face to face communication to analyse the current status of video mediated communication. Why? Well on subjective analysis both systems share common elements. Both have an audio channel and both have a visual channel. Let alone that many of the proposed and current applications that would and do use video conferencing as a surrogate for face to face communication.

Now that we have established our intentions, let's briefly look at some of the research that preceded video conferencing into telecommunications. Our motivations for doing this are firstly, to obtain a measurement of the distance in efficiency between face to face and telecommunications. Secondly, so that this measure can be compared with video conferencing and used as a rule for the subjective gains that are inherent in the potential of video conferencing. Telecommunications were the dominant form of distance communication prior to video conferencing The telephone, as I am sure you know, denies the communicants the use of a visual channel. Consequently, many studies in the 1970s through to late 1980s (see Short et al,1972), reported that telecommunications are significantly longer than face to face communications. Entailed in that extra length, are significantly more words in the dialogue. Also, entailed in telecommunications are significantly more verbal confirmation strategies, ( back channels, interruptions and requisitioning) than would be seen in face to face communications. In the excellent, 1994 study by Boyle et al, they provide evidence that time spent on achieving grounding and task completion is significantly greater in the audio only condition as opposed to face to face. Thus, the denial of a visual channel to the communicator's has an undoubted effect on time needed to achieve the same level of task completion.

As it became evident that video communication could become a reality. The subjective impression was that the addition of a visual channel would decrease the time spent on task completion. Presumably, because the visual channel would provide the facility for non verbal behaviour. Thus, face to face communication could be achieved across large distance at little time and financial expense. Much of the prevailing thought and discussion around the applications of video conferencing is in these terms: That video conferencing is an opportunity for distanced communicators to achieve face to face communications. The possible application of such a system could lead to a technological utopia after all, and perhaps more concretely, to the business and distance learning applications suggested by Nicholls, (1997) such as tutorials, lectures and seminars. A Net-University is not an impossibility as many of the application such as library catalogue and document access can be done at this very moment on the net. For a total system of net access, video conferencing is crucial.

So is video conferencing capable of filling this demand? Undoubtedly tasks can be achieved via a video link. Our assertion is that video conferencing is not as efficient in time spent as face to face conversation. Let alone that social uses such as friendly conversation or joking seem very difficult if not impossible.

Let's work through the evidence for these claims, using what we have learned about verbal and non verbal communication. In most if not all of the studies, video conferencing takes as long as an audio link to achieve a task. Why should this be? In many studies the technical difficulties of a low bandwidth, (causing parts of the audio to be lost or significantly delayed), has been cited as the culprit for the inefficiently as studies such as Jameson, Hobsley, et al (1996), found that low bandwidth will significantly increase the time spent achieving grounding and, consequently, the task completion time. Common problems that arise occurred in many studies such as Jameson, Hobsley et al, were ones such as longer and more interrupted dialogue in comparison with face to face. In fact, when results are compared with the audio only conditions the dialogues are frequently longer and more interrupted. This is in spite of the fact that the video conferencing subjects used visual cues. The problem then would appear to be a technical one.

Not so says, O'Malley, Langton et al (1996). They removed all trace of audio delay (the classic low bandwidth complaint), by using the `videotunnels' technique which results in high quality images that facilitate direct eye to eye contact for the communicators and no audio delay. In their own words, "..technology which was as close as possible to face to face interactions..",(pp 180). Their results show that in comparison with an audio only condition "subjects needed to say more rather than less, to achieve the same level of task performance" (pp. 184). There are numerous resuls of this type, where video conferencing is not as efficient as we may have predicted a priori are several, ( Anderson, Newlands, et al, 1996, Andrew,1996, Bruce, 1996, Watson and Sasse,1996, Ramsey, 1996,).

Why is this? Subjectively, we believed that through the work of Boyle et al (1994), the addition of a visual channel would improve video conferencing's efficiency over audio only links. As Short, et al, (1976) suggested technology would lift the "fundamental constraints" of face to face interactions. This evidence suggests that no university is, as yet, about to become the first complete Net-Uni. What is missing? The evidence suggests that one or more elements are missing from video mediated interactions. It appeared to us that the subjects in the different individual studies share a common problem. They appear to be reverting to using verbal channels even though they sometimes have an excellent visual channel. Why would that be?

One of the first questions that struck us was: Is the visual channel failing to transmit the non verbal signals? No, many studies purposely studied non verbal signals, usually gaze, noting that their subjects did in fact gaze at each other. This would suggest that subjects attempted to use the visual cues provided and found problems with the quality of information contained in those cues. Let's then turn to the literature on non verbal signals to assertain what the problems might be. As I have alluded to all through this review the study of the structure of face to face communication would yield information about video conferencing.

O'Malley, Langton et al, (1996), reported similar problem to the one we have already expressed. Specifically, that communicators did initially attempt to interact using the visual channels. Eventually reverting to using verbal schemas only. As we have pointed out a part of face to face interaction is a sense of social presence or proximity. It has been suggested that visual cues in face to face communication, heavily influence feelings of social presence. Visual cues such as gaze, facial expression, and body movement all add to the sense of proximity to another individual. It is not unreasonable to believe that this may be missing or at the very least, significantly depleted in video conferencing.

Studies on proximity (cited in Short et al, 1976) have produced a scale of proximity. This scale ranges from intimate right through to public distance. Classically face to face communication would be under the umbrella of the intimate section of the scale. As far as we can gather from the existing research on video conferencing, the only studies to pay any notice to proximity are those that need to hold distances constant, so that their camera can focus on individuals. If proximity is an issue of visual cueing, many studies have considered aspect of visual information. Let's defer any conclusions on proximity until we have considered these studies.

One piece of information that is missing from the data on visual cueing. Is the study of non verbal signals from the torso and arm movements. Information from the trunk and arms at its simplest is used to confirm the information given by the posture of the individual. An individual's posture can be used as a source of information about the personality and mood: is the individual communicating with confidence? Or are they disturbed, angry or even uninterested? This information is lost in most video conferencing systems. Why? We maintain that it is an issue of pictorial, (image) size. Most screens are to small too carry the scale of visual information needed for these global cues. Most screen sizes show the face and perhaps the shoulders of the individuals. Perhaps our hypothesised decrease in proximity, has at its roots the issue of pictorial size. All screens, apart from some of the high cost end of the technology, present video stimuli that are less than life size,( approx., 40*40 cm for a face). Therefore comparing video conferencing with face to face, is futile as it's not a like with like comparison. It seems reasonable to suggest that by reducing the video size from life size this, will detract from he usefulness of having a visual channel. The resulting decrease in visual effectiveness is only a product of this. O'Malley, Langton et al. (1996) did increase the size of their visual presentation in one of their experimental conditions. The intention of the increase was to allow participant to see the head and shoulder of their fellow subject. They found a significant effect for the size of their image. The larger of the two images shortened the number of words needed to achieve task completion.

Unfortunately this effect of screen size did not interact with visibility. Even with a larger screen the dialogue was longer than the audio only condition. Anderson, Newlands et al, (1996) also varied the size of the video image presented to their subjects. Their small image measured 3.5 inches*4.5 inches. The large was 6.5*8 inches[1]. They also found no significant visual effect for image size. Although this evidence is disappointing, we feel that no study has yet to compared like with like. Therefore to fall to one side or another of the debate would be ill advised.

One area that has been studied though, is facial information. It has long been artistic and literary knowledge that eye to eye contact is a fundamental part of human interactions. So much so that "our eyes met across a crowded room" has become a cliché. Visual gaze is the main focus for the attention of the research. Boyle et al's (1994) work provided good evidence of the importance of gaze in face to face communication. It therefore, makes sense to study its use in video conferencing. Subjectively we would believe a priori, that the addition of a technically based visual channel would significantly decrease the amount of verbal strategies used as the interlocutors could actively use gaze to convey non verbal visual signals.

In Anderson, Newlands, et al, (1996) they found that the addition of the visual channel did not bring the dialogue length or number of interruptions up to the level of face to face communication on the same task. In fact in comparison with an audio only condition, the dialogue length was not significantly better. The overall communicative strategies, in the V.C more resembled an audio only encounter than a face to face one. This was despite the fact that their subjects did actively gaze at one another. O'Malley, Langton et al,(1996) also included gaze analysis in their experiments. They have similar, but perhaps more revealing, gaze analysis results to Anderson, Newlands et al,(1996). Again their video conferencing condition used communicative strategies that more resembled an audio only encounter. Again the presumed benefits ( non verbal cues, and their reduction on verbal strategies) that a visual channel would bring with it to a video conferencing encounter failed to appear. Interestingly, their V.C subjects used significantly more `gazes' than the face to face condition. Subjects seem to try harder to communicate visually.
As we reported earlier O'Malley et al's apparatus contained a `videotunnel' which should facilitate eye to eye contact when one gazes, ( presuming that visual angles where held constant). This result suggest that there is indeed some "missing link" in video conferencing's apparatus. Again the image size may be an unaccounted variable.

What the video conferencing literature seem to suggest, in the light of its comparison with face to face communication is that face to face communication is a much more subtle process than first was envisioned. Consequently, it is our assertion that these subtleties are being lost or are missing all together in video conferencing applications.
This we feel is evident in the absence of the study of the more subtle forms of non verbal signals in the video conferencing literature. Notable ones are the study of facial expressions and as we have previously said full size images. Facial expression has been shown to have pan-cultural consistency in both the encoding and decoding of information (Ekman, 1971). The effect of facial expression would seem subjectively to be a powerful one.

The literature on video conferencing has yet to study it. Considering that the belief exists in some quarters that the head and face are the most important areas in face to face communication, its exclusion is surprising. Perhaps it is because it is a more subtle, some would say a subjective, issue, it has been easy to ignore it. As you have seen it would appear that it is the subtleties that are absent from video conferencing. Thus it may be pertinent to make an effort to study facial expressions and ways of maximising their transmission.

Video conferencing is in no way a failure. If we created that impression we are sorry. video conferencing completes the tasks set for it as well as any existing telecommunications system. Given a little more time on the same task it completes the task just as well as face to face communicators. But given its supposed potential to emulate face to face communication, it still has some way to go before that goal is fully achieved.

So in conclusion what do we want you to take away with you from this review? Firstly, video conferencing has the potential to deliver some, if not all, of the utopian goals claimed in its name. At the moment it is failing to deliver these goals, in several ways. I believe what we have shown is that the answer to video conferencing's short fall are not simple one. Firstly the users of video conferencing are all, in comparison with telecommunication and face to face interactions, inexperienced. It may be that the answer is merely a question of establishing new communicative strategies for the use of video conferencing. I seriously doubt that it is as simple as that. Still it is an issue that needs to be addressed.

Our own main issues centred on replicating a sense of communicative intimacy. For us image size is crucial to this. Also a medium is needed that allows the subtleties of the intimacies human communication to be transmitted, encoded and decoded. We spoke at length about facial expression, eye to eye contact and image size as possible benefits that have been scarcely researched as yet and that may be a pivotal part of the puzzle. At the moment successful use of a video conferencing application is dependant on too many factors collaborating at once, thus making success precarious. In our slightly facetious title we said, " a nod is as good as a wink." The completion of this bit of folk psychology, goes as follows "to a blind man." At the moment although video conferencing is capable of sight, but it might as well be blind.


Impact of video-mediated communication on simulated service encounter.
Interacting with Computers, 8 (2), pp. 193-206.
The effects of visibility on dialogue and task performance in a
co-operative problem solving task.
Language and Speech 37, (1) pp.1-20

BRUCE, V. (1996)
The role of the face in communication: implication for videophone design.
Interacting with Computers, 8 (2), pp.166-176.

CLARK, H.H, & WILKES-GIBBS, D. (1986),
Referring as a collaborative process.
Cognition 22 pp.1-39

EKMAN, P. (1971)
Universals and cultural differences in facial expression and emotion.
in COLE, J.K.(Ed) Nebraska Symposium on Motivation
Nebraska University Press, Lincoln.

GOWAN, J.A., & DOWNS, J.M., (1994),
Video Conferencing Human-Machine Interface- a field study.
Information & Management, 27, (6), pp.341-356.

Real-time interactively on the SuperJANET network.
Interacting with Computers, 8 (3), pp.285-295.

Videoconferencing in a language learning application.
Interacting with Computers, 8 (2), pp. 207-217.

Turn Taking : A Critical Analysis of the research tradition.
Journal of Psycholinguistic Research 19, pp. 345-379.

Comparison of face-to-face and video-mediated interaction.
Interacting with Computers, 8 (2), pp. 177-192.

NICHOLLS, A. (1997),
Towards new horizons. Distance learning: your flexible friend.
Educational Supplement, The Guardian, Tuesday 21 January, pp II-III.

RAMSEY, J., BARABESI, A., & PREECE, J. (1996),
Informal communication is about sharing objects and media.
Interacting with Computers, 8 (2), pp277-283.

ROSENFIELD, H.M., (1987)
Conversational control function of Non-Verbal Behaviour.
In Siegman A.,& Feldstein (eds), Non-Verbal and Communications
(pp. 563-601). Hillsdale, NJ: LEA.

SCHOBER, M., & CLARK, H.H, (1989)
Understanding the addressees and overhearers.
Cognitive Psychology 21, pp. 211-232.

The Social Psychology of Telecommunication.
Wiley, London.

VANHORN, R. (1996),
Sorting it out - Distance learning, video conferencing and desk-top
video conferencing.
Phi Delta Kappen 77, (9), pp. 646-647.

WATSON, A.,& SASSE, M.A.(1996),
Evaluating audio and video quality in low cost multimedia
conferencing systems.
Interacting with Computers, 8 (3), pp.255-275.

WILKES-GIBBS,D., & CLARK, H.H., (1992),
Co-ordinating beliefs in conversation
Journal of Language and Memory 31, pp. 183-194.