Last changed 23 Apr 2001 ............... Length about 4,000 words (25,000 bytes).
This is a WWW document maintained by Steve Draper, installed at You may copy it. How to refer to it.

Web site logical path: [] [~steve] [grumps] [this page]

Ethics of user action recording

Stephen W. Draper

Contents (click to jump to a section)


This is to collect some personal notes about ethical issues of collecting user input data. Although I don't mind who reads this, it is still an early and rough form of my developing views.

Chase up Belotti papers on feedback and control

No discussion of benefits

This document only discusses problems or costs of collecting personal data, not the benefits. In any specific case, people will (if given a choice) decide whether to participate on the basis of the balance of costs and benefits as seen by them. However it is the business of technologists and researchers to develop and describe the benefits, and one they can be relied on to do. This document is about considering what they will often not bother to write about: the disadvantages, and how to reduce or overcome those disadvantages.

In fact benefits often have two aspects: the possible benefit, and the probability of that coming about. This is particularly so with research, where there is often a pretty small chance of a rather large benefit. After all, research is about pursuing very uncertain issues: that is what makes it research, not development. However uncertainty is pretty unappealing to ordinary people. That makes it even more important to reduce the disadvantages and costs for people we wish to give us their permission to collect data from, so that they need only the smallest belief in there being future benefits.

However, is this avoidance of reasoning about benefits possible? Perhaps not, as we see below, if we accept the argument that we must invest in providing privacy controls. Privacy is a benefit, after all.

General requirements

In general, we have to consider privacy and awareness.

Privacy: most users will need to keep private some of their computer actions: e.g. passwords, confidential communications about themselves or others.

Awareness: It is not obvious what recording could make visible. E.g. ?? Whether for ourselves or our users, whose agreement to participate we need, it is hard to control this. And without awareness, there can be no genuinely informed consent.

Technical approaches

  1. Require signoff acknowledgement on starting up the software everytime: no signoff, no software use. Of course this means many will switch it off i.e. won't sign, so their data won't be collected.
  2. Have a permanently visible reminder (but do not require action every time). Simply having a clear sign or notice might be enough: the analogue is that seeing someone else tells you that you are in public / visible.
  3. Provide a control over the recording, allowing it to be turned off (and on).
  4. Broadcast information on EVERY use, particularly new uses that extend what can be drawn from the collection. I.e. every time a new type of data retrieval / inference is developed, publish this to the users.

Usability Principles

In addition to the principles about privacy and awareness, in practice these must be combined with usability issues. Thus convenience means a user does not want extra actions inserted as part of their regular work: which is why acknowledging or explicitly authorising collection is not desirable every session. Seeing controls as a reminder is better than having to do something every time, but perhaps unnecessary if we could be sure a) users had previously understood and thought fully about the issue, b) users would recall it to conscious consideration whenever appropriate.

What is wanted is a) no collection without awareness; but b) once genuinely considered it should be OK for it to fall back below awareness; yet c) not being led into allowing it without appropriate awareness. The last is probably in part about periodic reviews? and about spelling out, at some time the user has the attention and knowledge to spare, advanced implications.

But probably it means something like (cf. 14 day retraction period for consumer (financial?) products):
Can start up with a new software for a user with minimal warning that it includes recording; but need a graded set of reminders about options to disable the collection and/or about the inferences that are drawn from it. Require periodic reconfirmation? Or reminders that are only reduced depending on how long user has spent reading the information on the consequences? or passing a quiz on them?

In other words, all the techniques we see in current UIDs (about blending convenience, personal settings, and warning of dangerous actions) should here be applied to protecting the subject (person being observed).

Layers of attention; how to manage them

As noted above, people make decisions by attending to something for a short period, and then want to forget it: having the decision remain effective without further action and attention. If it can't be done just once, then they need to have/ to set up a review mechanism, where they review the decision periodically, or when a problem arises (in which case, they need a mechanism to have it brought to their attention).

How is this supported in software?

  1. Ideally, by finding one solution that suits everybody, so it can be wired in and no decisions or actions by users are required.
  2. Next best, is if the software can inspect the context of that user, and make the decision correctly for that user without bothering them. This is now often achieved by self-installation software.
  3. Next best: personal settings. The user may/must set various parameters, which are then remembered by the software. They can sometimes afford to defer this, then personalise the settings after a bit. Often this turns out to require the software to remember 3 sets of settings: factory settings (which hopefully allow the software at least to be used, even if not optimally, when switched on); users' permanent settings, once made; temporary state, allowing users to change modes but not have them remembered for long.
  4. Modes: user explictly changes state whenever needed. This classically leads to mode errors: when users forget to change state appropriately. The errors arise because users must maintain a memory of the state in their heads (but may forget); AND because the actions they take must depend on the prior mode/state: this means the actions cannot be routinised, as they vary not just with what they want to achieve next, but with what the last state was.

  5. Routine alerts: whenever the user requests a dangerous (irreversible and/or costly) action, the software queries them requiring an extra confirmation. Habit frequently negates this, by getting the user always to confirm without even reading, much less thinking about, the alert. (This is thought to contribute to train drivers going through red signals where the same AWS warning is routine in areas where most signals are yellow.)
  6. Send users the consequences of their current settings: in interaction recording, summaries of the data collected and inferences drawn?
  7. Repeat alerts until timing information shows users have probably thought about it and actually read the small print / warnings. (Cf. on railways, timing trains approaching junctions/speed restrictions, until it's certain they have not just seen the signal but slowed down.)

In general, we have to set up not just the software settings, but the users' habits. And furthermore, set up periodic review and/or warnings when the situation has changed so the setting needs to be reviewed. But also to allow easy (simple as well as fast) startup for new software adoption; i.e. allow immediate startup without specifying many parameters, but have these reviewed as usage settles down.

Spectrum of behaviour

  1. Concealed observation, then sudden publication: the News of the World tactic. Widely regarded as distasteful, but legal. It uses deception and concealment to bring out things which an individual does not advertise, and feels ashamed of if they are advertised.
    1b. Most of us would in fact feel this, not about actually unethical behaviour, but just if a camera were concealed in our bedrooms and bathrooms.

  2. Concealed observation, and permanently secret records: police state methods, but also those required in personal banking. Again, information you thought was limited gets to a wider audience you don't know about.

  3. Concealed recording of public actions. You know you are visible when going into any public space, so should you even be told about it?
    3b Being filmed in public, and the film distributed: in fact, modern technology is an issue here, as it affects distribution and inference from the observation. Normally we (seem to) limit people's observation of us to events and their immediate consequences as communicative acts. Recordings mean that our actions get displaced in time and space to other audiences, ...

  4. You are told about the observation, but cannot stop it. You knew, by virtue of being in a public space, you were visible: this just confirms it, and perhaps warns you about the recording aspect. This is the DPA (data protection act) position: you must be notified, and have access to the records, but have no rights to prevent it.

  5. You are given control over the recording happening or not.

Our present action

Presently we are informing students (participants, subjects); and giving them control to turn it off; but NOT remembering that setting, so it requires action every time to turn it off.
We aren't remembering that state because it is not convenient to our software design.
Furthermore, we aren't reminding them (if they turned recording up to total) to turn it down when they do confidential stuff. Perhaps we should prompt or act, to turn it down to concealing keystrokes, when they switch to email, and perhaps when they switch to Word?

Probably the software, and the architecture from which it springs, should be designed to: a) set things via defaults AND reminders so that it nearly always gets confidentiality right for most people; b) allows users to override it when they want; c) then but only then is it reasonable to allow the path of least resistance also to suit the collectors.

The principle of responsible research

Why we must adhere to the principle

The notion is that principles -- here the ethical issues to do with data collection -- should be wired into the basic architecture we invent, and not just left for later users of our architecture as a commentary on desirable practice.

This is desirable because:

  1. Depressing social psychology studies have shown that, while a small minority may be principled, most of us will do terrible things if led into it by the situation. If we want a society where most people behave well, we need to put ourselves into situations where we are led into behaving well.
  2. Civil engineering is currently in trouble here. Big dam projects in India and China are currently criticised because they destroy more people's lives (by flooding) than they improve by water and electricity supply. The engineers pursue their ends, and leave it to others to compensate for loss of land and housing. Those others then don't get round to delivering, meaning that the net effect of the engineers is to make life worse. On this analogy, it is a very bad sign that we aren't remembering users' privacy settings across login sessions because it is inconvenient in our architecture: this, on the argument above, shows we have a bad architecture: one that favours stealing data, and ignoring privacy concerns. Indeed, one that promotes this unethical behaviour to the extent that it has determined our own behaviour in this, our first case.
  3. Note that people (here, engineers who may use our architecture) can often be influenced by very small things. It has been shown that simply having a yard of garden in front of a house greatly reduces vandalism and burglary: a garden offers zero physical or legal barrier, but anyone entering that front garden feels (rightly) that they have to have a reason for it. We should consider setting things up, so that it is both easier to do the right than the wrong thing, and that what the engineer does is visible and makes them feel they are publicly accountable for the settings they adopt.
  4. The growing number of ways that modern ICT allows data to be both collected and processed means that much more care must be taken by everyone, both users and developers, to understand what their visibility is, and so how to control their privacy. See CACM, Feb. 2001 issue for some example issues.

What this means for us

So we can't just do collection, we must also put a lot of development effort into active privacy controls for our users even if this isn't our main or original interest. Put another way, it looks as if this point is one of our first discoveries about what must be done in this field.

There are two main approaches we could take. Either to collect everything in a persistent store and put the controls on its exit: on what gets used and by whom. Or else, to put filters at the point of collection: controls for the user (and also ourselves) on whatever gets collected, and so sensitive stuff doesn't get sent across the net (and the potential problems with that). In Grumps, we are exploring the latter.

Our present Grumps privacy controls

It is looking as if this is more difficult to do right than we had realised. The stages are something like:
  1. Do nothing. This means few users would be right to trust us, as we make it difficult or impossible for them to protect what they need to protect.
  2. Our first implementation: offer control in terms of the classes of events recorded i.e. the control mirrors low level technical categories. It turns out there is already a considerable problem with this. Our second most secure category or level is to record only NT window focus events. These store information about the window including its title. It turns out that there is quite sensitive information in these e.g. email addresses, user's names, login names, home directories etc.
  3. Many users, and many of us, think that what we want is privacy control in terms of software applications e.g. turn recording off for email, but on for other things. However MPA says this wouldn't be much use for him: many of his emails are public domain, but private stuff might be typed into a number of different applications (e.g. create email text in a word processors, ....).
  4. We now think, in general, we need to create filters of arbitrary complexity but supported by authoring software we develop, that are installed at the point of collection; and use these not only for the investigators' needs, but equally for users to control their privacy.


Why do people care?

Why do people care about privacy? I'm going to suggest for two classes of reason: the costs of communication, and trying to exploit others' ignorance.

Communication costs

Human communication turns out to rely heavily on inferences by the hearer to fill in what the speaker says, and so reduce by a large factor the amount that must be stated explicitly . This is quite intentional by the speaker (it saves them work and time), but it means a lot of mental effort is spent by both parties in making, managing, and correcting the inferences made. One of the things that makes writing so much harder than conversation, is shaping the text to elicit all but only the intended inferences in the reader (who cannot use dialogue to check and correct these on the fly). Another feature that has come out of studies of purposeful dialogues (i.e. those to do with practical joint action, as opposed to idle conversation), is that a lot of work is done to set up and achieve a common set of beliefs in the facts, but often only as far as they are relevant to the task at hand: it is often just too much work to correct all the irrelevant misconceptions as well.

Observing others, no matter how (e.g. overhearing their conversations) also gives rise to copious inferences. However because these are not the subject of interaction, they often contain many mistakes. On the other hand, people often do a lot of work to present themselves socially in a certain way e.g. as reasonable. That is, they spend considerable time and effort offering explanations ("accounts") of themselves, their actions, and their motives. A basic reason for this is that not just conversation but all our interactions with people largely depend upon their ability to guess correctly what we want: whether we are moving past them on the pavement or selecting things to mention to our boss.

Being observed when you were unaware of it, or being observed in new ways as technology develops, puts you in the position of either losing the successful presentation of yourself to others that we all work on to some extent, or else having to do a lot more work on it. This must make people reluctant because it's too much effort to go explaining oneself all the time: actively communicating the impression you make and overcoming mis-inferences; and equally explaining what is actually the case, the facts.

There are probably two fundamental sources of this, of this difference that makes inference and explanation so central to our communication: firstly, people just know different things from each other, even in little things; and secondly, people have different values.

Basically: being more visible is likely to involve us in giving more explanations of our behaviour, but without any benefit in terms of more beneficial actions from others. (Targeted marketing based on records of our purchases might be an exception to this.)

Exploiting ignorance

The effort of explaining oneself and other things is one deterrent. Separately from that, there are other motives for not exposing all the information you have. What people do depends on what they know; sometimes, their ignorance leads them to act in ways you prefer.

One example is information about price. But equally, other kinds of information can change market (purchase) behaviour: about the quality of the goods, about where this food has come from. Similarly how people vote, or act in other ways can be affected by the information they are given or prevented from getting. Another examples is crime: fraud depends upon witholding information, and indeed in purveying false data.

In my value system, these are all bad reasons; but they are quite widely important.

Candidate principles

  1. The principle of visible observation. Like seeing someone else in the room or a bulky CCTV camera, you know you are in view, even if you don't interact with the observer. This is what we are inured to in pre-technological life; and is a considerable part of the distinction between public and private spaces. Being observed when we assumed we were private is what annoys and threatens us. The distinction between people at the next table being able to hear, versus someone listening at a keyhole.
  2. The principle of mutual visibility. Being seen only by people you can see in an equal, reciprocal relationship. The analogue is being in the same room or same street. Reports suggest that CCTV in public places leads to bad behaviour by the control room staff because they can see without being seen, in ways that would probably not occur if they were themselves equally surveyed. The fix there would be to broadcast a camera showing the control room staff on to public screens in the spaces being surveyed. For user input recording, it suggests that
    2a. the data gatherers should also have their input actions gathered, and be as visible to the other users as the latters' data is to the gatherers. It may also suggest that
    2b. all data retrieval actions (and who did them) should be recorded and made public, since these are the equivalent of observing the original users. In other words, one should be able to know, not only what data is recorded about oneself, but who has inspected it.
  3. The principle of actively informing about (new) consequences. We may need to extend the above principles because the implications and extended uses of data in new technology are not familiar to many users, and indeed may only be discovered and developed as time goes by, contexts change, technology develops. Thus informed consent requires extended informing (and "re-consent") over time. So this principle proposes that each time a new application of, or inference from, a type of data is developed, all the users contributing to the data should be informed. [That anecdote on the person who allowed their input to be visible on the web, only to find that a hostile summary of their days' performance was broadcast to the world.]
    There are two aspects here:
    3a. When new inference methods are developed. Need to warn subjects, since they are now "visible" in a new way.
    3b. When the subject's circumstances change to allow new inferences to work on them. When they start up, their actions may not mean much, but later they will. Eg1: if you only spend about 10% via a credit card, having it public doesn't mean the same as when you spend 90% that way and your whole income can be estimated from it. Eg2: you start using a computer as a student and anyone could look at it, but when your job begins to require you to write confidential references, or to store exam marks then your privacy requirements change.
    3ab: Note that both might be satisfied by sending to each user all the inferences / summaries about them that the system can make.


Note that my prinicples really amount to:
  1. Show you are being observed. But this still gives no information on what audience to prepare your actions for.
  2. Show who is observing you. Tells you the audience, but gives no feedback on the kind of inference they are drawing about you.
  3. Show you what inferences can be drawn about you from the observation.

Note too that there are both social AND technical means of control. Do the principles express this?

Focus on actively informing about new consequences

More and more, I favour focusing on this principle alone.

Miscellaneous aspects

Secret detective work vs. public deterrence

Related to that is the two contrasting approaches to policing: having secret detectives that work by stealth and even deception as a counter to committed criminals, versus very public monitoring that works by deterrence e.g. uniformed patrols, neighbourhood watch, large signs marking burglar alarms, etc. The latter are extensions of communities where everyone's actions are open to scrutiny. ICT offers extended opportunities in both directions, just as it offers new opportunities for malicious and criminal use wherever any individual uses it in ways that do not follow the principles discussed above.


We must note that anonymising data is not a simple issue. In many particular cases, identities can be re-deduced from other data. To take a simple example, anonymous marking of student scripts probably works well in a large class, but can seldom be possible at all in a class of six, except by markers who have never met the class, nor heard anything about the individuals.

Privacy measures now

Locking doors, drawing window blinds, locking filing cabinets.

Why do people care? Because it's too much effort to go explaining oneself all the time: actively communicating the impression you make and overcoming mis-inferences. Because people are just too different from each other in what they expect, value, do.

What annoys people now

What evidence can bear on all this?

People feel betrayed by snooping, and this appears in moral codes, legal restrictions, and outrage: listening at keyholes, opening mail.

Recording one's own phone conversations is legally restricted in some but not other places: after all, only the intended audience gets it in the first instance.
Pressing one's nose to someone else's window: invasion, but essentially, inviting oneself into someone else's space.
Spying by telescope.

Turning on all recording and publishing it, while thinking of learning to program; and not realising that typing personal emails has other implications.

Web site logical path: [] [~steve] [grumps] [this page]
[Top of this page]