23 Apr 2001 ............... Length about 4,000 words (25,000 bytes).
This is a WWW document maintained by Steve Draper, installed at http://www.psy.gla.ac.uk/~steve/grumps/ethics.html.
You may copy it. How to refer to it.
Web site logical path:
Stephen W. Draper
This is to collect some personal notes about ethical issues of
collecting user input data.
Although I don't mind who reads this, it is still an early and rough form
of my developing views.
Chase up Belotti papers on feedback and control
This document only discusses problems or costs of collecting personal
data, not the benefits. In any specific case, people will (if given a choice)
decide whether to participate on the basis of the balance of costs and benefits
as seen by them. However it is the business of technologists and researchers
to develop and describe the benefits, and one they can be relied on to do.
This document is about considering what they will often not bother to write
about: the disadvantages, and how to reduce or overcome those disadvantages.
In fact benefits often have two aspects: the possible benefit, and the
probability of that coming about. This is particularly so with research, where
there is often a pretty small chance of a rather large benefit. After all,
research is about pursuing very uncertain issues: that is what makes it
research, not development. However uncertainty is pretty unappealing to
ordinary people. That makes it even more important to reduce the disadvantages
and costs for people we wish to give us their permission to collect data
from, so that they need only the smallest belief in there being future
However, is this avoidance of reasoning about benefits possible? Perhaps not,
as we see below, if we accept the argument that we must invest in providing
privacy controls. Privacy is a benefit, after all.
In general, we have to consider privacy and awareness.
Privacy: most users will need to keep private some of their computer actions:
e.g. passwords, confidential communications about themselves or others.
Awareness: It is not obvious what recording could make visible.
Whether for ourselves or our users, whose agreement to participate we need, it
is hard to control this. And without awareness, there can be no genuinely
- Require signoff acknowledgement on starting up the software
everytime: no signoff, no software use. Of course this means many will switch
it off i.e. won't sign, so their data won't be collected.
- Have a permanently visible reminder (but do not require action every time).
Simply having a clear sign or notice might be enough: the analogue is that
seeing someone else tells you that you are in public / visible.
- Provide a control over the recording, allowing it to be turned off (and on).
- Broadcast information on EVERY use, particularly new uses that extend what
can be drawn from the collection. I.e. every time a new type of data retrieval
/ inference is developed, publish this to the users.
In addition to the principles about privacy and awareness, in practice
these must be combined with usability issues. Thus convenience means a user
does not want extra actions inserted as part of their regular work: which is
why acknowledging or explicitly authorising collection is not desirable every
session. Seeing controls as a reminder is better than having to do something
every time, but perhaps unnecessary if we could be sure a) users had previously
understood and thought fully about the issue, b) users would recall it to
conscious consideration whenever appropriate.
What is wanted is a) no collection without awareness; but b) once genuinely
considered it should be OK for it to fall back below awareness; yet c) not
being led into allowing it without appropriate awareness. The last is probably
in part about periodic reviews? and about spelling out, at some time the user
has the attention and knowledge to spare, advanced implications.
But probably it means something like (cf. 14 day retraction period for consumer
Can start up with a new software for a user with minimal warning that it
includes recording; but need a graded set of reminders about options to disable
the collection and/or about the inferences that are drawn from it. Require
periodic reconfirmation? Or reminders that are only reduced depending on how
long user has spent reading the information on the consequences? or passing a
quiz on them?
In other words, all the techniques we see in current UIDs (about blending
convenience, personal settings, and warning of dangerous actions) should here
be applied to protecting the subject (person being observed).
As noted above, people make decisions by attending to something for a
short period, and then want to forget it: having the decision remain effective
without further action and attention. If it can't be done just once, then they
need to have/ to set up a review mechanism, where they review the decision
periodically, or when a problem arises (in which case, they need a mechanism to
have it brought to their attention).
How is this supported in software?
- Ideally, by finding one solution that suits everybody, so it can be wired in
and no decisions or actions by users are required.
- Next best, is if the software can inspect the context of that user, and make
the decision correctly for that user without bothering them. This is now often
achieved by self-installation software.
- Next best: personal settings. The user may/must set various parameters,
which are then remembered by the software. They can sometimes afford to defer
this, then personalise the settings after a bit. Often this turns out to
require the software to remember 3 sets of settings: factory settings (which
hopefully allow the software at least to be used, even if not optimally, when
switched on); users' permanent settings, once made; temporary state, allowing
users to change modes but not have them remembered for long.
- Modes: user explictly changes state whenever needed. This classically leads
to mode errors: when users forget to change state appropriately. The errors
arise because users must maintain a memory of the state in their heads (but may
forget); AND because the actions they take must depend on the prior mode/state:
this means the actions cannot be routinised, as they vary not just with what
they want to achieve next, but with what the last state was.
- Routine alerts: whenever the user requests a dangerous (irreversible and/or
costly) action, the software queries them requiring an extra confirmation.
Habit frequently negates this, by getting the user always to confirm without
even reading, much less thinking about, the alert. (This is thought to
contribute to train drivers going through red signals where the same AWS
warning is routine in areas where most signals are yellow.)
- Send users the consequences of their current settings: in interaction
recording, summaries of the data collected and inferences drawn?
- Repeat alerts until timing information shows users have probably thought
about it and actually read the small print / warnings. (Cf. on railways,
timing trains approaching junctions/speed restrictions, until it's certain they
have not just seen the signal but slowed down.)
In general, we have to set up not just the software settings, but the users'
habits. And furthermore, set up periodic review and/or warnings when the
situation has changed so the setting needs to be reviewed. But also to allow
easy (simple as well as fast) startup for new software adoption; i.e. allow
immediate startup without specifying many parameters, but have these reviewed
as usage settles down.
- Concealed observation, then sudden publication: the News of the World
tactic. Widely regarded as distasteful, but legal. It uses deception and
concealment to bring out things which an individual does not advertise, and
feels ashamed of if they are advertised.
1b. Most of us would in fact feel this, not about actually unethical behaviour,
but just if a camera were concealed in our bedrooms and bathrooms.
- Concealed observation, and permanently secret records: police state
methods, but also those required in personal banking.
Again, information you thought was limited gets to a wider audience you don't
- Concealed recording of public actions.
You know you are visible when going into any public space, so should you even
be told about it?
3b Being filmed in public, and the film distributed: in fact, modern technology
is an issue here, as it affects distribution and inference from the
observation. Normally we (seem to) limit people's observation of us to events
and their immediate consequences as communicative acts. Recordings mean that
our actions get displaced in time and space to other audiences, ...
- You are told about the observation, but cannot stop it. You knew, by virtue
of being in a public space, you were visible: this just confirms it, and
perhaps warns you about the recording aspect. This is the DPA (data protection
act) position: you must be notified, and have access to the records, but have
no rights to prevent it.
- You are given control over the recording happening or not.
- Can turn it off whenever you remember to
- It is only turned on when you remember to.
Presently we are informing students (participants, subjects); and
giving them control to turn it off; but NOT remembering that setting, so it
requires action every time to turn it off.
We aren't remembering that state because it is not convenient to our software
Furthermore, we aren't reminding them (if they turned recording up to total)
to turn it down when they do confidential stuff. Perhaps we should prompt or
act, to turn it down to concealing keystrokes, when they switch to email, and
perhaps when they switch to Word?
Probably the software, and the architecture from which it springs, should be
designed to: a) set things via defaults AND reminders so that it nearly always
gets confidentiality right for most people; b) allows users to override it when
they want; c) then but only then is it reasonable to allow the path of least
resistance also to suit the collectors.
The notion is that principles -- here the ethical issues to do
with data collection -- should be wired into the basic architecture we invent,
and not just left for later users of our architecture as a commentary on
This is desirable because:
- Depressing social psychology studies have shown that, while a small minority
may be principled, most of us will do terrible things if led into it by the
situation. If we want a society where most people behave well, we need to put
ourselves into situations where we are led into behaving well.
- Civil engineering is currently in trouble here. Big dam projects in India
and China are currently criticised because they destroy more people's lives (by
flooding) than they improve by water and electricity supply. The engineers
pursue their ends, and leave it to others to compensate for loss of land and
housing. Those others then don't get round to delivering, meaning that the net
effect of the engineers is to make life worse. On this analogy, it is a very
bad sign that we aren't remembering users' privacy settings across login
sessions because it is inconvenient in our architecture: this, on the argument
above, shows we have a bad architecture: one that favours stealing data, and
ignoring privacy concerns. Indeed, one that promotes this unethical behaviour
to the extent that it has determined our own behaviour in this, our first
- Note that people (here, engineers who may use our architecture) can often
be influenced by very small things. It has been shown that simply having a
yard of garden in front of a house greatly reduces vandalism and burglary: a
garden offers zero physical or legal barrier, but anyone entering that front
garden feels (rightly) that they have to have a reason for it. We should
consider setting things up, so that it is both easier to do the right than the
wrong thing, and that what the engineer does is visible and makes them feel
they are publicly accountable for the settings they adopt.
- The growing number of ways that modern ICT allows data to be both
collected and processed means that much more care must be taken by everyone,
both users and developers, to understand what their visibility is, and so how
to control their privacy. See CACM, Feb. 2001 issue for some example issues.
So we can't just do collection, we must also put a lot of development effort
into active privacy controls for our users even if this isn't our main or
Put another way, it looks as if this point is one of our first discoveries
about what must be done in this field.
There are two main approaches we could take. Either to collect everything in a
persistent store and put the controls on its exit: on what gets used and by
whom. Or else, to put filters at the point of collection: controls for the
user (and also ourselves) on whatever gets collected, and so sensitive stuff
doesn't get sent across the net (and the potential problems with that).
In Grumps, we are exploring the latter.
It is looking as if this is more difficult to do right than we had realised.
The stages are something like:
- Do nothing. This means few users would be right to trust us, as we make
it difficult or impossible for them to protect what they need to protect.
- Our first implementation: offer control in terms of the classes of events
recorded i.e. the control mirrors low level technical categories. It turns
out there is already a considerable problem with this. Our second most secure
category or level is to record only NT window focus events. These store
information about the window including its title. It turns out that there is
quite sensitive information in these e.g. email addresses, user's names,
login names, home directories etc.
- Many users, and many of us, think that what we want is privacy control in
terms of software applications e.g. turn recording off for email, but on for
other things. However MPA says this wouldn't be much use for him: many of his
emails are public domain, but private stuff might be typed into a number of
different applications (e.g. create email text in a word processors, ....).
- We now think, in general, we need to create filters of arbitrary
complexity but supported by authoring software we develop, that are installed
at the point of collection; and use these not only for the investigators'
needs, but equally for users to control their privacy.
Why do people care about privacy?
I'm going to suggest for two classes of reason: the costs
of communication, and trying to exploit others' ignorance.
Human communication turns out to rely heavily on inferences by the hearer to
fill in what the speaker says, and so reduce by a large factor the amount that
must be stated explicitly . This is quite intentional by the speaker (it
saves them work and time), but it means a lot of mental effort is spent by
both parties in making, managing, and correcting the inferences made. One of
the things that makes writing so much harder than conversation, is shaping the
text to elicit all but only the intended inferences in the reader (who cannot
use dialogue to check and correct these on the fly). Another feature that has
come out of studies of purposeful dialogues (i.e. those to do with practical
joint action, as opposed to idle conversation), is that a lot of work is done
to set up and achieve a common set of beliefs in the facts, but often only as
far as they are relevant to the task at hand: it is often just too much work
to correct all the irrelevant misconceptions as well.
Observing others, no matter how (e.g. overhearing their conversations) also
gives rise to copious inferences. However because these are not the subject
of interaction, they often contain many mistakes. On the other hand, people
often do a lot of work to present themselves socially in a certain way e.g. as
reasonable. That is, they spend considerable time and effort offering
explanations ("accounts") of themselves, their actions, and their motives. A
basic reason for this is that not just conversation but all our interactions
with people largely depend upon their ability to guess correctly what we want:
whether we are moving past them on the pavement or selecting things to
mention to our boss.
Being observed when you were unaware of it, or being observed in new ways as
technology develops, puts you in the position of either losing the successful
presentation of yourself to others that we all work on to some extent, or else
having to do a lot more work on it. This must make people reluctant
because it's too much effort to go explaining oneself all
the time: actively communicating the impression you make and overcoming
mis-inferences; and equally explaining what is actually the case, the facts.
There are probably two fundamental sources of this, of this difference that
makes inference and explanation so central to our communication: firstly,
people just know different things from each other, even in little things; and
secondly, people have different values.
Basically: being more visible is likely to involve us in giving more
explanations of our behaviour, but without any benefit in terms of more
beneficial actions from others. (Targeted marketing based on records of our
purchases might be an exception to this.)
The effort of explaining oneself and other things is one deterrent.
Separately from that, there are other motives for not exposing all the
information you have. What people do depends on what they know; sometimes,
their ignorance leads them to act in ways you prefer.
One example is information about price. But equally, other kinds of
information can change market (purchase) behaviour: about the quality of the
goods, about where this food has come from.
Similarly how people vote, or act in other ways can be affected by the
information they are given or prevented from getting.
Another examples is crime: fraud depends upon witholding information, and
indeed in purveying false data.
In my value system, these are all bad reasons; but they are quite widely
- The principle of visible observation.
Like seeing someone else in the room or a bulky CCTV camera, you know you are
in view, even if you don't interact with the observer. This is what we are
inured to in pre-technological life; and is a considerable part of the
distinction between public and private spaces. Being observed when we assumed
we were private is what annoys and threatens us. The distinction between
people at the next table being able to hear, versus someone listening at a
- The principle of mutual visibility.
Being seen only by people you can see in an equal, reciprocal relationship.
The analogue is being in the same room or same street. Reports suggest that
CCTV in public places leads to bad behaviour by the control room staff because
they can see without being seen, in ways that would probably not occur if they
were themselves equally surveyed. The fix there would be to broadcast a camera
showing the control room staff on to public screens in the spaces being
surveyed. For user input recording, it suggests that
2a. the data gatherers should also have their input actions gathered, and be as
visible to the other users as the latters' data is to the gatherers. It may
also suggest that
2b. all data retrieval actions (and who did them) should be recorded and made
public, since these are the equivalent of observing the original users. In
other words, one should be able to know, not only what data is recorded about
oneself, but who has inspected it.
- The principle of actively informing about (new) consequences.
We may need to extend the above principles because the implications and
extended uses of data in new technology are not familiar to many users, and
indeed may only be discovered and developed as time goes by, contexts change,
technology develops. Thus informed consent requires extended informing (and
"re-consent") over time. So this principle proposes that each time a new
application of, or inference from, a type of data is developed, all the users
contributing to the data should be informed. [That anecdote on the person who
allowed their input to be visible on the web, only to find that a hostile
summary of their days' performance was broadcast to the world.]
There are two aspects here:
3a. When new inference methods are developed. Need to warn subjects, since
they are now "visible" in a new way.
3b. When the subject's circumstances change to allow new inferences to work on
them. When they start up, their actions may not mean much, but later they
will. Eg1: if you only spend about 10% via a credit card, having it public
doesn't mean the same as when you spend 90% that way and your whole income can
be estimated from it. Eg2: you start using a computer as a student and anyone
could look at it, but when your job begins to require you to write confidential
references, or to store exam marks then your privacy requirements change.
3ab: Note that both might be satisfied by sending to each user all the
inferences / summaries about them that the system can make.
Note that my prinicples really amount to:
- Show you are being observed. But this still gives no information on what
audience to prepare your actions for.
- Show who is observing you. Tells you the audience, but gives no feedback
on the kind of inference they are drawing about you.
- Show you what inferences can be drawn about you from the observation.
Note too that there are both social AND technical means of control. Do the
principles express this?
More and more, I favour focusing on this principle alone.
Related to that is the two contrasting approaches to policing: having
secret detectives that work by stealth and even deception as a counter to
committed criminals, versus very public monitoring that works by deterrence
e.g. uniformed patrols, neighbourhood watch, large signs marking burglar
alarms, etc. The latter are extensions of communities where everyone's actions
are open to scrutiny. ICT offers extended opportunities in both directions,
just as it offers new opportunities for malicious and criminal use wherever any
individual uses it in ways that do not follow the principles discussed above.
- My principle of active publication
- This goes against laws promoting surreptitious monitoring, even by police
- Can it replace the other principles?
- It focusses attention on the feedback loop: letting users learn what the
exposure is, and so how to deal with it. This may be the main issue: what we
hate is having surprises sprung on us. Not least because information is uses
like communication; but communication is an active effort, requiring you to
work on how it will be understood by others.
We must note that anonymising data is not a simple issue. In many particular
cases, identities can be re-deduced from other data. To take a simple
example, anonymous marking of student scripts probably works well in a large
class, but can seldom be possible at all in a class of six, except by markers
who have never met the class, nor heard anything about the individuals.
Locking doors, drawing window blinds, locking filing cabinets.
Why do people care? Because it's too much effort to go explaining oneself all
the time: actively communicating the impression you make and overcoming
mis-inferences. Because people are just too different from each other in what
they expect, value, do.
What evidence can bear on all this?
People feel betrayed by snooping, and this appears in moral codes, legal
restrictions, and outrage: listening at keyholes, opening mail.
Recording one's own phone conversations is legally restricted in some but not
other places: after all, only the intended audience gets it in the first
Pressing one's nose to someone else's window: invasion, but essentially,
inviting oneself into someone else's space.
Spying by telescope.
Turning on all recording and publishing it, while thinking of learning to
program; and not realising that typing personal emails has other
Web site logical path:
[Top of this page]