Last changed
6 Apr 1998 ............... Length about 2,000 words (13,000 bytes).
This is a WWW document maintained by Steve Draper, installed at http://www.psy.gla.ac.uk/~steve/miraplans.html.
You may copy it. How to refer to it.
MIRA plans
by
Stephen W. Draper
Started April 1998.
This sketches plans for the last 2 MIRA workshops; and other things.
These people might be invited. They need to be cleared with EC; and
then written to to ask them (get them to put it in their diaries).
Miche (not at city)
Victoria (at city; should be encouraged to come?)
Clare Harvey
Photo woman (GETI? company)
Stephano Mizzaro
Annelise
Josianne? (now come 3 times; but not an official member)
Pia Borlund
Iain Campbell (to talk about his program)
Smeaton's yankee
Date and call for papers for final conference. And in fact therefore
decisions about the format and themes are probably needed.
Dates, hence length, hence content of Dublin workshop.
Fund 4th working group (me and Dunlop)
To spend more or less on inviting people.
Last week, Oct. 1998, Dublin. (?Wed-Fri 28-30 Oct 1998)
Schedule largely set by a) reports from working subgroups b) rehearsal of
activities for final workshop/conference
- A. Reports from the working groups. 1 hour each? Certainly a report; perhaps
ask for a trial interactive exercise.
A1 Joemon / Finland: photo retrieval domain study
A2 Fabio: relevance consensus tests
A3 Miche: designing a MMTC proposal
A4 Dunlop&Draper: start work on a demo illustrating the onion evaluation
model w.r.t. versions of a single piece of IR software. Do evaluation tests,
and save the videos of users failing.
- B. CVR: 1 hour session: planning for final conference. Presumably the aims are
to get: ideas, promises to do work preparing, consensus in MIRA. Might not be
necessary if enough of the planning below looks acceptable and/or can be agreed
over email within MIRA.
- C. Sessions prompting people to write papers for the final conference.
?Perhaps get MIRA people to have the idea and present a 10 min. outline?
- D. A debate / panel on interactive TreC. To what extent do we think it is
adequate?? Get Smeaton's Yank to present what it is; and perhaps defend it.
Dunlop&Draper, if no-one else, could concoct an "against" position.
- E. 30 mins on senses of information need. E.g. 10 min. talk by Pia Borlund;
possibly ditto by Stephano; demo (examples? exercise?) by Dunlop&Draper to
illustrate all this.
E2 ?Pair this with a similar one on senses of relevance judgement? Try to
balance: abstract statement, one or more good examples, interactive exercises
e.g. to show degree of consensus. And to relate it forward to possible
implications for the possibility of an MMTC.
- F1. 30 min. session, plus available as demo over evenings etc.
Get Iain Campbell to give a talk on his software. Its importance is that it is
a) image retrieval b) radical no-query all-relevance-feedback technique. Not
just interesting as an approach, but of course invalidates old TC tests as an
approach.
F2 30 min. talk/session. Clare Harvey: missed her this time. Continue our
education on eval. techniques.
F3 30 min. talk/session. Invite Dunlop's GETY woman to re-give talk on a
particular domain of image retrieval.
Final one; Easter 1999, at an island near Glasgow. [Cumbrae; Aran;
Bute; Orkney? Trossachs and a cruise on Loch Katrine?]
It was decided to have 2 half-weeks (one for IRSG), back to back with a free
weekend in between.
Two aims:
- 1. Dissemination. Any gesture, particularly with a wider audience, would do to
look as if we are disseminating. If however we really want to change minds,
then 2 things are needed:
1a Getting a range of influential yet susceptible attendees
1b Putting on exercises and demos rather than just a few monologues to get the
key points across.
- 2. Motivate wider attendance: probably means a conference and publication
format. However some of the topics proposed below would probably motivate
DARPA officials to attend if we notify them, as they relate to what could be a
sensible way forward for testbeds in IR.
- A. Paper sessions. Could go for parallel sessions: more time and more
discussion for each speaker; Could go for high-discussion format: speakers
warned to design a 30 min. session led by only 5 min. monologue ....
A2 [NONE] It would be entirely possible to invite and publish papers without
having the author give a talk at all.
- B. Possible gimmick. Video all sessions; get revelation to mount this video
archive ASAP on web both for its own sake for the IR community and as a
multimedia test set (modern counterpart to a collection of academic papers).
But video multiple copies live (e.g. one super VHS for the records plus a chain
of 4 domestic VHS recorders); have the tape instantly available during the
workshop for replay, viewing the sessions you missed, continuing the
discussion; give one of the tapes to each speaker as a leaving present.
- C. Demo/ exercises. Have one from each of our working groups:
C1 Fabio (also Yves/Fermi??): image retieval relevance tests. The exercise
would be to get every attendee to rate each of a set of test images for
relevance (then pool answers and show degree of (lack of) consensus.
C2 Joemon??
C3 Miche: proposal for MMTC. At the least, some example simulated information
needs, which participants take and then use a test program to try to satisfy;
followed by scoring of results using previously obtained domain user relevance
judgements????
C4. Dunlop&Draper: A demo illustrating the onion model of evaluation with
reference to a piece of IR software. For each onion layer, identify at least
one specific IR design issue, and show 2 or more variants of the software with
and without specific support for it. For instance, a) Raya's study mentioned
that her pupils failed to recognise a search engine form/box on screen if it
wasn't labelled in a way they could recognise b) failing to have the software
highlight and auto-scroll documents to the words matching the query has often
been reported to result in users being able to recognise a selected document as
in fact useful to them.
*C5. Re-run Annelise exercise. Seems a pity to waste all the work that went
into it. This would mean inviting her. Would it be too heavyweight for the
conference? Still, also impressive as a demo of what serious evaluation could
be like.
- D. Panels (debates). Something you can get out of workshops and conferences
which you can't usually get from journals is the opinions of experienced
researchers about issues they don't research themselves (particularly, their
reasons for NOT working on those issues). Panels allow the organisers to
dictate the question and elicit interesting views. Some sample panel topics
[this is only a very preliminary list]:
- Interactive TREC / non-interactive TreC: useless? ....
- What kind of relevance should we measure? (Mizzaro lists dozens of kinds)
- Interactive system performance depends on the work context so no standard
test bed is of any use or generality
- All kinds / levels of evaluation are needed. Each panellist speaks briefly
about a different level. [This could be a talk session rather than a debate;
but might be a good way to drive home our point about the diversity of
approaches actually needed if IR evaluation is to get real.]