25 Oct 1996 ............... Length about 6400 words (40000 bytes).
This is a WWW version of a document. You may copy it.
To ftp a postscript version of it to print click this.

Report on ELTHE workshop 3
Thursday 19 Sept. 1996

assembled by Stephen W. Draper

Contents (click to jump to a section)

The third ELTHE evaluation workshop was held in Glasgow University on 19 Sept 1996. Organised by Steve Draper, but supported both financially and administratively by TLTSN (the Glasgow University part of the regional TLTP support network). There were 34 participants altogether. Apart from the organisers there was little overlap in participants with the February workshop, implying that the UK could use at least two workshops a year but with variety of times and places to suit more people.

Feedback indicates that it was a definite success: no-one fell asleep after lunch, many seemed reluctant to leave, and remarks such as "I now feel refreshed about the whole business of evaluation" were heard. One reason for attending is for people who are the only evaluator in their group and need some peer interaction to overcome the isolation, or even training. However I am equally struck by how those who do have substantial experience and local colleagues in evaluation still find these workshops stimulating and useful. Too often workshops and conferences are conceived of as either for "dissemination" (meaning talk-only, don't listen) or for training (meaning listen-only, don't contribute), but these workshops remind me how important it is to share problems and solutions: to hear about other people's problems and wonder how your own methods would stand up to them, and hear about other people's solutions and wonder which bit of your own work might adopt them.

The workshop followed the plan tried out at the previous one, and in fact if anything went further along that road. While many "workshops" actually consist of a string of invited talks, these workshops have become mainly discussion while yielding just as much learning for participants (what I myself learned appears below). We had one proper invited talk by Sue Hewer, while Steve Draper's was cut short and all the indications were that even less monologue and more discussion would have been better, as participants organise their own topics. An evaluation scenario had been provided as a topic for the afternoon group discussions, but in the event only one of the five groups addressed it. Two of them addressed an issue not foreseen by the organisers, but important to many participants: how to do evaluation of WWW based materials. This was associated with a reshuffling of the groups in the light of this emergent interest. Feedback suggests that this spontaneous reorganisation had mixed effects: about equal numbers of people felt it was a good thing or a bad thing from their point of view. Besides the group discussions and main speakers, there were 6 short (5 min) talks at the start by participants who wanted to air their positions and/or problems, and which in part set the agenda. Below I have a section on the issues which came out of the day for me, and which were initiated by these short talks. After that are reports on each of the group discussion sessions. These were written by the coordinator or other participant, and have been given to group members for comments.

Some quotes

Why do you have to do empirical evaluation? "Because I can't tell the difference between a gut reaction and indigestion."

Warning to students who show initiative: "Self-directed learning equals self-destructive learning, because the exam hasn't changed."

"He's terribly blinkered, but he is facing towards the light." (Better read the education literature if you don't want to be patronised.)

Short presentations of evaluation problem issues

Gayle Calverley raised the plaint of how could she continue to do evaluation with no money and no other resources, given that the user sites were now remote from her and that in fact four different areas of evaluation were important: formative and summative evaluation of the software, evaluation of the educational content, and of how it matched teacher's requirements. Julian Cook presented his problem of supplying medical images on the WWW, which meant that his users were not all or mainly students doing courses: indeed, apart from the problems common in web applications of not knowing the time or place of many uses, his evaluation is as much about discovering uses for the service as about evaluating how effective it was at serving known needs. John Cowan and Judith George raised two problems in formative evaluation of courses. The first was that while testing the effectiveness of individual learning resources (such as a text book or single piece of CAL) can be done using pre- and post-tests, and used to refine the design of those resources, courses that use an integrated mix of resources cannot be tackled in this simple way, as test scores do not tell you which resource failed and should be modified. The second issue was how to use a tool such as concept maps to yield measures of deep learning or understanding that could be used in formative evaluation. Cathy Gunn stated her problem as a headline: how do you resolve generalisability and authenticity? Ross Reynolds stated his problem as can the WWW be used for training: a theme that attracted enough attention to take over two discussion groups later on. Jennifer Wilson raised the issue of how exactly pre- and post-tests should be organised: surely if you use identical tests then students may just remember the tests, but if you vary the test then you have to worry about differences in the tests (however much you try to make them equivalent) and try to counterbalance two versions of the test and two groups of students.

Themes that struck me

These are the four most important things that I myself learned from the workshop.

Looking for unspecified educational gains

In my own work, although we have always looked out for any problems that hadn't been foreseen by the teachers, we only looked for educational gains specified by the teacher in the learning objectives we elicited from them. This in fact is consistent with a hardline instructional design approach of subordinating all design, implementation, and testing to explicit objectives. But a number of discussions around the workshop made me realise for the first time that this is probably unreasonable. There could easily be benefits that a given teacher had not planned explicitly for, and if in evaluation we have to be alert for unforeseen problems we should also look for unforeseen benefits. Furthermore, if we could really foresee educational cause and effect then we wouldn't have to do evaluation at all: the point of doing formative evaluation and iterating the design is to take account of the unforeseen. Given that, it is unbalanced to only look for unforeseen bad, not for unforeseen good. It may sound unprofessional to implement courseware on the basis of rather vague ideas, and then use the testing on learners to discover what it is actually good for, but this is consistent with the need for evaluation; whereas hardline instructional design is logically very close to instructivism and a position that teacher knows best, telling learners causes learning, and nothing should happen in education that was not planned for and organised by the instructor i.e. the learners should have no input to the design process (it wouldn't be design if it weren't controlled by the designers, would it?).

We could begin to do this by, for instance, first asking learners what they think they learned, and only then administering a post-test to measure how much they learned of what we had thought they should learn. In later iterations of the evaluation, we could do systematic tests of any new benefits brought up by students. Similarly, as John Cowan pointed out, we can ask which outcome or learning objective is valued or valued most by students, and how they learned them (not just whether they did).

If I ask people, at least those of my age (I'm in my 40s), what was most important about their university education they tend to say things like "living away from home for the first time" or "meeting people with seriously different opinions than those I had met" or "having my views respected and taken seriously". They do not say "learning Maxwell's equations" nor "learning to write essays and give talks". If you look at those statements, you can see how relevant they are, not just for personal significance, but for many jobs; yet current learning objectives and notions of personal skills in HE fail to address them. Perhaps students do know more than teachers about the value of what they are learning, and we should ask them and attempt to measure it.

Similarly we could not measure the value of workshops such as this by fixed post-tests and an instructional design perspective: no-one knew before the workshop what I was going to learn; and the learning outcomes, though considerable, were probably different for different participants.

Big and small CAL

Some of the discussion made me focus on just how great a difference the scale of the educational intervention makes. A piece of CAL might cover the whole of a year's course, or it might be a small tool bearing on one point and taking less than an hour of each student's time. While all the big phrases (like formative and summative) apply in the abstract to both extremes, many things are entirely different depending on scale. At the big end, the evaluator may not need to do any tests of learning outcomes of their own at all, as the year exams will be essentially a direct and exact test of the CAL; while at the small end testing can only take a few minutes if the whole intervention is less than an hour, so probably confidence logs rather than direct tests must be used. Scale also has great effects on the development costs, and on the organisational issues of acceptance. At the big end, development costs are going to be big and the timescale slow, which probably means that many institutions must agree a common curriculum and the material must be usable for years at a time (or to put it another way, will be out of date for much of its lifetime, perhaps even the first year it is used as it will take several years to create). At the small end, a single teacher can create a little simulation or demonstration to illustrate one session. This will not involve any technical staff, will not require any explicit funding, will not involve any coordination or agreement between teachers, much less institutions. Clearly it is the small scale CAL that is likely to find easy acceptance and dissemination. It was suggested (by Fares Samara) that a good institutional strategy might be to support small scale CAL as the way to gain staff acceptance of the use of learning technology: it gives them total control, low risk, incremental change to courses (not revolution), and so on.

Thus small, medium, and large scale CAL may have quite different characteristics for evaluation, for funding, and for issues of institutional change.

Making evaluation part of standard practice

Evaluation that is part of permanent ongoing practice is not just a pious hope, but will have rather different characteristics and advantages from evaluation done one-off as part of innovation. For instance it may be objected that giving students a pre-test will alert them to what they should learn, and so they may learn more than they will in later years when the evaluators have gone. It is true that evaluation only measures the combined effect of teaching and evaluation, but if "evaluation" improves learning then it should be part of the permanent practice: the pre-test will become part of the communication to the learners about what is expected of them. A further point is that this would be a move towards making the students evaluators of their own learning: a useful skill for them, salutory for the teacher, but above all a move towards a more intimate involvement and understanding between those who evaluate and those doing the learning and teaching. Another point is that to the extent that evaluation becomes permanent, it may be done by the teachers without external evaluators, and so will not incur extra costs (a partial answer to Gayle's problem), although there may still be a need for the confidentiality that external evaluators can bring (a further issue that we didn't discuss).

Another advantage of evaluating becoming standard practice is that as data for successive years accumulates, it becomes more valuable: exam results can be added to the evaluation measures, the chance that any good results were a fluke of special students, novelty, particular care by the teacher and so on is reduced and the likelihood that a stable result is being seen is increased. Furthermore in a number of cases the best results are only seen in later years, as the experience and evaluation of the first year are used by the teacher to make further improvements. All of these are reasons for making evaluation part of permanent practice, and for viewing reports of one-off evaluations as rather less informative. From this perspective, too, educational evaluation should be seen as more like meterology and less like landmark scientific experiments; less to do with once-in-history discoveries and more to do with ongoing sets of measurements that are both of immediate use locally and build up large datasets for wider conclusions.

Evangelism-friendly evaluation

What kind of evaluation report would be most useful in convincing other teachers to adopt a piece of CAL? 1) Effectiveness: it should report on learning outcomes: at least adequate, hopefully improved. 2) It should report costs e.g. savings in teacher time. (This is likely to motivate them, unless the institution will just assign them other duties as its response.) 3) It should report on how to use the new material (e.g. many simulations give poor learning when just dumped on students but good learning when accompanied by a well designed worksheet to guide students through appropriate use). However concealed in the last item is one of the biggest barriers to changed practices, which is whether the teacher will be required to learn a new technique. This might be using a new piece of software, or conducting genuine group discussions: but whatever it is, whether it is new depends on the individual teacher and not on anything the software or course developer can do.

Group 1 morning by Margaret Brown

We listed the issues partcipants were interested in:
* Pre and post tests
* Evaluation of Computer Conferencing
* How to prioritise when have limited resources (people, money) e.g. alternative methods if you can't afford a 'full' evaluation
* Evaluating Learning on the Web: control, purposes, specific resources
* Remote Evaluations of: WWW, CAL in use in other institutions
* CMC (Computer Mediated Communication): the problem of people not responding electronically (or otherwise) to questionnaires
The main focus of the discussion was on WWW evaluation as the medium to discuss the wider issues of evaluation. WWW as provider of specific courseware as well as a source of many, varied resources; Problems of downloading and accessing resources; Problems, both legal and technical, with keeping a log of the e-mail addresses etc. of those accessing a web site. People are having problems with knowing who is actually accessing it let alone beginning to evaluate what they are doing !

We know about available evaluation methods BUT how do we apply them? especially: remote evaluations (WWW etc.) and where there are; constraints (people, time and money).

Group 1 afternoon by Margaret Brown

The Group decided to attempt the supplied evaluation design exercise. The course to be evaluated was a basic IT course for students with little or no previous IT experience. Two courses were offered:
1) a 12 hour course for those with little or no previous IT experience.(to meet the IT Baseline)
2) a 6 hour course for those with some experience but who wished to meet the IT Baseline

For those who felt they had achieved the IT Baseline already there is an assessment only option. Two versions of the courses are available:
1) Scheduled in individual departments or faculties
2) Open access in the central computing labs

We focussed on the statement from the course designer:
"....the idea is supposed to be EDUCATION not TRAINING. .......... It is intended to make people AWARE of how IT can enable them to carry out their STUDY TASKS more EFFECTIVELY." We then discussed the necessary resources, the areas to be addressed and the instruments to be used.

1) RESOURCES FOR EVALUATION. We hoped that these would be adequate for a comprehensive evaluation of the course, as the course was to be an important resource for all university students.

IT Baseline
Skill Level
Attainment of appropriate skills
Application of skills
Subsequent use of skills
IT contribution to future study skills (students' perception of utility and reality)
Effective course design
Actual learning time
Departments' perception of student attainment
Skills omitted
"Fear" of Computers (overcome or not?)

DATA required on from all courses and different versions of the courses including the Assessment only option.

All Documentation on Course and Students
Pre and Post Questionnaires
Delayed Questionnaires
Interviews of students and Departments
Obsevation of students
Sample and examination of examples of assessments
Computer logs

The evaluation must attempt to determine if what the course designer wanted is achieved.

It must determine if students attain appropriate skills which they can and do use effectively in their studies.

Also the skills obtained by the students need to be shown to be appropriate/acceptable to the department's requirements from which those students come (otherwise there may not be a case for a centrally run course), i.e. the improvement should be observable and able to be capitalised on by the individual departments as opposed to just useful to the student alone. N.B. how this discussion, but focussing on testing "skills" seems to have gone back to interpreting the course as aimed at training not educationl Perhaps this shows how vital it is for real evaluations to have access to the teacher in order to elicit both their aims and objectives, and how to test them.

Group 2 morning by John Milne

What are some of the issues / problems you have with evaluation

* How can authenticity and generalisability be equated. Evaluations about learning should occur in the classroom within a very specific situation. The result of the evaluation then has no generalisable result.
* How do you determine educational effectiveness
* How do you measure learning gains
* How can you determine if the system benefits students
* What is educational effectiveness
* How do you manage evaluation at different sites
* How do you evaluate material on the web
* How do you evaluate on a shoestring

How do you judge if courseware is good.

Points that were made were:
* Determine want the software is required to do
* Define want is a good outcome for the software
* Think about what the computer adds to the learning situation that other media can not provide.
* Judge the software on its development process. Did they use prototyping? Did they involve end users into the design and development process.
* Integration is vital. How the software is integrated into the course will determine if students learn.

What are confidence logs?

Two participants wanted to use confidence logs. One for University Students the other for Access Students who may have low self esteem.

* Confidence logs are described in Draper et al, Observing and Measuring the Performance of Educational Technology. They are based on the learning outcomes of the situation and are administered as a pre-test post-test.
* Could use interviews to find out why they are confident or not confident.
* Use diaries to highlight changes in confidence or areas that could be worked on
* There is a bubble dialogue technique (any references to this?). This would allow students to enter their confidence levels using pictures with few words. This may be less threatening than other techniques.

Group 2 afternoon by John Milne

We set out to plan an evaluation based on a Web based application. The scenario was based on a participant's project. The MIDRIB project is a database of medical images that will be made available of the WWW. The evaluation was to be on a shoestring. It was to be a Ford Fiesta of an evaluation. This seemed to be the consensus about most evaluations!

We identified the following areas that are necessary for an evaluation. Some discussion is added to each of these areas.

Identify who the stakeholders are in the Evaluation. In our scenario it was JISC, Project Team, Lecturers, Students, Content Contributors, Others. Should prioritise the stakeholders when there is such a large group and have a core group that will receive priority.

Decide on the Aims and Objectives of the Evaluation

This occupied us for a while. There was discussion about the difference between the objectives of the project and the objectives of the evaluation. I see that the project objectives can influence the evaluation objectives, if that is what you choose to evaluate. However the evaluation objectives can be other things, so project objectives are a subset of the evaluation objectives. It depends on what your stakeholders what you to evaluate.

Evaluation needs planning

This is an important part of the evaluation. Write a plan and circulate it to the stakeholders and others for their comments. This will ensure that you are answering the right questions and will help to get others involved in the evaluation. Are the results going to be useful? At the planning stage it may be useful to predict likely outcomes of the evaluation. This will help to determine how useful the results will be.

What are the likely influences on student learning. What should you be trying to measure? It is better to know at the beginning of an evaluation what might affect students, so the evaluator can be aware of it and also students can have the opportunity to comment on it.


A number of instruments are outlined in Draper et al (Observing and Measuring the Performance of Educational Technology).

There maybe some instruments that are easier to use on the Web. Questionnaires from remote sites are easier to administer on the web. The web does record the pages that people view. This could be useful information. But access logs on their own are not really "an evaluation" - you have to do something with them and be able to interpret the results - frequent access to one part of the courseware might actually indicate a problem with navigation and offer the possibility of a shortcut or streamlining.

I think that questionnaires are best administered by paper, especially if there are open ended questions. This allows all students to comment, those not used to computers may not be comfortable responding to questionnaires on computers.

The speed of the network may be an issue for Web based applications. This could be measured by timing how long files take to be delivered and then asking students if this is adequate.

The Evaluation Report

The evaluation has to be written up and sent to those interested in the results. This may mean that different reports are written to different stakeholders.

Other issues

Context of use. How should you administer pretests and post-tests? How do you manage multi institution evaluations?

Group 3 morning by Gordon Doughty

The six in our group exchanged descriptions of our reasons for being at the workshop. We discovered that some were local to Glasgow, some had been at the ALT conference, and others were just attending the workshop. Familiarity with TLTP, Glasgow University's TILT project, and evaluation in Higher Education varied from being closely involved to not knowing what they were.

Those least familiar with the work of TILT's Evaluation Group asked what were the evaluation tools used, where could they be found, and how could they be used. All had genuine evaluation jobs to be done, and wanted help by the end of the afternoon. All but one of the group had some degree of interest in evaluating use of the Web.

Group 3 afternoon by Gordon Doughty

The depleted group - Gordon Doughty, Sue Tickner and Jane Pannikar - found a common interest in applying IT to Continued Professional Development, CPD. So we looked at this rather than the supplied scenario. We focussed on Jane's evaluation needs (partly because she was the only paying attendee in this group). She needs to evaluate a series of CD-ROMs, to be delivered every six months, containing case studies for the CPD of Gyno Specialists in NHS hospitals.

The need to evaluate is: Summative - to establish how long it takes to earn a CPD point by this method of study; Formative - to help each subsequent CD-ROM to be more effective, and perhaps to help them make them into commercial products for export; Illuminative - to help Jane study her personal interest, probably for an MSc, in how good this is as an educational tool.

There are some ideas on changes to the integrated learning experience that evaluation may inform, e.g. the need for incentives and motivation to use the CDs, and for assistance such as a telephone helpline.

The learning system is unusual compared with most Higher Education in that there is no teacher and no assessment. The learners are professionals who probably do not like to admit ignorance. They tend to subscribe to the scientific paradigm of knowledge generation, so may respect the use of quantitative questionnaires and statistical analysis, but not an interpretative method of drawing conclusions. There will be an opportunity at a big conference/workshop in December to launch the CD-ROM series and carry out some evaluation. After that it may not be very easy to evaluate the users.

We felt that the following evaluation instruments would be appropriate: Computer and task experience questionnaires; Observation (e.g. of a group at the workshop - we discussed incentives to make them spend at least 30 minutes with the CD); Focus group at the workshop, followed by individual semi-structured interviews; Post-task questionnaires; Survey of other resources from which they update their knowledge and skills; Questions to establish what types of learners they tend to be; Comparison with learning theories of CPD (Laurillard's model not appropriate?). In the plenary discussion John Cowan suggested that the learners would be cooperative if asked to draft a notional letter to the Royal College on how this aspect of CPD could be improved.

Group 4 morning by Erica McAteer

Some of us had given presentations during the opening stages of the workshop, others hadn't so we went round the table quickly with a "oneliner" statement of our individual main issue for being there. Edmund gave us a basic question, relating to summative evaluation - Something has been developed and tested and is ready for use, what is the best (economic and effective) way of usefully evaluating the package before implementation? Judith has a more holistic concern, looking to build in reflective evaluation throughout use, involving all participants - ie from the student to the institution - in the process. Erica seconded Judith, wanting managageable practices which could extend back through development as well as forward during varied use and change of use. Sue brought us back to the concrete with her problem, which is TIME - how to devise useful methods which can be applied by very busy people, to material which is variously designed and delivered to suit various needs in medical education. Sally underlined all of this - her issue is the development and integration of materials to "balance" existing teaching, as well as adaptation of more general CAL (biological sciences) material and CMC resources. Norman then extended us - he is concerned with delivering courses to rural areas through the internet, and looks to build evaluation into this delivery.

We felt that rather than identifying particular instruments or methods, the issue was one of practice. Reflective evaluation should become a habit within teaching and learning practice: Part of the culture sort of thing... This can seem a bit of a vague and lofty aim, and not very useful as a directive though experience suggests that it doesn't take long "in the business" before each of us realises that that, for better or worse, is how it is. However, this doesn't mean that there are no good guidelines to hand and we discussed these - the "TILT instruments" are tried and true and can be adapted to many classroom uses of CAL. CMC resources may (or may not!) need a different approach - this was picked up during afternoon sessions. There was felt to be a need for guidelines to practice that would cover the development as well as the implementation of learning programmes. Erica had found Judith Calder's book for open and distance learning systems pretty sensible, Judith (George!) seconded this and knew of others - it was agreed that she would mail a list round.

It is difficult to encapsulate the interactions over the rest of the session - especially as I didn't pick up that acetate at the end of the day! There was general agreement that gathering information from the various perspectives of interests (Who Are The Stakeholders?) is obviously important; that feedback between immediately interested parties (developer, teacher, evaluator, students if appropriate) should be quick and even casual, though not at the level of change and change back with every comment!; that some of the responsibility and effort of the exercise should be taken on by the users; that reflection seems to imply ownership in some sense.

There was a fair amount of off the cuff exchange of problems and solutions - in parallel with the coordinator too involved to take multiple notes, and we agreed that such gatherings were really useful to new and old alike, as we questioned our own wisdoms in the face of others' questions.

Group 4 afternoon by Erica McAteer

The afternoon session was framed by a task - "Design an evaluation for:-" which we completed with "establishing the usefulness of Web delivered learning".

The breadth of that title indicates how much our individual remits varied. Just looking round at our own examples it is clear that CMC learning resources can differ with respect to: the nature of the communication itself; the situation within which the communicating task is undertaken; the purpose of that task; the nature of the task; the output of the task.

Devising any "standard methodology" is a bit of a challenge - if only for the relevance of whatever is devised! But some standard axioms of evaluation apply - e.g., considering efficacy in terms of purpose. Thus requiring careful consideration of purpose, and suitability of programme for purpose (bit like looking hard at learning objectives then wondering why you thought a particular teaching practice might achieve them!) And given that in a fairly new field such as this, not much experience on which to predicate - it seems reasonable to take some trouble to closely monitor situations of use - given that these are likely to vary widely for any specific CMC resource.

This should be done "from the qualitative to the quantitative" - interviewing (telephone even!) and facilitating focus group sessions of teaching staff, student users, and other protagonists - separately might be better, but not necessarily - to achieve a "story" of the resource from its different perspectives of use. The keeping of logs or use diaries may be sensible - or not. The thing is to establish what is critical, by asking the users. Where these users were spread around various sites, then locate the evaluation at site.

Then if more global data is required, questionnaires can be devised which address critical issues from the different perspectives, perhaps providing options for choice - N.B. to include some open response questions.

Methods for this were discussed - for some of us there was no problem as the resource had fairly closely defined use and users, who could be located and addressed easily. Others, for whom the target users were all inclusive ("the world") had problems which required both technological and social research knowhow to address.

There are issues for on-line questionnaire techniques which seriously mushroom when it is a question of generally accessible web resources.

Two issues raised by the group: "Anonimity culture" and "Techno proficiency barrier". I'd really like some examples here, to flesh out my "intuitive" understanding of what they mean for evaluation strategists.

We agreed it would be useful to keep in touch, and to pass experiences along the line. Whilst no truly global tools are likely to emerge, the notion of "Cluster Evaluation" (Kozma and Quellmalz 1996) might work - a portfolio of diverse yet related projects whose features can be evaluated from common perspectives. Points for clustering could include primary goals, educational approach (e.g. online seminars, project work), intended participants, context of use, technology type...)

Group 5 morning by Steve Draper

We ranged over a number of issues. First was big vs. small CAL (see an earlier section). Next, a series of points about what evaluation reports should be: jargon-free, user friendly, contextualised, and evangelism-friendly (a point expanded above). This latter point concerns how, from the point of view of changing how teaching is done in a university, evaluation is only one issue, and the issues of teacher acceptance and retraining (particularly in educational issues) are crucial; and furthermore that evaluation may be seen as threatening because it may push for cultural change, which in turn will render worthless skills that have been practiced for a long time.

The discussion then moved to a related set of points about what evaluation should be: early in development, involving many stakeholders and beginning with their concerns whether explicit or hidden. That is, it was argued (by Philip Crompton) that evaluation should begin by identifying the motivation behind the development of the courseware and behind doing the evaluation: unless an undertaking (e.g. writing a new piece of software or getting students to learn IT skills) has clearly defined aims and objectives how can we carry out a clear and realistic evaluation? Identifying the aims for the intervention, the stakeholder(s) in the intervention and the participants in the intervention at the beginning is crucial. The same applies to an evaluation: unless you know what you want to find out and/or what others expect you to examine, and what might be changed as a result, then how can you proceed? (Actually I don't fully agree with this. Often the most important findings are those that were not expected, and were not wanted or looked for deliberately. This is related to the issue of looking for unspecified educational gains: see above.)

Finally we discussed the notion that learners should become evaluators in their own right, and how that could be seen as part of the idea of evaluation becoming, not a specially commissioned enquiry, but part of permanent ongoing practice (see above).

Group 5 after lunch, reporter Kirsty Davidson

We chose to discuss a real life scenario facing one of the group members, centred on multimedia statistics courseware, which is being developed at Napier University. User-centred design has been used throughout the development of the courseware with frequent usability testing. The courseware is targeted at about 500 students a year studying a specific module in quantitative methods. As the module is under the control of the developers as regards integration, assessment, whether use of the courseware should be optional or compulsory, etc., there is a lot of potential for evaluation.

During the discussion it was observed that there are often opposing objectives in evaluation of courseware - to demonstrate improved quality of learning and to demonstrate reduced costs. The primary aim of the evaluation should be clear.

In order to demonstrate improved pedagogy, the effectiveness of achievement of the learning outcomes needs to be assessed. Pre- and post-tests were suggested as a means of assessing this, with three graded bands of questions starting off fairly simple and getting progressively harder, so that even at the pre-test, the student would be able to answer some questions. The desirability of anonymity was recognised to save any students' embarrassment, and some sort of coding, perhaps using matriculation number was suggested.

In order to evaluate the best way of implementing the courseware with a view to reducing costs, it was suggested that the courseware should be implemented in a staggered fashion with different classes, i.e. 3 classes + 1 unsupervised lab for one class, 2 classes + 1 supervised lab + 1 unsupervised lab for another class, and 1 class + 2 supervised classes + 1 unsupervised lab for another class. Exam marks and the views of the students could then be examined for the different approaches. This might point the way to identifying the optimum way to integrate the courseware with complementary teaching delivery including the probably essential human teacher component. It was pointed out that there are many other factors involved in the delivery of a course and a more holistic approach to such evaluation is often neccessary.