Last changed 25 Oct 2016 ............... Length about 5,000 words (42,000 bytes).
(Document started on 25 Dec 2009.) This is a WWW document maintained by Steve Draper, installed at You may copy it. How to refer to it.

Web site logical path: [] [~steve] [rap] [this page]

Assessment and Feedback (A&F) in HE

By Steve Draper,   Department of Psychology,   University of Glasgow.

This is an entry page into pages on A&F (assessment and feedback) in HE. This began with my involvement with the REAP project (April 2005 - July 2007); and with followup work.

Contents (click to jump to a section)

Design Principles

Do students actually want or need feedback?

The rest of this page goes with the conventional literature that presupposes without even discussion, much less evidence, that students want feedback in order to improve their learning. Although this may be untrue:

Oh well, back to the bad old assumptions ...

Interventions (some learning designs for feedback)

My pages on particular A&F interventions (learning designs, methods)

Links to other websites on A&F

  • GU "LEAF" A&F toolkit

  • REAP project website       (Page on my part of the REAP project)
  • New Strathclyde website(s) on A&F
  • Students

    5+ important websites on A&F

    GU "LEAF" for A&F toolkit
    REAP (see above) (hosted at Strathclyde).
    Dai Hounsell (Edinburgh) Enhancing Feedback site
    Edinburgh University website on A&F: A&F site
    David Boud (UT Sydney): feedback
    David Boud (UT Sydney): assessment
    Australian project, hosted at Adelaide, on assessment in web 2.0 Transforming assessment
    TESTA project (at Winchester) for evaluating a programme's assessment TESTA
    HE academy HEA on assessment (1)
    HE academy Transforming assessment in higher education toolkit (2)
    HE academy SENLEF feedback (3)
    Sheffield Sheffield toolkit
    QAA Assessment in first years
    GCU (Caledonian) Feedback for Future Learning
    See also a 2-part report on practical advice on giving learners feedback by Thalheimer:   Part 1   Part 2

    Conflicting criteria for assessment design

    In this draft, I'm writing this section egocentrically, referring to practices in this psychology dept., which is an essay-based discipline. I believe the points are general, but here I'm not writing to bring this out.

    The idea here is NOT to offer techniques for assessment BUT to provide a clear statement about the conflicting criteria which any assessment must satisfy or compromise over. This is necessary to have any rational thought, let alone discussion, about choices in decideing on assessment design. Most of the literature is lacking this.

    a) Criteria / requirements / dimensions of merit / aims / constraints: all of which independently apply to any assessment design.
    List EXPLICITLY the key criteria that have to be considered, both the naively aspirational educational slogans, and also the unspeakable but real constraints. What is hard about redesigning assessment is that there isn't one thing you want to improve, but how to optimise, or at least satisfice (reach an acceptibility threshold), multiple requirements that often conflict. This is made much harder by some not being written down in public, and so not discussed rationally by staff. (There is a provisional list below.)

    b) Metrics: For each of these criteria give a measurement scale that shows teachers the degree to which it is satisfied. E.g. if you want to raise the NSS score, then the NSS subscale is the measure (and could be administered every semester by a course team). If you want to improve learning, then you must show (for instance) grade rises year on year to demonstrate whether or not you succeeded.

    c) Marks: I will also occasionally mention the marks or grades given to students as the result of an assessment activity, to point out what they would (logically) mean if they were to represent that educational aim (criterion) for the assessment design.

    Draft list of assessment constraints / dimensions

    1. Learning from doing. The single biggest use of assessment at the moment, which however is never mentioned in most literature on assessment, (is not to measure student knowledge at all, but) is to mount an activity which is powerfully "mathemagenic" (productive of learning). We learn mostly by doing; it often doesn't matter whether it succeeds or fails, it often requires NO feedback by staff (contrary to what Laurillard says), just the internal changes that happen when we plan and attempt something new.

      Reword? At a simple level, the whole of the Maths presentation at the workshop was about the large demonstrated learning benefits from persuading students to actually do some maths work every week. Conversely, students generally report learning a lot from doing their final year project, although we don't measure this. Their whole redesign addresses this criterion.

      Metric: The metric for satisfying this design criterion/aim is how much the student learns from the activity, pre-to-post.
      Mark: essentially this measures attendance (engaging in the learning activity with reasonable sincerity), if it aligns with this aim.

    2. Produces information that is useful to the student.
      1. The information might be used either formatively or summatively.
      2. It may be based on:
        1. A human judgement
        2. A fact (right answer), independently confirmable elsewhere
        3. Or most powerfully, the degree of success of a construction (a bridge you built, a cake you baked).
      3. It may be in the form of:
        1. a mark/grade,
        2. written comments,
        3. It may be only an internal effect of changing the learner's degree of certainty / confidence in knowing something. Here are three kinds of assessment to do with this:
          • "Catalytic assessment" (Draper, 2009b) and peer discussion in general is one way of problematising confidence: the learner wonders if they have got it right, and is likely to work later to resolve it.
          • Formative tests: typically these have many items, and which items a learner fails shows which topics they need to direct further effort to.
          • Reassurance quizzes: essentially these are like formative tests in that any missed items show something that needs further work, but students may mostly use the overall score to tell then whether they are on the right track or have mistaken a lot and need to do a major redirection of effort.

      [2.2] But the often neglected further issue is: to which use is it put? As argued in Draper (2009a), egocentric academics hold whole conferences on A&F while presupposing that the only use is to improve the technical knowledge of the learner. Each type of learner use of assessment and feedback is in fact an independent criterion for designing an assessment, so that it produces that information. Thus this one sub-criterion of providing informaton useful to the learner in fact produces six alternative independent criteria, all desirable.

      Draper,S.W. (2009a) "What are learners actually regulating when given feedback?" British Journal of Educational Technology vol.40 no.2 pp.306-315 doi:10.1111/j.1467-8535.2008.00930.x
      Draper,S.W. (2009b) "Catalytic assessment: understanding how MCQs and EVS can foster deep learning" British Journal of Educational Technology vol.40 no.2 pp.285-293 doi:10.1111/j.1467-8535.2008.00920.x

      One list of learner uses follows.

      1. Self-regulate and allocate the learner's limited time and effort: if I got a B grade, then I needn't think about this topic any more. Spend less time on what I'm good at, more on what I am struggling with. As used in "mastery learning", and its use of formative testing to focus remedial learning each week, this brings large gains.
        Another form of this is "catalytic assessment" (Draper, 2009b): designed, like a brain-teaser, to signal to the learner that this is something they don't understand yet but want to.
      2. Decide future courses, based on what I did well on in the past. Spend more time on what I'm good at, drop what I struggle with. Our educational system requires students to make choices, but we fail to design assessments to support that choice optimally.
      3. Decide on the quality of the marker. Seek out other opinions.
      4. Improve the learner's technical knowledge.
      5. Decide whether and how to adjust my learning method.
      6. The mark may be interpreted by a learner as feedback on their learning, revision, and exam technique as a whole process.

      Metric: measures of pre/post change in information picked up by the learner.

    3. Cost to staff (in time, mostly).
      Metric: Staff-hours on the assessment.
    4. Defensiveness against student complaints, which cost both school and senate office staff a lot of time and trouble. This criterion has always been the main problem obstructing useful feedback from exams.
      Metric: Staff-hours / money spent on complaints and appeals.
    5. A measure for employers to use to discriminate amongst job applicants.
      Metric: (Validity, reliability, and ..) One metric is variance. E.g. Coursework not only has a higher mean mark, it has a lower standard deviation which makes it of considerably less use in discriminating capability.
    6. A measure of competence: (if you want this, use senate schedule C for pass/fail course marks; if you don't then don't moan about competence assessment as an aim).
      Metric: (Validity, reliability)
      Mark: Pass / fail.
    7. A measure of how much specific knowledge a student has. Our level 3 stats exam does this well; our other level 3 exams (1-hour essays) do not, because they offer a choice of questions each of which requires only a small proportion of the course's content knowledge.
      Metric: (Validity, reliability)
    8. A measure of generic discipline skill. Exam essays are our instrument for this, and quite good at it. The main criterion is: to what extent is the essay written like a psychologist? There are low level skills we teach but don't use much assessment on measuring. Then (mid-level) we assess specific content knowledge (facts and concepts rather than skills). We, like most departments, focus most on the ultimate, deep, high level skill of thinking and writing like a professional in the discipline. It is why essays are fundamentally confusing to level 1 students because essays mean different things in each discipline: for a very deep reason. The metric for this criterion is whether a given assessment measures, usually tacitly, how well the candidate exhibits disciplinary thinking (rather than reproduction of specific facts, names, etc.).
      Metric: (Validity, reliability)
    9. Student enjoyment of the activity: Do students like doing it? Giving students a choice of topic in an essay or project is motivated by this. On all other criteria, a fixed topic would be better. (Students may learn more if they enjoy it: that would be a positive secondary effect. Equally, they may choose a topic that is least work to them: a negative secondary effect on how much they learn.) In choosing a topic for an assessment, students are in fact choosing part of their curriculum: another deep educational issue disguised as an assessment design choice by teachers.
      Metric: student self-reports on enjoyment. More sophisticated versions of this might ask for self-reports on how much they feel they learned, and separately how much it corresponded to their intrinsic learning goals (as opposed to required curriculum learning goals).
    10. Raise NSS scores for the A&F subscale. There is generally little correlation between scores on the A&F subscale, and on the overall course (programme) satisfaction, so there is no reason to think that A&F contributes to learning nor to student satisfaction.
      Metric: The NSS subscale: how much does it increase?

    Key issues

    NSS: A&F scores don't affect the overall student rating of a course

    Perhaps feedback doesn't make a difference to the amount of learning. Teachers should have communicated it in advance, so feedback not necessary; learners should know how to check and remediate their own learning, and not rely on being told this.

    F-Prompting seems to be SO important, transformative of whether students learn from feedback. The main problem seems to be that our students mostly do not have any concept of learing from our written feedback: it doesn't occur to them to actively use it.

    Transformation: How to achieve change in practice

    Reflecting back on the success of REAP gave us some ideas on what does (and does not) go into making a project effective at actually changing learning and teaching in practice, and making it measurably better. These papers are about this, and so effectively on ideas about how to design and run large projects that bring about significant, large scale changes (in areas such as A&F).

  • References and links to Carol Twigg's work on transformation       Resources for doing it her way

  • Transforming Higher Education through Technology-Enhanced Learning ed. Terry Mayes, Derek Morrison, Harvey Mellar, Peter Bullen and Martin Oliver (2009) (York: Higher Education Academy). A book, available online.

    QEE website

  • QEE / QET   Integrative Assessment
  • sharepoint s.draper     4i8x7t1m
  • HEAcademy


    Web site logical path: [] [~steve] [rap] [this page]
    [Top of this page]