30 June 2019 ............... Length about 1,000 words (11,000 bytes).
(Document started on 29 Jan 2017.)
This is a WWW document maintained by
Steve Draper, installed at http://www.psy.gla.ac.uk/~steve/best/rct.html.
You may copy it.
How to refer to it.
Web site logical path:
[My Educ. refs]
[regression to mean]
[Shayer's road map]
RCTs: summary of the gold standards, and degrees of merit,
for expt. designs
Department of Psychology,
University of Glasgow.
(In this web page, any literature references given or implied should be (although may in fact not be) on my page of educ. lit refs.)
This page overlaps with another /~steve/educ/meth.html
and I'm not sure whether to merge them.
add the stuff (now in my hawth/placebo page) on sham interventions as an
important control against Ps' expectations
The idea is: that in medicine at least, the gold standard for a study is the RCT (randomised controlled trial).
I've found it can be a good way to discuss and report on less-than-gold trials, by providing a checklist of what inferences from the data are and are not supported in a given trial.
Furthermore, there are some other terms to note for similar use in common designs e.g. "two-arm trial".
And some extra grades on top of RCT (platinum standard?)
Merit score of quality from 1 upwards:
1. Trial (intervention not correlational)
2. Randomised (not convenience sample, not self-selected. Stratified?)
Allocation concealment: preventing the people receiving participants
from knowing what condition the new P will be placed in until after that
allocation has been done.
by: baseline ==? within-participants design
Cross-over trials. 2-arm trials.
by "treatment as usual" i.e. treatment before the new idea being tested was proposed.
by baseline: a larger group getting no treatment
4. Blinded: (e.g. by placebo) then blinding 0, 1, 2, 3 [triple-blind]
5. Merely formal blinding vs. actually testing that the participant didn't know whether the condition was placebo or "treatment" —
despite formal blinding, many patients can tell from side-effects.
6. Co-morbidities excluded
7. Co-drugs excluded (confounded by potential multi-drug interactions?)
What is so great about randomisation and comparative trials is that they can
balance out, cancel out, factors which you do not understand nor even suspect
exist ("unknown unknowns"). In a way, this is absolutely important in science
because to exist, science must have been able to create knowledge about some
factor without knowing in advance all the factors that exist.
RCTs in themselves address key biasses, but not all.
Other sources remain, though they too may have commonly used solutions e.g.
(Jadad, A. R. and Enkin, M. W. (2007) "Bias in Randomized Controlled Trials" in
Randomized Controlled Trials: Questions, Answers, and Musings
Second Edition, Blackwell Publishing Ltd, Oxford, UK.
Common problems with both medical and educational trials
In public health / medicine.
MRC advice, which on page 4 has this sequence:
- Muddled thinking about ethics, that confuses what we know after the trial
with what we know in advance. (Retrolental fibroplasia)
- Stats should additionally report what % of Ps see the effect. Means
don't tell you this; effect size doesn't tell you this.
- Regression to the mean: selecting for a trial only patients with a
particular diagnosis. A number of these will randomly revert to below the
threshold for diagnosis regardless of any intervention. An RCT does control
for this BUT does not let you measure a figure of how many (what percent) were
cured by the treatment; (even more so if the "control" group get "standard
treatment" rather than no treatment).
The education counterpart of this is those who do better in the post-test
regardless, for random reasons or due to "maturation" ....
- Forgetting to keep in mind that each experiment in these fields is not
stand-alone but part of a chain of types of study. Sample road maps of these
In medicine / drug trials:
- Studies in lab cell cultures;
- Animal studies (!A failed mouse model of the human immune system very
nearly killed 100% of the human treatment group);
- Human safety studies;
- Small human trials of the treatment, perhaps with dying people;
- Large scale treatment trials.
In education (Shayer's view):
- Pre-clinical (Theory)
- Phase I: Modelling
- Phase II: Exploratory trial
- Phase III: Definitive RCT
- Phase IV: Long-term implementation
- One "lab" trial of an intervention done by the chief researcher
(hugely non-blind AND implicit skill and charisma of the researcher)
- Second trial, using any other teacher
(still non-blind AND still probably an enthusiast for the new method)
- Create and use a training course for teachers: trial to see if benefits
to learners still occur.
Common problems with medical trials
- A clinical sample is a self-selected, not random, sample.
- A clinical sample excluding all those who don't complete the trial
is self-selected not only by wanting treatment but by the treatment being
perceived by the P as effective.
- Intention to treat analysis.
This may be better than ignoring the issue, or excluding all those who don't
fully comply with the trial. It preserves randomisation by the Rs;
and it includes the real world issue of refusing treatment (and leaving the
trial) because the treatment is unpleasant.
But it doesn't prevent self-selection:
it doesn't solve problems such as patients leaving before the end
because they feel cured; leaving because they can tell they aren't getting the
treatment. And the problem that decisions to leave or continue can and will
change in time as knowledge or prejudices about the treatment change over
time. So it cannot be said to represent the near-future real world context
that would be most useful to learn about from a trial.
And it may see only, or misleadingly many, in both intervention and ctrl
group, of those who feel they are getting better (the latter being those who
have spontaneously got better).
Common problems with education trials
- The intervention, which the RCT compares to a control condition, is not
in fact the treatment we want to study, but is the combination of:
a) the treatment; b) the testing (which often also causes learning);
c) and often the rest of the context e.g. school, what is in the news, the
time in the academic year; ....
- Delayed post-test effects.
(Accelerated learning. Learning to learn. Catalytic effects. Papert's
gearwheels — developmental events that are measurable only years later.)
- Need for blinding the teacher (Pygmalion effect)
- Ls having different levels of internalised meta-cog skills at learning;
The better L is at learning the less the teaching / intervention makes
- And these may be acquired during the trial.
(Asymmetric learning effects.)
- The teacher is in almost all cases the biggest causal factor, and the
learning design you are interested in is just a minor additional factor.
Hanushek; Dylan Wiliam.
- Lovisa's problem: are there any standardised measures available?
In an RCT non-standardised measures still allow an unbiassed comparison of
groups; but unless standardised, you can't compare the measurements from one
trial to another e.g. a classes' marks one year to another year.
You can today measure time very accurately; but not knowledge in any HE
discipline (no national exam bodies).
- The length of time to study e.g.
- Memory retention over a few minutes in a 60 second trial
- VS. the ecologically valid unit of learning time in HE —
one semester. (N.B. This is not different from many /most medical trials.)
In medical trials, the comparable issue is selecting the survival period to
measure and use.
- Can do 1-hour interventions; with one or more delayed measures e.g. at
end of semester — this is
not very difficult; although a) drop outs are signifcantly more, b) other
stimuli over the whole period adds to the random noise. But many successful
educational trials have been like this.
- BUT always remember that with short things you can do very many
more experiments. (e.g. compared to animal and plant breeding).
- Also not only more experiments, but more doses within one intervention.
E.g. do 6 mini-presentations per P within an hour. Look at the number of
seconds a student speaks in different cases. In a class of 200, most never
speak a word; in a class of 30 (school) it is shocking how few seconds
each pupil can speak in an "interactive" class.
in a group of 6 maybe one 5 min speech by each L; but with 1
min. talks, and groups of 3, each can speak say 6 times and get / give
feedback to the other 2 in their group.
Web site logical path:
[Top of this page]