Methods & Meta-science

Stimulus-level control variables and effective sample size

Here, I wish to discuss an issue that I recently encountered in one of my supervised maxi projects. In this project, we were interested in whether capitalisation (LIKE SO) is less detrimental to the recognition of "loud" words (e.g. "hurricane") than to the recognition of "silent" words (e.g. "monastery"). Given that previous word recognition research had shown an interaction between word frequency and capitalisation, it was important to balance frequency between loud and silent words. Following common practise, we selected and pre-tested 36 loud and 36 silent words so that, at item level, there was no significant difference in lexical frequency, in spite of a numerical trend towards loud words being more frequent (95% CI for the difference: 0.47 ± 0.60). The problem is as follows: The non-significant frequency difference holds true at item level, but since the stimuli were repeatedly presented to different participants (N=33), statistical power for any frequency-related effects should increase accordingly. Indeed, at subject level, we obtain a 95% CI of 0.47 ± 0.10 for the frequency difference, meaning that we cannot simply neglect its potential influence. Potential lessons to be learned are (a) ideally have 'matched pairs' of stimuli on the control variable or (b) make sure that your norms take effective sample sizes into account.