Methods & Meta-science

Essentials of data manipulation

Datasets in psychology and neuroscience are becoming increasingly complex, and researchers typically spend huge amounts of time just coping with this complexity. Most researchers use an ad hoc, patched-together workflow involving numerous manual point-and-click operations across multiple software packages. Not only is such a workflow inefficient and error-prone, it is also not reproducible. We can improve this situation by taking note of data scientists' recent attempts to develop a "grammar" for data manipulation, now implemented in several packages (e.g., dplyr in R, pandas in Python). I will give an overview of these developments, and lead discussion on what data manipulation skills are essential for researchers to know.