Safer Science

Simine Vazire

Simine VazireAt the last ARP conference in Charlotte, I organized a symposium called “Safer Science: How to improve the quality and replicability of personality research”. I got the idea for this symposium from various discussions I’d had about the “replicability crisis”.  First, I served on an APS taskforce on this issue, during which my primary role was to throw a yellow flag every time the group made a recommendation that applied only to experimental designs.  This experience made me reflect on which of the challenges our field faces are specific to experimental (i.e., mostly social and cognitive) psych, and what are the unique challenges faced by non-experimental research.  It’s easy to get haughty about not dropping conditions or not peeking at your data when you work with large-scale correlational datasets.  But we personality researchers are also susceptible to capitalizing on chance in other ways (it’s easy to do when our datasets have 7,000 variables).

The purpose of the Safer Science symposium was to stimulate discussion about what personality researchers can do to improve the quality of our research.  I used the phrase “safer science” because, like sex, I believe that science is never completely safe – there will always be errors, false positives, and even fraud, and we should not delude ourselves that we can eradicate these. However, we can always do better.  Some people seem to find the replicability crisis depressing – I find it uplifting because the popularity of the reform initiatives shows that we want to do better.  This is what progress looks like.  This is the only way a field becomes stronger.  Rather than pointing fingers, personality researchers should join in.

There are signs of change everywhere.  JRP has new submission guidelines emphasizing power and transparency, and encouraging replication studies.  SPSP is in the midst of similar changes.  APS has already opened its doors to pre-registered replication reports, and is about to put in place new submission guidelines for Psychological Science. NSF is holding a workshop on replicability.  It’s a brave new world.

What does this mean for personality research? The talks in the Safer Science symposium shed some light on some issues for us to keep in mind as we try to do better science.  Here are a few highlights (with some editorializing from me):

So where do we go from here?  First, we shouldn’t get too comfortable.  None of the journals we examined reached the average sample size Sanjay recommended: 180 for 80% power (to detect an r of .10), or 300 for 90% power.  The average sample size at JP – the journal that came out on top – is 178, at JRP it is 128, and at JPSP:PPID it is 122 (Fraley & Vazire, 2013).  We are still falling short of adequate power.  So lesson #1 is that we should, whenever possible, take the time to increase our sample sizes.

Second, we should remember to take the time to attempt to replicate our own and each other’s work.  And we shouldn’t take an attempted replication as a sign of mistrust.  Indeed, as Funder put it, we should be flattered if someone deems our finding important enough to be worthy of attempted replication.

Notice that I used the phrase “take the time” in both of these recommendations.  That’s because if we are to increase our sample sizes and replicate our results, it is going to take time.  And time is something that none of us feel like we have an abundance of. So what are we to do? One option is to put all of our studies on mTurk, or start doing only self-report or vignette studies.  That is my idea of research dystopia. Yes, we need larger samples and more replication, but not at the expense of methodological rigor – we need to continue using multiple methods, sampling non-college students, coding actual behavior, and tracking people over time.  So there’s only one solution. We need to join the slow science movement.

References

Fraley, R. C. & Vazire, S. (2013). The N-Pact Factor: Evaluating the quality of empirical journals with respect to sample size and statistical power. Manuscript under review.