My View on the Connection between Theory and Direct Replication – Brent Donnellan (The Trait-State Continuum)

I loved Simine’s blog post on flukiness and I don’t want to hijack the comments section of her blog with my own diatribe. So here it goes… I want to comment on the suggestion that researchers should propose an alternative theory to conduct a useful or meaningful close/exact/direct replication. In practice, I think most replicators draw on the same theory that original authors used for the original study.  Moreover, I worry that people making this argument (or even more extreme variants) sometimes get pretty darn close to equating a theory with a sort of religion.  As in, you have to truly believe (deep in your heart) the theory or else the attempt is not valid.  The point of a direct replication is to make sure the results of a particular method are robust and obtainable by independent researchers. My take: Original authors used Theory P to derive Prediction Q (If P then Q). This is the deep structure of the Introduction of their paper.  They then report evidence consistent with Q using a particular Method (M) in the Results section. A replicator might find the theoretical reasoning more or less plausible but mostly just think it is a good idea to evaluate whether repeating M yields the same result (especially if the original study was underpowered).* The point of the replication is to redo M (and ideally improve on it using a larger N to generate more precise parameter estimates) to test Prediction Q. Continue reading

on flukiness – Simine Vazire (sometimes i'm wrong)

Underwater_hippo1

this idea keeps popping up: if you conduct a replication study and get a null result, you need to explain why the original study found a significant effect and you didn't.
what's wrong with this idea?  a few things.
first, it seems to discount the possibility that the original finding was a fluke - a false positive that made it look like there is an effect when in fact there isn't.  here's an analogy:
null hypothesis: my coin is fair
research hypothesis: my coin is weighted
original study: i flip the coin 20 times and get 15 heads (p = .041) replication study: i flip the coin another 20 times and get 10 heads (p = 1.0) do i need to explain why i got 15 heads the first time?
maybe.  or maybe the first study was just a fluke.  that happens sometimes (4.1% of the time, to be exact).
what if the replication study was: i flip the same coin 100 times and get 50 heads?  now isn't the evidence pretty strong that the null is true, and the original study was just a fluke? Continue reading

ASA releases consensus statement – Sanjay Srivastava (The Hardest Science)

Several months ago, the journal Basic and Applied Social Psychology published an editorial announcing a “ban” on p-values and confidence intervals, and treating Bayesian inferential methods with suspicion as well. The editorial generated quite a bit of buzz among scientists and statisticians alike. In response the American Statistical Association released a letter expressing concern about the prospect of doing science without any inferential statistics at all. It announced that it would assemble a blue-ribbon panel of statisticians to issue recommendations. That statement has now been completed, and I got my hands on an advance copy. Here it is: We, the undersigned statisticians, represent the full range of statistical perspectives, Bayesian and frequentist alike. We have come to full agreement on the following points: 1. Regarding guiding principles, we all agree that statistical inference is an essential part of science and should not be dispensed with under any circumstances. Whenever possible you should put one of us on your grant to do it for you. 2. Continue reading

Gender Imbalance in Discussions of Best Research Practices – Michael Kraus (Psych Your Mind)

Over the last couple of weeks there have been some really excellent blog posts about gender representation in discussions of best research practices. The first was a shared Email correspondence between Simine Vazire and Lee Jussim. The second was a report of gender imbalance in discussions of best research practices by Alison Ledgerwood, Elizabeth Haines, and Kate Ratliff. Before then (May 2014), Sanjay Srivastava wrote about a probable diversity problem in the best practices debate. Go read these posts! I'll be here when you return. Read More->

“Open Source, Open Science” Meeting Report – March 2015 – Tal Yarkoni ([citation needed])

[The report below was collectively authored by participants at the Open Source, Open Science meeting, and has been cross-posted in other places.] On March 19th and 20th, the Center for Open Science hosted a small meeting in Charlottesville, VA, convened by COS and co-organized by Kaitlin Thaney (Mozilla Science Lab) and Titus Brown (UC Davis). People working across the open science ecosystem attended, including publishers, infrastructure non-profits, public policy experts, community builders, and academics. Open Science has emerged into the mainstream, primarily due to concerted efforts from various individuals, institutions, and initiatives. This small, focused gathering brought together several of those community leaders. The purpose of the meeting was to define common goals, discuss common challenges, and coordinate on common efforts. We had good discussions about several issues at the intersection of technology and social hacking including badging, improving standards for scientific APIs, and developing shared infrastructure. We also talked about coordination challenges due to the rapid growth of the open science community. At least three collaborative projects emerged from the meeting as concrete outcomes to combat the coordination challenges. A repeated theme was how to make the value proposition of open science more explicit. Why should scientists become more open, and why should institutions and funders support open science? We agreed that incentives in science are misaligned with practices, and we identified particular pain points and opportunities to nudge incentives. Continue reading

Guest Post: Not Nutting Up or Shutting Up – Simine Vazire (sometimes i'm wrong)

Not nutting up or shutting up: Notes on the demographic disconnect in our field’s best practices conversation

Alison Ledgerwood, Elizabeth Haines, and Kate Ratliff

A few weeks ago, two of us chaired a symposium on best practices at SPSP focusing on concrete steps that researchers can take right now to maximize the information they get from the work that they do. Before starting, we paused briefly to ask a couple simple questions about the field’s ongoing conversation on these issues. Our goal was to take a step back for a moment and consider both who is doing the talking as well as how are we talking about these issues. Apparently our brief pause sounded strident to some ears, and precipitated an email debate that was ultimately publicized on two blogs. The thing is, the issues we originally wanted to raise seemed to be getting a little lost in translation. And somehow, despite the absolute best of intentions of the two people having the (cordial, reasonable, interesting) debate, we had become literally invisible in the conversation that was taking place. So we thought maybe we would chime in, and Simine graciously allowed us to guest blog.* As we said in our symposium, a conversation about where the field as a whole is going should involve the field as a whole. And yet, when we look at the demographics of the voices involved in the conversation on best practices and the demographics of the field, it’s clear that there’s a disconnect.** For instance, the SPSP membership is about 56% female. Continue reading

Is there p-hacking in a new breastfeeding study? And is disclosure enough? – Sanjay Srivastava (The Hardest Science)

There is a new study out about the benefits of breastfeeding on eventual adult IQ, published in The Lancet Global Health. It’s getting lots of news coverage, for example in NPR, BBC, New York Times, and more. A friend shared a link and asked what I thought of it. So I took a look at the article and came across this (emphasis added):

We based statistical comparisons between categories on tests of heterogeneity and linear trend, and we present the one with the lower p value. We used Stata 13·0 for the analyses. We did four sets of analyses to compare breastfeeding categories in terms of arithmetic means, geometric means, median income, and to exclude participants who were unemployed and therefore had no income.

Yikes. The description of the analyses is frankly a little telegraphic. But unless I’m misreading it, or they did some kind of statistical correction that they forgot to mention, it sounds like they had flexibility in the data analyses (I saw no mention of pre-registration in the analysis plan), they used that flexibility to test multiple comparisons, and they’re openly disclosing that they used p-values for model selection – which is a more technical way of saying they engaged in p-hacking. (They don’t say how they selected among the 4 sets of analyses with different kinds of means etc.; was that based on p-values too?)* Continue reading