Category Archives: funderstorms

Thresholds – David Funder (funderstorms)

Part One I’ve been suffering an acute bout of cognitive dissonance lately, finding myself disagreeing with people I admire, specifically, several of the authors of this article. (The article has 72 authors and I don’t know all of them!)   The gist of the article can be stated very simply and in the authors’ own words: “We propose to change the default P-value threshold for statistical significance for claims of new discoveries from .05 to .005.”  This proposal is soberly, clearly argued and the article makes some good points, the best of which is that, imperfect as this change would be, at least it’s a step in the right direction.  But I respectfully disagree.  Here’s why. I’m starting to think that p-levels should all be labeled “for entertainment purposes only.”  They give a very very rough idea of the non-randomness of your data, and are kind of interesting to look at. So they’re not completely useless, but they are imprecise at best and almost impossible to interpret at worst*, and so should be treated as only one among many considerations when we decide what we as scientists actually believe.  Other considerations (partial list): prior probabilities (also very rough!), conceptual coherence, consistency with related findings, and (hats off please) replicability. Continue reading

Why doesn’t personality psychology have a replication crisis? – David Funder (funderstorms)

Because It’s Boring

“[Personality psychology] has reduced the chances of being wrong but palpably increased the fact of being boring. In making that transition, personality psychology became more accurate but less broadly interesting.”  — Roy Baumeister (2016, p. 6) Many fields of research – not just social psychology but also biomedicine, cancer biology, economics, political science, and even physics – are experiencing crises of replicability.  Recent and classic results are challenged by reports that when new investigators try to repeat them, often they simply can’t.  This fact has led to gnashing of teeth and rending of garments, not to mention back-and-forth controversies pitting creativity against rigor (see the article quoted in the epigram), and spawned memorable phrases such as “replication police” and “shameless little bullies.” But, as the quote above attests, personality psychology seems to be immune.  In particular, I am not aware of any major finding (1) in personality psychology that has experienced the kind of assault on its reliability that has been inflicted upon many findings in social psychology (2).  Why not?  Is it because personality psychology is boring?  Maybe so, and I’ll come back to that point at the end, but first let’s consider some other

Possible Reasons Personality Psychology Does Not Have a Replication Crisis

  1. Personality Psychology Takes Measurement Seriously
The typical study in personality measures some attribute of persons (usually a personality trait) and also measures an outcome such as a behavior, a level of attainment, or an indicator of mental or physical health. Continue reading

What if Gilbert is Right? – David Funder (funderstorms)

I. The Story Until Now (For late arrivals to the party) Over the decades, since about 1970, social psychologists conducted lots of studies, some of which found cute, counter-intuitive effects that gained great attention. After years of private rumblings that many of these studies – especially some of the cutest ones – couldn’t be replicated, a crisis suddenly broke out into the open (1). Failures to replicate famous and even beloved findings began to publicly appear, become well known, and be thoroughly argued-over, not always in the most civil of terms. The “replicability crisis” became a thing. But how bad was the crisis really? The accumulation of anecdotal stories and one-off failures to replicate was perhaps clarified to some extent by a major project organized by the Center for Open Science (COS), published last November, in which labs around the world tried to replicate 100 studies and, depending on your definition, “replicated” only 36% of them (2). In the face of all this, some optimists argued that social psychology shouldn’t really feel so bad, because failed replicators might simply be incompetent, if not actually motivated to fail, and the typical cute, counter-intuitive effect is a delicate flower that can only bloom under the most ideal climate and careful cultivation. Optimists of a different variety (including myself) also pointed out that psychology shouldn’t feel so bad, but for a different reason: problems of replicability are far from unique to our field. Failures to reproduce key findings have become seen as serious problems within biology, biochemistry, cardiac medicine, and even – and disturbingly –cancer research. It was widely reported that the massive biotech company Amgen was unable to replicate 47 out of 53 of seemingly promising cancer biology studies. If we have a problem, we are far from alone. II. And Then Came Last Friday’s News (3) Prominent psychology professors Daniel Gilbert and Tim Wilson published an article that “overturned” (4) the epic COS study. Continue reading

Bargain Basement Bayes – David Funder (funderstorms)

One of the more salutary consequences of the “replication crisis” has been a flurry of articles and blog posts re-examining basic statistical issues such as the relations between N and statistical power, the importance of effect size, the interpretation of confidence intervals, and the meaning of probability levels. A lot of the discussion of what is now often called the “new statistics” really amounts to a re-teaching (or first teaching?) of things anybody, certainly anybody with an advanced degree in psychology, should have learned in graduate school if not as an undergraduate. It should not be news, for example, that bigger N’s give you a bigger chance of getting reliable results, including being more likely to find effects that are real and not being fooled into thinking you have found effects when they aren’t real. Nor should anybody who had a decent undergrad stats teacher be surprised to learn that p-levels, effect sizes and N’s are functions of each other, such that if you know any two of them you can compute the third, and that therefore statements like “I don’t care about effect size” are absurd when said by anybody who uses p-levels and N’s. But that’s not my topic for today. My topic today is Bayes’ theorem, which is an important alternative to the usual statistical methods, but which is rarely taught at the undergraduate or even graduate level.(1)  I am far from expert about Bayesian statistics. This fact gives me an important advantage: I won’t get bogged down in technical details; in fact that would be impossible, because I don’t really understand them. A problem with discussions of Bayes’ theorem that I often see in blogs and articles is that they have a way of being both technical and dogmatic. A lot of ink – virtual and real – has been spilled about the exact right way to compute Bayes Factors and advocating that all statistical analyses should be conducted within a Bayesian framework. I don’t think the technical and dogmatic aspects of these articles are helpful – in fact I think they are mostly harmful – for helping non-experts to appreciate what thinking in a semi-Bayesian way has to offer. So, herewith is my extremely non-technical and very possibly wrong (2) appreciation of what I call Bargain Basement Bayes. Continue reading

Towards a De-biased Social Psychology: The effects of ideological perspective go beyond politics. – David Funder (funderstorms)

Behavioral and Brain Sciences, in press; subject to final editing before publication This is a commentary on: Duarte, J. L., Crawford, J. T., Stern, C., Haidt, J., Jussim, L., & Tetlock, P. E.  (in press). Political diversity will improve social psychological science. Behavioral and Brain Sciences. To access the target article, click here. Continue reading

How to Flunk Uber: A Guest Post by Bob Hogan – David Funder (funderstorms)

How to Flunk Uber by Robert Hogan Hogan Assessment Systems Delia Ephron, a best-selling American author, screenwriter, and playwright, published an essay in the New York Times on August 31st, 2014 entitled “Ouch, My Personality, Reviewed”  that is a superb example of what Freud called “the psychopathology of everyday life.”  She starts the essay by noting that she recently used Uber, the car service for metrosexuals, and the driver told her that if she received one more bad review, “…no driver will pick you up.”  She reports that this feedback triggered some “obsessive” soul searching:  she wondered how she could have created such a bad score as an Uber passenger when she had only used the service 6 times.  She then reviewed her trips, noting that, although she had often behaved badly (“I do get short tempered when I am anxious”), in each case extenuating circumstances caused her behavior.  She even got a bad review after a trip during which she said very little:  “Perhaps I simply am not a nice person and an Uber driver sensed it.” The essay is interesting because it is prototypical of people who can’t learn from experience.  For example, when Ms. Ephron reviewed the situations in which she mistreated Uber drivers, she spun each incident to show that her behavior should be understood in terms of the circumstances—the driver’s poor performance—and not in terms of her personality.  Perhaps situational explanations are the last refuge of both neurotics and social psychologists? In addition, although the situations changed, she behaved the same way in each of them:  she complained, she nagged and micro-managed the drivers, she lost her temper, and she broadcast her unhappiness to the world. Continue reading

The Real Source of the Replication Crisis – David Funder (funderstorms)

“Replication police.” “P-squashers.” “Hand-wringers.” “Hostile replicators.”  And of course, who can ever forget, “shameless little bullies.”  These are just some of the labels applied to what has become known as the replication movement, an attempt to improve science (psychological and otherwise) by assessing whether key findings can be reproduced in independent laboratories.

Replication researchers have sometimes targeted findings they found doubtful.  The grounds for finding them doubtful have included (a) the effect is “counter-intuitive” or in some way seems odd (1), (b) the original study had a small N and an implausibly large effect size, (c) anecdotes (typically heard at hotel bars during conferences) abound concerning naïve researchers who can’t reproduce the finding, (d) the researcher who found the effect refuses to make data public, has “lost” the data or refuses to answer procedural questions, or (e) sometimes, all of the above.

Fair enough. If a finding seems doubtful, and it’s important, then it behooves the science (if not any particular researcher) to get to the bottom of things.  And we’ve seen a lot of attempts to do that lately. Famous findings by prominent researchers have been put  through the replication wringer, sometimes with discouraging results.  But several of these findings also have been stoutly defended, and indeed the failure to replicate certain prominent effects seems to have stimulated much of the invective thrown at replicators more generally. Continue reading