Category Archives: funderstorms

Thoughts on “Ego Depletion” and Some Related Issues Concerning Replication – David Funder (funderstorms)

This brief essay was stimulated by a chapter by Baumeister (2019), which can be accessed at https://psyarxiv.com/uf3cn/.  “Fatigue,” though a common word, is far from being a boring or neglected concept. A quick search on PsychInfo reveals thousands of published articles on the subject (14,892, to be exact). A lot of this work is theoretical, more of it is applied, and all of it focuses on an experience that is common to everybody. I was particularly impressed by an article by Evans, Boggero, and Segerstrom (2016) that illuminates the connections between physical and psychological factors, and specifically addresses “how fatigue can occur even in the presence of sufficient resources.”  In fact, I read their article as providing evidence that fatigue usually occurs in the presence of sufficient resources – it’s not primarily a physical phenomenon at all; it’s a psychological one. This fact has many important implications. Fascinating stuff. The related phenomena demonstrated by many, many studies of “ego depletion” are real, and important, and I personally have no doubt whatsoever about that. When people are tired, including psychologically tired (an interesting concept in its own right), their self-control abilities wane, prepotent responses (such as emotional lashing out, simplistic thinking, overlearned habits, selfish impulses) tend to take over as conscious control weakens. Isn’t that pretty much what the studies show, in the aggregate? Does anybody doubt that really happens? Continue reading

MIsgivings: Some thoughts about “measurement invariance” – David Funder (funderstorms)

As a newcomer to cross-cultural research a few years ago, I soon became aware of the term “measurement invariance,” which typically is given as a necessary condition for using a psychological measurement instrument, such as a personality inventory, in more than one cultural context[1]. At one of the first talks where I presented some then-new data gathered in 20 different countries, using a new instrument developed in our lab (the Riverside Situational Q-sort) a member of the audience asked “what did you do to assess measurement invariance.”  I had no real answer, and my questioner shook his head sadly. Which, I started to realize, is kind of the generic response when these issues come up. If a researcher gathers data in multiple cultures and doesn’t assess measurement invariance, then the researcher earns scorn – from certain kinds of critics – for ignoring the issue. If the researcher does do the conventional kinds of analyses recommended to assess measurement invariance, the results are often discouraging. The RMSEA’s are out of whack, Delta CFI’s are bigger than .01, and oh my goodness, the item intercepts are not even close to equivalent so scalar invariance is a total joke, not to mention the forlorn hope of attaining “strict” invariance (which sounds harsh, because it is) . A member of a symposium I recently attended exclaimed, “If you can show me some real data where strict measurement invariance was achieved across cultures, I shall buy you a beer!” He had no takers. The following message is approaching the status of conventional wisdom: the lack of equivalence in the properties of psychological measures across cultures means that they cannot be used for cross-cultural comparison and attempts to do so are not just psychometrically ignorant, they are fatally flawed. As I have become a bit more experienced, however, I have begun to develop some misgivings about this conclusion, and the whole business of “measurement invariance,” which I put in scare quotes because I suspect there is less there, than meets the eye. Below, I shall refer to it simply as MI. Continue reading

8 Words Psychologists Have Almost Ruined6 – David Funder (funderstorms)

Psychology has almost ruined some perfectly innocent words. In each case, the first step was to take a useful word from the English language and give it a technical meaning that did not exactly or, in some cases, even approximately, match what it meant to begin with.  The second step – and this one is crucial – was to forget that this was done, and act as if the word still had its original meaning. The result: widespread confusion. Examples, starting with the most obvious: Significant (adj.): What it originally meant: sufficiently great or important to be worthy of attention; noteworthy[1]. How psychology uses the word: As used in “significance testing,” this word actually, and merely, means not-random. The most succinct – and accurate – interpretation of the meaning of a “significant finding” that I’ve seen is “there’s not nothing going on.” Why it’s a problem: An undergraduate psychology student, having just taken a stats course, phones home one evening and says, “Mom, something significant happened today!” Mom: “Oh my goodness, Sweetie, what do you mean?” Undergraduate: “I mean, there’s less than a 5% chance that what happened was completely random!!! Continue reading

Replication and Open Science for Undergraduates – David Funder (funderstorms)

(Draft of material for forthcoming The Personality Puzzle, 8th edition. New York: W.W. Norton). [Note: These are two sections of a chapter on Research Methods, and the first section follows a discussion of Null Hypothesis Significance Testing (NHST) and effect size.] Replication Beyond the size of a research result, no matter how it is evaluated, lies a second and even more fundamental question:  Is the result dependable, something you could expect to find again and again, or did it merely occur by chance? As was discussed above, null hypothesis significance testing (NHST) is typically used to answer this question, but it is not really up to the job. A much better indication of the stability of results is replication. In other words, do the study again. Statistical significance is all well and good, but there is nothing quite so persuasive as finding the same result repeatedly, with different participants and in different labs (Asendorpf et al., 2013; Funder et al., 2014). Continue reading

Thresholds – David Funder (funderstorms)

Part One I’ve been suffering an acute bout of cognitive dissonance lately, finding myself disagreeing with people I admire, specifically, several of the authors of this article. (The article has 72 authors and I don’t know all of them!)   The gist of the article can be stated very simply and in the authors’ own words: “We propose to change the default P-value threshold for statistical significance for claims of new discoveries from .05 to .005.”  This proposal is soberly, clearly argued and the article makes some good points, the best of which is that, imperfect as this change would be, at least it’s a step in the right direction.  But I respectfully disagree.  Here’s why. I’m starting to think that p-levels should all be labeled “for entertainment purposes only.”  They give a very very rough idea of the non-randomness of your data, and are kind of interesting to look at. So they’re not completely useless, but they are imprecise at best and almost impossible to interpret at worst*, and so should be treated as only one among many considerations when we decide what we as scientists actually believe.  Other considerations (partial list): prior probabilities (also very rough!), conceptual coherence, consistency with related findings, and (hats off please) replicability. Continue reading

Why doesn’t personality psychology have a replication crisis? – David Funder (funderstorms)

Because It’s Boring

“[Personality psychology] has reduced the chances of being wrong but palpably increased the fact of being boring. In making that transition, personality psychology became more accurate but less broadly interesting.”  — Roy Baumeister (2016, p. 6) Many fields of research – not just social psychology but also biomedicine, cancer biology, economics, political science, and even physics – are experiencing crises of replicability.  Recent and classic results are challenged by reports that when new investigators try to repeat them, often they simply can’t.  This fact has led to gnashing of teeth and rending of garments, not to mention back-and-forth controversies pitting creativity against rigor (see the article quoted in the epigram), and spawned memorable phrases such as “replication police” and “shameless little bullies.” But, as the quote above attests, personality psychology seems to be immune.  In particular, I am not aware of any major finding (1) in personality psychology that has experienced the kind of assault on its reliability that has been inflicted upon many findings in social psychology (2).  Why not?  Is it because personality psychology is boring?  Maybe so, and I’ll come back to that point at the end, but first let’s consider some other

Possible Reasons Personality Psychology Does Not Have a Replication Crisis

  1. Personality Psychology Takes Measurement Seriously
The typical study in personality measures some attribute of persons (usually a personality trait) and also measures an outcome such as a behavior, a level of attainment, or an indicator of mental or physical health. Continue reading

What if Gilbert is Right? – David Funder (funderstorms)

I. The Story Until Now (For late arrivals to the party) Over the decades, since about 1970, social psychologists conducted lots of studies, some of which found cute, counter-intuitive effects that gained great attention. After years of private rumblings that many of these studies – especially some of the cutest ones – couldn’t be replicated, a crisis suddenly broke out into the open (1). Failures to replicate famous and even beloved findings began to publicly appear, become well known, and be thoroughly argued-over, not always in the most civil of terms. The “replicability crisis” became a thing. But how bad was the crisis really? The accumulation of anecdotal stories and one-off failures to replicate was perhaps clarified to some extent by a major project organized by the Center for Open Science (COS), published last November, in which labs around the world tried to replicate 100 studies and, depending on your definition, “replicated” only 36% of them (2). In the face of all this, some optimists argued that social psychology shouldn’t really feel so bad, because failed replicators might simply be incompetent, if not actually motivated to fail, and the typical cute, counter-intuitive effect is a delicate flower that can only bloom under the most ideal climate and careful cultivation. Optimists of a different variety (including myself) also pointed out that psychology shouldn’t feel so bad, but for a different reason: problems of replicability are far from unique to our field. Failures to reproduce key findings have become seen as serious problems within biology, biochemistry, cardiac medicine, and even – and disturbingly –cancer research. It was widely reported that the massive biotech company Amgen was unable to replicate 47 out of 53 of seemingly promising cancer biology studies. If we have a problem, we are far from alone. II. And Then Came Last Friday’s News (3) Prominent psychology professors Daniel Gilbert and Tim Wilson published an article that “overturned” (4) the epic COS study. Continue reading

Bargain Basement Bayes – David Funder (funderstorms)

One of the more salutary consequences of the “replication crisis” has been a flurry of articles and blog posts re-examining basic statistical issues such as the relations between N and statistical power, the importance of effect size, the interpretation of confidence intervals, and the meaning of probability levels. A lot of the discussion of what is now often called the “new statistics” really amounts to a re-teaching (or first teaching?) of things anybody, certainly anybody with an advanced degree in psychology, should have learned in graduate school if not as an undergraduate. It should not be news, for example, that bigger N’s give you a bigger chance of getting reliable results, including being more likely to find effects that are real and not being fooled into thinking you have found effects when they aren’t real. Nor should anybody who had a decent undergrad stats teacher be surprised to learn that p-levels, effect sizes and N’s are functions of each other, such that if you know any two of them you can compute the third, and that therefore statements like “I don’t care about effect size” are absurd when said by anybody who uses p-levels and N’s. But that’s not my topic for today. My topic today is Bayes’ theorem, which is an important alternative to the usual statistical methods, but which is rarely taught at the undergraduate or even graduate level.(1)  I am far from expert about Bayesian statistics. This fact gives me an important advantage: I won’t get bogged down in technical details; in fact that would be impossible, because I don’t really understand them. A problem with discussions of Bayes’ theorem that I often see in blogs and articles is that they have a way of being both technical and dogmatic. A lot of ink – virtual and real – has been spilled about the exact right way to compute Bayes Factors and advocating that all statistical analyses should be conducted within a Bayesian framework. I don’t think the technical and dogmatic aspects of these articles are helpful – in fact I think they are mostly harmful – for helping non-experts to appreciate what thinking in a semi-Bayesian way has to offer. So, herewith is my extremely non-technical and very possibly wrong (2) appreciation of what I call Bargain Basement Bayes. Continue reading

Towards a De-biased Social Psychology: The effects of ideological perspective go beyond politics. – David Funder (funderstorms)

Behavioral and Brain Sciences, in press; subject to final editing before publication This is a commentary on: Duarte, J. L., Crawford, J. T., Stern, C., Haidt, J., Jussim, L., & Tetlock, P. E.  (in press). Political diversity will improve social psychological science. Behavioral and Brain Sciences. To access the target article, click here. Continue reading

How to Flunk Uber: A Guest Post by Bob Hogan – David Funder (funderstorms)

How to Flunk Uber by Robert Hogan Hogan Assessment Systems Delia Ephron, a best-selling American author, screenwriter, and playwright, published an essay in the New York Times on August 31st, 2014 entitled “Ouch, My Personality, Reviewed”  that is a superb example of what Freud called “the psychopathology of everyday life.”  She starts the essay by noting that she recently used Uber, the car service for metrosexuals, and the driver told her that if she received one more bad review, “…no driver will pick you up.”  She reports that this feedback triggered some “obsessive” soul searching:  she wondered how she could have created such a bad score as an Uber passenger when she had only used the service 6 times.  She then reviewed her trips, noting that, although she had often behaved badly (“I do get short tempered when I am anxious”), in each case extenuating circumstances caused her behavior.  She even got a bad review after a trip during which she said very little:  “Perhaps I simply am not a nice person and an Uber driver sensed it.” The essay is interesting because it is prototypical of people who can’t learn from experience.  For example, when Ms. Ephron reviewed the situations in which she mistreated Uber drivers, she spun each incident to show that her behavior should be understood in terms of the circumstances—the driver’s poor performance—and not in terms of her personality.  Perhaps situational explanations are the last refuge of both neurotics and social psychologists? In addition, although the situations changed, she behaved the same way in each of them:  she complained, she nagged and micro-managed the drivers, she lost her temper, and she broadcast her unhappiness to the world. Continue reading