Encore of Episode 32: The Scientific Process

Hidden Brain XX

--:--

Full Transcription:

[0] So here's the deal.

[1] Researchers recently tried to replicate 100 experiments in psychology that were published.

[2] The Center for Open Science recruited colleagues from around the world to try and replicate a hundred studies.

[3] And found that most of them could not be reproduced with the same results.

[4] In fact, depending on how you can.

[5] Welcome to Hidden Brain.

[6] I'm Shankar Vedantam.

[7] Today we're going to talk about what has been called a replication crisis in science.

[8] The replicators in this recent study failed to get the same findings from the original experience.

[9] From cancer medicine to psychology, researchers are finding that many claims made in scientific studies fail to hold up when those studies are repeated by an independent group.

[10] Later in this episode, we're going to explore one provocative study that looked at stereotypes about Asians, women, and math tests, and explain what happened when researchers try to reproduce the finding.

[11] We're going to use this story to explore a deeper question.

[12] What do scientists really mean when they talk about the truth?

[13] Let's take a moment to thank and share a message from our sponsor, LearnVest.

[14] LearnVest is an online financial advice company focused on empowering people nationwide to make good decisions with their money.

[15] Studies show that writing down your goals makes you 49 % more likely to achieve them.

[16] That's why when you work with LearnVest, you tell them what you want to accomplish, and they create a customized financial plan to help you get there.

[17] Plus, they pair you with a financial planner to help keep you on.

[18] on track.

[19] To see a sample plan and get a $50 credit, go to learnvest .com slash brain.

[20] The crisis has actually been a long time coming.

[21] In 2011, for example, Dutch researchers claim that broken sidewalks encourage racism.

[22] They publish their findings in one of the most prestigious academic journals, Science magazine.

[23] A couple of years later, another article in science showed that when a gay person shows up at a stranger's door and speaks openly about what it's like to be gay, this has an extraordinary effect.

[24] So it was the personal connection between the gay person who they were trying to show, you know, they're in person.

[25] People who are against gay marriage change their minds after these emotional encounters.

[26] It was the combination of, you know, contact with a minority coupled with a discussion of issues pertinent.

[27] The results were written up in the New York Times, the Wall Street Journal, the Washington Post.

[28] They were featured on public radio programs, such as the clip you're hearing on.

[29] Science Friday.

[30] Finally, in 2009, researchers claim that bilingualism, the ability to speak more than one language, is better for your brain.

[31] All these claims had serious problems.

[32] The Dutch claim was based on fabricated data.

[33] One author of the gay marriage claim asked for the paper to be withdrawn after concerns were raised about fraud.

[34] Both claims were retracted.

[35] Might be true.

[36] Might not.

[37] We don't know.

[38] Because it turns out the researchers made up the data.

[39] The bilingual advantage paper wasn't fabricated, but it was missing important context.

[40] The researchers had conducted four experiments.

[41] Three, failed to show that bilingualism was better for the brain.

[42] Only one experiment showed a benefit.

[43] It was the only one that was published.

[44] Angela DeBrown was on the team that worked on the bilingual advantage study.

[45] It is troubling because we like to believe that what we see is actually the truth.

[46] But if it's only half of the result, we find and we're in effect hiding the other half of the results, then we'll never really find out what's going on.

[47] At the University of Virginia, psychologist Brian Nozig decided something had to be done.

[48] Brian felt the problem was that too many researchers and too many scientific journals were focusing on publishing new and unusual findings.

[49] Too few were spending time cross -checking earlier work to make sure it was solid.

[50] One of the key factors of science is that a A claim becomes a credible claim by being reproducible, that someone else can take the same approach, the same protocol, the same procedure, do it again themselves, and obtain a similar result.

[51] Brian launched an effort to reproduce dozens of studies in psychology.

[52] He published a report in 2015.

[53] We found that we were able to reproduce the original results in less than half of the cases across five different criteria of evaluating whether a replication was successful or not.

[54] Over the last year, there have been many debates about what this means.

[55] Some critics say it proves that most studies are worthless.

[56] At many universities, researchers feel it's their integrity, not just their scientific conclusions, that are being called into question.

[57] At Harvard University, psychologist Dan Gilbert recently published a paper calling Brian's conclusions into question.

[58] So I think we just have to use our heads to figure out which kinds of things we expect to replicate directly, in which kinds of things we would only expect a conceptual replication.

[59] And we need to calm down when we don't see direct replication and asked whether we really should have expected at all.

[60] To unpack all of this, let's take a detailed look at one study and what happened when researchers try to replicate it.

[61] I think the story reveals many truths about the ongoing controversy.

[62] When they were graduate students at Harvard, Todd Pitinsky and his friend Margaret Shee often went to restaurants together.

[63] They went for the food, but they also spent a lot of time observing human behavior.

[64] We often, after class, would go to the cheesecake factory, and she would order a strawberry shortcake, and I would typically order a salad, and the number of times that the salad was delivered to her, and the strawberry shortcake was delivered to me. She also likes regular Coke, and I'm a diabetic, so I drink Diet Coke, and without fail, the Diet Coke would go to her, and the regular Coke would go to me. The waiters were stereotyping Todd and Margaret.

[65] The guy was probably ordering the less healthy stuff.

[66] The woman was ordering salads and diet drinks.

[67] Todd and Margaret knew there was lots of research into the effects of such stereotypes.

[68] Now, getting a dish you haven't ordered is one thing, but there are more serious consequences.

[69] Stereotypes can be hurtful.

[70] They can affect performance.

[71] But as Todd and Margaret observed the waiters, they realized something was missing in the way.

[72] the research.

[73] The previous studies had focused on the negative consequences of stereotypes.

[74] Could stereotypes also work in a positive fashion?

[75] We thought if we really want to understand how stereotypes operate in the world, we can't simply look at half of it.

[76] The young researchers brainstormed how they might study the other half of the equation.

[77] The answer came to them as they were, yeah, eating together.

[78] We were sitting in Harvard Square over ice cream, and we said what we need is a group where the stereotypes go in very different direction.

[79] They wanted to study a situation where stereotypes could have both positive and negative effects.

[80] And Margotchi happens to be an Asian American and a woman, and we started talking about math identities, and we kept going back and forth and back and forth, and then literally at the same moment, we said, well, why don't we study Asian women in math?

[81] The experiment they designed was ingenious and simple.

[82] There are negative stereotypes about women doing math, and positive.

[83] stereotypes about asians in math.

[84] So what happens when you give a math test to women who are Asian?

[85] We hypothesize that when you make different identities salient, you should expect different stereotypes to be applied.

[86] Todd and Margaret figured that if they reminded Asian women about their gender, they would see the negative stereotype at work.

[87] But what would happen if they subtly reminded the volunteers about their Asian identity?

[88] The researchers recruited Asian women as volunteers.

[89] and asked some of them to identify their gender on a form before taking a math test.

[90] Earlier research had shown that when you make gender salient in this way, this triggers the negative stereotype about women in math.

[91] Todd and Margaret reminded other volunteers, selected at random, about their Asian heritage.

[92] They wanted to make these volunteers remember the stereotype about Asians being good at math.

[93] After all the volunteers finished the tests, the researchers analyzed their performance.

[94] Todd was working down the hall from Margaret one day when he heard her call out to him.

[95] She just shouted, holy cow, it worked.

[96] So I just sort of ran down there and we started looking at the output together.

[97] The study found that when the volunteers were reminded that they were women, they did worse on the math test.

[98] When they were reminded that they were Asian, they did better.

[99] Same women, same math test.

[100] Negative stereotype, negative result.

[101] Positive stereotype, positive result.

[102] The study was an instant sensation.

[103] Psychologist Brian Nozek.

[104] This is one of my favorite effects in psychological science.

[105] Something that seems like it shouldn't be flexible, how well we perform in math, is flexible as a function of the identities that we have in mind and stereotypes associated with those identities.

[106] Asians being good at math, women being not as good at math.

[107] The study quickly became a staple of college textbooks, says psychologist Carolyn Gibson.

[108] It is a pretty amazing finding.

[109] And I've heard about that study for the first time as an undergrad.

[110] It's been used as an example in social psychology courses for four years since it was published in 1999.

[111] And it's used as a good example for stereotype threat and stereotype boost.

[112] But from a scientific perspective, there was one big problem.

[113] It had never been replicated, exactly.

[114] Somebody had never followed their steps that they followed and replicated their results, but it's been used to support further studies many times over the past 15 years.

[115] Brian Nozek agreed, someone needed to replicate the original study.

[116] He was spearheading a mammoth effort to reproduce dozens of studies in psychology.

[117] Along with the panel of reviewers, he selected this study for replication and asked Carolyn Gibson at Georgia Southern University to conduct it.

[118] Brian wanted the replication to closely match the conditions of, the original study.

[119] If you don't do that, you really are conducting two different studies.

[120] After launching the replication, he had second thoughts about its location in the south.

[121] And the reviewers thought, this looks like a case where the location might matter.

[122] Asians in the southern U .S. might be a more distinct minority than Asians in the northeast or in the west.

[123] And so we recruited a second team to do a replication science.

[124] simultaneously at UC Berkeley and the West Coast University, where Asians are much more prominent members of the community.

[125] The team in Berkeley was headed by Alice Moon.

[126] Alice was a fan of the original paper.

[127] When I heard about it, I just thought it was like one of the very cool demonstrations in social psychology, and so that's why I always liked this paper.

[128] She followed the protocol of the original Harvard experiment.

[129] She recruited Asian women, reminded them of the female side of their ideas, identity or the Asian side of their identity, and then gave them a math test.

[130] So what happened?

[131] When we compared, just as the original paper did, when we compared the participants who were in the Asian identity salient condition, with the participants in the female identity a salient condition, we found that there was no difference in their math performance.

[132] The celebrated study failed to be replicated.

[133] When Brian Nozek announced the finding about this and dozens of other studies that could not be replicated, it caused an uproar.

[134] Newspaper articles called it a crisis.

[135] Critics held accusations about fraud and scientific misconduct.

[136] In a 6 ,000 -word cover story, the conservative magazine Weekly Standard said that liberals had been making up research into how stereotypes affect women and people of color.

[137] The Berkeley study, however, was not the only reason.

[138] replication of Todd Pitinsky and Margaret Shee's paper.

[139] Remember how Brian had two groups conduct replications?

[140] I asked Carolyn Gibson at Georgia Southern University what she found when she ran the experiment on Asian women and math.

[141] When primed with Asian identity, Asian females did better on a math test compared to those who had been primed with their female identity and then those primed with their female identity did significantly worse.

[142] Carolyn has no doubt about the meaning of what she found.

[143] I believe that it further supports the original finding and that it gives even more robust evidence to this idea, mostly because we followed the same method as the original study and because we collected more participants, and so we have a more powerful study.

[144] At Berkeley, Alice isn't sure.

[145] I do believe that stereotypes in general do have effects on our lives, but in terms of this particular finding about whether stereotypes can facilitate people's academic performance, I guess it has made me question whether or not that finding is true.

[146] Okay, so which is it?

[147] Should we trust the results of the Berkeley study and say that Todd Pitensky and Margaret Trey's finding was disproved?

[148] Or should we trust the Georgia study and say the finding was confirmed?

[149] What happens when scientific studies disagree with one another?

[150] The popular narrative of the replication crisis suggests that scientists are like dueling gladiators.

[151] If two scientists come up with different findings, it must mean one of them is wrong, or worse, one of them must have faked her data.

[152] When we come back, we'll take a look at why this idea misunderstands how science is supposed to work.

[153] Our statistical techniques are probabilistic and not definitive, and so we absolutely need replications.

[154] but replications in our current academic climate are also serving the purpose of trying to vet out academic fraud and are serving as a detection technique.

[155] And those two are very different missions for replications.

[156] Stay with us.

[157] Support for NPR comes from Eli Lilly and Company.

[158] For 140 years, Lily has united caring with discovery to make life better for people around the world.

[159] Today, they're working to discover a life -changing life -changing, in the areas of diabetes, cancer, autoimmune diseases, and Alzheimer's disease, among others.

[160] Learn how the people of Lilly turn inspiration into action at lilyforbetter .com.

[161] Support also comes from the Amazon original series The Man in the High Castle, which imagines a world where the Allies lost World War II and America is ruled by Nazi Germany and imperialist Japan.

[162] But revelations and secret prophetic films prove our history.

[163] future belongs to those who change it.

[164] Based on the award -winning book by Philip K. Dick, executive produced by Ridley Scott and winner of two Emmy Awards.

[165] Stream the new season now on Amazon Prime Video.

[166] This is Hidden Brain.

[167] I'm Shankar Vedantam.

[168] We're taking a look today at how science works and the so -called replication crisis in the social sciences.

[169] As I listen to the news reports about the controversy, I found myself drawing an analogy with my own profession.

[170] journalism.

[171] Here's what I mean.

[172] A few years ago, a reporter for the New York Times was caught fabricating stories.

[173] Instead of traveling to various locations and interviewing people, he simply made stuff up.

[174] The newspaper went back and re -reported the stories Jason Blair had written.

[175] When the facts didn't match, the reporter was fired.

[176] Imagine for a second what would happen if we re -reported every story by every reporter at the New York Times.

[177] Even when reporters are doing a perfectly good job, the old and news stories might not match.

[178] A source might not see exactly the same thing again.

[179] Sometimes, if the circumstances have changed, a source might say something completely different.

[180] So when two reporters don't produce the same story, it could be that one of them is making stuff up.

[181] But much more likely is that both of them are right.

[182] Now, I know what you're thinking.

[183] Journalism is storytelling.

[184] science is about data.

[185] But let's look closely at what happened in the replications that Carolyn Gibson and Alice Moon did of Todd Pitinsky's study.

[186] In the original study, women administered the experiment.

[187] In Georgia, the facilitator was also female.

[188] But in Berkeley, where the replication failed, both male and female facilitators administered the study.

[189] Could that have made a difference?

[190] Let's be clear.

[191] So it was not an exact replication.

[192] So here's an example.

[193] It mentions clearly in the paper that, and I don't know whether this factor is important or not, that in one study, the experimenter gender were males, and another study, the experimenter gender were females.

[194] I have no idea whether that's a factor that could explain the difference between the two studies.

[195] And so let's be clear about what, it's not an exact replication.

[196] This is Eric Bradlow from the University of Pennsylvania.

[197] He's eminently qualified to talk about this stuff.

[198] Spent four years here at Wharton studying statistics and mathematics, went on to get my Ph .D. in statistics.

[199] And for the last 20 years, I've been applying statistical methods to lots of problems, but I consider myself a mathematical social scientist.

[200] Eric believes that requiring studies to achieve statistical replication to match more or less perfectly, before you conclude that either is true, is like requiring two reporters to cover a basketball game and come back with nearly identical stories.

[201] Exact replication is one of those mythological ivory tower things that doesn't exist.

[202] What we really need to think about is if the study doesn't replicate, why doesn't it replicate?

[203] And even if it doesn't replicate exactly, it may actually reinforce the original finding.

[204] In other words, you may be more certain.

[205] This isn't just true about studies in psychology.

[206] Eric told me that NIH researchers once found that lab mice, given a sedative, took 35 minutes to recover.

[207] When the experiment was repeated, the mice took 16 minutes to recover.

[208] The scientists scratched their heads.

[209] It made no sense.

[210] It took a while to figure out that something that shouldn't have made a difference, did.

[211] In between the two experiments, wood shavings in the animal cages were changed.

[212] Turns out that red cedar and pine shavings step up the speed at which the sedative was metabolized.

[213] Birch or maple don't.

[214] This is not to say that repeating experiments, is useless or pointless.

[215] It's incredibly valuable.

[216] But replications primarily help us understand the nuances around a phenomenon.

[217] They're not very useful as a tool to detect fraud.

[218] Just because you get different results doesn't mean you shouldn't trust them.

[219] How much, are they within a margin of error of each other?

[220] Are there other variables that would make it so that study done at University A and the study done at University B wouldn't yield exactly the same thing?

[221] I think that's a better way of looking at it than say, If you don't get exactly the same results or even results that are very nearly the same, you can't trust them.

[222] I think that's a superficial level of science.

[223] I think you need to go below that.

[224] So when you yourself look at a study that has not replicated or you looked at what's sometimes called a failed replication, do you not at the back of your mind say, well, this disproves the first study?

[225] Do you actually never think that way?

[226] Oh, never's a long word.

[227] Never's a strong word.

[228] You know, I'm thinking I have to think of that James Bond movie when Sean Connery said, never do James Bond again, and then 15 years later, he came out with a movie called Never Say Never Again.

[229] No, I would never say that, but I would say the following.

[230] Let's imagine that you do a study and that you find that, you know, people that take an SAT prep course do 15 % better on the SAT.

[231] And let's imagine then someone else does a study, and the answer's only 3%.

[232] Now, there's two possibilities.

[233] One is the first study, for whatever reason, overestimated the effect, that's entirely possible, and therefore 3 % is less than 15%.

[234] But note, if you combine those two studies together, your finding might actually be stronger in the sense, I'm now more sure that SAT prep helps performance on the SAT.

[235] Now, the effect size may shrink from 15 % to 11%, but also notice I've possibly now doubled or tripled my sample size, so my uncertainty goes down, and now I may even be more sure that SAT prep helps.

[236] Maybe not to the degree that it helped in the first study, but still I'm more sure that it's actually effective.

[237] When you think about different branches of science, though, aren't there branches of science where you can expect the same thing to happen very predictably over and over again, when you look at particle physics, for example, you would expect that if you fire 20 ,000 protons out of a gun, that they're basically going to do the same thing pretty much every time.

[238] Well, you're testing the boundary of my memory of my particle physics class when I took it here at Penn. But my understanding is, of course, and this is what statisticians study, right?

[239] We study the concept of randomness.

[240] And so every science, every discipline, unless you're talking about an equal sign, like E equals MC squared.

[241] E doesn't approximately equal MC squared.

[242] It actually equals.

[243] Most physical laws and things aren't equal signs.

[244] There's approximate signs, and so that means there's randomness to it.

[245] I think if you fired 20 ,000 protons, you would see that there's a deviation in the way they collide with other particles and there's randomness.

[246] I think the same thing is true in the social sciences.

[247] You bring in 500 subjects.

[248] You bring them in at University A. You bring them in at University B. There's randomness in people's answers.

[249] Of course, you would hope the overall patterns would be similar, but the fact, this belief that you're going to get exactly the same findings, I'm not sure that's something science should be striving for.

[250] Any individual study is just that.

[251] An individual study.

[252] It isn't the truth.

[253] Every observation is a point.

[254] It's a dot, and we observe dots, and then we observe more dots.

[255] And if those dots replicate, great, then we have more belief.

[256] Science is about the evolution of knowledge, right?

[257] But the process is never ending.

[258] There will always be more things to uncover, more nuances.

[259] We get more certain about what it is we know, and we also get more certain about what are called boundary conditions or moderators.

[260] Like, for example, maybe this effect holds in urban areas versus not.

[261] Maybe it holds in California, not in Alabama.

[262] Maybe it holds for people that hold these stereotypes, and maybe it doesn't hold for people that don't.

[263] To me, that's an advance of science.

[264] We have found what's called a main effect, which is, you know, stereotypes have an effect on outcomes or priming has an effect on outcomes, and then we say, oh, and by the way, it doesn't hold in these conditions.

[265] That's not a failure to replicate.

[266] That's a more nuanced view of the original finding.

[267] At Harvard, Dan Gilbert says you can expect some studies to replicate nearly perfectly every time, but in other cases, the very thing you're studying is changing, so exact replications aren't possible.

[268] There are many findings in psychological science that we would expect to replicate quite exactly years later and on different populations.

[269] I blink conditioning is a very nice example.

[270] If I blow in your eye enough, you're going to start blinking as I purse my lips.

[271] And that's not going to be very different across cultures, across times, across age groups.

[272] Other kinds of findings certainly are.

[273] There's one of my favorite experiments in social psychology shows that when young men who, are from the north or the south of the United States are insulted, they react very differently because northerners and southerners have very different codes of honor.

[274] Now, you can't take that experiment and expect to do it in Italy or expect to do it 25 years from now.

[275] It's an experiment that's of its moment and of its time.

[276] Every researcher I spoke with told me there's lots of agreement within the scientific community.

[277] There are certainly many scientific studies that are poorly designed.

[278] There are researchers who do shoddy work.

[279] There is great pressure at universities and scientific journals to publish striking findings.

[280] But the solution to all these problems, say Eric Bradlow, Brian Nozak, and Dan Gilbert is more and better science.

[281] Eric Bradlow.

[282] The truth will come out.

[283] More dots will come out.

[284] And if it turns out that what I published, it's not because I did anything fraudulent, just isn't true because of sample size, the way I collected the data, then you know what?

[285] science will eventually figure out that what I'm saying is not true.

[286] Brian Nozak is bemused that his findings about replicability have been taken to mean that the studies that fail to reproduce are worthless.

[287] He started a new system where researchers register protocols for their studies and commit to sticking to them.

[288] Scientific journals commit to publishing the findings of these studies, regardless of whether the results are sexy.

[289] Science is the slow march of accumulating evidence, and it's very easy to want to simple answer.

[290] Is it true or is it false?

[291] But really, replication is just an opportunity to accumulate more evidence to get a more precise estimate of that particular effect.

[292] To most people, the debate over scientific truth is an abstract issue.

[293] Most of us turn to scientists for answers.

[294] Should I drink a glass of red wine in the evening?

[295] Is this drug safe to give to my ailing mother?

[296] Should I give my kid a dollar every time she does something well at school?

[297] In reality, science is more in the question business than the answer business.

[298] There's a reason nearly every scientific paper ends with a call for more research.

[299] Especially when it comes to human behavior, nearly every conclusion you can draw about human beings has tons of exceptions.

[300] Are people selfish?

[301] Yeah, except millions act altruistically every day.

[302] Are humans kind?

[303] Yes, except that few species are capable of greater cruelty.

[304] If you want answers that never change, definitive conclusions and final truths, odds are you don't want to ask a scientist.

[305] This episode of Hidden Brain was produced by Kara McGurke -Alison, Max Nessdrak, and Maggie Penman.

[306] Our staff includes René Clare, Jenny Schmidt, and our supervising producer Tara Boyle.

[307] Our unsung hero this week is Camille Smiley, who truly lives up to her name.

[308] Camille is the executive assistant for our department and one of our favorite people.

[309] Whether you want to bounce around ideas, order office supplies, or just procrastinate next to her desk, Camille is always game to help out.

[310] For more Hidden Brain, you can find us on Facebook and Twitter and listen for my stories on your local public radio station.

[311] And speaking of your local public radio station, this is the season for giving.

[312] If you enjoy this program, please consider supporting your local station and tell them Hidden Brain sent you.

[313] It'll take just a few minutes, and it's tax deductible.

[314] Go to stations .npr .org.

[315] And thanks.

[316] I'm Shankar Vedantham, and this is NPR.

[317] As you look back on the past year, you can listen back too.

[318] New NPR podcasts and old favorites are all waiting for you on the NPR One app.

[319] Dive back in and listen to Embedded, Invisibilia, Code Switch, or even the Hidden Brain Archive.

[320] It's perfect for a long road trip or a break from the holiday parties.

[321] Listen any time on the NPR One app or at NPR One app.

[322] PR .org slash podcasts.