In the summer of 2013, Özge Siğirci, a young scientist in Turkey, had not yet arrived at Cornell University for her new research stint. But she already had an assignment from her future boss, Brian Wansink: Find something interesting about all-you-can-eat buffets.
As the head of Cornell’s prestigious food psychology research unit, the Food and Brand Lab, Wansink was a social science star. His dozens of studies about why and how we eat received mainstream attention everywhere from O, the Oprah Magazine to the Today show to the New York Times. At the heart of his work was an accessible, inspiring message: Weight loss is possible for anyone willing to make a few small changes to their environment, without need for strict diets or intense exercise.
Brian Wansink in 2013.
Gallo Images / Getty Images
When Siğirci started working with him, she was assigned to analyze a dataset from an experiment that had been carried out at an Italian restaurant. Some customers paid $8 for the buffet, others half price. Afterward, they all filled out a questionnaire about who they were and how they felt about what they’d eaten.
Somewhere in those survey results, the professor was convinced, there had to be a meaningful relationship between the discount and the diners. But he wasn’t satisfied by Siğirci’s initial review of the data.
“I don’t think I’ve ever done an interesting study where the data ‘came out’ the first time I looked at it,” he told her over email.
More than three years later, Wansink would publicly praise Siğirci for being “the grad student who never said ‘no.’” The unpaid visiting scholar from Turkey was dogged, Wansink wrote on his blog in November 2016. Initially given a “failed study” with “null results,” Siğirci analyzed the data over and over until she began “discovering solutions that held up,” he wrote. Her tenacity ultimately turned the buffet experiment into four published studies about pizza eating, all cowritten with Wansink and widely covered in the press.
But that’s not how science is supposed to work. Ideally, statisticians say, researchers should set out to prove a specific hypothesis before a study begins. Wansink, in contrast, was retroactively creating hypotheses to fit data patterns that emerged after an experiment was over.
Wansink couldn’t have known that his blog post would ignite a firestorm of criticism that now threatens the future of his three-decade career. Over the last 14 months, critics the world over have pored through more than 50 of his old studies and compiled “the Wansink Dossier,” a list of errors and inconsistencies that suggests he aggressively manipulated data. Cornell, after initially clearing him of misconduct, has opened an investigation. And he’s had five papers retracted and 14 corrected, the latest just this month.
Now, interviews with a former lab member and a trove of previously undisclosed emails show that, year after year, Wansink and his collaborators at the Cornell Food and Brand Lab have turned shoddy data into headline-friendly eating lessons that they could feed to the masses.
In correspondence between 2008 and 2016, the renowned Cornell scientist and his team discussed and even joked about exhaustively mining datasets for impressive-looking results. They strategized how to publish subpar studies, sometimes targeting journals with low standards. And they often framed their findings in the hopes of stirring up media coverage to, as Wansink once put it, “go virally big time.”
The correspondence shows, for example, how Wansink coached Siğirci to knead the pizza data.
First, he wrote, she should break up the diners into all kinds of groups: “males, females, lunch goers, dinner goers, people sitting alone, people eating with groups of 2, people eating in groups of 2+, people who order alcohol, people who order soft drinks, people who sit close to buffet, people who sit far away, and so on...”
Then she should dig for statistical relationships between those groups and the rest of the data: “# pieces of pizza, # trips, fill level of plate, did they get dessert, did they order a drink, and so on...”
“This is really important to try and find as many things here as possible before you come,” Wansink wrote to Siğirci. Doing so would not only help her impress the lab, he said, but “it would be the highest likelihood of you getting something publishable out of your visit.”
He concluded on an encouraging note: “Work hard, squeeze some blood out of this rock, and we’ll see you soon.”
Siğirci was game. “I will try to dig out the data in the way you described.”
All four of the pizza papers were eventually retracted or corrected. But the newly uncovered emails — obtained through records requests to New Mexico State University, which employs Wansink’s longtime collaborator Collin Payne — reveal two published studies that were based on shoddy data and have so far received no public scrutiny.
Still, Wansink defends his work.
“I stand by and am immensely proud of the work done here at the Lab,” he told BuzzFeed News by email, in response to a detailed list of allegations made in this story. “The Food and Brand Lab does not use ‘low-quality data’, nor does it seek to publish ‘subpar studies’.”
He pointed out that an independent lab confirmed the basic findings of the pizza papers. “That is, even where there has been unintentional error, the conclusions and impacts of the studies have not changed,” he wrote.
Siğirci and Payne did not respond to requests to comment for this story.
“I am sorry to say that it is difficult to read these emails and avoid a conclusion of research misconduct.”
Wansink’s practices are part of a troubling pattern of strategic data-crunching across the entire field of social science. Even so, several independent statisticians and psychology researchers are appalled at the extent of Wansink’s data manipulation.
“I am sorry to say that it is difficult to read these emails and avoid a conclusion of research misconduct,” Brian Nosek, a psychologist at the University of Virginia, told BuzzFeed News. As executive director of the Center for Open Science, Nosek is one of his field’s most outspoken reformers and spearheaded a massive project to try to reproduce prominent discoveries.
Based on the emails, Nosek said, “this is not science, it is storytelling.”
Wansink in the Food and Brand Lab in 2006.
David Duprey / Associated Press
The so-called replication crisis has punctured some of the world’s most famous psychology research, from Amy Cuddy’s work suggesting that “power poses” make people feel more confident, to Diederik Stapel’s fabricated claims that messy environments lead to discrimination. In an influential 2015 report, Nosek’s team attempted to repeat 100 psychology experiments, and reproduced less than one-half of the original findings.
One reason for the discrepancy is “p-hacking,” the taboo practice of slicing and dicing a dataset for an impressive-looking pattern. It can take various forms, from tweaking variables to show a desired result, to pretending that a finding proves an original hypothesis — in other words, uncovering an answer to a question that was only asked after the fact.
In psychology research, a result is usually considered statistically significant when a calculation called a p-value is less than or equal to 0.05. But excessive data massaging can wind up with a p-value lower than 0.05 just by random chance, making a hypothesis seem valid when it’s actually a fluke.
Wansink said his lab’s data is “heavily scrutinized,” and that’s “what exploratory research is all about.”
But for years, Wansink’s inbox has been filled with chatter that, according to independent statisticians, is blatant p-hacking.
“Pattern doesn’t look good,” Payne of New Mexico State wrote to Wansink and David Just, another Cornell professor, in April 2009, after what Payne called a “marathon” data-crunching session for an experiment about eating and TV-watching.
“I also ran — i am not kidding — 400 strategic mediation analyses to no avail...” Payne wrote. In other words, testing 400 variables to find one that might explain the relationship between the experiment and the outcomes. “The last thing to try — but I shutter to think of it — is trying to mess around with the mood variables. Ideas...suggestions?”
Two days later, Payne was back with promising news: By focusing on the relationship between two variables in particular, he wrote, “we get exactly what we need.” (The study does not appear to have been published.)
“That’s p-hacking on steroids.”
“That’s p-hacking on steroids,” said Kristin Sainani, an associate professor of health research and policy at Stanford University. “They’re running every possible combination of variables, essentially, to see if anything will come up significant.”
In a conversation about another study in August 2015, Wansink mentioned a series of experiments that “were chasing interactions that were hard to find.” He apparently hoped that they would all arrive at the same conclusion, which is “bad science,” said Susan Wei, an assistant professor of biostatistics at the University of Minnesota.
“It does very much seem like this Brian Wansink investigator is a consistent and repeated offender of statistics,” Wei added. “He’s so brazen about it, I can’t tell if he’s just bad at statistical thinking, or he knows that what he’s doing is scientifically unsound but he goes ahead anyway.”
In 2012, Wansink, Payne, and Just published one of their most famous studies, which revealed an easy way of nudging kids into healthy eating choices. By decorating apples with stickers of Elmo from Sesame Street, they claimed, elementary school students could be swayed to pick the fruit over cookies at lunchtime.
But back in September 2008, when Payne was looking over the data soon after it had been collected, he found no strong apples-and-Elmo link — at least not yet.
“I have attached some initial results of the kid study to this message for your report,” Payne wrote to his collaborators. “Do not despair. It looks like stickers on fruit may work (with a bit more wizardry).”
Wansink also acknowledged the paper was weak as he was preparing to submit it to journals. The p-value was 0.06, just shy of the gold standard cutoff of 0.05. It was a “sticking point,” as he put it in a Jan. 7, 2012, email.
“It seems to me it should be lower,” he wrote, attaching a draft. “Do you want to take a look at it and see what you think. If you can get the data, and it needs some tweeking, it would be good to get that one value below .05.”
BuzzFeed News
Later in 2012, the study appeared in the prestigious JAMA Pediatrics, the 0.06 p-value intact. But in September 2017, it was retracted and replaced with a version that listed a p-value of 0.02. And a month later, it was retracted yet again for an entirely different reason: Wansink admitted that the experiment had not been done on 8- to 11-year-olds, as he’d originally claimed, but on preschoolers.
Scientists are under a lot of pressure to attain the 0.05 p-value, said Wei of the University of Minnesota, even though it’s an arbitrary cutoff. “It’s an unfortunate state of being in the research community, in the publishing world.”
Still, the Food and Brand Lab appears to approach science far more flagrantly than most other scientists who face the same pressures, Nosek said.
“It’s a cartoon of how someone in the most extreme form might p-hack data,” he said of the emails as a whole. “There was the explicit goal of ‘Let’s just get something out of the data, use the data as a device to find something, anything, that’s interesting.’”
Back in March of last year, shortly after his pizza papers were called into question, Wansink was interviewed by the Chronicle of Higher Education. He told the outlet that before all the hubbub over his studies, he’d never heard of the term “p-hacking” or the replication crisis. “Science is messy in a lot of ways,” he said.
Mike Groll / AP
The emails reviewed by BuzzFeed News point to potential problems with two of Wansink’s studies that haven’t received any public criticism.
In July 2009, Wansink wrote to collaborators about a study in progress. Mall shoppers had been asked to read a pamphlet that described one of two kinds of walks — one that focused on listening to music and the other on exercising. At the end, researchers offered them salty and sweet snacks as a thank you, and recorded how much participants served themselves.
“What’s neat about this is that it shows that just thinking about walking makes people eat more,” Wansink wrote. “But we should be able to get much more from this.” He added, “I think it would be good to mine it for significance and a good story.”
Meanwhile, one of his coauthors, former Cornell visiting scholar Carolina Werle, was trying — in vain — to find a link between how much time the participants spent in the mall and other variables she’d tested. “There is no interaction with the experimental conditions,” she wrote, adding that “nothing works in terms of interaction.”
Werle did not respond to a request to comment for this story. The study wound up in the journal Appetite two years later, claiming that just the act of imagining physical activity prompts people to take more snacks.
In 2013, Werle and Wansink were discussing a different study about whether describing a walk as fun, such as by framing it as a scenic stroll rather than a form of exercise, influenced how much the walkers would want to eat afterward.
The scientists emailed back and forth about two facts that the subsequent paper did not disclose: that the “exercise” group was much smaller than the “fun” group, and that there was some missing data.
In another message, he mentioned that “there’s been a lot of data torturing with this cool data set.”