Episode: 2695 Title: HPR2695: Problems with Studies Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2695/hpr2695.mp3 Transcribed: 2025-10-19 07:37:16 --- This is HPR Episode 2695 entitled Problem with Studies and in part of the series Health and Health Care. It is hosted by AYUKA and in about 13 minutes long and currently in a clean flag. The summary is some principles for evaluating medical studies. This episode of HPR is brought to you by an honesthost.com. Get 15% discount on all shared hosting with the offer code HPR15. That's HPR15. Better web hosting that's honest and fair at an honesthost.com. Hello, this is AYUKA, welcoming you to Hacker Public Radio and another exciting episode. You know, a series about health and taking care of yourself and we've now moved into evaluating studies. And there are problems that, you know, need to be addressed when we take a look at it. We don't want to misconstrue what's going on. As we said last time around, medical science has made great strides. And when we look at problems, we should not lose sight of all of the significant accomplishments that medical science has made. I for one have no wish to return to the days when the leading cause of death for women was childbirth and a man was considered old if he reached the age of 40. However, it is worth taking the time to develop a better idea of what constitutes quality in research and how results should be interpreted. So what are some of the problems we run into? Well, the first one is generally no one cares about negative results. Now, they should care. You know, it's all important to know these things. But let's modify our previous example. Okay, that was the children eating breakfast thing. So let's consider whether it is a hot breakfast that improves student performance and raises test scores. Our control group would then be students who only got a cold breakfast like, say, juice and cereal with milk. We do a study. And it turns out there is no statistically significant evidence that hot breakfasts produce better performance than cold breakfasts. Well, with this in mind, we could then target our assistance to children a little better, maybe save some money. But we run into a problem of bias in publishing, which is that journals tend to prefer to publish studies that have positive results. This is very pronounced in studies of the efficacy of drugs. Pharmaceutical companies in the United States have to do careful studies for any new drug to establish a, the safety of the drug, and b, the efficacy of the drug. And I think there are similar rules in other countries. So what happens if a company does extensive testing and cannot find proof that the drug does anything? Now, the study goes on to a dusty shelf somewhere and has never heard from again. Now, this matters for a couple of reasons. The first data is data and even negative results have utility. But even more important is that sometimes studies that by themselves do not show positive results can be combined with other studies and a positive result can come out. This is because combining studies, something called meta-analysis, gives you a larger sample size, which can improve the power of the combined studies. Okay. The next problem, and this is serious, is no one cares about replication. Well, there's a few people that do, but remember in the last episode, we said that about one out of 20 studies will reach the wrong conclusion, even if no one did anything wrong. It's just, you know, randomness. Now, the best defense against that is to have other scientists replicate the study and see if they get the same results. And this is a huge problem because quality journals generally are not interested in publishing replication studies. And then the tenure committees at universities, therefore, do not consider replication studies as valuable research. Everyone wants something original. But many original studies have problems that could be addressed through replication studies. And when it's been studied, we found that a large number of these medical studies cannot be replicated. And this is particularly acute in social psychology. This has become known as the replication crisis. For an example, there was a 2018 paper in Nature Human Behavior, which looked at 21 studies and found that only 13 could be successfully replicated. It's slightly over half, but this is not, not standing record here. Now, there's a fellow, if you're at all interested in this, this is a name that you would want to follow. Professor John Ioannidis, that's I-O-A-N-N-I-D-I-S, who is a professor at Stanford University. And he's been a leading voice in this replication problem. He was an advisor to the reproducibility project Cancer Biology. Now, this study looked at five influential studies in cancer research. Influential means other people were basing their research off of this and found that two were confirmed, two were uncertain, and one was out and out disproved. Now, we want to be careful here. This is not something that derives from fraud in the vast majority of cases, although there are examples of fraud that we'll mention. It can derive from small differences in the research approach, and if we understand what those differences are, that might bring out areas for further research and can point you in the direction of important factors. And if an influential study is just out and out, wrong, it's going to lead to wasting time and resources that are based on this. Pharmaceutical companies have found, for instance, they cannot replicate the basic research underlying some of their proposed drugs, which causes waste and diverse resources that could be used more productively. Now, there's a lot of controversy over replication, and part of it is that there is sometimes an inference that if a scientist's results cannot be replicated, the scientist was at worst fraudulent or at best sloppy and lacking in care. Now, there are examples of fraud, Andrew Wakefield, who published a study claiming to find relationship between vaccination and autism that was just totally bogus, although people are still citing it. But I mean, this guy, his license to practice medicine was taken away, the paper was retracted, and he's now, you know, making a living, talking to fringe groups about vaccination, but no one in medical science is going to take him at all seriously. Another one, Huang Wu-Suk, also guilty of fraud, and it just had to do with stem cell research that it turns out he had basically fabricated some results. Now, sometimes a respected researcher who did not do anything wrong can somehow get tainted. An example of that is Yoshiki Sasai ended up committing suicide because someone else in his lab made some mistakes and that he got tagged with all of that. So you can understand it is a little bit of a controversial area, but the basics is you cannot rely on any study on any one study for anything. There is a replication problem here. Now, this could be dealt with, okay? We could start allocating a percentage of the research budget to replication studies. We could start rewarding people who do good work in this area, but it's going to take some work to change that. Now, another thing that is a problem is something that's referred to as p-hacking. Well, there's an interesting buzz word, isn't it? Now, as we discussed previously in our previous episode, statistical significance of a result is based on the p-value, which is the probability that the result came from random chance instead of coming from a real relationship. And this is important because people are not in general good at dealing with probability and statistics. And a formal test can provide a valuable check on confirmation bias. Confirmation bias is when you see what you want to see. If you want to know confirmation bias, you know, think of how many times you look in the mirror and say, oh, I'm not that much overweight. Now, we've noted how this should happen. A good researcher should come up with an hypothesis, then collect the data, then do a test to see if there is a statistically significant result. That is the ideal. But as we saw above, if you find nothing, you have a problem. Studies that fail to find anything do not generally get published. Researchers who do not get published do not get tenure, and will probably have trouble getting grants to do any more research. So, there is tremendous pressure to get a result that is significant and can be published. Your whole career can rest on it. Now, one result of that pressure is a practice known as p-hacking, which in general means looking for ways to get a significant result that is publishable by any means necessary. I've got a few links in the show notes if you want to follow up on this one, including a nice video that gets into this. Now, colloquially, p-hacking has been referred to as torturing the data until it confesses. One variation is to skip the initial hypothesis, and just go looking for any relationship you can find in the data. You might think that no harm is done, since if the researcher finds a relationship, that's a good thing, but remember the idea is to find true relationships. If you grab enough variables and run them against each other, you will definitely find correlations that they will not generally be anything other than coincidence. The whole point of stating an hypothesis in advance of collecting the data is that your hypothesis should be grounded in a legitimate theory of what is going on. For example, someone once did a study showing that you could predict the winner of the US presidential election, by which team won the previous Super Bowl? Now, there's no way that this correlation means anything other than random chance helped by a small sample size. Another approach is to drop some parameters of the original study to get a better p-value, or you might find an excuse to eliminate some of the data until you get a result. But if the p-value is the gate you have to get through to get published and have a career, some people will look for a reason to go through that gate. So the point of this analysis is that you have to be careful about uncritically accepting any result you come across. Reports on television news, or in newspapers deserve skepticism. I never accept any one study as definitive. You need to see repeated studies that validate the result before you can start to accept it. Now, of course, most civilians are going to have trouble doing all of that, which sets us up for the next article where we look at how healthcare providers can help with this. And above all else, if you want one rule to live by, if Gwyneth Paltrow suggests something, do the opposite. She may be the first recorded case of an IQ measured in negative numbers. And with that, this is Ahuka for Hecker Public Radio, signing off, and as always, reminding you to support FreeSoftware. Bye-bye. You've been listening to Hecker Public Radio at Hecker Public Radio.org. We are a community podcast network that releases shows every weekday, Monday through Friday. Today's show, like all our shows, was contributed by an HBR listener like yourself. If you ever thought of recording a podcast, then click on our contributing to find out how easy it really is. Hecker Public Radio was found by the Digital Dove Pound and the Infonomicon Computer Club, and is part of the Binary Revolution at Binrev.com. If you have comments on today's show, please email the host directly, leave a comment on the website or record a follow-up episode yourself. Unless otherwise stated, today's show is released on the creative comments, attribution, share a like, free.or license.