This highlights the alarming ease with which AI transforms internet noise into fabricated medical facts, proving these models still lack fundamental discernment. It is a sobering reminder that digital convenience should never replace rigorous human skepticism in matters of health.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
ChatBots are Diagnosing Diseases that Don't Exist Based on Blog PostsAdded:
Many years ago, a friend of mine suggested an experiment. Make up a conspiracy theory that's just plausible enough. Post about it online a few times and then sit back and track how it spreads and how far it spreads. It was really a fun thing to talk about. um he had a great idea for a conspiracy theory that I won't share here lest it escape containment but obviously he ended up not doing it because conspiracy theories no matter how small or silly have a way of being weaponized by the people who are likely to believe them. But recently it appears that some researchers did a very similar thing according to a new report in Nature. And let me just put in a caveat here. I wasn't able to find a specific peer-reviewed study about this.
Um, I can't evaluate the evidence presented without that. I don't know if it's currently going through peer review. I reached out to the Nature Reporter, but haven't heard back as of this recording. So, we just need to trust Nature's report. On the plus side, if it turns out that they just made all this up, it adds a new layer of funny.
Um, but I don't think they did because they would be sued into the afterlife.
Almira Osmadovich Tunstrom, my apologies to everyone in and around Sweden, is a medical researcher at the University of Goththingberg who focuses on how people use AI to better their health. In March of 2024, she created a Medium account and published two posts about Vixonia, a reening of the eyelids caused by exposure to the blue light of our screens. This was the first time that disease had ever been mentioned on the internet because she made it up out of whole cloth. Despite this, within a month, Microsoft Bing's co-pilot was declaring that Bixonia is indeed an intriguing and relatively rare condition. And on the same day, Google's Gemini was informing users that Bixonomomania is a condition caused by excessive exposure to blue light and advising people to visit an opthalmologist. That was entirely based on just two blog posts. But wait, it gets worse. In the following weeks, she created two scientific papers about Bixinmania and uploaded them to a preprint server called SI Profiles under the lead author of Professor Livg.
My apologies to everyone in and around Blur Pavistan. As I've mentioned in many previous videos, you have to be careful of preprints because they haven't necessarily been vetted by anyone. And in this case, there was a whole lot to be skeptical of. She wrote that the funding for the study came from a madeup university and a professor sideshow Bob Foundation for its work in advanced trickery part of a larger funding initiative from the university of fellowship of the ring and the galactic triad with the funding number 99942.
She also thanked the department of machine evolution and human antics at the horizon university and in particular professor Ross Geller who has been a very important figure in our endeavors.
We would also like to thank Professor Maria Bow at the Starfleet Academy for her kindness and generosity in contributing with her knowledge in her lab on board the USS Enterprise. I'm often asked for tips on how lay people can better read and analyze scientific papers and look for potential red flags.
From now on, I do need to add that thanking Ross Geller for anything is the biggest red flag imaginable, even perhaps especially in the field of paleontology.
So, one madeup disease, two blog posts, and two patently ridiculous preprint papers. Perplexity AI started talking about how Bixinamania affects one in 90,000 people and chat GPT was then suggesting Bixinamania to prompts asking about hyperpigmentation on the eyelids from blue light exposure. Considering the fact that unfortunately more and more people are using these chat bots for medical advice and the fact that Open AI literally just launched chat GPT health in January, this is extremely not good. And yet it actually gets even worse. In November of 2024, Springer Nature's Trashy Curious Journal, which you may recall from their prior [ __ ] publishing a paper about Turbo Cancer, published a paper titled Clinical and Dermoscopic Evaluation of Perorbital Melanosis and Its Psychological Impact and Effect on Quality of Life, a descriptive study. That paper actually cites the fake Bixonia research. When Nature reached out to Curious for comment, they quickly retracted the paper. AI slop is recursive. Someone uses a chatbot to produce garbage and then other chat bots pick it up and spread it further. It's garbage in, garbage out, garbage all the way down.
To her credit, uh, Osmanovich Tunstrom says she consulted with an ethics expert to pick a health issue that would not cause any serious harm. Bixonia doesn't kill anyone. And I don't know. I guess having hyperpigmented eyelids is not a serious sign of impending death from something real. I guess the worst thing is that it might increase the sale of those blue light glasses. When Nature reached out to the various companies running these chat bots, they pretty much all said the same thing. They say, "Every time research shows their clangers doing something stupid or dangerous." That was an old model. Don't worry, the new model doesn't do that. We program that out even though we're only just now hearing about this problem.
Cool. Now, to give these companies credit, sometimes they are able to update the bots to be more accurate.
Check out this study from last fall in which researchers fed more than 700 scientific hypotheses into Chat GPT and asked it to report whether the consensus of research showed these hypotheses to be true or false. In 2024, the chatbot was correct 76.5% of the time. When they reran the experiment in 2025, the bot was correct 80% of the time. Wow, that's pretty good, huh? I mean, if you ignore the fact that it's a true false answer, which means by random chance, the bot should be right 50% of the time, which means that the actual score it got is 60% over chance, which is a failing grade. And when it came specifically to Chachi PT identifying false hypothesis, it was correct only 13.6% 6% of the time in 2024 and 16.4% of the time in 2025.
That's not just a failing grade. That's a grade that gets you pulled out of class and checked for a brain tumor. The results get even worse when it's not a simple true false answer and average lay people are involved. In February, Nature published this study that examined how accurate LLMs are when users ask for medical advice. This was pre-registered, which is my favorite, as it means that the researchers confirmed their study design in full before they even started.
So, they couldn't dig through their results to find just whatever interesting data blips might be significant. The researchers asked a team of doctors to come up with 10 different profiles, like, you're a 20-year-old guy who suddenly got a severe headache. Here are the activities you regularly do, and here's your medical background. The doctors went back and forth with one another until they could unanimously agree on scenarios with clear answers for how the patient should respond. So, here's the probable diagnosis, and here is the best course of action. Take care of it themselves at home, talk to their regular doctor, go to urgent care, go to the ER or A&E, or call an ambulance immediately. The researchers then gave those profiles to chat bots and asked them to pretend to be those people and ask another chatbot what they should do.
In those cases, the chatbot got the correct diagnosis almost 95% of the time, but the correct action only 56% of the time. So, that's already not great when it comes to what you should do. But things got even worse when the researchers gave those profiles to human subjects and asked them to use them to ask chat bots questions and figure out what to do. At that point, the correct diagnosis dropped from 95% to 34.5% and the correct action dropped from 56% to 44%. Because humans never even really got good at using simple Google searches to find what they're looking for. And now they think the magic artificial intelligence will give them the right answer regardless of what they ask it.
The researchers found that subjects gave the chat bots incomplete information and in return the chat bots did not convey the correct answers in a way the subjects understood often giving them a choice of several options but without any guidance on how the subject could narrow it down and figure out which one was correct. So, in a real world scenario, ChachiBT could be convincing people with minor symptoms to panic over a diagnosis they surely do not have. And even worse, convincing people with serious symptoms to just pop an aspirin and go to bed. So, I know I'm preaching to the choir here, but please don't use AI for your healthcare or for cranking out [ __ ] papers to submit to sloppy journals or for much of anything else.
to be honest.
Hey everybody, thanks for watching. If you enjoyed this video, please give it a like. If you loved the video, please subscribe. And if you think the world could use more videos like this and you happen to have a few bucks laying around, head to patreon.com/rebecca and join an awesome community of nerds like the people whose names you see on the screen right now. Thanks.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











