AI’s Sources of Truth: How Chatbots Cite Health Information

Large language models (LLMs) have rapidly become a default source for health advice, from the most basic inquiries such as "What causes migraines?" to more complex ones like "What are the treatment options for autoimmune disorders? However, the sources from which these AI models acquire their medical knowledge remain unclear.
Our study, “AI’s Sources of Truth: How Chatbots Cite Health Information”, examined health-related queries to address this question. Leveraging a wide array of health-related prompts, researchers collected 5,472 citations generated by ChatGPT (GPT-4o with browsing), Google Gemini (2.5 Flash), Claude (Sonnet 4), and Perplexity (Sonar mode). By analyzing the citations produced by widely used chatbots, the study identifies the predominant websites, examines the recency of sources, characteristics of the content cited, and notes whether the information is accessible or paywalled.
Key Findings
- The most frequently cited domain, PubMed Central (pmc.ncbi.nlm.nih.gov) with 385 citations, represents less than 0.1% of all citations.
- Institutional health media such as Cleveland Clinic and Mayo Clinic play a prominent role.
- Nearly one in three citations (30.7%) comes from health media sources. Commercial and affiliate-driven sites comprise 23.1%, and academic/research sources 22.9%.
- Domain Rating analysis shows 62.4% of citations originate in domains with the highest authority ratings (DR 81-100), while only 2.7% come from the lowest tier.
- Most citations are recent, with almost two-thirds dated 2024 or 2025.
- Chatbots rely more on summaries and interpreters of science (59%) than peer-reviewed research (41%).
- Perplexity averages the most citations per answer (14.97), followed by Claude (13.99), ChatGPT (13.59), and Gemini (12.29).
- 99.3% of citations are open access, with almost no broken links (0.2%).
Conclusion
The data shows that chatbots overwhelmingly prefer accessible, recent, high-authority sources, and favor summaries over original research. Differences exist between LLMs: Perplexity favors commercial and user-generated content, Claude approaches research parity, ChatGPT draws heavily from health media, and Gemini focuses on government/NGO sources. AI is changing how we interact with healthcare information, but careful validation remains critical.
Methodology
This study analyzed 5,472 unique citations generated by AI chatbots in response to health-related prompts.
Data Collection
We built a prompt set designed to mimic real-world health queries, ranging from general wellness advice to more technical medical topics. These prompts were run through four major web-enabled large language models during August 2025:
- ChatGPT (Web browsing mode, GPT-4o)
- Google Gemini (2.5 Flash)
- Claude (Sonnet 4)
- Perplexity (Sonar mode)
All links surfaced in chatbot responses were extracted and cleaned before classification. The final dataset comprised 1,497 citations from Perplexity, 1,217 from Gemini, 1,359 from ChatGPT, and 1,399 from Claude.
Limitations
Findings represent a snapshot at a single point in time. Citation patterns may shift as generative AI systems are updated and retrained.
Turn Healthcare
Insight
into Accelerated Growth
Our healthcare growth teams works closely with you to design strategies tailored to your unique goals and market dynamics, fully focused on growth.