Large Language Models Image Generation AI in Healthcare

General-purpose AI models outperformed specialized clinical AI tools across three independent tests, suggesting that widely available large language models may match or exceed the performance of proprietary medical software.

Hacker News4d ago3 min read

Summaries like this, in your inbox every morning.

3 Key Points

1
What happened: Researchers compared two specialized clinical AI tools—OpenEvidence and UpToDate Expert AI—against three frontier general-purpose large language models (GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6) using three evaluation stages: 500 medical knowledge questions, 500 clinician-alignment items, and 100 real clinical queries reviewed by 12 US clinicians. On medical knowledge questions, Gemini achieved 97.4% accuracy, GPT 94.2%, and Claude 90.2%, while the clinical tools scored 89.6% (OpenEvidence) and 88.4% (UpToDate). Frontier models also outperformed clinical tools on clinician-alignment scoring and real-world clinical queries.
2
Why it matters: Specialized clinical AI tools are entering medical practice at scale, yet their internal design and training remain proprietary—making it difficult for clinicians and health systems to assess their value independently. This study provides the first rigorous, blinded comparison showing that general-purpose models available to anyone may perform as well as or better than tools marketed specifically for clinical use, challenging the assumption that domain-specific AI tools offer superior clinical performance.
3
What to watch: The clinical tools performed comparably to auto-enabled Google Search AI Overview on real clinical queries, suggesting that physicians may be relying on similarly capable (and more widely accessible) tools already. The research highlights the need for independent evaluation before specialized clinical AI tools enter medical settings at scale.

Discussion

No comments yet. Be the first to share your thoughts!

Noam Shazeer, co-author of a landmark AI paper and Google's Gemini co-lead, is leaving Google for OpenAI after a two-year return stint.

THE DECODER1h ago

Anne Hathaway caught job candidates using identical AI-written thank-you notes, warning that hiring managers can spot the deception—and it costs applicants the job.

Fortune AI1h ago

OpenAI is beginning to display ads in ChatGPT in Japan, marking a shift as generative AI takes over roles traditionally held by search engines.

Nikkei AI Stocks1h ago

Midjourney launches a full-body ultrasound scanner as a medical device, positioning it as a faster, safer alternative to MRI and calling it the first new whole-body imaging modality in 50 years.

Latent Space4h ago

SIGIL, an open-source tool for securing LLM prompts using cryptographic signatures, launches to eliminate reliance on third-party servers for AI security.

Hacker News4h ago

GenDB, an AI-powered query engine that generates optimized code for databases, achieves 3.2× to 462× faster execution than traditional systems like DuckDB and PostgreSQL on standard benchmarks.

Hacker News4h ago

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →