Fine-Tuned Large Language Models Enhance Error ID in Radiology Reports
By Elana Gotkine HealthDay Reporter
TUESDAY, May 27, 2025 -- Large language models (LLMs), fine-tuned on radiology reports, enhance error detection in radiology reports, according to a study published online May 20 in Radiology.
Cong Sun, Ph.D., from Weill Cornell Medicine in New York City, and colleagues developed and evaluated generative LLMs for detecting errors in radiology reports pertaining to health care proofreading in a retrospective study. A dataset was constructed with two parts: The first included 1,656 synthetic chest radiology reports generated by GPT-4 (OpenAI) with 828 error-free synthetic reports and 828 containing errors. A total of 614 reports were included in the second part: 307 error-free from the MIMIC chest radiograph (MIMIC-CXR) database and 307 synthetic reports with errors generated by GPT-4. Using zero-shot prompting, few-shot prompting, or fine-tuning strategies, several models were refined, and the performance of these models was assessed.
The researchers found that the fine-tuned Llama-3-70B-Instruct model achieved the best performance using zero-shot prompting, with F1 scores of 0.769, 0.772, 0.750, 0.828, and 0.780 for negation errors, left/right errors, interval change errors, transcription errors, and overall, respectively. Two radiologists reviewed 200 randomly selected reports output by the model in a real-world evaluation phase; 99 were confirmed by both radiologists to contain errors detected by the models and 163 were confirmed by at least one radiologist.
"The findings show that fine-tuning is crucial for enabling local deployment of LLMs while also demonstrating the importance of prompt design in optimizing performance for specific medical tasks," the authors write.
One author has patents planned, issued, or pending with Weill Cornell Hospital.
Editorial (subscription or payment may be required)
Disclaimer: Statistical data in medical articles provide general trends and do not pertain to individuals. Individual factors can vary greatly. Always seek personalized medical advice for individual healthcare decisions.

© 2025 HealthDay. All rights reserved.
Posted May 2025
Read this next
Women Have Worse Outcomes Than Men With Beta-Blockers After Acute MI
TUESDAY, Sept. 2, 2025 -- For women with myocardial infarction (MI), beta-blocker therapy is associated with worse outcomes, according to a study published online Aug. 30 in the...
Child's ZIP Code Tied to Odds of Being Injured, Killed by a Firearm
TUESDAY, Sept. 2, 2025 -- Where a child lives is tied to a their odds of being injured or killed by a firearm, according to a study published online Aug. 25...
Older Age, Lower Fitness Linked to Atrial Ectopic Burden
FRIDAY, Aug. 29, 2025 -- Age is a significant independent risk factor for atrial ectopic burden (AEB) and ventricular ectopic burden (VEB), according to a study presented at the...
More news resources
- FDA Medwatch Drug Alerts
- Daily MedNews
- News for Health Professionals
- New Drug Approvals
- New Drug Applications
- Drug Shortages
- Clinical Trial Results
- Generic Drug Approvals
Subscribe to our newsletter
Whatever your topic of interest, subscribe to our newsletters to get the best of Drugs.com in your inbox.