Skip to main content

Large Language Models Fall Short in Breast Imaging Classification

Medically reviewed by Carmen Pope, BPharm. Last updated on May 10, 2024.

By Elana Gotkine HealthDay Reporter

FRIDAY, May 10, 2024 -- Large language models (LLMs) appear to fall short in classification of breast imaging, which can have a negative impact on clinical management, according to a study published online April 30 in Radiology.

Andrea Cozzi, M.D., Ph.D., from the Ente Ospedaliero Cantonale in Lugano, Switzerland, and colleagues examined the agreement between human readers and LLMs for Breast Imaging Reporting and Data System (BI-RADS) categories assigned based on breast imaging reports written in three languages (Italian, English, and Dutch) in a retrospective study. Using only the findings described by the original radiologists, board-certified breast radiologists and the LLMs, including GPT-3.5, GPT-4 (OpenAI), and Bard, now known as Gemini, were assigned BI-RADs categories. Agreement between human readers and LLMs was assessed using the Gwet agreement coefficient (AC1 value).

The researchers observed almost perfect agreement between the original and reviewing radiologists across 2,400 reports (AC1, 0.91), while moderate agreement was seen between the original radiologists and GPT-4, GPT-3.5, and Bard (AC1, 0.52, 0.48, and 0.42, respectively). Differences were observed in the frequency of BI-RADS category upgrades or downgrades that would result in alterations in clinical management across human readers and LLMs (4.9 percent for human readers versus 25.5, 23.9, and 18.1 percent for Bard, GPT-3.5, and GPT-4, respectively) and that would negatively impact clinical management (1.5 percent versus 18.1, 14.3, and 10.6 percent, respectively).

"The results of this study add to the growing body of evidence that reminds us of the need to carefully understand and highlight the pros and cons of LLM use in health care," Cozzi said in a statement.

Several authors disclosed ties to the pharmaceutical and health technology industries.

Abstract/Full Text

Disclaimer: Statistical data in medical articles provide general trends and do not pertain to individuals. Individual factors can vary greatly. Always seek personalized medical advice for individual healthcare decisions.

© 2025 HealthDay. All rights reserved.

Read this next

Declining Childhood Vaccination May Increase Risk for Vaccine-Preventable Infections

WEDNESDAY, April 30, 2025 -- Declining childhood vaccination rates may increase outbreaks of eliminated vaccine-preventable infections within the United States, leading to a...

AACR: Incidence-Based Mortality Dropping for Young Women With Breast Cancer

TUESDAY, April 29, 2025 -- Incidence-based mortality (IBM) declined from 2010 to 2020 among women aged 20 to 49 years diagnosed with breast cancer, according to a study presented...

AACR: Nonsurgical Treatment Feasible for Mismatch Repair-Deficient Tumors

TUESDAY, April 29, 2025 -- A neoadjuvant programmed cell death 1 (PD-1) blockade enables nonoperative management among patients with early-stage mismatch repair-deficient (dMMR)...

More news resources

Subscribe to our newsletter

Whatever your topic of interest, subscribe to our newsletters to get the best of Drugs.com in your inbox.