Artificial intelligence algorithms are being built into almost all aspects of health care. They’re integrated into breast cancer screenings, clinical note-taking, health insurance management and even phone and computer apps to create virtual nurses and transcribe doctor-patient conversations. Companies say that these tools will make medicine more efficient and reduce the burden on doctors and other health care workers. But some experts question whether the tools work as well as companies claim they do.
AI tools such as large language models, or LLMs, which are trained on vast troves of text data to generate humanlike text, are only as good as their training and testing. But the publicly available assessments of LLM capabilities in the medical domain are based on evaluations that use medical student exams, such as the MCAT. In fact, a review of studies evaluating health care AI models, specifically LLMs, found that only 5 percent used real patient data. Moreover, most studies evaluated LLMs by asking questions about medical knowledge. Very few assessed LLMs’ abilities to write prescriptions, summarize conversations or have conversations with patients — tasks LLMs would do in the real world.
Read the full article here