H.After demonstrating the ability of this technology to perform many tasks faster, cheaper, and in some cases better than humans, healthcare companies are looking to incorporate generative AI tools into their product pipelines and IT systems. competing.
But efforts to harness the power of so-called large-scale language models, trained on vast amounts of data, have outpaced efforts to appreciate their value. AI experts still understand how and why AI works better than previous systems, and what blind spots might undermine its usefulness in medicine, I am trying to explain.
For example, it remains unclear how well these models will perform and what privacy and ethical challenges will arise when exposed to new types of data such as gene sequences, CT scans and electronic medical records. is. Even knowing exactly how much data a model needs to be fed to achieve the best performance on a given task is still mostly a guess.
“There is no satisfactory mathematical or theoretical explanation for exactly why these models have to be so large,” said Zachary Lipton, a professor of computer science at Carnegie Mellon University. “From millions of parameters he increases to 5 trillion parameters, why does it seem better? These are all open technical questions.”
STAT reporters invite AI experts to explain the history and foundations of large-scale language models and other forms of generative AI designed to generate answers in response to prompts. I asked a question. How accurate these responses are depends primarily on the data used to train them. STAT also called on experts to demystify the many misconceptions swirling around these systems as healthcare companies seek to apply them to new operations. Here’s what they think you should know before betting your patient’s health and profit expectations on ChatGPT’s first impression.
What a generative AI model is actually doing when generating an answer
In short, they are doing math.
More precisely, they’re doing the same kind of autocomplete that’s been built into our email and tools like automatic language translators for years.
“AI is identifying and reproducing patterns,” University of Michigan computer scientists Jenna Wiens and Trenton Chan wrote in response to STAT questions. “Many generative models of text are fundamentally based on predicting the probability that each word will come next, using probability as a proxy for how ‘reasonable’ an answer is. ”
Heather Lane, senior architect on the data science team at athenahealth, told STAT: It’s statistically likely to follow them, but there’s no “real understanding” of what it’s doing. AI models create ideas of what is “statistically likely” from vast amounts of data (including Wikipedia, Reddit, books, and elsewhere on the Internet), and what is “good” from human feedback on the answers. So you’ll learn.
This is very different from how humans think, and certainly far less efficient and limited than the reasoning systems that define how the brain processes information and solves problems. If you think large language models are on the verge of becoming artificial general intelligence (the Holy Grail of AI research), you are wrong.
A much better method than the generation AI of previous versions
This is largely because it was trained on much more data than previous versions of generative AI, but over the last few years several factors have converged to create the powerful models we have today.
“To start a fire, you need oxygen, fuel and heat,” Elliot Bolton, a research engineer at Stanford University who works on generative AI, told STAT via email. Similarly, in the past few years, the development of a technology called “transformers” (the “T” in “GPT”) combined with huge models trained on vast amounts of data with enormous computing power yielded impressive results. The results we saw today.
“People forget it was only 12 years ago, but if someone used all of Wikipedia to train[AI]this was a breathtakingly large study,” Lipton said. Told. “But now when people train language models, they’re training on all the text on the internet, or something like that.”
Models such as OpenAI’s GPT-4 and Google’s PaLM 2 are trained on so much data that they can recognize and reproduce patterns more easily. Still, its fluidity in producing complex output, such as pieces of music or pieces of computer code, would have made such a leap from perfecting his message to writing his essays on late 19th-century Impressionism. It was a surprise to AI researchers who hadn’t
“These larger models have been trained on much more computational resources and much more data and have proven to be amazingly capable,” Lipton said. Models can also be updated with data in new or different formats, or incorporated into existing products such as Microsoft’s Bing search engine.
They may look smart, but they are far from intelligent
Language models learn language in much the same way that young children do, but these models require much more training data than children, Lane said. Their verbal abilities are not grounded in understanding the world or causality, so they also fail spatial reasoning and math tasks.
“It’s very easy to make a model look stupid,” added Lipton of Carnegie Mellon University. “These are ultimately text processing engines. They have no idea that the world this document refers to exists.”
However, as more people start using them, more people rely on them to perform tasks that they previously struggled to do alone, such as writing and summarizing information, so they become more human. There is a lot of uncertainty about how it affects the intelligence of humans, he said. .
“My biggest fear is that they will somehow interfere with what we are doing and we will not be as creative as we used to be,” he said. .
There are ways to deal with the ChatGPT forgery problem
These generative AI models are only predicting probable and convincing texts, so they have no basis for understanding what is true and what is false.
“We don’t know that we’re lying because we basically don’t know the difference between truth and lies,” Lane said. “This is no different than dealing with a human being who sounds incredibly compelling and very compelling, but whose words are literally not tied to reality.”
That’s why it’s important to ask a few quick questions before using a model for a particular task. “Who built the model?” Was it trained using data likely to contain reliable information relevant to its intended use? Such bias or misinformation may arise?
This is especially important in the medical field, where inaccurate information can have a variety of negative consequences. “I don’t want doctors to be trained on Reddit. I don’t know you,” said Nigam Shah, a professor of biomedical informatics at Stanford University.
That doesn’t mean it’s impossible to improve the accuracy of a model that may have been biased or misinformed in its training. Builders of generative AI systems can use a technique known as reinforcement learning. This involves giving the model feedback so it learns which responses are more accurate and useful when judged by a human expert.
The technology was used to build GPT-4, but the model’s maker, OpenAI, has not disclosed what data was used to train it. Google created a large-scale language model known as MedPalm-2, trained on medical information designed to make healthcare-related uses more relevant.
“As generative AI models advance, ‘hallucinations’ are likely to decrease,” said Ron Kim, senior vice president of IT architecture at Merck.
The apocalypse probably won’t happen, but we need guardrails
The hype surrounding ChatGPT has raised new concerns that AI will take people’s jobs, or run wild in some way.
However, many researchers in the field are much more optimistic about this technology and its potential in medicine. Thomas Fuchs, director of the Mount Sinai Office of Artificial Intelligence and Human Health in New York, said that in the broadest sense an apocalyptic scenario is “extremely unlikely” and why speculation of fear hinders the potential of artificial intelligence. said it wouldn’t be Democratize access to quality healthcare, develop better medicines and reduce pressure on scarce physicians.
“In medicine, patients are dying today not because of AI, but because of lack of AI,” he said.
While there are many examples of inappropriate use of algorithms in medicine, experts hope GPT can be used responsibly with the right guardrails. There are no regulations specific to generative AI yet, but there is a growing movement for rules.
“At least for now, we need to enumerate the use cases where it makes sense and is low risk to use generative AI for a specific purpose,” said John Halamka, president and co-investigator of the Mayo Clinic Platform. I’m here. He leads the Coalition for Health AI, which is debating what kind of guardrails are appropriate. He said that GPT-based tools might be good for drafting insurance denial appeals or helping non-native English speakers organize scientific papers, but other use cases are: He said it should be off limits.
“Something to ask [generative AI] Creating clinical summaries or providing diagnostic assistance to physicians is probably not the use case we would choose today,” he said.
However, as technology advances and our ability to perform such tasks increases, humans will need to determine whether relying too heavily on AI will undermine their ability to think about problems and write their own answers. .
“What if what we really needed was smart people wondering what they were trying to say,” Lipton said. “And doesn’t he just let GPT-4 bury what someone might have plausibly said in the past?”
This story is part of a series that explores the use of artificial intelligence in medicine and the practice of patient data exchange and analysis. This project is supported by funding from the Gordon & Betty Moore Foundation.