After Watson, IBM’s artificial intelligence (AI) program, shot to fame in the game show Jeopardy! in 2011, it seemed to have taken a step onto the world stage.
In quick succession, Watson established partnerships with prominent medical institutions, such as the Veteran Affairs in the U.S. government, Sloan Kettering, Mayo Clinic, and Cleveland Clinic, to apply AI to cancer care.
However, Watson’s subsequent foray into the world of oncology didn’t go anywhere near as smoothly as its domination of Jeopardy! did. Although Watson has become more accurate over the years, in cancer diagnostics, erroneous recommendations have more serious consequences than a missed question on Jeopardy! As a result, many high-profile collaborations between medical institutions and IBM ended quietly.
Why did Watson’s awesome computational power fall short in oncology?
It boils down to the fact that winning a game show and fighting cancer are very different tasks for Watson and that Watson was trained differently for them.
It boils down to the fact that winning a game show and cancer diagnostics are very different tasks, so Watson was trained with different tools for them; and as a result, achieved different levels of success.
Data training for AI algorithms
There are two components of AI: training and inference. Before an AI program can be deployed to make decisions (inference), it needs to be trained so that it will achieve a minimal level of error during its predictive analysis.
Training an AI program is analogous to teaching a student. Both the AI software and the student need help to establish a system of thinking using external information so that they can solve the same or similar problems in the future.
A protocol of problem-solving equations, also called an algorithm, is designed into the AI software. Then, in training sessions, the algorithm analyzes existing data – like students learning from textbooks – to establish parameters. This represents a thought process that the AI can use in future analytics.
IBM's Watson triumphs in Jeopardy! in 2012. Credit: IBM
In supervised training, the algorithm works with datasets that are completely labeled. In other words, there is a clear relationship between the input (the “question”) and the output (the “answer”) value. After the algorithm computes with the input value, it will get instant and precise feedback as to if its calculation matches the output (the “answer”). This way, it can adjust quickly to increase its chance of getting the right answer the next time.
In unsupervised training, the data are unlabeled. It’s therefore more challenging for the algorithm to learn the relationship between the parameters. In semi-supervised learning, only some of the data is labeled; as a result, the effectiveness of semi-supervised learning falls between supervised and unsupervised learning.
Different data training for Jeopardy! and oncology
When set to the task of winning Jeopardy! or board games such as chess, the AI software will search for the outcome most likely to lead to victory – a checkmate in chess or a correct answer in Jeopardy!
The training of such AI game problems is supervised, as the datasets contain a large number of previous chess matches or pairs of question and answer in Jeopardy! These datasets are completely labeled, with clear relationships between the input and outcome.
On the other hand, completely labeled datasets are not feasible for the training of Watson for oncology. Since many lab results are quantitatively analyzed and AI excels in processing and analyzing image scans, it is relatively straightforward to train Watson in diagnostics. A2018 paper in The Oncologist reported that Watson was able to achieve very high accuracy when it dealt with clear, defined tasks like diagnosis.
However, it is much harder to train Watson with unstructured, abbreviated, and often subjective information on a patient, such as doctors’ notes and hospital discharge summaries, which make up close to 80% of a patient’s record.
Jeopardy! and oncology are different tasks
With Jeopardy!, Watson has the perfect scenario. The question-and-answer format is specific and defined. Watson was trained on and tested with quiz questions written in the same style. Therefore, the collection and preparation of data for analytics is relatively straightforward. All Watson needs is tremendous computational power to crunch through lots of data and determine the most likely answer.
On the other hand, oncology contains more complexity. In fact, oncology is several problems rolled into one: diagnostics, culling information from previous journal publications and analyzing the unstructured patient information. While AI does well analyzing the quantitative data in lab results, it does not yet have the capability to analyze texts that are rich in context and nuance.
Almost every journal article and doctor’s note is written by a different author, and each has varying use of jargon and shorthand. Parsing out the content and the relationship between different components (genetic mutations, symptoms, signaling pathways, etc.) in a ten-page, densely written journal article is much more complex than dissecting a one-sentence Jeopardy! question. Humans find it comparatively easy to figure out which of the multiple points made in the paper is most crucial in a particular context; in comparison, Ais such as Watson can’t pick up such nuance easily.
Similarly, doctors’ notes often contain incomplete information and details that are vaguely written or not organized in chronological order. Humans can decide which detail is more crucial in a particular context, but AI can only work according to a defined protocol; as a result, these system do not have the flexibility to weigh one type of detail against others in one paper and do the opposite in the next paper.
Moreover, there are what engineers refer to as the “unknown unknowns”, which may introduce bias that affects your analysis without you realizing it. Here, the unknown unknown may be mechanisms, genes, pathways, or interactions whose connection with cancer have not been identified. While these also pose a problem to humans, our brains are more flexible in assessing the significance of the unknown unknowns. However, AI are less effective in assessing these variables without prior instruction. Lastly, if the doctors have biases, then Watson will inherit their bias via training with their notes; the bias may affect the accuracy of the diagnosis.
As a result, collecting and preparing data from journal articles and doctors’ notes for analytics will be messy. The same The Oncologist paper that showed Watson’s proficiency in diagnostics also observed that Watson scored poorly for time-dependent, complex recommendations like therapy timelines. In addition, Watson performed inconsistently during its evaluation of different types of cancers, doing better in some types than others.
What Watson does well in medicine
AI systems like Watson can still have broad applications in medicine. AI or AI-powered robots excel in performing repetitive tasks with defined steps, such as simple routine surgeries of the eye or hair, analysis of X-rays or other scans, checking on patients between office visits and handling administrative billing or claims.
Watson has also had success analyzing clear, structured data such as genetic information. For example, the University of North Carolina recently published a paper on the effectiveness of Watson Genomics. During a study, Watson was able to identify previously unidentified mutations that proved important to therapeutic recommendations.
Right now, AI already has broad applications in the manufacturing sector. By taking over tasks that are repetitive, tedious and dangerous, AI leaves humans to do more complex and nuanced problem-solving. This way, AI and humans can work side-by-side to achieve higher efficiency and lower inaccuracy.
The same lesson can be applied to healthcare, where AI takes on the more menial tasks and leaves the humans to deal with the ambiguity and complex.
Meanwhile, continuously retraining of the algorithm with new data, such as novel genes, pathways and biomarkers that are involved, may reinforce AI’s learning and improve its accuracy.