IBM tells a nice story about Watson and oncology in India. The country has a paucity of oncologists — roughly 2000 in fact — to cover a population of 1.3 billion. Get cancer in India and your chances of receiving specialist care are, in relative terms, almost non-existent.
Enter Big Blue’s artificial intelligence platform. The company threw its AI firepower at the problem and — by IBM’s telling — within a few years the machine had ingested so much information and learnt so much that it could outperform cancer specialists in both diagnosis and treatment recommendations. As a story it is a powerful way to illustrate the potential of artificial intelligence.
More a series of APIs and applications than a unified platform, Watson is one of IBM chief Ginni Rometty’s great bets, as she recasts the global company and oversees another operational transformation.
A $US60M failure
However, with all the sound and fury that attends artificial intelligence and machine learning today, it is easy to lose sight of the fact that these are early days. Failures might be as likely as successes.
That is a hard and expensive lesson that IBM and one of its customers — the MD Anderson Cancer Centre of Texas — have learned over the last five years.
And it is important not to sheet all the blame home to IBM. PricewaterhouseCoopers was paid $US23 million for its part of the project yet, despite contractual stipulations, the program was never piloted. Likewise, MD Anderson hardly followed the project management textbook, and its travails have as much to do with poor governance — the traditional killer of technology projects through the ages. The CEO of MD Anderson, Dr Ronald DePinho, resigned yesterday, admitting a failure of management. His wife, Dr Lynda Chin, was the leader of the project.
An audit of the failed AI project by the University of Texas graphically showcases all the ways in which the dreams of artificial intelligence can turn into nightmares and multimillion-dollar black holes.
In 2012 MD Anderson, part of the University of Texas, approved a small investment in developing a proposed clinical advice system: the Oncology Expert Advisor (OEA). Doctor Chin described the ultimate goal of the OEA as being to elevate the standard of cancer care world-wide. The Advisor was not human.
The audit report outlines the background:
- Earlier in 2011, the IBM Watson (Watson) artificial intelligence system received global attention by winning a Jeopardy! exhibition against the game show’s two highest-rated players. Dr. Chin told us her idea was to use Watson technology to improve cancer treatment, and she approached IBM with her idea. The first MD Anderson contract related to development of OEA using Watson technology was signed with IBM in June 2012 — specifically to develop a “pilot solution that will enable MD Anderson to analyse MD Anderson’s data to derive insights into patient outcomes”. The contract states that IBM had been working with health insurer WellPoint, Inc. since September 2011 “to develop and commercialise a Watson-based diagnosis and treatment decision support system for Oncology”.
- The contract also noted that one deliverable of the project would be to “help enable both MD Anderson and IBM to understand how data developed within this type of solution may be incorporated into the WellPoint Watson Oncology Solution”. The contract specified that version 1.0 of MD Anderson’s product would “focus on lower risk” myelodysplastic syndrome (MDS) leukemia patients.
Basically, MD Anderson wanted to build software that could “ingest data from a variety of sources and use IBM Watson artificial intelligence technology to offer care advice and match patients with clinical trials”.
The first stage got rolling after an initial payment of $US2.4 million. Four years later, the project had blown out to more than $US60 million. The research fund was overdrawn by nearly ten million dollars, and the Oncology Expert Advisor developed by IBM was little better than vapourware. It remains unusable.
The University of Texas audit uncovered systemic failures in Anderson’s contracting and procurement practices, its compliance with the University’s approval requirements, the delivery of those requirements by the vendors, and even the way in which the Center sourced the project’s funding.
Amongst serial failures in contracting and procurement, auditors discovered that “only one of the seven OEA-related service agreements was procured through a competitive process” and, of the six awarded with competition, two “were not formally justified and approved”.
MD Anderson staff, when questioned about the rather lax process for awarding multimillion-dollar contracts, said they recalled “understanding” that IBM would be engaged on the basis of the proprietary nature of Watson technology and its exclusive use by the Center. That understanding doesn’t seem to have been shared by their technology partner.
Dr Chin told auditors that because the OEA project was high risk and transformative, it wasn’t suitable for routine funding. And, being funded directly by donors, it wasn’t suitable for management under the University’s compulsory IT governance processes. These processes, the auditors, reported, would have prevented the project’s management failure, and possibly even its overall failure. They did not agree with Chin that it was “not an IT project”.
The auditors’ job was made all the more difficult by the ad hoc and inconsistent way in which contracts and spending were processed. In spite of the tens of millions of dollars involved, obtaining all relevant contracts and amendments was “challenging”.
Much of the expenditure seemed to have been structured to avoid oversight, with many sums authorised just below levels at which external supervision would have been triggered.
IBM was not the only vendor involved. PricewaterhouseCoopers was retained to provide a business plan for exploiting the OEA — a task which inflated from an initial payment of less than two million dollars to a final reckoning of more than twenty million. The management of PWC’s contracts was found to be little better than those with IBM.
The project itself suffered from mission creep. IBM’s original focus on “lower risk” leukemia patients was repeatedly expanded “to include five additional types of leukemia, then in December 2014 to include lung cancer”. PwC’s agreements were also subject to significant changes in scope. Some of these amendments took the Oncology Expert Advisor into territory beyond the work approved by the University’s Board of Regents.
The payment system was no better organised.
The auditor reported that some “invoices were paid in full regardless of whether contracted services were delivered as agreed upon”, while others for completed work languished unpaid for significant periods of time.
From a sample of ten invoices, the audit was able to find evidence of correct review and approval for two. Email chains discussing another three invoices were found, but without any evidence that Center staff had followed the necessary and expected procedures to authorise expenditure. For another four invoices sampled, staff recalled having received verbal approval to pay, but could produce no documentation.
And of the last sampled invoice, there was no conclusive evidence of review and approval prior to payment.
There was nothing unique about the Oncology Expert Advisor project that set it up for failure. It was always going to be difficult to make it work. But the University of Texas audit demonstrates that, while you can’t guarantee success by following established procedures, you can almost certainly guarantee failure by ignoring them.
We asked data science specialists about the lessons for other companies from the problems at MD Anderson.
Expectations are a big part of the problem, according to Sri Annaswamy, founder and director of Swamy and Associates, a Sydney-based independent advanced analytics and BPM research and advisory business.
“AI — or, more correctly, machine learning — technologies are currently the subject of a mega-hype by vendors such as IBM and consulting firms such as PwC. Hence,” he said, “boards need to ask searching and tough questions right from the start of an AI project.”
Specifically he cautions that, unless commercial value in dollars and cents is established from each and every pilot, the board should not commit funds to scale these AI pilots into massive AI engagements
He also made the point that 90 per cent of machine-learning algorithms and their code are now available open source through various communities such as TensorFlow, Scikit Learn, H2O and Weka. “A compelling reason must be presented for the likes of IBM Watson and PwC to be engaged using their proprietary models and approaches in the light of this.”
Annaswamy told Which-50 that “Insitutionalisation of AI and machine learning is the most critical aspect of any AI engagement. If that does not happen, the project will fail.”
Walter Adamson, General Manager of Victoria-based KINSHIP digital, pointed out that the the Watson MD Anderson project never reached the stage where end users had any trust or confidence in the results. Instead, it was the “wetware” — the human and organisational factors including the setting of expectations — which were not calibrated realistically from the start.
“Even when AI and data science projects work, technically the biggest hurdle is in the last mile. That is getting users to trust and accept the guidance from the system in how they do their day-to-day jobs. Where KPIs and performance bonuses are involved, people are naturally extremely reluctant to accept guidance from something that they do not trust,” he said.
“Addressing this last mile needs to be done at the beginning of such projects — not as a band-aid at the end — if greater success is to be achieved.”
We have reached out to IBM for a comment.
*Andrew Birmingham also contributed to this story