Why data science needs predictive analytics but predictive analytics doesn’t need data science
We are firmly in the trough of disillusionment over data science. I’m seeing a trend in my clients — from startups to Fortune 100 — that’s driving that disillusionment. The results of data science are often failing to meet expectations. The fall of TESCO and the pending sale of its analytics business is one of a number of telling case studies.
What follows a business failure is the post mortem — a deep dive into what went wrong and how it’s going to get fixed. As the fog of data science fades, an uncomfortable truth is settling in. I was working with a mid-sized retail client to turn around a data science lab that wasn’t producing what the rest of the business expected. After a painful post mortem the CMO pulled me aside. He stared at me for an uncomfortable length of time and then decided to ask his question: “Why doesn’t any of this feel especially actionable? Why do I feel like I’m overpaying for what I’m getting?”
Moving from descriptive to Predictive
He nailed it. The long and the short of my response was this. Businesses start data science initiatives with a three-slide presentation:
- Gather data;
- Get insights;
- Growth and profit.
Businesses trust data science labs, outsourced data science teams and data science applications to provide the insights on slide 2. In the short term, that works out well.
Data science is incredibly good at descriptive analytics — tthe data visualisations that describe the current state of everything from the business to competitors to customers and more. Data science isn’t as good at extrapolating what will happen next, because the majority of its tools don’t work well for mid-range and long-range predictive models.
Said another way, data science tools are great at telling a business what it should do right now, and really bad at telling leadership what it should plan to do after that. After a while, descriptive analytics begins to feel like a visit from Captain Obvious. Those aren’t the insights the business was looking for on slide 2. Through these post mortems, data science is revealed as a one-trick pony, with businesses expecting a second act that the methodology doesn’t deliver.
I was fortunate to spend some time on the inside of business strategy with some extremely smart strategists before starting my current business. I learned the focus of business strategy is firmly on the future. What’s happening right now is great context, but what a business strategist really wants is a picture of what’s coming to build the right three- to five-year plan. That’s the second act, and predictive analytics delivers where data science stops. To realise the promise of actionable insights on slide 2, business needs predictive analytics.
Differentiating predictive analytics from data science
Data science is excellent at connecting two or more data endpoints. This reveals the relationships between these data points, which opens up a rich set of real-time insights. When X changes, those insights allow the business to understand what’s happening to Y. It’s tempting to say that data science predicts what’s happening to Y based on what’s happening to X, but that’s not really what’s going on. It’s more accurate to say, data science describes what’s happening to Y based on what’s happening to X.
Why am I parsing terms so closely? Let’s use American football as an example. If the quarterback completes a pass to a receiver who’s faster than anyone on the opposing team, with no defenders around him, he’ll score a touchdown. We can make that statement with a high level of certainty. The black swans — like the receiver celebrating too soon and dropping the ball or injuring himself during the run — make our certainty less than 100 per cent, but we’re all pretty comfortable making that statement.
However, we haven’t predicted a touchdown. We’ve defined a set of circumstances where a touchdown is very likely to be scored, that we can only use as the touchdown is being scored. Let’s say the TV signal gets lost in the instant that the fast receiver, all alone near the end zone, catches a pass. Using the relationship between known data points, we can describe what’s happening right now even though we aren’t seeing it ourselves. That’s how most data comes to us in business — as an incomplete picture — which is why data science proves itself so useful in the short term.
Let’s wind the clock backwards a bit to the end of the previous play. The coaches on both sides are making their decisions on play calls. What predictive analytics is good at is connecting two or more event endpoints. Events are little data ecosystems in and of themselves. An event is all the data that describes a specific point in time.
In our example we have all the information each coach is using as well as a model of their decision-making process. Using the event, we can model their play calls. That leads to the next set of events. This is everything from the quarterback or defensive captain making changes at the line to individual player behaviours. We run this series of nested models forward to predict the outcome of that play, which is the second event endpoint — or, in our case, the touchdown.
The complexity difference between a data science model and a predictive model is significant. The tendency to extrapolate a data science model into a predictive model is a huge pitfall. Stringing data science models together to form a predictive model fails very quickly because of the way a data science model handles uncertainty — it essentially ignores it. Although data science-based analytics gives a degree of certainty like 85 per cent, it doesn’t describe what happens the other 15 per cent of the time. Predictive analytics does, which gives rise to that increased complexity.
That’s where the event endpoint versus data endpoint differentiation becomes important. Predictive models look at a more complete picture and, as a result, can provide a more accurate description of not only what’s happening now but what will happen a number of events into the future. A predictive analytics approach includes methodologies to obtain the missing bits of information through very large datasets or experimentation.
Strategy: the business case for predictive analytics
The evolution of analytics capabilities must move quickly from descriptive to predictive to keep pace with business needs. The driver of that process is business strategy. The decisions that surround the strategy planning process are filled with uncertainty. The goal of predictive analytics is to remove as much uncertainty as possible from the process so strategists can make better decisions based on more complete information.
When I show clients the fruit of predictive models, it’s like giving them a bright flashlight while they’re walking through a dark forest. That’s the key business need justifying the effort behind predictive analytics. Why does a child run into the street without looking? They can’t see the potential consequences of that action. Parents have that foresight and can make better decisions resulting in better outcomes. In this case that leads to keeping their child from being hit by a car. In a business case, that can lead to executives preventing the business from being hit by a disruption like a new competitor.
Imagine the advantage of being able to see one step farther than competitors. That’s the shift in capabilities that predictive analytics brings to business strategy. It’s one that data science methodologies can’t. The solution for business’s disillusionment with data science is predictive analytics. Let’s move on to act two.