Do 87% of Data Science projects fail?

user5724987
7 min readNov 13, 2021

--

Photo by Samuel Bourke on Unsplash

In recent years both machine learning engineering and “MLOps” have become really popular. Especially MLOps (Machine Learning Ops or Operations) which was being promoted as a silver bullet for reducing the technical debt in ML projects and making the productization of projects much easier.

I am pretty sure that you have seen some of diagrams and phrases being used repeatedly in almost every MLOps presentation, book or blog post.

Probably the best example of that could be the famous diagram coming from a “Hidden Technical Debt in Machine Learning Systems” paper authored by D. Sculley et al.

I mean, paper is great and the diagram is pretty accurate, but I cannot stop myself from smiling when I see the same picture (usually with just different colors to match presentation style) over and over.

87% of data science…

But this is not the point of this post. There is yet one more thing being used over and over in machine learning resources. Have you ever heard that only 1 out of 10 data science projects make it into production?

No? Maybe the statement that 87% of data science projects never make it into production — sounds familiar to you? I hear and see that a lot, mostly as a justification for MLOps or rather MLOps books, courses and everything that can be sold if only justified.

But not only then. You can see this statement in Forbes article, StackOverflow blog and all over the Internet in blog posts and conference videos. This quote, sometimes paraphrased, is a must-have if you are targeting the business side of MLOps.

But is that really true? Where did that number come from? I checked that.

Step one: VentureBeat Article

Where does that statement come from? It seems that everybody cite the VentureBeat article entitled: Why do 87% of data science projects never make it into production? So let’s find out why is that.

Figure 1. Venture Beat Article | Source: VentureBeat.com

The article was written in July 2019 and (to be fair I have to point it out) is a sponsored article which references a talk (panel) from VentureBeat Transform 2019 conference. It is basically nothing but a short commentary mixed with quotations from the panel: What the heck does it even mean to “Do AI”?

But if this is a universal understanding, that AI empirically provides a competitive edge, why do only 13% of data science projects, or just one out of every 10, actually make it into production?

There are three ways to get started, and avoid becoming one of the 87%, Chapo said. Pick a small project to get started, he says — don’t try to boil the ocean, but choose a pain point to solve, where you can show demonstrable progress. Ensure you have the right team, cross-functionally, to solve this. And third, leverage third parties and folks like IBM and others to help accelerate your journey at the beginning.

Once again we can notice these bold statements that 87% of data science projects fail or don’t make it into production. But where does this number come from? I didn’t find the answer in that article, so I decided to watch the presentation (there was no link to the video, I had to find it myself on YouTube) — it must be there.

Step two: Transform 2019 conference

Figure 2. Transform 2019 Panel (Source: YouTube)

Here I am. Watching the recording of the panel from Transform 2019 conference. So I assume this is where it all comes from and I will finally learn more about the magic number which is passed forward from one MLOps presentation to another.

By the way, I couldn’t help but notice that this video has only 353 views and 0 comments after 2 years from the upload date. So I assume that not many people were interested in figuring out why do almost 9 out of 10 machine learning projects fail. It’s okay, I will find out for all of you.

The video is 26 minutes long and I set my ears to catch the moment one of speakers mention that 87% of data science projects fail (or that only 13% of project succeed or anything similar). I watched this video three times just to be sure I don’t miss anything and I got it.

Around 10th minute, you can hear:

I think CIO Dive Magazine says that only 13% of data science projects actually make it into production. 13%. I mean, that’s a staggering number…

Here it is! Said by Deborah Leff — Global Leader and Industry CTO for Data Science and AI, IBM. Unfortunately, it’s just yet another breadcrumb that I need to follow because apparently Transform 2019 panel is not a source of the information I’m trying to confirm.

Let’s find that CIO Dive article then…

Step three: CIO Dive Magazine says that…

Figure 3. An article by James Roberts | Source: CIODive.com

In 2017, two years before Transform 2019 conference, James Roberts (Chief Data Scientist at Quisitive in these days) wrote a guest article in CIO Dive Magazine. It’s called 4 reasons why most data science projects fail and I expect to finally reveal why 87% of data science projects fail and how somebody measured that i.e. where does that magic number come from.

The article is relatively short and well structured so I read it from top to bottom a couple of times. And here is what I discovered:

Experts have called 2017 the year of data literacy and digital transformation. While data is a key component that drives true digital transformation, too often companies approach data and analytics projects the wrong way. In fact, a mere 13% of data and analytics projects reach completion, and of those that do, only 8% of company leadership report being completely satisfied with the outcome.

I already know this number (13%) very well, Deborah Leff was right — it was CIO Dive Magazine where she found that piece of information. But what’s the source? Where is the explanation or at least another breadcrumb?

Why do only 13% of “data and analytics projects” reach completion?

Unfortunately, we know nothing about the source of that statement. Maybe it was just made up for the purpose of CIO Dive article, maybe the author simply forgot to cite yet another article which would finally explain the details of how it was measured that 87% of DS projects fail.

While it’s entirely possible that 9 out of 10 ML projects fail, it is barely possible to measure it in a reliable way. Or even to define the “failure” or “making it into production”. First of all, what does it even mean for a machine learning model to be in production?

Is a single API endpoint served e.g. with FastAPI enough? Or maybe we need a whole CI/CD/CT pipeline and monitoring to be set up? More than that, some project are just not meant (planned) to be deployed into production — do we count these as failures too?

I don’t know and I’m a bit disappointed that I didn’t find anything.

So what does it mean?

Wrap up

In 2017, A Chief Data Scientist writes an “Opinion”-labeled guest article in CIO Dive Magazine where he states that “a mere 13% of data and analytics projects reach completion”. No source, no links leading to research papers, zero information about where does that magic number come from.

Then, this article is brought up by a Global Leader and Industry CTO for Data Science and AI, IBM Deborah Leff during one of Transform 2019 panel where she says that: “I think CIO Dive Magazine says that only 13% of data science projects actually make it into production”.

This is then quoted by VentureBeat in their sponsored article which promoted Transform 2019 panel. The article doesn’t even provide a link to the video recording though. What happens next?

A dozens or hundreds of ML and MLOps resources cite the same article, the same piece of information: 87% of data science projects never make it into production and use it as a background for selling their tools and products.

I am genuinely disappointed that we spread such unconfirmed piece of information so easily, especially in community which is very close to R&D and academic environment — relying heavily on research.

What does it mean for MLOps? Probably nothing, we still need it. It’s just shocking that we built this community, tools, startups on a phrase which is nothing but a magic number from a single, opinionated article.

--

--

Responses (6)