Dark data – the hidden gold
The other day, I stumbled across the phrase “dark data”. I may or may not have heard it before, but now it sparked my imagination. The metaphor is simple and, if you’re familiar with the thing it paraphrases – “dark matter” – you immediately get what it means.
For those of you who don’t know about dark matter, it is the hypothesis that roughly 85% of all matter in our universe is “dark”, in the sense that we don’t (yet) have the understanding – or technology – to detect it directly. We can only infer it indirectly.
Similarly, dark data is data that sloshes around in our databases, file systems and clouds without being put to any use because we either don’t have the means to mine it, or we don’t even know it exists.
When I researched this a bit more, I found an interesting report from application software company Splunk that puts some flesh on the bones of the concept, “The state of dark data report”. The report is based on a global survey of more than 1300 companies across seven countries in North America, Europe, Asia and Australia.
For those of us who have been interested in bringing out the potential value in data for a long time, it confirms many of the things we have suspected.
First, some of the main findings in numbers. According to the respondents:
- 60 % said that at least half of all their data was dark
- 56 % agreed that “data driven” is just a slogan in their organization
- 81 % agreed that the future success of their organization means turning “data driven” from a slogan into a reality
- 85 % believed that data skills will continue to become more important for workers in all roles within their organization, not just IT (my emphasis)
At the same time, according to the respondents:
- more than half said they feel too old to learn new data skills
- 69 % were content to keep doing what they’re doing, even if it means they don’t get promoted again
- only 56 % rated their organization as “extremely good” or “very good” at asking the right questions of data, even though 75 % rated that skill as “extremely important” or “very important”
These numbers all speak to a common theme which many of us have been championing for many years. To quote the report, “(…) respondents said that to make better use of data, they’ll need a holistic approach to overcoming technical and organizational obstacles. Data strategy, and the pursuit of dark data, cannot be a “project” — it has to be a key organizational priority, an essential competency driven by in-house leaders and in-house talent with an eye to the end-to-end management of all data.”
Digitalization without high quality and comprehensive data is just a puff of hot air. To really take off, we need the fuel locked away in our dark data.
A gap between importance of data skills and the willingness to acquire them
Running through the Splunk report, I get the feeling that one of the main obstacles to progress is a lack of deep understanding of the value of data from business leaders. One can assume (supported by the penultimate bullet above) that this is a generational issue.
The massive increase in data has occurred mainly in the last 15 years, as evident in this graph from Statista*.
People like me (who were already 20 years into their career when this exponential development took off around 2005), have never really got our hands dirty managing (let alone leveraging) these massive volumes of data. At the same time, our generation occupy the top positions in many organizations, which is probably why we are perceived to be (and in many cases also are) real impediments to progress.
Going back to the Splunk report, it points to a gap between the recognition that data skills are essential and the willingness to acquire them. Given the profile of respondents (from managers to C-level), this would strengthen the generation theory. My personal take on this is that even if I can’t keep up with the science and technology of big data, I can understand its value and recognize what to look for when hiring people. I can be at peace with letting the next generation take off propelled by the dark data booster, leaving me and my grey peers behind.
AI is the future for dark data
Also, in the report two key capabilities around dark data are highlighted. This is what most respondents think will be essential to increase the extent to which their organizations can leverage the potential value locked away in their dark data.
- THE FIRST is Artificial Intelligence (although the report stresses that the term is ill defined). 73 % of the respondents think AI will be an essential tool to tackle the massive amounts of data and derive value from it. One of the most striking findings in the report is that while more than 60 % of the respondents think AI will be used in their organization for “essential use cases” in 5 years, only some 12 % say they already use AI in this way.
- THE SECOND capability is having “employees combining technical data skills with business acumen”. Most respondents acknowledged that these are exceedingly hard to find. To realize the potential of AI to mine dark data, you presumably need to recruit this kind of people, and fast, if you want to be on top of things within 5 years.
Coming to the end of this post, I realize I haven’t used the word “digital” once. I think it is, at least to me, because “digital” is more and more a blanket buzzword. Whatever we ultimately mean when we flaunt the digital blanket, beneath it the success or failure of any digital endeavor hinges on data. Either we manage it well or we don’t. Digitalization without high quality and comprehensive data is just a puff of hot air. To really take off, we need the fuel locked away in our dark data.
A new generation of data-savvy people are emerging, although many respondents lamented the rate at which universities churn out data scientists. Our job is to recognize them, hire them and provide funds and organizational structures to help them flourish. This is going to be a challenge for us all, but the rewards will be worth it.
* One zettabyte equals to 1 billion terabytes / Source: IDC, Kleiner Perkins, Statista Digital Market Outlook