The business benefits of illuminating “dark data”
Across the data estates of almost every organisation there are ever-expanding stores of unstructured and un-analysed data, sometimes referred to as “dark data”. This data can take many forms: text messages, documents, PDFs, spreadsheets, comment fields in databases, chat scripts, social media messages, photographs, scanned documents, conversations in collaboration tools, log files, IoT devices, survey responses and of course, email.
Data was recently described by the Economist as “the new oil”. Assuming the premise that data is a valuable commodity, there are undoubted benefits of being able to tap into the insights held within your enterprise’s data.
However, when we consider data analytics and business intelligence initiatives they have traditionally been applied solely to structured data. I.e. that held within databases or data warehouses assembled with data derived from a variety of structured data sources to yield a complete picture. But is this enough?
How much of your “dark data” has value?
This is a difficult one to answer, but let’s look at the research. A few years ago, IDC predicted that by 2020, 37% of the digital universe will contain information of value if analysed. They also predicted that up to 80% of our business data will be unstructured (“dark”) data. So, if we take a reasonably conservative assumption that the average “dark data” item size is 1MB, then an organisation with 10TB of data will have circa 2.96 million unstructured data assets[(10TB *80% *37%) / 1MB], that contain information of potential relevance and value to the organisation.
Buried within these assets will be valuable information on pricing, customer behaviour, and competitors. In multinational companies, they may also contain potentially valuable yet untranslated data assets created for or generated in non-English-speaking markets.
“Crash for cash”
Let’s consider an example scenario and very real problem. The average cost of an insurance whiplash claim in the UK is circa £1,000 to £2,500, but could be as high as £100,000 for serious cases. Insurer Aviva claim that 66% of fraud they detect is whiplash related, so called “crash for cash”.
They are now turning down 1 in 8 claims which they suspect are fraudulent, processing almost 17,000 personal injury claims in the period up 1,000 on the previous period (Source: Moneywise).
Many of these fraudulent claims are likely to be inspired by previously successful claimants and the new claimant may well use the same language or phraseology to describe the accident and substantiate the credibility of their claim or even deliberately crash the vehicle in a certain way to maximise the chances of the claim being accepted.
It is not possible for a claim handler to read through thousands of correspondence relating to historical claims, but a machine can. It can look for words and phrases like “whiplash”, “rear shunt” or lookup if this claimant has claimed previously, to help classify the correspondence and highlight known fraudulent language or perpetrators better informing the claim handler or the fraud investigation team.
Imagine if as the correspondence is received it has already been assessed by a machine to determine the fraud potential of the correspondence?
What about the data that’s not relevant and has no value?
Going back to our imaginary company with 10TB of data and almost 3 million data assets that potentially are relevant and valuable to the organisation. This means that the converse is also true, in that they have just over 7 million data assets of potentially no relevance or value. Just the storage and backup costs alone would be close to £2,000 per month (Source: Storagecraft.com)! Imagine if you were able to identify the dark data of no relevance or value and remove it, you would provide an immediate saving of £24,000 p.a.! However, consider that IDC have also predicted a ten-fold increase in data by 2025, so if you do nothing now, then you are committing to a future expense of £240,000 p.a. in five years’ time to store and manage data of potentially no relevance and value to your organisation. Not a good business decision!
Given the potential afforded by illuminating “dark data”, it is perhaps worth taking a moment to reflect on the potential relevance and value of your own company’s “dark data” and what you are going to do today, to mitigate long term cost, drive efficiencies, increase your competitiveness and deliver tangible improvements in services to your customers.