Dark data - or unstructured words and numbers - is the bulk of information sitting on company servers and in the cloud.
You might have come across the term “dark data” which, according to the recent Deloitte University Press article is the mass of unstructured, unanalysed text and numbers sitting untouched in your company in sources such as text messages, PDF documents, emails, video and audio files.
I have been approached by quite a few businesses looking for advice on how to analyse the so called “dark data” and add value to their business, so I wanted to share some advice on how to explore the hidden world beneath your digital feet, whilst helping you sort out the practicalities from the hype of big data.
The value of “dark data” from the “deep web”, not to be confused with the “dark web”, which is something completely different, is in integrating and filtering it to bring insights for better customer segmentation, personalisation and even new product development.
This unstructured data makes up over 90% of the data companies own and is doubling every year. So why isn’t your company taking advantage of it? Usually, the limitation is analyst time and the fact that the data is simply not in a database that allows easy access.
As the Deloitte authors put it: “Nagging quality issues with master data, lack of user sophistication, and the inability to bring together data from across enterprise systems often colluded to produce insights that were at best limited in scope and, at worst, misleading.” (Krambies et al)
I agree, but I would focus on the data quality and inability to stitch together different numeric sources, as it’s the easiest area for smaller companies to fix – and it is less likely to bring ethical and legal dilemmas than analysing customer images or text messages.
Data Quality
The suspicion that a data source is inaccurate or patchy, can immediately consign it to untrusted and unanalysed “dark bin”. This often happens with retailers’ Google Analytics, where the volume of sales or registrations does not match internal systems.
For example, we were contacted by the web team at one of the UK’s largest pizza delivery companies, who felt their Google Analytics setup was incorrect. They wanted to prove that the launch of their improved website would boost ecommerce conversion, but they needed numbers that management would believe.
After fixing a number of Google Analytics setup problems, the value of sales captured in Google Analytics was matched to 3% of the sales recorded on their accounting systems and, as a result, they could present their new site’s stats with confidence.
The problem was not that Google Analytics was inaccessible, but that it was unreliable. After the transformation, the web behaviour could be integrated with the customer behaviour (such as repeat purchasing) to give reliable insights.
Data Integration
The other situation in which data is hidden is when it cannot be linked together. Ironically the proliferation of cloud-based business tools (CRM, accounting, etc.) is actually making the problem harder – since one organisation may have multiple sources of truth about the customer relationship.
In the ecommerce world, there is a big opportunity to bring join product data and customer data with the web behaviour data.
British Red Cross Training, a trading arm of the charity, sells first aid training courses around the UK. Unusually, their ‘products’ or courses have a location and a time. They wanted to understand which courses were in demand at different locations, and where there was not enough availability of time slots.
In Google Analytics they could see the volume of searches being made, but by pulling the time and locations of the courses from their systems and sending as custom dimensions the web team could list out what users were searching for when they got zero results. As a result, British Red Cross could work with training venues to increase availability at the in demand locations.
An example of stitching together web and customer data is to find who has transacted previously. With ecommerce sites, the user is usually anonymous (does not login) until the final payment step. However, when the customer pays an account is created (which stores their purchase history) and so it is possible to match this flag for ‘has purchased previously’ with the web data.
The problem faced by online retailers is that they spend marketing dollars, especially via affiliate and voucher sites, acquiring customers who would have purchased anyway (because they already know the brand).
For example, Made.com, an online furniture retailer, found that being able to segment marketing campaigns by which ones brought NEW customers gave them confidence to invest in channels which initially looked marginal. Something as simple as splitting web behaviour by people who purchased previously versus those that hadn’t proved very useful.
Online with offline data
The current frontier of dark data for retail is in combining online with store data. Many online retailers are starting to open physical locations (e.g. Made.com in Liverpool) and every offline retailer is pushing into ecommerce. So how do you combine the offline and online data to create a better customer experience?
One example is US supermarket chain Kroger Co., who launched a pilot with 14 stores in 2016 to embed a network of sensors into store shelves that can interact with the Kroger app and a digital shopping list on a customer’s phone. As the customer walks down each aisle, the system analyses the customers purchase history and can display promoted products on 4-inch displays mounted along the aisles.
Getting started yourself
So how can you put some of these ideas into action in your company?
1. Start with an audit
You need to understand what is being captured and counted accurately before you dream up ways to use it. Fixing the little problems may be the quickest way to surface that data. Companies such as Littledata offer a free audit for your Google analytics setup.
2. Brainstorm your data assets
What potential sources of dark data do you have? Which cloud tools or databases are being used to store text or events, but never analysed? Or which are infrequently analysed, because they are in a hard-to-access format?
3. List out the questions you’d like to answer
What do you really need to know about your customers or products? Or where do you see the biggest opportunities for cost saving or extra revenue? Start with the question you need to answer, and then work back to which data source could provide some insight if integrated.
Data analysis is not like panning for gold – you don’t just sift vast quantities of dark sand until a nugget pops out. You need to know where to dig first!
Thanks for signing up to Minutehack alerts.
Brilliant editorials heading your way soon.
Okay, Thanks!