What is Big Data?

You don’t have to look very far before you hear mention of ‘Big Data’ these days. But what exactly is it, and why does it matter?

You’d be forgiven for thinking that the technology industry is obsessed with the concept of Big Data. The hype around this concept has gathered so much momentum over the last few years that the term has made it into the collective consciousness with the phrase being banded around even in the mainstream press. Most people are oblivious though as to what is really meant by Big Data, and especially why it even matters. This article aims at myth-busting the term and offering some explanation as to why we all need to understand its impact.

The term was famously coined by then META Group analyst Doug Laney as the culmination of complexity of data across three axis: Volume, Variety and Velocity. The “Three V’s” definition is often cited as the meaning of Big Data – and for some authors has been expanded to include two more: Variability and Veracity.

One problem with taking a characteristic approach to defining Big Data is that it leaves the reader with the view that there is somehow a threshold on each of these axis, after which Data can justifiably be considered “Big”. I often hear questions such as “I have a 2TB Database – do I need a Big Data solution?” or “I need to query my data daily rather than monthly, will I need a Big Data solution to do this”? Unfortunately, there simply is no threshold after which Data can be considered as “Big” – speed or size are not qualification criteria, it’s what you do with it that counts.

The other question I’m now very accustomed to hearing is, “We need help with our Big Data strategy?”. “Really?” I say, “what are you trying to do?” The answer is in all cases very telling. If I hear: “we have all these data sources, across the organisation – we need to try and find the value in it” – then I know I’m in for a long conversation! While data might very well be the new oil, simply mining and refining it for value isn’t likely to yield a fast positive return on investment. We need to start with a business problem first – and then look how we can build a technology strategy to solve it.

Past Experiences

A few years ago, I was asked to help a Financial Services business find opportunities to dramatically reduce cost in their organisation.

They had spent millions of dollars of investment in IT solutions to automate various business processes, resulting in significant efficiencies. The business units that hadn’t been automated had invariably had the manual elements of their workflow relocated to offshore centres such as South America, Eastern Europe or Asia where labour costs were lower. In some cases, functions had been outsourced to specialist firms who could provide an efficiency of scale on a per unit cost basis. Management Consultants had spent months of person-time building flow-charts to model the systems and processes in order to ‘re-engineer’ a new Target Operating Model that could shave a few fractions of a percent out of the total salary cost.

My client was still dissatisfied. She knew there were inefficiencies in the system, even if they couldn’t easily be spotted from all the reports. After some scratching of our heads, and far too many hours sketching on a whiteboard – we decided to test whether the flow of the data in the organisation matched the process models that had been created. We devised a system that could model how data tracked from individual to individual, team to team, system to system. The resultant image that we drew was reminiscent more of something you’d expect to see coming from the Large Hadron Collider – but was in fact just the output of a few days of monitoring activity at this Bank.

Immediately we could see that people were working in ways that the Visio diagrams indicated they wouldn’t be. Very quickly we identified our areas to target efficiency problems, and we were able to make the client significant savings in very short order.

Another client had a problem with customer service.

Surveys had indicated that customers were extremely dissatisfied with their experience in-store, yet the organisation in question had what they felt to be a very high quality training programme for all their customer facing staff – and could not understand why there was such a problem. We initially couldn’t spot any patterns as to why a significant number of customers were so dissatisfied.

For example, on a given day two surveyed customers visiting the same store, looking at the same products and dealing with the same representative might give wildly different survey results. We decided to expand the data set and look at the online orders and telephone support. “But those users are extremely happy!” protested the client. We brought the data together, and very quickly identified that a large proportion of the dissatisfied customers had previously tried to research their purchase online, or had called the telephone support department. Their dissatisfaction in store had nothing to do with the quality of the store staff, but rather was just a symptom of frustration as they had reached the end of their tether from their experience with this brand. The store-staff had been oblivious to the customer’s previous interactions across the other channels up until that point.

A third example was a company that had a very strong suspicion that their expenses policy was being abused.

They had updated their policies both on travel and entertaining, but somehow they were benchmarking far higher than they ought to have been. We looked at the client entertaining expenses first– a couple of spikes, but nothing that would indicate fraud. The company had a policy where staff working late in the evening would be allowed to order food into the office on the company’s expense after 8pm and could order taxis on the firm if they were still in the office after 10pm.

We decided to smash these two data-sets together that had previously been unconnected. Imagine how many instances there were of pizza delivery orders being made at 5 minutes past 10 in the evening, and then a taxi home just 20 minutes later! Ordering take-away food then travelling home to eat it was endemic in the firm, yet no-body knew until that point. What about the client entertaining, you might be thinking? We had a lucky break here – we pulled Human Resources data and matched staff birthdays against the outlying expenses. Incredibly 8 of the top 10 entertainment expenses in the previous year had been within 5 days of the claimant’s birthday. What are the chances of that?

In summary, if you’re trying to figure out what Big Data is – you really need to stop. Stop and think, that is. Big Data can’t be defined by the size or speed of the data, but rather than by the opportunity it can create for you. It’s a state of the mind, more than it is a state of the art technology. That’s why when clients ask me what Big Data is, I tell them it just means Big Opportunity. The question is, what is the opportunity that they want to create, and how can we use data to find and create it.