The crucial thing about “big data” is the data. “Big” is relative, and while size often matters, real disruption can come from data of any size.
This is not a new idea, being several hundred years old. The key advance of the scientific revolution (and associated industrial revolution) was in order to understand something you had to measure it – that is gather the data.
The modern hoopla about “big data” is simply the scientific method applied to a wider range of problems. Doing this cheaply enough is the challenge.
While the idea of collecting the data is the most fundamental, it is not sufficient. Analysts need to make sense of the data. The field of statistics was developed well over a century ago to help do so, originally for kings so they could know how much tax they could raise (the word “statistics” shares etymology with “state”).
Statistical thinking involves computing functions of data. Until recently the ability to do these computations was a major bottleneck. The many orders of magnitude in reduction of cost per byte stored or computation done that we have seen in the last couple of decades (driven by technological advances in chip fabrication and disk drive manufacture) has removed the old bottlenecks. It is this reduction in cost that has enabled the “big data revolution”.
Many businesses and organisations are now gathering and keeping data on a much finer grained scale than before. Rather than tabulating aggregate sales figures, a large retailer can now store every single purchase made by every single customer. With this they can understand the patterns of consumer behaviour in a manner that allows them to tailor their offerings in a very personalised manner.
Taking the analogy with the methods of science, this allows business people of all types to approach their business as a scientist would an experiment.
The use of data-centric techniques for marketing and the analysis of customer behaviour is certainly the most visible use of big data in industry, but it is actually just the tip of the iceberg. It is perhaps popular now since businesses typically record a lot of this data for other purposes (such as payments).
The real disruptions are likely to occur when business leaders realise they can measure (and then potentially make sense of) any other aspects of their business.
A bus company can measure every single journey on each bus by capturing the data from electronic payment systems. It can use this to optimise its routes and timetable in a much more fine grained manner than before.
A city can potentially control all of the traffic lights in the whole city on the basis of real-time information of the traffic across the whole city, rather than simply controlling locally at each intersection.
Energy companies can measure the output of rooftop solar panels and predict the energy produced 10 minutes hence.
Hospitals can mine the nurse’s daily records to detect deadly fungal diseases before they are noticed by other means.
Any problem where there is something you can measure is amenable to doing better with the techniques of data science.
Story continues on page 2. Please click below.