Data storage may be cheap, but where's the value?
The stone age didn't end because of a lack of stones: it was the development of metal tools that brought the curtain down on the 3.4 million year period, and the oil age won't end because oil runs out. The world's collective consumption habits will change (in time) because of a perceived, rather than an actual, lack of oil. So what about the data era we're in? The sheer volume of data will give us both wonderful opportunities and desperate headaches. And that volume isn't going to decline, because storage is cheap.
The fact that there is an incredibly large amount of data isn't in dispute. You store a large amount yourself! But how much of the data that you store is relevant? Do you store data because of its relevance, or because it is so cheap to store? If I were to suggest that somewhere between 80-90% of the data that you have is irrelevant, what would you think? You might say, show me where the irrelevant data is and I'll delete it. And that's the rub. I don't know which 80-90%. And you probably don't either, otherwise you wouldn't waste your time storing it.
When it comes to the Coracle Learning Line, we allow you to store lots and lots of data, and then we give you tools to filter for the information that's relevant to you, when you want to find it. You can store statements from your structured learning, you can store statements from unstructured learning (i.e., any webpage you visit) and you can store attachments. We don't force you into making a choice on what data to keep, rather we encourage you to use the learning line filters to help find relevant, contextual records in this data rich, information poor world of ours.
Your learning journey is unlikely to generate the sort of volume of data referred to as big data, so why shouldn't we offer you these tools?
Big data diversion!
If you google 'big data' there are nearly 1,600,000,000 results. Is that a big number in the context of big data? In order to answer that we need some understanding as to what big data is.
Big data refers to massive sets of data that makes processing very difficult and complex with traditional tools.
On a visit to the Silicon Valley offices of Symantec during 2012 I learnt some interesting facts about data. For instance, I learnt that there is more money in cyber crime than the illicit drug trade and that pre-teens are being taught to hack. If criminal activity gives us any clues as to where value is, then data is the biggest game in town. It is no surprise then that as the volume of data in the world increases, so there is a greater need to understand that data. This is a somewhat circular or self-fulfilling argument, but it is the spiral that we find ourselves caught on.
How much data is there?
Estimates for 2012 suggest that there might have been as much as 2.7 zettabytes of data created. A zettabyte (ZB) is 10^21 bytes, or 1,000 exabytes. To put that in context, in 2009 the world wide web contained around 500 exabytes (= 0.5 zettabytes). In other words, the rate of creation of data is exploding.
How long will it be before the world has a yottabyte of data? A yottabyte is 10^24 and however many times I look at it, 24 zeros is a lot! 1,000,000,000,000,000,000,000,000. At the moment there isn't an official word for a bigger number, but 'Hella' is popular for 10^27.
With all of this data we can be confident that we live in a world that is data rich. But how much of that data is actually relevant? How can we extract useful information from all of that data and how do we decide how to treat different pieces of data? Intuitively we all understand that data isn't all equally as valuable, the problem is figuring out which parts are helpful, practical and worthwhile.
The issue here is one of processing the volume. Traditional relational databases aren't up to the task in a reasonable time frame and so new ways are being developed to discover meaning in this vast array of data. One familiar example is the Large Hadron Collider which has 150 million sensors delivering data 40 million times a second. With 600 million collisions a second, there's a lot of data to sift through. In order to cope with the potential data set, the scientists filter early, and as a result they end up working with 0.001% of the sensor data. That still generates 25 petabytes of data!
Which all brings us back to the Learning Line and the opportunity of deriving value by filtering your learning journey.
Oh, and to answer the question, is 1,600,000,000 results a big number, I think we can agree that the answer is no, but that without some means of filtering those results, its a pointlessly large, un-processable number.