I think it’s fair to say that the face of data as we know it is changing significantly. Not only in the breadth of what we’re able to collect, store and analyse, but in the amount of information produced. It was believed in 2016 that 90% of all the data in existence had been created in the couple of years prior. We’ve come a long way from the days of recording information with pen and paper – smart devices are constantly generating data pertaining to everything from GPS coordinates to temperature.
The databases we’ve become so accustomed to that were built pre-internet just aren’t going to cut it at the scale we’re anticipating – there’s now 2.5 petabytes of data produced daily. SQL queries and relational database systems are not up to the task of sifting through unstructured data, which makes up 80% of the average business’ stored information. This is cumbersome to store, and problematic for the large-scale and high-speed stream of data to and from a database that is needed to power the developing data-hungry technologies.
It’s time to explore new and emerging forms of technology that can manage data as AI, IoT and enhanced analytics are deployed across industries.
Blockchain technology and its role in data storage
Blockchain shows real potential to these ends. Instead of housing data in the traditional centralised silos we’ve seen to date, distributed ledger technology allows it to be broken up and scattered across a number of nodes. The advantages to this are numerous; not only does it guarantee a vastly more secure means of storage (there’s no central point of failure), but it opens the door to mechanisms such as swarming, which leverages peer-to-peer connections to ensure that fragments of files can be remitted to their owners with as little latency as possible (much like torrenting). Using a blockchain to these ends adds an incentive layer – native tokens can be used to encourage peers to store and replicate files.
Such a network is incredibly efficient for high-throughput applications. It’s important that a proper structure is put in place as IoT takes off – interconnected devices stream data between each other constantly. Relying on a centralised server to process such colossal amounts of data is risky, as all devices connected are at the mercy of this centralised server, which is prone to downtime or congestion at peak times. With a distributed architecture, however, this is not an issue. As interactions are peer-to-peer, not only is the strain on the network reduced, but the system is never down. If one (or indeed, several) nodes go offline, redundant storage means that the data can be pulled from other sources.
Data in AI
There’s a lot of buzz around using AI in an enterprise setting. In all the hype though, it seems that we forget the key ingredient for training neural networks: data. The more of it that can be processed through machine learning algorithms, the better the insights and predictions that can be gleaned are. It’s imperative that we move to a system that is better ordered than our current model. There are hundreds if not thousands of use cases that seek to build atop blockchain infrastructure for facilitating the development of artificially intelligent software – there’s everything from allowing individuals to monetise their own information for training purposes, to decentralised applications building marketplaces for AI components.
We speak a lot about the importance of IoT, AI and big data, and their roles in shaping industries. To ensure that we can flesh these out to be the revolutionary vehicles for change they’re hyped to be, we need to focus on organising data properly. I strongly believe blockchain technology is key to building a highly resilient and secure infrastructure that can support the data needs of today.