Enterprise

The missing piece of the decentralized internet? A database service

decentralized internet - database service - Bluzelle
(C) iStock.com/wabeno

They say that history repeats itself and in the case of the decentralized internet at the moment, that might well be the case.

For while there has been rapid blockchain innovation and frenetic activity in the area of decentralized storage, a problem that earlier digital pioneers encountered seems to blocking progress once again.

Even before the decentralized internet, there were two primary data storage services – file system storage for files and database storage for data fields.

Files are usually over 10KB in size so relatively large and arbitrarily fixed, while data fields are typically small and of fixed size. Files are also not generally structured in a way that makes them searchable, while data fields are organized into groups and collections for easy searching.

These different characteristics made the storage solutions for each type of data quite different too. While data fields could be quickly stored and retrieved to achieve the best security, performance and scalability, file systems were optimized to deliver the entire file but lacked the granularity to search and retrieve data within them.

We’re now totally familiar with the major players in these fields, from Dropbox and Google Drive on the file storage side to Oracle and Mongo on the database side. To a greater or lesser extent, these solutions solved the problems surrounding access to data in the internet world.

What is the apparent now is that the same problems they faced then are occurring again in a decentralized internet. For while solutions such as IPFS/Filecoin, Storj, Sia and Ethereum’s Swarm have moved things forward, they can only progress so far.

Immutability is not an option

The immutability of the data on the blockchain is one of the key dynamics at play here. While never removing or changing data has many positive aspects to it, it also poses a serious issue when you consider how it works within important regulatory frameworks.

For example, the EU’s General Data Protection Regulation (GDPR), which is high on the agenda of many businesses right now, will require them to be able to purge customer data completely from their systems.

Even without the threat of GDPR, the immutability of data is seen as an unacceptable constraint for many real world data storage scenarios and can therefore be a deal breaker for software projects. This is one of the reasons we’ve seen so many decentralized software companies resort to traditional cloud-based, centralized databases.

“the immutability of data is seen as an unacceptable constraint for many real world data storage scenarios”

The issue of decentralized storage

Coupled with the issues of immutability is the fact that files are not stored in a useful way in new decentralized file storage services. Often, they are broken up into chunks with the divisions made at arbitrary locations, demonstrating little regard for the data in the file. Trying to access data when the underlying storage mechanism does not understand the nature of it is inefficient and likely to be error prone.

The reality is that, to read a simple mailing address from a relatively modest 10GB file on a storage service like IPFS, would require the entire file to be downloaded and then searched for the relevant information.

Even at download speeds of 1GB per second, it would take 80 seconds every time the file is accessed.

A database from the ground up

Alternatively, if you imagine a real database containing the same 10GB of data, that same 32-byte mailing address could be read in 100 milliseconds. You could build a database layer on top of the IPFS to tackle this issue too but, because the entire file would still need to be downloaded first, that would just add overhead and make performance worse.

The solutions that exists right now are file services. What is needed is a database service for the decentralized internet that can be quickly and cheaply scaled by dApps.

It is the missing piece in the decentralized internet, marrying the best aspects of blockchain decentralization with the lessons learnt from decades of database science and it will future proof the development of dApps.

It’s this combination of old and new that makes me think Mark Twain was much closer to the mark with his assessment that “history doesn’t repeat itself, but it does rhyme”.

Pavel Bains - Bluzelle

 

Pavel Bains is CEO of Bluzelle, as well as a futurist, entrepreneur, designer and investor in exponential technologies. Bluzelle Networks  builds blockchain and distributed ledger solutions for the finance industry. Pavel also provides advisory, M&A, and capital raising services for companies in digital media and technology. Pavel is an investor in fintech startup Bench and virtual reality startup VR Chat. the company was named as one of World Economic Forum’s 2017 Technology Pioneers.

 

1 Comment

1 Comment

  1. Andy Lawrence

    October 20, 2017 at 8:42 pm

    This describes very well one aspect of the project I am currently working on. I am building a revolutionary new kind of object store that can be the basis for a decentralized web. Each logical container (with its accompanying control software, comprising a ‘data node’) can manage hundreds of millions of objects and can communicate with all the other data nodes distributed around the globe. It is designed to scale such that a data node can be run on simple devices with limited RAM and storage; all the way up to large clusters of servers with PB of storage. A node is designed to be run on your laptop, the company server, or out in the cloud.

    It was built originally to be just a replacement for file systems (I got tired of waiting for something like WinFS to actually come to market) but is expanding to do much more. For example, I can put 100 million files in it and put contextual tags on each one. A query like ‘find all JPEG photos where Event = “Wedding”‘ can find all of them in just a couple of seconds (even if there are a million of them).

    One of the other things we can do with this system is build very flexible, relational database tables. Each column within the table is stored within a separate object. Large columns can also be broken into separate shards and distributed as well. New columns can be added or existing columns deleted at any time without re-writing any data. So far, all my tests show some impressive performance gains over existing RDBMS offerings. For example, I can create a table with 10 columns and 5 million rows and run a query like ‘SELECT name, address, city, state, zip WHERE state ILIKE ‘N%’;’ much faster than doing the same thing on Postgres or MySQL.

    The project is still under development, but we have a ton of features working. We have some demo videos on YouTube that show some of what it can do so far. Links can be found on my blog at http://www.DidgetMaster.blogspot.com

Leave a Reply

Your email address will not be published. Required fields are marked *

To Top