There are only a handful of corporations that have the massive data sets, skills and knowledge to turn all that into value. But what if data originators (individuals and businesses) could easily and securely share and monetize their data on their own terms?
The 451 Take
Data is fueling our economy, and the way we share it can be revolutionized by blockchain. Decentralized data marketplaces or exchanges represent an emerging concept that leverages blockchain technology for buying and selling personal and business data; the idea is to equalize the access to data and unlock its value for individuals, as well as researchers and businesses – set it free, but not for free. There are a few emerging approaches, but we are not yet aware of any production implementations or even real-world pilots. We believe that the concept will be welcomed by many, but what will actually work in the real world is unclear at this point. Data management and integration, as well as other technology and business issues, need to be addressed before the idea of open data marketplaces can take hold.
Data has become a strategic asset in our economy – some high-performing technology companies have built their entire businesses around the user data they collect. These companies can basically sell this data to whomever they wish, and in the majority of cases the actual source or originator of the data does not get paid for providing that data, and has little to no control over how that data is used or who it is sold to.
At the same time, there is a significant amount of data kept behind the firewalls of organizations that never gets leveraged because most of these companies don't know how to make this data available in a secure way and maximize its value, unlike the Facebooks and Googles of this world. At the other end of the spectrum, there are plenty of data analytics and AI startups, as well as researchers, that are starving for high-quality data that they can process, enrich, analyze or train their algorithms with.
Set it free, but not for free? Blockchain technology is well-suited to help tackle these issues and enable a system that directly connects and establishes trust between data haves and have-nots, equalizes access to data, and rewards participation. There are different decentralized data marketplace approaches being developed at the moment, and we present three of them below for illustrative purposes.
Datum Network aims to provide a decentralized data marketplace for storing, selling and buying data powered by Ethereum (a blockchain platform with 'smart contract' functionality), BigchainDB (scalable blockchain database) and IPFS (peer-to-peer distributed file system developed by Protocol Labs). The blockchain startup uses the motto 'take back your data' to describe its purpose, which is to work toward a future where data is primarily owned by its creators, who can choose to selectively share their anonymized data with trusted buyers and get compensated for it.
The Datum network, as envisaged by its creators, consists of data owners, storage nodes and data buyers. Datum's smart contract provides the logic and makes sure that each piece of data is traded securely and according to the data owners' terms. Each piece of data (together with the owner's terms) is encrypted, and only the data owner can provide the description key to the interested buyers.
Datum recently carried out a token sale – also known as an initial coin offering (ICO). The DAT tokens power the transactions on the network, enabling both the storage and the sharing of data.
The data is stored across multiple storage nodes that have compute power, storage capacity and bandwidth, and that run Datum's blockchain database. Storage nodes are rewarded with DAT tokens (a percentage of each transaction) in exchange for storing and transmitting data.
When the data owner receives a purchase request from a potential buyer, it can agree to it or send a counter offer. When an agreement is reached among the involved parties, the buyer pays the owner with DAT tokens.
The release of a fully functioning platform is scheduled for Q2 2018.
The Ocean Protocol has been developed to incentivize data owners to provide high-quality data (priced or public) that is adequate for training AI algorithms. It offers a tokenized approach that can power data sharing and marketplaces.
The project envisions a network consisting of data owners, who provide the data; data referrers or curators, who curate the data; data consumers, who purchase the data; and keepers, who collectively maintain the network by running nodes and mining tokens. The tokenized system is designed to incentivize three stakeholders: data owners, referrers/curators and keepers.
The Ocean core software (correct implementation of the protocol) is built on the Interplanetary Database (IPDB) network that runs BigchainDB and COALAIP rights (a blockchain-ready protocol for intellectual property licensing). It includes a member and data registry, with reputation in the form of a 'curation market' (a concept that uses tokens to curate information), and a pricing scheme. Verification that the correct data assets are made available is based on a 'challenge-response' protocol.
According to its creators, Ocean will not store the data itself, but will link to it and provide mechanisms for access control using BigchainDB protocols. The Ocean Protocol is being built gradually, and the team is looking to test the network publicly in mid-2018.
Unlike the Facebooks and the Googles of the world, most businesses are not built to monetize data. PencilDATA is a stealth startup that is developing a framework with the aim of helping these businesses safely and securely share and monetize data that is currently locked down behind their firewalls in CRM, ERP and other databases. The company's value proposition is to help businesses get the right data and maximize its value.
PencilDATA proposes going on-chain for the visibility and off-chain for managing 'active' data sets and encrypted containers. Static (read-only) data sets are stored on-chain, providing a detailed audit trail of user activities and interactions, while active data sets, which may be very large data files, are kept off-chain at trusted endpoints. Keys are managed on-chain, and access to those keys is controlled by the data publishers. If the data publisher does not like how subscribers are consuming the data, it has the right to revoke all downloaded copies, according to the company.
Initially, PencilDATA plans to help companies share data internally between business units, and to expand from there to supply chains and potentially to an open marketplace model.
We have analyzed the most pressing technology and business hurdles that blockchain needs to overcome before it can enjoy widespread adoption – data management and integration with other data sources are among them, which are especially relevant in this case. With regard to blockchains in use today, very little data beyond tracking state is transmitted between nodes, and very little development has been done to address how best to integrate blockchain nodes with other data sources and systems of record within enterprises.
When large sets of data need to be handled, the question arises: Should the 'active' data remain within the blockchain node (on-chain) or be stored in a centralized or other distributed repository (off-chain), with nodes handling access to the data and the state of the data? In the former case, what data capabilities does that node need? In the latter case, does the centralized repository in some way defeat the decentralized trust supposedly enabled by a blockchain? We believe that these questions must be addressed for each use case.
Furthermore, monitoring applicable regulations will be important, especially when it comes to data protection (e.g., EU GDPR). It shouldn't be overlooked that these decentralized data marketplaces (and any blockchain network, for that matter) need to attract enough of the right participants (e.g., data owners/publishers, buyers and curators) and keep them engaged (economically or otherwise) to make their concepts work and bring their value propositions to life.
Csilla Zsigri is a Senior Analyst for 451 Research’s Cloud Transformation channel. Csilla also works on custom research, providing strategic guidance, as well as market and competitive intelligence, to technology vendors, service providers and enterprises.
Carl Lehmann is a Principal Analyst in the Development, DevOps & IT Ops Channel. He leads 451 Research's coverage of integration and process management technologies in hybrid cloud architecture, as well as how hybrid IT affects business strategy and operations.
Keith Dawson is a principal analyst in 451 Research's Customer Experience & Commerce practice, primarily covering marketing technology. Keith has been covering the intersection of communications and enterprise software for 25 years, mainly looking at how to influence and optimize the customer experience.