The five most valuable companies in the world in terms of market capitalization – Apple, Alphabet, Microsoft, Amazon and Facebook – also stand to benefit the most from the combination of artificial intelligence algorithms and skills, as well as abundant data processing power and, most importantly, proprietary data. The self-reinforcing nature of this combination of data, processing power and skills is such that the five 'data giants' have the potential to create an effective oligopoly over the intelligence that can be generated from the world's most valuable data.

It is not our desire to take a position either way on whether antitrust rules should be adapted to reflect data-enabled dominance – merely to point out the logical consequences of the strength of the data giants in terms of the combination of artificial intelligence algorithms and skills, as well as abundant data processing power and, most important, proprietary data. The purpose of this report is not to take a view on whether the dominance of the data giants is inherently good, or bad; merely to highlight how it exists, and why it is likely to continue. Those that own the data will win, and data begets more data: giving the data-rich a potentially unassailable advantage over the data-poor.

451 Research has been saying for a while that 'those that own the data will win – everyone else will have to pay for it.' Follow this statement to its logical conclusion and it quickly brings you to a vision of the future, not unlike that described in Dave Eggers' 2013 novel 'The Circle,' where a single corporation controls how the vast majority of people communicate and interact.

This dystopian fictional vision of the future could become reality only if the five most valuable companies in the world – Apple, Alphabet, Microsoft, Amazon and Facebook – were to be combined into a single entity. However, there remains the potential for these companies – let's call them the 'data giants' – to establish, among them, unassailable dominance over how we live, work, play, transact and travel (among other things) based on the ownership and analysis of data.

Indeed, there is an argument to be made that we have already reached that point. Through internal development and acquisition, the data giants have amassed near-monopolies in multiple segments such as search, online retail, mobile applications, digital home assistants and social media.

For example, Kleiner Perkins Caufield & Byers general partner Mary Meeker's 2017 Internet trends report estimated that Google and Facebook account for 85% (and rising) of US internet advertising revenue. The related data provides them with unique insight into the behavior of huge numbers of consumers. What is more, they also increasingly own the skills and processing power to aggregate data and analyze data from multiple sources to generate unique insight that is unavailable to niche competitors.

Raising questions about the data dominance of the likes of Google, Amazon or Facebook is nothing new. As suggested above, however, what raises this to another level is their ownership of the resources to process all this data, as well as their ability to hire and acquire those with the skills to analyze the data via machine-learning and deep-learning algorithms.

This combination of data, processing power and analytic skills serves the data giants well: enabling the development of new features and services that attract and retain users. Data-driven applications produce still more data, resulting in a snowball effect that is likely to accentuate the dominance of the data-rich, and make life increasingly difficult for their data-poor competitors.

Benevolent dictators?

As a counter-argument to this implication of the inevitability of monopolistic tendencies, one could look at examples of the data giants sharing artificial intelligence algorithms and tools via open source (Google's TensorFlow, for example) as evidence of their willingness to spread the wealth. However, this is misleading. The core algorithms that enable artificial intelligence have been in existence for many years and provide relatively little differentiation in and of themselves besides improved efficiency.

That is not to say that there is no space for data-driven innovation by smaller companies. In particular, there are opportunities to apply the combination of domain expertise and advanced skills in niche areas that have been underserved by existing players. For example, Uber and Tesla have amassed huge troves of autonomous driving data, while the likes of Salesforce and Oracle Data Cloud have assembled significant volumes of customer data profiles.

However, far from diluting the dominance of the data giants, making algorithms and tools freely available increases the talent pool for the data giants to hire from, while the the data giants' purchasing power enables them to quickly acquire any emerging startups, expanding their dominance still further.

A prime example is Google's acquisition of deep-learning startup DeepMind Technologies, but there are many other examples of the data giants snapping up artificial intelligence expertise to differentiate their products and services driven by machine-learning and deep-learning functionality.

The true value that can be generated with artificial intelligence algorithms relies on applying them against large data sets, using abundant data processing power, and having the skills to interpret the results.

These skills have traditionally accumulated in academic institutions, many of which been targeted by the data giants and other data-driven enterprises, which have cherry-picked artificial intelligence graduates and researchers (or in some cases scooped up entire research teams – most famously Uber at Carnegie Mellon University's National Robotics Engineering Center).

In a 2016 article in The Economist, Anthony Goldbloom of data science community Kaggle, compared this accumulation of academic talent to the concentration of scientists in contributing to the Manhattan project, which developed the atomic bomb. What differentiates this trend today is that it is being undertaken by private companies. Ironically – or perhaps inevitably – Kaggle itself was acquired by Google in March.

Barriers to entry

Meanwhile, the data giants continue to retain an advantage over any smaller competitors in terms of owning proprietary data sets with which to run, train, debug and improve the algorithms and tools.

Data (or the lack of it) is a barrier to entry in any given market, and the longer any one company has been operating in a particular segment, the greater its ability to exploit the advantages of associated data. For example, historical search data produces more accurate targeted search results.

Google's initial advantage over the likes of a Yahoo and AltaVista lay in having a 'better' algorithm than the existing search providers have (specifically, PageRank). However, the company's true differentiation quickly became the size of its data. As the company explained its research paper, 'The Unreasonable Effectiveness of Data,' "simple models and a lot of data trump more elaborate models based on less data." That quote is often attributed to Google's director of research, Peter Norvig, as is the more direct "We don't have better algorithms, we just have more data."

New vendors, even if they do develop better algorithms, lack the historical data with which to exploit them.

The data giants have also shown their willingness to acquire not just artificial intelligence expertise, but also data itself. Microsoft's $26.2bn acquisition of LinkedIn certainly falls into this category, and while IBM is not among the data giants, it certainly wants to be and has made multiple data-driven acquisitions in industries such as healthcare (Truven Health Analytics) and weather (The Weather Company) in order to establish ownership over data sets that can be combined with its Watson analytics and intelligence capabilities to its competitive advantage.

Of course, the ability to analyze these proprietary data sets depends on having the data-processing resources to do so. Here again, the data giants have an advantage over smaller rivals. Processing power is more widely available than ever before via cloud services, but users still have to pay for that processing power.

As it happens, much of that processing power is concentrated in the hands of the data giants, which can effectively subsidize the cost of their own data-processing efforts by renting out computing power to others.

Through this combination of artificial intelligence and cloud services, there is little doubt that the wider world will benefit through the democratization of intelligent services such as image recognition and speech recognition. The ownership of highly proprietary data sets will ensure that the data giants will continue to generate differentiated value and higher levels of intelligence, however.

A matter of (anti)trust

As explained above, it is the combination of data, skills and processing power that gives the data giants an advantage that is likely to be self-reinforcing. Even if there are five data giants rather than a single vendor acting as a monopoly, questions have been raised about whether the dominance of the data giants should raise antitrust concerns.

However, the fact that what gives these giants their power is data, rather than dollars, means that the traditional tests for monopoly – such as price collusion and price gouging – don't necessarily apply. In many cases, the services provided by the data giants are given away at little or no charge. Users are effectively trading not money, but data, for these products and services. Current antitrust and discrimination laws simply do not take this into account.

There may still come a time when antitrust rules and measurements of influence are adapted to better reflect the power of data. In the interim, there are regulatory hurdles that restrict the extent to which they can exploit that data – including the impending arrival of GDPR.

Additionally, the data giants, along with other data-driven companies, have shown a willingness to at least ensure that some degree of the resulting benefits of artificial intelligence are available to others.

For example, Amazon Web Services has joined the likes of Elon Musk (Tesla, SpaceX), Reid Hoffman (LinkedIn, Greylock Partners), Peter Thiel (PayPal, Palantir, Facebook), Jessica Livingston and Sam Altman (both Y Combinator) and Greg Brockman (Stripe) in committing $1bn to non-profit AI research firm OpenAI, which is committed to making the results of its research available to all.

Microsoft is also a sponsor of OpenAI, and has additionally joined Amazon, DeepMind/Google, Facebook and IBM in forming Partnership on AI, a non-profit designed to ensure that artificial intelligence is used to benefit people and society through the development of best practices and the employment of AI for socially benevolent applications.

While this benevolence is to be welcomed, it should be noted that – as is the case in the distribution of open source AI tools – it does little to diminish the advantage that the data giants have in owning vast proprietary data sets, as well as the processing power, the skills and algorithms to exploit them.
Matt Aslett
Research Director, Data Platforms & Analytics

Matt Aslett is a Research Director for the Data Platforms and Analytics Channel at 451 Research. Matt has overall responsibility for the data platforms and analytics research coverage, which includes operational and analytic databases, Hadoop, grid/cache, stream processing, search-based data platforms, data integration, data quality, data management, analytics, machine learning and advanced analytics. Matt's own primary area of focus includes data management, reporting and analytics, and exploring how the various data platforms and analytics technology sectors are converging in the form of next-generation data platforms.

Patrick Daly
Senior Research Associate, Information Security

As a Senior Research Associate in 451 Research’s Information Security Channel, Patrick Daly covers emerging technologies in Internet of Things (IoT) security. His research focuses on different industrial disciplines of IoT security, including the protection of critical infrastructure, transportation and medical devices. In addition, Patrick’s coverage spans technological domains, including security for IoT devices, applications, platforms and networks.

Keith Dawson
Principal Analyst

Keith Dawson is a principal analyst in 451 Research's Customer Experience & Commerce practice, primarily covering marketing technology. Keith has been covering the intersection of communications and enterprise software for 25 years, mainly looking at how to influence and optimize the customer experience.

Want to read more? Request a trial now.