As the volume of unstructured data explodes, so to do opportunity costs for enterprises unable to harness it effectively. Not so long ago the battleground in content services was content storage, but increasingly the focus is on building metadata around content and content components, improving discoverability, security and reuse, and extracting contextual insight to enhance their value in workflows. To that end, the document capture segment is emerging as the point of convergence between advanced content analytics and process automation. There has been steady consolidation in this space with ECM players heavily acquisitive over the past years, but there is scope for pure plays to become more dynamic members of both the content services and process automation landscapes.
The 451 Take
Two major forces, anxiety over new regulatory mandates and the need to digitize and automate rote manual processes, have organizations scrambling to understand volumes of new and historical data and make them operationally functional. The new capture remit – locate and identify information of value within any unstructured data source and lend it as much functionality as structured data – caters directly to that need. New usage expectations for cloud and mobile-friendly services are pressuring long-standing leaders to create 'capture as a service' platforms providing embeddable, developer-friendly services to extend the value proposition beyond MFP (multi-function printer) and print management. We expect more partnerships and perhaps acquisitions from line-of-business applications and process automation players for which transforming unstructured data into extractable sources of business intelligence is top priority.
Market Context
The progenitors of today's modern document capture technologies stretch back over a hundred years to early character recognition devices designed to assist the blind. Today's defining technologies include OCR (optical character recognition), ICR (intelligent character recognition), OMR (optical mark reader), IWR (intelligent word recognition), and barcode scanning. Key document services functionality includes automated document ingest, classification, routing, extraction, migration and integration.
The advent of smartphones and the potential for mobile capture, strides in artificial intelligence and in particular, cognitive analytics, and significant investment in content-related process automation are all breathing new life into this category. Their disruptive impact is shaking up an industry that, supported by long-standing ecosystems of value-added resellers, OEM arrangements and integration partnerships has enabled vendors like Kofax, Canon, ABBYY, Nuance Communications, Xerox, IBM and OpenText to solidify their place in the market.
In the table below we've listed many of the players defining the space today.

As is clear from this list, capture technology has been tightly bound to the enterprise content management (ECM) market. Most major vendors – OpenText, Lexmark, Hyland – either own capture assets or have key partnerships for automation of content management operations (i.e., metadata population, document classification) and support transactional information exchange in customer, employee and partner communications. Hyland's recent acquisition of Allscripts' OneContent document capture and health record management software is the most recent instance of years of industry consolidation.
Market opportunities
Most capture players identify the two biggest growth opportunities in the market as mobile capture and cloud, both of which enable more modern and flexible methods of engaging with
customer, employee and partner communities. Younger players like Ephesoft are marketing as cloud-based, API driven, 'capture as a service' platforms with a view of the content repository as a data lake, rife with actionable insight. Inevitably this will eat into much of the legacy capture revenue in the MFP market, by facilitating new forms of democratized, 'off-site' and multi-channel digitization.
There is likewise a huge opportunity for capture technologies to become the connection point between users in the cloud, legacy content storage and other repositories, and the span of third-party applications and business systems. Our
2H 2017 Voice of the Connected User Landscape: Corporate Mobility and Digital Transformation survey shows that digitizing manual and paper-based employee processes and better managing risk around corporate data are the top two considerations for organizations pursuing digital transformation. As a result, unstructured data extraction and processing becomes much more of a core capability for any vendor looking to automate data entry and facilitate data integration.

Source: 451 Research's VoCUL, Corporate Mobility and Digital Transformation Survey, 2nd Half 2017
As companies look to digitize more of their processes, the value proposition for capture technologies is to free useful information from its document container and make it processable by other systems. There are thus growing synergies between document capture and other process-centric technology providers such as Appian, Kofax, Pega and Alfresco Software around their business process management (BPM) applications, with robotic process automation (RPA) vendors like EdgeVerve Systems, and IT services vendors like Accenture, Deloitte, EY
and IBM that have invested in RPA.
Kofax is one traditional document imaging provider that is bridging the gap itself, pursuing RPA and data transformation opportunities itself with
its Kapow assets. For workflow automation vendors like Nintex and K2 that enable document-centric process automation, the increasing granularity and intelligence around capture technologies
promises to make their processes more fine-grained and potentially opens more points for workflow orchestration. The same goes for vendors enabling automated document generation such as contract lifecycle management vendors Conga, Apttus
and Icertis.
Interest has also been sparked in content service categories like content collaboration, content productivity
and e-signature as vendors like Box, Dropbox, Adobe and
Docusign look to enhance support for the mobile user and automate transactional content workflows.
Areas of innovation
The explosion of personal connected devices has forced the capture landscape to evolve to handle more varied types of content such as image, voice, and video. ABBYY, for instance, has just released an SDK for its Real-Time Recognition functionality that permits livestreaming video capture. There is also growing demand to incorporate analytics and data processing at the 'edge,' capturing metadata regarding context and sourcing as early as possible to subsequently inform policy or prompt workflows in recurring environments. This is what distinguishes document capture technologies from general content analytics platforms in the short term. Historically, capture has been more concerned with content ingest, extracting data that originates from outside of the business firewall, but in time these markets will be indistinguishable from one another as vendors embrace capture-as-a-service platform approaches that apply analytics to content at any point in its lifecycle.
Content service providers have for a while now spoken of 'document intelligence,' referring to capabilities like automated classification, risk identification and insight into the defining characteristics of a document or set of documents. Contract risk analysis is one such example, driven in large part by enterprises wanting to realize capture's potential to support security and compliance programs over its ability to drive productivity. Natural language processing (NLP) and cognitive analytics, however, will increasingly shape the market as providers bring contextualized human understanding to text.
Going beyond transcription, it's becoming increasingly possible to pick out relevant information from large text assets based on context signals and derived patterns. One of the drivers is the development of the international regulatory environment, which a growing number of regulatory frameworks requiring much more granular handling and retrieval of consumer data. As these capabilities evolve, all unstructured data sources – like email, chat streams and audio conversations – will become extractable sources of business insight.