Apple made voice recognition mainstream when it launched Siri with the iPhone 4S in October 2011, influencing consumers' familiarity with voice-enabled applications. Our expectation was that human-computer interaction (HCI) could evolve and that by 2015, voice-enabled applications would be widely used on computers, smartphones and wearable devices by consumers and in the workplace. It is four years past our initial prediction, but there are signs that 2019 will be a turning point for the industry. Voice is emerging as the main user interface for many applications, devices and appliances. This includes smart speakers such as the Amazon Echo and Google Home, which according to 451 Research's most recent VoCUL Smart Home Leading Indicator survey, continue to be the most-owned smart home devices. The trend is now extending into the enterprise, but unlike the consumer segment, adoption is driven by use cases rather than devices. In this report, we look at six companies – Apprente, AWS, Cisco, Microsoft, Orion Labs and Voicea – targeting use cases where voice-driven HCI can be an alternative to keyboard or touchscreen interfaces.
The 451 Take
The vendors and use cases covered in this report are good examples of the impact that voice-enabled applications can have in the workplace. Unlike the consumer segment, early deployments show enterprise adoption is influenced by use cases where voice-driven HCI can be an alternative to keyboard or touchscreen interfaces. Advances in natural language understanding and voice synthesis give developers the flexibility to choose the best fit for HCI. This opens many opportunities for use cases where data entry remains a challenge. For example, some end users, such as healthcare practitioners and first responders, are unable to use a keyboard, and some interfaces are not ideally suited for typing, such as wearable devices. We expect to see more enterprise applications with conversational interfaces in 2019, particularly where mobility has a direct impact on productivity and where data input or real-time collaboration via a voice interface is what works best.
Organizations are becoming increasingly aware of the benefits that voice-activated interfaces can have for improving workforce productivity. 451 Research's VoCUL Digital Transformation survey shows it is one of the top disruptive technologies that organizations plan to adopt: 37% of early adopters of digital transformation initiatives plan to adopt voice-activated interfaces within the next 24 months, compared to 23% for all respondents (Figure 1).
Figure 1: Voice-activated interfaces are among the top disruptive technologies for early adopters of digital transformation
Source: 451 Research's Voice of the Connected User Landscape: Digital Transformation survey, 2H 2018
Use Cases Driving Adoption of Voice-Enabled Enterprise Applications
Speech recognition is now a standard feature for laptops, tablets and smartphones, as well as a growing number of devices including desk phones and meeting room equipment. With Alexa, Cortana, Google Assistant and Siri available on many of the devices used in the workplace, voice-enabled applications are already available to employees. Unlike the consumer segment, however, enterprise adoption of voice-enabled applications is not device-driven. Rather, early deployments show it is largely influenced by worker type and workflow requirements.
In the past two years, several initiatives have emerged – including (but not limited to) Apprente, Avaya, AWS, Cisco, Huawei, Microsoft, RingCentral, Orion Labs, Phobio, Plantronics, Voicea (formerly Voicera) and Vonage – that are exploring options in HCI outside the conventional keyboard and touchscreen input for enterprise use cases. In addition to horizontal use cases in business communications and meeting room technologies, early deployments include vertical-driven applications in industries such as healthcare, hospitality and construction, and use cases for field workers and first responders that can benefit from hands-free capabilities in devices for data entry, workflow automation, and real-time communications and collaboration (Figure 2).
The use cases covered in this report include the following worker profiles:
- Knowledge workers. Employees whose job involves transforming or managing information. Their work typically involves unstructured processes and outcomes. Examples include software developers, marketing and communications professionals, and sales professionals.
- Task workers. Employees whose job requires handling or consuming information. Their work is defined by structured processes with defined outcomes. Examples include contact center agents, field work technicians and healthcare professionals.
- Service workers. Employees whose job does not typically involve the use of information. Their work tends to be highly structured with discrete, measurable outcomes. Examples include frontline workers in hotels and fast food restaurants, retail workers, warehouse employees and drivers (transportation and logistics).
Figure 2: Enterprise use cases for voice-enabled enterprise applications
Improving Workforce Productivity with Intelligent Assistants
The most visible examples of enterprise voice-enabled applications are the intelligent assistants that are finding their way into meeting rooms and audio and video conferences, targeting team collaboration use cases for knowledge workers.Microsoft Cortana.
When it comes to AI voice-enabled enterprise applications, Microsoft has been ahead of the curve with Cortana, its speech-enabled intelligent assistant. Cortana is integrated with Windows 10 and Microsoft productivity applications; this means it already has a significant footprint in the enterprise. It also supports iOS and Android devices, further extending its market footprint. At Ignite 2018, Microsoft announced a new Cortana Skills Kit for Enterprise developed with the Microsoft Bot Framework and Azure Cognitive Services Language Understanding service. The solution allows enterprises to build custom skills and agents to effectively use Cortana to improve workforce productivity. It is currently available in private beta.AWS Alexa for Business.
In November 2017, AWS announced Alexa for Business (A4B), bringing its intelligent voice-enabled digital assistant technology to the enterprise. A4B aims to help organizations streamline and automate common business tasks such as auto-dial-in to conference calls, setting up reminders, managing agendas and calendars, and more complex tasks such as inventory management and querying databases and reports. The service provides integration to business applications such as Microsoft Office 365, Google G Suite, Skype for Business, Splunk, Vonage, Webex, Salesforce, ServiceNow and Concur. Additionally, A4B APIs make the service extensible because customers can build integrations with any application in the enterprise. It also allows enrolled users to integrate their personal Alexa account, providing access to these applications via voice commands from their home using any Alexa-enabled device.Cisco Webex Assistant.
In November 2017, Cisco announced its Webex Assistant (formerly Cisco Spark Assistant), an AI speech-enabled voice assistant for the enterprise. Initially available in the Cisco Webex Spark Room Series portfolio, including the Cisco Webex Spark Room 70 video system, Webex Assistant is a 'digital team member' capable of voice, visual and text interactions, and able to access enterprise knowledge. It aims to support team collaboration and enhance the meeting experience by performing tasks such as finding and calling people, starting and ending meetings and providing in-meeting controls. Cisco is exploring expanded capabilities such as assigning action items and creating meeting summaries.Voicea.
) is the creator of Enterprise Voice Assistant (EVA) – an AI-enabled virtual assistant for meetings – including conference calls, in-person and online meetings. EVA aims to help people have more productive meetings with faster follow-up and activation of important elements. Voicea
allows users to attend only the meetings in which they need to participate, but still review notes from the meetings they did not attend. EVA listens, records and takes meeting notes, extracting highlights such as action items and decisions. Users can have EVA schedule follow-up reminders or push highlights to other collaboration tools for follow-up. EVA aims to become a part of the meeting, enabling users to focus on the conversation without worrying about getting the information where it needs to go. Voicea
believes AI can help knowledge workers spend less time taking notes and focus their attention on getting work done efficiently.
EVA can email attendees in advance to remind them of the meeting. Afterward, it will send a meeting recap with key takeaways and action items or deliver these notes to a third-party system. Users can log into the dashboard to customize what EVA listens for, perform keyword searches and recall key moments, replay highlights, or share important moments with others. Users can also upload their own audio for transcription or organize meetings into channels for team collaboration; they can also push highlights to their team using email or via plug-in
to tools like Salesforce and Slack. Voicea
plans to expand EVA's integration to the collaboration environment with features that include the ability to manage teams across the organization and manage alerts for meetings where key concepts are being discussed. It currently integrates with video conference services such as BlueJeans, Cisco Webex, Google Meet, GoToMeeting, Highfive, Skype, UberConference and Zoom.
Enhancing the Customer Experience with Conversational Agents
Apprente is a startup
currently focused on using speech technology to support frontline workers in quick service restaurants (QSRs), developing systems that automate the food-ordering processes at drive-through stations, kiosks and
mobile devices. Unlike conventional speech-to-text approaches that transcribe audio signals, it provides speech-to-meaning technology for learning and processing of speech signals to directly infer meaning from audio. The company believes this provides a better approach for customer-experience-related use cases, particularly in noisy environments such as restaurants and public areas or in cases where customers tend to use colloquial, poorly structured language, resulting in low-accuracy speech recognition.
Enabling hands-free, heads-up communications for mobile workers and first responders
Orion Labs provides a real-time team
communications platform with AI-enabled push-to-talk (PTT) functionalities delivered over mobile networks and Wi-Fi. The platform enables users to reach one, some or all their team members with the touch of a button using their iOS and Android smartphones or other connected devices. It operates via voice-enabled commands and provides integration to business systems as well as third-party platforms such as Google Assistant, Microsoft's Cortana and Amazon's Alexa. Other key features include access to real-time
location of team members and workflow automation. The company's portfolio includes Onyx, a round-button clip-on walkie-talkie device that pairs via Bluetooth to the Orion mobile app on the user's smartphone, and Orion Sync, a compact, rugged wearable device equipped with LTE connectivity that provides multiple communication functions including PTT, voice-driven action button, and EMR/panic and voice command capabilities.
Key verticals and use cases for Orion Labs include those that have traditionally relied on PTT and two-way radios, including hospitality, transportation, healthcare, manufacturing, retail, construction and
public safety. Named clients include
Kennesaw Pediatrics, a healthcare provider based in Kennesaw, Georgia; Eastern PA Tactical EMS, a paramedic service in Leesport, Pennsylvania; Subverse
Industries, a fashion and clothing company; and the Parks, Beaches and Recreation Department in Pacifica, California.