Let’s talk about the voice-user interface


Personal computers began with a screen and a keyboard. Then we started pointing and clicking with a mouse, which was followed by touching our screens directly with our fingertips. And now, we can take an entirely hands-off approach thanks to the voice-user interface.

Although voice input isn’t new, personal computers have never been able to process audible language as accurately and quickly as we can with the incredible cloud computing power available to us today. With services from Apple, Google, Nuance, Amazon and others being offered to developers essentially for free, the commoditization of this power will redefine the way we interact with our devices.

What are the opportunities?

According to Amazon’s Alexa Skills Kit Documentation, “Natural user interfaces, such as those based on speech, represent the next major disruption in computing.” This claim comes as no surprise with the early success of Amazon’s Echo device–and it’s obviously in their best interests to grab the lion share of developers invested in voice interfaces–but this isn’t just self-serving hyperbole. The current state of always-connected devices, infinitely powerful cloud computing, and natural language processing means there are many opportunities for technology companies to not only extend existing product lines, but also create entirely new and novel use cases.

The voice-user interface can be added to or reimagined for existing applications that are suddenly able to take advantage of natural language voice controls. Instead of providing a short list of simple commands that must be spoken a certain way, applications can now understand how people really talk. Existing connected home hardware with a corresponding app could gain a voice interface through a software update.

New and existing services on the Internet can be extended with Amazon Alexa capabilities using the Alexa Skills Kit (ASK). ASK allows developers to extend the capability of Amazon’s Echo device to integrate their own service, teaching Alexa–the cloud intelligence that drives the Echo–a new skill.

One example Amazon cites is integration with StubHub, where one can use Echo to ask StubHub about shows nearby using natural language. Amazon also has announced its Alexa Voice Service for hardware manufacturers who build a connected product with a speaker and microphone. AVS enables natural language processing to control the product. To demonstrate their point, Amazon describes how Wink is integrating AVS into their connected home products to offer voice control.

Amazon has also created the $100 million Alexa Fund to kickstart the community while providing new interfaces for developers to integrate into their products. The funds are available to developers, manufacturers, and startups to help create innovative uses for voice interfaces.

Who are the competitors?

Amazon isn’t the only company flexing its vocal cords. Apple’s Siri remains a closed system, despite calls from developers for Apple to offer an API, but Nuance–the software on which Siri was originally based–is open for business. Siri continues to improve and make its way into more and more of Apple’s products like Apple Watch and the new Apple TV, and it may only be a matter of time before Apple takes the wraps off a developer API after years of teaching Siri how people talk to their devices.

Like Apple Watch, Google also made voice a primary user interface in its own Android Wear platform. Unlike Apple Watch, Google also provided an API for developers to add their own voice actions. Besides the ability to use voice to launch custom actions, Google also makes its built-in speech recognizer available to turn users’ speech into text for parsing.

On diminutive platforms like Android Wear and Apple Watch, where screen real estate is at a premium and full keyboards are impossible to implement practically, voice is the only interface that makes sense for input beyond simple taps and turns.

Other vendors trying to entice developers and manufacturers include api.ai, which is backed by Intel and Motorola, and wit.ai, which was acquired by Facebook earlier this year. Both services promise the ability to extract user intent from natural speech and turn it into actionable data.

With so many options from behemoths like Amazon, Google, and Nuance, untapped potential with Apple’s Siri, and nimble newcomers like api.ai, designers, developers, and manufacturers now have inexpensive tools to take a significant step forward in voice-user interface design.

How will we benefit?

While computing has made incredible leaps and crossed many significant milestones over the last 7 decades, the watershed moments where everything changes have been much more rare.

It can be argued that these watershed moments occur at points where the user interface to computing changes in some significant way, making computing accessible to millions, even billions more people.

The screen and keyboard. The mouse and point and click. The touch screen. And now, voice.

The commoditization of the incredible computing power that makes the voice-user interface possible is that next watershed moment. As developers use this technology to train their devices and services how to communicate meaningfully with humans in their own terms and language, every application that uses this technology will have the benefit of knowledge gained by every application that came before it. There will be a significant evolutionary leap forward for computing as a platform for intelligence, bringing about that next watershed moment–perhaps designed entirely by a computer itself.

Related Articles


Keep in the loop with the latest in emerging technology and Mutual Mobile