/ Partnership

How to Build Amazon Alexa Skills That Sound Natural

We’re simple beings. When we say things, we like to be understood. When other people say things, we like to understand them. Whether it’s a living breathing human or an Amazon Echo, the best conversations are the ones we don’t have to think too much about.

Our new course, “Conversational Design with Alexa,” makes it easier to build Alexa skills that users will feel understood by. Like the first Alexa courses we released last year, we co-created the new course series with the Alexa experts at Amazon who know the subject best. Together, we want to help you build Alexa skills that feel as natural as talking to a friend.

Why? Well, there are millions of voice-based interactions happening with smart home devices and smartphones every day. That means any Alexa skill you develop has the potential to be adopted on a massive scale, but only if people feel comfortable with the voice interface you’ve designed. Read on to learn about the principles that define a good voice user interface, where the field is going next, and how you can level-up and play a part in that evolution.

What’s VUI?

Let’s start with some context. Voice User Interface (VUI) refers to an input method that “allows people to use voice input to control computers and devices.” Just a decade ago, applications of VUI were mostly limited to systems that “were capable of understanding human speech over the telephone in order to carry out tasks.” These were a major innovation in their own right, but they had their limitations—think of automated bank operators that asked you to say “account balance” if you wanted to check your account balance. They could be useful sources of information, but using them felt too much like talking to a robot.

With Alexa devices like the Amazon Echo in homes everywhere, we’ve seen VUI start to come into its own over the past five years. No longer quite as robotic, it’s a booming field with nearly limitless potential. Developers are proving that you can already build smart, adaptive skills that handle everything from high-level personal banking to conversations with Pikachu.

The IBM Shoebox, an early 1960s machine that could perform basic arithmetic on voice command. (Image via IBM Archives)

The potential of VUI is immense because, when well-designed, there’s no learning curve. We’re always hungry for information, and we want to get that information in ways that come naturally to us. Very few things come more naturally than the inputs-and-outputs of conversation. In the words of Cathy Pearl, author of “Designing Voice User Interfaces,” “imagine being able to create technology and not needing to instruct customers on how to use it because they already know: they can simply ask.”

This makes designing skills that users will feel comfortable using especially challenging for Alexa developers. Most of us are willing to learn how to navigate clunky website and mobile app layouts because even the best-designed visual interfaces take some getting used to—it’s an expected part of the process. The same is not true for voice-based interactions, which we’re immersed in from the very beginning of our lives. In other words, “the more you make users deviate from their normal conversational patterns, the more difficult the interaction will be.”

That’s why it’s so important for anyone developing an Alexa skill to account for the nuances of human conversation. The days of saying “account balance” to check your account balance are long gone. We want to talk to Alexa devices like we’d talk to anyone else, with room to improvise and think on the fly. It’s the responsibility of developers to make that possible.

What is Entity Resolution?

That’s where entity resolution comes in. Entity resolution gives Alexa developers the ability to define synonyms within the Alexa Service. This feature wasn’t available when our first Alexa course launched, but it’s a massive step toward building skills that simulate authentic conversation.

Defining synonyms for "rain" with entity resolution. (Image via Alexa Blogs)

Say you’re building an Alexa skill that mimics talking to your friend Jesse, an NBA junkie who knows everything about the league. If you wanted to ask Jesse how many points LeBron James scored in a recent game against the Detroit Pistons, how might you ask the question?

Jesse, how many points did LeBron James score against the Pistons last night?
Jesse, how many points did LeBron score against the Pistons last night?
Jesse, how many points did King James score against the Pistons last night?

For people to feel comfortable using your skill, they’ll want to ask Alexa in the same way(s) they would any other NBA nerd in their life. That means not having to think twice about the exact phrasing of the question, knowing that their friend will understand what they mean and respond accordingly.

Entity resolution makes it easier to build a skill that mimics natural human recognition patterns by accounting for synonyms, so in the example above, LeBron James, LeBron, and King James are all processed identically, taking the pressure off of users to ask the question in one specific way.

When you dig into it, you start to realize the incredible number of ways a person could ask this or any other simple question. The thought of accounting for every one of these variations ahead of time can feel intimidating, but developers can now address each of those variations with ease.

As Stephanie Hay, leader of the design team that created Capital One’s Alexa skill, has put it, “while we may not be able to predict every potential rabbit hole, we need to at least design an infrastructure that mimics how conversations work and are contextually driven.” Accounting for synonyms via entity resolution frees up developers to build that infrastructure for their Alexa skills.

In “Conversational Design with Alexa,” you’ll learn how to take full advantage of this new possibility. By the end of the course, you’ll be able to easily build skills that account for synonyms, one major step closer to building more conversational skills.

What’s the future of VUI?

It’s an exciting time to be a VUI developer. For one, more people are interacting with their devices by voice alone every day. comScore predicts that 50% of all searches will be voice searches by 2020.

With this shift has come an explosion of VUI development. Take a look at the Alexa skills store to get a sense of the applications of VUI tech that developers all over the world are working toward perfecting at the same time.

The Alexa Skills Store.

Mass adoption and optimization across the board will only drive up demand for Alexa skills that provide natural voice experiences. As Brooke Hawkins, Voice User Interface Designer at Emmi, has put it, “as users begin to trust their VUIs, they start using more complex responses and phrases that they expect the system should understand.” Supply and demand for well-designed Alexa skills will continue driving each other.

Any Alexa developer who is willing to take on the challenge of building skills that feel seamless to interact with should be prepared for an iterative process, with skills and use cases for those skills piling on top of each other. “If we are learning from the use cases we’ve designed in one [VUI], then we can more quickly nail it for different kinds of people,” says Stephanie Hay. That’s why we’re developing two more courses for this Conversational Design with Alexa series that will dig into dialog management and more advanced voice design principles.

Combined with our existing Alexa courses, these new courses will bring you closer to building skills that feel natural to interact with, and that people will have fun using. You’ll have the skillset to build anything you’re able to dream up.

With tens of millions of Alexa-enabled devices in homes around the world, any skill you build has the potential to make a major impact. Be a part of the generation that defines what’s possible with a voice-user interface and learn Conversational Design with Alexa.

Get more practice, more projects, and more guidance.