Six questions for Dustin Coates, author of Voice Applications for Alexa and Google Assistant

Interview conducted by Frances Lefkowitz, Development Editor, Manning Publications

Dustin A. Coates is a web developer who has been programming for and writing about voice-first devices since the Alexa Skills Kit was first released. He shares his knowledge online at talkingtocomputers.com. He’s a Google Developers Expert for Assistant, and is Voice Search Go-To-Market Lead at Algolia.

 

What exactly does “voice-first” mean?
 

Sometimes you’ll see people using voice-first in the same context as mobile-first: start by planning and developing for voice first, and then move on to building a website or mobile app. I don’t find this meaning very useful, because it obscures the fact that voice applications are different than building for other platforms, not simply scaled-down versions.
Another meaning for voice-first, which I find much more enlightening, is ways of interacting with voice applications. I bucket them into voice-only, voice-first, and voice-added. These categories live on a spectrum, and refer to the degree to which voice is the primary mode of interaction. For example, speaking to Google Assistant through headphones would be a voice-only experience because voice is the only input and output. A voice search input on a website might be voice-added if the primary interaction is still through the screen. And speaking to your Echo device is a voice-first experience.

 

How do these apps differ from chatbots, and how does the development process compare?
 

Chatbots and voice apps both use conversational interactions to achieve a goal. The primary difference for most developers is not in the tech, but in the design. Voice is naturally a less precise mode of interaction than text. Developers need to account for misunderstandings on both sides of the conversations that arise due to the aural nature of voice. (Though they don’t need to worry about misspellings, as they do for chatbots!)
Another significant difference is that voice moves always forward at a single, fixed rate. Users can’t scan a response for what they want, so what the application says needs to be concise, on-topic, and fully covering the request while hewing to conversational best practices that we all learn from birth. This is true for chatbots, too, but takes heightened importance through voice.

 

What new kinds of skills, and new ways of thinking, do developers need for the general movement toward more voice and conversational apps?
 

Developers need to start thinking like script writers. There are myriad “rules” of conversation which we never think about explicitly, but they make interactions between people efficient. Four of the most important are Grice’s maxims, which say that any response in a conversation is expected to be of the right quantity, quality, relation, and manner. To pull out just one of these maxims, think of people you know who are overly taciturn or who go on for too long. Both are flouting the maxim of quality.
Still, these maxims are strong enough that even the flouting thereof is generally assumed to be a message. If your friend asks how your date went and you replied that you enjoyed the evening news, the assumption is not that you’re changing the subject. Instead, the date must have gone poorly enough that you were home by a reasonable hour.

 

What your book does is teach readers to build apps that kind of piggy-back on the biggest voice-first platforms, like Alexa and Google Assistant? Do these apps have their own voices and personalities, or do they just use the platform’s?
 

Yes, the voice-first platforms use two kinds of apps. First-party apps are the ones built into the platform, and they wholly assume the assistant’s personality. Third-party apps are those created by developers like the one reading my book. Third-party apps do not need to take on the personality of the assistant on the platform. Still, both Alexa and Google Assistant do give developers the option to use different computer-generated voices.

 

Can you give me an idea of the kinds of apps you’re talking about?
 

The best voice apps are the ones that complete a task more efficiently than through other modes. Controlling a television or lights are easier on voice, as is getting a small nugget of information. Voice also can lend itself well to games and other entertainment. Alas, voice is still not the best for purchasing, though both Amazon and Google are working to make it easier.

 

You’ve spent some time in France. How do you say, “Hey, Google” in French?
 

You can say “Okay, Google” just like in the other locales.