By Shayna Stewart & Amit Garg | July 8, 2019

To say the stakes with voice interactions are high would be an understatement. This is the moment for voice technology.

Voice has the power to capture attention like never before because it hooks directly into the mechanism for how people think. It removes the friction of reading, clicking and translating like with other technologies. 

However, voice-based AI is highly constrained in terms of what it can and cannot do:

  1. Voice can only perform tasks that it was programmed to do, which can result in inaccuracies
  2. The user is not aware of all the potential tasks that can be completed
  3. Obviously, voice is not suitable for tasks that require sight 

These constraints make the design of voice one with little room for error.

At YML, we think about building products (including voice projects) in the form of an infinity loop, a repeated steps of moments that are continuously optimized as you learn more.

Below we outline how UX and Data Strategists can partner in each moment for a voice-based AI project to reduce risk of voice AI going terribly wrong.

1. DEFINE - Align tone and personality of the conversation

Having a clear, distinct vision is essential for voice.

The utility of the conversation is the most important part of the vision. At this point in time, the voice-based AI has not mastered the art of casual conversation where it can react to what the person has said and feed them what they are anticipating to hear through compliments and relatability.

AI is programmed to learn in smaller verbal tasks and take the learning of the smaller tasks and associate learnings to other tasks (though progress is being made there).

UX should define the utility, personality and context of the conversation. Why is this interaction important to have? At what point should the conversation happen, particularly if the conversation is prompted by another interaction? What is the intended outcome of the conversation? What qualities of our brand will this voice represent?

We must provide evidence for why voice is the right channel to design for in a given interaction, especially by understanding its context. For example, a bedroom voice interface that reduces volume to 25% at late night and is less wordy understands we don’t want loud robotic voices at midnight. Practically, a feature like that could be documented in a user flow.

Understanding the tone and the personality is not something a data person typically creates (or even understands in the real world), however, in voice this is a critical element because the personality is actually a data requirement. For example the answers to the following questions are data requirements:

  • How human does the voice need to sound?
  • How should the AI respond?

They should also start identifying any current or potential datasets that may be relevant to help train the AI in the subsequent product phases. They will need to work with a variety of teams to collect and get it into a format that can be easily used during training.

2. DESIGN - Examining vocal vs graphic UIs

Fundamentally, the way we think about the design of sound is different from sight. 

Of course there’s overlap, but it’s interesting to take a closer look. For example, a designer strives for visual consistency. Repetition and visual hierarchy help us stay organized when looking at an interface. 

But with speech, that kind of repetition gets rather annoying. Therefore, we should think of the journey as a carefully crafted conversation full of familiar variety.

In an app or website, the way people interact is relatively constant. GUI interaction occurs in a fairly regular rhythm of cognitive load. Mentally navigating the interface, reading text, and executing tasks requires a sustained level of attention through out. 

It’s a very different situation for voice. 

People make the first move, unprompted, and the system responds immediately. And, due to the transient quality of sound, people need to give their full attention to process the response. The luxury of closing an alert dialog without reading it on a GUI is not afforded by voice, nor is the action of reading and re-reading information. Instead, our full attention is required during voice interaction, and absolutely no attention when not interacting. 

Therefore, voice experiences should feel like a conversation - an interaction that we want to give more of our attention to when it matters - in order to have the highest likelihood of re-engagement.

UX should define a framework for the desired flow of the conversation.

Similar to designing for GUIs, the overall flow of the interaction needs to be designed, as well as defining the user intent the system should be recognizing.

UX should be asking questions like:

How can we remove friction in the process? Is this how someone would actually think about this interaction? Is the system doing everything possible to pick up on the nuances of speech and trying to move the conversation forward?

Even in the case that the engineering team leverages machine learning techniques to let the AI learn the conversation flow on its own, this framework will help the team identify if it is producing the intended results. 

For example, at YML we recently worked with a Fortune 500 insurance company to reimagine their self-service digital strategy, which involved a concept for using in-home voice assistants to handle basic transactions like paying a bill.

Along each step of the conversation, we outlined how the system should move the conversation forward by capturing user intent, setting the variables of intent, and the next action to be taken - all packaged into a helpful and professional voice that emanates confidence and security.

The data strategy team should partner with UX to understand the basic conversation framework and then work with the engineering team to understand their methodology for building the dialog model.

The data strategy team member will need to be able to translate the constraints of each methodology whether its’ rule-based or machine learning-based.

For example, Amazon Alexa skills experts recommend that the conversation has no hierarchy.

This makes sense when designing skills - which typically are a one use product. This is because it prevents questions to have to go through a menu-like conversation (think pretty much any credit card company call centers first line of defense, having to answer a multitude of yes’ or nos before just getting directed to someone).

Though the implications of having no hierarchy mean that:

  • The skill does not have to go through a rule-based system to answer the question (positive) 
  • However, the conversation can get repetitive leading to an outdated dataset in which the AI is making conversation from, reducing engagement overtime (negative) 

This synthesis of the UX framework and engineering approach is important in this step because it will provide input on how to evaluate the success, the learning methodology and optimization strategies post-launch.

3. DEVELOP - Bringing the vision to life and defining metrics

This step is owned by the engineering teams, but this step should entail having regular meetings with both UX and data strategy to ensure that the assumptions they are making are in line with the overall vision from UX and data strategy.

This is also where the AI starts to learn from the team.

Part of defining conversation flows requires defining trigger words in order to move forward in the task. These are documented in the user flow, and are launching points for a task.

During development, UX can conduct usability testing. The classic task-based metrics (effectiveness, efficiency, and satisfaction) are still relevant here, in addition to qual research (in-home ethnography, surveys, interviews, etc.) to learn how customers respond to the design in context.

Data strategy should be listening in to how the AI is progressing over time. If it is not producing the anticipated results, the data strategy expert will need to evaluate why. It may be because the dataset is biased, it may be due to the training dataset not being reflective of the task at hand.

Once the issue is identified, the data strategy expert can make recommendations as to how the dataset should be modified.

Also in this phase the data strategy expert should be outlining a measurement strategy for how the AI will be evaluated based on its current progress. Datasets that will evaluate the performance also need to be built into the development phase. This measurement strategy should include workflows and resources needed to update the AI as it encounters new phrases, as this can be a manual process post-mvp launch.

4. DEPLOY - Monitor, measure, and understand

Voice-based AI is a product that needs constant optimization to not only ensure that it continues to work as anticipated, but also to keep audiences engaged.

If the experience starts to lose it’s initial utility or becomes repetitive, the usage of the product will plummet. Teams should be monitoring and optimizing based on the workflows outlined in the data strategy to ensure the sustained quality of the product.

In addition to refining the design, UX can provide insight into why any failures may be happening.

Was there an insight missing from the define period that changed the perspective of the utility of the conversation? Or is there a technical failure happening? Is the developed conversation mismatched from what was designed? Maybe the personality feels off.

All of this needs to be caught as soon as possible and translated into any new requirements for refinement.

To get deeper insight, UX should review transcripts from all conversations had, as these will provide rich qualitative data to help understand how the product is performing.

The data teams should be analyzing the number of failed conversations, understanding why they failed and making recommendations on how to teach the AI based on these conversations.

This is when the measurement strategy workflow outlined in “Develop” is working.

The data strategy team member will likely need to ensure that the work flows are increasing in efficiency overtime through monitoring the AI KPIs. This is what will lead to continuous optimization of the product infinity loop.

In Conclusion

The qualities of voice-based AI defined in this process result in the underlying identity of a brand.

It’s a high impact touchpoint, that when it goes wrong, goes really wrong.

Though, it also has the potential to reach people in new ways. It is the personification of a brand and has the potential for businesses to create new relationships with their customers.

To protect your brand from a potentially high-risk situation, partner your UX and Data Strategy teams together.

Sources