Scale your design
Multimodal design
Anatomy of a response
Conversational components
Conversational components are combined to compose the content in the spoken prompts, display prompts, and chips.
Conversational components (prompts and chips) should be designed for every dialog turn.
Spoken prompt | The content your Action speaks to the user, via TTS or pre-recorded audio |
Display prompt | The content your Action writes to the user, via printed text on the screen |
Chips | Suggestions for how the user can continue or pivot the conversation |
Visual components
Visual components include cards, carousels, and other visual assets.
Perfect for scanning and comparing options, visual components are useful if you're presenting detailed information—but they aren't required for every dialog turn.
Basic card | Use basic cards to display an image and text to users. |
Browsing carousel | Browsing carousels are optimized for allowing users to select one of many items, when those items are content from the web. |
Carousel | Carousels are optimized for allowing users to select one of many items, when those items are most easily differentiated by an image. |
List | Lists are optimized for allowing users to select one of many items, when those items are most easily differentiated by their title. |
Media response | Media responses are used to play and control the playback of audio content like music or other media. |
Table | Tables are used to display static data to users in an easily scannable format. |
Group devices by the components used for the response:
Go from spoken to multimodal
Relationship between prompts
In general, spoken prompts are optimized for and follow the conventions of spoken conversations. Display prompts are optimized for and follow the conventions of written conversations. Although slightly different, they should still convey the same core message.
Design prompts for both the ear and the eye. It’s easiest to start with the spoken prompt, imagining what you might say in a human-to-human conversation. Then, condense it to create the display prompt.
Say, essentially, the same thing
Do.
Don't.
Display prompts should be condensed versions of their spoken counterparts
Do.
Don't.
Keep the voice and tone consistent
Do.
Don't.
Design spoken and display prompts so they can be understood independently
Do.
Don't.
Relationship between components
Remember that all the components are meant to provide a single unified response.
It’s often easiest to start by writing prompts for a screenless experience, again imagining what you might say in a human-to-human conversation. Then, imagine how the conversation would change if one of the participants was holding a touchscreen. What details can now be omitted from the conversational components? Typically, the display prompt is significantly reduced since the user can just as easily comprehend the information in the visual as they can in the display prompt. Group the information in such a way that the user doesn’t have to look back and forth between the display prompt and visual repeatedly.
Always include the question in the prompts
Do.
Don't.
Avoid redundancy
Do.
Don't.
Give the short answer in the prompts, and the details in the visuals
Do.
Don't.
Even when the visuals provide the best answer, make sure the prompts still carry the core of the message
Do.
Don't.
Encourage users to select from lists or carousels, but allow them to continue with their voice
Do.
Don't.