Responses

This document describes the different types of user interfaces that you can present to your users when building Actions that can have audio or visual components or a combination of both.

Simple response

Figure 1. Simple response example (smartphone)

Simple responses take the form of a chat bubble visually and TTS/SSML sound.

TTS text will be used as chat bubble content by default. So if this looks fine, you do not need to specify any display text for a chat bubble.

Properties

Simple responses have the following requirements and optional properties that you can configure:

  • Supported on surfaces with the actions.capability.AUDIO_OUTPUT or actions.capability.SCREEN_OUTPUT capabilities.
  • 640 character limit per chat bubble. Strings longer than the limit are truncated at the first word break (or whitespace) before 640 characters.

  • Chat bubble content must be a phonetic subset or a complete transcript of the TTS/SSML output. This helps users map out what you are saying and increases comprehension in various conditions.

  • At most two chat bubbles per turn.

  • Chat head (logo) that you submit to Google must be 192x192 pixels and cannot be animated.

Figure 2. Simple response example (smart display)

Sample code

Node.js
conv.ask(new SimpleResponse({
  speech: 'Howdy, this is GeekNum. I can tell you fun facts about almost any number, my favorite is 42. What number do you have in mind?',
  text: 'Howdy! I can tell you fun facts about almost any number. What do you have in mind?',
}));
Java
ResponseBuilder responseBuilder = getResponseBuilder(request);
return responseBuilder
    .add(
        new SimpleResponse()
            .setDisplayText(
                "Howdy! I can tell you fun facts about almost any number. "
                    + "What do you have in mind?")
            .setTextToSpeech(
                "Howdy, this is GeekNum. I can tell you fun facts about almost any number, "
                    + " my favorite is 42. What number do you have in mind?"))
    .build();
Dialogflow JSON

Note that the JSON below describes a webhook response.

{
  "payload": {
    "google": {
      "expectUserResponse": true,
      "richResponse": {
        "items": [
          {
            "simpleResponse": {
              "textToSpeech": "Howdy, this is GeekNum. I can tell you fun facts about almost any number, my favorite is 42. What number do you have in mind?",
              "displayText": "Howdy! I can tell you fun facts about almost any number. What do you have in mind?"
            }
          }
        ]
      }
    }
  }
}
Actions SDK JSON

Note that the JSON below describes a webhook response.

{
  "expectUserResponse": true,
  "expectedInputs": [
    {
      "possibleIntents": [
        {
          "intent": "actions.intent.TEXT"
        }
      ],
      "inputPrompt": {
        "richInitialPrompt": {
          "items": [
            {
              "simpleResponse": {
                "textToSpeech": "Howdy, this is GeekNum. I can tell you fun facts about almost any number, my favorite is 42. What number do you have in mind?",
                "displayText": "Howdy! I can tell you fun facts about almost any number. What do you have in mind?"
              }
            }
          ]
        }
      }
    }
  ]
}

SSML and sounds

Using SSML and sounds in your responses gives them more polish and enhances the user experience. The following code snippet shows you how to create a response that uses SSML:

Node.js
function saySSML(conv) {
  const ssml = '<speak>' +
    'Here are <say-as interpret-as="characters">SSML</say-as> samples. ' +
    'I can pause <break time="3" />. ' +
    'I can play a sound <audio src="https://www.example.com/MY_WAVE_FILE.wav">your wave file</audio>. ' +
    'I can speak in cardinals. Your position is <say-as interpret-as="cardinal">10</say-as> in line. ' +
    'Or I can speak in ordinals. You are <say-as interpret-as="ordinal">10</say-as> in line. ' +
    'Or I can even speak in digits. Your position in line is <say-as interpret-as="digits">10</say-as>. ' +
    'I can also substitute phrases, like the <sub alias="World Wide Web Consortium">W3C</sub>. ' +
    'Finally, I can speak a paragraph with two sentences. ' +
    '<p><s>This is sentence one.</s><s>This is sentence two.</s></p>' +
    '</speak>';
  conv.ask(ssml);
}
    
Java
ResponseBuilder responseBuilder = getResponseBuilder(request);

String ssmlResponse =
    "<speak>"
        + "Here are <say-as interpret-as=\"characters\">SSML</say-as> samples. "
        + "I can pause <break time=\"3\" />. "
        + "I can play a sound <audio src=\"https://www.example.com/MY_WAVE_FILE.wav\">"
        + "your wave file</audio>. "
        + "I can speak in cardinals. Your position is "
        + "<say-as interpret-as=\"cardinal\">10</say-as> in line. "
        + "Or I can speak in ordinals. You are "
        + "<say-as interpret-as=\"ordinal\">10</say-as> in line. "
        + "Or I can even speak in digits. Your position in line is "
        + "<say-as interpret-as=\"digits\">10</say-as>. "
        + "I can also substitute phrases, like the "
        + "<sub alias=\"World Wide Web Consortium\">W3C</sub>. "
        + "Finally, I can speak a paragraph with two sentences. "
        + "<p><s>This is sentence one.</s><s>This is sentence two.</s></p>"
        + "</speak>";
return responseBuilder.add(ssmlResponse).build();
    
Actions SDK JSON
{
    "conversationToken": "",
    "expectUserResponse": true,
    "expectedInputs": [
        {
            "inputPrompt": {
                "initialPrompts": [
                    {
                        "ssml": "<speak>Here are <say-as interpret-as=\"characters\">SSML</say-as> samples. I can pause <break time=\"3\" />. I can play a sound <audio src=\"https://www.example.com/MY_WAVE_FILE.wav\">your wave file</audio>. I can speak in cardinals. Your position is <say-as interpret-as=\"cardinal\">10</say-as> in line. Or I can speak in ordinals. You are <say-as interpret-as=\"ordinal\">10</say-as> in line. Or I can even speak in digits. Your position in line is <say-as interpret-as=\"digits\">10</say-as>. I can also substitute phrases, like the <sub alias=\"World Wide Web Consortium\">W3C</sub>. Finally, I can speak a paragraph with two sentences. <p><s>This is sentence one.</s><s>This is sentence two.</s></p></speak>"
                    }
                ],
                "noInputPrompts": []
            },
            "possibleIntents": [
                {
                    "intent": "actions.intent.TEXT"
                }
            ]
        }
    ]
}

See the SSML reference documentation for more information.

Sound library

We provide a variety of free, short sounds in our sound library. These sounds are hosted for you, so all you need to do is include them in your SSML.

Rich responses

Use a rich response if you want to display visual elements to enhance user interactions with your Action. These visual elements can provide hints on how to continue a conversation.

Rich responses can appear on screen-only or audio and screen experiences. They can contain the following components:

Properties

Rich responses have the following requirements and optional properties that you can configure:

  • Supported on surfaces with the actions.capability.SCREEN_OUTPUT capability.
  • The first item in a rich response must be a simple response.
  • At most two simple responses.
  • At most one basic card or StructuredResponse.
  • At most 8 suggestion chips.
  • Suggestion chips are not allowed in a FinalResponse
  • Linking out to the web from smart displays is currently not supported.

The following sections show you how to build various types of rich responses.

Basic card

Figure 3. Basic card example (smartphone)

A basic card displays information that can include the following:

  • Image
  • Title
  • Sub-title
  • Text body
  • Link button
  • Border

Use basic cards mainly for display purposes. They are designed to be concise, to present key (or summary) information to users, and to allow users to learn more if you choose (using a weblink).

In most situations, you should add suggestion chips below the cards to continue or pivot the conversation.

Avoid repeating the information presented in the card in the chat bubble at all costs.

Properties

The basic card response type has the following requirements and optional properties that you can configure:

  • Supported on surfaces with the actions.capability.SCREEN_OUTPUT capability.
  • Formatted text (required if there's no image)
    • Plain text by default.
    • Must not contain a link.
    • 10 line limit with an image, 15 line limit without an image. This is about 500 (with image) or 750 (without image) characters. Smaller screen phones will also truncate text earlier than larger screen phones. If text contains too many lines, it's truncated at the last word break with an ellipses.
    • A limited subset of markdown is supported:
      • New line with a double space followed by \n
      • **bold**
      • *italics*
  • Image (required if there's no formatted text)
    • All images forced to be 192 dp tall.
    • If the image's aspect ratio is different than the screen, the image is centered with gray bars on either vertical or horizontal edges.
    • Image source is a URL.
    • Motion GIFs are allowed.
Optional
  • Title
    • Plain text.
    • Fixed font and size.
    • At most one line; extra characters are truncated.
    • The card height collapses if no title is specified.
  • Sub-title
    • Plain text.
    • Fixed font and font size.
    • At most one line; extra characters are truncated.
    • The card height collapses if no subtitle is specified.
  • Link button
    • Link title is required
    • At most one link
    • Links to sites outside the developer's domain are allowed.
    • Link text cannot be misleading. This is checked in the approval process.
    • A basic card has no interaction capabilities without a link. Tapping on the link sends the user to the link, while the main body of the card remains inactive.
  • Border
    • The border between the card and the image container can be adjusted to customize the presentation of your basic card.
    • Configured by setting the JSON string property imageDisplayOptions
Figure 4. Basic card example (smart display)

Sample code

Node.js
if (!conv.surface.capabilities.has('actions.capability.SCREEN_OUTPUT')) {
  conv.ask('Sorry, try this on a screen device or select the ' +
    'phone surface in the simulator.');
  return;
}

conv.ask('This is a basic card example.');
// Create a basic card
conv.ask(new BasicCard({
  text: `This is a basic card.  Text in a basic card can include "quotes" and
  most other unicode characters including emoji 📱.  Basic cards also support
  some markdown formatting like *emphasis* or _italics_, **strong** or
  __bold__, and ***bold itallic*** or ___strong emphasis___ as well as other
  things like line  \nbreaks`, // Note the two spaces before '\n' required for
                               // a line break to be rendered in the card.
  subtitle: 'This is a subtitle',
  title: 'Title: this is a title',
  buttons: new Button({
    title: 'This is a button',
    url: 'https://assistant.google.com/',
  }),
  image: new Image({
    url: 'https://example.com/image.png',
    alt: 'Image alternate text',
  }),
  display: 'CROPPED',
}));
Java
ResponseBuilder responseBuilder = getResponseBuilder(request);
if (!request.hasCapability(Capability.SCREEN_OUTPUT.getValue())) {
  return responseBuilder
      .add("Sorry, try ths on a screen device or select the phone surface in the simulator.")
      .build();
}
Button learnMoreButton =
    new Button()
        .setTitle("This is a Button")
        .setOpenUrlAction(new OpenUrlAction().setUrl("https://assistant.google.com"));
List<Button> buttons = new ArrayList<>();
buttons.add(learnMoreButton);
String text =
    "This is a basic card.  Text in a basic card can include \"quotes\" and\n"
        + "  most other unicode characters including emoji \uD83D\uDCF1. Basic cards also support\n"
        + "  some markdown formatting like *emphasis* or _italics_, **strong** or\n"
        + "  __bold__, and ***bold itallic*** or ___strong emphasis___ as well as other\n"
        + "  things like line  \\nbreaks";
responseBuilder
    .add("This is a basic card")
    .add(
        new BasicCard()
            .setTitle("This is a title")
            .setSubtitle("This is a subtitle")
            .setFormattedText(text)
            .setImage(
                new Image()
                    .setUrl("http://example.com/image.png")
                    .setAccessibilityText("Image alternate text"))
            .setImageDisplayOptions("CROPPED")
            .setButtons(buttons));
return responseBuilder.build();
Dialogflow JSON

Note that the JSON below describes a webhook response.

{
  "payload": {
    "google": {
      "expectUserResponse": true,
      "richResponse": {
        "items": [
          {
            "simpleResponse": {
              "textToSpeech": "This is a basic card example."
            }
          },
          {
            "basicCard": {
              "title": "Title: this is a title",
              "subtitle": "This is a subtitle",
              "formattedText": "This is a basic card.  Text in a basic card can include \"quotes\" and\n        most other unicode characters including emoji 📱.  Basic cards also support\n        some markdown formatting like *emphasis* or _italics_, **strong** or\n        __bold__, and ***bold itallic*** or ___strong emphasis___ as well as other\n        things like line  \nbreaks",
              "image": {
                "url": "https://example.com/image.png",
                "accessibilityText": "Image alternate text"
              },
              "buttons": [
                {
                  "title": "This is a button",
                  "openUrlAction": {
                    "url": "https://assistant.google.com/"
                  }
                }
              ],
              "imageDisplayOptions": "CROPPED"
            }
          }
        ]
      }
    }
  }
}
Actions SDK JSON

Note that the JSON below describes a webhook response.

{
  "expectUserResponse": true,
  "expectedInputs": [
    {
      "possibleIntents": [
        {
          "intent": "actions.intent.TEXT"
        }
      ],
      "inputPrompt": {
        "richInitialPrompt": {
          "items": [
            {
              "simpleResponse": {
                "textToSpeech": "This is a basic card example."
              }
            },
            {
              "basicCard": {
                "title": "Title: this is a title",
                "subtitle": "This is a subtitle",
                "formattedText": "This is a basic card.  Text in a basic card can include \"quotes\" and\n        most other unicode characters including emoji 📱.  Basic cards also support\n        some markdown formatting like *emphasis* or _italics_, **strong** or\n        __bold__, and ***bold itallic*** or ___strong emphasis___ as well as other\n        things like line  \nbreaks",
                "image": {
                  "url": "https://example.com/image.png",
                  "accessibilityText": "Image alternate text"
                },
                "buttons": [
                  {
                    "title": "This is a button",
                    "openUrlAction": {
                      "url": "https://assistant.google.com/"
                    }
                  }
                ],
                "imageDisplayOptions": "CROPPED"
              }
            }
          ]
        }
      }
    }
  ]
}
Figure 5. Browsing carousel example (smartphone)

A browsing carousel is a rich response that allows users to scroll vertically and select a tile in a collection. Browsing carousels are designed specifically for web content by opening the selected tile in a web browser (or an AMP browser if all tiles are AMP-enabled). The browsing carousel will also persist on the user's Assistant surface for browsing later.

Properties

The browsing carousel response type has the following requirements and optional properties that you can configure:

  • Supported on surfaces that have both the actions.capability.SCREEN_OUTPUT and actions.capability.WEB_BROWSER capabilities. This response type is currently not available on smart displays.
  • Browsing carousel
    • Maximum of ten tiles.
    • Minimum of two tiles.
    • Tiles in the carousel must all link to web content (AMP content recommended).
      • In order for the user to be taken to an AMP viewer, the urlHintType on AMP content tiles must be set to "AMP_CONTENT".
  • Browsing carousel tiles

    • Tile consistency (required):
      • All tiles in a browsing carousel must have the same components. For example, if one tile has an image field, the rest of the tiles in the carousel must also have image fields.
      • If all tiles in the browsing carousel link to AMP-enabled content, the user will be taken to an AMP browser with additional functionality. If any tile links to non-AMP content, then all tiles will direct users to a web browser.
    • Image (optional)
      • Image is forced to be 128 dp tall x 232 dp wide.
      • If the image aspect ratio doesn't match the image bounding box, then the image is centered with bars on either side.
      • If an image link is broken then a placeholder image is used instead.
      • Alt-text is required on an image.
    • Title (required)
      • Same formatting options as the basic text card.
      • Titles must be unique (to support voice selection).
      • Maximum of two lines of text.
      • Font size 16 sp.
    • Description (optional)
      • Same formatting options as the basic text card.
      • Maximum of four lines of text.
      • Truncated with ellipses (...)
      • Font size 14sp, gray color.
    • Footer (optional)
      • Fixed font and font size.
      • Maximum of one line of text.
      • Truncated with ellipses (...)
      • Anchored at the bottom, so tiles with fewer lines of body text may have white space above the sub-text.
      • Font size 14sp, gray color.
  • Interaction

    • The user can scroll vertically to view items.
    • Tap card: Tapping an item takes the user to a browser, displaying the linked page.
  • Voice input

    • Mic behavior
      • The mic doesn't re-open when a browsing carousel is sent to the user.
      • The user can still tap the mic or invoke the Assistant ("OK Google") to re-open the mic.
Guidance

By default, the mic remains closed after a browse carousel is sent. If you want to continue the conversation afterwards, we strongly recommend adding suggestion chips below the carousel.

Never repeat the options presented in the list as suggestion chips. Chips in this context are used to pivot the conversation (not for choice selection).

Same as with lists, the chat bubble that accompanies the carousel card is a subset of the audio (TTS/SSML). The audio (TTS/SSML) here integrates the first tile in the carousel, and we also strongly discourage reading all the elements from the carousel. It's best to mention the first item and the reason why it's there (for example, the most popular, the most recently purchased, the most talked about).

Sample code

Handling selected item

No follow-up fulfillment is necessary for user interactions with browse carousel items, since the carousel handles the browser handoff. Keep in mind that the mic will not re-open after the user interacts with a browse carousel item, so you should either end the conversation or include suggestion chips in your response as per the guidance above.

Suggestion chips

Figure 6. Suggestion chips example (smartphone)

Use suggestion chips to hint at responses to continue or pivot the conversation. If during the conversation there is a primary call for action, consider listing that as the first suggestion chip.

Whenever possible, you should incorporate one key suggestion as part of the chat bubble, but do so only if the response or chat conversation feels natural.

Properties

Suggestion chips have the following requirements and optional properties that you can configure:

  • Supported on surfaces with the actions.capability.SCREEN_OUTPUT capability.
  • To link suggestion chips out to the web, surfaces must also have the actions.capability.WEB_BROWSER capability. This capability is currently not available on smart displays.
  • Maximum of eight chips.
  • Maximum text length of 20 characters.
  • Supports only plain text.
Figure 7. Suggestion chips example (smart display)

Sample code

Node.js
if (!conv.surface.capabilities.has('actions.capability.SCREEN_OUTPUT')) {
  conv.ask('Sorry, try this on a screen device or select the ' +
    'phone surface in the simulator.');
  return;
}

conv.ask('These are suggestion chips.');
conv.ask(new Suggestions('Suggestion Chips'));
conv.ask(new Suggestions(['suggestion 1', 'suggestion 2']));
conv.ask(new LinkOutSuggestion({
  name: 'Suggestion Link',
  url: 'https://assistant.google.com/',
}));
Java
ResponseBuilder responseBuilder = getResponseBuilder(request);
if (!request.hasCapability(Capability.SCREEN_OUTPUT.getValue())) {
  return responseBuilder
      .add("Sorry, try ths on a screen device or select the phone surface in the simulator.")
      .build();
}

return responseBuilder
    .addSuggestions(new String[]{"Suggestion chips", "suggestion 1", "suggestion 3"})
    .add(
        new LinkOutSuggestion()
            .setDestinationName("Suggestion link")
            .setUrl("https://assistant.google.com"))
    .build();
Dialogflow JSON

Note that the JSON below describes a webhook response.

{
  "payload": {
    "google": {
      "expectUserResponse": true,
      "richResponse": {
        "items": [
          {
            "simpleResponse": {
              "textToSpeech": "These are suggestion chips."
            }
          }
        ],
        "suggestions": [
          {
            "title": "Suggestion Chips"
          },
          {
            "title": "suggestion 1"
          },
          {
            "title": "suggestion 2"
          }
        ],
        "linkOutSuggestion": {
          "destinationName": "Suggestion Link",
          "url": "https://assistant.google.com/"
        }
      }
    }
  }
}
Actions SDK JSON

Note that the JSON below describes a webhook response.

{
  "expectUserResponse": true,
  "expectedInputs": [
    {
      "possibleIntents": [
        {
          "intent": "actions.intent.TEXT"
        }
      ],
      "inputPrompt": {
        "richInitialPrompt": {
          "items": [
            {
              "simpleResponse": {
                "textToSpeech": "These are suggestion chips."
              }
            }
          ],
          "suggestions": [
            {
              "title": "Suggestion Chips"
            },
            {
              "title": "suggestion 1"
            },
            {
              "title": "suggestion 2"
            }
          ],
          "linkOutSuggestion": {
            "destinationName": "Suggestion Link",
            "url": "https://assistant.google.com/"
          }
        }
      }
    }
  ]
}

Media responses

Figure 8. Media response example (smartphone)

Media responses let your Actions play audio content with a playback duration longer than the 120-second limit of SSML. The primary component of a media response is the single-track card. The card allows the user to perform these operations:

  • Replay the last 10 seconds.
  • Skip forward for 30 seconds.
  • View the total length of the media content.
  • View a progress indicator for audio playback.
  • View the elapsed playback time.

Media responses support the following audio controls for voice interaction:

  • “Ok Google, play.”
  • “Ok Google, pause.”
  • “Ok Google, stop.”
  • “Ok Google, start over.”

Properties

Media responses have the following requirements and optional properties that you can configure:

  • Supported on surfaces with the actions.capability.MEDIA_RESPONSE_AUDIO capability.
  • Audio for playback must be in a correctly formatted .mp3 file. Live streaming is not supported.
  • The media file for playback must be specified as an HTTPS URL.
  • Image (optional)
    • You can optionally include a small thumbnail or a large image.
    • Small image
      • Your image appears as a borderless thumbnail on the right of the media player card.
      • Size should be 36 x 36 dp. Larger sized images are resized to fit.
    • Large image
      • The image container will be 192 dp tall.
      • Your image appears at the top of the media player card, and spans the full width of the card. Most images will appear with bars along the top or sides.
      • Motion GIFs are allowed.
    • You must specify the image source as a URL.
    • Alt-text is required on all images.

Behavior on surfaces

Media responses are supported on Android phones and on Google Home. The behavior of media responses depends on the surface on which users interact with your Actions.

On Android phones, users can see media responses when any of these conditions are met:

  • Google Assistant is in the foreground, and the phone screen is on.
  • The user leaves Google Assistant while audio is playing and returns to Google Assistant within 10 minutes of playback completion. On returning to Google Assistant, the user sees the media card and suggestion chips.

Media controls are available while the phone is locked. On Android, the controls also appear in the notification area.

Figure 9. Media response example (smart display)

Sample code

The following code sample shows how you might update your rich responses to include media.

Node.js
if (!conv.surface.capabilities.has('actions.capability.MEDIA_RESPONSE_AUDIO')) {
  conv.ask('Sorry, this device does not support audio playback.');
  return;
}

conv.ask('This is a media response example.');
conv.ask(new MediaObject({
  name: 'Jazz in Paris',
  url: 'https://storage.googleapis.com/automotive-media/Jazz_In_Paris.mp3',
  description: 'A funky Jazz tune',
  icon: new Image({
    url: 'https://storage.googleapis.com/automotive-media/album_art.jpg',
    alt: 'Album cover of an ccean view',
  }),
}));
Java
ResponseBuilder responseBuilder = getResponseBuilder(request);
if (!request.hasCapability(Capability.SCREEN_OUTPUT.getValue())) {
  return responseBuilder
      .add("Sorry, try ths on a screen device or select the phone surface in the simulator.")
      .build();
}

MediaObject mediaObject = new MediaObject();
mediaObject
    .setName("Jazz in Paris")
    .setContentUrl("https://storage.googleapis.com/automotive-media/Jazz_In_Paris.mp3")
    .setDescription("A funky Jazz tune")
    .setIcon(
        new Image()
            .setUrl("'https://storage.googleapis.com/automotive-media/album_art.jpg")
            .setAccessibilityText("Ocean view"));
List<MediaObject> mediaObjects = new ArrayList<>();
mediaObjects.add(mediaObject);

return responseBuilder.add(new MediaResponse().setMediaObjects(mediaObjects)).build();
Dialogflow JSON

Note that the JSON below describes a webhook response.

{
  "payload": {
    "google": {
      "expectUserResponse": true,
      "richResponse": {
        "items": [
          {
            "simpleResponse": {
              "textToSpeech": "This is a media response example."
            }
          },
          {
            "mediaResponse": {
              "mediaType": "AUDIO",
              "mediaObjects": [
                {
                  "contentUrl": "https://storage.googleapis.com/automotive-media/Jazz_In_Paris.mp3",
                  "description": "A funky Jazz tune",
                  "icon": {
                    "url": "https://storage.googleapis.com/automotive-media/album_art.jpg",
                    "accessibilityText": "Album cover of an ccean view"
                  },
                  "name": "Jazz in Paris"
                }
              ]
            }
          }
        ]
      }
    }
  }
}
Actions SDK JSON

Note that the JSON below describes a webhook response.

{
  "expectUserResponse": true,
  "expectedInputs": [
    {
      "possibleIntents": [
        {
          "intent": "actions.intent.TEXT"
        }
      ],
      "inputPrompt": {
        "richInitialPrompt": {
          "items": [
            {
              "simpleResponse": {
                "textToSpeech": "This is a media response example."
              }
            },
            {
              "mediaResponse": {
                "mediaType": "AUDIO",
                "mediaObjects": [
                  {
                    "contentUrl": "https://storage.googleapis.com/automotive-media/Jazz_In_Paris.mp3",
                    "description": "A funky Jazz tune",
                    "icon": {
                      "url": "https://storage.googleapis.com/automotive-media/album_art.jpg",
                      "accessibilityText": "Ocean view"
                    },
                    "name": "Jazz in Paris"
                  }
                ]
              }
            }
          ]
        }
      }
    }
  ]
}

Guidance

Your response must include a mediaResponse with a mediaType of AUDIO and containing a mediaObject within the rich response's item array. A media response supports a single media object. A media object must include the name and content URL of the audio file. A media object may optionally include sub-text (description) and an icon or large image URL.

On phones and Google Home, when your Action completes audio playback, Google Assistant checks if the media response is a final response. If not, it sends a callback to your fulfillment, allowing you to respond to the user.

Your Action must include suggestion chips if the response is not a final response.

Handling callback after playback completion

Your Action should handle the actions.intent.MEDIA_STATUS intent to prompt user for follow-up (for example, to play another song). Your Action receives this callback once media playback is completed. In the callback, the MEDIA_STATUS argument contains status information about the current media. The status value will either be FINISHED or STATUS_UNSPECIFIED.

Using Dialogflow

If you want to perform conversational branching in Dialogflow, you’ll need to set up an input context of actions_capability_media_response_audio on the intent to ensure it only triggers on surfaces that support a media response.

Building your fulfillment

The code snippet below shows how you might write the fulfillment code for your Action. If you're using Dialogflow, replace actions.intent.MEDIA_STATUS with the action name specified in the intent which receives the actions_intent_MEDIA_STATUS event, (for example, "media.status.update“).

Node.js
app.intent('actions.intent.MEDIA_STATUS', function(conv) {
  const mediaStatus = conv.arguments.get('MEDIA_STATUS');
  let response = 'Unknown media status received.';
  if (mediaStatus && mediaStatus.status === 'FINISHED') {
    response = 'Hope you enjoyed the tunes!';
  }
  conv.ask(response);
});
Java
ResponseBuilder responseBuilder = getResponseBuilder(request);
String mediaStatus = request.getMediaStatus();
String response;
if (mediaStatus != null && mediaStatus.equals("FINISHED")) {
  response = "Hope you enjoyed the tunes!";
} else {
  response = "Unknown media status received";
}

return responseBuilder.add(response).build();
Dialogflow JSON

Note that the JSON below describes a webhook response.

{
  "payload": {
    "google": {
      "expectUserResponse": true,
      "richResponse": {
        "items": [
          {
            "simpleResponse": {
              "textToSpeech": "Unknown media status received."
            }
          }
        ]
      }
    }
  }
}
Actions SDK JSON

Note that the JSON below describes a webhook response.

{
  "expectUserResponse": true,
  "expectedInputs": [
    {
      "possibleIntents": [
        {
          "intent": "actions.intent.TEXT"
        }
      ],
      "inputPrompt": {
        "richInitialPrompt": {
          "items": [
            {
              "simpleResponse": {
                "textToSpeech": "Unknown media status received."
              }
            }
          ]
        }
      }
    }
  ]
}

Table cards

Table cards allow you to display tabular data in your response (for example, sports standings, election results, and flights). You can define columns and rows (up to 3 each) that the Assistant is required to show in your table card. You can also define additional columns and rows along with their prioritization.

Tables are different than vertical lists because tables display static data and are not interactable, like list elements.

Figure 10. Table card example (smart display)
Properties

Table cards have the following requirements and optional properties that you can configure:

  • Supported on surfaces with the actions.capability.SCREEN_OUTPUT capability.

The following section summarizes how you can customize the elements in a table card.

Name Is Optional Is Customizable Customization Notes
title Yes Yes Overall title of the table. Must be set if subtitle is set. You can customize the font family and color.
subtitle Yes No Subtitle for the table.
image Yes Yes Image associated with the table.
Row No Yes

Row data of the table. Consists of an array of Cell objects and a divider_after property which indicates whether there should be a divider after the row.

The first 3 rows are guaranteed to be shown, but others might not appear on certain surfaces.

Please test with the simulator to see which rows will be shown for a given surface. On surfaces that support the WEB_BROWSER capability, you can point the user to a web page with more data. Linking to the web is currently not available for smart displays.

ColumnProperties Yes Yes Header and alignment for a column. Consists of a header property (representing the header text for a column) and a horizontal_alignment property (of type HorizontalAlignment).
Cell No Yes Describes a cell in a row. Each cell contains a string representing a text value. You can customize the text in the cell.
Button Yes Yes A button object that usually appears at the bottom of a card. A table card can only have 1 button. You can customize the button color.
HorizontalAlignment Yes Yes Horizontal alignment of content within the cell. Values can be LEADING, CENTER, or TRAILING. If unspecified, content will be aligned to the leading edge of the cell.
Sample code

The following snippets show how to implement a simple table card:

Node.js
// Simple table
conv.ask('This is a simple table example.')
conv.ask(new Table({
  dividers: true,
  columns: ['header 1', 'header 2', 'header 3'],
  rows: [
    ['row 1 item 1', 'row 1 item 2', 'row 1 item 3'],
    ['row 2 item 1', 'row 2 item 2', 'row 2 item 3'],
  ],
}))
Java
ResponseBuilder responseBuilder = getResponseBuilder(request);
if (!request.hasCapability(Capability.SCREEN_OUTPUT.getValue())) {
  return responseBuilder.add("Sorry, try this on a device with a screen").build();
}

List<TableCardColumnProperties> columnProperties = new ArrayList<>();
columnProperties.add(new TableCardColumnProperties().setHeader("Column #1"));
columnProperties.add(new TableCardColumnProperties().setHeader("Column #2"));
columnProperties.add(new TableCardColumnProperties().setHeader("Column #3"));

List<TableCardRow> rows = new ArrayList<>();
for (int i = 0; i < 4; i++) {
  List<TableCardCell> cells = new ArrayList<>();
  for (int j = 0; j < 3; j++) {
    String cellText = MessageFormat.format("Cell #{0}", (i + 1));
    cells.add(new TableCardCell().setText(cellText));
  }
  rows.add(new TableCardRow().setCells(cells));
}

TableCard table =
        new TableCard()
                .setColumnProperties(columnProperties)
                .setRows(rows);

responseBuilder.add('This is an example of Table card.').add(table);
return responseBuilder.build();
Dialogflow JSON

Note that the JSON below describes a webhook response.

{
  "payload": {
    "google": {
      "expectUserResponse": true,
      "richResponse": {
        "items": [
          {
            "simpleResponse": {
              "textToSpeech": "This is a simple table example."
            }
          },
          {
            "tableCard": {
              "rows": [
                {
                  "cells": [
                    {
                      "text": "row 1 item 1"
                    },
                    {
                      "text": "row 1 item 2"
                    },
                    {
                      "text": "row 1 item 3"
                    }
                  ],
                  "dividerAfter": true
                },
                {
                  "cells": [
                    {
                      "text": "row 2 item 1"
                    },
                    {
                      "text": "row 2 item 2"
                    },
                    {
                      "text": "row 2 item 3"
                    }
                  ],
                  "dividerAfter": true
                }
              ],
              "columnProperties": [
                {
                  "header": "header 1"
                },
                {
                  "header": "header 2"
                },
                {
                  "header": "header 3"
                }
              ]
            }
          }
        ]
      }
    }
  }
}
Actions SDK JSON

Note that the JSON below describes a webhook response.

{
  "expectUserResponse": true,
  "expectedInputs": [
    {
      "possibleIntents": [
        {
          "intent": "actions.intent.TEXT"
        }
      ],
      "inputPrompt": {
        "richInitialPrompt": {
          "items": [
            {
              "simpleResponse": {
                "textToSpeech": "This is a simple table example."
              }
            },
            {
              "tableCard": {
                "rows": [
                  {
                    "cells": [
                      {
                        "text": "row 1 item 1"
                      },
                      {
                        "text": "row 1 item 2"
                      },
                      {
                        "text": "row 1 item 3"
                      }
                    ],
                    "dividerAfter": true
                  },
                  {
                    "cells": [
                      {
                        "text": "row 2 item 1"
                      },
                      {
                        "text": "row 2 item 2"
                      },
                      {
                        "text": "row 2 item 3"
                      }
                    ],
                    "dividerAfter": true
                  }
                ],
                "columnProperties": [
                  {
                    "header": "header 1"
                  },
                  {
                    "header": "header 2"
                  },
                  {
                    "header": "header 3"
                  }
                ]
              }
            }
          ]
        }
      }
    }
  ]
}

The following snippets show how to implement a complex table card:

Node.js
// All fields
conv.ask('This is a table with all the possible fields.')
conv.ask(new Table({
  title: 'Table Title',
  subtitle: 'Table Subtitle',
  image: new Image({
    url: 'https://developers.google.com/actions/images/badges/XPM_BADGING_GoogleAssistant_VER.png',
    alt: 'Alt Text'
  }),
  columns: [
    {
      header: 'header 1',
      align: 'CENTER',
    },
    {
      header: 'header 2',
      align: 'LEADING',
    },
    {
      header: 'header 3',
      align: 'TRAILING',
    },
  ],
  rows: [
    {
      cells: ['row 1 item 1', 'row 1 item 2', 'row 1 item 3'],
      dividerAfter: false,
    },
    {
      cells: ['row 2 item 1', 'row 2 item 2', 'row 2 item 3'],
      dividerAfter: true,
    },
    {
      cells: ['row 2 item 1', 'row 2 item 2', 'row 2 item 3'],
    },
  ],
  buttons: new Button({
    title: 'Button Text',
    url: 'https://assistant.google.com'
  }),
}))
Java
ResponseBuilder responseBuilder = getResponseBuilder(request);
if (!request.hasCapability(Capability.SCREEN_OUTPUT.getValue())) {
  return responseBuilder.add("Sorry, try this on a device with a screen").build();
}

List<TableCardColumnProperties> columnProperties = new ArrayList<>();
columnProperties.add(new TableCardColumnProperties()
          .setHeader("Column #1")
          .setHorizontalAlignment("CENTER"));
columnProperties.add(new TableCardColumnProperties()
          .setHeader("Column #2")
          .setHorizontalAlignment("LEADING"));
columnProperties.add(new TableCardColumnProperties()
          .setHeader("Column #3")
          .setHorizontalAlignment("TRAILING"));

List<TableCardRow> rows = new ArrayList<>();
for (int i = 0; i < 4; i++) {
  List<TableCardCell> cells = new ArrayList<>();
  for (int j = 0; j < 3; j++) {
    String cellText = MessageFormat.format("Cell #{0}"), (i + 1));
    cells.add(new TableCardCell().setText(cellText));
  }
  rows.add(new TableCardRow().setCells(cells));
}

Button learnMoreButton = new Button()
                .setTitle("Button Text")
                .setOpenUrlAction(new OpenUrlAction().setUrl("https://assistant.google.com"));
List<Button> button = new ArrayList<>();
button.add(learnMoreButton);

TableCard table =
        new TableCard()
                .setTitle("Table Title")
                .setSubtitle("Table Subtitle")
                .setColumnProperties(columnProperties)
                .setRows(rows)
                .setImage(new Image()
                        .setUrl( "https://developers.google.com/actions/images/badges/XPM_BADGING_GoogleAssistant_VER.png")
                        .setAccessibilityText("Alt Text"))
                .setButtons(button));

responseBuilder.add('This is an example of Table card.').add(table);
return responseBuilder.build();
Dialogflow JSON

Note that the JSON below describes a webhook response.

{
  "payload": {
    "google": {
      "expectUserResponse": true,
      "richResponse": {
        "items": [
          {
            "simpleResponse": {
              "textToSpeech": "This is a table with all the possible fields."
            }
          },
          {
            "tableCard": {
              "title": "Table Title",
              "subtitle": "Table Subtitle",
              "image": {
                "url": "https://developers.google.com/actions/images/badges/XPM_BADGING_GoogleAssistant_VER.png",
                "accessibilityText": "Alt Text"
              },
              "rows": [
                {
                  "cells": [
                    {
                      "text": "row 1 item 1"
                    },
                    {
                      "text": "row 1 item 2"
                    },
                    {
                      "text": "row 1 item 3"
                    }
                  ],
                  "dividerAfter": false
                },
                {
                  "cells": [
                    {
                      "text": "row 2 item 1"
                    },
                    {
                      "text": "row 2 item 2"
                    },
                    {
                      "text": "row 2 item 3"
                    }
                  ],
                  "dividerAfter": true
                },
                {
                  "cells": [
                    {
                      "text": "row 2 item 1"
                    },
                    {
                      "text": "row 2 item 2"
                    },
                    {
                      "text": "row 2 item 3"
                    }
                  ]
                }
              ],
              "columnProperties": [
                {
                  "header": "header 1",
                  "horizontalAlignment": "CENTER"
                },
                {
                  "header": "header 2",
                  "horizontalAlignment": "LEADING"
                },
                {
                  "header": "header 3",
                  "horizontalAlignment": "TRAILING"
                }
              ],
              "buttons": [
                {
                  "title": "Button Text",
                  "openUrlAction": {
                    "url": "https://assistant.google.com"
                  }
                }
              ]
            }
          }
        ]
      }
    }
  }
}
Actions SDK JSON

Note that the JSON below describes a webhook response.

{
  "expectUserResponse": true,
  "expectedInputs": [
    {
      "possibleIntents": [
        {
          "intent": "actions.intent.TEXT"
        }
      ],
      "inputPrompt": {
        "richInitialPrompt": {
          "items": [
            {
              "simpleResponse": {
                "textToSpeech": "This is a table with all the possible fields."
              }
            },
            {
              "tableCard": {
                "title": "Table Title",
                "subtitle": "Table Subtitle",
                "image": {
                  "url": "https://developers.google.com/actions/images/badges/XPM_BADGING_GoogleAssistant_VER.png",
                  "accessibilityText": "Alt Text"
                },
                "rows": [
                  {
                    "cells": [
                      {
                        "text": "row 1 item 1"
                      },
                      {
                        "text": "row 1 item 2"
                      },
                      {
                        "text": "row 1 item 3"
                      }
                    ],
                    "dividerAfter": false
                  },
                  {
                    "cells": [
                      {
                        "text": "row 2 item 1"
                      },
                      {
                        "text": "row 2 item 2"
                      },
                      {
                        "text": "row 2 item 3"
                      }
                    ],
                    "dividerAfter": true
                  },
                  {
                    "cells": [
                      {
                        "text": "row 2 item 1"
                      },
                      {
                        "text": "row 2 item 2"
                      },
                      {
                        "text": "row 2 item 3"
                      }
                    ]
                  }
                ],
                "columnProperties": [
                  {
                    "header": "header 1",
                    "horizontalAlignment": "CENTER"
                  },
                  {
                    "header": "header 2",
                    "horizontalAlignment": "LEADING"
                  },
                  {
                    "header": "header 3",
                    "horizontalAlignment": "TRAILING"
                  }
                ],
                "buttons": [
                  {
                    "title": "Button Text",
                    "openUrlAction": {
                      "url": "https://assistant.google.com"
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  ]
}

Visual selection responses

Use a visual selection response if you want the user to pick a choice between several options in order to continue with your Action.

Visual selection responses can appear on screen-only or audio and screen experiences. They can contain the following components:

Properties

Visual selection responses have the following requirements and optional properties that you can configure:

  • Supported on surfaces with the actions.capability.SCREEN_OUTPUT capability.
  • The first item in a visual selection response must be a simple response.
  • At most one simple response.
  • At most one basic card, option interface (list or carousel), or StructuredResponse. (You cannot have both a basic card and an option interface at the same time).
  • At most 8 suggestion chips.
  • Suggestion chips are not allowed in a FinalResponse.

The following sections show you how to build various types of visual selection responses.

List

Figure 11. List example (smartphone)

The single-select list presents the user with a vertical list of multiple items and allows the user to select a single one. Selecting an item from the list generates a user query (chat bubble) containing the title of the list item.

Properties

The list response type has the following requirements and optional properties that you can configure:

  • Supported on surfaces with the actions.capability.SCREEN_OUTPUT capability.
  • Requires at least one list item, can contain a maximum of 30.
  • List Title (optional)
    • Fixed font and font size
    • Restricted to a single line. (Excessive characters will be truncated.)
    • Plain text, Markdown is not supported.
    • The card height collapses if no title is specified.
  • List item
    • Title
      • Fixed font and font size
      • Max length: 1 line (truncated with ellipses…)
      • Required to be unique (to support voice selection)
    • Body Text (optional)
      • Fixed font and font size
      • Max length: 2 lines (truncated with ellipses…)
    • Image (optional)
      • Size: 48x48 px
  • Interaction
    • Voice/Text
      • The user can always say or type an item's title instead of tapping it.
      • Must have an intent for touch input that handles the actions_intent_OPTION event.
Guidance

Lists are good for when it's important to disambiguate options (for example, which "Peter", do you need to speak to? Peter Jons, or Peter Hans?), or if the user needs to choose between options that need to be scanned at a glance.

We recommend adding suggestion chips below a list to enable the user to pivot or expand the conversation. Never repeat the options presented in the list as suggestion chips. Chips in this context are use to pivot the conversation (not for choice selection).

Notice that in the accompanying example, the chat bubble that accompanies the list card is a subset of the audio (TTS/SSML). The audio output includes only the first list item. We discourage reading all the elements from the list.

Make sure your Action shows what is most important to your users at the top of the list (for example, the most popular, the recently purchased, or the most talked about). The list initially displays up to 10 elements, but users can expand the list to show more elements. The number of items that the list shows before expansion may also change depending on the surface and time.

Figure 12. List example (smart display)

Sample code

Node.js
if (!conv.surface.capabilities.has('actions.capability.SCREEN_OUTPUT')) {
  conv.ask('Sorry, try this on a screen device or select the ' +
    'phone surface in the simulator.');
  return;
}

conv.ask('This is a list example.');
// Create a list
conv.ask(new List({
  title: 'List Title',
  items: {
    // Add the first item to the list
    'SELECTION_KEY_ONE': {
      synonyms: [
        'synonym 1',
        'synonym 2',
        'synonym 3',
      ],
      title: 'Title of First List Item',
      description: 'This is a description of a list item.',
      image: new Image({
        url: 'IMG_URL_AOG.com',
        alt: 'Image alternate text',
      }),
    },
    // Add the second item to the list
    'SELECTION_KEY_GOOGLE_HOME': {
      synonyms: [
        'Google Home Assistant',
        'Assistant on the Google Home',
    ],
      title: 'Google Home',
      description: 'Google Home is a voice-activated speaker powered by ' +
        'the Google Assistant.',
      image: new Image({
        url: 'IMG_URL_GOOGLE_HOME.com',
        alt: 'Google Home',
      }),
    },
    // Add the third item to the list
    'SELECTION_KEY_GOOGLE_PIXEL': {
      synonyms: [
        'Google Pixel XL',
        'Pixel',
        'Pixel XL',
      ],
      title: 'Google Pixel',
      description: 'Pixel. Phone by Google.',
      image: new Image({
        url: 'IMG_URL_GOOGLE_PIXEL.com',
        alt: 'Google Pixel',
      }),
    },
  },
}));
Java
ResponseBuilder responseBuilder = getResponseBuilder(request);
if (!request.hasCapability(Capability.SCREEN_OUTPUT.getValue())) {
  return responseBuilder
      .add("Sorry, try ths on a screen device or select the phone surface in the simulator.")
      .build();
}
List<ListSelectListItem> items = new ArrayList<>();
ListSelectListItem item = new ListSelectListItem();
item.setTitle("Title of first list item")
    .setDescription("This is a description of the list item")
    .setOptionInfo(
        new OptionInfo()
            .setKey("SELECTION_KEY_ONE")
            .setSynonyms(Arrays.asList("synonym 1", "synonym 2", "synonym 3")))
    .setImage(
        new Image()
            .setUrl("http://example.com/image.png")
            .setAccessibilityText("Image alternate text"));
items.add(item);

item = new ListSelectListItem();
item.setTitle("Google Home")
    .setDescription("Google Home is a voice activated speaker powered by the Google Assistant.")
    .setOptionInfo(
        new OptionInfo()
            .setKey("SELECTION_KEY_GOOGLE_HOME")
            .setSynonyms(Arrays.asList("Google Home assistant", "Assistant")))
    .setImage(new Image().setUrl("IMG_URL_GOOGLE_HOME").setAccessibilityText("Google Home"));
items.add(item);

item = new ListSelectListItem();
item.setTitle("Google Pixel")
    .setDescription("Pixel. Phone by Google")
    .setOptionInfo(
        new OptionInfo()
            .setKey("SELECTION_KEY_GOOGLE_PIXEL")
            .setSynonyms(Arrays.asList("Pixel", "Pixel XL")))
    .setImage(new Image().setUrl("IMG_URL_GOOGLE_PIXEL").setAccessibilityText("Google Pixel"));
items.add(item);

return responseBuilder
    .add("This is a List")
    .add(new SelectionList().setTitle("List title").setItems(items))
    .build();
Dialogflow JSON

Note that the JSON below describes a webhook response.

{
  "payload": {
    "google": {
      "expectUserResponse": true,
      "systemIntent": {
        "intent": "actions.intent.OPTION",
        "data": {
          "@type": "type.googleapis.com/google.actions.v2.OptionValueSpec",
          "listSelect": {
            "title": "List Title",
            "items": [
              {
                "optionInfo": {
                  "key": "SELECTION_KEY_ONE",
                  "synonyms": [
                    "synonym 1",
                    "synonym 2",
                    "synonym 3"
                  ]
                },
                "description": "This is a description of a list item.",
                "image": {
                  "url": "IMG_URL_AOG.com",
                  "accessibilityText": "Image alternate text"
                },
                "title": "Title of First List Item"
              },
              {
                "optionInfo": {
                  "key": "SELECTION_KEY_GOOGLE_HOME",
                  "synonyms": [
                    "Google Home Assistant",
                    "Assistant on the Google Home"
                  ]
                },
                "description": "Google Home is a voice-activated speaker powered by the Google Assistant.",
                "image": {
                  "url": "IMG_URL_GOOGLE_HOME.com",
                  "accessibilityText": "Google Home"
                },
                "title": "Google Home"
              },
              {
                "optionInfo": {
                  "key": "SELECTION_KEY_GOOGLE_PIXEL",
                  "synonyms": [
                    "Google Pixel XL",
                    "Pixel",
                    "Pixel XL"
                  ]
                },
                "description": "Pixel. Phone by Google.",
                "image": {
                  "url": "IMG_URL_GOOGLE_PIXEL.com",
                  "accessibilityText": "Google Pixel"
                },
                "title": "Google Pixel"
              }
            ]
          }
        }
      },
      "richResponse": {
        "items": [
          {
            "simpleResponse": {
              "textToSpeech": "This is a list example."
            }
          }
        ]
      }
    }
  }
}
Actions SDK JSON

Note that the JSON below describes a webhook response.

{
  "expectUserResponse": true,
  "expectedInputs": [
    {
      "possibleIntents": [
        {
          "intent": "actions.intent.OPTION",
          "inputValueData": {
            "@type": "type.googleapis.com/google.actions.v2.OptionValueSpec",
            "listSelect": {
              "title": "List Title",
              "items": [
                {
                  "optionInfo": {
                    "key": "SELECTION_KEY_ONE",
                    "synonyms": [
                      "synonym 1",
                      "synonym 2",
                      "synonym 3"
                    ]
                  },
                  "description": "This is a description of a list item.",
                  "image": {
                    "url": "IMG_URL_AOG.com",
                    "accessibilityText": "Image alternate text"
                  },
                  "title": "Title of First List Item"
                },
                {
                  "optionInfo": {
                    "key": "SELECTION_KEY_GOOGLE_HOME",
                    "synonyms": [
                      "Google Home Assistant",
                      "Assistant on the Google Home"
                    ]
                  },
                  "description": "Google Home is a voice-activated speaker powered by the Google Assistant.",
                  "image": {
                    "url": "IMG_URL_GOOGLE_HOME.com",
                    "accessibilityText": "Google Home"
                  },
                  "title": "Google Home"
                },
                {
                  "optionInfo": {
                    "key": "SELECTION_KEY_GOOGLE_PIXEL",
                    "synonyms": [
                      "Google Pixel XL",
                      "Pixel",
                      "Pixel XL"
                    ]
                  },
                  "description": "Pixel. Phone by Google.",
                  "image": {
                    "url": "IMG_URL_GOOGLE_PIXEL.com",
                    "accessibilityText": "Google Pixel"
                  },
                  "title": "Google Pixel"
                }
              ]
            }
          }
        }
      ],
      "inputPrompt": {
        "richInitialPrompt": {
          "items": [
            {
              "simpleResponse": {
                "textToSpeech": "This is a list example."
              }
            }
          ]
        }
      }
    }
  ]
}

Handling a selected item

When users select an item, the selected item value is passed to you as an argument. In the argument value, you will get the key identifier for the selected item:

Node.js
const SELECTED_ITEM_RESPONSES = {
  'SELECTION_KEY_ONE': 'You selected the first item',
  'SELECTION_KEY_GOOGLE_HOME': 'You selected the Google Home!',
  'SELECTION_KEY_GOOGLE_PIXEL': 'You selected the Google Pixel!',
};

app.intent('actions.intent.OPTION', (conv, params, option) => {
  let response = 'You did not select any item';
  if (option && SELECTED_ITEM_RESPONSES.hasOwnProperty(option)) {
    response = SELECTED_ITEM_RESPONSES[option];
  }
  conv.ask(response);
});
Java
ResponseBuilder responseBuilder = getResponseBuilder(request);
String selectedItem = request.getSelectedOption();
String response;

if (selectedItem == null) {
  response = "You did not select any item";
} else if (selectedItem.equals("SELECTION_KEY_ONE")) {
  response = "You selected the first item";
} else if (selectedItem.equals("SELECTION_KEY_GOOGLE_HOME")) {
  response = "You selected the Google Home";
} else if (selectedItem.equals("SELECTION_KEY_GOOGLE_PIXEL")) {
  response = "You selected the Google Pixel";
} else {
  response = "You did not select a valid item";
}
return responseBuilder.add(response).build();
Dialogflow JSON

Note that the JSON below describes a webhook request.

{
  "responseId": "",
  "queryResult": {
    "queryText": "",
    "action": "",
    "parameters": {},
    "allRequiredParamsPresent": true,
    "fulfillmentText": "",
    "fulfillmentMessages": [],
    "outputContexts": [],
    "intent": {
      "name": "foo",
      "displayName": "foo"
    },
    "intentDetectionConfidence": 1,
    "diagnosticInfo": {},
    "languageCode": ""
  },
  "originalDetectIntentRequest": {
    "source": "google",
    "version": "2",
    "payload": {
      "isInSandbox": true,
      "surface": {
        "capabilities": [
          {
            "name": "actions.capability.SCREEN_OUTPUT"
          },
          {
            "name": "actions.capability.AUDIO_OUTPUT"
          },
          {
            "name": "actions.capability.MEDIA_RESPONSE_AUDIO"
          },
          {
            "name": "actions.capability.WEB_BROWSER"
          }
        ]
      },
      "inputs": [
        {
          "rawInputs": [],
          "intent": "",
          "arguments": []
        }
      ],
      "user": {},
      "conversation": {},
      "availableSurfaces": [
        {
          "capabilities": [
            {
              "name": "actions.capability.SCREEN_OUTPUT"
            },
            {
              "name": "actions.capability.AUDIO_OUTPUT"
            },
            {
              "name": "actions.capability.MEDIA_RESPONSE_AUDIO"
            },
            {
              "name": "actions.capability.WEB_BROWSER"
            }
          ]
        }
      ]
    }
  },
  "session": ""
}
Actions SDK JSON

Note that the JSON below describes a webhook request.

{
  "user": {},
  "device": {},
  "surface": {
    "capabilities": [
      {
        "name": "actions.capability.SCREEN_OUTPUT"
      },
      {
        "name": "actions.capability.AUDIO_OUTPUT"
      },
      {
        "name": "actions.capability.MEDIA_RESPONSE_AUDIO"
      },
      {
        "name": "actions.capability.WEB_BROWSER"
      }
    ]
  },
  "conversation": {},
  "inputs": [
    {
      "rawInputs": [],
      "intent": "fizzbuzz",
      "arguments": []
    }
  ],
  "availableSurfaces": [
    {
      "capabilities": [
        {
          "name": "actions.capability.SCREEN_OUTPUT"
        },
        {
          "name": "actions.capability.AUDIO_OUTPUT"
        },
        {
          "name": "actions.capability.MEDIA_RESPONSE_AUDIO"
        },
        {
          "name": "actions.capability.WEB_BROWSER"
        }
      ]
    }
  ]
}
Figure 13. Carousel example (smartphone)

The carousel scrolls horizontally and allows for selecting one item. Compared to the list selector, it has large tiles-allowing for richer content. The tiles that make up a carousel are similar to the basic card with image. Selecting an item from the carousel will simply generate a chat bubble as the response just like with list selector.

Properties

The carousel response type has the following requirements and optional properties that you can configure:

  • Supported on surfaces with the actions.capability.SCREEN_OUTPUT capability.
  • Carousel
    • Maximum of ten tiles.
    • Minimum of two tiles.
    • Plain text, Markdown is not supported.
  • Carousel tile
    • Image (optional)
      • Image is forced to be 128 dp tall x 232 dp wide
      • If the image aspect ratio doesn't match the image bounding box, then the image is centered with bars on either side
      • If an image link is broken then a placeholder image is used instead
    • Title (required)
      • Same as the Basic Text Card
      • Titles must be unique (to support voice selection)
    • Description (optional)
      • Same formatting options as the Basic Text Card
      • Max 4 lines
      • Plain text, Markdown is not supported.
  • Interaction
    • Swipe left/right: Slide the carousel to reveal different cards.
    • Tap card: Tapping an item simply generates a chat bubble with the same text as the element title.
      • Must have an intent for touch input that handles the actions_intent_OPTION event.
    • Voice/Keyboard: Replying with the card title (if specified) functions the same as selecting that item.
Guidance

Carousels are good when various options are presented to the user, but a direct comparison is not required among them (versus lists). In general, lists are preferred over carousels simply because lists are easier to visually scan and interact with via voice.

If you want to build a carousel with items that link out to web pages, you may want to build a browsing carousel instead.

We recommend adding suggestion chips below a carousel if you want to continue the conversation.

Never repeat the options presented in the list as suggestion chips. Chips in this context are used to pivot the conversation (not for choice selection).

Same as with lists, the chat bubble that accompanies the carousel card is a subset of the audio (TTS/SSML). The audio (TTS/SSML) here integrates the first tile in the carousel, and we also strongly discourage reading all the elements from the carousel. It's best to mention the first item and the reason why it's there (for example, the most popular, the most recently purchased, or the most talked about).

Sample code

Handling selected item

When users select an item, the selected item value is passed to you as an argument. In the argument value, you will get the key identifier for the selected item:

Node.js
const SELECTED_ITEM_RESPONSES = {
  'SELECTION_KEY_ONE': 'You selected the first item',
  'SELECTION_KEY_GOOGLE_HOME': 'You selected the Google Home!',
  'SELECTION_KEY_GOOGLE_PIXEL': 'You selected the Google Pixel!',
};

app.intent('actions.intent.OPTION', (conv, params, option) => {
  let response = 'You did not select any item';
  if (option && SELECTED_ITEM_RESPONSES.hasOwnProperty(option)) {
    response = SELECTED_ITEM_RESPONSES[option];
  }
  conv.ask(response);
});
Java
ResponseBuilder responseBuilder = getResponseBuilder(request);
String selectedItem = request.getSelectedOption();
String response;

if (selectedItem == null) {
  response = "You did not select any item";
} else if (selectedItem.equals("SELECTION_KEY_ONE")) {
  response = "You selected the first item";
} else if (selectedItem.equals("SELECTION_KEY_GOOGLE_HOME")) {
  response = "You selected the Google Home";
} else if (selectedItem.equals("SELECTION_KEY_GOOGLE_PIXEL")) {
  response = "You selected the Google Pixel";
} else {
  response = "You did not select a valid item";
}
return responseBuilder.add(response).build();
Dialogflow JSON

Note that the JSON below describes a webhook request.

{
  "responseId": "",
  "queryResult": {
    "queryText": "",
    "action": "",
    "parameters": {},
    "allRequiredParamsPresent": true,
    "fulfillmentText": "",
    "fulfillmentMessages": [],
    "outputContexts": [],
    "intent": {
      "name": "foo",
      "displayName": "foo"
    },
    "intentDetectionConfidence": 1,
    "diagnosticInfo": {},
    "languageCode": ""
  },
  "originalDetectIntentRequest": {
    "source": "google",
    "version": "2",
    "payload": {
      "isInSandbox": true,
      "surface": {
        "capabilities": [
          {
            "name": "actions.capability.SCREEN_OUTPUT"
          },
          {
            "name": "actions.capability.AUDIO_OUTPUT"
          },
          {
            "name": "actions.capability.MEDIA_RESPONSE_AUDIO"
          },
          {
            "name": "actions.capability.WEB_BROWSER"
          }
        ]
      },
      "inputs": [
        {
          "rawInputs": [],
          "intent": "",
          "arguments": []
        }
      ],
      "user": {},
      "conversation": {},
      "availableSurfaces": [
        {
          "capabilities": [
            {
              "name": "actions.capability.SCREEN_OUTPUT"
            },
            {
              "name": "actions.capability.AUDIO_OUTPUT"
            },
            {
              "name": "actions.capability.MEDIA_RESPONSE_AUDIO"
            },
            {
              "name": "actions.capability.WEB_BROWSER"
            }
          ]
        }
      ]
    }
  },
  "session": ""
}
Actions SDK JSON

Note that the JSON below describes a webhook request.

{
  "user": {},
  "device": {},
  "surface": {
    "capabilities": [
      {
        "name": "actions.capability.SCREEN_OUTPUT"
      },
      {
        "name": "actions.capability.AUDIO_OUTPUT"
      },
      {
        "name": "actions.capability.MEDIA_RESPONSE_AUDIO"
      },
      {
        "name": "actions.capability.WEB_BROWSER"
      }
    ]
  },
  "conversation": {},
  "inputs": [
    {
      "rawInputs": [],
      "intent": "fizzbuzz",
      "arguments": []
    }
  ],
  "availableSurfaces": [
    {
      "capabilities": [
        {
          "name": "actions.capability.SCREEN_OUTPUT"
        },
        {
          "name": "actions.capability.AUDIO_OUTPUT"
        },
        {
          "name": "actions.capability.MEDIA_RESPONSE_AUDIO"
        },
        {
          "name": "actions.capability.WEB_BROWSER"
        }
      ]
    }
  ]
}

Customizing your responses

You can change the appearance of your rich responses by creating a custom theme. If you define a theme for your Actions project, rich responses across your project's Actions will be styled according to your theme. This custom branding can be useful for defining a unique look and feel to the conversation when users invoke your Actions on a surface with a screen.

To set a custom response theme, do the following:

  1. In the Actions console, navigate to Build > Theme customization.
  2. Set any or all of the following:
    • Background color will be used as the background of your cards. In general, you should use a light color for the background so the card's content will be easy to read.
    • Primary color is the main color for your cards' header texts and UI elements. In general, you should use a darker primary color to contrast with the background.
    • Font family describes the type of font used for titles and other prominent text elements.
    • Image corner style can alter the look of your cards' corners.
    • Background image will use a custom image in place of the background color. You'll need to provide two different images for when the surface device is in landscape or portrait mode, respectively. Note that if you use a background images, the primary color will be set to white.
  3. Click Save.
Figure 15. Customizing the theme in the Actions console

UI checklist

The following checklist highlights common things you can do to make sure your responses appear appropropriately on the surface users are experiencing your actions on.

Cards and Options
Use cards and options

Cards and options let you display information in a richer, more customizable format.

  • Basic card - If you need to present a lot of text to the user, use a basic card. A card can display up to 15 lines of text, and link to a website for further reading. Unlike chat bubbles, the card supports text formatting. You can also add an image and a list or carousel to display options.
  • List - If you are asking the user to pick from a list of choices, consider using a list instead of writing out the list in a chat bubble.
  • Carousel - If you want to the user to pick from a list of choices with a focus on larger images, use a carousel, which has a limit of 8 items.

Suggestion Chips
Use them after most turns

The best thing you can do to increase your Action's usability on devices with screens is to add chips, so the user can quickly tap to respond in addition to using voice or the keyboard. For example, any yes/no question should have suggestion chips for **Yes** and **No**.

When there are a few choices...

When offering the user a small number of choices (8 or less) you can add suggestion for each choice (present them in the same order as in your TTS, and using the same terminology).

When there are many choices...

If you ask a question with a wide range of possible answers, present a few of the most popular answers.

When returning media responses...

Your fulfillment must include suggestion chips with the media response if the response is not a final response.

Chat Bubbles
Correct capitalization and punctuation

Now that your TTS strings can show up as chat bubbles, check your them for correct capitalization and punctuation.

Fix phonetic spellings

If you spelled something out phonetically in your TTS to help with a pronunciation issue, then that phonetic misspelling will appear in your chat bubble. Use different display text to use correct spelling for chat bubbles on devices with screens.

Avoid truncation

Chat bubbles are limited to 640 characters and are truncated after that limit (however, we recommend around 300 as a general, design guideline). If you have more than that, you can:

  • Use a 2nd chat bubble - Up to 2 chat bubbles are allowed per turn, so find a natural break point and create a second chat bubble.
  • Don't show everything - If you are presenting long TTS content, consider showing only a subset of the TTS content in the chat bubble, such as just an introduction. You can use shorter display text than TTS text in this case.

Recorded Audio
Remove <audio> text from chat bubbles

If you have text inside your SSML <audio> tag, it's displayed in your corresponding chat bubble. For example, if your SSML is:

<speak>
  Here's that song.
  <audio src="...">song audio</audio>
</speak>

your chat bubble text appears as "Here's that song. song audio".

Instead, add a <desc> element inside your<audio> element. Any text inside <desc> is displayed, and any text outside <audio> is used as the alternate text if the audio source file cannot be loaded. For example:

<speak>
  Here's that song.
  <audio src="bad_url"><desc></desc>song audio</audio>
</speak>

results in the audio output: "Here's that song. song audio" and the chat bubble text: Here's that song.

Alternatively, you can just remove the text from your <audio> tag altogether, or use the SSML <sub> tag.

Eliminate empty chat bubbles

Every dialog turn is required to have at least one chat bubble. If your Action has dialogs that are composed of only streaming audio (no TTS) then the chat bubble text will be missing and your response will fail. In these cases, add display text that matches the words in your recorded audio, or the introduction.