AI-powered content search

Learn how to build a generative-AI conversational search application capable of answering questions related to a project or product.

11 activities
AI-powered content search
1

Overview: AI-powered content search

Searching for information is one of the most common uses of artificial intelligence (AI) language models. Building a conversational search interface for your content using AI allows your users to ask specific questions and get direct answers.

This pathway shows you how to build an AI-powered, conversational search interface for your content. It's based on Docs Agent, an open source project that uses Google PaLM API to create a conversational search interface, without training a new AI model or doing model tuning with PaLM models. That means you can get this search functionality built quickly and use it for small and large content sets.

Reference Build an AI content search with Docs Agent as you move through the steps of this pathway.

2

AI Content Search - Build with Google AI

Video

Watch this video to see how you can build an AI-powered conversational search interface for your content using the Google Gemini API and the open-source project Docs Agent.

This video provides an overview of what you will learn in this pathway as well as includes a demo, developer chat, and customization guidance.

3

Install prerequisites

The Docs Agent project uses Python 3 and Python Poetry to manage packages and run the application. The following installation instructions are for a Linux host machine.

1. Install Python 3 and the venv virtual environment package for Python.

  
  sudo apt update
  sudo apt install git pip python3-venv
  

2. Install Python Poetry to manage the dependencies and packaging for the Docs Agent project. You can use this tool to add more Python libraries if you extend the project.

  
  curl -sSL https://install.python-poetry.org | python3 -
  

4

Set environment variables

To set the environment variables required for the Docs Agent code project to run, do the following:

1. Get a Google PaLM API Key from the Generative AI Developer site and copy the key string.

2. Set the API Key as an environment variable using the following commands:

  
  export PALM_API_KEY=
  sudo apt update
  
3. Resolve a known issue for Python Poetry by setting the PYTHON_KEYRING_BACKEND parameter.

  
  export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
  

5

Clone and configure the project

Article

Download the project code and use the Poetry installation command to download the required dependencies and configure the project.

1. Clone the git repository using the following command:

  
  git clone https://github.com/google/generative-ai-docs
  

2. Optionally, configure your local git repository to use sparse checkout, so you have only the files for the Docs Agent project.

  
  cd generative-ai-docs/
  git sparse-checkout init --cone
  git sparse-checkout set demos/palm/python/docs-agent/
  

3. Move to the docs-agent project root directory.

  
  cd demos/palm/python/docs-agent/
  

4. Run the Poetry install command to download dependencies and configure the project.

  
  poetry install
  

6

Prepare content

The Docs Agent project is designed to work with text content. It includes tools that work specifically with websites that use Markdown as the source format. If you are working with website content, preserve or replicate the directory structure of the served website. This enables the content processing task to map and create links to that content.

Depending on the format and details of your content, you may need to clean your content to remove non-public information, internal notes, or other information that you don't want to be searchable. Retain basic formatting such as titles and headings, which help create logical text splits, or chunks, in the content processing step.

To prepare content for processing, do the following:

1. Create a directory for the content you want the AI agent to search.

  
  docs-agent/content/
  

2. Copy your content into the docs-agent/content/ directory. If the content is a website, preserve or replicate the directory structure of the served website.

3. Clean or edit the content as needed to remove non-public information, or other information you don't want included in the searches.

7

Use Flutter docs for testing

Article

You can use the Flutter developer docs for testing by doing the following:

1. Move to the content directory for the content you want the AI agent to search.

  
  cd docs-agent/content/
  

2. Clone the Flutter docs into the docs-agent/content/ directory.

  
  git clone --recurse-submodules https://github.com/flutter/website.git
  

8

Create text embedding vectors

Code sample

To generate text embeddings and populate the vector database, do the following:

1. Navigate to the docs-agent project directory.

  
  cd docs-agent/
  

2. Populate the vector database with your content using the populate_vector_database.py script.

  
  poetry run python3 scripts/populate_vector_database.py
  

9

Other content formats

Article Optional

The Docs Agent project is designed to work with website content in Markdown format. You can use other content formats with the project, however, those additional methods need to be built by you or other members of the community. Check the code repository Issues and Pull Requests for folks building similar solutions.

The key thing you need to build to support other content formats is a splitter script like the scripts/markdown_to_plain_text.py script. Aim to build a script or program that creates similar output to this script. Remember that the final text output should have minimal formatting and extraneous information. If you are using content formats such as HTML or JSON, make sure you strip away as much of the non-informational formatting (tags, scripting, CSS) as possible, so that it does not skew the values of the text embeddings you generate from them.

Once you have built a splitter script for content format, you should be able to run the populate_vector_database.py script to populate your vector database.

10

Run and test the project web interface

Article

To run and test the project web interface, do the following:

1. Navigate to the docs-agent project directory.

  
  cd docs-agent/
  

2. Run the web application launch script.

  
  poetry run ./chatbot/launch.sh -p 5000
  

3. Using your web browser, navigate to the URL web address shown in the output of the launch script and test the application.

  
  *Running on http://YOUR_HOSTNAME:5000
  

11

Additional resources

Code sample Optional

For more information about the Docs Agent project, see the code repository .

If you need help building the application or are looking for developer collaborators, check out the Google Developers Community Discord server.

If you plan to deploy Docs Agent for a large audience, note that your use of the Google PaLM API may be subject to rate limiting and other use restrictions .

If you are considering building a production application with the PaLM API like Docs Agent, check out Google Cloud Vertex AI services for increased scalability and reliability of your app.