Machine Learning Glossary: Generative AI

This page contains Generative AI glossary terms. For all glossary terms, click here.

A

auto-regressive model

#language

#image

#generativeAI

A model that infers a prediction based on its own previous predictions. For example, auto-regressive language models predict the next token based on the previously predicted tokens. All Transformer-based large language models are auto-regressive.

In contrast, GAN-based image models are usually not auto-regressive since they generate an image in a single forward-pass and not iteratively in steps. However, certain image generation models are auto-regressive because they generate an image in steps.

C

chain-of-thought prompting

#language

#generativeAI

A prompt engineering technique that encourages a large language model (LLM) to explain its reasoning, step by step. For example, consider the following prompt, paying particular attention to the second sentence:

How many g forces would a driver experience in a car that goes from 0 to 60 miles per hour in 7 seconds? In the answer, show all relevant calculations.

The LLM's response would likely:

Show a sequence of physics formulas, plugging in the values 0, 60, and 7 in appropriate places.
Explain why it chose those formulas and what the various variables mean.

Chain-of-thought prompting forces the LLM to perform all the calculations, which might lead to a more correct answer. In addition, chain-of-thought prompting enables the user to examine the LLM's steps to determine whether or not the answer makes sense.

chat

#language

#generativeAI

The contents of a back-and-forth dialogue with an ML system, typically a large language model. The previous interaction in a chat (what you typed and how the large language model responded) becomes the context for subsequent parts of the chat.

A chatbot is an application of a large language model.

contextualized language embedding

#language

#generativeAI

An embedding that comes close to "understanding" words and phrases in ways that native human speakers can. Contextualized language embeddings can understand complex syntax, semantics, and context.

For example, consider embeddings of the English word cow. Older embeddings such as word2vec can represent English words such that the distance in the embedding space from cow to bull is similar to the distance from ewe (female sheep) to ram (male sheep) or from female to male. Contextualized language embeddings can go a step further by recognizing that English speakers sometimes casually use the word cow to mean either cow or bull.

context window

#language

#generativeAI

The number of tokens a model can process in a given prompt. The larger the context window, the more information the model can use to provide coherent and consistent responses to the prompt.

D

direct prompting

#language

#generativeAI

Synonym for zero-shot prompting.

distillation

#generativeAI

The process of reducing the size of one model (known as the teacher) into a smaller model (known as the student) that emulates the original model's predictions as faithfully as possible. Distillation is useful because the smaller model has two key benefits over the larger model (the teacher):

Faster inference time
Reduced memory and energy usage

However, the student's predictions are typically not as good as the teacher's predictions.

Distillation trains the student model to minimize a loss function based on the difference between the outputs of the predictions of the student and teacher models.

Compare and contrast distillation with the following terms:

fine-tuning
prompt-based learning

F

few-shot prompting

#language

#generativeAI

A prompt that contains more than one (a "few") example demonstrating how the large language model should respond. For example, the following lengthy prompt contains two examples showing a large language model how to answer a query.

Parts of one prompt	Notes
`What is the official currency of the specified country?`	The question you want the LLM to answer.
`France: EUR`	One example.
`United Kingdom: GBP`	Another example.
`India:`	The actual query.

Few-shot prompting generally produces more desirable results than zero-shot prompting and one-shot prompting. However, few-shot prompting requires a lengthier prompt.

Few-shot prompting is a form of few-shot learning applied to prompt-based learning.

fine tuning

#language

#image

#generativeAI

A second, task-specific training pass performed on a pre-trained model to refine its parameters for a specific use case. For example, the full training sequence for some large language models is as follows:

Pre-training: Train a large language model on a vast general dataset, such as all the English language Wikipedia pages.
Fine-tuning: Train the pre-trained model to perform a specific task, such as responding to medical queries. Fine-tuning typically involves hundreds or thousands of examples focused on the specific task.

As another example, the full training sequence for a large image model is as follows:

Pre-training: Train a large image model on a vast general image dataset, such as all the images in Wikimedia commons.
Fine-tuning: Train the pre-trained model to perform a specific task, such as generating images of orcas.

Fine-tuning can entail any combination of the following strategies:

Modifying all of the pre-trained model's existing parameters. This is sometimes called full fine-tuning.
Modifying only some of the pre-trained model's existing parameters (typically, the layers closest to the output layer), while keeping other existing parameters unchanged (typically, the layers closest to the input layer). See parameter-efficient tuning.
Adding more layers, typically on top of the existing layers closest to the output layer.

Fine-tuning is a form of transfer learning. As such, fine-tuning might use a different loss function or a different model type than those used to train the pre-trained model. For example, you could fine-tune a pre-trained large image model to produce a regression model that returns the number of birds in an input image.

Compare and contrast fine-tuning with the following terms:

distillation
prompt-based learning

G

generative AI

#language

#image

#generativeAI

An emerging transformative field with no formal definition. That said, most experts agree that generative AI models can create ("generate") content that is all of the following:

complex
coherent
original

For example, a generative AI model can create sophisticated essays or images.

Some earlier technologies, including LSTMs and RNNs, can also generate original and coherent content. Some experts view these earlier technologies as generative AI, while others feel that true generative AI requires more complex output than those earlier technologies can produce.

Contrast with predictive ML.

I

in-context learning

#language

#generativeAI

Synonym for few-shot prompting.

instruction tuning

#generativeAI

A form of fine-tuning that improves a generative AI model's ability to follow instructions. Instruction tuning involves training a model on a series of instruction prompts, typically covering a wide variety of tasks. The resulting instruction-tuned model then tends to generate useful responses to zero-shot prompts across a variety of tasks.

Compare and contrast with:

parameter-efficient tuning
prompt tuning

L

LoRA

#language

#generativeAI

Abbreviation for Low-Rank Adaptability.

Low-Rank Adaptability (LoRA)

#language

#generativeAI

An algorithm for performing parameter efficient tuning that fine-tunes only a subset of a large language model's parameters. LoRA provides the following benefits:

Fine-tunes faster than techniques that require fine-tuning all of a model's parameters.
Reduces the computational cost of inference in the fine-tuned model.

A model tuned with LoRA maintains or improves the quality of its predictions.

LoRA enables multiple specialized versions of a model.

M

model cascading

#generativeAI

A system that picks the ideal model for a specific inference query.

Imagine a group of models, ranging from very large (lots of parameters) to much smaller (far fewer parameters). Very large models consume more computational resources at inference time than smaller models. However, very large models can typically infer more complex requests than smaller models. Model cascading determines the complexity of the inference query and then picks the appropriate model to perform the inference. The main motivation for model cascading is to reduce inference costs by generally selecting smaller models, and only selecting a larger model for more complex queries.

Imagine that a small model runs on a phone and a larger version of that model runs on a remote server. Good model cascading reduces cost and latency by enabling the smaller model to handle simple requests and only calling the remote model to handle complex requests.

model router

#generativeAI

The algorithm that determines the ideal model for inference in model cascading. A model router is itself typically a machine learning model that gradually learns how to pick the best model for a given input. However, a model router could sometimes be a simpler, non-machine learning algorithm.

O

one-shot prompting

#language

#generativeAI

A prompt that contains one example demonstrating how the large language model should respond. For example, the following prompt contains one example showing a large language model how it should answer a query.

Parts of one prompt	Notes
`What is the official currency of the specified country?`	The question you want the LLM to answer.
`France: EUR`	One example.
`India:`	The actual query.

Compare and contrast one-shot prompting with the following terms:

zero-shot prompting
few-shot prompting

P

parameter-efficient tuning

#language

#generativeAI

A set of techniques to fine-tune a large pre-trained language model (PLM) more efficiently than full fine-tuning. Parameter-efficient tuning typically fine-tunes far fewer parameters than full fine-tuning, yet generally produces a large language model that performs as well (or almost as well) as a large language model built from full fine-tuning.

Compare and contrast parameter-efficient tuning with:

instruction tuning
prompt tuning

Parameter-efficient tuning is also known as parameter-efficient fine-tuning.

PLM

#language

#generativeAI

Abbreviation for pre-trained language model.

pre-trained model

#language

#image

#generativeAI

Models or model components (such as an embedding vector) that have been already been trained. Sometimes, you'll feed pre-trained embedding vectors into a neural network. Other times, your model will train the embedding vectors themselves rather than rely on the pre-trained embeddings.

The term pre-trained language model refers to a large language model that has gone through pre-training.

pre-training

#language

#image

#generativeAI

The initial training of a model on a large dataset. Some pre-trained models are clumsy giants and must typically be refined through additional training. For example, ML experts might pre-train a large language model on a vast text dataset, such as all the English pages in Wikipedia. Following pre-training, the resulting model might be further refined through any of the following techniques:

distillation
fine-tuning
instruction tuning
parameter-efficient tuning
prompt-tuning

prompt

#language

#generativeAI

Any text entered as input to a large language model to condition the model to behave in a certain way. Prompts can be as short as a phrase or arbitrarily long (for example, the entire text of a novel). Prompts fall into multiple categories, including those shown in the following table:

Prompt category	Example	Notes
Question	`How fast can a pigeon fly?`
Instruction	`Write a funny poem about arbitrage.`	A prompt that asks the large language model to do something.
Example	`Translate Markdown code to HTML. For example: Markdown: * list item HTML: <ul> <li>list item</li> </ul>`	The first sentence in this example prompt is an instruction. The remainder of the prompt is the example.
Role	`Explain why gradient descent is used in machine learning training to a PhD in Physics.`	The first part of the sentence is an instruction; the phrase "to a PhD in Physics" is the role portion.
Partial input for the model to complete	`The Prime Minister of the United Kingdom lives at`	A partial input prompt can either end abruptly (as this example does) or end with an underscore.

A generative AI model can respond to a prompt with text, code, images, embeddings, videos…almost anything.

prompt-based learning

#language

#generativeAI

A capability of certain models that enables them to adapt their behavior in response to arbitrary text input (prompts). In a typical prompt-based learning paradigm, a large language model responds to a prompt by generating text. For example, suppose a user enters the following prompt:

Summarize Newton's Third Law of Motion.

A model capable of prompt-based learning isn't specifically trained to answer the previous prompt. Rather, the model "knows" a lot of facts about physics, a lot about general language rules, and a lot about what constitutes generally useful answers. That knowledge is sufficient to provide a (hopefully) useful answer. Additional human feedback ("That answer was too complicated." or "What's a reaction?") enables some prompt-based learning systems to gradually improve the usefulness of their answers.

prompt design

#language

#generativeAI

Synonym for prompt engineering.

prompt engineering

#language

#generativeAI

The art of creating prompts that elicit the desired responses from a large language model. Humans perform prompt engineering. Writing well-structured prompts is an essential part of ensuring useful responses from a large language model. Prompt engineering depends on many factors, including:

The dataset used to pre-train and possibly fine-tune the large language model.
The temperature and other decoding parameters that the model uses to generate responses.

See Introduction to prompt design for more details on writing helpful prompts.

Prompt design is a synonym for prompt engineering.

prompt tuning

#language

#generativeAI

A parameter efficient tuning mechanism that learns a "prefix" that the system prepends to the actual prompt.

One variation of prompt tuning—sometimes called prefix tuning—is to prepend the prefix at every layer. In contrast, most prompt tuning only adds a prefix to the input layer.

Click the icon to learn more about prefixes.

For prompt tuning, the "prefix" (also known as a "soft prompt") is a handful of learned, task-specific vectors prepended to the text token embeddings from the actual prompt. The system learns the soft prompt by freezing all other model parameters and fine-tuning on a specific task.

R

Reinforcement Learning from Human Feedback (RLHF)

#generativeAI

#rl

Using feedback from human raters to improve the quality of a model's responses. For example, an RLHF mechanism can ask users to rate the quality of a model's response with a 👍 or 👎 emoji. The system can then adjust its future responses based on that feedback.

role prompting

#language

#generativeAI

An optional part of a prompt that identifies a target audience for a generative AI model's response. Without a role prompt, a large language model provides an answer that may or may not be useful for the person asking the questions. With a role prompt, a large language model can answer in a way that's more appropriate and more helpful for a specific target audience. For example, the role prompt portion of the following prompts are in boldface:

Summarize this article for a PhD in economics.
Describe how tides work for a ten-year old.
Explain the 2008 financial crisis. Speak as you might to a young child, or a golden retriever.

S

soft prompt tuning

#language

#generativeAI

A technique for tuning a large language model for a particular task, without resource intensive fine-tuning. Instead of retraining all the weights in the model, soft prompt tuning automatically adjusts a prompt to achieve the same goal.

Given a textual prompt, soft prompt tuning typically appends additional token embeddings to the prompt and uses backpropagation to optimize the input.

A "hard" prompt contains actual tokens instead of token embeddings.

T

temperature

#language

#image

#generativeAI

A hyperparameter that controls the degree of randomness of a model's output. Higher temperatures result in more random output, while lower temperatures result in less random output.

Choosing the best temperature depends on the specific application and the preferred properties of the model's output. For example, you would probably raise the temperature when creating an application that generates creative output. Conversely, you would probably lower the temperature when building a model that classifies images or text in order to improve the model's accuracy and consistency.

Temperature is often used with softmax.

Z

zero-shot prompting

#language

#generativeAI

A prompt that does not provide an example of how you want the large language model to respond. For example:

Parts of one prompt	Notes
`What is the official currency of the specified country?`	The question you want the LLM to answer.
`India:`	The actual query.

The large language model might respond with any of the following:

Rupee
INR
₹
Indian rupee
The rupee
The Indian rupee

All of the answers are correct, though you might prefer a particular format.

Compare and contrast zero-shot prompting with the following terms:

one-shot prompting
few-shot prompting