Google 旗下 AI：AI ACROSS GOOGLE:

PaLM 2

PaLM 2 is our next generation large language model that builds on Google's legacy of breakthrough research in machine learning and responsible AI.

It excels at advanced reasoning tasks, including code and math, classification and question answering, translation and multilingual proficiency, and natural language generation better than our previous state-of-the-art LLMs, including PaLM. It can accomplish these tasks because of the way it was built - bringing together compute-optimal scaling, an improved dataset mixture, and model architecture improvements.

PaLM 2 is grounded in Google's approach to building and deploying AI responsibly. All versions of PaLM 2 are evaluated rigorously for potential harms and biases, capabilities and downstream uses in research and in-product applications. PaLM 2 is used in other state-of-the-art models, like Sec-PaLM. We continue to implement the latest versions of PaLM 2 in generative AI tools like the PaLM API.

PaLM 2 是 Google 的新一代大语言模型，延续了 Google 在 ML 和 Responsible AI 领域不断开拓进取的研究传统。

它特别擅长执行高级推理任务，在代码和数学、分类和问答、翻译和多语言熟练度以及自然语言生成等方面，能力明显优于我们之前开发的先进 LLM，包括 PaLM。该模型之所以能轻松完成这些任务，是因为它整合了计算优化型扩缩技术、更完善的数据集混合方法和改进后的模型架构，集多种优势于一身。

PaLM 2 秉承了 Google 的一贯理念：负责任地构建并部署 AI。我们严格评估了 PaLM 2 的所有版本，分析了它们在研究领域和产品内可能会产生的潜在危害和偏见、展现的功能以及遇到的下游应用场景。PaLM 2 已在其他先进模型内投入使用，例如 Sec-PaLM。我们将继续在生成式 AI 工具（比如 PaLM API）中实现 PaLM 2 的最新版本。

PaLM 2 能力解析

What PaLM 2 can do

推理
Reasoning

PaLM 2 可将一项复杂的任务分解为多项更简单的子任务，并且比之前的 LLM（例如 PaLM）更善于理解人类语言的微妙之处。例如，PaLM 2 精通谜语和习语，这就需要能透过模糊表述和比喻手法等现象看到字词的实质意思，而不是仅仅知道字面意思。
PaLM 2 can decompose a complex task into simpler subtasks and is better at understanding nuances of the human language than previous LLMs, like PaLM. For example, PaLM 2 excels at understanding riddles and idioms, which requires understanding ambiguous and figurative meaning of words, rather than the literal meaning.
多语言翻译
Multilingual translation

与前身 PaLM 相较而言，PaLM 2 在预训练时使用的语料库不仅含有平行多语言文本，而且规模更大、语言种类更丰富。因此，PaLM 2 可以游刃有余地完成多语言任务。
PaLM 2 was pre-trained on parallel multilingual text and on a much larger corpus of different languages than its predecessor, PaLM. This makes PaLM 2 excel at multilingual tasks.
编码
Coding

PaLM 2 在预训练时使用了大量的网页、源代码和其他数据集。这意味着，该模型既精通 Python 和 JavaScript 等流行编程语言，也能轻松使用 Prolog、Fortran 和 Verilog 等语言生成专用代码。所以，只要让 PaLM 2 身兼编码高手和语言大师的双重角色，团队就能轻松实现跨语言协作。
PaLM 2 was pre-trained on a large quantity of webpage, source code and other datasets. This means that it excels at popular programming languages like Python and JavaScript, but is also capable of generating specialized code in languages like Prolog, Fortran, and Verilog. Combining this with its language capabilities can help teams collaborate across languages.

PaLM 2 的构建和评估方式

How PaLM 2 was built and evaluated

构建 PaLM 2

Building PaLM 2

PaLM 2 excels at tasks like advanced reasoning, translation, and code generation because of how it was built. It improves upon its predecessor, PaLM, by unifying three distinct research advancements in large language models:

PaLM 2 能轻车熟路地完成高级推理、翻译和代码生成等任务，这得益于它的构建方式。青出于蓝而胜于蓝，PaLM 2 对前身 PaLM 的超越主要体现在，它整合了大语言模型领域中三项截然不同的研究成果：

Use of compute-optimal scaling: The basic idea of compute-optimal scaling is to scale the model size and the training dataset size in proportion to each other. This new technique makes PaLM 2 smaller than PaLM, but more efficient with overall better performance, including faster inference, fewer parameters to serve, and a lower serving cost.

Improved dataset mixture: Previous LLMs, like PaLM, used pre-training datasets that were mostly English-only text. PaLM 2 improves on its corpus with a more multilingual and diverse pre-training mixture, which includes hundreds of human and programming languages, mathematical equations, scientific papers, and web pages.

Updated model architecture and objective: PaLM 2 has an improved architecture. PaLM 2 and its latest version were trained on a variety of different tasks, all of which helps PaLM 2 learn different aspects of language.

计算优化型扩缩技术： 这项技术的基本理念是，让模型规模和训练数据集规模以互成比例的方式扩缩。有了这项新技术的加持，PaLM 2 比 PaLM 更小但更高效，整体性能也更卓越，包括推理速度更快、参数响应量更少以及响应成本更低。

更完善的混合数据集： 之前的 LLM（例如 PaLM）使用的预训练数据集大多是纯英语文本。相比之下，PaLM 2 的预训练语料库采用了语言种类更多的多样化混合数据集，包括数百种人类语言和编程语言、数学方程式、科学论文和网页。

更新后的模型架构和目标： PaLM 2 具有经过改进的架构。PaLM 2 及其最新版本受过各种不同任务的训练，让 PaLM 2 得以学习涉及语言的各方面知识。

评估 PaLM 2

Evaluating PaLM 2

对于推理基准测试任务，PaLM 2 的成绩非常出色。例如，PaLM 2 的 2023 年 5 月版本受过多项任务（例如 WinoGrande 和 BigBench-Hard）和基准测试（例如 XSum、WikiLingua 和 XLSum）的考核。对于基准测试，该模型取得的多语言考核成绩明显优于其前身 PaLM，展现的语言翻译能力（比如葡萄牙语和中文）也超越了 PaLM 和谷歌翻译服务。

PaLM 2 achieves state of the art results on reasoning benchmark tasks. For example, the May 2023 version of PaLM 2 was evaluated on tasks such as WinoGrande and BigBench-Hard and on benchmarks such as XSum, WikiLingua, and XLSum. On the latter, it significantly achieved better multilingual results than our previous large language model, PaLM, and improved translation capability over PaLM and Google Translate in languages like Portuguese and Chinese.

PaLM 2 及其后续版本更新将继续遵循我们的 Responsible AI 开发实践，并将继续履行我们对安全性的承诺。

PaLM 2, and its ongoing version updates, continue to follow our responsible AI development practices and commitment to safety.

Pre-training Data: We apply our Responsible AI Practices, filter duplicate documents to reduce memorization, and have shared analysis of how people are represented in pre-training data.

New Capabilities: PaLM 2 demonstrates improved multilingual toxicity classification capabilities, and has built-in control over toxic generation.

Evaluations: We evaluate potential harms and bias across a range of potential downstream uses for PaLM 2 and its version updates, including dialog, classification, translation, and question answering. This includes developing new evaluations for measuring potential harms in generative question-answering settings and dialog settings related to toxic language harms and social bias related to identity terms.

预训练数据： 我们遵循 Responsible AI 实践，滤除重复文档以缩减记忆量，并分享了对如何在预训练数据中正确处理人员表征的分析。

新功能： PaLM 2 具备更强的多语言恶意分类能力，并内置了恶意内容生成管控机制。

评估： 我们严格评估了 PaLM 2 及其新版本在各种潜在下游应用场景（包括对话、分类、翻译和问答）中可能会产生的危害和偏见。这包括针对生成式问答场景和对话场景开发新的评估方法，以衡量恶意语言的危害和身份认同方面的社会偏见可能会造成的负面影响。