AI safety includes a set of design and operational techniques to follow to avoid and contain actions that can cause harm, intentionally or unintentionally. For example, do AI systems behave as intended, even in the face of a security breach or targeted attack? Is the AI system robust enough to operate safely even when perturbed? How do you plan ahead to prevent or avoid risks? Is the AI system reliable and stable under pressure?
One such safety technique is adversarial testing, or the practice of trying to "break" your own application to learn how it behaves when provided with malicious or inadvertently harmful input. The Safety section of Google's Responsible AI Practices outlines recommended practices to protect AI systems from attacks, including adversarial testing. Learn more about Google's work in this area and lessons learned in the Keyword blog post, Google's AI Red Team: the ethical hackers making AI safer or at SAIF: Google's Guide to Secure AI.