Safety
Stay organized with collections
Save and categorize content based on your preferences.
AI safety includes a set of design and operational techniques to follow to
avoid and contain actions that can cause harm, intentionally or unintentionally.
For example, do AI systems behave as intended, even in the face of a security
breach or targeted attack? Is the AI system robust enough to operate safely
even when perturbed? How do you plan ahead to prevent or avoid risks? Is the AI
system reliable and stable under pressure?
One such safety technique is adversarial testing,
or the practice of trying to "break" your own application to learn how it
behaves when provided with malicious or inadvertently harmful input. The
Responsible Generative AI Toolkit
explains more about safety evaluations, including adversarial testing. Learn
more about Google's work in this area and lessons
learned in the Keyword blog post, Google's AI Red Team: the ethical hackers
making AI
safer
or at SAIF: Google's Guide to Secure AI.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-25 UTC.
[null,null,["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eAI safety encompasses design and operational techniques to prevent harm, ensuring AI systems behave as intended, even under pressure or attack.\u003c/p\u003e\n"],["\u003cp\u003eAdversarial testing is a key safety technique where AI systems are intentionally challenged with malicious or harmful input to assess their robustness.\u003c/p\u003e\n"],["\u003cp\u003eGoogle's Responsible AI Practices provide recommendations for protecting AI systems, including methods for adversarial testing and safeguarding against attacks.\u003c/p\u003e\n"]]],[],null,["# Safety\n\n\u003cbr /\u003e\n\nAI **safety** includes a set of design and operational techniques to follow to\navoid and contain actions that can cause harm, intentionally or unintentionally.\nFor example, do AI systems behave as intended, even in the face of a security\nbreach or targeted attack? Is the AI system robust enough to operate safely\neven when perturbed? How do you plan ahead to prevent or avoid risks? Is the AI\nsystem reliable and stable under pressure?\n\nOne such safety technique is [adversarial testing](/machine-learning/guides/adv-testing),\nor the practice of trying to \"break\" your own application to learn how it\nbehaves when provided with malicious or inadvertently harmful input. The\n[Responsible Generative AI Toolkit](https://ai.google.dev/responsible/docs/evaluation)\nexplains more about safety evaluations, including adversarial testing. Learn\nmore about Google's work in this area and lessons\nlearned in the Keyword blog post, [Google's AI Red Team: the ethical hackers\nmaking AI\nsafer](https://blog.google/technology/safety-security/googles-ai-red-team-the-ethical-hackers-making-ai-safer/)\nor at [SAIF: Google's Guide to Secure AI](https://saif.google/)."]]