8. Security - Hacking & Guardrails in LLM¶

Summary¶

LLM guardrails는 AI 시스템의 행동과 출력을 제어하기 위한 사전 정의된 프로토콜, 규칙, 및 제한 사항의 집합입니다. 이들은 AI 소프트웨어가 윤리적 기준을 충족하도록 보장하는 안전 메커니즘으로 작동하며, AI 개발 과정에서 법적 준수, 개인 정보 보호, 윤리적 고려 사항, 그리고 안전 및 보안을 강화합니다. 또한, LLM guardrails는 AI 시스템이 악의적인 입력이나 출력을 방지하고, 사용자와 개발자가 AI의 동작을 이해하고 신뢰할 수 있도록 도와줍니다.

Key Concepts¶

LLM Guardrails : AI 시스템의 행동과 출력을 제어하기 위한 사전 정의된 프로토콜, 규칙, 및 제한 사항의 집합으로, AI 소프트웨어가 윤리적 기준을 충족하도록 보장하는 안전 메커니즘입니다.
Guardrail Types : Adaptive Guardrails, Input Validation, Output Filtering, Legal Compliance, Privacy Preservation, Ethical Considerations, Safety and Security 등 다양한 유형의 guardrails이 존재하며, 각각 AI 시스템의 특정한 위험을 완화하기 위해 설계되었습니다.
Implementation : Guardrails AI, NVIDIA의 NeMo Guardrails, Amazon Bedrock Guardrails 등 다양한 프레임워크와 도구를 사용하여 LLM guardrails을 구현할 수 있습니다.
Hacking and Bypass Techniques : 악의적인 사용자가 LLM guardrails을 우회하기 위해 사용하는 다양한 기술이 존재하며, 이를 방지하기 위해 지속적인 모니터링과 보안 업데이트가 필요합니다.

References¶

URL Name	URL
Heavybit - LLM Guardrails	https://www.heavybit.com/library/article/how-llm-guardrails-reduce-ai-risk-in-software-development
AWS - Building Safe and Responsible Generative AI Applications with Guardrails	https://aws.amazon.com/blogs/machine-learning/build-safe-and-responsible-generative-ai-applications-with-guardrails/
YouTube - Attacking AI	Bypass Guardrails
Towards Data Science - Safeguarding LLMs with Guardrails	https://towardsdatascience.com/safeguarding-llms-with-guardrails-4f5d9f57cff2
Neptune.ai - LLM Guardrails: Secure and Controllable Deployment	https://neptune.ai/blog/llm-guardrails

LLM Engineering Handbook

8. Security - Hacking & Guardrails in LLM

Contents

8. Security - Hacking & Guardrails in LLM¶

Summary¶

Key Concepts¶

References¶