8. Security - Hacking & Guardrails in LLM

8. Security - Hacking & Guardrails in LLMยถ

Summaryยถ

LLM guardrails๋Š” AI ์‹œ์Šคํ…œ์˜ ํ–‰๋™๊ณผ ์ถœ๋ ฅ์„ ์ œ์–ดํ•˜๊ธฐ ์œ„ํ•œ ์‚ฌ์ „ ์ •์˜๋œ ํ”„๋กœํ† ์ฝœ, ๊ทœ์น™, ๋ฐ ์ œํ•œ ์‚ฌํ•ญ์˜ ์ง‘ํ•ฉ์ž…๋‹ˆ๋‹ค. ์ด๋“ค์€ AI ์†Œํ”„ํŠธ์›จ์–ด๊ฐ€ ์œค๋ฆฌ์  ๊ธฐ์ค€์„ ์ถฉ์กฑํ•˜๋„๋ก ๋ณด์žฅํ•˜๋Š” ์•ˆ์ „ ๋ฉ”์ปค๋‹ˆ์ฆ˜์œผ๋กœ ์ž‘๋™ํ•˜๋ฉฐ, AI ๊ฐœ๋ฐœ ๊ณผ์ •์—์„œ ๋ฒ•์  ์ค€์ˆ˜, ๊ฐœ์ธ ์ •๋ณด ๋ณดํ˜ธ, ์œค๋ฆฌ์  ๊ณ ๋ ค ์‚ฌํ•ญ, ๊ทธ๋ฆฌ๊ณ  ์•ˆ์ „ ๋ฐ ๋ณด์•ˆ์„ ๊ฐ•ํ™”ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, LLM guardrails๋Š” AI ์‹œ์Šคํ…œ์ด ์•…์˜์ ์ธ ์ž…๋ ฅ์ด๋‚˜ ์ถœ๋ ฅ์„ ๋ฐฉ์ง€ํ•˜๊ณ , ์‚ฌ์šฉ์ž์™€ ๊ฐœ๋ฐœ์ž๊ฐ€ AI์˜ ๋™์ž‘์„ ์ดํ•ดํ•˜๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค.

Key Conceptsยถ

  • LLM Guardrails : AI ์‹œ์Šคํ…œ์˜ ํ–‰๋™๊ณผ ์ถœ๋ ฅ์„ ์ œ์–ดํ•˜๊ธฐ ์œ„ํ•œ ์‚ฌ์ „ ์ •์˜๋œ ํ”„๋กœํ† ์ฝœ, ๊ทœ์น™, ๋ฐ ์ œํ•œ ์‚ฌํ•ญ์˜ ์ง‘ํ•ฉ์œผ๋กœ, AI ์†Œํ”„ํŠธ์›จ์–ด๊ฐ€ ์œค๋ฆฌ์  ๊ธฐ์ค€์„ ์ถฉ์กฑํ•˜๋„๋ก ๋ณด์žฅํ•˜๋Š” ์•ˆ์ „ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ž…๋‹ˆ๋‹ค.

  • Guardrail Types : Adaptive Guardrails, Input Validation, Output Filtering, Legal Compliance, Privacy Preservation, Ethical Considerations, Safety and Security ๋“ฑ ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ guardrails์ด ์กด์žฌํ•˜๋ฉฐ, ๊ฐ๊ฐ AI ์‹œ์Šคํ…œ์˜ ํŠน์ •ํ•œ ์œ„ํ—˜์„ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • Implementation : Guardrails AI, NVIDIA์˜ NeMo Guardrails, Amazon Bedrock Guardrails ๋“ฑ ๋‹ค์–‘ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ์™€ ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ LLM guardrails์„ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • Hacking and Bypass Techniques : ์•…์˜์ ์ธ ์‚ฌ์šฉ์ž๊ฐ€ LLM guardrails์„ ์šฐํšŒํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ๋‹ค์–‘ํ•œ ๊ธฐ์ˆ ์ด ์กด์žฌํ•˜๋ฉฐ, ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์ง€์†์ ์ธ ๋ชจ๋‹ˆํ„ฐ๋ง๊ณผ ๋ณด์•ˆ ์—…๋ฐ์ดํŠธ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

Referencesยถ

URL Name

URL

Heavybit - LLM Guardrails

https://www.heavybit.com/library/article/how-llm-guardrails-reduce-ai-risk-in-software-development

AWS - Building Safe and Responsible Generative AI Applications with Guardrails

https://aws.amazon.com/blogs/machine-learning/build-safe-and-responsible-generative-ai-applications-with-guardrails/

YouTube - Attacking AI

Bypass Guardrails

Towards Data Science - Safeguarding LLMs with Guardrails

https://towardsdatascience.com/safeguarding-llms-with-guardrails-4f5d9f57cff2

Neptune.ai - LLM Guardrails: Secure and Controllable Deployment

https://neptune.ai/blog/llm-guardrails