프롬프트 해킹(Prompt Hacking)

Contents

프롬프트 해킹(Prompt Hacking)¶

요약¶

프롬프트 해킹은 LLM(대형 언어 모델)의 입력을 조작하여 의도하지 않은 결과를 얻는 공격 기법입니다. 이 공격은 LLM의 복잡한 언어 이해 능력을 악용하여 모델이 의도하지 않은 행동을 하도록 유도합니다. 프롬프트 해킹은 LLM의 보안에 심각한 위협을 가하며, 이를 방지하기 위해서는 강력한 보안 조치와 지속적인 모니터링이 필요합니다.

주요 개념¶

프롬프트 인젝션(Prompt Injection) : 악의적인 프롬프트를 합법적인 프롬프트로 위장하여 LLM을 조작하는 공격 기법입니다.
프롬프트 리크(Prompt Leakage) : LLM의 내부 데이터를 노출시키는 공격 기법으로, 이는 민감한 정보의 유출을 초래할 수 있습니다.
제일브레이킹(Jailbreaking) : LLM의 보안 조치를 우회하여 모델을 조작하는 공격 기법입니다.

참고자료¶

URL 이름	URL
OWASP LLM Prompt Hacking	https://owasp.org/www-project-llm-prompt-hacking/
Prompt Hacking: The New Cyber Threat	https://promptengineering.org/the-rise-of-a-new-threat-prompt-hacking/
Prompt Hacking: Understanding Types and Defenses for LLM Security	https://learnprompting.org/docs/prompt_hacking/introduction
Prompt Hacking of Large Language Models - Comet.ml	https://www.comet.com/site/blog/prompt-hacking-of-large-language-models/
Prompt Hacking of Large Language Models (LLMs)	https://futureskillsacademy.com/blog/prompt-hacking-of-large-language-models/

previous

프롬프트 가드레일(Prompt Guardrail)

next

9. Agents and Tools in LLM