In the growing canon of AI security, the indirect prompt injection has emerged as the most powerful means for attackers to hack large language models such as OpenAIs GPT 3 and GPT 4 or Microsofts Copilot. By exploiting a model inability to distinguish between, on the one hand, developer defined prompts and, on the other, text in external content LLMs interact with, indirect prompt injections are remarkably effective at invoking harmful or otherwise unintended actions. Examples include divulging...
