I’ve been sitting with a question that doesn’t fit neatly into any one field, which is usually a sign it’s worth chasing. The question is this: if prompt injection is fundamentally an attack on an agent’s ordering of values — slipping a low-priority instruction past a higher-priority commitment — then why do some humans seem nearly immune to the analogous attack in the wild, while others fall for it constantly? And what does that asymmetry tell us about how to build AI systems that don’t? https://blossom.primal.net/3d8770ba78378afe30956854db66673d87fc91a5d85f8bcbb657d6c90fb38c40.gif