LLM Jailbreak Taxonomy · AI Safety Research · 2026
40 adversarial attack patterns. 10 mechanism-grounded categories.
Mapped to the safety-alignment assumptions each subverts.
Every citation direct-WebFetch verified. Refuted claims documented.
