Alexander Panfilov

Yo! My name is Sasha and I am a second-year ELLIS / IMPRS-IS PhD student, based in Tübingen. I find myself very lucky to be advised byJonas GeipingandMaksym Andriushchenko.
Broadly, I am interested in adversarial robustness, AI safety, and ML security. In practical terms, I enjoy finding various ways to break machine learning systems. Roughly three days a week I am an AI doomer.
Lately, I have been focusing on jailbreaking attacks on LLMs, contemplating: (1) What are the viable threat models for attacks on safety tuning? (2) Are safety jailbreaks truly effective, or are we victims of flawed (LLM-based) evaluations? (3) Are we doomed?
You can find my CV here. I am always open to collaboration — feel free to reach out via email!
News
- May 01, 2025: Our work, "An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks", has been accepted at ICML 2025.
- April 15, 2025: Our work, "ASIDE: Architectural Separation of Instructions and Data in Language Models", has been accepted for an oral presentation at the BuildingTrust Workshop at ICLR 2025.
- November 05, 2024: Presented our work, "Provable Compositional Generalization for Object-Centric Learning" at EPFL (Nicolas Flammarion's group seminar). You can find the slides here.
- October 09, 2024: Our work, "A Realistic Threat Model for Large Language Model Jailbreaks", has been accepted for an oral presentation at the Red Teaming GenAI Workshop at NeurIPS 2024.
- May 01, 2024: Started my PhD at the ELLIS Institute Tübingen / Max Planck Institute for Intelligent Systems. You can find the slides for my IMPRS talk here.
Selected Publications

Alexander Panfilov, Paul Kassianik, Maksym Andriushchenko, Jonas Geiping
preprint
Paper / Code

Valentyn Boreiko*, Alexander Panfilov*, Václav Voráček, Matthias Hein, Jonas Geiping
ICML 2025
Paper / Code

Egor Zverev, Evgenii Kortukov, Alexander Panfilov, Soroush Tabesh, Alexandra Volkova, Sebastian Lapuschkin, Wojciech Samek, Christoph Lampert
BuildingTrust Workshop at ICLR 2025
Paper / Code

Thaddäus Wiedemer*, Jack Brady*, Alexander Panfilov*, Attila Juhos*, Matthias Bethge, Wieland Brendel
ICLR 2024
Paper / Code / Project Page

Arip Asadulaev, Vitaly Shutov, Alexander Korotin, Alexander Panfilov, Vladislava Kontsevaya, Andrey Filchenkov
CoLLAs 2023
Paper
Acknowledgements
I am grateful to the many colleagues I worked with in the past, from whom I learned so much, for their invaluable contributions to my career. I would like to especially acknowledge the mentorship and guidance of Svyatoslav Oreshin, Arip Asadualev, Roland Zimmerman, Thaddaus Wiedemer, Jack Brady, Wieland Brendel, Valentyn Boreiko and Matthias Hein.