Alexander Panfilov
Yo! My name is Sasha and I am a second-year ELLIS / IMPRS-IS PhD student, based in Tübingen. I find myself very lucky to be advised by Jonas Geiping and Maksym Andriushchenko.
Broadly, I am interested in adversarial robustness, AI safety, and ML security. In practical terms, I enjoy finding various ways to break machine learning systems. Roughly three days a week I am an AI doomer.
Lately, I have been focusing on jailbreaking attacks on LLMs, contemplating: (1) What are the viable threat models for attacks on safety tuning? (2) Are safety jailbreaks truly effective, or are we victims of flawed (LLM-based) evaluations? (3) Are we doomed?
You can find my CV here. I am always open to collaboration — feel free to reach out via email!
News
- September 01, 2025: Kristina Nikolić, Evgenii Kortukov, and I won third place at the ARENA 6.0 Mechanistic Interpretability Hackathon by Apart Research in LISA (London)!
- July 09, 2025: Capability-Based Scaling Laws for LLM Red-Teaming accepted at ICML 2025 Workshop on Reliable and Responsible Foundation Models!
- June 23, 2025: Presented our work Capability-Based Scaling Laws for LLM Red-Teaming and ASIDE at the Google's Red Teaming seminar. You can find the slides here. Thanks for the invitation!
- May 01, 2025: Our work, An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks, has been accepted at ICML 2025.
- April 15, 2025: Our work, ASIDE: Architectural Separation of Instructions and Data in Language Models, has been accepted for an oral presentation at the BuildingTrust Workshop at ICLR 2025.
- November 05, 2024: Presented our work, Provable Compositional Generalization for Object-Centric Learning at EPFL (Nicolas Flammarion's group seminar). You can find the slides here.
- October 09, 2024: Our work, A Realistic Threat Model for Large Language Model Jailbreaks, has been accepted for an oral presentation at the Red Teaming GenAI Workshop at NeurIPS 2024.
- May 01, 2024: Started my PhD at the ELLIS Institute Tübingen / Max Planck Institute for Intelligent Systems. You can find the slides for my IMPRS talk here.
Selected Publications

Mikhail Terekhov*, , Daniil Dzenhaliou*, Caglar Gulcehre, Maksym Andriushchenko, Ameya Prabhu, Jonas Geiping
preprint
Paper / Project Page

, Evgenii Kortukov*, Kristina Nikolić, Matthias Bethge, Sebastian Lapuschkin, Wojciech Samek, Ameya Prabhu, Maksym Andriushchenko, Jonas Geiping
preprint
Paper

, Paul Kassianik, Maksym Andriushchenko, Jonas Geiping
preprint
Paper / Code

Valentyn Boreiko*, , Václav Voráček, Matthias Hein, Jonas Geiping
ICML 2025
Paper / Code

Egor Zverev, Evgenii Kortukov, , Soroush Tabesh, Alexandra Volkova, Sebastian Lapuschkin, Wojciech Samek, Christoph Lampert
BuildingTrust Workshop at ICLR 2025
Paper / Code

Thaddäus Wiedemer*, Jack Brady*, , Attila Juhos*, Matthias Bethge, Wieland Brendel
ICLR 2024
Paper / Code / Project Page
Acknowledgements
I am grateful to the many colleagues I worked with in the past, from whom I learned so much, for their invaluable contributions to my career. I would like to especially acknowledge the mentorship and guidance of Svyatoslav Oreshin, Arip Asadualev, Roland Zimmerman, Thaddaus Wiedemer, Jack Brady, Wieland Brendel, Valentyn Boreiko and Matthias Hein.