Alexander Panfilov

Yo! My name is Sasha and I am a first-year ELLIS / IMPRS-IS PhD student, based in Tübingen. I find myself very lucky to be advised byJonas GeipingandMaksym Andriushchenko.
Broadly, I am interested in adversarial robustness, AI safety, and ML security. In practical terms, I enjoy finding various ways to break machine learning systems.
Lately, I have been focusing on jailbreaking attacks on LLMs, contemplating: (1) What are the viable threat models for attacks on safety tuning? (2) Are safety jailbreaks truly effective, or are we victims of flawed (LLM-based) evaluations? (3) Are we doomed?
You can find my CV here. I am always open to collaboration — feel free to reach out via email!
News
- November 05, 2024: Presented our work, "Provable Compositional Generalization for Object-Centric Learning" at EPFL (Nicolas Flammarion's group seminar). You can find the slides here.
- October 09, 2024: Our work, "A Realistic Threat Model for Large Language Model Jailbreaks", has been accepted for an oral presentation at the Red Teaming GenAI Workshop at NeurIPS 2024.
- May 01, 2024: Started my PhD at the ELLIS Institute Tübingen / Max Planck Institute for Intelligent Systems. You can find the slides for my IMPRS talk here.
Selected Publications

Valentyn Boeiko*, Alexander Panfilov*, Václav Voráček, Matthias Hein, Jonas Geiping
Red Teaming GenAI Workshop at NeurIPS, 2024
Paper / Code

Thaddäus Wiedemer*, Jack Brady*, Alexander Panfilov*, Attila Juhos*, Matthias Bethge, Wieland Brendel
The Twelfth International Conference on Learning Representations (ICLR), 2024
Paper / Code / Project Page

Arip Asadulaev, Vitaly Shutov, Alexander Korotin, Alexander Panfilov, Vladislava Kontsevaya, Andrey Filchenkov
The Second Conference on Lifelong Learning Agents (CoLLAs), 2023
Paper
Acknowledgements
I am grateful to the many colleagues I worked with in the past, from whom I learned so much, for their invaluable contributions to my career. I would like to especially acknowledge the mentorship and guidance of Svyatoslav Oreshin, Arip Asadualev, Roland Zimmerman, Thaddaus Wiedemer, Jack Brady, and Wieland Brendel.