Alexander Panfilov

Google Scholar
Profile picture

Yo! My name is Sasha and I am a first-year ELLIS / IMPRS-IS PhD student, based in Tübingen. I find myself very lucky to be advised byJonas GeipingandMaksym Andriushchenko.

Broadly, I am interested in adversarial robustness, AI safety, and ML security. In practical terms, I enjoy finding various ways to break machine learning systems.

Lately, I have been focusing on jailbreaking attacks on LLMs, contemplating: (1) What are the viable threat models for attacks on safety tuning? (2) Are safety jailbreaks truly effective, or are we victims of flawed (LLM-based) evaluations? (3) Are we doomed?

You can find my CV here. I am always open to collaboration — feel free to reach out via email!

News

  • November 05, 2024: Presented our work, "Provable Compositional Generalization for Object-Centric Learning" at EPFL (Nicolas Flammarion's group seminar). You can find the slides here.
  • October 09, 2024: Our work, "A Realistic Threat Model for Large Language Model Jailbreaks", has been accepted for an oral presentation at the Red Teaming GenAI Workshop at NeurIPS 2024.
  • May 01, 2024: Started my PhD at the ELLIS Institute Tübingen / Max Planck Institute for Intelligent Systems. You can find the slides for my IMPRS talk here.

Selected Publications

Project image
A Realistic Threat Model for Large Language Models Jailbreaks
Valentyn Boeiko*, Alexander Panfilov*, Václav Voráček, Matthias Hein, Jonas Geiping
Red Teaming GenAI Workshop at NeurIPS, 2024
Paper / Code
Project image
Provable Compositional Generalization for Object-Centric Learning
Thaddäus Wiedemer*, Jack Brady*, Alexander Panfilov*, Attila Juhos*, Matthias Bethge, Wieland Brendel
The Twelfth International Conference on Learning Representations (ICLR), 2024
Paper / Code / Project Page
Project image
A Minimalist Approach for Domain Adaptation with Optimal Transport
Arip Asadulaev, Vitaly Shutov, Alexander Korotin, Alexander Panfilov, Vladislava Kontsevaya, Andrey Filchenkov
The Second Conference on Lifelong Learning Agents (CoLLAs), 2023
Paper

Acknowledgements

I am grateful to the many colleagues I worked with in the past, from whom I learned so much, for their invaluable contributions to my career. I would like to especially acknowledge the mentorship and guidance of Svyatoslav Oreshin, Arip Asadualev, Roland Zimmerman, Thaddaus Wiedemer, Jack Brady, and Wieland Brendel.