MSc.

Moritz Vandenhirtz

PhD Student

E-Mail: moritz.vandenhirtz@inf.ethz.ch
Address: Department of Computer Science
CAB G 15.2
Universitätstr. 6
CH – 8092 Zurich, Switzerland
Room: CAB G 15.2

I completed my bachelor's degree in Banking and Finance at the University of Zurich in 2020, where I worked with Prof. Dr. Michael Wolf on additive, high-dimensional models for predicting stock returns. I obtained my master's degree in Statistics at ETH in 2022, where I acquired a strong background in machine learning and was awarded the Willi Studer Prize. My master's thesis focused on interpretably discovering and removing hidden biases. In October 2022, I joined the Medical Data Science lab as a PhD student.

I am excited about exploring techniques that give insight into the decision making process of machine learning models and leveraging this extracted knowledge for medical problems. At the time of writing, my focus lies in (inherently) interpretable machine learning methods and representation learning. Additionally, I am curious about combining aforementioned topics with other fields such as adversarial machine learning, anomaly detection, drug discovery, or reinforcement learning.

N Deperrois, H Matsuo, S Ruipérez-Campillo, M Vandenhirtz, S Laguna, A Ryser, K Fujimoto, M Nishio, TM Sutter, JE Vogt, J Kluckert, T Frauenfelder, C Blüthgen, F Nooralahzadeh, M KrauthammerRadVLM: A Multitask Conversational Vision-Language Model for RadiologyarXiv

Abstract

The widespread use of chest X-rays (CXRs), coupled with a shortage of radiologists, has driven growing interest in automated CXR analysis and AI-assisted reporting. While existing vision-language models (VLMs) show promise in specific tasks such as report generation or abnormality detection, they often lack support for interactive diagnostic capabilities. In this work we present RadVLM, a compact, multitask conversational foundation model designed for CXR interpretation. To this end, we curate a large-scale instruction dataset comprising over 1 million image-instruction pairs containing both single-turn tasks -- such as report generation, abnormality classification, and visual grounding -- and multi-turn, multi-task conversational interactions. After fine-tuning RadVLM on this instruction dataset, we evaluate it across different tasks along with re-implemented baseline VLMs. Our results show that RadVLM achieves state-of-the-art performance in conversational capabilities and visual grounding while remaining competitive in other radiology tasks. Ablation studies further highlight the benefit of joint training across multiple tasks, particularly for scenarios with limited annotated data. Together, these findings highlight the potential of RadVLM as a clinically relevant AI assistant, providing structured CXR interpretation and conversational capabilities to support more effective and accessible diagnostic workflows.

Authors

N Deperrois, H Matsuo, S Ruipérez-Campillo, M Vandenhirtz, S Laguna, A Ryser, K Fujimoto, M Nishio, TM Sutter, JE Vogt, J Kluckert, T Frauenfelder, C Blüthgen, F Nooralahzadeh, M Krauthammer

Submitted

arXiv

Date

01.02.2025

Link DOI

Jorge da Silva Goncalves, Laura Manduchi, Moritz Vandenhirtz, Julia E. VogtHierarchical Clustering for Conditional Diffusion in Image GenerationarXiv

Abstract

Finding clusters of data points with similar characteristics and generating new cluster-specific samples can significantly enhance our understanding of complex data distributions. While clustering has been widely explored using Variational Autoencoders, these models often lack generation quality in real-world datasets. This paper addresses this gap by introducing TreeDiffusion, a deep generative model that conditions Diffusion Models on hierarchical clusters to obtain high-quality, cluster-specific generations. The proposed pipeline consists of two steps: a VAE-based clustering model that learns the hierarchical structure of the data, and a conditional diffusion model that generates realistic images for each cluster. We propose this two-stage process to ensure that the generated samples remain representative of their respective clusters and enhance image fidelity to the level of diffusion models. A key strength of our method is its ability to create images for each cluster, providing better visualization of the learned representations by the clustering model, as demonstrated through qualitative results. This method effectively addresses the generative limitations of VAE-based approaches while preserving their clustering performance. Empirically, we demonstrate that conditioning diffusion models on hierarchical clusters significantly enhances generative performance, thereby advancing the state of generative clustering models.

Authors

Jorge da Silva Goncalves, Laura Manduchi, Moritz Vandenhirtz, Julia E. Vogt

Submitted

arXiv

Date

22.10.2024

Link DOI

Emanuele Palumbo, Moritz Vandenhirtz, Alain Ryser, Imant Daunhawer^†, Julia E. Vogt^†
^† denotes shared last authorshipFrom Logits to Hierarchies: Hierarchical Clustering made SimplePreprint

Abstract

The structure of many real-world datasets is intrinsically hierarchical, making the modeling of such hierarchies a critical objective in both unsupervised and supervised machine learning. Recently, novel approaches for hierarchical clustering with deep architectures have been proposed. In this work, we take a critical perspective on this line of research and demonstrate that many approaches exhibit major limitations when applied to realistic datasets, partly due to their high computational complexity. In particular, we show that a lightweight procedure implemented on top of pre-trained non-hierarchical clustering models outperforms models designed specifically for hierarchical clustering. Our proposed approach is computationally efficient and applicable to any pre-trained clustering model that outputs logits, without requiring any fine-tuning. To highlight the generality of our findings, we illustrate how our method can also be applied in a supervised setup, recovering meaningful hierarchies from a pre-trained ImageNet classifier.

Authors

Emanuele Palumbo, Moritz Vandenhirtz, Alain Ryser, Imant Daunhawer^†, Julia E. Vogt^†
^† denotes shared last authorship

Submitted

Preprint

Date

10.10.2024

DOI

Jorge da Silva Gonçalves, Laura Manduchi, Moritz Vandenhirtz, Julia E. VogtStructured Generations: Using Hierarchical Clusters to guide Diffusion ModelsICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling

Abstract

This paper introduces Diffuse-TreeVAE, a deep generative model that integrates hierarchical clustering into the framework of Denoising Diffusion Probabilistic Models (DDPMs). The proposed approach generates new images by sampling from a root embedding of a learned latent tree VAE-based structure, it then propagates through hierarchical paths, and utilizes a second-stage DDPM to refine and generate distinct, high-quality images for each data cluster. The result is a model that not only improves image clarity but also ensures that the generated samples are representative of their respective clusters, addressing the limitations of previous VAE-based methods and advancing the state of clustering-based generative modeling.

Authors

Jorge da Silva Gonçalves, Laura Manduchi, Moritz Vandenhirtz, Julia E. Vogt

Submitted

ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling

Date

27.07.2024

Link

Moritz Vandenhirtz^, Sonia Laguna^, Ricards Marcinkevics, Julia E. Vogt
^* denotes shared first authorshipStochastic Concept Bottleneck ModelsICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling, Workshop on Models of Human Feedback for AI Alignment, and Workshop on Humans, Algorithmic Decision-Making and Society

Abstract

Concept Bottleneck Models (CBMs) have emerged as a promising interpretable method whose final prediction is based on intermediate, human-understandable concepts rather than the raw input. Through time-consuming manual interventions, a user can correct wrongly predicted concept values to enhance the model's downstream performance. We propose Stochastic Concept Bottleneck Models (SCBMs), a novel approach that models concept dependencies. In SCBMs, a single-concept intervention affects all correlated concepts. Leveraging the parameterization, we derive an effective intervention strategy based on the confidence region. We show empirically on synthetic tabular and natural image datasets that our approach improves intervention effectiveness significantly. Notably, we showcase the versatility and usability of SCBMs by examining a setting with CLIP-inferred concepts, alleviating the need for manual concept annotations.

Authors

Moritz Vandenhirtz^*, Sonia Laguna^*, Ricards Marcinkevics, Julia E. Vogt
^* denotes shared first authorship

Submitted

ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling, Workshop on Models of Human Feedback for AI Alignment, and Workshop on Humans, Algorithmic Decision-Making and Society

Date

26.07.2024

Link

Sonia Laguna^, Ricards Marcinkevics^, Moritz Vandenhirtz, Julia E. Vogt
^* denotes shared first authorshipBeyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?Arxiv

Abstract

Recently, interpretable machine learning has re-explored concept bottleneck models (CBM), comprising step-by-step prediction of the high-level concepts from the raw features and the target variable from the predicted concepts. A compelling advantage of this model class is the user's ability to intervene on the predicted concept values, affecting the model's downstream output. In this work, we introduce a method to perform such concept-based interventions on already-trained neural networks, which are not interpretable by design, given an annotated validation set. Furthermore, we formalise the model's intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black-box models. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We demonstrate that fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of the proposed techniques, we apply them to deep chest X-ray classifiers and show that fine-tuned black boxes can be as intervenable and more performant than CBMs.

Authors

Sonia Laguna^*, Ricards Marcinkevics^*, Moritz Vandenhirtz, Julia E. Vogt
^* denotes shared first authorship

Submitted

Arxiv

Date

24.01.2024

Link

Laura Manduchi^, Moritz Vandenhirtz^, Alain Ryser, Julia E. Vogt
^* denotes shared first authorshipTree Variational AutoencodersSpotlight at Neural Information Processing Systems, NeurIPS 2023

Abstract

We propose Tree Variational Autoencoder (TreeVAE), a new generative hierarchical clustering model that learns a flexible tree-based posterior distribution over latent variables. TreeVAE hierarchically divides samples according to their intrinsic characteristics, shedding light on hidden structures in the data. It adapts its architecture to discover the optimal tree for encoding dependencies between latent variables. The proposed tree-based generative architecture enables lightweight conditional inference and improves generative performance by utilizing specialized leaf decoders. We show that TreeVAE uncovers underlying clusters in the data and finds meaningful hierarchical relations between the different groups on a variety of datasets, including real-world imaging data. We present empirically that TreeVAE provides a more competitive log-likelihood lower bound than the sequential counterparts. Finally, due to its generative nature, TreeVAE is able to generate new samples from the discovered clusters via conditional sampling.

Authors

Laura Manduchi^*, Moritz Vandenhirtz^*, Alain Ryser, Julia E. Vogt
^* denotes shared first authorship

Submitted

Spotlight at Neural Information Processing Systems, NeurIPS 2023

Date

20.12.2023

Link Code

Ricards Marcinkevics^, Sonia Laguna^, Moritz Vandenhirtz, Julia E. Vogt
^* denotes shared first authorshipBeyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?XAI in Action: Past, Present, and Future Applications, NeurIPS 2023

Abstract

Recently, interpretable machine learning has re-explored concept bottleneck models (CBM), comprising step-by-step prediction of the high-level concepts from the raw features and the target variable from the predicted concepts. A compelling advantage of this model class is the user's ability to intervene on the predicted concept values, consequently affecting the model's downstream output. In this work, we introduce a method to perform such concept-based interventions on already-trained neural networks, which are not interpretable by design. Furthermore, we formalise the model's intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black-box models. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We demonstrate that fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of the proposed techniques, we apply them to chest X-ray classifiers and show that fine-tuned black boxes can be as intervenable and more performant than CBMs.

Authors

Ricards Marcinkevics^*, Sonia Laguna^*, Moritz Vandenhirtz, Julia E. Vogt
^* denotes shared first authorship

Submitted

XAI in Action: Past, Present, and Future Applications, NeurIPS 2023

Date

16.12.2023

Link

Claudio Fanconi^, Moritz Vandenhirtz^, Severin Husmann, Julia E. Vogt
^* denotes shared first authorshipThis Reads Like That: Deep Learning for Interpretable Natural Language ProcessingConference on Empirical Methods in Natural Language Processing, EMNLP 2023

Abstract

Prototype learning, a popular machine learning method designed for inherently interpretable decisions, leverages similarities to learned prototypes for classifying new data. While it is mainly applied in computer vision, in this work, we build upon prior research and further explore the extension of prototypical networks to natural language processing. We introduce a learned weighted similarity measure that enhances the similarity computation by focusing on informative dimensions of pre-trained sentence embeddings. Additionally, we propose a post-hoc explainability mechanism that extracts prediction-relevant words from both the prototype and input sentences. Finally, we empirically demonstrate that our proposed method not only improves predictive performance on the AG News and RT Polarity datasets over a previous prototype-based approach, but also improves the faithfulness of explanations compared to rationale-based recurrent convolutions.

Authors

Claudio Fanconi^*, Moritz Vandenhirtz^*, Severin Husmann, Julia E. Vogt
^* denotes shared first authorship

Submitted

Conference on Empirical Methods in Natural Language Processing, EMNLP 2023

Date

25.10.2023

Link DOI Code

Laura Manduchi^, Moritz Vandenhirtz^, Alain Ryser, Julia E. Vogt
^* denotes shared first authorshipTree Variational AutoencodersICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling

Abstract

We propose a new generative hierarchical clustering model that learns a flexible tree-based posterior distribution over latent variables. The proposed Tree Variational Autoencoder (TreeVAE) hierarchically divides samples according to their intrinsic characteristics, shedding light on hidden structures in the data. It adapts its architecture to discover the optimal tree for encoding dependencies between latent variables, improving generative performance. We show that TreeVAE uncovers underlying clusters in the data and finds meaningful hierarchical relations between the different groups on several datasets. Due to its generative nature, TreeVAE can generate new samples from the discovered clusters via conditional sampling.

Authors

Laura Manduchi^*, Moritz Vandenhirtz^*, Alain Ryser, Julia E. Vogt
^* denotes shared first authorship

Submitted

ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling

Date

30.06.2023

Link Code

Laura Manduchi^, Moritz Vandenhirtz^, Alain Ryser, Julia E. Vogt
^* denotes shared first authorshipTree Variational AutoencodersICML 2023 Workshop on Deployment Challenges for Generative AI

Abstract

Authors

Laura Manduchi^*, Moritz Vandenhirtz^*, Alain Ryser, Julia E. Vogt
^* denotes shared first authorship

Submitted

ICML 2023 Workshop on Deployment Challenges for Generative AI

Date

30.06.2023

Link Code

Moritz Vandenhirtz, Laura Manduchi, Ricards Marcinkevics, Julia E. VogtSignal Is Harder To Learn Than Bias: Debiasing with Focal LossDomain Generalization Workshop, ICLR 2023

Abstract

Spurious correlations are everywhere. While humans often do not perceive them, neural networks are notorious for learning unwanted associations, also known as biases, instead of the underlying decision rule. As a result, practitioners are often unaware of the biased decision-making of their classifiers. Such a biased model based on spurious correlations might not generalize to unobserved data, leading to unintended, adverse consequences. We propose Signal is Harder (SiH), a variational-autoencoder-based method that simultaneously trains a biased and unbiased classifier using a novel, disentangling reweighting scheme inspired by the focal loss. Using the unbiased classifier, SiH matches or improves upon the performance of state-of-the-art debiasing methods. To improve the interpretability of our technique, we propose a perturbation scheme in the latent space for visualizing the bias that helps practitioners become aware of the sources of spurious correlations.

Authors

Moritz Vandenhirtz, Laura Manduchi, Ricards Marcinkevics, Julia E. Vogt

Submitted

Domain Generalization Workshop, ICLR 2023

Date

04.05.2023

Link Code

Sarah Hagmann, Venkat Ramakrishnan, Alexander Tamalunas, Marc Hofmann, Moritz Vandenhirtz, Silvan Vollmer, Jsmea Hug, Philipp Niggli, Antonio Nocito, Rahel A. Kubik-Huch, Kurt Lehmann, Lukas John HefermehlTwo Decades of Active Surveillance for Prostate Cancer in a Single-Center Cohort: Favorable Outcomes after Transurethral Resection of the ProstateCancers

Abstract

Objective: To report the outcomes of active surveillance (AS) for low-risk prostate cancer (PCa) in a single-center cohort. Patients and Methods: This is a prospective, single-center, observational study. The cohort included all patients who underwent AS for PCa between December 1999 and December 2020 at our institution. Follow-up appointments (FU) ended in February 2021. Results: A total of 413 men were enrolled in the study, and 391 had at least one FU. Of those who followed up, 267 had PCa diagnosed by transrectal ultrasound (TRUS)-guided biopsy (T1c: 68.3%), while 124 were diagnosed after transurethral resection of the prostate (TURP) (T1a/b: 31.7%). Median FU was 46 months (IQR 25–90). Cancer specific survival was 99.7% and overall survival was 92.3%. Median reclassification time was 11.2 years. After 20 years, 25% of patients were reclassified within 4.58 years, 6.6% opted to switch to watchful waiting, 4.1% died, 17.4% were lost to FU, and 46.8% remained on AS. Those diagnosed by TRUS had a significantly higher reclassification rate than those diagnosed by TURP (p < 0.0001). Men diagnosed by targeted MRI/TRUS fusion biopsy tended to have a higher reclassification probability than those diagnosed by conventional template biopsies (p = 0.083). Conclusions: Our single-center cohort spanning over two decades revealed that AS remains a safe option for low-risk PCa even in the long term. Approximately half of AS enrollees will eventually require definitive treatment due to disease progression. Men with incidental prostate cancer were significantly less likely to have disease progression.

Authors

Sarah Hagmann, Venkat Ramakrishnan, Alexander Tamalunas, Marc Hofmann, Moritz Vandenhirtz, Silvan Vollmer, Jsmea Hug, Philipp Niggli, Antonio Nocito, Rahel A. Kubik-Huch, Kurt Lehmann, Lukas John Hefermehl

Submitted

Cancers

Date

12.01.2022

Link DOI

MSc.

Moritz Vandenhirtz

PhD Student

Publications