MSc.
Emanuele Palumbo
PhD Student
- emanuele.palumbo@inf.ethz.ch
- Address
-
Department of Computer Science
CAB G 15.2
Universitätstr. 6
CH – 8092 Zurich, Switzerland - Room
- CAB G 15.2
I completed my Bachelor’s Degree in Computer and Automation Engineering at Università Politecnica delle Marche (UNIVPM). Afterwards, I obtained a Master’s Degree in Data Science at ETH Zürich, with a thesis in efficient deep generative learning with a large number of data modalities. In May 2022 I joined the Medical Data Science group as a Doctoral Fellow of the ETH AI Center, co-mentored by Prof. Dr. Andrea Burden.
I am interested in multimodal learning, and its applications in the medical domain. In particular, I conduct research on generative approaches which can efficiently handle a large number of heterogeneous modalities, with the downstream goal of applying such models to real-world medical data.
Publications
The structure of many real-world datasets is intrinsically hierarchical, making the modeling of such hierarchies a critical objective in both unsupervised and supervised machine learning. Recently, novel approaches for hierarchical clustering with deep architectures have been proposed. In this work, we take a critical perspective on this line of research and demonstrate that many approaches exhibit major limitations when applied to realistic datasets, partly due to their high computational complexity. In particular, we show that a lightweight procedure implemented on top of pre-trained non-hierarchical clustering models outperforms models designed specifically for hierarchical clustering. Our proposed approach is computationally efficient and applicable to any pre-trained clustering model that outputs logits, without requiring any fine-tuning. To highlight the generality of our findings, we illustrate how our method can also be applied in a supervised setup, recovering meaningful hierarchies from a pre-trained ImageNet classifier.
AuthorsEmanuele Palumbo, Moritz Vandenhirtz, Alain Ryser, Imant Daunhawer†, Julia E. Vogt†† denotes shared last authorship
SubmittedPreprint
Date10.10.2024
Multimodal VAEs have recently received significant attention as generative models for weakly-supervised learning with multiple heterogeneous modalities. In parallel, VAE-based methods have been explored as probabilistic approaches for clustering tasks. Our work lies at the intersection of these two research directions. We propose a novel multimodal VAE model, in which the latent space is extended to learn data clusters, leveraging shared information across modalities. Our experiments show that our proposed model improves generative performance over existing multimodal VAEs, particularly for unconditional generation. Furthermore, our method favourably compares to alternative clustering approaches, in weakly-supervised settings. Notably, we propose a post-hoc procedure that avoids the need for our method to have a priori knowledge of the true number of clusters, mitigating a critical limitation of previous clustering frameworks.
AuthorsEmanuele Palumbo, Sonia Laguna, Daphné Chopard, Julia E Vogt
SubmittedICML 2023 Workshop on Structured Probabilistic Inference/Generative Modeling
Date23.06.2023
Multimodal VAEs have recently received significant attention as generative models for weaklysupervised learning with multiple heterogeneous modalities. In parallel, VAE-based methods have been explored as probabilistic approaches for clustering tasks. Our work lies at the intersection of these two research directions. We propose a novel multimodal VAE model, in which the latent space is extended to learn data clusters, leveraging shared information across modalities. Our experiments show that our proposed model improves generative performance over existing multimodal VAEs, particularly for unconditional generation. Furthermore, our method favorably compares to alternative clustering approaches, in weakly-supervised settings. Notably, we propose a post-hoc procedure that avoids the need for to have a priori knowledge of the true number of clusters, mitigating a critical limitation previous clustering frameworks.
AuthorsEmanuele Palumbo, Sonia Laguna, Daphné Chopard, Julia E Vogt
SubmittedICML 2023 Workshop DeployableGenerativeAI
Date23.06.2023
Contrastive learning is a cornerstone underlying recent progress in multi-view and multimodal learning, e.g., in representation learning with image/caption pairs. While its effectiveness is not yet fully understood, a line of recent work reveals that contrastive learning can invert the data generating process and recover ground truth latent factors shared between views. In this work, we present new identifiability results for multimodal contrastive learning, showing that it is possible to recover shared factors in a more general setup than the multi-view setting studied previously. Specifically, we distinguish between the multi-view setting with one generative mechanism (e.g., multiple cameras of the same type) and the multimodal setting that is characterized by distinct mechanisms (e.g., cameras and microphones). Our work generalizes previous identifiability results by redefining the generative process in terms of distinct mechanisms with modality-specific latent variables. We prove that contrastive learning can block-identify latent factors shared between modalities, even when there are nontrivial dependencies between factors. We empirically verify our identifiability results with numerical simulations and corroborate our findings on a complex multimodal dataset of image/text pairs. Zooming out, our work provides a theoretical basis for multimodal representation learning and explains in which settings multimodal contrastive learning can be effective in practice.
AuthorsImant Daunhawer, Alice Bizeul, Emanuele Palumbo, Alexander Marx, Julia E. Vogt
SubmittedThe Eleventh International Conference on Learning Representations, ICLR 2023
Date23.03.2023
Multimodal VAEs have recently gained attention as efficient models for weakly-supervised generative learning with multiple modalities. However, all existing variants of multimodal VAEs are affected by a non-trivial trade-off between generative quality and generative coherence. In particular mixture-based models achieve good coherence only at the expense of sample diversity and a resulting lack of generative quality. We present a novel variant of the mixture-of-experts multimodal variational autoencoder that improves its generative quality, while maintaining high semantic coherence. We model shared and modality-specific information in separate latent subspaces, proposing an objective that overcomes certain dependencies on hyperparameters that arise for existing approaches with the same latent space structure. Compared to these existing approaches, we show increased robustness with respect to changes in the design of the latent space, in terms of the capacity allocated to modality-specific subspaces. We show that our model achieves both good generative coherence and high generative quality in challenging experiments, including more complex multimodal datasets than those used in previous works.
AuthorsEmanuele Palumbo, Imant Daunhawer, Julia E. Vogt
SubmittedThe Eleventh International Conference on Learning Representations, ICLR 2023
Date02.03.2023
Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. Yet, despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs, which are completely unsupervised. In an attempt to explain this gap, we uncover a fundamental limitation that applies to a large family of mixture-based multimodal VAEs. We prove that the sub-sampling of modalities enforces an undesirable upper bound on the multimodal ELBO and thereby limits the generative quality of the respective models. Empirically, we showcase the generative quality gap on both synthetic and real data and present the tradeoffs between different variants of multimodal VAEs. We find that none of the existing approaches fulfills all desired criteria of an effective multimodal generative model when applied on more complex datasets than those used in previous benchmarks. In summary, we identify, formalize, and validate fundamental limitations of VAE-based approaches for modeling weakly-supervised data and discuss implications for real-world applications.
AuthorsImant Daunhawer, Thomas M. Sutter, Kieran Chin-Cheong, Emanuele Palumbo, Julia E. Vogt
SubmittedThe Tenth International Conference on Learning Representations, ICLR 2022
Date07.04.2022