Dr.
Thomas Sutter
Postdoc
- suttetho@inf.ethz.ch
- Address
-
Department of Computer Science
CAB G 37.1
Universitätstr. 6
CH – 8092 Zurich, Switzerland - Room
- CAB G 37.1
I completed my Master's in Electrical Engineering and Information Technology in 2014. After that, I joined the ETH spin-off upicto - which was later sold to Logitech - where I stayed for 4 years before starting my PhD in 2018.
I am working on applying machine learning models to the medical domain. I am interested in generative models of multiple data types. I try to understand how different modalities can be combined to efficiently model inference and generative paths.
I succesfully defended my thesis on June 14 2023.
Publications
Partitioning a set of elements into an unknown number of mutually exclusive subsets is essential in many machine learning problems. However, assigning elements, such as samples in a dataset or neurons in a network layer, to an unknown and discrete number of subsets is inherently non-differentiable, prohibiting end-to-end gradient-based optimization of parameters. We overcome this limitation by proposing a novel two-step method for inferring partitions, which allows its usage in variational inference tasks. This new approach enables reparameterized gradients with respect to the parameters of the new random partition model. Our method works by inferring the number of elements per subset and, second, by filling these subsets in a learned order. We highlight the versatility of our general-purpose approach on three different challenging experiments: variational clustering, inference of shared and independent generative factors under weak supervision, and multitask learning.
AuthorsThomas M. Sutter*, Alain Ryser*, Joram Liebeskind, Julia E. Vogt* denotes shared first authorship
SubmittedNeurips 2023
Date12.12.2023
Early detection of cardiac dysfunction through routine screening is vital for diagnosing cardiovascular diseases. An important metric of cardiac function is the left ventricular ejection fraction (EF), where lower EF is associated with cardiomyopathy. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound being a low-cost, real-time, and non-ionizing technology. However, human assessment of echocardiograms for calculating EF is time-consuming and expertise-demanding, raising the need for an automated approach. In this work, we propose using the M(otion)-mode of echocardiograms for estimating the EF and classifying cardiomyopathy. We generate multiple artificial M-mode images from a single echocardiogram and combine them using off-the-shelf model architectures. Additionally, we extend contrastive learning (CL) to cardiac imaging to learn meaningful representations from exploiting structures in unlabeled data allowing the model to achieve high accuracy, even with limited annotations. Our experiments show that the supervised setting converges with only ten modes and is comparable to the baseline method while bypassing its cumbersome training process and being computationally much more efficient. Furthermore, CL using M-mode images is helpful for limited data scenarios, such as having labels for only 200 patients, which is common in medical applications.
AuthorsEce Özkan Elsen*, Thomas M. Sutter*, Yurong Hu, Sebastian Balzer, Julia E. Vogt* denotes shared first authorship
SubmittedGCPR 2023
Date01.09.2023
Partitioning a set of elements into an unknown number of mutually exclusive subsets is essential in many machine learning problems. However, assigning elements, such as samples in a dataset or neurons in a network layer, to an unknown and discrete number of subsets is inherently non-differentiable, prohibiting end-to-end gradient-based optimization of parameters. We overcome this limitation by proposing a novel two-step method for inferring partitions, which allows its usage in variational inference tasks. This new approach enables reparameterized gradients with respect to the parameters of the new random partition model. Our method works by inferring the number of elements per subset and, second, by filling these subsets in a learned order. We highlight the versatility of our general-purpose approach on two different challenging experiments: variational clustering and inference of shared and independent generative factors under weak supervision.
AuthorsThomas M. Sutter*, Alain Ryser*, Joram Liebeskind, Julia E. Vogt* denotes shared first authorship
SubmittedICML workshop on Structured Probabilistic Inference & Generative Modeling
Date23.07.2023
Partitioning a set of elements into an unknown number of mutually exclusive subsets is essential in many machine-learning problems. However, assigning elements to an unknown and discrete number of subsets is inherently non-differentiable, prohibiting end-to-end gradient-based optimization of parameters. We propose a novel two-step method for learning distributions over partitions, including a reparametrization trick, to allow the inclusion of partitions in variational inference tasks. Our method works by first inferring the number of elements per subset and then sequentially filling these subsets in an order learned in a second step. We highlight the versatility of our general-purpose approach on two different experiments: multitask learning and unsupervised conditional sampling.
AuthorsThomas M. Sutter*, Alain Ryser*, Joram Liebeskind, Julia E. Vogt* denotes shared first authorship
SubmittedFifth Symposium on Advances in Approximate Bayesian Inference
Date18.07.2023
Partitioning a set of elements into subsets of a priori unknown sizes is essential in many applications. These subset sizes are rarely explicitly learned - be it the cluster sizes in clustering applications or the number of shared versus independent generative latent factors in weakly-supervised learning. Probability distributions over correct combinations of subset sizes are non-differentiable due to hard constraints, which prohibit gradient-based optimization. In this work, we propose the differentiable hypergeometric distribution. The hypergeometric distribution models the probability of different group sizes based on their relative importance. We introduce reparameterizable gradients to learn the importance between groups and highlight the advantage of explicitly learning the size of subsets in two typical applications: weakly-supervised learning and clustering. In both applications, we outperform previous approaches, which rely on suboptimal heuristics to model the unknown size of groups.
AuthorsThomas M. Sutter, Laura Manduchi, Alain Ryser, Julia E. Vogt
SubmittedICLR 2023
Date01.05.2023
Data scarcity is a fundamental problem since data lies at the heart of any ML project. For most applications, annotation is an expensive task in addition to data collection. Thus, learning from limited labeled data is very critical for data-limited problems, such as in healthcare applications, to have the ability to learn in a sample-efficient manner. Self-supervised learning (SSL) can learn meaningful representations from exploiting structures in unlabeled data, which allows the model to achieve high accuracy in various downstream tasks, even with limited annotations. In this work, we extend contrastive learning, an efficient implementation of SSL, to cardiac imaging. We propose to use generated M(otion)-mode images from readily available B(rightness)-mode echocardiograms and design contrastive objectives with structure and patient-awareness. Experiments on EchoNet-Dynamic show that our proposed model can achieve an AUROC score of 0.85 by simply training a linear head on top of the learned representations, and is insensitive to the reduction of labeled data.
AuthorsHu Yurong, Thomas M. Sutter, Ece Oezkan, Julia E. Vogt
Submitted1st Workshop on Machine Learning & Global Health (ICLR 2023)
Date20.03.2023
Early detection of cardiac dysfunction through routine screening is vital for diagnosing cardiovascular diseases. An important metric of cardiac function is the left ventricular ejection fraction (EF), which is used to diagnose cardiomyopathy. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound being a low-cost, real-time, and non-ionizing technology. However, human assessment of echocardiograms for calculating EF is both time-consuming and expertise-demanding, raising the need for an automated approach. Earlier automated works have been limited to still images or use echocardiogram videos with spatio-temporal convolutions in a complex pipeline. In this work, we propose to generate images from readily available echocardiogram videos, each image mimicking a M(otion)-mode image from a different scan line through time. We then combine different M-mode images using off-the-shelf model architectures to estimate the EF and, thus, diagnose cardiomyopathy. Our experiments show that our proposed method converges with only ten modes and is comparable to the baseline method while bypassing its cumbersome training process.
AuthorsThomas Sutter, Sebastian Balzer, Ece Özkan Elsen, Julia E. Vogt
SubmittedMedical Imaging Meets NeurIPS Workshop 2022
Date02.12.2022
Humans naturally integrate various senses to understand our surroundings, enabling us to compensate for partially missing sensory input.On the contrary, machine learning models excel at harnessing extensive datasets but face challenges in handling missing data effectively. While utilizing multiple data types provides a more comprehensive perspective, it also raises the likelihood of encountering missing values, underscoring the significance of proper missing data management in machine learning techniques. In this thesis, we advocate for developing machine learning models that emulate the human approach of merging diverse sensory inputs into a unified representation, demonstrating resilience in the face of missing input sources. Generating labels for multiple data types is laborious and often costly, resulting in a scarcity of fully annotated multimodal datasets. On the other hand, multimodal data naturally possesses a form of weak supervision. We understand that these samples describe the same event and assume that certain underlying generative factors are shared among the group members, providing a form of weak guidance. Our thesis focuses on learning from data characterized by weak supervision, delving into the interrelationships among group members. We start by exploring novel techniques for machine learning models capable of processing multimodal inputs while effectively handling missing data. Our emphasis is on variational autoencoders (VAE) for learning from weakly supervised data. We introduce a generalized formulation of probabilistic aggregation functions, designed to overcome the limitations of previous …
AuthorsThomas M. Sutter
Date30.09.2022
Partitioning a set of elements into an unknown number of mutually exclusive subsets is essential in many machine learning problems. However, assigning elements, such as samples in a dataset or neurons in a network layer, to an unknown and discrete number of subsets is inherently non-differentiable, prohibiting end-to-end gradient-based optimization of parameters. We overcome this limitation by proposing a novel two-step method for inferring partitions, which allows its usage in variational inference tasks. This new approach enables reparameterized gradients with respect to the parameters of the new random partition model. Our method works by inferring the number of elements per subset and, second, by filling these subsets in a learned order. We highlight the versatility of our general-purpose approach on two different challenging experiments: multitask learning and inference of shared and independent generative factors under weak supervision.
AuthorsThomas M. Sutter*, Alain Ryser*, Joram Liebeskind, Julia E Vogt* denotes shared first authorship
SubmittedICML 2023 Workshop on Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators
Date17.09.2022
Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. Yet, despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs, which are completely unsupervised. In an attempt to explain this gap, we uncover a fundamental limitation that applies to a large family of mixture-based multimodal VAEs. We prove that the sub-sampling of modalities enforces an undesirable upper bound on the multimodal ELBO and thereby limits the generative quality of the respective models. Empirically, we showcase the generative quality gap on both synthetic and real data and present the tradeoffs between different variants of multimodal VAEs. We find that none of the existing approaches fulfills all desired criteria of an effective multimodal generative model when applied on more complex datasets than those used in previous benchmarks. In summary, we identify, formalize, and validate fundamental limitations of VAE-based approaches for modeling weakly-supervised data and discuss implications for real-world applications.
AuthorsImant Daunhawer, Thomas M. Sutter, Kieran Chin-Cheong, Emanuele Palumbo, Julia E. Vogt
SubmittedThe Tenth International Conference on Learning Representations, ICLR 2022
Date07.04.2022
In this work, we propose a new formulation for multimodal VAEs to model and learn the relationship between data types. Despite their recent progress, current multimodal generative methods are based on simplistic assumptions regarding the relation between data types, which leads to a trade-off between coherence and quality of generated samples - even for simple toy datasets. The proposed method learns the relationship between data types instead of relying on pre-defined and limiting assumptions. Based on the principles of variational inference, we change the posterior approximation to explicitly include information about the relation between data types. We show empirically that the simplified assumption of a single shared latent space leads to inferior performance for a dataset with additional pairwise shared information.
AuthorsThomas M. Sutter, Julia E. Vogt
SubmittedBayesian Deep Learning Workshop at Neurips 2021
Date14.12.2021
Machine Learning has become more and more popular in the medical domain over the past years. While supervised machine learning has already been applied successfully, the vast amount of unlabelled data offers new opportunities for un- and self-supervised learning methods. Especially with regard to the multimodal nature of most clinical data, the labelling of multiple data types becomes quickly infeasible in the medical domain. However, to the best of our knowledge, multimodal unsupervised methods have been tested extensively on toy-datasets only but have never been applied to real-world medical data, for direct applications such as disease classification and image generation. In this article, we demonstrate that self-supervised methods provide promising results on medical data while highlighting that the task is extremely challenging and that there is space for substantial improvements.
AuthorsHendrik J. Klug, Thomas M. Sutter, Julia E. Vogt
SubmittedMedical Imaging with Deep Learning, MIDL 2021
Date07.07.2021
Background Preterm neonates frequently experience hypernatremia (plasma sodium concentrations >145 mmol/l), which is associated with clinical complications, such as intraventricular hemorrhage. Study design In this single center retrospective observational study, the following 7 risk factors for hypernatremia were analyzed in very low gestational age (VLGA, below 32 weeks) neonates: gestational age (GA), delivery mode (DM; vaginal or caesarian section), sex, birth weight, small for GA, multiple birth, and antenatal corticosteroids. Machine learning (ML) approaches were applied to obtain probabilities for hypernatremia. Results 824 VLGA neonates were included (median GA 29.4 weeks, median birth weight 1170g, caesarean section 83%). 38% of neonates experienced hypernatremia. Maximal sodium concentration of 144 mmol/l (interquartile range 142–147) was observed 52 hours (41–65) after birth. ML identified vaginal delivery and GA as key risk factors for hypernatremia. The risk of hypernatremia increased with lower GA from 22% for GA >= 31–32 weeks to 46% for GA < 31 weeks and 60% for GA < 27 weeks. A linear relationship between maximal sodium concentrations and GA was found, showing decreases of 0.29 mmol/l per increasing week GA in neonates with vaginal delivery and 0.49 mmol/l/week after cesarean section. Sex, multiple birth and antenatal corticosteroids were not associated hypernatremia. Conclusion VLGA neonates with vaginal delivery and low GA have the highest risk for hypernatremia. Early identification of neonates at risk and early intervention may prevent extreme sodium excursions and associated clinical complications.
AuthorsNadia S. Eugster, Florence Corminboeuf, Gilbert Koch, Julia E. Vogt, Thomas Sutter, Tamara van Donge, Marc Pfister, Roland Gerull
SubmittedKlinische Pädiatrie
Date07.06.2021
Multiple data types naturally co-occur when describing real-world phenomena and learning from them is a long-standing goal in machine learning research. However, existing self-supervised generative models approximating an ELBO are not able to fulfill all desired requirements of multimodal models: their posterior approximation functions lead to a trade-off between the semantic coherence and the ability to learn the joint data distribution. We propose a new, generalized ELBO formulation for multimodal data that overcomes these limitations. The new objective encompasses two previous methods as special cases and combines their benefits without compromises. In extensive experiments, we demonstrate the advantage of the proposed method compared to state-of-the-art models in self-supervised, generative learning tasks.
AuthorsThomas M. Sutter*, Imant Daunhawer*, Julia E. Vogt* denotes shared first authorship
SubmittedNinth International Conference on Learning Representations, ICLR 2021
Date04.05.2021
Rationale Tuberculosis diagnosis in children remains challenging. Microbiological confirmation of tuberculosis disease is often lacking, and standard immunodiagnostic including the tuberculin skin test and interferon-gamma release assay for tuberculosis infection has limited sensitivity. Recent research suggests that inclusion of novel Mycobacterium tuberculosis antigens has the potential to improve standard immunodiagnostic tests for tuberculosis. Objective To identify optimal antigen–cytokine combinations using novel Mycobacterium tuberculosis antigens and cytokine read-outs by machine learning algorithms to improve immunodiagnostic assays for tuberculosis. Methods A total of 80 children undergoing investigation of tuberculosis were included (15 confirmed tuberculosis disease, five unconfirmed tuberculosis disease, 28 tuberculosis infection and 32 unlikely tuberculosis). Whole blood was stimulated with 10 novel Mycobacterium tuberculosis antigens and a fusion protein of early secretory antigenic target (ESAT)-6 and culture filtrate protein (CFP) 10. Cytokines were measured using xMAP multiplex assays. Machine learning algorithms defined a discriminative classifier with performance measured using area under the receiver operating characteristics. Measurements and main results We found the following four antigen–cytokine pairs had a higher weight in the discriminative classifier compared to the standard ESAT-6/CFP-10-induced interferon-gamma: Rv2346/47c- and Rv3614/15c-induced interferon-gamma inducible protein-10; Rv2031c-induced granulocyte-macrophage colony-stimulating factor and ESAT-6/CFP-10-induced tumor necrosis factor-alpha. A combination of the 10 best antigen–cytokine pairs resulted in area under the curve of 0.92 +/- 0.04. Conclusion We exploited the use of machine learning algorithms as a key tool to evaluate large immunological datasets. This identified several antigen–cytokine pairs with the potential to improve immunodiagnostic tests for tuberculosis in children.
AuthorsNoemi Rebecca Meier, Thomas M. Sutter, Marc Jacobsen, Tom H. M. Ottenhoff, Julia E. Vogt, Nicole Ritz
SubmittedFrontiers in Cellular and Infection Microbiology
Date08.01.2021
Unplanned hospital readmissions are a burden to patients and increase healthcare costs. A wide variety of machine learning (ML) models have been suggested to predict unplanned hospital readmissions. These ML models were often specifically trained on patient populations with certain diseases. However, it is unclear whether these specialized ML models—trained on patient subpopulations with certain diseases or defined by other clinical characteristics—are more accurate than a general ML model trained on an unrestricted hospital cohort. In this study based on an electronic health record cohort of consecutive inpatient cases of a single tertiary care center, we demonstrate that accurate prediction of hospital readmissions may be obtained by general, disease-independent, ML models. This general approach may substantially decrease the cost of development and deployment of respective ML models in daily clinical routine, as all predictions are obtained by the use of a single model.
AuthorsThomas Sutter, Jan A Roth, Kieran Chin-Cheong, Balthasar L Hug, Julia E Vogt
SubmittedJournal of the American Medical Informatics Association
Date18.12.2020
Learning from different data types is a long-standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. However, existing generative models that approximate a multimodal ELBO rely on difficult or inefficient training schemes to learn a joint distribution and the dependencies between modalities. In this work, we propose a novel, efficient objective function that utilizes the Jensen-Shannon divergence for multiple distributions. It simultaneously approximates the unimodal and joint multimodal posteriors directly via a dynamic prior. In addition, we theoretically prove that the new multimodal JS-divergence (mmJSD) objective optimizes an ELBO. In extensive experiments, we demonstrate the advantage of the proposed mmJSD model compared to previous work in unsupervised, generative learning tasks.
AuthorsThomas M. Sutter, Imant Daunhawer, Julia E. Vogt
SubmittedNeurIPS 2020
Date22.10.2020
Multimodal generative models learn a joint distribution over multiple modalities and thus have the potential to learn richer representations than unimodal models. However, current approaches are either inefficient in dealing with more than two modalities or fail to capture both modality-specific and shared variations. We introduce a new multimodal generative model that integrates both modality-specific and shared factors and aggregates shared information across any subset of modalities efficiently. Our method partitions the latent space into disjoint subspaces for modality-specific and shared factors and learns to disentangle these in a purely self-supervised manner. In extensive experiments, we show improvements in representation learning and generative performance compared to previous methods and showcase the disentanglement capabilities.
AuthorsImant Daunhawer, Thomas M. Sutter, Ricards Marcinkevics, Julia E. Vogt
SubmittedGCPR 2020
Date10.09.2020
Electronic Health Records (EHRs) are commonly used by the machine learning community for research on problems specifically related to health care and medicine. EHRs have the advantages that they can be easily distributed and contain many features useful for e.g. classification problems. What makes EHR data sets different from typical machine learning data sets is that they are often very sparse, due to their high dimensionality, and often contain heterogeneous (mixed) data types. Furthermore, the data sets deal with sensitive information, which limits the distribution of any models learned using them, due to privacy concerns. For these reasons, using EHR data in practice presents a real challenge. In this work, we explore using Generative Adversarial Networks to generate synthetic, heterogeneous EHRs with the goal of using these synthetic records in place of existing data sets for downstream classification tasks. We will further explore applying differential privacy (DP) preserving optimization in order to produce DP synthetic EHR data sets, which provide rigorous privacy guarantees, and are therefore shareable and usable in the real world. The performance (measured by AUROC, AUPRC and accuracy) of our model's synthetic, heterogeneous data is very close to the original data set (within 3 - 5% of the baseline) for the non-DP model when tested in a binary classification task. Using strong (1,10^-5) DP, our model still produces data useful for machine learning tasks, albeit incurring a roughly 17% performance penalty in our tested classification task. We additionally perform a sub-population analysis and find that our model does not introduce any bias into the synthetic EHR data compared to the baseline in either male/female populations, or the 0-18, 19-50 and 51+ age groups in terms of classification performance for either the non-DP or DP variant.
AuthorsKieran Chin-Cheong, Thomas M. Sutter, Julia E. Vogt
SubmittedArxiv
Date07.06.2020
Learning from different data types is a long standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. Existing generative models that try to approximate a multimodal ELBO rely on difficult training schemes to handle the intermodality dependencies, as well as the approximation of the joint representation in case of missing data. In this work, we propose an ELBO for multimodal data which learns the unimodal and joint multimodal posterior approximation functions directly via a dynamic prior. We show that this ELBO is directly derived from a variational inference setting for multiple data types, resulting in a divergence term which is the Jensen-Shannon divergence for multiple distributions. We compare the proposed multimodal JS-divergence (mmJSD) model to state-of-the-art methods and show promising results using our model in unsupervised, generative learning using a multimodal VAE on two different datasets.
AuthorsThomas Sutter, Imant Daunhawer, Julia E. Vogt
SubmittedVisually Grounded Interaction and Language Workshop, NeurIPS 2019
Date12.12.2019
Multimodal generative models learn a joint distribution of data from different modalities---a task which arguably benefits from the disentanglement of modality-specific and modality-invariant information. We propose a factorized latent variable model that learns named disentanglement on multimodal data without additional supervision. We demonstrate the disentanglement capabilities on simulated data, and show that disentangled representations can improve the conditional generation of missing modalities without sacrificing unconditional generation.
AuthorsImant Daunhawer, Thomas Sutter, Julia E. Vogt
SubmittedBayesian Deep Learning Workshop, NeurIPS 2019
Date12.12.2019
Electronic Health Records (EHRs) are commonly used by the machine learning community for research on problems specifically related to health care and medicine. EHRs have the advantages that they can be easily distributed and contain many features useful for e.g. classification problems. What makes EHR data sets different from typical machine learning data sets is that they are often very sparse, due to their high dimensionality, and often contain heterogeneous data types. Furthermore, the data sets deal with sensitive information, which limits the distribution of any models learned using them, due to privacy concerns. In this work, we explore using Generative Adversarial Networks to generate synthetic, \textit{heterogeneous} EHRs with the goal of using these synthetic records in place of existing data sets. We will further explore applying differential privacy (DP) preserving optimization in order to produce differentially private synthetic EHR data sets, which provide rigorous privacy guarantees, and are therefore more easily shareable. The performance of our model's synthetic, heterogeneous data is very close to the original data set (within 4.5%) for the non-DP model. Although around 20% worse, the DP synthetic data is still usable for machine learning tasks.
AuthorsKieran Chin-Cheong, Thomas Sutter, Julia E. Vogt
SubmittedMachine Learning for Health (ML4H) Workshop, NeurIPS 2019
Date12.12.2019