Prof. Dr.

Julia Vogt

Group Leader

E-Mail
julia.vogt@inf.ethz.ch
Phone
+41 44 633 8714
Address
Department of Computer Science
CAB G 16.2
Universitätstr. 6
CH – 8092 Zurich, Switzerland
Room
CAB G 16.2

Julia Vogt is an assistant professor in Computer Science at ETH Zurich, where she leads the Medical Data Science Group. The focus of her research is on linking computer science with medicine, with the ultimate aim of personalized patient treatment. She has studied mathematics both in Konstanz and in Sydney and earned her Ph.D. in computer science at the University of Basel. She was a postdoctoral research fellow at the Memorial Sloan-Kettering Cancer Center in NYC and with the Bioinformatics and Information Mining group at the University of Konstanz. In 2018, she joined the University of Basel as an assistant professor. In May 2019, she and her lab moved to Zurich where she joined the Computer Science Department of ETH Zurich.

Abstract

Appendicitis is among the most frequent reasons for pediatric abdominal surgeries. Previous decision support systems for appendicitis have focused on clinical, laboratory, scoring, and computed tomography data and have ignored abdominal ultrasound, despite its noninvasive nature and widespread availability. In this work, we present interpretable machine learning models for predicting the diagnosis, management and severity of suspected appendicitis using ultrasound images. Our approach utilizes concept bottleneck models (CBM) that facilitate interpretation and interaction with high-level concepts understandable to clinicians. Furthermore, we extend CBMs to prediction problems with multiple views and incomplete concept sets. Our models were trained on a dataset comprising 579 pediatric patients with 1709 ultrasound images accompanied by clinical and laboratory data. Results show that our proposed method enables clinicians to utilize a human-understandable and intervenable predictive model without compromising performance or requiring time-consuming image annotation when deployed. For predicting the diagnosis, the extended multiview CBM attained an AUROC of 0.80 and an AUPR of 0.92, performing comparably to similar black-box neural networks trained and tested on the same dataset.

Authors

Ricards Marcinkevics, Patricia Reis Wolfertstetter, Ugne Klimiene, Kieran Chin-Cheong, Alyssia Paschke, Julia Zerres, Markus Denzinger, David Niederberger, Sven Wellmann, Ece Özkan Elsen, Christian Knorr, Julia E. Vogt

Submitted

Medical Image Analysis

Date

01.01.2024

LinkDOICode

Abstract

We propose Tree Variational Autoencoder (TreeVAE), a new generative hierarchical clustering model that learns a flexible tree-based posterior distribution over latent variables. TreeVAE hierarchically divides samples according to their intrinsic characteristics, shedding light on hidden structures in the data. It adapts its architecture to discover the optimal tree for encoding dependencies between latent variables. The proposed tree-based generative architecture enables lightweight conditional inference and improves generative performance by utilizing specialized leaf decoders. We show that TreeVAE uncovers underlying clusters in the data and finds meaningful hierarchical relations between the different groups on a variety of datasets, including real-world imaging data. We present empirically that TreeVAE provides a more competitive log-likelihood lower bound than the sequential counterparts. Finally, due to its generative nature, TreeVAE is able to generate new samples from the discovered clusters via conditional sampling.

Authors

Laura Manduchi, Moritz Vandenhirtz, Alain Ryser, Julia E. Vogt

Submitted

Spotlight at Neural Information Processing Systems, NeurIPS 2023

Date

17.12.2023

LinkCode

Abstract

Recently, interpretable machine learning has re-explored concept bottleneck models (CBM), comprising step-by-step prediction of the high-level concepts from the raw features and the target variable from the predicted concepts. A compelling advantage of this model class is the user's ability to intervene on the predicted concept values, consequently affecting the model's downstream output. In this work, we introduce a method to perform such concept-based interventions on already-trained neural networks, which are not interpretable by design. Furthermore, we formalise the model's intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black-box models. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We demonstrate that fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of the proposed techniques, we apply them to chest X-ray classifiers and show that fine-tuned black boxes can be as intervenable and more performant than CBMs.

Authors

Ricards Marcinkevics, Sonia Laguna, Moritz Vandenhirtz, Julia E. Vogt

Submitted

XAI in Action: Past, Present, and Future Applications, NeurIPS 2023

Date

16.12.2023

Link

Abstract

Prototype learning, a popular machine learning method designed for inherently interpretable decisions, leverages similarities to learned prototypes for classifying new data. While it is mainly applied in computer vision, in this work, we build upon prior research and further explore the extension of prototypical networks to natural language processing. We introduce a learned weighted similarity measure that enhances the similarity computation by focusing on informative dimensions of pre-trained sentence embeddings. Additionally, we propose a post-hoc explainability mechanism that extracts prediction-relevant words from both the prototype and input sentences. Finally, we empirically demonstrate that our proposed method not only improves predictive performance on the AG News and RT Polarity datasets over a previous prototype-based approach, but also improves the faithfulness of explanations compared to rationale-based recurrent convolutions.

Authors

Claudio Fanconi, Moritz Vandenhirtz, Severin Husmann, Julia E. Vogt

Submitted

Conference on Empirical Methods in Natural Language Processing, EMNLP 2023

Date

07.12.2023

LinkCode

Abstract

Chronic obstructive pulmonary disease (COPD) is a significant public health issue, affecting more than 100 million people worldwide. Remote patient monitoring has shown great promise in the efficient management of patients with chronic diseases. This work presents the analysis of the data from a monitoring system developed to track COPD symptoms alongside patients’ self-reports. In particular, we investigate the assessment of COPD severity using multisensory home-monitoring device data acquired from 30 patients over a period of three months. We describe a comprehensive data pre-processing and feature engineering pipeline for multimodal data from the remote home-monitoring of COPD patients. We develop and validate predictive models forecasting i) the absolute and ii) differenced COPD Assessment Test (CAT) scores based on the multisensory data. The best obtained models achieve Pearson’s correlation coefficient of 0.93 and 0.37 for absolute and differenced CAT scores. In addition, we investigate the importance of individual sensor modalities for predicting CAT scores using group sparse regularization techniques. Our results suggest that feature groups indicative of the patient’s general condition, such as static medical and physiological information, date, spirometer, and air quality, are crucial for predicting the absolute CAT score. For predicting changes in CAT scores, sleep and physical activity features are most important, alongside the previous CAT score value. Our analysis demonstrates the potential of remote patient monitoring for COPD management and investigates which sensor modalities are most indicative of COPD severity as assessed by the CAT score. Our findings contribute to the development of effective and data-driven COPD management strategies.

Authors

Zixuan Xiao, Michal Muszynski, Ricards Marcinkevics, Lukas Zimmerli, Adam D. Ivankay, Dario Kohlbrenner, Manuel Kuhn, Yves Nordmann, Ulrich Muehlner, Christian Clarenbach, Julia E. Vogt, Thomas Brunschwiler

Submitted

25th ACM International Conference on Multimodal Interaction, ICMI'23

Date

09.10.2023

LinkDOI

Abstract

Early detection of cardiac dysfunction through routine screening is vital for diagnosing cardiovascular diseases. An important metric of cardiac function is the left ventricular ejection fraction (EF), where lower EF is associated with cardiomyopathy. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound being a low-cost, real-time, and non-ionizing technology. However, human assessment of echocardiograms for calculating EF is time-consuming and expertise-demanding, raising the need for an automated approach. In this work, we propose using the M(otion)-mode of echocardiograms for estimating the EF and classifying cardiomyopathy. We generate multiple artificial M-mode images from a single echocardiogram and combine them using off-the-shelf model architectures. Additionally, we extend contrastive learning (CL) to cardiac imaging to learn meaningful representations from exploiting structures in unlabeled data allowing the model to achieve high accuracy, even with limited annotations. Our experiments show that the supervised setting converges with only ten modes and is comparable to the baseline method while bypassing its cumbersome training process and being computationally much more efficient. Furthermore, CL using M-mode images is helpful for limited data scenarios, such as having labels for only 200 patients, which is common in medical applications.

Authors

Ece Özkan Elsen and Thomas M. Sutter, Yurong Hu, Sebastian Balzer, Julia E. Vogt

Submitted

GCPR 2023

Date

01.09.2023

LinkCode

Abstract

We propose a new generative hierarchical clustering model that learns a flexible tree-based posterior distribution over latent variables. The proposed Tree Variational Autoencoder (TreeVAE) hierarchically divides samples according to their intrinsic characteristics, shedding light on hidden structures in the data. It adapts its architecture to discover the optimal tree for encoding dependencies between latent variables, improving generative performance. We show that TreeVAE uncovers underlying clusters in the data and finds meaningful hierarchical relations between the different groups on several datasets. Due to its generative nature, TreeVAE can generate new samples from the discovered clusters via conditional sampling.

Authors

Laura Manduchi, Moritz Vandenhirtz, Alain Ryser, Julia E. Vogt

Submitted

ICML 2023 Workshop on Deployment Challenges for Generative AI

Date

29.07.2023

LinkCode

Abstract

We propose a new generative hierarchical clustering model that learns a flexible tree-based posterior distribution over latent variables. The proposed Tree Variational Autoencoder (TreeVAE) hierarchically divides samples according to their intrinsic characteristics, shedding light on hidden structures in the data. It adapts its architecture to discover the optimal tree for encoding dependencies between latent variables, improving generative performance. We show that TreeVAE uncovers underlying clusters in the data and finds meaningful hierarchical relations between the different groups on several datasets. Due to its generative nature, TreeVAE can generate new samples from the discovered clusters via conditional sampling.

Authors

Laura Manduchi, Moritz Vandenhirtz, Alain Ryser, Julia E. Vogt

Submitted

ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling

Date

29.07.2023

LinkCode

Abstract

Appendicitis is among the most frequent reasons for pediatric abdominal surgeries. With recent advances in machine learning, data-driven decision support could help clinicians diagnose and manage patients while reducing the number of non-critical surgeries. However, previous decision support systems for appendicitis have focused on clinical, laboratory, scoring, and computed tomography data and have ignored the use of abdominal ultrasound, despite its noninvasive nature and widespread availability. In this work, we present interpretable machine learning models for predicting the diagnosis, management and severity of suspected appendicitis using ultrasound images. To this end, our approach utilizes concept bottleneck models (CBM) that facilitate interpretation and interaction with high-level concepts that are understandable to clinicians. Furthermore, we extend CBMs to prediction problems with multiple views and incomplete concept sets. Our models were trained on a dataset comprising 579 pediatric patients with 1709 ultrasound images accompanied by clinical and laboratory data. Results show that our proposed method enables clinicians to utilize a human-understandable and intervenable predictive model without compromising performance or requiring time-consuming image annotation when deployed.

Authors

Ricards Marcinkevics, Patricia Reis Wolfertstetter, Ugne Klimiene, Kieran Chin-Cheong, Alyssia Paschke, Julia Zerres, Markus Denzinger, David Niederberger, Sven Wellmann, Ece Özkan Elsen, Christian Knorr, Julia E. Vogt

Submitted

Workshop on Machine Learning for Multimodal Healthcare Data, Co-located with ICML 2023

Date

29.07.2023

Abstract

Abstract Ante-hoc interpretability has become the holy grail of explainable artificial intelligence for high-stakes domains such as healthcare; however, this notion is elusive, lacks a widely-accepted definition and depends on the operational context. It can refer to predictive models whose structure adheres to domain-specific constraints, or ones that are inherently transparent. The latter conceptualisation assumes observers who judge this quality, whereas the former presupposes them to have technical and domain expertise (thus alienating other groups of explainees). Additionally, the distinction between ante-hoc interpretability and the less desirable post-hoc explainability, which refers to methods that construct a separate explanatory model, is vague given that transparent predictive models may still require (post-)processing to yield suitable explanatory insights. Ante-hoc interpretability is thus an overloaded concept that comprises a range of implicit properties, which we unpack in this paper to better understand what is needed for its safe deployment across high-stakes domains. To this end, we outline modelling and explaining desiderata that allow us to navigate its distinct realisations in view of the envisaged application and audience.

Authors

Kacper Sokol, Julia E. Vogt

Submitted

Workshop on Interpretable ML in Healthcare at 2023 International Conference on Machine Learning (ICML)

Date

28.07.2023

LinkDOI

Authors

Alexander Immer, Christoph Schultheiss, Julia E. Vogt, Bernhard Schölkopf, Peter Bühlmann, Alexander Marx

Submitted

Proceedings of the 40th International Conference on Machine Learning, ICML 2023

Date

04.07.2023

LinkCode

Abstract

Multimodal VAEs have recently received significant attention as generative models for weakly-supervised learning with multiple heterogeneous modalities. In parallel, VAE-based methods have been explored as probabilistic approaches for clustering tasks. Our work lies at the intersection of these two research directions. We propose a novel multimodal VAE model, in which the latent space is extended to learn data clusters, leveraging shared information across modalities. Our experiments show that our proposed model improves generative performance over existing multimodal VAEs, particularly for unconditional generation. Furthermore, our method favourably compares to alternative clustering approaches, in weakly-supervised settings. Notably, we propose a post-hoc procedure that avoids the need for our method to have a priori knowledge of the true number of clusters, mitigating a critical limitation of previous clustering frameworks.

Authors

Emanuele Palumbo, Sonia Laguna, Daphné Chopard, Julia E Vogt

Submitted

ICML 2023 Workshop on Structured Probabilistic Inference/Generative Modeling

Date

23.06.2023

Link

Abstract

Multimodal VAEs have recently received significant attention as generative models for weaklysupervised learning with multiple heterogeneous modalities. In parallel, VAE-based methods have been explored as probabilistic approaches for clustering tasks. Our work lies at the intersection of these two research directions. We propose a novel multimodal VAE model, in which the latent space is extended to learn data clusters, leveraging shared information across modalities. Our experiments show that our proposed model improves generative performance over existing multimodal VAEs, particularly for unconditional generation. Furthermore, our method favorably compares to alternative clustering approaches, in weakly-supervised settings. Notably, we propose a post-hoc procedure that avoids the need for to have a priori knowledge of the true number of clusters, mitigating a critical limitation previous clustering frameworks.

Authors

Emanuele Palumbo, Sonia Laguna, Daphné Chopard, Julia E Vogt

Submitted

ICML 2023 Workshop DeployableGenerativeAI

Date

23.06.2023

Link

Authors

Paweł Czyż, Frederic Grabowski, Julia E. Vogt, Niko Beerenwinkel, Alexander Marx

Submitted

Arxiv

Date

19.06.2023

LinkCode

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) induces B and T cell responses, contributing to virus neutralization. In a cohort of 2,911 young adults, we identified 65 individuals who had an asymptomatic or mildly symptomatic SARS-CoV-2 infection and characterized their humoral and T cell responses to the Spike (S), Nucleocapsid (N) and Membrane (M) proteins. We found that previous infection induced CD4 T cells that vigorously responded to pools of peptides derived from the S and N proteins. By using statistical and machine learning models, we observed that the T cell response highly correlated with a compound titer of antibodies against the Receptor Binding Domain (RBD), S and N. However, while serum antibodies decayed over time, the cellular phenotype of these individuals remained stable over four months. Our computational analysis demonstrates that in young adults, asymptomatic and paucisymptomatic SARS-CoV-2 infections can induce robust and long-lasting CD4 T cell responses that exhibit slower decays than antibody titers. These observations imply that next-generation COVID-19 vaccines should be designed to induce stronger cellular responses to sustain the generation of potent neutralizing antibodies.

Authors

Ricards Marcinkevics, Pamuditha N. Silva, Anna-Katharina Hankele, Charlyn Dörnte, Sarah Kadelka, Katharina Csik, Svenja Godbersen, Algera Goga, Lynn Hasenöhrl, Pascale Hirschi, Hasan Kabakci, Mary P. LaPierre, Johanna Mayrhofer, Alexandra C. Title, Xuan Shu, Nouell Baiioud, Sandra Bernal, Laura Dassisti, Mara D. Saenz-de-Juano, Meret Schmidhauser, Giulia Silvestrelli, Simon Z. Ulbrich, Thea J. Ulbrich, Tamara Wyss, Daniel J. Stekhoven, Faisal S. Al-Quaddoomi, Shuqing Yu, Mascha Binder, Christoph Schultheiβ, Claudia Zindel, Christoph Kolling, Jörg Goldhahn, Bahram Kasmapour Seighalani, Polina Zjablovskaja, Frank Hardung, Marc Schuster, Anne Richter, Yi-Ju Huang, Gereon Lauer, Herrad Baurmann, Jun Siong Low, Daniela Vaqueirinho, Sandra Jovic, Luca Piccoli, Sandra Ciesek, Julia E. Vogt, Federica Sallusto, Markus Stoffel, Susanne E. Ulbrich

Submitted

Frontiers in Immunology

Date

29.05.2023

LinkDOICode

Abstract

Partitioning a set of elements into an unknown number of mutually exclusive subsets is essential in many machine learning problems. However, assigning elements, such as samples in a dataset or neurons in a network layer, to an unknown and discrete number of subsets is inherently non-differentiable, prohibiting end-to-end gradient-based optimization of parameters. We overcome this limitation by proposing a novel two-step method for inferring partitions, which allows its usage in variational inference tasks. This new approach enables reparameterized gradients with respect to the parameters of the new random partition model. Our method works by inferring the number of elements per subset and, second, by filling these subsets in a learned order. We highlight the versatility of our general-purpose approach on three different challenging experiments: variational clustering, inference of shared and independent generative factors under weak supervision, and multitask learning.

Authors

Thomas M. Sutter and Alain Ryser, Joram Liebeskind, Julia E. Vogt

Submitted

arxiv

Date

26.05.2023

Link

Abstract

Spurious correlations are everywhere. While humans often do not perceive them, neural networks are notorious for learning unwanted associations, also known as biases, instead of the underlying decision rule. As a result, practitioners are often unaware of the biased decision-making of their classifiers. Such a biased model based on spurious correlations might not generalize to unobserved data, leading to unintended, adverse consequences. We propose Signal is Harder (SiH), a variational-autoencoder-based method that simultaneously trains a biased and unbiased classifier using a novel, disentangling reweighting scheme inspired by the focal loss. Using the unbiased classifier, SiH matches or improves upon the performance of state-of-the-art debiasing methods. To improve the interpretability of our technique, we propose a perturbation scheme in the latent space for visualizing the bias that helps practitioners become aware of the sources of spurious correlations.

Authors

Moritz Vandenhirtz, Laura Manduchi, Ricards Marcinkevics, Julia E. Vogt

Submitted

Domain Generalization Workshop, ICLR 2023

Date

04.05.2023

LinkCode

Abstract

Partitioning a set of elements into subsets of a priori unknown sizes is essential in many applications. These subset sizes are rarely explicitly learned - be it the cluster sizes in clustering applications or the number of shared versus independent generative latent factors in weakly-supervised learning. Probability distributions over correct combinations of subset sizes are non-differentiable due to hard constraints, which prohibit gradient-based optimization. In this work, we propose the differentiable hypergeometric distribution. The hypergeometric distribution models the probability of different group sizes based on their relative importance. We introduce reparameterizable gradients to learn the importance between groups and highlight the advantage of explicitly learning the size of subsets in two typical applications: weakly-supervised learning and clustering. In both applications, we outperform previous approaches, which rely on suboptimal heuristics to model the unknown size of groups.

Authors

Thomas M. Sutter, Laura Manduchi, Alain Ryser, Julia E. Vogt

Submitted

ICLR 2023

Date

01.05.2023

LinkCode

Authors

Alice Bizeul, Imant Daunhawer, Emanuele Palumbo, Bernhard Schölkopf, Alexander Marx, Julia E. Vogt

Submitted

Conference on Causal Learning and Reasoning (Datasets Track), CLeaR, 2023

Date

08.04.2023

LinkCode

Abstract

Machine learning (ML) is a discipline emerging from computer science with close ties to statistics and applied mathematics. Its fundamental goal is the design of computer programs, or algorithms, that learn to perform a certain task in an automated manner. Without explicit rules or knowledge, ML algorithms observe and possibly, interact with the surrounding world by the use of available data. Typically, as a result of learning, algorithms distil observations of complex phenomena into a general model which summarises the patterns, or regularities, discovered from the data. Modern ML algorithms regularly break records achieving impressive performance at a wide range of tasks, e.g. game playing, protein structure prediction, searching for particles in high-energy physics, and forecasting precipitation. The utility of machine learning methods for healthcare is apparent: it is often argued that given vast amounts of heterogeneous data, our understanding of diseases, patient management and outcomes can be enriched with the insights from machine learning. In this chapter, we will provide a nontechnical introduction to the ML discipline aimed at a general audience with an affinity for biomedical applications. We will familiarise the reader with the common types of algorithms and typical tasks these algorithms can solve and illustrate these basic concepts by concrete examples of current machine learning applications in healthcare. We will conclude with a discussion of the open challenges, limitations, and potential impact of machine-learning-powered medicine.

Authors

Julia E. Vogt, Ece Özkan Elsen, Ricards Marcinkevics

Submitted

Chapter in Digital Medicine: Bringing Digital Solutions to Medical Practice

Date

31.03.2023

LinkDOI

Abstract

Contrastive learning is a cornerstone underlying recent progress in multi-view and multimodal learning, e.g., in representation learning with image/caption pairs. While its effectiveness is not yet fully understood, a line of recent work reveals that contrastive learning can invert the data generating process and recover ground truth latent factors shared between views. In this work, we present new identifiability results for multimodal contrastive learning, showing that it is possible to recover shared factors in a more general setup than the multi-view setting studied previously. Specifically, we distinguish between the multi-view setting with one generative mechanism (e.g., multiple cameras of the same type) and the multimodal setting that is characterized by distinct mechanisms (e.g., cameras and microphones). Our work generalizes previous identifiability results by redefining the generative process in terms of distinct mechanisms with modality-specific latent variables. We prove that contrastive learning can block-identify latent factors shared between modalities, even when there are nontrivial dependencies between factors. We empirically verify our identifiability results with numerical simulations and corroborate our findings on a complex multimodal dataset of image/text pairs. Zooming out, our work provides a theoretical basis for multimodal representation learning and explains in which settings multimodal contrastive learning can be effective in practice.

Authors

Imant Daunhawer, Alice Bizeul, Emanuele Palumbo, Alexander Marx, Julia E. Vogt

Submitted

The Eleventh International Conference on Learning Representations, ICLR 2023

Date

23.03.2023

Link

Abstract

Background and Objectives: Remote patient monitoring (RPM) of vital signs and symptoms for lung transplant recipients (LTRs) has become increasingly relevant in many situations. Nevertheless, RPM research integrating multisensory home monitoring in LTRs is scarce. We developed a novel multisensory home monitoring device and tested it in the context of COVID-19 vaccinations. We hypothesize that multisensory RPM and smartphone-based questionnaire feedback on signs and symptoms will be well accepted among LTRs. To assess the usability and acceptability of a remote monitoring system consisting of wearable devices, including home spirometry and a smartphone-based questionnaire application for symptom and vital sign monitoring using wearable devices, during the first and second SARS-CoV-2 vaccination. Materials and Methods: Observational usability pilot study for six weeks of home monitoring with the COVIDA Desk for LTRs. During the first week after the vaccination, intensive monitoring was performed by recording data on physical activity, spirometry, temperature, pulse oximetry and self-reported symptoms, signs and additional measurements. During the subsequent days, the number of monitoring assessments was reduced. LTRs reported on their perceptions of the usability of the monitoring device through a purpose-designed questionnaire. Results: Ten LTRs planning to receive the first COVID-19 vaccinations were recruited. For the intensive monitoring study phase, LTRs recorded symptoms, signs and additional measurements. The most frequent adverse events reported were local pain, fatigue, sleep disturbance and headache. The duration of these symptoms was 5–8 days post-vaccination. Adherence to the main monitoring devices was high. LTRs rated usability as high. The majority were willing to continue monitoring. Conclusions: The COVIDA Desk showed favorable technical performance and was well accepted by the LTRs during the vaccination phase of the pandemic. The feasibility of the RPM system deployment was proven by the rapid recruitment uptake, technical performance (i.e., low number of errors), favorable user experience questionnaires and detailed individual user feedback.

Authors

Mace M. Schuurmans, Michal Muszynski, Xiang Li, Ricards Marcinkevics, Lukas Zimmerli, Diego Monserrat Lopez, Bruno Michel, Jonas Weiss, Rene Hage, Maurice Roeder, Julia E. Vogt, Thomas Brunschwiler

Submitted

Medicina

Date

20.03.2023

LinkDOI

Abstract

Multimodal VAEs have recently gained attention as efficient models for weakly-supervised generative learning with multiple modalities. However, all existing variants of multimodal VAEs are affected by a non-trivial trade-off between generative quality and generative coherence. In particular mixture-based models achieve good coherence only at the expense of sample diversity and a resulting lack of generative quality. We present a novel variant of the mixture-of-experts multimodal variational autoencoder that improves its generative quality, while maintaining high semantic coherence. We model shared and modality-specific information in separate latent subspaces, proposing an objective that overcomes certain dependencies on hyperparameters that arise for existing approaches with the same latent space structure. Compared to these existing approaches, we show increased robustness with respect to changes in the design of the latent space, in terms of the capacity allocated to modality-specific subspaces. We show that our model achieves both good generative coherence and high generative quality in challenging experiments, including more complex multimodal datasets than those used in previous works.

Authors

Emanuele Palumbo, Imant Daunhawer, Julia E. Vogt

Submitted

The Eleventh International Conference on Learning Representations, ICLR 2023

Date

02.03.2023

Link

Abstract

Interpretability and explainability are crucial for machine learning (ML) and statistical applications in medicine, economics, law, and natural sciences and form an essential principle for ML model design and development. Although interpretability and explainability have escaped a precise and universal definition, many models and techniques motivated by these properties have been developed over the last 30 years, with the focus currently shifting toward deep learning. We will consider concrete examples of state-of-the-art, including specially tailored rule-based, sparse, and additive classification models, interpretable representation learning, and methods for explaining black-box models post hoc. The discussion will emphasize the need for and relevance of interpretability and explainability, the divide between them, and the inductive biases behind the presented “zoo” of interpretable models and explanation methods.

Authors

Ricards Marcinkevics, Julia E. Vogt

Submitted

WIREs Data Mining and Knowledge Discovery

Date

28.02.2023

LinkDOI

Abstract

Many modern research fields increasingly rely on collecting and analysing massive, often unstructured, and unwieldy datasets. Consequently, there is growing interest in machine learning and artificial intelligence applications that can harness this `data deluge'. This broad nontechnical overview provides a gentle introduction to machine learning with a specific focus on medical and biological applications. We explain the common types of machine learning algorithms and typical tasks that can be solved, illustrating the basics with concrete examples from healthcare. Lastly, we provide an outlook on open challenges, limitations, and potential impacts of machine-learning-powered medicine.

Authors

Ricards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt

Submitted

Arxiv

Date

23.12.2022

LinkDOI

Abstract

The robustness of machine learning algorithms to distributions shift is primarily discussed in the context of supervised learning (SL). As such, there is a lack of insight on the robustness of the representations learned from unsupervised methods, such as self-supervised learning (SSL) and auto-encoder based algorithms (AE), to distribution shift. We posit that the input-driven objectives of unsupervised algorithms lead to representations that are more robust to distribution shift than the target-driven objective of SL. We verify this by extensively evaluating the performance of SSL and AE on both synthetic and realistic distribution shift datasets. Following observations that the linear layer used for classification itself can be susceptible to spurious correlations, we evaluate the representations using a linear head trained on a small amount of out-of-distribution (OOD) data, to isolate the robustness of the learned representations from that of the linear head. We also develop "controllable" versions of existing realistic domain generalisation datasets with adjustable degrees of distribution shifts. This allows us to study the robustness of different learning algorithms under versatile yet realistic distribution shift conditions. Our experiments show that representations learned from unsupervised learning algorithms generalise better than SL under a wide variety of extreme as well as realistic distribution shifts.

Authors

Yuge Shi, Imant Daunhawer, Julia E. Vogt, Philip H.S. Torr, Amartya Sanyal

Submitted

The Eleventh International Conference on Learning Representations, ICLR 2023

Date

16.12.2022

Link

Abstract

Early detection of cardiac dysfunction through routine screening is vital for diagnosing cardiovascular diseases. An important metric of cardiac function is the left ventricular ejection fraction (EF), which is used to diagnose cardiomyopathy. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound being a low-cost, real-time, and non-ionizing technology. However, human assessment of echocardiograms for calculating EF is both time-consuming and expertise-demanding, raising the need for an automated approach. Earlier automated works have been limited to still images or use echocardiogram videos with spatio-temporal convolutions in a complex pipeline. In this work, we propose to generate images from readily available echocardiogram videos, each image mimicking a M(otion)-mode image from a different scan line through time. We then combine different M-mode images using off-the-shelf model architectures to estimate the EF and, thus, diagnose cardiomyopathy. Our experiments show that our proposed method converges with only ten modes and is comparable to the baseline method while bypassing its cumbersome training process.

Authors

Thomas Sutter, Sebastian Balzer, Ece Özkan Elsen, Julia E. Vogt

Submitted

Medical Imaging Meets NeurIPS Workshop 2022

Date

02.12.2022

Link

Abstract

Background: Arm use metrics derived from wrist-mounted movement sensors are widely used to quantify the upper limb performance in real-life conditions of individuals with stroke throughout motor recovery. The calculation of real-world use metrics, such as arm use duration and laterality preferences, relies on accurately identifying functional movements. Hence, classifying upper limb activity into functional and non-functional classes is paramount. Acceleration thresholds are conventionally used to distinguish these classes. However, these methods are challenged by the high inter and intra-individual variability of movement patterns. In this study, we developed and validated a machine learning classifier for this task and compared it to methods using conventional and optimal thresholds.Methods: Individuals after stroke were video-recorded in their home environment performing semi-naturalistic daily tasks while wearing wrist-mounted inertial measurement units. Data were labeled frame-by-frame following the Taxonomy of Functional Upper Limb Motion definitions, excluding whole-body movements, and sequenced into 1-s epochs. Actigraph counts were computed, and an optimal threshold for functional movement was determined by receiver operating characteristic curve analyses on group and individual levels. A logistic regression classifier was trained on the same labels using time and frequency domain features. Performance measures were compared between all classification methods.Results: Video data (6.5 h) of 14 individuals with mild-to-severe upper limb impairment were labeled. Optimal activity count thresholds were ≥20.1 for the affected side and ≥38.6 for the unaffected side and showed high predictive power with an area under the curve (95% CI) of 0.88 (0.87,0.89) and 0.86 (0.85, 0.87), respectively. A classification accuracy of around 80% was equivalent to the optimal threshold and machine learning methods and outperformed the conventional threshold by ∼10%. Optimal thresholds and machine learning methods showed superior specificity (75–82%) to conventional thresholds (58–66%) across unilateral and bilateral activities.Conclusion: This work compares the validity of methods classifying stroke survivors’ real-life arm activities measured by wrist-worn sensors excluding whole-body movements. The determined optimal thresholds and machine learning classifiers achieved an equivalent accuracy and higher specificity than conventional thresholds. Our open-sourced classifier or optimal thresholds should be used to specify the intensity and duration of arm use.

Authors

Johannes Pohl, Alain Ryser, Janne Marieke Veerbeek, Geert Verheyden, Julia Elisabeth Vogt, Andreas Rüdiger Luft, Chris Awai Easthope

Submitted

Frontiers in Physiology

Date

28.09.2022

LinkDOI

Abstract

Background: Stroke leads to motor impairment which reduces physical activity, negatively affects social participation, and increases the risk of secondary cardiovascular events. Continuous monitoring of physical activity with motion sensors is promising to allow the prescription of tailored treatments in a timely manner. Accurate classification of gait activities and body posture is necessary to extract actionable information for outcome measures from unstructured motion data. We here develop and validate a solution for various sensor configurations specifically for a stroke population.Methods: Video and movement sensor data (locations: wrists, ankles, and chest) were collected from fourteen stroke survivors with motor impairment who performed real-life activities in their home environment. Video data were labeled for five classes of gait and body postures and three classes of transitions that served as ground truth. We trained support vector machine (SVM), logistic regression (LR), and k-nearest neighbor (kNN) models to identify gait bouts only or gait and posture. Model performance was assessed by the nested leave-one-subject-out protocol and compared across five different sensor placement configurations.Results: Our method achieved very good performance when predicting real-life gait versus non-gait (Gait classification) with an accuracy between 85% and 93% across sensor configurations, using SVM and LR modeling. On the much more challenging task of discriminating between the body postures lying, sitting, and standing as well as walking, and stair ascent/descent (Gait and postures classification), our method achieves accuracies between 80% and 86% with at least one ankle and wrist sensor attached unilaterally. The Gait and postures classification performance between SVM and LR was equivalent but superior to kNN.Conclusion: This work presents a comparison of performance when classifying Gait and body postures in post-stroke individuals with different sensor configurations, which provide options for subsequent outcome evaluation. We achieved accurate classification of gait and postures performed in a real-life setting by individuals with a wide range of motor impairments due to stroke. This validated classifier will hopefully prove a useful resource to researchers and clinicians in the increasingly important field of digital health in the form of remote movement monitoring using motion sensors.

Authors

Johannes Pohl, Alain Ryser, Janne Marieke Veerbeek, Geert Verheyden, Julia Elisabeth Vogt, Andreas Rüdiger Luft, Chris Awai Easthope

Submitted

Frontiers in Physiology

Date

26.09.2022

LinkDOI

Abstract

Pulmonary hypertension (PH) in newborns and infants is a complex condition associated with several pulmonary, cardiac, and systemic diseases contributing to morbidity and mortality. Therefore, accurate and early detection of PH is crucial for successful management. Using echocardiography, the primary diagnostic tool in pediatrics, human assessment is both time-consuming and expertise-demanding, raising the need for an automated approach. In this work, we present an interpretable multi-view video-based deep learning approach to predict PH for a cohort of 194 newborns using echocardiograms. We use spatio-temporal convolutional architectures for the prediction of PH from each view, and aggregate the predictions of the different views using majority voting. To the best of our knowledge, this is the first work for an automated assessment of PH in newborns using echocardiograms. Our results show a mean F1-score of 0.84 for severity prediction and 0.92 for binary detection using 10-fold cross-validation. We complement our predictions with saliency maps and show that the learned model focuses on clinically relevant cardiac structures, motivating its usage in clinical practice.

Authors

Hanna Ragnarsdottir, Laura Manduchi, Holger Michel, Fabian Laumer, Sven Wellmann, Ece Özkan Elsen, Julia E. Vogt

Submitted

DAGM German Conference on Pattern Recognition

Date

20.09.2022

DOI

Abstract

We propose a novel anomaly detection method for echocardiogram videos. The introduced method takes advantage of the periodic nature of the heart cycle to learn three variants of a variational latent trajectory model (TVAE). While the first two variants (TVAE-C and TVAE-R) model strict periodic movements of the heart, the third (TVAE-S) is more general and allows shifts in the spatial representation throughout the video. All models are trained on the healthy samples of a novel in-house dataset of infant echocardiogram videos consisting of multiple chamber views to learn a normative prior of the healthy population. During inference, maximum a posteriori (MAP) based anomaly detection is performed to detect out-of-distribution samples in our dataset. The proposed method reliably identifies severe congenital heart defects, such as Ebstein’s Anomaly or Shone-complex. Moreover, it achieves superior performance over MAP-based anomaly detection with standard variational autoencoders when detecting pulmonary hypertension and right ventricular dilation. Finally, we demonstrate that the proposed method enables interpretable explanations of its output through heatmaps highlighting the regions corresponding to anomalous heart structures.

Authors

Alain Ryser, Laura Manduchi, Fabian Laumer, Holger Michel, Sven Wellmann, Julia E. Vogt

Submitted

The Seventh Machine Learning for Healthcare Conference, MLHC 2022

Date

05.08.2022

LinkCode

Abstract

Deep neural networks for image-based screening and computer-aided diagnosis have achieved expert-level performance on various medical imaging modalities, including chest radiographs. Recently, several works have indicated that these state-of-the-art classifiers can be biased with respect to sensitive patient attributes, such as race or gender, leading to growing concerns about demographic disparities and discrimination resulting from algorithmic and model-based decision-making in healthcare. Fair machine learning has focused on mitigating such biases against disadvantaged or marginalised groups, mainly concentrating on tabular data or natural images. This work presents two novel intra-processing techniques based on fine-tuning and pruning an already-trained neural network. These methods are simple yet effective and can be readily applied post hoc in a setting where the protected attribute is unknown during the model development and test time. In addition, we compare several intra- and post-processing approaches applied to debiasing deep chest X-ray classifiers. To the best of our knowledge, this is one of the first efforts studying debiasing methods on chest radiographs. Our results suggest that the considered approaches successfully mitigate biases in fully connected and convolutional neural networks offering stable performance under various settings. The discussed methods can help achieve group fairness of deep medical image classifiers when deploying them in domains with different fairness considerations and constraints.

Authors

Ricards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt

Submitted

The Seventh Machine Learning for Healthcare Conference, MLHC 2022

Date

05.08.2022

LinkCode

Abstract

Arguably, interpretability is one of the guiding principles behind the development of machine-learning-based healthcare decision support tools and computer-aided diagnosis systems. There has been a renewed interest in interpretable classification based on high-level concepts, including, among other model classes, the re-exploration of concept bottleneck models. By their nature, medical diagnosis, patient management, and monitoring require the assessment of multiple views and modalities to form a holistic representation of the patient's state. For instance, in ultrasound imaging, a region of interest might be registered from multiple views that are informative about different sets of clinically relevant features. Motivated by this, we extend the classical concept bottleneck model to the multiview classification setting by representation fusion across the views. We apply our multiview concept bottleneck model to the dataset of ultrasound images acquired from a cohort of pediatric patients with suspected appendicitis to predict the disease. The results suggest that auxiliary supervision from the concepts and aggregation across multiple views help develop more accurate and interpretable classifiers.

Authors

Ugne Klimiene, Ricards Marcinkevics, Patricia Reis Wolfertstetter, Ece Özkan Elsen, Alyssia Paschke, David Niederberger, Sven Wellmann, Christian Knorr, Julia E Vogt

Submitted

Oral spotlight at the 2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH), ICML 2022

Date

23.07.2022

LinkCode

Abstract

We propose a novel anomaly detection method for echocardiogram videos. The introduced method takes advantage of the periodic nature of the heart cycle to learn different variants of a variational latent trajectory model (TVAE). The models are trained on the healthy samples of an in-house dataset of infant echocardiogram videos consisting of multiple chamber views to learn a normative prior of the healthy population. During inference, maximum a posteriori (MAP) based anomaly detection is performed to detect out-ofdistribution samples in our dataset. The proposed method reliably identifies severe congenital heart defects, such as Ebstein’s Anomaly or Shonecomplex. Moreover, it achieves superior performance over MAP-based anomaly detection with standard variational autoencoders on the task of detecting pulmonary hypertension and right ventricular dilation. Finally, we demonstrate that the proposed method provides interpretable explanations of its output through heatmaps which highlight the regions corresponding to anomalous heart structures.

Authors

Alain Ryser, Laura Manduchi, Fabian Laumer, Holger Michel, Sven Wellmann, Julia E. Vogt

Submitted

Poster at the 2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH), ICML 2022

Date

23.07.2022

LinkCode

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) induces both B and T cell responses which jointly contribute to effective neutralization and clearance of the virus. Multiple compartments of circulating immune memory to SARS-CoV-2 are not fully understood. We analyzed humoral and T cell immune responses in young convalescent adults with previous asymptomatic SARS-CoV-2 infections or mildly symptomatic COVID-19 disease. We concomitantly measured antibodies in the blood and analyzed SARS-CoV-2-reactive T cell reaction in response to overlapping peptide pools of four viral proteins in peripheral blood mononuclear cells (PBMC). Using statistical and machine learning models, we investigated whether T cell reactivity predicted antibody status. Individuals with previous SARS-CoV-2 infection differed in T cell responses from non-infected individuals. Subjects with previous SARS-CoV-2 infection exhibited CD4+ T cell responses against S1-, N-proteins and CoV-Mix (containing N, M and S protein-derived peptides) that were dominant over CD8+ T cells. At the same time, signals against the M protein were less pronounced. Double positive IL2+/CD154+ and IFN+/TNF+ CD4+ T cells showed the strongest association with antibody titers. T-cell reactivity to CoV-Mix-, S1-, and N-antigens were most strongly associated with humoral immune response, specifically with a compound antibody titer consisting of RBD, S1, S2, and NP. The T cell phenotype of SARS-CoV-2 infected individuals was stable for four months, thereby exceeding antibody decay rates. Our findings demonstrate that mild COVID-19 infections can elicit robust SARS-CoV-2 T-cell reactive immunity against specific components of SARS-CoV-2.

Authors

Ricards Marcinkevics, Pamuditha Silva, Anna-Katharina Hankele, Katharina Csik, Svenja Godbersen, Algera Goga, Lynn Hasenöhrl, Pascale Hirschi, Hasan Kabakci, Mary P LaPierre, Johanna Mayrhofer, Alexandra Title, Xuan Shu, Nouell Baiioud, Sandra Bernal, Laura Dassisti, Mara D Saenz-de-Juano, Meret Schmidhauser, Giulia Silvestrelli, Simon Z Ulbrich, Thea J Ulbrich, Tamara Wyss, Daniel J Stekhoven, Faisal S Al-Quaddoomi, Shuqing Yu, Mascha Binder, Christoph Schultheiss, Claudia Zindel, Christoph Kolling, Jörg Goldhahn, Bahram Kasmapour, Polina Zjablovskaja, Frank Hardung, Anne Richter, Stefan Miltenyi, Luca Piccoli, Sandra Ciesek, Julia E Vogt, Federica Sallusto, Markus Stoffel, Susanne E Ulbrich

Submitted

The 1st Workshop on Healthcare AI and COVID-19 at ICML 2022

Date

22.07.2022

Abstract

Due to growing concerns about demographic disparities and discrimination resulting from algorithmic and model-based decision-making, recent research has focused on mitigating biases against already disadvantaged or marginalised groups in classification models. From the perspective of classification parity, the two commonest metrics for assessing fairness are statistical parity and equality of opportunity. Current approaches to debiasing in classification either require the knowledge of the protected attribute before or during training or are entirely agnostic to the model class and parameters. This work considers differentiable proxy functions for statistical parity and equality of opportunity and introduces two novel debiasing techniques for neural network classifiers based on fine-tuning and pruning an already-trained network. As opposed to the prior work leveraging adversarial training, the proposed methods are simple yet effective and can be readily applied post hoc. Our experimental results encouragingly suggest that these approaches successfully debias fully connected neural networks trained on tabular data and often outperform model-agnostic post-processing methods.

Authors

Ricards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt

Submitted

Contributed talk at ICLR 2022 Workshop on Socially Responsible Machine Learning

Date

29.04.2022

LinkCode

Abstract

In this work, we study the problem of clustering survival data — a challenging and so far under-explored task. We introduce a novel semi-supervised probabilistic approach to cluster survival data by leveraging recent advances in stochastic gradient variational inference. In contrast to previous work, our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and censored survival times. We compare our model to the related work on clustering and mixture models for survival data in comprehensive experiments on a wide range of synthetic, semi-synthetic, and real-world datasets, including medical imaging data. Our method performs better at identifying clusters and is competitive at predicting survival times. Relying on novel generative assumptions, the proposed model offers a holistic perspective on clustering survival data and holds a promise of discovering subpopulations whose survival is regulated by different generative mechanisms.

Authors

Laura Manduchi, Ricards Marcinkevics, Michela C. Massi, Thomas Weikert, Alexander Sauter, Verena Gotta, Timothy Müller, Flavio Vasella, Marian C. Neidert, Marc Pfister, Bram Stieltjes, Julia E. Vogt

Submitted

The Tenth International Conference on Learning Representations, ICLR 2022

Date

25.04.2022

LinkCode

Abstract

Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. Yet, despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs, which are completely unsupervised. In an attempt to explain this gap, we uncover a fundamental limitation that applies to a large family of mixture-based multimodal VAEs. We prove that the sub-sampling of modalities enforces an undesirable upper bound on the multimodal ELBO and thereby limits the generative quality of the respective models. Empirically, we showcase the generative quality gap on both synthetic and real data and present the tradeoffs between different variants of multimodal VAEs. We find that none of the existing approaches fulfills all desired criteria of an effective multimodal generative model when applied on more complex datasets than those used in previous benchmarks. In summary, we identify, formalize, and validate fundamental limitations of VAE-based approaches for modeling weakly-supervised data and discuss implications for real-world applications.

Authors

Imant Daunhawer, Thomas M. Sutter, Kieran Chin-Cheong, Emanuele Palumbo, Julia E. Vogt

Submitted

The Tenth International Conference on Learning Representations, ICLR 2022

Date

07.04.2022

Link

Abstract

Using artificial intelligence to improve patient care is a cutting-edge methodology, but its implementation in clinical routine has been limited due to significant concerns about understanding its behavior. One major barrier is the explainability dilemma and how much explanation is required to use artificial intelligence safely in healthcare. A key issue is the lack of consensus on the definition of explainability by experts, regulators, and healthcare professionals, resulting in a wide variety of terminology and expectations. This paper aims to fill the gap by defining minimal explainability standards to serve the views and needs of essential stakeholders in healthcare. In that sense, we propose to define minimal explainability criteria that can support doctors’ understanding, meet patients’ needs, and fulfill legal requirements. Therefore, explainability need not to be exhaustive but sufficient for doctors and patients to comprehend the artificial intelligence models’ clinical implications and be integrated safely into clinical practice. Thus, minimally acceptable standards for explainability are context-dependent and should respond to the specific need and potential risks of each clinical scenario for a responsible and ethical implementation of artificial intelligence.

Authors

Laura Arbelaez Ossa, Georg Starke, Giorgia Lorenzini, Julia E Vogt, David M Shaw, Bernice Simone Elger

Submitted

DIGITAL HEALTH

Date

11.02.2022

LinkDOI

Abstract

Die Digitalisierung hat die Medizin bereits verändert und wird die ärztliche Tätig­keit auch in Zukunft stark beeinflussen. Es ist deshalb wichtig, dass sich angehende Ärztinnen und Ärzte bereits während des Studiums mit den Methoden und Ein­satzmöglichkeiten des maschinellen Lernens auseinandersetzen. Die Arbeits­gruppe «Digitalisierung der Medizin» hat dazu Lernziele erarbeitet.

Authors

Raphaël Bonvin, Joachim Buhmann, Carlos Cotrini Jimenez, Marcel Egger, Alexander Geissler, Michael Krauthammer, Christian Schirlo, Christiane Spiess, Johann Steurer, Kerstin Noëlle Vokinger, Julia Vogt

Date

26.01.2022

Link

Abstract

Appendicitis is a common childhood disease, the management of which still lacks consolidated international criteria. In clinical practice, heuristic scoring systems are often used to assess the urgency of patients with suspected appendicitis. Previous work on machine learning for appendicitis has focused on conventional classification models, such as logistic regression and tree-based ensembles. In this study, we investigate the use of risk supersparse linear integer models (risk SLIM) for learning data-driven risk scores to predict the diagnosis, management, and complications in pediatric patients with suspected appendicitis on a dataset consisting of 430 children from a tertiary care hospital. We demonstrate the efficacy of our approach and compare the performance of learnt risk scores to previous analyses with random forests. Risk SLIM is able to detect medically meaningful features and outperforms the traditional appendicitis scores, while at the same time is better suited for the clinical setting than tree-based ensembles.

Authors

Pedro Roig Aparicio, Ricards Marcinkevics, Patricia Reis Wolfertstetter, Sven Wellmann, Christian Knorr, Julia E. Vogt

Submitted

Short paper at 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021

Date

16.12.2021

LinkDOI

Abstract

Constrained clustering has gained significant attention in the field of machine learning as it can leverage prior information on a growing amount of only partially labeled data. Following recent advances in deep generative models, we propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of stochastic gradient variational inference. By explicitly integrating domain knowledge in the form of probabilistic relations, our proposed model (DC-GMM) uncovers the underlying distribution of data conditioned on prior clustering preferences, expressed as pairwise constraints. These constraints guide the clustering process towards a desirable partition of the data by indicating which samples should or should not belong to the same cluster. We provide extensive experiments to demonstrate that DC-GMM shows superior clustering performances and robustness compared to state-of-the-art deep constrained clustering methods on a wide range of data sets. We further demonstrate the usefulness of our approach on two challenging real-world applications.

Authors

Laura Manduchi, Kieran Chin-Cheong, Holger Michel, Sven Wellmann, Julia E. Vogt

Submitted

Accepted at NeurIPS 2021

Date

14.12.2021

Abstract

In this work, we propose a new formulation for multimodal VAEs to model and learn the relationship between data types. Despite their recent progress, current multimodal generative methods are based on simplistic assumptions regarding the relation between data types, which leads to a trade-off between coherence and quality of generated samples - even for simple toy datasets. The proposed method learns the relationship between data types instead of relying on pre-defined and limiting assumptions. Based on the principles of variational inference, we change the posterior approximation to explicitly include information about the relation between data types. We show empirically that the simplified assumption of a single shared latent space leads to inferior performance for a dataset with additional pairwise shared information.

Authors

Thomas M. Sutter, Julia E. Vogt

Submitted

Bayesian Deep Learning Workshop at Neurips 2021

Date

14.12.2021

Link

Abstract

Background: Current strategies for risk stratification and prediction of neonatal early-onset sepsis (EOS) are inefficient and lack diagnostic performance. The aim of this study was to use machine learning to analyze the diagnostic accuracy of risk factors (RFs), clinical signs and biomarkers and to develop a prediction model for culture-proven EOS. We hypothesized that the contribution to diagnostic accuracy of biomarkers is higher than of RFs or clinical signs. Study Design: Secondary analysis of the prospective international multicenter NeoPInS study. Neonates born after completed 34 weeks of gestation with antibiotic therapy due to suspected EOS within the first 72 hours of life participated. Primary outcome was defined as predictive performance for culture-proven EOS with variables known at the start of antibiotic therapy. Machine learning was used in form of a random forest classifier. Results: One thousand six hundred eighty-five neonates treated for suspected infection were analyzed. Biomarkers were superior to clinical signs and RFs for prediction of culture-proven EOS. C-reactive protein and white blood cells were most important for the prediction of the culture result. Our full model achieved an area-under-the-receiver-operating-characteristic-curve of 83.41% (+/-8.8%) and an area-under-the-precision-recall-curve of 28.42% (+/-11.5%). The predictive performance of the model with RFs alone was comparable with random. Conclusions: Biomarkers have to be considered in algorithms for the management of neonates suspected of EOS. A 2-step approach with a screening tool for all neonates in combination with our model in the preselected population with an increased risk for EOS may have the potential to reduce the start of unnecessary antibiotics.

Authors

Martin Stocker, Imant Daunhawer, Wendy van Herk, Salhab el Helou, Sourabh Dutta, Frank A. B. A.Schuerman, Rita K. van den Tooren-de Groot, ; Jantien W. Wieringa, Jan Janota, Laura H. van der Meer-Kappelle, Rob Moonen, Sintha D. Sie, Esther de Vries, Albertine E. Donker, Urs Zimmerman, Luregn J. Schlapbach, Amerik C. de Mol, Angelique Hoffmann-Haringsma, Madan Roy, Maren Tomaske, René F. Kornelisse, Juliette van Gijsel, Frans B. Plötz, Sven Wellmann, Niek B Achten, Dirk Lehnick, Annemarie M. C. van Rossum, Julia E. Vogt

Submitted

The Pediatric Infectious Disease Journal, 2022

Date

09.09.2021

LinkDOI

Abstract

Autonomic peripheral activity is partly governed by brain autonomic centers. However, there is still a lot of uncertainties regarding the precise link between peripheral and central autonomic biosignals. Clarifying these links could have a profound impact on the interpretability, and thus usefulness, of peripheral autonomic biosignals captured with wearable devices. In this study, we take advantage of a unique dataset consisting of intracranial stereo-electroencephalography (SEEG) and peripheral biosignals acquired simultaneously for several days from four subjects undergoing epilepsy monitoring. Compared to previous work, we apply a deep neural network to explore high-dimensional nonlinear correlations between the cerebral brainwaves and variations in heart rate and electrodermal activity (EDA). Further, neural network explainability methods were applied to identify most relevant brainwave frequencies, brain regions and temporal information to predict a specific biosignal. Strongest brain-peripheral correlations were observed from contacts located in the central autonomic network, in particular in the alpha, theta and 52 to 58 Hz frequency band. Furthermore, a temporal delay of 12 to 14 s between SEEG and EDA signal was observed. Finally, we believe that this pilot study demonstrates a promising approach to mapping brain-peripheral relationships in a data-driven manner by leveraging the expressiveness of deep neural networks.

Authors

Alexander H. Hatteland, Ricards Marcinkevics, Renaud Marquis, Thomas Frick, Ilona Hubbard, Julia E. Vogt, Thomas Brunschwiler, Philippe Ryvlin

Submitted

Best paper award at IEEE International Conference on Digital Health, ICDH 2021

Date

05.09.2021

LinkDOI

Abstract

Machine Learning has become more and more popular in the medical domain over the past years. While supervised machine learning has already been applied successfully, the vast amount of unlabelled data offers new opportunities for un- and self-supervised learning methods. Especially with regard to the multimodal nature of most clinical data, the labelling of multiple data types becomes quickly infeasible in the medical domain. However, to the best of our knowledge, multimodal unsupervised methods have been tested extensively on toy-datasets only but have never been applied to real-world medical data, for direct applications such as disease classification and image generation. In this article, we demonstrate that self-supervised methods provide promising results on medical data while highlighting that the task is extremely challenging and that there is space for substantial improvements.

Authors

Hendrik J. Klug, Thomas M. Sutter, Julia E. Vogt

Submitted

Medical Imaging with Deep Learning, MIDL 2021

Date

07.07.2021

Link

Abstract

Background Preterm neonates frequently experience hypernatremia (plasma sodium concentrations >145 mmol/l), which is associated with clinical complications, such as intraventricular hemorrhage. Study design In this single center retrospective observational study, the following 7 risk factors for hypernatremia were analyzed in very low gestational age (VLGA, below 32 weeks) neonates: gestational age (GA), delivery mode (DM; vaginal or caesarian section), sex, birth weight, small for GA, multiple birth, and antenatal corticosteroids. Machine learning (ML) approaches were applied to obtain probabilities for hypernatremia. Results 824 VLGA neonates were included (median GA 29.4 weeks, median birth weight 1170g, caesarean section 83%). 38% of neonates experienced hypernatremia. Maximal sodium concentration of 144 mmol/l (interquartile range 142–147) was observed 52 hours (41–65) after birth. ML identified vaginal delivery and GA as key risk factors for hypernatremia. The risk of hypernatremia increased with lower GA from 22% for GA >= 31–32 weeks to 46% for GA < 31 weeks and 60% for GA < 27 weeks. A linear relationship between maximal sodium concentrations and GA was found, showing decreases of 0.29 mmol/l per increasing week GA in neonates with vaginal delivery and 0.49 mmol/l/week after cesarean section. Sex, multiple birth and antenatal corticosteroids were not associated hypernatremia. Conclusion VLGA neonates with vaginal delivery and low GA have the highest risk for hypernatremia. Early identification of neonates at risk and early intervention may prevent extreme sodium excursions and associated clinical complications.

Authors

Nadia S. Eugster, Florence Corminboeuf, Gilbert Koch, Julia E. Vogt, Thomas Sutter, Tamara van Donge, Marc Pfister, Roland Gerull

Submitted

Klinische Pädiatrie

Date

07.06.2021

LinkDOI

Abstract

Multiple data types naturally co-occur when describing real-world phenomena and learning from them is a long-standing goal in machine learning research. However, existing self-supervised generative models approximating an ELBO are not able to fulfill all desired requirements of multimodal models: their posterior approximation functions lead to a trade-off between the semantic coherence and the ability to learn the joint data distribution. We propose a new, generalized ELBO formulation for multimodal data that overcomes these limitations. The new objective encompasses two previous methods as special cases and combines their benefits without compromises. In extensive experiments, we demonstrate the advantage of the proposed method compared to state-of-the-art models in self-supervised, generative learning tasks.

Authors

Thomas M. Sutter, Imant Daunhawer, Julia E. Vogt

Submitted

Ninth International Conference on Learning Representations, ICLR 2021

Date

04.05.2021

Link

Abstract

Background: Given the absence of consolidated and standardized international guidelines for managing pediatric appendicitis and the few strictly data-driven studies in this specific, we investigated the use of machine learning (ML) classifiers for predicting the diagnosis, management and severity of appendicitis in children. Materials and Methods: Predictive models were developed and validated on a dataset acquired from 430 children and adolescents aged 0-18 years, based on a range of information encompassing history, clinical examination, laboratory parameters, and abdominal ultrasonography. Logistic regression, random forests, and gradient boosting machines were used for predicting the three target variables. Results: A random forest classifier achieved areas under the precision-recall curve of 0.94, 0.92, and 0.70, respectively, for the diagnosis, management, and severity of appendicitis. We identified smaller subsets of 6, 17, and 18 predictors for each of targets that sufficed to achieve the same performance as the model based on the full set of 38 variables. We used these findings to develop the user-friendly online Appendicitis Prediction Tool for children with suspected appendicitis. Discussion: This pilot study considered the most extensive set of predictor and target variables to date and is the first to simultaneously predict all three targets in children: diagnosis, management, and severity. Moreover, this study presents the first ML model for appendicitis that was deployed as an open access easy-to-use online tool. Conclusion: ML algorithms help to overcome the diagnostic and management challenges posed by appendicitis in children and pave the way toward a more personalized approach to medical decision-making. Further validation studies are needed to develop a finished clinical decision support system.

Authors

Ricards Marcinkevics, Patricia Reis Wolfertstetter, Sven Wellmann, Christian Knorr, Julia E Vogt

Submitted

Frontiers in Pediatrics

Date

29.04.2021

LinkDOICode

Abstract

Survival analysis has gained significant attention in the medical domain with many far-reaching applications. Although a variety of machine learning methods have been introduced for tackling time-to-event prediction in unstructured data with complex dependencies, clustering of survival data remains an under-explored problem. The latter is particularly helpful in discovering patient subpopulations whose survival is regulated by different generative mechanisms, a critical problem in precision medicine. To this end, we introduce a novel probabilistic approach to cluster survival data in a variational deep clustering setting. Our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and the potentially censored survival times. We compare our model to the related work on survival clustering in comprehensive experiments on a range of synthetic, semi-synthetic, and real-world datasets. Our proposed method performs better at identifying clusters and is competitive at predicting survival times in terms of the concordance index and relative absolute error.

Authors

Laura Manduchi, Ricards Marcinkevics, Julia E. Vogt

Submitted

Contributed talk at AI for Public Health Workshop at ICLR 2021

Date

09.04.2021

Link