Prof. Dr.

Julia Vogt

Group Leader

E-Mail
julia.vogt@inf.ethz.ch
Phone
+41 44 633 8714
Address
Department of Computer Science
CAB G 69.1
Universitätstr. 6
CH – 8092 Zurich, Switzerland
Room
CAB G 69.1

Julia Vogt is an assistant professor in Computer Science at ETH Zurich, where she leads the Medical Data Science Group. The focus of her research is on linking computer science with medicine, with the ultimate aim of personalized patient treatment. She has studied mathematics both in Konstanz and in Sydney and earned her Ph.D. in computer science at the University of Basel. She was a postdoctoral research fellow at the Memorial Sloan-Kettering Cancer Center in NYC and with the Bioinformatics and Information Mining group at the University of Konstanz. In 2018, she joined the University of Basel as an assistant professor. In May 2019, she and her lab moved to Zurich where she joined the Computer Science Department of ETH Zurich.

Abstract

Many modern research fields increasingly rely on collecting and analysing massive, often unstructured, and unwieldy datasets. Consequently, there is growing interest in machine learning and artificial intelligence applications that can harness this `data deluge'. This broad nontechnical overview provides a gentle introduction to machine learning with a specific focus on medical and biological applications. We explain the common types of machine learning algorithms and typical tasks that can be solved, illustrating the basics with concrete examples from healthcare. Lastly, we provide an outlook on open challenges, limitations, and potential impacts of machine-learning-powered medicine.

Authors

Ricards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt

Submitted

Arxiv

Date

23.12.2022

LinkDOI

Abstract

Early detection of cardiac dysfunction through routine screening is vital for diagnosing cardiovascular diseases. An important metric of cardiac function is the left ventricular ejection fraction (EF), which is used to diagnose cardiomyopathy. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound being a low-cost, real-time, and non-ionizing technology. However, human assessment of echocardiograms for calculating EF is both time-consuming and expertise-demanding, raising the need for an automated approach. Earlier automated works have been limited to still images or use echocardiogram videos with spatio-temporal convolutions in a complex pipeline. In this work, we propose to generate images from readily available echocardiogram videos, each image mimicking a M(otion)-mode image from a different scan line through time. We then combine different M-mode images using off-the-shelf model architectures to estimate the EF and, thus, diagnose cardiomyopathy. Our experiments show that our proposed method converges with only ten modes and is comparable to the baseline method while bypassing its cumbersome training process.

Authors

Thomas Sutter, Sebastian Balzer, Ece Özkan Elsen, Julia E. Vogt

Submitted

Medical Imaging Meets NeurIPS Workshop 2022

Date

02.12.2022

Link

Abstract

Background: Arm use metrics derived from wrist-mounted movement sensors are widely used to quantify the upper limb performance in real-life conditions of individuals with stroke throughout motor recovery. The calculation of real-world use metrics, such as arm use duration and laterality preferences, relies on accurately identifying functional movements. Hence, classifying upper limb activity into functional and non-functional classes is paramount. Acceleration thresholds are conventionally used to distinguish these classes. However, these methods are challenged by the high inter and intra-individual variability of movement patterns. In this study, we developed and validated a machine learning classifier for this task and compared it to methods using conventional and optimal thresholds.Methods: Individuals after stroke were video-recorded in their home environment performing semi-naturalistic daily tasks while wearing wrist-mounted inertial measurement units. Data were labeled frame-by-frame following the Taxonomy of Functional Upper Limb Motion definitions, excluding whole-body movements, and sequenced into 1-s epochs. Actigraph counts were computed, and an optimal threshold for functional movement was determined by receiver operating characteristic curve analyses on group and individual levels. A logistic regression classifier was trained on the same labels using time and frequency domain features. Performance measures were compared between all classification methods.Results: Video data (6.5 h) of 14 individuals with mild-to-severe upper limb impairment were labeled. Optimal activity count thresholds were ≥20.1 for the affected side and ≥38.6 for the unaffected side and showed high predictive power with an area under the curve (95% CI) of 0.88 (0.87,0.89) and 0.86 (0.85, 0.87), respectively. A classification accuracy of around 80% was equivalent to the optimal threshold and machine learning methods and outperformed the conventional threshold by ∼10%. Optimal thresholds and machine learning methods showed superior specificity (75–82%) to conventional thresholds (58–66%) across unilateral and bilateral activities.Conclusion: This work compares the validity of methods classifying stroke survivors’ real-life arm activities measured by wrist-worn sensors excluding whole-body movements. The determined optimal thresholds and machine learning classifiers achieved an equivalent accuracy and higher specificity than conventional thresholds. Our open-sourced classifier or optimal thresholds should be used to specify the intensity and duration of arm use.

Authors

Johannes Pohl, Alain Ryser, Janne Marieke Veerbeek, Geert Verheyden, Julia Elisabeth Vogt, Andreas Rüdiger Luft, Chris Awai Easthope

Submitted

Frontiers in Physiology

Date

28.09.2022

LinkDOI

Abstract

Background: Stroke leads to motor impairment which reduces physical activity, negatively affects social participation, and increases the risk of secondary cardiovascular events. Continuous monitoring of physical activity with motion sensors is promising to allow the prescription of tailored treatments in a timely manner. Accurate classification of gait activities and body posture is necessary to extract actionable information for outcome measures from unstructured motion data. We here develop and validate a solution for various sensor configurations specifically for a stroke population.Methods: Video and movement sensor data (locations: wrists, ankles, and chest) were collected from fourteen stroke survivors with motor impairment who performed real-life activities in their home environment. Video data were labeled for five classes of gait and body postures and three classes of transitions that served as ground truth. We trained support vector machine (SVM), logistic regression (LR), and k-nearest neighbor (kNN) models to identify gait bouts only or gait and posture. Model performance was assessed by the nested leave-one-subject-out protocol and compared across five different sensor placement configurations.Results: Our method achieved very good performance when predicting real-life gait versus non-gait (Gait classification) with an accuracy between 85% and 93% across sensor configurations, using SVM and LR modeling. On the much more challenging task of discriminating between the body postures lying, sitting, and standing as well as walking, and stair ascent/descent (Gait and postures classification), our method achieves accuracies between 80% and 86% with at least one ankle and wrist sensor attached unilaterally. The Gait and postures classification performance between SVM and LR was equivalent but superior to kNN.Conclusion: This work presents a comparison of performance when classifying Gait and body postures in post-stroke individuals with different sensor configurations, which provide options for subsequent outcome evaluation. We achieved accurate classification of gait and postures performed in a real-life setting by individuals with a wide range of motor impairments due to stroke. This validated classifier will hopefully prove a useful resource to researchers and clinicians in the increasingly important field of digital health in the form of remote movement monitoring using motion sensors.

Authors

Johannes Pohl, Alain Ryser, Janne Marieke Veerbeek, Geert Verheyden, Julia Elisabeth Vogt, Andreas Rüdiger Luft, Chris Awai Easthope

Submitted

Frontiers in Physiology

Date

26.09.2022

LinkDOI

Abstract

Pulmonary hypertension (PH) in newborns and infants is a complex condition associated with several pulmonary, cardiac, and systemic diseases contributing to morbidity and mortality. Therefore, accurate and early detection of PH is crucial for successful management. Using echocardiography, the primary diagnostic tool in pediatrics, human assessment is both time-consuming and expertise-demanding, raising the need for an automated approach. In this work, we present an interpretable multi-view video-based deep learning approach to predict PH for a cohort of 194 newborns using echocardiograms. We use spatio-temporal convolutional architectures for the prediction of PH from each view, and aggregate the predictions of the different views using majority voting. To the best of our knowledge, this is the first work for an automated assessment of PH in newborns using echocardiograms. Our results show a mean F1-score of 0.84 for severity prediction and 0.92 for binary detection using 10-fold cross-validation. We complement our predictions with saliency maps and show that the learned model focuses on clinically relevant cardiac structures, motivating its usage in clinical practice.

Authors

Hanna Ragnarsdottir, Laura Manduchi, Holger Michel, Fabian Laumer, Sven Wellmann, Ece Özkan Elsen, Julia E. Vogt

Submitted

DAGM German Conference on Pattern Recognition

Date

20.09.2022

DOI

Abstract

Deep neural networks for image-based screening and computer-aided diagnosis have achieved expert-level performance on various medical imaging modalities, including chest radiographs. Recently, several works have indicated that these state-of-the-art classifiers can be biased with respect to sensitive patient attributes, such as race or gender, leading to growing concerns about demographic disparities and discrimination resulting from algorithmic and model-based decision-making in healthcare. Fair machine learning has focused on mitigating such biases against disadvantaged or marginalised groups, mainly concentrating on tabular data or natural images. This work presents two novel intra-processing techniques based on fine-tuning and pruning an already-trained neural network. These methods are simple yet effective and can be readily applied post hoc in a setting where the protected attribute is unknown during the model development and test time. In addition, we compare several intra- and post-processing approaches applied to debiasing deep chest X-ray classifiers. To the best of our knowledge, this is one of the first efforts studying debiasing methods on chest radiographs. Our results suggest that the considered approaches successfully mitigate biases in fully connected and convolutional neural networks offering stable performance under various settings. The discussed methods can help achieve group fairness of deep medical image classifiers when deploying them in domains with different fairness considerations and constraints.

Authors

Ricards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt

Submitted

The Seventh Machine Learning for Healthcare Conference, MLHC 2022

Date

05.08.2022

LinkCode

Abstract

We propose a novel anomaly detection method for echocardiogram videos. The introduced method takes advantage of the periodic nature of the heart cycle to learn three variants of a variational latent trajectory model (TVAE). While the first two variants (TVAE-C and TVAE-R) model strict periodic movements of the heart, the third (TVAE-S) is more general and allows shifts in the spatial representation throughout the video. All models are trained on the healthy samples of a novel in-house dataset of infant echocardiogram videos consisting of multiple chamber views to learn a normative prior of the healthy population. During inference, maximum a posteriori (MAP) based anomaly detection is performed to detect out-of-distribution samples in our dataset. The proposed method reliably identifies severe congenital heart defects, such as Ebstein’s Anomaly or Shone-complex. Moreover, it achieves superior performance over MAP-based anomaly detection with standard variational autoencoders when detecting pulmonary hypertension and right ventricular dilation. Finally, we demonstrate that the proposed method enables interpretable explanations of its output through heatmaps highlighting the regions corresponding to anomalous heart structures.

Authors

Alain Ryser, Laura Manduchi, Fabian Laumer, Holger Michel, Sven Wellmann, Julia E. Vogt

Submitted

The Seventh Machine Learning for Healthcare Conference, MLHC 2022

Date

05.08.2022

LinkCode

Abstract

Arguably, interpretability is one of the guiding principles behind the development of machine-learning-based healthcare decision support tools and computer-aided diagnosis systems. There has been a renewed interest in interpretable classification based on high-level concepts, including, among other model classes, the re-exploration of concept bottleneck models. By their nature, medical diagnosis, patient management, and monitoring require the assessment of multiple views and modalities to form a holistic representation of the patient's state. For instance, in ultrasound imaging, a region of interest might be registered from multiple views that are informative about different sets of clinically relevant features. Motivated by this, we extend the classical concept bottleneck model to the multiview classification setting by representation fusion across the views. We apply our multiview concept bottleneck model to the dataset of ultrasound images acquired from a cohort of pediatric patients with suspected appendicitis to predict the disease. The results suggest that auxiliary supervision from the concepts and aggregation across multiple views help develop more accurate and interpretable classifiers.

Authors

Ugne Klimiene, Ricards Marcinkevics, Patricia Reis Wolfertstetter, Ece Özkan Elsen, Alyssia Paschke, David Niederberger, Sven Wellmann, Christian Knorr, Julia E Vogt

Submitted

Oral spotlight at the 2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH), ICML 2022

Date

23.07.2022

Link

Abstract

We propose a novel anomaly detection method for echocardiogram videos. The introduced method takes advantage of the periodic nature of the heart cycle to learn different variants of a variational latent trajectory model (TVAE). The models are trained on the healthy samples of an in-house dataset of infant echocardiogram videos consisting of multiple chamber views to learn a normative prior of the healthy population. During inference, maximum a posteriori (MAP) based anomaly detection is performed to detect out-ofdistribution samples in our dataset. The proposed method reliably identifies severe congenital heart defects, such as Ebstein’s Anomaly or Shonecomplex. Moreover, it achieves superior performance over MAP-based anomaly detection with standard variational autoencoders on the task of detecting pulmonary hypertension and right ventricular dilation. Finally, we demonstrate that the proposed method provides interpretable explanations of its output through heatmaps which highlight the regions corresponding to anomalous heart structures.

Authors

Alain Ryser, Laura Manduchi, Fabian Laumer, Holger Michel, Sven Wellmann, Julia E. Vogt

Submitted

Poster at the 2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH), ICML 2022

Date

23.07.2022

LinkCode

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) induces both B and T cell responses which jointly contribute to effective neutralization and clearance of the virus. Multiple compartments of circulating immune memory to SARS-CoV-2 are not fully understood. We analyzed humoral and T cell immune responses in young convalescent adults with previous asymptomatic SARS-CoV-2 infections or mildly symptomatic COVID-19 disease. We concomitantly measured antibodies in the blood and analyzed SARS-CoV-2-reactive T cell reaction in response to overlapping peptide pools of four viral proteins in peripheral blood mononuclear cells (PBMC). Using statistical and machine learning models, we investigated whether T cell reactivity predicted antibody status. Individuals with previous SARS-CoV-2 infection differed in T cell responses from non-infected individuals. Subjects with previous SARS-CoV-2 infection exhibited CD4+ T cell responses against S1-, N-proteins and CoV-Mix (containing N, M and S protein-derived peptides) that were dominant over CD8+ T cells. At the same time, signals against the M protein were less pronounced. Double positive IL2+/CD154+ and IFN+/TNF+ CD4+ T cells showed the strongest association with antibody titers. T-cell reactivity to CoV-Mix-, S1-, and N-antigens were most strongly associated with humoral immune response, specifically with a compound antibody titer consisting of RBD, S1, S2, and NP. The T cell phenotype of SARS-CoV-2 infected individuals was stable for four months, thereby exceeding antibody decay rates. Our findings demonstrate that mild COVID-19 infections can elicit robust SARS-CoV-2 T-cell reactive immunity against specific components of SARS-CoV-2.

Authors

Ricards Marcinkevics, Pamuditha Silva, Anna-Katharina Hankele, Katharina Csik, Svenja Godbersen, Algera Goga, Lynn Hasenöhrl, Pascale Hirschi, Hasan Kabakci, Mary P LaPierre, Johanna Mayrhofer, Alexandra Title, Xuan Shu, Nouell Baiioud, Sandra Bernal, Laura Dassisti, Mara D Saenz-de-Juano, Meret Schmidhauser, Giulia Silvestrelli, Simon Z Ulbrich, Thea J Ulbrich, Tamara Wyss, Daniel J Stekhoven, Faisal S Al-Quaddoomi, Shuqing Yu, Mascha Binder, Christoph Schultheiss, Claudia Zindel, Christoph Kolling, Jörg Goldhahn, Bahram Kasmapour, Polina Zjablovskaja, Frank Hardung, Anne Richter, Stefan Miltenyi, Luca Piccoli, Sandra Ciesek, Julia E Vogt, Federica Sallusto, Markus Stoffel, Susanne E Ulbrich

Submitted

The 1st Workshop on Healthcare AI and COVID-19 at ICML 2022

Date

22.07.2022

Abstract

Due to growing concerns about demographic disparities and discrimination resulting from algorithmic and model-based decision-making, recent research has focused on mitigating biases against already disadvantaged or marginalised groups in classification models. From the perspective of classification parity, the two commonest metrics for assessing fairness are statistical parity and equality of opportunity. Current approaches to debiasing in classification either require the knowledge of the protected attribute before or during training or are entirely agnostic to the model class and parameters. This work considers differentiable proxy functions for statistical parity and equality of opportunity and introduces two novel debiasing techniques for neural network classifiers based on fine-tuning and pruning an already-trained network. As opposed to the prior work leveraging adversarial training, the proposed methods are simple yet effective and can be readily applied post hoc. Our experimental results encouragingly suggest that these approaches successfully debias fully connected neural networks trained on tabular data and often outperform model-agnostic post-processing methods.

Authors

Ricards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt

Submitted

Contributed talk at ICLR 2022 Workshop on Socially Responsible Machine Learning

Date

29.04.2022

LinkCode

Abstract

Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. Yet, despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs, which are completely unsupervised. In an attempt to explain this gap, we uncover a fundamental limitation that applies to a large family of mixture-based multimodal VAEs. We prove that the sub-sampling of modalities enforces an undesirable upper bound on the multimodal ELBO and thereby limits the generative quality of the respective models. Empirically, we showcase the generative quality gap on both synthetic and real data and present the tradeoffs between different variants of multimodal VAEs. We find that none of the existing approaches fulfills all desired criteria of an effective multimodal generative model when applied on more complex datasets than those used in previous benchmarks. In summary, we identify, formalize, and validate fundamental limitations of VAE-based approaches for modeling weakly-supervised data and discuss implications for real-world applications.

Authors

Imant Daunhawer, Thomas M. Sutter, Kieran Chin-Cheong, Emanuele Palumbo, Julia E. Vogt

Submitted

The Tenth International Conference on Learning Representations, ICLR 2022

Date

27.04.2022

Link

Abstract

In this work, we study the problem of clustering survival data — a challenging and so far under-explored task. We introduce a novel semi-supervised probabilistic approach to cluster survival data by leveraging recent advances in stochastic gradient variational inference. In contrast to previous work, our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and censored survival times. We compare our model to the related work on clustering and mixture models for survival data in comprehensive experiments on a wide range of synthetic, semi-synthetic, and real-world datasets, including medical imaging data. Our method performs better at identifying clusters and is competitive at predicting survival times. Relying on novel generative assumptions, the proposed model offers a holistic perspective on clustering survival data and holds a promise of discovering subpopulations whose survival is regulated by different generative mechanisms.

Authors

Laura Manduchi, Ricards Marcinkevics, Michela C. Massi, Thomas Weikert, Alexander Sauter, Verena Gotta, Timothy Müller, Flavio Vasella, Marian C. Neidert, Marc Pfister, Bram Stieltjes, Julia E. Vogt

Submitted

The Tenth International Conference on Learning Representations, ICLR 2022

Date

25.04.2022

LinkCode

Abstract

Partitioning a set of elements into a given number of groups of a priori unknown sizes is an important task in many applications. Due to hard constraints, it is a non-differentiable problem which prohibits its direct use in modern machine learning frameworks. Hence, previous works mostly fall back on suboptimal heuristics or simplified assumptions. The multivariate hypergeometric distribution offers a probabilistic formulation of how to distribute a given number of samples across multiple groups. Unfortunately, as a discrete probability distribution, it neither is differentiable. In this work, we propose a continuous relaxation for the multivariate non-central hypergeometric distribution. We introduce an efficient and numerically stable sampling procedure. This enables reparameterized gradients for the hypergeometric distribution and its integration into automatic differentiation frameworks. We highlight the applicability and usability of the proposed formulation on two different common machine learning tasks.

Authors

Thomas M. Sutter, Laura Manduchi, Alain Ryser, Julia E. Vogt

Submitted

Arxiv

Date

03.03.2022

LinkCode

Abstract

Using artificial intelligence to improve patient care is a cutting-edge methodology, but its implementation in clinical routine has been limited due to significant concerns about understanding its behavior. One major barrier is the explainability dilemma and how much explanation is required to use artificial intelligence safely in healthcare. A key issue is the lack of consensus on the definition of explainability by experts, regulators, and healthcare professionals, resulting in a wide variety of terminology and expectations. This paper aims to fill the gap by defining minimal explainability standards to serve the views and needs of essential stakeholders in healthcare. In that sense, we propose to define minimal explainability criteria that can support doctors’ understanding, meet patients’ needs, and fulfill legal requirements. Therefore, explainability need not to be exhaustive but sufficient for doctors and patients to comprehend the artificial intelligence models’ clinical implications and be integrated safely into clinical practice. Thus, minimally acceptable standards for explainability are context-dependent and should respond to the specific need and potential risks of each clinical scenario for a responsible and ethical implementation of artificial intelligence.

Authors

Laura Arbelaez Ossa, Georg Starke, Giorgia Lorenzini, Julia E Vogt, David M Shaw, Bernice Simone Elger

Submitted

DIGITAL HEALTH

Date

11.02.2022

LinkDOI

Abstract

Die Digitalisierung hat die Medizin bereits verändert und wird die ärztliche Tätig­keit auch in Zukunft stark beeinflussen. Es ist deshalb wichtig, dass sich angehende Ärztinnen und Ärzte bereits während des Studiums mit den Methoden und Ein­satzmöglichkeiten des maschinellen Lernens auseinandersetzen. Die Arbeits­gruppe «Digitalisierung der Medizin» hat dazu Lernziele erarbeitet.

Authors

Raphaël Bonvin, Joachim Buhmann, Carlos Cotrini Jimenez, Marcel Egger, Alexander Geissler, Michael Krauthammer, Christian Schirlo, Christiane Spiess, Johann Steurer, Kerstin Noëlle Vokinger, Julia Vogt

Date

26.01.2022

Link

Abstract

Appendicitis is a common childhood disease, the management of which still lacks consolidated international criteria. In clinical practice, heuristic scoring systems are often used to assess the urgency of patients with suspected appendicitis. Previous work on machine learning for appendicitis has focused on conventional classification models, such as logistic regression and tree-based ensembles. In this study, we investigate the use of risk supersparse linear integer models (risk SLIM) for learning data-driven risk scores to predict the diagnosis, management, and complications in pediatric patients with suspected appendicitis on a dataset consisting of 430 children from a tertiary care hospital. We demonstrate the efficacy of our approach and compare the performance of learnt risk scores to previous analyses with random forests. Risk SLIM is able to detect medically meaningful features and outperforms the traditional appendicitis scores, while at the same time is better suited for the clinical setting than tree-based ensembles.

Authors

Pedro Roig Aparicio, Ricards Marcinkevics, Patricia Reis Wolfertstetter, Sven Wellmann, Christian Knorr, Julia E. Vogt

Submitted

Short paper at 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021

Date

16.12.2021

LinkDOI

Abstract

Constrained clustering has gained significant attention in the field of machine learning as it can leverage prior information on a growing amount of only partially labeled data. Following recent advances in deep generative models, we propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of stochastic gradient variational inference. By explicitly integrating domain knowledge in the form of probabilistic relations, our proposed model (DC-GMM) uncovers the underlying distribution of data conditioned on prior clustering preferences, expressed as pairwise constraints. These constraints guide the clustering process towards a desirable partition of the data by indicating which samples should or should not belong to the same cluster. We provide extensive experiments to demonstrate that DC-GMM shows superior clustering performances and robustness compared to state-of-the-art deep constrained clustering methods on a wide range of data sets. We further demonstrate the usefulness of our approach on two challenging real-world applications.

Authors

Laura Manduchi, Kieran Chin-Cheong, Holger Michel, Sven Wellmann, Julia E. Vogt

Submitted

Accepted at NeurIPS 2021

Date

14.12.2021

Abstract

In this work, we propose a new formulation for multimodal VAEs to model and learn the relationship between data types. Despite their recent progress, current multimodal generative methods are based on simplistic assumptions regarding the relation between data types, which leads to a trade-off between coherence and quality of generated samples - even for simple toy datasets. The proposed method learns the relationship between data types instead of relying on pre-defined and limiting assumptions. Based on the principles of variational inference, we change the posterior approximation to explicitly include information about the relation between data types. We show empirically that the simplified assumption of a single shared latent space leads to inferior performance for a dataset with additional pairwise shared information.

Authors

Thomas M. Sutter, Julia E. Vogt

Submitted

Bayesian Deep Learning Workshop at Neurips 2021

Date

14.12.2021

Link

Abstract

Background: Current strategies for risk stratification and prediction of neonatal early-onset sepsis (EOS) are inefficient and lack diagnostic performance. The aim of this study was to use machine learning to analyze the diagnostic accuracy of risk factors (RFs), clinical signs and biomarkers and to develop a prediction model for culture-proven EOS. We hypothesized that the contribution to diagnostic accuracy of biomarkers is higher than of RFs or clinical signs. Study Design: Secondary analysis of the prospective international multicenter NeoPInS study. Neonates born after completed 34 weeks of gestation with antibiotic therapy due to suspected EOS within the first 72 hours of life participated. Primary outcome was defined as predictive performance for culture-proven EOS with variables known at the start of antibiotic therapy. Machine learning was used in form of a random forest classifier. Results: One thousand six hundred eighty-five neonates treated for suspected infection were analyzed. Biomarkers were superior to clinical signs and RFs for prediction of culture-proven EOS. C-reactive protein and white blood cells were most important for the prediction of the culture result. Our full model achieved an area-under-the-receiver-operating-characteristic-curve of 83.41% (+/-8.8%) and an area-under-the-precision-recall-curve of 28.42% (+/-11.5%). The predictive performance of the model with RFs alone was comparable with random. Conclusions: Biomarkers have to be considered in algorithms for the management of neonates suspected of EOS. A 2-step approach with a screening tool for all neonates in combination with our model in the preselected population with an increased risk for EOS may have the potential to reduce the start of unnecessary antibiotics.

Authors

Martin Stocker, Imant Daunhawer, Wendy van Herk, Salhab el Helou, Sourabh Dutta, Frank A. B. A.Schuerman, Rita K. van den Tooren-de Groot, ; Jantien W. Wieringa, Jan Janota, Laura H. van der Meer-Kappelle, Rob Moonen, Sintha D. Sie, Esther de Vries, Albertine E. Donker, Urs Zimmerman, Luregn J. Schlapbach, Amerik C. de Mol, Angelique Hoffmann-Haringsma, Madan Roy, Maren Tomaske, René F. Kornelisse, Juliette van Gijsel, Frans B. Plötz, Sven Wellmann, Niek B Achten, Dirk Lehnick, Annemarie M. C. van Rossum, Julia E. Vogt

Submitted

The Pediatric Infectious Disease Journal, 2022

Date

09.09.2021

LinkDOI

Abstract

Autonomic peripheral activity is partly governed by brain autonomic centers. However, there is still a lot of uncertainties regarding the precise link between peripheral and central autonomic biosignals. Clarifying these links could have a profound impact on the interpretability, and thus usefulness, of peripheral autonomic biosignals captured with wearable devices. In this study, we take advantage of a unique dataset consisting of intracranial stereo-electroencephalography (SEEG) and peripheral biosignals acquired simultaneously for several days from four subjects undergoing epilepsy monitoring. Compared to previous work, we apply a deep neural network to explore high-dimensional nonlinear correlations between the cerebral brainwaves and variations in heart rate and electrodermal activity (EDA). Further, neural network explainability methods were applied to identify most relevant brainwave frequencies, brain regions and temporal information to predict a specific biosignal. Strongest brain-peripheral correlations were observed from contacts located in the central autonomic network, in particular in the alpha, theta and 52 to 58 Hz frequency band. Furthermore, a temporal delay of 12 to 14 s between SEEG and EDA signal was observed. Finally, we believe that this pilot study demonstrates a promising approach to mapping brain-peripheral relationships in a data-driven manner by leveraging the expressiveness of deep neural networks.

Authors

Alexander H. Hatteland, Ricards Marcinkevics, Renaud Marquis, Thomas Frick, Ilona Hubbard, Julia E. Vogt, Thomas Brunschwiler, Philippe Ryvlin

Submitted

Best paper award at IEEE International Conference on Digital Health, ICDH 2021

Date

06.09.2021

LinkDOI

Abstract

Machine Learning has become more and more popular in the medical domain over the past years. While supervised machine learning has already been applied successfully, the vast amount of unlabelled data offers new opportunities for un- and self-supervised learning methods. Especially with regard to the multimodal nature of most clinical data, the labelling of multiple data types becomes quickly infeasible in the medical domain. However, to the best of our knowledge, multimodal unsupervised methods have been tested extensively on toy-datasets only but have never been applied to real-world medical data, for direct applications such as disease classification and image generation. In this article, we demonstrate that self-supervised methods provide promising results on medical data while highlighting that the task is extremely challenging and that there is space for substantial improvements.

Authors

Hendrik J. Klug, Thomas M. Sutter, Julia E. Vogt

Submitted

Medical Imaging with Deep Learning, MIDL 2021

Date

07.07.2021

Link

Abstract

Background Preterm neonates frequently experience hypernatremia (plasma sodium concentrations >145 mmol/l), which is associated with clinical complications, such as intraventricular hemorrhage. Study design In this single center retrospective observational study, the following 7 risk factors for hypernatremia were analyzed in very low gestational age (VLGA, below 32 weeks) neonates: gestational age (GA), delivery mode (DM; vaginal or caesarian section), sex, birth weight, small for GA, multiple birth, and antenatal corticosteroids. Machine learning (ML) approaches were applied to obtain probabilities for hypernatremia. Results 824 VLGA neonates were included (median GA 29.4 weeks, median birth weight 1170g, caesarean section 83%). 38% of neonates experienced hypernatremia. Maximal sodium concentration of 144 mmol/l (interquartile range 142–147) was observed 52 hours (41–65) after birth. ML identified vaginal delivery and GA as key risk factors for hypernatremia. The risk of hypernatremia increased with lower GA from 22% for GA >= 31–32 weeks to 46% for GA < 31 weeks and 60% for GA < 27 weeks. A linear relationship between maximal sodium concentrations and GA was found, showing decreases of 0.29 mmol/l per increasing week GA in neonates with vaginal delivery and 0.49 mmol/l/week after cesarean section. Sex, multiple birth and antenatal corticosteroids were not associated hypernatremia. Conclusion VLGA neonates with vaginal delivery and low GA have the highest risk for hypernatremia. Early identification of neonates at risk and early intervention may prevent extreme sodium excursions and associated clinical complications.

Authors

Nadia S. Eugster, Florence Corminboeuf, Gilbert Koch, Julia E. Vogt, Thomas Sutter, Tamara van Donge, Marc Pfister, Roland Gerull

Submitted

Klinische Pädiatrie

Date

07.06.2021

LinkDOI

Abstract

Multiple data types naturally co-occur when describing real-world phenomena and learning from them is a long-standing goal in machine learning research. However, existing self-supervised generative models approximating an ELBO are not able to fulfill all desired requirements of multimodal models: their posterior approximation functions lead to a trade-off between the semantic coherence and the ability to learn the joint data distribution. We propose a new, generalized ELBO formulation for multimodal data that overcomes these limitations. The new objective encompasses two previous methods as special cases and combines their benefits without compromises. In extensive experiments, we demonstrate the advantage of the proposed method compared to state-of-the-art models in self-supervised, generative learning tasks.

Authors

Thomas M. Sutter, Imant Daunhawer, Julia E. Vogt

Submitted

Ninth International Conference on Learning Representations, ICLR 2021

Date

04.05.2021

Link

Abstract

Background: Given the absence of consolidated and standardized international guidelines for managing pediatric appendicitis and the few strictly data-driven studies in this specific, we investigated the use of machine learning (ML) classifiers for predicting the diagnosis, management and severity of appendicitis in children. Materials and Methods: Predictive models were developed and validated on a dataset acquired from 430 children and adolescents aged 0-18 years, based on a range of information encompassing history, clinical examination, laboratory parameters, and abdominal ultrasonography. Logistic regression, random forests, and gradient boosting machines were used for predicting the three target variables. Results: A random forest classifier achieved areas under the precision-recall curve of 0.94, 0.92, and 0.70, respectively, for the diagnosis, management, and severity of appendicitis. We identified smaller subsets of 6, 17, and 18 predictors for each of targets that sufficed to achieve the same performance as the model based on the full set of 38 variables. We used these findings to develop the user-friendly online Appendicitis Prediction Tool for children with suspected appendicitis. Discussion: This pilot study considered the most extensive set of predictor and target variables to date and is the first to simultaneously predict all three targets in children: diagnosis, management, and severity. Moreover, this study presents the first ML model for appendicitis that was deployed as an open access easy-to-use online tool. Conclusion: ML algorithms help to overcome the diagnostic and management challenges posed by appendicitis in children and pave the way toward a more personalized approach to medical decision-making. Further validation studies are needed to develop a finished clinical decision support system.

Authors

Ricards Marcinkevics, Patricia Reis Wolfertstetter, Sven Wellmann, Christian Knorr, Julia E Vogt

Submitted

Frontiers in Pediatrics

Date

29.04.2021

LinkDOICode

Abstract

Survival analysis has gained significant attention in the medical domain with many far-reaching applications. Although a variety of machine learning methods have been introduced for tackling time-to-event prediction in unstructured data with complex dependencies, clustering of survival data remains an under-explored problem. The latter is particularly helpful in discovering patient subpopulations whose survival is regulated by different generative mechanisms, a critical problem in precision medicine. To this end, we introduce a novel probabilistic approach to cluster survival data in a variational deep clustering setting. Our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and the potentially censored survival times. We compare our model to the related work on survival clustering in comprehensive experiments on a range of synthetic, semi-synthetic, and real-world datasets. Our proposed method performs better at identifying clusters and is competitive at predicting survival times in terms of the concordance index and relative absolute error.

Authors

Laura Manduchi, Ricards Marcinkevics, Julia E. Vogt

Submitted

Contributed talk at AI for Public Health Workshop at ICLR 2021

Date

09.04.2021

Link

Abstract

Generating interpretable visualizations of multivariate time series in the intensive care unit is of great practical importance. Clinicians seek to condense complex clinical observations into intuitively understandable critical illness patterns, like failures of different organ systems. They would greatly benefit from a low-dimensional representation in which the trajectories of the patients’ pathology become apparent and relevant health features are highlighted. To this end, we propose to use the latent topological structure of Self-Organizing Maps (SOMs) to achieve an interpretable latent representation of ICU time series and combine it with recent advances in deep clustering. Specifically, we (a) present a novel way to fit SOMs with probabilistic cluster assignments (PSOM), (b) propose a new deep architecture for probabilistic clustering (DPSOM) using a VAE, and (c) extend our architecture to cluster and forecastclinical states in time series (T-DPSOM). We show that our model achieves superior clustering performance compared to state-of-the-art SOM-based clustering methods while maintaining the favorable visualization properties of SOMs. On the eICU data-set, we demonstrate that T-DPSOM provides interpretable visualizations ofpatient state trajectories and uncertainty estimation. We show that our method rediscovers well-known clinical patient characteristics, such as a dynamic variant of the Acute Physiology And Chronic Health Evaluation (APACHE) score. Moreover, we illustrate how itcan disentangle individual organ dysfunctions on disjoint regions of the two-dimensional SOM map.

Authors

Laura Manduchi, Matthias Hüser, Martin Faltys, Julia Vogt, Gunnar Rätsch, Vincent Fortuin

Submitted

ACM CHIL 2021

Date

04.03.2021

Link

Abstract

In the quest for efficient and robust learning methods, combining unsupervised state representation learning and reinforcement learning (RL) could offer advantages for scaling RL algorithms by providing the models with a useful inductive bias. For achieving this, an encoder is trained in an unsupervised manner with two state representation methods, a variational autoencoder and a contrastive estimator. The learned features are then fed to the actor-critic RL algorithm Proximal Policy Optimization (PPO) to learn a policy for playing Open AI’s car racing environment. Hence, such procedure permits to decouple state representations from RL-controllers. For the integration of RL with unsupervised learning, we explore various designs for variational autoencoders and contrastive learning. The proposed method is compared to a deep network trained directly on pixel inputs with PPO. The results show that the proposed method performs slightly worse than directly learning from pixel inputs; however, it has a more stable learning curve, a substantial reduction of the buffer size, and requires optimizing 88% fewer parameters. These results indicate that the use of pre-trained state representations hasseveral benefits for solving RL tasks.

Authors

Juan M. Montoya, Imant Daunhawer, Julia E. Vogt, Marco Wiering

Submitted

ICAART 2021

Date

04.02.2021

Link

Abstract

Exploratory analysis of time series data can yield a better understanding of complex dynamical systems. Granger causality is a practical framework for analysing interactions in sequential data, applied in a wide range of domains. In this paper, we propose a novel framework for inferring multivariate Granger causality under nonlinear dynamics based on an extension of self-explaining neural networks. This framework is more interpretable than other neural-network-based techniques for inferring Granger causality, since in addition to relational inference, it also allows detecting signs of Granger-causal effects and inspecting their variability over time. In comprehensive experiments on simulated data, we show that our framework performs on par with several powerful baseline methods at inferring Granger causality and that it achieves better performance at inferring interaction signs. The results suggest that our framework is a viable and more interpretable alternative to sparse-input neural networks for inferring Granger causality.

Authors

Ricards Marcinkevics, Julia E. Vogt

Submitted

Ninth International Conference on Learning Representations, ICLR 2021

Date

15.01.2021

LinkCode

Abstract

Rationale Tuberculosis diagnosis in children remains challenging. Microbiological confirmation of tuberculosis disease is often lacking, and standard immunodiagnostic including the tuberculin skin test and interferon-gamma release assay for tuberculosis infection has limited sensitivity. Recent research suggests that inclusion of novel Mycobacterium tuberculosis antigens has the potential to improve standard immunodiagnostic tests for tuberculosis. Objective To identify optimal antigen–cytokine combinations using novel Mycobacterium tuberculosis antigens and cytokine read-outs by machine learning algorithms to improve immunodiagnostic assays for tuberculosis. Methods A total of 80 children undergoing investigation of tuberculosis were included (15 confirmed tuberculosis disease, five unconfirmed tuberculosis disease, 28 tuberculosis infection and 32 unlikely tuberculosis). Whole blood was stimulated with 10 novel Mycobacterium tuberculosis antigens and a fusion protein of early secretory antigenic target (ESAT)-6 and culture filtrate protein (CFP) 10. Cytokines were measured using xMAP multiplex assays. Machine learning algorithms defined a discriminative classifier with performance measured using area under the receiver operating characteristics. Measurements and main results We found the following four antigen–cytokine pairs had a higher weight in the discriminative classifier compared to the standard ESAT-6/CFP-10-induced interferon-gamma: Rv2346/47c- and Rv3614/15c-induced interferon-gamma inducible protein-10; Rv2031c-induced granulocyte-macrophage colony-stimulating factor and ESAT-6/CFP-10-induced tumor necrosis factor-alpha. A combination of the 10 best antigen–cytokine pairs resulted in area under the curve of 0.92 +/- 0.04. Conclusion We exploited the use of machine learning algorithms as a key tool to evaluate large immunological datasets. This identified several antigen–cytokine pairs with the potential to improve immunodiagnostic tests for tuberculosis in children.

Authors

Noemi Rebecca Meier, Thomas M. Sutter, Marc Jacobsen, Tom H. M. Ottenhoff, Julia E. Vogt, Nicole Ritz

Submitted

Frontiers in Cellular and Infection Microbiology

Date

08.01.2021

LinkDOI

Abstract

Unplanned hospital readmissions are a burden to patients and increase healthcare costs. A wide variety of machine learning (ML) models have been suggested to predict unplanned hospital readmissions. These ML models were often specifically trained on patient populations with certain diseases. However, it is unclear whether these specialized ML models—trained on patient subpopulations with certain diseases or defined by other clinical characteristics—are more accurate than a general ML model trained on an unrestricted hospital cohort. In this study based on an electronic health record cohort of consecutive inpatient cases of a single tertiary care center, we demonstrate that accurate prediction of hospital readmissions may be obtained by general, disease-independent, ML models. This general approach may substantially decrease the cost of development and deployment of respective ML models in daily clinical routine, as all predictions are obtained by the use of a single model.

Authors

Thomas Sutter, Jan A Roth, Kieran Chin-Cheong, Balthasar L Hug, Julia E Vogt

Submitted

Journal of the American Medical Informatics Association

Date

18.12.2020

LinkDOI

Abstract

In this review, we examine the problem of designing interpretable and explainable machine learning models. Interpretability and explainability lie at the core of many machine learning and statistical applications in medicine, economics, law, and natural sciences. Although interpretability and explainability have escaped a clear universal definition, many techniques motivated by these properties have been developed over the recent 30 years with the focus currently shifting towards deep learning methods. In this review, we emphasise the divide between interpretability and explainability and illustrate these two different research directions with concrete examples of the state-of-the-art. The review is intended for a general machine learning audience with interest in exploring the problems of interpretation and explanation beyond logistic regression or random forest variable importance. This work is not an exhaustive literature survey, but rather a primer focusing selectively on certain lines of research which the authors found interesting or informative.

Authors

Ricards Marcinkevics, Julia E. Vogt

Submitted

Arxiv

Date

04.12.2020

Link

Abstract

Exploratory analysis of time series data can yield a better understanding of complex dynamical systems. Granger causality is a practical framework for analysing interactions in sequential data, applied in a wide range of domains. In this paper, we propose a novel framework for inferring multivariate Granger causality under nonlinear dynamics based on an extension of self-explaining neural networks. This framework is more interpretable than other neural-network-based techniques for inferring Granger causality, since in addition to relational inference, it also allows detecting signs of Granger-causal effects and inspecting their variability over time. In comprehensive experiments on simulated data, we show that our framework performs on par with several powerful baseline methods at inferring Granger causality and that it achieves better performance at inferring interaction signs. The results suggest that our framework is a viable and more interpretable alternative to sparse-input neural networks for inferring Granger causality.

Authors

Ricards Marcinkevics, Julia E. Vogt

Submitted

Interpretable Inductive Biases and Physically Structured Learning Workshop, NeurIPS 2020

Date

01.11.2020

Link

Abstract

Learning from different data types is a long-standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. However, existing generative models that approximate a multimodal ELBO rely on difficult or inefficient training schemes to learn a joint distribution and the dependencies between modalities. In this work, we propose a novel, efficient objective function that utilizes the Jensen-Shannon divergence for multiple distributions. It simultaneously approximates the unimodal and joint multimodal posteriors directly via a dynamic prior. In addition, we theoretically prove that the new multimodal JS-divergence (mmJSD) objective optimizes an ELBO. In extensive experiments, we demonstrate the advantage of the proposed mmJSD model compared to previous work in unsupervised, generative learning tasks.

Authors

Thomas M. Sutter, Imant Daunhawer, Julia E. Vogt

Submitted

NeurIPS 2019

Date

22.10.2020

Link

Abstract

PET/CT imaging is the gold standard for the diagnosis and staging of lung cancer. However, especially in healthcare systems with limited resources, costly PET/CT images are often not readily available. Conventional machine learning models either process CT or PET/CT images but not both. Models designed for PET/CT images are hence restricted by the number of PET images, such that they are unable to additionally leverage CT-only data. In this work, we apply the concept of visual soft attention to efficiently learn a model for lung cancer segmentation from only a small fraction of PET/CT scans and a larger pool of CT-only scans. We show that our model is capable of jointly processing PET/CT as well as CT-only images, which performs on par with the respective baselines whether or not PET images are available at test time. We then demonstrate that the model learns efficiently from only a few PET/CT scans in a setting where mostly CT-only data is available, unlike conventional models.

Authors

Varaha Karthik Pattisapu, Imant Daunhawer, Thomas Weikert, Alexander Sauter, Bram Stieltjes, Julia E. Vogt

Submitted

GCPR 2020

Date

12.10.2020

Link

Abstract

Multimodal generative models learn a joint distribution over multiple modalities and thus have the potential to learn richer representations than unimodal models. However, current approaches are either inefficient in dealing with more than two modalities or fail to capture both modality-specific and shared variations. We introduce a new multimodal generative model that integrates both modality-specific and shared factors and aggregates shared information across any subset of modalities efficiently. Our method partitions the latent space into disjoint subspaces for modality-specific and shared factors and learns to disentangle these in a purely self-supervised manner. In extensive experiments, we show improvements in representation learning and generative performance compared to previous methods and showcase the disentanglement capabilities.

Authors

Imant Daunhawer, Thomas M. Sutter, Ricards Marcinkevics, Julia E. Vogt

Submitted

GCPR 2020

Date

10.09.2020

Link

Abstract

Background The mortality risk remains significant in paediatric and adult patients on chronic haemodialysis (HD) treatment. We aimed to identify factors associated with mortality in patients who started HD as children and continued HD as adults. Methods The data originated from a cohort of patients < 30 years of age who started HD in childhood (<= 19 years) on thrice-weekly HD in outpatient DaVita dialysis centres between 2004 and 2016. Patients with at least 5 years of follow-up since the initiation of HD or death within 5 years were included; 105 variables relating to demographics, HD treatment and laboratory measurements were evaluated as predictors of 5-year mortality utilizing a machine learning approach (random forest). Results A total of 363 patients were included in the analysis, with 84 patients having started HD at < 12 years of age. Low albumin and elevated lactate dehydrogenase (LDH) were the two most important predictors of 5-year mortality. Other predictors included elevated red blood cell distribution width or blood pressure and decreased red blood cell count, haemoglobin, albumin:globulin ratio, ultrafiltration rate, z-score weight for age or single-pool K_t/V (below target). Mortality was predicted with an accuracy of 81%. Conclusions Mortality in paediatric and young adult patients on chronic HD is associated with multifactorial markers of nutrition, inflammation, anaemia and dialysis dose. This highlights the importance of multimodal intervention strategies besides adequate HD treatment as determined by K_t/V alone. The association with elevated LDH was not previously reported and may indicate the relevance of blood–membrane interactions, organ malperfusion or haematologic and metabolic changes during maintenance HD in this population.

Authors

Verena Gotta, Georgi Tancev, Olivera Marsenic, Julia E. Vogt, Marc Pfister

Submitted

Nephrology Dialysis Transplantation

Date

08.06.2020

LinkDOI

Abstract

Electronic Health Records (EHRs) are commonly used by the machine learning community for research on problems specifically related to health care and medicine. EHRs have the advantages that they can be easily distributed and contain many features useful for e.g. classification problems. What makes EHR data sets different from typical machine learning data sets is that they are often very sparse, due to their high dimensionality, and often contain heterogeneous (mixed) data types. Furthermore, the data sets deal with sensitive information, which limits the distribution of any models learned using them, due to privacy concerns. For these reasons, using EHR data in practice presents a real challenge. In this work, we explore using Generative Adversarial Networks to generate synthetic, heterogeneous EHRs with the goal of using these synthetic records in place of existing data sets for downstream classification tasks. We will further explore applying differential privacy (DP) preserving optimization in order to produce DP synthetic EHR data sets, which provide rigorous privacy guarantees, and are therefore shareable and usable in the real world. The performance (measured by AUROC, AUPRC and accuracy) of our model's synthetic, heterogeneous data is very close to the original data set (within 3 - 5% of the baseline) for the non-DP model when tested in a binary classification task. Using strong (1,10^-5) DP, our model still produces data useful for machine learning tasks, albeit incurring a roughly 17% performance penalty in our tested classification task. We additionally perform a sub-population analysis and find that our model does not introduce any bias into the synthetic EHR data compared to the baseline in either male/female populations, or the 0-18, 19-50 and 51+ age groups in terms of classification performance for either the non-DP or DP variant.

Authors

Kieran Chin-Cheong, Thomas M. Sutter, Julia E. Vogt

Submitted

Arxiv

Date

07.06.2020

Link

Abstract

Clinical pharmacology is a multi-disciplinary data sciences field that utilizes mathematical and statistical methods to generate maximal knowledge from data. Pharmacometrics (PMX) is a well-recognized tool to characterize disease progression, pharmacokinetics and risk factors. Since the amount of data produced keeps growing with increasing pace, the computational effort necessary for PMX models is also increasing. Additionally, computationally efficient methods such as machine learning (ML) are becoming increasingly important in medicine. However, ML is currently not an integrated part of PMX, for various reasons. The goals of this article are to (i) provide an introduction to ML classification methods, (ii) provide examples for a ML classification analysis to identify covariates based on specific research questions, (iii) examine a clinically relevant example to investigate possible relationships of ML and PMX, and (iv) present a summary of ML and PMX tasks to develop clinical decision support tools.

Authors

Gilbert Koch, Marc Pfister, Imant Daunhawer, Melanie Wilbaux, Sven Wellmann, Julia E. Vogt

Submitted

Clinical Pharmacology & Therapeutics, 2020

Date

11.01.2020

LinkDOI

Abstract

Despite the application of advanced statistical and pharmacometric approaches to pediatric trial data, a large pediatric evidence gap still remains. Here, we discuss how to collect more data from children by using real-world data from electronic health records, mobile applications, wearables, and social media. The large datasets collected with these approaches enable, and may demand, the use of artificial intelligence and machine learning to allow the data to be analyzed for decision-making. Applications of this approach are presented, which include the prediction of future clinical complications, medical image analysis, identification of new pediatric endpoints and biomarkers, the prediction of treatment non-responders and the prediction of placebo-responders for trial enrichment. Finally, we discuss how to bring machine learning from science to pediatric clinical practice. We conclude that advantage should be taken of the current opportunities offered by innovations in data science and machine learning to close the pediatric evidence gap.

Authors

Sebastiaan C. Goulooze, Laura B. Zwep, Julia E. Vogt, Elke H.J. Krekels, Thomas Hankemeier, John N. van den Anker, Catherijne A.J. Knibbe

Submitted

Clinical Pharmacology & Therapeutics

Date

19.12.2019

LinkDOI

Abstract

Learning from different data types is a long standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. Existing generative models that try to approximate a multimodal ELBO rely on difficult training schemes to handle the intermodality dependencies, as well as the approximation of the joint representation in case of missing data. In this work, we propose an ELBO for multimodal data which learns the unimodal and joint multimodal posterior approximation functions directly via a dynamic prior. We show that this ELBO is directly derived from a variational inference setting for multiple data types, resulting in a divergence term which is the Jensen-Shannon divergence for multiple distributions. We compare the proposed multimodal JS-divergence (mmJSD) model to state-of-the-art methods and show promising results using our model in unsupervised, generative learning using a multimodal VAE on two different datasets.

Authors

Thomas Sutter, Imant Daunhawer, Julia E. Vogt

Submitted

Visually Grounded Interaction and Language Workshop, NeurIPS 2019

Date

12.12.2019

Abstract

Multimodal generative models learn a joint distribution of data from different modalities---a task which arguably benefits from the disentanglement of modality-specific and modality-invariant information. We propose a factorized latent variable model that learns named disentanglement on multimodal data without additional supervision. We demonstrate the disentanglement capabilities on simulated data, and show that disentangled representations can improve the conditional generation of missing modalities without sacrificing unconditional generation.

Authors

Imant Daunhawer, Thomas Sutter, Julia E. Vogt

Submitted

Bayesian Deep Learning Workshop, NeurIPS 2019

Date

12.12.2019

Abstract

Electronic Health Records (EHRs) are commonly used by the machine learning community for research on problems specifically related to health care and medicine. EHRs have the advantages that they can be easily distributed and contain many features useful for e.g. classification problems. What makes EHR data sets different from typical machine learning data sets is that they are often very sparse, due to their high dimensionality, and often contain heterogeneous data types. Furthermore, the data sets deal with sensitive information, which limits the distribution of any models learned using them, due to privacy concerns. In this work, we explore using Generative Adversarial Networks to generate synthetic, \textit{heterogeneous} EHRs with the goal of using these synthetic records in place of existing data sets. We will further explore applying differential privacy (DP) preserving optimization in order to produce differentially private synthetic EHR data sets, which provide rigorous privacy guarantees, and are therefore more easily shareable. The performance of our model's synthetic, heterogeneous data is very close to the original data set (within 4.5%) for the non-DP model. Although around 20% worse, the DP synthetic data is still usable for machine learning tasks.

Authors

Kieran Chin-Cheong, Thomas Sutter, Julia E. Vogt

Submitted

Machine Learning for Health (ML4H) Workshop, NeurIPS 2019

Date

12.12.2019

Abstract

We present a probabilistic model for clustering which enables the modeling of overlapping clusters where objects are only available as pairwise distances. Examples of such distance data are genomic string alignments, or protein contact maps. In our clustering model, an object has the freedom to belong to one or more clusters at the same time. By using an IBP process prior, there is no need to explicitly fix the number of clusters, as well as the number of overlapping clusters, in advance. In this paper, we demonstrate the utility of our model using distance data obtained from HIV1 protease inhibitor contact maps.

Authors

Sandhya Prabhakaran, Julia E. Vogt

Submitted

Artificial Intelligence in Medicine (AIME), Springer Lecture Notes in Artificial Intelligence, 2019

Date

29.05.2019

LinkDOI

Abstract

The recent adoption of Electronic Health Records (EHRs) by health care providers has introduced an important source of data that provides detailed and highly specific insights into patient phenotypes over large cohorts. These datasets, in combination with machine learning and statistical approaches, generate new opportunities for research and clinical care. However, many methods require the patient representations to be in structured formats, while the information in the EHR is often locked in unstructured texts designed for human readability. In this work, we develop the methodology to automatically extract clinical features from clinical narratives from large EHR corpora without the need for prior knowledge. We consider medical terms and sentences appearing in clinical narratives as atomic information units. We propose an efficient clustering strategy suitable for the analysis of large text corpora and to utilize the clusters to represent information about the patient compactly. To demonstrate the utility of our approach, we perform an association study of clinical features with somatic mutation profiles from 4,007 cancer patients and their tumors. We apply the proposed algorithm to a dataset consisting of about 65 thousand documents with a total of about 3.2 million sentences. We identify 341 significant statistical associations between the presence of somatic mutations and clinical features. We annotated these associations according to their novelty, and report several known associations. We also propose 32 testable hypotheses where the underlying biological mechanism does not appear to be known but plausible. These results illustrate that the automated discovery of clinical features is possible and the joint analysis of clinical and genetic datasets can generate appealing new hypotheses.

Authors

Stefan G. Stark, Stephanie L. Hyland, Melanie F. Pradier, Kjong Lehmann, Andreas Wicki, Fernando Perez Cruz, Julia E. Vogt, Gunnar Rätsch

Submitted

Arxiv preprint

Date

02.05.2019

Link

Abstract

Motivation: Personalized medicine aims at combining genetic, clinical, and environmental data to improve medical diagnosis and disease treatment, tailored to each patient. This paper presents a Bayesian nonparametric (BNP) approach to identify genetic associations with clinical/environmental features in cancer. We propose an unsupervised approach to generate data-driven hypotheses and bring potentially novel insights about cancer biology. Our model combines somatic mutation information at gene-level with features extracted from the Electronic Health Record. We propose a hierarchical approach, the hierarchical Poisson factor analysis (H-PFA) model, to share information across patients having different types of cancer. To discover statistically significant associations, we combine Bayesian modeling with bootstrapping techniques and correct for multiple hypothesis testing. Results: Using our approach, we empirically demonstrate that we can recover well-known associations in cancer literature. We compare the results of H-PFA with two other classical methods in the field: case-control (CC) setups, and linear mixed models (LMMs).

Authors

Melanie F. Pradier, Stephanie L. Hyland, Stefan G. Stark, Kjong Lehmann, Julia E. Vogt, Fernando Perez-Cruz, Gunnar Rätsch

Submitted

Biorxiv preprint

Date

29.04.2019

LinkDOI

Abstract

Background Machine learning models may enhance the early detection of clinically relevant hyperbilirubinemia based on patient information available in every hospital. Methods We conducted a longitudinal study on preterm and term born neonates with serial measurements of total serum bilirubin in the first two weeks of life. An ensemble, that combines a logistic regression with a random forest classifier, was trained to discriminate between the two classes phototherapy treatment vs. no treatment. Results Of 362 neonates included in this study, 98 had a phototherapy treatment, which our model was able to predict up to 48 h in advance with an area under the ROC-curve of 95.20%. From a set of 44 variables, including potential laboratory and clinical confounders, a subset of just four (bilirubin, weight, gestational age, hours since birth) suffices for a strong predictive performance. The resulting early phototherapy prediction tool (EPPT) is provided as an open web application. Conclusion Early detection of clinically relevant hyperbilirubinemia can be enhanced by the application of machine learning. Existing guidelines can be further improved to optimize timing of bilirubin measurements to avoid toxic hyperbilirubinemia in high-risk patients while minimizing unneeded measurements in neonates who are at low risk.

Authors

Imant Daunhawer, Severin Kasser, Gilbert Koch, Lea Sieber, Hatice Cakal, Janina Tütsch, Marc Pfister, Sven Wellman, Julia E. Vogt

Submitted

Pediatric Research, 2019

Date

30.03.2019

LinkDOI

Abstract

To exploit the full potential of big routine data in healthcare and to efficiently communicate and collaborate with information technology specialists and data analysts, healthcare epidemiologists should have some knowledge of large-scale analysis techniques, particularly about machine learning. This review focuses on the broad area of machine learning and its first applications in the emerging field of digital healthcare epidemiology.

Authors

Jan A. Roth, Manuel Battegay, Fabrice Juchler, Julia E. Vogt, Andreas F. Widmer

Submitted

Infection Control & Hospital Epidemiology, 2018

Date

04.11.2018

LinkDOI

Abstract

Molecular classification of hepatocellular carcinomas (HCC) could guide patient stratification for personalizedtherapies targeting subclass-specific cancer 'driver pathways'. Currently, there are several transcriptome-basedmolecular classifications of HCC with different subclass numbers, ranging from two to six. They were estab-lished using resected tumours that introduce a selection bias towards patients without liver cirrhosis and withearly stage HCCs. We generated and analyzed gene expression data from paired HCC and non-cancerous livertissue biopsies from 60 patients as well as five normal liver samples. Unbiased consensus clustering of HCCbiopsy profiles identified 3 robust classes. Class membership correlated with survival, tumour size and withEdmondson and Barcelona Clinical Liver Cancer (BCLC) stage. When focusing only on the gene expression ofthe HCC biopsies, we could validate previously reported classifications of HCC based on expression patterns ofsignature genes. However, the subclass-specific gene expression patterns were no longer preserved when thefold-change relative to the normal tissue was used. The majority of genes believed to be subclass-specificturned out to be cancer-related genes differentially regulated in all HCC patients, with quantitative ratherthan qualitative differences between the molecular subclasses. With the exception of a subset of samples with a definitive \beta-catenin gene signature, biological pathway analysis could not identify class-specific pathwaysreflecting the activation of distinct oncogenic programs. In conclusion, we have found that gene expressionprofiling of HCC biopsies has limited potential to direct therapies that target specific driver pathways, but canidentify subgroups of patients with different prognosis.

Authors

Zuzanna Makowska, Tujana Boldanova, David Adametz, Luca Quagliata, Julia E. Vogt, Michael T. Dill, Mathias S. Matter, Volker Roth, Luigi Terracciano, Markus H. Heim

Submitted

Journal of Pathology: Clinical Research, 2016

Date

05.01.2016

LinkDOI