Publications
2024
Self-supervised learning (SSL) has emerged as a powerful approach for learning biologically meaningful representations of single-cell data. To establish best practices in this domain, we present a comprehensive benchmark evaluating eight SSL methods across three downstream tasks and eight datasets, with various data augmentation strategies. Our results demonstrate that SimCLR and VICReg consistently outperform other methods across different tasks. Furthermore, we identify random masking as the most effective augmentation technique. This benchmark provides valuable insights into the application of SSL to single-cell data analysis, bridging the gap between SSL and single-cell biology.
AuthorsPhilip Toma*, Olga Ovcharenko*, Imant Daunhawer, Julia Vogt, Florian Barkmann†, Valentina Boeva†* denotes shared first authorship, † denotes shared last authorship
SubmittedPreprint
Date06.11.2024
The structure of many real-world datasets is intrinsically hierarchical, making the modeling of such hierarchies a critical objective in both unsupervised and supervised machine learning. Recently, novel approaches for hierarchical clustering with deep architectures have been proposed. In this work, we take a critical perspective on this line of research and demonstrate that many approaches exhibit major limitations when applied to realistic datasets, partly due to their high computational complexity. In particular, we show that a lightweight procedure implemented on top of pre-trained non-hierarchical clustering models outperforms models designed specifically for hierarchical clustering. Our proposed approach is computationally efficient and applicable to any pre-trained clustering model that outputs logits, without requiring any fine-tuning. To highlight the generality of our findings, we illustrate how our method can also be applied in a supervised setup, recovering meaningful hierarchies from a pre-trained ImageNet classifier.
AuthorsEmanuele Palumbo, Moritz Vandenhirtz, Alain Ryser, Imant Daunhawer†, Julia E. Vogt†† denotes shared last authorship
SubmittedPreprint
Date10.10.2024
Performant machine learning models are becoming increasingly complex and large. Due to their black-box design, they often have limited utility in exploratory data analysis and evoke little trust in non-expert users. Interpretable and explainable machine learning research emerges from application domains where, for technical or social reasons, interpreting or explaining the model's predictions or parameters is deemed necessary. In practice, interpretability and explainability are attained by (i) constructing models understandable to users by design and (ii) developing techniques to help explain already-trained black-box models. This thesis develops interpretable and explainable machine learning models and methods tailored to applications in biomedical and healthcare data analysis. The challenges posed by this domain require nontrivial solutions and deserve special treatment. In particular, we consider practical use cases with high-dimensional and unstructured data types, diverse application scenarios, and different stakeholder groups, which all dictate special design considerations. We demonstrate that, beyond social and ethical value, interpretability and explainability help in (i) performing exploratory data analysis, (ii) supporting medical professionals' decisions, (iii) facilitating interaction with users, and (iv) debugging the model. Our contributions are structured in two parts, tackling distinct research questions from the perspective of biomedical and healthcare applications. Firstly, we explore how to develop and incorporate inductive biases to render neural network models interpretable. Secondly, we study how to leverage explanation methods to interact with and edit already-trained black-box models. This work spans several model and method families, including interpretable neural network architectures, prototype- and concept-based models, and attribution methods. Our techniques are motivated by classic biomedical and healthcare problems, such as time series, survival, and medical image analysis. In addition to new model and method development, we concentrate on empirical comparison, providing proof-of-concept results on real-world biomedical benchmarks. Thus, the primary contribution of this thesis is the development of interpretable models and explanation methods with a principled treatment of specific biomedical and healthcare data types to solve application- and user-grounded problems. Through concrete use cases, we show that interpretability and explainability are context- and user-specific and, therefore, must be studied in conjunction with their application domain. We hope that our methodological and empirical contributions pave the way for future application- and user-driven interpretable and explainable machine learning research.
AuthorsRicards Marcinkevics
SubmittedDoctoral thesis
Date24.09.2024
Sudden cardiac death (SCD) remains a pressing health issue, affecting hundreds of thousands each year globally. The heterogeneity among SCD victims, ranging from individuals with severe heart failure to seemingly healthy individuals, poses a significant challenge for effective risk assessment. Conventional risk stratification, which primarily relies on left ventricular ejection fraction, has resulted in only modest efficacy of implantable cardioverter-defibrillators for SCD prevention. In response, artificial intelligence (AI) holds promise for personalized SCD risk prediction and tailoring preventive strategies to the unique profiles of individual patients. Machine and deep learning algorithms have the capability to learn intricate nonlinear patterns between complex data and defined end points and leverage these to identify subtle indicators and predictors of SCD that may not be apparent through traditional statistical analysis. However, despite the potential of AI to improve SCD risk stratification, there are important limitations that need to be addressed. We aim to provide an overview of the current state-of-the-art of AI prediction models for SCD, highlight the opportunities for these models in clinical practice, and identify the key challenges hindering widespread adoption.
AuthorsMZH Kolk, S Ruipérez-Campillo, AAM Wilde, RE Knops, SM Narayan, FVY Tjong
SubmittedHeart Rhythm
Date06.09.2024
Concept Bottleneck Models (CBMs) have emerged as a promising interpretable method whose final prediction is based on intermediate, human-understandable concepts rather than the raw input. Through time-consuming manual interventions, a user can correct wrongly predicted concept values to enhance the model's downstream performance. We propose Stochastic Concept Bottleneck Models (SCBMs), a novel approach that models concept dependencies. In SCBMs, a single-concept intervention affects all correlated concepts. Leveraging the parameterization, we derive an effective intervention strategy based on the confidence region. We show empirically on synthetic tabular and natural image datasets that our approach improves intervention effectiveness significantly. Notably, we showcase the versatility and usability of SCBMs by examining a setting with CLIP-inferred concepts, alleviating the need for manual concept annotations.
AuthorsMoritz Vandenhirtz*, Sonia Laguna*, Ricards Marcinkevics, Julia E. Vogt* denotes shared first authorship
SubmittedICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling, Workshop on Models of Human Feedback for AI Alignment, and Workshop on Humans, Algorithmic Decision-Making and Society
Date26.07.2024
Background: Segmenting computed tomography (CT) is crucial in various clinical applications, such as tailoring personalized cardiac ablation for managing cardiac arrhythmias. Automating segmentation through machine learning (ML) is hindered by the necessity for large, labeled training data, which can be challenging to obtain. This article proposes a novel approach for automated, robust labeling using domain knowledge to achieve high-performance segmentation by ML from a small training set. The approach, the domain knowledge-encoding (DOKEN) algorithm, reduces the reliance on large training datasets by encoding cardiac geometry while automatically labeling the training set. The method was validated in a hold-out dataset of CT results from an atrial fibrillation (AF) ablation study. Methods: The DOKEN algorithm parses left atrial (LA) structures, extracts “anatomical knowledge” by leveraging digital LA models (available publicly), and then applies this knowledge to achieve high ML segmentation performance with a small number of training samples. The DOKEN-labeled training set was used to train a nnU-Net deep neural network (DNN) model for segmenting cardiac CT in N = 20 patients. Subsequently, the method was tested in a hold-out set with N = 100 patients (five times larger than training set) who underwent AF ablation. Results: The DOKEN algorithm integrated with the nn-Unet model achieved high segmentation performance with few training samples, with a training to test ratio of 1:5. The Dice score of the DOKEN-enhanced model was 96.7% (IQR: 95.3% to 97.7%), with a median error in surface distance of boundaries of 1.51 mm (IQR: 0.72 to 3.12) and a mean centroid–boundary distance of 1.16 mm (95% CI: −4.57 to 6.89), similar to expert results (r = 0.99; p < 0.001). In digital hearts, the novel DOKEN approach segmented the LA structures with a mean difference for the centroid–boundary distances of −0.27 mm (95% CI: −3.87 to 3.33; r = 0.99; p < 0.0001). Conclusions: The proposed novel domain knowledge-encoding algorithm was able to perform the segmentation of six substructures of the LA, reducing the need for large training data sets. The combination of domain knowledge encoding and a machine learning approach could reduce the dependence of ML on large training datasets and could potentially be applied to AF ablation procedures and extended in the future to other imaging, 3D printing, and data science applications.
AuthorsP Ganesan*, R Feng*, B Deb, FVY Tjong, AJ Rogers, S Ruipérez-Campillo, S Somani, Paul Clopton, T Baykaner, M Rodrigo, J Zou, F Haddad, M Zaharia, SM Narayan* denotes shared first authorship
SubmittedDiagnostics
Date17.07.2024
The efficacy of an implantable cardioverter-defibrillator (ICD) in patients with a non-ischaemic cardiomyopathy for primary prevention of sudden cardiac death is increasingly debated. We developed a multimodal deep learning model for arrhythmic risk prediction that integrated late gadolinium enhanced (LGE) cardiac magnetic resonance imaging (MRI), electrocardiography (ECG) and clinical data. Short-axis LGE-MRI scans and 12-lead ECGs were retrospectively collected from a cohort of 289 patients prior to ICD implantation, across two tertiary hospitals. A residual variational autoencoder was developed to extract physiological features from LGE-MRI and ECG, and used as inputs for a machine learning model (DEEP RISK) to predict malignant ventricular arrhythmia onset. In the validation cohort, the multimodal DEEP RISK model predicted malignant ventricular arrhythmias with an area under the receiver operating characteristic curve (AUROC) of 0.84 (95% confidence interval (CI) 0.71–0.96), a sensitivity of 0.98 (95% CI 0.75–1.00) and a specificity of 0.73 (95% CI 0.58–0.97). The models trained on individual modalities exhibited lower AUROC values compared to DEEP RISK [MRI branch: 0.80 (95% CI 0.65–0.94), ECG branch: 0.54 (95% CI 0.26–0.82), Clinical branch: 0.64 (95% CI 0.39–0.87)]. These results suggest that a multimodal model achieves high prognostic accuracy in predicting ventricular arrhythmias in a cohort of patients with non-ischaemic systolic heart failure, using data collected prior to ICD implantation.
AuthorsMZH Kolk, S Ruipérez-Campillo, CP Allaart, AAM Wilde, RE Knops, SM Narayan, FVY Tjong
SubmittedNature Scientific Reports
Date27.06.2024
Recent years have witnessed a rapid development of deep generative models for creating synthetic media, such as images and videos. While the practical applications of these models in everyday tasks are enticing, it is crucial to assess the inherent risks regarding their fairness. In this work, we introduce a comprehensive framework for benchmarking the performance and fairness of conditional generative models. We develop a set of metrics—inspired by their supervised fairness counterparts—to evaluate the models on their fairness and diversity. Focusing on the specific application of image upsampling, we create a benchmark covering a wide variety of modern upsampling methods. As part of the benchmark, we introduce UnfairFace, a subset of FairFace that replicates the racial distribution of common large-scale face datasets. Our empirical study highlights the importance of using an unbiased training set and reveals variations in how the algorithms respond to dataset imbalances. Alarmingly, we find that none of the considered methods produces statistically fair and diverse results. All experiments can be reproduced using our provided repository.
AuthorsMike Laszkiewicz, Imant Daunhawer, Julia E. Vogt†, Asja Fischer†, Johannes Lederer†† denotes shared last authorship
SubmittedACM Conference on Fairness, Accountability, and Transparency, 2024
Date05.06.2024
Despite significant progress, evaluation of explainable artificial intelligence remains elusive and challenging. In this paper we propose a fine-grained validation framework that is not overly reliant on any one facet of these sociotechnical systems, and that recognises their inherent modular structure: technical building blocks, user-facing explanatory artefacts and social communication protocols. While we concur that user studies are invaluable in assessing the quality and effectiveness of explanation presentation and delivery strategies from the explainees' perspective in a particular deployment context, the underlying explanation generation mechanisms require a separate, predominantly algorithmic validation strategy that accounts for the technical and human-centred desiderata of their (numerical) outputs. Such a comprehensive sociotechnical utility-based evaluation framework could allow to systematically reason about the properties and downstream influence of different building blocks from which explainable artificial intelligence systems are composed – accounting for a diverse range of their engineering and social aspects – in view of the anticipated use case.
AuthorsKacper Sokol, Julia E. Vogt
SubmittedExtended Abstracts of the 2024 ACM Conference on Human Factors in Computing Systems (CHI)
Date02.05.2024
Background and Objectives: The extensive collection of electrocardiogram (ECG) recordings stored in paper format has provided opportunities for numerous digitization studies. However, the traditional 10 s 12-lead ECG printout typically splits the ECG signals into four asynchronous sections of 3 leads and 2.5 s each. Since each lead corresponds to different time instants, developing a synchronization method becomes necessary for applications such as vectorcardiogram (VCG) reconstruction. Methods: A beat-level synchronization method has been developed and validated using a dataset of 21,674 signals. This method effectively addresses synchronization distortions caused by RR interval variations and preserves the time lags between R peaks across different leads for each beat. Results: The results demonstrate that the proposed method successfully synchronizes the ECG, allowing a VCG reconstruction with an average Pearson Correlation Coefficient of 0.9815±0.0426. The Normalized Root Mean Squared Error (NRMSE) and Mean Absolute Error (MAE) values for the reconstructed VCG are 0.0248±0.0214 mV and 0.0133±0.0123 mV, respectively. These metrics indicate the reliability of the VCG reconstruction achieved by means of the proposed synchronization method. Conclusions: The synchronization method has demonstrated its robustness and high performance compared to existing techniques in the field. Its effectiveness has been observed across a wide variety of signals, showcasing its applicability in real clinical environments. Moreover, its ability to handle a large number of signals makes it suitable for various applications, including retrospective studies and the development of machine learning methods.
AuthorsE Ramírez, S Ruipérez-Campillo, F Castells, R Casado-Arroyo, J Millet
SubmittedBiomedical Signal Processing and Control
Date01.05.2024
In the field of cardiac electrophysiology (EP), effectively reducing noise in intra-cardiac signals is crucial for the accurate diagnosis and treatment of arrhythmias and cardiomyopathies. However, traditional noise reduction techniques fall short in addressing the diverse noise patterns from various sources, often non-linear and non-stationary, present in these signals. This work introduces a Variational Autoencoder (VAE) model, aimed at improving the quality of intra-ventricular monophasic action potential (MAP) signal recordings. By constructing representations of clean signals from a dataset of 5706 time series from 42 patients diagnosed with ischemic cardiomyopathy, our approach demonstrates superior denoising performance when compared to conventional filtering methods commonly employed in clinical settings. We assess the effectiveness of our VAE model using various metrics, indicating its superior capability to denoise signals across different noise types, including time-varying non-linear noise frequently found in clinical settings. These results reveal that VAEs can eliminate diverse sources of noise in single beats, outperforming state-of-the-art denoising techniques and potentially improving treatment efficacy in cardiac EP.
AuthorsS Ruipérez-Campillo, A Ryser, TM Sutter, R Feng, P Ganesan, B Deb, KA Brennan, AJ Rogers, MZH Kolk, FVY Tjong, SM Narayan, JE Vogt
SubmittedICLR 2024 - Workshop on Time Series for Healthcare
Date28.03.2024
High-density multielectrode catheters are becoming increasingly popular in cardiac electrophysiology for advanced characterisation of the cardiac tissue, dueto their potential to identify impaired sites. These are often characterised by abnormal electrical conduction, which may cause locally disorganised propagation wavefronts.To quantify it, a novel heterogeneity parameter based on vector field analysis is proposed, utilising finite differences to measure direction changes between adjacent cliques. The proposed Vector Field Heterogeneity metric has been evaluated on a set of simulations with controlled levels of organisation in vector maps, and a variety of grid sizes. Furthermore, it has been tested on animal experimental models of isolated Langendorff-perfused rabbit hearts. The proposed parameter exhibited superior capturing ability of heterogeneous propagation wavefronts compared to the classical Spatial Inhomogeneity Index, and simulations proved that the metric effectively captures gradual increments in disorganisation in propagation patterns. Notably, it yielded robust and consistent outcomes for 4 × 4 grid sizes, underscoring its suitability for the latest generation of orientation-independent cardiac catheters. Index Terms—Animal experimental models, cardiac signal processing, electrophysiology, high-density electrode catheters, vector field heterogeneity. Impact Statement—The authors introduce the Vector Field Heterogeneity (VFH) metric, which provides a precise evaluation of disorganisation in electrical propagation maps within cardiac tissue, potentially improving the diagnosis and characterisation of electrophysiological conditions.
AuthorsL Pancorbo*, S Ruipérez-Campillo*, A Tormos, A Guill, R Cervigón, A Alberola, FJ Chorro, J Millet, F Castells* denotes shared first authorship
SubmittedIEEE Open Journal of Engineering in Medicine and Biology
Date23.02.2024
Pulmonary hypertension (PH) in newborns and infants is a complex condition associated with several pulmonary, cardiac, and systemic diseases contributing to morbidity and mortality. Thus, accurate and early detection of PH and the classification of its severity is crucial for appropriate and successful management. Using echocardiography, the primary diagnostic tool in pediatrics, human assessment is both time-consuming and expertise-demanding, raising the need for an automated approach. Little effort has been directed towards automatic assessment of PH using echocardiography, and the few proposed methods only focus on binary PH classification on the adult population. In this work, we present an explainable multi-view video-based deep learning approach to predict and classify the severity of PH for a cohort of 270 newborns using echocardiograms. We use spatio-temporal convolutional architectures for the prediction of PH from each view, and aggregate the predictions of the different views using majority voting. Our results show a mean F1-score of 0.84 for severity prediction and 0.92 for binary detection using 10-fold cross-validation and 0.63 for severity prediction and 0.78 for binary detection on the held-out test set. We complement our predictions with saliency maps and show that the learned model focuses on clinically relevant cardiac structures, motivating its usage in clinical practice. To the best of our knowledge, this is the first work for an automated assessment of PH in newborns using echocardiograms.
AuthorsHanna Ragnarsdottir*, Ece Özkan Elsen*, Holger Michel*, Kieran Chin-Cheong, Laura Manduchi, Sven Wellmann†, Julia E. Vogt†* denotes shared first authorship, † denotes shared last authorship
SubmittedInternational Journal of Computer Vision
Date06.02.2024
Recently, interpretable machine learning has re-explored concept bottleneck models (CBM), comprising step-by-step prediction of the high-level concepts from the raw features and the target variable from the predicted concepts. A compelling advantage of this model class is the user's ability to intervene on the predicted concept values, affecting the model's downstream output. In this work, we introduce a method to perform such concept-based interventions on already-trained neural networks, which are not interpretable by design, given an annotated validation set. Furthermore, we formalise the model's intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black-box models. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We demonstrate that fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of the proposed techniques, we apply them to deep chest X-ray classifiers and show that fine-tuned black boxes can be as intervenable and more performant than CBMs.
AuthorsSonia Laguna*, Ricards Marcinkevics*, Moritz Vandenhirtz, Julia E. Vogt* denotes shared first authorship
SubmittedArxiv
Date24.01.2024
Biological organisms experience a world of multiple modalities through a variety of sensory systems. For example, they may perceive physical or chemical stimuli through the senses of sight, smell, taste, touch, and hearing. Across species, the nervous system integrates heterogeneous sensory stimuli and forms multimodal representations that capture information shared between modalities. Analogously, machines can perceive their environment through different types of sensors, such as cameras and microphones. Yet, it is not sufficiently well understood how multimodal representations can be formed in silico, i.e., via computer simulation. In this thesis, we study how to leverage statistical dependencies between modalities to form multimodal representations computationally using machine learning. We start from the premise that real-world data is generated from a few factors of variation. Given a set of observations, representation learning seeks to infer these latent variables, which is fundamentally impossible without further assumptions. However, when we have corresponding observations of different modalities, statistical dependencies between them can carry meaningful information about the latent structure of the underlying process. Motivated by this idea, we study multimodal learning under weak supervision, which means that we consider corresponding observations of multiple modalities without labels for what is shared between them. For this challenging setup, we design machine learning algorithms that transform observations into representations of shared and modality-specific information without explicit supervision by labels. Thus, we develop methods that infer latent structure from low-level observations using weak supervision in the form of multiple modalities. We develop techniques for multimodal representation learning using two approaches—generative and discriminative learning. First, we focus on generative learning with variational autoencoders (VAEs) and propose a principled and scalable method for variational inference and density estimation on sets of modalities. Our method enhances the encoding and disentanglement of shared and modality-specific information and consequently improves the generative performance compared to relevant baselines. Motivated by these results, we consider an explicit partitioning of the latent space into shared and modality-specific subspaces. We explore the benefits and pitfalls of partitioning and develop a model that promotes the desired disentanglement for the respective subspaces. Thereby, it further improves the generative performance compared to models with a joint latent space. On the other hand, we also establish fundamental limitations for generative learning with multimodal VAEs. We show that the sub-sampling of modalities enforces an undesirable bound on the approximation of the joint distribution. This limits the generative performance of mixture-based multimodal VAEs and constrains their application to settings where relevant information can be predicted in expectation across modalities on the level of observations. To address these issues, we shift to discriminative approaches and focus on contrastive learning. We show that contrastive learning can be used to identify shared latent factors that are invariant across modalities up to a block-wise indeterminacy, even in the presence of non-trivial statistical and causal dependencies between latent variables. Finally, we demonstrate how the representations produced by contrastive learning can be used to transcend the limitations of multimodal VAEs, which yields a hybrid approach for multimodal generative learning and the disentanglement of shared and modality-specific information. Thus, we establish a theoretical basis for multimodal representation learning and explain in which settings generative and discriminative approaches can be effective in practice.
AuthorsImant Daunhawer
SubmittedDoctoral Thesis
Date12.01.2024
Background and Objectives: The extensive collection of electrocardiogram (ECG) recordings stored in paper format has provided opportunities for numerous digitization studies. However, the traditional 10 s 12-lead ECG printout typically splits the ECG signals into four asynchronous sections of 3 leads and 2.5 s each. Since each lead corresponds to different time instants, developing a synchronization method becomes necessary for applications such as vectorcardiogram (VCG) reconstruction. Methods: A beat-level synchronization method has been developed and validated using a dataset of 21,674 signals. This method effectively addresses synchronization distortions caused by RR interval variations and preserves the time lags between R peaks across different leads for each beat. Results: The results demonstrate that the proposed method successfully synchronizes the ECG, allowing a VCG reconstruction with an average Pearson Correlation Coefficient of 0.9815±0.0426. The Normalized Root Mean Squared Error (NRMSE) and Mean Absolute Error (MAE) values for the reconstructed VCG are 0.0248±0.0214 mV and 0.0133±0.0123 mV, respectively. These metrics indicate the reliability of the VCG reconstruction achieved by means of the proposed synchronization method. Conclusions: The synchronization method has demonstrated its robustness and high performance compared to existing techniques in the field. Its effectiveness has been observed across a wide variety of signals, showcasing its applicability in real clinical environments. Moreover, its ability to handle a large number of signals makes it suitable for various applications, including retrospective studies and the development of machine learning methods.
AuthorsE Ramírez, S Ruipérez-Campillo, F Castells, R Casado-Arroyo, J Millet
SubmittedBiomedical Signal Processing and Control
Date05.01.2024
Appendicitis is among the most frequent reasons for pediatric abdominal surgeries. Previous decision support systems for appendicitis have focused on clinical, laboratory, scoring, and computed tomography data and have ignored abdominal ultrasound, despite its noninvasive nature and widespread availability. In this work, we present interpretable machine learning models for predicting the diagnosis, management and severity of suspected appendicitis using ultrasound images. Our approach utilizes concept bottleneck models (CBM) that facilitate interpretation and interaction with high-level concepts understandable to clinicians. Furthermore, we extend CBMs to prediction problems with multiple views and incomplete concept sets. Our models were trained on a dataset comprising 579 pediatric patients with 1709 ultrasound images accompanied by clinical and laboratory data. Results show that our proposed method enables clinicians to utilize a human-understandable and intervenable predictive model without compromising performance or requiring time-consuming image annotation when deployed. For predicting the diagnosis, the extended multiview CBM attained an AUROC of 0.80 and an AUPR of 0.92, performing comparably to similar black-box neural networks trained and tested on the same dataset.
AuthorsRicards Marcinkevics*, Patricia Reis Wolfertstetter*, Ugne Klimiene*, Kieran Chin-Cheong, Alyssia Paschke, Julia Zerres, Markus Denzinger, David Niederberger, Sven Wellmann, Ece Özkan Elsen†, Christian Knorr†, Julia E. Vogt†* denotes shared first authorship, † denotes shared last authorship
SubmittedMedical Image Analysis
Date01.01.2024
Background Risk stratification for ventricular arrhythmias currently relies on static measurements that fail to adequately capture dynamic interactions between arrhythmic substrate and triggers over time. We trained and internally validated a dynamic machine learning (ML) model and neural network that extracted features from longitudinally collected electrocardiograms (ECG), and used these to predict the risk of malignant ventricular arrhythmias. Methods A multicentre study in patients implanted with an implantable cardioverter-defibrillator (ICD) between 2007 and 2021 in two academic hospitals was performed. Variational autoencoders (VAEs), which combine neural networks with variational inference principles, and can learn patterns and structure in data without explicit labelling, were trained to encode the mean ECG waveforms from the limb leads into 16 variables. Supervised dynamic ML models using these latent ECG representations and clinical baseline information were trained to predict malignant ventricular arrhythmias treated by the ICD. Model performance was evaluated on a hold-out set, using time-dependent receiver operating characteristic (ROC) and calibration curves. Findings 2942 patients (61.7 ± 13.9 years, 25.5% female) were included, with a total of 32,129 ECG recordings during a mean follow-up of 43.9 ± 35.9 months. The mean time-varying area under the ROC curve for the dynamic model was 0.738 ± 0.07, compared to 0.639 ± 0.03 for a static (i.e. baseline-only model). Feature analyses indicated dynamic changes in latent ECG representations, particularly those affecting the T-wave morphology, were of highest importance for model predictions. Interpretation Dynamic ML models and neural networks effectively leverage routinely collected longitudinal ECG recordings for personalised and updated predictions of malignant ventricular arrhythmias, outperforming static models.
AuthorsMZH Kolk, S Ruipérez-Campillo, L Alvarez-Florez, B Deb, EJ Bekkers, CP Allaart, ALCJ van der Lingen, P Clopton, I Isgum, AAM Wilde, RE Knops, SM Narayan, FVY Tjong
SubmittedLancet eBiomedicine
Date01.01.2024
2023
We propose Tree Variational Autoencoder (TreeVAE), a new generative hierarchical clustering model that learns a flexible tree-based posterior distribution over latent variables. TreeVAE hierarchically divides samples according to their intrinsic characteristics, shedding light on hidden structures in the data. It adapts its architecture to discover the optimal tree for encoding dependencies between latent variables. The proposed tree-based generative architecture enables lightweight conditional inference and improves generative performance by utilizing specialized leaf decoders. We show that TreeVAE uncovers underlying clusters in the data and finds meaningful hierarchical relations between the different groups on a variety of datasets, including real-world imaging data. We present empirically that TreeVAE provides a more competitive log-likelihood lower bound than the sequential counterparts. Finally, due to its generative nature, TreeVAE is able to generate new samples from the discovered clusters via conditional sampling.
AuthorsLaura Manduchi*, Moritz Vandenhirtz*, Alain Ryser, Julia E. Vogt* denotes shared first authorship
SubmittedSpotlight at Neural Information Processing Systems, NeurIPS 2023
Date20.12.2023
ExpLIMEable is a tool to enhance the comprehension of Local Interpretable Model-Agnostic Explanations (LIME), particularly within the realm of medical image analysis. LIME explanations often lack robustness due to variances in perturbation techniques and interpretable function choices. Powered by a convolutional neural network for brain MRI tumor classification, ExpLIMEable seeks to mitigate these issues. This explainability tool allows users to tailor and explore the explanation space generated post hoc by different LIME parameters to gain deeper insights into the model’s decision-making process, its sensitivity, and limitations. We introduce a novel dimension reduction step on the perturbations seeking to find more informative neighborhood spaces and extensive provenance tracking to support the user. This contribution ultimately aims to enhance the robustness of explanations, key in high-risk domains like healthcare
AuthorsSonia Laguna, Julian Heidenreich, Jiugeng Sun, Nil\"ufer Cetin, Ibrahim Al Hazwani, Udo Schlegel, Furui Cheng, Mennatallah El-Assady
SubmittedNeurIPS 2023, XAI in Action: Past, Present, and Future Applications
Date16.12.2023
Recently, interpretable machine learning has re-explored concept bottleneck models (CBM), comprising step-by-step prediction of the high-level concepts from the raw features and the target variable from the predicted concepts. A compelling advantage of this model class is the user's ability to intervene on the predicted concept values, consequently affecting the model's downstream output. In this work, we introduce a method to perform such concept-based interventions on already-trained neural networks, which are not interpretable by design. Furthermore, we formalise the model's intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black-box models. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We demonstrate that fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of the proposed techniques, we apply them to chest X-ray classifiers and show that fine-tuned black boxes can be as intervenable and more performant than CBMs.
AuthorsRicards Marcinkevics*, Sonia Laguna*, Moritz Vandenhirtz, Julia E. Vogt* denotes shared first authorship
SubmittedXAI in Action: Past, Present, and Future Applications, NeurIPS 2023
Date16.12.2023
Background: The overarching goal of blood glucose forecasting is to assist individuals with type 1 diabetes (T1D) in avoiding hyper- or hypoglycemic conditions. While deep learning approaches have shown promising results for blood glucose forecasting in adults with T1D, it is not known if these results generalize to children. Possible reasons are physical activity (PA), which is often unplanned in children, as well as age and development of a child, which both have an effect on the blood glucose level. Materials and Methods: In this study, we collected time series measurements of glucose levels, carbohydrate intake, insulin-dosing and physical activity from children with T1D for one week in an ethics approved prospective observational study, which included daily physical activities. We investigate the performance of state-of-the-art deep learning methods for adult data—(dilated) recurrent neural networks and a transformer—on our dataset for short-term (30 min) and long-term (2 h) prediction. We propose to integrate static patient characteristics, such as age, gender, BMI, and percentage of basal insulin, to account for the heterogeneity of our study group. Results: Integrating static patient characteristics (SPC) proves beneficial, especially for short-term prediction. LSTMs and GRUs with SPC perform best for a prediction horizon of 30 min (RMSE of 1.66 mmol/l), a vanilla RNN with SPC performs best across different prediction horizons, while the performance significantly decays for long-term prediction. For prediction during the night, the best method improves to an RMSE of 1.50 mmol/l. Overall, the results for our baselines and RNN models indicate that blood glucose forecasting for children conducting regular physical activity is more challenging than for previously studied adult data. Conclusion: We find that integrating static data improves the performance of deep-learning architectures for blood glucose forecasting of children with T1D and achieves promising results for short-term prediction. Despite these improvements, additional clinical studies are warranted to extend forecasting to longer-term prediction horizons.
AuthorsAlexander Marx, Francesco Di Stefano, Heike Leutheuser, Kieran Chin-Cheong, Marc Pfister, Marie-Anne Burckhardt, Sara Bachmann†, Julia E. Vogt†† denotes shared last authorship
SubmittedFrontiers in Pediatrics
Date14.12.2023
Partitioning a set of elements into an unknown number of mutually exclusive subsets is essential in many machine learning problems. However, assigning elements, such as samples in a dataset or neurons in a network layer, to an unknown and discrete number of subsets is inherently non-differentiable, prohibiting end-to-end gradient-based optimization of parameters. We overcome this limitation by proposing a novel two-step method for inferring partitions, which allows its usage in variational inference tasks. This new approach enables reparameterized gradients with respect to the parameters of the new random partition model. Our method works by inferring the number of elements per subset and, second, by filling these subsets in a learned order. We highlight the versatility of our general-purpose approach on three different challenging experiments: variational clustering, inference of shared and independent generative factors under weak supervision, and multitask learning.
AuthorsThomas M. Sutter*, Alain Ryser*, Joram Liebeskind, Julia E. Vogt* denotes shared first authorship
SubmittedNeurips 2023
Date12.12.2023
Prototype learning, a popular machine learning method designed for inherently interpretable decisions, leverages similarities to learned prototypes for classifying new data. While it is mainly applied in computer vision, in this work, we build upon prior research and further explore the extension of prototypical networks to natural language processing. We introduce a learned weighted similarity measure that enhances the similarity computation by focusing on informative dimensions of pre-trained sentence embeddings. Additionally, we propose a post-hoc explainability mechanism that extracts prediction-relevant words from both the prototype and input sentences. Finally, we empirically demonstrate that our proposed method not only improves predictive performance on the AG News and RT Polarity datasets over a previous prototype-based approach, but also improves the faithfulness of explanations compared to rationale-based recurrent convolutions.
AuthorsClaudio Fanconi*, Moritz Vandenhirtz*, Severin Husmann, Julia E. Vogt* denotes shared first authorship
SubmittedConference on Empirical Methods in Natural Language Processing, EMNLP 2023
Date25.10.2023
Background: Hyperbilirubinemia of the newborn infant is a common disease worldwide. However, recognized early and treated appropriately, it typically remains innocuous. We recently developed an early phototherapy prediction tool (EPPT) by means of machine learning (ML) utilizing just one bilirubin measurement and few clinical variables. The aim of this study is to test applicability and performance of the EPPT on a new patient cohort from a different population. Materials and methods: This work is a retrospective study of prospectively recorded neonatal data from infants born in 2018 in an academic hospital, Regensburg, Germany, meeting the following inclusion criteria: born with 34 completed weeks of gestation or more, at least two total serum bilirubin (TSB) measurement prior to phototherapy. First, the original EPPT—an ensemble of a logistic regression and a random forest—was used in its freely accessible version and evaluated in terms of the area under the receiver operating characteristic curve (AUROC). Second, a new version of the EPPT model was re-trained on the data from the new cohort. Third, the predictive performance, variable importance, sensitivity and specificity were analyzed and compared across the original and re-trained models. Results: In total, 1,109 neonates were included with a median (IQR) gestational age of 38.4 (36.6–39.9) and a total of 3,940 bilirubin measurements prior to any phototherapy treatment, which was required in 154 neonates (13.9%). For the phototherapy treatment prediction, the original EPPT achieved a predictive performance of 84.6% AUROC on the new cohort. After re-training the model on a subset of the new dataset, 88.8% AUROC was achieved as evaluated by cross validation. The same five variables as for the original model were found to be most important for the prediction on the new cohort, namely gestational age at birth, birth weight, bilirubin to weight ratio, hours since birth, bilirubin value. Discussion: The individual risk for treatment requirement in neonatal hyperbilirubinemia is robustly predictable in different patient cohorts with a previously developed ML tool (EPPT) demanding just one TSB value and only four clinical parameters. Further prospective validation studies are needed to develop an effective and safe clinical decision support system.
AuthorsImant Daunhawer, Kai Schumacher, Anna Badura, Julia E. Vogt, Holger Michel, Sven Wellmann
SubmittedFrontiers in Pediatrics, 2023
Date09.10.2023
Chronic obstructive pulmonary disease (COPD) is a significant public health issue, affecting more than 100 million people worldwide. Remote patient monitoring has shown great promise in the efficient management of patients with chronic diseases. This work presents the analysis of the data from a monitoring system developed to track COPD symptoms alongside patients’ self-reports. In particular, we investigate the assessment of COPD severity using multisensory home-monitoring device data acquired from 30 patients over a period of three months. We describe a comprehensive data pre-processing and feature engineering pipeline for multimodal data from the remote home-monitoring of COPD patients. We develop and validate predictive models forecasting i) the absolute and ii) differenced COPD Assessment Test (CAT) scores based on the multisensory data. The best obtained models achieve Pearson’s correlation coefficient of 0.93 and 0.37 for absolute and differenced CAT scores. In addition, we investigate the importance of individual sensor modalities for predicting CAT scores using group sparse regularization techniques. Our results suggest that feature groups indicative of the patient’s general condition, such as static medical and physiological information, date, spirometer, and air quality, are crucial for predicting the absolute CAT score. For predicting changes in CAT scores, sleep and physical activity features are most important, alongside the previous CAT score value. Our analysis demonstrates the potential of remote patient monitoring for COPD management and investigates which sensor modalities are most indicative of COPD severity as assessed by the CAT score. Our findings contribute to the development of effective and data-driven COPD management strategies.
AuthorsZixuan Xiao, Michal Muszynski, Ricards Marcinkevics, Lukas Zimmerli, Adam D. Ivankay, Dario Kohlbrenner, Manuel Kuhn, Yves Nordmann, Ulrich Muehlner, Christian Clarenbach, Julia E. Vogt, Thomas Brunschwiler
Submitted25th ACM International Conference on Multimodal Interaction, ICMI'23
Date09.10.2023
A precise method to measure spasticity is fundamental in improving the quality of life of spastic patients. The measurement methods that exist for spasticity have long been considered scarce and inadequate, which can partly be explained by a lack of consensus in the definition of spasticity. Spasticity quantification methods can be roughly classified according to whether they are based on neurophysiological or biomechanical mechanisms, clinical scales, or imaging techniques. This article reviews methods from all classes and further discusses instrumentation, dimensionality, and EMG onset detection methods. The objective of this article is to provide a review on spasticity measurement methods used to this day in an effort to contribute to the advancement of both the quantification and treatment of spasticity.
AuthorsKO Kristinsdottir, S Ruipérez-Campillo, T Helgason
SubmittedChapter of the Book "Stroke-Management Pearls"
Date04.10.2023
Background Segmentation of computed tomography (CT) is important for many clinical procedures including personalized cardiac ablation for the management of cardiac arrhythmias. While segmentation can be automated by machine learning (ML), it is limited by the need for large, labeled training data that may be difficult to obtain. We set out to combine ML of cardiac CT with domain knowledge, which reduces the need for large training datasets by encoding cardiac geometry, which we then tested in independent datasets and in a prospective study of atrial fibrillation (AF) ablation. Methods We mathematically represented atrial anatomy with simple geometric shapes and derived a model to parse cardiac structures in a small set of N = 6 digital hearts. The model, termed “virtual dissection,” was used to train ML to segment cardiac CT in N = 20 patients, then tested in independent datasets and in a prospective study. Results In independent test cohorts (N = 160) from 2 Institutions with different CT scanners, atrial structures were accurately segmented with Dice scores of 96.7% in internal (IQR: 95.3%–97.7%) and 93.5% in external (IQR: 91.9%–94.7%) test data, with good agreement with experts (r = 0.99; p < 0.0001). In a prospective study of 42 patients at ablation, this approach reduced segmentation time by 85% (2.3 ± 0.8 vs. 15.0 ± 6.9 min, p < 0.0001), yet provided similar Dice scores to experts (93.9% (IQR: 93.0%–94.6%) vs. 94.4% (IQR: 92.8%–95.7%), p = NS). Conclusions Encoding cardiac geometry using mathematical models greatly accelerated training of ML to segment CT, reducing the need for large training sets while retaining accuracy in independent test data. Combining ML with domain knowledge may have broad applications.
AuthorsRuibin Feng, Brototo Deb, Prasanth Ganesan, Fleur VY Tjong, Albert J Rogers, Samuel Ruipérez-Campillo, Sulaiman Somani, Paul Clopton, Tina Baykaner, Miguel Rodrigo, James Zou, Fracois Haddad, Matei Zahari, Sanjiv M. Narayan
SubmittedFrontiers in cardiovascular medicine
Date02.10.2023
This study presents a novel metric to evaluate the heterogeneity of cardiac substrate by using vector maps derived from omnipolar electrograms. This metric determines the level of disorganisation of electrical propagation having the potential to classify cardiac tissue under the catheter. We tested the methodology on propagation maps obtained from experimental recordings with and without electrical stimulation, under the assumption that the former exhibit greater heterogeneity. Results show the discriminatory behaviour of the parameter (p < 0.001), assigning higher values to non-stimulated maps and lower values in cases with stimulation. The clinical relevance of this paper lies in the introduction of a new metric defined on omnipolarderived vector maps, capable of identifying and quantifying areas of disorganised electrical propagation within the heart. This parameter has the potential to make orientation-independent catheterisation procedures more efficient providing electrophysiologists with valuable information for the management of arrhythmias.
AuthorsL Pancorbo*, S Ruipérez-Campillo*, F Castells, J Millet* denotes shared first authorship
SubmittedIEEE Computing in Cardiology (50th CinC, 2023)
Date01.10.2023
Many patients remain in a comatose state after initially surviving a resuscitation following a cardiac arrest. The prognosis in this state carries the decision of life support withdrawal, thus needing an objective and deterministic guideline. The objective of this study, is to assist this decision by providing a model able to predict the cerebral performance category (CPC) of comatose patients following cardiac arrest from their electroencephalographic (EEG) signal. To achieve this, binary classifiers built with 3D Convolutional Neural Networks (CNNs) followed by Dense Neural Networks (DNN) are used in combination with a “divide and conquer” strategy, thus enabling the automatic extraction of features from the tensors of EEG signals, taking into consideration the spatial relation of the signals according to the electrodes’ distribution on the scalp. This work was submitted under the team name “BioITACA UPV” to “Predicting Neurological Recovery from Coma After Cardiac Arrest: The George B. Moody PhysioNet Challenge 2023”, and while the team did not score in the official phase, results obtained from a held-out subset of the training set demonstrate the capability of the model to classify by CPC from short segments of 5 seconds to long recordings of EEG data. Results show an average accuracy of 0.76 between the CPC classifiers and capability to discern between a good or bad outcome prognosis.
AuthorsRT Ors-Quixal, E Ramírez-Candela, S Ruipérez-Campillo, F Castells, J Millet
SubmittedIEEE Computing in Cardiology (50th CinC, 2023)
Date01.10.2023
The aim of this study is to improve the prediction of long-term outcomes in patients with atrial fibrillation solely using electrogram (EGM) features. We developed three distinct models based on data from a cohort of N=561 patients, each targeting different aspects of EGM analysis: Principal Component Analysis (PCA): We applied PCA to analyze the variances of eigenvectors projecting more than a fixed threshold of the overall variance (15%). To identify common projection axes among these eigenvectors, we employed the k-means algorithm for clustering. Auto Regressive: This technique involves applying a bijective transformation to the coefficients, which are subsequently used as input for various machine learning classifiers, including Random Forest or Support Vector Classifier. Feature Engineering: We performed feature engineering by extracting voltage, rate, and shape similarity metrics from raw EGM (Electrogram) data.
AuthorsM Pedron, P Ganesan, R Feng, B Deb, H Chang, S Ruipérez-Campillo, S Somani, Y Desai, AJ Rogers, P Clopton, SM Narayan
SubmittedIEEE Computing in Cardiology (50th CinC, 2023)
Date01.10.2023
The vectorcardiogram (VCG) provides a comprehensive representation of the heart's electrical activity in 3D aiding in the diagnosis and treatment of cardiovascular diseases. The conventional electrocardiogram (ECG) records twelve leads intermittently at intervals of 2.5 seconds, with lead II typically recorded continuously, which poses a challenge for reconstructing the VCG, as each lead's beats belong to different time instances. The purpose of this research is to propose and validate a methodology for accurately synchronizing the recording beats to reconstruct the VCG. To achieve this goal, a phantom was created to mimic the standard 12-lead ECG setup. The temporal offset of each beat from the first is calculated using cross-correlation utilizing the continuous lead and the same offset is applied to all leads, and finally reconstructing the VCG. The results demonstrate precise synchronization, as evidenced by Pearson correlation values of 0.9959±0.0034 , an MAE of 0.0077±0.0024 mV , and an RMSE of 0.0119±0.0038 mV in the VCG reconstruction. This technique is essential for the accurate diagnosis and treatment of cardiovascular diseases and can be applied to conventional ECG recordings taken on paper to obtain VCG.
AuthorsE Ramítez, S Ruipérez-Campillo, F Castells, R Casado-Arroyo, J Millet
SubmittedIEEE Computing in Cardiology (50th CinC, 2023)
Date01.10.2023
Aims Left ventricular ejection fraction (LVEF) is suboptimal as a sole marker for predicting sudden cardiac death (SCD). Machine learning (ML) provides new opportunities for personalized predictions using complex, multimodal data. This study aimed to determine if risk stratification for implantable cardioverter-defibrillator (ICD) implantation can be improved by ML models that combine clinical variables with 12-lead electrocardiograms (ECG) time-series features. Methods and results A multicentre study of 1010 patients (64.9 ± 10.8 years, 26.8% female) with ischaemic, dilated, or non-ischaemic cardiomyopathy, and LVEF ≤ 35% implanted with an ICD between 2007 and 2021 for primary prevention of SCD in two academic hospitals was performed. For each patient, a raw 12-lead, 10-s ECG was obtained within 90 days before ICD implantation, and clinical details were collected. Supervised ML models were trained and validated on a development cohort (n = 550) from Hospital A to predict ICD non-arrhythmic mortality at three-year follow-up (i.e. mortality without prior appropriate ICD-therapy). Model performance was evaluated on an external patient cohort from Hospital B (n = 460). At three-year follow-up, 16.0% of patients had died, with 72.8% meeting criteria for non-arrhythmic mortality. Extreme gradient boosting models identified patients with non-arrhythmic mortality with an area under the receiver operating characteristic curve (AUROC) of 0.90 [95% confidence intervals (CI) 0.80-1.00] during internal validation. In the external cohort, the AUROC was 0.79 (95% CI 0.75-0.84). Conclusions ML models combining ECG time-series features and clinical variables were able to predict non-arrhythmic mortality within three years after device implantation in a primary prevention population, with robust performance in an independent cohort.
AuthorsMZH Kolk, S Ruipérez-Campillo, B Deb, E Bekkers, CP Allaart, AJ Rogers, ACJ Van Der Lingen, I Isgum, B De Vos, P Clopton, others
SubmittedEuropace
Date15.09.2023
The purpose of the study is to better understand the complex nature of loneliness in older adults and the potential contributing factors that may impact their sense of connection and well-being. The study utilized a mixed-methods approach, combining quantitative measures such as heart rate monitoring with qualitative data collected through interviews and surveys. The findings suggest that loneliness in older adults may be influenced by multiple factors, including their level of education, resilience, and empathy and incidence in spontaneous heart rate variations. Results highlight the importance of empathy in promoting social connectedness and reducing feelings of loneliness in older adults, may have implications for developing targeted interventions aimed at reducing loneliness and improving the well-being of older adults.
AuthorsR Cervigón, S Ruipérez-Campillo, J Millet, F Castells
SubmittedIEEE Mediterranean Conference on Medical and Biological Engineering and Computing (2023)
Date14.09.2023
Early detection of cardiac dysfunction through routine screening is vital for diagnosing cardiovascular diseases. An important metric of cardiac function is the left ventricular ejection fraction (EF), where lower EF is associated with cardiomyopathy. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound being a low-cost, real-time, and non-ionizing technology. However, human assessment of echocardiograms for calculating EF is time-consuming and expertise-demanding, raising the need for an automated approach. In this work, we propose using the M(otion)-mode of echocardiograms for estimating the EF and classifying cardiomyopathy. We generate multiple artificial M-mode images from a single echocardiogram and combine them using off-the-shelf model architectures. Additionally, we extend contrastive learning (CL) to cardiac imaging to learn meaningful representations from exploiting structures in unlabeled data allowing the model to achieve high accuracy, even with limited annotations. Our experiments show that the supervised setting converges with only ten modes and is comparable to the baseline method while bypassing its cumbersome training process and being computationally much more efficient. Furthermore, CL using M-mode images is helpful for limited data scenarios, such as having labels for only 200 patients, which is common in medical applications.
AuthorsEce Özkan Elsen*, Thomas M. Sutter*, Yurong Hu, Sebastian Balzer, Julia E. Vogt* denotes shared first authorship
SubmittedGCPR 2023
Date01.09.2023
Appendicitis is among the most frequent reasons for pediatric abdominal surgeries. With recent advances in machine learning, data-driven decision support could help clinicians diagnose and manage patients while reducing the number of non-critical surgeries. However, previous decision support systems for appendicitis have focused on clinical, laboratory, scoring, and computed tomography data and have ignored the use of abdominal ultrasound, despite its noninvasive nature and widespread availability. In this work, we present interpretable machine learning models for predicting the diagnosis, management and severity of suspected appendicitis using ultrasound images. To this end, our approach utilizes concept bottleneck models (CBM) that facilitate interpretation and interaction with high-level concepts that are understandable to clinicians. Furthermore, we extend CBMs to prediction problems with multiple views and incomplete concept sets. Our models were trained on a dataset comprising 579 pediatric patients with 1709 ultrasound images accompanied by clinical and laboratory data. Results show that our proposed method enables clinicians to utilize a human-understandable and intervenable predictive model without compromising performance or requiring time-consuming image annotation when deployed.
AuthorsRicards Marcinkevics*, Patricia Reis Wolfertstetter*, Ugne Klimiene*, Kieran Chin-Cheong, Alyssia Paschke, Julia Zerres, Markus Denzinger, David Niederberger, Sven Wellmann, Ece Özkan Elsen†, Christian Knorr†, Julia E. Vogt†* denotes shared first authorship, † denotes shared last authorship
SubmittedWorkshop on Machine Learning for Multimodal Healthcare Data, Co-located with ICML 2023
Date29.07.2023
Abstract Ante-hoc interpretability has become the holy grail of explainable artificial intelligence for high-stakes domains such as healthcare; however, this notion is elusive, lacks a widely-accepted definition and depends on the operational context. It can refer to predictive models whose structure adheres to domain-specific constraints, or ones that are inherently transparent. The latter conceptualisation assumes observers who judge this quality, whereas the former presupposes them to have technical and domain expertise (thus alienating other groups of explainees). Additionally, the distinction between ante-hoc interpretability and the less desirable post-hoc explainability, which refers to methods that construct a separate explanatory model, is vague given that transparent predictive models may still require (post-)processing to yield suitable explanatory insights. Ante-hoc interpretability is thus an overloaded concept that comprises a range of implicit properties, which we unpack in this paper to better understand what is needed for its safe deployment across high-stakes domains. To this end, we outline modelling and explaining desiderata that allow us to navigate its distinct realisations in view of the envisaged application and audience.
AuthorsKacper Sokol, Julia E. Vogt
SubmittedWorkshop on Interpretable ML in Healthcare at 2023 International Conference on Machine Learning (ICML)
Date28.07.2023
Counterfactual explanations are the de facto standard when tasked with interpreting decisions of (opaque) predictive models. Their generation is often subject to algorithmic and domain-specific constraints – such as density-based feasibility for the former and attribute (im)mutability or directionality of change for the latter – that aim to maximise their real-life utility. In addition to desiderata with respect to the counterfactual instance itself, the existence of a viable path connecting it with the factual data point, known as algorithmic recourse, has become an important technical consideration. While both of these requirements ensure that the steps of the journey as well as its destination are admissible, current literature neglects the multiplicity of such counterfactual paths. To address this shortcoming we introduce the novel concept of explanatory multiverse that encompasses all the possible counterfactual journeys and shows how to navigate, reason about and compare the geometry of these paths – their affinity, branching, divergence and possible future convergence – with two methods: vector spaces and graphs. Implementing this (interactive) explanatory process grants explainees more agency by allowing them to select counterfactuals based on the properties of the journey leading to them in addition to their absolute differences.
AuthorsKacper Sokol, Edward Small, Yueqing Xuan
SubmittedWorkshop on Counterfactuals in Minds and Machines at 2023 International Conference on Machine Learning (ICML)
Date28.07.2023
In this study, a novel unsupervised classification framework for time series of medical nature is presented. This framework is based on the intersection of machine learning, Hilbert Spaces algebra, and signal theory. The methodology is illustrated through the resolution of three biomedical engineering problems: neuronal activity tracking, protein functional classification, and non-invasive diagnosis of atrial flutter (AFL). The results indicate that the proposed algorithms exhibit high proficiency in solving these tasks and demonstrate robustness in identifying damaged neuronal units while tracking healthy ones. Moreover, the application of the framework in protein functional classification provides a new perspective for the development of pharmaceutical products and personalised medicine. Additionally, the controlled environment of the framework in AFL simulation problem underscores the algorithm’s ability to encode information efficiently. These results offer valuable insights into the potential of this framework and lay the groundwork for future studies.Clinical relevance— The framework proposed in this study has the potential to yield novel insights into the effects of newly implanted electrodes in the brain. Furthermore, the categorization of proteins by function could facilitate the development of personalised and efficient medicines, ultimately reducing both time and cost. The simulation of atrial flutter also demonstrates the framework’s ability to encode information for arrhythmia diagnosis and treatment, which has the potential to lead to improved patient outcomes.
AuthorsS Ruipérez-Campillo, F Castells, J Millet
SubmittedIEEE Engineering in Medicine & Biology Society (45th EMBC, 2023)
Date24.07.2023
The development of high-density multielectrode catheters has significantly advanced cardiac electrophysiology mapping. High-density grid catheters have enabled the creation of a novel technique for reconstructing electrogram (EGM) signals known as "omnipole," which is believed to be more reliable than other methods, especially in terms of orientation independence. This study aims to evaluate how distance affects the omnipolar reconstruction of EGMs by comparing different configurations. Using an animal set up of perfused isolated rabbit hearts, recordings were taken using an ad hoc high-density epicardial multielectrode catheter. Inter-electrode distances ranging from 1 to 4 mm were analysed for their effect on the quality of resulting EGMs. Two biomarkers were computed to evaluate the robustness of the reconstructions: the areas contained within the bipolar loops and the amplitudes of the omnipoles. We hypothesised that both bipolar and omnipolar electrograms would be more robust at shorter inter-electrode distances. The results showed that an increase in distance triggers an increase in loop areas and amplitudes, which supports the hypothesis. This finding provides a more reliable estimate of wavefront propagation for the cross-omnipolar reconstruction method. These results emphasise the importance of distance in cardiac electrophysiology mapping and provide valuable insights into the use of high-density multielectrode catheters for EGM reconstruction.Clinical Relevance- The results of this study have direct clinical relevance in the application of the described techniques to recording systems in the cardiac electrophysiology laboratory, enabling clinicians to obtain more precise characterisation of signals in the myocardium.
AuthorsM Crespo*, S Ruipérez Campillo*, R Casado-Arroyo, J Millet, F Castells* denotes shared first authorship
SubmittedIEEE Engineering in Medicine \& Biology Society (45th EMBC, 2023)
Date24.07.2023
The present study aims to design and fabricate a system capable of generating heterogeneities on the epicardial surface of an isolated rabbit heart perfused in a Langendorff system. The system consists of thermoelectric modules that can be independently controlled by the developed hardware, thereby allowing for the generation of temperature gradients on the epicardial surface, resulting in conduction slowing akin to heterogeneities of pathological origin. A comprehensive analysis of the system’s viability was performed through modeling and thermal simulation, and its practicality was validated through preliminary tests conducted at the experimental cardiac electrophysiology laboratory of the University of Valencia. The design process involved the use of Fusion 360 for 3D designs, MATLAB/Simulink for algorithms and block diagrams, LTSpice and Altium Designer for schematic captures and PCB design, and the integration of specialized equipment for animal experimentation. The objective of the study was to efficiently capture epicardial recordings under varying conditions. Clinical relevance— The proposed system aims to induce local epicardial heterogeneities to generate labeled correct signals that can serve as a golden standard for improving algorithms that identify and characterize fibrotic substrates. This improvement will enhance the efficacy of ablation processes and potentially reduce the ablated surface area.
AuthorsI Segarra, A Cebrián, S Ruipérez-Campillo, A Tormos, FJ Chorro, F Castells, A Alberola, J Millet
SubmittedIEEE Engineering in Medicine & Biology Society (45th EMBC, 2023)
Date24.07.2023
Partitioning a set of elements into an unknown number of mutually exclusive subsets is essential in many machine learning problems. However, assigning elements, such as samples in a dataset or neurons in a network layer, to an unknown and discrete number of subsets is inherently non-differentiable, prohibiting end-to-end gradient-based optimization of parameters. We overcome this limitation by proposing a novel two-step method for inferring partitions, which allows its usage in variational inference tasks. This new approach enables reparameterized gradients with respect to the parameters of the new random partition model. Our method works by inferring the number of elements per subset and, second, by filling these subsets in a learned order. We highlight the versatility of our general-purpose approach on two different challenging experiments: variational clustering and inference of shared and independent generative factors under weak supervision.
AuthorsThomas M. Sutter*, Alain Ryser*, Joram Liebeskind, Julia E. Vogt* denotes shared first authorship
SubmittedICML workshop on Structured Probabilistic Inference & Generative Modeling
Date23.07.2023
Partitioning a set of elements into an unknown number of mutually exclusive subsets is essential in many machine-learning problems. However, assigning elements to an unknown and discrete number of subsets is inherently non-differentiable, prohibiting end-to-end gradient-based optimization of parameters. We propose a novel two-step method for learning distributions over partitions, including a reparametrization trick, to allow the inclusion of partitions in variational inference tasks. Our method works by first inferring the number of elements per subset and then sequentially filling these subsets in an order learned in a second step. We highlight the versatility of our general-purpose approach on two different experiments: multitask learning and unsupervised conditional sampling.
AuthorsThomas M. Sutter*, Alain Ryser*, Joram Liebeskind, Julia E. Vogt* denotes shared first authorship
SubmittedFifth Symposium on Advances in Approximate Bayesian Inference
Date18.07.2023
We propose a new generative hierarchical clustering model that learns a flexible tree-based posterior distribution over latent variables. The proposed Tree Variational Autoencoder (TreeVAE) hierarchically divides samples according to their intrinsic characteristics, shedding light on hidden structures in the data. It adapts its architecture to discover the optimal tree for encoding dependencies between latent variables, improving generative performance. We show that TreeVAE uncovers underlying clusters in the data and finds meaningful hierarchical relations between the different groups on several datasets. Due to its generative nature, TreeVAE can generate new samples from the discovered clusters via conditional sampling.
AuthorsLaura Manduchi*, Moritz Vandenhirtz*, Alain Ryser, Julia E. Vogt* denotes shared first authorship
SubmittedICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling
Date30.06.2023
We propose a new generative hierarchical clustering model that learns a flexible tree-based posterior distribution over latent variables. The proposed Tree Variational Autoencoder (TreeVAE) hierarchically divides samples according to their intrinsic characteristics, shedding light on hidden structures in the data. It adapts its architecture to discover the optimal tree for encoding dependencies between latent variables, improving generative performance. We show that TreeVAE uncovers underlying clusters in the data and finds meaningful hierarchical relations between the different groups on several datasets. Due to its generative nature, TreeVAE can generate new samples from the discovered clusters via conditional sampling.
AuthorsLaura Manduchi*, Moritz Vandenhirtz*, Alain Ryser, Julia E. Vogt* denotes shared first authorship
SubmittedICML 2023 Workshop on Deployment Challenges for Generative AI
Date30.06.2023
High-density catheters combined with Orientation Independent Sensing (OIS) methods have emerged as a groundbreaking technology for cardiac substrate characterisation. In this study, we aim to assess the arrangements and constraints to reliably estimate the so-called omnipolar electrogram (oEGM). Performance was evaluated using an experimental animal model. Thirty-eight recordings from nine retrospective experiments on isolated perfused rabbit hearts with an epicardial HD multielectrode were used. We estimated oEGMs according to the classic triangular clique (4 possible orientations) and a novel cross-orientation clique arrangement. Furthermore, we tested the effects of interelectrode spacing from 1 to 4 mm. Performance was evaluated by means of several parameters that measured amplitude rejection ratios, electric field loop area, activation pulse width and morphology distortion. Most reliable oEGM estimations were obtained with cross-configurations and interelectrode spacings <=2 mm. Estimations from triangular cliques resulted in wider electric field loops and unreliable detection of the direction of the propagation wavefront. Moreover, increasing interelectrode distance resulted in increased pulse width and morphology distortion. The results prove that current oEGM estimation techniques are insufficiently accurate. This study opens a new standpoint for the design of new-generation HD catheters and mapping software.
AuthorsS Ruipérez-Campillo, M Crespo, A Tormos, A Guill, A Cebrián, A Alberola, J Heimer, FJ Chorro, J Millet, F Castells
SubmittedPhysical and Engineering Sciences in Medicine
Date26.06.2023
Multimodal VAEs have recently received significant attention as generative models for weakly-supervised learning with multiple heterogeneous modalities. In parallel, VAE-based methods have been explored as probabilistic approaches for clustering tasks. Our work lies at the intersection of these two research directions. We propose a novel multimodal VAE model, in which the latent space is extended to learn data clusters, leveraging shared information across modalities. Our experiments show that our proposed model improves generative performance over existing multimodal VAEs, particularly for unconditional generation. Furthermore, our method favourably compares to alternative clustering approaches, in weakly-supervised settings. Notably, we propose a post-hoc procedure that avoids the need for our method to have a priori knowledge of the true number of clusters, mitigating a critical limitation of previous clustering frameworks.
AuthorsEmanuele Palumbo, Sonia Laguna, Daphné Chopard, Julia E Vogt
SubmittedICML 2023 Workshop on Structured Probabilistic Inference/Generative Modeling
Date23.06.2023
Multimodal VAEs have recently received significant attention as generative models for weaklysupervised learning with multiple heterogeneous modalities. In parallel, VAE-based methods have been explored as probabilistic approaches for clustering tasks. Our work lies at the intersection of these two research directions. We propose a novel multimodal VAE model, in which the latent space is extended to learn data clusters, leveraging shared information across modalities. Our experiments show that our proposed model improves generative performance over existing multimodal VAEs, particularly for unconditional generation. Furthermore, our method favorably compares to alternative clustering approaches, in weakly-supervised settings. Notably, we propose a post-hoc procedure that avoids the need for to have a priori knowledge of the true number of clusters, mitigating a critical limitation previous clustering frameworks.
AuthorsEmanuele Palumbo, Sonia Laguna, Daphné Chopard, Julia E Vogt
SubmittedICML 2023 Workshop DeployableGenerativeAI
Date23.06.2023
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) induces B and T cell responses, contributing to virus neutralization. In a cohort of 2,911 young adults, we identified 65 individuals who had an asymptomatic or mildly symptomatic SARS-CoV-2 infection and characterized their humoral and T cell responses to the Spike (S), Nucleocapsid (N) and Membrane (M) proteins. We found that previous infection induced CD4 T cells that vigorously responded to pools of peptides derived from the S and N proteins. By using statistical and machine learning models, we observed that the T cell response highly correlated with a compound titer of antibodies against the Receptor Binding Domain (RBD), S and N. However, while serum antibodies decayed over time, the cellular phenotype of these individuals remained stable over four months. Our computational analysis demonstrates that in young adults, asymptomatic and paucisymptomatic SARS-CoV-2 infections can induce robust and long-lasting CD4 T cell responses that exhibit slower decays than antibody titers. These observations imply that next-generation COVID-19 vaccines should be designed to induce stronger cellular responses to sustain the generation of potent neutralizing antibodies.
AuthorsRicards Marcinkevics*, Pamuditha N. Silva*, Anna-Katharina Hankele*, Charlyn Dörnte, Sarah Kadelka, Katharina Csik, Svenja Godbersen, Algera Goga, Lynn Hasenöhrl, Pascale Hirschi, Hasan Kabakci, Mary P. LaPierre, Johanna Mayrhofer, Alexandra C. Title, Xuan Shu, Nouell Baiioud, Sandra Bernal, Laura Dassisti, Mara D. Saenz-de-Juano, Meret Schmidhauser, Giulia Silvestrelli, Simon Z. Ulbrich, Thea J. Ulbrich, Tamara Wyss, Daniel J. Stekhoven, Faisal S. Al-Quaddoomi, Shuqing Yu, Mascha Binder, Christoph Schultheiβ, Claudia Zindel, Christoph Kolling, Jörg Goldhahn, Bahram Kasmapour Seighalani, Polina Zjablovskaja, Frank Hardung, Marc Schuster, Anne Richter, Yi-Ju Huang, Gereon Lauer, Herrad Baurmann, Jun Siong Low, Daniela Vaqueirinho, Sandra Jovic, Luca Piccoli, Sandra Ciesek, Julia E. Vogt, Federica Sallusto, Markus Stoffel†, Susanne E. Ulbrich†* denotes shared first authorship, † denotes shared last authorship
SubmittedFrontiers in Immunology
Date29.05.2023
Spurious correlations are everywhere. While humans often do not perceive them, neural networks are notorious for learning unwanted associations, also known as biases, instead of the underlying decision rule. As a result, practitioners are often unaware of the biased decision-making of their classifiers. Such a biased model based on spurious correlations might not generalize to unobserved data, leading to unintended, adverse consequences. We propose Signal is Harder (SiH), a variational-autoencoder-based method that simultaneously trains a biased and unbiased classifier using a novel, disentangling reweighting scheme inspired by the focal loss. Using the unbiased classifier, SiH matches or improves upon the performance of state-of-the-art debiasing methods. To improve the interpretability of our technique, we propose a perturbation scheme in the latent space for visualizing the bias that helps practitioners become aware of the sources of spurious correlations.
AuthorsMoritz Vandenhirtz, Laura Manduchi, Ricards Marcinkevics, Julia E. Vogt
SubmittedDomain Generalization Workshop, ICLR 2023
Date04.05.2023
Partitioning a set of elements into subsets of a priori unknown sizes is essential in many applications. These subset sizes are rarely explicitly learned - be it the cluster sizes in clustering applications or the number of shared versus independent generative latent factors in weakly-supervised learning. Probability distributions over correct combinations of subset sizes are non-differentiable due to hard constraints, which prohibit gradient-based optimization. In this work, we propose the differentiable hypergeometric distribution. The hypergeometric distribution models the probability of different group sizes based on their relative importance. We introduce reparameterizable gradients to learn the importance between groups and highlight the advantage of explicitly learning the size of subsets in two typical applications: weakly-supervised learning and clustering. In both applications, we outperform previous approaches, which rely on suboptimal heuristics to model the unknown size of groups.
AuthorsThomas M. Sutter, Laura Manduchi, Alain Ryser, Julia E. Vogt
SubmittedICLR 2023
Date01.05.2023
Machine learning (ML) is a discipline emerging from computer science with close ties to statistics and applied mathematics. Its fundamental goal is the design of computer programs, or algorithms, that learn to perform a certain task in an automated manner. Without explicit rules or knowledge, ML algorithms observe and possibly, interact with the surrounding world by the use of available data. Typically, as a result of learning, algorithms distil observations of complex phenomena into a general model which summarises the patterns, or regularities, discovered from the data. Modern ML algorithms regularly break records achieving impressive performance at a wide range of tasks, e.g. game playing, protein structure prediction, searching for particles in high-energy physics, and forecasting precipitation. The utility of machine learning methods for healthcare is apparent: it is often argued that given vast amounts of heterogeneous data, our understanding of diseases, patient management and outcomes can be enriched with the insights from machine learning. In this chapter, we will provide a nontechnical introduction to the ML discipline aimed at a general audience with an affinity for biomedical applications. We will familiarise the reader with the common types of algorithms and typical tasks these algorithms can solve and illustrate these basic concepts by concrete examples of current machine learning applications in healthcare. We will conclude with a discussion of the open challenges, limitations, and potential impact of machine-learning-powered medicine.
AuthorsJulia E. Vogt, Ece Özkan Elsen, Ricards Marcinkevics
SubmittedChapter in Digital Medicine: Bringing Digital Solutions to Medical Practice
Date31.03.2023
Contrastive learning is a cornerstone underlying recent progress in multi-view and multimodal learning, e.g., in representation learning with image/caption pairs. While its effectiveness is not yet fully understood, a line of recent work reveals that contrastive learning can invert the data generating process and recover ground truth latent factors shared between views. In this work, we present new identifiability results for multimodal contrastive learning, showing that it is possible to recover shared factors in a more general setup than the multi-view setting studied previously. Specifically, we distinguish between the multi-view setting with one generative mechanism (e.g., multiple cameras of the same type) and the multimodal setting that is characterized by distinct mechanisms (e.g., cameras and microphones). Our work generalizes previous identifiability results by redefining the generative process in terms of distinct mechanisms with modality-specific latent variables. We prove that contrastive learning can block-identify latent factors shared between modalities, even when there are nontrivial dependencies between factors. We empirically verify our identifiability results with numerical simulations and corroborate our findings on a complex multimodal dataset of image/text pairs. Zooming out, our work provides a theoretical basis for multimodal representation learning and explains in which settings multimodal contrastive learning can be effective in practice.
AuthorsImant Daunhawer, Alice Bizeul, Emanuele Palumbo, Alexander Marx, Julia E. Vogt
SubmittedThe Eleventh International Conference on Learning Representations, ICLR 2023
Date23.03.2023
Background and Objectives: Remote patient monitoring (RPM) of vital signs and symptoms for lung transplant recipients (LTRs) has become increasingly relevant in many situations. Nevertheless, RPM research integrating multisensory home monitoring in LTRs is scarce. We developed a novel multisensory home monitoring device and tested it in the context of COVID-19 vaccinations. We hypothesize that multisensory RPM and smartphone-based questionnaire feedback on signs and symptoms will be well accepted among LTRs. To assess the usability and acceptability of a remote monitoring system consisting of wearable devices, including home spirometry and a smartphone-based questionnaire application for symptom and vital sign monitoring using wearable devices, during the first and second SARS-CoV-2 vaccination. Materials and Methods: Observational usability pilot study for six weeks of home monitoring with the COVIDA Desk for LTRs. During the first week after the vaccination, intensive monitoring was performed by recording data on physical activity, spirometry, temperature, pulse oximetry and self-reported symptoms, signs and additional measurements. During the subsequent days, the number of monitoring assessments was reduced. LTRs reported on their perceptions of the usability of the monitoring device through a purpose-designed questionnaire. Results: Ten LTRs planning to receive the first COVID-19 vaccinations were recruited. For the intensive monitoring study phase, LTRs recorded symptoms, signs and additional measurements. The most frequent adverse events reported were local pain, fatigue, sleep disturbance and headache. The duration of these symptoms was 5–8 days post-vaccination. Adherence to the main monitoring devices was high. LTRs rated usability as high. The majority were willing to continue monitoring. Conclusions: The COVIDA Desk showed favorable technical performance and was well accepted by the LTRs during the vaccination phase of the pandemic. The feasibility of the RPM system deployment was proven by the rapid recruitment uptake, technical performance (i.e., low number of errors), favorable user experience questionnaires and detailed individual user feedback.
AuthorsMace M. Schuurmans, Michal Muszynski, Xiang Li, Ricards Marcinkevics, Lukas Zimmerli, Diego Monserrat Lopez, Bruno Michel, Jonas Weiss, Rene Hage, Maurice Roeder, Julia E. Vogt, Thomas Brunschwiler
SubmittedMedicina
Date20.03.2023
Data scarcity is a fundamental problem since data lies at the heart of any ML project. For most applications, annotation is an expensive task in addition to data collection. Thus, learning from limited labeled data is very critical for data-limited problems, such as in healthcare applications, to have the ability to learn in a sample-efficient manner. Self-supervised learning (SSL) can learn meaningful representations from exploiting structures in unlabeled data, which allows the model to achieve high accuracy in various downstream tasks, even with limited annotations. In this work, we extend contrastive learning, an efficient implementation of SSL, to cardiac imaging. We propose to use generated M(otion)-mode images from readily available B(rightness)-mode echocardiograms and design contrastive objectives with structure and patient-awareness. Experiments on EchoNet-Dynamic show that our proposed model can achieve an AUROC score of 0.85 by simply training a linear head on top of the learned representations, and is insensitive to the reduction of labeled data.
AuthorsHu Yurong, Thomas M. Sutter, Ece Oezkan, Julia E. Vogt
Submitted1st Workshop on Machine Learning & Global Health (ICLR 2023)
Date20.03.2023
Aims There is a clinical spectrum for atrial tachyarrhythmias wherein most patients with atrial tachycardia (AT) and some with atrial fibrillation (AF) respond to ablation, while others do not. It is undefined if this clinical spectrum has pathophysiological signatures. This study aims to test the hypothesis that the size of spatial regions showing repetitive synchronized electrogram (EGM) shapes over time reveals a spectrum from AT, to AF patients who respond acutely to ablation, to AF patients without acute response. Methods and results We studied n = 160 patients (35% women, 65.0 ± 10.4 years) of whom (i) n = 75 had AF terminated by ablation propensity matched to (ii) n = 75 without AF termination and (iii) n = 10 with AT. All patients had mapping by 64-pole baskets to identify areas of repetitive activity (REACT) to correlate unipolar EGMs in shape over time. Synchronized regions (REACT) were largest in AT, smaller in AF termination, and smallest in non-termination cohorts (0.63 ± 0.15, 0.37 ± 0.22, and 0.22 ± 0.18, P < 0.001). Area under the curve for predicting AF termination in hold-out cohorts was 0.72 ± 0.03. Simulations showed that lower REACT represented greater variability in clinical EGM timing and shape. Unsupervised machine learning of REACT and extensive (50) clinical variables yielded four clusters of increasing risk for AF termination (P < 0.01, χ2), which were more predictive than clinical profiles alone (P < 0.001). Conclusion The area of synchronized EGMs within the atrium reveals a spectrum of clinical response in atrial tachyarrhythmias. These fundamental EGM properties, which do not reflect any predetermined mechanism or mapping technology, predict outcome and offer a platform to compare mapping tools and mechanisms between AF patient groups.
AuthorsP Ganesan, B Deb, F Feng, M Rodrigo, S Ruipérez-Campillo, AJ Rogers, P Clopton, JJ Wang, S Zeemering, U Schotten, WJ Rappel, SM Narayan
SubmittedEuropace
Date18.03.2023
Electronic health records contain a wealth of valuable information for improving healthcare. There are, however, challenges associated with clinical text that prevent computers from maximising the utility of such information. While deep learning (DL) has emerged as a practical paradigm for dealing with the complexities of natural language, applying this class of machine learning algorithms to clinical text raises several research questions. First, we tackled the problem of data sparsity by looking into the task of adverse event detection. As these events are rare, examples thereof are lacking. To compensate for data scarcity, we leveraged large pre-trained language models (LMs) in combination with formally represented medical knowledge. We demonstrated that such a combination exhibits remarkable generalisation abilities despite the low availability of data. Second, we focused on the omnipresence of short forms in clinical texts. This typically leads to out-of-vocabulary problems, which motivates unlocking the underlying words. The novelty of our approach lies in its capacity to learn how to automatically expand short forms without resorting to external resources. Third, we investigated data augmentation to address the issue of data scarcity at its core. To the best of our knowledge, we were one of the firsts to investigate population-based augmentation for scheduling text data augmentation. Interestingly, little improvement was seen in fine-tuning large pre-trained LMs with the augmented data. We suggest that, as LMs proved able to cope well with small datasets, the need for data augmentation was made redundant. We conclude that DL approaches to clinical text mining should be developed by fine-tuning large LMs. One area where such models may struggle is the use of clinical short forms. Our method to automating their expansion fixes this issue. Together, these two approaches provide a blueprint for successfully developing DL approaches to clinical text mining in low-data regimes.
AuthorsDaphné Chopard
SubmittedPhD Thesis
Date15.03.2023
Multimodal VAEs have recently gained attention as efficient models for weakly-supervised generative learning with multiple modalities. However, all existing variants of multimodal VAEs are affected by a non-trivial trade-off between generative quality and generative coherence. In particular mixture-based models achieve good coherence only at the expense of sample diversity and a resulting lack of generative quality. We present a novel variant of the mixture-of-experts multimodal variational autoencoder that improves its generative quality, while maintaining high semantic coherence. We model shared and modality-specific information in separate latent subspaces, proposing an objective that overcomes certain dependencies on hyperparameters that arise for existing approaches with the same latent space structure. Compared to these existing approaches, we show increased robustness with respect to changes in the design of the latent space, in terms of the capacity allocated to modality-specific subspaces. We show that our model achieves both good generative coherence and high generative quality in challenging experiments, including more complex multimodal datasets than those used in previous works.
AuthorsEmanuele Palumbo, Imant Daunhawer, Julia E. Vogt
SubmittedThe Eleventh International Conference on Learning Representations, ICLR 2023
Date02.03.2023
Objective: The aim of this study is to propose a method to reduce the sensitivity of the estimated omnipolar electrogram (oEGM) with respect to the angle of the propagation wavefront. Methods: A novel configuration of cliques taking into account all four electrodes of a squared cell is proposed. To test this approach, simulations of HD grids of cardiac activations at different propagation angles, conduction velocities, interelectrode distance and electrogram waveforms are considered. Results: The proposed approach successfully provided narrower loops (essentially a straight line) of the electrical field described by the bipole pair with respect to the conventional approach. Estimation of the direction of propagation was improved. Additionally, estimated oEGMs presented larger amplitude, and estimations of the local activation times were more accurate. Conclusions: A novel method to improve the estimation of oEGMs in HD grid of electrodes is proposed. This approach is superior to the existing methods and avoids pitfalls not yet resolved. Relevance: Robust tools for quantifying the cardiac substrate are crucial to determine with accuracy target ablation sites during an electrophysiological procedure.
AuthorsF Castells*, S Ruipérez-Campillo*, I Segarra, R Cervigón, R Casado-Arroyo, JL Merino, J Millet* denotes shared first authorship
SubmittedComputers in Biology and Medicine
Date01.03.2023
Background Ventricular arrhythmia (VA) precipitating sudden cardiac arrest (SCD) is among the most frequent causes of death and pose a high burden on public health systems worldwide. The increasing availability of electrophysiological signals collected through conventional methods (e.g. electrocardiography (ECG)) and digital health technologies (e.g. wearable devices) in combination with novel predictive analytics using machine learning (ML) and deep learning (DL) hold potential for personalised predictions of arrhythmic events. Methods This systematic review and exploratory meta-analysis assesses the state-of-the-art of ML/DL models of electrophysiological signals for personalised prediction of malignant VA or SCD, and studies potential causes of bias (PROSPERO, reference: CRD42021283464). Five electronic databases were searched to identify eligible studies. Pooled estimates of the diagnostic odds ratio (DOR) and summary area under the curve (AUROC) were calculated. Meta-analyses were performed separately for studies using publicly available, ad-hoc datasets, versus targeted clinical data acquisition. Studies were scored on risk of bias by the PROBAST tool. Findings 2194 studies were identified of which 46 were included in the systematic review and 32 in the meta-analysis. Pooling of individual models demonstrated a summary AUROC of 0.856 (95% CI 0.755–0.909) for short-term (time-to-event up to 72 h) prediction and AUROC of 0.876 (95% CI 0.642–0.980) for long-term prediction (time-to-event up to years). While models developed on ad-hoc sets had higher pooled performance (AUROC 0.919, 95% CI 0.867–0.952), they had a high risk of bias related to the re-use and overlap of small ad-hoc datasets, choices of ML tool and a lack of external model validation. Interpretation ML and DL models appear to accurately predict malignant VA and SCD. However, wide heterogeneity between studies, in part due to small ad-hoc datasets and choice of ML model, may reduce the ability to generalise and should be addressed in future studies.
AuthorsMZH Kolk, B Deb, S Ruipérez-Campillo, NK Bhatia, P Clopton, AAM Wilde, SM Narayan, RE Knops, FVY Tjong
SubmittedThe Lancet eBiomedicine
Date01.03.2023
Interpretability and explainability are crucial for machine learning (ML) and statistical applications in medicine, economics, law, and natural sciences and form an essential principle for ML model design and development. Although interpretability and explainability have escaped a precise and universal definition, many models and techniques motivated by these properties have been developed over the last 30 years, with the focus currently shifting toward deep learning. We will consider concrete examples of state-of-the-art, including specially tailored rule-based, sparse, and additive classification models, interpretable representation learning, and methods for explaining black-box models post hoc. The discussion will emphasize the need for and relevance of interpretability and explainability, the divide between them, and the inductive biases behind the presented “zoo” of interpretable models and explanation methods.
AuthorsRicards Marcinkevics, Julia E. Vogt
SubmittedWIREs Data Mining and Knowledge Discovery
Date28.02.2023
2022
Many modern research fields increasingly rely on collecting and analysing massive, often unstructured, and unwieldy datasets. Consequently, there is growing interest in machine learning and artificial intelligence applications that can harness this `data deluge'. This broad nontechnical overview provides a gentle introduction to machine learning with a specific focus on medical and biological applications. We explain the common types of machine learning algorithms and typical tasks that can be solved, illustrating the basics with concrete examples from healthcare. Lastly, we provide an outlook on open challenges, limitations, and potential impacts of machine-learning-powered medicine.
AuthorsRicards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt
SubmittedArxiv
Date23.12.2022
The robustness of machine learning algorithms to distributions shift is primarily discussed in the context of supervised learning (SL). As such, there is a lack of insight on the robustness of the representations learned from unsupervised methods, such as self-supervised learning (SSL) and auto-encoder based algorithms (AE), to distribution shift. We posit that the input-driven objectives of unsupervised algorithms lead to representations that are more robust to distribution shift than the target-driven objective of SL. We verify this by extensively evaluating the performance of SSL and AE on both synthetic and realistic distribution shift datasets. Following observations that the linear layer used for classification itself can be susceptible to spurious correlations, we evaluate the representations using a linear head trained on a small amount of out-of-distribution (OOD) data, to isolate the robustness of the learned representations from that of the linear head. We also develop "controllable" versions of existing realistic domain generalisation datasets with adjustable degrees of distribution shifts. This allows us to study the robustness of different learning algorithms under versatile yet realistic distribution shift conditions. Our experiments show that representations learned from unsupervised learning algorithms generalise better than SL under a wide variety of extreme as well as realistic distribution shifts.
AuthorsYuge Shi, Imant Daunhawer, Julia E. Vogt, Philip H.S. Torr, Amartya Sanyal
SubmittedThe Eleventh International Conference on Learning Representations, ICLR 2023
Date16.12.2022
Early detection of cardiac dysfunction through routine screening is vital for diagnosing cardiovascular diseases. An important metric of cardiac function is the left ventricular ejection fraction (EF), which is used to diagnose cardiomyopathy. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound being a low-cost, real-time, and non-ionizing technology. However, human assessment of echocardiograms for calculating EF is both time-consuming and expertise-demanding, raising the need for an automated approach. Earlier automated works have been limited to still images or use echocardiogram videos with spatio-temporal convolutions in a complex pipeline. In this work, we propose to generate images from readily available echocardiogram videos, each image mimicking a M(otion)-mode image from a different scan line through time. We then combine different M-mode images using off-the-shelf model architectures to estimate the EF and, thus, diagnose cardiomyopathy. Our experiments show that our proposed method converges with only ten modes and is comparable to the baseline method while bypassing its cumbersome training process.
AuthorsThomas Sutter, Sebastian Balzer, Ece Özkan Elsen, Julia E. Vogt
SubmittedMedical Imaging Meets NeurIPS Workshop 2022
Date02.12.2022
Three-dimensional imaging of live processes at a cellular level is a challenging task. It requires high-speed acquisition capabilities, low phototoxicity, and low mechanical disturbances. Three-dimensional imaging in microfluidic devices poses additional challenges as a deep penetration of the light source is required, along with a stationary setting, so the flows are not perturbed. Different types of fluorescence microscopy techniques have been used to address these limitations; particularly, confocal microscopy and light sheet fluorescence microscopy (LSFM). This manuscript proposes a novel architecture of a type of LSFM, single-plane illumination microscopy (SPIM). This custom-made microscope includes two mirror galvanometers to scan the sample vertically and reduce shadowing artifacts while avoiding unnecessary movement. In addition, two electro-tunable lenses fine-tune the focus position and reduce the scattering caused by the microfluidic devices. The microscope has been fully set up and characterized, achieving a resolution of 1.50 μ m in the x-y plane and 7.93 μ m in the z-direction. The proposed architecture has risen to the challenges posed when imaging microfluidic devices and live processes, as it can successfully acquire 3D volumetric images together with time-lapse recordings, and it is thus a suitable microscopic technique for live tracking miniaturized tissue and disease models.
AuthorsClara Gomez-Cruz, Sonia Laguna, Ariadna Bachiller-Pulido, Cristina Quilez, Marina Ca\~nadas-Ortega, Ignacio Albert-Smet, Jorge Ripoll, Arrate Mu\~noz-Barrutia
SubmittedBiosensors
Date01.12.2022
Synthetic super-resolved images generated by a machine learning algorithm from portable low-field-strength (0.064-T) brain MRI had good agreement with real images at high field strength (1.5–3 T).
AuthorsJuan Eugenio Iglesias, Riana Schleicher, Sonia Laguna, Benjamin Billot, Pamela Schaefer, Brenna McKaig, Joshua N Goldstein, Kevin N Sheth, Matthew S Rosen, W Taylor Kimberly
SubmittedRadiology
Date08.11.2022
Humans naturally integrate various senses to understand our surroundings, enabling us to compensate for partially missing sensory input.On the contrary, machine learning models excel at harnessing extensive datasets but face challenges in handling missing data effectively. While utilizing multiple data types provides a more comprehensive perspective, it also raises the likelihood of encountering missing values, underscoring the significance of proper missing data management in machine learning techniques. In this thesis, we advocate for developing machine learning models that emulate the human approach of merging diverse sensory inputs into a unified representation, demonstrating resilience in the face of missing input sources. Generating labels for multiple data types is laborious and often costly, resulting in a scarcity of fully annotated multimodal datasets. On the other hand, multimodal data naturally possesses a form of weak supervision. We understand that these samples describe the same event and assume that certain underlying generative factors are shared among the group members, providing a form of weak guidance. Our thesis focuses on learning from data characterized by weak supervision, delving into the interrelationships among group members. We start by exploring novel techniques for machine learning models capable of processing multimodal inputs while effectively handling missing data. Our emphasis is on variational autoencoders (VAE) for learning from weakly supervised data. We introduce a generalized formulation of probabilistic aggregation functions, designed to overcome the limitations of previous …
AuthorsThomas M. Sutter
Date30.09.2022
Background: Arm use metrics derived from wrist-mounted movement sensors are widely used to quantify the upper limb performance in real-life conditions of individuals with stroke throughout motor recovery. The calculation of real-world use metrics, such as arm use duration and laterality preferences, relies on accurately identifying functional movements. Hence, classifying upper limb activity into functional and non-functional classes is paramount. Acceleration thresholds are conventionally used to distinguish these classes. However, these methods are challenged by the high inter and intra-individual variability of movement patterns. In this study, we developed and validated a machine learning classifier for this task and compared it to methods using conventional and optimal thresholds.Methods: Individuals after stroke were video-recorded in their home environment performing semi-naturalistic daily tasks while wearing wrist-mounted inertial measurement units. Data were labeled frame-by-frame following the Taxonomy of Functional Upper Limb Motion definitions, excluding whole-body movements, and sequenced into 1-s epochs. Actigraph counts were computed, and an optimal threshold for functional movement was determined by receiver operating characteristic curve analyses on group and individual levels. A logistic regression classifier was trained on the same labels using time and frequency domain features. Performance measures were compared between all classification methods.Results: Video data (6.5 h) of 14 individuals with mild-to-severe upper limb impairment were labeled. Optimal activity count thresholds were ≥20.1 for the affected side and ≥38.6 for the unaffected side and showed high predictive power with an area under the curve (95% CI) of 0.88 (0.87,0.89) and 0.86 (0.85, 0.87), respectively. A classification accuracy of around 80% was equivalent to the optimal threshold and machine learning methods and outperformed the conventional threshold by ∼10%. Optimal thresholds and machine learning methods showed superior specificity (75–82%) to conventional thresholds (58–66%) across unilateral and bilateral activities.Conclusion: This work compares the validity of methods classifying stroke survivors’ real-life arm activities measured by wrist-worn sensors excluding whole-body movements. The determined optimal thresholds and machine learning classifiers achieved an equivalent accuracy and higher specificity than conventional thresholds. Our open-sourced classifier or optimal thresholds should be used to specify the intensity and duration of arm use.
AuthorsJohannes Pohl, Alain Ryser, Janne Marieke Veerbeek, Geert Verheyden, Julia Elisabeth Vogt, Andreas Rüdiger Luft, Chris Awai Easthope
SubmittedFrontiers in Physiology
Date28.09.2022
Background: Stroke leads to motor impairment which reduces physical activity, negatively affects social participation, and increases the risk of secondary cardiovascular events. Continuous monitoring of physical activity with motion sensors is promising to allow the prescription of tailored treatments in a timely manner. Accurate classification of gait activities and body posture is necessary to extract actionable information for outcome measures from unstructured motion data. We here develop and validate a solution for various sensor configurations specifically for a stroke population.Methods: Video and movement sensor data (locations: wrists, ankles, and chest) were collected from fourteen stroke survivors with motor impairment who performed real-life activities in their home environment. Video data were labeled for five classes of gait and body postures and three classes of transitions that served as ground truth. We trained support vector machine (SVM), logistic regression (LR), and k-nearest neighbor (kNN) models to identify gait bouts only or gait and posture. Model performance was assessed by the nested leave-one-subject-out protocol and compared across five different sensor placement configurations.Results: Our method achieved very good performance when predicting real-life gait versus non-gait (Gait classification) with an accuracy between 85% and 93% across sensor configurations, using SVM and LR modeling. On the much more challenging task of discriminating between the body postures lying, sitting, and standing as well as walking, and stair ascent/descent (Gait and postures classification), our method achieves accuracies between 80% and 86% with at least one ankle and wrist sensor attached unilaterally. The Gait and postures classification performance between SVM and LR was equivalent but superior to kNN.Conclusion: This work presents a comparison of performance when classifying Gait and body postures in post-stroke individuals with different sensor configurations, which provide options for subsequent outcome evaluation. We achieved accurate classification of gait and postures performed in a real-life setting by individuals with a wide range of motor impairments due to stroke. This validated classifier will hopefully prove a useful resource to researchers and clinicians in the increasingly important field of digital health in the form of remote movement monitoring using motion sensors.
AuthorsJohannes Pohl, Alain Ryser, Janne Marieke Veerbeek, Geert Verheyden, Julia Elisabeth Vogt, Andreas Rüdiger Luft, Chris Awai Easthope
SubmittedFrontiers in Physiology
Date26.09.2022
Pulmonary hypertension (PH) in newborns and infants is a complex condition associated with several pulmonary, cardiac, and systemic diseases contributing to morbidity and mortality. Therefore, accurate and early detection of PH is crucial for successful management. Using echocardiography, the primary diagnostic tool in pediatrics, human assessment is both time-consuming and expertise-demanding, raising the need for an automated approach. In this work, we present an interpretable multi-view video-based deep learning approach to predict PH for a cohort of 194 newborns using echocardiograms. We use spatio-temporal convolutional architectures for the prediction of PH from each view, and aggregate the predictions of the different views using majority voting. To the best of our knowledge, this is the first work for an automated assessment of PH in newborns using echocardiograms. Our results show a mean F1-score of 0.84 for severity prediction and 0.92 for binary detection using 10-fold cross-validation. We complement our predictions with saliency maps and show that the learned model focuses on clinically relevant cardiac structures, motivating its usage in clinical practice.
AuthorsHanna Ragnarsdottir, Laura Manduchi, Holger Michel, Fabian Laumer, Sven Wellmann, Ece Özkan Elsen, Julia E. Vogt
SubmittedDAGM German Conference on Pattern Recognition
Date20.09.2022
Partitioning a set of elements into an unknown number of mutually exclusive subsets is essential in many machine learning problems. However, assigning elements, such as samples in a dataset or neurons in a network layer, to an unknown and discrete number of subsets is inherently non-differentiable, prohibiting end-to-end gradient-based optimization of parameters. We overcome this limitation by proposing a novel two-step method for inferring partitions, which allows its usage in variational inference tasks. This new approach enables reparameterized gradients with respect to the parameters of the new random partition model. Our method works by inferring the number of elements per subset and, second, by filling these subsets in a learned order. We highlight the versatility of our general-purpose approach on two different challenging experiments: multitask learning and inference of shared and independent generative factors under weak supervision.
AuthorsThomas M. Sutter*, Alain Ryser*, Joram Liebeskind, Julia E Vogt* denotes shared first authorship
SubmittedICML 2023 Workshop on Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators
Date17.09.2022
CD8+ T cells underpin effective anti-tumor immune responses in melanoma; however, their functions are attenuated due to various immunosuppressive factors in the tumor microenvironment (TME), resulting in disease progression. T cell function is elicited by the T cell receptor (TCR), which recognizes antigen peptide-major histocompatibility complex (pMHC) expressed on tumor cells via direct physical contact, i.e., two-dimensional (2D) interaction. TCR–pMHC 2D affinity plays a central role in antigen recognition and discrimination, and is sensitive to both the conditions of the T cell and the microenvironment in which it resides. Herein, we demonstrate that CD8+ T cells residing in TME have lower 2D TCR–pMHC bimolecular affinity and TCR–pMHC–CD8 trimolecular avidity, pull fewer TCR–pMHC bonds by endogenous forces, flux lower level of intracellular calcium in response to antigen stimulation, exhibit impaired in vivo activation, and show diminished anti-tumor effector function. These detrimental effects are localized in the tumor and tumor draining lymph node (TdLN), and affect both antigen-inexperienced and antigen-experienced CD8+ T cells irrespective of their TCR specificities. These findings implicate impaired antigen recognition as a mechanism of T cell dysfunction in the TME.
AuthorsZ Yuan, MJ O’Melia, K Li, J Lyu, F Zhou, P Jothikumar, NA Rohner, MP Manspeaker, DM Francis, K Bai, C Ge, MN Rushdi, L Chingozha, S Ruipérez-Campillo, H Lu, SN Thomas, C Zhu
SubmittedbioRxiv
Date13.09.2022
The diagnosis and treatment of cardiac arrhythmias relies on catheter recordings, that may be inefficient because of the continued use of the bipolar processing and analysis techniques of traditional catheters, missing the potential of the novel matrix catheters. This results in the need of more processing of the signals and longer cardiac scans to obtain accurate information about the state of the tissue being analysed. This study proposes a new clique configuration to compute omnipolar EGM (oEGM) in multi-electrode array catheters to obtain parameters of interest in a more robust and accurate manner. Numerous simulations with varying input parameters are designed to emulate the propagation of electrical activity on the cardiac tissue surface captured by the catheter and characterise the differences between the classic method of omnipolar analysis (triangular clique) and our proposed new method (cross clique). The results show that the cross clique is more robust to variations in the direction of wave propagation, and more accurate in the estimation of the local activation time (LAT).
AuthorsISegarra, S Ruipérez-Campillo, FCastells, J Millet
SubmittedIEEE Computing in Cardiology (49th CinC, 2022)
Date04.09.2022
Automated segmentation of myocardial fibrosis in late gadolinium enhancement (LGE) cardiac MRI (CMR) has the potential to improve efficiency and precision of diagnosis and treatment of cardiomyopathies. However, state-of-the-art Deep Learning approaches require manual pixel-level annotations. Using weaker labels can greatly reduce manual annotation time and expedite dataset curation, which is why we propose fibrosis segmentation methods using either slice-level or stack-level fibrosis labels. 5759 short-axis LGE CMR image slices were retrospectively obtained from 482 patients. U-Nets with slice-level and stack-level supervision are trained with 446 weakly-labeled patients by making use of a myocardium segmentation U-Net and fibrosis classification Dilated Residual Networks (DRN). For comparison, a U-Net is trained with pixel-level supervision using a training set of 81 patients. On the proprietary test set of 24 patients, pixel-level, slice-level and stack-level supervision reach Dice scores of 0.74, 0.70 and 0.70, while on the external Emidec dataset of 100 patients Dice scores of 0.55, 0.61 and 0.52 were obtained. Results indicate that using larger weakly-annotated datasets can approach the performance of methods using pixel-level annotated datasets and potentially improve generalization to external datasets.
AuthorsRC Klein, RE van Lieshout, MZH Kolk, K Geijtenbeek, R Vos, S Ruipérez-Campillo, R Feng, B Deb, P Ganesan, RE Knops, I Isgum, SM Narayan, E Bekkers, B Vos FVY Tong
SubmittedIEEE Computing in Cardiology (49th CinC, 2022)
Date04.09.2022
Many patients at high risk of life-threatening ventricular arrhythmias (VA) and sudden cardiac death (SCD) who received an implantable cardioverter defibrillator (ICD), never receive appropriate device therapy. The presence of fibrosis on LGE CMR imaging is shown to be associated with increased risk of VA. Therefore, there is a strong need for both automatic segmentation and quantification of cardiac fibrosis as well as better risk stratification for SCD. This study first presents a novel two-stage deep learning network for the segmentation of left ventricle myocardium and fibrosis on LGE CMR images. Secondly it aims to effectively predict device therapy in ICD patients by using a graph neural network approach which incorporates both myocardium and fibrosis features as well as the left ventricle geometry. Our segmentation network outperforms previous state-of-the-art methods on 2D CMR data, reaching a Dice score of 0.82 and 0.77 on myocardium and fibrosis segmentation, respectively. The ICD therapy prediction network reaches an AUC of 0.60 while using only CMR data and outperforms baseline methods based on current guideline markers for ICD implantation. This work lays a strong basis for future research on improved risk stratification for VA and SCD.
AuthorsFE van Lieshout, RC Klein, MZH Kolk, K van Geijtenbeek, R Vos, S Ruipérez-Campillo, R Feng, B Deb, P Ganesan, RE Knops, I Isgum, SM Narayan, E Bekkers, B Vos, FVY Tjong
SubmittedIEEE Computing in Cardiology (49th CinC, 2022)
Date04.09.2022
Accurate non-invasive diagnoses in the context of cardiac diseases are problems that hitherto remain unresolved. We propose an unsupervised classification of atrial flutter (AFL) using dimensional transforms of ECG signals in high dimensional vector spaces. A mathematical model is used to generate synthetic signals based on clinical AFL signals, and hierarchical clustering analysis and novel machine learning (ML) methods are designed for the un-supervised classification. Metrics and accuracy parameters are created to assess the performance of the model, proving the power of this novel approach for the diagnosis of AFL from ECG using innovative AI algorithms.
AuthorsS Ruipérez-Campillo, J Millet, FCastells
SubmittedIEEE Computing in Cardiology (49th CinC, 2022)
Date04.09.2022
Persistent atrial fibrillation ablation has a high recurrence rate. In this work, we performed an analysis of bipolar intracavitary signals obtained with a conventional 24-pole diagnostic catheter (Woven Orbiter) placed in the right atrium and coronary sinus in a cohort of patients with persistent atrial fibrillation undergoing ablation to detect features predictive of acute procedural success (conversion to sinus rhythm during ablation) and the occurrence of recurrences. The goal is to arrive at a quantitative description of the degree of randomness of the atrial response in atrial fibrillation and to demonstrate the presence of hidden periodic components. This was done by the determination of the autocorrelation function. Results showed that higher correlation in relative maximum peaks, and a lower dominant atrial frequency (greater distance between relative amplitude maxima) may be associated with a greater likelihood of achieving reversion to sinus rhythm and lower probability of recurrences. A larger study is needed to draw conclusions.
AuthorsR Cervigón, E Franco, S Ruipérez-Campillo, C Lozano, F Castells, J Moreno
SubmittedIEEE Computing in Cardiology (49th CinC, 2022)
Date04.09.2022
Loneliness in older adults is associated with functional decline, depression and even death. Given the prevalence of loneliness, the aim of this study was to examine the association between loneliness and cardiac biomarkers in older people that attend to cardiology consultation. The results showed that loneliness was more prevalent in women than in men, and it was associated with marital status too. ECG recording were analyzed and QT interval and T-wave length showed higher values in people suffering from loneliness, as well as higher cardiac frequency, where the presence of meaning in life be a protective factor. Studies with a larger sample size are needed, but these results appear to show a relationship between biomarkers and mental state.
AuthorsML Cardo, A Chulián, S Ruipérez-Campillo, J Millet, F Castells, R Cervigón
SubmittedIEEE Computing in Cardiology (49th CinC, 2022)
Date04.09.2022
We propose a novel anomaly detection method for echocardiogram videos. The introduced method takes advantage of the periodic nature of the heart cycle to learn three variants of a variational latent trajectory model (TVAE). While the first two variants (TVAE-C and TVAE-R) model strict periodic movements of the heart, the third (TVAE-S) is more general and allows shifts in the spatial representation throughout the video. All models are trained on the healthy samples of a novel in-house dataset of infant echocardiogram videos consisting of multiple chamber views to learn a normative prior of the healthy population. During inference, maximum a posteriori (MAP) based anomaly detection is performed to detect out-of-distribution samples in our dataset. The proposed method reliably identifies severe congenital heart defects, such as Ebstein’s Anomaly or Shone-complex. Moreover, it achieves superior performance over MAP-based anomaly detection with standard variational autoencoders when detecting pulmonary hypertension and right ventricular dilation. Finally, we demonstrate that the proposed method enables interpretable explanations of its output through heatmaps highlighting the regions corresponding to anomalous heart structures.
AuthorsAlain Ryser, Laura Manduchi, Fabian Laumer, Holger Michel, Sven Wellmann, Julia E. Vogt
SubmittedThe Seventh Machine Learning for Healthcare Conference, MLHC 2022
Date05.08.2022
Deep neural networks for image-based screening and computer-aided diagnosis have achieved expert-level performance on various medical imaging modalities, including chest radiographs. Recently, several works have indicated that these state-of-the-art classifiers can be biased with respect to sensitive patient attributes, such as race or gender, leading to growing concerns about demographic disparities and discrimination resulting from algorithmic and model-based decision-making in healthcare. Fair machine learning has focused on mitigating such biases against disadvantaged or marginalised groups, mainly concentrating on tabular data or natural images. This work presents two novel intra-processing techniques based on fine-tuning and pruning an already-trained neural network. These methods are simple yet effective and can be readily applied post hoc in a setting where the protected attribute is unknown during the model development and test time. In addition, we compare several intra- and post-processing approaches applied to debiasing deep chest X-ray classifiers. To the best of our knowledge, this is one of the first efforts studying debiasing methods on chest radiographs. Our results suggest that the considered approaches successfully mitigate biases in fully connected and convolutional neural networks offering stable performance under various settings. The discussed methods can help achieve group fairness of deep medical image classifiers when deploying them in domains with different fairness considerations and constraints.
AuthorsRicards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt
SubmittedThe Seventh Machine Learning for Healthcare Conference, MLHC 2022
Date05.08.2022
Arguably, interpretability is one of the guiding principles behind the development of machine-learning-based healthcare decision support tools and computer-aided diagnosis systems. There has been a renewed interest in interpretable classification based on high-level concepts, including, among other model classes, the re-exploration of concept bottleneck models. By their nature, medical diagnosis, patient management, and monitoring require the assessment of multiple views and modalities to form a holistic representation of the patient's state. For instance, in ultrasound imaging, a region of interest might be registered from multiple views that are informative about different sets of clinically relevant features. Motivated by this, we extend the classical concept bottleneck model to the multiview classification setting by representation fusion across the views. We apply our multiview concept bottleneck model to the dataset of ultrasound images acquired from a cohort of pediatric patients with suspected appendicitis to predict the disease. The results suggest that auxiliary supervision from the concepts and aggregation across multiple views help develop more accurate and interpretable classifiers.
AuthorsUgne Klimiene*, Ricards Marcinkevics*, Patricia Reis Wolfertstetter, Ece Özkan Elsen, Alyssia Paschke, David Niederberger, Sven Wellmann, Christian Knorr, Julia E Vogt* denotes shared first authorship
SubmittedOral spotlight at the 2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH), ICML 2022
Date23.07.2022
We propose a novel anomaly detection method for echocardiogram videos. The introduced method takes advantage of the periodic nature of the heart cycle to learn different variants of a variational latent trajectory model (TVAE). The models are trained on the healthy samples of an in-house dataset of infant echocardiogram videos consisting of multiple chamber views to learn a normative prior of the healthy population. During inference, maximum a posteriori (MAP) based anomaly detection is performed to detect out-ofdistribution samples in our dataset. The proposed method reliably identifies severe congenital heart defects, such as Ebstein’s Anomaly or Shonecomplex. Moreover, it achieves superior performance over MAP-based anomaly detection with standard variational autoencoders on the task of detecting pulmonary hypertension and right ventricular dilation. Finally, we demonstrate that the proposed method provides interpretable explanations of its output through heatmaps which highlight the regions corresponding to anomalous heart structures.
AuthorsAlain Ryser, Laura Manduchi, Fabian Laumer, Holger Michel, Sven Wellmann, Julia E. Vogt
SubmittedPoster at the 2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH), ICML 2022
Date23.07.2022
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) induces both B and T cell responses which jointly contribute to effective neutralization and clearance of the virus. Multiple compartments of circulating immune memory to SARS-CoV-2 are not fully understood. We analyzed humoral and T cell immune responses in young convalescent adults with previous asymptomatic SARS-CoV-2 infections or mildly symptomatic COVID-19 disease. We concomitantly measured antibodies in the blood and analyzed SARS-CoV-2-reactive T cell reaction in response to overlapping peptide pools of four viral proteins in peripheral blood mononuclear cells (PBMC). Using statistical and machine learning models, we investigated whether T cell reactivity predicted antibody status. Individuals with previous SARS-CoV-2 infection differed in T cell responses from non-infected individuals. Subjects with previous SARS-CoV-2 infection exhibited CD4+ T cell responses against S1-, N-proteins and CoV-Mix (containing N, M and S protein-derived peptides) that were dominant over CD8+ T cells. At the same time, signals against the M protein were less pronounced. Double positive IL2+/CD154+ and IFN+/TNF+ CD4+ T cells showed the strongest association with antibody titers. T-cell reactivity to CoV-Mix-, S1-, and N-antigens were most strongly associated with humoral immune response, specifically with a compound antibody titer consisting of RBD, S1, S2, and NP. The T cell phenotype of SARS-CoV-2 infected individuals was stable for four months, thereby exceeding antibody decay rates. Our findings demonstrate that mild COVID-19 infections can elicit robust SARS-CoV-2 T-cell reactive immunity against specific components of SARS-CoV-2.
AuthorsRicards Marcinkevics*, Pamuditha Silva*, Anna-Katharina Hankele*, Katharina Csik, Svenja Godbersen, Algera Goga, Lynn Hasenöhrl, Pascale Hirschi, Hasan Kabakci, Mary P LaPierre, Johanna Mayrhofer, Alexandra Title, Xuan Shu, Nouell Baiioud, Sandra Bernal, Laura Dassisti, Mara D Saenz-de-Juano, Meret Schmidhauser, Giulia Silvestrelli, Simon Z Ulbrich, Thea J Ulbrich, Tamara Wyss, Daniel J Stekhoven, Faisal S Al-Quaddoomi, Shuqing Yu, Mascha Binder, Christoph Schultheiss, Claudia Zindel, Christoph Kolling, Jörg Goldhahn, Bahram Kasmapour, Polina Zjablovskaja, Frank Hardung, Anne Richter, Stefan Miltenyi, Luca Piccoli, Sandra Ciesek, Julia E Vogt, Federica Sallusto, Markus Stoffel†, Susanne E Ulbrich†* denotes shared first authorship, † denotes shared last authorship
SubmittedThe 1st Workshop on Healthcare AI and COVID-19 at ICML 2022
Date22.07.2022
Portable low-field MRI has the potential to revolutionize neuroimaging, by enabling pointof-care imaging and affordable scanning in underserved areas. The lower resolution and signal-to-noise ratio of these scans preclude image analysis with existing tools. Superresolution (SR) methods can overcome this limitation, but: (i) training with downsampled high-field scans fails to generalize; and (ii) training with paired low/high-field data is hard due to the lack of perfectly aligned images. Here, we present an architecture that combines denoising, SR and domain adaptation modules to tackle this problem. The denoising and SR components are pretrained in a supervised fashion with large amounts of existing high-resolution data, whereas unsupervised learning is used for domain adaptation and end-to-end finetuning. We present preliminary results on a dataset of 11 low-field scans. The results show that our method enables segmentation with existing tools, which yield ROI volumes that correlate strongly with those derived from high-field scans (ρ > 0.8).
AuthorsSonia Laguna, Riana Schleicher, Benjamin Billot, Pamela Schaefer, Brenna McKaig, Joshua N Goldstein, Kevin N Sheth, Matthew S Rosen, W Taylor Kimberly, Juan Eugenio Iglesias
SubmittedMedical Imaging with Deep Learning
Date07.07.2022
We study the problem of identifying cause and effect over two univariate continuous variables X and Y from a sample of their joint distribution. Our focus lies on the setting when the variance of the noise may be dependent on the cause. We propose to partition the domain of the cause into multiple segments where the noise indeed is dependent. To this end, we minimize a scale-invariant, penalized regression score, finding the optimal partitioning using dynamic programming. We show under which conditions this allows us to identify the causal direction for the linear setting with heteroscedastic noise, for the non-linear setting with homoscedastic noise, as well as empirically confirm that these results generalize to the non-linear and heteroscedastic case. Altogether, the ability to model heteroscedasticity translates into an improved performance in telling cause from effect on a wide range of synthetic and real-world datasets.
AuthorsSascha Xu, Osman A Mian, Alexander Marx, Jilles Vreeken
SubmittedProceedings of the 39th International Conference on Machine Learning, ICML 2022
Date28.06.2022
Motivation: Global acronyms are used in written text without their formal definitions. This makes it difficult to automatically interpret their sense as acronyms tend to be ambiguous. Supervised machine learning approaches to sense disambiguation require large training datasets. In clinical applications, large datasets are difficult to obtain due to patient privacy. Manual data annotation creates an additional bottleneck. Results: We proposed an approach to automatically modifying scientific abstracts to (i) simulate global acronym usage and (ii) annotate their senses without the need for external sources or manual intervention. We implemented it as a web-based application, which can create large datasets that in turn can be used to train supervised approaches to word sense disambiguation of biomedical acronyms. Availability and implementation: The datasets will be generated on demand based on a user query and will be downloadable from https://datainnovation.cardiff.ac.uk/acronyms/.
AuthorsMaxim Filimonov, Daphné Chopard, Irena Spasić
SubmittedBioinformatics
Date26.05.2022
The algorithmic independence of conditionals, which postulates that the causal mechanism is algorithmically independent of the cause, has recently inspired many highly successful approaches to distinguish cause from effect given only observational data. Most popular among these is the idea to approximate algorithmic independence via two-part Minimum Description Length (MDL). Although intuitively sensible, the link between the original postulate and practical two-part MDL encodings is left vague. In this work, we close this gap by deriving a two-part formulation of this postulate, in terms of Kolmogorov complexity, which directly links to practical MDL encodings. To close the cycle, we prove that this formulation leads on expectation to the same inference result as the original postulate.
AuthorsAlexander Marx, Jilles Vreeken
SubmittedAAAI'22 Workshop on Information-Theoretic Methods for Causal Inference and Discovery (ITCI’22)
Date05.05.2022
We study the problem of identifying the cause and the effect between two univariate continuous variables X and Y. The examined data is purely observational, hence it is required to make assumptions about the underlying model. Often, the independence of the noise from the cause is assumed, which is not always the case for real world data. In view of this, we present a new method, which explicitly models heteroscedastic noise. With our HEC algorithm, we can find the optimal model regularized, by an information theoretic score. In thorough experiments we show, that our ability to model heteroscedastic noise translates into a superior performance on a wide range of synthetic and real-world datasets.
AuthorsSascha Xu, Alexander Marx, Osman Mian, Jilles Vreeken
SubmittedAAAI'22 Workshop on Information-Theoretic Methods for Causal Inference and Discovery (ITCI’22)
Date05.05.2022
Estimating mutual information (MI) between two continuous random variables X and Y allows to capture non-linear dependencies between them, non-parametrically. As such, MI estimation lies at the core of many data science applications. Yet, robustly estimating MI for high-dimensional X and Y is still an open research question. In this paper, we formulate this problem through the lens of manifold learning. That is, we leverage the common assumption that the information of X and Y is captured by a low-dimensional manifold embedded in the observed high-dimensional space and transfer it to MI estimation. As an extension to state-of-the-art kNN estimators, we propose to determine the k-nearest neighbors via geodesic distances on this manifold rather than from the ambient space, which allows us to estimate MI even in the high-dimensional setting. An empirical evaluation of our method, G-KSG, against the state-of-the-art shows that it yields good estimations of MI in classical benchmark and manifold tasks, even for high dimensional datasets, which none of the existing methods can provide.
AuthorsAlexander Marx, Jonas Fischer
SubmittedProceedings of the SIAM International Conference on Data Mining, SDM 2022
Date30.04.2022
Due to growing concerns about demographic disparities and discrimination resulting from algorithmic and model-based decision-making, recent research has focused on mitigating biases against already disadvantaged or marginalised groups in classification models. From the perspective of classification parity, the two commonest metrics for assessing fairness are statistical parity and equality of opportunity. Current approaches to debiasing in classification either require the knowledge of the protected attribute before or during training or are entirely agnostic to the model class and parameters. This work considers differentiable proxy functions for statistical parity and equality of opportunity and introduces two novel debiasing techniques for neural network classifiers based on fine-tuning and pruning an already-trained network. As opposed to the prior work leveraging adversarial training, the proposed methods are simple yet effective and can be readily applied post hoc. Our experimental results encouragingly suggest that these approaches successfully debias fully connected neural networks trained on tabular data and often outperform model-agnostic post-processing methods.
AuthorsRicards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt
SubmittedContributed talk at ICLR 2022 Workshop on Socially Responsible Machine Learning
Date29.04.2022
In this work, we study the problem of clustering survival data — a challenging and so far under-explored task. We introduce a novel semi-supervised probabilistic approach to cluster survival data by leveraging recent advances in stochastic gradient variational inference. In contrast to previous work, our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and censored survival times. We compare our model to the related work on clustering and mixture models for survival data in comprehensive experiments on a wide range of synthetic, semi-synthetic, and real-world datasets, including medical imaging data. Our method performs better at identifying clusters and is competitive at predicting survival times. Relying on novel generative assumptions, the proposed model offers a holistic perspective on clustering survival data and holds a promise of discovering subpopulations whose survival is regulated by different generative mechanisms.
AuthorsLaura Manduchi*, Ricards Marcinkevics*, Michela C. Massi, Thomas Weikert, Alexander Sauter, Verena Gotta, Timothy Müller, Flavio Vasella, Marian C. Neidert, Marc Pfister, Bram Stieltjes, Julia E. Vogt* denotes shared first authorship
SubmittedThe Tenth International Conference on Learning Representations, ICLR 2022
Date25.04.2022
Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. Yet, despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs, which are completely unsupervised. In an attempt to explain this gap, we uncover a fundamental limitation that applies to a large family of mixture-based multimodal VAEs. We prove that the sub-sampling of modalities enforces an undesirable upper bound on the multimodal ELBO and thereby limits the generative quality of the respective models. Empirically, we showcase the generative quality gap on both synthetic and real data and present the tradeoffs between different variants of multimodal VAEs. We find that none of the existing approaches fulfills all desired criteria of an effective multimodal generative model when applied on more complex datasets than those used in previous benchmarks. In summary, we identify, formalize, and validate fundamental limitations of VAE-based approaches for modeling weakly-supervised data and discuss implications for real-world applications.
AuthorsImant Daunhawer, Thomas M. Sutter, Kieran Chin-Cheong, Emanuele Palumbo, Julia E. Vogt
SubmittedThe Tenth International Conference on Learning Representations, ICLR 2022
Date07.04.2022
Using artificial intelligence to improve patient care is a cutting-edge methodology, but its implementation in clinical routine has been limited due to significant concerns about understanding its behavior. One major barrier is the explainability dilemma and how much explanation is required to use artificial intelligence safely in healthcare. A key issue is the lack of consensus on the definition of explainability by experts, regulators, and healthcare professionals, resulting in a wide variety of terminology and expectations. This paper aims to fill the gap by defining minimal explainability standards to serve the views and needs of essential stakeholders in healthcare. In that sense, we propose to define minimal explainability criteria that can support doctors’ understanding, meet patients’ needs, and fulfill legal requirements. Therefore, explainability need not to be exhaustive but sufficient for doctors and patients to comprehend the artificial intelligence models’ clinical implications and be integrated safely into clinical practice. Thus, minimally acceptable standards for explainability are context-dependent and should respond to the specific need and potential risks of each clinical scenario for a responsible and ethical implementation of artificial intelligence.
AuthorsLaura Arbelaez Ossa, Georg Starke, Giorgia Lorenzini, Julia E Vogt, David M Shaw, Bernice Simone Elger
SubmittedDIGITAL HEALTH
Date11.02.2022
The recent introduction of portable, low-field MRI (LF-MRI) into the clinical setting has the potential to transform neuroimaging. However, LF-MRI is limited by lower resolution and signal-to-noise ratio, leading to incomplete characterization of brain regions. To address this challenge, recent advances in machine learning facilitate the synthesis of higher resolution images derived from one or multiple lower resolution scans. Here, we report the extension of a machine learning super-resolution (SR) algorithm to synthesize 1 mm isotropic MPRAGElike scans from LF-MRI T1-weighted and T2-weighted sequences. Our initial results on a paired dataset of LF and high-field (HF, 1.5T-3T) clinical scans show that: (i) application of available automated segmentation tools directly to LF-MRI images falters; but (ii) segmentation tools succeed when applied to SR images with high correlation to gold standard measurements from HF-MRI (e.g., r = 0.85 for hippocampal volume, r = 0.84 for the thalamus, r = 0.92 for the whole cerebrum). This work demonstrates proof-of-principle postprocessing image enhancement from lower resolution LF-MRI sequences. These results lay the foundation for future work to enhance the detection of normal and abnormal image findings at LF and ultimately improve the diagnostic performance of LF-MRI. Our tools are publicly available on FreeSurfer (surfer.nmr.mgh.harvard.edu/)
AuthorsJuan Eugenio Iglesias, Riana Schleicher, Sonia Laguna, Benjamin Billot, Pamela Schaefer, Brenna McKaig, Joshua N Goldstein, Kevin N Sheth, Matthew S Rosen, W Taylor Kimberly
SubmittedarXiv preprint arXiv:2202.03564
Date07.02.2022
Die Digitalisierung hat die Medizin bereits verändert und wird die ärztliche Tätigkeit auch in Zukunft stark beeinflussen. Es ist deshalb wichtig, dass sich angehende Ärztinnen und Ärzte bereits während des Studiums mit den Methoden und Einsatzmöglichkeiten des maschinellen Lernens auseinandersetzen. Die Arbeitsgruppe «Digitalisierung der Medizin» hat dazu Lernziele erarbeitet.
AuthorsRaphaël Bonvin, Joachim Buhmann, Carlos Cotrini Jimenez, Marcel Egger, Alexander Geissler, Michael Krauthammer, Christian Schirlo, Christiane Spiess, Johann Steurer, Kerstin Noëlle Vokinger, Julia Vogt
Date26.01.2022
Objective: To report the outcomes of active surveillance (AS) for low-risk prostate cancer (PCa) in a single-center cohort. Patients and Methods: This is a prospective, single-center, observational study. The cohort included all patients who underwent AS for PCa between December 1999 and December 2020 at our institution. Follow-up appointments (FU) ended in February 2021. Results: A total of 413 men were enrolled in the study, and 391 had at least one FU. Of those who followed up, 267 had PCa diagnosed by transrectal ultrasound (TRUS)-guided biopsy (T1c: 68.3%), while 124 were diagnosed after transurethral resection of the prostate (TURP) (T1a/b: 31.7%). Median FU was 46 months (IQR 25–90). Cancer specific survival was 99.7% and overall survival was 92.3%. Median reclassification time was 11.2 years. After 20 years, 25% of patients were reclassified within 4.58 years, 6.6% opted to switch to watchful waiting, 4.1% died, 17.4% were lost to FU, and 46.8% remained on AS. Those diagnosed by TRUS had a significantly higher reclassification rate than those diagnosed by TURP (p < 0.0001). Men diagnosed by targeted MRI/TRUS fusion biopsy tended to have a higher reclassification probability than those diagnosed by conventional template biopsies (p = 0.083). Conclusions: Our single-center cohort spanning over two decades revealed that AS remains a safe option for low-risk PCa even in the long term. Approximately half of AS enrollees will eventually require definitive treatment due to disease progression. Men with incidental prostate cancer were significantly less likely to have disease progression.
AuthorsSarah Hagmann, Venkat Ramakrishnan, Alexander Tamalunas, Marc Hofmann, Moritz Vandenhirtz, Silvan Vollmer, Jsmea Hug, Philipp Niggli, Antonio Nocito, Rahel A. Kubik-Huch, Kurt Lehmann, Lukas John Hefermehl
SubmittedCancers
Date12.01.2022
2021
Background: Pharmacovigilance and safety reporting, which involve processes for monitoring the use of medicines in clinical trials, play a critical role in the identification of previously unrecognized adverse events or changes in the patterns of adverse events. Objective: This study aims to demonstrate the feasibility of automating the coding of adverse events described in the narrative section of the serious adverse event report forms to enable statistical analysis of the aforementioned patterns. Methods: We used the Unified Medical Language System (UMLS) as the coding scheme, which integrates 217 source vocabularies, thus enabling coding against other relevant terminologies such as the International Classification of Diseases–10th Revision, Medical Dictionary for Regulatory Activities, and Systematized Nomenclature of Medicine). We used MetaMap, a highly configurable dictionary lookup software, to identify the mentions of the UMLS concepts. We trained a binary classifier using Bidirectional Encoder Representations from Transformers (BERT), a transformer-based language model that captures contextual relationships, to differentiate between mentions of the UMLS concepts that represented adverse events and those that did not. Results: The model achieved a high F1 score of 0.8080, despite the class imbalance. This is 10.15 percent points lower than human-like performance but also 17.45 percent points higher than that of the baseline approach. Conclusions: These results confirmed that automated coding of adverse events described in the narrative section of serious adverse event reports is feasible. Once coded, adverse events can be statistically analyzed so that any correlations with the trialed medicines can be estimated in a timely fashion.
AuthorsDaphné Chopard, Matthias S Treder, Padraig Corcoran, Nagheen Ahmed, Claire Johnson, Monica Busse, Irena Spasić, others
SubmittedJMIR Medical Informatics
Date24.12.2021
Appendicitis is a common childhood disease, the management of which still lacks consolidated international criteria. In clinical practice, heuristic scoring systems are often used to assess the urgency of patients with suspected appendicitis. Previous work on machine learning for appendicitis has focused on conventional classification models, such as logistic regression and tree-based ensembles. In this study, we investigate the use of risk supersparse linear integer models (risk SLIM) for learning data-driven risk scores to predict the diagnosis, management, and complications in pediatric patients with suspected appendicitis on a dataset consisting of 430 children from a tertiary care hospital. We demonstrate the efficacy of our approach and compare the performance of learnt risk scores to previous analyses with random forests. Risk SLIM is able to detect medically meaningful features and outperforms the traditional appendicitis scores, while at the same time is better suited for the clinical setting than tree-based ensembles.
AuthorsPedro Roig Aparicio, Ricards Marcinkevics, Patricia Reis Wolfertstetter, Sven Wellmann, Christian Knorr, Julia E. Vogt
SubmittedShort paper at 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021
Date16.12.2021
Constrained clustering has gained significant attention in the field of machine learning as it can leverage prior information on a growing amount of only partially labeled data. Following recent advances in deep generative models, we propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of stochastic gradient variational inference. By explicitly integrating domain knowledge in the form of probabilistic relations, our proposed model (DC-GMM) uncovers the underlying distribution of data conditioned on prior clustering preferences, expressed as pairwise constraints. These constraints guide the clustering process towards a desirable partition of the data by indicating which samples should or should not belong to the same cluster. We provide extensive experiments to demonstrate that DC-GMM shows superior clustering performances and robustness compared to state-of-the-art deep constrained clustering methods on a wide range of data sets. We further demonstrate the usefulness of our approach on two challenging real-world applications.
AuthorsLaura Manduchi, Kieran Chin-Cheong, Holger Michel, Sven Wellmann, Julia E. Vogt
SubmittedAccepted at NeurIPS 2021
Date14.12.2021
In this work, we propose a new formulation for multimodal VAEs to model and learn the relationship between data types. Despite their recent progress, current multimodal generative methods are based on simplistic assumptions regarding the relation between data types, which leads to a trade-off between coherence and quality of generated samples - even for simple toy datasets. The proposed method learns the relationship between data types instead of relying on pre-defined and limiting assumptions. Based on the principles of variational inference, we change the posterior approximation to explicitly include information about the relation between data types. We show empirically that the simplified assumption of a single shared latent space leads to inferior performance for a dataset with additional pairwise shared information.
AuthorsThomas M. Sutter, Julia E. Vogt
SubmittedBayesian Deep Learning Workshop at Neurips 2021
Date14.12.2021
Despite its proven efficiency in other fields, data augmentation is less popular in the context of natural language processing (NLP) due to its complexity and limited results. A recent study (Longpre et al., 2020) showed for example that task-agnostic data augmentations fail to consistently boost the performance of pretrained transformers even in low data regimes. In this paper, we investigate whether data-driven augmentation scheduling and the integration of a wider set of transformations can lead to improved performance where fixed and limited policies were unsuccessful. Our results suggest that, while this approach can help the training process in some settings, the improvements are unsubstantial. This negative result is meant to help researchers better understand the limitations of data augmentation for NLP.
AuthorsDaphné Chopard, Matthias S Treder, Irena Spasić
SubmittedProceedings of the Second Workshop on Insights from Negative Results in NLP
Date01.11.2021
Sleep is crucial to restore body functions and metabolism across nearly all tissues and cells, and sleep restriction is linked to various metabolic dysfunctions in humans. Using exhaled breath analysis by secondary electrospray ionization high-resolution mass spectrometry, we measured the human exhaled metabolome at 10-s resolution across a night of sleep in combination with conventional polysomnography. Our subsequent analysis of almost 2,000 metabolite features demonstrates rapid, reversible control of major metabolic pathways by the individual vigilance states. Within this framework, whereas a switch to wake reduces fatty acid oxidation, a switch to slow-wave sleep increases it, and the transition to rapid eye movement sleep results in elevation of tricarboxylic acid (TCA) cycle intermediates. Thus, in addition to daily regulation of metabolism, there exists a surprising and complex underlying orchestration across sleep and wake. Both likely play an important role in optimizing metabolic circuits for human performance and health.
AuthorsNora Nowak, Thomas Gaisl, Djordje Miladinovic, Ricards Marcinkevics, Martin Osswald, Stefan Bauer, Joachim Buhmann, Renato Zenobi, Pablo Sinues, Steven A. Brown, Malcolm Kohler
SubmittedCell Reports
Date26.10.2021
Estimating conditional mutual information (CMI) is an essential yet challenging step in many machine learning and data mining tasks. Estimating CMI from data that contains both discrete and continuous variables, or even discrete-continuous mixture variables, is a particularly hard problem. In this paper, we show that CMI for such mixture variables, defined based on the Radon-Nikodym derivate, can be written as a sum of entropies, just like CMI for purely discrete or continuous data. Further, we show that CMI can be consistently estimated for discrete-continuous mixture variables by learning an adaptive histogram model. In practice, we estimate such a model by iteratively discretizing the continuous data points in the mixture variables. To evaluate the performance of our estimator, we benchmark it against state-of-the-art CMI estimators as well as evaluate it in a causal discovery setting.
AuthorsAlexander Marx, Lincen Yang, Matthijs van Leeuwen
SubmittedProceedings of the SIAM International Conference on Data Mining, SDM 2021
Date21.10.2021
Background: Current strategies for risk stratification and prediction of neonatal early-onset sepsis (EOS) are inefficient and lack diagnostic performance. The aim of this study was to use machine learning to analyze the diagnostic accuracy of risk factors (RFs), clinical signs and biomarkers and to develop a prediction model for culture-proven EOS. We hypothesized that the contribution to diagnostic accuracy of biomarkers is higher than of RFs or clinical signs. Study Design: Secondary analysis of the prospective international multicenter NeoPInS study. Neonates born after completed 34 weeks of gestation with antibiotic therapy due to suspected EOS within the first 72 hours of life participated. Primary outcome was defined as predictive performance for culture-proven EOS with variables known at the start of antibiotic therapy. Machine learning was used in form of a random forest classifier. Results: One thousand six hundred eighty-five neonates treated for suspected infection were analyzed. Biomarkers were superior to clinical signs and RFs for prediction of culture-proven EOS. C-reactive protein and white blood cells were most important for the prediction of the culture result. Our full model achieved an area-under-the-receiver-operating-characteristic-curve of 83.41% (+/-8.8%) and an area-under-the-precision-recall-curve of 28.42% (+/-11.5%). The predictive performance of the model with RFs alone was comparable with random. Conclusions: Biomarkers have to be considered in algorithms for the management of neonates suspected of EOS. A 2-step approach with a screening tool for all neonates in combination with our model in the preselected population with an increased risk for EOS may have the potential to reduce the start of unnecessary antibiotics.
AuthorsMartin Stocker, Imant Daunhawer, Wendy van Herk, Salhab el Helou, Sourabh Dutta, Frank A. B. A.Schuerman, Rita K. van den Tooren-de Groot, ; Jantien W. Wieringa, Jan Janota, Laura H. van der Meer-Kappelle, Rob Moonen, Sintha D. Sie, Esther de Vries, Albertine E. Donker, Urs Zimmerman, Luregn J. Schlapbach, Amerik C. de Mol, Angelique Hoffmann-Haringsma, Madan Roy, Maren Tomaske, René F. Kornelisse, Juliette van Gijsel, Frans B. Plötz, Sven Wellmann, Niek B Achten, Dirk Lehnick, Annemarie M. C. van Rossum, Julia E. Vogt
SubmittedThe Pediatric Infectious Disease Journal, 2022
Date09.09.2021
Autonomic peripheral activity is partly governed by brain autonomic centers. However, there is still a lot of uncertainties regarding the precise link between peripheral and central autonomic biosignals. Clarifying these links could have a profound impact on the interpretability, and thus usefulness, of peripheral autonomic biosignals captured with wearable devices. In this study, we take advantage of a unique dataset consisting of intracranial stereo-electroencephalography (SEEG) and peripheral biosignals acquired simultaneously for several days from four subjects undergoing epilepsy monitoring. Compared to previous work, we apply a deep neural network to explore high-dimensional nonlinear correlations between the cerebral brainwaves and variations in heart rate and electrodermal activity (EDA). Further, neural network explainability methods were applied to identify most relevant brainwave frequencies, brain regions and temporal information to predict a specific biosignal. Strongest brain-peripheral correlations were observed from contacts located in the central autonomic network, in particular in the alpha, theta and 52 to 58 Hz frequency band. Furthermore, a temporal delay of 12 to 14 s between SEEG and EDA signal was observed. Finally, we believe that this pilot study demonstrates a promising approach to mapping brain-peripheral relationships in a data-driven manner by leveraging the expressiveness of deep neural networks.
AuthorsAlexander H. Hatteland*, Ricards Marcinkevics*, Renaud Marquis, Thomas Frick, Ilona Hubbard, Julia E. Vogt, Thomas Brunschwiler†, Philippe Ryvlin†* denotes shared first authorship, † denotes shared last authorship
SubmittedBest paper award at IEEE International Conference on Digital Health, ICDH 2021
Date05.09.2021
Understanding how epigenetic variation in non-coding regions is involved in distal gene-expression regulation is an important problem. Regulatory regions can be associated to genes using large-scale datasets of epigenetic and expression data. However, for regions of complex epigenomic signals and enhancers that regulate many genes, it is difficult to understand these associations. We present StitchIt, an approach to dissect epigenetic variation in a gene-specific manner for the detection of regulatory elements (REMs) without relying on peak calls in individual samples. StitchIt segments epigenetic signal tracks over many samples to generate the location and the target genes of a REM simultaneously. We show that this approach leads to a more accurate and refined REM detection compared to standard methods even on heterogeneous datasets, which are challenging to model. Also, StitchIt REMs are highly enriched in experimentally determined chromatin interactions and expression quantitative trait loci. We validated several newly predicted REMs using CRISPR-Cas9 experiments, thereby demonstrating the reliability of StitchIt. StitchIt is able to dissect regulation in superenhancers and predicts thousands of putative REMs that go unnoticed using peak-based approaches suggesting that a large part of the regulome might be uncharted water.
AuthorsFlorian Schmidt, Alexander Marx, Nina Baumgarten, Marie Hebel, Martin Wegner, Manuel Kaulich, Matthias S Leisegang, Ralf P Brandes, Jonathan Göke, Jilles Vreeken, Marcel H Schulz
SubmittedNucleic Acids Research
Date01.09.2021
Objective: To evaluate the association of self-reported physical function with subjective and objective measures as well as temporospatial gait features in lumbar spinal stenosis (LSS). Design: Cross-sectional pilot study. Setting: Outpatient multispecialty clinic. Participants: Participants with LSS and matched controls without LSS (n=10 per group; N=20). Interventions: Not applicable. Main outcome measures: Self-reported physical function (36-Item Short Form Health Survey [SF-36] physical functioning domain), Oswestry Disability Index, Swiss Spinal Stenosis Questionnaire, the Neurogenic Claudication Outcome Score, and inertia measurement unit (IMU)-derived temporospatial gait features. Results: Higher self-reported physical function scores (SF-36 physical functioning) correlated with lower disability ratings, neurogenic claudication, and symptom severity ratings in patients with LSS (P<.05). Compared with controls without LSS, patients with LSS have lower scores on physical capacity measures (median total distance traveled on 6-minute walk test: controls 505 m vs LSS 316 m; median total distance traveled on self-paced walking test: controls 718 m vs LSS 174 m). Observed differences in IMU-derived gait features, physical capacity measures, disability ratings, and neurogenic claudication scores between populations with and without LSS were statistically significant. Conclusions: Further evaluation of the association of IMU-derived temporospatial gait with self-reported physical function, pain related-disability, neurogenic claudication, and spinal stenosis symptom severity score in LSS would help clarify their role in tracking LSS outcomes.
AuthorsCharles A Odonkor, Salam Taraben, Christy Tomkins-Lane, Wei Zhang, Amir Muaremi, H. Leutheuser, Ruopeng Sun, Matthew Smuck
SubmittedArchives of Rehabilitation Research and Clinical Translation
Date01.09.2021
One of the core assumptions in causal discovery is the faithfulness assumption--i.e. assuming that independencies found in the data are due to separations in the true causal graph. This assumption can, however, be violated in many ways, including xor connections, deterministic functions or cancelling paths. In this work, we propose a weaker assumption that we call 2-adjacency faithfulness. In contrast to adjacency faithfulness, which assumes that there is no conditional independence between each pair of variables that are connected in the causal graph, we only require no conditional independence between a node and a subset of its Markov blanket that can contain up to two nodes. Equivalently, we adapt orientation faithfulness to this setting. We further propose a sound orientation rule for causal discovery that applies under weaker assumptions. As a proof of concept, we derive a modified Grow and Shrink algorithm that recovers the Markov blanket of a target node and prove its correctness under strictly weaker assumptions than the standard faithfulness assumption.
AuthorsAlexander Marx, Arthur Gretton, Joris M. Mooij
SubmittedProceedings of the Conference on Uncertainty in Artificial Intelligence, UAI 2021
Date01.08.2021
Machine Learning has become more and more popular in the medical domain over the past years. While supervised machine learning has already been applied successfully, the vast amount of unlabelled data offers new opportunities for un- and self-supervised learning methods. Especially with regard to the multimodal nature of most clinical data, the labelling of multiple data types becomes quickly infeasible in the medical domain. However, to the best of our knowledge, multimodal unsupervised methods have been tested extensively on toy-datasets only but have never been applied to real-world medical data, for direct applications such as disease classification and image generation. In this article, we demonstrate that self-supervised methods provide promising results on medical data while highlighting that the task is extremely challenging and that there is space for substantial improvements.
AuthorsHendrik J. Klug, Thomas M. Sutter, Julia E. Vogt
SubmittedMedical Imaging with Deep Learning, MIDL 2021
Date07.07.2021
It is well-known that correlation does not equal causation, but how can we infer causal relations from data? Causal discovery tries to answer precisely this question by rigorously analyzing under which assumptions it is feasible to infer directed causal networks from passively collected, so-called observational data. Classical approaches assume the data to be faithful to the causal graph, that is, independencies found in the distribution are assumed to be due to separations in the true graph. Under this assumption, so-called constraint-based methods can infer the correct Markov equivalence class of the true graph (i.e. the correct undirected graph and some edge directions), only using conditional independence tests. In this dissertation, we aim to alleviate some of the weaknesses of constraint-based algorithms. In the first part, we investigate causal mechanisms, which cannot be detected when assuming faithfulness. We then suggest a weaker assumption based on triple interactions, which allows for recovering a broader spectrum of causal mechanisms. Subsequently, we focus on conditional independence testing, which is a crucial tool for causal discovery. In particular, we propose to measure dependencies through conditional mutual information, which we show can be consistently estimated even for the most general setup: discrete-continuous mixture random variables. Last, we focus on distinguishing Markov equivalent graphs (i.e. infer the complete DAG structure), which boils down to inferring the causal direction between two random variables. In this setting, we focus on continuous and mixed-type data and develop our methods based on an information-theoretic postulate, which states that the true causal graph can be compressed best, i.e. has the smallest Kolmogorov complexity.
AuthorsAlexander Marx
SubmittedSaarländische Universitäts- und Landesbibliothek
Date06.07.2021
Background Preterm neonates frequently experience hypernatremia (plasma sodium concentrations >145 mmol/l), which is associated with clinical complications, such as intraventricular hemorrhage. Study design In this single center retrospective observational study, the following 7 risk factors for hypernatremia were analyzed in very low gestational age (VLGA, below 32 weeks) neonates: gestational age (GA), delivery mode (DM; vaginal or caesarian section), sex, birth weight, small for GA, multiple birth, and antenatal corticosteroids. Machine learning (ML) approaches were applied to obtain probabilities for hypernatremia. Results 824 VLGA neonates were included (median GA 29.4 weeks, median birth weight 1170g, caesarean section 83%). 38% of neonates experienced hypernatremia. Maximal sodium concentration of 144 mmol/l (interquartile range 142–147) was observed 52 hours (41–65) after birth. ML identified vaginal delivery and GA as key risk factors for hypernatremia. The risk of hypernatremia increased with lower GA from 22% for GA >= 31–32 weeks to 46% for GA < 31 weeks and 60% for GA < 27 weeks. A linear relationship between maximal sodium concentrations and GA was found, showing decreases of 0.29 mmol/l per increasing week GA in neonates with vaginal delivery and 0.49 mmol/l/week after cesarean section. Sex, multiple birth and antenatal corticosteroids were not associated hypernatremia. Conclusion VLGA neonates with vaginal delivery and low GA have the highest risk for hypernatremia. Early identification of neonates at risk and early intervention may prevent extreme sodium excursions and associated clinical complications.
AuthorsNadia S. Eugster, Florence Corminboeuf, Gilbert Koch, Julia E. Vogt, Thomas Sutter, Tamara van Donge, Marc Pfister, Roland Gerull
SubmittedKlinische Pädiatrie
Date07.06.2021
Multiple data types naturally co-occur when describing real-world phenomena and learning from them is a long-standing goal in machine learning research. However, existing self-supervised generative models approximating an ELBO are not able to fulfill all desired requirements of multimodal models: their posterior approximation functions lead to a trade-off between the semantic coherence and the ability to learn the joint data distribution. We propose a new, generalized ELBO formulation for multimodal data that overcomes these limitations. The new objective encompasses two previous methods as special cases and combines their benefits without compromises. In extensive experiments, we demonstrate the advantage of the proposed method compared to state-of-the-art models in self-supervised, generative learning tasks.
AuthorsThomas M. Sutter*, Imant Daunhawer*, Julia E. Vogt* denotes shared first authorship
SubmittedNinth International Conference on Learning Representations, ICLR 2021
Date04.05.2021
We study the problem of inferring causal graphs from observational data. We are particularly interested in discovering graphs where all edges are oriented, as opposed to the partially directed graph that the state-of-the-art discover. To this end we base our approach on the algorithmic Markov condition. Unlike the statistical Markov condition, it uniquely identifies the true causal network as the one that provides the simplest--as measured in Kolmogorov complexity--factorization of the joint distribution. Although Kolmogorov complexity is not computable, we can approximate it from above via the Minimum Description Length principle, which allows us to define a consistent and computable score based on non-parametric multivariate regression. To efficiently discover causal networks in practice, we introduce the GLOBE algorithm, which greedily adds, removes, and orients edges such that it minimizes the overall cost. Through an extensive set of experiments we show GLOBE performs very well in practice, beating the state-of-the-art by a margin.
AuthorsOsman Mian, Alexander Marx, Jilles Vreeken
SubmittedProceedings of the AAAI Conference on Artificial Intelligence, AAAI 2021
Date01.05.2021
Background: Given the absence of consolidated and standardized international guidelines for managing pediatric appendicitis and the few strictly data-driven studies in this specific, we investigated the use of machine learning (ML) classifiers for predicting the diagnosis, management and severity of appendicitis in children. Materials and Methods: Predictive models were developed and validated on a dataset acquired from 430 children and adolescents aged 0-18 years, based on a range of information encompassing history, clinical examination, laboratory parameters, and abdominal ultrasonography. Logistic regression, random forests, and gradient boosting machines were used for predicting the three target variables. Results: A random forest classifier achieved areas under the precision-recall curve of 0.94, 0.92, and 0.70, respectively, for the diagnosis, management, and severity of appendicitis. We identified smaller subsets of 6, 17, and 18 predictors for each of targets that sufficed to achieve the same performance as the model based on the full set of 38 variables. We used these findings to develop the user-friendly online Appendicitis Prediction Tool for children with suspected appendicitis. Discussion: This pilot study considered the most extensive set of predictor and target variables to date and is the first to simultaneously predict all three targets in children: diagnosis, management, and severity. Moreover, this study presents the first ML model for appendicitis that was deployed as an open access easy-to-use online tool. Conclusion: ML algorithms help to overcome the diagnostic and management challenges posed by appendicitis in children and pave the way toward a more personalized approach to medical decision-making. Further validation studies are needed to develop a finished clinical decision support system.
AuthorsRicards Marcinkevics*, Patricia Reis Wolfertstetter*, Sven Wellmann, Christian Knorr†, Julia E Vogt†* denotes shared first authorship, † denotes shared last authorship
SubmittedFrontiers in Pediatrics
Date29.04.2021
Survival analysis has gained significant attention in the medical domain with many far-reaching applications. Although a variety of machine learning methods have been introduced for tackling time-to-event prediction in unstructured data with complex dependencies, clustering of survival data remains an under-explored problem. The latter is particularly helpful in discovering patient subpopulations whose survival is regulated by different generative mechanisms, a critical problem in precision medicine. To this end, we introduce a novel probabilistic approach to cluster survival data in a variational deep clustering setting. Our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and the potentially censored survival times. We compare our model to the related work on survival clustering in comprehensive experiments on a range of synthetic, semi-synthetic, and real-world datasets. Our proposed method performs better at identifying clusters and is competitive at predicting survival times in terms of the concordance index and relative absolute error.
AuthorsLaura Manduchi*, Ricards Marcinkevics*, Julia E. Vogt* denotes shared first authorship
SubmittedContributed talk at AI for Public Health Workshop at ICLR 2021
Date09.04.2021
Generating interpretable visualizations of multivariate time series in the intensive care unit is of great practical importance. Clinicians seek to condense complex clinical observations into intuitively understandable critical illness patterns, like failures of different organ systems. They would greatly benefit from a low-dimensional representation in which the trajectories of the patients’ pathology become apparent and relevant health features are highlighted. To this end, we propose to use the latent topological structure of Self-Organizing Maps (SOMs) to achieve an interpretable latent representation of ICU time series and combine it with recent advances in deep clustering. Specifically, we (a) present a novel way to fit SOMs with probabilistic cluster assignments (PSOM), (b) propose a new deep architecture for probabilistic clustering (DPSOM) using a VAE, and (c) extend our architecture to cluster and forecastclinical states in time series (T-DPSOM). We show that our model achieves superior clustering performance compared to state-of-the-art SOM-based clustering methods while maintaining the favorable visualization properties of SOMs. On the eICU data-set, we demonstrate that T-DPSOM provides interpretable visualizations ofpatient state trajectories and uncertainty estimation. We show that our method rediscovers well-known clinical patient characteristics, such as a dynamic variant of the Acute Physiology And Chronic Health Evaluation (APACHE) score. Moreover, we illustrate how itcan disentangle individual organ dysfunctions on disjoint regions of the two-dimensional SOM map.
AuthorsLaura Manduchi, Matthias Hüser, Martin Faltys, Julia Vogt, Gunnar Rätsch, Vincent Fortuin
SubmittedACM CHIL 2021
Date04.03.2021
In the quest for efficient and robust learning methods, combining unsupervised state representation learning and reinforcement learning (RL) could offer advantages for scaling RL algorithms by providing the models with a useful inductive bias. For achieving this, an encoder is trained in an unsupervised manner with two state representation methods, a variational autoencoder and a contrastive estimator. The learned features are then fed to the actor-critic RL algorithm Proximal Policy Optimization (PPO) to learn a policy for playing Open AI’s car racing environment. Hence, such procedure permits to decouple state representations from RL-controllers. For the integration of RL with unsupervised learning, we explore various designs for variational autoencoders and contrastive learning. The proposed method is compared to a deep network trained directly on pixel inputs with PPO. The results show that the proposed method performs slightly worse than directly learning from pixel inputs; however, it has a more stable learning curve, a substantial reduction of the buffer size, and requires optimizing 88% fewer parameters. These results indicate that the use of pre-trained state representations hasseveral benefits for solving RL tasks.
AuthorsJuan M. Montoya, Imant Daunhawer, Julia E. Vogt, Marco Wiering
SubmittedICAART 2021
Date04.02.2021
Background and objectives Macroreentrant atrial tachyarrhythmias (MRATs) can be caused by different reentrant circuits. The treatment for each MRAT type may require ablation at different sites, either at the right or left atria. Unfortunately, the reentrant circuit that drives the arrhythmia cannot be ascertained previous to the electrophysiological intervention. Methods A noninvasive approach based on the comparison of atrial vectorcardiogram (VCG) loops is proposed. An archetype for each group was created, which served as a reference to measure the similarity between loops. Methods were tested in a variety of simulations and real data obtained from the most common right (peritricuspid) and left (perimitral) macroreentrant circuits, each divided into clockwise and counterclockwise subgroups. Adenosine was administered to patients to induce transient AV block, allowing the recording of the atrial signal without the interference of ventricular signals. From the vectorcardiogram, we measured intrapatient loop consistence, similarity of the pathway to archetypes, characterisation of slow velocity regions and pathway complexity. Results Results show a considerably higher similarity with the loop of its corresponding archetype, in both simulations and real data. We found the capacity of the vectorcardiogram to reflect a slow velocity region, consistent with the mechanisms of MRAT, and the role that it plays in the characterisation of the reentrant circuit. The intra-patient loop consistence was over 0.85 for all clinical cases while the similarity of the pathway to archetypes was found to be 0.85 ± 0.03, 0.95 ± 0.03, 0.87 ± 0.04 and 0.91 ± 0.02 for the different MRAT types (and p<0.02 for 3 of the 4 groups), and pathway complexity also allowed to discriminate among cases (with p<0.05). Conclusions We conclude that the presented methodology allows us to differentiate between the most common forms of right and left MRATs and predict the existence and location of a slow conduction zone. This approach may be useful in planning ablation procedures in advance.
AuthorsS Ruipérez-Campillo, S Castrejón, M Martínez, R Cervigón, OMeste, JL Merino, J Millet, FCastells
SubmittedComputer methods and programs in biomedicine
Date01.02.2021
Exploratory analysis of time series data can yield a better understanding of complex dynamical systems. Granger causality is a practical framework for analysing interactions in sequential data, applied in a wide range of domains. In this paper, we propose a novel framework for inferring multivariate Granger causality under nonlinear dynamics based on an extension of self-explaining neural networks. This framework is more interpretable than other neural-network-based techniques for inferring Granger causality, since in addition to relational inference, it also allows detecting signs of Granger-causal effects and inspecting their variability over time. In comprehensive experiments on simulated data, we show that our framework performs on par with several powerful baseline methods at inferring Granger causality and that it achieves better performance at inferring interaction signs. The results suggest that our framework is a viable and more interpretable alternative to sparse-input neural networks for inferring Granger causality.
AuthorsRicards Marcinkevics, Julia E. Vogt
SubmittedNinth International Conference on Learning Representations, ICLR 2021
Date15.01.2021
Rationale Tuberculosis diagnosis in children remains challenging. Microbiological confirmation of tuberculosis disease is often lacking, and standard immunodiagnostic including the tuberculin skin test and interferon-gamma release assay for tuberculosis infection has limited sensitivity. Recent research suggests that inclusion of novel Mycobacterium tuberculosis antigens has the potential to improve standard immunodiagnostic tests for tuberculosis. Objective To identify optimal antigen–cytokine combinations using novel Mycobacterium tuberculosis antigens and cytokine read-outs by machine learning algorithms to improve immunodiagnostic assays for tuberculosis. Methods A total of 80 children undergoing investigation of tuberculosis were included (15 confirmed tuberculosis disease, five unconfirmed tuberculosis disease, 28 tuberculosis infection and 32 unlikely tuberculosis). Whole blood was stimulated with 10 novel Mycobacterium tuberculosis antigens and a fusion protein of early secretory antigenic target (ESAT)-6 and culture filtrate protein (CFP) 10. Cytokines were measured using xMAP multiplex assays. Machine learning algorithms defined a discriminative classifier with performance measured using area under the receiver operating characteristics. Measurements and main results We found the following four antigen–cytokine pairs had a higher weight in the discriminative classifier compared to the standard ESAT-6/CFP-10-induced interferon-gamma: Rv2346/47c- and Rv3614/15c-induced interferon-gamma inducible protein-10; Rv2031c-induced granulocyte-macrophage colony-stimulating factor and ESAT-6/CFP-10-induced tumor necrosis factor-alpha. A combination of the 10 best antigen–cytokine pairs resulted in area under the curve of 0.92 +/- 0.04. Conclusion We exploited the use of machine learning algorithms as a key tool to evaluate large immunological datasets. This identified several antigen–cytokine pairs with the potential to improve immunodiagnostic tests for tuberculosis in children.
AuthorsNoemi Rebecca Meier, Thomas M. Sutter, Marc Jacobsen, Tom H. M. Ottenhoff, Julia E. Vogt, Nicole Ritz
SubmittedFrontiers in Cellular and Infection Microbiology
Date08.01.2021
2020
Unplanned hospital readmissions are a burden to patients and increase healthcare costs. A wide variety of machine learning (ML) models have been suggested to predict unplanned hospital readmissions. These ML models were often specifically trained on patient populations with certain diseases. However, it is unclear whether these specialized ML models—trained on patient subpopulations with certain diseases or defined by other clinical characteristics—are more accurate than a general ML model trained on an unrestricted hospital cohort. In this study based on an electronic health record cohort of consecutive inpatient cases of a single tertiary care center, we demonstrate that accurate prediction of hospital readmissions may be obtained by general, disease-independent, ML models. This general approach may substantially decrease the cost of development and deployment of respective ML models in daily clinical routine, as all predictions are obtained by the use of a single model.
AuthorsThomas Sutter, Jan A Roth, Kieran Chin-Cheong, Balthasar L Hug, Julia E Vogt
SubmittedJournal of the American Medical Informatics Association
Date18.12.2020
In this review, we examine the problem of designing interpretable and explainable machine learning models. Interpretability and explainability lie at the core of many machine learning and statistical applications in medicine, economics, law, and natural sciences. Although interpretability and explainability have escaped a clear universal definition, many techniques motivated by these properties have been developed over the recent 30 years with the focus currently shifting towards deep learning methods. In this review, we emphasise the divide between interpretability and explainability and illustrate these two different research directions with concrete examples of the state-of-the-art. The review is intended for a general machine learning audience with interest in exploring the problems of interpretation and explanation beyond logistic regression or random forest variable importance. This work is not an exhaustive literature survey, but rather a primer focusing selectively on certain lines of research which the authors found interesting or informative.
AuthorsRicards Marcinkevics, Julia E. Vogt
SubmittedArxiv
Date04.12.2020
Echocardiography monitors the heart movement for noninvasive diagnosis of heart diseases. It proves to be of profound practical importance as it combines low-cost portable instrumentation and rapid image acquisition without the risks of ionizing radiation. However, echocardiograms produce high-dimensional, noisy data which frequently proved difficult to interpret. As a solution, we propose a novel autoencoder-based framework, DeepHeartBeat, to learn human interpretable representations of cardiac cycles from cardiac ultrasound data. Our model encodes high dimensional observations by a cyclic trajectory in a lower dimensional space. We show that the learned parameters describing the latent trajectory are well interpretable and we demonstrate the versatility of our model by successfully applying it to various cardiologically relevant tasks, such as ejection fraction prediction and arrhythmia detection. As a result, DeepHeartBeat promises to serve as a valuable assistant tool for automating therapy decisions and guiding clinical care.
AuthorsFabian Laumer, Gabriel Fringeli, Alina Dubatovka, Laura Manduchi, Joachim M. Buhmann
Submittedbest newcomer award + spotlight talk at Machine Learning for Health Workshop, NeurIPS 2020
Date01.12.2020
Exploratory analysis of time series data can yield a better understanding of complex dynamical systems. Granger causality is a practical framework for analysing interactions in sequential data, applied in a wide range of domains. In this paper, we propose a novel framework for inferring multivariate Granger causality under nonlinear dynamics based on an extension of self-explaining neural networks. This framework is more interpretable than other neural-network-based techniques for inferring Granger causality, since in addition to relational inference, it also allows detecting signs of Granger-causal effects and inspecting their variability over time. In comprehensive experiments on simulated data, we show that our framework performs on par with several powerful baseline methods at inferring Granger causality and that it achieves better performance at inferring interaction signs. The results suggest that our framework is a viable and more interpretable alternative to sparse-input neural networks for inferring Granger causality.
AuthorsRicards Marcinkevics, Julia E. Vogt
SubmittedInterpretable Inductive Biases and Physically Structured Learning Workshop, NeurIPS 2020
Date01.11.2020
Learning from different data types is a long-standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. However, existing generative models that approximate a multimodal ELBO rely on difficult or inefficient training schemes to learn a joint distribution and the dependencies between modalities. In this work, we propose a novel, efficient objective function that utilizes the Jensen-Shannon divergence for multiple distributions. It simultaneously approximates the unimodal and joint multimodal posteriors directly via a dynamic prior. In addition, we theoretically prove that the new multimodal JS-divergence (mmJSD) objective optimizes an ELBO. In extensive experiments, we demonstrate the advantage of the proposed mmJSD model compared to previous work in unsupervised, generative learning tasks.
AuthorsThomas M. Sutter, Imant Daunhawer, Julia E. Vogt
SubmittedNeurIPS 2020
Date22.10.2020
PET/CT imaging is the gold standard for the diagnosis and staging of lung cancer. However, especially in healthcare systems with limited resources, costly PET/CT images are often not readily available. Conventional machine learning models either process CT or PET/CT images but not both. Models designed for PET/CT images are hence restricted by the number of PET images, such that they are unable to additionally leverage CT-only data. In this work, we apply the concept of visual soft attention to efficiently learn a model for lung cancer segmentation from only a small fraction of PET/CT scans and a larger pool of CT-only scans. We show that our model is capable of jointly processing PET/CT as well as CT-only images, which performs on par with the respective baselines whether or not PET images are available at test time. We then demonstrate that the model learns efficiently from only a few PET/CT scans in a setting where mostly CT-only data is available, unlike conventional models.
AuthorsVaraha Karthik Pattisapu, Imant Daunhawer, Thomas Weikert, Alexander Sauter, Bram Stieltjes, Julia E. Vogt
SubmittedGCPR 2020
Date12.10.2020
The objective of this study is to non-invasively characterise a variety of atrial flutter (AFL) types, defined by a maroreentrant circuit. A vectorcardiographic approach is proposed to compare atrial macroreentrant circuits. Vectorcardiogram (VCG) arechetypes are computed so that parameters such as similarity among loops can be calculated. The methodology was employed in a set of artificial VCGs created from a computational simulation based on a mathematical model and in signals from real patients. Adenosine was used to block the ventricular contribution to the ECG signal, later transformed to a VCG analysed from different perspectives. Results demonstrate a high similarity for cases belonging to a group with its archetype in synthetic and real cases. Slow conduction velocity regions were found to be very well represented in VCGs, in accordance with AFL mechanisms and its importance when characterising atrial macroreentries. The conclusion is that our methodology allows differentiation between the most recurrent types of AFL through the analysis of its VCG representation, predicting the presence of slow conduction regions along the macroreentry. This can be very useful when planning in advance the ablation procedure.
AuthorsS Ruipérez-Campillo, S Castrejón, M Martínez, R Cervigón, O Meste, JL Merino, J Millet, F Castells
SubmittedIEEE Computing in Cardiology (47th CinC, 2020)
Date13.09.2020
Multimodal generative models learn a joint distribution over multiple modalities and thus have the potential to learn richer representations than unimodal models. However, current approaches are either inefficient in dealing with more than two modalities or fail to capture both modality-specific and shared variations. We introduce a new multimodal generative model that integrates both modality-specific and shared factors and aggregates shared information across any subset of modalities efficiently. Our method partitions the latent space into disjoint subspaces for modality-specific and shared factors and learns to disentangle these in a purely self-supervised manner. In extensive experiments, we show improvements in representation learning and generative performance compared to previous methods and showcase the disentanglement capabilities.
AuthorsImant Daunhawer, Thomas M. Sutter, Ricards Marcinkevics, Julia E. Vogt
SubmittedGCPR 2020
Date10.09.2020
Richard Rau, Ece Özkan Elsen, Batu M. Ozturkler, Leila Gastli, Orcun Goksel
SubmittedIEEE International Ultrasonics Symposium (IUS)
Date11.08.2020
Background Functional ambulation limitations are features of lumbar spinal stenosis (LSS) and knee osteoarthritis (OA). With numerous validated walking assessment protocols and a vast number of spatiotemporal gait parameters available from sensor-based assessment, there is a critical need for selection of appropriate test protocols and variables for research and clinical applications. Research question In patients with knee OA and LSS, what are the best sensor-derived gait parameters and the most suitable clinical walking test to discriminate between these patient populations and controls? Methods We collected foot-mounted inertial measurement unit (IMU) data during three walking tests (fast-paced walk test-FPWT, 6-min walk test– 6MWT, self-paced walk test – SPWT) for subjects with LSS, knee OA and matched controls (N = 10 for each group). Spatiotemporal gait characteristics were extracted and pairwise compared (Omega partial squared – w_p^2) between patients and controls. Results We found that normal paced walking tests (6MWT, SPWT) are better suited for distinguishing gait characteristics between patients and controls. Among the sensor-based gait parameters, stance and double support phase timing were identified as the best gait characteristics for the OA population discrimination, whereas foot flat ratio, gait speed, stride length and cadence were identified as the best gait characteristics for the LSS population discrimination. Significance These findings provide guidance on the selection of sensor-derived gait parameters and clinical walking tests to detect alterations in mobility for people with LSS and knee OA.
AuthorsC. Odonkor, A. Kuwabara, C. Tomkins-Lane, W. Zhang, A. Muaremi, H. Leutheuser, R. Sun, M. Smuck
SubmittedGait&Posture
Date01.07.2020
Background The mortality risk remains significant in paediatric and adult patients on chronic haemodialysis (HD) treatment. We aimed to identify factors associated with mortality in patients who started HD as children and continued HD as adults. Methods The data originated from a cohort of patients < 30 years of age who started HD in childhood (<= 19 years) on thrice-weekly HD in outpatient DaVita dialysis centres between 2004 and 2016. Patients with at least 5 years of follow-up since the initiation of HD or death within 5 years were included; 105 variables relating to demographics, HD treatment and laboratory measurements were evaluated as predictors of 5-year mortality utilizing a machine learning approach (random forest). Results A total of 363 patients were included in the analysis, with 84 patients having started HD at < 12 years of age. Low albumin and elevated lactate dehydrogenase (LDH) were the two most important predictors of 5-year mortality. Other predictors included elevated red blood cell distribution width or blood pressure and decreased red blood cell count, haemoglobin, albumin:globulin ratio, ultrafiltration rate, z-score weight for age or single-pool K_t/V (below target). Mortality was predicted with an accuracy of 81%. Conclusions Mortality in paediatric and young adult patients on chronic HD is associated with multifactorial markers of nutrition, inflammation, anaemia and dialysis dose. This highlights the importance of multimodal intervention strategies besides adequate HD treatment as determined by K_t/V alone. The association with elevated LDH was not previously reported and may indicate the relevance of blood–membrane interactions, organ malperfusion or haematologic and metabolic changes during maintenance HD in this population.
AuthorsVerena Gotta, Georgi Tancev, Olivera Marsenic, Julia E. Vogt, Marc Pfister
SubmittedNephrology Dialysis Transplantation
Date08.06.2020
Electronic Health Records (EHRs) are commonly used by the machine learning community for research on problems specifically related to health care and medicine. EHRs have the advantages that they can be easily distributed and contain many features useful for e.g. classification problems. What makes EHR data sets different from typical machine learning data sets is that they are often very sparse, due to their high dimensionality, and often contain heterogeneous (mixed) data types. Furthermore, the data sets deal with sensitive information, which limits the distribution of any models learned using them, due to privacy concerns. For these reasons, using EHR data in practice presents a real challenge. In this work, we explore using Generative Adversarial Networks to generate synthetic, heterogeneous EHRs with the goal of using these synthetic records in place of existing data sets for downstream classification tasks. We will further explore applying differential privacy (DP) preserving optimization in order to produce DP synthetic EHR data sets, which provide rigorous privacy guarantees, and are therefore shareable and usable in the real world. The performance (measured by AUROC, AUPRC and accuracy) of our model's synthetic, heterogeneous data is very close to the original data set (within 3 - 5% of the baseline) for the non-DP model when tested in a binary classification task. Using strong (1,10^-5) DP, our model still produces data useful for machine learning tasks, albeit incurring a roughly 17% performance penalty in our tested classification task. We additionally perform a sub-population analysis and find that our model does not introduce any bias into the synthetic EHR data compared to the baseline in either male/female populations, or the 0-18, 19-50 and 51+ age groups in terms of classification performance for either the non-DP or DP variant.
AuthorsKieran Chin-Cheong, Thomas M. Sutter, Julia E. Vogt
SubmittedArxiv
Date07.06.2020
The benefit of fog computing to use local devices more efficiently and to reduce the latency and operation cost compared to cloud infrastructure is promising for industrial automation. Many industrial (control) applications have demanding real-time requirements and existing automation networks typically exhibit low-bandwidth links between sensing and computing devices. Fog applications in industrial automation contexts thus require that the amount of data transferred between sensing, computing and actuating devices, as well as latencies of control loops are minimized. To meet these requirements, this paper proposes a fog layer architecture that manages the computation and deployment of latency-aware industrial applications with Kubernetes, the prevalent container orchestration framework. The resulting fog layer dynamically solves the resource allocation optimization problem and then deploys distributed containerized applications to automation system networks. It achieves this in a non-intrusive manner, i.e. without actively modifying Kubernetes. Moreover it does not depend on proprietary protocols and infrastructure and is thus widely applicable and preferable to a vendor-specific solution. We compare the architecture with two alternative approaches that differ in the level of coupling to Kubernetes.
AuthorsRaphael Eidenbenz, Yvonne-Anne Pignolet, Alain Ryser
SubmittedFifth International Conference on Fog and Mobile Edge Computing (FMEC)
Date20.04.2020
Clinical pharmacology is a multi-disciplinary data sciences field that utilizes mathematical and statistical methods to generate maximal knowledge from data. Pharmacometrics (PMX) is a well-recognized tool to characterize disease progression, pharmacokinetics and risk factors. Since the amount of data produced keeps growing with increasing pace, the computational effort necessary for PMX models is also increasing. Additionally, computationally efficient methods such as machine learning (ML) are becoming increasingly important in medicine. However, ML is currently not an integrated part of PMX, for various reasons. The goals of this article are to (i) provide an introduction to ML classification methods, (ii) provide examples for a ML classification analysis to identify covariates based on specific research questions, (iii) examine a clinically relevant example to investigate possible relationships of ML and PMX, and (iv) present a summary of ML and PMX tasks to develop clinical decision support tools.
AuthorsGilbert Koch, Marc Pfister, Imant Daunhawer, Melanie Wilbaux, Sven Wellmann, Julia E. Vogt
SubmittedClinical Pharmacology & Therapeutics, 2020
Date11.01.2020
2019
Despite the application of advanced statistical and pharmacometric approaches to pediatric trial data, a large pediatric evidence gap still remains. Here, we discuss how to collect more data from children by using real-world data from electronic health records, mobile applications, wearables, and social media. The large datasets collected with these approaches enable, and may demand, the use of artificial intelligence and machine learning to allow the data to be analyzed for decision-making. Applications of this approach are presented, which include the prediction of future clinical complications, medical image analysis, identification of new pediatric endpoints and biomarkers, the prediction of treatment non-responders and the prediction of placebo-responders for trial enrichment. Finally, we discuss how to bring machine learning from science to pediatric clinical practice. We conclude that advantage should be taken of the current opportunities offered by innovations in data science and machine learning to close the pediatric evidence gap.
AuthorsSebastiaan C. Goulooze, Laura B. Zwep, Julia E. Vogt, Elke H.J. Krekels, Thomas Hankemeier, John N. van den Anker, Catherijne A.J. Knibbe
SubmittedClinical Pharmacology & Therapeutics
Date19.12.2019
Self-organizing maps (SOMs) have been widely used as a means to visualize latent structure in large amounts of heterogeneous data, in particular as a clustering method. Relatively little work, however, has focused on combining SOMs with deep generative networks for modeling health states, which arise for example in the intensive care unit (ICU). We present Temporal PSOM, a novel neural network architecture that jointly trains a Variational Autoencoder for feature extraction and a probabilistic version of SOM to achieve an interpretable discrete representation of patient health states in the ICU. Experiments on the publicly available eICU data set show significant improvements over state-of-the-art methods in terms of cluster enrichment for current APACHE physiology scores as well as prediction of future physiology states.
AuthorsLaura Manduchi, Matthias Hueser, Gunnar Raetsch, Vincent Fortuin
SubmittedML4H Workshop, NeurIPS 2019
Date15.12.2019
Learning from different data types is a long standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. Existing generative models that try to approximate a multimodal ELBO rely on difficult training schemes to handle the intermodality dependencies, as well as the approximation of the joint representation in case of missing data. In this work, we propose an ELBO for multimodal data which learns the unimodal and joint multimodal posterior approximation functions directly via a dynamic prior. We show that this ELBO is directly derived from a variational inference setting for multiple data types, resulting in a divergence term which is the Jensen-Shannon divergence for multiple distributions. We compare the proposed multimodal JS-divergence (mmJSD) model to state-of-the-art methods and show promising results using our model in unsupervised, generative learning using a multimodal VAE on two different datasets.
AuthorsThomas Sutter, Imant Daunhawer, Julia E. Vogt
SubmittedVisually Grounded Interaction and Language Workshop, NeurIPS 2019
Date12.12.2019
Multimodal generative models learn a joint distribution of data from different modalities---a task which arguably benefits from the disentanglement of modality-specific and modality-invariant information. We propose a factorized latent variable model that learns named disentanglement on multimodal data without additional supervision. We demonstrate the disentanglement capabilities on simulated data, and show that disentangled representations can improve the conditional generation of missing modalities without sacrificing unconditional generation.
AuthorsImant Daunhawer, Thomas Sutter, Julia E. Vogt
SubmittedBayesian Deep Learning Workshop, NeurIPS 2019
Date12.12.2019
Electronic Health Records (EHRs) are commonly used by the machine learning community for research on problems specifically related to health care and medicine. EHRs have the advantages that they can be easily distributed and contain many features useful for e.g. classification problems. What makes EHR data sets different from typical machine learning data sets is that they are often very sparse, due to their high dimensionality, and often contain heterogeneous data types. Furthermore, the data sets deal with sensitive information, which limits the distribution of any models learned using them, due to privacy concerns. In this work, we explore using Generative Adversarial Networks to generate synthetic, \textit{heterogeneous} EHRs with the goal of using these synthetic records in place of existing data sets. We will further explore applying differential privacy (DP) preserving optimization in order to produce differentially private synthetic EHR data sets, which provide rigorous privacy guarantees, and are therefore more easily shareable. The performance of our model's synthetic, heterogeneous data is very close to the original data set (within 4.5%) for the non-DP model. Although around 20% worse, the DP synthetic data is still usable for machine learning tasks.
AuthorsKieran Chin-Cheong, Thomas Sutter, Julia E. Vogt
SubmittedMachine Learning for Health (ML4H) Workshop, NeurIPS 2019
Date12.12.2019
Abbreviations and acronyms are shortened forms of words or phrases that are commonly used in technical writing. In this study we focus specifically on abbreviations and introduce a corpus-based method for their expansion. The method divides the processing into three key stages: abbreviation identification, full form candidate extraction, and abbreviation disambiguation. First, potential abbreviations are identified by combining pattern matching and named entity recognition. Both acronyms and abbreviations exhibit similar orthographic properties, thus additional processing is required to distinguish between them. To this end, we implement a character-based recurrent neural network (RNN) that analyses the morphology of a given token in order to classify it as an acronym or an abbreviation. A siamese RNN that learns the morphological process of word abbreviation is then used to select a set of full form candidates. Having considerably constrained the search space, we take advantage of the Word Mover’s Distance (WMD) to assess semantic compatibility between an abbreviation and each full form candidate based on their contextual similarity. This step does not require any corpus-based training, thus making the approach highly adaptable to different domains. Unlike the vast majority of existing approaches, our method does not rely on external lexical resources for disambiguation, but with a macro F-measure of 96.27% is comparable to the state-of-the art.
AuthorsDaphné Chopard, Irena Spasić
SubmittedStatistical Language and Speech Processing: 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14--16, 2019, Proceedings 7
Date14.10.2019
We consider the problem of inferring the causal direction between two univariate numeric random variables X and Y from observational data. This case is especially challenging as the graph X causes Y is Markov equivalent to the graph Y causes X, and hence it is impossible to determine the correct direction using conditional independence tests. To tackle this problem, we follow an information theoretic approach based on the algorithmic Markov condition. This postulate states that in terms of Kolmogorov complexity the factorization given by the true causal model is the most succinct description of the joint distribution. This means that we can infer that X is a likely cause of Y when we need fewer bits to first transmit the data over X, and then the data of Y as a function of X, than for the inverse direction. That is, in this paper we perform causal inference by compression. To put this notion to practice, we employ the Minimum Description Length principle, and propose a score to determine how many bits we need to transmit the data using a class of regression functions that can model both local and global functional relations. To determine whether an inference, i.e. the difference in compressed sizes, is significant, we propose two analytical significance tests based on the no-hypercompression inequality. Last, but not least, we introduce the linear-time Slope and Sloper algorithms that through thorough empirical evaluation we show outperform the state of the art by a wide margin.
AuthorsAlexander Marx, Jilles Vreeken
SubmittedKnowledge and Information Systems
Date01.09.2019
Clustering, a fundamental task in data science and machine learning, groups a set of objects in such a way that objects in the same cluster are closer to each other than to those in other clusters. In this paper, we consider a well-known structure, so-called $r$-nets, which rigorously captures the properties of clustering. We devise algorithms that improve the run-time of approximating $r$-nets in high-dimensional spaces with $\ell_1$ and $\ell_2$ metrics from $\tilde{O}(dn^{2-\Theta(\sqrt{\epsilon})})$ to $\tilde{O}(dn + n^{2-\alpha})$, where $\alpha = \Omega({\epsilon^{1/3}}/{\log(1/\epsilon)})$. These algorithms are also used to improve a framework that provides approximate solutions to other high dimensional distance problems. Using this framework, several important related problems can also be solved efficiently, e.g., $(1+\epsilon)$-approximate $k$th-nearest neighbor distance, $(4+\epsilon)$-approximate Min-Max clustering, $(4+\epsilon)$-approximate $k$-center clustering. In addition, we build an algorithm that $(1+\epsilon)$-approximates greedy permutations in time $\tilde{O}((dn + n^{2-\alpha}) \cdot \log{\Phi})$ where $\Phi$ is the spread of the input. This algorithm is used to $(2+\epsilon)$-approximate $k$-center with the same time complexity.
AuthorsGeorgia Avarikioti, Alain Ryser, Yuyi Wang, Roger Wattenhofer
SubmittedProceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 3207-3214).
Date17.07.2019
We consider the problem of telling apart cause from effect between two univariate continuous-valued random variables X and Y. In general, it is impossible to make definite statements about causality without making assumptions on the underlying model; one of the most important aspects of causal inference is hence to determine under which assumptions are we able to do so. In this paper we show under which general conditions we can identify cause from effect by simply choosing the direction with the best regression score. We define a general framework of identifiable regression-based scoring functions, and show how to instantiate it in practice using regression splines. Compared to existing methods that either give strong guarantees, but are hardly applicable in practice, or provide no guarantees, but do work well in practice, our instantiation combines the best of both worlds; it gives guarantees, while empirical evaluation on synthetic and real-world data shows that it performs at least as well as the state of the art.
AuthorsAlexander Marx, Jilles Vreeken
SubmittedProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2019
Date01.07.2019
Lisa Ruby, Sergio J. Sanabria, Katharina Martini, Konstantin J. Dedes, Denise Vorburger, Ece Özkan Elsen, Thomas Frauenfelder, Orcun Goksel, Marga B. Rominger
SubmittedInvestigative Radiology
Date30.06.2019
We present a probabilistic model for clustering which enables the modeling of overlapping clusters where objects are only available as pairwise distances. Examples of such distance data are genomic string alignments, or protein contact maps. In our clustering model, an object has the freedom to belong to one or more clusters at the same time. By using an IBP process prior, there is no need to explicitly fix the number of clusters, as well as the number of overlapping clusters, in advance. In this paper, we demonstrate the utility of our model using distance data obtained from HIV1 protease inhibitor contact maps.
AuthorsSandhya Prabhakaran, Julia E. Vogt
SubmittedArtificial Intelligence in Medicine (AIME), Springer Lecture Notes in Artificial Intelligence, 2019
Date29.05.2019
The recent adoption of Electronic Health Records (EHRs) by health care providers has introduced an important source of data that provides detailed and highly specific insights into patient phenotypes over large cohorts. These datasets, in combination with machine learning and statistical approaches, generate new opportunities for research and clinical care. However, many methods require the patient representations to be in structured formats, while the information in the EHR is often locked in unstructured texts designed for human readability. In this work, we develop the methodology to automatically extract clinical features from clinical narratives from large EHR corpora without the need for prior knowledge. We consider medical terms and sentences appearing in clinical narratives as atomic information units. We propose an efficient clustering strategy suitable for the analysis of large text corpora and to utilize the clusters to represent information about the patient compactly. To demonstrate the utility of our approach, we perform an association study of clinical features with somatic mutation profiles from 4,007 cancer patients and their tumors. We apply the proposed algorithm to a dataset consisting of about 65 thousand documents with a total of about 3.2 million sentences. We identify 341 significant statistical associations between the presence of somatic mutations and clinical features. We annotated these associations according to their novelty, and report several known associations. We also propose 32 testable hypotheses where the underlying biological mechanism does not appear to be known but plausible. These results illustrate that the automated discovery of clinical features is possible and the joint analysis of clinical and genetic datasets can generate appealing new hypotheses.
AuthorsStefan G. Stark, Stephanie L. Hyland, Melanie F. Pradier, Kjong Lehmann, Andreas Wicki, Fernando Perez Cruz, Julia E. Vogt, Gunnar Rätsch
SubmittedArxiv preprint
Date02.05.2019
Motivation: Personalized medicine aims at combining genetic, clinical, and environmental data to improve medical diagnosis and disease treatment, tailored to each patient. This paper presents a Bayesian nonparametric (BNP) approach to identify genetic associations with clinical/environmental features in cancer. We propose an unsupervised approach to generate data-driven hypotheses and bring potentially novel insights about cancer biology. Our model combines somatic mutation information at gene-level with features extracted from the Electronic Health Record. We propose a hierarchical approach, the hierarchical Poisson factor analysis (H-PFA) model, to share information across patients having different types of cancer. To discover statistically significant associations, we combine Bayesian modeling with bootstrapping techniques and correct for multiple hypothesis testing. Results: Using our approach, we empirically demonstrate that we can recover well-known associations in cancer literature. We compare the results of H-PFA with two other classical methods in the field: case-control (CC) setups, and linear mixed models (LMMs).
AuthorsMelanie F. Pradier, Stephanie L. Hyland, Stefan G. Stark, Kjong Lehmann, Julia E. Vogt, Fernando Perez-Cruz, Gunnar Rätsch
SubmittedBiorxiv preprint
Date29.04.2019
Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice--especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as L2 consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.
AuthorsAlexander Marx, Jilles Vreeken
SubmittedProceedings of the International Conference on Artificial Intelligence and Statistics, AISTATS 2019
Date01.04.2019
Background Machine learning models may enhance the early detection of clinically relevant hyperbilirubinemia based on patient information available in every hospital. Methods We conducted a longitudinal study on preterm and term born neonates with serial measurements of total serum bilirubin in the first two weeks of life. An ensemble, that combines a logistic regression with a random forest classifier, was trained to discriminate between the two classes phototherapy treatment vs. no treatment. Results Of 362 neonates included in this study, 98 had a phototherapy treatment, which our model was able to predict up to 48 h in advance with an area under the ROC-curve of 95.20%. From a set of 44 variables, including potential laboratory and clinical confounders, a subset of just four (bilirubin, weight, gestational age, hours since birth) suffices for a strong predictive performance. The resulting early phototherapy prediction tool (EPPT) is provided as an open web application. Conclusion Early detection of clinically relevant hyperbilirubinemia can be enhanced by the application of machine learning. Existing guidelines can be further improved to optimize timing of bilirubin measurements to avoid toxic hyperbilirubinemia in high-risk patients while minimizing unneeded measurements in neonates who are at low risk.
AuthorsImant Daunhawer, Severin Kasser, Gilbert Koch, Lea Sieber, Hatice Cakal, Janina Tütsch, Marc Pfister, Sven Wellman, Julia E. Vogt
SubmittedPediatric Research, 2019
Date30.03.2019
Alvaro Gomariz, Weiye Li, Ece Özkan Elsen, Christine Tanner, Orcun Goksel
SubmittedInternational Symposium on Biomedical Imaging (ISBI)
Date06.02.2019
The classification of time series data is a well-studied problem with numerous practical applications, such as medical diagnosis and speech recognition. A popular and effective approach is to classify new time series in the same way as their nearest neighbours, whereby proximity is defined using Dynamic Time Warping (DTW) distance, a measure analogous to sequence alignment in bioinformatics. However, practitioners are not only interested in accurate classification, they are also interested in why a time series is classified a certain way. To this end, we introduce here the problem of finding a minimum length subsequence of a time series, the removal of which changes the outcome of the classification under the nearest neighbour algorithm with DTW distance. Informally, such a subsequence is expected to be relevant for the classification and can be helpful for practitioners in interpreting the outcome. We describe a simple but optimized implementation for detecting these subsequences and define an accompanying measure to quantify the relevance of every time point in the time series for the classification. In tests on electrocardiogram data we show that the algorithm allows discovery of important subsequences and can be helpful in detecting abnormalities in cardiac rhythms distinguishing sick from healthy patients.
AuthorsRicards Marcinkevics, Steven Kelk, Carlo Galuzzi, Berthold Stegemann
SubmittedArxiv
Date26.01.2019
Stefanie Ehrbar, Alexander Jöhl, Michael Kühni, Mirko Meboldt, Ece Özkan Elsen, Christine Tanner, Orcun Goksel, Stephan Klöck, Jan Unkelbach, Matthias Guckenberger, Stephanie Tanadini-Lang
SubmittedMedical Physics
Date03.01.2019
2018
Sandhya Prabhakaran and Julia E. Vogt
SubmittedAll of Bayesian Nonparametrics Workshop in Neural Information Processing Systems Conference 2018
Date02.12.2018
To exploit the full potential of big routine data in healthcare and to efficiently communicate and collaborate with information technology specialists and data analysts, healthcare epidemiologists should have some knowledge of large-scale analysis techniques, particularly about machine learning. This review focuses on the broad area of machine learning and its first applications in the emerging field of digital healthcare epidemiology.
AuthorsJan A. Roth, Manuel Battegay, Fabrice Juchler, Julia E. Vogt, Andreas F. Widmer
SubmittedInfection Control & Hospital Epidemiology, 2018
Date04.11.2018
Sergio J Sanabria, Ece Özkan Elsen, Marga Rominger, Orcun Goksel
SubmittedPhysics in Medicine and Biology
Date26.10.2018
How can we discover whether X causes Y, or vice versa, that Y causes X, when we are only given a sample over their joint distribution? How can we do this such that X and Y can be univariate, multivariate, or of different cardinalities? And, how can we do so regardless of whether X and Y are of the same, or of different data type, be it discrete, numeric, or mixed? These are exactly the questions we answer. We take an information theoretic approach, based on the Minimum Description Length principle, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. Simply put, if Y can be explained more succinctly by a set of classification or regression trees conditioned on X, than in the opposite direction, we conclude that X causes Y. Empirical evaluation on a wide range of data shows that our method, Crack, infers the correct causal direction reliably and with high accuracy on a wide range of settings, outperforming the state of the art by a wide margin. Code related to this paper is available at: http://eda.mmci.uni-saarland.de/crack.
AuthorsAlexander Marx, Jilles Vreeken
SubmittedProceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Data, ECMLPKDD 2018
Date13.08.2018
Ece Özkan Elsen, Valery Vishnevsky, Orcun Goksel
SubmittedIEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control
Date03.03.2018
Ece Özkan Elsen, Orcun Goksel
SubmittedBiomedical Physics & Engineering Express
Date10.01.2018
Wearable health sensors are about to change our health system. While several technological improvements have been presented to enhance performance and energy-efficiency, battery runtime is still a critical concern for practical use of wearable biomedical sensor systems. The runtime limitation is directly related to the battery size, which is another concern regarding practicality and customer acceptance. We introduced ULPSEK-Ultra-Low-Power Sensor Evaluation Kit-for evaluation of biomedical sensors and monitoring applications (http://ulpsek.com). ULPSEK includes a multiparameter sensor measuring and processing electrocardiogram, respiration, motion, body temperature, and photoplethysmography. Instead of a battery, ULPSEK is powered using an efficient body heat harvester. The harvester produced 171 W on average, which was sufficient to power the sensor below 25 C ambient temperature. We present design issues regarding the power supply and the power distribution network of the ULPSEK sensor platform. Due to the security aspect of self-powered health sensors, we suggest a hybrid solution consisting of a battery charged by a harvester.
AuthorsA. Tobola, H. Leutheuser, M. Pollak, P. Spies, C. Hofmann, C. Weigand, B.M. Eskofier, G. Fischer
SubmittedIEEE J Biomed Health Inform.
Date01.01.2018
2017
The second most common cause of diving fatalities is cardiovascular diseases. Monitoring the cardiovascular system in actual underwater conditions is necessary to gain insights into cardiac activity during immersion and to trigger preventive measures. We developed a wearable, current-based electrocardiogram (ECG) device in the eco-system of the FitnessSHIRT platform. It can be used for normal/dry ECG measuring purposes but is specifically designed to allow underwater signal acquisition without having to use insulated electrodes. Our design is based on a transimpedance amplifier circuit including active current feedback. We integrated additional cascaded filter components to counter noise characteristics specific to the immersed condition of such a system. The results of the evaluation show that our design is able to deliver high-quality ECG signals underwater with no interferences or loss of signal quality. To further evaluate the applicability of the system, we performed an applied study with it using 12 healthy subjects to examine whether differences in the heart rate variability exist between sitting and supine positions of the human body immersed in water and outside of it. We saw significant differences, for example, in the RMSSD and SDSD between sitting outside the water (36 ms) and sitting immersed in water (76 ms) and the pNN50 outside the water (6.4%) and immersed in water (18.2%). The power spectral density for the sitting positions in the TP and HF increased significantly during water immersion while the LF/HF decreased significantly. No significant changes were found for the supine position.
AuthorsS. Gradl, T. Cibis, J. Lauber, R. Richer, R. Rybalko, N. Pfeiffer, H. Leutheuser, M. Wirth, V. Tscharner, B. M. Eskofier
SubmittedAppl Sci.
Date08.12.2017
Objective: Respiratory inductance plethysmography (RIP) provides an unobtrusive method for measuring breathing characteristics. Accurately adjusted RIP provides reliable measurements of ventilation during rest and exercise if data are acquired via two elastic measuring bands surrounding the rib cage (RC) and abdomen (AB). Disadvantageously, the most accurate reported adjusted model for RIP in literature-least squares regression-requires simultaneous RIP and flowmeter (FM) data acquisition. An adjustment method without simultaneous measurement (reference-free) of RIP and FM would foster usability enormously. Methods: In this paper, we develop generalizable, functional, and reference-free algorithms for RIP adjustment incorporating anthropometric data. Further, performance of only one-degree of freedom (RC or AB) instead of two (RC and AB) is investigated. We evaluate the algorithms with data from 193 healthy subjects who performed an incremental running test using three different datasets: training, reliability, and validation dataset. The regression equation is improved with machine learning techniques such as sequential forward feature selection and 10-fold cross validation. Results: Using the validation dataset, the best reference-free adjustment model is the combination of both bands with 84.69% breaths within 20% limits of equivalence compared to 43.63% breaths using the best comparable algorithm from literature. Using only one band, we obtain better results using the RC band alone. Conclusion: Reference-free adjustment for RIP reveals tidal volume differences of up to 0.25 l when comparing to the best possible adjustment currently present which needs the simultaneous measurement of RIP and FM. Significance: This demonstrates that RIP has the potential for usage in wide applications in ambulatory settings.
AuthorsH. Leutheuser, C. Heyde, K. Roecker, A. Gollhofer, B. M Eskofier
SubmittedIEEE Trans Biomed Eng.
Date01.12.2017
We consider the fundamental problem of inferring the causal direction between two univariate numeric random variables X and Y from observational data. The two-variable case is especially difficult to solve since it is not possible to use standard conditional independence tests between the variables. To tackle this problem, we follow an information theoretic approach based on Kolmogorov complexity and use the Minimum Description Length (MDL) principle to provide a practical solution. In particular, we propose a compression scheme to encode local and global functional relations using MDL-based regression. We infer X causes Y in case it is shorter to describe Y as a function of X than the inverse direction. In addition, we introduce Slope, an efficient linear-time algorithm that through thorough empirical evaluation on both synthetic and real world data we show outperforms the state of the art by a wide margin.
AuthorsAlexander Marx, Jilles Vreeken
SubmittedProceedings of the IEEE International Conference on Data Mining, ICDM 2017
Date01.11.2017
Aims: The identification of arrhythmogenic right ventricular dysplasia (ARVD) from 12-channel standard electrocardiogram (ECG) is challenging. High density ECG data may identify lead locations and criteria with a higher sensitivity. Methods and results: Eighty-channel ECG recording from patients diagnosed with ARVD and controls were quantified by magnitude and integral measures of QRS and T waves and by a measure (the average silhouette width) of differences in the shapes of the normalized ECG cycles. The channels with the best separability between ARVD patients and controls were near the right ventricular wall, at the third intercostal space. These channels showed pronounced differences in P waves compared to controls as well as the expected differences in QRS and T waves. Conclusion: Multichannel recordings, as in body surface mapping, add little to the reliability of diagnosing ARVD from ECGs. However, repositioning ECG electrodes to a high anterior position can improve the identification of ECG variations in ARVD. Additionally, increased P wave amplitude appears to be associated with ARVD.
AuthorsRicards Marcinkevics, James O’Neill, Hannah Law, Eleftheria Pervolaraki, Andrew Hogarth, Craig Russell, Berthold Stegemann, Arun V Holden, Muzahir H Tayebjee
SubmittedEP Europace
Date29.08.2017
Sleep plays a fundamental role in the life of every human. The prevalence of sleep disorders has increased significantly, now affecting up to 50% of the general population. Sleep is usually analyzed by extracting a hypnogram containing sleep stages. The gold standard method polysomnography (PSG) requires subjects to stay overnight in a sleep laboratory and to wear a series of obtrusive devices. This work presents an easy to use method to perform somnography at home using unobtrusive motion sensors. Ten healthy male subjects were recorded during two consecutive nights. Sensors from the Shimmer platform were placed in the bed to record accelerometer data, while reference hypnograms were collected using a SOMNOwatch system. A series of filters were used to extract a motion feature in 30 second epochs from the accelerometer signals. The feature was used together with the ground truth information to train a Naive Bayes classifiers that distinguished wakefulness, REM and non-REM sleep. Additionally the algorithm was implemented on an Android mobile phone. Averaged over all subjects, the classifier had a mean accuracy of 79.0 % (SD 9.2%) for the three classes. The mobile phone implementation was able to run in realtime during all experiments. In future this will lead to a method for simple and unobtrusive somnography using mobile phones.
AuthorsS. Gradl, H. Leutheuser, P. Kugler, T. Biermann, S. Kreil, J. Kornhuber, M. Bergner, B. M. Eskofier
SubmittedIn Proc: 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Date03.07.2017
Ece Özkan Elsen, Christine Tanner, Matej Kastelic, Oliver Mattausch, Maxim Makhinya, Orcun Goksel
SubmittedInternational Journal of Computer Assisted Radiology and Surgery
Date22.03.2017
Innovative and pervasive monitoring possibilities are given using textile integration of wearable computing components. We present the FitnessSHIRT (Fraunhofer IIS, Erlangen, Germany) as one example of a textile integrated wearable computing device. Using the FitnessSHIRT, the electric activity of the human heart and breathing characteristics can be determined. Within this chapter, we give an overview of the market situation, current application scenarios, and related work. We describe the technology and algorithms behind the wearable FitnessSHIRT as well as current application areas in sports and medicine. Challenges using textile integrated wearable devices are stated and addressed in experiments or in explicit recommendations. The applicability of the FitnessSHIRT is shown in user studies in sports and medicine. This chapter is concluded with perspectives for textile integrated wearable devices.
AuthorsLeutheuser, H. and Lang, N. and Gradl, S. and Struck, M. and Tobola, A. and Hofmann, C. and Anneken, L. and Eskofier, B. M.
SubmittedSmart Textiles: Fundamentals, Design, and Interaction
Date01.02.2017
2016
Battery runtime is a critical concern for practical usage of wearable biomedical sensor systems. A long runtime requires an interdisciplinary low-power knowledge and appropriate design tools. We addressed this issue designing a toolbox in three parts: (1) Modular evaluation kit for development of wearable ultra-low-power biomedical sensors; (2) Miniaturized, wearable, and code compatible sensor system with the same properties as the development kit; (3) Web-based battery runtime calculator for our sensor systems. The purpose of the development kit is optimization of the power consumption. Once optimization is finished, the same embedded software can be transferred to the miniaturized body worn sensor. The web-based application supports development quantifying the effects of use case and design decisions on battery runtime. A sensor developer can select sensor modules, configure sensor parameters, enter use case specific requirements, and select a battery to predict the battery runtime for a specific application. Our concept adds value to development of ultra-low-power biomedical wearable sensors. The concept is effective for professional work and educational purposes.
AuthorsTobola, A. and Leutheuser, H. and Schmitz, B. and Hofmann, C. and Struck, M. and Weigand, C. and Eskofier, B. M. and Fischer, G.
SubmittedIn Proc: IEEE-EMBS 13th International Conference on Wearable and Implantable Body Sensor Networks (BSN)
Date14.06.2016
Arrhythmia detection algorithms require the exact and instantaneous detection of fiducial points in the ECG signal. These fiducial points (QRS-complex, P- and T-wave) correspond to distinct cardiac contraction phases. The performance evaluation of different fiducial points detection algorithms require the existence of large databases (DBs) encompassing reference annotations. Up to last year, P- and T-wave annotations were only available for the QT DB. This was addressed by Elgendi et al. who provided P- and T-wave annotations to the MIT-BIH arrhythmia DB. A variety of ECG fiducial points detection algorithms exists in literature, whereas, to the best knowledge of the authors, we could not identify any single-lead algorithm ready for instantaneous P- and T-wave detection. In this work, we present three P- and T-wave detection algorithms: a revised version for QRS detection using line fitting capable to detect P- and T-wave, an expeditious version of a wavelet based ECG delineation algorithm, and a fast naive fiducial points detection algorithm. The fast naive fiducial points detection algorithm performed best on both DBs with sensitivities ranging from 73.0% (P-wave detection, error interval of ± 40 ms) to 89.4% (T-wave detection, error interval of ± 80 ms). As this algorithm detects a wave event in every search window, it has to be investigated how this affects arrhythmia detection algorithms. The reference Matlab implementations are available for download to encourage the development of high-accurate and automated ECG processing algorithms for the integration in daily life using mobile computers.
AuthorsLeutheuser, H. and Gradl, S. and Anneken, L. and Arnold, M. and Lang, N. and Achenbach, S. and Eskofier, B. M.
SubmittedIn Proc: IEEE-EMBS 13th International Conference on Wearable and Implantable Body Sensor Networks (BSN)
Date14.06.2016
Ece Özkan Elsen, Gemma Roig, Orcun Goksel, Xavier Boix
SubmittedarXiv
Date27.05.2016
Firat Ozdemir, Ece Özkan Elsen, Orcun Goksel
SubmittedInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)
Date27.05.2016
Ece Özkan Elsen, Orcun Goksel
SubmittedIEEE International Ultrasonics Symposium (IUS)
Date27.05.2016
Respiratory motion analysis based on range imaging (RI) has emerged as a popular means of generating respiration surrogates to guide motion management strategies in computer-assisted interventions. However, existing approaches employ heuristics, require substantial manual interaction, or yield highly redundant information. In this paper, we propose a framework that uses preprocedurally obtained 4-D shape priors from patient-specific breathing patterns to drive intraprocedural RI-based real-time respiratory motion analysis. As the first contribution, we present a shape motion model enabling an unsupervised decomposition of respiration induced high-dimensional body surface displacement fields into a low-dimensional representation encoding thoracic and abdominal breathing. Second, we propose a method designed for GPU architectures to quickly and robustly align our models to high-coverage multiview RI body surface data. With our fully automatic method, we obtain respiration surrogates yielding a Pearson correlation coefficient (PCC) of 0.98 with conventional surrogates based on manually selected regions on RI body surface data. Compared to impedance pneumography as a respiration signal that measures the change of lung volume, we obtain a PCC of 0.96. Using off-the-shelf hardware, our framework enables high temporal resolution respiration analysis at 50 Hz.
AuthorsJ. Wasza, P. Fischer, H. Leutheuser, T. Oefner, C. Bert, A. Maier, J. Hornegger
SubmittedIEEE Trans Biomed Eng.
Date01.03.2016
Molecular classification of hepatocellular carcinomas (HCC) could guide patient stratification for personalizedtherapies targeting subclass-specific cancer 'driver pathways'. Currently, there are several transcriptome-basedmolecular classifications of HCC with different subclass numbers, ranging from two to six. They were estab-lished using resected tumours that introduce a selection bias towards patients without liver cirrhosis and withearly stage HCCs. We generated and analyzed gene expression data from paired HCC and non-cancerous livertissue biopsies from 60 patients as well as five normal liver samples. Unbiased consensus clustering of HCCbiopsy profiles identified 3 robust classes. Class membership correlated with survival, tumour size and withEdmondson and Barcelona Clinical Liver Cancer (BCLC) stage. When focusing only on the gene expression ofthe HCC biopsies, we could validate previously reported classifications of HCC based on expression patterns ofsignature genes. However, the subclass-specific gene expression patterns were no longer preserved when thefold-change relative to the normal tissue was used. The majority of genes believed to be subclass-specificturned out to be cancer-related genes differentially regulated in all HCC patients, with quantitative ratherthan qualitative differences between the molecular subclasses. With the exception of a subset of samples with a definitive \beta-catenin gene signature, biological pathway analysis could not identify class-specific pathwaysreflecting the activation of distinct oncogenic programs. In conclusion, we have found that gene expressionprofiling of HCC biopsies has limited potential to direct therapies that target specific driver pathways, but canidentify subgroups of patients with different prognosis.
AuthorsZuzanna Makowska, Tujana Boldanova, David Adametz, Luca Quagliata, Julia E. Vogt, Michael T. Dill, Mathias S. Matter, Volker Roth, Luigi Terracciano, Markus H. Heim
SubmittedJournal of Pathology: Clinical Research, 2016
Date05.01.2016
In many research disciplines, hypothesis tests are applied to evaluate whether findings are statistically significant or could be explained by chance. The Wilcoxon–Mann–Whitney (WMW) test is among the most popular hypothesis tests in medicine and life science to analyze if two groups of samples are equally distributed. This nonparametric statistical homogeneity test is commonly applied in molecular diagnosis. Generally, the solution of the WMW test takes a high combinatorial effort for large sample cohorts containing a significant number of ties. Hence, P value is frequently approximated by a normal distribution. We developed EDISON-WMW, a new approach to calculate the exact permutation of the two-tailed unpaired WMW test without any corrections required and allowing for ties. The method relies on dynamic programing to solve the combinatorial problem of the WMW test efficiently. Beyond a straightforward implementation of the algorithm, we presented different optimization strategies and developed a parallel solution. Using our program, the exact P value for large cohorts containing more than 1000 samples with ties can be calculated within minutes. We demonstrate the performance of this novel approach on randomly-generated data, benchmark it against 13 other commonly-applied approaches and moreover evaluate molecular biomarkers for lung carcinoma and chronic obstructive pulmonary disease (COPD). We found that approximated P values were generally higher than the exact solution provided by EDISON-WMW. Importantly, the algorithm can also be applied to high-throughput omics datasets, where hundreds or thousands of features are included. To provide easy access to the multi-threaded version of EDISON-WMW, a web-based solution of our algorithm is freely available at http://www.ccb.uni-saarland.de/software/wtest/.
AuthorsAlexander Marx, Christina Backes, Eckart Meese, Hans-Peter Lenhof, Andreas Keller
SubmittedGenomics, Proteomics & Bioinformatics
Date01.01.2016
2015
This paper proposes a new framework to find associations between somatic mu- tations and clinical features in cancer. The clinical features are directly extracted from the Electronic Health Records by performing a large-scale clustering of the sentences. Using a linear mixed model, we find significant associations between EHR-based phenotypes and gene mutations, while correcting for the cancer type as a confounding effect. To the author’s knowledge, this is the first attempt to per- form genetic association studies using EHR-based phenotypes. Such research has the potential to help in the discovery of unknown mechanisms in cancer, which will allow to prevent the disease, monitor patients at risk, and design tailored treatments for the patients.
AuthorsMelanie F. Pradier, Stefan Stark, Stephanie Hyland, Julia E. Vogt, Gunnar Rätsch, and Fernando Perez-Cruz
SubmittedPaper + Spotlight Talk at Machine Learning for Computational Biology Workshop in Neural Information Processing Systems Conference 2015
Date07.12.2015
Melanie F. Pradier, Theofanis Karaletsos, Stefan Stark, Julia E. Vogt, Gunnar Rätsch, and Fernando Perez-Cruz
SubmittedAccepted Abstract at Machine Learning for Healthcare Workshop in Neural Information Processing Systems Conference 2015
Date06.12.2015
Photoplethysmography (PPG) is a non-invasive, inexpensive and unobtrusive method to achieve heart rate monitoring during physical exercises. Motion artifacts during exercise challenge the heart rate estimation from wrist-type PPG signals. This paper presents a methodology to overcome these limitation by incorporating acceleration information. The proposed algorithm consisted of four stages: (1) A wavelet based denoising, (2) an acceleration based denoising, (3) a frequency based approach to estimate the heart rate followed by (4) a postprocessing step. Experiments with different movement types such as running and rehabilitation exercises were used for algorithm design and development. Evaluation of our heart rate estimation showed that a mean absolute error 1.96 bpm (beats per minute) with standard deviation of 2.86 bpm and a correlation of 0.98 was achieved with our method. These findings suggest that the proposed methodology is robust to motion artifacts and is therefore applicable for heart rate monitoring during sports and rehabilitation.
AuthorsMullan, P. J. and Kanzler, C. M. and Lorch, B. and Schröder, L. and Winkler, L. and Laich, L. H. and Riedel, F. and Richer, R. and Luckner, C. and Leutheuser, H. and Eskofier, B. M. and Pasluosta, C. F.
SubmittedIn Proc: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Date25.08.2015
In the last decade the interest for heart rate variability analysis has increased tremendously. Related algorithms depend on accurate temporal localization of the heartbeat, e.g. the R-peak in electrocardiogram signals, especially in the presence of arrhythmia. This localization can be delivered by numerous solutions found in the literature which all lack an exact specification of their temporal precision. We implemented three different state-of-the-art algorithms and evaluated the precision of their R-peak localization. We suggest a method to estimate the overall R-peak temporal inaccuracy-dubbed beat slackness-of QRS detectors with respect to normal and abnormal beats. We also propose a simple algorithm that can complement existing detectors to reduce this slackness. Furthermore we define improvements to one of the three detectors allowing it to be used in real-time on mobile devices or embedded hardware. Across the entire MIT-BIH Arrhythmia Database, the average slackness of all the tested algorithms was 9 ms for normal beats and 13 ms for abnormal beats. Using our complementing algorithm this could be reduced to 4 ms for normal beats and to 7 ms for abnormal beats. The presented methods can be used to significantly improve the precision of R-peak detection and provide an additional measurement for QRS detector performance.
AuthorsGradl, S. and Leutheuser, H. and Elgendi, M. and Lang, N. and Eskofier, B. M.
SubmittedIn Proc: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Date25.08.2015
Epilepsy is a disease of the central nervous system. Nearly 70% of people with epilepsy respond to a proper treatment, but for a successful therapy of epilepsy, physicians need to know if and when seizures occur. The gold standard diagnosis tool video-electroencephalography (vEEG) requires patients to stay at hospital for several days. A wearable sensor system, e.g. a wristband, serving as diagnostic tool or event monitor, would allow unobtrusive ambulatory long-term monitoring while reducing costs. Previous studies showed that seizures with motor symptoms such as generalized tonic-clonic seizures can be detected by measuring the electrodermal activity (EDA) and motion measuring acceleration (ACC). In this study, EDA and ACC from 8 patients were analyzed. In extension to previous studies, different types of seizures, including seizures without motor activity, were taken into account. A hierarchical classification approach was implemented in order to detect different types of epileptic seizures using data from wearable sensors. Using a k-nearest neighbor (kNN) classifier an overall sensitivity of 89.1% and an overall specificity of 93.1% were achieved, for seizures without motor activity the sensitivity was 97.1% and the specificity was 92.9%. The presented method is a first step towards a reliable ambulatory monitoring system for epileptic seizures with and without motor activity.
AuthorsB. E. Heldberg, T. Kautz, H. Leutheuser, R. Hopfeng\"artner, B. Kasper, B. M. Eskofier
SubmittedIn Proc: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Date25.08.2015
Medical diagnosis is the first level for recognition and treatment of diseases. To realize fast diagnosis, we propose a concept of a basic framework for the underwater monitoring of a diver’s ECG signal, including an alert system that warns the diver of predefined medical emergency situations. The framework contains QRS detection, heart rate calculation and an alert system. After performing a predefined study protocol, the algorithm’s accuracy was evaluated with 10 subjects in a dry environment and with 5 subjects in an underwater environment. The results showed that, in 3 out of 5 dives as well as in dry environment, data transmission remained stable. In these cases, the subjects were able to trigger the alert system. The evaluated data showed a clear ECG signal with a QRS detection accuracy of 90%. Thus, the proposed framework has the potential to detect and to warn of health risks. Further developments of this sample concept can imply an extension for monitoring different biomedical parameters.
AuthorsT. Cibis, B. Groh, H. Leutheuser, B. M. Eskofier
SubmittedIn Proc: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Date25.08.2015
Purpose Exercise and physical activity is a driving force for mental health. Major challenges in the treatment of psychological diseases are accurate activity profiles and the adherence to exercise intervention programs. We present the development and validation of CHRONACT, a wearable realtime activity tracker based on inertial sensor data to support mental health. Methods CHRONACT comprised a Human Activity Recognition (HAR) algorithm that determined activity levels based on their Metabolic Equivalent of Task (MET) with sensors on ankle and wrist. Special emphasis was put on wearability, real-time data analysis and runtime to be able to use the system as augmented feedback device. For the development, data of 47 healthy subjects performing clinical intervention program activities were collected to train different classification models. The most suitable model according to the accuracy and processing power tradeoff was selected for an embedded implementation on CHRONACT. Results A validation trial (six subjects, 6 h of data) showed the accuracy of the system with a classification rate of 85.6%. The main source of error was identified in acyclic activities that contained activity bouts of neighboring classes. The runtime of the system was more than 7 days and continuous result logging was available for 39 h. Conclusions In future applications, the CHRONACT system can be used to create accurate and unobtrusive patient activity profiles. Furthermore, the system is ready to assess the effects of individual augmented feedback for exercise adherence.
AuthorsU. Jensen, H. Leutheuser, S. Hofmann, B. Schuepferling, G. Suttner, K. Seiler, J. Kornhuber, B. M Eskofier
SubmittedBiomed Eng Lett.
Date18.07.2015
We present a novel probabilistic clustering model for objects that are represented via pairwise distances and observed at different time points. The proposed method utilizes the information given by adjacent time points to find the underlying cluster structure and obtain a smooth cluster evolution. This approach allows the number of objects and clusters to differ at every time point, and no identification on the identities of the objects is needed. Further, the model does not require the number of clusters being specified in advance—they are instead determined automatically using a Dirichlet process prior. We validate our model on synthetic data showing that the proposed method is more accurate than state-of-the-art clustering methods. Finally, we use our dynamic clustering model to analyze and illustrate the evolution of brain cancer patients over time.
AuthorsJulia E. Vogt, Marius Kloft, Stefan Stark, Sandhya Prabhakaran, Sudhir Raman, Volker Roth and Gunnar Rätsch
SubmittedMachine Learning Journal, 2015
Date16.07.2015
Long battery runtime is one of the most wanted prop-erties of wearable sensor systems. The sampling rate has an highimpact on the power consumption. However, defining a sufficientsampling rate, especially for cutting edge mobile sensors isdifficult. Often, a high sampling rate, up to four times higher thannecessary, is chosen as a precaution. Especially for biomedicalsensor applications many contradictory recommendations exist,how to select the appropriate sample rate. They all are motivatedfrom one point of view – the signal quality. In this paper wemotivate to keep the sampling rate as low as possible. Thereforewe reviewed common algorithms for biomedical signal processing.For each algorithm the number of operations depending on thedata rate has been estimated. The Bachmann-Landau notationhas been used to evaluate the computational complexity independency of the sampling rate. We found linear, logarithmic,quadratic and cubic dependencies.
AuthorsTobola, A. and Streit, F. and Espig, C. and Korpok, O. and Leutheuser, H. and Sauter, C. and Lang, N. and Schmitz, B. and Hofmann, C. and Struck, M. and Weigand, C. and Eskofier, B. M. and Fischer G.
SubmittedIn Proc: IEEE-EMBS 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN)
Date09.06.2015
Far too many people are dying from stroke or other heart related diseases each year. Early detection of abnormal heart rhythm could trigger the timely presentation to the emergency department or outpatient unit. Smartphones are an integral part of everyone's life and they form the ideal basis for mobile monitoring and real-time analysis of signals related to the human heart. In this work, we investigated the performance of arrhythmia classification systems using only features calculated from the time instances of individual heart beats. We built a sinusoidal model using N (N = 10, 15, 20) consecutive RR intervals to predict the (N+1)th RR interval. The integration of the innovative sinusoidal regression feature, together with the amplitude and phase of the proposed sinusoidal model, led to an increase in the mean class-dependent classification accuracies. Best mean class-dependent classification accuracies of 90% were achieved using a Naive Bayes classifier. Well-performing real- time analysis arrhythmia classification algorithms using only the time instances of individual heart beats could have a tremendous impact in reducing healthcare costs and reducing the high number of deaths related to cardiovascular diseases.
AuthorsLeutheuser, H. and Tobola, A. and Anneken, L. and Arnold, M. and Lang, N. and Achenbach, S. and Eskofier, B. M
SubmittedIn Proc: IEEE-EMBS 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN)
Date09.06.2015
Athletes and their coaches aim for enhancing the sports performance. Collecting data from athletes, transforming them into useful information related to their sports performance (e.g., their type of gait), and transmitting the information to the coaches supports the enhancement. The types of gait standing, walking, and running were often examined. Lack of research remains for the two types of running, jogging and sprinting. In this work, standing, walking, jogging, and sprinting were classified with a single inertial-magnetic measurement unit that was placed at a novel position at the trunk. A comparison was made between classification systems using different combinations of accelerometer, gyroscope, and magnetometer data as well as different classifiers (Naïve Bayes, k-Nearest Neighbors, Support Vector Machine, Adaptive Boosting). After collecting data from 15 male subjects, the data were preprocessed, features were extracted and selected, and the data were classified. All classification systems were successful. With a mean true positive rate of 95.68% ±1.80%, the classification system using accelerometer and gyroscope data as well as the Naïve Bayes classifier performed best. The classification system can be used for applications in sport and sports performance analysis in particular.
AuthorsK. Full, H. Leutheuser, J. Schlessman, R. Armitage, B. M. Eskofier
SubmittedIn Proc: IEEE-EMBS 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN)
Date09.06.2015
Ece Özkan Elsen, Orcun Goksel
SubmittedInternational Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Date27.05.2015
Everything in nature tries to reach the lowest possible energy level. Therefore any natural or artificial system must have the ability to adjust itself to the changing requirements of its surrounding environment. In this paper we address this issue by an ECG sensor designed to be adjustable during runtime, having the ability to reduce the power consumption at cost of the informational content. Accessible for everyone, standard ECG hardware and open source software has been used to realize an ECG processing system for wearable applications. The average power consumption has been measured for each mode of operation. Finally we take conclusion to conciser context-aware scaling as key feature to address the energy issue of wearable sensor systems.
AuthorsTobola, A. and Espig, C. and Streit, F. J. and Korpok, O. and Leutheuser, H. and Schmitz, B. and Hofmann, C. and Struck, M. and Weigand, C. and Eskofier, B. M. and Fischer, G.
SubmittedIn Proc: 10th Annual IEEE International Symposium on Medical Measurements and Applications (MeMeA)
Date07.05.2015
A major challenge in computational biology is to find simple representations of high-dimensional data that best reveal the underlying structure. In this work, we present an intuitive and easy-to-implement method based on ranked neighborhood comparisons that detects structure in unsupervised data. The method is based on ordering objects in terms of similarity and on the mutual overlap of nearest neighbors. This basic framework was originally introduced in the field of social network analysis to detect actor communities. We demonstrate that the same ideas can successfully be applied to biomedical data sets in order to reveal complex underlying structure. The algorithm is very efficient and works on distance data directly without requiring a vectorial embedding of data. Comprehensive experiments demonstrate the validity of this approach. Comparisons with state-of-the-art clustering methods show that the presented method outperforms hierarchical methods as well as density based clustering methods and model-based clustering. A further advantage of the method is that it simultaneously provides a visualization of the data. Especially in biomedical applications, the visualization of data can be used as a first pre-processing step when analyzing real world data sets to get an intuition of the underlying data structure. We apply this model to synthetic data as well as to various biomedical data sets which demonstrate the high quality and usefulness of the inferred structure.
AuthorsJulia E. Vogt
SubmittedIEEE/ACM Transactions on Computational Biology and Bioinformatics (Volume: 12 , Issue: 4 , July-Aug. 1 2015)
Date26.01.2015
2014
Early detection of arrhythmic beats in the electrocardiogram (ECG) signal could improve the identification of patients at risk from sudden death, for example due to coronary heart disease. We present a mobile, hierarchical classification system (three stages in total) using complete databases with the aim to provide instantaneous analysis in case of symptoms and–if necessary–the recommendation to visit an emergency department. In this work, we give more details about the training process of the second stage classifier. The Linear Regression classifier achieved the smallest false negative rate of 14.06% with an accuracy of 66.19% after feature selection. It has to be investigated whether the hierarchical classification system has–in its entirety–better performance orientating on the false negative rate or the accuracy for the second stage classifier. The complete hierarchical classification system has the potential to provide automated, accurate ECG arrhythmia detection that can easily be integrated in daily life.
AuthorsH. Leutheuser, T. Gottschalk, L. Anneken, M. Struck, A. Heuberger, M. Arnold, S. Achenbach, B. M. Eskofier
SubmittedIn Proc: Conference on Mobile and Information Technologies in Medicine (MobileMed)
Date20.11.2014
Activity recognition is mandatory in order to provide feedback about the individual quality of life. Usually, activity recognition algorithms are evaluated on one specific database which is limited in the number of subjects, sensors and type of activities. In this paper, a novel database fusion strategy was proposed which fused three different publicly available databases to one large database consisting of 42 subjects. The fusion of databases addresses the two attributes high volume and high variety of the term "big data". Furthermore, an algorithm was developed which can deal with multiple databases varying in the number of sensors and activities. Nine features were computed in sliding windows of inertial data of several sensor positions. Decision-level fusion was performed in order to combine the information of different sensor positions. The proposed classification system achieved an overall mean classification rate of 85.8 % and allows an easy integration of new databases. Using big data is necessary to develop robust and stable activity recognition algorithms in the future.
AuthorsSchuldhaus, D. and Leutheuser, H. and Eskofier, B. M.
SubmittedIn Proc: 9th International Conference on Body Area Networks (BodyNets)
Date01.09.2014
Analysis of electroencephalography (EEG) recorded during movement is often aggravated or even completely hindered by electromyogenic artifacts. This is caused by the overlapping frequencies of brain and myogenic activity and the higher amplitude of the myogenic signals. One commonly employed computational technique to reduce these types of artifacts is Independent Component Analysis (ICA). ICA estimates statistically independent components (ICs) that, when linearly combined, closely match the input (sensor) data. Removing the ICs that represent artifact sources and re-mixing the sources returns the input data with reduced noise activity. ICs of real-world data are usually not perfectly separated, actual sources, but a mixture of these sources. Adding additional input signals, predominantly generated by a single IC that is already part of the original sensor data, should increase that IC's separability. We conducted this study to evaluate this concept for ICA-based electromyogenic artifact reduction in EEG using EMG signals as additional inputs. To acquire the appropriate data we worked with nine human volunteers. The EEG and EMG were recorded while the study volunteers performed seven exercises designed to produce a wide range of representative myogenic artifacts. To evaluate the effect of the EMG signals we estimated the sources of each dataset once with and once without the EMG data. The ICs were automatically classified as either `myogenic' or `non-myogenic'. We removed the former before back projection. Afterwards we calculated an objective measure to quantify the artifact reduction and assess the effect of including EMG signals. Our study showed that the ICA-based reduction of electromyogenic artifacts can be improved by including the EMG data of artifact-inducing muscles. This approach could prove beneficial for locomotor disorder research, brain-computer interfaces, neurofeedback, and most other areas where brain activity during movement has to be analyzed.
AuthorsF. Gabsteiger, H. Leutheuser, P. Reis, M. Lochmann, B. M. Eskofier
SubmittedIn Proc: 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Date26.08.2014
Respiratory inductive plethysmography (RIP) has been introduced as an alternative for measuring ventilation by means of body surface displacement (diameter changes in rib cage and abdomen). Using a posteriori calibration, it has been shown that RIP may provide accurate measurements for ventilatory tidal volume under exercise conditions. Methods for a priori calibration would facilitate the application of RIP. Currently, to the best knowledge of the authors, none of the existing ambulant procedures for RIP calibration can be used a priori for valid subsequent measurements of ventilatory volume under exercise conditions. The purpose of this study is to develop and validate a priori calibration algorithms for ambulant application of RIP data recorded in running exercise. We calculated Volume Motion Coefficients (VMCs) using seven different models on resting data and compared the root mean squared error (RMSE) of each model applied on running data. Least squares approximation (LSQ) without offset of a two-degree-of-freedom model achieved the lowest RMSE value. In this work, we showed that a priori calibration of RIP exercise data is possible using VMCs calculated from 5 min resting phase where RIP and flowmeter measurements were performed simultaneously. The results demonstrate that RIP has the potential for usage in ambulant applications.
AuthorsH. Leutheuser, C. Heyde, A. Gollhofer, B. M Eskofier
SubmittedIn Proc: 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Date26.08.2014
The electrocardiogram (ECG) is a key diagnostic tool in heart disease and may serve to detect ischemia, arrhythmias, and other conditions. Automatic, low cost monitoring of the ECG signal could be used to provide instantaneous analysis in case of symptoms and may trigger the presentation to the emergency department. Currently, since mobile devices (smartphones, tablets) are an integral part of daily life, they could form an ideal basis for automatic and low cost monitoring solution of the ECG signal. In this work, we aim for a realtime classification system for arrhythmia detection that is able to run on Android-based mobile devices. Our analysis is based on 70% of the MIT-BIH Arrhythmia and on 70% of the MIT-BIH Supraventricular Arrhythmia databases. The remaining 30% are reserved for the final evaluation. We detected the R-peaks with a QRS detection algorithm and based on the detected R-peaks, we calculated 16 features (statistical, heartbeat, and template-based). With these features and four different feature subsets we trained 8 classifiers using the Embedded Classification Software Toolbox (ECST) and compared the computational costs for each classification decision and the memory demand for each classifier. We conclude that the C4.5 classifier is best for our two-class classification problem (distinction of normal and abnormal heartbeats) with an accuracy of 91.6%. This classifier still needs a detailed feature selection evaluation. Our next steps are implementing the C4.5 classifier for Android-based mobile devices and evaluating the final system using the remaining 30% of the two used databases.
AuthorsH. Leutheuser, S. Gradl, P. Kugler, L. Anneken, M. Arnold, S. Achenbach, B. M. Eskofier
SubmittedIn Proc: 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Date26.08.2014
Insufficient physical activity is the 4th leading risk factor for mortality. The physical activity of a person is reflected in the walking behavior. Different methods for the calculation of the accurate step number exists and most of them are evaluated using different walking speeds measured on a treadmill or using a small sample size of overground walking. In this paper, we introduce the BaSA (Basic Step Activities) dataset consisting of four different step activities (walking, jogging, ascending, and descending stairs) that were performed under natural conditions. We further compare two step segmentation algorithms (a simple peak detection algorithm vs. subsequence Dynamic Time Warping (sDTW)). We calculated a multivariate Analysis of Variance (ANOVA) with repeated measures followed by multiple dependent t-tests with Bonferroni correction to test for significant differences in the two algorithms. sDTW performed equally good compared to the peak detection algorithm, but was not considerably better. In further analysis, continuous, real walking signals with transitions from one step activity to the other step activity should be considered to investigate the adaptability of these two step segmentation algorithms.
AuthorsH. Leutheuser, S. Doelfel, D. Schuldhaus, S. Reinfelder, B. M. Eskofier
SubmittedIn Proc: IEEE-EMBS 11th International Conference on Wearable and Implantable Body Sensor Networks (BSN)
Date16.06.2014
Using multiple inertial sensors for energy expenditure estimation provides a useful tool for the assessment of daily life activities. Due to the high variety of new upcoming sensor types and recommendations for sensor placement to assess physiological human body function, an adaptable inertial sensor fusion-based approach is mandatory. In this paper, two inertial body sensors, consisting of a triaxial accelerometer and a triaxial gyroscope, were placed on hip and ankle. Ten subjects performed two trials of running on a treadmill under three speed levels ([3.2, 4.8, 6.4] km/h). Each sensor source was separately subjected to preprocessing, feature extraction and regression. In the final step, decision level fusion was performed by averaging the predicted results. A mean absolute error of 0.50 MET was achieved against indirect calorimetry. The system allows an easy integration of new sensors without retraining the complete system. This is an advantage over commonly used feature level fusion approaches.
AuthorsSchuldhaus, D. and Dorn, S. and Leutheuser, H. and Tallner, A. and Klucken, J. and Eskofier, B. M.
SubmittedIn Proc: 15th International Conference on Biomedical Engineering (ICBME)
Date15.06.2014
Traditionally, electroencephalography (EEG) recorded during movement has been considered too noise prone to allow for sophisticated analysis. Superimposed electromyogenic activity interferes and masks the EEG signal. Presently, computational techniques such as Independent Component Analysis allow reduction of these artifacts. However, to date, it is relied on the user to select the artifact-contaminated components to reject. To automate this process and to reduce user dependent factors, we trained a support vector machine (SVM) to assist the user in choosing the independent components (ICs) most influenced by electromyogenic artifacts. We designed and conducted a study with specific neck and body movement exercises and collected data from five human participants (35 datasets total). After preprocessing, we decomposed the data by applying the Adaptive Mixture of Independent Component Analysis (AMICA) algorithm. An expert labeled the ICs found in the EEG recordings after decomposition as either ‘myogenic activity’ or ‘non-myogenic activity’. Afterwards, the classifier was evaluated on the dataset of one participant, whose data were not used in the training phase, and obtained 93% sensitivity and 96% specificity. Our study was designed to cover a diverse selection of exercises that stimulate the musculature that most interferes in EEG recordings during movement. This selection should produce similar artifact patterns as seen in most exercises or movements. Although unfamiliar exercises could result in worse classification performance, the results are expected to be equivalent to ours. Our study showed that this tool can help EEG analysis by reliably and efficiently choosing electromyogenic artifact contaminated components after AMICA decomposition, ultimately increasing the speed of data processing.
AuthorsF. Gabsteiger, H. Leutheuser, P. Reis, M. Lochmann, B. M. Eskofier
SubmittedIn Proc: 15th International Conference on Biomedical Engineering (ICBME)
Date15.06.2014
Introduction: The aim of this study was to provide a rationale for future validations of a priori calibrated respiratory inductance plethysmography (RIP) when used under exercise conditions. Therefore, the validity of a posteriori-adjusted gain factors and accuracy in resultant breath-by-breath RIP data recorded under resting and running conditions were examined. Methods: Healthy subjects, 98 men and 88 women (mean ± SD: height = 175.6 ± 8.9 cm, weight = 68.9 ± 11.1 kg, age = 27.1 ± 8.3 yr), underwent a standardized test protocol, including a period of standing still, an incremental running test on treadmill, and multiple periods of recovery. Least square regression was used to calculate gain factors, respectively, for complete individual data sets as well as several data subsets. In comparison with flowmeter data, the validity of RIP in breathing rate (fR) and inspiratory tidal volume (VTIN) were examined using coefficients of determination (R). Accuracy was estimated from equivalence statistics. Results: Calculated gains between different data subsets showed no equivalence. After gain adjustment for the complete individual data set, fR and VTIN between methods were highly correlated (R = 0.96 ± 0.04 and 0.91 ± 0.05, respectively) in all subjects. Under conditions of standing still, treadmill running, and recovery, 86%, 98%, and 94% (fR) and 78%, 97%, and 88% (VTIN), respectively, of all breaths were accurately measured within ± 20% limits of equivalence. Conclusion: In case of the best possible gain adjustment, RIP confidentially estimates tidal volume accurately within ± 20% under exercise conditions. Our results can be used as a rationale for future validations of a priori calibration procedures.
AuthorsC. Heyde, H. Leutheuser, B. M. Eskofier, K. Roecker, A. Gollhofer
SubmittedMed Sci Sports Exerc.
Date01.03.2014
The use of pegylated interferon-\alpha (pegIFN-\alpha) has replaced unmodified recombinant IFN-\alpha for the treatment of chronic viral hepatitis. While the superior antiviral efficacy of pegIFN-\alpha is generally attributed to improved pharmacokinetic properties, the pharmacodynamic effects of pegIFN-\alpha in the liver have not been studied. Here, we analyzed pegIFN-\alpha–induced signaling and gene regulation in paired liver biopsies obtained prior to treatment and during the first week following pegIFN-\alpha injection in 18 patients with chronic hepatitis C. Despite sustained high concentrations of pegIFN-\alpha in serum, the Jak/STAT pathway was activated in hepatocytes only on the first day after pegIFN-\alpha administration. Evaluation of liver biopsies revealed that pegIFN-\alpha induces hundreds of genes that can be classified into four clusters based on different temporal expression profiles. In all clusters, gene transcription was mainly driven by IFN-stimulated gene factor 3 (ISGF3). Compared with conventional IFN-\alpha therapy, pegIFN-\alpha induced a broader spectrum of gene expression, including many genes involved in cellular immunity. IFN-induced secondary transcription factors did not result in additional waves of gene expression. Our data indicate that the superior antiviral efficacy of pegIFN-\alpha is not the result of prolonged Jak/STAT pathway activation in hepatocytes, but rather is due to induction of additional genes that are involved in cellular immune responses.
AuthorsMichael T. Dill, Zuzanna Makowska, Gaia Trincucci, Andreas J. Gruber, Julia E. Vogt, Magdalena Filipowicz, Diego Calabrese, Ilona Krol, Daryl T. Lau, Luigi Terracciano, Erik van Nimwegen, Volker Roth and Markus H. Heim
SubmittedThe Journal of Clinical Investigation
Date23.02.2014
2013
Insufficient physical activity is the 4th leading risk factor for mortality. Methods for assessing the individual daily life activity (DLA) are of major interest in order to monitor the current health status and to provide feedback about the individual quality of life. The conventional assessment of DLAs with self-reports induces problems like reliability, validity, and sensitivity. The assessment of DLAs with small and light-weight wearable sensors (e.g. inertial measurement units) provides a reliable and objective method. State-of-the-art human physical activity classification systems differ in e.g. the number and kind of sensors, the performed activities, and the sampling rate. Hence, it is difficult to compare newly proposed classification algorithms to existing approaches in literature and no commonly used dataset exists. We generated a publicly available benchmark dataset for the classification of DLAs. Inertial data were recorded with four sensor nodes, each consisting of a triaxial accelerometer and a triaxial gyroscope, placed on wrist, hip, chest, and ankle. Further, we developed a novel, hierarchical, multi-sensor based classification system for the distinction of a large set of DLAs. Our hierarchical classification system reached an overall mean classification rate of 89.6% and was diligently compared to existing state-of-the-art algorithms using our benchmark dataset. For future research, the dataset can be used in the evaluation process of new classification algorithms and could speed up the process of getting the best performing and most appropriate DLA classification system.
AuthorsH. Leutheuser, D. Schuldhaus, B. M. Eskofier
SubmittedPLOS ONE
Date09.10.2013
We present a Bayesian approach for estimating the relative frequencies of multi-single nucleotide polymorphism (SNP) haplotypes in populations of the malaria parasite Plasmodium falciparum by using microarray SNP data from human blood samples. Each sample comes from a malaria patient and contains one or several parasite clones that may genetically differ. Samples containing multiple parasite clones with different genetic markers pose a special challenge. The situation is comparable with a polyploid organism. The data from each blood sample indicates whether the parasites in the blood carry a mutant or a wildtype allele at various selected genomic positions. If both mutant and wildtype alleles are detected at a given position in a multiply infected sample, the data indicates the presence of both alleles, but the ratio is unknown. Thus, the data only partially reveals which specific combinations of genetic markers (i.e. haplotypes across the examined SNPs) occur in distinct parasite clones. In addition, SNP data may contain errors at non-negligible rates. We use a multinomial mixture model with partially missing observations to represent this data and a Markov chain Monte Carlo method to estimate the haplotype frequencies in a population. Our approach addresses both challenges, multiple infections and data errors.
AuthorsLeonore Wigger, Julia E. Vogt, Volker Roth
SubmittedStatistics in Medicine: 04/2013
Date19.09.2013
The fusion of inertial sensor data is heavily used for the classification of daily life activities. The knowledge about the performed daily life activities is mandatory to give physically inactive people feedback about their individual quality of life. In this paper, four inertial sensors were placed on wrist, chest, hip and ankle of 19 subjects, which had to perform seven daily life activities. Each sensor node separately performed preprocessing, feature extraction and classification. In the final step, the classifier decisions of the sensor nodes were fused and a single activity was predicted by majority voting. The proposed classification system obtained an overall mean classification rate of 93.9 % and was robust against defect sensors. The system allows an easy integration of new sensors without retraining of the complete system, which is an advantage over commonly used feature level fusion approaches.
AuthorsSchuldhaus, D. and Leutheuser, H. and Eskofier, B. M.
SubmittedIn Proc: 8th International Conference on Body Area Networks (BodyNets)
Date01.09.2013
Electromyogenic or muscle artifacts constitute a major problem in studies involving electroencephalography (EEG) measurements. This is because the rather low signal activity of the brain is overlaid by comparably high signal activity of muscles, especially neck muscles. Hence, recording an artifact-free EEG signal during movement or physical exercise is not, to the best knowledge of the authors, feasible at the moment. Nevertheless, EEG measurements are used in a variety of different fields like diagnosing epilepsy and other brain related diseases or in biofeedback for athletes. Muscle artifacts can be recorded using electromyography (EMG). Various computational methods for the reduction of muscle artifacts in EEG data exist like the ICA algorithm InfoMax and the AMICA algorithm. However, there exists no objective measure to compare different algorithms concerning their performance on EEG data. We defined a test protocol with specific neck and body movements and measured EEG and EMG simultaneously to compare the InfoMax algorithm and the AMICA algorithm. A novel objective measure enabled to compare both algorithms according to their performance. Results showed that the AMICA algorithm outperformed the InfoMax algorithm. In further research, we will continue using the established objective measure to test the performance of other algorithms for the reduction of artifacts.
AuthorsH. Leutheuser, F. Gabsteiger, F. Hebenstreit, P. Reis, M. Lochmann, B. M. Eskofier
SubmittedIn Proc: 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Date03.07.2013
The normal oscillation of the heart rate is called Heart Rate Variability (HRV). HRV parameters change under different conditions like rest, physical exercise, mental stress, and body posture changes. However, results how HRV parameters adapt to physical exercise have been inconsistent. This study investigated how different HRV parameters changed during one hour of running. We used datasets of 295 athletes where each dataset had a total length of about 65 minutes. Data was divided in segments of five minutes and three HRV parameters and one kinematic parameter were calculated for each segment. We applied two different analysis of variance (ANOVA) models to analyze the differences in the means of each segment for every parameter. The two ANOVA models were univariate ANOVA with repeated measures and multivariate ANOVA with repeated measures. The obligatory post-hoc procedure consisted of multiple dependent t tests with Bonferroni correction. We investigated the last three segments of the parameters in more detail and detected a delay of one minute between varying running speed and measured heart rate. Hence, the circulatory system of our population needed one minute to adapt to a change in running speed. The method we provided can be used to further investigate more HRV parameters.
AuthorsH. Leutheuser, B. M. Eskofier
SubmittedInt J Comp Sci Sport
Date01.01.2013
2012
Partitioning methods for observations represented by pairwise dissimilarities are studied. Particular emphasis is put on their properties when applied to dissimilarity matrices that do not admit a loss-free embedding into a vector space. Specifically, the Pairwise Clustering cost function is shown to exhibit a shift invariance property which basically means that any symmetric dissimilarity matrix can be modified to allow a vector-space representation without distorting the optimal group structure. In an approximate sense, the same holds true for a probabilistic generalization of Pairwise Clustering, the so-called Wishart–Dirichlet Cluster Process. This shift-invariance property essentially means that these clustering methods are “blind” against Euclidean or metric violations. From the application side, such blindness against metric violations might be seen as a highly desired feature, since it broadens the applicability of certain algorithms. From the viewpoint of theory building, however, the same property might be viewed as a “negative” result, since studying these algorithms will not lead to any new insights on the role of metricity in clustering problems.
AuthorsVolker Roth, Thomas J. Fuchs, Julia E. Vogt, Sandhya Prabhakaran, Joachim M. Buhmann
SubmittedSimilarity-Based Pattern Analysis and Recognition, 157-177
Date31.12.2012
Introduction IFN-\alpha signals through the Jak-STAT pathway to induce expression of IFN-stimulated genes (ISGs) with antiviral functions. USP18 is an IFN-inducible negative regulator of the Jak-STAT pathway. Upregulation of USP18 results in a long-lasting desensitization of IFN-\alpha signalling. As a result of this IFN-induced refractoriness, ISG levels decrease back to baseline despite continuous presence of the cytokine. Pegylated forms of IFN-\alpha (pegIFN-\alpha) are currently in clinical use for treatment of chronic hepatitis C virus infection. PegIFN-\alphas show increased anti-hepatitis C virus efficacy compared to nonpegylated IFN-\alpha. This has been attributed to the significantly longer plasma half-life of the pegylated form. However, the underlying assumption that persistently high plasma levels obtained with pegIFN-\alpha therapy result in ongoing stimulation of ISGs in the liver has never been tested. In the present study we therefore investigated the kinetics of Jak-STAT pathway activation and ISG induction in the human liver at several time points during the first week of pegIFN-\alpha therapy. Methods 18 patients with chronic hepatitis C underwent a liver biopsy 4 h (n = 6), 16 h, 48 h, 96 h or 144 h (all n = 3) after the first injection of pegIFN-\alpha-2b. Additional 3 patients received pegIFN-\alpha-2a and were biopsied at 144 h. The activation of Jak-STAT signalling and USP18 upregulation were assessed by immunohistochemistry and Western blot. Gene expression analysis was performed using Human Genome U133 Plus 2.0 arrays and Bioconductor packages of R statistical environment. Results A single dose of pegIFN-\alpha-2b resulted in elevated IFN-\alpha plasma levels throughout the one-week dosing interval. Despite the continuous IFN-\alpha exposure, strong activation of the Jak-STAT pathway was only observed at early time points after administration. Almost 500 genes were significantly upregulated in the liver samples following pegIFN-\alpha stimulation. The breadth of transcriptional response to pegIFN-\alpha was maximal 16 h post-injection and decreased gradually, with only few genes significantly upregulated after 144 h of treatment. Bayesian clustering of the gene expression data revealed 4 distinct groups of the ISGs based on the temporal patterns of regulation. Of 494 upregulated ISGs, the expression of 474 peaked 4 h or 16 h after pegIFN-\alpha administration, followed by a steady decline of mRNA levels through the remaining 128 h of treatment. This transient activation of the Jak-STAT pathway coincided with elevated expression of USP18 on the protein level, which was first detectable 16 post-injection. Conclusion PegIFN-\alpha induces a transient activation of Jak-STAT signalling and ISG upregulation in human liver, in spite of persistent high serum concentrations. The short-lived STAT1 phosphorylation and gene induction can be explained by upregulation of USP18 and establishment of refractory state. The superior efficacy of pegIFN-\alpha compared to conventional IFN-\alpha for chronic hepatitis C therapy cannot be explained by persistent signalling and ISG induction during the one-week dosing interval.
AuthorsZ. Makowska, M. T. Dill, Julia E. Vogt, Magdalena Filipowicz Sinnreich, L. Terraciano, Volker Roth, M. H. Heim
SubmittedCytokine 59(3):563–564, 2012
Date11.08.2012
Archetype analysis involves the identification of representative objects from amongst a set of multivariate data such that the data can be expressed as a convex combination of these representative objects. Existing methods for archetype analysis assume a fixed number of archetypes a priori. Multiple runs of these methods for different choices of archetypes are required for model selection. Not only is this computationally infeasible for larger datasets, in heavy-noise settings model selection becomes cumbersome. In this paper, we present a novel extension to these existing methods with the specific focus of relaxing the need to provide a fixed number of archetypes beforehand. Our fast iterative optimization algorithm is devised to automatically select the right model using BIC scores and can easily be scaled to noisy, large datasets. These benefits are achieved by introducing a Group-Lasso component popular for sparse linear regression. The usefulness of the approach is demonstrated through simulations and on a real world application of document analysis for identifying topics.
AuthorsSandhya Prabhakaran, Sudhir Raman, Julia E. Vogt, Volker Roth
SubmittedPattern Recognition: Joint 34th DAGM and 36th OAGM Symposium, Lecture Notes in Computer Science, 2012
Date31.07.2012
The Group-Lasso is a well-known tool for joint regularization in machine learning methods. While the l_{1,2} and the l_{1,\infty} version have been studied in detail and efficient algorithms exist, there are still open questions regarding other l_{1,p} variants. We characterize conditions for solutions of the l_{1,p} Group-Lasso for all p-norms with 1 <= p <= \infty, and we present a unified active set algorithm. For all p-norms, a highly efficient projected gradient algorithm is presented. This new algorithm enables us to compare the prediction performance of many variants of the Group-Lasso in a multi-task learning setting, where the aim is to solve many learning problems in parallel which are coupled via the Group-Lasso constraint. We conduct large-scale experiments on synthetic data and on two real-world data sets. In accordance with theoretical characterizations of the different norms we observe that the weak-coupling norms with p between 1.5 and 2 consistently outperform the strong-coupling norms with p >> 2.
AuthorsJulia E. Vogt, Volker Roth
SubmittedICML 2012: Proceedings of the 29th international conference on Machine Learning
Date17.06.2012
2011
BACKGROUND & AIMS: The host immune response during the chronic phase of hepatitis C virus infection varies among individuals; some patients have a no interferon (IFN) response in the liver, whereas others have full activation of IFN-stimulated genes (ISGs). Preactivation of this endogenous IFN system is associated with nonresponse to pegylated IFN-\alpha (pegIFN-\alpha) and ribavirin. Genome-wide association studies have associated allelic variants near the IL28B (IFN\lambda3) gene with treatment response. We investigated whether IL28B genotype determines the constitutive expression of ISGs in the liver and compared the abilities of ISG levels and IL28B genotype to predict treatment outcome. METHODS: We genotyped 109 patients with chronic hepatitis C for IL28B allelic variants and quantified the hepatic expression of ISGs and of IL28B. Decision tree ensembles, in the form of a random forest classifier, were used to calculate the relative predictive power of these different variables in a multivariate analysis. RESULTS: The minor IL28B allele was significantly associated with increased expression of ISG. However, stratification of the patients according to treatment response revealed increased ISG expression in nonresponders, irrespective of IL28B genotype. Multivariate analysis of ISG expression, IL28B genotype, and several other factors associated with response to therapy identified ISG expression as the best predictor of treatment response. CONCLUSIONS: IL28B genotype and hepatic expression of ISGs are independent predictors of response to treatment with pegIFN-\alpha and ribavirin in patients with chronic hepatitis C. The most accurate prediction of response was obtained with a 4-gene classifier comprising IFI27, ISG15, RSAD2, and HTATIP2.
AuthorsMichael T. Dill, Francois H.T. Duong, Julia E. Vogt, Stephanie Bibert, Pierre-Yves Bochud, Luigi Terracciano, Andreas Papassotiropoulos, Volker Roth and Markus H. Heim
SubmittedGastroenterology, 2011 Mar;140(3):1021-1031.e10
Date28.02.2011
2010
The l_{1,\infty} norm and the l_{1,2} norm are well known tools for joint regularization in Group-Lasso methods. While the l_{1,2} version has been studied in detail, there are still open questions regarding the uniqueness of solutions and the efficiency of algorithms for the l_{1,\infty} variant. For the latter, we characterize the conditions for uniqueness of solutions, we present a simple test for uniqueness, and we derive a highly efficient active set algorithm that can deal with input dimensions in the millions. We compare both variants of the Group-Lasso for the two most common application scenarios of the Group-Lasso, one is to obtain sparsity on the level of groups in “standard” prediction problems, the second one is multi-task learning where the aim is to solve many learning problems in parallel which are coupled via the Group-Lasso constraint. We show that both version perform quite similar in “standard” applications. However, a very clear distinction between the variants occurs in multi-task settings where the l_{1,2} version consistently outperforms the l_{1,\infty} counterpart in terms of prediction accuracy.
AuthorsJulia E. Vogt, Volker Roth
SubmittedPattern Recognition: 32-nd DAGM Symposium, Lecture Notes in Computer Science, 2010
Date31.07.2010
We present a probabilistic model for clustering of objects represented via pairwise dissimilarities. We propose that even if an underlying vectorial representation exists, it is better to work directly with the dissimilarity matrix hence avoiding unnecessary bias and variance caused by embeddings. By using a Dirichlet process prior we are not obliged to fix the number of clusters in advance. Furthermore, our clustering model is permutation-, scale- and translation-invariant, and it is called the Translation-invariant Wishart Dirichlet (TIWD) process. A highly efficient MCMC sampling algorithm is presented. Experiments show that the TIWD process exhibits several advantages over competing approaches.
AuthorsJulia E. Vogt, Sandhya Prabhakaran, Thomas J. Fuchs, Volker Roth
SubmittedICML 2010: Proceedings of the 27th international conference on Machine Learning
Date20.06.2010