The medical data science group carries out research at the intersection of machine learning and medicine with the ultimate goal of improving diagnosis and treatment outcome to the benefit of the care and wellbeing of patients. As medical and health data is heterogenous and multimodal, our research deals with the advancement of machine learning models and methodologies to address the specific challenges of the medical domain. Specifically, we work in the areas of multimodal data integration, structure detection, and trustworthy (or transparent) models. The challenge lies not only in developing fast, robust and reliable systems but also in systems that are easy to interpret and usable in clinical practice.

MDS at ICLR 2024

Several members of the MDS group attended ICLR 2024. Congratulations to everyone who presented work at the main conference and workshops!

Artificial intelligence detects heart defects in newborns

Our recent paper "The Deep Learning Based Prediction of Pulmonary Hypertension in Newborns Using Echocardiograms", published together with KUNO Klinik…

Thomas and Imant defend PhD thesis in 2023

Congratulations to Thomas Sutter and Imant Daunhawer, who both successfully defended their PhD Theses in 2023.

Thomas' thesis is titled "Imposing and…

Performant machine learning models are becoming increasingly complex and large. Due to their black-box design, they often have limited utility in exploratory data analysis and evoke little trust in non-expert users. Interpretable and explainable machine learning research emerges from application domains where, for technical or social reasons, interpreting or explaining the model's predictions or parameters is deemed necessary. In practice, interpretability and explainability are attained by (i) constructing models understandable to users by design and (ii) developing techniques to help explain already-trained black-box models. This thesis develops interpretable and explainable machine learning models and methods tailored to applications in biomedical and healthcare data analysis. The challenges posed by this domain require nontrivial solutions and deserve special treatment. In particular, we consider practical use cases with high-dimensional and unstructured data types, diverse application scenarios, and different stakeholder groups, which all dictate special design considerations. We demonstrate that, beyond social and ethical value, interpretability and explainability help in (i) performing exploratory data analysis, (ii) supporting medical professionals' decisions, (iii) facilitating interaction with users, and (iv) debugging the model. Our contributions are structured in two parts, tackling distinct research questions from the perspective of biomedical and healthcare applications. Firstly, we explore how to develop and incorporate inductive biases to render neural network models interpretable. Secondly, we study how to leverage explanation methods to interact with and edit already-trained black-box models. This work spans several model and method families, including interpretable neural network architectures, prototype- and concept-based models, and attribution methods. Our techniques are motivated by classic biomedical and healthcare problems, such as time series, survival, and medical image analysis. In addition to new model and method development, we concentrate on empirical comparison, providing proof-of-concept results on real-world biomedical benchmarks. Thus, the primary contribution of this thesis is the development of interpretable models and explanation methods with a principled treatment of specific biomedical and healthcare data types to solve application- and user-grounded problems. Through concrete use cases, we show that interpretability and explainability are context- and user-specific and, therefore, must be studied in conjunction with their application domain. We hope that our methodological and empirical contributions pave the way for future application- and user-driven interpretable and explainable machine learning research.


Ricards Marcinkevics


Doctoral thesis





Sudden cardiac death (SCD) remains a pressing health issue, affecting hundreds of thousands each year globally. The heterogeneity among SCD victims, ranging from individuals with severe heart failure to seemingly healthy individuals, poses a significant challenge for effective risk assessment. Conventional risk stratification, which primarily relies on left ventricular ejection fraction, has resulted in only modest efficacy of implantable cardioverter-defibrillators for SCD prevention. In response, artificial intelligence (AI) holds promise for personalized SCD risk prediction and tailoring preventive strategies to the unique profiles of individual patients. Machine and deep learning algorithms have the capability to learn intricate nonlinear patterns between complex data and defined end points and leverage these to identify subtle indicators and predictors of SCD that may not be apparent through traditional statistical analysis. However, despite the potential of AI to improve SCD risk stratification, there are important limitations that need to be addressed. We aim to provide an overview of the current state-of-the-art of AI prediction models for SCD, highlight the opportunities for these models in clinical practice, and identify the key challenges hindering widespread adoption.


MZH Kolk, S Ruipérez-Campillo, AAM Wilde, RE Knops, SM Narayan, FVY Tjong


Heart Rhythm





Concept Bottleneck Models (CBMs) have emerged as a promising interpretable method whose final prediction is based on intermediate, human-understandable concepts rather than the raw input. Through time-consuming manual interventions, a user can correct wrongly predicted concept values to enhance the model's downstream performance. We propose Stochastic Concept Bottleneck Models (SCBMs), a novel approach that models concept dependencies. In SCBMs, a single-concept intervention affects all correlated concepts. Leveraging the parameterization, we derive an effective intervention strategy based on the confidence region. We show empirically on synthetic tabular and natural image datasets that our approach improves intervention effectiveness significantly. Notably, we showcase the versatility and usability of SCBMs by examining a setting with CLIP-inferred concepts, alleviating the need for manual concept annotations.


Moritz Vandenhirtz*, Sonia Laguna*, Ricards Marcinkevics, Julia E. Vogt
* denotes shared first authorship


ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling, Workshop on Models of Human Feedback for AI Alignment, and Workshop on Humans, Algorithmic Decision-Making and Society





Background: Segmenting computed tomography (CT) is crucial in various clinical applications, such as tailoring personalized cardiac ablation for managing cardiac arrhythmias. Automating segmentation through machine learning (ML) is hindered by the necessity for large, labeled training data, which can be challenging to obtain. This article proposes a novel approach for automated, robust labeling using domain knowledge to achieve high-performance segmentation by ML from a small training set. The approach, the domain knowledge-encoding (DOKEN) algorithm, reduces the reliance on large training datasets by encoding cardiac geometry while automatically labeling the training set. The method was validated in a hold-out dataset of CT results from an atrial fibrillation (AF) ablation study. Methods: The DOKEN algorithm parses left atrial (LA) structures, extracts “anatomical knowledge” by leveraging digital LA models (available publicly), and then applies this knowledge to achieve high ML segmentation performance with a small number of training samples. The DOKEN-labeled training set was used to train a nnU-Net deep neural network (DNN) model for segmenting cardiac CT in N = 20 patients. Subsequently, the method was tested in a hold-out set with N = 100 patients (five times larger than training set) who underwent AF ablation. Results: The DOKEN algorithm integrated with the nn-Unet model achieved high segmentation performance with few training samples, with a training to test ratio of 1:5. The Dice score of the DOKEN-enhanced model was 96.7% (IQR: 95.3% to 97.7%), with a median error in surface distance of boundaries of 1.51 mm (IQR: 0.72 to 3.12) and a mean centroid–boundary distance of 1.16 mm (95% CI: −4.57 to 6.89), similar to expert results (r = 0.99; p < 0.001). In digital hearts, the novel DOKEN approach segmented the LA structures with a mean difference for the centroid–boundary distances of −0.27 mm (95% CI: −3.87 to 3.33; r = 0.99; p < 0.0001). Conclusions: The proposed novel domain knowledge-encoding algorithm was able to perform the segmentation of six substructures of the LA, reducing the need for large training data sets. The combination of domain knowledge encoding and a machine learning approach could reduce the dependence of ML on large training datasets and could potentially be applied to AF ablation procedures and extended in the future to other imaging, 3D printing, and data science applications.


P Ganesan*, R Feng*, B Deb, FVY Tjong, AJ Rogers, S Ruipérez-Campillo, S Somani, Paul Clopton, T Baykaner, M Rodrigo, J Zou, F Haddad, M Zaharia, SM Narayan
* denotes shared first authorship







The efficacy of an implantable cardioverter-defibrillator (ICD) in patients with a non-ischaemic cardiomyopathy for primary prevention of sudden cardiac death is increasingly debated. We developed a multimodal deep learning model for arrhythmic risk prediction that integrated late gadolinium enhanced (LGE) cardiac magnetic resonance imaging (MRI), electrocardiography (ECG) and clinical data. Short-axis LGE-MRI scans and 12-lead ECGs were retrospectively collected from a cohort of 289 patients prior to ICD implantation, across two tertiary hospitals. A residual variational autoencoder was developed to extract physiological features from LGE-MRI and ECG, and used as inputs for a machine learning model (DEEP RISK) to predict malignant ventricular arrhythmia onset. In the validation cohort, the multimodal DEEP RISK model predicted malignant ventricular arrhythmias with an area under the receiver operating characteristic curve (AUROC) of 0.84 (95% confidence interval (CI) 0.71–0.96), a sensitivity of 0.98 (95% CI 0.75–1.00) and a specificity of 0.73 (95% CI 0.58–0.97). The models trained on individual modalities exhibited lower AUROC values compared to DEEP RISK [MRI branch: 0.80 (95% CI 0.65–0.94), ECG branch: 0.54 (95% CI 0.26–0.82), Clinical branch: 0.64 (95% CI 0.39–0.87)]. These results suggest that a multimodal model achieves high prognostic accuracy in predicting ventricular arrhythmias in a cohort of patients with non-ischaemic systolic heart failure, using data collected prior to ICD implantation.


MZH Kolk, S Ruipérez-Campillo, CP Allaart, AAM Wilde, RE Knops, SM Narayan, FVY Tjong


Nature Scientific Reports

