The medical data science group carries out research at the intersection of machine learning and medicine with the ultimate goal of improving diagnosis and treatment outcome to the benefit of the care and wellbeing of patients. As medical and health data is heterogenous and multimodal, our research deals with the advancement of machine learning models and methodologies to address the specific challenges of the medical domain. Specifically, we work in the areas of multimodal data integration, structure detection, and trustworthy (or transparent) models. The challenge lies not only in developing fast, robust and reliable systems but also in systems that are easy to interpret and usable in clinical practice.
New Timeline Documents 30+ Years of Promoting Women in Computer Science at D-INFK
The Department of Computer Science (D-INFK) at ETH Zurich has published a new historical timeline documenting the development of its women’s promotion…
Dr Ece Özkan Elsen appointed as BRCCH Professor of Paediatric Digital Health Data Analysis
We are excited to announce that Dr. Ece Ozkan Elsen, currently an Established Researcher in our group, will be transitioning to her new role as…
MDS at NeurIPS 2024
Several members of the MDS group attended NeurIPS 2024. Congratulations to everyone who presented work at the main conference and workshops!
Background In infants, pulmonary hypertension (PH) increases morbidity and mortality. Echocardiography, though standard, is time- and expertise-demanding. We propose a deep learning approach for automated PH detection using standard echocardiography videos, validated by the systolic eccentricity index (EIs). Methods The training and validation set comprised 975 videos and the held-out set 378 videos, including five echocardiographic standard views from infants aged 3–90 days, taken between 2018–2021 and 2021–2022, respectively. Echocardiograms were labeled as PH (EIs < 0.82) and healthy (EIs ≥ 0.87). After preprocessing and random segmentation of all videos into 13.530 frames, spatial and spatio-temporal convolutional neural network architectures were used for training of a PH prediction model and gradient-weighted class activation mapping for explainability. Results The best single-view performance was achieved using parasternal short axis view (AUROC spatial and spatio-temporal: 0.91 and 0.94 in validation set, 0.93 and 0.88 in held-out set, respectively). Combination of three standard views improved accuracy with AUROC 0.96 and 0.90 in validation (spatio-temporal) and held-out set (spatial), respectively. Saliency maps revealed model focus on clinically relevant regions, including interventricular septum and left atrial filling. Conclusions The presented deep learning model for automated detection of PH in neonates shows high accuracy, explainability, and reproducibility.
AuthorsHolger Michel, Ece Ozkan, Kieran Chin-Cheong, Anna Badura, Verena Lehnerer, Stephan Gerling, Julia E. Vogt, Sven Wellmann
SubmittedPediatric Research
Date24.09.2025
Anomaly detection focuses on identifying samples that deviate from the norm. Discovering informative representations of normal samples is crucial to detecting anomalies effectively. Recent self-supervised methods have successfully learned such representations by employing prior knowledge about anomalies to create synthetic outliers during training. However, we often do not know what to expect from unseen data in specialized real-world applications. In this work, we address this limitation with our new approach, Con2, which leverages prior knowledge about symmetries in normal samples to observe the data in different contexts. Con2 consists of two parts: Context Contrasting clusters representations according to their context, while Content Alignment encourages the model to capture semantic information by aligning the positions of normal samples across clusters. The resulting representation space allows us to detect anomalies as outliers of the learned context clusters. We demonstrate the benefit of this approach in extensive experiments on specialized medical datasets, outperforming competitive baselines based on self-supervised learning and pretrained models and presenting competitive performance on natural imaging benchmarks.
AuthorsAlain Ryser, Thomas M. Sutter, Alexander Marx, Julia E. Vogt
SubmittedTransactions on Machine Learning Research
Date16.09.2025
General movements (GMs) are spontaneous, coordinated body movements in infants that offer valuable insights into the developing nervous system. Assessed through the Prechtl GM Assessment (GMA), GMs are reliable predictors for neurodevelopmental disorders. However, GMA requires specifically trained clinicians, who are limited in number. To scale up newborn screening, there is a need for an algorithm that can automatically classify GMs from infant video recordings. This data poses challenges, including variability in recording length, device type, and setting, with each video coarsely annotated for overall movement quality. In this work, we introduce a tool for extracting features from these recordings and explore various machine learning techniques for automated GM classification.
AuthorsDaphne Chopard*, Sonia Laguna*, Kieran Chin-Cheong*, Annika Dietz, Anna Badura, Sven Wellmann, Julia E Vogt* denotes shared first authorship
SubmittedProceedings of Machine Learning Research - Machine Learning for Healthcare 2025, previous version in ICLR 2025 (Best Paper Award - Oral) Workshop AI4CHL
Date15.08.2025
Building generalizable medical AI systems requires pretraining strategies that are data-efficient and domain-aware. Unlike internet-scale corpora, clinical datasets such as MIMIC-CXR offer limited image counts and scarce annotations, but exhibit rich internal structure through multi-view imaging. We propose a self-supervised framework that leverages the inherent structure of medical datasets. Specifically, we treat paired chest X-rays (i.e., frontal and lateral views) as natural positive pairs, learning to reconstruct each view from sparse patches while aligning their latent embeddings. Our method requires no textual supervision and produces informative representations. Evaluated on MIMIC-CXR, we show strong performance compared to supervised objectives and baselines being trained without leveraging structure. This work provides a lightweight, modality-agnostic blueprint for domain-specific pretraining where data is structured but scarce.
AuthorsAndrea Agostini*, Sonia Laguna*, Alain Ryser*, Samuel Ruiperez-Campillo*, Moritz Vandenhirtz, Nicolas Deperrois, Farhad Nooralahzadeh, Michael Krauthammer, Thomas M Sutter†, Julia E Vogt†* denotes shared first authorship, † denotes shared last authorship
SubmittedInternational Conference of Machine Learning (ICML) 2025 Workshop on FM4LS
Date15.07.2025
We introduce Concept Bottleneck Reward Models (CB-RM), a reward modeling framework that enables interpretable preference learning through selective concept annotation. Unlike standard RLHF methods that rely on opaque reward functions, CB-RM decomposes reward prediction into human-interpretable concepts. To make this framework efficient in low-supervision settings, we formalize an active learning strategy that dynamically acquires the most informative concept labels. We propose an acquisition function based on Expected Information Gain and show that it significantly accelerates concept learning without compromising preference accuracy. Evaluated on the UltraFeedback dataset, our method outperforms baselines in interpretability and sample efficiency, marking a step towards more transparent, auditable, and human-aligned reward models.
AuthorsSonia Laguna, Katarzyna Kobalczyk, Julia E Vogt, Mihaela Van der Schaar
SubmittedInternational Conference on Machine Learning (ICML) 2025 Workshop on PRAL
Date12.07.2025
					
					
					
					

