The medical data science group carries out research at the intersection of machine learning and medicine with the ultimate goal of improving diagnosis and treatment outcome to the benefit of the care and wellbeing of patients. As medical and health data is heterogenous and multimodal, our research deals with the advancement of machine learning models and methodologies to address the specific challenges of the medical domain. Specifically, we work in the areas of multimodal data integration, structure detection, and trustworthy (or transparent) models. The challenge lies not only in developing fast, robust and reliable systems but also in systems that are easy to interpret and usable in clinical practice.


Congratulations to Samuel Ruiperez-Campillo on receiving the Best Oral Presentation Award from the European Society of Cardiology at the Digital…

Read more

Congratulations to Ricards Marcinkevics on receiving the 2025 ABB Research Prize, which was presented at the 2025 ETH Day, for his doctoral thesis

Read more

The Department of Computer Science (D-INFK) at ETH Zurich has published a new historical timeline documenting the development of its women’s promotion…

Read more

Abstract

Abstract Foundation models achieve state-of-the-art performance across different tasks, but their size and computational demands raise concerns about accessibility and sustainability. Existing efficiency methods often require additional retraining or fine-tuning, limiting their practicality. Recent findings suggest that deep neural networks exhibit internal representation similarities. While such similarities across different models have been exploited for enabling techniques such as model stitching and merging, intra-network redundancy remains underexplored as a source for efficiency gains. In this paper, we introduce Transformer Optimization using Adaptive and Simple Transformations (TOAST), a framework that exploits these redundancies to approximate entire transformer blocks with lightweight closed-form mappings, such as linear transformations or even the identity function, without any additional training. Across state-of-the-art pretrained vision models (e.g., ViT, DINOv2, DeiT) and datasets ranging from MNIST to ImageNet-1k, TOAST reduces parameters and computation while preserving, and in some cases improving, downstream performance. These results show that large portions of transformer depth can be replaced by trivial functions, opening a new perspective on efficient foundation models.

Authors

Irene Cannistraci, Simone Antonelli, Emanuele Palumbo, Thomas M. Sutter, Emanuele Rodolà, Bastian Rieck, Julia E. Vogt,
denotes shared last authorship

Submitted

Journal Transactions on Machine Learning Research (TMLR)

Date

13.05.2026

LinkCode

Abstract

Background Gait impairment is a hallmark motor deficit of Parkinson’s disease (PD) and represents an important, yet insufficiently understood, target of subthalamic deep brain stimulation (DBS). Although DBS can improve several motor symptoms, identifying robust and physiologically meaningful gait biomarkers that capture both disease-related deficits and stimulation-induced improvements remains a major challenge. In particular, conventional mean-based gait metrics often fail to fully characterize pathological gait or treatment responsiveness. Methods We analyzed 35 spatiotemporal gait parameters obtained during continuous walking from individuals with PD assessed before and after subthalamic DBS, alongside age-matched healthy controls. Multiple machine learning classifiers were evaluated to discriminate between groups, with extreme gradient boosting (XGBoost) achieving the best performance. To enhance interpretability and reduce redundancy among correlated parameters, grouped SHapley Additive exPlanations (SHAP) were applied to rank feature importance and guide feature selection. Results Feature selection consistently highlighted step width variability, step width asymmetry, bilateral interlimb coordination, and the anteroposterior margin of stability as the most discriminative parameters. A compact set of five overlapping features after selection not only reliably distinguished PD gait from healthy controls but also demonstrated a shift toward healthy ranges following DBS. Importantly, these selected features outperformed conventional mean-based metrics in capturing both pathological gait characteristics and treatment-related changes. Discussion Our findings demonstrate that explainable artificial intelligence approaches can identify physiologically grounded gait features that may serve as candidate markers of both PD severity and DBS responsiveness. By emphasizing variability,

Authors

Zhongke Mei, Alain Ryser, Gianluca Amprimo, Jinhao Wang, Julia Vogt, Deepak K Ravi

Submitted

Journal of NeuroEngineering and Rehabilitation

Date

27.04.2026

LinkDOI

Abstract

Reducing electrophysiological (EP) signal noise is essential for diagnosis, mapping, and ablation procedures in patients with arrhythmias or conditions such as cardiomyopathies. However, traditional approaches have been suboptimal due to the varied sources of noise. We hypothesized that variational autoencoders (VAEs) can learn key components of ’clean’ electrophysiological signals by creating robust internal representations, thereby enabling automatic denoising of diverse noise in clinical recordings. We set out to apply a β-VAE model to a dataset of 5706 intra-ventricular monophasic action potential (MAP) signals, selected because their morphology is verifiable and measurable against a reference, from 42 patients with ischemic cardiomyopathy at risk for sudden death. We designed a noise library, and implemented baselines based on state-of-the-art clinical filtering techniques. The proposed β-VAE model was assessed for various noise types, including challenging non-stationary real EP noise. Comprehensive evaluation using general metrics and clinical action potential duration labels by domain experts revealed that our β-VAE outperformed current state-of-the-art filters in denoising efficacy, with key physiological information encoded in the reconstruction. We performed a sensitivity analysis that confirmed the robustness of the β-VAE model to increasing noise levels. These results demonstrate the ability of our model to denoise various sources, including those of time-varying nature. The application to well-studied MAPs verifies that clinically meaningful features were reconstructed in the EP context. This work enhances traditional signal processing approaches to ensure ’clean’ electrical signals, and may have promising applications for diagnosis, tracking therapy and prognostication in patients with EP disorders in real-world clinical environments.

Authors

Samuel Ruipérez-Campillo, Alain Ryser, Thomas M Sutter, Brototo Deb, Ruibin Feng, Prasanth Ganesan, Kelly A Brennan, Albert J Rogers, Maarten ZH Kolk, Fleur VY Tjong, Sanjiv M Narayan, Julia E Vogt
denotes shared last authorship

Submitted

Expert Systems with Applications

Date

01.03.2026

LinkDOI

Abstract

BACKGROUND: Mapping of heart rhythms is influenced by the size and configuration of the mapping electrodes. Whether a recorded electrogram represents near (local) or remote activity influences diagnosis and treatment, yet is affected by mapping characteristics that are often undefined. METHODS: We developed biophysical computational models to predict interactions between the recording tool and cardiac tissue in coherent and disorganized rhythms, which we validated in clinical recordings. RESULTS: Biophysical computational models demonstrated the ability to quantify and visualize the recording antennae for different electrode configurations. Our results show that unipolar electrograms reflected a recording antenna within 3-dimensional ellipsoids of radius 8 mm across-tissue and 2.7 mm transmurally. Bipolar electrogram antennae align with propagation direction in ellipsoids of long axis radius 1.7, 5.7, and 8.3 mm for 2, 5, and 10 mm spacing, respectively, and often extend beyond the physical extent of electrodes. Notably, omnipolar electrograms, constructed from orthogonal bipoles in a triangular configuration, retained some directional preferences of bipolar electrograms, with a complex relationship between electrode orientation and wave direction. When tested clinically on high-resolution, narrow field (grid) catheters and moderate-to-low resolution, global (basket) catheters, antennae varied more with electrode type (correlation coefficient of 0.43 unipolar, 0.05 bipolar, and 0.26 omnipolar; P<0.001) and spacing (correlation coefficient of 0.36 versus 0.42; P=0.002) than the precise electrode size. CONCLUSIONS: This novel computational-clinical system approach enabled us to systematically compare electrode configurations. This work may help interpret signals in complex biological rhythms, such as atrial fibrillation, and may influence the design of novel catheter configurations and signal processing approaches to identify local tissue signals.

Authors

Miguel Rodrigo, Samuel Ruipérez-Campillo, Prasanth Ganesan, Ruibin Feng, Sanjiv M. Narayan

Submitted

Circulation: Arrhythmia and Electrophysiology

Date

26.02.2026

LinkDOI

Abstract

Recent advances in vision–language models (VLMs) have improved Chest X-ray (CXR) interpretation in multiple aspects. However, many medical VLMs rely solely on supervised fine-tuning (SFT), which optimizes next-token prediction without evaluating answer quality. In contrast, reinforcement learning (RL) can incorporate task-specific feedback, and its combination with explicit intermediate reasoning (``thinking'') has demonstrated substantial gains on verifiable math and coding tasks. To investigate the effects of RL and thinking in a CXR VLM, we perform large-scale SFT on CXR data to build an updated RadVLM based on Qwen3-VL, followed by a cold-start SFT stage that equips the model with basic thinking ability. We then apply Group Relative Policy Optimization (GRPO) with clinically grounded, task-specific rewards for report generation and visual grounding, and run matched RL experiments on both domain-specific and general-domain Qwen3-VL variants, with and without thinking. Across these settings, we find that while strong SFT remains crucial for high base performance, RL provides additional gains on both tasks, whereas explicit thinking does not appear to further improve results. Under a unified evaluation pipeline, the RL-optimized RadVLM models outperform their baseline counterparts and reach state-of-the-art performance on both report generation and grounding, highlighting clinically aligned RL as a powerful complement to SFT for medical VLMs.

Authors

Benjamin Gundersen, Nicolas Deperrois, Samuel Ruipérez-Campillo, Thomas M. Sutter, Julia E. Vogt, Michael Moor, Farhad Nooralahzadeh, Michael Krauthammer

Submitted

MIDL 2026

Date

14.02.2026

Link