The medical data science group carries out research at the intersection of machine learning and medicine with the ultimate goal of improving diagnosis and treatment outcome to the benefit of the care and wellbeing of patients. As medical and health data is heterogenous and multimodal, our research deals with the advancement of machine learning models and methodologies to address the specific challenges of the medical domain. Specifically, we work in the areas of multimodal data integration, structure detection, and trustworthy (or transparent) models. The challenge lies not only in developing fast, robust and reliable systems but also in systems that are easy to interpret and usable in clinical practice.
Samuel Ruiperez-Campillo receives awards from the European Society of Cardiology, the American Heart Association, and Computing in Cardiology in 2025.
Congratulations to Samuel Ruiperez-Campillo on receiving the Best Oral Presentation Award from the European Society of Cardiology at the Digital…
Ricards Marcinkevics receives ABB Research Prize
Congratulations to Ricards Marcinkevics on receiving the 2025 ABB Research Prize, which was presented at the 2025 ETH Day, for his doctoral thesis …
New Timeline Documents 30+ Years of Promoting Women in Computer Science at D-INFK
The Department of Computer Science (D-INFK) at ETH Zurich has published a new historical timeline documenting the development of its women’s promotion…
Cardiac electrophysiology (EP) time series are frequently degraded by various sources of overlapping artifacts that obscure physiologically relevant morphologies. This is particularly limiting for intracardiac electrograms, an opaque yet potentially informative modality, and remains relevant for large-scale surface electrocardiograms (ECGs) denoising. We cast denoising as conditional generation using conditional denoising diffusion probabilistic models (cDDPMs). We further introduce an antithetic-variable (AV) sampling approach that couples reverse diffusion trajectories via additive-inverse Gaussian transition noise at each step. This variance-reduction scheme improves reconstruction and stabilizes uncertainty estimates without increasing the number of reverse-chain evaluations. We evaluate on a ventricular electrogram cohort of over 6000 curated samples from over 50 patients, as well as the QTDB + Noise Stress Test ECG benchmark. The proposed AV-cDDPM suppresses baseline wander, powerline contamination, spikes, and EP-derived residual artifacts while preserving depolarization–repolarization morphology in monophasic action potentials (MAPs) and ECG waveforms. Additionally, AV-cDDPM achieves state-of-the-art reconstruction on MAPs (root mean square error 3.32 x 10−3, Pearson correlation coefficient 0.978) and improves ECG denoising (cosine similarity 0.926) versus crude Monte Carlo sampling and competitive baselines. Importantly, MAP denoising translates to improved recovery of repolarization markers and time-resolved uncertainty maps aligned with physiological transitions with direct potential of clinical deployment. Overall, variance-reduced conditional diffusion enables uncertainty-aware, compute-efficient denoising both for therapeutically-relevant intracardiac signals and ubiquitous ECGs supporting clinical reliability with reduced manual curation burden.
AuthorsSamuel Ruipérez-Campillo, Pablo Blasco-Fernández, Moritz Rau, Prasanth Ganesan, Sabyasachi Bandyopadhyay, Charles Sillett, Lukas P Arts, Esteban Peralta, Albert J Rogers, Fleur VY Tjong, Sanjiv M Narayan, Julia E Vogt
SubmittedIEEE Journal of Biomedical and Health Informatics
Date22.06.2026
Background and Objectives: The analysis of intracardiac electrograms (EGMs) is crucial for understanding the electrophysiological mechanisms underlying cardiac arrhythmias. However, the diversity of mapping systems and data formats poses challenges for data integration, visualization, and the comparison of electrophysiological biomarkers. In this work, we develop a modular and extensible software platform for the visualization, analysis, and comparison of intracardiac electrograms and related biomarkers in both research and clinical contexts. Methods: A custom software tool was developed with an interactive graphical user interface (GUI) capable of loading, processing, and visualizing cardiac mapping data. The platform supports three-dimensional (3D) representations of cardiac geometries and allows users to display multiple biomarkers including peak-to-peak amplitude, local activation times (LATs), ablation points, and other derived metrics. The tool enables the visualization of unipolar and bipolar electrograms, as well as the computation of omnipolar signals. The open-source software is made available. Results: The platform provides a unified environment for handling heterogeneous electrophysiological data. Users can visualize cardiac geometries, overlay various signal-derived maps, and interactively inspect local electrograms. The modular design enables the integration of new biomarkers and analytical methods with minimal effort, facilitating both clinical interpretation and research development. Conclusions: This software bridges a key gap in electrophysiological data analysis by enabling consistent visualization and comparison of intracardiac signals across datasets and biomarkers. Its modular and extensible design supports ongoing research in cardiac electrophysiology and provides clinicians with an intuitive environment for multimodal analysis.
AuthorsElisa Ramírez, Raul Alós, Samuel Ruipérez-Campillo, Raquel Cervigón, Francisco Castells, José Millet
SubmittedSoftwareX
Date15.06.2026
Objective:Accurate pre-procedural identification of atrial flutter (AFL) mechanisms can streamline mapping and indirectly inform ablation strategy, yet surface-electrocardiogram (ECG) criteria remain unreliable and circuit definition is typically confirmed invasively.Methods and procedures:We analyzed 97 consecutive patients undergoing electrophysiological (EP) study with simultaneous 12-lead ECG and EP-verified AFL subtype; adenosine-induced atrioventricular AV block enabled extraction of clean atrial segments. We reconstructed atrial vectorcardiograms (VCGs) and engineered interpretable descriptors of loop morphology and kinematics, including archetype cosine correlation, geometric complexity, and velocity-based slow-occupancy indices, then fused these with clinical variables in an explainable tree-ensemble model evaluated with nested cross-validation.Results:VCG loops exhibited subtype-specific archetypes (within-class correlation: 0.832±0.129 CCCW, 0.874±0.154 CCW, 0.647±0.127 PMCCW, 0.667±0.159 PMCW; C: common; PM: perimitral; CW: clockwise; CCW: counter-CW). On the test set, the multimodal Random-Forest improved discrimination over VCG-only and clinical-only baselines, achieving AUROC of 0.870 (CCCW), 0.900 (CCW), 0.840 (PMCCW), and 0.790 (PMCW), with high sensitivity for common AFL (0.833 and 0.929) and very high specificity for PMCW (0.988).Conclusion:This interpretable framework provides a practical route to non-invasive, mechanism-oriented AFL stratification to support targeted mapping and more efficient ablation planning. Future work will focus on multicenter prospective validation and robust atrial-signal extraction without adenosine to broaden routine applicability.
AuthorsSamuel Ruipérez-Campillo, David Hernando, Elisa Ramírez, Sergio Castrejón, Cecilia Zapata, Carlos Rodríguez-Carneiro, Julia E. Vogt, José L. Merino, Francisco Castells, José Millet
SubmittedIEEE Journal of Translational Engineering in Health and Medicine
Date19.05.2026
Abstract Foundation models achieve state-of-the-art performance across different tasks, but their size and computational demands raise concerns about accessibility and sustainability. Existing efficiency methods often require additional retraining or fine-tuning, limiting their practicality. Recent findings suggest that deep neural networks exhibit internal representation similarities. While such similarities across different models have been exploited for enabling techniques such as model stitching and merging, intra-network redundancy remains underexplored as a source for efficiency gains. In this paper, we introduce Transformer Optimization using Adaptive and Simple Transformations (TOAST), a framework that exploits these redundancies to approximate entire transformer blocks with lightweight closed-form mappings, such as linear transformations or even the identity function, without any additional training. Across state-of-the-art pretrained vision models (e.g., ViT, DINOv2, DeiT) and datasets ranging from MNIST to ImageNet-1k, TOAST reduces parameters and computation while preserving, and in some cases improving, downstream performance. These results show that large portions of transformer depth can be replaced by trivial functions, opening a new perspective on efficient foundation models.
AuthorsIrene Cannistraci, Simone Antonelli, Emanuele Palumbo, Thomas M. Sutter, Emanuele Rodolà, Bastian Rieck†, Julia E. Vogt, †† denotes shared last authorship
SubmittedJournal Transactions on Machine Learning Research (TMLR)
Date13.05.2026
Multi-domain fine-tuning of large language models requires improving performance on target domains while preserving performance on constrained domains, such as general knowledge, instruction following, or safety evaluations. Existing data mixing strategies rely on fixed heuristics or adaptive rules that cannot explicitly enforce preservation of such capabilities. We propose DynaMiCS, a dynamic mixture optimizer that casts multi-domain fine-tuning as a constrained optimization problem. At each update, DynaMiCS performs short domain-specific probing runs to estimate a slope matrix of local cross-domain effects, capturing how training on each fine-tuning dataset affects each evaluation domain. These estimates are then used to compute mixture weights through optimization over the probability simplex, with the objective of improving target-domain performance while keeping constrained-domain losses below reference levels. Across multi-domain fine-tuning scenarios with varying numbers of target and constrained domains, DynaMiCS achieves stronger target-domain improvements and higher constraint satisfaction than fixed-mixture baselines, at lower computational cost and without reference models, per-example scoring, or manually tuned mixture weights.
AuthorsEleonora Gualdoni, Sonia Laguna, Louis Bethune, Joao Monteiro, Pierre Ablin, Marco Cuturi
Submittedarxiv
Date11.05.2026


