Dr.
Ece Özkan Elsen
Established Researcher
- ece.oezkanelsen@inf.ethz.ch
- Address
-
Department of Computer Science
CAB G 33.2
Universitätstr. 6
CH – 8092 Zurich, Switzerland - Room
- CAB G 33.2
I obtained my PhD degree in December 2018 from ETH Zurich. During my Master's Degree, I focused on image processing and machine learning methods using probabilistic models for efficient semantic segmentation and object detection. My doctoral research centered on developing methods to characterize the bio-mechanical properties of soft tissues using ultrasound.
From 2021 to 2022, I worked as a Postdoctoral Researcher in the Medical Data Science group at ETH and concurrently led the Network of Women in Computer Science (CSNOW). In 2023, I was a Postdoctoral Fellow at Department of Brain and Cognitive Sciences at MIT, working on human-inspired machine learning techniques for medical image analysis.
Returning to ETH in May 2024, I resumed my role within the Medical Data Science group as an Established Researcher.
My research interests lie in enhancing the generalization, explainability, and fairness of machine learning models to address medical challenges and interpret medical data.
You can find a video portrait of me here.Publications
Pulmonary hypertension (PH) in newborns and infants is a complex condition associated with several pulmonary, cardiac, and systemic diseases contributing to morbidity and mortality. Thus, accurate and early detection of PH and the classification of its severity is crucial for appropriate and successful management. Using echocardiography, the primary diagnostic tool in pediatrics, human assessment is both time-consuming and expertise-demanding, raising the need for an automated approach. Little effort has been directed towards automatic assessment of PH using echocardiography, and the few proposed methods only focus on binary PH classification on the adult population. In this work, we present an explainable multi-view video-based deep learning approach to predict and classify the severity of PH for a cohort of 270 newborns using echocardiograms. We use spatio-temporal convolutional architectures for the prediction of PH from each view, and aggregate the predictions of the different views using majority voting. Our results show a mean F1-score of 0.84 for severity prediction and 0.92 for binary detection using 10-fold cross-validation and 0.63 for severity prediction and 0.78 for binary detection on the held-out test set. We complement our predictions with saliency maps and show that the learned model focuses on clinically relevant cardiac structures, motivating its usage in clinical practice. To the best of our knowledge, this is the first work for an automated assessment of PH in newborns using echocardiograms.
AuthorsHanna Ragnarsdottir*, Ece Özkan Elsen*, Holger Michel*, Kieran Chin-Cheong, Laura Manduchi, Sven Wellmann†, Julia E. Vogt†* denotes shared first authorship, † denotes shared last authorship
SubmittedInternational Journal of Computer Vision
Date06.02.2024
Appendicitis is among the most frequent reasons for pediatric abdominal surgeries. Previous decision support systems for appendicitis have focused on clinical, laboratory, scoring, and computed tomography data and have ignored abdominal ultrasound, despite its noninvasive nature and widespread availability. In this work, we present interpretable machine learning models for predicting the diagnosis, management and severity of suspected appendicitis using ultrasound images. Our approach utilizes concept bottleneck models (CBM) that facilitate interpretation and interaction with high-level concepts understandable to clinicians. Furthermore, we extend CBMs to prediction problems with multiple views and incomplete concept sets. Our models were trained on a dataset comprising 579 pediatric patients with 1709 ultrasound images accompanied by clinical and laboratory data. Results show that our proposed method enables clinicians to utilize a human-understandable and intervenable predictive model without compromising performance or requiring time-consuming image annotation when deployed. For predicting the diagnosis, the extended multiview CBM attained an AUROC of 0.80 and an AUPR of 0.92, performing comparably to similar black-box neural networks trained and tested on the same dataset.
AuthorsRicards Marcinkevics*, Patricia Reis Wolfertstetter*, Ugne Klimiene*, Kieran Chin-Cheong, Alyssia Paschke, Julia Zerres, Markus Denzinger, David Niederberger, Sven Wellmann, Ece Özkan Elsen†, Christian Knorr†, Julia E. Vogt†* denotes shared first authorship, † denotes shared last authorship
SubmittedMedical Image Analysis
Date01.01.2024
Early detection of cardiac dysfunction through routine screening is vital for diagnosing cardiovascular diseases. An important metric of cardiac function is the left ventricular ejection fraction (EF), where lower EF is associated with cardiomyopathy. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound being a low-cost, real-time, and non-ionizing technology. However, human assessment of echocardiograms for calculating EF is time-consuming and expertise-demanding, raising the need for an automated approach. In this work, we propose using the M(otion)-mode of echocardiograms for estimating the EF and classifying cardiomyopathy. We generate multiple artificial M-mode images from a single echocardiogram and combine them using off-the-shelf model architectures. Additionally, we extend contrastive learning (CL) to cardiac imaging to learn meaningful representations from exploiting structures in unlabeled data allowing the model to achieve high accuracy, even with limited annotations. Our experiments show that the supervised setting converges with only ten modes and is comparable to the baseline method while bypassing its cumbersome training process and being computationally much more efficient. Furthermore, CL using M-mode images is helpful for limited data scenarios, such as having labels for only 200 patients, which is common in medical applications.
AuthorsEce Özkan Elsen*, Thomas M. Sutter*, Yurong Hu, Sebastian Balzer, Julia E. Vogt* denotes shared first authorship
SubmittedGCPR 2023
Date01.09.2023
Appendicitis is among the most frequent reasons for pediatric abdominal surgeries. With recent advances in machine learning, data-driven decision support could help clinicians diagnose and manage patients while reducing the number of non-critical surgeries. However, previous decision support systems for appendicitis have focused on clinical, laboratory, scoring, and computed tomography data and have ignored the use of abdominal ultrasound, despite its noninvasive nature and widespread availability. In this work, we present interpretable machine learning models for predicting the diagnosis, management and severity of suspected appendicitis using ultrasound images. To this end, our approach utilizes concept bottleneck models (CBM) that facilitate interpretation and interaction with high-level concepts that are understandable to clinicians. Furthermore, we extend CBMs to prediction problems with multiple views and incomplete concept sets. Our models were trained on a dataset comprising 579 pediatric patients with 1709 ultrasound images accompanied by clinical and laboratory data. Results show that our proposed method enables clinicians to utilize a human-understandable and intervenable predictive model without compromising performance or requiring time-consuming image annotation when deployed.
AuthorsRicards Marcinkevics*, Patricia Reis Wolfertstetter*, Ugne Klimiene*, Kieran Chin-Cheong, Alyssia Paschke, Julia Zerres, Markus Denzinger, David Niederberger, Sven Wellmann, Ece Özkan Elsen†, Christian Knorr†, Julia E. Vogt†* denotes shared first authorship, † denotes shared last authorship
SubmittedWorkshop on Machine Learning for Multimodal Healthcare Data, Co-located with ICML 2023
Date29.07.2023
Machine learning (ML) is a discipline emerging from computer science with close ties to statistics and applied mathematics. Its fundamental goal is the design of computer programs, or algorithms, that learn to perform a certain task in an automated manner. Without explicit rules or knowledge, ML algorithms observe and possibly, interact with the surrounding world by the use of available data. Typically, as a result of learning, algorithms distil observations of complex phenomena into a general model which summarises the patterns, or regularities, discovered from the data. Modern ML algorithms regularly break records achieving impressive performance at a wide range of tasks, e.g. game playing, protein structure prediction, searching for particles in high-energy physics, and forecasting precipitation. The utility of machine learning methods for healthcare is apparent: it is often argued that given vast amounts of heterogeneous data, our understanding of diseases, patient management and outcomes can be enriched with the insights from machine learning. In this chapter, we will provide a nontechnical introduction to the ML discipline aimed at a general audience with an affinity for biomedical applications. We will familiarise the reader with the common types of algorithms and typical tasks these algorithms can solve and illustrate these basic concepts by concrete examples of current machine learning applications in healthcare. We will conclude with a discussion of the open challenges, limitations, and potential impact of machine-learning-powered medicine.
AuthorsJulia E. Vogt, Ece Özkan Elsen, Ricards Marcinkevics
SubmittedChapter in Digital Medicine: Bringing Digital Solutions to Medical Practice
Date31.03.2023
Many modern research fields increasingly rely on collecting and analysing massive, often unstructured, and unwieldy datasets. Consequently, there is growing interest in machine learning and artificial intelligence applications that can harness this `data deluge'. This broad nontechnical overview provides a gentle introduction to machine learning with a specific focus on medical and biological applications. We explain the common types of machine learning algorithms and typical tasks that can be solved, illustrating the basics with concrete examples from healthcare. Lastly, we provide an outlook on open challenges, limitations, and potential impacts of machine-learning-powered medicine.
AuthorsRicards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt
SubmittedArxiv
Date23.12.2022
Early detection of cardiac dysfunction through routine screening is vital for diagnosing cardiovascular diseases. An important metric of cardiac function is the left ventricular ejection fraction (EF), which is used to diagnose cardiomyopathy. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound being a low-cost, real-time, and non-ionizing technology. However, human assessment of echocardiograms for calculating EF is both time-consuming and expertise-demanding, raising the need for an automated approach. Earlier automated works have been limited to still images or use echocardiogram videos with spatio-temporal convolutions in a complex pipeline. In this work, we propose to generate images from readily available echocardiogram videos, each image mimicking a M(otion)-mode image from a different scan line through time. We then combine different M-mode images using off-the-shelf model architectures to estimate the EF and, thus, diagnose cardiomyopathy. Our experiments show that our proposed method converges with only ten modes and is comparable to the baseline method while bypassing its cumbersome training process.
AuthorsThomas Sutter, Sebastian Balzer, Ece Özkan Elsen, Julia E. Vogt
SubmittedMedical Imaging Meets NeurIPS Workshop 2022
Date02.12.2022
Pulmonary hypertension (PH) in newborns and infants is a complex condition associated with several pulmonary, cardiac, and systemic diseases contributing to morbidity and mortality. Therefore, accurate and early detection of PH is crucial for successful management. Using echocardiography, the primary diagnostic tool in pediatrics, human assessment is both time-consuming and expertise-demanding, raising the need for an automated approach. In this work, we present an interpretable multi-view video-based deep learning approach to predict PH for a cohort of 194 newborns using echocardiograms. We use spatio-temporal convolutional architectures for the prediction of PH from each view, and aggregate the predictions of the different views using majority voting. To the best of our knowledge, this is the first work for an automated assessment of PH in newborns using echocardiograms. Our results show a mean F1-score of 0.84 for severity prediction and 0.92 for binary detection using 10-fold cross-validation. We complement our predictions with saliency maps and show that the learned model focuses on clinically relevant cardiac structures, motivating its usage in clinical practice.
AuthorsHanna Ragnarsdottir, Laura Manduchi, Holger Michel, Fabian Laumer, Sven Wellmann, Ece Özkan Elsen, Julia E. Vogt
SubmittedDAGM German Conference on Pattern Recognition
Date20.09.2022
Deep neural networks for image-based screening and computer-aided diagnosis have achieved expert-level performance on various medical imaging modalities, including chest radiographs. Recently, several works have indicated that these state-of-the-art classifiers can be biased with respect to sensitive patient attributes, such as race or gender, leading to growing concerns about demographic disparities and discrimination resulting from algorithmic and model-based decision-making in healthcare. Fair machine learning has focused on mitigating such biases against disadvantaged or marginalised groups, mainly concentrating on tabular data or natural images. This work presents two novel intra-processing techniques based on fine-tuning and pruning an already-trained neural network. These methods are simple yet effective and can be readily applied post hoc in a setting where the protected attribute is unknown during the model development and test time. In addition, we compare several intra- and post-processing approaches applied to debiasing deep chest X-ray classifiers. To the best of our knowledge, this is one of the first efforts studying debiasing methods on chest radiographs. Our results suggest that the considered approaches successfully mitigate biases in fully connected and convolutional neural networks offering stable performance under various settings. The discussed methods can help achieve group fairness of deep medical image classifiers when deploying them in domains with different fairness considerations and constraints.
AuthorsRicards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt
SubmittedThe Seventh Machine Learning for Healthcare Conference, MLHC 2022
Date05.08.2022
Arguably, interpretability is one of the guiding principles behind the development of machine-learning-based healthcare decision support tools and computer-aided diagnosis systems. There has been a renewed interest in interpretable classification based on high-level concepts, including, among other model classes, the re-exploration of concept bottleneck models. By their nature, medical diagnosis, patient management, and monitoring require the assessment of multiple views and modalities to form a holistic representation of the patient's state. For instance, in ultrasound imaging, a region of interest might be registered from multiple views that are informative about different sets of clinically relevant features. Motivated by this, we extend the classical concept bottleneck model to the multiview classification setting by representation fusion across the views. We apply our multiview concept bottleneck model to the dataset of ultrasound images acquired from a cohort of pediatric patients with suspected appendicitis to predict the disease. The results suggest that auxiliary supervision from the concepts and aggregation across multiple views help develop more accurate and interpretable classifiers.
AuthorsUgne Klimiene*, Ricards Marcinkevics*, Patricia Reis Wolfertstetter, Ece Özkan Elsen, Alyssia Paschke, David Niederberger, Sven Wellmann, Christian Knorr, Julia E Vogt* denotes shared first authorship
SubmittedOral spotlight at the 2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH), ICML 2022
Date23.07.2022
Due to growing concerns about demographic disparities and discrimination resulting from algorithmic and model-based decision-making, recent research has focused on mitigating biases against already disadvantaged or marginalised groups in classification models. From the perspective of classification parity, the two commonest metrics for assessing fairness are statistical parity and equality of opportunity. Current approaches to debiasing in classification either require the knowledge of the protected attribute before or during training or are entirely agnostic to the model class and parameters. This work considers differentiable proxy functions for statistical parity and equality of opportunity and introduces two novel debiasing techniques for neural network classifiers based on fine-tuning and pruning an already-trained network. As opposed to the prior work leveraging adversarial training, the proposed methods are simple yet effective and can be readily applied post hoc. Our experimental results encouragingly suggest that these approaches successfully debias fully connected neural networks trained on tabular data and often outperform model-agnostic post-processing methods.
AuthorsRicards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt
SubmittedContributed talk at ICLR 2022 Workshop on Socially Responsible Machine Learning
Date29.04.2022
Richard Rau, Ece Özkan Elsen, Batu M. Ozturkler, Leila Gastli, Orcun Goksel
SubmittedIEEE International Ultrasonics Symposium (IUS)
Date11.08.2020
Lisa Ruby, Sergio J. Sanabria, Katharina Martini, Konstantin J. Dedes, Denise Vorburger, Ece Özkan Elsen, Thomas Frauenfelder, Orcun Goksel, Marga B. Rominger
SubmittedInvestigative Radiology
Date30.06.2019
Alvaro Gomariz, Weiye Li, Ece Özkan Elsen, Christine Tanner, Orcun Goksel
SubmittedInternational Symposium on Biomedical Imaging (ISBI)
Date06.02.2019
Stefanie Ehrbar, Alexander Jöhl, Michael Kühni, Mirko Meboldt, Ece Özkan Elsen, Christine Tanner, Orcun Goksel, Stephan Klöck, Jan Unkelbach, Matthias Guckenberger, Stephanie Tanadini-Lang
SubmittedMedical Physics
Date03.01.2019
Sergio J Sanabria, Ece Özkan Elsen, Marga Rominger, Orcun Goksel
SubmittedPhysics in Medicine and Biology
Date26.10.2018
Ece Özkan Elsen, Valery Vishnevsky, Orcun Goksel
SubmittedIEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control
Date03.03.2018
Ece Özkan Elsen, Orcun Goksel
SubmittedBiomedical Physics & Engineering Express
Date10.01.2018
Ece Özkan Elsen, Christine Tanner, Matej Kastelic, Oliver Mattausch, Maxim Makhinya, Orcun Goksel
SubmittedInternational Journal of Computer Assisted Radiology and Surgery
Date22.03.2017
Ece Özkan Elsen, Gemma Roig, Orcun Goksel, Xavier Boix
SubmittedarXiv
Date27.05.2016
Firat Ozdemir, Ece Özkan Elsen, Orcun Goksel
SubmittedInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)
Date27.05.2016
Ece Özkan Elsen, Orcun Goksel
SubmittedIEEE International Ultrasonics Symposium (IUS)
Date27.05.2016
Ece Özkan Elsen, Orcun Goksel
SubmittedInternational Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Date27.05.2015