Welcome

We work on developing and extending new machine learning techniques for precision medicine, the life sciences and clinical data analysis. This field is exciting and challenging because new methods for a better understanding of diseases are enormous important. The field of action comprises many areas such as prediction of response to treatment in personalized medicine, (sparse) biomarker detection, tumor classification or the understanding of interactions between genes or groups of genes. The challenge lies not only in developing fast, robust and reliable systems but also in systems that are easy to interpret and usable in clinical practice.

News


ETH Podcast: How Machine Learning can help in medicine

Julia Vogt and Fanny Yang appear on the ETH Podcast


Predictors of 5-year mortality in young dialysis patients

Latest paper mentioned on EurekAlert!


Inaugural Lecture from Prof. Dr. Julia Vogt

Julia Vogt will give her inaugural lecture at ETH Zurich on Wednesday, November 4, 2020.


Publications


Abstract

Multimodal generative models learn a joint distribution over multiple modalities and thus have the potential to learn richer representations than unimodal models. However, current approaches are either inefficient in dealing with more than two modalities or fail to capture both modality-specific and shared variations. We introduce a new multimodal generative model that integrates both modality-specific and shared factors and aggregates shared information across any subset of modalities efficiently. Our method partitions the latent space into disjoint subspaces for modality-specific and shared factors and learns to disentangle these in a purely self-supervised manner. In extensive experiments, we show improvements in representation learning and generative performance compared to previous methods and showcase the disentanglement capabilities.

Authors

Imant Daunhawer, Thomas M. Sutter, Ricards Marcinkevics, Julia E. Vogt

Submitted

GCPR

Link

Abstract

Learning from different data types is a long-standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. However, existing generative models that approximate a multimodal ELBO rely on difficult or inefficient training schemes to learn a joint distribution and the dependencies between modalities. In this work, we propose a novel, efficient objective function that utilizes the Jensen-Shannon divergence for multiple distributions. It simultaneously approximates the unimodal and joint multimodal posteriors directly via a dynamic prior. In addition, we theoretically prove that the new multimodal JS-divergence (mmJSD) objective optimizes an ELBO. In extensive experiments, we demonstrate the advantage of the proposed mmJSD model compared to previous work in unsupervised, generative learning tasks.

Authors

Thomas M. Sutter, Imant Daunhawer, Julia E. Vogt

Submitted

Arxiv

Link

Abstract

Background The mortality risk remains significant in paediatric and adult patients on chronic haemodialysis (HD) treatment. We aimed to identify factors associated with mortality in patients who started HD as children and continued HD as adults. Methods The data originated from a cohort of patients < 30 years of age who started HD in childhood (<= 19 years) on thrice-weekly HD in outpatient DaVita dialysis centres between 2004 and 2016. Patients with at least 5 years of follow-up since the initiation of HD or death within 5 years were included; 105 variables relating to demographics, HD treatment and laboratory measurements were evaluated as predictors of 5-year mortality utilizing a machine learning approach (random forest). Results A total of 363 patients were included in the analysis, with 84 patients having started HD at < 12 years of age. Low albumin and elevated lactate dehydrogenase (LDH) were the two most important predictors of 5-year mortality. Other predictors included elevated red blood cell distribution width or blood pressure and decreased red blood cell count, haemoglobin, albumin:globulin ratio, ultrafiltration rate, z-score weight for age or single-pool K_t/V (below target). Mortality was predicted with an accuracy of 81%. Conclusions Mortality in paediatric and young adult patients on chronic HD is associated with multifactorial markers of nutrition, inflammation, anaemia and dialysis dose. This highlights the importance of multimodal intervention strategies besides adequate HD treatment as determined by K_t/V alone. The association with elevated LDH was not previously reported and may indicate the relevance of blood–membrane interactions, organ malperfusion or haematologic and metabolic changes during maintenance HD in this population.

Authors

Verena Gotta, Georgi Tancev, Olivera Marsenic, Julia E. Vogt, Marc Pfister

Submitted

Nephrology Dialysis Transplantation

Link DOI

Abstract

Electronic Health Records (EHRs) are commonly used by the machine learning community for research on problems specifically related to health care and medicine. EHRs have the advantages that they can be easily distributed and contain many features useful for e.g. classification problems. What makes EHR data sets different from typical machine learning data sets is that they are often very sparse, due to their high dimensionality, and often contain heterogeneous (mixed) data types. Furthermore, the data sets deal with sensitive information, which limits the distribution of any models learned using them, due to privacy concerns. For these reasons, using EHR data in practice presents a real challenge. In this work, we explore using Generative Adversarial Networks to generate synthetic, heterogeneous EHRs with the goal of using these synthetic records in place of existing data sets for downstream classification tasks. We will further explore applying differential privacy (DP) preserving optimization in order to produce DP synthetic EHR data sets, which provide rigorous privacy guarantees, and are therefore shareable and usable in the real world. The performance (measured by AUROC, AUPRC and accuracy) of our model's synthetic, heterogeneous data is very close to the original data set (within 3 - 5% of the baseline) for the non-DP model when tested in a binary classification task. Using strong (1,10^-5) DP, our model still produces data useful for machine learning tasks, albeit incurring a roughly 17% performance penalty in our tested classification task. We additionally perform a sub-population analysis and find that our model does not introduce any bias into the synthetic EHR data compared to the baseline in either male/female populations, or the 0-18, 19-50 and 51+ age groups in terms of classification performance for either the non-DP or DP variant.

Authors

Kieran Chin-Cheong, Thomas M. Sutter, Julia E. Vogt

Submitted

Arxiv

Link

Abstract

A project conducted within the K. Bomblies lab at ETHZ to examine root architecture of A. arenosa. Phenotypes for 14 populations were assayed including lateral & adventitious roots, direction of growth, upside-down germination, roots lifting off of the plate, and starch levels in root tips. Additional phenotypes of germination time and rates for these populations were also recorded and analyzed.

Authors

Joyce Y. Kao

Submitted

Open Science Framework

Link DOI