We work on developing and extending new machine learning techniques for precision medicine, the life sciences and clinical data analysis. This field is exciting and challenging because new methods for a better understanding of diseases are enormous important. The field of action comprises many areas such as prediction of response to treatment in personalized medicine, (sparse) biomarker detection, tumor classification or the understanding of interactions between genes or groups of genes. The challenge lies not only in developing fast, robust and reliable systems but also in systems that are easy to interpret and usable in clinical practice.
Latest paper mentioned on EurekAlert!
Julia Vogt will give her inaugural lecture at ETH Zurich on Wednesday, November 4, 2020.
Portrait of Julia Vogt: Interview with eQual! at ETH
Learning from different data types is a long-standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. However, existing generative models that approximate a multimodal ELBO rely on difficult or inefficient training schemes to learn a joint distribution and the dependencies between modalities. In this work, we propose a novel, efficient objective function that utilizes the Jensen-Shannon divergence for multiple distributions. It simultaneously approximates the unimodal and joint multimodal posteriors directly via a dynamic prior. In addition, we theoretically prove that the new multimodal JS-divergence (mmJSD) objective optimizes an ELBO. In extensive experiments, we demonstrate the advantage of the proposed mmJSD model compared to previous work in unsupervised, generative learning tasks.Authors
Thomas M. Sutter, Imant Daunhawer, Julia E. VogtSubmitted
Background The mortality risk remains significant in paediatric and adult patients on chronic haemodialysis (HD) treatment. We aimed to identify factors associated with mortality in patients who started HD as children and continued HD as adults. Methods The data originated from a cohort of patients < 30 years of age who started HD in childhood (<= 19 years) on thrice-weekly HD in outpatient DaVita dialysis centres between 2004 and 2016. Patients with at least 5 years of follow-up since the initiation of HD or death within 5 years were included; 105 variables relating to demographics, HD treatment and laboratory measurements were evaluated as predictors of 5-year mortality utilizing a machine learning approach (random forest). Results A total of 363 patients were included in the analysis, with 84 patients having started HD at < 12 years of age. Low albumin and elevated lactate dehydrogenase (LDH) were the two most important predictors of 5-year mortality. Other predictors included elevated red blood cell distribution width or blood pressure and decreased red blood cell count, haemoglobin, albumin:globulin ratio, ultrafiltration rate, z-score weight for age or single-pool K_t/V (below target). Mortality was predicted with an accuracy of 81%. Conclusions Mortality in paediatric and young adult patients on chronic HD is associated with multifactorial markers of nutrition, inflammation, anaemia and dialysis dose. This highlights the importance of multimodal intervention strategies besides adequate HD treatment as determined by K_t/V alone. The association with elevated LDH was not previously reported and may indicate the relevance of blood–membrane interactions, organ malperfusion or haematologic and metabolic changes during maintenance HD in this population.Authors
Verena Gotta, Georgi Tancev, Olivera Marsenic, Julia E. Vogt, Marc PfisterSubmitted
Nephrology Dialysis Transplantation
Electronic Health Records (EHRs) are commonly used by the machine learning community for research on problems specifically related to health care and medicine. EHRs have the advantages that they can be easily distributed and contain many features useful for e.g. classification problems. What makes EHR data sets different from typical machine learning data sets is that they are often very sparse, due to their high dimensionality, and often contain heterogeneous (mixed) data types. Furthermore, the data sets deal with sensitive information, which limits the distribution of any models learned using them, due to privacy concerns. For these reasons, using EHR data in practice presents a real challenge. In this work, we explore using Generative Adversarial Networks to generate synthetic, heterogeneous EHRs with the goal of using these synthetic records in place of existing data sets for downstream classification tasks. We will further explore applying differential privacy (DP) preserving optimization in order to produce DP synthetic EHR data sets, which provide rigorous privacy guarantees, and are therefore shareable and usable in the real world. The performance (measured by AUROC, AUPRC and accuracy) of our model's synthetic, heterogeneous data is very close to the original data set (within 3 - 5% of the baseline) for the non-DP model when tested in a binary classification task. Using strong (1,10^-5) DP, our model still produces data useful for machine learning tasks, albeit incurring a roughly 17% performance penalty in our tested classification task. We additionally perform a sub-population analysis and find that our model does not introduce any bias into the synthetic EHR data compared to the baseline in either male/female populations, or the 0-18, 19-50 and 51+ age groups in terms of classification performance for either the non-DP or DP variant.Authors
Kieran Chin-Cheong, Thomas M. Sutter, Julia E. VogtSubmitted
A project conducted within the K. Bomblies lab at ETHZ to examine root architecture of A. arenosa. Phenotypes for 14 populations were assayed including lateral & adventitious roots, direction of growth, upside-down germination, roots lifting off of the plate, and starch levels in root tips. Additional phenotypes of germination time and rates for these populations were also recorded and analyzed.Authors
Joyce Y. KaoSubmitted
Open Science Framework
Clinical pharmacology is a multi-disciplinary data sciences field that utilizes mathematical and statistical methods to generate maximal knowledge from data. Pharmacometrics (PMX) is a well-recognized tool to characterize disease progression, pharmacokinetics and risk factors. Since the amount of data produced keeps growing with increasing pace, the computational effort necessary for PMX models is also increasing. Additionally, computationally efficient methods such as machine learning (ML) are becoming increasingly important in medicine. However, ML is currently not an integrated part of PMX, for various reasons. The goals of this article are to (i) provide an introduction to ML classification methods, (ii) provide examples for a ML classification analysis to identify covariates based on specific research questions, (iii) examine a clinically relevant example to investigate possible relationships of ML and PMX, and (iv) present a summary of ML and PMX tasks to develop clinical decision support tools.Authors
Gilbert Koch, Marc Pfister, Imant Daunhawer, Melanie Wilbaux, Sven Wellmann, Julia E. VogtSubmitted
Clinical Pharmacology & Therapeutics