We work on developing and extending new machine learning techniques for precision medicine, the life sciences and clinical data analysis. This field is exciting and challenging because new methods for a better understanding of diseases are enormous important. The field of action comprises many areas such as prediction of response to treatment in personalized medicine, (sparse) biomarker detection, tumor classification or the understanding of interactions between genes or groups of genes. The challenge lies not only in developing fast, robust and reliable systems but also in systems that are easy to interpret and usable in clinical practice.


Julia Vogt receives ERC starting grant

Julia Vogt has received an ERC starting grant for the project TransMed: Transparent Machine Learning in Medicine: development and application of…

Read more

Inaugural Lecture from Prof. Dr. Julia Vogt

Julia Vogt's inaugural lecture will take place on 18 November 2021, 17.15 - 19.00, at the AudiMax.

Read more

Alexander and Ricards win best paper at ICDH2021

Congratulations to Alexander H. Hatteland and  Ricards Marcinkevics for winning best paper at ICDH2021 with their paper Exploring Relationships…

Read more


Abstract

Constrained clustering has gained significant attention in the field of machine learning as it can leverage prior information on a growing amount of only partially labeled data. Following recent advances in deep generative models, we propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of stochastic gradient variational inference. By explicitly integrating domain knowledge in the form of probabilistic relations, our proposed model (DC-GMM) uncovers the underlying distribution of data conditioned on prior clustering preferences, expressed as pairwise constraints. These constraints guide the clustering process towards a desirable partition of the data by indicating which samples should or should not belong to the same cluster. We provide extensive experiments to demonstrate that DC-GMM shows superior clustering performances and robustness compared to state-of-the-art deep constrained clustering methods on a wide range of data sets. We further demonstrate the usefulness of our approach on two challenging real-world applications.

Authors

Laura Manduchi, Kieran Chin-Cheong, Holger Michel, Sven Wellmann, Julia E. Vogt

Submitted

Accepted at NeurIPS 2021

Abstract

Appendicitis is a common childhood disease, the management of which still lacks consolidated international criteria. In clinical practice, heuristic scoring systems are often used to assess the urgency of patients with suspected appendicitis. Previous work on machine learning for appendicitis has focused on conventional classification models, such as logistic regression and tree-based ensembles. In this study, we investigate the use of risk supersparse linear integer models (risk SLIM) for learning data-driven risk scores to predict the diagnosis, management, and complications in pediatric patients with suspected appendicitis on a dataset consisting of 430 children from a tertiary care hospital. We demonstrate the efficacy of our approach and compare the performance of learnt risk scores to previous analyses with random forests. Risk SLIM is able to detect medically meaningful features and outperforms the traditional appendicitis scores, while at the same time is better suited for the clinical setting than tree-based ensembles.

Authors

Pedro Roig Aparicio, Ricards Marcinkevics, Patricia Reis Wolfertstetter, Sven Wellmann, Christian Knorr, Julia E. Vogt

Submitted

Short paper at 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021

Abstract

Sleep is crucial to restore body functions and metabolism across nearly all tissues and cells, and sleep restriction is linked to various metabolic dysfunctions in humans. Using exhaled breath analysis by secondary electrospray ionization high-resolution mass spectrometry, we measured the human exhaled metabolome at 10-s resolution across a night of sleep in combination with conventional polysomnography. Our subsequent analysis of almost 2,000 metabolite features demonstrates rapid, reversible control of major metabolic pathways by the individual vigilance states. Within this framework, whereas a switch to wake reduces fatty acid oxidation, a switch to slow-wave sleep increases it, and the transition to rapid eye movement sleep results in elevation of tricarboxylic acid (TCA) cycle intermediates. Thus, in addition to daily regulation of metabolism, there exists a surprising and complex underlying orchestration across sleep and wake. Both likely play an important role in optimizing metabolic circuits for human performance and health.

Authors

Nora Nowak, Thomas Gaisl, Djordje Miladinovic, Ricards Marcinkevics, Martin Osswald, Stefan Bauer, Joachim Buhmann, Renato Zenobi, Pablo Sinues, Steven A. Brown, Malcolm Kohler

Submitted

Cell Reports

LinkDOICode

Abstract

Estimating conditional mutual information (CMI) is an essential yet challenging step in many machine learning and data mining tasks. Estimating CMI from data that contains both discrete and continuous variables, or even discrete-continuous mixture variables, is a particularly hard problem. In this paper, we show that CMI for such mixture variables, defined based on the Radon-Nikodym derivate, can be written as a sum of entropies, just like CMI for purely discrete or continuous data. Further, we show that CMI can be consistently estimated for discrete-continuous mixture variables by learning an adaptive histogram model. In practice, we estimate such a model by iteratively discretizing the continuous data points in the mixture variables. To evaluate the performance of our estimator, we benchmark it against state-of-the-art CMI estimators as well as evaluate it in a causal discovery setting.

Authors

Alexander Marx, Lincen Yang, Matthijs van Leeuwen

Submitted

Proceedings of the SIAM International Conference on Data Mining, SDM 2021

LinkDOICode

Abstract

Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. Yet, despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs, which are completely unsupervised. In an attempt to explain this gap, we uncover a fundamental limitation that applies to a large family of mixture-based multimodal VAEs. We prove that the sub-sampling of modalities enforces an undesirable upper bound on the multimodal ELBO and thereby limits the generative quality of the respective models. Empirically, we showcase the generative quality gap on both synthetic and real data and present the tradeoffs between different variants of multimodal VAEs. We find that none of the existing approaches fulfills all desired criteria of an effective multimodal generative model when applied on more complex datasets than those used in previous benchmarks. In summary, we identify, formalize, and validate fundamental limitations of VAE-based approaches for modeling weakly-supervised data and discuss implications for real-world applications.

Authors

Imant Daunhawer, Thomas M. Sutter, Kieran Chin-Cheong, Emanuele Palumbo, Julia E. Vogt

Submitted

Arxiv

Link