Publications

Abstract

Many modern research fields increasingly rely on collecting and analysing massive, often unstructured, and unwieldy datasets. Consequently, there is growing interest in machine learning and artificial intelligence applications that can harness this `data deluge'. This broad nontechnical overview provides a gentle introduction to machine learning with a specific focus on medical and biological applications. We explain the common types of machine learning algorithms and typical tasks that can be solved, illustrating the basics with concrete examples from healthcare. Lastly, we provide an outlook on open challenges, limitations, and potential impacts of machine-learning-powered medicine.

Authors

Ricards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt

Submitted

Arxiv

Date

23.12.2022

LinkDOI

Abstract

Early detection of cardiac dysfunction through routine screening is vital for diagnosing cardiovascular diseases. An important metric of cardiac function is the left ventricular ejection fraction (EF), which is used to diagnose cardiomyopathy. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound being a low-cost, real-time, and non-ionizing technology. However, human assessment of echocardiograms for calculating EF is both time-consuming and expertise-demanding, raising the need for an automated approach. Earlier automated works have been limited to still images or use echocardiogram videos with spatio-temporal convolutions in a complex pipeline. In this work, we propose to generate images from readily available echocardiogram videos, each image mimicking a M(otion)-mode image from a different scan line through time. We then combine different M-mode images using off-the-shelf model architectures to estimate the EF and, thus, diagnose cardiomyopathy. Our experiments show that our proposed method converges with only ten modes and is comparable to the baseline method while bypassing its cumbersome training process.

Authors

Thomas Sutter, Sebastian Balzer, Ece Özkan Elsen, Julia E. Vogt

Submitted

Medical Imaging Meets NeurIPS Workshop 2022

Date

02.12.2022

Link

Abstract

Background: Arm use metrics derived from wrist-mounted movement sensors are widely used to quantify the upper limb performance in real-life conditions of individuals with stroke throughout motor recovery. The calculation of real-world use metrics, such as arm use duration and laterality preferences, relies on accurately identifying functional movements. Hence, classifying upper limb activity into functional and non-functional classes is paramount. Acceleration thresholds are conventionally used to distinguish these classes. However, these methods are challenged by the high inter and intra-individual variability of movement patterns. In this study, we developed and validated a machine learning classifier for this task and compared it to methods using conventional and optimal thresholds.Methods: Individuals after stroke were video-recorded in their home environment performing semi-naturalistic daily tasks while wearing wrist-mounted inertial measurement units. Data were labeled frame-by-frame following the Taxonomy of Functional Upper Limb Motion definitions, excluding whole-body movements, and sequenced into 1-s epochs. Actigraph counts were computed, and an optimal threshold for functional movement was determined by receiver operating characteristic curve analyses on group and individual levels. A logistic regression classifier was trained on the same labels using time and frequency domain features. Performance measures were compared between all classification methods.Results: Video data (6.5 h) of 14 individuals with mild-to-severe upper limb impairment were labeled. Optimal activity count thresholds were ≥20.1 for the affected side and ≥38.6 for the unaffected side and showed high predictive power with an area under the curve (95% CI) of 0.88 (0.87,0.89) and 0.86 (0.85, 0.87), respectively. A classification accuracy of around 80% was equivalent to the optimal threshold and machine learning methods and outperformed the conventional threshold by ∼10%. Optimal thresholds and machine learning methods showed superior specificity (75–82%) to conventional thresholds (58–66%) across unilateral and bilateral activities.Conclusion: This work compares the validity of methods classifying stroke survivors’ real-life arm activities measured by wrist-worn sensors excluding whole-body movements. The determined optimal thresholds and machine learning classifiers achieved an equivalent accuracy and higher specificity than conventional thresholds. Our open-sourced classifier or optimal thresholds should be used to specify the intensity and duration of arm use.

Authors

Johannes Pohl, Alain Ryser, Janne Marieke Veerbeek, Geert Verheyden, Julia Elisabeth Vogt, Andreas Rüdiger Luft, Chris Awai Easthope

Submitted

Frontiers in Physiology

Date

28.09.2022

LinkDOI

Abstract

Background: Stroke leads to motor impairment which reduces physical activity, negatively affects social participation, and increases the risk of secondary cardiovascular events. Continuous monitoring of physical activity with motion sensors is promising to allow the prescription of tailored treatments in a timely manner. Accurate classification of gait activities and body posture is necessary to extract actionable information for outcome measures from unstructured motion data. We here develop and validate a solution for various sensor configurations specifically for a stroke population.Methods: Video and movement sensor data (locations: wrists, ankles, and chest) were collected from fourteen stroke survivors with motor impairment who performed real-life activities in their home environment. Video data were labeled for five classes of gait and body postures and three classes of transitions that served as ground truth. We trained support vector machine (SVM), logistic regression (LR), and k-nearest neighbor (kNN) models to identify gait bouts only or gait and posture. Model performance was assessed by the nested leave-one-subject-out protocol and compared across five different sensor placement configurations.Results: Our method achieved very good performance when predicting real-life gait versus non-gait (Gait classification) with an accuracy between 85% and 93% across sensor configurations, using SVM and LR modeling. On the much more challenging task of discriminating between the body postures lying, sitting, and standing as well as walking, and stair ascent/descent (Gait and postures classification), our method achieves accuracies between 80% and 86% with at least one ankle and wrist sensor attached unilaterally. The Gait and postures classification performance between SVM and LR was equivalent but superior to kNN.Conclusion: This work presents a comparison of performance when classifying Gait and body postures in post-stroke individuals with different sensor configurations, which provide options for subsequent outcome evaluation. We achieved accurate classification of gait and postures performed in a real-life setting by individuals with a wide range of motor impairments due to stroke. This validated classifier will hopefully prove a useful resource to researchers and clinicians in the increasingly important field of digital health in the form of remote movement monitoring using motion sensors.

Authors

Johannes Pohl, Alain Ryser, Janne Marieke Veerbeek, Geert Verheyden, Julia Elisabeth Vogt, Andreas Rüdiger Luft, Chris Awai Easthope

Submitted

Frontiers in Physiology

Date

26.09.2022

LinkDOI

Abstract

Pulmonary hypertension (PH) in newborns and infants is a complex condition associated with several pulmonary, cardiac, and systemic diseases contributing to morbidity and mortality. Therefore, accurate and early detection of PH is crucial for successful management. Using echocardiography, the primary diagnostic tool in pediatrics, human assessment is both time-consuming and expertise-demanding, raising the need for an automated approach. In this work, we present an interpretable multi-view video-based deep learning approach to predict PH for a cohort of 194 newborns using echocardiograms. We use spatio-temporal convolutional architectures for the prediction of PH from each view, and aggregate the predictions of the different views using majority voting. To the best of our knowledge, this is the first work for an automated assessment of PH in newborns using echocardiograms. Our results show a mean F1-score of 0.84 for severity prediction and 0.92 for binary detection using 10-fold cross-validation. We complement our predictions with saliency maps and show that the learned model focuses on clinically relevant cardiac structures, motivating its usage in clinical practice.

Authors

Hanna Ragnarsdottir, Laura Manduchi, Holger Michel, Fabian Laumer, Sven Wellmann, Ece Özkan Elsen, Julia E. Vogt

Submitted

DAGM German Conference on Pattern Recognition

Date

20.09.2022

DOI

Abstract

Deep neural networks for image-based screening and computer-aided diagnosis have achieved expert-level performance on various medical imaging modalities, including chest radiographs. Recently, several works have indicated that these state-of-the-art classifiers can be biased with respect to sensitive patient attributes, such as race or gender, leading to growing concerns about demographic disparities and discrimination resulting from algorithmic and model-based decision-making in healthcare. Fair machine learning has focused on mitigating such biases against disadvantaged or marginalised groups, mainly concentrating on tabular data or natural images. This work presents two novel intra-processing techniques based on fine-tuning and pruning an already-trained neural network. These methods are simple yet effective and can be readily applied post hoc in a setting where the protected attribute is unknown during the model development and test time. In addition, we compare several intra- and post-processing approaches applied to debiasing deep chest X-ray classifiers. To the best of our knowledge, this is one of the first efforts studying debiasing methods on chest radiographs. Our results suggest that the considered approaches successfully mitigate biases in fully connected and convolutional neural networks offering stable performance under various settings. The discussed methods can help achieve group fairness of deep medical image classifiers when deploying them in domains with different fairness considerations and constraints.

Authors

Ricards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt

Submitted

The Seventh Machine Learning for Healthcare Conference, MLHC 2022

Date

05.08.2022

LinkCode

Abstract

We propose a novel anomaly detection method for echocardiogram videos. The introduced method takes advantage of the periodic nature of the heart cycle to learn three variants of a variational latent trajectory model (TVAE). While the first two variants (TVAE-C and TVAE-R) model strict periodic movements of the heart, the third (TVAE-S) is more general and allows shifts in the spatial representation throughout the video. All models are trained on the healthy samples of a novel in-house dataset of infant echocardiogram videos consisting of multiple chamber views to learn a normative prior of the healthy population. During inference, maximum a posteriori (MAP) based anomaly detection is performed to detect out-of-distribution samples in our dataset. The proposed method reliably identifies severe congenital heart defects, such as Ebstein’s Anomaly or Shone-complex. Moreover, it achieves superior performance over MAP-based anomaly detection with standard variational autoencoders when detecting pulmonary hypertension and right ventricular dilation. Finally, we demonstrate that the proposed method enables interpretable explanations of its output through heatmaps highlighting the regions corresponding to anomalous heart structures.

Authors

Alain Ryser, Laura Manduchi, Fabian Laumer, Holger Michel, Sven Wellmann, Julia E. Vogt

Submitted

The Seventh Machine Learning for Healthcare Conference, MLHC 2022

Date

05.08.2022

LinkCode

Abstract

Arguably, interpretability is one of the guiding principles behind the development of machine-learning-based healthcare decision support tools and computer-aided diagnosis systems. There has been a renewed interest in interpretable classification based on high-level concepts, including, among other model classes, the re-exploration of concept bottleneck models. By their nature, medical diagnosis, patient management, and monitoring require the assessment of multiple views and modalities to form a holistic representation of the patient's state. For instance, in ultrasound imaging, a region of interest might be registered from multiple views that are informative about different sets of clinically relevant features. Motivated by this, we extend the classical concept bottleneck model to the multiview classification setting by representation fusion across the views. We apply our multiview concept bottleneck model to the dataset of ultrasound images acquired from a cohort of pediatric patients with suspected appendicitis to predict the disease. The results suggest that auxiliary supervision from the concepts and aggregation across multiple views help develop more accurate and interpretable classifiers.

Authors

Ugne Klimiene, Ricards Marcinkevics, Patricia Reis Wolfertstetter, Ece Özkan Elsen, Alyssia Paschke, David Niederberger, Sven Wellmann, Christian Knorr, Julia E Vogt

Submitted

Oral spotlight at the 2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH), ICML 2022

Date

23.07.2022

Link

Abstract

We propose a novel anomaly detection method for echocardiogram videos. The introduced method takes advantage of the periodic nature of the heart cycle to learn different variants of a variational latent trajectory model (TVAE). The models are trained on the healthy samples of an in-house dataset of infant echocardiogram videos consisting of multiple chamber views to learn a normative prior of the healthy population. During inference, maximum a posteriori (MAP) based anomaly detection is performed to detect out-ofdistribution samples in our dataset. The proposed method reliably identifies severe congenital heart defects, such as Ebstein’s Anomaly or Shonecomplex. Moreover, it achieves superior performance over MAP-based anomaly detection with standard variational autoencoders on the task of detecting pulmonary hypertension and right ventricular dilation. Finally, we demonstrate that the proposed method provides interpretable explanations of its output through heatmaps which highlight the regions corresponding to anomalous heart structures.

Authors

Alain Ryser, Laura Manduchi, Fabian Laumer, Holger Michel, Sven Wellmann, Julia E. Vogt

Submitted

Poster at the 2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH), ICML 2022

Date

23.07.2022

LinkCode

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) induces both B and T cell responses which jointly contribute to effective neutralization and clearance of the virus. Multiple compartments of circulating immune memory to SARS-CoV-2 are not fully understood. We analyzed humoral and T cell immune responses in young convalescent adults with previous asymptomatic SARS-CoV-2 infections or mildly symptomatic COVID-19 disease. We concomitantly measured antibodies in the blood and analyzed SARS-CoV-2-reactive T cell reaction in response to overlapping peptide pools of four viral proteins in peripheral blood mononuclear cells (PBMC). Using statistical and machine learning models, we investigated whether T cell reactivity predicted antibody status. Individuals with previous SARS-CoV-2 infection differed in T cell responses from non-infected individuals. Subjects with previous SARS-CoV-2 infection exhibited CD4+ T cell responses against S1-, N-proteins and CoV-Mix (containing N, M and S protein-derived peptides) that were dominant over CD8+ T cells. At the same time, signals against the M protein were less pronounced. Double positive IL2+/CD154+ and IFN+/TNF+ CD4+ T cells showed the strongest association with antibody titers. T-cell reactivity to CoV-Mix-, S1-, and N-antigens were most strongly associated with humoral immune response, specifically with a compound antibody titer consisting of RBD, S1, S2, and NP. The T cell phenotype of SARS-CoV-2 infected individuals was stable for four months, thereby exceeding antibody decay rates. Our findings demonstrate that mild COVID-19 infections can elicit robust SARS-CoV-2 T-cell reactive immunity against specific components of SARS-CoV-2.

Authors

Ricards Marcinkevics, Pamuditha Silva, Anna-Katharina Hankele, Katharina Csik, Svenja Godbersen, Algera Goga, Lynn Hasenöhrl, Pascale Hirschi, Hasan Kabakci, Mary P LaPierre, Johanna Mayrhofer, Alexandra Title, Xuan Shu, Nouell Baiioud, Sandra Bernal, Laura Dassisti, Mara D Saenz-de-Juano, Meret Schmidhauser, Giulia Silvestrelli, Simon Z Ulbrich, Thea J Ulbrich, Tamara Wyss, Daniel J Stekhoven, Faisal S Al-Quaddoomi, Shuqing Yu, Mascha Binder, Christoph Schultheiss, Claudia Zindel, Christoph Kolling, Jörg Goldhahn, Bahram Kasmapour, Polina Zjablovskaja, Frank Hardung, Anne Richter, Stefan Miltenyi, Luca Piccoli, Sandra Ciesek, Julia E Vogt, Federica Sallusto, Markus Stoffel, Susanne E Ulbrich

Submitted

The 1st Workshop on Healthcare AI and COVID-19 at ICML 2022

Date

22.07.2022

Abstract

We study the problem of identifying cause and effect over two univariate continuous variables X and Y from a sample of their joint distribution. Our focus lies on the setting when the variance of the noise may be dependent on the cause. We propose to partition the domain of the cause into multiple segments where the noise indeed is dependent. To this end, we minimize a scale-invariant, penalized regression score, finding the optimal partitioning using dynamic programming. We show under which conditions this allows us to identify the causal direction for the linear setting with heteroscedastic noise, for the non-linear setting with homoscedastic noise, as well as empirically confirm that these results generalize to the non-linear and heteroscedastic case. Altogether, the ability to model heteroscedasticity translates into an improved performance in telling cause from effect on a wide range of synthetic and real-world datasets.

Authors

Sascha Xu, Osman A Mian, Alexander Marx, Jilles Vreeken

Submitted

Proceedings of the 39th International Conference on Machine Learning, ICML 2022

Date

28.06.2022

LinkCode

Abstract

The algorithmic independence of conditionals, which postulates that the causal mechanism is algorithmically independent of the cause, has recently inspired many highly successful approaches to distinguish cause from effect given only observational data. Most popular among these is the idea to approximate algorithmic independence via two-part Minimum Description Length (MDL). Although intuitively sensible, the link between the original postulate and practical two-part MDL encodings is left vague. In this work, we close this gap by deriving a two-part formulation of this postulate, in terms of Kolmogorov complexity, which directly links to practical MDL encodings. To close the cycle, we prove that this formulation leads on expectation to the same inference result as the original postulate.

Authors

Alexander Marx, Jilles Vreeken

Submitted

AAAI'22 Workshop on Information-Theoretic Methods for Causal Inference and Discovery (ITCI’22)

Date

05.05.2022

Abstract

We study the problem of identifying the cause and the effect between two univariate continuous variables X and Y. The examined data is purely observational, hence it is required to make assumptions about the underlying model. Often, the independence of the noise from the cause is assumed, which is not always the case for real world data. In view of this, we present a new method, which explicitly models heteroscedastic noise. With our HEC algorithm, we can find the optimal model regularized, by an information theoretic score. In thorough experiments we show, that our ability to model heteroscedastic noise translates into a superior performance on a wide range of synthetic and real-world datasets.

Authors

Sascha Xu, Alexander Marx, Osman Mian, Jilles Vreeken

Submitted

AAAI'22 Workshop on Information-Theoretic Methods for Causal Inference and Discovery (ITCI’22)

Date

05.05.2022

Abstract

Estimating mutual information (MI) between two continuous random variables X and Y allows to capture non-linear dependencies between them, non-parametrically. As such, MI estimation lies at the core of many data science applications. Yet, robustly estimating MI for high-dimensional X and Y is still an open research question. In this paper, we formulate this problem through the lens of manifold learning. That is, we leverage the common assumption that the information of X and Y is captured by a low-dimensional manifold embedded in the observed high-dimensional space and transfer it to MI estimation. As an extension to state-of-the-art kNN estimators, we propose to determine the k-nearest neighbors via geodesic distances on this manifold rather than from the ambient space, which allows us to estimate MI even in the high-dimensional setting. An empirical evaluation of our method, G-KSG, against the state-of-the-art shows that it yields good estimations of MI in classical benchmark and manifold tasks, even for high dimensional datasets, which none of the existing methods can provide.

Authors

Alexander Marx, Jonas Fischer

Submitted

Proceedings of the SIAM International Conference on Data Mining, SDM 2022

Date

30.04.2022

LinkDOICode

Abstract

Due to growing concerns about demographic disparities and discrimination resulting from algorithmic and model-based decision-making, recent research has focused on mitigating biases against already disadvantaged or marginalised groups in classification models. From the perspective of classification parity, the two commonest metrics for assessing fairness are statistical parity and equality of opportunity. Current approaches to debiasing in classification either require the knowledge of the protected attribute before or during training or are entirely agnostic to the model class and parameters. This work considers differentiable proxy functions for statistical parity and equality of opportunity and introduces two novel debiasing techniques for neural network classifiers based on fine-tuning and pruning an already-trained network. As opposed to the prior work leveraging adversarial training, the proposed methods are simple yet effective and can be readily applied post hoc. Our experimental results encouragingly suggest that these approaches successfully debias fully connected neural networks trained on tabular data and often outperform model-agnostic post-processing methods.

Authors

Ricards Marcinkevics, Ece Özkan Elsen, Julia E. Vogt

Submitted

Contributed talk at ICLR 2022 Workshop on Socially Responsible Machine Learning

Date

29.04.2022

LinkCode

Abstract

Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. Yet, despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs, which are completely unsupervised. In an attempt to explain this gap, we uncover a fundamental limitation that applies to a large family of mixture-based multimodal VAEs. We prove that the sub-sampling of modalities enforces an undesirable upper bound on the multimodal ELBO and thereby limits the generative quality of the respective models. Empirically, we showcase the generative quality gap on both synthetic and real data and present the tradeoffs between different variants of multimodal VAEs. We find that none of the existing approaches fulfills all desired criteria of an effective multimodal generative model when applied on more complex datasets than those used in previous benchmarks. In summary, we identify, formalize, and validate fundamental limitations of VAE-based approaches for modeling weakly-supervised data and discuss implications for real-world applications.

Authors

Imant Daunhawer, Thomas M. Sutter, Kieran Chin-Cheong, Emanuele Palumbo, Julia E. Vogt

Submitted

The Tenth International Conference on Learning Representations, ICLR 2022

Date

27.04.2022

Link

Abstract

In this work, we study the problem of clustering survival data — a challenging and so far under-explored task. We introduce a novel semi-supervised probabilistic approach to cluster survival data by leveraging recent advances in stochastic gradient variational inference. In contrast to previous work, our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and censored survival times. We compare our model to the related work on clustering and mixture models for survival data in comprehensive experiments on a wide range of synthetic, semi-synthetic, and real-world datasets, including medical imaging data. Our method performs better at identifying clusters and is competitive at predicting survival times. Relying on novel generative assumptions, the proposed model offers a holistic perspective on clustering survival data and holds a promise of discovering subpopulations whose survival is regulated by different generative mechanisms.

Authors

Laura Manduchi, Ricards Marcinkevics, Michela C. Massi, Thomas Weikert, Alexander Sauter, Verena Gotta, Timothy Müller, Flavio Vasella, Marian C. Neidert, Marc Pfister, Bram Stieltjes, Julia E. Vogt

Submitted

The Tenth International Conference on Learning Representations, ICLR 2022

Date

25.04.2022

LinkCode

Abstract

Partitioning a set of elements into a given number of groups of a priori unknown sizes is an important task in many applications. Due to hard constraints, it is a non-differentiable problem which prohibits its direct use in modern machine learning frameworks. Hence, previous works mostly fall back on suboptimal heuristics or simplified assumptions. The multivariate hypergeometric distribution offers a probabilistic formulation of how to distribute a given number of samples across multiple groups. Unfortunately, as a discrete probability distribution, it neither is differentiable. In this work, we propose a continuous relaxation for the multivariate non-central hypergeometric distribution. We introduce an efficient and numerically stable sampling procedure. This enables reparameterized gradients for the hypergeometric distribution and its integration into automatic differentiation frameworks. We highlight the applicability and usability of the proposed formulation on two different common machine learning tasks.

Authors

Thomas M. Sutter, Laura Manduchi, Alain Ryser, Julia E. Vogt

Submitted

Arxiv

Date

03.03.2022

LinkCode

Abstract

Using artificial intelligence to improve patient care is a cutting-edge methodology, but its implementation in clinical routine has been limited due to significant concerns about understanding its behavior. One major barrier is the explainability dilemma and how much explanation is required to use artificial intelligence safely in healthcare. A key issue is the lack of consensus on the definition of explainability by experts, regulators, and healthcare professionals, resulting in a wide variety of terminology and expectations. This paper aims to fill the gap by defining minimal explainability standards to serve the views and needs of essential stakeholders in healthcare. In that sense, we propose to define minimal explainability criteria that can support doctors’ understanding, meet patients’ needs, and fulfill legal requirements. Therefore, explainability need not to be exhaustive but sufficient for doctors and patients to comprehend the artificial intelligence models’ clinical implications and be integrated safely into clinical practice. Thus, minimally acceptable standards for explainability are context-dependent and should respond to the specific need and potential risks of each clinical scenario for a responsible and ethical implementation of artificial intelligence.

Authors

Laura Arbelaez Ossa, Georg Starke, Giorgia Lorenzini, Julia E Vogt, David M Shaw, Bernice Simone Elger

Submitted

DIGITAL HEALTH

Date

11.02.2022

LinkDOI

Abstract

Die Digitalisierung hat die Medizin bereits verändert und wird die ärztliche Tätig­keit auch in Zukunft stark beeinflussen. Es ist deshalb wichtig, dass sich angehende Ärztinnen und Ärzte bereits während des Studiums mit den Methoden und Ein­satzmöglichkeiten des maschinellen Lernens auseinandersetzen. Die Arbeits­gruppe «Digitalisierung der Medizin» hat dazu Lernziele erarbeitet.

Authors

Raphaël Bonvin, Joachim Buhmann, Carlos Cotrini Jimenez, Marcel Egger, Alexander Geissler, Michael Krauthammer, Christian Schirlo, Christiane Spiess, Johann Steurer, Kerstin Noëlle Vokinger, Julia Vogt

Date

26.01.2022

Link

Abstract

Objective: To report the outcomes of active surveillance (AS) for low-risk prostate cancer (PCa) in a single-center cohort. Patients and Methods: This is a prospective, single-center, observational study. The cohort included all patients who underwent AS for PCa between December 1999 and December 2020 at our institution. Follow-up appointments (FU) ended in February 2021. Results: A total of 413 men were enrolled in the study, and 391 had at least one FU. Of those who followed up, 267 had PCa diagnosed by transrectal ultrasound (TRUS)-guided biopsy (T1c: 68.3%), while 124 were diagnosed after transurethral resection of the prostate (TURP) (T1a/b: 31.7%). Median FU was 46 months (IQR 25–90). Cancer specific survival was 99.7% and overall survival was 92.3%. Median reclassification time was 11.2 years. After 20 years, 25% of patients were reclassified within 4.58 years, 6.6% opted to switch to watchful waiting, 4.1% died, 17.4% were lost to FU, and 46.8% remained on AS. Those diagnosed by TRUS had a significantly higher reclassification rate than those diagnosed by TURP (p < 0.0001). Men diagnosed by targeted MRI/TRUS fusion biopsy tended to have a higher reclassification probability than those diagnosed by conventional template biopsies (p = 0.083). Conclusions: Our single-center cohort spanning over two decades revealed that AS remains a safe option for low-risk PCa even in the long term. Approximately half of AS enrollees will eventually require definitive treatment due to disease progression. Men with incidental prostate cancer were significantly less likely to have disease progression.

Authors

Sarah Hagmann, Venkat Ramakrishnan, Alexander Tamalunas, Marc Hofmann, Moritz Vandenhirtz, Silvan Vollmer, Jsmea Hug, Philipp Niggli, Antonio Nocito, Rahel A. Kubik-Huch, Kurt Lehmann, Lukas John Hefermehl

Submitted

Cancers

Date

12.01.2022

LinkDOI

Abstract

Appendicitis is a common childhood disease, the management of which still lacks consolidated international criteria. In clinical practice, heuristic scoring systems are often used to assess the urgency of patients with suspected appendicitis. Previous work on machine learning for appendicitis has focused on conventional classification models, such as logistic regression and tree-based ensembles. In this study, we investigate the use of risk supersparse linear integer models (risk SLIM) for learning data-driven risk scores to predict the diagnosis, management, and complications in pediatric patients with suspected appendicitis on a dataset consisting of 430 children from a tertiary care hospital. We demonstrate the efficacy of our approach and compare the performance of learnt risk scores to previous analyses with random forests. Risk SLIM is able to detect medically meaningful features and outperforms the traditional appendicitis scores, while at the same time is better suited for the clinical setting than tree-based ensembles.

Authors

Pedro Roig Aparicio, Ricards Marcinkevics, Patricia Reis Wolfertstetter, Sven Wellmann, Christian Knorr, Julia E. Vogt

Submitted

Short paper at 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021

Date

16.12.2021

LinkDOI

Abstract

Constrained clustering has gained significant attention in the field of machine learning as it can leverage prior information on a growing amount of only partially labeled data. Following recent advances in deep generative models, we propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of stochastic gradient variational inference. By explicitly integrating domain knowledge in the form of probabilistic relations, our proposed model (DC-GMM) uncovers the underlying distribution of data conditioned on prior clustering preferences, expressed as pairwise constraints. These constraints guide the clustering process towards a desirable partition of the data by indicating which samples should or should not belong to the same cluster. We provide extensive experiments to demonstrate that DC-GMM shows superior clustering performances and robustness compared to state-of-the-art deep constrained clustering methods on a wide range of data sets. We further demonstrate the usefulness of our approach on two challenging real-world applications.

Authors

Laura Manduchi, Kieran Chin-Cheong, Holger Michel, Sven Wellmann, Julia E. Vogt

Submitted

Accepted at NeurIPS 2021

Date

14.12.2021

Abstract

In this work, we propose a new formulation for multimodal VAEs to model and learn the relationship between data types. Despite their recent progress, current multimodal generative methods are based on simplistic assumptions regarding the relation between data types, which leads to a trade-off between coherence and quality of generated samples - even for simple toy datasets. The proposed method learns the relationship between data types instead of relying on pre-defined and limiting assumptions. Based on the principles of variational inference, we change the posterior approximation to explicitly include information about the relation between data types. We show empirically that the simplified assumption of a single shared latent space leads to inferior performance for a dataset with additional pairwise shared information.

Authors

Thomas M. Sutter, Julia E. Vogt

Submitted

Bayesian Deep Learning Workshop at Neurips 2021

Date

14.12.2021

Link

Abstract

Sleep is crucial to restore body functions and metabolism across nearly all tissues and cells, and sleep restriction is linked to various metabolic dysfunctions in humans. Using exhaled breath analysis by secondary electrospray ionization high-resolution mass spectrometry, we measured the human exhaled metabolome at 10-s resolution across a night of sleep in combination with conventional polysomnography. Our subsequent analysis of almost 2,000 metabolite features demonstrates rapid, reversible control of major metabolic pathways by the individual vigilance states. Within this framework, whereas a switch to wake reduces fatty acid oxidation, a switch to slow-wave sleep increases it, and the transition to rapid eye movement sleep results in elevation of tricarboxylic acid (TCA) cycle intermediates. Thus, in addition to daily regulation of metabolism, there exists a surprising and complex underlying orchestration across sleep and wake. Both likely play an important role in optimizing metabolic circuits for human performance and health.

Authors

Nora Nowak, Thomas Gaisl, Djordje Miladinovic, Ricards Marcinkevics, Martin Osswald, Stefan Bauer, Joachim Buhmann, Renato Zenobi, Pablo Sinues, Steven A. Brown, Malcolm Kohler

Submitted

Cell Reports

Date

26.10.2021

LinkDOICode

Abstract

Estimating conditional mutual information (CMI) is an essential yet challenging step in many machine learning and data mining tasks. Estimating CMI from data that contains both discrete and continuous variables, or even discrete-continuous mixture variables, is a particularly hard problem. In this paper, we show that CMI for such mixture variables, defined based on the Radon-Nikodym derivate, can be written as a sum of entropies, just like CMI for purely discrete or continuous data. Further, we show that CMI can be consistently estimated for discrete-continuous mixture variables by learning an adaptive histogram model. In practice, we estimate such a model by iteratively discretizing the continuous data points in the mixture variables. To evaluate the performance of our estimator, we benchmark it against state-of-the-art CMI estimators as well as evaluate it in a causal discovery setting.

Authors

Alexander Marx, Lincen Yang, Matthijs van Leeuwen

Submitted

Proceedings of the SIAM International Conference on Data Mining, SDM 2021

Date

21.10.2021

LinkDOICode

Abstract

Background: Current strategies for risk stratification and prediction of neonatal early-onset sepsis (EOS) are inefficient and lack diagnostic performance. The aim of this study was to use machine learning to analyze the diagnostic accuracy of risk factors (RFs), clinical signs and biomarkers and to develop a prediction model for culture-proven EOS. We hypothesized that the contribution to diagnostic accuracy of biomarkers is higher than of RFs or clinical signs. Study Design: Secondary analysis of the prospective international multicenter NeoPInS study. Neonates born after completed 34 weeks of gestation with antibiotic therapy due to suspected EOS within the first 72 hours of life participated. Primary outcome was defined as predictive performance for culture-proven EOS with variables known at the start of antibiotic therapy. Machine learning was used in form of a random forest classifier. Results: One thousand six hundred eighty-five neonates treated for suspected infection were analyzed. Biomarkers were superior to clinical signs and RFs for prediction of culture-proven EOS. C-reactive protein and white blood cells were most important for the prediction of the culture result. Our full model achieved an area-under-the-receiver-operating-characteristic-curve of 83.41% (+/-8.8%) and an area-under-the-precision-recall-curve of 28.42% (+/-11.5%). The predictive performance of the model with RFs alone was comparable with random. Conclusions: Biomarkers have to be considered in algorithms for the management of neonates suspected of EOS. A 2-step approach with a screening tool for all neonates in combination with our model in the preselected population with an increased risk for EOS may have the potential to reduce the start of unnecessary antibiotics.

Authors

Martin Stocker, Imant Daunhawer, Wendy van Herk, Salhab el Helou, Sourabh Dutta, Frank A. B. A.Schuerman, Rita K. van den Tooren-de Groot, ; Jantien W. Wieringa, Jan Janota, Laura H. van der Meer-Kappelle, Rob Moonen, Sintha D. Sie, Esther de Vries, Albertine E. Donker, Urs Zimmerman, Luregn J. Schlapbach, Amerik C. de Mol, Angelique Hoffmann-Haringsma, Madan Roy, Maren Tomaske, René F. Kornelisse, Juliette van Gijsel, Frans B. Plötz, Sven Wellmann, Niek B Achten, Dirk Lehnick, Annemarie M. C. van Rossum, Julia E. Vogt

Submitted

The Pediatric Infectious Disease Journal, 2022

Date

09.09.2021

LinkDOI

Abstract

Autonomic peripheral activity is partly governed by brain autonomic centers. However, there is still a lot of uncertainties regarding the precise link between peripheral and central autonomic biosignals. Clarifying these links could have a profound impact on the interpretability, and thus usefulness, of peripheral autonomic biosignals captured with wearable devices. In this study, we take advantage of a unique dataset consisting of intracranial stereo-electroencephalography (SEEG) and peripheral biosignals acquired simultaneously for several days from four subjects undergoing epilepsy monitoring. Compared to previous work, we apply a deep neural network to explore high-dimensional nonlinear correlations between the cerebral brainwaves and variations in heart rate and electrodermal activity (EDA). Further, neural network explainability methods were applied to identify most relevant brainwave frequencies, brain regions and temporal information to predict a specific biosignal. Strongest brain-peripheral correlations were observed from contacts located in the central autonomic network, in particular in the alpha, theta and 52 to 58 Hz frequency band. Furthermore, a temporal delay of 12 to 14 s between SEEG and EDA signal was observed. Finally, we believe that this pilot study demonstrates a promising approach to mapping brain-peripheral relationships in a data-driven manner by leveraging the expressiveness of deep neural networks.

Authors

Alexander H. Hatteland, Ricards Marcinkevics, Renaud Marquis, Thomas Frick, Ilona Hubbard, Julia E. Vogt, Thomas Brunschwiler, Philippe Ryvlin

Submitted

Best paper award at IEEE International Conference on Digital Health, ICDH 2021

Date

06.09.2021

LinkDOI

Abstract

Understanding how epigenetic variation in non-coding regions is involved in distal gene-expression regulation is an important problem. Regulatory regions can be associated to genes using large-scale datasets of epigenetic and expression data. However, for regions of complex epigenomic signals and enhancers that regulate many genes, it is difficult to understand these associations. We present StitchIt, an approach to dissect epigenetic variation in a gene-specific manner for the detection of regulatory elements (REMs) without relying on peak calls in individual samples. StitchIt segments epigenetic signal tracks over many samples to generate the location and the target genes of a REM simultaneously. We show that this approach leads to a more accurate and refined REM detection compared to standard methods even on heterogeneous datasets, which are challenging to model. Also, StitchIt REMs are highly enriched in experimentally determined chromatin interactions and expression quantitative trait loci. We validated several newly predicted REMs using CRISPR-Cas9 experiments, thereby demonstrating the reliability of StitchIt. StitchIt is able to dissect regulation in superenhancers and predicts thousands of putative REMs that go unnoticed using peak-based approaches suggesting that a large part of the regulome might be uncharted water.

Authors

Florian Schmidt, Alexander Marx, Nina Baumgarten, Marie Hebel, Martin Wegner, Manuel Kaulich, Matthias S Leisegang, Ralf P Brandes, Jonathan Göke, Jilles Vreeken, Marcel H Schulz

Submitted

Nucleic Acids Research

Date

01.09.2021

DOICode

Abstract

Objective: To evaluate the association of self-reported physical function with subjective and objective measures as well as temporospatial gait features in lumbar spinal stenosis (LSS). Design: Cross-sectional pilot study. Setting: Outpatient multispecialty clinic. Participants: Participants with LSS and matched controls without LSS (n=10 per group; N=20). Interventions: Not applicable. Main outcome measures: Self-reported physical function (36-Item Short Form Health Survey [SF-36] physical functioning domain), Oswestry Disability Index, Swiss Spinal Stenosis Questionnaire, the Neurogenic Claudication Outcome Score, and inertia measurement unit (IMU)-derived temporospatial gait features. Results: Higher self-reported physical function scores (SF-36 physical functioning) correlated with lower disability ratings, neurogenic claudication, and symptom severity ratings in patients with LSS (P<.05). Compared with controls without LSS, patients with LSS have lower scores on physical capacity measures (median total distance traveled on 6-minute walk test: controls 505 m vs LSS 316 m; median total distance traveled on self-paced walking test: controls 718 m vs LSS 174 m). Observed differences in IMU-derived gait features, physical capacity measures, disability ratings, and neurogenic claudication scores between populations with and without LSS were statistically significant. Conclusions: Further evaluation of the association of IMU-derived temporospatial gait with self-reported physical function, pain related-disability, neurogenic claudication, and spinal stenosis symptom severity score in LSS would help clarify their role in tracking LSS outcomes.

Authors

Charles A Odonkor, Salam Taraben, Christy Tomkins-Lane, Wei Zhang, Amir Muaremi, H. Leutheuser, Ruopeng Sun, Matthew Smuck

Submitted

Archives of Rehabilitation Research and Clinical Translation

Date

01.09.2021

DOI

Abstract

One of the core assumptions in causal discovery is the faithfulness assumption--i.e. assuming that independencies found in the data are due to separations in the true causal graph. This assumption can, however, be violated in many ways, including xor connections, deterministic functions or cancelling paths. In this work, we propose a weaker assumption that we call 2-adjacency faithfulness. In contrast to adjacency faithfulness, which assumes that there is no conditional independence between each pair of variables that are connected in the causal graph, we only require no conditional independence between a node and a subset of its Markov blanket that can contain up to two nodes. Equivalently, we adapt orientation faithfulness to this setting. We further propose a sound orientation rule for causal discovery that applies under weaker assumptions. As a proof of concept, we derive a modified Grow and Shrink algorithm that recovers the Markov blanket of a target node and prove its correctness under strictly weaker assumptions than the standard faithfulness assumption.

Authors

Alexander Marx, Arthur Gretton, Joris M. Mooij

Submitted

Proceedings of the Conference on Uncertainty in Artificial Intelligence, UAI 2021

Date

01.08.2021

Link

Abstract

Machine Learning has become more and more popular in the medical domain over the past years. While supervised machine learning has already been applied successfully, the vast amount of unlabelled data offers new opportunities for un- and self-supervised learning methods. Especially with regard to the multimodal nature of most clinical data, the labelling of multiple data types becomes quickly infeasible in the medical domain. However, to the best of our knowledge, multimodal unsupervised methods have been tested extensively on toy-datasets only but have never been applied to real-world medical data, for direct applications such as disease classification and image generation. In this article, we demonstrate that self-supervised methods provide promising results on medical data while highlighting that the task is extremely challenging and that there is space for substantial improvements.

Authors

Hendrik J. Klug, Thomas M. Sutter, Julia E. Vogt

Submitted

Medical Imaging with Deep Learning, MIDL 2021

Date

07.07.2021

Link

Abstract

It is well-known that correlation does not equal causation, but how can we infer causal relations from data? Causal discovery tries to answer precisely this question by rigorously analyzing under which assumptions it is feasible to infer directed causal networks from passively collected, so-called observational data. Classical approaches assume the data to be faithful to the causal graph, that is, independencies found in the distribution are assumed to be due to separations in the true graph. Under this assumption, so-called constraint-based methods can infer the correct Markov equivalence class of the true graph (i.e. the correct undirected graph and some edge directions), only using conditional independence tests. In this dissertation, we aim to alleviate some of the weaknesses of constraint-based algorithms. In the first part, we investigate causal mechanisms, which cannot be detected when assuming faithfulness. We then suggest a weaker assumption based on triple interactions, which allows for recovering a broader spectrum of causal mechanisms. Subsequently, we focus on conditional independence testing, which is a crucial tool for causal discovery. In particular, we propose to measure dependencies through conditional mutual information, which we show can be consistently estimated even for the most general setup: discrete-continuous mixture random variables. Last, we focus on distinguishing Markov equivalent graphs (i.e. infer the complete DAG structure), which boils down to inferring the causal direction between two random variables. In this setting, we focus on continuous and mixed-type data and develop our methods based on an information-theoretic postulate, which states that the true causal graph can be compressed best, i.e. has the smallest Kolmogorov complexity.

Authors

Alexander Marx

Submitted

Saarländische Universitäts- und Landesbibliothek

Date

06.07.2021

DOI

Abstract

Background Preterm neonates frequently experience hypernatremia (plasma sodium concentrations >145 mmol/l), which is associated with clinical complications, such as intraventricular hemorrhage. Study design In this single center retrospective observational study, the following 7 risk factors for hypernatremia were analyzed in very low gestational age (VLGA, below 32 weeks) neonates: gestational age (GA), delivery mode (DM; vaginal or caesarian section), sex, birth weight, small for GA, multiple birth, and antenatal corticosteroids. Machine learning (ML) approaches were applied to obtain probabilities for hypernatremia. Results 824 VLGA neonates were included (median GA 29.4 weeks, median birth weight 1170g, caesarean section 83%). 38% of neonates experienced hypernatremia. Maximal sodium concentration of 144 mmol/l (interquartile range 142–147) was observed 52 hours (41–65) after birth. ML identified vaginal delivery and GA as key risk factors for hypernatremia. The risk of hypernatremia increased with lower GA from 22% for GA >= 31–32 weeks to 46% for GA < 31 weeks and 60% for GA < 27 weeks. A linear relationship between maximal sodium concentrations and GA was found, showing decreases of 0.29 mmol/l per increasing week GA in neonates with vaginal delivery and 0.49 mmol/l/week after cesarean section. Sex, multiple birth and antenatal corticosteroids were not associated hypernatremia. Conclusion VLGA neonates with vaginal delivery and low GA have the highest risk for hypernatremia. Early identification of neonates at risk and early intervention may prevent extreme sodium excursions and associated clinical complications.

Authors

Nadia S. Eugster, Florence Corminboeuf, Gilbert Koch, Julia E. Vogt, Thomas Sutter, Tamara van Donge, Marc Pfister, Roland Gerull

Submitted

Klinische Pädiatrie

Date

07.06.2021

LinkDOI

Abstract

Multiple data types naturally co-occur when describing real-world phenomena and learning from them is a long-standing goal in machine learning research. However, existing self-supervised generative models approximating an ELBO are not able to fulfill all desired requirements of multimodal models: their posterior approximation functions lead to a trade-off between the semantic coherence and the ability to learn the joint data distribution. We propose a new, generalized ELBO formulation for multimodal data that overcomes these limitations. The new objective encompasses two previous methods as special cases and combines their benefits without compromises. In extensive experiments, we demonstrate the advantage of the proposed method compared to state-of-the-art models in self-supervised, generative learning tasks.

Authors

Thomas M. Sutter, Imant Daunhawer, Julia E. Vogt

Submitted

Ninth International Conference on Learning Representations, ICLR 2021

Date

04.05.2021

Link

Abstract

We study the problem of inferring causal graphs from observational data. We are particularly interested in discovering graphs where all edges are oriented, as opposed to the partially directed graph that the state-of-the-art discover. To this end we base our approach on the algorithmic Markov condition. Unlike the statistical Markov condition, it uniquely identifies the true causal network as the one that provides the simplest--as measured in Kolmogorov complexity--factorization of the joint distribution. Although Kolmogorov complexity is not computable, we can approximate it from above via the Minimum Description Length principle, which allows us to define a consistent and computable score based on non-parametric multivariate regression. To efficiently discover causal networks in practice, we introduce the GLOBE algorithm, which greedily adds, removes, and orients edges such that it minimizes the overall cost. Through an extensive set of experiments we show GLOBE performs very well in practice, beating the state-of-the-art by a margin.

Authors

Osman Mian, Alexander Marx, Jilles Vreeken

Submitted

Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2021

Date

01.05.2021

LinkCode

Abstract

Background: Given the absence of consolidated and standardized international guidelines for managing pediatric appendicitis and the few strictly data-driven studies in this specific, we investigated the use of machine learning (ML) classifiers for predicting the diagnosis, management and severity of appendicitis in children. Materials and Methods: Predictive models were developed and validated on a dataset acquired from 430 children and adolescents aged 0-18 years, based on a range of information encompassing history, clinical examination, laboratory parameters, and abdominal ultrasonography. Logistic regression, random forests, and gradient boosting machines were used for predicting the three target variables. Results: A random forest classifier achieved areas under the precision-recall curve of 0.94, 0.92, and 0.70, respectively, for the diagnosis, management, and severity of appendicitis. We identified smaller subsets of 6, 17, and 18 predictors for each of targets that sufficed to achieve the same performance as the model based on the full set of 38 variables. We used these findings to develop the user-friendly online Appendicitis Prediction Tool for children with suspected appendicitis. Discussion: This pilot study considered the most extensive set of predictor and target variables to date and is the first to simultaneously predict all three targets in children: diagnosis, management, and severity. Moreover, this study presents the first ML model for appendicitis that was deployed as an open access easy-to-use online tool. Conclusion: ML algorithms help to overcome the diagnostic and management challenges posed by appendicitis in children and pave the way toward a more personalized approach to medical decision-making. Further validation studies are needed to develop a finished clinical decision support system.

Authors

Ricards Marcinkevics, Patricia Reis Wolfertstetter, Sven Wellmann, Christian Knorr, Julia E Vogt

Submitted

Frontiers in Pediatrics

Date

29.04.2021

LinkDOICode

Abstract

Survival analysis has gained significant attention in the medical domain with many far-reaching applications. Although a variety of machine learning methods have been introduced for tackling time-to-event prediction in unstructured data with complex dependencies, clustering of survival data remains an under-explored problem. The latter is particularly helpful in discovering patient subpopulations whose survival is regulated by different generative mechanisms, a critical problem in precision medicine. To this end, we introduce a novel probabilistic approach to cluster survival data in a variational deep clustering setting. Our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and the potentially censored survival times. We compare our model to the related work on survival clustering in comprehensive experiments on a range of synthetic, semi-synthetic, and real-world datasets. Our proposed method performs better at identifying clusters and is competitive at predicting survival times in terms of the concordance index and relative absolute error.

Authors

Laura Manduchi, Ricards Marcinkevics, Julia E. Vogt

Submitted

Contributed talk at AI for Public Health Workshop at ICLR 2021

Date

09.04.2021

Link

Abstract

Generating interpretable visualizations of multivariate time series in the intensive care unit is of great practical importance. Clinicians seek to condense complex clinical observations into intuitively understandable critical illness patterns, like failures of different organ systems. They would greatly benefit from a low-dimensional representation in which the trajectories of the patients’ pathology become apparent and relevant health features are highlighted. To this end, we propose to use the latent topological structure of Self-Organizing Maps (SOMs) to achieve an interpretable latent representation of ICU time series and combine it with recent advances in deep clustering. Specifically, we (a) present a novel way to fit SOMs with probabilistic cluster assignments (PSOM), (b) propose a new deep architecture for probabilistic clustering (DPSOM) using a VAE, and (c) extend our architecture to cluster and forecastclinical states in time series (T-DPSOM). We show that our model achieves superior clustering performance compared to state-of-the-art SOM-based clustering methods while maintaining the favorable visualization properties of SOMs. On the eICU data-set, we demonstrate that T-DPSOM provides interpretable visualizations ofpatient state trajectories and uncertainty estimation. We show that our method rediscovers well-known clinical patient characteristics, such as a dynamic variant of the Acute Physiology And Chronic Health Evaluation (APACHE) score. Moreover, we illustrate how itcan disentangle individual organ dysfunctions on disjoint regions of the two-dimensional SOM map.

Authors

Laura Manduchi, Matthias Hüser, Martin Faltys, Julia Vogt, Gunnar Rätsch, Vincent Fortuin

Submitted

ACM CHIL 2021

Date

04.03.2021

Link

Abstract

In the quest for efficient and robust learning methods, combining unsupervised state representation learning and reinforcement learning (RL) could offer advantages for scaling RL algorithms by providing the models with a useful inductive bias. For achieving this, an encoder is trained in an unsupervised manner with two state representation methods, a variational autoencoder and a contrastive estimator. The learned features are then fed to the actor-critic RL algorithm Proximal Policy Optimization (PPO) to learn a policy for playing Open AI’s car racing environment. Hence, such procedure permits to decouple state representations from RL-controllers. For the integration of RL with unsupervised learning, we explore various designs for variational autoencoders and contrastive learning. The proposed method is compared to a deep network trained directly on pixel inputs with PPO. The results show that the proposed method performs slightly worse than directly learning from pixel inputs; however, it has a more stable learning curve, a substantial reduction of the buffer size, and requires optimizing 88% fewer parameters. These results indicate that the use of pre-trained state representations hasseveral benefits for solving RL tasks.

Authors

Juan M. Montoya, Imant Daunhawer, Julia E. Vogt, Marco Wiering

Submitted

ICAART 2021

Date

04.02.2021

Link

Abstract

Exploratory analysis of time series data can yield a better understanding of complex dynamical systems. Granger causality is a practical framework for analysing interactions in sequential data, applied in a wide range of domains. In this paper, we propose a novel framework for inferring multivariate Granger causality under nonlinear dynamics based on an extension of self-explaining neural networks. This framework is more interpretable than other neural-network-based techniques for inferring Granger causality, since in addition to relational inference, it also allows detecting signs of Granger-causal effects and inspecting their variability over time. In comprehensive experiments on simulated data, we show that our framework performs on par with several powerful baseline methods at inferring Granger causality and that it achieves better performance at inferring interaction signs. The results suggest that our framework is a viable and more interpretable alternative to sparse-input neural networks for inferring Granger causality.

Authors

Ricards Marcinkevics, Julia E. Vogt

Submitted

Ninth International Conference on Learning Representations, ICLR 2021

Date

15.01.2021

LinkCode

Abstract

Rationale Tuberculosis diagnosis in children remains challenging. Microbiological confirmation of tuberculosis disease is often lacking, and standard immunodiagnostic including the tuberculin skin test and interferon-gamma release assay for tuberculosis infection has limited sensitivity. Recent research suggests that inclusion of novel Mycobacterium tuberculosis antigens has the potential to improve standard immunodiagnostic tests for tuberculosis. Objective To identify optimal antigen–cytokine combinations using novel Mycobacterium tuberculosis antigens and cytokine read-outs by machine learning algorithms to improve immunodiagnostic assays for tuberculosis. Methods A total of 80 children undergoing investigation of tuberculosis were included (15 confirmed tuberculosis disease, five unconfirmed tuberculosis disease, 28 tuberculosis infection and 32 unlikely tuberculosis). Whole blood was stimulated with 10 novel Mycobacterium tuberculosis antigens and a fusion protein of early secretory antigenic target (ESAT)-6 and culture filtrate protein (CFP) 10. Cytokines were measured using xMAP multiplex assays. Machine learning algorithms defined a discriminative classifier with performance measured using area under the receiver operating characteristics. Measurements and main results We found the following four antigen–cytokine pairs had a higher weight in the discriminative classifier compared to the standard ESAT-6/CFP-10-induced interferon-gamma: Rv2346/47c- and Rv3614/15c-induced interferon-gamma inducible protein-10; Rv2031c-induced granulocyte-macrophage colony-stimulating factor and ESAT-6/CFP-10-induced tumor necrosis factor-alpha. A combination of the 10 best antigen–cytokine pairs resulted in area under the curve of 0.92 +/- 0.04. Conclusion We exploited the use of machine learning algorithms as a key tool to evaluate large immunological datasets. This identified several antigen–cytokine pairs with the potential to improve immunodiagnostic tests for tuberculosis in children.

Authors

Noemi Rebecca Meier, Thomas M. Sutter, Marc Jacobsen, Tom H. M. Ottenhoff, Julia E. Vogt, Nicole Ritz

Submitted

Frontiers in Cellular and Infection Microbiology

Date

08.01.2021

LinkDOI

Abstract

Unplanned hospital readmissions are a burden to patients and increase healthcare costs. A wide variety of machine learning (ML) models have been suggested to predict unplanned hospital readmissions. These ML models were often specifically trained on patient populations with certain diseases. However, it is unclear whether these specialized ML models—trained on patient subpopulations with certain diseases or defined by other clinical characteristics—are more accurate than a general ML model trained on an unrestricted hospital cohort. In this study based on an electronic health record cohort of consecutive inpatient cases of a single tertiary care center, we demonstrate that accurate prediction of hospital readmissions may be obtained by general, disease-independent, ML models. This general approach may substantially decrease the cost of development and deployment of respective ML models in daily clinical routine, as all predictions are obtained by the use of a single model.

Authors

Thomas Sutter, Jan A Roth, Kieran Chin-Cheong, Balthasar L Hug, Julia E Vogt

Submitted

Journal of the American Medical Informatics Association

Date

18.12.2020

LinkDOI

Abstract

In this review, we examine the problem of designing interpretable and explainable machine learning models. Interpretability and explainability lie at the core of many machine learning and statistical applications in medicine, economics, law, and natural sciences. Although interpretability and explainability have escaped a clear universal definition, many techniques motivated by these properties have been developed over the recent 30 years with the focus currently shifting towards deep learning methods. In this review, we emphasise the divide between interpretability and explainability and illustrate these two different research directions with concrete examples of the state-of-the-art. The review is intended for a general machine learning audience with interest in exploring the problems of interpretation and explanation beyond logistic regression or random forest variable importance. This work is not an exhaustive literature survey, but rather a primer focusing selectively on certain lines of research which the authors found interesting or informative.

Authors

Ricards Marcinkevics, Julia E. Vogt

Submitted

Arxiv

Date

04.12.2020

Link

Abstract

Echocardiography monitors the heart movement for noninvasive diagnosis of heart diseases. It proves to be of profound practical importance as it combines low-cost portable instrumentation and rapid image acquisition without the risks of ionizing radiation. However, echocardiograms produce high-dimensional, noisy data which frequently proved difficult to interpret. As a solution, we propose a novel autoencoder-based framework, DeepHeartBeat, to learn human interpretable representations of cardiac cycles from cardiac ultrasound data. Our model encodes high dimensional observations by a cyclic trajectory in a lower dimensional space. We show that the learned parameters describing the latent trajectory are well interpretable and we demonstrate the versatility of our model by successfully applying it to various cardiologically relevant tasks, such as ejection fraction prediction and arrhythmia detection. As a result, DeepHeartBeat promises to serve as a valuable assistant tool for automating therapy decisions and guiding clinical care.

Authors

Fabian Laumer, Gabriel Fringeli, Alina Dubatovka, Laura Manduchi, Joachim M. Buhmann

Submitted

best newcomer award + spotlight talk at Machine Learning for Health Workshop, NeurIPS 2020

Date

01.12.2020

Link

Abstract

Exploratory analysis of time series data can yield a better understanding of complex dynamical systems. Granger causality is a practical framework for analysing interactions in sequential data, applied in a wide range of domains. In this paper, we propose a novel framework for inferring multivariate Granger causality under nonlinear dynamics based on an extension of self-explaining neural networks. This framework is more interpretable than other neural-network-based techniques for inferring Granger causality, since in addition to relational inference, it also allows detecting signs of Granger-causal effects and inspecting their variability over time. In comprehensive experiments on simulated data, we show that our framework performs on par with several powerful baseline methods at inferring Granger causality and that it achieves better performance at inferring interaction signs. The results suggest that our framework is a viable and more interpretable alternative to sparse-input neural networks for inferring Granger causality.

Authors

Ricards Marcinkevics, Julia E. Vogt

Submitted

Interpretable Inductive Biases and Physically Structured Learning Workshop, NeurIPS 2020

Date

01.11.2020

Link

Abstract

Learning from different data types is a long-standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. However, existing generative models that approximate a multimodal ELBO rely on difficult or inefficient training schemes to learn a joint distribution and the dependencies between modalities. In this work, we propose a novel, efficient objective function that utilizes the Jensen-Shannon divergence for multiple distributions. It simultaneously approximates the unimodal and joint multimodal posteriors directly via a dynamic prior. In addition, we theoretically prove that the new multimodal JS-divergence (mmJSD) objective optimizes an ELBO. In extensive experiments, we demonstrate the advantage of the proposed mmJSD model compared to previous work in unsupervised, generative learning tasks.

Authors

Thomas M. Sutter, Imant Daunhawer, Julia E. Vogt

Submitted

NeurIPS 2019

Date

22.10.2020

Link

Abstract

PET/CT imaging is the gold standard for the diagnosis and staging of lung cancer. However, especially in healthcare systems with limited resources, costly PET/CT images are often not readily available. Conventional machine learning models either process CT or PET/CT images but not both. Models designed for PET/CT images are hence restricted by the number of PET images, such that they are unable to additionally leverage CT-only data. In this work, we apply the concept of visual soft attention to efficiently learn a model for lung cancer segmentation from only a small fraction of PET/CT scans and a larger pool of CT-only scans. We show that our model is capable of jointly processing PET/CT as well as CT-only images, which performs on par with the respective baselines whether or not PET images are available at test time. We then demonstrate that the model learns efficiently from only a few PET/CT scans in a setting where mostly CT-only data is available, unlike conventional models.

Authors

Varaha Karthik Pattisapu, Imant Daunhawer, Thomas Weikert, Alexander Sauter, Bram Stieltjes, Julia E. Vogt

Submitted

GCPR 2020

Date

12.10.2020

Link

Abstract

Multimodal generative models learn a joint distribution over multiple modalities and thus have the potential to learn richer representations than unimodal models. However, current approaches are either inefficient in dealing with more than two modalities or fail to capture both modality-specific and shared variations. We introduce a new multimodal generative model that integrates both modality-specific and shared factors and aggregates shared information across any subset of modalities efficiently. Our method partitions the latent space into disjoint subspaces for modality-specific and shared factors and learns to disentangle these in a purely self-supervised manner. In extensive experiments, we show improvements in representation learning and generative performance compared to previous methods and showcase the disentanglement capabilities.

Authors

Imant Daunhawer, Thomas M. Sutter, Ricards Marcinkevics, Julia E. Vogt

Submitted

GCPR 2020

Date

10.09.2020

Link

Authors

Richard Rau, Ece Özkan Elsen, Batu M. Ozturkler, Leila Gastli, Orcun Goksel

Submitted

IEEE International Ultrasonics Symposium (IUS)

Date

11.08.2020

DOI

Abstract

Background Functional ambulation limitations are features of lumbar spinal stenosis (LSS) and knee osteoarthritis (OA). With numerous validated walking assessment protocols and a vast number of spatiotemporal gait parameters available from sensor-based assessment, there is a critical need for selection of appropriate test protocols and variables for research and clinical applications. Research question In patients with knee OA and LSS, what are the best sensor-derived gait parameters and the most suitable clinical walking test to discriminate between these patient populations and controls? Methods We collected foot-mounted inertial measurement unit (IMU) data during three walking tests (fast-paced walk test-FPWT, 6-min walk test– 6MWT, self-paced walk test – SPWT) for subjects with LSS, knee OA and matched controls (N = 10 for each group). Spatiotemporal gait characteristics were extracted and pairwise compared (Omega partial squared – w_p^2) between patients and controls. Results We found that normal paced walking tests (6MWT, SPWT) are better suited for distinguishing gait characteristics between patients and controls. Among the sensor-based gait parameters, stance and double support phase timing were identified as the best gait characteristics for the OA population discrimination, whereas foot flat ratio, gait speed, stride length and cadence were identified as the best gait characteristics for the LSS population discrimination. Significance These findings provide guidance on the selection of sensor-derived gait parameters and clinical walking tests to detect alterations in mobility for people with LSS and knee OA.

Authors

C. Odonkor, A. Kuwabara, C. Tomkins-Lane, W. Zhang, A. Muaremi, H. Leutheuser, R. Sun, M. Smuck

Submitted

Gait&Posture

Date

01.07.2020

DOI

Abstract

Background The mortality risk remains significant in paediatric and adult patients on chronic haemodialysis (HD) treatment. We aimed to identify factors associated with mortality in patients who started HD as children and continued HD as adults. Methods The data originated from a cohort of patients < 30 years of age who started HD in childhood (<= 19 years) on thrice-weekly HD in outpatient DaVita dialysis centres between 2004 and 2016. Patients with at least 5 years of follow-up since the initiation of HD or death within 5 years were included; 105 variables relating to demographics, HD treatment and laboratory measurements were evaluated as predictors of 5-year mortality utilizing a machine learning approach (random forest). Results A total of 363 patients were included in the analysis, with 84 patients having started HD at < 12 years of age. Low albumin and elevated lactate dehydrogenase (LDH) were the two most important predictors of 5-year mortality. Other predictors included elevated red blood cell distribution width or blood pressure and decreased red blood cell count, haemoglobin, albumin:globulin ratio, ultrafiltration rate, z-score weight for age or single-pool K_t/V (below target). Mortality was predicted with an accuracy of 81%. Conclusions Mortality in paediatric and young adult patients on chronic HD is associated with multifactorial markers of nutrition, inflammation, anaemia and dialysis dose. This highlights the importance of multimodal intervention strategies besides adequate HD treatment as determined by K_t/V alone. The association with elevated LDH was not previously reported and may indicate the relevance of blood–membrane interactions, organ malperfusion or haematologic and metabolic changes during maintenance HD in this population.

Authors

Verena Gotta, Georgi Tancev, Olivera Marsenic, Julia E. Vogt, Marc Pfister

Submitted

Nephrology Dialysis Transplantation

Date

08.06.2020

LinkDOI

Abstract

Electronic Health Records (EHRs) are commonly used by the machine learning community for research on problems specifically related to health care and medicine. EHRs have the advantages that they can be easily distributed and contain many features useful for e.g. classification problems. What makes EHR data sets different from typical machine learning data sets is that they are often very sparse, due to their high dimensionality, and often contain heterogeneous (mixed) data types. Furthermore, the data sets deal with sensitive information, which limits the distribution of any models learned using them, due to privacy concerns. For these reasons, using EHR data in practice presents a real challenge. In this work, we explore using Generative Adversarial Networks to generate synthetic, heterogeneous EHRs with the goal of using these synthetic records in place of existing data sets for downstream classification tasks. We will further explore applying differential privacy (DP) preserving optimization in order to produce DP synthetic EHR data sets, which provide rigorous privacy guarantees, and are therefore shareable and usable in the real world. The performance (measured by AUROC, AUPRC and accuracy) of our model's synthetic, heterogeneous data is very close to the original data set (within 3 - 5% of the baseline) for the non-DP model when tested in a binary classification task. Using strong (1,10^-5) DP, our model still produces data useful for machine learning tasks, albeit incurring a roughly 17% performance penalty in our tested classification task. We additionally perform a sub-population analysis and find that our model does not introduce any bias into the synthetic EHR data compared to the baseline in either male/female populations, or the 0-18, 19-50 and 51+ age groups in terms of classification performance for either the non-DP or DP variant.

Authors

Kieran Chin-Cheong, Thomas M. Sutter, Julia E. Vogt

Submitted

Arxiv

Date

07.06.2020

Link

Abstract

The benefit of fog computing to use local devices more efficiently and to reduce the latency and operation cost compared to cloud infrastructure is promising for industrial automation. Many industrial (control) applications have demanding real-time requirements and existing automation networks typically exhibit low-bandwidth links between sensing and computing devices. Fog applications in industrial automation contexts thus require that the amount of data transferred between sensing, computing and actuating devices, as well as latencies of control loops are minimized. To meet these requirements, this paper proposes a fog layer architecture that manages the computation and deployment of latency-aware industrial applications with Kubernetes, the prevalent container orchestration framework. The resulting fog layer dynamically solves the resource allocation optimization problem and then deploys distributed containerized applications to automation system networks. It achieves this in a non-intrusive manner, i.e. without actively modifying Kubernetes. Moreover it does not depend on proprietary protocols and infrastructure and is thus widely applicable and preferable to a vendor-specific solution. We compare the architecture with two alternative approaches that differ in the level of coupling to Kubernetes.

Authors

Raphael Eidenbenz, Yvonne-Anne Pignolet, Alain Ryser

Submitted

Fifth International Conference on Fog and Mobile Edge Computing (FMEC)

Date

20.04.2020

Link

Abstract

Clinical pharmacology is a multi-disciplinary data sciences field that utilizes mathematical and statistical methods to generate maximal knowledge from data. Pharmacometrics (PMX) is a well-recognized tool to characterize disease progression, pharmacokinetics and risk factors. Since the amount of data produced keeps growing with increasing pace, the computational effort necessary for PMX models is also increasing. Additionally, computationally efficient methods such as machine learning (ML) are becoming increasingly important in medicine. However, ML is currently not an integrated part of PMX, for various reasons. The goals of this article are to (i) provide an introduction to ML classification methods, (ii) provide examples for a ML classification analysis to identify covariates based on specific research questions, (iii) examine a clinically relevant example to investigate possible relationships of ML and PMX, and (iv) present a summary of ML and PMX tasks to develop clinical decision support tools.

Authors

Gilbert Koch, Marc Pfister, Imant Daunhawer, Melanie Wilbaux, Sven Wellmann, Julia E. Vogt

Submitted

Clinical Pharmacology & Therapeutics, 2020

Date

11.01.2020

LinkDOI

Abstract

Despite the application of advanced statistical and pharmacometric approaches to pediatric trial data, a large pediatric evidence gap still remains. Here, we discuss how to collect more data from children by using real-world data from electronic health records, mobile applications, wearables, and social media. The large datasets collected with these approaches enable, and may demand, the use of artificial intelligence and machine learning to allow the data to be analyzed for decision-making. Applications of this approach are presented, which include the prediction of future clinical complications, medical image analysis, identification of new pediatric endpoints and biomarkers, the prediction of treatment non-responders and the prediction of placebo-responders for trial enrichment. Finally, we discuss how to bring machine learning from science to pediatric clinical practice. We conclude that advantage should be taken of the current opportunities offered by innovations in data science and machine learning to close the pediatric evidence gap.

Authors

Sebastiaan C. Goulooze, Laura B. Zwep, Julia E. Vogt, Elke H.J. Krekels, Thomas Hankemeier, John N. van den Anker, Catherijne A.J. Knibbe

Submitted

Clinical Pharmacology & Therapeutics

Date

19.12.2019

LinkDOI

Abstract

Self-organizing maps (SOMs) have been widely used as a means to visualize latent structure in large amounts of heterogeneous data, in particular as a clustering method. Relatively little work, however, has focused on combining SOMs with deep generative networks for modeling health states, which arise for example in the intensive care unit (ICU). We present Temporal PSOM, a novel neural network architecture that jointly trains a Variational Autoencoder for feature extraction and a probabilistic version of SOM to achieve an interpretable discrete representation of patient health states in the ICU. Experiments on the publicly available eICU data set show significant improvements over state-of-the-art methods in terms of cluster enrichment for current APACHE physiology scores as well as prediction of future physiology states.

Authors

Laura Manduchi, Matthias Hueser, Gunnar Raetsch, Vincent Fortuin

Submitted

ML4H Workshop, NeurIPS 2019

Date

15.12.2019

Abstract

Learning from different data types is a long standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. Existing generative models that try to approximate a multimodal ELBO rely on difficult training schemes to handle the intermodality dependencies, as well as the approximation of the joint representation in case of missing data. In this work, we propose an ELBO for multimodal data which learns the unimodal and joint multimodal posterior approximation functions directly via a dynamic prior. We show that this ELBO is directly derived from a variational inference setting for multiple data types, resulting in a divergence term which is the Jensen-Shannon divergence for multiple distributions. We compare the proposed multimodal JS-divergence (mmJSD) model to state-of-the-art methods and show promising results using our model in unsupervised, generative learning using a multimodal VAE on two different datasets.

Authors

Thomas Sutter, Imant Daunhawer, Julia E. Vogt

Submitted

Visually Grounded Interaction and Language Workshop, NeurIPS 2019

Date

12.12.2019

Abstract

Multimodal generative models learn a joint distribution of data from different modalities---a task which arguably benefits from the disentanglement of modality-specific and modality-invariant information. We propose a factorized latent variable model that learns named disentanglement on multimodal data without additional supervision. We demonstrate the disentanglement capabilities on simulated data, and show that disentangled representations can improve the conditional generation of missing modalities without sacrificing unconditional generation.

Authors

Imant Daunhawer, Thomas Sutter, Julia E. Vogt

Submitted

Bayesian Deep Learning Workshop, NeurIPS 2019

Date

12.12.2019

Abstract

Electronic Health Records (EHRs) are commonly used by the machine learning community for research on problems specifically related to health care and medicine. EHRs have the advantages that they can be easily distributed and contain many features useful for e.g. classification problems. What makes EHR data sets different from typical machine learning data sets is that they are often very sparse, due to their high dimensionality, and often contain heterogeneous data types. Furthermore, the data sets deal with sensitive information, which limits the distribution of any models learned using them, due to privacy concerns. In this work, we explore using Generative Adversarial Networks to generate synthetic, \textit{heterogeneous} EHRs with the goal of using these synthetic records in place of existing data sets. We will further explore applying differential privacy (DP) preserving optimization in order to produce differentially private synthetic EHR data sets, which provide rigorous privacy guarantees, and are therefore more easily shareable. The performance of our model's synthetic, heterogeneous data is very close to the original data set (within 4.5%) for the non-DP model. Although around 20% worse, the DP synthetic data is still usable for machine learning tasks.

Authors

Kieran Chin-Cheong, Thomas Sutter, Julia E. Vogt

Submitted

Machine Learning for Health (ML4H) Workshop, NeurIPS 2019

Date

12.12.2019

Abstract

We consider the problem of inferring the causal direction between two univariate numeric random variables X and Y from observational data. This case is especially challenging as the graph X causes Y is Markov equivalent to the graph Y causes X, and hence it is impossible to determine the correct direction using conditional independence tests. To tackle this problem, we follow an information theoretic approach based on the algorithmic Markov condition. This postulate states that in terms of Kolmogorov complexity the factorization given by the true causal model is the most succinct description of the joint distribution. This means that we can infer that X is a likely cause of Y when we need fewer bits to first transmit the data over X, and then the data of Y as a function of X, than for the inverse direction. That is, in this paper we perform causal inference by compression. To put this notion to practice, we employ the Minimum Description Length principle, and propose a score to determine how many bits we need to transmit the data using a class of regression functions that can model both local and global functional relations. To determine whether an inference, i.e. the difference in compressed sizes, is significant, we propose two analytical significance tests based on the no-hypercompression inequality. Last, but not least, we introduce the linear-time Slope and Sloper algorithms that through thorough empirical evaluation we show outperform the state of the art by a wide margin.

Authors

Alexander Marx, Jilles Vreeken

Submitted

Knowledge and Information Systems

Date

01.09.2019

DOICode

Abstract

Clustering, a fundamental task in data science and machine learning, groups a set of objects in such a way that objects in the same cluster are closer to each other than to those in other clusters. In this paper, we consider a well-known structure, so-called $r$-nets, which rigorously captures the properties of clustering. We devise algorithms that improve the run-time of approximating $r$-nets in high-dimensional spaces with $\ell_1$ and $\ell_2$ metrics from $\tilde{O}(dn^{2-\Theta(\sqrt{\epsilon})})$ to $\tilde{O}(dn + n^{2-\alpha})$, where $\alpha = \Omega({\epsilon^{1/3}}/{\log(1/\epsilon)})$. These algorithms are also used to improve a framework that provides approximate solutions to other high dimensional distance problems. Using this framework, several important related problems can also be solved efficiently, e.g., $(1+\epsilon)$-approximate $k$th-nearest neighbor distance, $(4+\epsilon)$-approximate Min-Max clustering, $(4+\epsilon)$-approximate $k$-center clustering. In addition, we build an algorithm that $(1+\epsilon)$-approximates greedy permutations in time $\tilde{O}((dn + n^{2-\alpha}) \cdot \log{\Phi})$ where $\Phi$ is the spread of the input. This algorithm is used to $(2+\epsilon)$-approximate $k$-center with the same time complexity.

Authors

Georgia Avarikioti, Alain Ryser, Yuyi Wang, Roger Wattenhofer

Submitted

Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 3207-3214).

Date

17.07.2019

Link

Abstract

We consider the problem of telling apart cause from effect between two univariate continuous-valued random variables X and Y. In general, it is impossible to make definite statements about causality without making assumptions on the underlying model; one of the most important aspects of causal inference is hence to determine under which assumptions are we able to do so. In this paper we show under which general conditions we can identify cause from effect by simply choosing the direction with the best regression score. We define a general framework of identifiable regression-based scoring functions, and show how to instantiate it in practice using regression splines. Compared to existing methods that either give strong guarantees, but are hardly applicable in practice, or provide no guarantees, but do work well in practice, our instantiation combines the best of both worlds; it gives guarantees, while empirical evaluation on synthetic and real-world data shows that it performs at least as well as the state of the art.

Authors

Alexander Marx, Jilles Vreeken

Submitted

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2019

Date

01.07.2019

DOICode

Authors

Lisa Ruby, Sergio J. Sanabria, Katharina Martini, Konstantin J. Dedes, Denise Vorburger, Ece Özkan Elsen, Thomas Frauenfelder, Orcun Goksel, Marga B. Rominger

Submitted

Investigative Radiology

Date

30.06.2019

DOI

Abstract

We present a probabilistic model for clustering which enables the modeling of overlapping clusters where objects are only available as pairwise distances. Examples of such distance data are genomic string alignments, or protein contact maps. In our clustering model, an object has the freedom to belong to one or more clusters at the same time. By using an IBP process prior, there is no need to explicitly fix the number of clusters, as well as the number of overlapping clusters, in advance. In this paper, we demonstrate the utility of our model using distance data obtained from HIV1 protease inhibitor contact maps.

Authors

Sandhya Prabhakaran, Julia E. Vogt

Submitted

Artificial Intelligence in Medicine (AIME), Springer Lecture Notes in Artificial Intelligence, 2019

Date

29.05.2019

LinkDOI

Abstract

The recent adoption of Electronic Health Records (EHRs) by health care providers has introduced an important source of data that provides detailed and highly specific insights into patient phenotypes over large cohorts. These datasets, in combination with machine learning and statistical approaches, generate new opportunities for research and clinical care. However, many methods require the patient representations to be in structured formats, while the information in the EHR is often locked in unstructured texts designed for human readability. In this work, we develop the methodology to automatically extract clinical features from clinical narratives from large EHR corpora without the need for prior knowledge. We consider medical terms and sentences appearing in clinical narratives as atomic information units. We propose an efficient clustering strategy suitable for the analysis of large text corpora and to utilize the clusters to represent information about the patient compactly. To demonstrate the utility of our approach, we perform an association study of clinical features with somatic mutation profiles from 4,007 cancer patients and their tumors. We apply the proposed algorithm to a dataset consisting of about 65 thousand documents with a total of about 3.2 million sentences. We identify 341 significant statistical associations between the presence of somatic mutations and clinical features. We annotated these associations according to their novelty, and report several known associations. We also propose 32 testable hypotheses where the underlying biological mechanism does not appear to be known but plausible. These results illustrate that the automated discovery of clinical features is possible and the joint analysis of clinical and genetic datasets can generate appealing new hypotheses.

Authors

Stefan G. Stark, Stephanie L. Hyland, Melanie F. Pradier, Kjong Lehmann, Andreas Wicki, Fernando Perez Cruz, Julia E. Vogt, Gunnar Rätsch

Submitted

Arxiv preprint

Date

02.05.2019

Link

Abstract

Motivation: Personalized medicine aims at combining genetic, clinical, and environmental data to improve medical diagnosis and disease treatment, tailored to each patient. This paper presents a Bayesian nonparametric (BNP) approach to identify genetic associations with clinical/environmental features in cancer. We propose an unsupervised approach to generate data-driven hypotheses and bring potentially novel insights about cancer biology. Our model combines somatic mutation information at gene-level with features extracted from the Electronic Health Record. We propose a hierarchical approach, the hierarchical Poisson factor analysis (H-PFA) model, to share information across patients having different types of cancer. To discover statistically significant associations, we combine Bayesian modeling with bootstrapping techniques and correct for multiple hypothesis testing. Results: Using our approach, we empirically demonstrate that we can recover well-known associations in cancer literature. We compare the results of H-PFA with two other classical methods in the field: case-control (CC) setups, and linear mixed models (LMMs).

Authors

Melanie F. Pradier, Stephanie L. Hyland, Stefan G. Stark, Kjong Lehmann, Julia E. Vogt, Fernando Perez-Cruz, Gunnar Rätsch

Submitted

Biorxiv preprint

Date

29.04.2019

LinkDOI

Abstract

Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice--especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as L2 consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.

Authors

Alexander Marx, Jilles Vreeken

Submitted

Proceedings of the International Conference on Artificial Intelligence and Statistics, AISTATS 2019

Date

01.04.2019

LinkCode

Abstract

Background Machine learning models may enhance the early detection of clinically relevant hyperbilirubinemia based on patient information available in every hospital. Methods We conducted a longitudinal study on preterm and term born neonates with serial measurements of total serum bilirubin in the first two weeks of life. An ensemble, that combines a logistic regression with a random forest classifier, was trained to discriminate between the two classes phototherapy treatment vs. no treatment. Results Of 362 neonates included in this study, 98 had a phototherapy treatment, which our model was able to predict up to 48 h in advance with an area under the ROC-curve of 95.20%. From a set of 44 variables, including potential laboratory and clinical confounders, a subset of just four (bilirubin, weight, gestational age, hours since birth) suffices for a strong predictive performance. The resulting early phototherapy prediction tool (EPPT) is provided as an open web application. Conclusion Early detection of clinically relevant hyperbilirubinemia can be enhanced by the application of machine learning. Existing guidelines can be further improved to optimize timing of bilirubin measurements to avoid toxic hyperbilirubinemia in high-risk patients while minimizing unneeded measurements in neonates who are at low risk.

Authors

Imant Daunhawer, Severin Kasser, Gilbert Koch, Lea Sieber, Hatice Cakal, Janina Tütsch, Marc Pfister, Sven Wellman, Julia E. Vogt

Submitted

Pediatric Research, 2019

Date

30.03.2019

LinkDOI

Authors

Alvaro Gomariz, Weiye Li, Ece Özkan Elsen, Christine Tanner, Orcun Goksel

Submitted

International Symposium on Biomedical Imaging (ISBI)

Date

06.02.2019

DOI

Abstract

The classification of time series data is a well-studied problem with numerous practical applications, such as medical diagnosis and speech recognition. A popular and effective approach is to classify new time series in the same way as their nearest neighbours, whereby proximity is defined using Dynamic Time Warping (DTW) distance, a measure analogous to sequence alignment in bioinformatics. However, practitioners are not only interested in accurate classification, they are also interested in why a time series is classified a certain way. To this end, we introduce here the problem of finding a minimum length subsequence of a time series, the removal of which changes the outcome of the classification under the nearest neighbour algorithm with DTW distance. Informally, such a subsequence is expected to be relevant for the classification and can be helpful for practitioners in interpreting the outcome. We describe a simple but optimized implementation for detecting these subsequences and define an accompanying measure to quantify the relevance of every time point in the time series for the classification. In tests on electrocardiogram data we show that the algorithm allows discovery of important subsequences and can be helpful in detecting abnormalities in cardiac rhythms distinguishing sick from healthy patients.

Authors

Ricards Marcinkevics, Steven Kelk, Carlo Galuzzi, Berthold Stegemann

Submitted

Arxiv

Date

26.01.2019

Link

Authors

Stefanie Ehrbar, Alexander Jöhl, Michael Kühni, Mirko Meboldt, Ece Özkan Elsen, Christine Tanner, Orcun Goksel, Stephan Klöck, Jan Unkelbach, Matthias Guckenberger, Stephanie Tanadini-Lang

Submitted

Medical Physics

Date

03.01.2019

DOI

Authors

Sandhya Prabhakaran and Julia E. Vogt

Submitted

All of Bayesian Nonparametrics Workshop in Neural Information Processing Systems Conference 2018

Date

02.12.2018

Abstract

To exploit the full potential of big routine data in healthcare and to efficiently communicate and collaborate with information technology specialists and data analysts, healthcare epidemiologists should have some knowledge of large-scale analysis techniques, particularly about machine learning. This review focuses on the broad area of machine learning and its first applications in the emerging field of digital healthcare epidemiology.

Authors

Jan A. Roth, Manuel Battegay, Fabrice Juchler, Julia E. Vogt, Andreas F. Widmer

Submitted

Infection Control & Hospital Epidemiology, 2018

Date

04.11.2018

LinkDOI

Abstract

How can we discover whether X causes Y, or vice versa, that Y causes X, when we are only given a sample over their joint distribution? How can we do this such that X and Y can be univariate, multivariate, or of different cardinalities? And, how can we do so regardless of whether X and Y are of the same, or of different data type, be it discrete, numeric, or mixed? These are exactly the questions we answer. We take an information theoretic approach, based on the Minimum Description Length principle, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. Simply put, if Y can be explained more succinctly by a set of classification or regression trees conditioned on X, than in the opposite direction, we conclude that X causes Y. Empirical evaluation on a wide range of data shows that our method, Crack, infers the correct causal direction reliably and with high accuracy on a wide range of settings, outperforming the state of the art by a wide margin. Code related to this paper is available at: http://eda.mmci.uni-saarland.de/crack.

Authors

Alexander Marx, Jilles Vreeken

Submitted

Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Data, ECMLPKDD 2018

Date

13.08.2018

DOICode

Authors

Ece Özkan Elsen, Valery Vishnevsky, Orcun Goksel

Submitted

IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control

Date

03.03.2018

DOI

Abstract

Wearable health sensors are about to change our health system. While several technological improvements have been presented to enhance performance and energy-efficiency, battery runtime is still a critical concern for practical use of wearable biomedical sensor systems. The runtime limitation is directly related to the battery size, which is another concern regarding practicality and customer acceptance. We introduced ULPSEK-Ultra-Low-Power Sensor Evaluation Kit-for evaluation of biomedical sensors and monitoring applications (http://ulpsek.com). ULPSEK includes a multiparameter sensor measuring and processing electrocardiogram, respiration, motion, body temperature, and photoplethysmography. Instead of a battery, ULPSEK is powered using an efficient body heat harvester. The harvester produced 171 W on average, which was sufficient to power the sensor below 25 C ambient temperature. We present design issues regarding the power supply and the power distribution network of the ULPSEK sensor platform. Due to the security aspect of self-powered health sensors, we suggest a hybrid solution consisting of a battery charged by a harvester.

Authors

A. Tobola, H. Leutheuser, M. Pollak, P. Spies, C. Hofmann, C. Weigand, B.M. Eskofier, G. Fischer

Submitted

IEEE J Biomed Health Inform.

Date

01.01.2018

DOI

Abstract

The second most common cause of diving fatalities is cardiovascular diseases. Monitoring the cardiovascular system in actual underwater conditions is necessary to gain insights into cardiac activity during immersion and to trigger preventive measures. We developed a wearable, current-based electrocardiogram (ECG) device in the eco-system of the FitnessSHIRT platform. It can be used for normal/dry ECG measuring purposes but is specifically designed to allow underwater signal acquisition without having to use insulated electrodes. Our design is based on a transimpedance amplifier circuit including active current feedback. We integrated additional cascaded filter components to counter noise characteristics specific to the immersed condition of such a system. The results of the evaluation show that our design is able to deliver high-quality ECG signals underwater with no interferences or loss of signal quality. To further evaluate the applicability of the system, we performed an applied study with it using 12 healthy subjects to examine whether differences in the heart rate variability exist between sitting and supine positions of the human body immersed in water and outside of it. We saw significant differences, for example, in the RMSSD and SDSD between sitting outside the water (36 ms) and sitting immersed in water (76 ms) and the pNN50 outside the water (6.4%) and immersed in water (18.2%). The power spectral density for the sitting positions in the TP and HF increased significantly during water immersion while the LF/HF decreased significantly. No significant changes were found for the supine position.

Authors

S. Gradl, T. Cibis, J. Lauber, R. Richer, R. Rybalko, N. Pfeiffer, H. Leutheuser, M. Wirth, V. Tscharner, B. M. Eskofier

Submitted

Appl Sci.

Date

08.12.2017

DOI

Abstract

Objective: Respiratory inductance plethysmography (RIP) provides an unobtrusive method for measuring breathing characteristics. Accurately adjusted RIP provides reliable measurements of ventilation during rest and exercise if data are acquired via two elastic measuring bands surrounding the rib cage (RC) and abdomen (AB). Disadvantageously, the most accurate reported adjusted model for RIP in literature-least squares regression-requires simultaneous RIP and flowmeter (FM) data acquisition. An adjustment method without simultaneous measurement (reference-free) of RIP and FM would foster usability enormously. Methods: In this paper, we develop generalizable, functional, and reference-free algorithms for RIP adjustment incorporating anthropometric data. Further, performance of only one-degree of freedom (RC or AB) instead of two (RC and AB) is investigated. We evaluate the algorithms with data from 193 healthy subjects who performed an incremental running test using three different datasets: training, reliability, and validation dataset. The regression equation is improved with machine learning techniques such as sequential forward feature selection and 10-fold cross validation. Results: Using the validation dataset, the best reference-free adjustment model is the combination of both bands with 84.69% breaths within 20% limits of equivalence compared to 43.63% breaths using the best comparable algorithm from literature. Using only one band, we obtain better results using the RC band alone. Conclusion: Reference-free adjustment for RIP reveals tidal volume differences of up to 0.25 l when comparing to the best possible adjustment currently present which needs the simultaneous measurement of RIP and FM. Significance: This demonstrates that RIP has the potential for usage in wide applications in ambulatory settings.

Authors

H. Leutheuser, C. Heyde, K. Roecker, A. Gollhofer, B. M Eskofier

Submitted

IEEE Trans Biomed Eng.

Date

01.12.2017

DOI

Abstract

We consider the fundamental problem of inferring the causal direction between two univariate numeric random variables X and Y from observational data. The two-variable case is especially difficult to solve since it is not possible to use standard conditional independence tests between the variables. To tackle this problem, we follow an information theoretic approach based on Kolmogorov complexity and use the Minimum Description Length (MDL) principle to provide a practical solution. In particular, we propose a compression scheme to encode local and global functional relations using MDL-based regression. We infer X causes Y in case it is shorter to describe Y as a function of X than the inverse direction. In addition, we introduce Slope, an efficient linear-time algorithm that through thorough empirical evaluation on both synthetic and real world data we show outperforms the state of the art by a wide margin.

Authors

Alexander Marx, Jilles Vreeken

Submitted

Proceedings of the IEEE International Conference on Data Mining, ICDM 2017

Date

01.11.2017

DOICode

Abstract

Aims: The identification of arrhythmogenic right ventricular dysplasia (ARVD) from 12-channel standard electrocardiogram (ECG) is challenging. High density ECG data may identify lead locations and criteria with a higher sensitivity. Methods and results: Eighty-channel ECG recording from patients diagnosed with ARVD and controls were quantified by magnitude and integral measures of QRS and T waves and by a measure (the average silhouette width) of differences in the shapes of the normalized ECG cycles. The channels with the best separability between ARVD patients and controls were near the right ventricular wall, at the third intercostal space. These channels showed pronounced differences in P waves compared to controls as well as the expected differences in QRS and T waves. Conclusion: Multichannel recordings, as in body surface mapping, add little to the reliability of diagnosing ARVD from ECGs. However, repositioning ECG electrodes to a high anterior position can improve the identification of ECG variations in ARVD. Additionally, increased P wave amplitude appears to be associated with ARVD.

Authors

Ricards Marcinkevics, James O’Neill, Hannah Law, Eleftheria Pervolaraki, Andrew Hogarth, Craig Russell, Berthold Stegemann, Arun V Holden, Muzahir H Tayebjee

Submitted

EP Europace

Date

29.08.2017

LinkDOI

Abstract

Sleep plays a fundamental role in the life of every human. The prevalence of sleep disorders has increased significantly, now affecting up to 50% of the general population. Sleep is usually analyzed by extracting a hypnogram containing sleep stages. The gold standard method polysomnography (PSG) requires subjects to stay overnight in a sleep laboratory and to wear a series of obtrusive devices. This work presents an easy to use method to perform somnography at home using unobtrusive motion sensors. Ten healthy male subjects were recorded during two consecutive nights. Sensors from the Shimmer platform were placed in the bed to record accelerometer data, while reference hypnograms were collected using a SOMNOwatch system. A series of filters were used to extract a motion feature in 30 second epochs from the accelerometer signals. The feature was used together with the ground truth information to train a Naive Bayes classifiers that distinguished wakefulness, REM and non-REM sleep. Additionally the algorithm was implemented on an Android mobile phone. Averaged over all subjects, the classifier had a mean accuracy of 79.0 % (SD 9.2%) for the three classes. The mobile phone implementation was able to run in realtime during all experiments. In future this will lead to a method for simple and unobtrusive somnography using mobile phones.

Authors

S. Gradl, H. Leutheuser, P. Kugler, T. Biermann, S. Kreil, J. Kornhuber, M. Bergner, B. M. Eskofier

Submitted

In Proc: 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Date

03.07.2017

DOI

Authors

Ece Özkan Elsen, Christine Tanner, Matej Kastelic, Oliver Mattausch, Maxim Makhinya, Orcun Goksel

Submitted

International Journal of Computer Assisted Radiology and Surgery

Date

22.03.2017

DOI

Abstract

Innovative and pervasive monitoring possibilities are given using textile integration of wearable computing components. We present the FitnessSHIRT (Fraunhofer IIS, Erlangen, Germany) as one example of a textile integrated wearable computing device. Using the FitnessSHIRT, the electric activity of the human heart and breathing characteristics can be determined. Within this chapter, we give an overview of the market situation, current application scenarios, and related work. We describe the technology and algorithms behind the wearable FitnessSHIRT as well as current application areas in sports and medicine. Challenges using textile integrated wearable devices are stated and addressed in experiments or in explicit recommendations. The applicability of the FitnessSHIRT is shown in user studies in sports and medicine. This chapter is concluded with perspectives for textile integrated wearable devices.

Authors

Leutheuser, H. and Lang, N. and Gradl, S. and Struck, M. and Tobola, A. and Hofmann, C. and Anneken, L. and Eskofier, B. M.

Submitted

Smart Textiles: Fundamentals, Design, and Interaction

Date

01.02.2017

DOI

Abstract

Battery runtime is a critical concern for practical usage of wearable biomedical sensor systems. A long runtime requires an interdisciplinary low-power knowledge and appropriate design tools. We addressed this issue designing a toolbox in three parts: (1) Modular evaluation kit for development of wearable ultra-low-power biomedical sensors; (2) Miniaturized, wearable, and code compatible sensor system with the same properties as the development kit; (3) Web-based battery runtime calculator for our sensor systems. The purpose of the development kit is optimization of the power consumption. Once optimization is finished, the same embedded software can be transferred to the miniaturized body worn sensor. The web-based application supports development quantifying the effects of use case and design decisions on battery runtime. A sensor developer can select sensor modules, configure sensor parameters, enter use case specific requirements, and select a battery to predict the battery runtime for a specific application. Our concept adds value to development of ultra-low-power biomedical wearable sensors. The concept is effective for professional work and educational purposes.

Authors

Tobola, A. and Leutheuser, H. and Schmitz, B. and Hofmann, C. and Struck, M. and Weigand, C. and Eskofier, B. M. and Fischer, G.

Submitted

In Proc: IEEE-EMBS 13th International Conference on Wearable and Implantable Body Sensor Networks (BSN)

Date

14.06.2016

DOI

Abstract

Arrhythmia detection algorithms require the exact and instantaneous detection of fiducial points in the ECG signal. These fiducial points (QRS-complex, P- and T-wave) correspond to distinct cardiac contraction phases. The performance evaluation of different fiducial points detection algorithms require the existence of large databases (DBs) encompassing reference annotations. Up to last year, P- and T-wave annotations were only available for the QT DB. This was addressed by Elgendi et al. who provided P- and T-wave annotations to the MIT-BIH arrhythmia DB. A variety of ECG fiducial points detection algorithms exists in literature, whereas, to the best knowledge of the authors, we could not identify any single-lead algorithm ready for instantaneous P- and T-wave detection. In this work, we present three P- and T-wave detection algorithms: a revised version for QRS detection using line fitting capable to detect P- and T-wave, an expeditious version of a wavelet based ECG delineation algorithm, and a fast naive fiducial points detection algorithm. The fast naive fiducial points detection algorithm performed best on both DBs with sensitivities ranging from 73.0% (P-wave detection, error interval of ± 40 ms) to 89.4% (T-wave detection, error interval of ± 80 ms). As this algorithm detects a wave event in every search window, it has to be investigated how this affects arrhythmia detection algorithms. The reference Matlab implementations are available for download to encourage the development of high-accurate and automated ECG processing algorithms for the integration in daily life using mobile computers.

Authors

Leutheuser, H. and Gradl, S. and Anneken, L. and Arnold, M. and Lang, N. and Achenbach, S. and Eskofier, B. M.

Submitted

In Proc: IEEE-EMBS 13th International Conference on Wearable and Implantable Body Sensor Networks (BSN)

Date

14.06.2016

DOI

Authors

Ece Özkan Elsen, Gemma Roig, Orcun Goksel, Xavier Boix

Submitted

arXiv

Date

27.05.2016

Authors

Firat Ozdemir, Ece Özkan Elsen, Orcun Goksel

Submitted

International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)

Date

27.05.2016

DOI

Abstract

Respiratory motion analysis based on range imaging (RI) has emerged as a popular means of generating respiration surrogates to guide motion management strategies in computer-assisted interventions. However, existing approaches employ heuristics, require substantial manual interaction, or yield highly redundant information. In this paper, we propose a framework that uses preprocedurally obtained 4-D shape priors from patient-specific breathing patterns to drive intraprocedural RI-based real-time respiratory motion analysis. As the first contribution, we present a shape motion model enabling an unsupervised decomposition of respiration induced high-dimensional body surface displacement fields into a low-dimensional representation encoding thoracic and abdominal breathing. Second, we propose a method designed for GPU architectures to quickly and robustly align our models to high-coverage multiview RI body surface data. With our fully automatic method, we obtain respiration surrogates yielding a Pearson correlation coefficient (PCC) of 0.98 with conventional surrogates based on manually selected regions on RI body surface data. Compared to impedance pneumography as a respiration signal that measures the change of lung volume, we obtain a PCC of 0.96. Using off-the-shelf hardware, our framework enables high temporal resolution respiration analysis at 50 Hz.

Authors

J. Wasza, P. Fischer, H. Leutheuser, T. Oefner, C. Bert, A. Maier, J. Hornegger

Submitted

IEEE Trans Biomed Eng.

Date

01.03.2016

DOI

Abstract

Molecular classification of hepatocellular carcinomas (HCC) could guide patient stratification for personalizedtherapies targeting subclass-specific cancer 'driver pathways'. Currently, there are several transcriptome-basedmolecular classifications of HCC with different subclass numbers, ranging from two to six. They were estab-lished using resected tumours that introduce a selection bias towards patients without liver cirrhosis and withearly stage HCCs. We generated and analyzed gene expression data from paired HCC and non-cancerous livertissue biopsies from 60 patients as well as five normal liver samples. Unbiased consensus clustering of HCCbiopsy profiles identified 3 robust classes. Class membership correlated with survival, tumour size and withEdmondson and Barcelona Clinical Liver Cancer (BCLC) stage. When focusing only on the gene expression ofthe HCC biopsies, we could validate previously reported classifications of HCC based on expression patterns ofsignature genes. However, the subclass-specific gene expression patterns were no longer preserved when thefold-change relative to the normal tissue was used. The majority of genes believed to be subclass-specificturned out to be cancer-related genes differentially regulated in all HCC patients, with quantitative ratherthan qualitative differences between the molecular subclasses. With the exception of a subset of samples with a definitive \beta-catenin gene signature, biological pathway analysis could not identify class-specific pathwaysreflecting the activation of distinct oncogenic programs. In conclusion, we have found that gene expressionprofiling of HCC biopsies has limited potential to direct therapies that target specific driver pathways, but canidentify subgroups of patients with different prognosis.

Authors

Zuzanna Makowska, Tujana Boldanova, David Adametz, Luca Quagliata, Julia E. Vogt, Michael T. Dill, Mathias S. Matter, Volker Roth, Luigi Terracciano, Markus H. Heim

Submitted

Journal of Pathology: Clinical Research, 2016

Date

05.01.2016

LinkDOI

Abstract

In many research disciplines, hypothesis tests are applied to evaluate whether findings are statistically significant or could be explained by chance. The Wilcoxon–Mann–Whitney (WMW) test is among the most popular hypothesis tests in medicine and life science to analyze if two groups of samples are equally distributed. This nonparametric statistical homogeneity test is commonly applied in molecular diagnosis. Generally, the solution of the WMW test takes a high combinatorial effort for large sample cohorts containing a significant number of ties. Hence, P value is frequently approximated by a normal distribution. We developed EDISON-WMW, a new approach to calculate the exact permutation of the two-tailed unpaired WMW test without any corrections required and allowing for ties. The method relies on dynamic programing to solve the combinatorial problem of the WMW test efficiently. Beyond a straightforward implementation of the algorithm, we presented different optimization strategies and developed a parallel solution. Using our program, the exact P value for large cohorts containing more than 1000 samples with ties can be calculated within minutes. We demonstrate the performance of this novel approach on randomly-generated data, benchmark it against 13 other commonly-applied approaches and moreover evaluate molecular biomarkers for lung carcinoma and chronic obstructive pulmonary disease (COPD). We found that approximated P values were generally higher than the exact solution provided by EDISON-WMW. Importantly, the algorithm can also be applied to high-throughput omics datasets, where hundreds or thousands of features are included. To provide easy access to the multi-threaded version of EDISON-WMW, a web-based solution of our algorithm is freely available at http://www.ccb.uni-saarland.de/software/wtest/.

Authors

Alexander Marx, Christina Backes, Eckart Meese, Hans-Peter Lenhof, Andreas Keller

Submitted

Genomics, Proteomics & Bioinformatics

Date

01.01.2016

DOICode

Abstract

This paper proposes a new framework to find associations between somatic mu- tations and clinical features in cancer. The clinical features are directly extracted from the Electronic Health Records by performing a large-scale clustering of the sentences. Using a linear mixed model, we find significant associations between EHR-based phenotypes and gene mutations, while correcting for the cancer type as a confounding effect. To the author’s knowledge, this is the first attempt to per- form genetic association studies using EHR-based phenotypes. Such research has the potential to help in the discovery of unknown mechanisms in cancer, which will allow to prevent the disease, monitor patients at risk, and design tailored treatments for the patients.

Authors

Melanie F. Pradier, Stefan Stark, Stephanie Hyland, Julia E. Vogt, Gunnar Rätsch, and Fernando Perez-Cruz

Submitted

Paper + Spotlight Talk at Machine Learning for Computational Biology Workshop in Neural Information Processing Systems Conference 2015

Date

07.12.2015

Link

Authors

Melanie F. Pradier, Theofanis Karaletsos, Stefan Stark, Julia E. Vogt, Gunnar Rätsch, and Fernando Perez-Cruz

Submitted

Accepted Abstract at Machine Learning for Healthcare Workshop in Neural Information Processing Systems Conference 2015

Date

06.12.2015

Link

Abstract

Photoplethysmography (PPG) is a non-invasive, inexpensive and unobtrusive method to achieve heart rate monitoring during physical exercises. Motion artifacts during exercise challenge the heart rate estimation from wrist-type PPG signals. This paper presents a methodology to overcome these limitation by incorporating acceleration information. The proposed algorithm consisted of four stages: (1) A wavelet based denoising, (2) an acceleration based denoising, (3) a frequency based approach to estimate the heart rate followed by (4) a postprocessing step. Experiments with different movement types such as running and rehabilitation exercises were used for algorithm design and development. Evaluation of our heart rate estimation showed that a mean absolute error 1.96 bpm (beats per minute) with standard deviation of 2.86 bpm and a correlation of 0.98 was achieved with our method. These findings suggest that the proposed methodology is robust to motion artifacts and is therefore applicable for heart rate monitoring during sports and rehabilitation.

Authors

Mullan, P. J. and Kanzler, C. M. and Lorch, B. and Schröder, L. and Winkler, L. and Laich, L. H. and Riedel, F. and Richer, R. and Luckner, C. and Leutheuser, H. and Eskofier, B. M. and Pasluosta, C. F.

Submitted

In Proc: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Date

25.08.2015

DOI

Abstract

In the last decade the interest for heart rate variability analysis has increased tremendously. Related algorithms depend on accurate temporal localization of the heartbeat, e.g. the R-peak in electrocardiogram signals, especially in the presence of arrhythmia. This localization can be delivered by numerous solutions found in the literature which all lack an exact specification of their temporal precision. We implemented three different state-of-the-art algorithms and evaluated the precision of their R-peak localization. We suggest a method to estimate the overall R-peak temporal inaccuracy-dubbed beat slackness-of QRS detectors with respect to normal and abnormal beats. We also propose a simple algorithm that can complement existing detectors to reduce this slackness. Furthermore we define improvements to one of the three detectors allowing it to be used in real-time on mobile devices or embedded hardware. Across the entire MIT-BIH Arrhythmia Database, the average slackness of all the tested algorithms was 9 ms for normal beats and 13 ms for abnormal beats. Using our complementing algorithm this could be reduced to 4 ms for normal beats and to 7 ms for abnormal beats. The presented methods can be used to significantly improve the precision of R-peak detection and provide an additional measurement for QRS detector performance.

Authors

Gradl, S. and Leutheuser, H. and Elgendi, M. and Lang, N. and Eskofier, B. M.

Submitted

In Proc: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Date

25.08.2015

DOI

Abstract

Epilepsy is a disease of the central nervous system. Nearly 70% of people with epilepsy respond to a proper treatment, but for a successful therapy of epilepsy, physicians need to know if and when seizures occur. The gold standard diagnosis tool video-electroencephalography (vEEG) requires patients to stay at hospital for several days. A wearable sensor system, e.g. a wristband, serving as diagnostic tool or event monitor, would allow unobtrusive ambulatory long-term monitoring while reducing costs. Previous studies showed that seizures with motor symptoms such as generalized tonic-clonic seizures can be detected by measuring the electrodermal activity (EDA) and motion measuring acceleration (ACC). In this study, EDA and ACC from 8 patients were analyzed. In extension to previous studies, different types of seizures, including seizures without motor activity, were taken into account. A hierarchical classification approach was implemented in order to detect different types of epileptic seizures using data from wearable sensors. Using a k-nearest neighbor (kNN) classifier an overall sensitivity of 89.1% and an overall specificity of 93.1% were achieved, for seizures without motor activity the sensitivity was 97.1% and the specificity was 92.9%. The presented method is a first step towards a reliable ambulatory monitoring system for epileptic seizures with and without motor activity.

Authors

B. E. Heldberg, T. Kautz, H. Leutheuser, R. Hopfeng\"artner, B. Kasper, B. M. Eskofier

Submitted

In Proc: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Date

25.08.2015

DOI

Abstract

Medical diagnosis is the first level for recognition and treatment of diseases. To realize fast diagnosis, we propose a concept of a basic framework for the underwater monitoring of a diver’s ECG signal, including an alert system that warns the diver of predefined medical emergency situations. The framework contains QRS detection, heart rate calculation and an alert system. After performing a predefined study protocol, the algorithm’s accuracy was evaluated with 10 subjects in a dry environment and with 5 subjects in an underwater environment. The results showed that, in 3 out of 5 dives as well as in dry environment, data transmission remained stable. In these cases, the subjects were able to trigger the alert system. The evaluated data showed a clear ECG signal with a QRS detection accuracy of 90%. Thus, the proposed framework has the potential to detect and to warn of health risks. Further developments of this sample concept can imply an extension for monitoring different biomedical parameters.

Authors

T. Cibis, B. Groh, H. Leutheuser, B. M. Eskofier

Submitted

In Proc: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Date

25.08.2015

DOI

Abstract

Purpose Exercise and physical activity is a driving force for mental health. Major challenges in the treatment of psychological diseases are accurate activity profiles and the adherence to exercise intervention programs. We present the development and validation of CHRONACT, a wearable realtime activity tracker based on inertial sensor data to support mental health. Methods CHRONACT comprised a Human Activity Recognition (HAR) algorithm that determined activity levels based on their Metabolic Equivalent of Task (MET) with sensors on ankle and wrist. Special emphasis was put on wearability, real-time data analysis and runtime to be able to use the system as augmented feedback device. For the development, data of 47 healthy subjects performing clinical intervention program activities were collected to train different classification models. The most suitable model according to the accuracy and processing power tradeoff was selected for an embedded implementation on CHRONACT. Results A validation trial (six subjects, 6 h of data) showed the accuracy of the system with a classification rate of 85.6%. The main source of error was identified in acyclic activities that contained activity bouts of neighboring classes. The runtime of the system was more than 7 days and continuous result logging was available for 39 h. Conclusions In future applications, the CHRONACT system can be used to create accurate and unobtrusive patient activity profiles. Furthermore, the system is ready to assess the effects of individual augmented feedback for exercise adherence.

Authors

U. Jensen, H. Leutheuser, S. Hofmann, B. Schuepferling, G. Suttner, K. Seiler, J. Kornhuber, B. M Eskofier

Submitted

Biomed Eng Lett.

Date

18.07.2015

DOI

Abstract

We present a novel probabilistic clustering model for objects that are represented via pairwise distances and observed at different time points. The proposed method utilizes the information given by adjacent time points to find the underlying cluster structure and obtain a smooth cluster evolution. This approach allows the number of objects and clusters to differ at every time point, and no identification on the identities of the objects is needed. Further, the model does not require the number of clusters being specified in advance—they are instead determined automatically using a Dirichlet process prior. We validate our model on synthetic data showing that the proposed method is more accurate than state-of-the-art clustering methods. Finally, we use our dynamic clustering model to analyze and illustrate the evolution of brain cancer patients over time.

Authors

Julia E. Vogt, Marius Kloft, Stefan Stark, Sandhya Prabhakaran, Sudhir Raman, Volker Roth and Gunnar Rätsch

Submitted

Machine Learning Journal, 2015

Date

16.07.2015

LinkDOI

Abstract

Long battery runtime is one of the most wanted prop-erties of wearable sensor systems. The sampling rate has an highimpact on the power consumption. However, defining a sufficientsampling rate, especially for cutting edge mobile sensors isdifficult. Often, a high sampling rate, up to four times higher thannecessary, is chosen as a precaution. Especially for biomedicalsensor applications many contradictory recommendations exist,how to select the appropriate sample rate. They all are motivatedfrom one point of view – the signal quality. In this paper wemotivate to keep the sampling rate as low as possible. Thereforewe reviewed common algorithms for biomedical signal processing.For each algorithm the number of operations depending on thedata rate has been estimated. The Bachmann-Landau notationhas been used to evaluate the computational complexity independency of the sampling rate. We found linear, logarithmic,quadratic and cubic dependencies.

Authors

Tobola, A. and Streit, F. and Espig, C. and Korpok, O. and Leutheuser, H. and Sauter, C. and Lang, N. and Schmitz, B. and Hofmann, C. and Struck, M. and Weigand, C. and Eskofier, B. M. and Fischer G.

Submitted

In Proc: IEEE-EMBS 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN)

Date

09.06.2015

Abstract

Far too many people are dying from stroke or other heart related diseases each year. Early detection of abnormal heart rhythm could trigger the timely presentation to the emergency department or outpatient unit. Smartphones are an integral part of everyone's life and they form the ideal basis for mobile monitoring and real-time analysis of signals related to the human heart. In this work, we investigated the performance of arrhythmia classification systems using only features calculated from the time instances of individual heart beats. We built a sinusoidal model using N (N = 10, 15, 20) consecutive RR intervals to predict the (N+1)th RR interval. The integration of the innovative sinusoidal regression feature, together with the amplitude and phase of the proposed sinusoidal model, led to an increase in the mean class-dependent classification accuracies. Best mean class-dependent classification accuracies of 90% were achieved using a Naive Bayes classifier. Well-performing real- time analysis arrhythmia classification algorithms using only the time instances of individual heart beats could have a tremendous impact in reducing healthcare costs and reducing the high number of deaths related to cardiovascular diseases.

Authors

Leutheuser, H. and Tobola, A. and Anneken, L. and Arnold, M. and Lang, N. and Achenbach, S. and Eskofier, B. M

Submitted

In Proc: IEEE-EMBS 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN)

Date

09.06.2015

DOI

Abstract

Athletes and their coaches aim for enhancing the sports performance. Collecting data from athletes, transforming them into useful information related to their sports performance (e.g., their type of gait), and transmitting the information to the coaches supports the enhancement. The types of gait standing, walking, and running were often examined. Lack of research remains for the two types of running, jogging and sprinting. In this work, standing, walking, jogging, and sprinting were classified with a single inertial-magnetic measurement unit that was placed at a novel position at the trunk. A comparison was made between classification systems using different combinations of accelerometer, gyroscope, and magnetometer data as well as different classifiers (Naïve Bayes, k-Nearest Neighbors, Support Vector Machine, Adaptive Boosting). After collecting data from 15 male subjects, the data were preprocessed, features were extracted and selected, and the data were classified. All classification systems were successful. With a mean true positive rate of 95.68% ±1.80%, the classification system using accelerometer and gyroscope data as well as the Naïve Bayes classifier performed best. The classification system can be used for applications in sport and sports performance analysis in particular.

Authors

K. Full, H. Leutheuser, J. Schlessman, R. Armitage, B. M. Eskofier

Submitted

In Proc: IEEE-EMBS 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN)

Date

09.06.2015

DOI

Authors

Ece Özkan Elsen, Orcun Goksel

Submitted

International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Date

27.05.2015

DOI

Abstract

Everything in nature tries to reach the lowest possible energy level. Therefore any natural or artificial system must have the ability to adjust itself to the changing requirements of its surrounding environment. In this paper we address this issue by an ECG sensor designed to be adjustable during runtime, having the ability to reduce the power consumption at cost of the informational content. Accessible for everyone, standard ECG hardware and open source software has been used to realize an ECG processing system for wearable applications. The average power consumption has been measured for each mode of operation. Finally we take conclusion to conciser context-aware scaling as key feature to address the energy issue of wearable sensor systems.

Authors

Tobola, A. and Espig, C. and Streit, F. J. and Korpok, O. and Leutheuser, H. and Schmitz, B. and Hofmann, C. and Struck, M. and Weigand, C. and Eskofier, B. M. and Fischer, G.

Submitted

In Proc: 10th Annual IEEE International Symposium on Medical Measurements and Applications (MeMeA)

Date

07.05.2015

DOI

Abstract

A major challenge in computational biology is to find simple representations of high-dimensional data that best reveal the underlying structure. In this work, we present an intuitive and easy-to-implement method based on ranked neighborhood comparisons that detects structure in unsupervised data. The method is based on ordering objects in terms of similarity and on the mutual overlap of nearest neighbors. This basic framework was originally introduced in the field of social network analysis to detect actor communities. We demonstrate that the same ideas can successfully be applied to biomedical data sets in order to reveal complex underlying structure. The algorithm is very efficient and works on distance data directly without requiring a vectorial embedding of data. Comprehensive experiments demonstrate the validity of this approach. Comparisons with state-of-the-art clustering methods show that the presented method outperforms hierarchical methods as well as density based clustering methods and model-based clustering. A further advantage of the method is that it simultaneously provides a visualization of the data. Especially in biomedical applications, the visualization of data can be used as a first pre-processing step when analyzing real world data sets to get an intuition of the underlying data structure. We apply this model to synthetic data as well as to various biomedical data sets which demonstrate the high quality and usefulness of the inferred structure.

Authors

Julia E. Vogt

Submitted

IEEE/ACM Transactions on Computational Biology and Bioinformatics (Volume: 12 , Issue: 4 , July-Aug. 1 2015)

Date

26.01.2015

LinkDOI

Abstract

Early detection of arrhythmic beats in the electrocardiogram (ECG) signal could improve the identification of patients at risk from sudden death, for example due to coronary heart disease. We present a mobile, hierarchical classification system (three stages in total) using complete databases with the aim to provide instantaneous analysis in case of symptoms and–if necessary–the recommendation to visit an emergency department. In this work, we give more details about the training process of the second stage classifier. The Linear Regression classifier achieved the smallest false negative rate of 14.06% with an accuracy of 66.19% after feature selection. It has to be investigated whether the hierarchical classification system has–in its entirety–better performance orientating on the false negative rate or the accuracy for the second stage classifier. The complete hierarchical classification system has the potential to provide automated, accurate ECG arrhythmia detection that can easily be integrated in daily life.

Authors

H. Leutheuser, T. Gottschalk, L. Anneken, M. Struck, A. Heuberger, M. Arnold, S. Achenbach, B. M. Eskofier

Submitted

In Proc: Conference on Mobile and Information Technologies in Medicine (MobileMed)

Date

20.11.2014

Abstract

Activity recognition is mandatory in order to provide feedback about the individual quality of life. Usually, activity recognition algorithms are evaluated on one specific database which is limited in the number of subjects, sensors and type of activities. In this paper, a novel database fusion strategy was proposed which fused three different publicly available databases to one large database consisting of 42 subjects. The fusion of databases addresses the two attributes high volume and high variety of the term "big data". Furthermore, an algorithm was developed which can deal with multiple databases varying in the number of sensors and activities. Nine features were computed in sliding windows of inertial data of several sensor positions. Decision-level fusion was performed in order to combine the information of different sensor positions. The proposed classification system achieved an overall mean classification rate of 85.8 % and allows an easy integration of new databases. Using big data is necessary to develop robust and stable activity recognition algorithms in the future.

Authors

Schuldhaus, D. and Leutheuser, H. and Eskofier, B. M.

Submitted

In Proc: 9th International Conference on Body Area Networks (BodyNets)

Date

01.09.2014

DOI

Abstract

Analysis of electroencephalography (EEG) recorded during movement is often aggravated or even completely hindered by electromyogenic artifacts. This is caused by the overlapping frequencies of brain and myogenic activity and the higher amplitude of the myogenic signals. One commonly employed computational technique to reduce these types of artifacts is Independent Component Analysis (ICA). ICA estimates statistically independent components (ICs) that, when linearly combined, closely match the input (sensor) data. Removing the ICs that represent artifact sources and re-mixing the sources returns the input data with reduced noise activity. ICs of real-world data are usually not perfectly separated, actual sources, but a mixture of these sources. Adding additional input signals, predominantly generated by a single IC that is already part of the original sensor data, should increase that IC's separability. We conducted this study to evaluate this concept for ICA-based electromyogenic artifact reduction in EEG using EMG signals as additional inputs. To acquire the appropriate data we worked with nine human volunteers. The EEG and EMG were recorded while the study volunteers performed seven exercises designed to produce a wide range of representative myogenic artifacts. To evaluate the effect of the EMG signals we estimated the sources of each dataset once with and once without the EMG data. The ICs were automatically classified as either `myogenic' or `non-myogenic'. We removed the former before back projection. Afterwards we calculated an objective measure to quantify the artifact reduction and assess the effect of including EMG signals. Our study showed that the ICA-based reduction of electromyogenic artifacts can be improved by including the EMG data of artifact-inducing muscles. This approach could prove beneficial for locomotor disorder research, brain-computer interfaces, neurofeedback, and most other areas where brain activity during movement has to be analyzed.

Authors

F. Gabsteiger, H. Leutheuser, P. Reis, M. Lochmann, B. M. Eskofier

Submitted

In Proc: 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Date

26.08.2014

DOI

Abstract

Respiratory inductive plethysmography (RIP) has been introduced as an alternative for measuring ventilation by means of body surface displacement (diameter changes in rib cage and abdomen). Using a posteriori calibration, it has been shown that RIP may provide accurate measurements for ventilatory tidal volume under exercise conditions. Methods for a priori calibration would facilitate the application of RIP. Currently, to the best knowledge of the authors, none of the existing ambulant procedures for RIP calibration can be used a priori for valid subsequent measurements of ventilatory volume under exercise conditions. The purpose of this study is to develop and validate a priori calibration algorithms for ambulant application of RIP data recorded in running exercise. We calculated Volume Motion Coefficients (VMCs) using seven different models on resting data and compared the root mean squared error (RMSE) of each model applied on running data. Least squares approximation (LSQ) without offset of a two-degree-of-freedom model achieved the lowest RMSE value. In this work, we showed that a priori calibration of RIP exercise data is possible using VMCs calculated from 5 min resting phase where RIP and flowmeter measurements were performed simultaneously. The results demonstrate that RIP has the potential for usage in ambulant applications.

Authors

H. Leutheuser, C. Heyde, A. Gollhofer, B. M Eskofier

Submitted

In Proc: 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Date

26.08.2014

DOI

Abstract

The electrocardiogram (ECG) is a key diagnostic tool in heart disease and may serve to detect ischemia, arrhythmias, and other conditions. Automatic, low cost monitoring of the ECG signal could be used to provide instantaneous analysis in case of symptoms and may trigger the presentation to the emergency department. Currently, since mobile devices (smartphones, tablets) are an integral part of daily life, they could form an ideal basis for automatic and low cost monitoring solution of the ECG signal. In this work, we aim for a realtime classification system for arrhythmia detection that is able to run on Android-based mobile devices. Our analysis is based on 70% of the MIT-BIH Arrhythmia and on 70% of the MIT-BIH Supraventricular Arrhythmia databases. The remaining 30% are reserved for the final evaluation. We detected the R-peaks with a QRS detection algorithm and based on the detected R-peaks, we calculated 16 features (statistical, heartbeat, and template-based). With these features and four different feature subsets we trained 8 classifiers using the Embedded Classification Software Toolbox (ECST) and compared the computational costs for each classification decision and the memory demand for each classifier. We conclude that the C4.5 classifier is best for our two-class classification problem (distinction of normal and abnormal heartbeats) with an accuracy of 91.6%. This classifier still needs a detailed feature selection evaluation. Our next steps are implementing the C4.5 classifier for Android-based mobile devices and evaluating the final system using the remaining 30% of the two used databases.

Authors

H. Leutheuser, S. Gradl, P. Kugler, L. Anneken, M. Arnold, S. Achenbach, B. M. Eskofier

Submitted

In Proc: 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Date

26.08.2014

DOI

Abstract

Insufficient physical activity is the 4th leading risk factor for mortality. The physical activity of a person is reflected in the walking behavior. Different methods for the calculation of the accurate step number exists and most of them are evaluated using different walking speeds measured on a treadmill or using a small sample size of overground walking. In this paper, we introduce the BaSA (Basic Step Activities) dataset consisting of four different step activities (walking, jogging, ascending, and descending stairs) that were performed under natural conditions. We further compare two step segmentation algorithms (a simple peak detection algorithm vs. subsequence Dynamic Time Warping (sDTW)). We calculated a multivariate Analysis of Variance (ANOVA) with repeated measures followed by multiple dependent t-tests with Bonferroni correction to test for significant differences in the two algorithms. sDTW performed equally good compared to the peak detection algorithm, but was not considerably better. In further analysis, continuous, real walking signals with transitions from one step activity to the other step activity should be considered to investigate the adaptability of these two step segmentation algorithms.

Authors

H. Leutheuser, S. Doelfel, D. Schuldhaus, S. Reinfelder, B. M. Eskofier

Submitted

In Proc: IEEE-EMBS 11th International Conference on Wearable and Implantable Body Sensor Networks (BSN)

Date

16.06.2014

DOI

Abstract

Using multiple inertial sensors for energy expenditure estimation provides a useful tool for the assessment of daily life activities. Due to the high variety of new upcoming sensor types and recommendations for sensor placement to assess physiological human body function, an adaptable inertial sensor fusion-based approach is mandatory. In this paper, two inertial body sensors, consisting of a triaxial accelerometer and a triaxial gyroscope, were placed on hip and ankle. Ten subjects performed two trials of running on a treadmill under three speed levels ([3.2, 4.8, 6.4] km/h). Each sensor source was separately subjected to preprocessing, feature extraction and regression. In the final step, decision level fusion was performed by averaging the predicted results. A mean absolute error of 0.50 MET was achieved against indirect calorimetry. The system allows an easy integration of new sensors without retraining the complete system. This is an advantage over commonly used feature level fusion approaches.

Authors

Schuldhaus, D. and Dorn, S. and Leutheuser, H. and Tallner, A. and Klucken, J. and Eskofier, B. M.

Submitted

In Proc: 15th International Conference on Biomedical Engineering (ICBME)

Date

15.06.2014

DOI

Abstract

Traditionally, electroencephalography (EEG) recorded during movement has been considered too noise prone to allow for sophisticated analysis. Superimposed electromyogenic activity interferes and masks the EEG signal. Presently, computational techniques such as Independent Component Analysis allow reduction of these artifacts. However, to date, it is relied on the user to select the artifact-contaminated components to reject. To automate this process and to reduce user dependent factors, we trained a support vector machine (SVM) to assist the user in choosing the independent components (ICs) most influenced by electromyogenic artifacts. We designed and conducted a study with specific neck and body movement exercises and collected data from five human participants (35 datasets total). After preprocessing, we decomposed the data by applying the Adaptive Mixture of Independent Component Analysis (AMICA) algorithm. An expert labeled the ICs found in the EEG recordings after decomposition as either ‘myogenic activity’ or ‘non-myogenic activity’. Afterwards, the classifier was evaluated on the dataset of one participant, whose data were not used in the training phase, and obtained 93% sensitivity and 96% specificity. Our study was designed to cover a diverse selection of exercises that stimulate the musculature that most interferes in EEG recordings during movement. This selection should produce similar artifact patterns as seen in most exercises or movements. Although unfamiliar exercises could result in worse classification performance, the results are expected to be equivalent to ours. Our study showed that this tool can help EEG analysis by reliably and efficiently choosing electromyogenic artifact contaminated components after AMICA decomposition, ultimately increasing the speed of data processing.

Authors

F. Gabsteiger, H. Leutheuser, P. Reis, M. Lochmann, B. M. Eskofier

Submitted

In Proc: 15th International Conference on Biomedical Engineering (ICBME)

Date

15.06.2014

DOI

Abstract

Introduction: The aim of this study was to provide a rationale for future validations of a priori calibrated respiratory inductance plethysmography (RIP) when used under exercise conditions. Therefore, the validity of a posteriori-adjusted gain factors and accuracy in resultant breath-by-breath RIP data recorded under resting and running conditions were examined. Methods: Healthy subjects, 98 men and 88 women (mean ± SD: height = 175.6 ± 8.9 cm, weight = 68.9 ± 11.1 kg, age = 27.1 ± 8.3 yr), underwent a standardized test protocol, including a period of standing still, an incremental running test on treadmill, and multiple periods of recovery. Least square regression was used to calculate gain factors, respectively, for complete individual data sets as well as several data subsets. In comparison with flowmeter data, the validity of RIP in breathing rate (fR) and inspiratory tidal volume (VTIN) were examined using coefficients of determination (R). Accuracy was estimated from equivalence statistics. Results: Calculated gains between different data subsets showed no equivalence. After gain adjustment for the complete individual data set, fR and VTIN between methods were highly correlated (R = 0.96 ± 0.04 and 0.91 ± 0.05, respectively) in all subjects. Under conditions of standing still, treadmill running, and recovery, 86%, 98%, and 94% (fR) and 78%, 97%, and 88% (VTIN), respectively, of all breaths were accurately measured within ± 20% limits of equivalence. Conclusion: In case of the best possible gain adjustment, RIP confidentially estimates tidal volume accurately within ± 20% under exercise conditions. Our results can be used as a rationale for future validations of a priori calibration procedures.

Authors

C. Heyde, H. Leutheuser, B. M. Eskofier, K. Roecker, A. Gollhofer

Submitted

Med Sci Sports Exerc.

Date

01.03.2014

DOI

Abstract

The use of pegylated interferon-\alpha (pegIFN-\alpha) has replaced unmodified recombinant IFN-\alpha for the treatment of chronic viral hepatitis. While the superior antiviral efficacy of pegIFN-\alpha is generally attributed to improved pharmacokinetic properties, the pharmacodynamic effects of pegIFN-\alpha in the liver have not been studied. Here, we analyzed pegIFN-\alpha–induced signaling and gene regulation in paired liver biopsies obtained prior to treatment and during the first week following pegIFN-\alpha injection in 18 patients with chronic hepatitis C. Despite sustained high concentrations of pegIFN-\alpha in serum, the Jak/STAT pathway was activated in hepatocytes only on the first day after pegIFN-\alpha administration. Evaluation of liver biopsies revealed that pegIFN-\alpha induces hundreds of genes that can be classified into four clusters based on different temporal expression profiles. In all clusters, gene transcription was mainly driven by IFN-stimulated gene factor 3 (ISGF3). Compared with conventional IFN-\alpha therapy, pegIFN-\alpha induced a broader spectrum of gene expression, including many genes involved in cellular immunity. IFN-induced secondary transcription factors did not result in additional waves of gene expression. Our data indicate that the superior antiviral efficacy of pegIFN-\alpha is not the result of prolonged Jak/STAT pathway activation in hepatocytes, but rather is due to induction of additional genes that are involved in cellular immune responses.

Authors

Michael T. Dill, Zuzanna Makowska, Gaia Trincucci, Andreas J. Gruber, Julia E. Vogt, Magdalena Filipowicz, Diego Calabrese, Ilona Krol, Daryl T. Lau, Luigi Terracciano, Erik van Nimwegen, Volker Roth and Markus H. Heim

Submitted

The Journal of Clinical Investigation

Date

23.02.2014

LinkDOI

Abstract

Insufficient physical activity is the 4th leading risk factor for mortality. Methods for assessing the individual daily life activity (DLA) are of major interest in order to monitor the current health status and to provide feedback about the individual quality of life. The conventional assessment of DLAs with self-reports induces problems like reliability, validity, and sensitivity. The assessment of DLAs with small and light-weight wearable sensors (e.g. inertial measurement units) provides a reliable and objective method. State-of-the-art human physical activity classification systems differ in e.g. the number and kind of sensors, the performed activities, and the sampling rate. Hence, it is difficult to compare newly proposed classification algorithms to existing approaches in literature and no commonly used dataset exists. We generated a publicly available benchmark dataset for the classification of DLAs. Inertial data were recorded with four sensor nodes, each consisting of a triaxial accelerometer and a triaxial gyroscope, placed on wrist, hip, chest, and ankle. Further, we developed a novel, hierarchical, multi-sensor based classification system for the distinction of a large set of DLAs. Our hierarchical classification system reached an overall mean classification rate of 89.6% and was diligently compared to existing state-of-the-art algorithms using our benchmark dataset. For future research, the dataset can be used in the evaluation process of new classification algorithms and could speed up the process of getting the best performing and most appropriate DLA classification system.

Authors

H. Leutheuser, D. Schuldhaus, B. M. Eskofier

Submitted

PLOS ONE

Date

09.10.2013

DOI

Abstract

We present a Bayesian approach for estimating the relative frequencies of multi-single nucleotide polymorphism (SNP) haplotypes in populations of the malaria parasite Plasmodium falciparum by using microarray SNP data from human blood samples. Each sample comes from a malaria patient and contains one or several parasite clones that may genetically differ. Samples containing multiple parasite clones with different genetic markers pose a special challenge. The situation is comparable with a polyploid organism. The data from each blood sample indicates whether the parasites in the blood carry a mutant or a wildtype allele at various selected genomic positions. If both mutant and wildtype alleles are detected at a given position in a multiply infected sample, the data indicates the presence of both alleles, but the ratio is unknown. Thus, the data only partially reveals which specific combinations of genetic markers (i.e. haplotypes across the examined SNPs) occur in distinct parasite clones. In addition, SNP data may contain errors at non-negligible rates. We use a multinomial mixture model with partially missing observations to represent this data and a Markov chain Monte Carlo method to estimate the haplotype frequencies in a population. Our approach addresses both challenges, multiple infections and data errors.

Authors

Leonore Wigger, Julia E. Vogt, Volker Roth

Submitted

Statistics in Medicine: 04/2013

Date

19.09.2013

LinkDOI

Abstract

The fusion of inertial sensor data is heavily used for the classification of daily life activities. The knowledge about the performed daily life activities is mandatory to give physically inactive people feedback about their individual quality of life. In this paper, four inertial sensors were placed on wrist, chest, hip and ankle of 19 subjects, which had to perform seven daily life activities. Each sensor node separately performed preprocessing, feature extraction and classification. In the final step, the classifier decisions of the sensor nodes were fused and a single activity was predicted by majority voting. The proposed classification system obtained an overall mean classification rate of 93.9 % and was robust against defect sensors. The system allows an easy integration of new sensors without retraining of the complete system, which is an advantage over commonly used feature level fusion approaches.

Authors

Schuldhaus, D. and Leutheuser, H. and Eskofier, B. M.

Submitted

In Proc: 8th International Conference on Body Area Networks (BodyNets)

Date

01.09.2013

DOI

Abstract

Electromyogenic or muscle artifacts constitute a major problem in studies involving electroencephalography (EEG) measurements. This is because the rather low signal activity of the brain is overlaid by comparably high signal activity of muscles, especially neck muscles. Hence, recording an artifact-free EEG signal during movement or physical exercise is not, to the best knowledge of the authors, feasible at the moment. Nevertheless, EEG measurements are used in a variety of different fields like diagnosing epilepsy and other brain related diseases or in biofeedback for athletes. Muscle artifacts can be recorded using electromyography (EMG). Various computational methods for the reduction of muscle artifacts in EEG data exist like the ICA algorithm InfoMax and the AMICA algorithm. However, there exists no objective measure to compare different algorithms concerning their performance on EEG data. We defined a test protocol with specific neck and body movements and measured EEG and EMG simultaneously to compare the InfoMax algorithm and the AMICA algorithm. A novel objective measure enabled to compare both algorithms according to their performance. Results showed that the AMICA algorithm outperformed the InfoMax algorithm. In further research, we will continue using the established objective measure to test the performance of other algorithms for the reduction of artifacts.

Authors

H. Leutheuser, F. Gabsteiger, F. Hebenstreit, P. Reis, M. Lochmann, B. M. Eskofier

Submitted

In Proc: 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Date

03.07.2013

DOI

Abstract

The normal oscillation of the heart rate is called Heart Rate Variability (HRV). HRV parameters change under different conditions like rest, physical exercise, mental stress, and body posture changes. However, results how HRV parameters adapt to physical exercise have been inconsistent. This study investigated how different HRV parameters changed during one hour of running. We used datasets of 295 athletes where each dataset had a total length of about 65 minutes. Data was divided in segments of five minutes and three HRV parameters and one kinematic parameter were calculated for each segment. We applied two different analysis of variance (ANOVA) models to analyze the differences in the means of each segment for every parameter. The two ANOVA models were univariate ANOVA with repeated measures and multivariate ANOVA with repeated measures. The obligatory post-hoc procedure consisted of multiple dependent t tests with Bonferroni correction. We investigated the last three segments of the parameters in more detail and detected a delay of one minute between varying running speed and measured heart rate. Hence, the circulatory system of our population needed one minute to adapt to a change in running speed. The method we provided can be used to further investigate more HRV parameters.

Authors

H. Leutheuser, B. M. Eskofier

Submitted

Int J Comp Sci Sport

Date

01.01.2013

Abstract

Partitioning methods for observations represented by pairwise dissimilarities are studied. Particular emphasis is put on their properties when applied to dissimilarity matrices that do not admit a loss-free embedding into a vector space. Specifically, the Pairwise Clustering cost function is shown to exhibit a shift invariance property which basically means that any symmetric dissimilarity matrix can be modified to allow a vector-space representation without distorting the optimal group structure. In an approximate sense, the same holds true for a probabilistic generalization of Pairwise Clustering, the so-called Wishart–Dirichlet Cluster Process. This shift-invariance property essentially means that these clustering methods are “blind” against Euclidean or metric violations. From the application side, such blindness against metric violations might be seen as a highly desired feature, since it broadens the applicability of certain algorithms. From the viewpoint of theory building, however, the same property might be viewed as a “negative” result, since studying these algorithms will not lead to any new insights on the role of metricity in clustering problems.

Authors

Volker Roth, Thomas J. Fuchs, Julia E. Vogt, Sandhya Prabhakaran, Joachim M. Buhmann

Submitted

Similarity-Based Pattern Analysis and Recognition, 157-177

Date

31.12.2012

LinkDOI

Abstract

Introduction IFN-\alpha signals through the Jak-STAT pathway to induce expression of IFN-stimulated genes (ISGs) with antiviral functions. USP18 is an IFN-inducible negative regulator of the Jak-STAT pathway. Upregulation of USP18 results in a long-lasting desensitization of IFN-\alpha signalling. As a result of this IFN-induced refractoriness, ISG levels decrease back to baseline despite continuous presence of the cytokine. Pegylated forms of IFN-\alpha (pegIFN-\alpha) are currently in clinical use for treatment of chronic hepatitis C virus infection. PegIFN-\alphas show increased anti-hepatitis C virus efficacy compared to nonpegylated IFN-\alpha. This has been attributed to the significantly longer plasma half-life of the pegylated form. However, the underlying assumption that persistently high plasma levels obtained with pegIFN-\alpha therapy result in ongoing stimulation of ISGs in the liver has never been tested. In the present study we therefore investigated the kinetics of Jak-STAT pathway activation and ISG induction in the human liver at several time points during the first week of pegIFN-\alpha therapy. Methods 18 patients with chronic hepatitis C underwent a liver biopsy 4 h (n = 6), 16 h, 48 h, 96 h or 144 h (all n = 3) after the first injection of pegIFN-\alpha-2b. Additional 3 patients received pegIFN-\alpha-2a and were biopsied at 144 h. The activation of Jak-STAT signalling and USP18 upregulation were assessed by immunohistochemistry and Western blot. Gene expression analysis was performed using Human Genome U133 Plus 2.0 arrays and Bioconductor packages of R statistical environment. Results A single dose of pegIFN-\alpha-2b resulted in elevated IFN-\alpha plasma levels throughout the one-week dosing interval. Despite the continuous IFN-\alpha exposure, strong activation of the Jak-STAT pathway was only observed at early time points after administration. Almost 500 genes were significantly upregulated in the liver samples following pegIFN-\alpha stimulation. The breadth of transcriptional response to pegIFN-\alpha was maximal 16 h post-injection and decreased gradually, with only few genes significantly upregulated after 144 h of treatment. Bayesian clustering of the gene expression data revealed 4 distinct groups of the ISGs based on the temporal patterns of regulation. Of 494 upregulated ISGs, the expression of 474 peaked 4 h or 16 h after pegIFN-\alpha administration, followed by a steady decline of mRNA levels through the remaining 128 h of treatment. This transient activation of the Jak-STAT pathway coincided with elevated expression of USP18 on the protein level, which was first detectable 16 post-injection. Conclusion PegIFN-\alpha induces a transient activation of Jak-STAT signalling and ISG upregulation in human liver, in spite of persistent high serum concentrations. The short-lived STAT1 phosphorylation and gene induction can be explained by upregulation of USP18 and establishment of refractory state. The superior efficacy of pegIFN-\alpha compared to conventional IFN-\alpha for chronic hepatitis C therapy cannot be explained by persistent signalling and ISG induction during the one-week dosing interval.

Authors

Z. Makowska, M. T. Dill, Julia E. Vogt, Magdalena Filipowicz Sinnreich, L. Terraciano, Volker Roth, M. H. Heim

Submitted

Cytokine 59(3):563–564, 2012

Date

11.08.2012

LinkDOI

Abstract

Archetype analysis involves the identification of representative objects from amongst a set of multivariate data such that the data can be expressed as a convex combination of these representative objects. Existing methods for archetype analysis assume a fixed number of archetypes a priori. Multiple runs of these methods for different choices of archetypes are required for model selection. Not only is this computationally infeasible for larger datasets, in heavy-noise settings model selection becomes cumbersome. In this paper, we present a novel extension to these existing methods with the specific focus of relaxing the need to provide a fixed number of archetypes beforehand. Our fast iterative optimization algorithm is devised to automatically select the right model using BIC scores and can easily be scaled to noisy, large datasets. These benefits are achieved by introducing a Group-Lasso component popular for sparse linear regression. The usefulness of the approach is demonstrated through simulations and on a real world application of document analysis for identifying topics.

Authors

Sandhya Prabhakaran, Sudhir Raman, Julia E. Vogt, Volker Roth

Submitted

Pattern Recognition: Joint 34th DAGM and 36th OAGM Symposium, Lecture Notes in Computer Science, 2012

Date

31.07.2012

LinkDOI

Abstract

The Group-Lasso is a well-known tool for joint regularization in machine learning methods. While the l_{1,2} and the l_{1,\infty} version have been studied in detail and efficient algorithms exist, there are still open questions regarding other l_{1,p} variants. We characterize conditions for solutions of the l_{1,p} Group-Lasso for all p-norms with 1 <= p <= \infty, and we present a unified active set algorithm. For all p-norms, a highly efficient projected gradient algorithm is presented. This new algorithm enables us to compare the prediction performance of many variants of the Group-Lasso in a multi-task learning setting, where the aim is to solve many learning problems in parallel which are coupled via the Group-Lasso constraint. We conduct large-scale experiments on synthetic data and on two real-world data sets. In accordance with theoretical characterizations of the different norms we observe that the weak-coupling norms with p between 1.5 and 2 consistently outperform the strong-coupling norms with p >> 2.

Authors

Julia E. Vogt, Volker Roth

Submitted

ICML 2012: Proceedings of the 29th international conference on Machine Learning

Date

17.06.2012

Link

Abstract

BACKGROUND & AIMS: The host immune response during the chronic phase of hepatitis C virus infection varies among individuals; some patients have a no interferon (IFN) response in the liver, whereas others have full activation of IFN-stimulated genes (ISGs). Preactivation of this endogenous IFN system is associated with nonresponse to pegylated IFN-\alpha (pegIFN-\alpha) and ribavirin. Genome-wide association studies have associated allelic variants near the IL28B (IFN\lambda3) gene with treatment response. We investigated whether IL28B genotype determines the constitutive expression of ISGs in the liver and compared the abilities of ISG levels and IL28B genotype to predict treatment outcome. METHODS: We genotyped 109 patients with chronic hepatitis C for IL28B allelic variants and quantified the hepatic expression of ISGs and of IL28B. Decision tree ensembles, in the form of a random forest classifier, were used to calculate the relative predictive power of these different variables in a multivariate analysis. RESULTS: The minor IL28B allele was significantly associated with increased expression of ISG. However, stratification of the patients according to treatment response revealed increased ISG expression in nonresponders, irrespective of IL28B genotype. Multivariate analysis of ISG expression, IL28B genotype, and several other factors associated with response to therapy identified ISG expression as the best predictor of treatment response. CONCLUSIONS: IL28B genotype and hepatic expression of ISGs are independent predictors of response to treatment with pegIFN-\alpha and ribavirin in patients with chronic hepatitis C. The most accurate prediction of response was obtained with a 4-gene classifier comprising IFI27, ISG15, RSAD2, and HTATIP2.

Authors

Michael T. Dill, Francois H.T. Duong, Julia E. Vogt, Stephanie Bibert, Pierre-Yves Bochud, Luigi Terracciano, Andreas Papassotiropoulos, Volker Roth and Markus H. Heim

Submitted

Gastroenterology, 2011 Mar;140(3):1021-1031.e10

Date

28.02.2011

LinkDOI

Abstract

The l_{1,\infty} norm and the l_{1,2} norm are well known tools for joint regularization in Group-Lasso methods. While the l_{1,2} version has been studied in detail, there are still open questions regarding the uniqueness of solutions and the efficiency of algorithms for the l_{1,\infty} variant. For the latter, we characterize the conditions for uniqueness of solutions, we present a simple test for uniqueness, and we derive a highly efficient active set algorithm that can deal with input dimensions in the millions. We compare both variants of the Group-Lasso for the two most common application scenarios of the Group-Lasso, one is to obtain sparsity on the level of groups in “standard” prediction problems, the second one is multi-task learning where the aim is to solve many learning problems in parallel which are coupled via the Group-Lasso constraint. We show that both version perform quite similar in “standard” applications. However, a very clear distinction between the variants occurs in multi-task settings where the l_{1,2} version consistently outperforms the l_{1,\infty} counterpart in terms of prediction accuracy.

Authors

Julia E. Vogt, Volker Roth

Submitted

Pattern Recognition: 32-nd DAGM Symposium, Lecture Notes in Computer Science, 2010

Date

31.07.2010

Link

Abstract

We present a probabilistic model for clustering of objects represented via pairwise dissimilarities. We propose that even if an underlying vectorial representation exists, it is better to work directly with the dissimilarity matrix hence avoiding unnecessary bias and variance caused by embeddings. By using a Dirichlet process prior we are not obliged to fix the number of clusters in advance. Furthermore, our clustering model is permutation-, scale- and translation-invariant, and it is called the Translation-invariant Wishart Dirichlet (TIWD) process. A highly efficient MCMC sampling algorithm is presented. Experiments show that the TIWD process exhibits several advantages over competing approaches.

Authors

Julia E. Vogt, Sandhya Prabhakaran, Thomas J. Fuchs, Volker Roth

Submitted

ICML 2010: Proceedings of the 27th international conference on Machine Learning

Date

20.06.2010

Link