Dr.

Alexander Marx

Alumni

E-Mail
alexander.marx@inf.ethz.ch
Address
Department of Computer Science
CAB G 15.2
Universitätstr. 6
CH – 8092 Zurich, Switzerland

In September 2021, I joined the Medical Data Science group led by Prof. Dr. Julia Vogt as a Post-Doctoral Fellow of the ETH AI Center, and am co-mentored by Peter Bühlmann. I am interested in conducting research related to causality, information theory, and multimodal learning to approach modern challenges in the medical and biological domain. I obtained my Master’s degree in Bioinformatics in 2016 and completed my PhD in Computer Science in 2021 at Saarland University, where I was affiliated with the CISPA Helmholtz Center for Information Security and the Max Planck Institute for Informatics. During my PhD, I did a three-month research visit at the University of Amsterdam in 2019.

You can find a portrait article about me and my research here. For more information on current projects, please visit my personal website.

Abstract

Background: The overarching goal of blood glucose forecasting is to assist individuals with type 1 diabetes (T1D) in avoiding hyper- or hypoglycemic conditions. While deep learning approaches have shown promising results for blood glucose forecasting in adults with T1D, it is not known if these results generalize to children. Possible reasons are physical activity (PA), which is often unplanned in children, as well as age and development of a child, which both have an effect on the blood glucose level. Materials and Methods: In this study, we collected time series measurements of glucose levels, carbohydrate intake, insulin-dosing and physical activity from children with T1D for one week in an ethics approved prospective observational study, which included daily physical activities. We investigate the performance of state-of-the-art deep learning methods for adult data—(dilated) recurrent neural networks and a transformer—on our dataset for short-term (30  min) and long-term (2  h) prediction. We propose to integrate static patient characteristics, such as age, gender, BMI, and percentage of basal insulin, to account for the heterogeneity of our study group. Results: Integrating static patient characteristics (SPC) proves beneficial, especially for short-term prediction. LSTMs and GRUs with SPC perform best for a prediction horizon of 30  min (RMSE of 1.66  mmol/l), a vanilla RNN with SPC performs best across different prediction horizons, while the performance significantly decays for long-term prediction. For prediction during the night, the best method improves to an RMSE of 1.50  mmol/l. Overall, the results for our baselines and RNN models indicate that blood glucose forecasting for children conducting regular physical activity is more challenging than for previously studied adult data. Conclusion: We find that integrating static data improves the performance of deep-learning architectures for blood glucose forecasting of children with T1D and achieves promising results for short-term prediction. Despite these improvements, additional clinical studies are warranted to extend forecasting to longer-term prediction horizons.

Authors

Alexander Marx, Francesco Di Stefano, Heike Leutheuser, Kieran Chin-Cheong, Marc Pfister, Marie-Anne Burckhardt, Sara Bachmann, Julia E. Vogt
denotes shared last authorship

Submitted

Frontiers in Pediatrics

Date

14.12.2023

LinkDOI

Authors

Alexander Immer, Christoph Schultheiss, Julia E. Vogt, Bernhard Schölkopf, Peter Bühlmann, Alexander Marx

Submitted

Proceedings of the 40th International Conference on Machine Learning, ICML 2023

Date

04.07.2023

LinkCode

Authors

Paweł Czyż, Frederic Grabowski, Julia E. Vogt, Niko Beerenwinkel, Alexander Marx

Submitted

Arxiv

Date

19.06.2023

LinkCode

Authors

Alice Bizeul, Imant Daunhawer, Emanuele Palumbo, Bernhard Schölkopf, Alexander Marx, Julia E. Vogt

Submitted

Conference on Causal Learning and Reasoning (Datasets Track), CLeaR, 2023

Date

08.04.2023

LinkCode

Abstract

Contrastive learning is a cornerstone underlying recent progress in multi-view and multimodal learning, e.g., in representation learning with image/caption pairs. While its effectiveness is not yet fully understood, a line of recent work reveals that contrastive learning can invert the data generating process and recover ground truth latent factors shared between views. In this work, we present new identifiability results for multimodal contrastive learning, showing that it is possible to recover shared factors in a more general setup than the multi-view setting studied previously. Specifically, we distinguish between the multi-view setting with one generative mechanism (e.g., multiple cameras of the same type) and the multimodal setting that is characterized by distinct mechanisms (e.g., cameras and microphones). Our work generalizes previous identifiability results by redefining the generative process in terms of distinct mechanisms with modality-specific latent variables. We prove that contrastive learning can block-identify latent factors shared between modalities, even when there are nontrivial dependencies between factors. We empirically verify our identifiability results with numerical simulations and corroborate our findings on a complex multimodal dataset of image/text pairs. Zooming out, our work provides a theoretical basis for multimodal representation learning and explains in which settings multimodal contrastive learning can be effective in practice.

Authors

Imant Daunhawer, Alice Bizeul, Emanuele Palumbo, Alexander Marx, Julia E. Vogt

Submitted

The Eleventh International Conference on Learning Representations, ICLR 2023

Date

23.03.2023

Link

Abstract

We study the problem of identifying cause and effect over two univariate continuous variables X and Y from a sample of their joint distribution. Our focus lies on the setting when the variance of the noise may be dependent on the cause. We propose to partition the domain of the cause into multiple segments where the noise indeed is dependent. To this end, we minimize a scale-invariant, penalized regression score, finding the optimal partitioning using dynamic programming. We show under which conditions this allows us to identify the causal direction for the linear setting with heteroscedastic noise, for the non-linear setting with homoscedastic noise, as well as empirically confirm that these results generalize to the non-linear and heteroscedastic case. Altogether, the ability to model heteroscedasticity translates into an improved performance in telling cause from effect on a wide range of synthetic and real-world datasets.

Authors

Sascha Xu, Osman A Mian, Alexander Marx, Jilles Vreeken

Submitted

Proceedings of the 39th International Conference on Machine Learning, ICML 2022

Date

28.06.2022

LinkCode

Abstract

The algorithmic independence of conditionals, which postulates that the causal mechanism is algorithmically independent of the cause, has recently inspired many highly successful approaches to distinguish cause from effect given only observational data. Most popular among these is the idea to approximate algorithmic independence via two-part Minimum Description Length (MDL). Although intuitively sensible, the link between the original postulate and practical two-part MDL encodings is left vague. In this work, we close this gap by deriving a two-part formulation of this postulate, in terms of Kolmogorov complexity, which directly links to practical MDL encodings. To close the cycle, we prove that this formulation leads on expectation to the same inference result as the original postulate.

Authors

Alexander Marx, Jilles Vreeken

Submitted

AAAI'22 Workshop on Information-Theoretic Methods for Causal Inference and Discovery (ITCI’22)

Date

05.05.2022

Abstract

We study the problem of identifying the cause and the effect between two univariate continuous variables X and Y. The examined data is purely observational, hence it is required to make assumptions about the underlying model. Often, the independence of the noise from the cause is assumed, which is not always the case for real world data. In view of this, we present a new method, which explicitly models heteroscedastic noise. With our HEC algorithm, we can find the optimal model regularized, by an information theoretic score. In thorough experiments we show, that our ability to model heteroscedastic noise translates into a superior performance on a wide range of synthetic and real-world datasets.

Authors

Sascha Xu, Alexander Marx, Osman Mian, Jilles Vreeken

Submitted

AAAI'22 Workshop on Information-Theoretic Methods for Causal Inference and Discovery (ITCI’22)

Date

05.05.2022

Abstract

Estimating mutual information (MI) between two continuous random variables X and Y allows to capture non-linear dependencies between them, non-parametrically. As such, MI estimation lies at the core of many data science applications. Yet, robustly estimating MI for high-dimensional X and Y is still an open research question. In this paper, we formulate this problem through the lens of manifold learning. That is, we leverage the common assumption that the information of X and Y is captured by a low-dimensional manifold embedded in the observed high-dimensional space and transfer it to MI estimation. As an extension to state-of-the-art kNN estimators, we propose to determine the k-nearest neighbors via geodesic distances on this manifold rather than from the ambient space, which allows us to estimate MI even in the high-dimensional setting. An empirical evaluation of our method, G-KSG, against the state-of-the-art shows that it yields good estimations of MI in classical benchmark and manifold tasks, even for high dimensional datasets, which none of the existing methods can provide.

Authors

Alexander Marx, Jonas Fischer

Submitted

Proceedings of the SIAM International Conference on Data Mining, SDM 2022

Date

30.04.2022

LinkDOICode

Abstract

Estimating conditional mutual information (CMI) is an essential yet challenging step in many machine learning and data mining tasks. Estimating CMI from data that contains both discrete and continuous variables, or even discrete-continuous mixture variables, is a particularly hard problem. In this paper, we show that CMI for such mixture variables, defined based on the Radon-Nikodym derivate, can be written as a sum of entropies, just like CMI for purely discrete or continuous data. Further, we show that CMI can be consistently estimated for discrete-continuous mixture variables by learning an adaptive histogram model. In practice, we estimate such a model by iteratively discretizing the continuous data points in the mixture variables. To evaluate the performance of our estimator, we benchmark it against state-of-the-art CMI estimators as well as evaluate it in a causal discovery setting.

Authors

Alexander Marx, Lincen Yang, Matthijs van Leeuwen

Submitted

Proceedings of the SIAM International Conference on Data Mining, SDM 2021

Date

21.10.2021

LinkDOICode

Abstract

Understanding how epigenetic variation in non-coding regions is involved in distal gene-expression regulation is an important problem. Regulatory regions can be associated to genes using large-scale datasets of epigenetic and expression data. However, for regions of complex epigenomic signals and enhancers that regulate many genes, it is difficult to understand these associations. We present StitchIt, an approach to dissect epigenetic variation in a gene-specific manner for the detection of regulatory elements (REMs) without relying on peak calls in individual samples. StitchIt segments epigenetic signal tracks over many samples to generate the location and the target genes of a REM simultaneously. We show that this approach leads to a more accurate and refined REM detection compared to standard methods even on heterogeneous datasets, which are challenging to model. Also, StitchIt REMs are highly enriched in experimentally determined chromatin interactions and expression quantitative trait loci. We validated several newly predicted REMs using CRISPR-Cas9 experiments, thereby demonstrating the reliability of StitchIt. StitchIt is able to dissect regulation in superenhancers and predicts thousands of putative REMs that go unnoticed using peak-based approaches suggesting that a large part of the regulome might be uncharted water.

Authors

Florian Schmidt, Alexander Marx, Nina Baumgarten, Marie Hebel, Martin Wegner, Manuel Kaulich, Matthias S Leisegang, Ralf P Brandes, Jonathan Göke, Jilles Vreeken, Marcel H Schulz

Submitted

Nucleic Acids Research

Date

01.09.2021

DOICode

Abstract

One of the core assumptions in causal discovery is the faithfulness assumption--i.e. assuming that independencies found in the data are due to separations in the true causal graph. This assumption can, however, be violated in many ways, including xor connections, deterministic functions or cancelling paths. In this work, we propose a weaker assumption that we call 2-adjacency faithfulness. In contrast to adjacency faithfulness, which assumes that there is no conditional independence between each pair of variables that are connected in the causal graph, we only require no conditional independence between a node and a subset of its Markov blanket that can contain up to two nodes. Equivalently, we adapt orientation faithfulness to this setting. We further propose a sound orientation rule for causal discovery that applies under weaker assumptions. As a proof of concept, we derive a modified Grow and Shrink algorithm that recovers the Markov blanket of a target node and prove its correctness under strictly weaker assumptions than the standard faithfulness assumption.

Authors

Alexander Marx, Arthur Gretton, Joris M. Mooij

Submitted

Proceedings of the Conference on Uncertainty in Artificial Intelligence, UAI 2021

Date

01.08.2021

Link

Abstract

It is well-known that correlation does not equal causation, but how can we infer causal relations from data? Causal discovery tries to answer precisely this question by rigorously analyzing under which assumptions it is feasible to infer directed causal networks from passively collected, so-called observational data. Classical approaches assume the data to be faithful to the causal graph, that is, independencies found in the distribution are assumed to be due to separations in the true graph. Under this assumption, so-called constraint-based methods can infer the correct Markov equivalence class of the true graph (i.e. the correct undirected graph and some edge directions), only using conditional independence tests. In this dissertation, we aim to alleviate some of the weaknesses of constraint-based algorithms. In the first part, we investigate causal mechanisms, which cannot be detected when assuming faithfulness. We then suggest a weaker assumption based on triple interactions, which allows for recovering a broader spectrum of causal mechanisms. Subsequently, we focus on conditional independence testing, which is a crucial tool for causal discovery. In particular, we propose to measure dependencies through conditional mutual information, which we show can be consistently estimated even for the most general setup: discrete-continuous mixture random variables. Last, we focus on distinguishing Markov equivalent graphs (i.e. infer the complete DAG structure), which boils down to inferring the causal direction between two random variables. In this setting, we focus on continuous and mixed-type data and develop our methods based on an information-theoretic postulate, which states that the true causal graph can be compressed best, i.e. has the smallest Kolmogorov complexity.

Authors

Alexander Marx

Submitted

Saarländische Universitäts- und Landesbibliothek

Date

06.07.2021

DOI

Abstract

We study the problem of inferring causal graphs from observational data. We are particularly interested in discovering graphs where all edges are oriented, as opposed to the partially directed graph that the state-of-the-art discover. To this end we base our approach on the algorithmic Markov condition. Unlike the statistical Markov condition, it uniquely identifies the true causal network as the one that provides the simplest--as measured in Kolmogorov complexity--factorization of the joint distribution. Although Kolmogorov complexity is not computable, we can approximate it from above via the Minimum Description Length principle, which allows us to define a consistent and computable score based on non-parametric multivariate regression. To efficiently discover causal networks in practice, we introduce the GLOBE algorithm, which greedily adds, removes, and orients edges such that it minimizes the overall cost. Through an extensive set of experiments we show GLOBE performs very well in practice, beating the state-of-the-art by a margin.

Authors

Osman Mian, Alexander Marx, Jilles Vreeken

Submitted

Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2021

Date

01.05.2021

LinkCode

Abstract

We consider the problem of inferring the causal direction between two univariate numeric random variables X and Y from observational data. This case is especially challenging as the graph X causes Y is Markov equivalent to the graph Y causes X, and hence it is impossible to determine the correct direction using conditional independence tests. To tackle this problem, we follow an information theoretic approach based on the algorithmic Markov condition. This postulate states that in terms of Kolmogorov complexity the factorization given by the true causal model is the most succinct description of the joint distribution. This means that we can infer that X is a likely cause of Y when we need fewer bits to first transmit the data over X, and then the data of Y as a function of X, than for the inverse direction. That is, in this paper we perform causal inference by compression. To put this notion to practice, we employ the Minimum Description Length principle, and propose a score to determine how many bits we need to transmit the data using a class of regression functions that can model both local and global functional relations. To determine whether an inference, i.e. the difference in compressed sizes, is significant, we propose two analytical significance tests based on the no-hypercompression inequality. Last, but not least, we introduce the linear-time Slope and Sloper algorithms that through thorough empirical evaluation we show outperform the state of the art by a wide margin.

Authors

Alexander Marx, Jilles Vreeken

Submitted

Knowledge and Information Systems

Date

01.09.2019

DOICode

Abstract

We consider the problem of telling apart cause from effect between two univariate continuous-valued random variables X and Y. In general, it is impossible to make definite statements about causality without making assumptions on the underlying model; one of the most important aspects of causal inference is hence to determine under which assumptions are we able to do so. In this paper we show under which general conditions we can identify cause from effect by simply choosing the direction with the best regression score. We define a general framework of identifiable regression-based scoring functions, and show how to instantiate it in practice using regression splines. Compared to existing methods that either give strong guarantees, but are hardly applicable in practice, or provide no guarantees, but do work well in practice, our instantiation combines the best of both worlds; it gives guarantees, while empirical evaluation on synthetic and real-world data shows that it performs at least as well as the state of the art.

Authors

Alexander Marx, Jilles Vreeken

Submitted

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2019

Date

01.07.2019

DOICode

Abstract

Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice--especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as L2 consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.

Authors

Alexander Marx, Jilles Vreeken

Submitted

Proceedings of the International Conference on Artificial Intelligence and Statistics, AISTATS 2019

Date

01.04.2019

LinkCode

Abstract

How can we discover whether X causes Y, or vice versa, that Y causes X, when we are only given a sample over their joint distribution? How can we do this such that X and Y can be univariate, multivariate, or of different cardinalities? And, how can we do so regardless of whether X and Y are of the same, or of different data type, be it discrete, numeric, or mixed? These are exactly the questions we answer. We take an information theoretic approach, based on the Minimum Description Length principle, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. Simply put, if Y can be explained more succinctly by a set of classification or regression trees conditioned on X, than in the opposite direction, we conclude that X causes Y. Empirical evaluation on a wide range of data shows that our method, Crack, infers the correct causal direction reliably and with high accuracy on a wide range of settings, outperforming the state of the art by a wide margin. Code related to this paper is available at: http://eda.mmci.uni-saarland.de/crack.

Authors

Alexander Marx, Jilles Vreeken

Submitted

Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Data, ECMLPKDD 2018

Date

13.08.2018

DOICode

Abstract

We consider the fundamental problem of inferring the causal direction between two univariate numeric random variables X and Y from observational data. The two-variable case is especially difficult to solve since it is not possible to use standard conditional independence tests between the variables. To tackle this problem, we follow an information theoretic approach based on Kolmogorov complexity and use the Minimum Description Length (MDL) principle to provide a practical solution. In particular, we propose a compression scheme to encode local and global functional relations using MDL-based regression. We infer X causes Y in case it is shorter to describe Y as a function of X than the inverse direction. In addition, we introduce Slope, an efficient linear-time algorithm that through thorough empirical evaluation on both synthetic and real world data we show outperforms the state of the art by a wide margin.

Authors

Alexander Marx, Jilles Vreeken

Submitted

Proceedings of the IEEE International Conference on Data Mining, ICDM 2017

Date

01.11.2017

DOICode

Abstract

In many research disciplines, hypothesis tests are applied to evaluate whether findings are statistically significant or could be explained by chance. The Wilcoxon–Mann–Whitney (WMW) test is among the most popular hypothesis tests in medicine and life science to analyze if two groups of samples are equally distributed. This nonparametric statistical homogeneity test is commonly applied in molecular diagnosis. Generally, the solution of the WMW test takes a high combinatorial effort for large sample cohorts containing a significant number of ties. Hence, P value is frequently approximated by a normal distribution. We developed EDISON-WMW, a new approach to calculate the exact permutation of the two-tailed unpaired WMW test without any corrections required and allowing for ties. The method relies on dynamic programing to solve the combinatorial problem of the WMW test efficiently. Beyond a straightforward implementation of the algorithm, we presented different optimization strategies and developed a parallel solution. Using our program, the exact P value for large cohorts containing more than 1000 samples with ties can be calculated within minutes. We demonstrate the performance of this novel approach on randomly-generated data, benchmark it against 13 other commonly-applied approaches and moreover evaluate molecular biomarkers for lung carcinoma and chronic obstructive pulmonary disease (COPD). We found that approximated P values were generally higher than the exact solution provided by EDISON-WMW. Importantly, the algorithm can also be applied to high-throughput omics datasets, where hundreds or thousands of features are included. To provide easy access to the multi-threaded version of EDISON-WMW, a web-based solution of our algorithm is freely available at http://www.ccb.uni-saarland.de/software/wtest/.

Authors

Alexander Marx, Christina Backes, Eckart Meese, Hans-Peter Lenhof, Andreas Keller

Submitted

Genomics, Proteomics & Bioinformatics

Date

01.01.2016

DOICode