Methodological and Computational Advances in Survival Analysis

26th of November 2024

Inria Paris - 48, rue Barrault

Survival analysis has recently seen substantial advances in both its computational methods and theoretical foundations. On one hand, driven by the new availability of massive claim management and EHR datasets, large-scale machine learning techniques and GPUs have been adapted to address survival prediction tasks. On the other, survival analysis has been hybridised with key topics in modern machine learning and biostatistics, including time-dependent data, fairness, differential privacy, online learning, and deep models.

This workshop aims to bring together researchers and practitioners for a one-day event in Paris to discuss and explore these recent developments. A particular emphasis will be put on industrial applications of survival analysis.

Registrations for in-person attendance are now closed. You are welcome to attend remotely using the Webex link provided bellow.

Speakers

Julie Alberge, Inria Paris-Saclay (SODA) and Vincent Maladière, Probabl - Proper Scoring Rule and Stochastic Optimization with Competing Risks

In this talk, we explore the competing risks framework in survival analysis, where the goal is not only to predict the time until an event occurs but also to account for the possibility of multiple outcomes. Traditional survival analysis models focus on a single event, but competing risks present a classification challenge, which has been less explored. A key limitation of classic competing risks models lies in the coupling of architecture and loss, affecting scalability. To address these issues, we design a strictly proper censoring-adjusted separable scoring rule, allowing optimization on a subset of the data because the evaluation is conducted independently for each observation. The loss estimates outcome probabilities and enables stochastic optimization for competing risks, which we use for efficient gradient boosting trees. Our algorithm, called SurvivalBoost, not only outperforms 12 state-of-the-art models across several metrics on 4 real-life datasets, both in competing risks and survival settings, but also provides great calibration, the ability to predict across any time horizon, and faster computation times compared to existing methods. Additionally, we present our Python library, hazardous (https://soda-inria.github.io/hazardous/), which implements SurvivalBoost, a gradient boosting tree for competing risks built on top of scikit-learn, scalable and easy to cross validate. Some examples have been implemented for applying SurvivalBoost in both survival and competing risks settings.

Olivier Bouaziz, Université de Lille - Pseudo-Observations and the Super Learner for the Prediction of Time-to-Event Data

Predicting the time to an event of interest, based on patient attributes, is of great interest when analysing medical data. However, time to event data usually suffer from right-censoring which makes it difficult to propose a prediction algorithm. Under a quadratic loss, this problem is equivalent to estimating the conditional Restricted Mean Survival Time (RMST). To that aim, we propose a flexible and easy-to-use ensemble algorithm that combines pseudo-observations and the super learner. The classical theoretical results of the super learner are extended to right-censored data, using a new definition of pseudo-observations, the so-called split pseudo-observations. Simulation studies indicate that the split pseudo-observations and the standard pseudo-observations are similar even for small sample sizes which seems to indicate that our theoretical results extend to the super learner based on the standard pseudo-observations. If time allows, the method will be further illustrated on a maintenance and a colon cancer datasets. This talk is based on https://arxiv.org/pdf/2404.17211. It is a joint work with Ariane Cwiling and Vittorio Perduca (MAP5, Université Paris Cité).

Adeline Fermanian, Califrais - Dynamic Survival Analysis with Controlled Latent States

We consider the task of learning individual specific intensities of counting processes from a set of static variables and irregularly sampled time series. We introduce a novel modelization approach in which the intensity is the solution to a controlled differential equation. We first design a neural estimator by building on neural controlled differential equations. In a second time, we show that our model can be linearized in the signature space under sufficient regularity conditions, yielding a signature-based estimator which we call CoxSig. We provide theoretical learning guarantees for both estimators, before showcasing the performance of our models on a vast array of simulated and real-world datasets from finance, predictive maintenance and food supply chain management.

Camila Fernandez, LPSM and Nokia Labs - An Online Learning Approach to Survival Analysis

We introduce an online mathematical framework for survival analysis, allowing real time adaptation to dynamic environments and censored data. This framework enables the estimation of event time distributions through an optimal second order online convex optimization algorithm—Online Newton Step (ONS). This approach, previously unexplored, presents substantial advantages, including explicit algorithms with non-asymptotic convergence guarantees. Moreover, we analyze the selection of ONS hyperparameters, which depends on the exp-concavity property and has a significant influence on the regret bound. We introduce an adaptive aggregation method that ensures robustness in hyperparameter selection while maintaining fast regret bounds. These findings can extend beyond the survival analysis field, and are relevant for any case characterized by poor exp-concavity and unstable ONS. These assertions are illustrated by simulation experiments. Additionally, I will present an application of diverse survival analysis tools to the employee attrition prediction problem.

Jean Feydy, Inria Paris (HeKA) - Scalable survival analysis in a French hospital

Since 2021, researchers at Inserm and Inria have access to drug reimbursement data for the full French population. This may open the door to large-scale studies in pharmacovigilance... but is this data truly accessible and usable for novel methodological work? To answer these questions, I will first present survivalGPU, a re-implementation of the R survival package for the Cox PH model that scales up to millions of patients in seconds. Then, I will discuss the practicalities of getting access to French "cartes vitales" data - from a description of the necessary paperwork to honest feedback on the Health Data Hub platform. Finally, I will discuss our current projects on the topic in the multidisciplinary HeKA team at PariSanté Campus.

Olivier Lopez, CREST and ENSAE - Survival Analysis and Micro-Reserving in Insurance

For some guarantees in insurance, like medical malpractice or corporal damages, the claim management process may be quite long (up to several years). This means that the amount of a given claim may not be stabilized during this period. Evaluating this amount (especially in the case of large claims) is a key challenge for the insurer, due to the need to constitute a reserve that is able to absorb this loss. Traditional reserving approaches (called "triangle based") are associated with huge confidence intervals and imprecisions, that usually lead to immobilize a too high proportion of capital, which is in the disadvantage of both insurer and policyholders. The use of machine learning and survival analysis is an opportunity to considerably improve the vision of the reserve. In this talk, we will go through different methodologies and applications that illustrate this issue.

Stefan Michiels, IGR - Survival Prediction from High-Dimensional Biomarker Data in Clinical Trials

With the development of genomics and targeted therapies, prediction models based on biomarkers are increasingly used for estimating patient survival outcomes and expected treatment benefits using time-to-event data from clinical trials in oncology. In this presentation, I will overview some of our methodological developments over the last decade using penalised Cox  or machine learning methods.  In high-dimensional penalised regression models, minimizing the false discovery rate is of primary importance, while a low false negative rate is a complementary measure, and different penalty terms are explored.  Variants of adaptive lasso methods can be used to analyse multi-omimics or multi-pathway data or correctly identify treatment-by-biomarker interactions.   More recent works adapt artificial neural network models to time-to-event data using specific ways for handling censored observations, and also allows integration of multi-omics data.Different simulation studies are performed to study the operating characteristics of the methods, which will be illustrated using gene expression data from several breast cancer studies.

Jean Ogier du Terrail, Owkin - FedECA: A Federated External Control Arm Method for Causal Inference with Time-To-Event Data in Distributed Settings

External control arms (ECA) can inform the early clinical development of experimental drugs and provide efficacy evidence for regulatory approval. However, the main challenge in implementing ECA lies in accessing real-world or historical clinical trials data. Indeed, regulations protecting patients' rights by strictly controlling data processing make pooling data from multiple sources in a central server often difficult. To address these limitations, we develop a new method, 'FedECA' that leverages federated learning (FL) to enable inverse probability of treatment weighting (IPTW) for time-to-event outcomes on separate cohorts without needing to pool data. To showcase the potential of FedECA, we apply it in different settings of increasing complexity culminating with a real-world use-case in which FedECA provides evidence for a differential effect between two drugs that would have otherwise gone unnoticed. By sharing our code, we hope FedECA will foster the creation of federated research networks and thus accelerate drug development.

Geneviève Robin, Owkin - A Statistical Learning Take on the Concordance Index for Survival Analysis

The introduction of machine learning (ML) techniques to the field of survival analysis has increased the flexibility of modelling approaches, and ML based models have become state-of-the-art. These models optimize their own cost functions, and their performance is often evaluated using the concordance index (C-index). From a statistical learning perspective, it is therefore an important problem to analyze the relationship between the optimizers of the C-index and those of the ML cost functions. We address this issue by providing C-index Fisher-consistency results and excess risk bounds for several of the commonly used cost functions in survival analysis. We identify conditions under which they are consistent, under the form of three nested families of survival models. We also study the general case where no model assumption is made and present a new, off-the-shelf method that is shown to be consistent with the C-index, although computationally expensive at inference. Finally, we perform limited numerical experiments with simulated data to illustrate our theoretical findings.

Charlotte Voinot, Inria Montpellier (Premedical) - Estimation of the Average Treatment Effect (ATE) in Causal Survival: Comparison, Applications and Practical Recommendations

Estimating the Average Treatment Effect (ATE) is one of the fundamental measures in causal inference, aimed at assessing the causal impact of a treatment on an outcome variable. Causal survival analysis is at the heart of this approach, seeking to evaluate the effect of a treatment on patient survival over time. However, despite the abundance of literature on causal survival, the use of Cox methods remains predominant for assessing this effect. The main objective of this research is to estimate the causal effect of a treatment using survival data which not necessarily derived from randomized trials. Its main aim is to provide users with practical recommendations in the face of the multitude of information available, and to highlight the advantages and differences compared with the classic correlation approaches still widely used such as Hazard ratio to measure the impact of a treatment. The impact of the choice of variables is also discussed. To this end, we will begin by presenting the state of the art in causal survival methods, describing identifiable assumptions and the main estimators, including weighting, regression and triply/doubly robust approaches. Then, an extensive simulation study will then be carried out to compare the different estimators, their preferred regimes and illustrate their theoretical properties on finite sample sizes. Finally, we will examine how the addition of certain variables in the censoring, survival or treatment models can impact the variance of the estimators. Results will be discussed with a particular focus on the validity of the estimators and its robustness to misspecification on finite sample sizes simulated datasets for practical recommendations in non-randomized settings. This is joint work with Julie Josse (Inria) and Bernard Sebastien (SANOFI).

Hugo Yèche, ETH Zürich - Advancing Deep Learning-based Early Warning System in the Intensive Care Unit with Dynamic Survival Analysis

In the intensive care unit, clinicians have seen the emergence of machine learning-based Early Warning Systems (EWS) to assist them in patient monitoring. Such systems aim to predict, before they occur, specific adverse events, such as organ failures. At their core, EWSs rely on models of a cumulative failure function at a predefined horizon. In the literature, such timestep-level estimators are typically trained by maximum likelihood estimation of this cumulative failure function. In this talk, we explore the benefits of an alternative approach: relying on a dynamic survival model, trained as such, to estimate the desired cumulative failure function. First, we discuss how we can effectively bridge the gap to existing models in terms of timestep-level risk estimation. Then, we show how, integrated into an EWS, such a survival model is beneficial, if we leverage the additional risk estimates outside the predefined horizon it provides.

Organising Committee

Judith Abecassis, Inria Paris-Saclay (SODA)

Julie Alberge, Inria Paris-Saclay (SODA)

Linus Bleistein, Inria Paris (HeKA) and Inria Montpellier (Premedical)

Agathe Guilloux, Inria Paris (HeKA)

Julie Josse, Inria Montpellier (Premedical)