News Release

Data driven model discovery and interpretation for CAR T-cell killing using sparse identification and latent variables

Peer-Reviewed Publication

College of Charleston

Dynamical systems modeling is one of the most successfully implemented methodologies throughout mathematical oncology (1). Applications of these model first approaches have led to important insights in fundamental cancer biology as well as the planning and tracking of treatment response for patient cohorts (29). Simultaneously, the last twenty years have seen explosive growth in the study and application of data-driven methods. These data first approaches, initially implemented as machine learning methods for imaging and genomics analyses, have seen much success (10, 11). However, such approaches are often limited to classification problems and fall short when the intention is to identify and validate mathematical models of the underlying dynamics. Recent efforts by us and others have aimed to develop methodologies that bridge these model first and data first approaches (1214).

In this work, we combine the methods of latent variable discovery and sparse identification of nonlinear dynamics (SINDy) (1517) to analyze experimental in vitro cell killing assay data for chimeric antigen receptor (CAR) T-cells and glioblastoma cancer cells (18). This experimental data, featuring high temporal resolution, offers a unique opportunity to conduct an in situ test of the SINDy model discovery method. Interpretation of the discovered SINDy model is conducted under the expectation of a predator-prey interaction in which the cancer cells function as the prey and the CAR T-cells the predator (19).

Predator-prey systems are a broad class of ordinary differential equations (ODEs) that aim to characterize changes in populations between two or more groups of organisms in which at least one survives via predation on another. Originally applied to the study of plant herbivory (20) and fishery monitoring (21) in the early 20th century, predator-prey models have since become a workhorse of ecology, evolutionary biology, and most recently mathematical oncology (19, 22). Importantly, predator-prey models underpin much of the computational modeling of CAR T-cell killing, particularly in the context of in vitro cell killing assays (7, 23). An important example of these is the CAR T-cell Response in GliOma (CARRGO) model, a model that characterizes the in vitro interactions between CAR T-cells and glioma cells (18). The CARRGO model has shed light on the underlying biological mechanisms of action (18, 23), has informed effective dosing strategies for combination CAR T-cell and targeted radionuclide therapy (24), and CAR T-cell therapy in combination with the anti-inflammatory steroid Dexamethasone (25).

Despite the success of the CARRGO model, it is limited in the scope of potential phenomena that it can capture in regards to the precise interactions between the CAR T-cells and glioma cells. In this work, we use the SINDy modeling framework to incorporate important extensions to the CARRGO model. These extensions are: predator growth that is dependent on the density of prey, also known as a functional response (26, 27); individual predator and prey growth that saturates at some maximum value (logistic growth) (18), or has a population threshold below which collapse occurs (the Allee effect) (28, 29); and predator-prey interactions in which one or two CAR T-cells are bound to a single cancer cell at once, referred to as single or double binding, respectively (23, 30). Other efforts of extending CAR T-cell modeling have looked at fractional order derivatives (31) and stochastic dynamics (32) in the context of CAR T-cell treatment for viral infections, specifically coronaviruses. Our treatment focuses on integer order derivatives and deterministic dynamics.

An ever-present challenge to quantitative biologists is fitting a proposed model to experimental data, also known as parameter estimation or model inference. On one hand, quantitative biologists seek models that capture as much biological realism and complexity as possible. On the other hand, increasing model complexity increases the computational challenge to accurately, confidently, and expediently determine model parameter values. This approach is further complicated if a researcher chooses to compare competing or complementary models (33, 34). An alternative approach, examined in this paper, is to leverage newly developed methods rooted in data science and machine learning which identify the strength of individual mathematical terms as candidates for an explanatory model. These methods are often referred to as dynamic mode decomposition, symbolic regression, or sparse identification.

Dynamic mode decomposition (DMD) is a data driven technique that interrogates time-series data by performing a singular value decomposition (SVD) on carefully structured matrices of the given data (13, 35). In this formalism, the orthonormal basis vectors generated by singular value decomposition serve as linear generators of the system dynamics such that forward prediction can be performed absent a known underlying mathematical model. Alternatively, SINDy identifies the specific mathematical terms that give rise to the observed dynamics governed by ordinary and partial differential equation models (15). SINDy achieves this by regressing experimental data onto a high-dimensional library of candidate model terms, and it has proven successful in climate modeling (36), fluid mechanics (37), and control theory (38). Since the initial publication of SINDy, several extensions have been studied, including: discovery of rational ordinary differential equations (39, 40); robust implementation with under-sampled data (41) or excessive noise (42); or incorporation of physics informed neural networks when particular symmetries are known to exist (43).

In its original and subsequent implementations, the CARRGO model demonstrated valuable utility in quantifying CAR T-cell killing dynamics when treating glioblastoma. Inferences of the underlying biological dynamics were made by examining how model parameter values changed along gradients of effector:target (E:T) ratios or as a function of other combination therapy concentrations. This is in direct contrast to the SINDy methodology, where the discovery of different model terms provides insight into the underlying biological dynamics as a result of variation along the E:T gradient. Here we compare these two modelling frameworks on the same data set to provide further insight into the trade-offs of data first versus model first approaches.

In this paper we utilize our experimental data to test these and other aspects of the DMD and SINDy frameworks. In Section 2.2 we introduce the families of models that are anticipated to be simultaneously biologically relevant and identifiable by SINDy, and we introduce a new approach to performing SINDy-based model inference. In Section 2.3.1 we present the latent variable analysis based on DMD that is used to generate the time-series CAR T-cell trajectories based on those of the cancer cells and the known boundary values for the CAR T-cells. In Section 2.3.2 we introduce the SINDy methodology in the particular context of our application. Results of our approach are presented in Section 3 where we (1) highlight how the discovered models vary as a result of different initial conditions in the cancer cell and CAR T-cell populations and (2) examine how well the discovered models found in this data first approach compare to a typical model first in characterizing the experimental data. In Section 4 we demonstrate how our results can guide experimental design to validate the predictions made by the discovered models, and we elaborate on some of the challenges encountered in this study.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.