Data-Driven Phenotyping

Data-Driven Phenotyping 620 372 IEEE Pulse
Author(s): Jeremy Orr, Atul Malhotra

Sleep apnea is a multifactorial disease with a complex underlying physiology, which includes the chemoreflex feedback loop controlling ventilation. The instability of this feedback loop is one of the key factors contributing to a number of sleep disorders, including Cheyne–Stokes respiration and obstructive sleep apnea (OSA). A major limitation of the conventional characterization of this feedback loop is the need for labor-intensive and technically challenging experiments. In recent years, a number of techniques that bring together concepts from signal processing, control theory, and machine learning have proven effective for estimating the overall loop gain of the respiratory control system (see Figure 1) and its major components, chemoreflex gain and plant gain, from noninvasive time-series measurements of ventilation and blood gases. The purpose of this article is to review the existing model-based techniques for phenotyping of sleep apnea, and some of the emerging methodologies, under a unified modeling framework known as graphical models. The hope is that the graphical model perspective provides insight into the future development of techniques for model-based phenotyping. Ultimately, such approaches have major clinical relevance since strategies to manipulate physiological parameters may improve sleep apnea severity. For example, oxygen therapy or drugs such as acetazolamide may be used to reduce chemoreflex gain, which may improve sleep apnea in selected patients.

FIGURE 1: A schematic diagram of the closed-loop respiratory control system. the plant represents the gas-exchange system. the input to the plant is level of ventilation (VE) o and the partial pressure of blood gasses in venus blood (taken as constant), and the output is the arterial gas tension (PvCO2 and PvO2) . the delay term represents the circulatory time delay between the lungs and chemoreceptors and the delay associated with the mixing of co2 and o2 with the existing level in the heart and arteries. the controller represents the aggregate response of the respiratory pattern generator to its inputs (including chemoreceptor outputs, higher congnitive inputs, wakefulness/sleep-stage-related drives, etc.). the (frequency-dependent) product of the various components around the loop (plant, delay, and controller) is known as the loop gain of the system. A high loop gain describes a system that is intrinsically unstable, whereas a low loop gain describes a more stable system.
FIGURE 1: A schematic diagram of the closed-loop respiratory control system. the plant represents the gas-exchange system. the input to the plant is level of ventilation (VE) o and the partial pressure of blood gasses in venus blood (taken as constant), and the output is the arterial gas tension (PvCO2 and PvO2) . the delay term represents the circulatory time delay between the lungs and chemoreceptors and the delay associated with the mixing of co2 and o2 with the existing level in the heart and arteries. the controller represents the aggregate response of the respiratory pattern generator to its inputs (including chemoreceptor outputs, higher congnitive inputs, wakefulness/sleep-stage-related drives, etc.). the (frequency-dependent) product of the various components around the loop (plant, delay, and controller) is known as the loop gain of the system. A high loop gain describes a system that is intrinsically unstable, whereas a low loop gain describes a more stable system.

The Chemoreflex Feedback Loop

A simplified model of the interaction among spontaneous fluctuations in breath-to-breath values of ventilation EQ-0721-2839 partial pressure of arterial EQ-0722-5933  and EQ-0722-1634 can be represented by the following matrix equation:
EQ-0721-3447(1) where EQ-0722-1657 EQ-0721-4126 EQ-0722-5928 is a vector of Gaussian distributed zero-mean noise terms with covariance EQ-0721-4317 and EQ-0721-4339 is an appropriately constrained coefficient matrix, representing the relationships among the modeled variables at the EQ-0721-4356 time lags. The above equation is a structured vector autoregressive (VAR) model and arises as a result of discretizing a continuous differential equation model of the gas exchange process, under small-signal theory approximations [1]. We showed in a previous work [2] that the parameters of this model can be used to describe the pairwise interactions among the model components (i.e., the controller and plant gains), and to derive the frequency-domain stability characteristics (i.e., the loop gain) of the underlying system.

Graphical Models for Time Series

The directed graphical model formalism provides a unifying framework for modeling complex evolving interactions among random variables. Graphical model representations of three time-series models are shown in Figure 2. The random variables are typically represented as the nodes of a graph, and their conditional dependencies are captured by directed arrows. In general, the encoded relationships can be nonlinear and may include latent variables of both discrete and continuous type. In the following sections, we discuss how classical time-series models, such as the VAR, time-varying VAR, switching VAR, and many other time-series modeling techniques can be represented as graphical models, thus yielding themselves to standard inference and learning algorithms designed for learning on graphs [3].

FIGURE 2 A graphical model representation of the time-series models: (a) a first-order VAr model with static parameters i, (b) a tVAr with dynamic patameters i1fiN, and (c) an sVAr model, which includes a collection of J VAr models, with the markov transition matrix Z. each node represents a random variable, and the lack of an edge represents the conditional independence relationship among the variables. the time-series samples y1,f, yN are observed, and the remaining variables are latent.
FIGURE 2: A graphical model representation of the time-series models: (a) a first-order VAr model with static parameters i, (b) a tVAr with dynamic patameters i1fiN, and (c) an sVAr model, which includes a collection of J VAr models, with the markov transition matrix Z. each node represents a random variable, and the lack of an edge represents the conditional independence relationship among the variables. the time-series samples y1,f, yN are observed, and the remaining variables are latent.

Vector Autoregressive Modeling

The application of autoregressive modeling to the identification of the respiratory feedback loop goes back to the pioneering work of Khoo et al. [1] in the 1990s. More recently, we showed that the technique could be generalized to identify transfer path function and to assess the stability properties of a system involving multiple interacting variables in a feedback loop [2]. Figure 2(a) ­depicts the graphical model representation of a first-order VAR model of a sequence of EQ-0721-4444 observations from a time series EQ-0721-4537 with the set of VAR coefficients lumped into the parameter EQ-0721-4553 Given a p-order VAR mode, one may exploit Bayes’ rule [Bayes’ rule states that the probability of A is conditioned on knowing EQ-0721-5546 to write the joint probability of the observations as  EQ-0823-2901EQ-0823-2840  Recursive application of Bayes’ rule yields the joint probability of the N time-series samples:
EQ-0721-1412(2)
A point-estimate of the model parameters can now be made by maximizing (2) with respect to EQ-0721-45531 Although, in the case of the Gaussian likelihood model, this optimization problem simplifies to a set of least-square equations, the graphical model formalism allows for inference and learning of model parameters under more general distributions. Moreover, graphical models provide a natural and intuitive tool for formulating variations on the classic methods as well as constructing more complex time-series models.

Time-Varying Vector Autoregressive

It is known that the ventilatory feedback loop parameters can vary over time due to factors such as sleep-state-related changes in the chemical control system, upper-airway mechanics, and other behavioral factors [1]. Assuming that the system parameters change at a sufficiently slower rate than the dynamics of the time series, we may use a time-varying VAR (TVAR) to model a nonstationary ventilatory time series, and the associated loop gains (and its components, controller and plant gains). Figure 2(b) depicts the graphical model representation of a TVAR model. The model also belongs to the class of linear dynamical systems (LDSs), and the celebrated Kalman filter and Rauch–Tung–Striebel (RTS) smoother can be used to learn the time-varying parameters of this model. The forward and backward recursions of the Kalman filter are a subset of a broader class of belief propagation (or message-passing) algorithms on the directed graphical models [4]. Moreover, one may use the expectation-maximization technique to learn the optimal learning rate of the TVAR [7].
Notably, the stability analysis of such time-varying systems has been a subject of extensive research within the control theory literature. Briefly, the traditional linear time-invariant Lyapunov asymptotic stability (LAS) analyses for linear time-invariant systems have been replaced by a more refined concept of finite-time stability (FTS). While LAS deals with the behavior of a system within a sufficiently long (in principle, infinite) time interval, FTS has been used to study the system behavior within a finite (possibly short) interval and, therefore, is more applicable to the study of systems with threshold mechanisms (for instance, the chemoreflex feedback loop, which includes an apneic threshold for arterial EQ-0722-2002 [5].

Switching Vector Autoregressive

An alternative approach to modeling sleep-dependent changes in the chemoreflex system variables is to utilize a switching VAR (SVAR) model. Figure 2(c) is a graphical model representation of the SVAR, which includes N discrete latent switching variables EQ-0722-2045 modeling the probability of belonging to any one of the J models at the nth time step. Physiologically, these latent variables could be driven by changes in sleep stage, body position, sensor fallouts, etc. However, in the absence of this side information, inference in graphical models allows for the identification of the most likely setting of these latent causes across time.
The models discussed so far are only a subset of a rich class of time-series models. Other important information, such as the quality of measured signals, and the influence of latent and observed variables (such as arousals and the concept of wakefulness drive) can be conveniently incorporated into the graphical model of ventilatory time series. Other latent switching models include the hidden Markov models, the switching LDSs, and their nonparametric analogs, where the dimension of the hidden state or the number of modes are also defined as a part of inference and learning [6].

Maximum-Likelihood Learning Versus Outcome Discriminative Learning

There are a number of exact and approximate algorithms for inference and learning in graphical models, with the objective of maximizing the data likelihood. These approaches may include methods of expectation propagation, sequential Markov chain Monte Carlo methods, and variational Bayesian inference, which provide full marginal distributions over the model parameters [3], [4]. More recently, Nemati et al. introduced a new class of outcome-discriminative (supervised) algorithms for learning “phenotypic” patterns in multivariate time series [8]. Figure 3 presents a schematic diagram of a dynamic neural-network representation of the SVAR model of Figure 2(c), augmented with a neural-network-based classification layer. Given the representation of Figure 3, one may learn the marginal distributions over the switching variables and the model parameters using the standard error backpropagation technique for neural networks. In contrast to the standard maximum-­likelihood techniques, here the objective is to find time-series patterns that maximally separate two patient cohorts and, therefore, define cohort-specific phenotypic time-series dynamics.

nemati03-2339402
FIGURE 3 the dynamic neural network (DNN) analog representation of an sVAr model, with an added neural network classifier layer (in purple). the DNN is constructed by unrolling the graphical model representation of Figure 2(c), both in time and in inference step (superscripts f and s denote the inferred filtered and smoothed state variables, respectively). A two-pass efficient algorithm allows for learning time-series (phenotypic) dynamics that are most predictive of outcomes of interest (e.g., normal versus apneic). A total of J dynamical behaviors (or chemoreflex models) can be learned on a cohort of patients’ time series (indexed by i) from an overnight polysomnography study.

Conclusions and Perspective

The graphical model formalism provides a flexible framework for development of time-series models of the chemoreflex feedback loop. Additionally, complex interaction among ventilatory variables and the sleep-arousal dynamics as well as sensor noise and artifacts can be encoded into the structure of the graph. Stability analysis of the time-varying and switching models of the chemoreflex control system [5] provides unique opportunities for future research, with applications to noninvasive assessment of ventilatory instability in patients with congestive heart failure or OSA. More importantly, identification of the mechanisms responsible for system instability in such patients would enable clinicians to target particular therapies on a personalized basis. For example, interventions that specifically target the individual components of loop gain (e.g., supplemental oxygen or acetazolamide to adjust chemosensitivity) may provide selected patients with attractive treatment alternatives to currently available treatments. Phenotyping these patients in an efficient and ­reliable manner is requisite and can be facilitated by the use of graphical models in sleep apnea.
 

References

  1. M. C. K. Khoo, Physiological Control Systems: Analysis, Simulation, and Estimation. Piscataway, NJ: Wiley IEEE Press, 2000.
  2. S. Nemati, B. A. Edwards, S. A. Sands, P. J. Berger, A. Wellman, G. C. Verghese, A. Malhotra, and J. P. Butler, “Model-based characterization of ventilatory stability using spontaneous breathing,” J. Appl. Physiol., vol. 111, no. 1, pp. 55–67, 2011.
  3. M. J. Wainwright and M. I. Jordan, “Graphical models, exponential families, and variational inference,” Found. Trends Mach. Learn., vol. 1, no. 1–2, pp. 1–305, 2008.
  4. K. P. Murphy, Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press, 2012.
  5. F. Amato, R. Ambrosino, M. Ariola, and C. Cosentino. “Finite-time stability of linear time-varying systems with jumps,” Automatica, vol. 45, no. 5, pp. 1354–1358, 2009.
  6. E. Fox, E. B. Sudderth, M. I. Jordan, and A. Willsky. “Bayesian nonparametric inference of switching dynamic linear models,” IEEE Trans. Signal Processing, vol. 59, no. 4, pp. 1569–1585, 2011.
  7. S. Nemati, A. Wellman, B. A. Edwards, S. A. Sands, and A. Malhotra, “Model-based characterization of ventilatory stability during spontaneous breathing: A human study,” in Proc. B70. Sleep Disordered Breathing: Pathophysiology, May 1, 2012, pp. A3609–A3609.
  8. S. Nemati, L. H. Lehman, and R. P. Adams. “Learning outcome-discriminative dynamics in multivariate physiological cohort time series,” in Proc. 35th Annu. Int. Conf. IEEE Engineering Medicine Biology Society, 2013, pp. 7104–7107.