On Incorporating Forecasts into Linear State Space Model Markov Decision Processes

Weather forecast information will very likely find increasing application in the control of future energy systems. In this paper, we introduce an augmented state space model formulation with linear dynamics, within which one can incorporate forecast information that is dynamically revealed alongside the evolution of the underlying state variable. We use the martingale model for forecast evolution (MMFE) to enforce the necessary consistency properties that must govern the joint evolution of forecasts with the underlying state. The formulation also generates jointly Markovian dynamics that give rise to Markov decision processes (MDPs) that remain computationally tractable. This paper is the first to enforce MMFE consistency requirements within an MDP formulation that preserves tractability.


Introduction
Forecasts are ubiquitous in energy system control problems and there is reason to believe their importance will only grow, e.g. in the fast changing electric sector. This is especially true for forecasts that provide weather-related information, as weather patterns have a strong impact on energy demand and increasingly on (renewable) energy production. The meteorological community has made significant progress in that field over the past decades and can now offer several advantages over purely statistical models [1]. In a recent review of forecasting for renewable energy [2], the authors note a rise in demand for probabilistic forecasting and that the typical renewable energy production use cases for weather forecasts correspond to timescales of weeks to years, or hours to a day ahead.
Much deeper transformations are in store once very high penetrations of renewables are reached. A growing number of applications will require storing electricity with durations from 10 to 100 hours [3][4][5]. Wind generation is one example for which low availability levels can be observed for several consecutive days. The need for longer duration storage will appear more clearly with penetrations of > 70% wind and solar generation on a regional grid -see e.g. Figure  1 from [5]. As soon as longer duration storage becomes available, we will correspondingly need new energy management strategies. At these timescales, conditioning on forecasts, in particular for weather variables, will very likely have large impacts on problems of decision making under uncertainty.
Using forecast information in the context of control problems is a difficult general problem that implicitly appears in many real-life applications. In sequential decision problems, it is often the case that exogenous forecast information is presented to the controller at regular intervals. Given the key role of Markov decision processes (MDPs) in the computation of optimal policies in such settings, a full accounting of the impact of future forecast information requires introducing the forecasts into the Markov state variable, thereby leading to potential high-dimensional state representations. Another fundamental issue relates to the fact that the forecasts should be "compatible" with the state variable that is being forecast (e.g. weather), so that the forecasted state variable (i.e. the state variable for which forecasts are available) and the forecasts themselves should exhibit self-consistent dynamics. To gain some appreciation for this issue, note that the s-period forecast must contain information that implicitly "peeks" s periods into the future of the underlying state space model, so that the s-period forecast implicitly constrains the dynamics of the underlying model over the next s periods. These constraints need to be built into the joint dynamics in such a way that the Markov structure is preserved. The preservation of Markov structure is critical if we wish to be able to compute optimal policies via the use of MDP-based theory and algorithms.
In this paper, we utilize the martingale model for forecast evolution (MMFE) as a vehicle for imposing the appropriate mathematical consistency between the dynamics of the forecasts and the forecasted state variables. The MMFE framework was introduced and developed by [6][7][8] and has since been utilized extensively by the inventory control and supply chain management community (see e.g [9] and the references therein). Applications of the MMFE and studies on the impact of forecasts on decision making can also be found in the energy community, e.g. for hydraulic reservoir management [10] or wind energy integration [11,12].
To our knowledge, this paper is the first to rigorously introduce forecast model consistency into MDPs, specifically in the context of linear state space models. This work gives us the first principled and mathematically consistent framework for the incorporation of forecasts into MDPs in the setting of state space models with linear dynamics, and uses no ad hoc elements to add forecast information into the MDP setting. Linear state space models are widely applied across many disciplines, and can even represent the linearized dynamics associated with nonlinear structure [13].
We note that forecasts must depend on a richer information filtration than that associated with the forecasted state variable, because an optimal MDP policy computed from the forecasted state variable already fully utilizes all the information associated with the forecasted state variable's filtration. In our setting, the extra information that enters the forecasts is the meteorological data available to the forecasters that is unobserved by the energy system manager. Thus, a key contribution of our paper is the development of an MMFE framework in which one can rigorously discuss, via the use of the language of σ-algebras, the different ways in which forecast information can be incorporated into an MDP framework. In our carefully chosen formulation, each of these different approaches for incorporating MMFE forecasts leads to a different, but computationally tractable, MDP.
Our first new MDP (Section 4) incorporates the "static" forecast information that is available to the decision-maker at the beginning of the decision horizon, and leads to an MDP that has the same state space as for the forecasted state variable, but with transition probabilities that are nonstationary as a consequence of the initial set of forecasts. The model has the property that when one conditions the future dynamics of the forecasted state variable on the forecast information available at the beginning of the decision horizon, the Markov structure is preserved with no need to increase the dimensionality of the state representation. In Section 5, we develop an MDP in which the forecasts are dynamically updated over time, along with the forecasted state variable. Thus, this formulation explicitly models the additional forecasting information that is revealed to the decision-maker over time. In this dynamic forecasting formulation, one needs to expand the state space of the MDP to incorporate the forecast evolution, but the MDP has stationary transition probabilities. Our final MDP is a formulation in which new r-period lookahead forecasts are made available to the decision-maker over time, in addition to an extended set of static forecasts that provides forecast information more than r periods into the future. This MDP that combines both static and dynamic forecasts is introduced in Section 6, and leads to both an enlarged state space and non-stationary transition probabilities.
An alternative means of utilizing the availability of forecasts in the control setting is to apply the ideas of model predictive control (MPC). MPC has become a standard tool for many industrial applications and provides a practical way of dealing with forecasts [14][15][16]. In this approach, one uses the forecasts available to solve a sequence of MDP formulations over time. At each decision epoch, a conventional MDP that incorporates the forecasts available is solved, the optimal first period action is taken, and this process is repeated at the next decision epoch. In particular, the MDPs that are used by MPC at each decision epoch do not explicitly model, within the MDP, the fact that the decision-maker will have available a new set of forecasts at each future decision epoch within the decision horizon associated with the MDP. This is also the case in adaptations of standard MPC to the setting where the forecasts contain probabilistic information [16]. In contrast, the MDPs introduced by this paper model the fact that the forecasts are continually "refreshed" over the MDP's decision horizon, and do so via a formulation that preserves the computational tractability of the MDP.
In Section 7, we introduce a simple energy system control model that controls interior building temperatures in an external weather environment for which forecasts are available. In the presence of a quadratic cost structure, we are able to use the existing linear-quadratic stochastic control theory to compute the optimal value associated with MDPs for the energy system in which no forecast information is available and also the optimal value for the dynamic forecasting MDP of Section 5. This allows us to analyze the degree of improvement that can be obtained by incorporating dynamic forecast information into the MDP formulation in the context of our simple energy system example. Section 8 concludes the paper with a discussion of additional research questions that this paper motivates.

MDP's with no forecasts
In this section, we review the basic MDP framework that can be used when making sequential decisions involving an energy system that is affected by the weather. Our formulation here does not take advantage of any forecast information that may be available. We model the dynamics in discrete time, and take the view that the weather variables Wn at time n can be represented by an R d -valued random variable (rv). Given our MDP modeling perspective, we assume that (Wn : n ∈ Z) is a stationary R d -valued stochastic process that enjoys the Markov property, so that for n ∈ Z, where (Zn : n ∈ Z) is an independent and identically distributed (iid) sequence of R m1 -valued rv's and f is a (deterministic) mapping from R d × R m1 into R d . The stationarity is intended here to simplify the exposition and is (at best) only approximately valid in the weather setting. For example, in examining daily weather records, it may be that such time series look approximately stationary over time scales of (say) one month. Given that our decision horizon is typically much shorter than a month, the stationarity assumption will often be a reasonable one in practice. For the energy system's control, we model its state evolution via an R l -valued sequence (Xn : for n ∈ Z, where (Vn : n ∈ Z) is an iid sequence of R m2 -valued rv's independent of the Z j 's, An is an A-valued action taken at time n, and φ is a deterministic mapping. The action An must be adapted to the history Fn σ((W j , X j ) : j ≤ n), so that it can depend only on previously observed values of the weather and control system state. The joint dynamics 2.1 and 2.2 assume (reasonably) that the weather affects the control system dynamics, but not vice-versa. We now describe the dynamic program (DP) backwards dynamic recursion that is commonly used to compute the optimal A * j 's, when optimizing the control of such an energy system over a finite horizon [0, t + 1). Throughout this paper, we take n = 0 as the time at which the sequence of control actions will be computed. Suppose that our goal is to minimize the total expected cost of running the energy system over [0, t + 1), namely Here c(X j , A j , W j+1 ) represents the one-period cost for running the energy system over [j, j + 1). For an (appropriately integrable) function h with domain R l × R d , define the operator The DP value functions (v i (·) : 0 ≤ i ≤ t) are then computed via the recursion Assuming that v t , v t−1 , · · · , v 0 are recursively computed via 2.6 and 2.7 , we then select a * i (x, w) (a * t (x, w)) as any minimizer (assumed to exist) of the right-hand side of 2.6 (2.7), and put is then the desired cost-minimizing adapted optimal control, see e.g. [17].

The mathematical structure of forecasts
In order to build forecast information into the Markov model of Section 2, we review the mathematical structure of forecasts, so that we can ensure that the model combining Markovian state dynamics and forecast information respects the appropriate mathematical constraints. To this end, we assume that E||Wn|| 2 < ∞ (where || · || is the Euclidian norm). We model the (point) forecast F n|k of Wn available at time k ≤ n as the rv E[Wn|G k ], where G k is a σ-algebra representing the information available to the forecaster. Since weather forecasters have available vastly more weather information than does the energy system manager, we expect that G k represents a strictly richer "information set" than F W k σ(W j : j ≤ k). Consequently, we require G k to be strictly larger than F W k . In fact, if G k = F W k , the availability of forecasts will offer no advantage over the optimal control (A * j : 0 ≤ j ≤ t) computed in Section 2, since that policy is already guaranteed to be optimal over all F W k -adapted policies. We note that F n|n = Wn and that the tower property of conditional expectation implies that be the k'th martingale difference associated with the martingale (F n|k : k ≤ n). The square integrability of the Wn's implies that D n|k D m|j is integrable and for j = k and n ≥ k, m ≥ j. This orthogonality of D n|k and D m|j is a key property of such martingale differences. As was discussed in the Introduction, the fact that such martingale structure is a reasonable requirement to impose on forecasts has been noted previously [6][7][8]12].

MDP's incorporating a static forecast
We now wish to build a tractable model under which (W k : 0 ≤ k ≤ t + 1) evolves over the decision horizon, conditional on the forecasts (F n|0 : n ≥ 0) available at the outset of the decision interval.
To this end, let Kn σ(F m+j|m : j ∈ Z + , m ≤ n) denote the σ-algebra associated with the forecasts collected by time n, and note that Kn ⊆ Gn, the σ-algebra associated with all the information observed by the forecaster by time n. We now wish to construct an MDP formulation appropriate to decision-making by the energy system manager when she has access to the information available both in Fn and K 0 . In other words, her decision at time n must be Fn ∨ K 0 adapted, where B 1 ∨ B 2 ∨ · · · ∨ B l is our notation for the smallest σ-algebra containing B 1 , B 2 , . . . , B l . We call this a static forecast formulation, since the decision maker only uses the forecasts available at time 0 in making decisions.
In particular, we shall build a model under which the (conditional on G 0 ) Markov property holds for 0 ≤ n ≤ t. This ensures that We will now formulate a flexible model that satisfies both the ordinary Markov property (as expressed through the recursion 2.1) and the conditional Markov property (as expressed through 4.1). In particular, we now specialize the stochastic recursion 2.1 to a linear state space model of the where G is a deterministic d × d matrix having spectral radius less than 1, and for which the for n ∈ Z. We further assume that for each n ∈ Z, we can writeZn in the form where the sum in 4.5 is assumed to converge a.s. and in mean square. The family of rv's ( n(j), j ≤ n, n ∈ Z) is assumed to satisfy: Remark The n(k) disturbance models the information gathered by the forecaster at time k that is relevant to the forecast for time n. In view of this interpretation, it is natural that we then "model" the σ-algebra Gn of Section 3 as Gn σ( m(j) : j ≤ n, m ≥ j) in the context of this state space model. In this case Gn = Kn, as we will see later in this section, although we note that in general Gn could be strictly richer than Kn. We further note that A2 implies that the distribution for n(k) only depends on n − k. A1 and A2 ensure that (Zn : n ∈ Z) is an iid sequence of mean zero square integrable rv's.
If we set Hn σ( n(m − j) : m ≤ n, j ∈ Z + ), we note that F W n σ(W j : j ≤ n) ⊆ Hn and that the independence of ( n+1 (n + 1 − j) : j ∈ Z + ) from Hn ensures that P(W n+1 ∈ ·|Hn) = P(W n+1 ∈ ·|Wn). (4.6) It follows that the policy (A * n : 0 ≤ n ≤ t) computed in Section 2 is optimal not only over the Fnadapted policies but also over the Hn ∨ Fn-adapted policies.
Furthermore, for k ≤ n, (4.7) We recall that Gn contains the sequence of rv's ( m(j) : j ≤ n, m > n) that are independent of Hn (and hence Fn). This represents the additional information available to the forecaster about the weather in future time periods that goes beyond the predictive information present in observing Wn that is locally available to the energy system manager. Figures 1 and 2 illustrate the differences in the information sets Hn and Gn.  . . . We then note that 4.7 implies that for n ≥ k, 8) and the corresponding martingale differences are given by In addition, we note that 4.8 implies that Since (I − G)EW 0 = EZ 0 , it follows that As a consequence of 4.12, we see that the "forward forecasts" from time k are correlated, and form (for each k) their own state space model with independent (but not identically distributed) "noise" rv's (Yn(G k ) : n > k), initialized at F k|k = W k . Such correlation in the forward forecasts is clearly desirable from a modeling perspective. We now turn to the conditional dynamics of (Wn : n ≥ k), conditional on G k . Define the Wn(G k )'s via P (Wn(G k ) : n ≥ k) ∈ · = P (Wn : n ≥ k) ∈ ·|G k . (4.14) The relations 4.7 and 4.8 imply that It follows that for n ≥ k, Consequently, (Wn(G k ) : n ≥ k) is (conditional on G k ) a Markov chain that is a linear state space model driven by a sequence (Zn(G k ) : n > k) of conditionally independent (but nonidentically distributed) rv's. For a given k, the variance of the Zn(G k ) sequence (conditional on G k ) increases with n, so the "uncertainty plume" correspondingly grows with time, as one would expect.
With 4.16 in hand, we can now modify the value function recursion of Section 2 so as to compute the optimal policy when the energy system decision maker has available at time n ≥ 0 the information present in Fn ∨ K 0 , the smallest σ-algebra containing both Fn and the forecasts collected up to time 0 by the manager. The structure of our model implies that the policy that is optimal over Fn ∨ Hn ∨ G 0 -adapted policies is actually Fn ∨ K 0 -measurable. Figure 3 illustrates the weather-related information set associated with Hn ∨ G 0 .
Our goal is to minimize and setc for 1 ≤ i ≤ t + 1. The appropriate value function backwards recursion in this setting is then given   As in Section 2, an optimal Fn ∨ Hn ∨ G 0 -adapted policy is then given by is any minimizer of the right-hand side of 4.21 and 4.22.

MDP's incorporating a dynamic forecast
In this section, we discuss how our energy system manager should modify her decision-making when she has access to a new set of forecasts each day. More precisely, suppose that at each time k through the decision horizon, the decision maker receives the forecasts (F n|k : n ≥ k) prior to making the decision for that period. Now, the decision made at time k can depend on both W k and the F n|k 's. In particular, the decision can now be F k ∨ K k -adapted. Since there is more information about (Wn : n ≥ k) available when one uses the forecasts, this will typically modify the optimal control relative to the previously discussed formulations of Sections 2 and 4. Since the forecasts used by the decision maker are constantly updated as k increases, we refer to this setting as a dynamic forecast formulation. Let Fn = (F n+j|n : j ∈ Z + ) be the entire set of forward forecasts issued at time n (and computed from the history Gn). Recall that Wn = F n|n . We claim that the infinite-dimensional process ( Fn : n ∈ Z) is a Markov chain. To see this, observe that 4.9 implies that for j ≥ 0. Since the collection of rv's ( n+1+j (n + 1) : j ≥ 0) is independent of Gn, it follows that ( Fn : n ∈ Z) is a Markov chain. One important and related characteristic of our model is that the Markov chain can be initialized from an arbitrary set of values. This means that our model is consistent with any set of forecast values specified at time 0. Of course, we cannot effectively compute optimal policies with a Markov chain having an infinite dimensional state space. So, we need to truncate the set of forecasts that we use within our formulation in order to generate a finite dimensional Markov state variable. In particular, suppose that Gn,r is the smallest σ-algebra containing both Hn and the σ-algebra σ( n+j (k) : k ≤ n, 1 ≤ j ≤ r), so that it contains only the forecaster's information about the r future forecasts for periods n + 1, · · · , n + r, in addition to the information associated with Hn. Figure 4 illustrates the weather-related information set associated with Gn,r. We note that Gn ⊇ Gn,r and that for 1 ≤ j ≤ r, F n+j|n = E[W n+j |Gn] is a function only of rv's associated with Gn,r, and hence is Gn,rmeasurable.  Using the information associated with Gn,r, we can use the recursion in 5.1 for 0 ≤ j < r. For j = r, we can use 4.12 to expand F n+1+r|n , which yields the recursion As a result, F n+1,r (F n+1+j|n+1 : 1 ≤ j ≤ r) is a linear function of Fn,r and a collection of rv's ( n+1+i (n + 1), n+1+r (j) : 1 ≤ i ≤ r, j ≤ n) that are independent of Gn,r. It follows that ( Fn,r : n ∈ Z) is an rd-dimensional Markov chain. It is also easily seen that it is a Markov chain with stationary transition probabilities. Furthermore, W n+1 = F n+1|n+1 is a simple stochastic function of Fn,r, specifically W n+1 = F n+1|n + n+1 (n + 1), so that it can easily be generated from Fn,r simultaneously with F n+1,r . We can now turn to the computation of the optimal policy in this setting. In particular, we seek the Fn ∨ Gn,r-adapted policy that minimizes t j=0 E[c(X j , A j , W j+1 )|X 0 , F 0,r ] (5.4) over all Fn ∨ Gn,r-adapted policies (An : 0 ≤ n ≤ t). Define the operator Pa (acting on integrable We can then compute the associated value functions for this formulation via the backwards for 0 ≤ i < t, subject to the terminal condition The optimal Fn ∨ Gn,r-adapted action A * n to be taken in period n is then given by A * n = a * n (Xn, Fn,r), where a * n (x, f ) is the minimizer of the right-hand side of 5.7 or 5.8 corresponding to vn(x, f ) for 0 ≤ n ≤ t.  Remark We note that the use of the reduced Markov state variable Fn,r for the weather variables (as opposed to using the state variable (Wn, Fn,r)) is possible only because we made the modeling decision in Section 2 to express the control state recursion in the form and cost c(Xm, An, W n+1 ) in terms of W n+1 rather than Wn. If we had instead modeled the control state evolution via X n+1 = φ(Xn, An, Wn, V n+1 ) (5.10) and/or cost c(Xm, An, Wn), then the decision maker at time n would need to know Wn, and Wn would then need to be added to the Markov state variable for the weather. Since either choice, Wn or W n+1 , is typically reasonable from a modeling viewpoint, we choose to use W n+1 in order to obtain this state reduction.

MDP's incorporating both static and dynamic forecasts
For computational tractability, the value of r used in Fn,r will typically need to be small. But weather forecasters will typically provide forward forecasts over a much larger number of periods. In order to (partially) account for these longer range forecasts (without expanding our state description for the MDP), we now build a formulation that takes into account all the forward forecasts that are present in G 0 (i.e. the static forecasts that are available at time 0), as well as the dynamic forecasts associated with Gn,r for 1 ≤ n ≤ t. Thus, in this formulation, the decision maker at time n has access to Xn, F n+1|n , · · · , F n+r|n and F j|0 for j ≥ 1. Figure 5 illustrates the weather-related information set corresponding to G 0 ∨ Gn,r.
We now turn to the conditional dynamics of ( Fn,r : n ≥ 0), conditional on G 0 . Define the Fn,r(G 0 )'s via P(( Fn,r(G 0 ) : n ≥ 0) ∈ ·) = P(( Fn,r : n ≥ 0) ∈ ·|G 0 ) (6.1) Because the martingale difference terms on the right-hand side of 5.1 are independent of G 0 , for 0 ≤ j < r, where β n+1,j (G 0 ) is independent of G 0 ∨ Gn,r and On the other hand, the right-hand side of 5.2 contains terms that are G 0 -measurable. In particular, Since F n+1,r (G 0 ) can be expressed as a function of Fn,r(G 0 ) and a family of rv's β n+1 (G 0 ) (β n+1,j (G 0 ) : 1 ≤ j ≤ r) that are independent of H 0 n , it follows that ( Fn,r(G 0 ) : n ≥ 0) is (conditional on G 0 ) a Markov chain. However, as with the Markov chain of Section 4, the conditioning on G 0 makes this a Markov chain with non-stationary transition probabilities; see 6.5 in particular. Furthermore, as in Section 4, the variance of β n+1,r (G 0 ) (conditional on G 0 ) increases in n, so that the "uncertainty plume" increases over time.
We now turn to the computation of a policy (A * n : 0 ≤ n ≤ t) that minimizes over all policies (An : 0 ≤ n ≤ t) that are H 0 n -adapted. For f = (f 1 , f 2 , · · · , fr), define the operator P a,i (G 0 ) (acting on integrable functions h) via for 1 ≤ i ≤ t + 1. As in Section 5, the value function recursion takes the form Again, the optimal H 0 n -adapted policy is then given by i (x, f ) is any minimizer of the right-hand side of 6.10 and 6.11.

An energy control system example
In this section, we illustrate some of our theory in the setting of a simple energy control system example. In particular, we let Wn represent the ambient outdoors temperature at the beginning of period n at the site of the energy system that is under control. We assume that (Wn : n ∈ Z) is a real-valued Markov chain corresponding to a first order autoregressive process, so that for n ∈ Z, where g ∈ (0, 1) and the Z i 's are iid with EZ 2 0 < ∞. To help interpret g, we note that corr(W j , W j+n ) = g n , so that the number of periods for the correlation to decay to 0.1 is approximately log(0.1)/ log g. We now describe our simplified energy control system corresponding to heating and cooling a building. We assume that the difference ∆n( Xn − Wn) between the internal (Xn) and external (Wn) temperatures is "mean reverting", so that the ∆n's satisfy their own first order autoregression. In particular, in the absence of control, where the V j 's are iid and independent of the Z k 's with EV 2 0 < ∞. We expect the building to equilibrate more rapidly than does the outdoors temperature, so we expect ρ ∈ (0, g). Substituting 7.1 into 7.2, we find that in the presence of the control An, for n ∈ Z.
We now wish to take advantage of the powerful toolset that is available when our state space model has a quadratic cost structure. We assume that our goal is to minimize the expected infinite horizon discounted cost given by over all F j -adapted controls, where κ > 0, α ∈ (0, 1) is the discount factor, and τ is the reference temperature to which we are trying to steer the system. To incorporate τ into the linear/quadratic formulation, we add Y j as a state variable for which Y j = Y j−1 for j ∈ Z. Furthermore, we , and rewrite 7.1 and 7.3 in terms of the mean zero "noise" rv'sZ n+1 andṼ n+1 : Furthermore, we can express Xn − τ asXn − Yn, where we take Y 0 = τ − τ 0 . Set χn = (Wn,Xn, Yn) T and ξn = (Zn,Zn +Ṽn, 0) T , and observe that we can express our control system dynamics as The objective 7.4 can then be re-expressed as where We observe that this model does not satisfy the standard controllabillity hypothesis that is commonly used within the literature on state space models with quadratic costs (in particular, the Y j 's are not controllable). Nevertheless, the special problem structure here allows us to follow the approach on p. 231-233 of [18] to obtain the solution of this stochastic control problem in closed form. In particular, define the optimal return operator T (defined on suitably integrable functions h) via for z = (w, x, y) T and note that this stochastic control problem corresponds to a positive dynamic program; see p. 214 of [18].
is also a monotone sequence so that h k → h∞. The limit h∞ is the value function corresponding to the policy in which A k = 0 for k ≥ 0.
Since |g| < 1 and |ρ| < 1, the associated stochastic dynamical system is stable and h∞ is finitevalued. We can therefore conclude that v∞ is finite-valued. We further note that if J = (J(i, k) : 1 ≤ i, k ≤ 3) is a symmetric non-negative definite matrix, the scalar αB T JB + R = αJ(2, 2) + κ > 0 (since the diagonal entries of such a matrix must be non-negative), so that αB T JB + R is guaranteed to be non-singular. As a result, the matrix recursion see p. 231 of [18]. By following the argument on p. 156 of [19], we can conclude that there exists a finite-valued non-negative definite matrix K∞ = (K∞(i, k) : 1 ≤ i, k ≤ 3) for which K j → K∞ as j → ∞. Taking limits in 7.13, we find that K∞ satisfies the matrix Ricatti equation Furthermore, as seen from p. 232 of [18], we conclude that the optimal value function for the control problem is 16) and the associated optimal action A * j to be taken at time j is (7.17) We now turn to the analyzing exactly the same MDP when the dynamic forecasts of Section 5 are incorporated into the problem. To simplify our exposition, we set r = 2, so that our energy system manager has access to the forecasts F n+1|n and F n+2|n (in addition to Wn and Xn) at the time that the decision at time n is taken. PutF n+i|n = F n+i|n − EZ 0 /(1 − g), for i = 1, 2 and n ∈ Z.

(7.32)
As an alternative to using the closed forms 7.27 and 7.28 to compute Ev∞(W 0 ,X 0 , τ − τ 0 ) and Eṽ∞(χ 0 ), an iterative approach can be used to compute the covariance matrices; this may be It follows that the covariance matrix for χn satisfies subject to Λ 0 = 0 [20]. Implementations for both the closed-form and the iterative approaches are provided in the case of our energy control system example in the online repository for this paper [21]. Figure 6 shows the percentage reduction in cost from using dynamic forecasts, namely D Ev∞ W 0 ,X 0 , τ − τ 0 − Eṽ∞(χ 0 ) Ev∞ W 0 ,X 0 , τ − τ 0 * 100. (7.35) Contour plots are used to explore D's dependence as a function of selected pairs of parameters for the system. In the top right plot of Figure 6, the improvement increases as γ grows closer to 1. As γ grows, previous terms in the n(k) sequence play a larger role so the value of forecasts increases.
On the other hand, as g grows closer to 1, the dependence across time of the Wn sequence grows, and so the value of forecasts decreases. On the top right plot, the value of forecasts increases as the dependence of X n+1 on Xn grows (with ρ). In the bottom left plot, the value of forecasts increases with the noise in the weather sequence Wn (controlled by σ 2 ) but decreases with the noise in the building temperature sequence Xn (controlled by σ 2 V ). Finally, the bottom right plot shows a symmetry in the improvement with respect to the control setpoint τ around τ 0 . We note that τ 0 = EX 0 , the building temperature in the absence of control. The value of forecasts decreases when the target setpoint is farther from τ 0 : as the weight of the action in the value function grows, the relative value of forecasts weakens.

Conclusion
In this work we introduced the first principled and mathematically consistent framework for the incorporation of forecasts into MDPs in the setting of state space models with linear dynamics, using no ad hoc elements to add forecast information into the MDP setting. In this framework, we discussed the different ways in which forecast information can be incorporated (static, dynamic, static and dynamic together). Through an illustrative energy system control example, we provided a numerical comparison of the optimal value functions for the setting with no forecasts to the setting with dynamic forecasts.
The introduction of this framework opens the door to several theoretical and applied research questions, e.g. on how the quality of forecasts affects control methods in different disciplines and in different applications. Potential theoretical research directions include extensions to periodic Markov chains (e.g. to model time-of-day effects), non-stationary Markov chains, forecast updates that are not synchronized with decisions epochs, and Markov chains with nonlinear dynamics.
Authors' Contributions. JAC and PWG conceived of the study and wrote the manuscript.
Competing Interests. The authors declare that they have no competing interests.