SUPPLEMENTARY FILE: Full observability and estimation of unknown inputs, states, and parameters of nonlinear biological models

2 Supplementary information: Methods 1 2.1 On the feasibility of the FISPO analysis without Assumption 1 . . . . . . . . . . . 1 2.2 On the implications of Assumption 1 for the FISPO analysis . . . . . . . . . . . . 2 2.3 Numerical solution of nonlinear optimal control problems . . . . . . . . . . . . . . 3 2.3.1 Control vector parameterisation . . . . . . . . . . . . . . . . . . . . . . . 3 2.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4.1 Remarks on implemented case studies . . . . . . . . . . . . . . . . . . . . 4 2.5 Remarks on input estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5


Additional supplementary material
In addition to this document, the following accompanying files are available: • The FISPO analysis is performed with the MATLAB toolbox STRIKE-GOLDD 2.1, which includes implementations of the case studies analysed in the main paper. It can be downloaded from Zenodo (https://zenodo.org/record/2649224). The latest version of the toolbox is also available in GitHub (https://github.com/afvillaverde/strike-goldd_ 2.1).

On the feasibility of the FISPO analysis without Assumption 1
This subsection shows that the FISPO analysis can be performed without Assumption 1, i.e. without setting to zero the (i + 1) th time derivative of the unknown input, w (i+1) (t). To this end we show that w (i+1) (t) does not appear in O I when this matrix is built with i Lie derivatives. Assume we build O I with i Lie derivatives. The augmented state vector and its dynamics are: . . .
Note that the output function g(u, w, x, θ ) is still a function only of {u, w, x, θ }, not of any derivatives of u or w. Hence the first Lie derivative is: Note that g(u, w, x, θ ) may depend on w, but not onẇ,ẅ, . . . . Therefore, the only elements of the vector ∂ g(u,w,x,θ ) ∂x that can be non-zero are the ones corresponding to the partial derivatives with respect to {w, x, θ }. Thus, the product ∂ g(u,w,x,θ ) ∂xf (x, u) may containẇ but not higher order derivatives such asẅ, since the elements of the vectorf where those derivatives appear are multiplied by zeros.
As for the term ∑ j=∞ j=0 ∂ g(u,w,x,θ ) ∂ u ( j) u ( j+1) , it may be a function of {u, w, x, θ ,u} (but not of higher derivatives of u or w). Therefore, the Lie derivative of order one, Lf g, can be a function of {u,u, w,ẇ, x, θ }, but not of derivatives of {u, w} of order higher than one.
This result can be easily generalized to Lie derivatives of order higher than one, which are obtained recursively as: Following the same reasoning as above, it can be readily seen that the Lie derivative of order i may contain derivatives of {u, w} up to order i or lower. As a consequence, a matrix O I built with i Lie derivatives does not contain w (i+1) , even though w (i+1) appears inf . Hence it is not necessary to assume that w (i+1) = 0 in order to calculate rank(O I ).

On the implications of Assumption 1 for the FISPO analysis
This subsection discusses the implications of the following assumption that can be made when applying the FISPO analysis method: Assumption 1. The time derivatives of the unknown input vector of the system under consideration vanish above a given order i, that is, Let us first consider a typical scenario, in which the analysis of a model reveals that it is FISPO for i = {1, 2, 3, 4, 5, . . .}. This is the case of the C2M analysed in the paper, which is computationally cheap to analyse even for large values of i, so we can establish that it is FISPO even for i = 15 (it would be feasible for larger values too, but we stopped the calculations here arbitrarily; see subsection 3.1.1 of this supplementary information for more details). In this case we know for a fact that the model is FISPO for i = {1, 2, 3, 4, 5, . . . , 15}, that is, for all unknown inputs whose 15 th and higher order derivatives are zero. It is reasonable to assume that this will be the case as i keeps growing (in fact, for small models such as this, this might be determined by inspecting the O I matrix, but it will typically not be the case for more complex models). Therefore, we may assume that the model is FISPO for any unknown input. In doing so we might be overestimating observability: according to our assumptions, it would be conceivable that two different inputs whose 16 th derivatives are nonzero produce the same output. However, it should be taken into account that in reality the unknown inputs to biological systems are not ideal mathematical functions. It is rather the other way around: they can be modelled, or approximated, using ideal mathematical functions, which in most cases can be approximated very accurately with low order polynomials. The result obtained for the C2M model tells us that, as long as the input can be accurately modeled with a polynomial of order equal or smaller than 15, it will be reconstructible. Theoretically, there is a chance that we would be reconstructing the 'wrong' polynomial, but the error in this estimation would come from the terms of order 15 and higher, so it is highly unlikely that it will be distinguishable from noise. Of course, for models that are computationally more expensive it may be necessary to set a much lower limit to the number of nonzero derivatives, i. However, even if it is only 4 or 5 it allows for a realistic representation of most biological inputs. Therefore, while there is a risk of overestimating observability, we believe that it should be taken more as a theoretical caution than as an actual limitation in practice.
We may also consider a different scenario: imagine that for low values of i the method yields that a model is not FISPO, but the rank of the O I matrix keeps growing with i, allowing for a theoretical chance that it will eventually have full rank with sufficient derivatives. However, due to computational limitations we are forced to stop the calculations before that happens. If we interpret these results as an indication that the model is not FISPO, we might be underestimating the observability. However, this would be a highly counter-intuitive situation, in which having unknown inputs of higher complexity would improve observability. We have never encountered it in any of the models we analysed so far. Hence we believe that the risk of underestimating observability is not a real concern. (Note that this scenario is not the same as the one in which the number of Lie derivatives is not sufficient to produce a result. In this case the method yields an inconclusive result; the STRIKE-GOLDD toolbox reports that the FISPO analysis could not be completed. Of course this should not be interpreted as a sign or observability nor unobservability.)

Numerical solution of nonlinear optimal control problems
The simultaneous input and parameter estimation problem can be formulated as an optimal control problem (see main text). Methods for the numerical solution of nonlinear optimal control problems can be classified under three main categories: dynamic programming, indirect and direct approaches. Dynamic programming [5,6] suffers from the so-called curse of dimensionality, so the latter two are the most promising strategies for problems of realistic complexity. Indirect approaches are based on the transformation of the original optimal control problem into a multipoint boundary value problem using Pontryagin's necessary conditions [9,13]. Direct methods transform the optimal control problem into a nonlinear programming problem (NLP). They are based on the discretisation of either the control, in which case they are known as "sequential" strategy [4,17], or both the control and the states, in which case they are known as "simultaneous" strategy [7,8].
In the sequential strategy, also known as control vector parameterisation (CVP), the controls are approximated piecewise, usually by a low order polynomial, the coefficients of which are the decision variables. The problem is transformed in an outer NLP problem optimising the decision variables provided in the inner initial value problem (IVP) where the dynamic system is integrated. In the simultaneous strategy, also known as complete parameterisation approach, both states and controls are parameterised by dividing the whole time domain into small intervals. This is done either using multiple shooting, where similarly to the sequential approach, the problem is integrated separately in each interval and linked with the rest through equality constraints, or by a collocation approach, where the solution of the dynamic system is being coupled with the optimisation problem.
The simultaneous collocation strategy leads to larger optimisation problems but has the advantage of avoiding the repeated integration of the system dynamics at each iteration. The Control vector parameterisation strategy is easier to apply to arbitrary nonlinear dynamics, and relies on well tested initial value solvers. Here, we have chosen the control vector parameterisation approach (sequential strategy).

Control vector parameterisation
The control vector parameterisation (CVP) approach proceeds by dividing the time horizon into a number of elements (ρ). Each of the control variables ( j = 1 . . . n u ) are then approximated within each interval (i = 1 . . . ρ) by means of some basis functions (usually, low order Lagrange polynomials [18]) as follows: being τ the normalised time in each element i: and M j the order of the Lagrange polynomial ( ). In this work we will consider M j = 1 or M j = 2, i.e. step-wise or linear-wise control approximations.
In the CVP approach, the controls are expressed as functions of a new set of time invariant parameters corresponding to the polynomial coefficients (w). Therefore the original infinite dimensional problem is transformed into a set of non-linear programming problems, with dynamic (the model) and algebraic constraints, in which the decision variables correspond to the original unknown parameters in θ and w, which will be part of the overall set of parameters to determine.

Implementation
In order to perform the FISPO analysis, a new release of the STRIKE-GOLDD [19] toolbox for MATLAB was created for this paper. It incorporates the capability of analysing models with unmeasured inputs, w(t), and includes as examples all the case studies presented in this article. This new release, STRIKE-GOLDD 2.1.2, is available at Zenodo (https://zenodo.org/record/ 2649224) and GitHub (https://github.com/afvillaverde/strike-goldd_2.1).
Regarding the estimation problem, the methodology was introduced in [16] and implemented as a software module (IOC add-on) for the AMIGO2 toolbox ( [2]). The IOC add-on is publicly available at https://sites.google.com/site/amigo2toolbox/home/amigo2_ioc. The implementation of the case studies reported in this paper is available at Zenodo (https://doi.org/ 10.5281/zenodo.2542798). In all the computations we have used the eSS optimiser, a hybrid global-local optimisation meta-heuristic available in the AMIGO2 toolbox.

Remarks on implemented case studies
As mentioned in the previous sections, the methodology used for the numerical estimation problem of the full system reconstruction was introduced in our previous work [16]. For any further discussion on the use of this methodology or on its scalability we refer the interested reader to the aforementioned paper.
In order to enable the computational reproducibility of the estimation results presented in this article, we provide the necessary scripts that include all the settings that were used, along with the exact synthetic pseudo-experimental data sets that were generated and the CPU seed from which the computation was initialised. The latter is necessary for the reproduction of the exact numerical results, since eSS is a partially stochastic optimisation solver that is generating pseudo-random reference sets of points in the search space.
It should be noted that the majority of the numerical subcases presented in this paper were solved, as can be seen in the provided scripts, using the eSS method with the local solver FSQP [12]. This local solver is not publicly available with the AMIGO2 toolbox, since it requires a licence. Substituting FSQP with any other solver available in the AMIGO2 toolbox can result in differences in the convergence of the solver (e.g. higher CPU time requirements) and therefore affect the accurate reproduction of the presented results.

Remarks on input estimation
In the present work we have used a standard control vector parameterization (CVP) scheme [17,18], where inputs (controls) are approximated using low order piecewise polynomials, typically piecewise constant (PWC) or piecewise linear (PWL) approximations. We refined the controls according to the following procedure: 1. We chose basis functions (PWC, PWL) and an initial coarse discretization based on prior knowledge about the smoothness of the input to be reconstructed. PWC should be chosen in the absence of prior knowledge, or when a non-smooth input is expected.
2. Since we did not assume any prior knowledge, we chose PWC and proceeded to refine the control mesh. If the final result indicated a relatively smooth profile, we considered a PWL discretization with a coarser mesh.
3. We proceeded using an iterative mesh refinement technique (successive re-optimisations increasing the control discretization, as described in [1]) until we obtained a satisfactory fit to data.
In the three case studies considered here, relatively coarse discretisations were sufficient to obtain a good fit to the data. Additional examples of the use of the above scheme for case studies with inputs of different non-smoothness can be found in [3,11].
The above procedure works well for most problems, but if the computational cost of the mesh refinement scheme becomes very significant (e.g. cases with very wild input profiles and which are computationally very costly to evaluate, such as large-scale dynamic models), one can resort to CVP variants that introduce efficient control discretisations which are able to approximate difficult input profiles with coarser discretisations (a review of recent methods is given in [15]).

FISPO analysis
We performed the FISPO analysis of three different versions of the C2M model defined in Section 3.1 of the main paper: 1. The model with known input u, as originally presented in [20] and defined by equation (25) of the main text.
2. The model with unknown input w(t) = b · u(t), as in eq. (26) of the main text.
3. The model with unknown input w(t) = b · u(t), and with the parameter k 1e considered as a known constant.
For each version we considered a number of different bounds on the number of nonzero derivatives that the inputs are assumed to have (i U for the known inputs, i W for the unknown ones), to assess their influence on the outcome of the FISPO analysis. Table S.T1 shows the results for the different options, including the CPU time of the calculations. The number of Lie derivatives required to reach a solution, as well as the corresponding rank of the generalised observability matrix O I , are also shown.
For this case study the number of nonzero derivatives affects the result if the input is known, but not if it is unknown. For the unknown input case the results are constant for all the numbers chosen. We take this as an indication that the same result holds as i W → ∞.

Numerical estimation
For this fully observable model, the optimal tracking approach was used to reconstruct the full system. Two subcases were considered in this problem: 1. A noiseless subcase using synthetic data without any addition of noise. For this case the estimation should achieve the theoretically possible results, numerically proving or disproving the FISPO analysis.

2.
A noisy subcase, where the addition of noise to the synthetic data allows us to assess the practical reconstructibility of the model in a more realistic scenario.
The estimation problem arising from this model includes: two unknown model parameters, one unobserved initial condition (as well as the corresponding state), and one unmeasured input. The nominal parameter values were all equal to 1 with b = 2. The time horizon was considered in arbitrary units between 0 and 1, taking 11 samples. For the noiseless subcase no noise was added to the data, while in the noisy subcase 5% homoscedastic noise was added. In both subcases the search space considered was [0, 10] in the case of the input and a two orders of magnitude space around their respective nominal values in the case of the parameters.
The results of the numerical estimation for the noiseless subcase are given in the main text. The noiseless results approximated the theoretically possible (perfect) inference. For the noisy subcase the problem requires several experiments in order to be practically reconstructible. We used 6 different experiments with pseudo-experimental noisy data, with different input values and a different initial condition of the observed state (x 1 ) in each of them. In the main article we show the reconstruction results for one of the 6 experiments, and the remaining ones are shown here in Figures S1-S5. It can be noticed that after the addition of noise to the data the estimation is still good but obviously worse, highlighting the practical numerical issues that are encountered in a realistic instance of an optimal tracking.

FISPO analysis
Similarly as for the C2M case study analysed in the previous subsection, we performed the FISPO analysis of two different versions of the TS model defined in Section 3.2 of the main paper: 1. The model with known inputs as in [10,14], defined by equation (27) of the main text.
2. The reformulated model with unknown inputs defined by eqs. (28, 29) of the main text.
Again, for each model version we considered a number of different bounds on the number of nonzero derivatives that the inputs are assumed to have (i U for the known inputs, i W for the unknown ones), to assess their influence on the outcome of the FISPO analysis. Table S.T2 shows the results for the different options, including the CPU time of the calculations. The number of Lie derivatives required to reach a solution, as well as the corresponding rank of the generalised observability matrix O I , are also shown.
For this model the number of nonzero derivatives affects the result for the case of known inputs, but not for the unknown inputs case. (Note that in the latter case the model formulation is different, having been reparameterised.) The model with unknown inputs keeps being FISPO as the number of nonzero input derivatives grows. We take this as an indication that the same result holds as

Numerical estimation
Following the same procedure as with the C2M case study, we perform the numerical estimation for the fully observable version of the model with unknown inputs in two synthetic subcases: (i) without and (ii) with the addition of noise to the generated data. We consider the noiseless subcase as a numerical validation of the FISPO analysis results, and the noisy subcase as a numerical validation of the IOC methodology. The problem consists of 6 unknown model parameters and 2 unknown inputs. The nominal values of the unknown parameters are given in Table S.T3. Regarding the experimental scheme, 11 equidistant samples were assumed to be taken within a time horizon [0, 16.5] in arbitrary units. The search space used in the estimation problem was a two orders of magnitude space around their respective nominal values, in the case of the time-invariant unknowns (i.e. the parameters), and of space of [0, 73.68] and [0, 121.83] in the case of the unknown inputs, w 1 and w 2 , respectively. The input search space was adopted according to the reformulation to the upper bounds considered for atc(t) and IPT G(t) in [14] and [10]. In the noisy subcase, 5% homoscedastic noise was included in the pseudo-experimental data. To avoid the numerical ill-conditioning of the estimation problem we considered -as we did for the previous case study (C2M) -a multi-experimental scheme with 4 experiments that had the same experimental conditions but different nominal input values.
The main text of the article shows the results of the the full system reconstruction for the noiseless subcase for one of the four experiments used in the noisy subcase (Exp 4). The rest of the results are presented here in Figures S6-S8. They show that the full system reconstruction in the noisy case was very good, successfully inferring both the unknown model parameters and the unknown inputs in every experiment.

FISPO analysis
We performed the FISPO analysis of two different versions of the HIV model defined in Section 3.3, eq. (30) of the main paper: 1. The model with the time-varying infection rate, η(t), considered as a known input.
2. The model with η(t) considered as an unknown input.
For each model version we considered a number of different bounds on the number of nonzero derivatives that the inputs are assumed to have (i U for the known inputs, i W for the unknown ones), to assess their influence on the outcome of the FISPO analysis. Table S.T4 shows the results for the different options, including the CPU time of the calculations. The number of Lie derivatives required to reach a solution, as well as the corresponding rank of the generalised observability matrix O I , are also shown.
STRIKE-GOLDD determined that the model is FISPO both with known and unknown inputs. Therefore, unlike in the other case studies, no re-formulation was required for estimating the full system. The number of nonzero derivatives did not affect the results.
To demonstrate the possibility of performing the FISPO analysis without the Assumption 1, i.e. without limiting the number of non-zero derivatives of the unknown input, we also analysed a third version of the model: 3. The model with η(t) considered as an unknown input, and either δ or N considered as a known parameter.
As shown in table S.T4, STRIKE-GOLDD determined that this third version the model is FISPO even with an infinite number of nonzero derivatives. In contrast, for the model version 2, i.e. with all parameters unknown, the result was inconclusive: STRIKE-GOLDD reported that {λ , ρ, c} are identifiable, but could not decide about the remaining variables within the maximum computation time that we allowed (5000 seconds).

Numerical estimation
Similarly to the previous case studies we consider (i) a noiseless and (ii) a noisy synthetic subcase in the estimation problem. However, as explained in the main article, in contrast with the previous case studies a single-experimental scheme was used in the estimation of the noisy subcase, in order to consider a realistic scenario of a patient-specific full system reconstruction of this model. The problem includes the 5 unknown model parameters, the 2 unobserved initial conditions, and the unknown input. All nominal values considered for the problem's unknowns were taken from the literature and are presented in Table S  In the estimation problem, the time horizon considered is [0, 201], measured in days. Following a realistically possible experimental scheme we assume daily measurements for the first 20 days, and once per 4 days after that. The search space considered was [10 −8 , 10 −1 ] in the case of η, and a two orders of magnitude space around their respective nominal values in the case of the 5 unknown parameters and the 2 unobserved initial conditions. To improve the numerical conditioning of the NLP problem, scaling was used in the y 1 and η. This can be easily identified in the scripts provided.
To reconstruct the input η we approximated it with a number of equidistant piece-wise linear elements (PWL). Assuming no prior knowledge on the nominal input's shape, we utilised a reoptimisation based technique that was also used in our previous work [16] for handling arbitrary inputs. Solving the problem considering an initial crude discretisation of the input, we used this result as the initial guess for a re-optimisation of the same problem with the double of PWL elements. Iterating this solution strategy, we started from one PWL element (a ramp starting between t 0 and t f ) and increased the number of PWL to 2 and subsequently to 4. We observed that the solution using 4 PWL was already a very good reconstruction of the system. The results regarding both the noiseless and the noisy subcases are shown in the main article. Please note that, in order to reproduce those results using the scripts that are provided as supplementary files, it is necessary to apply the re-optimisation based approach described above.