## Abstract

Pandemic management requires that scientists rapidly formulate and analyse epidemiological models in order to forecast the spread of disease and the effects of mitigation strategies. Scientists must modify existing models and create novel ones in light of new biological data and policy changes such as social distancing and vaccination. Traditional scientific modelling workflows detach the structure of a model—its submodels and their interactions—from its implementation in software. Consequently, incorporating local changes to model components may require global edits to the code base through a manual, time-intensive and error-prone process. We propose a compositional modelling framework that uses high-level algebraic structures to capture domain-specific scientific knowledge and bridge the gap between how scientists think about models and the code that implements them. These algebraic structures, grounded in applied category theory, simplify and expedite modelling tasks such as model specification, stratification, analysis and calibration. With their structure made explicit, models also become easier to communicate, criticize and refine in light of stakeholder feedback.

This article is part of the theme issue ‘Technical challenges of modelling real-life epidemics and examples of overcoming these’.

### 1. Introduction

The basic principles of epidemics originate with the Kermack–McKendrick model, which established the paradigm of compartmental models to predict how an infectious disease spreads through a population over time. Yet, the COVID-19 pandemic has created a pressing need to expedite the development of new and customized epidemiological models. For instance, the Institute for Health Metrics and Evaluation (IHME), an independent global health research centre at the University of Washington, has been collecting data, running simulations and publishing findings throughout the COVID-19 pandemic. In April 2020, just two months after the first confirmed US cases of COVID-19, the IHME was comparing different epidemiological models according to the accuracy of their forecasts [1]. The models were summarized in natural language and compared mathematically via model outcomes derived by simulation. Such summaries do not permit a precise comparison of model *structure*, which is essential to understanding how modelling assumptions affect model outcomes. Inadequate model representations limit the speed and precision with which organizations like the IHME can respond to emerging pandemics.

We present and exemplify an approach to developing scientific modelling software in which the high-level structure of a model is visible and manipulable in its software implementation. Mathematical models of scientific systems involve rich, structured knowledge that scientists leverage to communicate, iterate and validate their models. However, this knowledge is obscured from computer systems when models are implemented using low-level computational primitives rather than by software tools that mirror the scientific concepts. Our approach has two phases: (i) formalizing the mathematics of high-level structures that recur in scientific models and (ii) implementing these mathematical formalisms directly in software. Our approach contrasts with the traditional implementation of scientific models, because it treats the structure of scientific models as primary, which in turn prioritizes the expertise of domain scientists and modellers during the development, adjustment and analysis of models. As an example of our approach, we formalize the specification of a compositional or stratified epidemiological model using the mathematics of applied category theory, and we demonstrate its implementation in software.^{1}

Many aspects of model structure can be formalized as algebraic structure using mathematics from category theory. For example, Waites *et al.* [2] apply category theory to formalize composable rule-based models. In this work, we present a generalized addition and a generalized product for composing models. These operators are grounded in category theoretic concepts including copresheaves, structured cospans and pullbacks. These concepts underlie our software design, which automates the implementation of these operations. When specifying a composite model, a sharp distinction is drawn between the *syntax of composition* dictating how the subsystems interact and the *semantics of composition* assigning concrete mathematical models to the subsystems. This separation promotes generality and flexibility in modelling as the same syntax can have several different semantics. In this paper, we will see that Petri nets with mass action kinetics and, more generally, ordinary differential and delay differential equations, can all serve as semantics for the same syntax. Furthermore, translations between these and other types of models can be formalized using concepts from category theory. Unlike the textual syntax of conventional programming languages and data formats, our syntax is algebraic and diagrammatic in nature. This language makes the informal diagrams used by scientists and engineers mathematically rigorous and directly computable. Furthermore, the syntax itself becomes a combinatorial data structure that can be algorithmically manipulated. *Undirected wiring diagrams*, which form an algebraic structure called an *operad*, are the main syntax treated in this paper, although other syntaxes, such as directed wiring diagrams, can be similarly applied [3,4].

Although all scientists intuitively understand that complex models are assembled from simpler ones, such decompositions are rarely made explicit in modelling practice. Our operadic approach to compositional modelling makes the modular structure transparent and rigorous. As a result, experts in distinct fields can develop submodels in parallel, particular submodels can easily be replaced without affecting others and programming errors are reduced because the software handles the bookkeeping needed to assemble the composite model. All of this serves to accelerate the modelling process, which is critical in emerging scenarios where existing models must be rapidly adapted to new scientific contexts.

Because programming simulators is labour intensive and error-prone, many existing software systems present higher-level interfaces for modellers to formulate and analyse models. They are often specific to a particular scientific domain or theoretical class of models. For example, Stan is used for analysing data with probabilistic models [5], Kappa for rule-based modelling of biochemical systems [6], Copasi for simulating mechanistic models in systems biology [7],

Our software system encompasses a range of tasks in the scientific modelling workflow including model formulation, simulation, analysis and comparison. The paper is organized along these lines. Section 2 presents the compositional approach to model specification, explaining how undirected wiring diagrams (UWDs) are a syntax for composing open Petri nets as well as more general differential equation models. Section 2d highlights the computational and theoretical advantages of this approach. Section 3 demonstrates how maps between Petri nets can encode domain-specific type systems, which are useful for building stratified models and verifying that models do not violate established theory. Finally, §4 discusses calibration and sensitivity analysis via interoperation with other tools, emphasizing visualization and the tight feedback loop enabled by high-level model specifications.

### 2. Compositional methods of model specification

#### (a) Structured multicospans of Petri nets for compartmental models

In this section, we exemplify the categorical approach to compositional modelling by giving the three components of such a framework: (i) a model semantics, for specifying concrete mathematical models of systems, (ii) a composition syntax, for specifying interactions between systems and (iii) a composition rule, for specifying how to compose chosen concrete models according to a given syntactic term. We emphasize that the composition syntax is a logical syntax, in that the composition syntax axiomatizes the rules of well-formed expressions that define a composition of models. By contrast, the model semantics is a denotational semantics that assigns mathematical meaning to the models in a composition. We do not mean semantics in the sense of observed, real-world phenomena and such an interpretation is outside of the scope of this work.

For the model semantics, we use Petri nets in a form called *whole-grain Petri nets* [11], defined by the diagram of finite sets and functions shown in figure 1*c*. Thus, a whole-grain Petri net consists of a finite set of *places* or *species* $S$, a finite set of *transitions* $T$, and spans $S\stackrel{is}{\leftarrow}I\stackrel{it}{\to}T$ and $S\stackrel{os}{\leftarrow}O\stackrel{ot}{\to}T$ defining the *input* and *output* arcs between states and transitions. A span is similar to a ‘multirelation’ in that pairs of elements may be related with multiplicity. Multiplicity is important for transitions that input or output multiple tokens of the same species, such as an infection transition that yields two infected individuals as output. As an example, consider the susceptible–infected–recovered (SIR) model in figure 1*b*. This Petri net has three places $S=\{S,I,R\}$ corresponding to susceptible, infected and recovered populations and two transitions corresponding to infection and recovery. The infection transition is the target of two input arcs—one whose source is the place $S$ and one whose source is the place $I$—and the source of two output arcs—both whose target is the place $I$. The recovery transition is the target of a single input arc and the source of a single output arc.

Petri nets are closed systems, meaning that they are isolated from interaction with other systems. Non-compositional modelling approaches focus on explicitly defining and implementing closed systems. However, Petri nets arising in practice can comprise hundreds [12] or thousands [13] of states and transitions, making them unwieldy both to conceptualize and to implement. By contrast, our compositional modelling approach capitalizes on the tendency of real-world systems to coexist in richly structured ecosystems and enables the development and assembly of open-system models.

Structured cospans [14] and decorated cospans [15] are formalisms for turning closed model semantics into open model semantics, in which systems can interact along specified interfaces. Although structured cospans are applicable to a variety of systems, mathematically and in our implementation, we restrict our discussion to the important case of Petri nets. A *structured multicospan of Petri nets*, or *open Petri net* (cf. [16]), is a whole-grain Petri net [11] together with a list of finite sets ${A}_{1},\dots ,{A}_{n}$ and functions ${A}_{1}\to S,\dots ,{A}_{n}\to S$. The sets ${A}_{i}$, called the *feet* of the structured multicospan, define an interface for the open Petri net. The functions ${A}_{i}\to S$, called the *legs* of the structured multicospan, select the places of the Petri net which are exposed through the interface. The more standard notion of *structured cospan* is the special case of $n=2$ legs. The extra flexibility afforded by multicospans is useful in practice. In particular, open Petri nets with arbitrary numbers of legs can be composed using the graphical syntax of UWDs [17], which are a generic graphical syntax for composing relations, database tables, structured multicospans and other undirected systems. A UWD consists of a set of *boxes*, a set of *ports* and a set of *junctions*. Each port is assigned to a box and wired to a junction. Figure 2*a* depicts a UWD with three boxes, 10 ports and five junctions. A port assigned to the SIR box and a port assigned to the VIvR box are both wired to the top-most junction. Likewise, two junctions connect two ports assigned to the SIR box with two ports assigned to the cross exposure box, and two junctions connect two ports assigned to the VIvR box with two ports assigned to the cross exposure box.

In figure 2, we present a compartmental model for viral dynamics that accounts for vaccination as the composition of three concrete submodels corresponding to (i) a disease spread model for unvaccinated people, (ii) a disease spread model for vaccinated people and (iii) a cross exposure model of the interactions between the two populations. The open Petri nets for these three primitive subsystems are shown in figure 2*b*. The composition syntax is given by the UWD in figure 2*a*, which has three boxes, 10 ports and five junctions. The systems compose by identifying places that are connected in the UWD. For example, the recovered populations, labelled $R$, in the SIR and VIvR models are identified in the composite model. The resulting composite model is shown in figure 2*c*.

In mathematical terms, UWDs form an *operad* constituting the composition syntax, and the structured multicospans of Petri nets (strictly speaking, their isomorphism classes) form an *operad algebra* of the UWD syntax. This operadic framework has many advantages. In addition to the modular model specification strategy exemplified in figure 2, the algebra enables a hierarchical model specification strategy in which a submodel may itself be the composite of still more primitive submodels. A syntax for hierarchical modelling is given later in figure 3*f*. The operadic framework also enables a mathematically rigorous divide-and-conquer workflow by designating subsystems that can be developed and refined in parallel. Updates to submodels do not affect others except through explicitly represented changes to the composition syntax. Furthermore, the syntax provides an opportunity to build assumptions and domain-specific knowledge directly into models and can be used to identify properties or appropriate sampling algorithms of the composite model. These advantages are discussed further in §2d.

The Julia packages AlgebraicPetri and AlgebraicDynamics directly implement the operadic approach described in this section and its extensions in §2c. These packages enable modellers to create executable code for composite models that reflect the modular and hierarchical structure of real-world systems [3].

#### (b) Mass action kinetics for open Petri nets

A Petri net is a combinatorial description of a dynamical process. The graphical representation and network topology of Petri nets can be analysed to infer structural properties of the system. However, behavioural analyses often require explicit model simulations. These simulations can be computed using discrete sampling algorithms, such as Gillespie’s direct method or tau-leaping, or by interpreting a Petri net as an ordinary differential equation (ODE) and applying standard numerical integration techniques. In this section, we focus on the latter method, which allows for integration with the calibration and analysis toolkits described in §4.

Following Baez & Pollard ([18], Definition 13), we associate an ODE to a Petri net by applying the law of mass action, which states that transitions consume inputs and produce outputs at rates proportional to the product of their input concentrations. To illustrate, define the function $p:S\to {\mathbb{N}}^{T}$ so that $p(s)$ is the multiset of transitions producing the species $s$. Likewise, define $r:T\to {\mathbb{N}}^{S}$ so that $r(t)$ is the multiset of species that are inputs to the transition $t$. Its weighted preimage ${r}^{-1}:S\to {\mathbb{N}}^{T}$ maps a species $s$ to the multiset of transitions for which it is an input. Each species $s$ in the Petri net is assigned a variable ${u}_{s}$ in the ODE. The transitions define the following vector field on the state space ${\mathbb{R}}^{S}$:

When applicable, defining ODEs by Petri nets or composites of Petri nets has significant software advantages. Meaningful, local changes to a Petri net, such as substituting submodels in a composition or adding a single species or transition, often lead to non-trivial, global changes to the corresponding ODE that affect many variables and terms. For example, analysis tools that observe, report and calculate properties of state variables throughout a simulation often refer to a single variable in many places. Therefore, adding or removing a state variable to the system requires making many coordinated changes to the code. This design pattern results in software where local changes to the mathematical model require global changes to the software implementation. By contrast, local changes to a Petri net model requires only local changes to its software implementation in AlgebraicPetri. The transformation of a Petri net into an ODE via the law of mass action automatically and accurately translates these local changes to the Petri net model into global changes to the corresponding ODE. This automation can accelerate the modelling cycle—such as for making modifications in response to new information or testing for policy robustness—which is critical when responding to urgent situations.

#### (c) Composition of general differential equation models

Mass action kinetics are often insufficient to simulate complex dynamical processes like those found in biology, ecology and epidemiology. In this section, we show how the composition method for Petri nets generalizes to composition methods for models of different types, in particular to models explicitly defined by ODEs or delay differential equations (DDEs).

The composition of Petri nets described in §2a is an example of the categorical formalism of operads and operad algebras, which equip visual grammars with the rigour of algebraic equations. The theme of the operadic approach is to explicitly and independently describe the syntax of composition—how subsystems interact—and the semantics of composition—particular choices of component models for each subsystem. The example of composing Petri nets, in which the syntax is given by a UWD and the semantics by open Petri nets, is one of many examples of the operadic approach to modelling.

Modelling vector-borne pathogens, such as malaria, is an exemplar context for the operadic approach because the pathogen dynamics naturally decompose into distinct scientific domains such as epidemiology and entomology. The Ross–Macdonald class of equational models highlights the decomposition of a vector-borne pathogen epidemic into three subsystems: pathogen dynamics in the vector (mosquito) population, pathogen dynamics in the host (human) population and the dynamics of pathogen transference in the bloodmeal [19]. Syntactically, this composition is defined by a UWD with three boxes corresponding to the three subsystems and with junctions corresponding to populations involved in multiple processes. Figure 3*a* defines this UWD. To this composition pattern, we apply submodels of increasing complexity and of different types, namely Petri nets in figure 3*b*, ODEs in figure 3*c*,*d* and DDEs in figure 3*e*. The composition for Petri nets was defined in §2a. The ODE and DDE submodels compose by identifying variables connected in the UWD and summing the rates of change for identified variables. For example, the result of composing the three ODE submodels in figure 3*d* according to the given composition pattern is

Just as the law of mass action defines a transformation from Petri nets to ODEs, there is also a transformation from ODEs to DDEs giving a trivial dependence on the history. These transformations allow modellers to use different model types to specify the submodels of a composite. For example, in figure 3*e* the submodels for the pathogen dynamics in hosts and vectors are given by ODEs and in contrast the bloodmeal model is given by a DDE. The composite model is derived by translating the ODE submodels into DDE models and then composing the DDE models with the result being akin to the Sharpe–Lotka model [20]. This formal process gives domain experts the freedom to choose model classes that best fit their field.

#### (d) Discussion of compositional model specification

In this section, we have presented a compositional approach to modelling that is grounded in the mathematics of applied category theory and implemented directly in software. We conclude with a discussion of this framework.

##### (i) Advantages to the engineering process

Engineering is a process that involves taking a theoretical description of a model and developing software that can simulate, calibrate and analyse the model. A model description is often informally compositional and in traditional software this structure is implicit in the code. By contrast, software packages based on the operadic approach to modelling make this structure explicit and disambiguate the process of turning the mathematical specification of a model into the code that implements it. As a result, engineers have a mathematically grounded divide-and-conquer approach to select, implement and iteratively develop submodels. This process is also hierarchical as the categorical formalism implies that a submodel may itself be the composition of still more primitive models. Furthermore, as discussed in §2b, the implementation of the categorical framework can reduce code complexity and errors, since local changes to models correspond to local changes in the code base.

##### (ii) Advantages to the scientific process

The scientific process relies on transparent communication and critique of models, and a common problem is that the shortest description of a model is the code itself rather than the theoretical model description. While the code is precise, it is often not easily or efficiently understood, even by proficient programmers. Strategies such as the Overview, Design Concepts and Details (ODD) Protocol alleviate these strains on the scientific process by establishing documentation conventions and encouraging the assumptions and theoretical underpinnings of a model to be made rigorous and communicable [21]. Our framework takes this strategy a step further by grounding the model description in an algebraic structure. The visual diagrams used to communicate models, such as those in figures 2 and 3, then become rigorous enough to be unambiguously translated into code. By making a model’s theoretical formulation more visible and more tightly bound to its software implementation, the compositional approach helps modellers identify components or interactions that are unnecessarily complicated, do not properly reflect domain knowledge, or depend upon unreasonably strong assumptions.

The compositional framework for modelling also streamlines the scientific process by prioritizing the independence of submodels. Submodels can be efficiently tested and substituted without affecting other submodels in a composite. For example, figure 3*c*–*e* exemplifies updating the bloodmeal submodel without affecting the host and vector submodels. This feature is practical for (i) model formulation, in which parsimonious but empirically adequate models are found by testing different combinations and complexities of submodels and (ii) policymaking, where it is important that policies be robust to variations in submodels and other modelling assumptions. Works such as [13,22] demonstrate the importance of testing multiple combinations of submodels. The categorical approach and its implementation in software disambiguates and assists this process.

##### (iii) Theoretical advantages

The compositional framework also provides theoretical advantages to scientific modelling. Because the syntax and semantics of composition are explicitly and independently represented, the composition syntax is a venue for exchanging expert knowledge, while the choice of submodels can be left to specific domain experts or a model selection process. For example, the syntax proposed in figure 3*f* asserts that the submodels for the vector dynamics must include a susceptible, an infected but not yet infectious and an infectious population. It also specifies that the model of vector dynamics is broken down into submodels for the aquatic stages and for the epidemic in adults. Additionally, the composition syntax can be analysed for mechanistic or causal dependencies. For instance, the syntax given in figure 3*f* expresses that the host and vector population can only affect each other through the bloodmeal.

Finally, mathematical properties of these compositional modelling frameworks translate directly into important consistency properties for model construction. The associativity and symmetry properties of operads and their algebras imply that the order of composing submodels does not affect the final model. Similarly, the functoriality of reinterpretation rules, such as the law of mass action, implies that reinterpretation and composition can be done in any order, which again does not affect the final result.

### 3. Type systems for open Petri nets

#### (a) Typed Petri nets

Category theory emphasizes the importance of *morphisms* or maps between mathematical objects. In this section, we demonstrate how morphisms between Petri nets can be used to define typed Petri nets.

Petri nets can represent domain-specific type systems. For example, the Petri net ${P}_{\mathsf{i}\mathsf{n}\mathsf{f}\mathsf{e}\mathsf{c}\mathsf{t}\mathsf{i}\mathsf{o}\mathsf{u}\mathsf{s}}$ in figure 4*a* defines a type system for an infectious disease model. It consists of a single species type and three transition types corresponding to (i) spontaneous changes in infection status; (ii) spontaneous changes between non-infection-related strata, such as movement between patches or changes in quarantine status; and (iii) interactions between a pair of individuals. By contrast, the Petri net ${P}_{\mathsf{v}\mathsf{e}\mathsf{c}\mathsf{t}\mathsf{o}\mathsf{r}\text{-}\mathsf{b}\mathsf{o}\mathsf{r}\mathsf{n}\mathsf{e}}$ depicted in figure 4*c* represents a type system for a vector-borne disease model and has two species types corresponding to the vector and host populations.

A morphism between Petri nets is a map of places, transitions, input arcs and output arcs that preserves the arities of the arcs and respects the sources and targets of the arcs.^{2} For example, the source of an input arc $i$ in the domain Petri net must map to the source of the arc to which $i$ is mapped. Given a Petri net ${P}_{\mathsf{t}\mathsf{y}\mathsf{p}\mathsf{e}}$ defining the type system, a *typed Petri net* is a Petri net $P$ together with a morphism $\varphi :P\to {P}_{\mathsf{t}\mathsf{y}\mathsf{p}\mathsf{e}}$. Figures 4*b* and 5 give examples of Petri nets typed by ${P}_{\mathsf{i}\mathsf{n}\mathsf{f}\mathsf{e}\mathsf{c}\mathsf{t}\mathsf{i}\mathsf{o}\mathsf{u}\mathsf{s}}$.

All of the Petri nets typed by a given type system form a *slice category* of the category of whole-grain Petri nets. The mathematical features of slice categories guarantee important modelling features. First, typed Petri nets are practical for model checking. A Petri net typed by ${P}_{\mathsf{i}\mathsf{n}\mathsf{f}\mathsf{e}\mathsf{c}\mathsf{t}\mathsf{i}\mathsf{o}\mathsf{u}\mathsf{s}}$ assigns each transition to be a spontaneous change in infection, a spontaneous change in strata, or an interaction. The type of a transition must be consistent with the number of input and output arcs connected to it. For example, a typing by ${P}_{\mathsf{i}\mathsf{n}\mathsf{f}\mathsf{e}\mathsf{c}\mathsf{t}\mathsf{i}\mathsf{o}\mathsf{u}\mathsf{s}}$ ensures that a transition with interaction type has two inputs and two outputs. Second, typed Petri nets facilitate high-level critiques of a model. For example, a model typed by ${P}_{\mathsf{v}\mathsf{e}\mathsf{c}\mathsf{t}\mathsf{o}\mathsf{r}\text{-}\mathsf{b}\mathsf{o}\mathsf{r}\mathsf{n}\mathsf{e}}$ cannot incorporate vertical transmission from parents to offspring or sexual transmission in either hosts or vectors. This property may contradict known transmission pathways for a specific disease and thus motivate a revision of the model and the type system. Third, features of the type system may also directly translate into features of the typed Petri net. For example, because each transition in ${P}_{\mathsf{i}\mathsf{n}\mathsf{f}\mathsf{e}\mathsf{c}\mathsf{t}\mathsf{i}\mathsf{o}\mathsf{u}\mathsf{s}}$ has the same number of inputs and outputs, any Petri net typed by ${P}_{\mathsf{i}\mathsf{n}\mathsf{f}\mathsf{e}\mathsf{c}\mathsf{t}\mathsf{i}\mathsf{o}\mathsf{u}\mathsf{s}}$ conserves the total population over time.

Typed Petri nets also provide guardrails for composing models. A Petri net typed by ${P}_{\mathsf{v}\mathsf{e}\mathsf{c}\mathsf{t}\mathsf{o}\mathsf{r}\text{-}\mathsf{b}\mathsf{o}\mathsf{r}\mathsf{n}\mathsf{e}}$ must assign each species to be of either vector-type or of host-type thereby separating the vector and host populations. The type system also ensures that interactions only occur between vectors and hosts and that no vectors spontaneously become hosts and vice versa. For example in figure 4*d*, the transition ${S}_{H}\to {S}_{V}$ which turns susceptible hosts into susceptible vectors is forbidden by the type system. When composing open Petri nets typed by the same type system via the process described in §2a, we can add a constraint that identified species must have the same type. With this check in place, a host subpopulation will not be identified with a vector subpopulation during model composition. Furthermore, under this constraint, the composition of typed Petri nets retains a typing.

Ultimately, a domain-specific typing can guarantee meaningful properties of a model and prevent novice users or automated systems from generating models that contradict common sense or domain expertise.

#### (b) Stratified compartmental models

A recurring theme in scientific modelling is the importance of stratified models, in which local dynamics are reproduced in multiple strata and strata interact according to a specified scheme. For example, Pooley *et al.* [23] consider age-stratified models, and Citron *et al.* [22] compare stratified models defined by a choice of local epidemiological dynamics (SIR, SIS or Ross–Macdonald) and a choice of stratification by location (the flux or simple trip models of metapopulation dynamics). Typed Petri nets offer a general methodology for stratifying models, which contrasts with the by-hand approach taken in [22].

Consider two typed Petri nets, the unstratified *disease model* $\varphi :P\to {P}_{\mathsf{t}\mathsf{y}\mathsf{p}\mathsf{e}}$ and a *stratification scheme* ${\varphi}^{\prime}:{P}^{\prime}\to {P}_{\mathsf{t}\mathsf{y}\mathsf{p}\mathsf{e}}$. The *stratification* of $P$ by ${P}^{\prime}$ is defined to be the Petri net with places (resp. transitions, input arcs and output arcs) consisting of pairs of places (resp. transitions, input arcs and output arcs) in $P$ and ${P}^{\prime}$ which have the same type. Because the morphisms $\varphi $ and ${\varphi}^{\prime}$ respect the source and target maps from arcs to species and transitions, the source and target maps in the stratified model are well-defined. Figure 6*b* gives an example of stratifying the SIR model by a model of quarantine/isolation status. In the stratified model, the place $(S,Q)$ represents the susceptible and quarantining population while $(S,\sim Q)$ represents the susceptible and not quarantining population. The transition mediating the places $(S,\sim Q)$ and $(I,\sim Q)$ represents the infection transition in the SIR model and the interaction between non-quarantining people in the stratification scheme. Because these transitions are both of interaction type and mediate species of the same type, they are paired in the stratified model. Figure 5 gives a palette of additional epidemiological models (SIS and SVIIvR) and stratification schemes (quarantine status, age, the flux model of spatial dynamics and the simple trip model of spatial dynamics).

The categorical abstraction standardizes the definition of model stratification, and its implementation in AlgebraicPetri automates the construction of stratified models under the constraints of the expert-chosen type system. Because the size of stratified models grows quadratically with respect to the sizes of the component models, this framework streamlines the accurate implementation of stratified models as well as the clear communication and critique of stratified models via their components. Many of the advantages described in §2d also apply to the categorical representation of model stratification.

Comparison with [22] highlights the clarity and efficiency that our approach brings to the modelling workflow. Citron *et al.* [22] investigate the choice of applying candidate models of movement between subpopulations to disease models when calibrated to real-world datasets. They combine three standard disease models (SIR, SIS and Ross–Macdonald) with two choices of movement dynamics (flux and simple trip). In the flux model, people relocate to different patches at fixed rates. In the simple trip model, people are assigned a home patch and temporarily visit other patches. The flux and simple trip models on two patches are expressed as Petri nets in figure 5*b*(iii)(iv). In [22], the adjustments to the disease models are done manually and do not express a formal relationship between the adjusted models and their component disease and movement models. By contrast, our approach to model stratification formalizes this construction. In particular, the stratification of the Petri nets for the SIR and SIS disease models (figures 4*b* and 5*a*(i)) by the Petri nets for the flux and simple trip movement models mirror the differential equation models defined in ([22], eqns 6, 7, 9 and 10). Our software implementation of the mathematical ideas presented in this section can then be applied to automatically generate the stratified models from the palette of component models. As shown in the electronic supplementary material, this implementation greatly reduces the size of the Petri nets that must be encoded by hand. Additionally, this approach makes it easier to extend the methods of Citron *et al.* [22], since new stratification schemes, once defined and typed, can be seamlessly integrated into the model construction, calibration and analysis pipeline, instead of requiring experts to manually adjust each candidate disease model by each candidate movement model.

Mathematically, a stratified model is a *pullback* of whole-grain Petri nets, or equivalently a *product* in the slice category over the given type system. Properties of these well-studied categorical formalisms immediately verify useful properties of stratified Petri nets. For example, consider stratifying a disease model by quarantine status and by spatial dynamics. Since pullback is an associative and commutative binary operation, the order of stratification does not affect the final model. That is, the following procedures are equivalent: stratifying the disease model by spatial dynamics and then by quarantine status, stratifying the disease model by quarantine status and then by spatial dynamics, and stratifying the disease model by the stratification of spatial dynamics by quarantine status (or vice versa).

### 4. Calibrating and analysing models

The purpose of epidemiological modelling during an unfolding pandemic is to transform sparse data into effective policy decisions. The robustness of this process depends on understanding how adjusting a model affects its accuracy in representing the data (model calibration) and the policy outcomes it evidences (model analysis). In §2, we described the mathematics and implementation of specifying models by composing open Petri nets and other differential equation models. In this section, we show how this modelling framework streamlines the iterative loop of model specification, calibration and analysis in the context of composing Petri nets, clarifying several of the advantages sketched in §2d.

Since our approach decouples disease models based on Petri nets from the implementation of simulators in code, analysis tools can be defined directly on the combinatorial data structures representing the Petri nets. A tool defined once can thus be applied with equal ease to explicitly defined Petri nets, compositions of Petri nets, hierarchically defined Petri nets and stratified Petri nets. One class of analysis tools comes from integrating AlgebraicPetri with the SciML suite, which provides procedures for parameter estimation and sensitivity analysis. We illustrate how this integration tightens the modelling workflow using the example of the SVIIvR Petri net model defined in figure 2.

AlgebraicPetri includes a method that converts generic Petri nets into the reaction network format supported by Catalyst, a library in the SciML ecosystem that uses symbolic algebra from the ModelingToolkit framework to represent chemical reaction networks [24]. This method can be applied independently of how the Petri net was constructed. For example, we can apply it to the modularly constructed SVIIvR model from §2a and use Catalyst to fit the parameters according to COVID-19 infection data gathered from the US state Georgia over a five-month period (figure 7) [25]. In this case, there is an order of magnitude difference between the estimated initial population and the true initial population. This mismatch may trigger an adjustment to the underlying model such as by a parallel refinement of one or more of the submodels. The adjusted models can be fed into the same calibration pipeline with no modifications to the analysis code (see figure 7).

Following calibration, a model can be analysed to suggest policy decisions and predict policy outcomes. A seamless process for analysing models as they evolve is critical to test the robustness of these decisions and outcomes. As an example analysis, we integrate the proportion of the non-infectious population over a simulation of the SVIIvR model. We also compute the sensitivity of this outcome to the transition rates of the Petri net using the tools in ForwardDiff [26]. In figure 8, the sensitivity results are visualized by a heatmap which explicitly connects the results of the analysis and the underlying Petri net model. Adjustments to the underlying model—such as adding or removing transitions, changing transition rates, or substituting one submodel for another—are immediately reflected in the analysis. This tight feedback loop gives practical and visual tools that can be rapidly refined [27] and used to determine which policy decisions or outcomes are robust to model changes. In these examples, the analysis is treated externally to the model with the analysis being run directly on the ODE derived from the Petri net. However, in future work we intend to incorporate explicit representations of behavioural analyses into the compositional framework and those which are informed by the structure of composition. These analyses may be purely observational, actively control submodels in the composite, or check satisfaction of contracts as formalized in [28].

### 5. Conclusion

Scientific modelling is an iterative process of proposing, implementing, simulating, calibrating, analysing and comparing models. We presented a mathematical framework and software tools to accelerate the modelling process for compartmental models of infectious disease, in an effort to reduce the response time to emerging pandemics. Our framework is grounded in applied category theory and captures the algebraic and compositional structure of scientific models in a way that can be easily conveyed to both human scientists and computer systems. As a result, complex models can be specified compositionally using the syntax of wiring diagrams and algebraic operations or through the stratification of typed models. Our approach makes model structure a readily computable resource, which streamlines numerous downstream analyses, such as parameter estimation and sensitivity analysis. Together, the mathematical and computational features of our approach simplify and accelerate the iterative modelling process.

The structuralist approach to epidemiological modelling suggests many directions for future work. It can be extended to incorporate additional model semantics, such as stock-and-flow diagrams as an alternative to Petri nets, or stochastic and jump differential equations as complements to ODEs and DDEs. As demonstrated, the compositional structure simplifies the specification and visibility of multi-faceted models. A natural next step would be to investigate how compositional structure can be exploited in the mathematical and computational analysis of the models. For instance, parallel computations could be organized using the hierarchical decomposition already inherent in the model specification.

### Data accessibility

The software presented in this paper can be found at www.algebraicjulia.org/.

### Authors' contributions

S.L.: conceptualization, formal analysis, investigation, methodology, software, validation, visualization, writing—original draft, writing—review and editing; A.B.: conceptualization, data curation, formal analysis, investigation, methodology, software, validation, visualization, writing—original draft, writing—review and editing; M.H.: conceptualization, methodology, software, writing—original draft, writing—review and editing; E.P.: conceptualization, formal analysis, investigation, methodology, software, supervision, validation, visualization, writing—original draft, writing—review and editing; J.P.F.: conceptualization, formal analysis, funding acquisition, investigation, methodology, project administration, resources, software, supervision, validation, visualization, writing—original draft, writing—review and editing.

All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

### Conflict of interest declaration

We declare we have no competing interests.

### Funding

The authors were supported by the following DARPA Awards: W911NF2110323 (Fairbanks), HR00112090067 (Libkind and Patterson) and HR00111990008 (Baas and Halter) along with AFOSR Award FA9550-20-1-0348 (Patterson).

## Acknowledgements

The authors thank Xiaoyan Li, Nathaniel Osgood, David Smith and Sean Wu for valuable insights into the methods and workflows of professional epidemiological modellers. We are also grateful to John Baez for insights into categories and pullbacks of Petri nets. We thank Alexandra Trani and Sean Wu for thorough reviews of the manuscript.

## Footnotes

1 Code that demonstrates our software system and reproduces the examples in this manuscript is available on GitHub at https://github.com/AlgebraicJulia/Structured-Epidemic-Modeling/.

2 Such morphisms of Petri nets, also called *etale maps*, are defined in ([11], Section 2.2).

### References

- 1.
Friedman J *et al.*2021 Predictive performance of international COVID-19 mortality forecasting models.**Nat. Commun.**, 2609. (doi:10.1038/s41467-021-22457-w) Crossref, PubMed, Web of Science, Google Scholar**12** - 2.
Waites W *et al.*2022 MGDrivE 2: a simulation framework for gene drive systems incorporating seasonality and epidemiological dynamics.**Phil. Trans. R. Soc.**, 20210307. (doi:10.1098/rsta.2021.0307) Abstract, Google Scholar**380** - 3.
Libkind S, Baas A, Patterson E, Fairbanks J . 2021 Operadic modeling of dynamical systems: mathematics and computation. In*Applied Category Theory 2021*. (https://arxiv.org/abs/2105.12282) Google Scholar - 4.
Vagner D, Spivak DI, Lerman E . 2015 Algebras of open dynamical systems on the operad of wiring diagrams.**Theory Appl. Categ.**, 1793-1822. Web of Science, Google Scholar**30** - 5.
Carpenter B *et al.*2017 Stan: a probabilistic programming language.**J. Stat. Softw.**, 1-32. (doi:10.18637/jss.v076.i01) Crossref, PubMed, Web of Science, Google Scholar**76** - 6.
Boutillier P, Feret J, Krivine J, Fontana W . 2021 The Kappa language and tools. See https://kappalanguage.org/. Google Scholar - 7.
Hoops S *et al.*2006 COPASI: a complex pathway simulator.**Bioinformatics**, 3067-3074. (doi:10.1093/bioinformatics/btl485) Crossref, PubMed, Web of Science, Google Scholar**22** - 8.
King AA, Nguyen D, Ionides EL . 2016 Statistical inference for partially observed Markov processes via the R package pomp.**J. Stat. Softw.**, 1–43. (doi:10.18637/jss.v069.i12) Crossref, Web of Science, Google Scholar**69** - 9.
- 10.
Hucka M *et al.*2003 The Systems Biology Markup Language (SBML): a medium for representation and exchange of biochemical network models.**Bioinformatics**, 524-531. (doi:10.1093/bioinformatics/btg015) Crossref, PubMed, Web of Science, Google Scholar**19** - 11.
Kock J . 2020 Elements of Petri nets and processes. In*Applied Category Theory 2020*. (https://arxiv.org/abs/2005.05108v2) Google Scholar - 12.
Bohner G, Venkataraman G, Wilde H . 2020 COEXI(S)T: modelling COVID-19 exit strategies for policy makers in the United Kingdom. See https://github.com/gbohner/coexist/. Google Scholar - 13.
Wu SL, Bennett JB, Sánchez CHM, Dolgert AJ, León TM, Marshall JM . 2021 MGDrivE 2: a simulation framework for gene drive systems incorporating seasonality and epidemiological dynamics.**PLoS Comput. Biol.**, e1009030. (doi:10.1371/journal.pcbi.1009030) Crossref, PubMed, Web of Science, Google Scholar**17** - 14.
Baez JC, Courser K . 2020 Structured cospans.**Theory Appl. Categ.**, 1771-1822. Web of Science, Google Scholar**35** - 15.
- 16.
Baez JC, Master J . 2020 Open Petri nets.**Math. Struct. Comput. Sci.**, 314-341. (doi:10.1017/S0960129520000043) Crossref, Web of Science, Google Scholar**30** - 17.
Spivak DI . 2013 The operad of wiring diagrams: formalizing a graphical language for databases, recursion, and plug-and-play circuits. (http://arxiv.org/abs/1305.0297). Google Scholar - 18.
Baez JC, Pollard BS . 2017 A compositional framework for reaction networks.**Rev. Math. Phys.**, 1750028. (doi:10.1142/S0129055X17500283) Crossref, Web of Science, Google Scholar**29** - 19.
Smith DL *et al.*2014 Recasting the theory of mosquito-borne pathogen transmission dynamics and control.**Trans. R Soc. Trop. Med. Hyg.**, 185-197. (doi:10.1093/trstmh/tru026) Crossref, PubMed, Web of Science, Google Scholar**108** - 20.
Sharpe FR, Lotka AJ . 1978 Contribution to the analysis of malaria epidemiology. IV. Incubation lag. In*The Golden Age of Theoretical Ecology: 1923–1940*, pp. 348–368. Berlin, Heidelberg, Germany: Springer. Google Scholar - 21.
Grimm V, Berger U, DeAngelis DL, Polhill JG, Giske J, Railsback SF . 2010 The ODD protocol: a review and first update.**Ecol. Modell.**, 2760-2768. (doi:10.1016/j.ecolmodel.2010.08.019) Crossref, Web of Science, Google Scholar**221** - 22.
Citron DT, Guerra CA, Dolgert AJ, Wu SL, Henry JM, Smith DL . 2021 Comparing metapopulation dynamics of infectious diseases under different models of human movement.**Proc. Natl Acad. Sci. USA**, e2007488118. (doi:10.1073/pnas.2007488118) Crossref, PubMed, Web of Science, Google Scholar**118** - 23.
Pooley CM, Doeschl-Wilson AB, Marion G . 2022 Estimation of age-stratified contact rates during the COVID-19 pandemic using a novel inference algorithm.**Phil. Trans. R. Soc.**, 20210298. (doi:10.1098/rsta.2021.0298) Abstract, Google Scholar**380** - 24.
Ma Y, Gowda S, Anantharaman R, Laughman C, Shah V, Rackauckas C . 2021 ModelingToolkit: a composable graph transformation system for equation-based modeling. (https://arxiv.org/abs/2103.05244) Google Scholar - 25. The New York Times. 2021 Coronavirus (COVID-19) data in the United States. https://github.com/nytimes/covid-19-data. Retrieved September 3, 2021. Google Scholar
- 26.
Revels J, Lubin M, Papamarkou T . 2016 Forward-mode automatic differentiation in Julia. (http://arxiv.org/abs/1607.07892). Google Scholar - 27.
Dykes J *et al.*2022 Visualization for epidemiological modelling: challenges, solutions, reflections and recommendations.**Phil. Trans. R. Soc.**, 20210299. (doi:10.1098/rsta.2021.0299) Link, Google Scholar**380** - 28.
Bakirtzis G, Fleming CH, Vasilakopoulou C . 2021 Categorical semantics of cyber-physical systems theory.**ACM Trans. Cyber-Phys. Syst.**, 1-32. (doi:10.1145/3461669) Crossref, Web of Science, Google Scholar**5**