Abstract
We introduce
1. Introduction
The spread of SARSCoV2 in populations with largely no immunological resistance, and the associated COVID19 disease, have caused considerable disruption to healthcare systems and a large number of fatalities around the globe. The assessment of policy options to mitigate the impact of this and other epidemics on the health of individuals, and the efficiency of healthcare systems, relies on a detailed understanding of the spread of the disease, and requires both shortterm operational forecasts and longerterm strategic resource planning.
There are various modelling approaches which aim to provide insights into the spread of an epidemic. They range from analytic models, formulated through differential or difference equations, which reduce numerous aspects of the society–virus–disease interaction onto a small set of parameters, to purely datadriven parametrizations, often based on machine learning, which inherently rely on a probability density that has been fitted to the current and past state of the system in an often untraceable way. As another class of approaches, agentbased models (ABMs) are particularly useful when it is necessary to model the disease system in a spatiallyexplicit fashion or when host behavior is complex[.] [1, p. 2:5].^{1} Being the traditional tool of choice to analyse behavioural patterns in society, they find ample use in understanding and modelling the observed spread of infections and in leveraging this for intermediate and longterm forecasting [3–5]. Such models also provide the flexibility to experiment with different policies and practices, founded in realistic changes to the model structure, such as the inclusion of new treatments, changes in social behaviour and restrictions on movement.
To simulate pandemics, specific realizations of ABMs, individualbased models (IBMs), have been developed in the past two decades, for example [6,7]. In these models, the agents represent individuals constituting a population, usually distributed spatially according to the population density and with the demographics—age and sex—taken from census data.^{2} Within the existing taxonomy of agentbased models in epidemiology, see for instance [8,9], these models often use a diseasespecific modelling framework. Interactions between individuals in predefined social settings, systematically studied for the first time in [10], provide the background for disease spread, formulated in probabilistic language and dependent on the properties of the individuals and the social setting. The sociology of the population and the transmission dynamics are constrained separately using external datasets and available literature, and connected in the description of the spread of the disease. Calibration of such models to observed disease outcomes, such as hospital admission and mortality rates, is therefore reduced to the specific interface between the disease and the varying physiology across the broad population. Policy interactions and mitigation strategies can be flexibly encoded in detail as modifications of the social setting, and allow precise analysis of their efficacy that is not readily available in other approaches.
Evidence from disease data such as COVID19 fatality statistics suggests that case and infection fatality rates are correlated, amongst other factors, to the age and socioeconomics status of the population exposed to the etiological agent [11]. This necessitates the construction of a model with exceptional social and geographic granularity to exploit highly local heterogeneities in the demographic structure. In this publication, we introduce a new individualbased model,
As a first application of
The remainder of this paper is as follows. Section 2 provides an overview of the structure of the
2. The structure of the June modelling framework
The
The
The
The
In response to the spread of a disease through its population, a government might introduce policy measures designed to control and reduce the impact of the disease. In the case of COVID19 in England and many other countries, policies have included social distancing measures, the closure of schools, shops, restaurants and other leisure venues, and restrictions on movement. In
3. Population and its static properties
3.1. Geography and demography
To facilitate generalizability across multiple settings,
For the case of England, the construction of the virtual population in
1.  regions—London, East Midlands, West Midlands, the Northwest, the Northeast, etc.;  
2.  super areas—approximately 7200 middle layer super output areas (MSOAs);  
3.  areas—approximately 180 000 output areas (OAs). 
The individuals in
3.2. Household construction
The virtual population within
For the UK the ONS census datasets provide a detailed record of both household type and composition in England at the OA (area) level. That is, for each OA there is a set of summary statistics across a number of criteria, choices can then be made in regard to aggregating those frequency measurements at different resolutions. In term of data categories for households, the OA (area) level provides the following occupancy type counts: single, couple, family, student, communal and other [20], and further specifies them by the number of old adults, aged over 65, adults, dependent adults (such as students) and children, providing around 20 distinct classes contingent on the underlying census information. Given the data structure, it is impossible to recover the exact composition for each household type. For example, the number of nondependent children (people over the age of 18 living with their parents), the number of multigenerational families, and the exact distribution of adult groups sharing a household are not specified in these datasets. However, these features can be statistically extrapolated using a mix of further secondary data and validated against various aggregate survey information at the regional and national level.^{4} Households are populated iteratively giving preference to those household types with the most precise available data. The exact procedure for the UK is documented in appendix A.1.
Similarly, to households, care homes are classified by type, positioned and populated using ONS data [21] at the OA (area) level. The ONS collects information on the age distribution and sex of residents of communal establishments at the MSOA (super area) level [22]. By combining these datasets, we infer the age and sex distribution of the care home population.
Other communal establishments specified in the census, including student accommodations and prisons can be flexibly added with sufficient datasets. Within the presented version of
3.3. Construction of virtual schools and universities
Schools and universities are two locations where a resident population will visit and interact. Every location can have universal and specific attributes flexibly initiated within the modelling framework depending on the detail of available information. From public records
To model schools in England, we use data provided by [23] to determine the location of schools and their age brackets. Based on the current enrolment requirements for the UK, we assume that children between the ages 0–19 can attend school, with mandatory attendance between 5 and 18. Since 19yearolds can attend school, university, work or none of these, the institution they attend is determined by the number of vacancies in schools accepting students of that age group. We send children to one of the n = 10 nearest schools where classes sizes are limited to 40. One way in which we validate our assumptions is by comparing average travel distance to schools of different types. In
Similarly, universities are located according to their address as recorded in the UK Register of Learning Providers (UKRLP) [23]. Students are enrolled in a university using the UKRLP enrolment data. The enrolled students are assigned from a subset of the local population to the university, reflecting the fact that the ONS census uses the termtime address of students.
Students are sampled from adults between the ages 18–25 with a preference given to those previously assigned to living in student or communal households in a given radius around the university. The concentrations of students expected by
3.4. Construction of workplaces
Workplaces are constructed for the subset of the population in employment according to public records. We divide employment structures into three categories: work in companies with employees; work outside fixed company structures; work in hospitals and schools. The number of employees in each MSOA (super area) is data driven from the workforce information in that specific MSOA. To distribute the workforce over workplaces,
In England, the ONS database contains information on companies and workforce structured by industry type. Industries and companies are categorized according to 21 sectors following the Standard Industrial Classification (SIC) code convention [26] (table 1) and information about company numbers per sector, and company sizes is available at the MSOA (super area) level [27]. Similarly, the ONS data also contain the size and sex distribution of the workforce by sector at the MSOA level, as well as the location of their employment [28,29]. This enables the construction of an origin–destination matrix and allows us to distribute the workforce accordingly. More details on this specific procedure for initializing companies in
SIC code identifier  description 

A  agriculture, forestry and fishing 
B  mining and quarrying 
C  manufacturing 
D  electricity, gas, steam and air conditioning supply 
E  water supply; sewerage, waste management and remediation activities 
F  construction 
G  wholesale and retail trade; repair of motor vehicles and motorcycles 
H  transportation and storage 
I  accommodation and food service activities 
J  information and communication 
K  financial and insurance activities 
L  real estate activities 
M  professional, scientific and technical activities 
N  administrative and support service activities 
O  public administration and defence; compulsory social security 
P  education 
Q  human health and social work activities 
R  arts, entertainment and recreation 
S  other service activities 
T  activities of households as employers; undifferentiated goodsand servicesproducing activities of households for own use 
U  activities of extraterritorial organizations and bodies 
The resulting distribution of our procedure assigning individuals an industry sector can be seen in figure 5.
Hospitals play a dual role in
4. Simulating social interactions
The
4.1. A virtual individual’s day
Calendar days, decomposed into timesteps of varying length given in units of hours, are the background for our simulation of the social interactions of our virtual population.
Time in
A summary of how much time is spent each week on various activities as a function of age is reported in figure 6a. In figure 6b, we show a comparison of the amount of time spent at home, work, grocery shopping, eating at restaurants/pubs and commuting, between
4.2. Localized activities
Within
For England, we have located 120 000 pubs and restaurants according to their geocoordinates, as well as 32 000 stores and 650 cinemas, with data from OpenStreetMap [32]. Each time a person is assigned to any of ‘pubs’, ‘groceries’ or ‘cinemas’, we pick a random venue from the n venues closest to their place of residence, or the closest venue if the distance to any of them is greater than 5 km. We have chosen n = 7 for pubs, n = 15 for shopping stores and n = 5 for cinemas. Note that there are no permanent ‘workers’ in these venues who return to a single venue daily; only ‘attendees’ who choose their venue at random. Further locations such as gyms and places of worship can be easily added to the activity model, and, of course, it can easily be adjusted to other societies.
In addition, we model interactions in naively constructed social networks, by linking each household to a list of up to $\mathcal{N}$ other households in the same super area. One of the households in this list is selected if ‘household visits’ is chosen as activity during a timestep. Residents will stay at home to receive the incoming visitor, who in turn may also bring their whole household with them according to a probability described by an external parameter. Comparison with national surveys suggests that setting the number of linked households $\mathcal{N}=3$ provides realistic movement profiles. While care home residents in
4.3. Modelling mobility: commuting patterns
Mobility is modelled in
To model commuting and rail travel in England we use data provided by the UK Department for Transport [34]. Large metropolitan areas are selected as the major transit node for the network. Commuting induces social mixing between many people who may not normally come into contact and reflects the importance of transport as a mechanism for promoting the geographical spread of infection supplementing the spread from individuals moving to a new location and infecting other individuals at that location.
To fill our origin–destination matrix we use information contained in the ONS database concerning the mode of commuting of individuals at the area (OA) level [35], to distribute commuting modes probabilistically. We define two modes of public transport, ‘external’ which defines those commuting in and out of metropolitan areas, and ‘internal’ which defines those commuting within these areas. Metropolitan areas are defined using data obtained from the ONS [36]. For the sake of computational efficiency, we model only the travel patterns of those working inside metropolitan areas, who in fact represent the overwhelming majority of public transport commuters. This includes commuters who live and work in the city, as well as those who are entering the metropolitan area from outside. The number of internal and external commuters by city in England is given in figure 7. The cities included are geographically spread across England thereby accounting for major commuting patterns in most regions modelled. In total, we explicitly model commuting into 13 out of a possible 109 cities in England, which accounts for 60% of all metropolitan commuters and 46% of all those using public transport to commute to work. Figure 8 shows maps of the residences of internal and external commuters in two cities in our model, where the inner section in white denotes the respective metropolitan areas. Specifically, from figure 8b, we can see that, given the large commute radius of cities like London (we observe a similarly large radius for Birmingham and several other cities), commuting can be a key driver for the interregional spread of infectious diseases.
Travelling within a metropolitan area, i.e. the internal commuting mode, is modelled as a selfconnected loop—practically speaking this means that internal commuters may in principle interact, irrespective of the actual movement inside the city. For external commuting, the travel into and out from the metropolitan area, we identify shared routes for commuters living in neighbouring areas and super areas. The number of possible routes into each city, and therefore the number of ways to divide regions around the cities, is informed by the approximate number of rail network lines into each city—currently this is set to eight in London and four for each of the other 12 cities [37].
We randomly partition people sharing the same commuting route into subgroups, ‘carriages’, which define the environment in which social interactions take place. The commuting timestep is run twice a day and in each run the travellers are randomly distributed into carriages. The number of people per carriage is determined by citydependent data obtained from the UK Department for Transport [34]. More details on the specific algorithm for modelling commuting in
4.4. Social interaction frequencies and intensities
Social contact matrices [10,15] provide information about the agedependent frequency and intensity of inperson contact in different social settings, an important ingredient to many epidemiological simulations. They measure the average daily number of conversational and physical contacts between individuals of different ages. This means that they are normalized to the size of the population in the respective age bins, but do not account for whether they can take part in such contacts. To use them within
Averaging over age ranges in different settings, we arrive at simplified social mixing matrices, ${\chi}_{si}^{\mathcal{L}}$, which will be comparable to the inputs from literature upon combination with the model results for the composition of social environments. Below we list our simplified social mixing matrices inferred from the literature, with $\mathcal{L}\in \{(H),(S),(W)\}$ (home, school, workplace), as well as the relative proportions, ${\varphi}_{\mathrm{si}}^{\mathcal{L}}$, of physical contacts. The latter are relevant, since in line with standard approaches, closer physical contact in
For the households social mixing matrices, we define four categories, young children (K), young dependent adults of age 18 or more (Y) that still live with their parents, adults (A), and older adults (O) of age 65 and over. We use
Social contacts in schools identify teachers (T) and students (S); the latter are organized in year groups and further divided into classes of up to 40 students. In our ageaveraging, we implicitly assume that the number and character of teacher–student contacts is independent of the age of the students. Student–student contacts are assumed to be most frequent within a class or year group, and fall off steeply with the age difference. This behaviour is captured by fitting a matrix with values for the agediagonal elements and a falloff per year agedifference by a factor of 3. Therefore we have
For the contacts at work, we do not take into account of any agedependence and, in the absence of data, do not model any sectordependent variation of their number of intensity, thus
These social mixing matrices in
To validate these simplified matrices, we include them within
5. Infection modelling: spreading and health impact
The transmission of infection through social interactions described in the
Throughout this section and the rest of the paper, we will use two definitions of COVID19 ‘cases’. The first is when we refer to cases in the model itself—here, a case of COVID19 is an infected agent which may be symptomatic or asymptomatic. The second is when referring to cases in reality—here, a case is someone who has tested positive for COVID19. Since the latter is subject to testing coverage, capacity and efficacy, we do not use these for fitting or validation purposes.
5.1. Infection transmission
—  the number, N_{i}, of infectious people i ∈ g present;  
—  the infectiousness of the infectors, i, at time t, I_{i}(t);  
—  the susceptibility, ψ_{s}, of the potential infectee, s;  
—  the exposure time interval, [t, t + Δt], during which the group, g, is at the same location;  
—  the number of possible contacts, ${\chi}_{si}^{(L)}$, and the proportion of physical contacts, ${\varphi}_{si}^{(L)}$, in location L, both taken from equations (4.3)–(4.6) in §4.4;  
—  and the overall intensity, β^{(L,g)}, of group contacts in location L. 
In the construction of an infection probability for a susceptible individual, s, we make a number of assumptions. First of all, we model the probability of being infected as a Poisson process. In keeping with the probabilistic process, the argument of the Poisonnian is given by a sum over individual pairs of infectious individuals with the susceptible person, implying a simple superposition of individual infectiousness. The underlying individual transmission probabilities are written as the product of the susceptibility of the susceptible individual, the infectiousness of the infected person, and the contact intensity, all integrated over the time interval in which the interaction occurs. The integration over time ensures that the transmission probability increases with the time of exposure. We therefore arrive at the transmission probability, i.e. a probability for s to be infected as
Note that in the actual implementation, we approximate the integral over time with a simple product,
This leaves us to fix the last two ingredients in equation (5.2), the individual susceptibility, ψ_{s}, and the infectiousness, ${\mathcal{I}}_{i}(t)$. Contemporary peerreviewed academic research on susceptibility to infection by the etiological agent with or without the onset of disease symptoms is sparse and inconsistent. Following some evidence, for example in [38] and [39], on transmission and susceptibility of children (using the UN classification), we fix ψ_{s} = 0.5 for children under the age of 12, and ψ_{s} = 1 for everybody else. The infectiousness of individuals, ${\mathcal{I}}_{i}$, changes with time, and it is not directly measurable. To model its behaviour, we use the temporal dependence of viral shedding as a proxy for infectiousness. Studies in the context of COVID19 have shown that viral shedding peaks at or slightly before the onset of symptoms, and then begins to decrease [40]. In
5.2. Infection progression
When an individual is infected, they will experience different impacts on their health. Figure 11 presents the paths available in
1.  asymptomatic individuals, rate R_{I→A}(p), continue their life normally;  
2.  individuals with mild symptoms, rate R_{I→M}(p), usually continue their lives as normal, except if certain policies are activated;  
3.  individuals with severe but not lethal symptoms, rate R_{I→S}(p), stay at home until recovery;  
4.  individuals with severe symptoms who will eventually die in their residences, with rate R_{I→DR}(p);  
5.  individuals who are admitted to hospital but will recover, with rate R_{I→H}(p);  
6.  individuals who are ultimately admitted to ICU/ITU before recovering, with rate R_{I→ICU}(p);  
7.  individuals who are admitted to hospital and will die there, with rate R_{I→DH}(p) and  
8.  individuals who are admitted to ICU/ITU and die there, with rate R_{I→DICU}(p). 
The construction of reasonable progression paths, and their probabilistic distribution, relies critically on the knowledge of how many people have been infected, as well as the dependence on attributes such as age and sex. COVID19 tests between February and May 2020 in the UK were mostly administered to people presenting symptoms or people that have been in close contact with confirmed cases in hospital, thereby biasing the results. We therefore need to infer the number of infections from other controlled studies, such as antibody tests. In [42], the seroprevalence, r_{sp}(p), of COVID19 in the adult population in England was determined through a sample of more than 100 000 adults, showing a reduction in seroprevalence with increasing age. Because the seroprevalence is an estimate of all people that were infected up to the time of the test and—most importantly—survived, we need to correct for those who died of the disease until this point. This turns out to be an important correction, especially in older age bins due to the nonnegligible probability of elderly who died. We therefore add the age and sexdependent number of deaths, N_{D}(p), reported by the ONS [43], to the corresponding numbers inferred from the seroprevalence to arrive at the total number of cases, N_{tot}(p),
Health outcomes given a simulated infection are captured in R_{I→X}, where X is one of the eight trajectories listed in figure 11. The asymptomatic rate, R_{I→A}, and the mild case rate, R_{I→M}, are taken from a calibration done in [41] from [45,46]. To calculate the different hospitalization and fatality rates, we have used a series of datasets listed in table 2, all of them containing data until 13 July 2020, to be consistent with the considered seroprevalence values. In order to avoid possible irregularities in our results derived from the use of different data sources, we normalize all our death data to the ONS reported numbers of total deaths (51 443), hospital deaths (32 164) and residence deaths (19 279), [43] and then use more granular data to distribute deaths by age and sex for each place of death occurrence [50,51]. Likewise, the total number of hospital admissions is taken from [52], and distributed by age, sex and residence type also using [50,51]. The number of deaths in care homes reported in [53] is only reported by age until late June, so we assume that the distribution does not change until 13 July 2020. We also ensure that we correctly account for differences in reporting times. As a first step, we calculate the overall infection fatality rate (IFR) for the general population outside care homes (GP),
quantity  source 

population by age, sex and residence type  [47,48] 
seroprevalence in GP by age  [42] 
seroprevalence in CH by age  [44] 
deaths by place of occurrence and residence type  [43] 
deaths profile by age and sex  [43] 
deaths in CH profile by age and sex  [49] 
hospital deaths profile by age, sex  [50] 
hospital deaths in CH profile by age, sex  [51] 
ICU/ITU deaths profile by age, sex  [51] 
total hospital admissions  [52] 
hospital admissions profile by age, sex  [50] 
ICU/ITU admissions profile by age, sex  [51] 
hospital admissions in CH profile by age, sex  [51] 
The results of computing the individual infection outcome rates by age, sex and residence type are shown in figure 13. The most important visible difference is the disparity on the fatality rates between care home residents and the general population. This could be the reflection of various reasons, including, for example, a generally poorer health condition of the care home population, or differences in admission policies to hospitals. Consistent with the ONS data [53], most of the care home deaths occur within the care home residence itself, while the probability of being admitted to the hospital decreases with age. Likewise, both for the general population and the care home population, people aged 55–70 years old are the group most likely to be admitted in the ICU/ITU. Females are less likely in general to develop a severe infection of COVID19, with fatality rates roughly equivalent to those of a male 5 years younger.
Once an infection outcome has been determined, the infected individual follows a symptoms trajectory composed of different stages. The time spent at each stage is sampled from different distributions derived from different data sources. In table 3, we list the different stages per trajectory by infection outcome, and the details on the various timings are listed in appendix D. In figure 14a, we show the probability density functions for the incubation time, and the time to die or recover in hospital.
trajectory  stages 


asymptomatic  I[β_{I}]  A[C_{14}]  R  
mild  I[β_{I}]  M[C_{20}]  R  
severe  I[β_{I}]  M[C_{20}]  S[C_{20}]  R  
death at home  I[β_{I}]  M[LN_{M}]  S[C_{3}]  D  
ward  I[β_{I}]  M[LN_{M}]  H[β_{H}]  M[C_{8}]  R  
death in ward  I[β_{I}]  M[LN_{M}]  H[β_{D}]  D  
ICU/ITU  I[β_{I}]  M[LN_{M}]  H[LN_{ICU}]  ICU[e_{ICU}]  H[e_{H}]  M[C_{3}]  R 
death in ICU/ITU  I[β_{I}]  M[LN_{M}]  H[LN_{ICU}]  ICU[e_{D}]  D 
5.3. Seeding infections
In the absence of sufficiently detailed knowledge of how epidemics arrive in a country, we seed infections using secondary information such as the number and regional distribution of observed cases. In the example of the simulating the spread of COVID19 in England, we use the number of COVID19related deaths recorded in hospitals to estimate initial infection numbers and their regional distribution. Accounting for the time delay between infection and possible death, and for the probability of admitted patients to die, we have
The relatively large statistical fluctuations in the initial phase of an epidemic, and possibly differing time profiles across regions, translate into the need for a regionspecific seeding. This difference is highlighted by contrasting the seeding for London, where we introduce initial infections over two days only (28–29 February 2020) with the northeast of England and Yorkshire, where we seeded infections for a week, 28 February–5 March 2020. We introduce the estimated number of daily cases in each of the regions until the following criterion is met,
6. Mitigation policies and strategies
Policies and interventions, often enacted by governing bodies, are introduced in an attempt to mitigate and control the spread of infectious diseases. In general, such policies are highly dependent on the type of infection and social norms in the affected population, and may include guidelines on how to change individual patterns of behaviour or the closure of certain venues where transmission is estimated to be highly likely. The modular nature of
6.1. Behavioural changes
There are a variety of changes in behavioural patterns that are designed to reduce the probability of viral transmission, ranging from simple social distancing, increased hygiene and mask wearing, to quarantining of infected individuals or those who have been in sufficiently close contact with them, and the shielding of vulnerable parts of the population. We model the impact of the former set of measures, social distancing, increased hygiene and mask wearing, through multiplicative reductions in the locationspecific contactintensity parameters, β^{(L,g)}, see figure 15 for an example. The impact of compliance with social distancing and other, similar measures can be recorded both nationally and sometimes even in specific locations. This allows us to calculate the reduction in the corresponding intensity parameters as follows:
We will now turn to discuss our choices for specific measures. There have been a variety of studies on the effectiveness of social distancing with respect to COVID19 and other infectious diseases. A comprehensive systematic review and metaanalysis [55] suggested that the relative risk of infection decreases by approximately a factor of 2 per metre distance. In practice, however, the efficiency of social distancing is highly dependent on external factors, in terms of both physical and social environment. We therefore use this literature as a benchmark, assuming on average 1 m social distancing, E = 0.5, and fit the effects of social distancing to data where possible (see §6.3).
We simulate mask wearing according to equation (6.2), i.e. by multiplicatively reducing the β parameters in different locations. There is a significant body of literature on the effectiveness of mask wearing, including differences based on the material of the mask and the locations in which they are worn [55–57], as well as changes in efficiency due to reusing or washing them [58,59]. In general, we focus on the wearing of masks by nonhealthcare workers in settings outside the home and estimate mask effectiveness, E, to be 50% [60], irrespective of the specific location. However, after adjustments for compliance the actual, intensity parameter reduction may be much lower than this, which leads us to believe that this represents a conservative estimate.
In
Given the additional danger infectious diseases may pose to the more vulnerable and elderly populations, various policies, usually referred to as ‘shielding’ can be introduced with an aim to protect these individuals. In
6.2. Closure of venues
Mitigation strategies that aim at reducing infection transmission through changes in individual behaviour may have to be further supplemented through partial or complete closure of certain parts of public life such as companies, transport, schools and universities.
Starting with the closure of companies,
School and university closure is handled similarly to the closure of companies in
In addition to the partial or complete closure of companies in some industry sectors and of schools or universities, government policies may also close or limit the number or people attending leisure venues, such as restaurants and pubs, cinemas, or similar. In
6.3. Policies in the UK
The population, interaction and disease layers of
To simulate, expost, the spread of COVID–19 in England we impose a set of policies restricting movement and attempting to reduce transmission. Table 4 lists the operational policy interventions enacted by the UK Government from the beginning of March 2020 to October 2020 in an effort to reduce the spread of SARSCoV2.
date (dd/mm/yy)  policy  implemented 

04/03/2020  encourage increased handwashing  
12/03/2020  case isolation at home  * 
16/03/2020  voluntary household quarantine  * 
16/03/2020  stop all nonessential travel  ** 
16/03/2020  stop all nonessential contact  ** 
16/03/2020  voluntary working from home  * 
16/03/2020  voluntary avoidance of leisure venues  * 
16/03/2020  encourage social distancing of entire population  * 
16/03/2020  shielding of over70s  * 
20/03/2020  closure of schools and universities  * 
21/03/2020  closure of leisure venues  * 
21/03/2020  stopping of mass gatherings  ** 
23/03/2020  ‘stay at home’ messaging  ** 
11/05/2020  multiple trips outside are allowed in England only  
13/05/2020  encouraged to go back to work if they can while distancing  * 
01/06/2020  meeting in groups of up to 6 outside allowed  ** 
01/06/2020  shielding of over70s relaxed  * 
01/06/2020  school reopening for Early Year and Year 6 students  * 
13/06/2020  ‘support bubbles’ allowed  
15/06/2020  school reopening for Year 10 and 12 students for facetoface support  * 
04/07/2020  leisure venues allowed to reopen  * 
04/07/2020  householdtohousehold visits permitted along with overnight stays  * 
24/07/2020  mask wearing compulsory in grocery stores  * 
01/08/2020  shielding is paused  * 
01/08/2020  ‘Eat Out to Help Out’ scheme introduced  * 
31/08/2020  ‘Eat Out to Help Out’ scheme ends  * 
01/09/2020  schools and universities allowed to reopen  * 
01/09/2020  ‘Rule of 6’ introduced  
14/10/2020  tiered local lockdown system introduced  * 
In order to estimate the effects of social distancing on the epidemiological development of COVID–19, we implement multiple staggered social distancing steps during the first wave of the pandemic between 16 March and 4 July 2020 and then again going into September 2020 as schools and universities begin to fully reopen. We fit the national compliance, C^{(N)}, with social distancing between 24 March and 11 May 2020 in the range 20–100% when fitting the rest the parameters (see §7). This is taken to be the harshest social distancing step against and others are determined relative to this fit. The locationspecific compliance, C^{(L)}, is set to be 100% in all locations during fitting to avoid parameter degeneracy and then altered manually thereafter. No social distancing is assumed between household members. We derived the compliance with mask wearing from a YouGov survey [61], and we further stratify the results by social environment or locations. Specifically, we assume complete (100%) compliance with mask wearing during commuting, 50% in care homes and no compliance in pubs, schools or in the household. Compliance with mask wearing in grocery stores is assumed to be at 50% before 24 July 2020, after which we assume complete compliance given the change in government regulations. Since we already assume low intensity parameters in hospitals due to the significant amount of personal protective equipment (PPE) being worn in these scenarios, we do not apply any additional mask wearing in these settings.
On 16 March 2020, the UK Government encouraged people with COVID19 symptoms to quarantine in their household for 7 days and all those in their household to quarantine for 14 days from symptom onset. We assume that compliance with this measure varies with time as people become more aware of the dangers of COVID19. Between the 16 March and 23 March 2020 (i.e. the week leading up to the nationwide ‘lockdown’) we fit compliance with the quarantine policy of those symptomatic to be between 5 and 45%, and the probability that the rest of the household of a symptomatic individual complies is set to the same fitted value. After ‘lockdown’ comes into effect, the government tightened these rules to only leave the house for essential trips and one form of exercise per day. To account for this, we increase the symptomatic and household compliance with quarantine to be double their fitted value. In addition, the UK Government strongly suggested that people over the age of 70 were to shield, from 16 March 2020. As in the case of quarantine, we assume people become more compliant with this policy over time and that the initial compliance with the shielding policy for this age bracket increased from 20% in the first week to 70% afterwards. Indeed, one of the reasons the compliance was set to only 70% even after lockdown is due to the fact that people in this age bracket already have a reduced mobility and interaction potential. A 70% compliance therefore still allows them a small chance to interact with others, e.g. in grocery stores, and any higher compliance figures would mean a complete and unrealistic decoupling of this critical population from any social interactions. The shielding policy initially runs until 1 August 2020 and after which the UK Government paused the policy.
To model the partial or complete closure of industry sectors, it is important to understand the descriptions of key workers provided by the UK Government [62], and match these up with the relevant fivedigit SIC codes [26]. This ultimately allows us to deduce the proportion of key workers in each sector and assign the corresponding key worker attribute probabilistically according to these proportions. In our simulation, we encode findings from the ONS [62], reporting that 33% of the total workforce were key workers in 2019 with 14% able to work from home. We therefore set the proportion of key workers, i.e. those who go to work each day, at 19% of the workforce. We use the same logic to also decide which workers are furloughed in
From 20 March 2020, all schools and universities in England were asked to close, with the exception that children of key workers could still attend school. To account for the partial school reopening of Early Years (nursery and reception age children) and Year 6 students on 1 June 2020, we open up these year groups in
On 16 March 2020, the UK Government encouraged people to avoid going to leisure venues such as bars and restaurants, although this rule was not imposed through the closure of such venues. However, on 21 March 2020, this closure took place. We model these policies first by reducing the probability that people leave the house from 16 March 2020 followed by the closure of all relevant leisure venues included in the simulation—cinemas, pubs and restaurants—from 21 March 2020. Visits to care homes are also halted from this time. Since many of these venues were permitted to reopen from 4 July 2020, we assumed all venues reopen at this point. Additionally, data collected by OpenTable suggests that restaurant attendance after that date saw a significant increase probably encouraged by the UK Government’s ‘Eat Out to Help Out’ scheme which we capture in
7. Discussion of model outputs
In this section, we finally highlight the ability of
In figure 18, we exhibit results for the number of daily deaths in hospital for regions of England and England itself. In addition, in figure 19, we show the same realizations for daily deaths in England stratified by age. The agreement with data is satisfying and while there are minor discrepancies for certain outputs, we would like to stress that all of these outputs are simultaneously fit by
Along with deaths in hospitals, there have been a nonnegligible number of fatalities in care homes in England during this pandemic.
We would like to emphasize that the outputs shown here are illustrative of the capabilities of
All interactions resulting in infections are stored in full detail in the model’s output, enabling further expost analysis of the sociological nature of disease spread and outcomes for all individuals modelled in the simulation. A simple example of such an analysis is shown in figure 21 where locations of infections are compared for one of the realizations shown in figure 18. Remaining realizations manifest a similar hierarchy of infection locations demonstrating
8. Fitting via Bayesian emulation
We now discuss efficient calibration strategies which form a critical part of our ability to extract core insights from
We hence employ the Bayes linear emulation and history matching methodology [74–76], a widely applied uncertainty quantification approach designed to facilitate the exploration of large parameter spaces for expensivetoevaluate models of deterministic or stochastic form. This approach centres around the concept of an emulator: a statistical construct that mimics the slowtoevaluate scientific model in question, providing predictions of the model outputs with associated uncertainty, at asyetunevaluated input parameter settings. In contrast to the model, the emulator is extremely fast to evaluate: for example, in the case of
Initially, we identify a large set of input parameters to search over, primarily composed of interaction intensity parameters at the group level, along with associated broad ranges, as given in table 8. We then identify a set of particular model outputs to match to corresponding observed data. Here, we focus on hospital deaths (CPNS [72]) and total deaths (ONS) at wellspaced time points throughout the period of the first wave of the epidemic. We then construct Bayes linear emulators for each of the model outputs at each of the chosen time points. The emulators are trained using a set of
Due to the emulators’ speed, they are ideal for global parameter exploration. This is performed by constructing an implausibility measure that gives the distance between the emulator’s expected
For the
We can see that the Bayes linear emulation and history matching methodology facilitates the efficient exploration, development and calibration of the highly complex
9. Summary
In this paper, we introduced the new
The model is formulated and encoded in four distinct layers,
Studies where
Data accessibility
Map data copyrighted OpenStreetMap contributors and available from https://www.openstreetmap.org.
Authors' contributions
J.A.B., A.C., C.C.L., E.E., M.I.L., A.Q.B., A.S. and H.T. thank the STFCfunded Centre for Doctoral Training in DataIntensive Science^{9} for financial support.
Competing interests
We declare we have no competing interests.
Funding
F.K. gratefully acknowledges funding as Royal Society Wolfson Research fellow. I.V. gratefully acknowledges Wellcome funding (218261/Z/19/Z). This work used the DiRAC@Durham facility managed by the Institute for Computational Cosmology on behalf of the STFC DiRAC HPC Facility (www.dirac.ac.uk). The equipment was funded by BEIS capital funding via STFC capital grants ST/K00042X/1, ST/P002293/1, ST/R002371/1 and ST/S002502/1, Durham University and STFC operations grant ST/R000832/1. DiRAC is part of the National eInfrastructure.
Acknowledgements
This work was undertaken as a contribution to the Rapid Assistance in Modelling the Pandemic (RAMP) initiative, coordinated by the Royal Society. We are indebted to a number of people who shared their insights into various aspects of the project with us: We would like to thank Sinclair Sutherland for his patience and support in using the ONS database of the census data—without his help it would have been near impossible for us to produce our virtual population. James Nightingale and Richard Hayes provided valuable insights into the construction of efficient algorithms in the initial phase of the project. We are grateful to Bryan Lawrence, Grenville Lister, Sadie Bartholomew and Valeriu Predoi from the National Centre of Atmospheric Science and the University of Reading for assistance in improving the computational performance of the model. We gratefully acknowledge the generous provision of computing time on the Hartree and JASMIN facilities. We would like to thank the GridPP team at Durham and Manchester for their support and computing time spent on their systems. We would also thank Michael Goldstein and T.J. McKinley for their statistical and epidemiological advice. Christina Pagel and Rebecca Shipley provided invaluable advice in producing this publication and looking for holes in our arguments. This paper made use of Python [86] and the following Python libraries: Matplotlib [87], Numpy [88], Pandas [89,90], Scipy [91], SciencePlots [92].
Appendix A. Algorithms
A.1. Constructing credible households
The ONS divides households into the following broad categories: single, couple, family, student, communal and other [20]. We populate the households in this ordering, giving preference to those types for which we have the most precise and unambiguous data.
We define and construct households types as follows:
1.  Single: These are households with a single person living in them. The census data differentiate single households occupied by an adult or an older adult (greater than or equal to 65 years old), and we fill the households accordingly.  
2.  Couple: These are households occupied by a couple without children. Again, the census differentiates between household with adults or older adults living in them. We preferentially fill these households with two people of different sex, with an age difference sampled from the corresponding UK distribution of age differences at the time of marriage [93] (see also figure 23a).  
3.  Family: These households are defined by the number of adults (singles or couples) and the number of children. A difficulty here is that the census data does not stratify beyond ‘two or more’ children. To compensate for this, we introduce a distribution to select the number of children in these households. To fill a family household, we allocate a female adult first. If there are no female adults available (because they have already been allocated somewhere else), we chose a male adult. In case of families with two adults, we match the person with a partner, preferentially with different sex, and an age difference sampled from the same dataset we use for couples. The census data provide us with the number of dependent children for each OA (area), and we add a suitable number of children according to the age difference between the mother and the nth child as given by ONS data collected on birth characteristics [94] (see also figure 23b,c).  
4.  Students: From the census data, we know how many student households there are and how many students live in a given OA (area). We uniformly distribute students among their households, assuming a constant ratio of the number of students per household. Students are selected from the population aged between 18 and 25 years old.  
5.  Communal: We use census data on the number of people in an OA (area) living in a communal establishment, as well as the number of such establishments, such as care homes [21]. The communal establishments are filled last, after the types described above; their residents will be those who do not live in any of the other household types. As in the case of student households, we assume a constant ratio of the number of communal residents per establishment.  
6.  Other: This category encapsulates the uncertain household compositions given by ONS. These may include groups of adults living together, multifamily or multigenerational families. In a similar manner to the communal households, these are filled last with those people that have not yet been allocated. 
As a further test of our household populating algorithm against available data, we compare the
A.2. Schools
The procedure for assigning children and teachers to schools throughout England is specified in §3.3.
Following our algorithm, we arrive at a distribution of school sizes displayed in figure 25, which we see to be in reasonable agreement with the data. Similarly, figure 26 shows the full distribution of class sizes in
A.3. Workplaces
We use ONS data on industries and companies in England categorized according to 21 sectors following the Standard Industrial Classification (SIC) code convention (table 1) [26] as our framework for differentiating between different types of work.
Companies are initialized according to ONS data on company sizes and sectors at the MSOA (super area) level [27]. We use data on the geographical distribution of company sizes to fix the number of companies at the MSOA (super area) level and use the data on the distribution of sectors to probabilistically assign an industry sector to these companies at the same geographical level. Since the ONS provides information on company sizes by binned size ranges, we take the median size of each bin and assign this to each company. The largest bin is 1000 + employees which we assume to be 1500. It should be noted that companies are not assigned a sector based on their size, but purely on their geography. This does not mean there is no correlation between company size and their sector in
Individuals are assigned a sector attribute probabilistically, following the distributions of sectors disaggregated by sex at the MSOA (super area) level [28]. We determine the MSOA (super area) in which they work according to the ONS commuting origin–destination matrix (or ‘flow’ data) [29] which provides information on the number of people by sex travelling from one MSOA (super area) to another for work. Finally, a matching is carried out between people who work in a certain MSOA (super area), and the companies available to them based on their respective sector attributes. In future work, we plan to use additional demographic attributes to assign individuals their sectors and companies.
A.4. Commuting
The commuting structure in
The following procedure is used to determine the groups within which people have the chance to mix during a commute.
1.  For each city, we seed several additional nodes which act as ‘gateway stations’ outside the metropolitan area boundary. These serve as funnels into the city and determine the mixing of external commuters. In the case of London, we seed eight stations which are placed evenly around the boundary of the metropolitan area. For all other stations, we seed four evenly spaced stations north, south, east and west of the city boundaries. These figures are informed by the approximate number of train lines entering each city, and the proportional differences between the number of London public transport links and those of other cities [37].  
2.  We model the commuting of all people who travel by public transport into a city’s metropolitan area. We assign all external commuters to the nearest gateway station to where they live. During each commuting timestep in the simulation, people travelling through the same gateway station are randomly split into ‘carriages’ containing people with whom they have the potential to interact. Similarly, internal commuters are also split into carriages and able to interact with each other.  
3.  During a commute timestep, each carriage is assigned to be travelling at ‘peak time’ with an 80% probability.  
4.  The default number of people in an average carriage is fixed to 50 people. For each city this number is adjusted in proportion to data from the UK Department for Transport (DfT) data on overcrowding in trains [34]. This data also disaggregates at the level of peak or offpeak travel which is used to further adjust the filling of carriages.  
5.  The commuting timestep is run twice a day in order to simulate commuting in each direction. 
Appendix B. Timesteps
As mentioned in §4.1,
index  calendar time  allowed activities 

0  08.00–09.00  M, R, C 
1  09.00–17.00  M, P, L, R 
2  17.00–18.00  M, R, C 
3  18.00–21.00  M, L, R 
4  21.00–08.00  M, R 
index  calendar time  allowed activities 

0  08.00–12.00  M, R, L 
1  12.00–16.00  M, R, L 
2  16.00–20.00  M, R, L 
3  20.00–08.00  M, R 
When choosing the timesteps, we aimed to choose the lengths such that they are somewhat close to the characteristic time of interaction of activities allowed in that timestep, but also not choosing so many timesteps to overfragment the simulation. For instance, the weekday timestep with index 1 (09.00–17.00) is 8 h, and matches the primary activities of ‘school’ and ‘work’, even though the ‘leisure’ activity (which is allowed for old adults who are not assigned a workplace) has a characteristic time of 3 h. Breaking this in half would better match the leisure characteristic time (3 h) for this timestep, but would mean that all individuals in the simulation would be reassigned an activity for the second of the two timesteps. Even though the vast majority would be reassigned to their same, required ‘primary activity’, causing needless computation.
Appendix C. Contact matrices
We use the contact matrices from the BBC Pandemic survey [15] and supplement them with the
To extract mixing matrices that are suitable for our contextspecific simulation, we have to correct for the fact that the reported matrices average over the corresponding age bins in the UK population. For example, contacts between teachers and school children are normalized to the full UK population in the respective age bin instead of the number of teachers in schools that actually participate in the interaction. This necessitates rescaling to the number of people in the social context to arrive at corrected social interaction matrices ${\overline{\mathcal{M}}}_{ij}^{(H,W,S,O)}$. This correction step will be detailed in the relevant subsections below.
C.1. Social mixing at work
The matrices for the agedependent interaction frequency at the workplace show only a very mild correlation with age, typically favouring interactions of workers with a similar age by about a factor of 2. We will therefore not include age effects at the workplace into the matrices used in
C.2. Social mixing in schools
We decompose school populations into year groups labelled with indices i ∈ {1, 2, …, N} for a school with N year groups and denote teachers with T. Starting with the interaction of pupils in various year groups an apparent large asymmetry emerges between the summed number of interactions of pupils with adults in the school and of adults with pupils in the BBC dataset. This, however, is easily explained by realizing that the number of interaction in a given context is normalized to the fraction of the population in a given age bin, irrespective of whether they can participate in the interaction or not. This means that the number of interactions between teachers and pupils have to be renormalized to the ratio of teachers in the adult population—about 500 000 teachers out of 36 300 000 adults, with about 216 000 working in primary and 208 000 working in secondary schools.
Summing the number of interactions of children in the age range of 5–17 with adults in the range 25–65 in schools, and assuming the latter are all teachers yields an average of 0.75 pupil–teacher interactions ($0.06=8\text{\%}$ of them physical) per day with very little dependence on the children’s age. Conversely, adults have about 0.2 ($0.02=10\text{\%}$ of them physical) interactions per day with children in schools, again, relatively independent of the age of the children. Normalizing this to the number of teachers in the population, we arrive at about 15 teacher–pupil interactions per day, which fits very well to approximate teacher–pupil ratios of 1 : 20–1 : 25.^{10} We therefore assume that the individual interaction frequency of one specific teacher–pupil pair is consistently described with 0.75/day. For interactions among adults in the school setting, we include the interaction of parents with teachers and of parents among themselves, thereby blurring the picture. We therefore assume that teachers inherit the daily contact frequencies from the workplace mixing above. Turning finally to the interactions amongst children, we see a very dominant correlation in age. In order to capture this, we assume that per year of agedifference the number of interactions among children in school, ${n}_{KK}^{(S)}$, will be reduced by a factor ξ. By fitting to the combination of BBC and
As a consequence, we obtain the following social interaction frequency matrix for individual pairings at schools:
C.3. Social mixing at home
In our model, we decompose the household population into four subgroups, namely children (K, ages 0–19), young adults (Y, 18–24), adults (A, 25–64) and older adults (O, 65+). We therefore arrive at a 4 × 4 matrix of corrected social interactions at home, ${\overline{\mathcal{M}}}_{ij}^{(H)}$, where the indices i, j ∈ {K, Y, A, O}. In the following, we will detail how we arrive at the various matrix elements. When correcting for the impact of social environment, i.e. the household compositions, we will ignore household compositions which are listed as ‘other’ in the ONS database, due to a lack of detailed information (see §3.2 for more details). When using these data, we will use numbers in units of millions, H_{OAYK} of households with a composition of O older adults, A adults, Y independent children or young adults living at home and K children aged 0–19.
—  ${\overline{\mathcal{M}}}_{\mathrm{OO}}^{(\mathrm{H})}$: we ignore the case of care homes or other facilities with more than two residents. Then the average interaction frequency from the BBC data is given by ${n}_{\mathrm{OO}}^{(H)}=0.78$ (0.44 physical) and 0.62 at weekends.^{11} With H_{2000} = 2.131 and H_{1000} = 3.294.^{12} $${\overline{\mathcal{M}}}_{OO}^{(H)}=0.78\times \frac{2{H}_{2000}+{H}_{1000}}{2{H}_{2000}}\approx \mathrm{1.4.}$$C 3  
—  ${\overline{\mathcal{M}}}_{AA}^{(H)}$: the interaction frequency between adults aged 20–65 at home from the BBC data is given by ${n}_{AA}^{(H)}=1.2$ ($0.74=62\text{\%}$ of them physical). $${\overline{\mathcal{M}}}_{AA}^{(H)}=1.2\times \frac{\sum _{x,y}(2{H}_{02xy}+{H}_{01xy})}{\sum _{x,y}2{H}_{02xy}}\approx 1.34,$$C 4 where $\sum _{x,y}{H}_{02xy}=8.751$ and $\sum _{x,y}{H}_{01xy}=7.644$.  
—  ${\overline{\mathcal{M}}}_{YY}^{(H)}$: the interaction frequency between young adults age 18–26 at home from the BBC data is given by ${n}_{YY}^{(H)}=1.3$ ($0.4=34\text{\%}$ of them physical). There is no obvious household correction that we can apply, but the number of contacts is relatively close to the value of ${\overline{\mathcal{M}}}_{AA}^{(H)}=1.34$, so we will assume that young adults interact with each other with a frequency similar to that of adults $${\overline{\mathcal{M}}}_{YY}^{(H)}={\overline{\mathcal{M}}}_{AA}^{(H)}.$$C 5 It is worth noting that the age range for young adults is relatively narrow, and that there will be edge effects that may effectively increase the interaction frequency.  
—  ${\overline{\mathcal{M}}}_{YA}^{(H)}$ and ${\overline{\mathcal{M}}}_{AY}^{(H)}$: we have ${n}_{YA}^{(H)}\approx 0.7$ with a relatively steep decline with the age of the young adults, which we attribute to the fact that with increasing age young adults move out of their parents’ home. To obtain some better understanding of the situation, we look at the interaction of adults in the age range 40–65 with young adults, aged 18–24. From this, we arrive at an average of ${n}_{AY}^{(H)}=0.17$ ($0.07=40\text{\%}$ of them physical). To relate this to a corrected value, we must make an assumption concerning the number of young adults in the three age bins that still live with their parents, which we take as $75\text{\%}$, $50\text{\%}$ and $40\text{\%}$ for the three age bins. To correct the AY number, we assume that the majority of households with young adults living as nondependent children with their parents is composed of households with one young adult. Therefore, $$\left.\begin{array}{rl}& {\overline{\mathcal{M}}}_{YA}^{(H)}=\frac{1}{3}\left[\frac{0.87}{0.75}+\frac{0.65}{0.5}+\frac{0.55}{0.4}\right]\approx 1.3\hfill \\ \mathrm{and}\phantom{\rule{1em}{0ex}}\hfill & {\overline{\mathcal{M}}}_{AY}^{(H)}=0.17\cdot \frac{\sum _{xy}(2{H}_{02xy}+{H}_{01xy})}{\sum _{y}(2{H}_{021y}+{H}_{011y})}\approx 1.47,\hfill \end{array}\right\}$$C 6 where $\sum _{y}{H}_{021y}=1.514$ and $\sum _{y}{H}_{011y}=0.946$.  
—  ${\overline{\mathcal{M}}}_{KK}^{(H)}$: the average number of daily contacts at home between children age 0–17 is ${n}_{KK}^{(H)}=0.47$ (79% of them physical). Assuming all children live as dependents with their parents, and demanding that households with ‘2 or more children’ (ONS classification) have, on average, 2.3 children to account for the UK reproduction rate, we arrive at $${\overline{\mathcal{M}}}_{AA}^{(H)}=0.87\cdot \frac{\sum _{x}({H}_{02x1}+{H}_{01x1})+2.3({H}_{02x2}+{H}_{0.1x2})}{\sum _{x}2.3({H}_{02x2}+{H}_{0.1x2})}\approx \mathrm{1.2.}$$C 7  
—  ${\overline{\mathcal{M}}}_{KA}^{(H)}$ and ${\overline{\mathcal{M}}}_{AK}^{(H)}$: to account for contacts of children with adults we will use sliding age windows in dependence on the age of the child, using that parents are usually between 20 and 40 years older than their children. We then arrive at ${n}_{KA}^{(H)}=1.27$ (70% of them physical) and ${n}_{AK}^{(H)}=0.67$, the former with an only mild dependence on the age of the child, while the latter shows clear edge effects for the first and last bins of the adult age distribution. These numbers translate into $${\overline{\mathcal{M}}}_{KA}^{(H)}=1.27$$C 8 and$${\overline{\mathcal{M}}}_{AK}^{(H)}=0.67\cdot \frac{\sum _{x,y}(2{H}_{02xy}+{H}_{01xy)}}{\sum _{[]x,y}2({H}_{02x1}+{H}_{02x2})+({H}_{01x1}+{H}_{01x2})]}\approx {\mathrm{1.69.}}^{13}$$C 9 ^{}We will also assume that the interaction frequency and intensity of children and young adults living in the same household is determined by$${\overline{\mathcal{M}}}_{KY}^{(H)}={\overline{\mathcal{M}}}_{KA}^{(H)}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\overline{\mathcal{M}}}_{YK}^{(H)}={\overline{\mathcal{M}}}_{AK}^{(H)}.$$C 10  
— ${\overline{\mathcal{M}}}_{O,KYA}^{(H)}$ and ${\overline{\mathcal{M}}}_{KYA,O}^{(H)}$: we assume that interactions of children, young adults and adults with older adults at home have three different realizations:

C.4. Social mixing in other venues
Social venues (pubs, cinemas and groceries) in
Hospitals have three subgroups: medical staff, ward patients and ICU/ITU patients. The social mixing matrix for hospitals (where the superscript M refers to ‘medical facility’) is
Social mixing in care homes considers three subgroups: workers, residents and visitors, with matrix
Finally, universities are modelled as having six groups to represent professors and five distinct groups of students (for the moment based only on age 19–23), with diagonal elements ${\overline{\mathcal{M}}}_{i=j}^{(U)}=2$ and offdiagonal elements ${\overline{\mathcal{M}}}_{i\ne j}^{(U)}=0.75$, and all ${\varphi}_{ij}^{(U)}=0.25$.
C.5. Deriving contact matrices from
We derive the contact matrices in figure 9 by simulating a week of prelockdown activity. For each person, in each subgroup i, in each venue, we choose the required N_{ij} people (with replacement) for all (nonempty) subgroups j in that venue (where N_{ij} is from the relevant social mixing matrix). We populate ‘raw’ contact matrices using these selected people. As these contacts are then unidirectional, we make the same corrections as in [15] to account for reciprocal contacts. We hope to produce contact matrices derived from constructing selfconsistent (reciprocal) networks of contacts within groups in future work.
Appendix D. Details on modelling health trajectories
For the times spent in different stages of disease progression, we use a variety of functions, namely intervals of constant length, scaled and shifted β functions, scaled lognormal distributions and exponential Weibull distributions, given by
name  function  source 

C_{T}  constant with time T  
β_{I}  β_{2.29,19.05,0.39,39.8}(t)  [40] 
LN_{M}  LN_{0.83,5.7}(t)  * 
β_{H}  β_{1.35,3.68,0.05,27.1}(t)  [96] 
β_{D}  β_{1.21,1.97,0.08,12.9}(t)  [96] 
${\mathrm{LN}}_{\mathrm{ICU}}$  LN_{1.41,0.9}(t)  [97] 
e_{ICU}  e_{1.06,0.89,12}(t)  [97] 
e_{D}  e_{1.23,1,9.69}(t)  [97] 
Appendix E. Calibration via Bayes linear emulation and history matching
We now provide more details of the Bayes linear emulation and history matching process outlined in §8. To set up the history matching problem, we identify a large set of 18 input parameters to the
input parameter (x_{i})  type  range 

${\beta}_{\hspace{0.17em}\mathrm{pub}}$  contact intensity  [0.02,0.6] 
${\beta}_{\mathrm{grocery}}$  .  [0.02,0.6] 
${\beta}_{\mathrm{cinema}}$  .  [0.02,0.6] 
${\beta}_{\mathrm{university}}$  .  [0.02,0.6] 
${\beta}_{\mathrm{city}\hspace{0.17em}\mathrm{transport}}$  .  [0.08,0.77] 
${\beta}_{\mathrm{intercity}\hspace{0.17em}\mathrm{transport}}$  .  [0.08,1.2] 
${\beta}_{\mathrm{hospital}}$  .  [0.08,1.2] 
${\beta}_{\mathrm{care}\hspace{0.17em}\mathrm{home}}$  .  [0.08,1.2] 
${\beta}_{\mathrm{company}}$  .  [0.08,1.2] 
${\beta}_{\mathrm{school}}$  .  [0.08,1.2] 
${\beta}_{\mathrm{household}}$  .  [0.08,1.2] 
${\beta}_{\mathrm{care}\hspace{0.17em}\mathrm{visits}}$  .  [0.1,8] 
${\beta}_{\mathrm{household}\hspace{0.17em}\mathrm{visits}}$  .  [0.08,1.2] 
${\alpha}_{\hspace{0.17em}\mathrm{physical}}$  physical contact factor  [1.8,3] 
${\alpha}_{\mathrm{seed}\hspace{0.17em}\mathrm{strength}}$  seeding  [0.5,1.3] 
${M}_{\mathrm{quarantine}\hspace{0.17em}\mathrm{household}\hspace{0.17em}\mathrm{compliance}}$  compliance  [0.034,0.26] 
${M}_{\mathrm{social}\hspace{0.17em}\mathrm{distancing}\hspace{0.17em}\mathrm{beta}\hspace{0.17em}\mathrm{factor}}$  social distancing  [0.65,0.95] 
${M}_{\mathrm{sd}4\hspace{0.17em}\mathrm{random}\hspace{0.17em}\mathrm{factor}\hspace{0.17em}\mathrm{all}}$  social distancing  [0.004,0.5] 
We note that the
We perform an initial space filling set of n = 125 runs $D=(f({x}^{(1)}),f({x}^{(2)}),\dots ,f({x}^{(n)}))$ with the ${x}^{(i)}\in {\mathcal{X}}_{0}$ chosen using a maximin Latin hypercube design. The emulators are updated by the runs D using the Bayes linear update equations [75], and hence can give a prediction with corresponding uncertainty, of the unobserved f(x) at a new, previously unevaluated input point x, in the form of the adjusted expectation E_{D}(f_{i}(x)) and the adjusted variance Var_{D}(f_{i}(x)), respectively. The emulators have to satisfy extensive diagnostics [75,98], an illustrative example of which is given in figure 28a, which shows the emulator prediction E_{D}(f_{i}(x)) for f_{i}(x) across several time points (the solid red line) and the prediction interval ${\mathrm{E}}_{D}({f}_{i}(x))\pm 3\sqrt{{\mathrm{Var}}_{D}({f}_{i}(x))}$ (the red dashed lines) along with the held out run output f(x) (the blue line) which the emulator has not previously seen, showing excellent agreement between emulator and model. The emulator evaluation takes a fraction of a second, and mimics the
By confronting the emulators with the observed data vector z corresponding to the outputs in f, and incorporating major sources of uncertainty (e.g. observation error, structural model discrepancy, stochasticity), we can rule out large parts of the input parameter space ${\mathcal{X}}_{0}$ as implausible. We do this using an implausibility measure, for which the univariate version I_{i}(x), is defined for each output as
There are various ways to combine implausibility measures for each of the individual outputs, the simplest being to maximize: I_{M}(x) = max_{i} I_{i}(x), that is to take the maximum implausibility across all outputs of interest, which is the measure chosen here, although we note that other more nuanced and/or robust versions are available, that capture more of the multivariate behaviour [75].
We now employ iterative history matching [75], a parameter search method that seeks to identify all parts of parameter space that would give rise to acceptable matches between model output and observational data. This proceeds at the jth iteration (or wave), by constructing emulators using the current set of runs, removing the implausible parts of the input space to define the new nonimplausible region ${\mathcal{X}}_{j}=\{x\in {\mathcal{X}}_{0}\hspace{0.17em}:\hspace{0.17em}{I}_{M}(x)<c\}$, designing and performing a new space filling set of runs across the reduced input space ${\mathcal{X}}_{j}$ and reemulating, but now with a more accurate emulator defined only over the reduced region ${\mathcal{X}}_{\hspace{0.17em}j}$. For further discussion see [75,81,82], but it suffices to note that the iterative nature of history matching is key, as it allows later iteration emulators to become far more accurate as they are only employed over far smaller parts of the input space, and are hence informed by a much higher density of runs.
The observed data for total deaths were obtained from the ONS, while the hospital deaths data is taken from CPNS—the Covid Patient Notification System [72]. For each output corresponding to the element of f, the data were first smoothed slightly with a standard kernel smoother, to reduce the daytoday stochasticity. The observation error and model discrepancy variances for each output were each decomposed into multiplicative and additive components to represent possible systematic biases, in addition to a scaled $\sqrt{n}$ component for the observation error only, to model the noisy count process. For example, we have the decompositions ${\sigma}_{{\u03f5}_{i}}^{2}={\alpha}_{\mathrm{mult},{\u03f5}_{i}}^{2}{z}_{i}^{2}+{\gamma}_{\mathrm{add},{\u03f5}_{i}}^{2}$, with ${\alpha}_{\mathrm{mult},{\u03f5}_{i}}=0.06$ and ${\gamma}_{\mathrm{add},{\u03f5}_{i}}^{2}=3/2$, and ${\sigma}_{{e}_{i}}^{2}={\alpha}_{\mathrm{mult},{e}_{i}}^{2}{z}_{i}^{2}+{\gamma}_{\mathrm{add},{e}_{i}}^{2}+{({\delta}_{\mathrm{corr},{e}_{i}}\sqrt{{z}_{i}})}^{2}$, with ${\alpha}_{\mathrm{mult},{e}_{i}}=0.06$ and ${\gamma}_{\mathrm{add},{e}_{i}}^{2}=3/2$ and ${\delta}_{\mathrm{corr},{e}_{i}}=0.25$ governed by the mitigation of the smoothing process.
As described in §8, we performed three waves of the history match with 125 runs each wave, finding that the emulators were of sufficient accuracy after the third wave. Figure 28b shows the progression of the runs from iterations 1, 2 and 3 used in the history matching process (in purple, green and red, respectively) for the daily hospital deaths in England output, with the data (original and smoothed) in black. We can see that the third iteration runs are vastly improved and surround the observed data. These allow accurate emulators to be constructed that can identify the region of input space of interest, which were used to construct figure 22, as discussed in §8.
Footnotes
1 Indeed, many models also feature some optimizing behaviour of individuals as artificial intelligencetype actors against randomly drawn welfare functions, e.g. [2].
2 We will use the term ‘sex’ in regard to chromosomal differentiation throughout this paper rather than gender. At the time of writing a full classification of the impact of chromosomal sex versus gender identification on the epidemiology of COVID19 is unavailable. Within our modelling framework, nested and nonnested identifiers can be constructed to map sex and gender should more granular statistical data be available.
3 A full open source code base and implementation examples are linked here: GitHub: https://github.com/IDASDurham/JUNE and PyPI: https://pypi.org/project/june/. The version used for this paper is v. 1.0
4 A forthcoming publication will discuss the use of secondary data to further constrain the uncertainty in the household construction and the subsequent impact on simulating the spread of COVID–19.
5 Some NHS trusts share resources and exchange patients across regions in an ad hoc manner; however, this is not modelled explicitly.
6 These probabilities can be generalized to depend on any attributes of the individual given reliable data.
7 In reality, the β parameters are fitted in a locationspecific way, irrespective of the group—i.e. a location of type L in containing one group of people, g_{1}, and another location of the same type, but with a different group of people, g_{2}, (e.g. two pubs in different places) will have the same β.
8
10 In fact, for primary schools, the average class size is about 21 pupils, while for secondary schools it is about 16 pupils [25].
11 One may speculate in how far this drop is a reflection of uncertainties in the data or a true ‘physical’ effect, for example due to visitors, travel or similar.
12 Here and in the following, the numbers of different household configurations are taken from [95].
References
 1.
Russell RE, Katz RA, Richgels K, Walsh DP, Grant E . 2017 A framework for modeling emerging diseases to inform management. Emerg. Infect. Dis. 23, 16. (doi:10.3201/eid2301.161452) Crossref, PubMed, Web of Science, Google Scholar  2.
Brandon N, Dionisio KL, Isaacs K, TorneroVelez R, Kapraun D, Setzer RW, Price PS . 2018 Simulating exposurerelated behaviors using agentbased models embedded with needsbased artificial intelligence. J. Expo. Sci. Environ. Epidemiol. 30, 184193. (doi:10.1038/s413700180052y) Crossref, PubMed, Web of Science, Google Scholar  3.
Auchincloss AH, Gebreab SY, Mair C, Diez Roux AV . 2012 A review of spatial methods in epidemiology, 2000–2010. Annu. Rev. Public Health 33, 107122. (doi:10.1146/annurevpublhealth031811124655) Crossref, PubMed, Web of Science, Google Scholar  4.
ElSayed AM, Scarborough P, Seemann L, Galea S . 2012 Social network analysis and agentbased modeling in social epidemiology. Epidemiol. Perspect. Innov. 9, 1. (doi:10.1186/1742557391) Crossref, PubMed, Google Scholar  5.
Rockett RJ et al. 2020 Revealing COVID19 transmission in Australia by SARSCoV2 genome sequencing and agentbased modeling. Nat. Med. 26, 13981404. (doi:10.1038/s4159102010007) Crossref, PubMed, Web of Science, Google Scholar  6.
Ferguson NM, Cummings DAT, Fraser C, Cajka JC, Cooley PC, Burke DS . 2006 Strategies for mitigating an influenza pandemic. Nature 442, 448452. (doi:10.1038/nature04795) Crossref, PubMed, Web of Science, Google Scholar  7.
Chao DL, Halloran ME, Obenchain VJ, Longini IM . 2010 Flute, a publicly available stochastic influenza epidemic simulation model. PLoS Comput. Biol. 6, e1000656. (doi:10.1371/journal.pcbi.1000656) Crossref, PubMed, Web of Science, Google Scholar  8.
Hunter E, Mac Namee B, Kelleher JD . 2017 A taxonomy for agentbased models in human infectious disease epidemiology. J. Artif. Soc. Soc. Simul. 20, 2. (doi:10.18564/jasss.3414) Crossref, Web of Science, Google Scholar  9.
Abar S, Theodoropoulos GK, Lemarinier P, O’Hare GMP . 2017 Agent based modelling and simulation tools: a review of the stateofart software. Comput. Sci. Rev. 24, 1333. (doi:10.1016/j.cosrev.2017.03.001) Crossref, Web of Science, Google Scholar  10.
Mossong J et al. 2008 Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Med. 5, e74. (doi:10.1371/journal.pmed.0050074) Crossref, PubMed, Web of Science, Google Scholar  11.
Williamson EJ et al. 2020 Factors associated with COVID19related death using opensafely. Nature 584, 430436. (doi:10.1038/s4158602025214) Crossref, PubMed, Web of Science, Google Scholar  12.
Zoellner C, Jennings R, Wiedmann M, Ivanek R . 2019 Enable: an agentbased model to understand listeria dynamics in food processing facilities. Sci. Rep. 9, 114. (doi:10.1038/s4159801836654z) Crossref, PubMed, Web of Science, Google Scholar  13.
Lund AM, Gouripeddi R, Facelli JC . 2020 Stham: an agent based model for simulating human exposure across high resolution spatiotemporal domains. J. Expo. Sci. Environ. Epidemiol. 30, 459468. (doi:10.1038/s4137002002164) Crossref, PubMed, Web of Science, Google Scholar  14. Mei S, Guan H, Wang Q. 2018 An overview on the convergence of high performance computing and big data processing. In 2018 IEEE 24th Int. Conf. on Parallel and Distributed Systems (ICPADS), pp. 1046–1051. IEEE. Google Scholar
 15.
Klepac P, Kucharski AJ, Conlan AJK, Kissler S, Tang M, Fry H, Gog JR . 2020 Contacts in context: largescale settingspecific social mixing matrices from the BBC pandemic project. medRxiv. [Preprint] (doi:10.1101/2020.02.16.20023754) Google Scholar  16. Office for National Statistics. QS103EW (age by single year). See https://www.nomisweb.co.uk/census/2011/qs103ew. Google Scholar
 17. Office for National Statistics. Sex by age. https://www.nomisweb.co.uk/census/2011/lc1117ew. Google Scholar
 18. Office for National Statistics. DC2101EW (ethnic group by sex by age). See https://www.nomisweb.co.uk/census/2011/dc2101ew. Google Scholar
 19. Ministry of Housing, Communities & Local Government. English indices of deprivation 2019. See https://www.gov.uk/government/statistics/englishindicesofdeprivation2019. Google Scholar
 20. Office for National Statistics. KS105EW (household composition). https://www.nomisweb.co.uk/census/2011/ks105ew. Google Scholar
 21. Office for National Statistics. KS405UK (communal establishment residents). See https://www.nomisweb.co.uk/census/2011/ks405uk. Google Scholar
 22. Office for National Statistics. DC1104EW (residence type by sex by age). See https://www.nomisweb.co.uk/census/2011/dc1104ew. Google Scholar
 23. Education and Skills Funding Agency. UK Register of Learning Providers. https://www.ukrlp.co.uk/. Google Scholar
 24. Department for Transport. National travel survey 2014: Travel to school. See https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/476635/traveltoschool.pdf. Google Scholar
 25. Department for Education. Class size and education in england evidence report. See https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/183364/DFERR169.pdf. Google Scholar
 26. Office for National Statistics. UK SIC 2007. See https://www.ons.gov.uk/methodology/classificationsandstandards/ukstandardindustrialclassificationofeconomicactivities/uksic2007. Google Scholar
 27. Office for National Statistics. 2011 UK Business Counts – enterprises by industry and employment size band. See https://www.nomisweb.co.uk/datasets/idbrent. Google Scholar
 28. Office for National Statistics. KS605EWKS607EW (industry by sex). See https://www.nomisweb.co.uk/census/2011/ks605ew. Google Scholar
 29. Office for National Statistics. WU01EW (location of usual residence and place of work by sex). See https://www.nomisweb.co.uk/census/2011/wu01ew. Google Scholar
 30.
Klepac P, Kissler S, Gog J . 2018 Contagion! the BBC four pandemic – the model behind the documentary. Epidemics 24, 4959. (doi:10.1016/j.epidem.2018.03.003) Crossref, PubMed, Web of Science, Google Scholar  31. 2015 Gershuny and O. Sullivan: United Kingdom time use survey, 20142015. Technical Report SN: 8128, UK Data Service. Google Scholar
 32. OpenStreetMap contributors. 2017 Planet dump retrieved from https://planet.osm.org. https://www.openstreetmap.org. Google Scholar
 33. Age UK. Briefing: Health and care of older people in England 2019. See https://www.ageuk.org.uk/globalassets/ageuk/documents/reportsandpublications/reportsandbriefings/healthwellbeing/age_uk_briefing_state_of_health_and_care_of_older_people_july2019.pdf (accessed 14 December 2020). Google Scholar
 34. UK Department for Transport. 2011 RAI0201 (city centre peak and all day arrivals and departures by rail on a typical autumn weekday, by city). See https://www.gov.uk/government/statisticaldatasets/rai02capacityandovercrowding. Google Scholar
 35. Office for National Statistics. QS701EW (method of travel to work). See https://www.nomisweb.co.uk/census/2011/qs701ew. Google Scholar
 36. Office for National Statistics. Output Area (2011) to Major Towns and Cities (December 2015) Lookup in England and Wales. See https://geoportal.statistics.gov.uk/datasets/78ff27e752e44c3194617017f3f15929, 2015. Google Scholar
 37. National Rail. 2015 Maps of the national rail network of Great Britain. See https://www.nationalrail.co.uk/stations_destinations/maps.aspx. Google Scholar
 38.
Dong Y, Mo X, Hu Y, Qi X, Jiang F, Jiang Z, Tong S . 2020 Epidemiology of COVID19 among children in China. Pediatrics 145. (doi:10.1542/peds.20200702) Crossref, PubMed, Web of Science, Google Scholar  39.
Lee PI, Hu YL, Chen PY, Huang YC, Hsueh PR . 2020 Are children less susceptible to COVID19? J. Microbiol. Immunol. Infect. 53, 371. (doi:10.1016/j.jmii.2020.02.011) Crossref, PubMed, Google Scholar  40.
He X et al. 2020 Temporal dynamics in viral shedding and transmissibility of COVID19. Nat. Med. 26, 672675. (doi:10.1038/s4159102008695) Crossref, PubMed, Web of Science, Google Scholar  41.
Hinch R et al. 2020 OpenABMCovid19 – an agentbased model for nonpharmaceutical interventions against COVID19 including contact tracing. medRxiv. [Preprint] (doi:10.1101/2020.09.16.20195925) Google Scholar  42.
Ward H et al. 2021 SARSCoV2 in England following first peak of the pandemic. Nat. Comms. 12, 905. (doi:10.1038/s4146702121237w) Crossref, PubMed, Web of Science, Google Scholar  43. Office for National Statistics. Deaths registered weekly in England and Wales, provisional. See https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/weeklyprovisionalfiguresondeathsregisteredinenglandandwales. Google Scholar
 44. Department of Health & Social Care. Vivaldi 1: Covid19 care homes study report. See https://www.gov.uk/government/publications/vivaldi1coronaviruscovid19carehomesstudyreport/vivaldi1covid19carehomesstudyreport#fn:1. Google Scholar
 45.
Pollán M et al. 2020 Prevalence of SARSCoV2 in Spain (ENECOVID): a nationwide, populationbased seroepidemiological study. Lancet 396, 535544. (doi:10.1016/S01406736(20)314835) Crossref, PubMed, Web of Science, Google Scholar  46.
Riccardo F et al. 2020 Epidemiological characteristics of Covid19 cases and estimates of the reproductive numbers 1 month into the epidemic, Italy, 28 January to 31 March 2020. Euro Surveill. 25, 2000790. (doi:10.2807/15607917.ES.2020.25.49.2000790) Crossref, Google Scholar  47. Office for National Statistics. Population estimates for the UK, England and Wales, Scotland and Northern Ireland: mid2019. https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/bulletins/annualmidyearpopulationestimates/latest. Google Scholar
 48. Office for National Statistics. Lc1105ew residence type by sex by age. See https://www.nomisweb.co.uk/census/2011/lc1105ew. Google Scholar
 49. Office for National Statistics. Deaths involving Covid19 in the care sector, England and Wales: deaths occurring up to 12 June 2020 and registered up to 20 June 2020 (provisional). See https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/deathsinvolvingcovid19inthecaresectorenglandandwales/deathsoccurringupto12june2020andregisteredupto20june2020provisional. Google Scholar
 50. Scientific Advisory Group for Emergencies. Dynamic COCIN report to SAGE and NERVTAG – 30 June 2020. See https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/903395/S0612_Dynamic_COCIN_report_to_SAGE_and_NERVTAG.pdf. Google Scholar
 51. Public Health England (PHE). Covid19 hospitalisation in England surveillance system (CHESS) – daily reporting. See https://www.england.nhs.uk/coronavirus/wpcontent/uploads/sites/52/2020/03/phelettertotrustsredailycovid19hospitalsurveillance11march2020.pdf (accessed 14 December 2020). Google Scholar
 52. NHS. Covid19 situation reports. See https://digital.nhs.uk/aboutnhsdigital/corporateinformationanddocuments/directionsanddataprovisionnotices/dataprovisionnoticesdpns/covid19situationreports (accessed 14 December 2020). Google Scholar
 53. Office for National Statistics. Deaths involving COVID19 in the care sector, England and Wales. See https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/deathsinvolvingcovid19inthecaresectorenglandandwales/deathsoccurringupto12june2020andregisteredupto20june2020provisional#characteristicsofcarehomeresidentswhodiedfromcovid19. Google Scholar
 54.
Brazeau N et al. 2020 Covid19 infection fatality ratio: estimates from seroprevalence. Report 34. Imperial College London. (doi:10.25561/83545) Google Scholar  55.
Chu DK, Akl EA, Duda S, Solo K, Yaacoub S, Schünemann HJ . 2020 Physical distancing, face masks, and eye protection to prevent persontoperson transmission of SARSCoV2 and COVID19: a systematic review and metaanalysis. Lancet 395, 19731987. (doi:10.1016/S01406736(20)311429) Crossref, PubMed, Web of Science, Google Scholar  56.
Fischer EP, Fischer MC, Grass D, Henrion I, Warren WS, Westman E . 2020 Lowcost measurement of face mask efficacy for filtering expelled droplets during speech. Sci. Adv. 6, eabd3083. (doi:10.1126/sciadv.abd3083) Crossref, PubMed, Web of Science, Google Scholar  57.
Howard J et al. 2021 An evidence review of face masks against COVID19. PNAS 118, e2014564118. (doi:10.1073/pnas.2014564118) Crossref, PubMed, Web of Science, Google Scholar  58.
Suen CY, Leung HH, Lam KW, Karen PH, Chan MY, Kwan JK . 2020 Feasibility of reusing surgical mask under different disinfection treatments. medRxiv. (doi:10.1101/2020.05.16.20102178) PubMed, Google Scholar  59.
Toomey EC et al. 2020 Extended use or reuse of singleuse surgical masks and filtering facepiece respirators during the coronavirus disease 2019 (COVID19) pandemic: a rapid systematic review. Infect. Control Hosp. Epidemiol. 8, 19. (doi:10.37473/fic/10.1017/ice.2020.1243) Google Scholar  60.
Liang M, Gao L, Cheng C, Zhou Q, Uy JP, Heiner K, Sun C . 2020 Efficacy of face mask in preventing respiratory virus transmission: a systematic review and metaanalysis. Travel Med. Infect. Dis. 36, 101 751101 751. (doi:10.1016/j.tmaid.2020.101751) Crossref, Web of Science, Google Scholar  61. YouGov. 2020 YouGov COVID19 behaviour changes tracker: wearing a face mask when in public places. https://yougov.co.uk/topics/international/articlesreports/2020/03/17/personalmeasurestakenavoidcovid19. Google Scholar
 62. Office for National Statistics. 2020 Coronavirus and key workers in the UK. https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/earningsandworkinghours/articles/coronavirusandkeyworkersintheuk/20200515. Google Scholar
 63. Institute for Fiscal Studies. 2020 Sector shutdowns during the coronavirus crisis: which workers are most exposed? https://www.ifs.org.uk/publications/14791. Google Scholar
 64. HM Revenue & Customs. 2020 HMRC coronavirus (COVID19) statistics. https://www.gov.uk/government/collections/hmrccoronaviruscovid19statistics#CoronavirusJobRetentionSchemeManagementinformation. Google Scholar
 65. YouGov. 2020 YouGov COVID19 behaviour changes tracker: avoiding going to work. https://yougov.co.uk/topics/international/articlesreports/2020/03/17/personalmeasurestakenavoidcovid19. Google Scholar
 66. Office for National Statistics. 2020 Coronavirus and the latest indicators for the UK economy and society: 29 October 2020. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronavirustheukeconomyandsocietyfasterindicators/29october2020. Google Scholar
 67. UK Department for Education. 2020 Attendance in education and early years settings during the coronavirus (COVID19) outbreak. https://www.gov.uk/government/collections/attendanceineducationandearlyyearssettingsduringthecoronaviruscovid19outbreak. Google Scholar
 68. OpenTable. 2020 The state of the restaurant industry. https://www.opentable.com/stateofindustry. Google Scholar
 69. Government of the United Kingdom. 2020 Get a discount with the eat out to help out scheme. See https://www.gov.uk/guidance/getadiscountwiththeeatouttohelpoutscheme. Google Scholar
 70. Office for National Statistics. 2020 Coronavirus and the social impacts on Great Britain: 25 September 2020. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthandwellbeing/bulletins/coronavirusandthesocialimpactsongreatbritain/25september2020. Google Scholar
 71.
Bullock J et al. In preparation. June: a Bayesian uncertainty analysis. Google Scholar  72. Covid19 patient notification system (CPNS) user guide. See https://www.england.nhs.uk/statistics/wpcontent/uploads/sites/2/2020/09/CPNSUserGuide20200831.pdf. (accessed 14 December 2020). Google Scholar
 73. IDAS Covid group. In preparation. Social differences in SarsCov2 infection spread — draft. Google Scholar
 74.
Craig PS, Goldstein M, Seheult AH, Smith JA . 1997 Pressure matching for hydrocarbon reservoirs: a case study in the use of Bayes linear strategies for large computer experiments (with discussion). In Case studies in Bayesian statistics (eds C. Gatsonis, J.S. Hodges, R.E. Kass, R. McCulloch, P. Rossi, N. D. Singpurwalla), vol. 3, pp. 36–93. New York, NY: Springer. Google Scholar  75.
Vernon I, Goldstein M, Bower RG . 2010 Galaxy formation: a Bayesian uncertainty analysis. Bayesian Anal. 5, 619670. (doi:10.1214/10ba524) Crossref, Web of Science, Google Scholar  76.
Andrianakis I, Vernon I, McCreesh N, McKinley TJ, Oakley JE, Nsubuga R, Goldstein M, White RG . 2015 Bayesian history matching of complex infectious disease models using emulation: a tutorial and a case study on HIV in Uganda. PLoS Comput. Biol. 11, e1003968. (doi:10.1371/journal.pcbi.1003968) Crossref, PubMed, Web of Science, Google Scholar  77.
Andrianakis I, Vernon I, McCreesh N, McKinley TJ, Oakley JE, Nsubuga RN, Goldstein M, White RG . 2017 History matching of a complex epidemiological model of human immunodeficiency virus transmission by using variance emulation. J. R. Stat. Soc. C (Appl. Stat.) 66, 717740. (doi:10.1111/rssc.12198) Crossref, PubMed, Web of Science, Google Scholar  78.
Andrianakis I, McCreesh N, Vernon I, McKinley TJ, Oakley JE, Nsubuga R, Goldstein M, White RG . 2017 Efficient history matching of a high dimensional individual based HIV transmission model. SIAM/ASA J. Uncertain. Quantification 5, 694719. (doi:10.1137/16M1093008) Crossref, Google Scholar  79.
McCreesh N et al. 2017 Universal test, treat, and keep: improving art retention is key in costeffective HIV control in Uganda. BMC Infect. Dis. 17, 322. (doi:10.1186/s128790172420y) Crossref, PubMed, Web of Science, Google Scholar  80.
McKinley TJ, Vernon I, Andrianakis I, McCreesh N, Oakley JE, Nsubuga RN, Goldstein M, White RG . 2018 Approximate Bayesian computation and simulationbased inference for complex stochastic epidemic models. Stat. Sci. 33, 418. (doi:10.1214/17STS618) Crossref, Web of Science, Google Scholar  81.
Vernon I, Liu J, Goldstein M, Rowe J, Topping J, Lindsey K . 2018 Bayesian uncertainty analysis for complex systems biology models: emulation, global parameter searches and evaluation of gene functions. BMC Syst. Biol. 12, 1. (doi:10.1186/s1291801704843) Crossref, PubMed, Web of Science, Google Scholar  82.
Vernon I, Goldstein M, Bower RG . 2010 Rejoinder for Galaxy formation: a Bayesian uncertainty analysis. Bayesian Anal. 5, 697708. Web of Science, Google Scholar  83.
O’Hagan A . 2006 Bayesian analysis of computer code outputs: a tutorial. Reliab. Eng. Syst. Saf. 91, 12901300. (doi:10.1016/j.ress.2005.11.025) Crossref, Web of Science, Google Scholar  84.
Marathe MV, Ramakrishnan N . 2013 Recent advances in computational epidemiology. IEEE Intell. Syst. 28, 96101. (doi:10.1109/MIS.2013.114) Crossref, PubMed, Web of Science, Google Scholar  85.
AylettBullock J et al. 2021 Operational response simulation tool for epidemics within refugee and IDP settlements. medRxiv. Google Scholar  86.
Van Rossum G, Drake FL . 2009 Python 3 reference manual. Scotts Valley, CA: CreateSpace. Google Scholar  87.
Hunter JD . 2007 Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 9095. (doi:10.1109/MCSE.2007.55) Crossref, Web of Science, Google Scholar  88.
Harris CR et al. 2020 Array programming with NumPy. Nature 585, 357362. (doi:10.1038/s4158602026492) Crossref, PubMed, Web of Science, Google Scholar  89. The pandas development team. pandasdev/pandas: Pandas, February 2020. Google Scholar
 90. McKinney W. 2010 Data structures for statistical computing in Python. In Proc. of the 9th Python in Science Conf. (eds S. van der Walt, J. Millman), pp. 56–61. Google Scholar
 91.
Virtanen P . 2020 SciPy 1.0 Contributors. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261272. (doi:10.1038/s4159201906862) Crossref, PubMed, Web of Science, Google Scholar  92.
 93. Office for National Statistics. 2017 Marriages in England and Wales. See https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/marriagecohabitationandcivilpartnerships/datasets/marriagesinenglandandwales2013. Google Scholar
 94. Office for National Statistics. 2017 Birth characteristics in England and Wales: 2017. https://www.ons.gov.uk/releases/birthcharacteristicsinenglandandwales2017. Google Scholar
 95. Office for National Statistics. LC1109EW (household composition by age by sex). See https://www.nomisweb.co.uk/census/2011/lc1109ew. Google Scholar
 96. Scientific Advisory Group for Emergencies. Dynamic COCIN report to SAGE and NERVTAG, 13 May 2020. See https://www.gov.uk/government/publications/dynamiccocinreporttosageandnervtag13may2020. Google Scholar
 97. ICNARC. Icnarc report on Covid19 in critical care 10 July 2020. See https://www.icnarc.org/OurAudit/Audits/Cmp/Reports. Google Scholar
 98.
Bastos TS, O’Hagan A . 2008 Diagnostics for Gaussian process emulators. Technometrics 51, 425438. (doi:10.1198/TECH.2009.08019) Crossref, Web of Science, Google Scholar  99.
Kennedy MC, O’Hagan A . 2001 Bayesian calibration of computer models. J. R. Stat. Soc. B (Stat. Methodol.) 63, 425464. (doi:10.1111/14679868.00294) Crossref, Web of Science, Google Scholar