Temporal variation of human encounters and the number of locations in which they occur: a longitudinal study of Hong Kong residents

Patterns of social contact between individuals are important for the transmission of many pathogens and shaping patterns of immunity at the population scale. To refine our understanding of how human social behaviour may change over time, we conducted a longitudinal study of Hong Kong residents. We recorded the social contact patterns for 1450 individuals, up to four times each between May 2012 and September 2013. We found individuals made contact with an average of 12.5 people within 2.9 geographical locations, and spent an average estimated total duration of 9.1 h in contact with others during a day. Distributions of the number of contacts and locations in which contacts were made were not significantly different between study waves. Encounters were assortative by age, and the age mixing pattern was broadly consistent across study waves. Fitting regression models, we examined the association of contact rates (number of contacts, total duration of contact, number of locations) with covariates and calculated the inter- and intra-participant variation in contact rates. Participant age was significantly associated with the number of contacts made, the total duration of contact and the number of locations in which contact occurred, with children and parental-age adults having the highest rates of contact. The number of contacts and contact duration increased with the number of contact locations. Intra-individual variation in contact rate was consistently greater than inter-individual variation. Despite substantial individual-level variation, remarkable consistency was observed in contact mixing at the population scale. This suggests that aggregate measures of mixing behaviour derived from cross-sectional information may be appropriate for population-scale modelling purposes, and that if more detailed models of social interactions are required for improved public health modelling, further studies are needed to understand the social processes driving intra-individual variation.


List of Tables and Figures
. Contact diary recording tables. Table S2. Mean average of contact rates stratified by study wave. Table S3. Comparison of individuals' contact metrics between different study waves. Table S4. Mean number of contacts, stratified by study wave, and age group of participant and contact. Table S5. Mean number of contacts, stratified by study wave, age group of participant, and social setting. Table S6. Mean number of contacts involving touch, stratified by study wave, and age group of participant and contact. Table S7. Estimated fixed effects of the regression models shown in Figure 4 of the main text, excluding spline terms. Table S8. Variance within the random effects of the regression models. Table S9. Mean and standard deviation of reported individual and group contacts, for each study wave. Table S10. Data dictionary for released data. Figure S1. Study population demography. Figure S2a. Individual-level distribution of the different in number of contacts reported by participant between pairs of study waves. Figure S2b. Individual-level distribution of the different in duration of contacts reported by participant between pairs of study waves. Figure S2c. Individual-level distribution of the different in number of locations reported by participant between pairs of study waves. Figure S3. Boxplots of total number of contacts made by age groups of participants stratified by study wave. Figure S4. Correlations between age-mixing matrices from each study wave. Figure S5. Force of infection estimates based on observed age-specific mixing patterns for each study wave. Figure S6. Modelled contact rates for days reported by participants as being 'typical'. Figure S7. Modelled number of contacts made in (A,B) home, (C,D) work or school, and (E,F) other social settings. Figure S8. Sensitivity of fitted regression model to estimated contact durations. Figure S9. Regression models exploring the relationship between the proportion of contacts involving touch and the average duration per contact and the total number of contacts reported. Figure S10. Intra-participant variation in contact rates. Figure S11. Plots examining how the coefficient of variation of different contact metrics changes as observations accumulate with each study wave. Figure S12. Relationship between individual-level coefficient of variation (CoV) for number of contacts, contact duration and number of locations. Figure S13. Distribution of number of contacts reported as individuals and group contacts, stratified by study wave. Figure S14. Modelled number of contacts reported as (A,B) individuals and (C,D) groups.

Appendix A. Recording of contacts and locations.
A participant's contacts and characteristics of those contacts were recorded by study researchers through interviews with the participant, and the information provided was recorded by researchers in two tables shown in Table S1. Participants were interviewed and prompted to recall the people they encountered as well as contact events, where an event was defined as an encounter made with one or more other people in a particular location within a discrete time period, and an encounter was defined as a face-to-face conversation between a participant and another person where they are within 1 meter of each other and/or where a participant touched someone's skin with their skin (examples provided to participants included shaking or holding hands, or a kiss). Each participant's contacts were assigned a unique name or description by the interviewer. Thus, a participant could record encounters made in the same location with possibly the same individuals but at very different times of the day (for example, meeting the same group of commuters on the way to work and on the way back from work).
Interviews proceeded as follows. First, subjects were asked to recall the different individuals or groups of individuals they encountered on the pre-assigned recording day, to populate the Person table (Table S1). Second, participants were asked to answer some basic information about each contact individual or group. Participants were asked to report: the age range of the contact(s) (0-4, 5-19, 20-39, 40-64, 65 or older); the typical frequency of encountering the contact(s) (Regular contact: 4 or more days a week, 2-3 days a week, once a week, Non-regular contact: less than once a week, met for the first time that day). For groups, participants were instructed to report the characteristics that would apply to the majority of individuals present within the group. Third, participants were asked to recall, for each individual/group encountered, the different locations in which they encountered that particular individual/group. Responses were used to populate the Contact Event Table (Table S1). Characteristics of the contact event were recorded at this time: whether the encounter involved skin-on-skin touch; the social context in which the contact event occurred (the participant's home, work or school, travel, shopping, meet or others), an estimate of the duration of the contact event (<10 minutes, 10-29 minutes, 30-59 minutes, 1 hour, 1 hour to 2 hours, 2 hours to 4 hours and 4 hours or more).   Table) • Start time at which the participant was at that location (for reference purposes during interview) • A short description of the location (for reference purposes during interview) • Setting or setting of encounter event (home, work, school, shopping, restaurant, travel, leisure, other) • Duration of the contact event   Figure S2a. Individual-level distribution of the different in number of contacts reported by participant between pairs of study waves. Note, the absolute difference is binned by logarithmically spaced breaks.

Figure S2b
. Individual-level distribution of the different in duration of contacts reported by participant between pairs of study waves. Note, the absolute difference is binned by logarithmically spaced breaks.

Figure S2c
. Individual-level distribution of the different in number of locations reported by participant between pairs of study waves. Figure S3. Boxplots of total number of contacts made by age groups of participants stratified by study wave. Box widths are indicative of the sample size. Red dots denote the mean number of contacts for each group. Note, the y-axis is plotted as a log-scale, and so participants making zero contacts are not represented. The number of zero contact observations in each study wave are as follows: R1, 21; R2 11; R3, 16; R4 21. Figure S4. Correlations between age-mixing matrices from each study wave. This matrix shows the spearman correlation coefficients (and associated p-values) between pairs of wave-specific age-mixing matrices shown in Figure 4. R1 to R4 are each of the four study waves.
Appendix D. Average number of total and touch contacts stratified by participant-contact age groups, setting, and study wave Here we estimate the Force of infection ( ) for each study wave using the observed age-mixing patterns as shown in Figure 3 of the main text. We assume that the population is entirely susceptible, except for initial infecteds, and that the transmission rate per contact is constant and independent of participant or contact age. The population size in each age class is , based on Hong Kong Census information. We make two assumptions regarding the number of initial infectious individuals in each age class : (1) the number in each age category within the population is a fixed proportion, = ; (2) the number in each age category is the same across all age categories, = ∑ × ∑ , such that the total number of infectious individuals under each assumption is the same.
We define the force of infection for each age class as = ∑ where is the observed average number of contacts made by age class with age class per day, and is the transmission rate per day given contact.
We present the age-class specific force of infection by study wave in figures S5A and S5B below and the total Force of Infection (∑ ) by wave for each assumption regarding initial infected in figure S5C. We used = 1 −4 , and = 1 −4 , which corresponds to 697 initial infecteds in a population of 6,971,882. We excluded the 0 to 4 age class from our force of infection calculations (though did include this age group as potential infectors) due to the low number of observations in our study for these ages. Figure S5. Force of infection estimates based on observed age-specific mixing patterns for each study wave. Agepair specific force of infection estimates assuming the same proportion (A) or number (B) of infecteds in each age group. (C) Average force of infection across age-groups, weighted by census-derived population size of each age group, over each study wave. Table S7. Estimated fixed effects of the regression models shown in Figure 4 of the main text, excluding spline terms.   Table S8. Variance within the random effects of the regression models. This table contains the variance associated with the random effect terms of the models presented in Figure 4 of the main text (Section A, using all observations), and those from additional regression models which restricted the observations to those from participants reporting their contact day was a 'typical' day (section B) or where the outcome was the number of contacts reported in a specific setting (section C).  Figure S6. Modelled contact rates for days reported by participants as being 'typical'. Here we show the percentage contribution to contact rate (number of contact, duration of contact, number of locations) by the various covariates included in each model, relative to the contact rate predicted for a male 50-year-old from a household of size 1, on a Monday, with one contact location and during study wave 1. Models were fitted to data restricted to observations where participants reported their reporting day to be 'typical' and participants for whom there were at least two observations. Outcome variables were number of contacts (A and B), the total duration of contact events (C and D), and the number of locations in which contact occurred (E and F). Here, we show the percentage contribution to the number of contacts by covariates included in each model, relative to the contact rate predicted for a male 50-year-old from a household of size 1, on a Monday, with one contact location and during study wave 1. Models were fitted to data restricted to observations where participants for whom there were at least two observations. Note, for the number of contacts at home model (A,B), we excluded number of locations as an explanatory variable and instead included household size as a variable.  average duration per contact as response variables. Models adjusted for age (spline), sex, day of the week, household size, and study. Raw observations are shown as points in both plots, though they are jittered in A for clarity.

Appendix G. Individual-level variation in contact rates.
Figure S10. Intra-participant variation in contact rates. Proportion of individuals (with two or more observations) who remain within a single contact rate quantile category across all waves, against the number of quantiles used, for (A) number of contacts and (B) contact duration. Bootstrap estimates for both observed data (red) and null model synthetic data (grey) are shown. Null 'synthetic' data was generated from our observed data, where the individual-level contact metrics for study wave are resampled without replacement from the observationsessentially breaking the within-individual dependencies of our observed contact rates, while preserving the distribution of rates within each wave. Here, we assign each participant's wave-specific contact rate into a quantile category. Category breaks were defined by finding the required number of quantiles from all observed contact rates for individuals participating for their first time. We excluded individuals for which there was only a single (wave) observation. Lines represent bootstrapped 95% confidence intervals, which were generated through 500 resamples.     Here, we show the percentage contribution to the number of contacts by covariates included in each model, relative to the contact rate predicted for a male 50-year-old from a household of size 1, on a Monday, with one contact location and during study wave 1. Regression analysis performed as for the main text, apart from the new outcome variables.