Journal of The Royal Society Interface
Published:https://doi.org/10.1098/rsif.2017.0467

    Abstract

    Population growth is often ignored when quantifying gene expression levels across clonal cell populations. We develop a framework for obtaining the molecule number distributions in an exponentially growing cell population taking into account its age structure. In the presence of generation time variability, the average acquired across a population snapshot does not obey the average of a dividing cell over time, apparently contradicting ergodicity between single cells and the population. Instead, we show that the variation observed across snapshots with known cell age is captured by cell histories, a single-cell measure obtained from tracking an arbitrary cell of the population back to the ancestor from which it originated. The correspondence between cells of known age in a population with their histories represents an ergodic principle that provides a new interpretation of population snapshot data. We illustrate the principle using analytical solutions of stochastic gene expression models in cell populations with arbitrary generation time distributions. We further elucidate that the principle breaks down for biochemical reactions that are under selection, such as the expression of genes conveying antibiotic resistance, which gives rise to an experimental criterion with which to probe selection on gene expression fluctuations.

    1. Introduction

    Exploring the consequences of cell-to-cell variability is crucial to understand the functioning of endogenous and synthetic circuitry in living cells [14]. In clonal cell populations, differences between cells arise from several dynamical effects. An important source of heterogeneity is the intrinsic stochasticity in biochemical reactions [57]. Equally important, but often overlooked, is the substantial variability of individual cell division timings. In fact, two sister cells do not divide at the same time [810], which leads to more heterogeneous cell ages in clonal populations. Although both of these factors have been studied independently, their contributions cannot easily be separated when cells are growing and dividing [11,12]. Approaches that allow to investigate the interplay of these effects remain widely unexplored.

    Stochasticity in the levels of molecules per cell is commonly modelled using the stochastic simulation algorithm [5,13]. While this approach fares well for non-growing cells, it does not take into account the fact that molecule levels per cell need to double over the cell division cycle, at least on average. A number of studies therefore considered lineages that track the biochemical dynamics inside a single dividing cell over many generations [1419]. Such approaches are well equipped to model the statistics observed in mother machines [20,21], for example, or using cell tracking [22] that allow monitoring intracellular biochemistry in isolated cells over time. Modelling approaches that also account for the substantial variations observed in cell division timing are only currently being developed [23,24]. Generation times of single Escherichia coli [10] and budding yeast cells [25], for example, vary up to 40% and 30% from their respective means, and similar values have been observed in mammalian cells [26].

    On the other hand, population snapshots are commonly used to quantify heterogeneity across clonal cell populations. Such data are obtained from flow cytometry [27] or smFISH [28], for instance. An important source of heterogeneity in these datasets stems from the unknown cell-cycle positions [29]. Sorting cells by physiological features—such as using cell-cycle markers, DNA content or cell size as a proxy for cell-cycle stage—are used to reduce this uncertainty [27,30,31]. It has also been suggested that simultaneous measurements of cell age, i.e. the time interval since the last division, could allow monitoring the progression of cells through the cell cycle from fixed images [3033]. Presently, however, there exists no theoretical framework that addresses both cell-cycle variability and biochemical fluctuations measured across a growing cell population, and thus we lack the principles that allow us to establish such a correspondence.

    In applications, it is often assumed that the statistics observed over successive cell divisions of a single cell equals the average over a population with marked cell-cycle stages at a single point in time [34]. In statistical physics, such an assumption is referred to as an ergodic hypothesis, which once it is verified leads to an ergodic principle. Such principles certainly fare well for non-dividing cell populations, but it is less clear whether they also apply to growing populations, in particular, in the presence of fluctuating division times of single cells. While this relationship can be tested experimentally [35,36], we demonstrate that it is also amenable to theoretical investigation.

    In this article, we develop a framework to analyse the distribution of stochastic biochemical reactions across a growing cell population. We first note that the molecule distribution across a population snapshot sorted by cell ages disagrees with the statistics of single cells observed in isolation, similarly to what has been described for the statistics of cell-cycle durations [8,37,38]. We go on to show that a cell history, a single cell measure obtained from tree data describing typical lineages in a population [3943], agrees exactly with age-sorted snapshots of molecule numbers. The correspondence between histories and population snapshots thus reveals an ergodic principle relating the cell-cycle progression of single cells to the population. The principle gives important biological insights because it provides a new interpretation to population snapshot data.

    In the results, we investigate the differences of the statistics of isolated cell lineages and population snapshots. Section 2.1 develops a novel approach to model the stochastic biochemical dynamics in a growing cell population. We derive the governing equations for an age-sorted population and formulate the ergodic principle. In §2.2, we demonstrate this principle using explicit analytical solutions for stochastic gene expression in forward lineages and populations of growing and dividing cells. Our results are compared with stochastic simulations directly sampling the histories of cells in the population. Finally, in §2.3, we elucidate using experimental fluorescence data of an antibiotic-resistance gene that testing the principle allows us to discriminate whether a biochemical process is under selection.

    2. Results

    Several statistical measures can be used to quantify the levels of gene expression in single cells and populations. Distributions obtained across a cell population, such as those taken from static images, represent the final state of a growing population (figure 1a, shaded green cells). Cells in isolation, by contrast, can be modelled as random paths in the lineage tree, denoted as forward lineages (figure 1a, black line), that follow either of the two daughter cells with equal probability. Stochastic simulations show that the distributions of these two statistics disagree (figure 1b), and thus, apparently, the population dynamics violates the ergodic hypothesis. In the following analysis, we provide a novel ergodic principle relating how single cells progress through the cell cycle to the distribution of a growing cell population. To formulate the principle, we introduce histories (figure 1a, red line), a single-cell statistic that has the same distribution as a population that is sorted by cell ages (figure 1c,d). Such histories are obtained by choosing an arbitrary cell in the population and tracing its evolution back to the ancestor from which the whole population originated.

    Figure 1.

    Figure 1. Ergodic principle between single cells and the population. (a) Lineage tree of a clonal population leading to heterogeneous distribution of molecule numbers per cell (green colour). Two types of lineage statistics characterize the tree: (i) forward lineages (black line) originate from a common ancestor, end at an arbitrary cell in the population, and (ii) histories start from an arbitrarily chosen cell in the population and end at a common ancestor (red line). The conceptual difference between these measures are the probabilities with which these lineages are selected. (b) Statistics of simulated forward lineages do not agree with snapshot distributions (see box 1 and caption of figure 5 for simulation details). (c) Statistics observed across the population and the statistics after sorting cells by age, which allows inferring the cell-cycle progression. (d) The ergodic principle states that the statistics of age-sorted cells equals the distribution observed along cell histories but not along forward lineages.

    2.1. Biochemical dynamics in cell populations

    We model stochastic biochemical reactions in a population of growing and dividing cells. We assume that each cell contains a pool of interacting biochemical species X1, X2, …, XNS reacting via a network of R intracellular reactions of the form

    Display Formula
    where r = 1, …, R and ν±j,r are the stoichiometric coefficients. To model the effect of cell divisions, we associate to each cell an age τ measuring the time interval from cell birth. If cells divide with an age-dependent rate γ(τ), which is independent of the number of molecules x in the network under consideration, the division times τd of each cell are distributed according to
    Display Formula
    2.1
    Equivalently, for a given division-time distribution one obtains the corresponding rate function via
    Display Formula

    While the forward-lineage approach is summarized in appendix A, we here focus on the population dynamics. To this end, we associate with the state of the population the density of cells n(τ, x, t) with molecule count x and age between τ and τ + dτ at time t. This quantity represents the outcome of repeated snapshots of the population growth process, which averages out variations in the total number of cells (see appendix B for a detailed derivation). The mean number of cells in the population is then given by

    Display Formula
    2.2
    Since the probability for a cell to divide at age τ is given by γ(τ) dτ, the rate of change in the cell density obeys
    Display Formula
    2.3a
    where Inline Formula is the transition matrix for the biochemical reactions
    Display Formula
    2.3b
    wr is the propensity and (νr)i = ν+irνir is the stochiometric vector of the r-th reaction. Equation (2.3a) must be supplemented by a boundary condition accounting for cell divisions. Because the number of newborn cells equals twice the number of dividing cells, the condition reads
    Display Formula
    2.3c
    The division kernel B(x | x′) in equation (2.3c) models the inheritance of x daughter molecules from x′ mother molecules and is given by
    Display Formula
    2.4
    where B1(x | x′) and B1(x′ − x | x′) are the marginal distributions of inherited molecules for the two daughter cells (see appendix B for details). Note that the total molecule numbers are conserved during cell division. For example, when molecules are partitioned with equal probability between daughter cells, B1 and B are binomial [16,44].

    A simple algorithm that enables simulating the biochemical dynamics in the population exactly, which we will refer to as the First Division Algorithm, is given in box 1. Step 2 simulates the transitions due to biochemical reactions, equation (2.3b), while step 3 implements the boundary condition (2.3c) for cell divisions. The density of cells with molecule count x and age τ obtained from several snapshots then obeys equations (2.3).

    Box 1: First Division Algorithm to simulate a cell population up to time tf.

    • 1. Initialization. At time t = 0, initialize the cell population by assigning to each cell an age τi, a division time τd,i and molecule count xi.

    • 2. Biochemical reactions. Determine the next dividing cell from j = argmini (τd,iτi) and Δt = mini (τd,iτi). Advance the molecule numbers of each cell independently from age τi to τi + Δt using the Gillespie algorithm and advance time from t to t + Δt.

    • 3. Cell division. Replace the dividing cell by two newborn daughter cells of zero age. Assign to one of these a molecule number distributed according to B1(x | xj), depending on the mother's molecule count xj, and assign the remaining molecules to the other daughter. Assign to each daughter independently a division time distributed according to φ(τd).

    • 4. Repeat. Repeat from 2 until t = tf.

    While stochastic simulations are relatively easy to carry out, equations (2.3) are generally difficult to solve because they represent an integro-partial differential equation. To allow for analytical progress, we consider the long-term behaviour in which the population grows exponentially with rate λ,

    Display Formula
    2.5
    and identify Π(τ, x) as the joint distribution of cell ages and intracellular molecule counts in the population. We treat this distribution synonymously with the population snapshot. Using ansatz (2.5) in equation (2.3), we obtain
    Display Formula
    2.6
    Similarly, the corresponding boundary condition is given by equation (2.3c) but with replacing n(τ, x) by Π(τ, x).

    2.1.1. Age distribution in a population

    The age distribution measures the fraction of cells in the population that reach a given age. It is obtained from marginalizing the population distribution over the molecule numbers

    Display Formula
    2.7
    Summing equation (2.6) and solving (see appendix C for details), we find that the age distribution obeys
    Display Formula
    2.8
    which recovers the results in [8,9,45]. The integral in the above equation denotes the probability that a cell has not divided before reaching age τ.

    The population growth rate λ needs to be determined from the boundary condition (appendix C), which leads to the characteristic equation

    Display Formula
    2.9
    also known as the Euler–Lotka equation [8,9,45]. The largest root of this equation yields the population growth rate λ. For a discussion of the relation between the population growth rate and the mean division time, see [10,45].

    2.1.2. Molecule-number distribution in an age-sorted population

    Next, we consider the probability of observing x molecules in a cell of age τ. This conditional probability is given by the number of cells with age τ and molecule count x divided by the number of cells at that age

    Display Formula
    2.10
    It can be verified by plugging equation (2.10) into (2.6) that the distribution obeys the master equation
    Display Formula
    2.11a
    Similarly, inserting equation (2.10) with (2.5) into boundary condition (2.3c), we deduce that the distribution of inherited molecules obeys
    Display Formula
    2.11b
    We identify the density ρ with the ancestral division-time distribution [8,40]
    Display Formula
    2.11c
    describing the past division-times in a growing cell population. Taken together, equations (2.11) describe the distribution in an age-sorted population. Combining these equations with the age distribution given in the previous section they contain the full statistical information of the population distribution. An extension of this result to heritable division times is provided in appendix C.

    Since histories contain only ancestral cells, their division times are distributed according to the ancestral division-time distribution ρ, as has been pointed by Wakamoto et al. [40]. In figure 2a, we compare the division-time distributions in forward lineages and histories for variable division times. We observe that the distribution tails of φ are suppressed in cell histories due to exponential dependence on the growth rate λ. Specifically, we highlight the fact that cells with division times shorter than ln 2/λ are over-represented in histories compared with forward lineages, while cells with longer division times are over-represented.

    Figure 2.

    Figure 2. Generation time variability in forward lineages and histories. (a) Division times in forward lineages (dashed lines) following a gamma distribution with unit mean and coefficients of variation of 10% (red, scaled by factor Inline Formula), 50% (blue) and 100% (yellow dashed line). The corresponding division time distributions in cell histories are shown by solid lines. (b) The corresponding age distributions in forward lineages (dashed), histories (solid lines) and the population (light dashed). Note that the age-distribution in a history is the same as the population for the exponential distribution (CV 100%, yellow line), and the distributions in histories and forward lineages are very similar for small cell-cycle variability (CV 10%, red lines).

    2.1.3. Ergodic principle for cell populations

    We now explain how these quantities can be computed from population data. To this end, we use the approach presented by Nozoe et al. [42] enumerating all lineages of a given population tree. We denote the molecule numbers in lineage j by (x1,j(τ1,j), …, xD,j(τd, j)) with Dj completed cell divisions, where xi,j(τ) is the time-course from birth to division of molecule numbers of the ith cell in that lineage. We then evaluate the average of a function f(x) over a lineage of cells at a given age τ,

    Display Formula
    2.12
    where Inline Formula is the number of cells in the lineage that reach age τ before dividing.

    Because forward lineages track either one of the two daughter cells with equal probability, the probability of choosing lineage j from the tree is 2Dj, which decreases with the number of cell divisions [42]. Because the division times in forward lineages follow the distribution φ, we have

    Display Formula
    2.13
    The distribution under the integral is the lineage-probability described in appendix A.

    Next, we consider the distribution along histories that track an arbitrary cell from the population and trace back its evolution. The probability of choosing such a history is the same for every lineage and therefore equals the inverse number of cells N(t) in the population. The statistics of histories can thus be interpreted as a typical lineage. We note that equations (2.11) can be understood as describing the distribution along a lineage, similar to the forward-lineage approach develop in appendix A (cf. equations (A 1)), but by replacing the division-time distribution φ with the one of the ancestral population ρ. Since histories are composed entirely of ancestral cells, their division times follow the ancestral division-time distribution ρ. It hence follows that histories posses the same statistics as in the age-sorted population, that is

    Display Formula
    2.14
    The distribution Π(x | τ) is the lineage distribution given by the solution of equations (2.11) and equals the distribution in an age-sorted population, as we have shown in §2.1.2. We thus formulate the ergodic principle as the average of the histories of single cells in a population obtained over many cell divisions equals the average over an age-sorted cell population at every time point. A more intuitive way of stating the result is that the endpoints of all lineages have the same distribution Π(x | τ) as cells of τ in the population. Note that equations (2.13) and (2.14) are equal only for deterministic division times. In fact, most lineages Inline Formula are close to the history-average given by the right-hand side of equation (2.14), as we show in the following section through the use of examples (§2.2).

    Before proceeding, we investigate the age distribution that yields the fraction of cells reaching age τ in a history. Because division times in histories are distributed according to ρ and the probability that a cell has not divided before age τ is Inline Formula, we have

    Display Formula
    2.15
    where 〈τdρ is the mean of the distribution ρ, which arises as a normalizing constant (see appendix D). The result is notably different from the age-structure in a population Π(τ), compare equation (2.8). For example, the corresponding age distributions for gamma-distributed division times are compared in figure 2b.

    It thus follows that the average over a history of molecule numbers is not generally the same as the population average irrespective of age. The only exception to the rule are memoryless division times, i.e. constant division rates resulting in exponentially distributed division timings. In this case, equation (2.8) and (2.15) agree, but interestingly, the population distribution differs from the forward-lineage distribution.

    2.2. Analytical solutions for forward lineages, histories and snapshot distributions

    To obtain analytical solutions we make use of the generating function approach. For this purpose, we restrict ourselves to binomial partitioning of molecules

    Display Formula
    2.16
    The boundary condition (2.11b) for the probability generating function Inline Formula then simplifies to
    Display Formula
    2.17
    In the following, we demonstrate how explicit solutions for the lineage, histories and snapshot distributions can be obtained for simple examples of stochastic gene expression but arbitrary division-time distributions.

    2.2.1. Molecule synthesis and degradation

    As a first application, we consider a birth–death process. Molecules are synthesized at a constant rate k0 and are degraded via a first-order reaction with rate k1,

    Display Formula
    2.18
    To test whether differences between forward lineages and histories could be significant in a single population, we used the First Division Algorithm (box 1) to simulate a population tree corresponding to figure 1a. We then select histories by choosing cells at random from the population and tracing their evolution back through the population tree. Similarly, we choose forward lineages by following the first cell in the population over many cell cycles selecting either daughter cell with equal probability. A qualitative comparison of few representative time-courses shows that forward lineages exhibit fewer cell divisions but higher molecule counts than histories (figure 3a), consistent with the predicted age-distributions, which contain more old cells in forward lineages than in histories (figure 2b).
    Figure 3.

    Figure 3. Histories represent typical lineages of finite population trees. (a) Representative forward lineages (blue) and histories (red lines) from a single population for the birth–death process (2.18) with rates k0 = 5 and k1 = 0.1 and division times are exponentially distributed with unit rate. Simulated time-courses show molecule synthesis by biochemical reactions (step 1 in box 1) interrupted by sudden decreases in molecule number due to binomial partitioning at cell division (step 2 in box 1). (b) Smoothed frequency distributions of the mean number of inherited molecules computed for each lineage of 10 population trees with N = 100 (teal), 1000 (blue) and 10 000 cells (magenta) at the final time and rates k0 = 50 and k1 = 1. The distribution mode is centred about the mean of histories (solid grey line) but not about the means of forward lineages (dashed grey line). (c) The corresponding second moment E[x2 | 0] is shown. The result indicates that the statistics of most lineages is close to the one of histories.

    To quantify whether histories represent typical lineages in a finite population, we simulate 40 lineage trees up to a population size of N = 100, 1000 and 10 000. We then calculate mean and second moments for each individual lineage in the tree and evaluate their distributions across all lineages (figure 3b,c). As the population size increases the distribution of first and second moments becomes centred about the moments of the history distribution which are smaller than the corresponding ones in forward lineages. In particular, the moments of histories correspond to the distribution modes verifying that histories are typical lineages, even for finite populations.

    Next, we investigate how to analytically characterize the distributions of both processes. The generating function of the corresponding master equation obeys the evolution equation

    Display Formula
    2.19
    Because molecules are produced and degraded independently, their number x per cell can be separated into a sum of two independent contributions: (i) the amount of molecules newly produced and degraded up to age τ, and (ii) the amount of molecules inherited after cell division. Clearly, the first contribution is Poissonian and its mean is given by μ(τ) = (k0/k1)(1 − ek1τ). The second contribution needs to be determined from the boundary condition as we show in the following.

    The solution to equation (2.19) can be written as a product of generating functions of the aforementioned contributions

    Display Formula
    2.20
    where the first factor is the generating function of newly produced and degraded molecules until age τ. Using this solution in equation (2.17) we find the condition
    Display Formula
    2.21
    If cell divisions occur at deterministic time intervals, the integral is straightforward. In this case, the distribution of inherited molecules p0 is Poissonian with mean μp(τ) = ((1 − Inline Formula)/(2 − Inline Formula))(k0/k1) Inline Formula and depends on the division-time τd. This result has been obtained earlier by Schwabe et al. [29], and as a particular case by Johnston et al. [19].

    The solution turns out to be more involved in the presence of division-time variability. Certainly, equation (2.20) could be solved for particular choices of ρ(τd). Since, however, the division-time variability is difficult to characterize, in general, we aim to solve this equation for arbitrary distributions. To this end, we expand

    Display Formula
    2.22
    and write an equation for the series coefficients an. Plugging equation (2.22) into (2.20) and equating equal powers of z, we find
    Display Formula
    2.23
    where Inline Formula is the Laplace transform of the division-time distribution in equation (2.11c). From the above formula all coefficients can be computed recursively using a0 = 1. It follows using G0((z − 1) ek1τ) in equation (2.22) and differentiating at z = 1 that the probability of observing x molecules inherited after cell division is
    Display Formula
    2.24
    Note that the distribution depends on the cell age τ because the inherited molecules are degraded over the cell cycle. Because the number of molecules produced up to age τ is Poissonian, the total amount of molecules for a cell of age τ obeys
    Display Formula
    2.25
    The distributions of forward lineages are obtained by substituting φ for ρ in equation (2.23) (appendix A).

    In figure 4a, we show the resulting distributions of inherited molecules (τ = 0) for different levels of cell-cycle variability (figure 4a,i) for the cases of an age-sorted population (solid lines) and for the lineage statistics (dashed lines). Division times are assumed to be gamma-distributed, which fits cell-cycle variability observed in many bacteria [8,10]. The molecule number distributions are obtained using either ρ or φ in equation (2.23) and truncating equation (2.24) after the first 150 terms. In agreement with the theoretical predictions, lineage and histories distributions are similar for small division time variability (red lines) but become significantly different as variability increases (yellow lines). Also shown are the corresponding distribution for varying degradation rates (figure 4a,ii). We observe that forward lineages and history statistics are comparable for unstable molecules (fast degradation) but not for stable ones (slow degradation). This finding is in line with the intuition that rapid degradation averages out timing fluctuations.

    Figure 4.

    Figure 4. Analytical solutions for a birth–death process. (a)(i) Analytical distributions of inherited molecules in a population and in forward lineages are depicted for gamma-distributed division times with 10% (red), 50% (blue) and 100% (yellow line) variations about the mean (as shown in figure 2). Simulated distributions of inherited molecules in histories are shown as shaded areas. Lineages and histories agree only for small division times variability. Parameters are k0 = 50 and k1 = 1. (a)(ii) Analytical solutions for different degradation rates, k1 = 0.1 (teal), k1 = 1 (blue) and k1 = 10 (magenta). The distribution of inherited molecules in a population (τ = 0, solid lines) coincides with the simulated histories (shaded ares), which verifies the ergodic principle, but not with the lineage distribution (dashed lines). Histories and forward lineages agree well only for large degradation rates. Parameters are k0 = 50k1 and exponentially distributed division times with unit mean (CV[τ] = 1). (b) The corresponding distributions are shown for the mid-cell cycle τ = 0.5. (c) Simulated distributions estimated irrespectively of cell age across the population (shaded areas) agree well with analytical solutions (solid lines). Also shown are the theoretical distributions across forward lineages (dashed lines).

    To support the numerical results, we evaluate the mean number of molecules at birth Inline Formula Inline Formula. For fast degradation, the expression reduces to Inline Formula in both forward lineages and histories because it is independent of the division times. For slow degradation, however, we obtain E[x | 0] ∼ k0Eρ(τd) in histories and Efw[x | 0] ∼ k0Eφ(τd) in forward lineages. Since Eρ(τd) ≤ Eφ(τd) with equality for deterministic division times [10], we conclude that cells in histories and the population contain fewer molecules than in forward lineages. One can further show that the difference between population and forward lineages E[x | 0] − Efw[x | 0] = − k0ln 2 CV2φ[τ] + O(CV4φ) increases with the coefficient of variation of timing fluctuations.

    In figure 4b, we illustrate the corresponding distributions for τ = 0.5 corresponding to half of the mean generation time. We observe that both cell-cycle variation (figure 4b,i) and degradation (figure 4b,ii) mitigate the discrepancies between lineage and history statistics. Intuitively, this could also be concluded from the fact that equation (2.19), and similarly the distribution of molecules equation (2.11a), approaches a steady state independent of the number of inherited molecules for large τ. For all cases, we verified the ergodic principle by stochastic simulations using the First Division Algorithm (box 1) of 40 population trees from which we chose histories at random. The corresponding distributions are shown as shaded areas in figure 4a,b, which are in excellent agreement with our analytical solutions (solid lines).

    To investigate the effect of the overall distribution in the population irrespectively of cell age, we numerically integrate the analytical solution (2.25) over the age-distribution Π(τ) of the population given by equation (2.8), Inline Formula; the result is shown in figure 4c. The corresponding age distribution in a forward lineage is given in appendix D. We find that these predictions (solid lines) are in excellent agreement with the simulated distributions across populations (shaded areas) but not with the distributions across forward lineages (dashed lines).

    2.2.2. Expression of a stable protein

    Characteristic mRNA half-lives in E. coli are of the order of 5 min, while doubling times range from 20 min to several hours. Given the findings of the previous section, this suggests that generation time variability has little effect on mRNA distributions. Protein half-lives in living cells, however, occur on timescales much longer than the doubling time, meaning that many proteins are essentially stable and diluted mainly through cell divisions.

    To investigate protein variability in growing cell populations, we model the expression of such a protein including mRNA transcription via the reactions

    Display Formula
    2.26
    Using the First Division Algorithm (box 1), we simulate a population of dividing cells and collect representative forward lineages and histories (figure 5a). Given that empirical distributions of rescaled log-division times in E. coli are invariant and bell-shaped across conditions [46], we assume lognormal distributed division times. We observe that protein time-courses corresponding to forward lineages have higher protein levels and are noisier than histories.
    Figure 5.

    Figure 5. Lineage and history characterization of the expression of a stable protein. (a) Representative time courses of forward lineages (blue) and histories (red) of a stable protein collected from a single population tree. (b) Simulated distributions of inherited molecules in forward lineages (blue) and histories (red) are shown as shaded areas. The analytical solution for a population (solid red line) agrees with the simulated histories but not with the lineage verifying the ergodic principle. For comparison, we also show the analytical solution of a lineage (solid blue line). (c) Distributions obtained irrespectively of cell age from forward lineages (shaded blue) and across the population (shaded blue), both of which are in excellent agreement with the simulations. In all panels, the distribution of division time is lognormal with unit mean and s.d. 3 resulting in a growth rate of approximately 1.25. Parameters for the simulation of the full gene expression model (2.26) are k0 = 20, k1 = 20 and k2 = 100.

    Deriving analytical expressions for the distribution of proteins is more involved because the amount of produced protein generally depends on the amount of inherited mRNA. We use the simplifying assumption that the mRNA half-life is much shorter than the cell-cycle time, for which the reactions (2.26) reduce to a single reaction [47,48]

    Display Formula
    2.27
    with proteins being produced in stochastic bursts of size m with distribution νm. For the reactions (2.26), the burst size distribution νm is geometric with mean b = k2/k1 corresponding to the mean number of proteins produced per mRNA lifetime [48].

    The rate of change for the generating function of this process satisfies

    Display Formula
    2.28
    where Inline Formula. It can be verified using equation (2.28) and the boundary condition (2.17) that the exact generating function solution is
    Display Formula
    2.29
    where Inline Formula and Inline Formula is the Laplace transform of ρ. Marginalizing the above equation over the age distribution, we obtain the generating function for the protein number distribution in the population
    Display Formula
    2.30
    An explicit expression for the Laplace transform of the age-distribution Inline Formula is given in appendix D, equation (D 3).

    The distributions corresponding to the generating function, equation (2.29), are obtained numerically via the inverse transform

    Display Formula
    2.31
    Note that because the moment-generating function does not exist for the lognormal distribution, we computed the Laplace transform of the division-time distribution directly. In practise, it was sufficient to truncate the sum in K(z) after the first 10 terms. In figure 5b, we show the resulting distribution of inherited protein in a population (τ = 0, red line). We observe excellent agreement between the theoretical solution and the history statistics obtained from stochastic simulations of 40 population trees (shaded red area), which verifies the ergodic principle. Instead, the distribution obtained across forward cell lineages shows a broader distribution with significant distribution tails (blue line) due to larger division time variability, in agreement with simulations (shaded blue area). Similar to the previous example, cells in histories and the population inherit fewer molecules than in forward lineages. We also obtain good agreement with the snapshot distributions measured across the cell population shown in figure 5c (red line: theory, red shaded area: simulation) by instead using equation (2.30) in the transformation (2.31).

    2.3. Breakdown of the ergodic principle under selection

    So far, we considered traits that do not affect the cell division rate. To test the validity of the principle in the opposite case, we simulate the production of a biomolecule whose levels increase division rate. We find that the distributions in histories do not coincide with the age-sorted population (figure 6a). The history distribution exhibits significantly larger tails resulting in a higher mean number of molecules (21.6 in histories versus 12.4 in population). This breakdown suggests that one could in principle discriminate the factors that affect cell fitness.

    Figure 6.

    Figure 6. The ergodic principle breaks down in the presence of selection. (a) Simulated distributions of inherited molecules for a biomolecule that increases division rate. It is seen that histories and age-sorted distributions are different and thus the ergodic principle is violated. The ratio of these probabilities yields the reproductive value, ν(x | 0), which increases with the number of molecules in the presence of selection (inset). In the simulation molecules are produced with rate k0 and divisions occur after a period τd = 50K10/(K10 + x100), where k0 = 100, K = 10 and x0 is the molecule count of that cell at birth. (b) Statistics of tree data of an antibiotic-resistance gene in the absence of antibiotic taken from Nozoe et al. [42]. The history distribution agrees very well with the population distribution. The ergodic principle is obeyed and the reproductive value of cells is independent of the gene expression (inset). (c) By contrast, in the presence of the drug, the distributions of histories and the age-sorted population disagree. The increase of the reproductive value with fluorescence indicates selection on high protein levels (inset).

    To investigate this effect, we consider Fisher's reproductive value [49], denoted by ν, which counts the relative number of future offspring for a cell in a given phenotypic state. Interestingly, it can be defined as the ratio of the distribution of histories and the age-sorted population statistics [50]. For a given trait x observed in cells of age τ, we here use their conditional distributions that provide the following factorization

    Display Formula
    2.32
    The first factor is the reproductive value due expression of x, involving the distributions across the age-sorted population Π(x | τ) and across histories ρ(x | τ). The second factor, ν(τ) = Πh(τ)/Π(τ), denotes the contribution from other factors affecting the division rate and is given by the ratio of the age distribution of histories Πh(τ) and the age distribution of the population Π(τ), which are generally different [10,40]. According to the ergodic principle the first contribution equals 1, when x does not affect the division rate, and thus the reproductive value is solely determined by ‘other’ factors that are not explicitly known and modelled via a stochastic interdivision time. When x increases division rate, higher levels of x can be selected upon and the reproductive value ν(x | τ) increases with x (figure 6a, inset). Thus, when the molecule levels are selected upon, ancestral cells contain more molecules on average than cells of the present population (figure 6a).

    We investigate this dependence using recently acquired tree data of the antibiotic-resistance gene smR conveying resistance to the antibiotic streptomycin [42]. The protein is fused to a fluorescent reporter allowing direct observation. We obtain age-sorted distributions using measured total fluorescence after cell division (τ = 0), while lineage-weighted distributions are computed using equations (2.13) and (2.14). In the absence of antibiotic treatment, the distributions of new-born cells in the age-sorted, histories and forward lineages are essentially the same (figure 6b). In agreement with the ergodic principle, the reproductive value due to the expression of the protein is constant (figure 6b, inset). In the presence of sub-inhibitory antibiotic doses, the distribution in histories differ from the one acquired across the population (figure 6c) and, moreover, the reproductive value increases with fluorescence (inset), which indicates selection on higher expression levels. These results are in agreement with the known function of the gene and highlight that testing the proposed ergodic principle can be used to probe selection in cell populations.

    3. Discussion

    We presented a modelling approach for stochastic biochemical reactions in dividing cell populations. The theory enabled us to identify an ergodic principle between single cells and the population, which can be stated as the average of the histories of single cells in a population obtained over many cell divisions equals the average over an age-sorted cell population at any single time point. Cell histories capture the statistics of typical lineages in a population, which are obtained by choosing an arbitrary individual of a clonal population and tracking back its evolution to the ancestor from which the population originated.

    This principle provides an interpretation of population snapshot data because it identifies the sample paths that are associated with age-sorted snapshot data with typical cell histories. Histories thus allow us to characterize the progression of single cells in the population through the cell cycle. We emphasized that the statistical difference between forward lineages and histories arises from variable division times, cells dividing faster than the population growth rate being over-represented in histories as compared to forward lineages. These deviations are expected to be significant since generation times of single cells are highly variable in microbial cell populations.

    It is important to point out that the principle requires the cell age to be known. The exception to this rule is the case of exponential division times for which the time-average of histories equals the one across snapshots. In the general case, gating techniques as used in flow cytometry can be used to narrow the range of physiological differences and thus correspond to age-sorted distributions, which according to the principle agree with the histories of single cells. Mother machines, on the other hand, provide statistics of individual forward lineages assuming that cells divide symmetrically. Thus our findings suggest potential differences when studying the dynamics of gene expression in different experimental devices, even when cell-cycle positions are explicitly known.

    Different notions of ergodicity have been discussed in the literature. Rocco et al. [17], for instance, challenged the ergodic hypothesis between an ensemble of individual lineages and their time-averages. They found that the hypothesis only holds true for fast degrading molecules. We found a similar dependence for the ergodicity between snapshots of a growing population and individual histories, which is justified when studying the dynamics of short-lived molecules such as mRNAs. The majority of proteins in E. coli and yeast, however, have half-lives much longer than typical population doubling times [51,52], for which the present ergodic principle accounts for. We demonstrated this dependence using analytical solutions to lineage, history and population distributions of simple stochastic models of gene expression.

    Although we developed the approach for biochemical reactions, the general principle applies to heritable non-genetic traits, such as plasmid numbers or chromatin states. Specifically, it applies to any trait of interest that does not affect the cell division rate. A counterexample of such a trait is cell size to which the history analysis has recently been applied [43]. Here, we found the that the likelihood to find an ancestral cell with a positively selected trait value is higher than for cells of the present population. Thus testing the ergodic principle could be used to experimentally probe this effect on cell fitness for any trait of interest.

    Data accessibility

    This article has no additional data.

    Competing interests

    I declare I have no competing interests.

    Funding

    P.T. acknowledges a fellowship by The Royal Commission for the Exhibition of 1851 and support by the EPSRC Centre for Mathematics of Precision Healthcare (EP/N014529/1).

    Acknowledgements

    It is a pleasure to thank Diego Oyarzún and Vahid Shahrezaei for valuable feedback.

    Appendix A. Biochemical dynamics in forward lineages

    The lineage description tracks a single cell over many cell division cycles but retains information about only one daughter at each division. We consider the lineage-probability pD(x|τ) of observing x molecules in a cell of age τ after Dth division. In between divisions the distribution satisfies the chemical master equation

    Display Formula
    A 1a
    where the transition matrix is defined in equation (2.3b) of the main text. An important difference to the usual master equation is that it needs to be solved subject to both an initial condition p0(x | 0) and a boundary condition describing cell divisions. The latter is given by
    Display Formula
    A 1b
    where φ(τd) is the distribution of division times, B(x | x′) is the division kernel as defined in equation (2.4) of the main text, and the summation is over all possible molecular states x′. The stationary lineage-probability is obtained in the long-term limit p(x | τ) = limD → ∞pD(x | τ) after a large number of cell divisions, or equivalently, by letting pD(x | τ) = pD+1(x | τ) in equation (A 1). The boundary condition, equation (A 1b), couples the distribution of dividing to the one of newborns.

    Apendix B. Master equation description of snapshot probabilities and snapshot densities

    We here derive a stochastic description for the outcome of snapshots in a dividing cell population. The state of the system may be described by observations of molecule content and cellular age {τi, xi}i=N, where N is the present number of cells. Note that τi and xi but also N are random variables for a single snapshot [9]. When cells are indistinguishable, a snapshot is equivalently characterized by the empirical distribution function

    Display Formula
    B 1

    Generally, more than a single snapshot is available, for instance by collecting snapshots from different microcolonies. Aggregating snapshots averages out the dependence on the individual stochastic realizations of {τi, xi}i=N, and yields the snapshot density

    Display Formula
    B 2
    quantifying the frequency with which cells with molecule count x and age τ are observed in the population. In the following, we first derive a master equation description for the functional probabilities P[s, t] of observing a particular realization of the snapshot s(x, τ) at time t. Second, we perform the average over all possible snapshots to obtain the density n(x, τ). An alternative approach to the one presented here would extend the kinetic framework given in [53] to incorporate biochemical reactions.

    We consider two types of transitions: (i) cell division in which molecules are partitioned between the two daughter cells and (ii) biochemical reactions. The rate of cell division for a single cell of age τ′ is γ(τ′) and hence the propensity of cell divisions in the population is

    Display Formula
    B 3
    At division, the mother cell is replaced with two new cells of zero age. The snapshot, therefore, changes instantaneously by
    Display Formula
    B 4
    where τ′ and x′ are the respective age and molecule count of the mother cell, and x1 and x2 the molecule counts of the daughter cells.

    When a biochemical reaction with stoichiometry νr occurs, we replace the mother cell with molecule count x′ by a cell of the same age but x′ + νr molecules. The snapshot therefore changes as follows:

    Display Formula
    B 5

    For brevity of notation, we define the step-operator, which acts as Inline Formula on any functional f of the snapshot function s(x, τ). Combining the changes from divisions and R possible reactions, we obtain the change in the snapshot probability P[s, t], describing the probability of all possible snapshots, which obeys

    Display Formula
    where we assume that B(x1, x2 | x′) is the probability of partitioning x′ molecules into the proportions x1 and x2 of the daughter cells.

    The snapshot density n(x, τ, t) from repeated experiments is obtained as follows:

    Display Formula
    Thus multiplying equation (B 6) by s(x, τ), integrating over all possible snapshots, and using the definition of the total derivative, we obtain the master equation for the snapshot density
    Display Formula
    where Inline Formula and Inline Formula Inline Formula are the marginal distributions of the partition probability. If the molecule numbers are conserved during cell division, we must have B2(x | x′) = B1(x′ − x | x′).

    Integrating the above equation over a small interval Inline Formula containing the newborn cells and using n(x, ε) → n(x, 0), n(x, − ε) → 0 and definition (2.4) of the main text, we find that the second and third term on the right-hand side of the above equation give the boundary condition (2.3c) of the main text. The remaining terms describe the rate of change in between divisions and are equal to equation (2.3a) with (2.3b) of the main text.

    Appendix C. Extension to heritable interdivision times

    We here discuss an extension of the ergodic principle. We consider a general trait x and division times that are statistically dependent between mother daughter cells. Such correlations could arise, for instance, due to heritable factors such as cell size. To proceed we characterize the state of each cell by a cell age τ and a interdivision time τd. The latter is a random quantity that yields the age at which a cell will eventually divide. The distribution of cell ages at division must then satisfy

    Display Formula
    C 1
    for an appropriate choice of the division rate γ(s, τd).

    C.1. Forward lineages

    At stationarity, the division time distribution along a forward lineage can then be described by

    Display Formula
    C 2
    Similar to the case of independent division times, the lineage dynamics of the trait x is a function of cell age
    Display Formula
    C 3
    but does not depend on the future division time. The boundary condition describing cell divisions is
    Display Formula
    C 4
    Note that for generality we have included the case where the division kernel B(x | x′, τd) also depends on heritable factors effectively through its dependence on the age at division τd.

    C.2. Distribution of age and division times

    For the cell population, we describe the density of cells with age τ, future division time τd and the trait x, which follows the evolution equation

    Display Formula
    C 5a
    To account for cell divisions, we supplement the above equation by a boundary condition, which again replaces each mother cell by two daughter cells of zero age but with the trait x′ replaced by x after cell division. The latter can be expressed as
    Display Formula
    C 5b

    Summing equation (C 5a) over all possible trait values x and solving for Π(τ, τd), we find that the age-division-time distribution is

    Display Formula
    C 6
    where θ denotes the Heaviside function. The division times of newborn cells in the population follow Π(τd | 0), which is obtained from
    Display Formula
    C 7
    The constant Π(0) can be determined by integrating equations (C 5) over all possible values of τ, τd and x and combining the results to obtain
    Display Formula
    C 8

    Specifically, marginalizing equation (C 6) over the division times τd, the age distribution becomes

    Display Formula
    C 9
    Further, summing equation (C 5b) over all possible values of x and combing equations (C 1), (C 6) and (C 7) with the boundary condition (C 5b) gives an integral equation for the division time of newborn cells in the population
    Display Formula
    C 10
    Interestingly, when division time is inherited this distribution is different from the one in a forward lineage. The above equation has first been derived by Powell [8] to investigate the effect of mother–daughter correlations, albeit using a different approach. The population growth rate is implicitly defined through the normalizing condition
    Display Formula
    C 11
    In particular, we note that the distribution below the integral is the ancestral division time distribution
    Display Formula
    C 12
    This distribution counts all divisions that occurred in a stationary population up to a certain point in time. The distribution of ‘future’ division times is Π(τd)
    Display Formula
    C 13
    which satisfies the relation
    Display Formula
    C 14

    C.3. Distribution of a trait along histories

    Finally, we turn our attention to the distribution of molecule numbers for cells of a given age. To this end, it is important to note that the molecule numbers are non-anticipating in that they are statistically independent of division time τd but they only depend on cell age

    Display Formula
    C 15
    It is worth pointing out that this statement assumes the division times to be independent of the trait of interest. In effect, the distribution of molecules at a certain cell age evolves according to
    Display Formula
    C 16a
    The boundary condition can be derived by letting Π(τ, x, τd) = Π(x | τ)Π(τ, τd) in equation (C 5b), which gives
    Display Formula
    C 16b
    We observe that this is the same equation as obtained in a forward lineage but replacing the lineage distribution of division times with the ancestral distribution ρ(τd), equation (C 12). Because histories contain only ancestral cells, the above equation also determines the distribution in cell histories. In summary, we have shown that the distribution of a cellular trait along a history equals the distribution in a dividing cell population with inherited division times.

    Appendix D. Properties of the age distributions

    Consider an age distribution of the form

    Display Formula
    D 1
    which for λ = 0 coincides with the age distribution in forward lineages and for λ≠0 with the age distribution in a population. Its Laplace transform Inline Formula can be expressed as follows:
    Display Formula
    D 2
    where Inline Formula denotes the Laplace transform of the division-time distribution φ. For the population, it follows from Inline Formula and Inline Formula that Π(0) = 2λ. Hence, we obtain the Laplace transform
    Display Formula
    D 3
    For forward lineages, we use λ = 0 in equation (D 1) and the fact that
    Display Formula
    D 4
    equals the average division time. It then follows that Π(0) = 1/〈τ〉 and hence
    Display Formula
    D 5
    or, equivalently,
    Display Formula
    D 6

    Footnotes

    Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.