Only accessible information is useful: insights from gradient-mediated patterning

Information theory is gaining popularity as a tool to characterize performance of biological systems. However, information is commonly quantified without reference to whether or how a system could extract and use it; as a result, information-theoretic quantities are easily misinterpreted. Here, we take the example of pattern-forming developmental systems which are commonly structured as cascades of sequential gene expression steps. Such a multi-tiered structure appears to constitute sub-optimal use of the positional information provided by the input morphogen because noise is added at each tier. However, one must distinguish between the total information in a morphogen and information that can be usefully extracted and interpreted by downstream elements. We demonstrate that quantifying the information that is accessible to the system naturally explains the prevalence of multi-tiered network architectures as a consequence of the noise inherent to the control of gene expression. We support our argument with empirical observations from patterning along the major body axis of the fruit fly embryo. We use this example to highlight the limitations of the standard information-theoretic characterization of biological signalling, which are frequently de-emphasized, and illustrate how they can be resolved.

Only accessible information is useful: insights from gradient-mediated patterning Mikhail Tikhonov, Shawn C. Little, Thomas Gregor

Information carried by a linear morphogen gradient
For a linear morphogenĉ spanning the range [0, c max ], with constant Gaussian noise σ 0 , the information content is given by To show this, we apply the definition of the mutual information: Here P c is the probability distribution of c; in the smallnoise approximation, P c is uniform between 0 and c max , just as P x , the distribution of x, is uniform between 0 and 1. P c|x is the conditional distribution of the concentration of c given x (which is Gaussian of width σ 0 ); angular brackets · x denote averaging over x, and H[P ] is the differential entropy of a probability distribution P : H[P ] ≡ − P (z) ln P (z) dz = − ln P P .
Clearly, H[P c ] = ln c max . The second term, for any x, is the differential entropy of a Gaussian distribution P σ0 of width σ 0 : P σ0 (z) = 1 2πσ 2 0 exp − z 2 2σ 2 0 and therefore: Putting this together, we find: (We add that to express this information in "bits", one should use logarithm base 2 instead of the natural logarithm). This formula represents the information content of a morphogen profile seen as a continuous function. It is a good approximation of the positional information available to a discrete set of cells as long as their number N is large enough that their discreteness can be neglected; this is assumed in the main text. If N is too small, or if noise magnitude σ 0 is too low, the information available to cells becomes limited by their number rather by the noise of the morphogen. Indeed, N cells can have no more than ln N bits of positional information (this amount of information corresponds to each cell being able to determine its position with no error). Therefore, the continuous approximation made here is valid as long as where I is the information content calculated above (or N 2 I if information is expressed in bits). For the Drosophila system, in our region of interest we estimated the raw information content of Hb and Kr to be, respectively, 2.6 and 2.7 bits. It follows that in this system, the continuous assumption is valid only if the number of nuclei in the region of interest satisfies N 6.5. The actual number of nuclei is about 10 (see Fig. S2A). This means that the noise in the two morphogens is quite low, and if nuclei could accurately "measure" Hb and Kr at their position (with precision limited by the noise in these morphogens, rather than their own regulatory circuitry), they could (depending on the degree of correlation between the noise of these morphogens) almost infer their position from Hb and Kr alone. As shown in the main text, the considerably noisier profile of Eve that carries only 2.0 bits of raw information is useful to the system precisely because nuclei can obtain only a noisy estimate of the true morphogen concentration.

Joint accessible information and the zigzag profileẑ
The section "Multiple tiers can improve gradient interpretation even when raw information decreases" considers a zigzag-shaped profileẑ (λ) , with λ a small integer. We observe that repeatedly using the same output value at multiple positions reduces the amount of information carried by this profile alone, but these positions are made distinguishable by the original morphogenĉ, and so considering the joint information carried by both profiles effectively reverts us to the case ofĉ (λ) considered previously.
To make this argument more precise, it is convenient to introduce a discrete "segment" variable s taking integer values s ∈ {1, 2 . . . λ}, which indicates, for each cell, the segment of the zigzag in which it is located.
We now make two observations. First, in the smallnoise approximation, having access toĉ is sufficient to accurately inferŝ, because confusion can arise only in -vicinity of λ − 1 boundaries between segments, where = (η 2 0 + σ 2 0 )/c max , and therefore Second, we observe that Therefore, all statements about raw or accessible information inĉ (λ) directly translate into statements about the pair {ŝ,ẑ (λ) }. In particular, an extra tier of noisy amplification will increase the joint amount of accessible information in this pair under the same condition (7). To save space, the argument in the main text was phrased directly in terms ofĉ instead of the discrete variableŝ.

Experimental procedures
Antibody staining was performed using procedures and antisera described in [8] and [9]. Confocal microscopy was performed at 12 bit resolution on a Leica SP5 with a 20x HC PL APO NA 0.7 immersion objective at 1.4x magnified zoom using pixels of size 135 x 135 nm covering an area of 554x554 mm. For each embryo, 17 images slices were obtained at a z interval of 4 microns, spanning approximately 50% of embryo thickness. All data were collected in a single acquisition cycle using identical scanning parameters.
Estimating expression levels (image processing) The immunostaining procedure described above yields confocal stacks of images where pixel intensity corresponds to the recorded fluorescence level. Image processing was performed with custom scripts written for MatLab (Mathworks, Inc.). Raw data and scripts reproducing Fig. 4 and the supplementary figures are available from the Dryad Digital Repository: http://dx. doi.org/10.5061/dryad.n3s7d. Confocal stacks were converted into projected Hb, Kr and Eve images (such as displayed on Fig. 4A) as the maximum projection of Gaussian-smoothed frames. The width of the averaging kernel (8 pixels, corresponding to approximately 1 µm) was smaller than the radius of the nuclei, therefore for pixels close to the nucleus center the averaging volume was wholly within the nucleus. Smoothing frames prior to maximum projection ensured robustness against imaging noise. In each of N = 8 embryos, the location of nuclei was identified manually. For each of the projected images (Hb, Kr and Eve), we recorded the highest intensity value within 5 pixels of nuclei center locations as the fluorescence intensity in that nucleus. Allowing for a 5-pixel "wiggle room" ensured robustness against registration errors across colour channels, as well as against errors in the manual selection of nuclei center locations. The recorded intensity values were corrected for background autofluorescence by subtracting the mean intensity recorded in nuclei located in non-expressing regions of the embryo. The background-corrected fluorescence values reflect protein concentration, up to a proportionality factor (intensity of a fluorophore). The fractional measurement noise in estimating relative concentrations can be estimated as the standard deviation of pixel intensity values within a nucleus on the projected map. In their respective regions of expression, this standard deviation of Hb, Kr and Eve pixel intensity constituted ≈ 1% of the expression value and was therefore negligible compared to the expression noise observed across nuclei (Fig.  4B). To avoid signal distortion artifacts observed at the edges of the imaged portion of the embryo due to tissue curvature and compression, all analysis was restricted to nuclei located in the low-distortion region selected manually along the imaged embryo center line, typically 20-25 nuclei wide (Fig. S1).
Estimating expression noise (Fig. 4B) Expression noise is defined as: where c recorded is the recorded fluorescent intensity (of Hb, Kr or Eve), and c expected is the expected value at that location. Measuring noise therefore requires a method for constructing c expected . We use a method that we call "haltere-shaped filtering". To introduce and motivate this method, we begin by discuss two simpler alternatives and their limitations: binning by AP coordinate and neighbour averaging.

Binning by AP coordinate
Since gap genes expression is often said to be a function of the location along the antero-posterior (AP) axis, one approach could be to define c expected as the average expression level in all nuclei with a similar AP coordinate. This approach, however, would yield strongly biased results due to the curvature of gene expression domains (Fig. S1).

Neighbour averaging
A better approach is to construct c expected for each nucleus based on the expression levels observed in neighbouring nuclei. Since expression profiles are relatively smooth functions of location, the average of expression levels in nuclei that are immediate neighbours of nucleus i provides a reasonable expectation for c i . Despite being a significant improvement over the naive AP-based method, however, the simple averaging over neighbours provides an unbiased estimate only in regions where the profile shape is well approximated by a linear dependence. In all other cases this estimate will have a bias The simple neighbour-averaging method will underestimate c expected in the regions where the profile is concave, e.g. at the peaks of Eve stripes (nucleus X), and overestimate c expected where the profile is convex, e.g. in the Eve troughs (nucleus Y). A: Eve stripes 2 and 3. Nuclei X and Y marked by smaller circles; the large circles encompass the neighbours over which averaging is performed. B: cnoise as estimated using the neighbour-averaging method, shown as a function of AP coordinate. Black line: window average of cnoise over 50 consecutive nuclei. This average should be close to zero for an unbiased estimate, but exhibits a clear correlation with the Eve profile shape. proportional to the convexity (second derivative) of the mean profile shape. This is particularly clear for the sharply varying profile of Eve (Fig. S2A). This bias can lead to a dangerous artifact, whereby sharply varying profiles would appear to be more noisy, which would be unacceptable for our analysis of the Hb-Kr-Eve system. Fig. S2B shows the inferred c noise as a function of AP axis coordinate. The severity of the bias of the neighbouraveraging method of estimating c expected can be measured by the clearly observed correlation between c noise and the average profile shape of Eve (i.e. c recorded ).

Haltere-shaped filtering
We now describe the procedure we used to construct c expected for our analysis. We begin by creating an "expression map" whereby in the projected image such as depicted in Fig. S1 the value of every pixel is replaced by the expression level c recorded recorded in the nucleus closest to that pixel. The image is then filtered using a haltere-shaped filter depicted in Fig. S3A, and pixel values at each nucleus after filtering define the values of c expected .
This method combines the better qualities of the two approaches discussed above. On a perfectly regular hexagonal lattice, this would be equivalent to the neighbour-averaging method using only the immediate dorsal and ventral neighbours, but the specific procedure we described naturally deal with lattice imperfections. In fact, c noise in Fig. S2B was constructed using this exact procedure, but using an annulus-shaped fil-ter depicted in Fig. S2A. Since the gradient of expression profiles is predominantly aligned with the AP axis, using a haltere-shaped filter greatly reduces any introduced bias (Fig. S3B). The fact that the magnitude of the residual systematic bias (Fig. S3B, solid black line) is significantly smaller than the magnitude of measured noise (root-mean-square scatter of red datapoints) confirms that the procedure we developed successfully eliminates most of the systematic errors due to DV dependence of expression profiles, so that the residual deviations are dominated by a DV-independent component of the noise.
One might expect that for even higher accuracy, the orientation of the haltere filter could be set not by perpendicularity to the imaginary AP axis, but by the isolines of the actual expression profile after sufficiently strong smoothing. However, in practice such an approach is functionally less robust due to the number of tunable parameters, and we empirically found the fixed-angle haltere filtering to result in the lowest bias as measured by the correlation of average c noise in a region and the average c recorded in that same region.
Idealised profiles (Fig. 4C) The expression profiles of long body axis patterning genes in Drosophila form a pattern that, to a good approximation, can be considered one-dimensional. However, as discussed above, due to the curvature of expression profiles, x AP is not the variable that best captures the variance. To estimate positional information in a gene expression pattern using data from single embryos, we therefore use the measured expression pattern shape and noise to construct what we call "idealised profiles". First, we plot the recorded expression values c recorded as a function of x AP and construct a smooth spline fit that captures the mean profile shape; we denote the result µ(x AP ). Next, the same procedure is applied to expression noise, estimated as described above: the smooth spline fit to c 2 noise as a function of x AP describes how the experimentally observed expression noise varies along the AP axis; we denote this root-mean-square deviation function e(x AP ). An expression pattern with mean µ(x AP ) and independent Gaussian noise of magnitude e(x AP ) constitutes the "idealised profile" of a given patterning cue (see Fig. 4C).
Note that when calculating average noise magnitude for a given AP coordinate, expression noise is calculated as described in the previous section, i.e. prior to binning by AP. The result is the average of expression noise measured locally for all nuclei at a similar AP locationas opposed to the variance of expression among all nuclei at the same x AP ; the latter, as we described, suffers from artifacts. The procedure we described effectively straightens out expression stripes: the resulting profile has the same mean and noise magnitude as observed experimentally, but is, by construction, a function of a single variable. This approach contrasts with the procedure of [8] where embryos were imaged in cross-section and only dorsal or ventral "expression profiles" were used, i.e. expression levels were recorded along a particular AP line (from multiple embryos). Here, we use all nuclei observed on a slightly flattened surface of a single embryo, and the variation of expression profile shape with the dorsal-ventral coordinate becomes a major factor.
Computing information content (Fig. 4D) By definition, the information content (or the mutual information) I(c, x) of a profile c(x) is the average reduction of uncertainty of c after x becomes known: Here the first term is the entropy of the full distribution of c, which we denote P c , and S(c|x) is the entropy of the conditional distribution P (c|x). We write: because the position x is uniformly distributed between x min and x max (in our case, the range of AP positions is between x min = 0.37 and x max = 0.47). These formulas express the information content of a one-dimensional profile entirely in terms of the conditional probability function p(c|x). For the idealised profile, at a given AP location x 0 , the conditional distribution p(c|x 0 ) is Gaussian with mean µ(x 0 ) and width e(x 0 ); in particular, the entropy of p(c|x 0 ) is known analytically. Therefore, we compute I(x, c) by numerically performing the integral. We validated our code by computing information content of simple profiles for which the information content can also be calculated analytically.