Hosts of avian brood parasites have evolved egg signatures with elevated information content

Hosts of brood-parasitic birds must distinguish their own eggs from parasitic mimics, or pay the cost of mistakenly raising a foreign chick. Egg discrimination is easier when different host females of the same species each lay visually distinctive eggs (egg ‘signatures’), which helps to foil mimicry by parasites. Here, we ask whether brood parasitism is associated with lower levels of correlation between different egg traits in hosts, making individual host signatures more distinctive and informative. We used entropy as an index of the potential information content encoded by nine aspects of colour, pattern and luminance of eggs of different species in two African bird families (Cisticolidae parasitized by cuckoo finches Anomalospiza imberbis, and Ploceidae by diederik cuckoos Chrysococcyx caprius). Parasitized species showed consistently higher entropy in egg traits than did related, unparasitized species. Decomposing entropy into two variation components revealed that this was mainly driven by parasitized species having lower levels of correlation between different egg traits, rather than higher overall levels of variation in each individual egg trait. This suggests that irrespective of the constraints that might operate on individual egg traits, hosts can further improve their defensive ‘signatures' by arranging suites of egg traits into unpredictable combinations.


(b) Variance-correlation decomposition
The variance-covariance matrix can be rewritten as Σ = SRS, a matrix product involving a diagonal matrix of marginal trait-level standard deviations, S = diag(σ 1 , . . . , σ p ), and the trait correlation matrix R (Equation 2-36, p. 59 in ref. [4]). A consequence of this is that the determinant of Σ can be factored as the product of two terms, one involving only the marginal trait variances and the other the trait correlations: (Property 9, p. 206 in ref. [3]). This allows us to decompose entropy into the sum of three terms, a constant and separate terms representing total variation and correlation, respectively. In particular, H(Y ) = 0.5 log (2πe) p + 0.5 log |Σ| where e 1 , . . . , e p are the eigenvalues of R. Only the latter two terms vary from species to species when p, the number of traits, does not vary and all terms scale with the number of traits. Hence, for purposes of our analysis, we ignore the constant term and scale by the number of traits, i.e. characterize variation in This standardized measure of entropy is the sum of the average logged trait variance,Ĥ Var (Y ), and the average of the logged eigenvalues of the correlation matrix,Ĥ Cor (Y ). A species that adapts by exploiting a greater range of variation in a given trait while leaving trait correlations fixed will increase entropy by increasing the first term (note that H(Y ) andĤ(Y ) are increasing functions of the marginal variance terms, σ 2 i ), while one that adapts by reducing correlations among the traits, leaving variances fixed, will increase entropy by increasing the second term. In what follows, we show how reduced dependence among traits corresponds to increased entropy.
First, note that the differential entropy of Y can be decomposed into the sum of conditional and a marginal differential entropies, , the former relating to the conditional distribution of Y 1 given all other Y 's and the latter relating to the marginal (to Y 1 ) distribution of the remaining Y 's (Equation 9.33, p. 230 in ref. [1]). We will use this decomposition to show that differential entropy increases with a weakening of the correlation structure in the form of reduced dependence of one trait (arbitrarily labeled Y 1 ) on the remaining traits. We assume that the distribution of the remaining traits, and hence H( Y 2 , . . . , Y p ), remains unchanged.
Because Y has a multivariate normal distribution, it follows that the conditional distribution 1,2 , . . . , σ 2 1,p ), the vector of covariances between trait one and traits two through p (Result 4.6, p. 135 in ref. [4]). Hence, by Equation 1, Note that the covariance, σ 2 1,i , between traits one and i can be expressed as σ 1 σ i ρ 1,i , a product of marginal standard deviations and the correlation between traits one and i (Equation 2-33, p. 58 in ref. [4]).
Suppose that some bird species have adapted by reducing dependence of trait one on the remaining traits such that the marginal variance of Y 1 , σ 2 1 , as well as the distribution of Y >1 , remain the same. In particular, suppose that the correlations between trait one and traits two through p are diminished by a constant factor 0 ≤ π < 1, so that Σ adapt 1,>1 = (σ 1 σ 2 πρ 1,2 , . . . , σ 1 σ p πρ 1,p ) = πΣ 1,>1 .
For such a species, Note that, since Σ >1,>1 is a positive definite covariance matrix, the quadratic form Hence, the conditional differential entropy for the adapting species is greater than that of its non-adapting conspecifics and therefore its differential entropy, H adapt (Y ), is greater as a result.

(c) Linear constraints among phenotypic measures
Differential entropy is not defined when one or more constraints are present among the variables, as is the case here, where the cone catch measures are standardized to sum to one. This is because the variables do not have a proper density and the integral is not defined (Remark, p. 225 in ref. [1]). There is no variation in one dimension of the phenotype matrix when all variables that are co-standardized are included; it is akin to including a phenotype measurement that is constant. The solution is to restrict attention to the dimensions of the phenotype matrix that do vary. This can be accomplished here by removing any one of the cone catch variables.
In what follows, we show that the value of the total entropyĤ(Y ) does not depend on which cone catch variables are removed but that the balance between the variance and correlation components is sensitive to this choice. For this reason, we repeated our analysis with each of the four cone catch variables removed, in turn.
Denote the multivariate phenotype variable by Y, the elements in that matrix containing the cone catch measures Y LW , Y MW , Y SW , and Y UV and the remaining phenotype elements Hence variation in a given cone catch can be determined from the others by a simple linear relationship. In words, variation in any one of the standardized cone catches is reflected entirely in the remaining three.
Consider, for example, an analysis of the phenotype vector with the LW cone catch removed, This linear operation has the effect of replacing Y MW , the first element of Y (−LW) by Y LW = 1 − Y MW − Y SW − Y UV , while leaving its remaining elements unchanged.
If Y (−LW) is multivariate normal with covariance matrix Σ, then Y (−MW) is multivariate normal with covariance matrix A T ΣA (Result 4.3, p. 132 in ref. [4]). As noted above, the entropy of a multivariate normal phenotype vector is determined by the determinant of its covariance matrix. Note that (Property 9, p. 206 in ref. [3]) since |A| = −1. Hence the differential entropy estimated using Y (−LW) is equal to that estimated using Y (−MW) or when leaving out any one of the other standardized cone catch measures (Tables S2-S4).
While entropy does not change, the relative contributions of the marginal variances and the correlations likely will (Tables S2-S4). To see this, note that any two choices of the phenotype vector will differ by only one element. In the above example, the LW component replaces the MW component; all other components remain the same. As a result, the value ofĤ Var (Y ) changes by a factor of log (σ 2 LW ) − log (σ 2 MW ) /p, where σ 2 LW and σ 2 MW are the marginal variances of the LW and MW cone catches, respectively. This quantity will be small when the two marginal variances are similar and zero only when they are equivalent. This change is offset by a change inĤ Cor (Y ) of the same magnitude, but in the opposite direction.   Table S3: Results of linear models, with UV cone catch removed, relating entropy to parasitism status (see Table 1 legend for parasitism status definitions). In the phylogenetic generalised least squares (PGLS), for each model lambda differed significantly from one but not from zero, indicating little to no phylogenetic signal in the model residuals.  Table S4: Results of linear models, with MW cone catch removed, relating entropy to parasitism status (see Table 1 legend for parasitism status definitions). In the phylogenetic generalised least squares (PGLS), for each model lambda differed significantly from one but not from zero, indicating little to no phylogenetic signal in the model residuals. Removal of the MW cone reduces the contribution of brownish-green eggs, such as those of Eremomela icteropygialis.  Table S5: Results of linear models, with SW cone catch removed, relating entropy to parasitism status (see Table 1 legend for parasitism status definitions). In the phylogenetic generalised least squares (PGLS), for each model lambda differed significantly from one but not from zero, indicating little to no phylogenetic signal in the model residuals.