Joint assessment of density correlations and fluctuations for analysing spatial tree patterns

Inferring the processes underlying the emergence of observed patterns is a key challenge in theoretical ecology. Much effort has been made in the past decades to collect extensive and detailed information about the spatial distribution of tropical rainforests, as demonstrated, e.g. in the 50 ha tropical forest plot on Barro Colorado Island, Panama. These kinds of plots have been crucial to shed light on diverse qualitative features, emerging both at the single-species or the community level, like the spatial aggregation or clustering at short scales. Here, we build on the progress made in the study of the density correlation functions applied to biological systems, focusing on the importance of accurately defining the borders of the set of trees, and removing the induced biases. We also pinpoint the importance of combining the study of correlations with the scale dependence of fluctuations in density, which are linked to the well-known empirical Taylor’s power law. Density correlations and fluctuations, in conjunction, provide a unique opportunity to interpret the behaviours and, possibly, to allow comparisons between data and models. We also study such quantities in models of spatial patterns and, in particular, we find that a spatially explicit neutral model generates patterns with many qualitative features in common with the empirical ones.

to compute the borders with α-shape we used a dedicated and efficient Matlab library, † and computed the covered area via Delaunay triangulation, which is also implemented in the same library.
The remaining problem is now the choice of the free parameter α. If the value of α is too large we get back the convex hull, while if it is too small, all the discs will penetrate into the distribution of points, breaking it up completely and producing a collection of small sets of points. Given these limiting cases, typically a good way of finding an appropriate value of α is to study the behaviour of the covered area (or, equivalently, of the density) as a function of α. Thus, when the concavities of different sizes are eliminated, a sudden jump of the area is expected, followed by a plateau which finally yields a good estimation α small enough but larger than the typical first-neighbour distance. As an example, in the inset of Fig. SI.2a we show the behaviour of the total covered area, estimated via Delaunay triangulation 1 , for a set of Poisson-distributed points in a square domain of side L = 20 where we removed two semicircles of radius r 1 = 3 and r 2 = 8. As clear from the plot, below a value of α ≈ r 1 the covered area converges to the analytical value, L 2 − π 2 (r 2 1 + r 2 2 )/4, provided the number of points is large enough. Taylor's law, using the borders as defined by the CH (red) or α−shapes for α = 2.5 (orange). Black dashed line stands for theoretical results, g(r) = 1 and δ r n r = n r 1/2 . Inset of (a): covered area versus α for different number of points included in the same set as main panel. Notice that the covered area A α converges to the analytical value (dashed line) for α < 3 when the number of points N is large enough (in this case N 10 4 ).
The main panel of Fig. SI.2a shows the effects of using the different boundaries on the computation of the pair correlation function (PCF), g(r). When using the convex hull, g(r) is found to be greater than 1 at small scales (r < 6) and converge to ≈ 1 at large scales. This is a spurious "clumping" effect due to an incorrect choice of borders. In contrast, when using α-shape, the expected result g(r) = 1 is found.
In Fig. SI.2b we show how biases induced by the incorrect identification of the borders impact on the evaluation of the density fluctuations, i.e. Taylor's law (TL). Without α-shapes, i.e. using the borders of the square, one would have found strong deviations of TL exponent from the Poisson value γ = 1/2. Indeed, when failing to exclude the empty area we observe a power law with exponent γ ≈ 1, which via Eq. (3.3) of main text would imply fluctuations that do not decay with the scale of observation. Instead, with the proper borders (i.e. removing the empty semicircles from the computation), and using the Hanisch method for avoiding border bias, we obtain the expected exponent γ = 1/2. The main reason for such spurious results is that empty cells are those which typically increase the fluctuations, so one should ensure, at each scale r, to include only the empty or semi-empty cells which really belong to the set of points. We conclude observing that at very small scales (gray shaded area in Fig. SI.2b) γ = 1/2 also with the wrong borders provided n r < 1, this is the trival regime discussed in Sec. 3 of the main text.

SI-1.1: Area and density estimation in BCI using the α-shapes
The dependence of covered area or density with α is not always as clear as in the inset of Fig. SI.2a. In actual BCI data many scales can be involved, and the step-like behaviour of the covered area is more an exception than a rule. This is particularly clear from Figs. SI.3a-b where we show, for H. prunifolius and T. panamensis the dependence on α of the fraction of empty area (A − A α )/A (where A and is the area of the rectangular BCI plot and A α that determined with the α-shape algorithm). We observe an almost continuous and monotonic decay with α until the CH limit, in which no concavities are identified. Figures SI.3c-d show that the same applies to the relative difference between the α-shape estimated density ρ α = N/A α and the naive estimation, ρ 0 = N/A.
H. prunifolius T. panamensis In the absence of clear plateaus, one is forced to use a more subjective criterion to fix the value of α. Consider for instance the difference in covered area for two different values of α for H. prunifolius in the insets of Figs. SI.3a,c. To choose α we proceeded as follows: first, we always required α to be significantly larger than the mean distance between nearest-neighbour trees, to avoid fragmentation of the system (otherwise the covered area would be unreasonably small, see Figs. SI.3a-b). Inter-particle distances are plotted with vertical bars in the main panels of Figs.SI.3a-b. Second, we have taken α small enough to assure the identification of empty areas in the spatial distribution of different species, which were always checked by visual inspection. Further, to be more systematic, we also selected α-values when the continuous decay of the estimated area changes its curvature.
As seen in the insets of Fig. SI.3, for H. prunifolius (see also Fig. 2 of main text) and T. panamensis such criterion provides a reasonable estimation of the covered area. In particular, for H. prunifolius our choice identified the empty region depicted in Fig. 2 of the main text (and in the inset of Fig. SI.3a) which actually corresponds to swampland region unfavorable to the establishement of this species 2 . Unfortunately, for other species we do not have additional information of this kind to check the validity of the choice of α. Comparing the insets of Figs. SI.3b,d corresponding to the area covered by T. panamensis for the same value of α but for the first and last census, one can appreciate how some of the holes present in census 1 were filled in census 8, meaning that this species experienced an expansion in between the two censuses, whose consequences for the PCF are discussed in the main text (in particular Fig. 4d).
In structure (and as time progress looks more and more homogeneous), unlike P. armata. This is reflected in the behavior of density correlations (Fig. 5 of main text) and fluctuations (Fig.SI.8 and SI.9).

SI-2: Density correlations and fluctuations at the single species level
As discussed in the main text, the most abundant species in Barro Colorado, while coherently showing evidence of clumping at small scales (g(r) > 1), clearly display very different behaviours at large scales. In the main text we mainly discussed three representative species. Here, we analyze those of Fig. SI.4, examining the effects of using α-shapes on both the PCF (Fig. SI.6) and Taylor's law (Fig. SI.7). Figure SI.6a shows the PCF computed using the rectangular plot borders. For all species we observe clumping at short distances, but at larger scales behaviours range from clear anticorrelation (g(r) < 1) at intermediate distances (see e.g. H. triandra), to species displaying a large scale plateaux at values either larger than 1 (e.g. F. occidentalis and S. simplex) or very near 1 (as e.g. H. prunifolius and G. intermedia). When using α-shapes to find the border, as shown in Fig. SI.6b, some of the species recover a plateaux to 1 (T. panamensis) and anticorrelations essentially disappear, but still a quite large variability in the large scale behavior persists for other species. The fact that anti-correlations disappear means that some voids have been removed (this is likely the case of H. triandra, see e.g. Fig. SI.4). As briefly discussed in the main text, this points out a delicate issue when using methods such as α-shapes to find borders: anticorrelations are not always spurious features of a tree distribution, so that when removing them one should be aware that one might be masking a genuine process responsible for them. On the other hand, in cases such as T. panamensis, which displays mild anticorrelation at large distances in Fig. SI.6a (see also main text Fig. 4d), the most likely explanation is that the species is experiencing an expansion (see also the insets of Fig. SI.3b-d).
Moving on to density fluctuations via Taylor's law (Sec. 3 of main text), we show in Fig. SI.7 how the root mean square deviations of number of trees in cells of size r, δ r n, changes with mean n r . We observed that apart from small inessential quantitative changes, the TL is not sensitive to using α-shapes or not. This is somehow surprising for H. prunifolius,f for which a large hole in the domain is present. As for the quantitative aspects, we found that besides the trivial convergence to γ = 1/2 below the interparticle distance (gray shaded area), at large scales the behavior is always anomalous (γ > 1/2) for all species, with γ varying in the range [0.73 : 0.85]. The effect of using α-shapes or not is to produce small changes in the exponent of the single species, but the range of values remains basically the same.
In Figs. SI.8 and SI.9, we show density fluctuations for the two contracting species P. cordulatum and P. armata, showing the effect of using α-shapes. We observe different behavior for the two species. For P. cordulatum we find strong dependence on the definition of the border, especially in census 4. In particular, γ seems to approach the value 1/2 indicative of a tendency of recovery of homogeneity. This seems to be confirmed by the behavior of the PCF shown in Fig.5a of the main text and with the visual impression from the top panels of Fig. SI.5, which show a qualitative change of the tree distribution in census 4 with respect to the previous two. For P. armata we found that γ changes from 0.9 to 0.85, but there is no tendency to approach a homogeneous process.

SI-3: Dual representation of the spatially explicit neutral model
The multispecies voter model has been simulated exploiting its duality with a system of coalescing random walkers with a killing rate [3][4][5] . The main advantage of this method is that it allows for very fast simulations to produce independent realization of sample patterns at the (non-equilibrium) stationary state virtually free from boundary effects such as that can be introduced by periodic boundary conditions (typically employed when simulating in the forward representation described in the main text).
The dual process is built as follows. At the beginning, each lattice site is filled with a random walker. The dual process proceeds backwards in time to reconstruct the ancestry of the species. At each discrete (backward) time step, with probability 1 − ν, a randomly chosen walker is moved to a different site (which can be outside the sampled domain, since the lattice is infinite though we only observe a finite portion), chosen according to a distribution which depends on the distance from the original site r -i.e. the dispersal kernel P(r r r), which we have chosen to be Gaussian centred in the original site and with standard deviation σ . If the landing site is occupied, the two walkers coalesce and one of them is removed, keeping trace of the coalescing partner. With complementary probability ν, the randomly chosen walker is killed. This corresponds to a speciation event in the forward in time description. The simulation proceeds until only one walker is present. Finally, having stored the whole tree of coalescences and knowing which walker was killed one can trace back the entire genealogical tree of a species up to the speciation event that originated it. Then, having labelled each walker in such a way to be able to identify its initial position, one can assign to each site of the lattice. Since the number of walkers decreases at each coalescence or killing event, the simulation time becomes faster as time proceeds. Of course, this procedure can be used only if one is interested in the static, long-term, properties of the model. Spatial patterns for different species in the MVM. Parameters: σ = 9, ν = 3.8 · 10 −6 and N i ∈ (142000 − 157000).

SI-3.1: Effects of α − shapes method
In the case of the above described spatially explicit neutral model we have a complete epistemic knowledge of the system and, in particular, we know that the (eco-)system is homogeneous in the sense that all points occupied by trees are equivalent a priori. This means that there are no reasons to consider borders besides the square where the process is simulated, which should be regarded as a sample of an infinite system. However, as clear from Fig. SI.11 voids are present due to the competition between species. Thus, for the sake of completeness and comparison with BCI, it is worth considering the effect of α-shapes also within the patterns generated with the neutral model.
In Fig. SI.12 we show the border of the spatial patterns for some selected species in a neutral environment together with the value of α we used. Regarding to the g(r), some differences between both cases (see Fig.SI.13a and Fig.SI.13b) can be observed. Firstly, the behavior at short scales is not affected by the α-shape method. That is, clumping behavior is completely robust against changes of the borders. For larger scales, the effect of α − shapes consists of mainly removing anticorrelations. However, we stress again, in this case there is no plausible reason -as also discussed above and in the main text-to remove such anticorrelations in the pattern generated by this model. For the sake of completeness, we also investigated the possible role of boundaries in the behavior of spatial density fluctuations, i.e. in the behavior of the Taylor's law. As shown in Fig.SI.14a and Fig.SI.14b, we have not found any bias on the value of the exponent γ when the α − shape method is employed. Thus, we can safely say that -for reasonable values of al pha, larger than the interparticle distance-large spatial fluctuations do not depend on the selection of the borders for the neutral model, at least for the most abundant species there represented. Figure SI.14. Spatial density fluctuations for the ten selected simulated species with the neutral model (a) without α − shapes and (b) employing the α − shape method with a predefined radius α = 30. There is no qualitatively change between both cases.