Estimating the relative importance of demic and cultural diffusion in the spread of the Neolithic in Scandinavia

Using a database of early farming sites in Scandinavia, we estimate that the spread rate of the Neolithic was in the range 0.44–0.66 km yr−1. This is substantially slower (by about 50%) than the rate in continental Europe. We interpret this result in the framework of a new mathematical model that includes horizontal cultural transmission (acculturation), vertical cultural transmission (interbreeding) and demic diffusion (reproduction and dispersal of farmers). To parametrize the model, we estimate reproduction rates of early farmers using archaeological data (sum-calibrated probabilities for the dates of early Neolithic Scandinavian sites) and use them in a wave-of-advance model for the first time. Comparing the model with the archaeological data, we find that the percentage of the spread rate due to cultural diffusion is below 50% (except for very extreme parameter values, and even for them it is below 54%). This strongly suggests that the spread of the Neolithic in Scandinavia was driven mainly by demic diffusion. This conclusion, obtained from archaeological data, agrees qualitatively with the implications of ancient genetic data, but the latter are yet too few in Scandinavia to produce any quantitative percentage for the spread rate due to cultural diffusion. We also find that, on average, fewer than eight hunter–gatherers were incorporated in the Neolithic communities by each group of 10 pioneering farmers, via horizontal and/or vertical cultural transmission.

. Example of distance computation using google maps. The origin (lower circle) is the site of Oxie and the end (upper circle) is the site of Skee. For some other final locations (instead of Skee), google maps includes sea travel for part of the route, but the distances from the coast are small (<100 km) and therefore reasonable if we take into account the navigation capabilities of early farmers (as implied, e.g., by the fact that the Neolithic arrived to the island of Cyprus, which is about 100 km from the mainland). Although the mathematical model in the main paper does not strictly include sea travel (so that an equation for the spread rate can be found), it is reasonable to assume that it will be approximately valid for so small distances over sea. Another option would be to avoid sea travel in the computation of all distances, but this would contradict the fact that the Neolithic arrived to Sweden across the Baltic sea.
In Fig. S2 we show the regression using these shortest-path distances. Thus Fig. S2 is the same as Fig. 2 in the main paper, but using shortest paths instead of great circles. Obviously, the spread rate implied by the regression fit in Fig. S2 should be faster than that by Fig. 2 in the main paper, because there we have used the shortest possible distance between two points on the Earth surface (great circle). The spread rate implied by the regression fit in Fig. S2 is 0.56-0.84 km/yr, which is similar to that in the main paper (0.44-0.66 km/yr). Indeed, replacing the horizontal rectangle in Figs. 3-7 (main paper) by the spread range from Fig. S2 (0.56-0.84 km/yr), it is very easy to see that the implied percentage for the cultural effect is below 50% in all cases except Fig. 7. As explained in the main paper (Sec. 5), the same happens for the range found in the main paper (0.44-0.66 km/yr). Therefore, we conclude that using our procedure to take into account the effects of landscape, vegetation, etc., does not change the conclusions in the main paper.
Finally we would like to stress that clearly cost distances (here defined as road distances by foot obtained using google maps) are more realistic estimations of length than great circles for individual travels, but we should keep in mind the spread rate of a wave of advance is a macroscopic quantity (i.e., it is valid at scales sufficiently large compared to that of the individual displacements, i.e. to the distances in the dispersal kernel). In other words, we should distinguish between the effects of large and small obstacles. On one hand, large obstacles (compared to the distances in the dispersal kernel) can deflect the wave of advance and thus lead to higher distances (and a faster observed spread rate) than great circles. On the other hand, small obstacles (leading, e.g., to a few curves in a road) will affect the length of an individual travel, but we think that it is not realistic to expect that they will have any noticeable effect on the spread rate of the wave of advance or its direction (note that the vector ∆ , ∆ appearing in Eq. (3.1) or (S5) depends on the initial and final locations of each displacement, but not on the detailed path). Therefore, in our opinion the validity of cost distances should not be overstated, and it is reasonable to consider that both great circles (main paper) and cost distances (this section) give reasonable bounds on the observed spread rate.

S2. Non-linear models
In the main paper ( Fig. 2), we have fitted a linear dependency ( = + ) to the oldest dates of Neolithic sites (variable ) versus their distances from a putative origin of spread (variable ). In this section, we consider more complicated (i.e., non-linear) dependencies, and justify that they are not necessary for our purposes.
The Pearson correlation coefficient, , is defined so that it square is given by [2] where the index = 1,2,3, … , identifies the site, is its observed date (defined as the oldest of all dates obtained by radiocarbon dating), is the average date over all sites ( = ∑ / ), and is the date estimated (by the model considered) for site . If the model produced for all sites exactly their observed dates ( = for all ), the numerator in Eq. (S1) would vanish and = 1. Otherwise, the better the agreement between the model and the data, the lower the value of the numerator in Eq. (S1) and the higher the value of . According to linear regression theory [2], 0 ≤ ≤ 1.
Let us first consider, instead of a linear model ( = + ), a second-order polynomial ( = + + ). This implies to include an additional parameter ( ) that will be fitted to obtain the best possible agreement between the model and the data. Thus the values obtained from the second-order model will tend to be closer to the data than for the linear model, i.e., the values of − will tend to be smaller in magnitude, and Eq. (S1) implies that the value of will be higher than for the linear model (except if = 0, but in this case the value of will remain the same because the polynomial becomes of first order, i.e. we would be dealing again with the linear model). Thus, considering a model with more parameters will never decrease the value of . Therefore, we cannot use the Pearson correlation coefficient to determine which model is better. Instead, according to statistical theory, the so-called adjusted correlation coefficient should be used. Its square is given by [ where is the number of parameters in the model considered. Note that is the same as the Pearson correlation coefficient if = 1, i.e. if the model has only one parameter (this happens, e.g., for the model = ). According to Eq. (S2), increasing the value of tends to decrease the value of , but obviously this also leads to a new fitted function , so according to Eq. (S1) the value of will also change. In general, it is justified to increase the value of (number of parameters in the model) if diminishes sufficiently to cause an increase in the value of , given by Eq. (S2) [2].
The results for the data considered in the main paper are as follows. For the linear model ( = + , Fig. 2 in the main paper), = 0.7705. For the second-order model ( = + + ), = 0.7739. The increase in is very small (below 1%). This suggests that there is no point in complicating the model by including the second-order term ( ). This can be confirmed by visual inspection of the two models or, more rigorously, by computation of the spread rates implied by them, as we explain in the next paragraph.
In Fig. S3 we show the regression fits according to the linear and second-order models. It is seen that both models yield essentially the same dependency. The deviation at distances above 1,200 km is likely due to the fact that there are only 5 dated sites in this range. Indeed, in the range 0-1,200 km the spread rate (inverse of the slope) implied by the second-order model (blue line in Fig. S3) is 0.41-0.70 km/yr. This is very similar to the range estimated from the linear model in the main paper, namely 0.44-0.66 km/yr with 95% confidence level (CL) or 0.40-0.70 km/yr with 99% CL (black line in Fig. S3 or Fig. 2 in the main paper). Therefore, we can be sure that all of the conclusions of the main paper would be the same if we used the second-order model instead of the linear one.
Because there is no substantial increase in , and also because the results would not change, there is no point in complicating the model by including a second-order term. This is why we have used the linear model in the main paper. Below we discuss two other non-linear models and arrive at the same conclusion. We have also considered a third-order polynomial model ( = + + + ). Again, the value of is almost the same as for the first and second-order models above. In fact = 0.7702, which implies that does not increase but decreases, so there is surely no improvement in this case (not even a negligible one).
Finally we consider an exponential model ( = ). We see in Fig. S4 that the behavior is again almost linear. In this case, = 0.7778, so the increase relative to the linear model ( = 0.7705) is again below 1%. Moreover, we can estimate the spread rate by computing the inverse of the slope of the exponential model (red line) in Fig. S4 in the distance range 0-1200 km/yr (as done above for the second-order model). The result is 0.41-0.68 km/yr, once more very similar to the range estimated from the linear model in the main paper (0.44-0.66 km/yr with 95% CL, or 0.40-0.70 km/yr with 99% CL). Therefore, we can be confident that all of the conclusions of the main paper would not change if we considered an exponential model rather than a linear one.
For all three non-linear models tested, complicating the linear model leads to negligible changes in the value of . Clearly, this indicates that there is no point in considering non-linear models. As explained above, this is also confirmed by the corresponding ranges of the spread rate, which are almost the same as for the linear model and would, therefore, lead to the same conclusions as those reported in the main paper.

S3. Justification of the origin of distances
In the main paper, we have used the site of Oxie (4,200 cal yr BC) as the origin of distances, because it is the oldest site in the database. Here we check statistically that Oxie is a reasonable origin, by comparing the corresponding value of the Pearson correlation coefficient to those using other origins. Alternatively, instead of the Pearson correlation coefficient , other measures could be used (e.g., Akaike's Information Criterion, AIC). However, simulated dispersals have shown than those indices are maximized for the same origin as provided that > 0.3. This is indeed our case, because in the main paper we have found > 0.7 assuming Oxie as origin. Therefore, for our purposes it is justified to use the Pearson correlation coefficient .
We defined a square grid with nodes separated 1º latitude and 1º longitude, covering the whole area in Fig. 1 in the main paper. We considered all nodes not located on the sea as possible origins of the Neolithic spread. For each origin, we computed the distance from it to all 70 sites in our database, and used these distances to find the linear regression of the dates of all sites versus these distances. In this way, we obtained the Pearson correlation coefficient for each origin or node. Figure S5 shows an interpolation of these values of . Similarly to previous work [3,4], we identify the most likely area for the location of the origin of the dispersal. This is the red area in Fig. S5, which has the highest values of for our database. As mentioned in the main paper, the site of Oxie is located near the coast of the southern tip of Sweden, so it falls within the red area in Fig. S5. Therefore, it is clearly reasonable to use Oxie as origin of distances. Alternatively, we could use as origin the only node within the red area in Fig. S5 that has a value of higher than Oxie, namely the point with coordinates (56N, 14E). Using all 70 sites, this would yield = 0.799. The difference in the value of relative to Oxie ( = 0.792) is very small (below 1%). Moreover, the spread range using the point (56N, 14E) as origin is 0.39-0.57 km/yr. This is very similar to that estimated using Oxie as origin in the main paper (0.42-0.62 km/yr, also with 70 sites), so we can be sure that the conclusions of our paper would not change (in fact they would be strengthened, because a slower observed speed corresponds to a lower cultural effect).
For completeness we mention that, if we preferred to use only 63 sites (instead of all 70 sites in the database), i.e., if we excluded the 7 sites in Denmark and Finland (as explained in the main paper), the results would be 0.40-0.60 km/yr ( = 0.786) for the origin at (56N, 14E), versus 0.44-0.66 km/yr for the origin at Oxie ( = 0.775) Again both ranges are so similar that the conclusions of our paper would be the same (in fact they would be strengthened if using the origin at (56N, 14E), for the reason given in the last sentence of the previous paragraph).

S4. Details on the estimation of the spread rate
The great-circle distance between two locations i and j as a function of their geographical coordinates (latitude φ and longitude λ) is given by the Haversine equation [5], where is the mean Earth radius ( =6371 km).
As in previous work [6], we consider time as the dependent variable because it has presumably more error than distance (see main paper, Sec. 2). Therefore, the spread rate (in km/yr) is obtained as the inverse of the slope of the time-space regression, i.e. where slope  is standard error of the slope. We have used Eqs. (S2) and (S3) to compute the 95% confidence-level (CL) interval for the spread rate, i.e., the range [8,9]   speed speed where t is Student's t-distribution for a 95% CL and − 2 degrees of freedom, and is the number of sites. In the present work ≥ 63 and therefore ≈ 2 [8,9].

S5. Details on the wave-of-advance model
The evolution of the population densities of farmers and hunter-gatherers is driven by Eqs.
where ∆ , ∆ is the dispersal kernel of population =N, P. The integrals in Eqs. (S5) cover the whole space because, in general, humans can move arbitrary distances and directions. However, below Eq. (S14) and in the main paper we consider specific dispersal kernels, so that in practice the distances and probabilities of motion are not arbitrary.
In Eqs. (S5) we have introduced
Horizontal transmission (acculturation) is given by . The additional parameter in Eq.
(S8) takes into account that a HG can have different probabilities to learn from other HGs than from farmers [12].

(S9)
This equation follows from the fact that cultural transmission theory [11,13] has shown by that the probability that a farmer mates a HG is proportional to the frequency of HGs, i.e. . Under random mating = 1, but members of two populations with substantially different cultures (farmers and HGs in our case) obviously do not mate at random. Multiplying this probability by the number of farmers, i.e., [ ( , , )], yields the number of HGs that become farmers due to interbreeding, i.e., Eq. (S9) [13].
We apply the usual mathematical approach (linearization method) [14] to derive the front speed by considering a region where the pioneering populations of farmers arrive but hunter-gatherers are still close to their carrying capacity, i.e. (S13) From Fig. 1 in the main paper, it is obvious that data are not detailed enough to measure different rates in different regions or directions. Therefore, there is no need to complicate the model with such features, and we can simply assume that all parameter values (and thus the speed) are uniform and isotropic. Then the easiest way to compute the speed is to do so along the -axis ( = 0), by considering a coordinate frame = − moving with the wave of advance (we have introduced the symbol s for the spread rate or front speed). In order to find the speed we have to assume, as usual [14], that the population density decreases exponentially at large distances, ( , , ) ∝ exp [− ] → 0 for → ∞ (with > 0). In this way, Eq. (S13) yields where we have applied polar coordinates (∆= ∆ + ∆ , = ∆ ∆ , and ∆ ∆ = Δ Δ ), so that the angle is defined so that ∆ = Δ cos .
When dispersal data are measured, they are grouped in histograms or kernels per unit length, (∆), defined as the probability to disperse into a ring of radius Δ and width Δ, divided by Δ. Therefore, if the data indicate that N-individuals (i.e., farmers) have probabilities to disperse at distances (j=1,2,...,M), we can write that where ( ) ( ) is the 1D Dirac delta centered at (i.e., a function that vanishes everywhere except at ∆= ). Obviously the total probability must be one, and (∆) is clearly a probability per unit length. In contrast, the kernel ∆ , ∆ used up to now is a probability per unit area (because in Eq. (S5) it is multiplied by ∆ ∆ , which has units of area). Therefore, where we have assumed the kernel is isotropic, ∆ , ∆ = (Δ) (because there are no ethnographic data detailed enough to detect anisotropies). Comparing the former two equations, we see that the dispersal probability per unit length (Δ) is related to that per unit area (Δ) as [14] (Δ) = 2 Δ (Δ) and thus Finally, according to linear stability analysis, the minimum speed is that of the front [15]. Therefore, we obtain from Eq. (S14) that the spread rate of the farming waves of advance driven by Eqs. can be considered as a measure of the joint intensity of horizontal and vertical cultural transmission because, without net reproduction ( = 0) neither dispersal (i.e., replacing the integrals by ( )), Eq. (S13) becomes simply in Eq. (S15) is the modified Bessel function of the first kind and order zero.
As mentioned above, dispersal data are usually reported using histograms, in which the quantity (which appears in Eq. (S15)) is the probability for farmers to disperse a distance , which is the mean distance of the corresponding histogram bin, and each bin is identified by the index = 1,2, … , .
Finally we note that the assumption that space is homogeneous, in the sense that humans can live everywhere on the landscape, is necessary to obtain Eq. (S15), because otherwise it would not be possible to perform the integrals in Eqs. (S14) analytically.