Technology networks: the autocatalytic origins of innovation

We analyse the autocatalytic structure of technological networks and evaluate its significance for the dynamics of innovation patenting. To this aim, we define a directed network of technological fields based on the International Patents Classification, in which a source node is connected to a receiver node via a link if patenting activity in the source field anticipates patents in the receiver field in the same region more frequently than we would expect at random. We show that the evolution of the technology network is compatible with the presence of a growing autocatalytic structure, i.e. a portion of the network in which technological fields mutually benefit from being connected to one another. We further show that technological fields in the core of the autocatalytic set display greater fitness, i.e. they tend to appear in a greater number of patents, thus suggesting the presence of positive spillovers as well as positive reinforcement. Finally, we observe that core shifts take place whereby different groups of technology fields alternate within the autocatalytic structure; this points to the importance of recombinant innovation taking place between close as well as distant fields of the hierarchical classification of technological fields.


LN, 0000-0001-7829-7340
We analyse the autocatalytic structure of technological networks and evaluate its significance for the dynamics of innovation patenting. To this aim, we define a directed network of technological fields based on the International Patents Classification, in which a source node is connected to a receiver node via a link if patenting activity in the source field anticipates patents in the receiver field in the same region more frequently than we would expect at random. We show that the evolution of the technology network is compatible with the presence of a growing autocatalytic structure, i.e. a portion of the network in which technological fields mutually benefit from being connected to one another. We further show that technological fields in the core of the autocatalytic set display greater fitness, i.e. they tend to appear in a greater number of patents, thus suggesting the presence of positive spillovers as well as positive reinforcement. Finally, we observe that core shifts take place whereby different groups of technology fields alternate within the autocatalytic structure; this points to the importance of recombinant innovation taking place between close as well as distant fields of the hierarchical classification of technological fields. system they populate. This has implied a change of perspective from linear narratives, in which the direction of causality connecting phenomena is unambiguous, towards processes of cumulative causation [1,2]. This relation is particularly evident in the study of biological systems, where models of pure resource competition are unable to explain the persistent variety of ecosystems. To this aim, it is useful to introduce the idea that some species in a heterogeneous population can serve as catalysts (or inhibitors) for the survival of other species. The interaction between prey and predators is one of the best-known examples of this kind of relation, but a similar mechanism has also been observed in settings like plant-pollinator interactions, opportunistic behaviour and symbiotic relations [3]. A particularly relevant feature of these ecological systems is the presence of autocatalytic sets (ACSs) [4]-self-sustaining subsystems, in which each species benefits directly or indirectly from its cohabitation with the others. The relevance of interactions in the above framework lends itself to a complex systems interpretation for which networks are a natural tool of analysis.

Introduction
The idea of catalytic interaction can be fruitfully extended to human systems and, in particular, to the realm of technological innovation. In this setting, interactions take place between technological fields whenever existing ideas are applied to new problems or used to bridge previously unrelated fields, thereby expanding the set of technological capabilities and spawning further innovation [5][6][7][8][9][10]. Consider, for instance, the development of lasers, which have opened the way to a number of innovations in different industries, ranging from telecommunication to data storage and health care. Also the combination of multiple technological fields can produce radical-possibly disruptive-innovations or technological 'convergence' [11][12][13]. An example of the former is the case of opto-electronics, since optical and electrical devices lie at the root of the technological framework for modern telecommunication systems; a notable example of convergence is instead provided by smart phones and electronic tablets, which combine functionalities that could previously be found only separately in computers, telephones and television sets. In general, combining knowledge from previously isolated domains has become extremely relevant in several innovation-oriented domains, such as academic research projects-which often involve scientific collaborations between groups with heterogeneous backgrounds [14]-and industrial endeavours-where R&D collaborations have become common practice [15][16][17] especially in sectors characterized by a quick pace of technological progress (e.g. biotechnology [18] and information technology [19]). For this reason, the network structure of both scientific [20][21][22] and industrial collaborations [23] has been studied in depth in the past.
This study investigates the autocatalytic structure of the network of interactions between technological fields extracted from patent data and shows how the tools of complex systems analysis are able to forecast the evolution and future relevance of individual fields based on their role in the interaction network. Following [24], we say that a technological field is catalytic for another one if the development in a region of innovations involving the former is positively associated with the future emergence of innovative expertise in the latter in the same region. We employ patents as a proxy for inventions, in line with an established body of research about the patterns of technological change, which has been pioneered by the scholarly [25,26] and institutional effort to tap into their potential to shed light on relevant open questions concerning the drivers of technological progress, the relative importance of technological domains, and the significance of technological proximity vis à vis technological variety for the emergence of radical and incremental innovation [27]. One of the decisive advantages of using patents as sources of data stems from the fact that exclusive commercial rights to an invention are granted to applicants provided that they publish a complete description of the patented invention allowing it to be replicated by others once the exclusive rights expire. In order to assess innovativeness, patent offices map the claimed novel features of each invention to the technological fields it impacts through a standard classification system and collect the above information into dedicated databases. This leads both to an extensive coverage of the innovation spectrum and a high degree of standardization that allows large-scale analysis.
The rest of paper is organized as follows. Section 2 defines the technological network based on patent classification codes; §3 summarizes the features of autocatalytic networks; §4 reports the results of our study of the autocatalytic structure of the technology network, and §5 concludes.

Connecting regions and technological fields
Our analysis relies on the patent data contained in PATSTAT [28], a comprehensive database collecting information about applications filed at national and regional patent offices all around the world. PATSTAT contains several tables linking over 50 million patent applications to information such as the filing date of applications, the patent families 1 they belong to, and their technological content as encoded by the International Patent Classification (IPC) codes assigned to the patent claims by patent office examiners. IPC codes define a hierarchical classification consisting of five levels (sections, classes, subclasses, groups, subgroups), which includes eight codes at the coarsest level (sections) and over 70 000 codes at the bottom of the classification tree.
We associate patents to the location of their assignees through ORBIS, a commercial database of firm-level data maintained by the Bureau van Dijk, which collects the list of PATSTAT identifiers of the applications filed by companies that have been active in patenting at some point in time. We match the technological codes associated with patent families with the firm-level data so to unambiguously localize firms geographically through their country of residence and their postal code. This allows us to construct a geographical matrix attributing IPC codes to regions through the patent portfolios and the geographical locations of the companies. 2 We observe different regions worldwide across time to uncover the effects that innovation in a field produces on other fields within the same region. In order to build comparable regions across different countries, we need a spatial identification system. For European countries, we connect postal codes of patenting companies to the associated NUTS 3 regions 3 of the standard European classification (corresponding e.g. to provinces in Italy and districts in the UK). Since extra-European countries are not included in the NUTS classification, we resort to national classifications when necessary and align them to achieve a broad overall accordance with the employed European classification and hence assure the consistency of the geographical tree. Our data define a spacial hierarchy comprising over 3000 regions in 39 countries and aggregating information about the location of around 500 000 patenting firms.
As mentioned above, the basic units of observation for the construction of the data matrices (W(t)) are individual firms and the patents they own. In particular, we group patents into families and consider the latter as individual inventions because of the strong contiguity between documents they group together. In building the matrices, we assume that every family containing patents filed in year t counts as one unit and weighs accordingly within W(t). Moreover, we make the hypothesis that the technologies expressed within a patent family can be reasonably accounted for by considering the set of unique IPC codes they contain. This way, we avoid double-counting codes that appear in patent applications filed in the same year and belong to the same family. We evenly split the unit of weight attributed to each family that is active in a given year between all unique combinations of technology codes and regions it maps to. We thus define element W r,i (t) as the sum, for all families containing patents filed in year t, of the shares attributed to field i and region r.
For the purpose of the analysis, we need to transform each W(t) into a presence-absence matrix. In line with the literature, we assign a value of 1 to a location-technology pair if the corresponding value W r,i (t) is compatible with a measure of revealed advantage. This allows us to reduce noise and avoid overstating the relevance of technological fields in those regions in which they play only a marginal role. In particular, we use revealed comparative advantage [29] to produce a matrix M(t) in which M r,i (t) is recorded as a presence (M r,i (t) = 1) if in the corresponding W(t) we have that and an absence (M r,i (t) = 0) otherwise. Owing to time lags between patent filing and data publication by patent offices, the version of PATSTAT we employ (2014a) contains reliable data up to 2011, after which coverage falls sharply. For this reason, 2011 is the most recent year we include in the analysis. As for the left extreme of the time interval, coverage is not an issue, especially for the second half of the twentieth century, even though the number of patents filed decreases quickly going backwards. We stop at 1980, because it strikes a balance between having a long time interval for the analysis and dealing with yearly M(t) matrices that are not excessively sparse.

Directed network between technological fields
The aim of this work is to measure the relationship between the patenting activity taking place within a geographical region (r) involving technological field i at time t and the patenting activity performed in r involving a possibly different field (j) at time t + δ. 4 To this end, we count how often patents in field i are present at time t in regions that produce patents in field j at time t + δ. We discount for regional diversification d r (t)-i.e. the number of fields in which region r is active at time t-and the ubiquity of different fields u i (t)-i.e. the number of regions in which each field is represented at time t-to establish a measure of the excess probability that innovation in a technological field precedes innovation in another field in the same place. Applying the procedure proposed in [24] to the conceptual framework proposed in [30][31][32], we obtain where d r (t) = j M r,j (t) and u i (t) = r M r,i (t). B i,j (t, δ) can be interpreted as the probability that a region which has a revealed competitive advantage at time t in the field i will also display a revealed competitive advantage at time t + δ in field j: assuming that the information about the capabilities linking pairs of technological fields is fully captured by their co-occurrence within each region, i.e. that probability( An equivalent way of interpreting equation (2.3) is illustrated in figure 1, which depicts the tripartite directed network connecting (i) technological field i at time t to (ii) regions, and regions to (iii) technological field j at time t + δ. In this framework, B i,j (t, δ) is equivalent to the probability that a random walk on the network starting from technology i at time t reaches technology j by time t + δ. It is worth noting that the choice of the scale of analysis is not neutral, since-like any socioeconomic process-also innovation has different characteristics depending on the resolution at which it is observed. However, equation (2.3) can be naturally applied to observe the system at different technological granularities. In what follows, we employ IPC classes for the analysis presented in the main text and later check the robustness of the results on the more disaggregated IPC subclasses (see appendix A). Classes and subclasses are two nested levels of the IPC hierarchy that divide the spectrum of technological fields associated with patents, respectively, into 121 and 640 fields.

Assessing link significance
Matrix B(t, δ) of equation (2.2) represents a directed weighted network connecting all technologies. 5 However, to decide whether a link between two technologies is statistically significant, a null model is required to account for the fact that some links could seem relevant owing to the properties of the graph without, however, being the product of any actual catalytic effect. For example, very advanced technologies can be developed only by a minority of regions around the World and, for this reason, the associated codes might often appear in the same region at different points in time, even though no catalytic relation connects them. Following [24], we use the bipartite configuration model [33] to test the empirical network against a randomly generated counterpart that displays, on average, the same degree distribution. Thus, in the random graphs we generate, each region has the same expected diversification in terms of technological codes as the empirical data and each technological code has the empirically observed expected ubiquity. This choice of null model implies that the degree of the nodes is the only information we extract from the empirical matrices to construct the null matrices. Generating a large number of null matrices with the same null model (in our case, 1000 null matrices for each pair of years), it is possible to establish the significance of each link between technologies and we compute a matrix P, in which element P i,j represents the percentile of the null distribution associated with B i,j . This allows us to define the statistical significance of each individual link.
Finally, we construct the unweighed directed adjacency matrix C that contains only the significant links in the network. A link from field i to field j is included in C if the corresponding P i,j is larger than a fixed threshold, say 1 − p. However, if this comparison is performed separately for each pair of fields based solely on the significance of individual links, by definition, we expect a share p of false positive links to be retained and the probability of false positive links for the whole network to be higher than the desired level p. This is known as the multiple comparisons problem in the statistics literature [34]. A method to accurately control for the proportion of false positive links simultaneously for all pairwise link tests, and thus maintain the overall significance level p, is the false-discovery-rate procedure of Benjamini & Hochberg [35]. The network used in our analysis is constructed by applying this procedure and retaining the significant links in the network derived from matrix P that, according to the falsediscovery-rate procedure, ensure an overall significance level p = 5%.

Autocatalytic networks
The matrix C defined in §2.3 is a directed adjacency matrix representing the links between the N nodes that represent technological fields: a directed link from a technological field i (source) to another technological field j (receiver) is present (C i,j = 1) if there is a significant signal suggesting that patenting activity in i promoted patenting activity in j in the same region. Within this representation, directed technological links either exist or not, and the technology network is completely specified by a binary adjacency matrix of size N × N with elements C i,j . Inspired by Kauffman [1] and Jain & Krishna [4,36], we propose a model of network evolution for biological systems based on a catalytic interpretation of the relationship between species, which we apply to the case of interactions between technologies. To describe the fundamental analytical properties of catalytic systems, we use the simplest dynamical model where y i is the intensity of activity in technological class i. In this simple example, activity in field i pushes the innovation activities in every field j such that C i,j = 1. Notice that the class of models proposed in [36] is vast and the following results are still valid for more realistic models of the dynamic behaviour of the innovation system. Consider, for instance, an empty network (C i,j = 0 ∀i, j). In this toy example,ẏ i = 0 ∀i and any initial condition is an equilibrium. Let us now add a link from i to j. In this case, we have thatẏ j = y i , which implies y j (t) = y j (0) + y i (0)t while all other nodes still experience a constant activity. When more links are present, we can have different polynomial behaviours for the activity in different technology classes. An interesting case arises when there is a direct or indirect reciprocal influence between i and j. In this case, i and j form an ACS, which, in terms of the technology network, implies that the more patents employ technology i, the more will also employ technology j, and vice versa. The catalytic cycle creates  an exponential dynamic involving the innovation activities acting on fields in the set. This exponential behaviour is in line with the empirical observation of the innovation system, both in terms of patents and in terms of productivity growth. More in general, we can see the same exponential dynamics for any arbitrarily long cycle, i.e. for any closed path connecting a subset of the nodes in the network.
To this end, it is useful to introduce the notion of an ACS, which is defined as a 'subgraph, each of whose nodes has at least one incoming link from a node belonging to the same subgraph' [37, p. 7]. In what follows, any set of nodes connected through one or more cycles will be called the core of its corresponding ACS, which, in addition to the nodes in the core, also includes the periphery, i.e. the set of nodes that are catalysed by the core but have no outgoing links feeding into the closed path. Because of this configuration, peripheral nodes have a passive role in the ACS; however, they still benefit from the boost provided by their incoming links. Figure 2 depicts a simple network containing an ACS and highlights the relevant subsets in which its nodes can be partitioned.
By expressing equation (3.1) in its matrix form,ẏ = Cy, it is easy to show that the presence of cycles is linked to the presence of positive eigenvalues of the adjacency matrix C. Since the adjacency matrix is non-negative, the Perron-Frobenius theorem guarantees the existence of a real eigenvalue which is larger than all other eigenvalues and is called the Perron-Frobenius eigenvalue (PFe), λ 1 . It is possible to prove [38,39] that a cycle exists in the graph if the PFe is greater than 0. A formal proof of the theorem is beyond the scope of this paper, but it can be shown that, if the PFe is greater than 0, any innovation activity corresponding to a positive element of the Perron-Frobenius eigenvector (PFE), y 1 , experiences an exponential growth becauseẏ 1 = λ 1 y 1 . Indeed, the PFe is informative of the presence or absence of closed directed paths (loops) in the graph, and its corresponding eigenvector has non-zero elements corresponding to the nodes that belong to the ACS. A larger PFe indicates a faster exponential growth driven by a higher connectivity in the core of the ACS. Note that, more in general, a matrix can have more than one PFe (and corresponding PFE) if it has more than one ACS (two ACSs are distinct if there is no path connecting their cores).
In what follows, we show how this very simple model can both (i) give us a novel understanding of the innovation system as a process of cumulative causation, and (ii) identify the core technologies in the evolving technological landscape. generating the network of technology classes, we map the network searching for an autocatalytic structure as defined in §3. A unique ACS was present in the network between 1980 and 2010, the size of which increased over time. At the beginning of the period, multiple clusters were present that did not form an ACS because no closed path between them was present. By 1998, most clusters had been 'captured' by the ACS, which in 2010 spanned approximately half of the technology network. In particular, the core occupied the largest portion of the ACS and peripheral nodes were only a minority. Figure 4 reports    The plots clearly show that the autocatalytic character of the network has become stronger over time and that its growth seems to have occurred in two phases: a first phase in which the periphery expanded (figure 4c) and a second one during which most of the technologies in the ACS transitioned to the core (figure 4c,d).

Fitness
Having observed the emergence of an autocatalytic structure in the technology network of classes, we want to understand how this structure affects the fitness 6 of the technological fields. In this article, we define fitness of a technological field the number of patent applications filed in a certain year that innovate in said field. The idea is that a higher patenting rate is indicative of a higher technological productivity, and a proxy of the 'innovativeness' of a technology class. By comparing class fitness between different parts of the network-namely the core, the ACS and the rest of the nodes-we test our hypothesis that technology classes inside the ACS display higher fitness than other classes. Figure 5 shows the time series of the fitness of the classes belonging to the core (red), to the whole ACS (blue), and to the rest of the technology network. Figure 5a reports the average fitness of nodes in each of the above-mentioned subsets. It clearly emerges that the nodes in the ACS, and especially those in the core, showed an increasing average fitness, while the fitness of the nodes in the rest of the network fluctuated around a constant value and was two times smaller than the fitness of the nodes in the ACS at the beginning of the sample and was 10 times smaller by 2011. The average fitness in the ACS and in its core remained largely of the same order of magnitude, although for nodes of the core it was almost always larger (with the exception of 1990 and in 1996). These figures lend support to the hypothesis that technology classes belonging to the ACS benefited from an autocatalytic advantage. Figure 5b proxies an absolute measure of fitness by reporting the time series of the total number of patents in the ACS, its core and the rest of the network. The plot shows that there were many more patents outside the ACS until the early 1990s, when only a few nodes were part of the ACS ( figure 3). However, as soon as the  size of the ACS started increasing substantially, its absolute fitness quickly surpassed the total fitness of the rest of the network by a large measure. Although less meaningful than the average fitness time series, this figure is evidence of the transition experienced by the network of technological fields, with the emergence of a large autocatalytic structure around the end of the century. Our analysis suggests that technology classes belonging to the ACS are more innovative owing to the positive effect of catalytic links on knowledge flows. In particular, classes in the core of the ACS experience the positive feedback of selfreinforcing cyclical catalytic structures, which give rise to cumulative processes of innovation. Figure 5c reinforces the above intuition by showing the evolution of the distribution of total fitness between the core of the ACS, the periphery and the rest of the nodes. It shows that in the early 1980s, the autocatalytic structure was rather marginal in the network of technology classes, since it included only around 10% of the total number of patents. After 1985, an increasing trend started whereby the ACS gained weight in the network and became clearly predominant, to the point that it concentrated almost 80% of total fitness in the last decade of the sample. This was mirrored by a significantly more contained increase in the share of nodes included in the ACS, as shown in figure 5d. Considering that just above half of the classes in the network became part of the ACS in the same period, there is evidence of a strong correlation between the prominence of the ACS of the technology network and the intensity of patenting activity in the fields comprising it. The above is a strong indication of the cumulative causation process behind innovation as identified by patenting activity. If technology classes in the ACS grow while classes outside the ACS do not, then catalytic inter-linkages between classes are a relevant driver of innovation. Taking patenting intensity as a meaningful proxy for innovation, the empirical evidence presented in figure 5 suggests that the substantial growth of a technology class is linked to its connection to other classes 'feeding' into it with incoming links. This is to say that a flow of knowledge stemming from the source of the directed link  can provide new knowledge and thus the basis for a new patent in the target class. The fact that stronger growth for a technology class comes from belonging to the core shows that innovation is fostered not only by technology spillovers from other classes, but especially from membership of a cycle of mutually reinforcing spillovers.

Autocatalytic structure and database hierarchy
In this section, we turn to the hierarchical structure of the database to investigates its role in the autocatalytic structure of the technology network. As mentioned in §2.1, PATSTAT adopts the IPC, 7 which has a tree-like structure consisting of eight sections at the root, which branch out into the progressively finer-grained classes, subclasses and so on. For the present analysis, we use classes and subclasses 8 as nodes of the technology network, but we are also interested in understanding how the hierarchy induced by the IPC maps to the ACS. To this end, we employ the IPC sections: -human necessities (A) -performing operations; transporting (B) -chemistry; metallurgy (C) -textiles; paper (D) -fixed constructions (E) -mechanical engineering; lighting; heating; weapons; blasting (F) -physics (G) -electricity (H) to cluster the nodes and see how sections map onto the structure of the technology network. In particular, we ask if the ACS discriminates sections and if significant links cut across section borders. These are not just questions of topological nature, because the answers shed light on whether recombinant innovation and, more specifically, the cumulative causation process of autocatalytic structures takes place mainly within classes or if, instead, it also involves broader connections between the coarse technological areas identified by IPC sections. Figure 6 reports the share of each IPC section within and outside the ACS thus showing how relevant each section is for the two subsets of nodes. The main result, displayed in figure 6a, is that the ACS is not at all static. At the beginning of the sample, the ACS consisted almost entirely of the classes contained in section C (chemistry; metallurgy), with the only exception of one node of section A (human necessity). Over time, a more variegated picture unfolded and, by 2011, the ACS spanned all sections but one-D (textiles; paper)-which remained outside the ACS consistently, only making a few sporadic appearances. A further observation suggested by figure 6 concerns the different composition of the ACS and the rest of the network; while the latter presents a quite uniform distribution of sections along the whole period 1980-2011, the ACS is characterized by less uniformity and a richer dynamic.  It is also possible to quantitatively assess the non-uniform distribution of sections in the growing autocatalytic structure of the technology network. To this end, we can imagine having an urn filled with marbles of different colours, each corresponding to a distinct section, in which every marble represents an IPC class. There is a different number of marbles of each colour in the urn and we want to extract as many marbles as there are classes in the ACS. The null hypothesis is that the sampling is random, meaning that marbles are picked blindly from the urn. The alternative hypothesis is that the sampling process is preferential and tends to privilege a specific subset of colours. We test the null hypothesis of random sampling against the alternative hypothesis of biased sampling from Fisher's non-central hypergeometric distribution [41] and find that the statistic relative to every year from 1980 to 2011 is above the critical value for a significance of 5% (see appendix B for further details). This indicates a significant bias in favour of some sections in terms of occupancy of the ACS over the period considered and shows that, though the expansion of the ACS has brought more sections in the ACS and its core, a non-uniform distribution remains.
It is also interesting to consider the share of nodes belonging to each section that are part of the ACS. For instance, figure 7 shows that one section, H (electricity), has been entirely contained in the core of the ACS in the more recent years of the sample, while the situation of class E (fixed construction) is less clear, also owing to large fluctuations in the final decade. Sections F and G seem to display a somewhat growing trend in the share of classes they contributed to the ACS. The remaining sections maintained  Up to now, we have seen that the expansion of the ACS has brought more sections in the ACS and its core, though a non-uniform distribution has persisted, with some sections remaining almost absent from the ACS throughout the whole period. Moreover, we observe a shift from A (human necessities) and C (chemistry and metallurgy) towards H (electricity) as the most important sections of the ACS, suggesting that there has been a transition from chemistry and metallurgy in the 1980s to electricity in the twentyfirst century as driving forces of innovation. The relevance of individual sections can be further clarified by looking at the adjacency matrix of the technology network displayed in figure 8a and the relative distribution of links within and between sections plotted in figure 8b. Note that the matrix of figure 8a is sorted according to a lexicographic ordering of the IPC class codes, so that nodes are grouped together based on the section they belong to. The presence of a block-diagonal structure would indicate that most links occur between classes of the same section. However, evidence of such a pattern appears to be mixed both in figure 8a, which depicts the adjacency matrix of the technology network for 1998 (the central year of the sample 9 ) and figure 8b, which summarizes the evidence for the entire period. Note that sections in the technology network are not all equal in terms of within and between connectivity. For example, sections B and D seem to have mostly internal significant links, suggesting that the hierarchical structure of the IPC captures the extent of knowledge spillovers for technologies related to transporting, textiles and paper. On the other hand, sections C and A, and likewise sections G and H, appear to share many links that cut the section border. This suggests that relevant spillovers can take place between 'distant' (and versatile) technologies and that a relevant role is played by the subsets of the autocatalytic structures in affecting the distribution of links. In other words, innovation, as measured by the production of patents, can potentially spawn as many connections between sections as it does within sections. The autocatalytic structure thus has at least comparable importance to that of sections in defining the boundaries of the drivers of technological progress.

Conclusion
This study is a first step to uncover the cumulative causation processes driving technological change by detecting an autocatalytic structure of patent databases. Our results can be summarized in three main points. First, the technology landscape described by the network of patent technology codes is characterized by a clear autocatalytic structure that has grown over the years to encompass most technology classes. Second, the classes that are involved in the ACS perform better in terms of innovativeness as measured by the rate of growth in the number of patents containing them. Finally, the autocatalytic structure of the technology network is evident and possibly stronger than the hierarchical structure of the database, since as many links connect classes from different sections as they do classes from the same section. This implies that recombinant innovation arising from interdisciplinary technological interactions is a stylized fact of technological change.
We believe that our approach based on detecting autocatalytic structures can be successfully extended in more fundamental ways. We owe at least part of the inspiration to study autocatalysis in technological systems to the work of [1,4,36], who have proposed autocatalytic networks as a model of selforganization for interlinked biological species. However, a distinctive characteristic of those models is the interplay of a fast dynamics driving species evolution and a slow dynamics 'reshuffling' the links of the network through species replacement or mutation. Instead, this work assumes that the equivalent of the fast dynamics acts on the population of technological codes and observe the changing network without providing a model for its evolution or its relation to the slow dynamics. A reason for this is that, while it is reasonable to assume that population and network evolution take place at very different time scales in biological networks, this seems less plausible in the domain of innovation, where intuition suggests that success of individual technological fields and their mutual interactions might change at more similar speeds.
In principle, technological codes are a powerful device to explore not only the different time scales of the dynamics shaping technological progress but also the different scales of technological definition and categorization. For example, recent studies have observed that about 60% of new patents use novel combinations of codes [42] taken from the most recent version of the technological classification. Moreover, an interesting property of codes is the fact that the classifications they are drawn from are not static, but rather change over time to keep up with the pace of technological change. In fact, recombination of existing knowledge appears as a distinguishing feature of innovation, a stylized fact that can be directly observed through changes of the classification system [43,44]. We envisage two main avenues for future research stemming from our study: first, an empirical analysis of ACSs at different scales of technological classification, to uncover possible fractal structures; second, a modelling framework that can reproduce and explain the statistical features of the empirical network of technology classes. From a methodological viewpoint, it would be interesting to explore the same questions using fundamentally different data about patents, such as citation networks [45,46] or co-occurrences of technology codes within patent documents or families.
This article reports the initial phase of a broader research project aimed at empirically assessing and understanding cumulative causation in social systems [47]. Within the realm of technological change, the results of this line of research bears potentially relevant implications for technological investment strategies at the corporate and institutional level as well as for innovation policy. More specifically, understanding the role of individual technological fields in the evolution of the wider technology system can be of the utmost importance for designing policies that address the different challenges of present times, from economic development and inequality to energy, security and climate change.
Data accessibility. Data, codes and a detailed explanation of their structure can be accessed through the Dryad Digital Repository at http://dx.doi.org/10.5061/dryad.r21qp [48]. series of the average fitness and the total fitness of subclasses in the ACS, its core and the rest of the network. As with IPC classes (figure 5), we find a clear positive trend of fitness for the ACS and its core over time. In particular, the average fitness of subclasses in these two sets grew from about 1000 patents per subclass in the ACS to about 2000 patents per subclass, and the average fitness for the core increased even more to about 3000 patents per subclass (figure 10a). At the same time, the average fitness in the rest of the network increased only slightly, reaching around 500 patents per subclass. Also the time series of total fitness (figure 10c) shows a lower level of fitness for ACS and core compared to the rest of the network in the first two decades of the investigated time period. The rest of the network was later surpassed by the fast growing ACS around the turn of the century. If we discount for the fact that subclasses are still less numerous in the ACS (and we normalize by the total number of subclasses in a set), we obtain the picture presented by the average fitness in figure 10a. Figure 10c shows the relative share of the three subsets of the subclasses network in terms of absolute fitness (number of patents). This figure presents a similar trend to the fitness in the network of IPC classes (figure 5c): the ACS increased in relative terms, while the rest of the network shrank. This time, the periphery of the ACS maintained a sizeable share (approx. 10%). In the last year (2011), the aggregate share of the ACS is about 60% for subclasses, compared with about 80% in the network of classes. However, considering that for subclasses the aggregate number of nodes in the various ACSs is roughly one-third of the population, the autocatalytic structure emerges as a dominant pattern also in the network of subclasses.
The above evidence further confirms our hypothesis that autocatalytic structures foster innovation as measured by the number of filed patents. In fact, our hypothesis is verified also at the level of IPC subclasses: like for IPC classes, also the finer structure defined by subclasses clearly shows that technology fields benefit from the inclusion in the ACS. This implies that the self-reinforcing process of cumulative innovation giving rise to the same quicker pace of technological progress that we observed in §4.2 is also detected in a more disaggregated representation technology network.  Black dots indicate that a link is present between two classes.
Comparing figure 12 with figure 8a clearly shows that the structure of the matrix is robust over time and that the point made in the main text concerning the structure of links in the technology network is generally valid. In particular, it can be seen that, over time, the number of significant links in the network has increased, while a significant share of these has consistently lain both within and across sections. This indicates that important knowledge flows consistently connecting classes belonging to different sections, and that recombinant innovation is a stylized feature of technological change. Put differently, sections do not affect the distribution of links between classes alone, since between-section co-occurrences also appear to play a relevant role.