Sheaving—a universal construction for semantic compositionality

Semantic compositionality—the way that meanings of complex entities obtain from meanings of constituent entities and their structural relations—is supposed to explain certain concomitant cognitive capacities, such as systematicity. Yet, cognitive scientists are divided on mechanisms for compositionality: e.g. a language of thought on one side versus a geometry of thought on the other. Category theory is a field of (meta)mathematics invented to bridge formal divides. We focus on sheaving—a construction at the nexus of algebra and geometry/topology, alluding to an integrative view, to sketch out a category theory perspective on the semantics of compositionality. Sheaving is a universal construction for making inferences from local knowledge, where meaning is grounded by the underlying topological space. Three examples illustrate how topology conveys meaning, in terms of the inclusion relations between the open sets that constitute the space, though the topology is not regarded as the only source of semantic information. In this sense, category (sheaf) theory provides a general framework for semantic compositionality. This article is part of the theme issue ‘Towards mechanistic models of meaning composition’.


Introduction
The way that representations and their meanings for complex entities obtain from the representations and meanings for the constituent entities and their structural relations is called semantic compositionality. Some form of compositionality is supposed to explain concomitant cognitive capacities, such as the systematicity of language [1] and thought [2], i.e. where possessing certain cognitive capacities implies possessing certain other (structurally related) cognitive capacities-an equivalence relation on cognitive abilities [3]-such as under-mechanisms-a language of thought [2] on one side versus a geometry of thought [7] on the other-and their explanatory import [6]. The challenge is not just to explain how some form of compositionality accounts for properties such as systematicity, but why cognition is compositional in the first place [8].
Explaining the why versus how of systematicity was posed as a challenging problem for connectionist theories [4], and later shown to be also problematic for classical theory [6]. Problematically, while there are instances of compositionality that support a requisite systematicity property, there are also instances that do not support the same property. So, systematicity does not necessarily follow from core principles and assumptions of classical or connectionist theories. Auxiliary assumptions added to pick out just those instances of compositionality that support systematicity are ad hoc when they are unconnected to the theory's core principles and assumptions, cannot be confirmed independently of confirming the theory, and are motivated only by the need to fit the data, in which case, the theory fails to fully explain systematicity [6]. One recourse is to claim that the supposed counterexamples are not the 'canonical' forms of compositionality that classical theory takes as a core assumption [3]. Yet, its unclear what characterizes canonicity, or why cognition is canonically compositional [9].
A category theory [10] approach to compositionality was introduced to address the why of systematicity [11]. Category theory is a field of (meta)mathematics invented to formally compare mathematical structures [12]. The core explanatory concept is universal construction, formalized as universal morphism, which is a way of comparing cognitive capacities modelled as compositions of maps-such constructions are characterized by a universal mapping property [13]: in regard to a collection of systematically related cognitive capacities, each map modelling a member capacity is composed of the map shared by all members and a map that is unique to that capacity. Hence, a universal morphism identifies an equivalence class of systematically related cognitive capacities. Such constructions are the 'best' one can do within a certain (categorical) context-every construction in that context 'leads to' a universal construction, so necessarily obtains via a recursive process [9].
An explanation for semantic compositionality must ultimately connect to the physical (neural) system that supports cognition. Classical theory assumes that symbols are supported by a neural system that implements the equivalent of memory registers, i.e. the physical symbol system hypothesis [14]. Connectionist theory makes this link more directly as the representations that supposedly support semantic compositionality are instantiated as neural activity for a network of (abstract) neurons. A categorical approach must also make this kind of connection. To this end, the current work focuses on another universal morphism, called sheaving [15] or sheafification [16], to sketch out a category theory perspective on the semantics of compositionality. Sheaving is a construction at the nexus of algebra and geometry/topology, which alludes to an integrative view. This view starts with a ( pre)sheaf to model cognitive representations as data attached to a topological space [17]. As we shall see, the underlying topological space gives meaning to the data in terms of the relations between the open sets that constitute the topology.
The presentation of this work is primarily informal to facilitate an intuitive understanding of the approach. Connections to formal details appear elsewhere [17], and deeper introductions to categories and sheaves appear in many textbooks on these topics [10,16,18,19]. We proceed with an example of a universal morphism that serves to illustrate the basic category theory concepts ( §2) underlying the examples of sheaving given in the context of cognition ( §3). This approach is discussed by comparison and contrast with classical notions of compositionality and possible neural mechanisms ( §4). For convenience and to help ground concepts, some formal details appear in the appendix.

Categories and (universal) compositionality
We use playing cards as a running example of compositionality to bootstrap the needed category theory from the more familiar concepts of sets and functions. Each card has a rank (i.e. two, three, … , ten, jack, queen, king, ace) and a suit (i.e. spade, club, diamond, heart). For example, queen and heart constitute the queen of hearts. The ranks can be represented by the set of symbols Rank ¼ {2, 3,4,5,6,7,8,9,10, J, Q, K, A}, the suits by the set of symbols Suit ¼ {;, ', V,~} and the cards by the Cartesian product of those sets: For instance, the pair of symbols (Q,~) represents the queen of hearts. This product also comes with two functions that retrieve the rank and suit of each card: e.g. rk : (Q,~) 7 ! Q and st : (Q,~) 7 !~. Accordingly, sets and functions provide a basic set-theoretic model of playing cards.
Category theory starts with the formal concept of a category (definition A.1), which consists of a collection of entities, called objects, a collection of relations between objects, called morphisms, and an operation that takes two morphisms and returns a morphism, called composition. The archetypal category is Set (example A.2), the category of sets (objects) and functions (morphisms), with function composition as the composition operation (remark A.3). Hence, sets Rank, Suit and Card are objects and functions rk and st are morphisms in Set, constituting a categorical product (definition A.6), which is the Cartesian product for this category (example A.7). A deck of cards is modelled as a mapping of each face, signifying a playing card, to the corresponding symbol, e.g. a function card : Face ! Card; Q~7 ! (Q,~). The mappings from faces to ranks and from faces to suits are given by compositions faceRank ¼ rk card and faceSuit ¼ st card, respectively: e.g. faceRank : Q~7 ! Q, which says that the rank of the card signified by the face Q~is Q (remark A.8). Thus, we have a category-theoretic model of the same playing cards concept.
Having introduced categories, we can now look at basic constructions and their relations. A functor (definition A.12) is a way of constructing, indexing, or identifying objects and morphisms. For example, the product functor (example A.14) constructs the set of cards from the sets of ranks and suits, i.e. P : (Rank, Suit) 7 ! Rank Â Suit, and a constant functor identifies the set of cards (i.e. the functor that sends every set and function, in Set, to the set of cards, Card, and its identity function, 1 Card ). Two functors are related by a natural transformation (definition A.15), and the optimal (or most efficient) transformation pertains to a universal morphism (definition A.17). For example, the transformation from the set of cards to their ranks and suits is the universal morphism (Card, rs), where rs ¼ {rk, st}. The transformation is efficient in that there are no more and no fewer mappings than needed to retrieve the rank and suit of every card.
royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 375: 20190303 Note that universal morphisms are unique up to unique isomorphism (remark A.19). So, constituents need not be 'tokened' in the classical sense. A characteristic of classical compositionality is that the symbols representing constituents are tokened (inscribed, or written out) whenever the representation of their complex host is tokened [4]. The symbol pair representation of cards is an example of tokening: for instance, the symbols for queen, Q, and heart,~, are tokened whenever the symbol for queen of hearts, (Q,~), is tokened. In category theory, the product of two sets is conventionally given as the Cartesian product, but other products exist. For example, the cards can be represented as numbers, say from 1 to 52, provided the accompanying functions retrieve the requisite components. Being an isomorphic set is not sufficient, because one still needs the appropriate functions to recover the constituents-such isomorphisms are generally not unique (remark A.19).

Sheaving: bridging gaps in knowledge
Our categorical approach to semantic compositionality involves presheaves/sheaves (functors) and sheaving (natural transformation). A presheaf/sheaf (definitions A.20/A.21) models data attached to a topological space (definition A.4). A sheaf is a presheaf where the attached data are globally coherent, i.e. agree on overlapping regions. Pullbacks (definition A.9) express global coherency conditions (remark A.22). For Set, a pullback of f and g (example A.10) is a constrained product (remark A.11), which consists of only those pairs, (a, b), whose components map to a common value ( property): f (a) = g(b). Hence, pullbacks pertain to non-local (global) properties. Sheaving is a universal morphism that constructs the 'nearest' sheaf from a given presheaf (remark A.23). This construction is likened to the natural join operation (example A.24) that extracts information from data stored locally in different tables of a relational database-say, the addresses of all people prescribed a particular medication, where contact and medical data are stored in separate tables. In this way, sheaving is a kind of relational inference: a way of bridging gaps in knowledge via meaning grounded in the underlying topological space.
We give three examples of sheaving that pertain to cognition. The first example continues the introduction to category (sheaf ) theory constructions via the familiar concept of playing cards. The second example involves visual feature binding [17] extended for triple conjunction search [20]. The third example involves a simple version of depth perception. Each example illustrates the different ways that meaning is conveyed by the relations between the open sets that constitute the topology.

(a) Playing cards
The playing cards example, introduced earlier, can be considered as a presheaf or sheaf on a topological space constituted by elements identifying the (feature) dimensions of rank and suit. For example, suppose the rank and suit dimensions are labelled as R and S, respectively. The set of dimension labels D = {R, S} together with the topology {;, {R}, {S}, {R, S}} constitute a discrete topological space, which consists of all subsets of labels and their inclusion relations (example A.5). And, the values of each card constitute the data attached to that space. For example, the queen of hearts and two of spades are represented by the presheaf, F Q2 : D op ! Set. In database terms, this presheaf can be regarded as a collection of tables whose attributes (headings) correspond to the open sets and rows correspond to the attached data, e.g. there is a two-column table whose attributes correspond to the open set {R, S} that has two rows: one row for the queen of hearts and one row for the two of spades (example A.26). In sheaf theory terms, F Q2 sends each open set to the set of functions on that set-each function maps the elements of the open set to the attached data-e.g. F Q2 : {R, S} 7 ! {c QH , c 2S }, where c QH : R 7 ! Q, S 7 !ã nd c 2S : R 7 ! 2, S 7 ! ;. The inclusions given by the topology are preserved as restrictions on functions, e.g. {R} ⊆ {R, S} maps to the restriction fj R : c QH 7 ! c Q , c 2S 7 ! c 2 . Restriction corresponds to (database) projection of a table along the specified attribute(s).
Sheaving affords the systematic capacity to represent all cards (example A.27), but this capacity depends on the topology. To illustrate, suppose one knows the ranks and suits, i.e. there is a one-column table of 13 rows for ranks and a one-column table of four rows for suits. In this situation, sheaving simply constructs all pairwise combinations of ranks and suits, which is the sheaf F þ card . Thus, we have a systematic capacity to represent all 52 cards. One can think of sheaving as a kind of completion, or limit process-adding just enough rows to make a sheaf.
A contrasting scenario is where one knows some of the cards without knowing about constituents rank and suit: cards are understood as non-compositional entities. This situation is captured by the indiscrete topology (example A.5), i.e. {;, D}. Sheaving, in this case, does not add any rows to the table containing just the known (non-compositional) cards. Hence, one does not necessarily have a systematic capacity to represent all cards. Completion is trivial-the presheaf is a sheaf-because the topology does not consist of any other (non-empty) open sets.
This difference between sheaving with respect to a discrete versus indiscrete topological space was used to model the difference between generalization and lack of generalization observed with participants trained on cue-target maps [17]. The participants who failed to generalize were regarded as having learned the mappings from cues to targets-pairs of letters to coloured shapes-as mappings of non-compositional entities.

(b) Visual feature binding
Visual feature binding concerns the capacity to identify, say, a red square and a blue triangle, as opposed to a red triangle and a blue square based on globally coherent spatial information (location). This process is modelled as the sheaving of colour and shape location maps to obtain a colour-shape conjunction map that corresponds to objects observed in the visual field as needed to perform visual search [17]. Here, we show how this example of sheaving extends straightforwardly to triple conjunction search [20], i.e. where the target of search is identifiable by a triple of features, such as colour, orientation and (spatial) frequency.
In terms of universal morphisms, sheaving involves pullbacks (remark A.22). For instance, the colour-orientation map obtains from the pullback of the projections of the colour-location (CL) and orientation (OL) maps onto location: p 2 : CL ! L and p 2 : OL ! L to obtain the colour-orientation map, denoted C × L O, and its projections. Thus, triple conjunction obtains from two pullbacks: The topology in this example conveys a different (relational) meaning from the meanings conveyed by the discrete and indiscrete topologies. Each topology induces a corresponding order over the elements of the underlying space, called the specialization ( pre)order (remark A.28): C ≤ L, O ≤ L, F ≤ L for the current example, which says that colour, orientation and frequency specialize location; conversely, location is a general (global) property of the data (object features) attached to the topological space. By contrast, the discrete topology in the cards example has the corresponding order R ≤ R, S ≤ S, which says that neither dimension is a specialization of the other. In other words, the dimensions are independent; sheaving is effectively a Cartesian product of the sets of values on those dimensions (example A.27).
The preorder corresponding to the indiscete topology in the cards example has R ≤ S and S ≤ R, which says that the dimensions are specializations of each other, i.e. effectively the same dimension (remark A.28). Thus, topology plays a significant role in our approach to semantic compositionality.

(c) Depth perception
Binocular vision can be used to infer (triangulate) location of a target object using lines of sight and relative eye positions. This computation can be achieved as an instance of sheaving, using simple geometry. Suppose the position of the target object is (x, y) ∈ P and the angles of the eyes (lines of sight) to the target are λ and ρ for the left and right eyes, respectively. Left and right lines of sight specify position as functions of distance from the eyes, l ∈ L and r ∈ R, parameterized by angle: left l : l 7 ! l(cos l, sin l), and right r : r 7 ! r(cos r, sin r).
The position of the target is the intersection of the two lines of sight, which is the pullback of left l and right r . This pullback is equivalent to the pullback of projections p 2 : LP ! P and

Discussion
Semantic compositionality concerns the way that representations and the entities they stand in for correspond in some systematic, structurally consistent manner. Our sheaf theory approach regards this correspondence as data attached to a topological space (presheaf/sheaf ), where the shape (topology) of the underlying space conveys meaning to the representations. Shape is determined by the open sets and its structure is preserved by restrictions of the data, either locally ( presheaf ), or in a systematic, globally coherent manner (sheaf ). Systematicity is afforded by a universal construction (sheaving). Sheaving infers non-local information from locally sourced knowledge to construct the nearest sheaf by gluing together data that agree on the overlapping regions (global coherency). Three examples were given: (1) inferring the ranks and suits of every card, given ranks and suits of some cards, (2) inferring the binding of features to objects given the binding of features to locations and (3) inferring object location given binocular line of sight. In each case, local knowledge is extended (composed) to infer non-local information, and this form of compositionality depends on the topology.
Note that there are two senses in which sheaving spans a formal divide. There is a 'vertical' sense in that presheaves are maps that preserve spatial relations (inclusions) as algebraic relations (restrictions). We limited ourselves to the simplest case where attached data were sets. In general, other categories can be used, such as categories of partially ordered sets, or groups. And there is a 'horizontal' sense in that data attached to open sets are glued together to construct data attached to a larger open set. These two senses arise because functors are maps between categories, whereas natural transformations (sheavings) are maps between functors.
This sheaf theory approach can be compared/contrasted with classical approaches to compositionality. Classical compositionality, in comparison, says that representations of complex entities are given by representations of their constituent entities so that the semantic relations between constituents are preserved by syntactic relations between corresponding symbolic representations. Functors preserve structure. So, classical and categorical approaches are similar to the extent that classical structures are category-like. Classical theory assumes symbolic representations are instantiated on some physical system, e.g. memory registers (or, slots), hence classical systems are sometimes called physical symbol systems [14]. Given a set of registers, one can impose the discrete topological space, in which the instantiated symbols are data attached to that space, thus realizing a presheaf. In this way, classical compositionality can be seen as an instance of categorical compositionality. By contrast, however, functoriality is only one part of the categorical approach to compositionality presented here. Presheaves and sheaves are functors, but only presheaves that are sheaves satisfy the global coherency conditions. As noted elsewhere [17], pullbacks are reminiscent of symbolic connectionist models, LISA [21] and DORA [22]. The idea is that (relational) entities are represented via connections to corresponding neurons representing the constituent entities (fillers) and their roles in the relation based on shared semantic information represented by a common pool of neurons. Neurons representing related entities that have shared semantic features tend to bind together. Similarly, the pullback of morphisms f : A → C and g : B → C is a generalized intersection royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 375: 20190303 of A and B constrained by C. In terms of those models, objects A and B pertain to roles and fillers, C to semantic features, and the pullback object to relational binding. This correspondence is suggestive of a way to connect sheaving to neural network models. Neurons are topologically organized and their activities are the attached data.
The nature of sheaves depends on the nature of the data and the underlying topology. The examples of sheaves presented here are relatively simple. Sheaf theory has applications in other areas that may be adaptable to cognition. For example, a sheaf theory approach to sensor fusion [23] suggests applications to the psychology of perception. Human probability judgments that violate classical probability laws motivate quantum probability theory for cognition [24]. The close connection between sheaf theory and contextuality effects in quantum physics [25] suggests that our sheaving approach to semantic compositionality may also be applicable to quantum-like compositionality effects [17]. In these applications, the data are measurements, or probabilities [23,25].
One important direction for further work is modelling the development of the underlying topological space. Our examples illustrate how different topologies ground relational information differently. However, we have not considered how these topological spaces are obtained. Sheaf theory methods in applied topology [26] may be useful here, where the underlying topological space is inferred from data.
The importance of the underlying topology is another way that the sheaving approach goes beyond classical and artificial neural network approaches to compositionality. In this paper, we focused on the universal morphism aspect of sheaves and sheaving, because universal morphisms were argued to play a crucial role in explaining systematicity [9,11], which is a cognitive property motivating compositionality principles [8]. Yet, the topological aspect of sheaving is also crucial. Any set of registers or neurons can be given a topology. The deeper question is why one topology arises over another. Discrete and indiscrete topologies were asserted for an application of sheaving [17] because they are two extremes obtained from universal morphisms. So, their determination accords with the general universal construction principle [9,11]. Determination of other topologies will depend on other constraints. For instance, the physical (geometrical) relations between sensors ground triangulation of object location. This view of semantics differs from the classical view, which regards the computational ( psychological) level as supported by, but independent of the specific physical (implementational) level-just as a programming language is supported by, but independent of a specific computer.
Topology captures order, and order is implicit even in the productive (recursive) aspects of cognition, e.g. level within a tree hierarchy. We have not dealt with productivity, as it purportedly implies recursion in language [27]. Category theory also provides general constructions for recursion [28], and these methods have been applied to some aspects of cognition [9]. Topology is not regarded as the only source of semantic information. So, in this sense, category (sheaf ) theory provides a general framework for semantic compositionality.
Data accessibility. This article does not contain any additional data. Competing interests. I declare I have no competing interests. Funding. This work was supported by a Japanese Society for the Pro-

Appendix A. Basic theory
Conceptual introductions to the formal concepts provided in this appendix can be found in [23,29,30], see also in [17]. Deeper introductions to the category theory concepts can be found in [13,16,19] and sheaf theory concepts in [16,18]. Specific results are referenced where they appear in the appendix.

Example A.5 (Topological space). A topological space is a category of open sets (objects) and inclusions (morphisms)there is just one morphism
The discrete topology on X is the set of all subsets of X; the indiscrete topology on X is {;, X}.
Definition A.6 (Product). A product of objects A and B, in a category C, is an object P (also written A × B) together with a pair of morphisms π 1 : P → A and π 2 : P → B such that for every object Z and morphisms f : Z → A and g : Z → B there exists a unique morphism u : Z → P such that f = π 1 • u and g = π 2 • u. Morphism u is also denoted 〈f, g〉, as it is uniquely given by f and g. royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 375: 20190303 Remark A.8. The function u : Z → P need not be a one-toone correspondence (bijection). For instance, the rules of a game may stipulate that certain cards are duplicated or withheld, so a deck may contain more or less than 52 cards, i.e. the map from faces to cards, card : Face ! Card, is onto (surjection) or into (injection).
Definition A.9 (Pullback). A pullback of morphisms f : A → C and g : B → C, in a category C, is an object P (also written A × C B) together with a pair of morphisms π 1 : P → A and π 2 : P → B such that for every object Z and morphisms z 1 : Z → A and z 2 : Z → B there exists a unique morphism u : Z → P such that diagram Remark A.11. A pullback is a (generalized) product constrained by f and g. A product of A and B is equivalently a pullback of f : A → 1 and g : B → 1, where 1 is terminal: an object such that for every object X, in C, there exists a unique morphism from X to 1. In Set, a terminal is any singleton set, thence f (a) = g(b) for all a ∈ A and b ∈ B. Thus, a product is effectively an 'unconstrained' pullback.
Definition A.17 (Universal morphism). A universal morphism from functor F : C → D to object Y in D is a pair (B, ψ) consisting of an object B in C and a morphism ψ : F(B) → Y in D such that for every object X in C and every morphism g : F(X ) → Y in D there exists a unique morphism u : X → B in C such that g = ψ • F(u).
Example A.18 (Products, pullbacks). A product of A and B is a universal morphism (A × B, π) from the diagonal functor, Δ, to the pair of objects (A, B), where π = (π 1 , π 2 ). A pullback of morphisms f : A → C and g : B → C is a universal morphism (A × C B, π) from the (generalized) diagonal functor [10] to the pair of morphisms ( f, g).
where the right diamond indicates the pullback of fj U>V and gj U>V ; or equivalently, by the equalizer of pairs of morphisms   Remark A.28. A topological space, (X, T ), induces a specialization preorder on the elements of the underlying set, X. Two elements x, y ∈ X are comparable, x ≤ y, if x is an element of the closure of y, i.e. the intersection of all closed sets containing y-if U is an open set of T, then the complement of U (i.e. the set of elements in X that are not in U) is a closed set. In the cards example, the indiscrete topology has closed sets ; and {R, S}. The closure of R and the closure of S are the same set, {R, S}. Hence, the preorder has R ≤ S and S ≤ R. Open sets specify closeness. Accordingly, the open set {R, S} says that R and S are close to each other, but not preferentially so, since there are no other open sets. The open sets of a discrete topology are also the closed sets. So, in the discrete case, R and S are not comparable, since R is not in the closure of S, i.e. {S}, and S is not in the closure of R, i.e. {R}. Note that an element is always comparable to itself, x ≤ x, because any topology T on X must contain X as an open set of T (by definition). Figure 1. Relational tables for a presheaf (a-e) and its nearest sheaf (a-d,f ).