Architecture and evolution of semantic networks in mathematics texts

Knowledge is a network of interconnected concepts. Yet, precisely how the topological structure of knowledge constrains its acquisition remains unknown, hampering the development of learning enhancement strategies. Here we study the topological structure of semantic networks reflecting mathematical concepts and their relations in college-level linear algebra texts. We hypothesize that these networks will exhibit structural order, reflecting the logical sequence of topics that ensures accessibility. We find that the networks exhibit strong core-periphery architecture, where a dense core of concepts presented early is complemented with a sparse periphery presented evenly throughout the exposition; the latter is composed of many small modules each reflecting more narrow domains. Using tools from applied topology, we find that the expositional evolution of the semantic networks produces and subsequently fills knowledge gaps, and that the density of these gaps tracks negatively with community ratings of each textbook. Broadly, our study lays the groundwork for future efforts developing optimal design principles for textbook exposition and teaching in a classroom setting.


Introduction
Knowledge has been distilled into formal representations for millennia [1]. Such efforts have sought to explain human reasoning and support artificial reasoning [2,3,4]. Semantic networks organize information by detailing concepts (nodes) and their relations (edges), which can be defined by inclusion in the same thesaurus entry, free word association data, or co-occurrence within a corpus of text [5,6]. Concept maps reflect information in a similar manner, but are drawn out by students and therefore can be used to evaluate comprehension [7,8,9,10] and identify topics that are most difficult to connect to other concepts [11]. With the capacity to construct semantic networks, concept maps, and similar formal representations of knowledge comes the challenge of distilling rules and mechanisms of knowledge formalization and acquisition.
Network science offers an appropriate conceptual language and useful mathematical toolset with which to meet this challenge [12]. In the parlance of network science, semantic networks of language tend to exhibit highly ordered architectures with strong local clustering, relatively short paths between any pair of nodes, and a few hubs, which are connected to an unexpectedly large number of other nodes [5,6]. Recent work using highly stylized laboratory experiments provides some preliminary evidence that network structure may play a role in how humans process information [13] and acquire knowledge [14,15,16]. Yet extending these findings to the real world has proven difficult and it remains unknown precisely how the network structure of knowledge in the form of science textbooks [17], science and mathematics topics on Wikipedia [18], and even formal scientific papers [19,20] determines or impacts the learnability of these content domains. Learnability aside, even efforts to study the network architecture alone suffer from the major limitation of considering the whole semantic network at once, rather than evaluating the network's dynamic structure as it unfurls over the course of presentation, exposition, or acquisition. The educational literature clearly supports the intuitive notion that the order in which topics are introduced can hurt or hinder learning at this level [21,22], but the notion has not be formalized in network approaches to learning in ecologically valid experimental settings.
Here we seek to address these limitations by studying semantic networks of mathematical concepts [23,24] in linear algebra textbooks. A common college-level course, the subject is rigorous and logical, being composed of concepts that relate to one another in a careful and curated manner. To begin, we seek to understand the latent structure of these inter-concept relations in textbooks, which present the knowledge in a thoughtfully ordered and comprehensive exposition. We proceed under the assumption that the textbook's semantic network reflects the latent structure of the corresponding knowledge. Using techniques from network science [12], we test the hypothesis that semantic networks exhibit structural order, reflecting the logical sequence of topics that ensures accessibility. Motivated by a recent report that language acquisition proceeds through an ordered progression through knowledge gaps [25], we use persistent homology [26,27,28,29,30] to track the growth and development of topological cavities in the semantic network. We predict that few knowledge gaps will exist; withholding connections between topics that have already been taught is unlikely to effectively convey knowledge. Finally, we compare the growth of semantic networks elicited from multiple texts, in terms of their different expositional structures and topic orderings. We hypothesize that the degree to which knowledge gaps are created and persist within texts may be related to the complexity or difficulty of a text, and to the knowledge it reflects. Broadly, our quantitative evaluation of the differing structures and expositional layouts of distinct textbooks provides a foundation for future work examining the effects of topic ordering and network architecture on classroom learning.

Results
We constructed semantic networks and expositional growing networks from 10 linear algebra textbooks (see Supplementary Methods). We first used a modified version of the RAKE algorithm [31] to identify significant phrases (Fig. 1, step 1), which we refer to collectively as the index list of concepts. We represent these concepts as nodes, and connect two nodes by an edge if their corresponding concepts co-occur within the same sentence (Fig. 1, step 2). To mimic the growth of a reader's knowledge network, we add nodes and edges as soon as they are mentioned in the book (Fig. 1, step 3). Across textbooks, node sets ranged in length from 146 to 453 (average 279.4) and edge densities ranged from 0.0748 to 0.204 (average 0.129). In what follows, we characterize the semantic network growth of all texts, and when useful we give examples from individual texts referred to by author last name.

Meso-scale structure of semantic networks
Mathematics as a field and linear algebra as a subject contain many fundamental topics and conceptual connections between those topics. Practitioners and authors can agree or disagree about which topics are fundamental, and which are more tangential, or less strongly linked to the rest. Within a network, this organizational scheme can manifest as core-periphery structure where fundamental concepts are densely connected to one another, while peripheral concepts connect to the core but not to one another (Fig. 2a). To assess this structure in a semantic network constructed from the whole text, we calculate the coreperiphery statistic and compare statistic values to those obtained from two null models (Fig. 2c): (i) a random index null model, in which random words are used to generate the expositional network, and (ii) a continuous configuration model, in which the original network is rewired while maintaining node degree and strength. Generally, we observe that the true semantic networks obtained from the entire texts show greater core-periphery organization than the continuous configuration model, suggesting the presence of a strongly connected core of topics along with a set of sparsely connected periphery topics given the degree and strength distributions. Interestingly, we also observe that the true networks show less core-periphery organization than the random index model, indicating that the networks of math terms are more homogeneous than a network of randomly chosen words.
We next investigate the internal structure of the core and periphery. For the core, we find that across texts many similar words participate, including 'determinant', 'vector space', and 'matrix' as expected (see Supplementary Table S3). In contrast, we expect that the periphery contains terms more specific to a given book and its particular sub-topics. We therefore hypothesize that the periphery will display community structure (Fig. 2d). To test our hypothesis, we calculate the modularity of the periphery subnetwork, along with the relevant subnetworks of the random index and continuous configuration null models. We observe that the periphery of each semantic network generally exhibits a modular organization that is stronger than that of the continuous configuration model, but weaker than that of the random index model (Fig. 2e). Intuitively, while randomly chosen words may display strong modularity due to both semantic and syntactical relationships, mathematics phrases are used in a more modular fashion than expected, perhaps due to the nature of focusing on one general idea at a time in chapters and sections.

Expositional development of the large-scale structure
How does the identified network structure develop along a text's exposition? We find that the introduction of core nodes precedes the introduction of periphery nodes throughout the exposition (Fig. 3a). We quantify this observation by calculating the area between the core and periphery node introduction curves; high values indicate that the core appears much earlier than the periphery, and low values indicate that the core and periphery appear at a more equal rate. The areas range from 0.064 and 0.20 across texts, and a one-sample t-test rejects the null hypothesis that these values are drawn from a distribution with mean 0 (t = 8.65, p = 1.18 × 10 −5 ).
Next, we compare the areas obtained from the true texts to those expected in statistical null models. Notably, we find that the periphery is introduced earlier (relative to the core) than expected in the random index model (Fig. 3b). We observe no consistent trend across texts in comparison to a random sentence model, in which we use the original index list to build a growing graph from the texts after randomizing sentence order. While many texts show a marked discrepancy between the core and periphery development, others show a more even development. These differences across texts could reflect different expositional styles amongst different authors: some may choose to introduce core topics initially and save possibly extra tangents for later, while others may involve discussions of peripheral topics throughout the text for motivation. Additionally, we take a similar approach in examining the relative rate of introduction of edges connecting different types of groups within the core and periphery, and find that of all edge types, those connecting concepts within a single periphery community are introduced the most sporadically, with some communities being fully introduced early on in the text, and some being introduced later (see Supplementary Figure S2).

Expositional development of knowledge gaps
In studying core and periphery formation, we focused on densely connected areas in the growing networks; now we turn to a study of sparsely connected areas. Specifically, we seek to understand how voids or knowledge gaps might emerge and evolve throughout the exposition. Teaching strategies may intentionally leave open a connection or an area of the knowledge space in order to more intuitively reveal the connection later when a learner has more experience, or to provide the reader the opportunity to derive the connection on his/her/their own. A lack of connections between concepts can manifest as a topological gap in the network (Fig. 4a).
To detect gaps that form and evolve throughout the text, we compute the persistent homology [27,28,29,30] of the ordered set of networks composed of nodes and edges that exist at each point in the exposition; note, this ordered set of binary graphs is referred to as a filtration. We specifically detect gaps between connected components (dimension 0 homology > 1), cavities within rings of edges (dimension 1 homology > 0), or voids within polyhedra (dimension 2 homology > 0). We say that these so-called persistent cavities are born at the first instance of their appearance in the network, they live as long as the network grows and the topological void still persists, and they die when they are either connected to another previously disconnected component (in the case of dimension 0) or are tessellated by crossing edges (in the higher dimension cases). We invite the reader to refer to the Supplementary Methods for a more rigorous description of persistent homology in this application.
In order to detect emerging and evolving gaps throughout exposition, we compute the persistent homology of each text. The number of gaps of dimension n that are alive at a given point in the filtration, called the Betti curve, is denoted β n . We see that the texts tend to generate a large number of components, as manifest by the initial β 0 peak, followed by a rise in β 1 , and finally a slow and steady increase in β 2 (Fig. 4b).
For each text, we summarize the life and death of each persistent gap in a barcode (Fig. 4c). Each bar represents a single persistent cavity; the left endpoint of the bar indicates the birth time of the persistent cavity, while the right endpoint indicates the death time. Across all texts, we see that although many persistent cavities are killed soon after birth, a non-trivial number of gaps in each of the three dimensions persist throughout many sentences, suggesting that long-lived gaps are a consistent hallmark of the growing text structure.
To further evaluate the substantiveness of the gap architecture, we compared the persistent homology of the text networks to two filtration-based null models. In the first null model, we use the introduction of concepts to order the complete network for the filtration. More precisely, this node-ordered null adds a node at the first mention of the concept, and also adds all of the connections that will ever exist from that node to previously acquired concepts. This model mimics the teaching strategy of introducing all connections of a new concept to anything previously taught. We find that the node-ordered model produces almost no persistent homology, in stark contrast to the original text (see Supplementary Figs. S5, S6). This result suggests that the true expositions consistently leave connections between already-learned concepts for later discussion. We use a second null model to determine whether the introduction of edges that will exist in any order might produce similar progressions of persistent cavities. We find that this random edge order model produces an order of magnitude more persistent cavities of dimension 1 and 2 than the original text (see Supplementary Figs. S5, S6). Broadly, the presence of a few long-lived cavities in the actual text are consistent with the notion that knowledge gaps exist but are introduced sparingly, and that introducing connections to all topics previously learned is not the strategy of these texts.
At this point we know that throughout the text the introduction of terms and connections forms and fills gaps as a reader progresses. However, we do not yet know if the number and longevity of persistent cavities is different than we would expect from any growing semantic network in the text or from a reordered text. In order to answer this question, we define the normalized average cycle lifetime in dimension n as the sum of all persistent cavity lifetimes normalized by the number of cavities and filtration length (similar to metrics defined in [33], see Methods for details). Then intuitively a large value of normalized average cycle lifetime suggests that many long-lived persistent gaps exist, while a small value suggests that few gaps ever form or those that do will die shortly after birth. We show the distributions of normalized average cycle lifetime values in dimension 0 in Fig. 4d for the random index and random sentence models, and the corresponding distributions for dimensions 1 and 2 in Supplementary Figs.S7 S8. For completeness, we also include the barcodes and Betti curves for each model in the Supplementary Figs. S3, S4, S5, S6. Strikingly, the original text expositions generally fall below both null models' expected normalized average cycle lifetimes. This observation suggests that the exposition proceeds in a manner that may intentionally avoid developing disconnected topics, or possibly connects new topics to others very quickly. In dimensions 1 and 2, texts' normalized average cycle lifetimes vary more in relation to their null models, with only a handful of texts showing lower values than the null models.

Evolving structure and text properties
After characterizing the structural features of the growing text networks, we next ask if these features might relate to text rating. Perhaps some readers particularly enjoy a book that leaves open many gaps motivating future study, while others enjoy a book with a stronger core offering conceptual closure. To determine whether readers' preferences relate to network structure, we used text ratings from Goodreads goodreads.com. We kept any text which had at least five ratings, which was the case for seven of the ten texts. We observe no significant correlation between average text rating and normalized average cycle lifetime across texts' sentence-based filtrations ( Fig. 5a; see Supplementary Table S5 for all Spearman's correlation coefficients and p-values). We also consider a one-at-a-time (OAAT) filtration (see Supplementary Methods), which in addition to allowing for comparability in persistent homology across texts and null models, provides additional information not just about a text's knowledge gaps on the sentence scale, but furthermore its subsentence topological structure. Remarkably, we observe significant negative correlations between average rating and OAAT normalized average cycle lifetime in dimensions 0 (Spearman's correlation coefficient ρ = −0.857, p = 0.0137) and 2 (ρ = −0.893, p = 0.00681), as well as the mean cycle lifetime averaged over dimensions 0, 1, and 2 (ρ = −0.821, p = 0.0234) (Fig. 5b, see Supplementary Table S5 for all statistics). Intuitively, these results provide preliminary support for the notion that the extent of knowledge gaps in exposition influences the quality of a text as a learning tool. However, these results only account for seven of the ten books, due to lack of availability of ratings for the others; as such, further work will be necessary to confirm the reliability of these findings in larger samples. For a description of additional relationships between the texts' structural features and broader text characteristics, we refer the reader to the Supplement.

Discussion
Here we examined the structure and topological development of semantic networks of mathematical knowledge as extracted from linear algebra texts. Meso-scale structural analysis indicates that the semantic networks exhibit strong core-periphery structure, where a tightly knit group of concepts form a core, surrounded by sparsely-connected periphery concepts that are grouped into communities. Furthermore, these features appear to relate to the growth of the networks over the course of exposition; the cores of networks are built more quickly than the peripheries, and edges within each particular periphery community are introduced at varied times over the course of exposition. Using persistent homology, we extracted the knowledge gaps inherent in the exposition and found that the number of distinct connected components tends to decrease throughout the text, while topological cavities tend to increase. Finally, we examined possible relationships between the extent and persistence of knowledge gaps and other features of a text and its associated semantic network, providing motivation for future work examining the role of knowledge gaps in learning.

Structure and evolution of mesoscale features in semantic networks
The prevalence of core-periphery and community structures in the networks we examine is consistent with a hierarchical structuring of mathematical knowledge, in which there exist a set of foundational concepts (the core), which are necessary for the subsequent logical development of subsidiary (periphery) concepts, which themselves are hierarchically organized into related communities. The generic notion of hierarchical structure in mathematics has been discussed in the context of presenting a logical sequence of concepts in education [34]. Hierarchical structure has also been noted in Wikipedia topic networks, in which concepts tend to maintain several connections to the foundational concepts used in each article [18]. A hierarchical structure of mathematics knowledge is intuitive, particularly within a delimited area such as linear algebra: a set of foundational concepts, such as matrix, vector, and linearity, are used to motivate and develop the rest of the topics within the field, which, for the most part, all presuppose the concepts in the core. Naturally, this hierarchy will not be a simple dichotomy (core-periphery), but the subsidiary concepts should themselves fall hierarchically into different groups, which may differ across texts due to author interests and publisher goals.
The observed growth dynamics of core-periphery structure offers a coherent expositional model. Given that the set of core concepts are highly related, and thus plausibly represent the concepts providing the foundation for linear algebra, it seems reasonable to introduce these concepts early and to introduce periphery concepts, which presuppose the core, later. The importance of giving sufficient foundational context and prior knowledge in exposition is well-appreciated [35]. Further, the edge dynamics we observe are consistent with an expositional model in which topics are procedurally related to each other; the core concepts are introduced first, and used to introduce, at each point in the exposition, the communities that are being focused on; furthermore, subsidiary concepts that have already been introduced may then be used to give context for and develop further, separate subsidiary communities. Such an expository approach, in which connections are consistently introduced between that which has been learned, and that which is to be learned, has been demonstrated as useful in teaching mathematics proofs [36]. An extension of this motivating expository style is one that incorporates the historical context [37]. In future, it could prove fruitful to compare the expository structure of mathematics texts and the historical development of the results included in those texts.
It is worth noting that, while recent efforts address the dynamics of core-periphery [38,39] and community [40,41] structures in networks, comparatively little work has addressed the growth and emergence of these structures over time. The perspective of our work is therefore important; we consider there to be some a priori structure of mathematical knowledge, unbeknownst to us, which is elucidated and approximated by the various texts [42]. Thus, rather than examining the evolution of meso-scale features in the networks, we instead focus on how the eventual features, which we take to represent those present in the latent structure of the knowledge, happen to be created throughout exposition. Such methods dealing with the emergence of meso-scale features could prove useful in studies of learning. For example, how does the semantic networks of students' knowledge evolve as students are taught? Can that evolution be formally predicted by a generative network model built from the textbook used in the class?

Knowledge gaps in the exposition of mathematics texts
Using the tool of persistent homology, we examined the growth and persistence of knowledge gaps (colloquially), or topological cavities (formally), in the semantic networks of linear algebra texts. While this tool has been applied to other types of text, including Shakespeare's plays [43] and natural language [44], little is known about how knowledge gaps within an expositional text or growing semantic network may impact how that text or knowledge structure might be received or understood. Our hypothesis, motivated by the idea that a topologically complex structure with many gaps in knowledge might be more difficult to learn, was that effective exposition likely seeks to produce a smaller number of knowledge gaps, as the creation of a great number of topological cavities could prove confusing to a reader. Still, leaving a few gaps throughout exposition can add intrigue to the subject and allow for the reader to make connections themselves.
In the context of this discussion, it is interesting to contrast the features of a process that humans have arguably optimized for explicit learning with the features of a process that nature has arguably optimized for implicit learning [45]. As a token of the former, we consider textbook writing; as a token of the latter, we consider language acquisition in children [5,42]. Evidence suggests that knowledge gaps, detected as topological cavities, are a robust feature of language acquisition in toddlers and their prevalence is unaffected by maternal education or by the order in which words are learned [25]. One could speculate that this observed homogeneity in the early semantic feature network learning supports robust language acquisition, ensuring that children who are exposed to different sets of words at different times are still able to reach adult language proficiency. In contrast, when constructing an exposition for a textbook whose sole purpose is to take a set of students from naivety to sophistication in the same place and at the same time, such robustness is not needed and instead consistency, thoroughness, and comprehensiveness is required. The relative paucity of knowledge gaps in the textbooks we study here would be consistent with these distinctions in goals and environment. It could prove useful in future to more generically assess the robustness of growing networks to the order of nodes [46], particularly as a function of the implicitness or explicitness of the learning process.

Future directions
A clear open area for future work lies in understanding tradeoffs in ordered network structure. Here, we find four separate instances in which semantic networks of linear algebra textbooks appear to balance competing constraints. First, while core-ness and modularity are higher than expected in a continuous configuration null model, they are notably lower than expected in a random index null model. Second, while core nodes tend to be added more quickly than periphery nodes, the difference in speed is more stark in the random index model. Third, while some texts add core nodes faster than expected in the random sentence order null model, some texts add core nodes more slowly, suggesting that each text opts for a different expositional style. Fourth, while the barcodes of the true networks are relatively sparse compared to the random edge model, they still exhibit more persistent cycles than the most ordered models, the node-ordered and topological distance filtrations. Collectively, these results suggest that effective and useful exposition, while structured in nature, is not as strongly structured as it could be. It may be effective to purposefully introduce some gaps in knowledge by withholding topics to support productive failure [22] or by providing broad yet detailed motivation before introducing topics. Of course it is also possible that our observations reflect the nature of the structure of mathematics: perhaps mathematics simply does not have as stronglyordered a structure as we might observe in our null models. Future efforts could seek to better understand this tradeoff and its potential causes.

Materials and methods
Details regarding the materials and methods used in the performance of this work can be found in the Supplementary Materials.

Summary of Supplementary Material
In this supplementary document, we provide supplemental methods, followed by supplemental results. We conclude with additional discussion relevant to the findings in both the main and supplemental texts.

Supplementary Methods
In this work, all the tools and methods that we develop and use are designed to be broadly applicable to any expositional text. We therefore also provide Python code containing tools for extracting semantic networks of concepts from text as well as performing analysis on the resulting networks at https://github.com/nhchristianson/Math-text-semantic-networks.

Data Collection and Preprocessing
We collected a set of ten PDF files of linear algebra textbooks with publication dates ranging from 1967 to 2018 [47,48,49,50,51,52,53,54,55,56]. The set is relatively diverse, with some books focusing on theory and other books focusing on applications of linear algebra to real world problems. The set also includes two texts that were translated from a different language, and two texts that are made available online for free use. We converted the PDF files to text files with the tool at https://pdftotext.com/, and manually cleaned each text to remove everything except the main chapters, discarding any introduction or appendix chapters. After converting the text to the unicode KD normal form and replacing hyphens with spaces, we used spaCy [57] to lemmatize all the words in each text, which reduces inflected forms of words to their dictionary form. We then used the Python Natural Language Toolkit (NLTK, Version 3.3 [58]) to tokenize the text into sentences and their component words, replacing any word with numerical characters in it with the character "#". We then removed all words not comprised solely of letters and "#", converting the remaining words to all lowercase. Due to the presence of embedded mathematics in the textbooks' expositions which is imperfectly handled by the PDF-to-text conversion process, we implemented a final measure in an attempt to clean the text of artifacts such as the remnants of variables and equations. In particular, we first created a "stop list" -that is, a list of common stop words in English -by first taking the Ranks NL Long stop word list (https://www.ranks.nl/stopwords) and removing all single-letter stop words from the list except "a", since such single character words likely reflect variables within the text. Then, we applied a series of rules to determine whether any given word token was sufficiently variable-like to be converted to a "VAR" variable placeholder: any "#" placeholder was kept as is; then, any word without vowels was converted to "VAR"; then, any word of length at most two not present on our stop list was converted to "VAR"; and finally, any word of length 3 or 4 that did not spell check using the Enchant [59] spell-checker was converted to "VAR".

Concept Extraction
In order to construct a semantic network, it is first necessary to choose which concepts should comprise the nodes of that network. The aforementioned studies have generally considered all or most of the individual words in a text as the network nodes; we avoid this assumption so that we may consider, further than individual words, higher-level concepts that may be presented in multi-word phrases. Another choice of nodes could be the topics present in the index of a text, if an index is included. We also choose not to use this method, as we seek to determine and extract the concepts from the text's exposition via some more intrinsic metric of conceptual significance. This choice was motivated by an interest in examining the semantic networks of concepts that the text poses as significant, rather than simply those of concepts which the author deems significant. Thus, via this paradigm of intrinsic conceptual significance, we aim to emulate human readers in their assessment of the significance of concepts. In choosing a methodology of extracting concepts from the texts for use as the networks' nodes, we sought to find a method that would maximize the number of extracted mathematical concepts while minimizing the number of extracted words and phrases that are not mathematics related. We also sought a method that would be extensible to domains of knowledge and exposition aside from mathematics, so that our whole methodology can be extended to the analysis of general textual exposition.
The linguistics and natural language processing (NLP) literature provide a number of canonical statistical metrics for determining the significance of n-grams, or phrases comprised of n words, within text [60]. After exploring a number of both supervised and unsupervised keyphrase extraction methodologies, we settled on an unsupervised method based on the rapid automatic keyword extraction (RAKE) algorithm [31]. RAKE works as follows: 1. A provided set of stop words, phrase delimiters, and word delimiters are used to divide the document into a set of candidate keyphrases and their comprising keywords.
2. The frequency of keywords and their co-occurrence in different keyphrases is calculated, forming a co-occurrence graph.
3. The candidate keyphrases are ranked by a scoring function score(k), which typically ranks candidate keyphrases by certain properties of their comprising keywords.
4. Some threshold n is chosen, and the top n ranked candidate keyphrases are taken to be the extracted keyphrases.
In RAKE, the scoring function for a candidate keyphrase k is typically taken to be where deg(k i ) and freq(k i ) are the degree and frequency, respectively, of the ith keyword comprising the phrase k in the co-occurrence graph RAKE constructs. As such, RAKE poses that significant keyphrases are those whose component words co-occur with many other words, but do not occur very frequently. Because we wish to ensure that the scores of more plausibly mathematical words are higher, the keyphrase scoring function we use incorporates the term frequency-inverse document frequency ranking method [61], adding an additional term to the RAKE score function in order to account for how frequently a given keyphrase occurs in an external corpus. Specifically, we specify our phrase scoring function as where brown(k) is the number of times that the entire keyphrase occurs in the Brown corpus [62] with the "Learned" category (comprised of scientific and other academic texts) removed. As such, we aim to penalize phrases that occur very frequently in regular text, as such words will likely not be as meaningful in a mathematical sense. We add 1 to the brown(k) term since not all phrases extracted from our texts by RAKE occur in the Brown corpus.
In our code, we use the python-rake implementation of RAKE (https://github.com/fabianvf/pythonrake); as a stop word list, we use the modified Ranks NL Long stop word list we discussed in the previous section, from which we remove the word "value" (which plays an important role in linear algebra phrases such as "singular value decomposition". We also add to this stop list our placeholder words "#", "VAR", and "-pron-" (the pronoun placeholder output by the spaCy lemmatizer), as well as certain words used extensively in mathematics exposition that do not convey mathematical content, in an effort to ensure that our keyphrases might better reflect a set of meaningful mathematical concepts (see Supplementary Table S1). We also prune the candidate pool by specifying that keywords must be comprised of at least 3 characters and must occur at least 5 times within the text, and keyphrases can be no more than 4 words long. Given these specifications, RAKE generates a set of candidate keyphrases and their associated scores, which we modify with the addition of our extra Brown frequency term. We then clean the candidate keyphrases by removing any of the numerical, variable, or pronoun placeholders; after this cleaning, if there are any duplicate candidates, we give the keyphrase in question the highest score from all duplicates. We choose a threshold of one-half of the candidate keyphrases, since this threshold appears to include most phrases one might expect to reflect significant linear algebra concepts in each text; thus we take the top half of scored keyphrases to be the concept set for each text, which we shall henceforth refer to as the index list. This threshold of one-half is similar to thresholds used in other work, such as the threshold of one-third in RAKE examples counterexample text texts undergraduate chapter definition notation proof exercise result  Figure S1: A simple example of a semantic network comprised of linear algebra concepts. (a) The lack of connection between "square matrix" and "eigenvalues" or between "isomorphism" and "determinant" indicates the presence of a knowledge gap. (b) The knowledge gap is extinguished by the addition of the relationship between "isomorphism" and "determinant," thus ensuring that all concepts' neighbors are also neighbors themselves.
[31] and Textrank [63]. However, no choice of threshold will perfectly include all relevant concepts and omit irrelevant words.

Network Construction
Once we have determined a set of concepts to use as the nodes of a text's semantic network, we then wish to form the semantic network of those concepts and their relationships, as provided by the text's exposition. Certain approaches to semantic network construction seek to determine not only whether two entities are related, but also the semantic nature of the relationship between the entities in question. Fig. S1gives an example of such an annotated semantic network, in which each relation has a meaningful label. Such semantic parsing techniques to generate semantic networks have been applied to scientific texts in several cases [64,65], but they generally require involved syntactic parsing rules or data annotation. We did not use these approaches, as the messy nature of the text-converted mathematics textbooks -with embedded variables, formulas, and symbols sometimes interjecting sentences -likely would have interfered with effective inference of semantic relationships. Instead, we use a method of extracting concept relationships that is more resistant to such noise: co-occurrence frequency [66]. Co-occurrence is a notion specifying the degree to which words or phrases tend to occur nearby each other in either a text or a set of texts. Statistical metrics based on co-occurrence have been studied extensively in the field of computational linguistics as a measure of the semantic relatedness of words or phrases [67,68,69].
We construct our semantic networks of concepts by calculating co-occurrence of the concepts in each text. Because we are interested in relationships between concepts which are not purely linguistic in nature, and since many of our concepts are multiple-word phrases, we choose to calculate co-occurrence on the sentence level; this level of granularity will also ensure that phrases in the same sentence, yet separated by a string of math variables, will be inferred to be related. Thus, we proceed in constructing each text's semantic network by calculating the co-occurrence of its index set, deeming two concepts to co-occur, and thus be related, if they occur in the same sentence at some point in the text. We also assign to each edge between concepts an integer weight indicating the number of sentences in which the two concepts co-occur. This data yields an undirected weighted graph G = (V, E), where each node v ∈ V is a concept and each edge (v 1 , v 2 ) = e ∈ E represents a semantic relationship between concepts with an associated positive integer weight w(e) ∈ Z + denoting the number of sentences in which the two concepts co-occur.
We are not merely interested in the total semantic network of each textbook, but in the development of the semantic networks over the course of exposition. Thus, for each text, we keep track of the first sentence in which each concept and each relationship -equivalently, each node and each edge -is introduced. If a text has N sentences, our methodology of extracting growing semantic networks gives us a sequence of N graphs G 1 → · · · → G N , where the kth graph G k includes all nodes and edges which have been introduced prior to or during the kth sentence of the text. In the context of algebraic topology which will be employed throughout this study, such a sequence of nested objects is called a filtration. Henceforth, we will call this sequence the expositional filtration of a text. In considering this filtration, we consider the binarized graphs; that is, we disregard edge weight data and only consider edge weight data for the final semantic network, which we call the total network.

Meso-Scale Structure of Networks
Complex networks may often exhibit meso-scale or global characteristics of structural order. For example, one may wish to determine whether the network exhibits community structure, in which densely connected communities of nodes exhibit sparse or weak inter-community connections [70]. In the context of semantic networks, such densely connected communities may represent strongly related concepts that reflect the existence of some higher-order enveloping concept or umbrella term. Another type of meso-scale structure which may be exhibited is core-periphery structure, which is characterized by a densely connected set of core nodes and a set of periphery nodes, which are sparsely connected amongst themselves, but are strongly connected to the core [71]. Such an organization of semantic networks is a plausible one in the context of mathematics, in which many different ideas may be developed from a smaller set of highly related concepts.
To detect community and core-periphery structure in the networks, we used the Brain Connectivity Toolbox for Python, version 0.5.0, which is based on the MATLAB Brain Connectivity Toolbox (BCT) [72]. To evaluate the presence of a core-periphery structure, the relevant function in the BCT seeks to assign a network's nodes to either the core or the periphery group, so as to maximize the core-ness quality function [73]: where C c and C p are the sets of nodes in the core and periphery, respectively, w ij is the weight of the edge from node i to node j (which will be 0 if the nodes are not connected by an edge),w is the average of all edge weights, where nonexistent edges with "zero weight" are also included in the average, γ C is a resolution which controls the size of the core, which we set to 1, and v C is a normalization constant. In effect, in maximizing core-ness we seek to maximize the number and weight of intra-core connections, while minimizing the number and weight of intra-periphery connections.
To evaluate the presence of community structure in the networks, we use a Louvain-like locally greedy algorithm [74] to optimize the modularity quality function: where C is the set of network nodes, w ij is the weight of the connection from node i to node j, s i and s j are the summed weights of edges connected to node i and node j, respectively, γ M is a resolution parameter controlling the size of communities which we set to 1, v M is a normalization constant, and δ ij is the Kronecker delta function, which is 1 when node i and node j are in the same community and is 0 otherwise [73]. In effect, modularity maximization seeks to maximize the strength and number of connections within communities, yielding a partition of the network nodes into a set of densely connected communities with few inter-community connections.

Persistent Homology
Beyond characterization of the local and meso-scale attributes of the total semantic network of the texts, we furthermore seek to evaluate structural and topological characteristics of the semantic networks as they are built over the course of the entire text. In particular, we wish to examine the extent to which "knowledge gaps" are created and persist in semantic networks throughout a text's exposition. To this end, we use a method with roots in the mathematics of algebraic topology called persistent homology which, in short, evaluates the creation and lifespan of topological "holes" in data over time, or in this case, over the course of exposition, thus allowing us to characterize and evaluate the presence of these gaps in knowledge. Here we give a brief, intuitive overview of how we calculate persistent homology for our expositional semantic networks; the particularly interested reader may refer to Refs. [27,26,28,29] for a rigorous overview of persistent homology and its computation for data analysis, as well as Ref. [75,76,77,30] for example uses of persistent homology in the context of complex networks.
Recall that a text's semantic network at a certain point in the exposition (a particular graph in the expositional filtration) is an undirected graph, where connections between nodes indicate that the concepts represented by those nodes have already co-occurred in a sentence. Given a binary undirected graph G = (V, E), we may construct an object called the clique complex X(G), which, for every natural number k, assigns to every all-to-all connected subgraph of G on (k + 1) vertices (also known as a (k + 1)-clique) a k-simplex, which may geometrically be represented as the convex hull of (k + 1) affinely independent points. For example, a 0-simplex is simply a single node, a 1-simplex is an edge, a 2-simplex is a filled-in triangle, and a 3-simplex is a filled-in tetrahedron. Intuitively speaking, this clique complex X(G) is a "filled-in" version of the graph G, where, for each k, we choose a distinct color and then we color in all (k + 1)-cliques in G to form k-simplices. Then, classical homology intuitively tells us, for each k, how many topological "holes" of dimension k are in the complex, or similarly, considering a single dimension's color, how many regions are not colored, but are enclosed by that color. In other words, homology detects cycles 1 of k-simplices that surround a void. For example, a 1-cycle reflects a conventional cycle in a graph, just like the hole in a circle; and a 2-cycle reflects a cavity, like the hole in the center of a sphere. A 0-cycle is intuitively slightly different, in that 0-cycles refer to connected components of the graph, so that having more than one 0-cycle tells us that multiple disconnected components exist. In our work, we restrict our focus to these first three dimensions, since these are the most geometrically intuitive. These cavities or holes are exactly the knowledge gaps which we seek in the semantic networks, as they reflect some closed cycle of (k + 1)-order connections between concepts surrounding a region where the concepts are not connected to that extent.
A useful extension of homology is that, in addition to being able to calculate the homology of a clique complex for each semantic network in the exposition so as to count the number of holes present at each step, we may also keep track of which topological cavities are being created and destroyed at each step in the exposition. Specifically, persistent homology allows for the computation of the homology for the sequence of clique complexes of our expositional filtration X(G 1 ) → · · · → X(G N ); this tool not only keeps track of the number of cavities of each dimension present at each expositional step, but it also tracks the persistence of each individual cavity over the course of the exposition, so we may identify individual knowledge gaps, when they were created, how long they persist, and when they are extinguished. Rigorously, the kth persistent homology of a graph filtration yields a (multi-)set of intervals called the barcode: where b i indicates the time of birth of the ith k-dimensional cavity, and d i indicates the time of death of that cavity (which may be ∞ if the cavity is still present in the total network, i.e., it never dies). Thus, the number of intervals, as well as their length (the difference between their death and birth times), indicate the number and persistence, respectively, of topological cavities during exposition.
Once we have computed the persistent homology of a text's expositional filtration for a given k, we may use several characteristics of the resultant persistence intervals to examine various aspects of the persistence of knowledge gaps in the semantic networks. In particular, we consider three metrics: the first is the number of k-cavities present in the final semantic network, which is given by the number of intervals for which d i = ∞, meaning that they persist to infinity. Secondly, we examine the value of m, which gives the total number of k-cavities which were created, at some point, over the course of the exposition. Finally, we define a metric similar to one presented in Ref. [33], which we refer to as the normalized average cycle lifetime of dimension k: where N is the number of steps in the filtration, and d i is the time of death of the ith k-cavity, unless d i = ∞, in which case we set d i = N + 1, to distinguish these infinitely-persisting cavities from those that die at step N . Intuitively, this metric describes the extent to which an expositional filtration has cycles which are persistent; it is normalized by N , the length of the filtration, and m, the number of k-cycles introduced throughout the filtration, so as to be comparable across texts which might have different filtration lengths or total numbers of cycles introduced. The goal is to allow a formal comparison of how persistent k-cycles tend to be in different texts.
In our work, we use Ripser.py [78] due to its speed and efficiency for the computation of persistent homology for the true networks and the null models. Because Ripser.py cannot extract representative cycles, we use the Python package Dionysus (http://mrzv.org/software/dionysus2/) for extracting example cavities from the persistent homology.

Null Models
In order to determine to what extent the results we observe for meso-scale structure and topological dynamics in the semantic networks are significant, we employ two categories of null models: data-level null models, which randomize on the scale of the underlying text and index list, from which we may then extract semantic networks and expositional filtrations; and projected network-level null models, which randomize on the scale of the networks we extract for each text. Furthermore, while some of these models are particularly suited as null models for the structural metrics on the total network, as they yield a single, weighted network, others are more suitable as null models for the growing dynamics of the semantic networks, as they provide a null expositional filtration. For each null model, our null ensemble is comprised of 100 random instantiations; we present the resulting distributions of metrics alongside the data for our actual networks.
We begin with the data-level null models: for both the total network and the expositional filtration, we wish to determine the extent to which our results might simply be reflective of the topology one would expect from the growing "semantic network" generated by computing the co-occurrence of a random set of words in our texts. To this end, we employ a random index null model, in which we select a random set of index terms of equal size to the original index list, drawn without replacement from the set of words comprising each text (not including the augmented stop word list we used for RAKE extraction). We use this random index list as our set of "concepts" for calculating each text's co-occurrence, yielding both a final weighted network, as well as an expositional filtration, allowing this null to be used both in the comparison of meso-scale structure and development, as well as of persistent homology. Note, however, that we may interpret the random index null model in a different way: that is, since the random index set excludes any stop words, it must be comprised of meaningful words. Thus, the random index model can be viewed as conveying a semantic network -not the network that the book intends to convey, but a semantic network nonetheless that may very well include some mathematically meaningful concepts.
We further seek to establish the extent to which our results on topological development of the networks are dependent on the order in which relationships are introduced within the texts. We therefore employ a random sentence order null model, in which for each text, we randomly permute that text's sentences, and use the original set of index terms to calculate co-occurrence. This null model yields the same total network, since the index set is the same and the same sentences are present, and thus the same sentence-level co-occurrences will occur; however, the filtration it yields will differ in the order of edge introduction, thus enabling us to study how the meso-scale and topological development of the network differs based on differing sentence order.
The remainder of our null models are projected network-level nulls. To evaluate the extent to which the results we observe for the core-periphery and community structure of the true networks would be expected from a random network with a similar joint distribution of node degrees and weights, we use the continuous configuration model [79]. This model is an extension of the configuration model for random graph generation, and seeks to preserve the expected degree of each node, as well as the expected strength of the node, where a node's strength is the sum of the weights of the edges it participates in. Specifically, if d u and s u give the degree and strength, respectively, of a node u, and d T and s T are the sum of all node degrees and strengths, respectively, then given some graph with node set [n], for any two nodes u, v ∈ [n], we define d uv = dudv d T and s uv = susv s T , as well as {P uv } as some family of probability distributions with mean one. Then to generate a graph using the continuous configuration model, we iterate through all possible pairs of nodes u, v, introducing an edge between u and v with probability d uv ; if an edge is introduced, then the edge is given weight w uv = suv duv ξ uv where the normalized weight random variable ξ uv ∼ P uv . For the sake of simplicity, we assume that all distributions P uv are identical, so that all ξ uv iid ∼ P ; we discuss our fitting of the distribution P for each network in the supplement.
To examine how our results on persistent homology differ from a model of exposition in which connections are drawn completely at random -that is, with a filtration of the true total network that adds edges randomly -we employ the random edge null model. In this model, edges present in the true network are introduced in a random order, and nodes are introduced immediately preceding their first inclusion in an added edge. Next, to determine how our persistent homology results differ from a model of exposition in which concepts are iteratively introduced and connected to all already-introduced concepts, we examine a node-ordered filtration [25,46]. In this null model, nodes are added by order of introduction in the text; if multiple nodes were originally added in a single sentence, then those nodes will be added to the node-ordered model in a random order. After each node is added to the null, all edges between it and previously-added nodes that are present in the total network are added in a random order.
Finally, we must consider a caveat for these null models: in particular, while the original expositional filtration, the random index null, and the random sentence order null have some intrinsic sense of "time" of introduction due to the presence of the sentence structure of the text, the latter two null models do not, as they introduce nodes and edges one at a time. As such, in order to meaningfully compare persistence barcodes amongst all these models, we must "unfurl" the expositional filtrations of the real network and the random index networks. To this end, we introduce the one-at-a-time (OAAT) filtration process; this methodology takes a filtration in which multiple nodes and edges might be introduced in single sentences, such as the expositional filtration of a text, and transforms it so that only a single node or edge is added at each step in the filtration. Specifically, for each sentence, the OAAT process examines what nodes and edges are added to the network in that sentence; if multiple nodes are added, then they are added first, one at a time, in a random order; then edges are added, one at a time, in a random order. For our true expositional network, we compute 100 instantiations of this OAAT filtration in order to account for stochasticity in the random ordering (we do not do this for each random index or sentence order filtrations, since we already compute 100 distinct such graphs). With this method, we may examine the topological development that occurs not just over the course of the text with a sentence-level granularity, but also on a sub-sentence scale.
There are certain tradeoffs we make in using the OAAT filtration for our expositional filtrations. In particular, we lose the direct relationship of cavity persistence length to "time", or sentence duration throughout the text, since we instead simply introduce one node or edge at each "timestep" in the OAAT filtration. However, long cycles should still tend to be long, under the assumption that there is relatively consistent introduction of nodes and edges throughout the texts. Furthermore, this "unfurling" of the expositional filtration gives us the ability to do a tête-à-tête comparison of our latter two null models to the expositional filtrations. These two nulls have no built-in notion of time, and introduce a single node or edge at each step of their filtration; as such, putting our expositional filtrations on equal footing makes the qualitative and quantitative comparison of the persistent homologies of these filtrations more direct.

Estimating the normalized weight distributions for the continuous configuration model
The parametrization of the continuous configuration null model for weighted undirected graphs rests upon the choice of a family of probability distributions P uv that specifies the distribution of the possible "normed weight" values for each edge connecting nodes u and v in a network's node set. Specifically, where d u and s u are the degree and strength, respectively, of a node u, and d T and s T are the sum of degrees and strengths respectively over all nodes in a network, and d uv = dudv d T and s uv = susv s T give a normalized view of to what extent two nodes are both high (or low) in degree or strength, then the continuous configuration model assumes that the weight of an edge between two nodes u and v, if such an edge exists, will be where ξ uv ∼ P uv , some probability distribution on what we call the "normalized weight" of an edge. In our work, for the sake of simplicity, we make the assumption that all normalized weight distributions are the same distribution P . With this assumption, we may choose a parametrization of P and fit this distribution on the empirical normalized weights of all edges in a given network. In particular, if the empirical edge weights are given asŵ uv for all u, v in the set of edges, then the empirical normalized weights are simply given byŵ uv duv suv . Once we have the normalized weights, we may choose a parametrization. Because the normalized weights of a network are positive and not restricted to the integers, we attempted maximum likelihood fits of a number of continuous probability distributions with support on the positive real line on each of the networks' normalized weights. Specifically, we focused on long-tailed distributions: the Pareto, Log-normal, Lévy, Burr, Fisk, Log-gamma, Log-Laplace, and power-law distributions. We also calculated the Kolmogorov-Smirnov (K-S) statistic D of each best-fit distribution in order to determine how well the distribution fit the empirical normalized weight data. Distributions were fit and K-S statistics were calculated in Python with the SciPy library, version 1.1.0. In all networks, the K-S statistic was quite low (D < 0.025) with p-values all significantly greater than 0.05, indicating good fit between the empirical and best-fit distributions, or insufficient evidence to reject the null hypothesis that the empirical normalized weight distribution and the best-fit distribution are identical. The best-fits and statistics for each text's network are reported in Table  S2.
Concepts that appear in more than half the semantic networks' cores See Table S3.

Example concepts in the Axler periphery communities
See Table S4.

Development of the meso-scale core-periphery and community structures
Similar to our analysis of the development of each text's core and periphery, we further wish to examine the development of the community structure in the semantic networks through the addition of edges between particular groups over the course of exposition. Specifically, we consider four edge types: 'core-periphery' edges, or those connecting a core node with a periphery node; 'intra-core' edges, connecting two core nodes; 'inter-periphery' edges, connecting nodes in two different periphery communities; and 'intra-community' edges, connecting two nodes in the same periphery community. We examine the relative introduction of each group of edge types by calculating, at each point in the texts' expositions, what fraction of edges in a particular group have been introduced. We show in Figure S2 the mean ± 2 standard deviations of these group introduction curves across all texts; for the intra-community curves, we plot two examples: one of an early-introduced community, which attains a value near 1 reflecting near-completion relatively quickly, and one of a late-introduced community, which takes longer to be fully developed, and remains closer to 0 throughout much of the text.
Note that while the core-periphery, inter-community, and intra-core edge sets appear to be introduced steadily, showing little deviation from the diagonal y = x, which reflects constant introduction over time, the early and late intra-community examples shown have significant variability and deviate greatly from such constant introduction. We may quantify this behavior of deviation from constant introduction with the Kolmogorov-Smirnov (K-S) distance: in particular, for any of the edge group development curves c(·), we examine its K-S distance, or greatest vertical distance, to the line y = x on the interval (0, 1): Note that we chose our early-and late-introduced communities in Figure S2 as those communities with the most positive and negative values of c(t) − t on the interval (0, 1), respectively. We plot the resulting K-S metrics for each edge group type across all texts and corresponding null models in Fig. S2b-e. We find relative consistency across texts in relatively low K-S values for the intra-core, core-periphery, and inter-community groups, and notably, in many cases it appears as though the actual texts exhibit lower K-S values, and thus more constancy in edge introduction in these groups, than the bulk of the random index and random sentence order graphs (Fig. S2b-d). Notably, we also observe that while many of the texts exhibit lower mean intra-community K-S values than the bulk of the random index networks, they also generally lie well above the distribution of values for the random sentence order null graphs. Thus, this pattern of findings suggests that while the true texts generally exhibit significant variability in when intra-community edges are introduced during the exposition, the reordering of sentences that occurs in the random sentence order model disrupts this variability, causing a community's edges to, on average, be introduced in a more distributed fashion over the course of the reordered 'exposition'. In turn, these findings suggest that the periphery communities extracted from the true networks do indeed reflect distinct groups of related concepts that are localized in their position in text, as we might expect from a chapter focusing on a particular topic S2e. Normalized average cycle lifetime for texts and all null ensembles.

Barcodes and Betti curves for all texts and null models
For normalized average cycle lifetimes of the sentence-granularity filtrations for the true texts, random index model, and random sentence order model, see Fig. S7. For the normalized average lifetimes of the OAAT filtrations for the true texts and all null models, see Fig. S8.

Extended correlation analysis
In them main text, we report results of a brief exploratory analysis assessing the relationship between structural features of exposition and community ratings of the textbooks from which the expositions are taken. Here, we provide the complete statistics for the Spearman and Pearson correlations between average rating on Goodreads and normalized average cycle lifetime (NACL) in Table S5. We furthermore examine additional correlations between text features, both structural and otherwise, in Fig. S9, with associated p-values in Fig. S10. Notably, while we observe correlations between average and dimension-2 OAAT NACL and both number of sentences and node count of each text, neither of the latter structural features are significantly correlated with the average text rating. Furthermore, though the number of ratings for each text is highly variable (Table S6), we find that this does not significantly correlate with text rating (Spearman ρ = 0.464, p = 0.294). Finally, we find that both dimension-0 and average OAAT NACL are negatively correlated with the frequency of the word "proof" in the texts' sentences (Spearman ρ = −0.782, Figure S2: Community development curves across texts, and associated K-S distance between community development curve types and the line y = x across all texts and null ensembles. (a) Mean ± 2 standard deviations of community development curves (fraction of edges within a particular group present at a particular normalized time in the exposition) across all texts, (b) K-S distances for the core-core edge introduction curve, (c) K-S distances for the core-periphery edge introduction curve, (d) K-S distances for the periphery-periphery edge introduction curve, and (e) mean K-S distances across intra-community edge introduction curves. Figure S3: Sentence-filtration barcodes and Betti curves for the first half of the texts. Each pair of rows shows an example barcode and betti curves for a given text, with text results in the leftmost column and null models in the other columns. Figure S4: Sentence-filtration barcodes and Betti curves for the second half of the texts. Each pair of rows shows an example barcode and Betti curves for a given text, with text results in the leftmost column and null models in the other columns.

Supplementary Discussion
Knowledge gaps in the exposition of mathematics texts Topological cavities are detected as persistent cycles. Notably, we found that most cycles were eliminated before the end of each text. We saw that, while multiple individual connected components were introduced, everything was eventually connected into a single piece, and most were connected to the primary connected component quite quickly, suggesting that the expositional order of introduction of edges throughout the text minimizes the extent to which cavities are formed. Notably, the order of the expositions -that is, the extent to which cycles were not introduced and did not persist -did not appear to be maximal. That is, the node-ordered filtration null model exhibited significantly sparser persistent homology than we observed in the texts (Fig. S8). This observation suggests a tradeoff between topological order and apparent learnability; specifically, though, while such neatly-ordered expositions might minimize the extent to which knowledge gaps are created and persist, it is likely in the best interest of readable and enjoyable exposition to not follow this purely structural ordering -that is, to properly motivate concepts, give relationships where they might seem natural and useful, and make the text generally more readable.
Our correlation analysis of the barcode densities suggests some interesting directions for further study of the potential relationship between persistent homology of a growing semantic network and effective learnability. Specifically, while our study did not deal explicitly with differential learnability of texts or in how knowledge gaps might affect the learning process, we did observe several interesting relationships between the 0-and 2-dimensional barcode densities and textbook ratings. While these results are preliminary, they suggest that an interesting avenue for further study would be to examine the topology of growing semantic networks in the classroom setting. In particular, one could consider multiple networks: the network of the textbook being used, providing the "latent space" of the knowledge and the relationships between concepts; Figure S9: Spearman correlation matrix for text features, including sentence-and OAAT-normalized average cycle lifetime (NACL), core-ness and modularity statistics, core -periphery area, intra-community edge development K-S, word frequencies, average text ratings and number of ratings, and text length, node count, and edge density. "NACL d" refers to NACL in dimension d. Figure S10: Spearman correlation p-values for text features, including sentence-and OAAT-normalized average cycle lifetime (NACL), core-ness and modularity statistics, core -periphery area, intra-community edge development K-S, word frequencies, average text ratings and number of ratings, and text length, node count, and edge density. "NACL d" refers to NACL in dimension d.
the teacher's network, as provided in class to the students through lessons; and finally, the students' networks, as they develop over time while the students learn the material. An analysis of the developmental and topological relationships between all three of these classes of semantic networks could yield interesting results in how knowledge structures are transferred from teacher and book to student, and could provide useful insight to effective structuring and expositional presentation of knowledge in a textbook format.
The remarkable effectiveness of the random index null model Throughout our study, we have used the random index model as a null to examine how the results we get for the texts' actual semantic networks differ from what we might expect when simply calculating the co-occurrence networks and filtrations of a set of random words in a text. Notably, while most of our results have fallen at the extreme ends of the distribution of metric results we observe for the random index ensemble, almost all of our results, with the exception of the core-periphery curve difference area in particular, fall within the range of values given by the ensemble. The perspective that the null simply gives us a weighted network and filtration computed from the co-occurrence of a random set of words might be disheartening, as this could suggest that our results, rather than informing us as to the meaningful structure of semantic networks of concepts elucidated by the books, instead might simply reflect growing topologies that would be expected from any similar calculation of co-occurrence within a text. However, there is another lens through which we can view the random index null model; recalling that the random index sets are comprised of words not found within the stop word list, we might consider each random index graph as a semantic network itself. Certainly, the semantic features extracted through co-occurrence might not reflect the content which is the primary focus of the text, since the random index set might include non-mathematically-meaningful words. Even so, it is likely that some mathematical words will make their way onto the index set, and the remainder of the words are also meaningful in some way, since they are not stop words. Thus, the random index set actually may be viewed as an ensemble of semantic networks, each of which simply happens to have a different node set, and as such is extracting semantic information about the relationships between different words. In this frame of reference, the explanatory power of the random index model both makes sense, and should be expected.

Methodological Considerations
There are certain limitations inherent in our work that should be considered for future study. First, our text extraction methodology imperfectly converted PDFs to plaintext, leaving significant textual noise and artifacts of embedded math which required subsequent automated removal, and the remnants of which prevented perfect concept extraction and sentence-level co-occurrence calculation. Because textbook PDFs are easier to access than textbook source material, we spent significant time developing our text extraction approach to account for these circumstances so that our methodology could be widely applicable. However, future work could utilize the LaTeX source for textbooks in order to reduce noise. Second, the problem of concept extraction is ill-posed because "concept" is a subjective notion. We examined a number of both supervised and unsupervised keyphrase extraction algorithms, and our modified RAKE algorithm performed best in comparison to our intuitive expectations for linear algebra contexts. However, future work will be necessary to better understand (a) what threshold should be used when extracting concepts from text, (b) what should comprise a "concept" in a semantic network, and to (c) examine hierarchically structured semantic networks to incorporate the subjectivity of concepts into the network structure, so that high-level concepts are distinguished from those which are lower-level. Third, our network and filtration construction methodology is only one of many possible methodologies; as we chose to use co-occurrence to construct the networks, they are undirected and lack edge labels detailing the nature of each relationship. Fourth, the application of a clique complex to infer knowledge gaps in a growing network is one of many choices, and it assumes that any fully-connected (k + 1)-cliques should, in fact, reflect a filled k-simplex of knowledge. However, a possible alternative could be to only add a k-simplex when such higher-order relationships are observed simultaneously, such as when three words co-occur in the same sentence. Finally, further research in a classroom setting should be able to provide insight into what types of knowledge gaps might have an effect on student learning, thus providing an answer as to how persistent homology should be computed on growing semantic networks.