Significance and popularity in music production

Creative industries constantly strive for fame and popularity. Though highly desirable, popularity is not the only achievement artistic creations might ever acquire. Leaving a longstanding mark in the global production and influencing future works is an even more important achievement, usually acknowledged by experts and scholars. ‘Significant’ or ‘influential’ works are not always well known to the public or have sometimes been long forgotten by the vast majority. In this paper, we focus on the duality between what is successful and what is significant in the musical context. To this end, we consider a user-generated set of tags collected through an online music platform, whose evolving co-occurrence network mirrors the growing conceptual space underlying music production. We define a set of general metrics aiming at characterizing music albums throughout history, and their relationships with the overall musical production. We show how these metrics allow to classify albums according to their current popularity or their belonging to expert-made lists of important albums. In this way, we provide the scientific community and the public at large with quantitative tools to tell apart popular albums from culturally or aesthetically relevant artworks. The generality of the methodology presented here lends itself to be used in all those fields where innovation and creativity are in play.


Introduction
Starting from the beginning of the twentieth century popular music has acquired a great importance in modern culture. Technological progress allowed for a massive and worldwide diffusion of musical genres and styles, previously confined in relatively small contexts and milieux. This diffusion process gave birth to a global musical culture, i. e

Dataset
Our analysis is based on a dataset, we collected by crawling the Last.fm website in February 2015, with the dedicated API provided by Last.fm itself (the data can be accessed through [29]; details of the collection of the dataset can be found in the electronic supplementary material, section A). The dataset we collected stores information about music albums, released in a time frame going from 1950 to 2015. For each album, we stored information such as the number of times an album has been listened to through the Last.fm platform (denoted as Playcount in the following), the artists, the release year, the list of songs and all userassigned tags. Multiple releases of the same album have been excluded from the sample, keeping only the very first release. Tags are usually single words or short sentences that users may use to annotate albums to ease future retrieval. To each tag in each album is associated a score ranging from 1 to 100 indicating how frequently it has been adopted by users. For instance, a tag with a score of 100 is very significant for an album, while a score of 1 signals a less relevant tag (see the electronic supplementary material). To each album, we associate the date of its first release and we do not consider its subsequent editions. Therefore, when necessary, we correct the release date of an album by resorting to another music information archive, MusicBrainz (http://musicbrainz.org/) which contains the complete information on the release dates of all the editions of an album. Albums without information on their release date were filtered out of the dataset. The dataset was further filtered to correct for the presence of 'over-tagged' albums, namely tagged a lot by a small number of users. This results in a large number of tags with a boosted high score. As these tags are poorly validated by users, the classification of the corresponding album may be inaccurate. To solve this problem, we considered the correlations between the number of tags of an album and its playcount value, the latter being a proxy for the number of users who ever tagged the album. We used this information to identify 'outlier albums', which display more tags than other albums with similar playcount values, and reduced the number of their tags below a given threshold. More information about this tags-filtering process can be found in the electronic supplementary material, section A. At the end of this process, we remove from the sample all the albums without a correct release date or without tags, ending up eventually with a subsample of 163 829 time-ordered albums and 108 984 distinct tags.

Network of tags and topical representation
The tags associated with albums together with album release dates can be used to build a growing cooccurrence network of tags, by mutually linking tags assigned to the same album [13,30,31]. This network will be then adopted to define the properties of each album and its relation and contribution to the global music production. We start with an empty network at the first year of our albums set (1950). Every year a certain number of albums are introduced in the system according to their release dates. In this growing network G, an album is represented by a clique of tags that merges with the existing network through the addition of new tags, the creation of a new link between two already existing tags or the strengthening of a link as depicted in figure 1a. It is in this way, that an album dynamically modifies the network and contributes to the global music production, possibly affecting future creations. To each node of the network, i.e. each tag, a creation time-stamp y tag is associated as the release date of the oldest album in which the tag ever appeared. In this way, the emerging network is a directed one with each link going from the eldest node to the youngest in terms of creation times. Tags created in the same year will be connected through a bidirectional link. The total number of co-occurrences of a pair of tags until a certain year y defines the weight of the link e = (tag 1 , tag 2 ), w(e, y), where we will drop the dependence on y whenever the network at the final year y = 2015 will be considered. The final (at year y = 2015) network G exhibits a power-law distribution for the out-degrees and weights of links (figure 1b-d). On the other hand, in-degrees distribution is not as broad as the out-degrees one. Though the total number of links is large (11 801 650), about 90% of them has weight w(e) ≤ 2. Figure 1c reports the distribution of the absolute difference between the creation times of the pairs of tags connected in the network. Also, here one observes an heavy-tailed distribution pointing to the existence of albums with tags created in very different times.
We exploit the tag co-occurrence network to easily cluster tags according to a common topic through a community detection algorithm. A community of nodes (cluster of tags) is a subset of nodes more strongly connected among themselves than with the rest of the network. After removing the direction of edges, we apply the Louvain method of community detection [32]. In this way, each tag is assigned  represented by a clique of tags introduced in the network at the first release date of each album. In the above example, the new album contributes both to strengthen an existing link and to add the new 'rock' tag to the network. (b-d) Out-degree (red) and in-degree (blue) distributions, distribution of links' weights and distribution for the creation time differences between nodes linked in the network.
with its own topic according to the community it belongs to. We assume that all the tags within the same community bear a close semantic relation. Note that when considering the growth of the network at different time frames, the community structure might vary as it is likely that many communities would not yet exist at early times, while other communities would contain elements differing from those present at the final stage. However, we will assume that the topic of a tag is defined by the latest community structure referring to the year 2015. In this way, we are somehow looking at the music production from the present-day point of view. In the electronic supplementary material, figure B, we report the fraction of the total number of tags belonging to a subset of topics every year from 1950 to 2015.

Album features and impact metrics
We now introduce several metrics to characterize albums based on the tags co-occurrence network and the associations topics-tags. These metrics, summarized in table 1, are meant to quantify some of the features of the albums, their mutual relationships and their impact on the overall production. We indicate a generic album as a, its release date as y a , the set of its tags as T a and the set of the topics associated with its tags as L a . Similarly, the set of all the possible pairs of tags in T a , each one corresponding to an edge in the co-occurrence network, is indicated as E a . Given the co-occurrence network, F(l, y) indicates the total fraction of tags adopted for a given topic l until year y, while T(y) denotes the set of tags present in the network until the year y. Given the significant correlation between the playcount values and the number of tags of an album (see the electronic supplementary material, figure A), all the metrics or the quantities involved in their definition are 'intensive', meaning that they do not explicitly depend on the total number of tags. We will define in the following the adopted metrics, grouping them in three clusters according to the main information they carry (for a more detailed definition of these metrics, see the electronic supplementary material, section C). Our first set of metrics is related to the 'heterogeneity' of an album, i.e. how diversified are the tags describing it. It has been shown in other contexts [33] that being heterogeneous can be an advantage in terms of acquiring popularity, because the more diversified the content, the larger is the variety of public who might be interested.
Topical entropy E(a) is the entropy of the distribution of the tags T a associated with the album a over the set of topics L a , normalized in [0, 1]. A value close to 1 denotes an album less biased on a single topic. If topics represent musical styles, the album with an entropy close to 1 can be considered as a balanced mixture of many styles. Heterogeneity can also be relevant in time as witnessed by the large distribution of creation times as seen in figure 1d. For instance, classicrock and synthrock tags belong to the same topic rsos.royalsocietypublishing.org R. Soc. open log |L a | f (l) = fraction of tags in T a with label (topic) l mainstreamness but they have been introduced in very different times. Average time span TS(a) quantifies this kind of heterogeneity measuring the average creation time distances between the set of tags of an album, so that this metric is 0 if all the tags have been created in the same year. The average tag age A(a) is defined as the average relative age of all the tags associated with the album a, and it aims at quantifying the interplay between age of a tag and its impact or success.
The following set of metrics quantify the relationship of an album with the music production preceding it. The mainstreamness M(a) measures how different was the album with respect to the musical production before its release and it is defined as the distance between the probability distribution of the tags associated with the album and the global probability distribution of all tags associated with all the preceding albums. A value of M close to 0 indicates an album fully aligned with the previous production, while a value of 1 denotes strong originality.
Burstiness B(a) identifies whether an album has contributed to a significant increase in the production associated with a given topic. It is defined as the weighted average of the logarithmic derivative of the number of tags belonging to each topic in L a in the release year y a of an album (see the electronic supplementary material, figure C for a representation of the logarithmic derivative of the number of tags in some topics). A large value of B(a) means that the album was released in a year when one or many of its topics have experienced a fast relative growth, indicating that the album was created following an emerging trend. Finally, the novelty metric f new (a) is the fraction of tags in T a which are actually new at the time y a , which is a proxy for the innovations introduced by the album a.
Another interesting feature to look at concerns the notion of 'possible'. Instead of looking only at the relationship of an album with its preceding production, one can focus on the new possibilities opened by the introduction of an album and its long-term impact on the network. Following Kauffman [23], the adjacent possible of an album a can be defined as the set of all tags unlocked, i.e. made possible, by the release of the album and of its associated tags. To be more precise, we imagine first the complete cooccurrence graph of tags as seen by 2015. This includes all the tags (and their links) ever adopted by the albums released during the whole time span considered. At the beginning of the evolution, this graph has not been explored yet and all the tags are grey, as in figure 2.
As time goes by and new albums are released, tags are progressively coloured in green, meaning tags actually adopted for released albums (we shall refer to them as actualized). A third category of tags can also be identified at each time step, namely tags belonging to the adjacent possible space of a given actualized tag, with the following definition. Given an album a released at time y a and its set of tags T a , a tag belongs to the adjacent possible space of a if it belongs to the adjacent possible of any given tag tag ∈ T a . The adjacent possible space of a tag tag ∈ T a is composed by all the tags that: (i) are neighbours of tag in the global co-occurrence graph; (ii) have been not yet adopted by users at time y a , and (iii) for which all its predecessors (i.e. all the tags with in-links to them) have all been already adopted by users. With reference to figure 2, the tag Fusion enters in the adjacent possible space of rock because all its predecessors (trumpet and rock) have been already actualized. In other words, the adjacent possible space of an album quantifies the number of tags that are one step away from being introduced after its release. In more detail, we define Adj(a) of an album a as the average, over tag ∈ T a , of the fraction of nodes entering the adjacent possible space of tag during y a . This definition makes it easy to identify which part of the co-occurrence network is ready to be discovered by future music production, because all the ingredients necessary for its existence are already in place. The introduction of each new tag contributes to the expansion of this space of possibilities, because such a tag might be the last needed predecessor for other tags to become part of the adjacent possible space. A value of Adj(a) close to 1 indicates that almost every neighbouring tag of T a potentially triggered by the release of a is now very close to being adopted. On the other hand, a value of Adj(a) close to zero indicates that either the album features a set of tags lying in a part of the network already widely explored or that the album is too 'avant garde', because all the nearby part of the co-occurrence network cannot be adopted as the predecessors are not yet in place. In summary, the adjacent possible metric quantifies the short-term impact of an album, identifying how much of the potential adjacent possibilities are actually unlocked by the album itself. A metric similar to the one used here has already been introduced in [34] in the study of the cultural evolution of the production of movies, pointing out its relation with the cultural value of movies.
To quantify the long-term impact of an album on the subsequent production, we resort to an approach inspired to network resilience theory [35,36]. Given a certain tag ∈ T a , we remove such a tag from the network and, following its out-links, we iterate this removal process for its neighbouring tags until no further removal is possible. Links connecting removed tags are also removed, so that one ends up with a smaller network G . We now define uch(tag) as the fraction of G destroyed by the removal of tag. Uchronia uch(a) is obtained by averaging uch(tag) over all tag ∈ T a (weighted to account for how many new albums adopted tag after the release of album a) and it quantifies the impact of an album on future production.
Connected to Uchronia is Uchronia entropy, h uch (a), i.e. the variation of the entropy of the distribution of tags over the different topics, averaged over all tag ∈ T a . Uchronia entropy quantifies how much the rsos.royalsocietypublishing.org R. Soc. open removal of the tags of an album impact the topical distribution of tags. For example, a negative value of this quantity indicates that the removal has brought us to a more uniform distribution, destroying tags clustered around one or more larger topics. On the other hand, a positive value of h uch (a) indicates that the final distribution is more clustered, so that just one or few more topics gather all the remaining tags. Figure 3 shows the correlation matrix between all the metrics introduced above, the playcount values of each album and their age in 2015 (rows and columns within the dotted square). There are clear strong anti-correlations between the mainstreamness and the topical entropy of an album and between its two long-term impact metrics, Uchronia and Uchronia entropy. The interpretation of the latter case is quite trivial: destroying a large fraction of the network leads to a more entropic situation, where larger topics have been considerably reduced and the ensuing distribution is more uniform. The anti-correlation of the mainstreamness and the topical entropy expresses that small topical diversities are usually related to some specialized 'niche' albums, far from the mainstream production. Playcount values exhibit slight correlations (Spearman coefficients all significant but never larger than 0.36 in absolute values) with almost every metric except burstiness and average tag age. The small anti-correlation between playcount values and age (Spearman coefficient −0.14) indicates that being old does not imply a strong influence on the current success of an album, and this is probably due to the recommendation system of Last.fm, favouring novel albums. The relationship between all the metrics and playcount values are better shown in section C of the electronic supplementary material, file S1. The metrics introduced have also been compared with well-known network centrality metrics such as in-degree, out-degree, betweenness centrality and eigenvector centrality, corresponding to the rows and columns outside the dotted square in figure 3. These metrics are computed over the co-occurrence network whose nodes are tags, and we assigned to each album the value corresponding to the average over its tags. We can see that these metrics are highly correlated with each other, and feature the same correlation patterns with the other metrics introduced in this work, i.e. they are correlated or anti-correlated in the same way with the other metrics. Hence, they seem to carry similar information regarding the properties of the albums. Unsurprisingly, they exhibit larger (anti-)correlations with the Uchronia, Uchronia entropy and adjacent possible metrics, which can be regarded as network centrality metrics.

Popularity and significance
The popularity of a given album is strongly linked to its commercial success, connected, in its turn, to the number of people who have listened to it since its release date. Playcount values on Last.fm represent a good proxy to assess popularity. Through them we define as 'popular' albums with playcount values larger than the 75th percentile of the distribution (≈ 10 5 ) and as 'highly popular' albums with playcount values larger than the 90th percentile of the distribution (≈ 10 6 ). As already mentioned, popularity does not always coincide with significance. In order to go beyond popularity and quantify the level of significance of an album, we make use of expert-made lists of influential albums. In this way, we rely on experts' judgements about what is 'significant' or 'worth' and we can investigate whether those albums feature important differences with respect to albums in the popular and highly popular classes. We considered in particular the following lists, easily accessible from the Internet:   Metrics outside the dotted square are standard network centrality metrics (in-degree, out-degree, betweenness centrality and eigenvector centrality). All the positive and negative coefficients were found to be significant, with a p-value considerably smaller than 0.05.
-The National Recording Registry (NRR): a list of recordings that are considered culturally, historically, or aesthetically important or relevant to the American Culture (https://www.loc. gov/programs/national-recording-preservation-board/recording-registry/). We identified 230 entries in our dataset being also part of the NRR list. Table 2 shows a brief list of the five albums with largest playcount values for every list. Despite the presence of some overlaps, the lists are quite different from one another and reflect different viewpoints on music production. Figure 4 shows the distribution of the release dates and the playcount values of the albums for the four lists compared with the whole sample. Both the RS and the GHF lists have a bias towards albums produced between the 1960s and the 1980s, while the other two lists seem to be less biased in this sense and more uniform in time. On the other hand, the only list that seems not to be biased towards popular albums is the NRR list, followed by the GHF list showing just a weak bias. Hence, we decided to merge the RS and NME lists and the GHF and NRR lists. In this way, we obtained two larger lists indicated in the following with the names 'RS/NME' and 'GHF/NRR', the first one being more biased towards popularity than the second. These two categories will be then compared with the popular and highly popular ones. Summarizing, we will consider the following overlapping categories: -popular category: 41 053 albums with a playcount value larger than 10 5 .
-high-popular category: 7325 albums with a playcount value larger than 10 6 .
-RS/NME category: 619 albums considered important by the RSM and NME magazine.
-GHF/NRR category: 509 albums listed as within the GHF and the NRR. Figure 5 shows the behaviour of the metrics defined in the previous section as a function of time for these categories. The qualitative behaviour of the metrics restricted to one of the classes is usually quite similar to that of the whole sample of albums. However, it is evident that they usually display a higher degree of heterogeneity (topical entropy and average time span are higher with respect to the whole sample) and have a small mainstreamness. In the electronic supplementary material, section F, we also show the playcount values as a function of our metrics for the four defined categories. Again we find qualitatively rsos.royalsocietypublishing.org R. Soc. open              other systems [24,37]). Other metrics instead do not display clear monotonic trends. For instance, the adjacent possible metric related to the very next future stays more or less constant in time. Long-term impact metrics (uch(a) and h uch (a)) and the topical entropy E(a) feature instead a positive or negative peak around 1970 and 1980, indicating a time frame of great heterogeneity in which many influential tags have been introduced.

Predicting album category
From the plots in figure 5, it is not easy to assess whether the expert classifications are somehow different from the overall sample and what are the differences among one and another. In order to see whether our metrics are predictive either of success or of the belonging to the RS/NME or GHF/NRR categories, we performed a random forest classification [38] using the album metrics defined in previous sections as predictors. For each classification, we use a training set corresponding to the 80% of our data points, chosen at random without replacement. In order to have a good and unbiased classification, we ran the algorithm by using 1000 random trees, weighting each class according to their size in order to compensate for the high unbalance between them. After having trained the algorithm, we built a receiver-operating characteristic (ROC) curve based on false and true positives rates in the remaining sample. The Area Under the ROC curve (AuROC) is a quantification of how good our classifier is. An AuROC equal to 0.5 means that our classifier is basically random, while 1 indicates a perfect classification.
In every case, we find that our predictors are generally able to classify quite correctly the albums: we have an AuROC equal to 0.85 and 0.92 for the classification of albums with playcount values larger than 10 5 and 10 6 , respectively; an AuROC equal to 0.89 for the RS/NME list; AuROC equal to 0.81 for the GHF/NRR list. In order to test the significance of the predictions, in the electronic supplementary material, section D, we show the same prediction for a randomized case where both belonging to a category and the playcount have been randomly reassigned among all the albums being released in the same decade. The classification for this randomized case shows a significant decrease of the AuROC for each category, being in most cases very close to 0. 5.
The random forest algorithm (RFA) provides also information about which metrics are more relevant for the prediction. The RFA delivers a score by which the examined predictors can be ranked. These scores take into account the differences between the prediction errors obtained with the actual data, and a randomly reshuffled dataset [38]. In this way, we have information on what metrics have been mainly used to predict the belonging of an album to one of our categories and to eventually identify differences.
In figure 6, we show the ranking of the metrics used for the prediction of our categories. Most of the metrics introduced in this work depend on the particular community structure we considered. In all the results discussed above, we used the Louvain method for community detection [32]. In order to test the robustness of our predictions with respect to variations in identifying communities, we repeated all the measures when the community structure is identified by using the OSLOM algorithm [39]. In the electronic supplementary material, section E, we show that even with this different method we are able to predict belonging to each category with almost the same accuracy. Moreover, the most important metrics for the various predictions remain the same.

Popular category
The most important features for the popular category classification are the heterogeneity metrics (E(a), TS(a) and A(a)), followed by mainstreamness M(a) and Uchronia uch(a). Note that the popular category being the largest (25% of the sample can be considered as part of this category), it is also the one which is most difficult to classify as it contains albums with quite diverse features. In fact, the first three metrics in the ranking account for less than 2 5 of the total score and the score of every feature is comparable. Figure 7 shows the percentage of albums belonging to the popular category as a function of three pairs of the most relevant metrics: topical entropy E-average time span TS, topical entropy E-mainstreamness M and topical entropy E-Uchronia uch. In other words, we show the probability that an album belongs to the category popular conditioned on having a certain value for a couple of these three metrics. The distributions of these metrics and the ellipses of covariance at 1 and 2 standard deviations of the scatter plot of the metrics are also shown in order to highlight where the majority of the sample is clustered. The picture emerging from crossing topical entropy and average time span ( figure 7a) is what is to be expected from the relationship between heterogeneity and success. The probability that an album belongs to the popular category is enhanced by having both these metrics higher than usual and the peak of the probability lying outside the largest ellipse of covariance.  Figure 7b unveils another property, which will be common also for the other categories: the probability of belonging to the popular category is enhanced for small values of mainstreamness (indicating great similarity to the musical production at the release time of the album), as can be seen from the fact that there is a peak of probability in the bottom right corner of this panel. This is not surprising at least for the prediction of popularity (we also refer to the electronic supplementary material, figure G), as too original albums might be perceived as weird and then not acquire public favour. Finally, the comparison with Uchronia reveals that the probability of belonging to the popular category is higher when this value is just slightly larger than 0, indicating that albums in this category usually have an impact to some extent on the music production.

Highly popular category
For this category, two of the most relevant metrics are again related to heterogeneity, but the main predictor in this case is novelty, which roughly measures the degree of innovation of an album. Mainstreamness is again important, which is not surprising for the same reasons explained for the popular category. Note that, unlike the popular category, now the first three metrics account for more than 50% of the total score. Figure 8a shows the probability of belonging to this category as a function of the novelty and topical entropy metrics, similarly to what has been shown in figure 7 for the popular category. We can see that the vast majority of albums have a value of novelty which is equal to 0 (top histogram in this panel), but from the plot we can see that albums with a value larger than 0 are more likely to be part of the highly popular category. Moreover, we can observe in figure 8a,b that similar to the popular category, highly popular category albums have small values of mainstreamness and large values of topical entropy. Hence, with respect to the previous prediction, now heterogeneity and adherence to the style of the year of release of the album make the album more likely to be popular. However, in order to acquire vast popularity, an album should also introduce some innovations to the music scenario, suggesting that the public rewards innovative elements with most attention.

Rolling Stone/NME category
We have seen in figure 8 that this category is strongly related to popularity. In fact the main predictors are almost the same as for the highly popular category even though they are ranked in a slightly different way. However, mainstreamness and novelty seem to share a larger part of the total score. The fourth relevant metric that has changed with respect to the highly popular case is Uchronia, a long-term impact metric, though with a very small score. In fact, in figure 9 we can see that mainstream albums (small M) introducing a certain number of new tags (relatively high f new ) are more likely to be part of this category. A large topical entropy also enhances the probability (figure 9b), while in the case of Uchronia (figure 9c) we can see that the probability is enhanced when the value of this metric is slightly larger than 0, indicating the need for some sort of long-term impact in order to be considered important by music magazines. The similarity between the highly popular albums and those belonging to the RS/NME category is quite interesting considering that the RS/NME lists are expert-made. This fact might be the effect of the mutual influence loop between magazines and public opinion. From one side, magazines are biased towards what the vast public likes in order to maximize their sales, while at the same time the public is affected by influential magazines in their opinion of what is interesting in the music scene.

Grammy Hall of Fame/National Recording Registry category
The last category, obtained by merging Grammy Hall of Fame and NRR lists, is instead clearly not related to success and popularity. Besides mainstreamness, which is the second most important predictor, the other most important are in fact quite peculiar with respect to the other categories: burstiness, Uchronia entropy and average tag age. Considering burstiness and mainstreamness in figure 10, it is evident that small mainstreamness and large burstiness are two main features of the albums belonging to this category. The picture is quite surprising when burstiness and average tag age are crossed together (figure 10c). While it is clear that burstiness dominates the prediction as there are large conditional probabilities for large value of burstiness, average tag age must instead be kept small. Finally, the comparison with Uchronia entropy indicates that this long-term impact metric is important, so that the probability of belonging to the GHF/NRR category is higher when this metric is negative.
The criterion of belonging to this category is drastically different with respect to the other categories we have seen and definitely not related to the taste of the public. The fact that burstiness is the main predictor indicates that the albums present in the GHF/NRR lists are typically released whenever some topic was rapidly growing, so in other words these albums could have contributed to the emergence of some trend in music production. The small values of the average tag age instead indicate that the features of the albums were highly related to the particular historical period in which they were released. Finally, the negative values of Uchronia entropy indicate that such tags have a strong influence on the future albums. The result from mainstreamness again indicates, however, that though innovative, these albums are not distant from the musical trends of their times, indicating that also in terms of influence over musical production being too far with respect to what is generally appreciated might lead to negative results.
As a final remark, we must note that the adjacent possible metric was not among the most relevant ones in every prediction. This could either be due to a real irrelevance of this metric or to a correlation with other metrics which are effectively more important. In the electronic supplementary material, section G, we show the same plots shown before for the four categories in the case of the adjacent possible metric. These plots do not highlight a particular relevance of the adjacent possible space in any of these predictions. To shed some more light on the possible meaning of this metric, we show in the electronic supplementary material, figure I, that the probability of belonging to one of the categories considered above, conditioned on having a certain value of the adjacent possible metric, features a maximum before decreasing for large values of the metric itself. It seems thus that there exists an optimal stadium of exploration of the adjacent possible that is rewarded by every category. Too avant garde albums instead, which unlock a large part of the conceptual space, are not considered valuable either by the public or by music experts.

Conclusion
Popular music is a highly competitive world, where bands and musicians produce music albums and tracks rivalling each other in order to acquire public attention. Popularity and in general commercial success seem to be the driving forces of the system, so that the production of artworks has to cater to the public's taste. However, there are other possible levels of importance that only experts in the particular field might be able to recognize. In order to assess and characterize the presence of this new kind of significance, we developed some general metrics aimed at describing the features of musical albums, their relationships with each other and with the global music production and their impact on it. These metrics have been developed using a dynamical space of album features represented by user-defined tags collected on Last.fm in 2015. This space can be represented as a growing network of co-occurrence of tags, where albums are described by cliques of tags attached to the network in their first year of release. The metrics we defined were then used to predict the belonging of an album to four different categories, two of which were linked with popularity and two built using expert-made lists of important albums. Our metrics have been used as features in a random forest classification algorithm, showing a great degree of prediction accuracy (values of AuROC larger than 0.8 in every case). Moreover, the RFA gave us information on which metrics were more important for the predictions, allowing us to have some insights on how an album belonging to a category typically is. We have seen that popularity is linked with heterogeneity, so a popular album connects different musical features which might also belong to a different period of time. Another important characteristic, relevant also for the two expert-made categories, is the mainstreamness, i.e. being somehow aligned with the general style of its own period. In general, albums which could have been found 'too weird' did not acquire popularity.
Albums belonging to the first expert-made category, built using the albums list from the RS and NME magazines, showed a great similarity with highly popular albums. In this sense, it is possible to argue that the level of significance considered in those lists is indeed success, which is not too surprising considering that music magazines have also to cater to the general taste. Finally, albums in the last considered category, obtained by merging lists compiled by the GHF and the NRR, are instead quite different from those having acquired success and can be considered as excelling in significance. This category takes into account cultural, historical and aesthetic importance, and albums belonging to it usually have a high impact on the future evolution of the tags co-occurrence network. Most importantly, these albums were released in a period of fast growth of the musical area they contributed to. In this sense, such albums might have been seminal for important future trends and the high impact of their tags indicates that they had a strong influence on their own field. It is important to mention that a large fraction of albums in this category cannot be considered popular, so that their significance cannot be related to success.
We think the approach proposed in this paper to tell apart significance and popularity is sufficiently general to be applied to other realms of artistic expression along with all the areas of scientific production. The evaluation of the scientific impact of articles and the importance of a scholar, for instance, is nowadays represented by the h-index, computed over the distribution of the scientific citations of a given element [40], which has been widely criticized due to its biases and its lack of accuracy [41,42]. Interesting future developments of this work might include the investigation of solid criteria to decree