Don’t follow the leader: how ranking performance reduces meritocracy

In the name of meritocracy, modern economies devote increasing amounts of resources to quantifying and ranking the performance of individuals and organizations. Rankings send out powerful signals, which lead to identifying the actions of top performers as the ‘best practices’ that others should also adopt. However, several studies have shown that the imitation of best practices often leads to a drop in performance. So, should those lagging behind in a ranking imitate top performers or should they instead pursue a strategy of their own? I tackle this question by numerically simulating a stylized model of a society whose agents seek to climb a ranking either by imitating the actions of top performers or by randomly trying out different actions, i.e. via serendipity. The model gives rise to a rich phenomenology, showing that the imitation of top performers increases welfare overall, but at the cost of higher inequality. Indeed, the imitation of top performers turns out to be a self-defeating strategy that consolidates the early advantage of a few lucky—and not necessarily talented—winners, leading to a very unequal, homogenized and effectively non-meritocratic society. Conversely, serendipity favours meritocratic outcomes and prevents rankings from freezing.


Introduction
Modern advanced economies devote ever-increasing amounts of resources to quantifying and ranking the performance of individuals, companies and institutions. The rationale underpinning this trend is that of meritocracy: ranking performance encourages to strive to be at the top, generating a virtuous cycle which rewards top performers and incentivizes others to improve.
implemented through public relative performance feedback (PRF) [4], which entails disclosing workers' productivity metrics in order to promote the diffusion of the practices adopted by top performers. Adopted by a large number of US corporations [5], PRF has measurably led to improvements in productivity in a variety of workplaces (e.g. hospitals [4]), and has led to temporary improvements in test results when applied to students in schools [6].
On the other hand, experimental research suggests that PRF may lead to more nuanced outcomes under certain incentive schemes. Studies have shown that PRF may backfire in situations where participants are compensated under tournament-like incentives akin to schemes that are in place in many firms, where the top-performing employees receive a bonus [7]. Indeed, the cognitive costs associated with a change of strategy to adopt best practices often lead to further improvement for a minority of already excellent performers, and to a deterioration in performance for the rest of the population [8].
Similar contradictions also arise at the aggregate level of organizations. For example, the academic performance of higher education institutions is now measured and ranked in a variety of ways based, e.g., on the ability to attract funding, student output, awards received, graduate employment, etc. [9]. Although all these indicators individually contribute to the quality and prestige of academic institutions, their aggregation into rankings has attracted considerable controversy and criticism as a driver of homogenization in higher education, as universities become more responsive to changes in the rankings themselves than to their broader social responsibilities [10,11].
In line with the above considerations, a growing body of literature suggests that the outcomes of ranking processes do not necessarily reflect the true value of the individuals or organizations being ranked [12]. Arguably, such a disconnect between value and ranking is the by-product of the interaction between three main factors: imitation, serendipity and reactivity.
As mentioned above in relation to PRF, the imitation of 'best practices' adopted by successful individuals can backfire and exacerbate inequalities in performance. In fact, the disconnect between value and success is a typical emergent property of collective decision systems, where individual decisions are not made independently [13,14]. Experimental studies have indeed demonstrated that the very same people or items can achieve markedly different levels of success in a ranking in situations where individuals can observe and imitate the choices made by others (e.g. [15] for a seminal example in an artificial cultural market). In such situations, the compound imitation of choices typically results in a very skewed visibility distribution, which in turn leads to a few dominant 'hits' ultimately capturing most of the attention, as is systematically the case for e.g. movies [16], web pages [17] and even scientific papers [18].
At the same time, serendipity is known to play an exceedingly important-yet often underplayedrole in determining success. Serendipity refers to positive developments of events that occur in an unplanned manner. Notable examples of serendipity are scientific discoveries that have occurred in fortuitous ways, such as those of penicillin and X-rays. A number of studies have highlighted how random events can lead to the aforementioned disconnect between an individual's value (e.g. skills and intelligence) and her level of success. For example, a recent simulation-based study has shown how short-term success in an artificial society is most often achieved by the luckiest individuals rather than the most talented ones [19], with real-world examples of similar dynamics having been found, e.g. in financial markets [20,21], sports [22] and science [23].
In most cases, the tension between imitation and serendipity as different mechanisms to achieve success in a ranking is driven by reactivity, which refers to changes in behaviour due to the awareness of being observed (also referred to as the Hawthorne effect [24]). In this respect, quantifying and ranking performance is a self-defeating process in situations where individuals can partially manipulate the metrics according to which they are being ranked. This is encapsulated by the adage known as Goodhart's Law [25]: 'when a measure becomes a target, it ceases to be a good measure'. Examples of Goodhart's Law in action abound in many contexts. For example, surgeons in the UK reportedly try to avoid the most complex surgeries due to the introduction of public league tables reporting success rates [26]. Similarly, school systems based on standardized testing are known to be plagued by 'teaching to the test' practices, i.e. teachers devoting disproportionate amounts of time and resources to subjects known to be frequently assessed in tests, preventing pupils from receiving a broader education [27]. In recent years, academia has also been affected by similar practices due to the constantly increasing emphasis being placed on citation-based bibliometric indicators to quantify the impact of published research and rank researchers accordingly. Indeed, plenty of evidence relates such practices to the empirically observed increase in self-citation rates [28,29] and exchange of citations between co-authors [30]. In contexts where individuals or institutions are ranked, reactivity provides a strong incentive to imitate the actions of top performers. Yet, as mentioned above, this can easily backfire. So, what should those lagging behind in a ranking do to climb closer to the top?
In the following, I propose a stylized model to contrast imitation and serendipity as competing mechanisms in an artificial society whose agents are aware of being ranked based on their performance, and try to climb the ranking by either imitating the past actions of better ranked agents, or by trying their luck with the adoption of new actions that are presented to them at random. Within this simplified setting, I will seek to determine whether the adoption of best practices from top performers can always outperform luck, as one would intuitively expect. The model gives rise to a rather rich dynamic, which unveils a negative feedback loop between the likelihood of climbing a ranking and the attempt of doing so through the imitation of top performers. Indeed, I will show that imitation is a largely self-defeating endeavour, which in most cases is vastly outperformed by serendipity.
The paper is organized as follows. In §2 I outline the model and provide some qualitative intuition on its functioning and its main results. Section 3 is then devoted to outlining such results in detail, while §4 concludes the paper with a discussion on its implications.

The model
Let us consider N agents who repeatedly select which action to play among M possibilities. Each action can in principle yield a pay-off up to a value π j ∈ [0, 1] ( j = 1, …, M), but the agents' ability to reap the benefits of a particular action varies according to a matrix α ij ∈ [0, 1] (i = 1, …, N; j = 1, …, M) such that the pay-off that agent i receives when adopting action j reads P ij ¼ a ij p j : (2:1) In the following, and throughout the rest of the paper, I will assume both the π j 's and the α ij 's to be independent random variables drawn from a uniform distribution over [0, 1].
The pay-offs π j in the above definition capture the intrinsic potential profitability of the available actions, whereas the factors α ij capture the idiosyncrasies associated with the agents' abilities to profit from them due to e.g. different skill sets. For example, in an academic context the pay-offs π j would quantify the overall potential for impact of a scientific field j, while α ij would quantify the ability of a specific researcher i to publish high-quality research in it.
Crucially, there can be situations such that P ij > P ik and π j < π k , i.e. cases in which an agent i is better off playing an action that is associated with a lower potential pay-off, but still yields a higher individual payoff to her (if α ij ≫ α ik ). In the same analogy used above, this would represent a researcher i whose individual potential is much better fulfilled in a less impactful field.
At the beginning of time (t = 0) each agent starts out by playing a randomly selected action, but at any later time step (t = 1, …, T) has the opportunity to change action depending on the pay-off she has received in the latest round. Let us denote as P (t) ij the pay-off agent i has received by playing action j at time t. Let us define the pay-offs an agent has accumulated over time as the agent's utility. This reads and depends on the set of actions {p j0 , p j1 , . . . , p jt } she has played at each round. At each time step t, the agents are ranked based on the utility they have accumulated up to that point , and use such ranking in order to decide whether to change their current action or not. Namely: (1) at each time step t each agent keeps playing the same action with probability equal to the last pay-off she has received, i.e. P (tÀ1) ij ; (2) if an agent changes action, then with probability q ∈ [0, 1] she copies the time t − 1 action of a randomly selected agent among those ranked better than her, while with probability 1 − q she picks a new action at random.
The model's dynamic is sketched in figure 1. Point (1) above captures the agents' quest for actions that are profitable to themselves. Indeed, an agent will drop a potentially highly profitable action (p j & 1) with high probability when her ability to benefit from it is low (α ij ≪ 1). Point (2)  parameter q ∈ [0, 1], which quantifies to what extent the agents pay attention to the ranking and choose to imitate the actions of their most successful peers. When q is large, the majority of action changes will be aimed at imitating the actions of the most successful agents. Conversely, when q is small most agents will select a random action when switching to a new one. Figure 2 provides some initial intuition of the results we can expect from the model. The panels illustrate agent trajectories in the space of possible actions for simulations of the model with N = 200, M = 1000, T = 500. The pay-offs π j and the matrix elements α ij are independent and identically distributed variables drawn from a uniform distribution over [0,1]. From left to right, the panels correspond to q = 0.1 (a model where the agents' action selection process is largely random), q = 0.5 and q = 0.9 (a model where the agents' selection process is largely driven by the imitation of better ranked agents), respectively. The top (bottom) panels show the trajectories of the top (bottom) 10 agents according to the final ranking at time T.
As it can be seen from the top panels, higher values of q lead to much higher stability in terms of the actions played by the highest ranked agents. Indeed, for q = 0.1 no particular pattern is clearly discernible, as the agents keep switching actions and do so mostly at random. On the other hand, for q = 0.9 the top 10 agents quickly lock in on the same action, and essentially keep playing it with very few interruptions. Furthermore, the ranking makes sure that such interruptions are short-lived, as top-ranked agents only have a handful of peers to look up to in the ranking, and these are all playing the same action most of the time.
The bottom panels show that the above effect trickles down all the way to the bottom of the ranking. Indeed, it can be seen that for higher values of q even the lowest ranked agents tend to return to the action played by the highest ranked ones, albeit in a much more noisy fashion. All in all, these examples begin to highlight the presence of a clear feedback mechanism between the ranking and the level of attention the agents pay to it when switching actions: the higher the value of q, the more frequently all agents will turn to the ranking to make their decisions, regardless of their abilities. This mechanism dramatically narrows the diversity of choices made by top-ranked agents, which in turn further narrows the options of lower ranked agents when they turn to the ranking to decide which actions to adopt.
Before proceeding to detail the model's results, it is important to acknowledge its main limitations. As outlined above, the agents' decision-making is based on two very simple rules, which mimic real-world randomly selected action imitation of higher ranked agent Figure 1. Sketch of the model. At the beginning of a time step, agent i is playing action j, which awards a pay-off π j . With probability equal to the individual pay-off P ij = α ij π j (see equation (2.1)), the agent keeps playing the same action, and with probability 1 − P ij switches to a new action, which is determined either via imitation or via serendipity. With probability q, the agent adopts the action being played by a (randomly selected) better ranked agent, while with probability 1 − q the agent selects a new action at random. royalsocietypublishing.org/journal/rsos R. Soc. open sci. 6: 191255 behaviour on an intuitive level but lack microfoundations. In addition, the model does not allow for adaptive behaviour, in that the agents lack memory and therefore cannot learn from their past actions, and the actions' pay-offs remain constant, regardless of the number of agents playing them and their position in the ranking. Therefore, the model is to be interpreted as a stylized representation of much more complex dynamics. Nevertheless, as we will see in the following sections, the model's strength precisely lies in the simplicity of its assumptions, which allow to draw meaningful comparisons between the model's results and real-world outcomes.

Results
I will now turn to exploring the model's results in greater detail. First, I will investigate how utility is generated and distributed across the agents.

Utility and inequality
At any given time step we can define the total utility of the agents simply as U(t) ¼ P N i¼1 u i (t), where the utility of each agent is defined as per equation (2.2). Figure 3a shows the total utility U(T ) at the end of simulations with N = 200, M = 1000, T = 500. As it can be seen, the total utility increases monotonically with the parameter q up to q ∼ 0.9, after which it declines slightly. At first glance, this would seem to suggest that the agents are overall better off when changing actions based on the imitation of better ranked individuals. Yet, the increase in total utility is not unequivocally positive. Figure 3b shows the Gini coefficient for the distribution of the agents' individual utility at the end of simulations. A very well-known measure of inequality in a society, the Gini coefficient is usually defined as By construction, the Gini coefficient ranges from 0 for a perfectly equal society (u i (t) = u k (t), 8 i, k) to 1 for a completely unequal society where the entirety of the available utility is owned by a single agent (u i (t) = U(t) and u k (t) = 0, 8 k = i). As shown in figure 3b, the Gini coefficient g(U(T )) at the end of simulations increases monotonically with the parameter q, highlighting a steady increase in inequality.
The two above results together show that an increased attention towards the ranking drives towards an overall increase in utility, although such utility gets increasingly concentrated in the hands of fewer agents. In principle, this does not rule out that those at the bottom of the ranking might still be better off in absolute terms, in which case higher inequality would be a more justifiable outcome. However, figure 3c shows that the total utility accumulated by the bottom 10% of the population eventually decreases for high values of q. Symmetrically, the total utility accumulated by the top 10% steadily increases with q.
In summary, the results presented in this section show that when imitation becomes the prevalent strategy, society as a whole becomes 'richer'. Yet, this is entirely driven by a much faster accumulation of utility in the higher layers of the ranking, whereas those at the bottom eventually accumulate less utility than they would if their actions were chosen at random.

Meritocracy and homogenization
The results of the previous section show that when the agents pay more attention to the ranking, the inequalities between them increase. Yet, such an outcome would surely look more acceptable if it somehow reflected an underlying meritocratic dynamic, according to which the 'best' agents are those who accumulate more utility. In order to verify whether this is indeed the case, I introduce two measures of the agents' intrinsic potential ability-which I refer to as fitness-based on the utility they would be able to extract if they selected actions based on two predetermined strategies. The first one is the average pay-off an agent would receive by playing all actions uniformly at random It should be noted that the above does not correspond to the average pay-off an agent receives when q = 0, as even in that case agents preferentially play profitable actions and therefore do not sample the action space uniformly. The second fitness measure I consider is instead the highest pay-off an agent can extract by playing her most profitable action, i.e., where j* is such that P ij* > P ij , 8j = j Ã : The two above measures capture different aspects. The former considers agents with a more diversified portfolio of skills as the fittest, whereas the latter singles out those agents who excel at one specific action, regardless of their skills when playing other actions. and u i > u k ). Two observations can be made. First, unless q is very low, utility systematically tends to correlate more with ϕ max than with ϕ avg , i.e. as soon as the ranking is relied upon to inform the agents' decisions, then the agents who are rewarded the most tend to be those excelling in a single action (typically one associated with some of the highest pay-offs) rather than those who are consistently good at playing several actions.
Second, both correlations decrease with higher values of q. The correlation between utility and ϕ avg does so monotonically, whereas the correlation between utility and ϕ max decreases after reaching a maximum value around q ≈ 0.3. Such a decrease signals that the more the agents pay attention to the ranking, the less such ranking reflects the actual agents' skills, substantially reducing meritocracy.
The dynamics induced by the ranking also have consequences on society's homogeneity in terms of the number of actions played by the agents. Let us denote the fraction of actions being adopted across the agent population at any given time as d(t) ¼ M À1 P M j¼1 1(p j , t), where the indicator function is such that 1(π j , t) = 1 if at least one agent is playing action j at time t and 1(π j , t) = 0 otherwise. Clearly, we have δ(t) ∈ [1/M; min (1, N/M )], where the lower bound corresponds to all agents playing the same action, while the upper bound is attained when each agent is playing a different action. Figure 4b shows that homogeneity increases with q by reporting the average value of δ(T ), i.e. the average fraction of actions being played at the end of a simulation. As it can be seen, when the ranking plays no role in the agents' decisions (q = 0), the agents already discard a very large fraction of the available actions. Intuitively, this naturally happens through the agents' random search for higher pay-offs: once an agent randomly 'stumbles upon' a highly rewarding action, she will keep playing it over multiple consecutive rounds with high probability. In contrast, with higher q the agents will increasingly tend to imitate the choices of their better ranked peers, ultimately shrinking the space of adopted actions to its bare minimum when q → 1.

Ranking dynamics
How stable are the rankings produced by the model? In this section I address this question by studying changes in the agents' ranking position. Namely, I consider the fraction m i (t) of agents occupying a lower position than a certain agent i in the ranking at a given time t, i.e. (3:4) where Θ( · ) denotes Heaviside's step function (i.e. Θ(x) = 1 for x > 0, and Θ(x) = 0 otherwise), and quantify agent i's change in ranking position over a time interval as Δm i (t, t + Δt) = m i (t + Δt) − m i (t) [32].  The panels in figure 5 show the time evolution of the above quantity averaged over different simulations with N = 200 and M = 1000. In all four panels, changes in the ranking position as defined in equation (3.4) are computed over time lags of Δt = 100 time steps, and the different panels refer to snapshots taken at time steps t = 100, 200, 300, 400. The solid lines represent averages, whereas the shaded regions represent 90% confidence level intervals, and each panel shows the results obtained for q = 0.1 and q = 0.9.
As it can be seen, at the beginning of its time evolution the ranking changes substantially, regardless of the agents' preferences when switching actions. Indeed, the downward trend in the top-left panel of figure 5 shows that agents that happen to start at the top of the ranking typically lose ground during early stages of the model's dynamics, whereas agents initially at the bottom tend to climb up. However, as the dynamics continue the agents' preferences become increasingly important, leading to very different outcomes.
When random choices are prevalent (q = 0.1), the model still allows for considerable changes in the ranking in the long run. On average, the position of most agents in the ranking does not change dramatically, and agents at the top rapidly consolidate their position, but large fluctuations still take place in the central and bottom parts of the ranking, allowing agents with less utility to climb up. Conversely, when the agents mostly imitate better ranked agents (q = 0.9), the ranking essentially freezes.
The above result highlights once more the negative feedback between rankings and active efforts to climb them based on the imitation of actions adopted by those at the top. Once the model has produced some early 'winners', they will keep their position at the top of the ranking, and any effort made by lower ranked agents to beat them through imitation will only backfire. The only mitigation to this outcome is serendipity (low q), i.e. a random search for more profitable actions, which prevents lower ranked agents from becoming perpetual 'losers'.
As a final remark, it should be noted that all of the above results are robust with respect to changes in the model's parameter specifications. That is, the model's behaviour as a function of q is qualitatively unaffected by changes in the values of N and M.

Discussion
This paper puts forward a stylized framework to model the emergence of a divide between the fitness of an agent and her position in a ranking according to a measure of performance. This is done by simultaneously accounting for three well-documented mechanisms. First, the agents' attempts at climbing the ranking account for reactivity, i.e. the awareness of being observed and changing behaviour accordingly. Second, the imitation of 'best practices' and most successful strategies is encoded in the agents' imitation of the actions adopted by top-ranked peers. Third, serendipity partially determines the agents' chances when they search for more profitable actions. The combination of the three above factors gives rise to a fairly rich phenomenology, which allows to study the trade-off between imitation and serendipity as different strategies to climb a ranking. In a nutshell, such a trade-off can be summarized as follows. Attempting to climb a ranking by imitating the actions of those at the top is a self-defeating strategy that further consolidates the early advantage of a few lucky-and not necessarily talented-winners. Attempts based on serendipity, i.e. on a random search for more profitable actions, have instead a mitigating effect on these outcomes.
A number of considerations can be made on the above. First, it is interesting to notice how the model highlights the existence of a negative feedback loop between the attempt to 'enforce' meritocracy by means of a ranking process and the actual possibility of achieving it. The model's dynamic is such that it always creates some lucky winners, i.e. those agents who happen to stumble on a profitable action early on in the model's time evolution and keep playing it over several rounds with high probability. As shown in figure 4a, this is a general feature of the model, as there is always a weakly positive correlation between the success achieved by the agents and their overall fitness (echoing the findings of [19]). Yet, when decisions are largely driven by the ranking, most agents will seek to imitate the actions of the early lucky winners, with the only result of widening inequality (figure 3) and ultimately reducing their own chances of climbing the ranking (figure 5).
In this respect, it is interesting to observe that serendipity plays the role of a double-edged sword in the model. On the one hand, it hampers meritocracy by endowing a lucky minority of agents with an early-yet permanent-competitive advantage. On the other hand, when the parameter q is low, it partially restores meritocracy by favouring upward mobility in the ranking and a higher correlation between fitness and ranking outcomes. Put differently, luck will always generate some disconnect between intrinsic skills and measured performance, but attempts to overcome this by means of a ranking process will typically make things worse for those lagging behind.
From the perspective of society as a whole, the attention paid to rankings has a number of effects. As already mentioned, when the agents seek to climb the ranking through the imitation of others, they ultimately generate higher inequality, lower meritocracy and reduced chances of climbing a ranking (see [33] for similar findings in the context of financial wealth accumulation). Furthermore, the imitation mechanism drastically reduces the diversity of the actions played by the agents, resulting in an almost complete homogenization of society (figure 4b). Interestingly, this echoes evidence from financial markets, where analysts often prefer to imitate each other's forecasts rather than independently coming up with their own [34].
Finally, what lessons can be learned from the above model in the context of academic research and higher education? At the level of individual researchers, the model's results are in line with empirical evidence from publication data, which reveals that the first papers in a novel field-regardless of their content-often tend to attract citations at a higher rate than the papers following them [35]. In this respect, quoting [35], 'the scientist who wants to become famous is better off-by a wide marginwriting a modest paper in next year's hottest field than an outstanding paper in this year's'. Paraphrasing this in the context of the model, those who aim to become top-cited scientists in their field have much better chances of doing it by serendipitously pursuing their own research agendas rather than by imitating those of already well-established scientists.
At the broader level of institutions, instead, the model sheds some light on the causes of the increased homogenization of the higher education landscape, which indeed has been often associated with the ever-increasing emphasis placed on university rankings [10,11]. As shown in figure 4b, the more the agents' decisions are driven by the imitation of their top-ranked peers, the less the space of possible actions is explored, and all agents end up playing just a handful of actions, regardless of their profitability.
In this respect, as a data-rich and ranking-driven environment, academia represents the ideal laboratory in which to test the model's predictions in future work. Indeed, the analysis of citation royalsocietypublishing.org/journal/rsos R. Soc. open sci. 6: 191255 data easily allows to quantify the similarity of research outputs [36] and to follow individual career trajectories across a discipline's research space (e.g. [37]). These basic ingredients would allow to compare the impact achieved by 'trend-followers' who actively seek to publish in mainstream fields as opposed to serendipitous researchers who mostly follow their interests. Similarly, aggregating such data would allow to quantify the performance of academic institutions in relation to their strategic behaviour.
In conclusion, this paper puts forward a framework to model the interplay between imitation and serendipity in situations where individuals or organizations are ranked (and aware of being ranked) based on some quantitative metric of performance. As mentioned above, the model is a deliberately stylized representation of the real-world dynamics of such situations and clearly has a number of limitations, which extensions of the present work will seek to overcome. Most importantly, future extensions will allow the agents to retain some memory of their previous choices and to learn from them, in order to adapt their preferences for imitation or serendipity accordingly. Also, it should be noted that the model's dynamics would not lead to significant changes if studied on a network of interactions (as opposed to the well-mixed population case considered here), due to the fact that the imitation of actions would be able to diffuse throughout it regardless of its specific topology. In this respect, future extensions of the model should make it more consistent with the available evidence that network structures alone can determine ranking outcomes [38]. Nevertheless, the model's strength lies precisely in the clarity and simplicity of the assumptions made, and in the fact that these are enough to generate rich dynamics which qualitatively resemble real-world observations. Hopefully the model presented in this paper will contribute to reflect on the importance that we collectively place on rankings, and on the unintended consequences they may have on our societies.
Data accessibility. As a simulation study, no data have been used in this manuscript. The code to simulate the model described in the paper has been uploaded to GitHub: https://doi.org/10.5281/zenodo.3345835.

Competing interests. I declare I have no competing interests.
Funding. I acknowledge support from an EPSRC Early Career Fellowship in Digital Economy (grant no. EP/N006062/1).