Do pride and shame track the evaluative psychology of audiences? Preregistered replications of Sznycer et al. (2016, 2017)

Are pride and shame adaptations for promoting the benefits of being valued and limiting the costs of being devalued, respectively? Recent findings indicate that the intensities of anticipatory pride and shame regarding various potential acts and traits track the degree to which fellow community members value or disvalue those acts and traits. Thus, it is possible that pride and shame are engineered to activate in proportion to others' valuations. Here, we report the results of two preregistered replications of the original pride and shame reports (Sznycer et al. 2016 Proc. Natl Acad. Sci. USA 113, 2625–2630. (doi:10.1073/pnas.1514699113); Sznycer et al. 2017 Proc. Natl Acad. Sci. USA 114, 1874–1879. (doi:10.1073/pnas.1614389114)). We required the data to meet three criteria, including frequentist and Bayesian replication measures. Both replications met the three criteria. This new evidence invites a shifting of prior assumptions about pride and shame: these emotions are engineered to gain the benefits of being valued and avoid the costs of being devalued.


Introduction
Humans have a powerful need to belong [1][2][3] and, reciprocally, a powerful aversion to being devalued, excluded or ostracized [4][5][6]. Here, we consider a functional framework to test emotional components of this motivational disposition. More specifically, we test the emotions of pride and shame as systems engineered to promote others' positive valuations of the self and to avoid being devalued by others [7][8][9][10][11][12].

Prior evidence that pride and shame may track the valuations of audiences
A well-engineered pride system must mobilize not only reactively but also prospectively, in order to motivate the pursuit of socially valued actions that might increase others' valuations of the self [16,32]. In this way, prospective pride helps the individual decide which courses of action to take.
It has been hypothesized that the anticipatory feeling of pride is an internally generated prediction that signals the magnitude of audience valuation one would accrue if one took an action that others value [16]. A pride system that accurately forecasts and precisely tracks audience valuation allows the individual to avoid two types of costly errors: (i) under-activation of anticipatory pride, which would cause the individual to insufficiently pursue socially valued courses of action, and (ii) over-activation of anticipatory pride, which would cause the individual to over-pursue actions in excess of their actual return. This analysis suggests the existence of a feature: the pride system should (i) forecast the magnitude of valuation people in one's social ecology would express if one took a given act that they favour, and (ii) deliver an internal signal (anticipatory pride) whose intensity is proportional to it. Experiments conducted in 16 countries supported this prediction: the intensity of anticipatory pride in every country closely tracked the magnitude of valuation expressed by local audiences-in the absence of any communication between participants reporting their pride versus audiences reporting their valuation regarding each of various potential acts and traits, such as generosity, trustworthiness and skills [16].
Analogous reasoning suggests that the anticipatory feeling of shame is an internal prediction of the degree to which local audiences would devalue the individual if she took an action that they disfavour, such as theft, sexual infidelity or stinginess ( [24]; see [33][34][35]). By forecasting and tracking the precise magnitude of audience devaluation, the aversive signal of anticipated shame allows the individual to steer adaptively between a dangerous disregard of others' views, which would yield excessive devaluation, and an excessive timidity about one's possible disgraceful behaviour, which would yield insufficient personal payoffs. As predicted, shame closely tracked audience devaluation in three countries [24].

The present work
Here, we present the results of two preregistered replications addressing the following questions: does anticipatory pride track the magnitude of audience valuation [16]? And, does anticipatory shame track the magnitude of audience devaluation [24]?
The present work addresses two issues surrounding the replicability of the original pride and shame studies. First, here we perform preregistered confirmatory analyses. This allows us to validly conduct null hypothesis significance testing while controlling long-run error rates that otherwise would be inflated by undisclosed flexibility in data analysis [36,37]. If the replications are successful, that would make it less likely that the original findings were false positives or that the original effect sizes were misestimated due to undisclosed flexibility in data analysis. Second, the original and replication studies were conducted and analysed by different individuals. Therefore, a successful replication would reduce uncertainty in the original findings by arguing against experimenter error in the original study design, implementation or analysis.
Following best practices to design and implement replication studies [38,39], the first two authors collaborated with the third author (the lead author of the original studies). The first two authors conducted the studies and analysed the data, while the third author shared original materials and data, provided feedback about the accuracy of the study implementation and helped identify discrepancies with the original studies. This allowed us to implement replication studies that were closely aligned with the original studies and document any remaining differences.
The procedures were the same as in the study by Sznycer et al. [16,24], with the following exceptions: (i) The original studies included a number of measures/stimuli testing other hypotheses, which were dropped from the replications, (ii) the replications were administered online, as the original studies were, but were conducted exclusively in laboratory rather than a mix of inside and outside of laboratory, and (iii) participants were students completing the tasks as part of a course assignment rather than paid participants, as was the case in some of the original samples. Successful replications would suggest the original effects are robust to these modified procedures. Although there were not strong theoretical reasons for expecting these procedural differences to alter the original effects, these differences were preregistered, as they would be among the first factors to consider as moderators in explaining any failures to replicate.
The current studies also differed from the original research by using the same participants across studies. This difference could impact the results even though study order and condition assignment were random. If participants were first assigned to the audience devaluation condition in the Shame study and then received the pride condition in the other study, it is possible that those participants continued to adopt an audience perspective in the pride condition despite instructions to anticipate how much pride they would feel. Adopting an audience perspective in the pride condition would inflate correlations between pride and audience valuations. The same concern would apply to other condition pairs in which perspective shifts between the first and second conditions (e.g. receiving valuation before shame, pride before devaluation and shame before valuation). This potential problem was addressed with post hoc analyses that excluded data from the study that was administered second, eliminating the possibility of an audience devaluation (or valuation) condition priming an audience perspective in a subsequent pride (or shame) condition and inflating the correlations.
Since no individual measure of replication success is without limitation [40][41][42], we defined replication success as meeting three criteria: (i) a correlation between emotion and audience evaluations that is statistically significant ( p < 0.05) and in the same direction as in the original study, (ii) an effect size that is different from zero and not different from the original effect size, and (iii) a replication Bayes factor [43] that exceeds 3, 2 which is considered at a minimum 'substantial evidence' in favour of the alternative hypothesis relative to the null hypothesis [44].

. Participants
We recruited 87 participants. Six were removed from analyses for failing an attention check, leaving a final sample size of 81 participants (M = 20.8 years, s.d. = 4.55 and 57 females). Participants were students from the University of Hawai'i at Mā noa who were enrolled in a research methods course [45]. Although students worked with the data as part of a course project after participating in the study, they were naive to hypotheses at the time of testing. Bootstrapping simulations on the Study 1 data from the US sample of [16] indicated that 95% power would be achieved with 10 participants. However, because participation was part of a course requirement, the stopping rule 3 for the frequentist tests (Replication Criteria i and ii) was determined by the number of students enrolled in the course (n = 87). The actual sample, after exclusion criteria were applied (n = 81), was well in excess of the sample size that would produce 95% power.

Design
Study 1 tested whether the anticipated intensity of felt pride with respect to a given prospective act or trait that others positively value correlates with the degree of positive valuation attached to that act or trait by those in the social world of the individual. Participants rated 25 brief hypothetical scenarios in which someone's acts or traits might cause them to be viewed positively by others.
Participants were randomly assigned to an audience condition or a pride condition. In the audience condition, participants were asked to rate 25 scenarios involving another individual (e.g. 'Her children are healthier and taller than average for their age', 'She is ambitious'). Participants in the audience condition were asked to 'indicate how you would view [someone of your same sex and age] if they were in those situations,' on scales ranging from 1 (I wouldn't view her positively at all) to 7 (I'd view her very positively). These ratings provide event-specific measures of positive social valuation.
In the pride condition, a different set of participants were asked to 'indicate how much pride you would feel if you were in those situations' (i.e. in the 25 scenarios; e.g. 'Your children are healthier and taller than average for their age', 'You are ambitious'), on scales ranging from 1 (no pride at all) to 7 (a lot of pride). Except for the perspectival differences, the stimuli in the pride and audience conditions were identical. The scenarios were presented in randomized order in both conditions. To conduct the replications, we used original materials provided by the third author.

Procedure
Participants were tested in a computer laboratory. They participated in Study 1, Study 2 and a third unrelated study in random order. Participants entered their gender and age, were randomly assigned to one of the two conditions, and completed the task. The scenarios were gendered according to the participant's gender. Participants were given an attention check before completing the study. 1 We report a different power analysis in the Method sections of Studies 1 and 2 to address a problem with the preregistered power analyses. See the addendum on OSF for details: https://osf.io/jymzk/. 2 The Bayes factor quantifies the evidence in favour of one hypothesis relative to a second hypothesis. A Bayes factor of 3 represents odds of 3:1 in support of a hypothesis compared with another, competing hypothesis. 3 Due to an oversight, the stopping rules for Studies 1 and 2 were not preregistered. However, no analyses were conducted before the stopping rule was reached and no additional data were collected after.

Results
This article received results-blind in-principle acceptance (IPA). Following IPA, the accepted Stage 1 version of the manuscript, not including results and discussion, was preregistered on the OSF (https://osf.io/ 8r9ah). This preregistration was performed after data analysis. Materials, data and analyses are available on the OSF (https://osf.io/upg5w/).

Is the correlation between pride and audience valuation significantly different from zero and in the same direction as in the original study?
Yes. For each scenario, we calculated the mean pride ratings provided by participants in the pride condition, and the mean valuation ratings provided by participants in the audience condition. As in the original study, the pride means and the valuation means were positively correlated, r 23

Is the replication Bayes factors greater than 3 and in favour of the alternative hypothesis relative to the null hypothesis?
Yes. We complemented the previous two frequentist criteria by computing an evidence updating Bayes factor (EU-BF). The EU-BF indicates whether an effect is present in a replication experiment factoring in the data from the original experiment [43]. The EU-BF is calculated as follows: BF 10 (replication | original) = BF 10 (original + replication)/BF 10 (original). We took two approaches to calculate a valid Bayesian measure of replication success (E.J. Wagenmakers and Alexander Ly, personal communication). 4 First, we combined the 25 item pairs from the original and the replication studies as if they were separate items, yielding 50 item pairs, to calculate BF 10 (original + replication). Dividing this by BF 10 (original) yielded BF 10 (replication|original) = 1.03 × 10 9 , which exceeded a Bayes factor of 3. For the second approach, we generated a posterior probability for ρ from the original study and used it as the prior for the replication study. The recalculated BF 10 (replication|original) was 1.19 × 10 8 , which also exceeded a Bayes factor of 3. The two measures are off by almost a factor of 9 because the second approach An initial calculation of the replication Bayes factor that was invalid is described in the electronic supplementary material. approximates the first using a stretched β-distribution, which is less accurate for correlations towards the tails of the distribution (e.g. near ρ = 1), as in the current case. Both replication Bayes factors provide evidence consistent with the frequentist approaches, satisfying the third criterion for a successful replication.

Participants
The sample for Study 2 was the same as Study 1. Participants were excluded from analyses if they failed the attention check from Study 1. 5 Bootstrapping simulations on the Study 1 data from the US sample of Sznycer et al. [24] indicated that 95% power would be achieved with 14 participants. However, because participation was part of a course requirement, the stopping rule for the frequentist tests (Replication Criteria i and ii) was determined by the number of students enrolled in the course (n = 87). Therefore, the actual sample (n = 81 after removing inattentives) well exceeded the sample size that would produce 95% power.

Design
Study 2 tested whether the anticipated intensity of felt shame with respect to a prospective act or trait that others disvalue tracks the degree of devaluation expressed by local audiences regarding that act or trait. Participants rated 29 brief hypothetical scenarios in which someone's acts or traits might lead them to be viewed negatively. The scenarios featured situations in various evolutionarily relevant domains, including social exchange, parenting, mating, the violation of social norms, status and skills. Participants were randomly assigned to an audience condition or a shame condition. In the audience condition, participants were asked to rate 29 scenarios involving another individual (e.g. 'He hosts his extended family for a holiday meal, but he burns the food', 'He dropped out of school much earlier than others'). In this condition, participants were asked to 'indicate how you would view this person if they were in those situations'; they indicated their reactions using scales ranging from 1 (I wouldn't view him negatively at all) to 7 (I'd view him very negatively). These ratings provide event-specific measures of the degree to which the members of a given population would devalue the individual described in the scenarios.
A different set of participants were asked, in the shame condition, to 'indicate how much shame you would feel if you were in those situations' (i.e. in each of the 29 scenarios; e.g. 'You host your extended family for a holiday meal, but you burn the food', 'You dropped out of school much earlier than others'), on 1 (no shame at all) to 7 (a lot of shame) scales. In both conditions, the scenarios were presented in a randomized order. To conduct the replications, we used original materials provided by the third author.

Procedure
The procedures were the same as in Study 1, except here participants were randomly assigned to an audience condition or a shame condition, and no attention check was administered.

Is the replication Bayes factors greater than 3 and in favour of the alternative hypothesis relative to the null hypothesis?
Yes. We used the same two Bayesian approaches from Study 1. On the first approach, we combined the 29 item pairs from the original and replication studies as if they were separate items (total = 58 item pairs) to calculate BF 10 (original + replication). Dividing this by BF 10 (original) yielded BF 10 (replication|original) = 2.76 × 10 4 , which exceeded a Bayes factor of 3. On the second approach, we generated a posterior for ρ from the original study and used it as the prior for the replication study. The recalculated BF 10 (replication|original) was 4.55 × 10 4 , which also exceeded a Bayes factor of 3. Both replication Bayes factors are consistent with the frequentist replication analyses, and together provide evidence for replication of the original results.

Discussion
Two preregistered replications provided confirmatory evidence in support of evolutionary-functional theories of shame and pride, as reported originally by Sznycer et al. ( [16,24]; see also [46]). The intensity of anticipatory pride felt with respect to a prospective action or trait closely tracks the magnitude of positive valuation audiences express with respect to that action or trait. Similarly, anticipatory shame closely tracks the audience devaluation.
There is no single, definitive measure of replication success, so we required the data to meet three different criteria to classify the replication as successful: (i) significant replication p-values, with effects in the same direction as in the original studies, (ii) replication effect sizes different from zero and royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 191922 Table 2. Ratings of devaluation and shame, by scenario. Note: displayed are means, with standard deviations in parentheses. Ns: shame: 42, devaluation: 39. The male versions of the shame and devaluation scenarios are presented before and after the slash, respectively. The female versions of the scenarios read 'men' (scenario # 1) and 'husband' (scenarios # 3, 6, 10, 11 and 23) instead of 'women' and 'wife'. Further, the female versions of the devaluation scenarios featured female pronouns. Otherwise, the male and female scenarios were identical. Scenarios are displayed from the highest to the lowest mean devaluation scores. # scenario devaluation shame 3 at the wedding of an acquaintance, you are discovered cheating on your wife with a food server/at the wedding of an acquaintance, he is discovered cheating on his wife with a food server In both studies, all three criteria of replication success were met. Because the data analysis plans were preregistered and neither the last author nor any of the other authors involved in the original studies participated in the data collection or analyses of the present replications, the original findings are unlikely to be false positives due to researcher degrees of freedom in data collection or analysis [47] or experimenter error. A recent large-scale replication study showed that replication success largely does not depend on the study sample [48]. Consistent with this, differences between the original and replication samples did not impact the results. Even though the replication sample was drawn from the sixth most ethnically diverse campus in the US [49], probably making it less homogeneous than most of the samples from the original studies, both pride and shame effects replicated.
Although the evidence for replication is strong, there are potential limitations that remain unaddressed. First, in the pride and shame conditions, there were probably multiple scenarios that were not relatable to most participants (e.g. 'You finished first in a marathon' in Study 1; 'You were in an accident and your face was permanently disfigured' in Study 2). How does an individual generate a specific rating of anticipatory pride (or shame) regarding a situation she has never directly encountered? One possibility is that the individual generates those ratings by imagining how she (or others) would view someone else in that situation. But, if so, that would be equivalent to the task in the audience condition, and a high correlation between audience evaluations and 'pride' (or 'shame') ratings would be inevitable. Under this hypothesis, the ostensible pride ratings in fact reflect ratings of valuation, so the finding is only apparently about pride tracking valuation but in reality is about consensus in valuation. And, similarly, the shame-devaluation findings might simply reflect consensus in shame. It is possible, then, that the observed correlations are unduly inflated by uncommon scenarios that may induce an audience perspective in the pride or shame conditions. Future studies would alleviate these conceptual validity concerns by attempting to replicate the effects with hypothetical events more relatable to participants or with events actually experienced by participants. Participants may have treated the shame and pride conditions from an audience perspective for another reason. Participants completed one condition of Study 1, one condition of Study 2, and a third unrelated task, all in a random order. Consider a participant assigned first to the audience (valuation) condition of Study 1 and then to the shame condition of Study 2. It is possible that the initial assignment to the audience condition primed the participant to subsequently adopt an audience perspective when generating the shame ratings. 6 If so, the correlation between those shame ratings and the devaluation ratings ( provided by other participants) might have been artificially inflated. To test this priming alternative, we analysed only the data from the replication study that was administered first, excluding from analyses the data from the replication study that was administered subsequently. This eliminated any possibility of one study priming the other. Analyses based only on the data of the study presented first did not change the results (see Priming-Control Analyses in the electronic supplementary material). Thus, priming of the other perspective cannot explain the observed effects.
Second, none of the scenarios in either study were negatively worded. Given this, some types of response biases (e.g. acquiescence) may have the effect of artificially inflating the emotion-evaluation correlations. Future studies can profitably assess whether the links between shame and pride on the one hand and the evaluative psychology of audiences on the other generalize to measures other than self-reports.
Because of their association with undesirable outcomes such as aggression and dominance, it has been argued that shame and the hubristic facet of pride are maladaptive emotions ( [50][51][52]; but see [11]). However, the present results suggest a different view of shame and pride. The fact that anticipatory shame tracks the devaluative psychology of audiences suggests that the shame system is designed to steer precisely between a reckless disregard of others' values and an excessive diffidence in the pursuit of personal payoffs. Notwithstanding shame's association with aggression, this emotion comprises multiple features that appear well designed to counter the threat of being devalued (see above). Moreover, even aggression may be a best response in situations where social benefits are no longer as abundantly provided because one is devalued and instead must be bargained for by threatening harm (see [8,12,27]). Similarly, the fact that anticipatory pride tracks audience valuation suggests that this emotion is naturally selected to avoid the insufficient pursuit of success on the one hand and the excessive pursuit, advertisement and entitlement over one's successes on the other. Data accessibility. Data have been deposited in an external repository: https://osf.io/upg5w/. Authors' contributions. A.S.C. implemented the replication studies and collected data; A.S.C. and R.C. analysed the data and drafted the manuscript; D.S. designed the original studies that were replicated and helped draft the manuscript; A.S.C. and D.S. gave the final approval for preregistration. All authors gave final approval for publication and agree to be held accountable for the work performed therein.