A Bayesian model-averaged meta-analysis of the power pose effect with informed and default priors

Carney, Cuddy, and Yap (2010) found that --compared to participants who adopted constrictive body postures-- participants who adopted expansive body postures reported feeling more powerful, showed an increase in testosterone and a decrease in cortisol, and displayed an increased tolerance for risk. However, these power pose effects have recently come under considerable scrutiny. Here we present a Bayesian meta-analysis of six preregistered studies from this special issue, focusing on the effect of power posing on felt power. Our analysis improves on standard classical meta-analyses in several ways. First and foremost, we considered only preregistered studies, eliminating concerns about publication bias. Second, the Bayesian approach enables us to quantify evidence for both the alternative and the null hypothesis. Third, we use Bayesian model-averaging to account for the uncertainty with respect to the choice for a fixed-effect model or a random-effect model. Fourth, based on a literature review we obtained an empirically informed prior distribution for the between-study heterogeneity of effect sizes. This empirically informed prior can serve as a default choice not only for the investigation of the power pose effect, but for effects in the field of psychology more generally. For effect size, we considered a default and an informed prior. Our meta-analysis yields very strong evidence for an effect of power posing on felt power. However, when the analysis is restricted to participants unfamiliar with the effect, the meta-analysis yields evidence that is only moderate.


Introduction
Could adopting a powerful body posture make us more powerful? Carney, Cuddy, and Yap (2010) found that participants who adopted expansive, high-power body postures (Figure 1, top row) as opposed to constrictive, low-power body postures (Figure 1, bottom row) reported feeling more powerful and in charge, showed an increase in testosterone and a decrease in cortisol, and displayed an increased tolerance for risk. The power-pose effect has attracted a lot of attention, partly due to the anticipated consequences for day-to-day life suggesting that it might be possible to "fake it 'til you make it". However, this power pose effect has recently come under scrutiny. When Ranehill et al. (2015) attempted to replicate the effect, they found-similar to the original study-that adopting high-power poses increased participants' self-reported feelings of power; nevertheless, they did not find an effect on testosterone or cortisol nor on behavioral measures such as risk taking. Carney, Cuddy, and Yap (2015) pointed out a number of methodological differences that they believe might have been the cause for the diverging results. Recently, Garrison, Tang, and Schmeichel (2016) conducted a preregistered replication and extension of the power pose study, and they failed to identify an effect of power posing on risk taking behavior. Furthermore, in contrast to Ranehill et al. (2015), these authors did not find evidence for a power pose effect on subjective feelings of power.
Here we present a meta-analysis of the effect of power posing on self-reported felt power, which was included as a dependent variable in six of the seven studies in this special issue.
Our analysis improves upon classical analyses in several ways. First, we only consider a set of preregistered studies which comes with the advantage that publication bias can be ruled out a priori (cf. the concept of a prospective meta-analysis in medicine). Second, the Bayesian approach enables us to quantify evidence for both the alternative hypothesis and for the null hypothesis; note that this evidence can be seamlessly updated as future studies on the effect become available. Third, Bayesian model-averaging enables us to fully acknowledge uncertainty with respect to the choice of a fixed-effect or random-effect model; in the fixed-effect model, the effect is assumed to be identical across studies; in the random-effect model, the effect is assumed to vary across studies. Instead of adopting one model for inference and ignoring the other model entirely, we can weight the results of both models according to their posterior plausibilities. This yields a model-averaged measure of evidence and a modelaveraged estimate for the meta-analytic effect size. Fourth, the Bayesian approach enables us to incorporate existing knowledge into our analysis (e.g., Rhodes, Turner, & Higgins, 2015). Based on an extensive literature review of meta-analyses in the field of psychology, we obtained an informed prior distribution for the between-study heterogeneity. This informed prior distribution can serve as an informed default not only for the investigation of the power pose effect in the present meta-analysis, but for the field of psychology more generally. For effect size we also consider an informed prior distribution based on knowledge about effect sizes in the field of psychology. As a robustness check with respect to the prior choice we show that qualitatively similar results are obtained when we instead use a default prior for the effect size parameter.
The outline of this article is as follows: first, we explain the details of our analysis. Second, we present the results of an extensive literature review that allowed us to specify an informed prior distribution for the between-study heterogeneity. Third, we present the results of the model-averaged Bayesian meta-analysis for two different prior choices for effect size. Finally, we investigate whether the results change when only participants unaware of the power pose effect are included in the analysis.

Method
In our meta-analysis, we focused on the dependent variable felt power which was measured in all replication studies in the present issue except for the study by Jackson et al., which was therefore not considered in the analysis. We investigated the question whether felt power was higher in the high-power condition than in the low power condition.

Analysis of Individual Studies
When considering a single study, the power pose effect can be tested using a standard one-sided, independent-samples t-test. Hence, the first step in our analysis was to compute one-sided Bayesian t-tests (Rouder, Speckman, Sun, Morey, & Iverson, 2009;Ly, Verhagen, & Wagenmakers, 2016;Gronau, Ly, & Wagenmakers, 2017). This allowed us (1) to estimate for each study the posterior distribution of the standardized effect size that represents our beliefs about the effect size after having observed the data of that study and (2) to quantify the evidence that each study provides in favor of the hypothesis that the power pose effect is positive (H + ) versus the null hypothesis that the effect is zero (H 0 ).
To quantify the evidence that the data provide for or against H + we computed the Bayes factor (Jeffreys, 1961;Kass & Raftery, 1995) which is the predictive updating factor that quantifies how much the data have changed the relative plausibility of the competing models. The Bayes factor has an intuitive interpretation: when BF +0 = 10 this indicates that the data are ten times more likely under H + than under H 0 ; when BF +0 = 1/5 this indicates that the data are five times more likely under H 0 than under H + .

Meta-Analysis
The next step in our analysis was to combine the studies with the help of a Bayesian meta-analysis (e.g., Marsman, Schönbrodt, Morey, Yao, Gelman, & Wagenmakers, 2017) to obtain an estimate of the overall effect size and to quantify the evidence for an effect that takes into account all studies simultaneously. In a classical meta-analysis the analyst has to make a choice between a fixed-effect and a random-effect model. A fixed-effect model makes the assumption that there is one underlying effect size so that the true effect in each study is identical; differences in the observed effect sizes are solely due to normally distributed sampling error. This can be formalized as follows: we assume that y i ~ N( fixed , SE i 2 ), where y i , i = 1,2,...,n denotes the observed effect size in the i-th of n studies, SE i denotes the corresponding standard error which is commonly assumed to be known, and fixed corresponds to the common true effect size. In contrast, a random-effect model allows for idiosyncratic study effects, that is, we no longer impose the constraint that there exists one common true effect size for all studies. The random study effects are usually assumed to follow a normal distribution with a mean equal to the overall effect size that we are interested in and a standard deviation that corresponds to the between-study heterogeneity. Note that analogously to the fixed-effect model, the model still incorporates random sampling error so that the observed effect size for a given study is not necessarily identical to the true effect size for that study. These assumptions yield a model with a hierarchical structure which can be formalized as follows: let random denote the mean of the normal distribution of the study effects (i.e., the quantity that we are interested in), denote the standard deviation of that normal distribution (i.e., between-study heterogeneity), and i denote the true study effect for the i-th study. Then, i ~ N( random , 2 ) and y i | i ~ N( i , SE i 2 ). The structure of the model allows one to analytically integrate out the random study effects so that the model can equivalently be written as y i ~ N( random , 2 + SE i 2 ) which can be more convenient from a computational perspective.

Bayesian Model-Averaging
The choice of a fixed-effect or random-effect model commonly relies on a test for heterogeneity or on a priori considerations. Final inference is then based on either the fixedeffect or random-effect model. When the number of studies is small, this choice may be difficult; and in certain cases, the choice may be consequential. The Bayesian approach, however, allows a compromise solution: instead of selecting either a fixed-effect or random-effect model, we can use Bayesian model-averaging (e.g., Haldane, 1932;Hoeting, Madigan, Raftery, & Volinsky, 1999) and retain all models for final inference. Conclusions are then based on a combination of all models where the results of each model are taken into account according to the model's plausibility in light of the observed data. Concretely, Bayesian model-averaging allows us to obtain a model-averaged estimate for the meta-analytic effect size (Sutton & Abrams, 2001) and to quantify the overall evidence for an effect that considers both the fixedeffect and random-effect model (Scheibehenne, Gronau, Jamil, & Wagenmakers, 2017).
With respect to hypothesis testing, for the current analysis we entertained four models of interest, shown in Table 1: (1) the fixed-effect model H + ; (2) the fixed-effect model H 0 (i.e., fixed = 0); (3) the random-effect model H + ; (4) the random-effect model H 0 (i.e., random = 0). The fixedeffect meta-analytic Bayes factor was obtained by comparing case (1) to case (2); the randomeffect meta-analytic Bayes factor pitched case (3) against case (4). To compute the modelaveraged Bayes factor, we contrasted the summed posterior model probabilities (i.e., the probability of a model given the data) for cases (1) and (3) against the summed posterior model probabilities for cases (2) and (4). This assumes that all four models are equally likely a priori, a common assumption in model-averaging scenarios. In case the prior model probabilities were not identical, the ratio of the summed posterior model probabilities for cases (1) and (3) over (2) and (4) would need to be divided by a ratio obtained in a similar fashion but this time based on the prior model probabilities.
With respect to parameter estimation, we computed a model-averaged effect size estimate based on the four model versions described above, except that we no longer imposed the constraint that the effect size has to be positive. In other words, consistent with standard practice, we imposed a directional constraint for testing but not for estimation (cf. Jeffreys, 1961, who also used different priors for estimation and testing). This reflects the fact that the estimation framework is generally more exploratory in nature, and this mindset is inconsistent with the use of hard boundaries. The combined estimate was obtained by combining the estimates of models (1) and (3) --but without the order-constraints--according to their posterior model probabilities. To conduct the model-averaged Bayesian meta-analysis, we used the R package metaBMA (Heck & Gronau, 2017) available from https://github.com/danheck/metaBMA.

Prior Distributions
In the Bayesian approach, model parameters are assigned prior distributions that reflect the knowledge, uncertainty, or beliefs for the parameters before seeing the data. Using Bayes' theorem, these prior distributions are then updated by the data to yield posterior distributions, which reflect the uncertainty for the parameters after the data have been observed. Consequently, in order to conduct our Bayesian analyses, prior distributions were required for all model parameters.
For the standardized effect size, we considered two different prior choices. First, we used what has now become the default choice in the field of psychology, that is, a zero-centered Cauchy distribution with scale parameter equal to 1/ 2 (Morey & Rouder, 2015). Second, we considered the informed prior distribution reported in Gronau et al. (2017): a t distribution with location 0.350, scale 0.102, and three degrees of freedom, which is displayed in Figure 2. This prior distribution was elicited from Dr. Oosterwijk, a social psychologist at the University of Amsterdam, for a reanalysis of the Registered Replication Report on the facial feedback hypothesis . We believe this prior distribution is generally plausible for a wide range of small-to-medium effects in social psychology (i.e., for effects whose presence needs to be ascertained by statistical analysis). One could elicit a "power pose prior", but we believe the resulting distribution would be highly similar to the Oosterwijk prior, and therefore yield highly similar inferences. Researchers interested in using a specific "power pose prior" are invited to explore this option using the R code provided online (https://osf.io/r2cds/).
For the one-sided hypothesis tests, the priors were truncated at zero, that is, the model encoded the a priori assumption that negative effect sizes are impossible. For estimating the effect size, however, we removed this truncation. The informed and default priors are depicted in Figure 2. The informed prior expresses the belief that the effect size is positive but most likely small to medium in size. The default prior on the other hand is more spread out (i.e., less informative) and it is centered on zero. Figure 2 also illustrates how the priors were truncated at zero for testing whereas for estimation, this truncation was removed. In addition to the prior distribution for the effect size, the Bayesian meta-analysis required a prior distribution for the between-study heterogeneity. Here we chose an informed prior distribution for the between-study standard deviation . This informed prior was based on all available between-study heterogeneity estimates for mean-difference effect sizes in metaanalyses reported in Psychological Bulletin in the years 1990 to 2013 (van Erp, Verhagen, Grasman, & Wagenmakers, 2017, https://osf.io/preprints/psyarxiv/myu9c). The distribution of these 162 estimates is shown in Figure 3. Note that we have excluded between-study heterogeneity estimates that were exactly equal to zero, as the prior should reflect knowledge conditional on the assumption that the random-effect model is true; between-study heterogeneity estimates of exactly zero, however, suggest that the fixed-effect model was more appropriate. The distribution of the estimates in Figure 3 suggests that (1) the between-study standard deviations in the field of psychology range from 0 to 1 and (2) there are more small estimates than large ones. These two features are captured by an Inverse-Gamma(1, 0.15) distribution (depicted in Figure 3 as a solid line). 1 Note, however, that this prior distribution does not completely rule out the possibility that between-study heterogeneity is larger than 1; the distribution merely assigns values larger than 1 a relatively small prior credibility. This inversegamma distribution resembles the one obtained when maximum-likelihood methods are used to fit an inverse-gamma distribution to the between-study heterogeneity estimates. However, in our opinion, the maximum-likelihood inverse-gamma distribution slightly overemphasizes small between-study heterogeneity values. In the appendix, we present the results obtained under two alternative prior choices for between-study heterogeneity: (1) the maximum-likelihood inversegamma distribution; and (2) a Beta(1, 2) prior distribution. The results are robust across all of these prior choices. Having specified the models and prior distributions, we needed to compute the probability of the data given each model under consideration. This was achieved by integrating out the model parameters with respect to their prior distributions. For the models for which this was not possible analytically, we evaluated this quantity using numerical integration as implemented in the R package metaBMA (Heck & Gronau, 2017). R code for reproducing all analyses can be found on the Open Science Framework: https://osf.io/r2cds/. 2 Figure 4 displays the results of the Bayesian analysis using the default effect size prior for the studies as reported in this special issue. Note that most studies did not exclude participants who were familiar with the effect, for instance, from viewing the TED talk about power posing, which is currently the second most popular TED talk of all time (https://www.ted.com/playlists/171/the_most_popular_talks_of_all). This analysis is based on a total of 1071 participants. Below, we investigate how the results change when considering only those participants who indicated not to know the power pose effect. The upper part of Figure 4 displays the results of the Bayesian t-tests. The left-part of the figure displays for each study the median of the posterior distribution for the effect size (grey dots) and a 95% highest density interval (HDI; i.e., the shortest interval that captures 95% of the posterior mass). The right part of the figure shows the one-sided default Bayes factors in favor of H + and, for comparison, the (two-sided) p-values obtained from classical independent samples t-tests. Based on the posterior distributions, it appears that there might be a positive effect. However, this is hard to assess since the 95% highest density intervals are relatively wide. All Bayes factors except one are between ⅓ and 3 indicating that there is not much evidence for H + or H 0 . Hence, when considering the individual studies separately, we cannot draw strong conclusions about whether there is an effect or not.

Analysis of Reported Studies: Default Prior on Effect Size
Each study alone does not provide much evidence in favor of either hypothesis; however, a Bayesian meta-analysis allows us to obtain an impression of the overall evidence obtained when considering all studies simultaneously. The lower part of Figure 4 displays the result of the Bayesian meta-analysis using the default Cauchy prior with scale 1/ 2 for the meta-analytic effect size. The black diamonds display the median of the posterior distribution of the meta-analytic effect size for the fixed-effect, random-effect, and model-averaged analysis, and the lines correspond to the 95% highest density intervals. The model-averaged posterior distribution is obtained by combining the estimates of the fixed-effect and the random-effect model according to their plausibility in light of the data. The lower right part of Figure 4 shows the meta-analytic one-sided Bayes factors and, for the fixed-effect and the random-effect model, the two-sided p-value obtained by conducting classical meta-analyses. The meta-analytic fixedeffect Bayes factor equals BF +0 = 89.6, indicating very strong evidence in favor of an effect of power posing on felt power. The meta-analytic random-effect Bayes factor is less extreme but still indicates evidence for an effect: BF +0 = 9.4. The observed data support a fixed-effect model more than a random-effect model: the Bayes factor that compares case (1) To sum up, the Bayesian meta-analytic results based on the default prior for the effect size provide very strong evidence in favor of the hypothesis that power posing leads to an increase in felt power.

Analysis of Reported Studies: Informed Prior on Effect Size
Next, we consider the results based on the informed t prior distribution for the effect size with location 0.350, scale 0.102, and three degrees of freedom (cf. Figure 2). The results are displayed in Figure 5. The effect size posterior distributions for the individual studies clearly show the influence of the informed prior distribution: the posteriors are narrower and slightly shifted towards the location of the informed prior. The individual study one-sided informed Bayes factors are larger than the default ones. This can be explained by interpreting the Bayes factor as an assessment tool of the predictive success of two competing hypotheses. The informed alternative hypothesis makes much riskier predictions than the default alternative hypothesis; however, these risky predictions are rewarded because the observed effect sizes fall within the range of values predicted by the informed hypothesis. Hence, since the predictions match the observed data, the informed hypothesis yields more evidence for the presence of the power pose effect as compared to an alternative hypothesis that specifies a default prior for the effect size. Nevertheless, only two of the study-specific Bayes factors provide moderate evidence for an effect, whereas the other four provide only anecdotal evidence for H + or H 0 .
The informed meta-analytic fixed-effect Bayes factor is BF +0 = 191.8 indicating extreme evidence in favor of an effect of power posing on felt power. The informed meta-analytic random-effect Bayes factor is less extreme but still indicates strong evidence for an effect: BF +0 = 20.7. As for the default prior, the observed data support a fixed-effect model more than a random-effect model, the Bayes factor that compares case (1), fixed-effect H + , to case (3), random-effect H + , (not displayed) indicates that the data are 3.9 times more likely under the fixed-effect model than under the random-effect model (not displayed). The informed metaanalytic model-averaged Bayes factor is equal to BF +0 = 71.4 indicating very strong evidence in favor of an effect of power posing on felt power. The median of the model-averaged metaanalytic effect size is similar to the default one and is equal to 0.26 [95% HDI: 0.14, 0.37].
To sum up, the Bayesian meta-analytic results based on the informed prior for the effect size provide very strong evidence in favor of the hypothesis that power posing leads to an increase in felt power. The informed analysis yields more evidence for an effect as compared to the default analysis indicating that the successful predictions of the informed hypothesis are rewarded. Figure 5: Bayesian model-averaged meta-analysis using the informed t prior with location 0.350, scale 0.102, and three degrees of freedom for the standardized effect size (depicted in Figure  2A). The dots and diamonds correspond to the median of the posterior distribution for the effect size; the lines correspond to the 95% highest density intervals. The one-sided Bayes factors are displayed on the right, flanked by classical two-sided p-values.

Moderator Analysis: Knowledge of the Effect (Default Prior on Effect Size)
Next we investigate whether and how the results change when considering only participants who indicated to be unaware of the power posing effect. Hence, participants who could guess the goal of the study or were familiar with the power pose TED talk were excluded in all studies under consideration, leaving a total of 809 participants. Figure 6 displays the results of the Bayesian analysis using the default effect size prior.  Figure 4, the posterior distributions are shifted towards smaller values and the 95% highest density intervals are relatively wide (due to the reduced sample size). Three Bayes factors are between ⅓ and 3 indicating that there is little evidence for H + or H 0 , one Bayes factor indicates moderate evidence for the alternative hypothesis, and two Bayes factors indicate moderate evidence for the null hypothesis. Hence, similar to the previous analysis, when considering the individual studies separately, we cannot draw strong conclusions about whether or not there is an effect.

Compared to
The lower part of Figure 6 displays the result of the Bayesian meta-analysis using the default Cauchy prior with scale 1/ 2. The meta-analytic fixed-effect Bayes factor equals BF +0 = 4.4 indicating moderate evidence in favor of an effect of power posing on felt power. The metaanalytic random-effect Bayes factor equals BF +0 = 1.6 indicating only anecdotal evidence for the alternative hypothesis. The observed data support a fixed-effect model more than a randomeffect model: the Bayes factor that compares case (1), fixed-effect H + , to case (3), randomeffect H + , (not displayed) indicates that the data are 3.1 times more likely under the fixed-effect model than under the random-effect model. This is reflected in the model-averaged result: the meta-analytic model-averaged Bayes factor is equal to BF +0 = 3.1 indicating moderate evidence in favor of an effect of power posing on felt power. The median of the model-averaged metaanalytic effect size is equal to 0.18 [95% HDI: 0.03, 0.33].
To sum up, when considering only participants who were unaware of the effect and using the default effect size prior, we obtain only moderate evidence for an effect of power posing on felt power. This is in contrast to the results of the previous analysis in which participants who were familiar with the effect were mostly not excluded.

Moderator Analysis: Knowledge of the Effect (Informed Prior on Effect Size)
Next we consider the results based on the informed t prior distribution for effect size with location 0.350, scale 0.102, and three degrees of freedom (depicted in Figure 2) when taking into account only participants unfamiliar with the effect. The results are displayed in Figure 7. As before, the effect size posterior distributions for the individual studies clearly show the influence of the informed prior distribution: the posteriors are narrower and slightly shifted towards the location of the informed prior. Again, the individual study one-sided informed Bayes factors are larger than the default ones. Nevertheless, only one Bayes factor provides moderate evidence for an effect, four provide anecdotal evidence for the alternative or the null hypothesis, and one provides moderate evidence for the null.
The informed meta-analytic fixed-effect Bayes factor equals BF +0 = 6.8, indicating moderate evidence in favor of an effect of power posing on felt power. The informed metaanalytic random-effect Bayes factor is BF +0 = 2.6, indicating anecdotal evidence for an effect. As for the default prior, the observed data support a fixed-effect model more than a random-effect model, the Bayes factor that compares case (1), fixed-effect H + , to case (3), random-effect H + , (not displayed) indicates that the data are 3.0 times more likely under the fixed-effect model than under the random-effect model. The informed meta-analytic model-averaged Bayes factor is equal to BF +0 = 4.9 indicating moderate evidence in favor of an effect of power posing on felt power. The median of the model-averaged meta-analytic effect size is equal to 0.23 [95% HDI: 0.10, 0.36].
To sum up, when considering only participants who were unaware of the effect, the results were robust with respect to using the informed or the default prior for the effect size. In both analyses, we found only moderate evidence in favor of the hypothesis that power posing leads to an increase in felt power.

Discussion
Six preregistered studies in this special issue were subjected to a Bayesian metaanalysis of the effect of power posing on self-reported felt power. The Bayesian approach enabled us to fully acknowledge uncertainty with respect to the choice of a fixed-effect or a random-effect model, and allowed us to incorporate prior information about between-study heterogeneity and plausible effect sizes in the field of psychology. The informed prior distribution for between-study heterogeneity was based on an extensive literature review, and we believe it may serve as an informed default in the field of psychology more generally (cf. Rhodes et al., 2015, for a similar approach in medicine).
When considering the studies as reported (i.e., most studies did not exclude participants who were familiar with the effect), we obtained very strong evidence that adopting high-power poses increases subjective feelings of power; this was the case for both the analysis based on a default prior and an informed prior for the effect size. However, when considering only participants unfamiliar with the effect, we obtained only moderate evidence for an effect for both the default and informed effect size prior analysis. This suggests that knowledge of the effect might play a role with respect to the size of the effect of power posing on felt power, although a formal assessment of this possibility requires a different statistical analysis (e.g., Gelman & Stern, 2006;Nieuwenhuis, Forstmann, & Wagenmakers, 2011), the development of which is beyond the scope of this paper. Future studies might investigate this potential moderating effect and explore the extent to which the felt power effect is a demand characteristic. Note that the Bayesian approach allows us to seamlessly update the evidence as more studies become available (e.g., Scheibehenne et al., 2017).
Our meta-analysis focused on the effect of power posing on feelings of subjective power and did not consider behavioral or hormonal measures. Nevertheless, we would like to emphasize that given a set of preregistered studies that include the behavioral and hormonal measures of interest, our methodology can readily be applied to quantify evidence in a coherent Bayesian way for those measures as well. Figure 8: Distribution of the non-zero between-study standard deviations from meta-analyses reported in Psychological Bulletin (1990-2013van Erp et al., 2017). The informed Inverse-Gamma(1, 0.15) prior distribution is displayed on top as a solid line, the maximum-likelihood inverse-gamma distribution is depicted as a dashed line, and the Beta(1, 2) distribution is depicted as a dotted line. Figure available at http://tinyurl.com/k6yyz6b under CC license https://creativecommons.org/licenses/by/2.0/. Table 2 displays the results for the reported data and Table 3 displays the results for the data of the subset of participants who were unfamiliar with the power pose effect: for all three prior choices for the between-study heterogeneity the results are highly similar.