Predictive physiological anticipation preceding seemingly unpredictable stimuli: An update of Mossbridge ’s et al

This is an update of the Mossbridge  ’s meta-analysis related Background: et al to the physiological anticipation preceding seemingly unpredictable stimuli. The overall effect size observed was 0.21; 95% Confidence Intervals: 0.13 0.29 Eighteen new peer and non-peer reviewed studies completed from Methods: January 2008 to October 2017 were retrieved describing a total of 26 experiments and 34 associated effect sizes. The overall weighted effect size, estimated with a frequentist Results: multilevel random model, was: 0.29; 95% Confidence Intervals: 0.19-0.38; the overall weighted effect size, estimated with a multilevel Bayesian model, was: 0.29; 95% Credible Intervals: 0.18-0.39. Effect sizes of peer reviewed studies were slightly higher: 0.38; Confidence Intervals: 0.27-0.48 than non-peer reviewed articles: 0.22; Confidence Intervals: 0.05-0.39. The statistical estimation of the publication bias by using the Copas model suggest that the main findings are not contaminated by publication bias. In summary, with this update, the main findings reported in Conclusions: Mossbridge  ’s meta-analysis, are confirmed. et al


Introduction
The human ability to predict future events has been crucial in our evolutionary development and proliferation over epochs of time, both from a species perspective, but also, on an individual level. Our day-to-day survival is predicated on a successful marriage of experience (e.g., memory) and sensory processing (e.g., perceptual cues); for example, on a very humid heavily overcast night, our perceptions and memories inform us that a thunder storm is possible and it might be intelligent to find shelter. Such behaviour is highly adaptive as it fosters survival based strategies and is perfectly explicable in terms of current theories of biological causality. Now imagine if such prognosticating ability was possible without any sensory or other inferential cues. Such seemingly inexplicable ability would definitely hold survival advantage, if they existed. For millennia people have been reporting strange feelings of foreboding that later transpired to have significance. Over the last 36 years these phenomena have been scrutinized in the laboratory in which a subject's physiology is monitored before a randomly presented stimulus that is designed to evoke a significant poststimulus response. Disturbingly, moments before the stimulus is presented there are murmurings of activity, as if the body is predicting moments ahead of time. This effect is termed presentiment, or more recently, Predictive Anticipatory Activity (Mossbridge et al., 2014). By 2012 a good number of these studies had been completed and it was deemed worthwhile to conduct a meta-analysis of the extant literature at the time. Mossbridge, Tressoldi and Utts located 42 studies published from 1978 to 2010, testing the presentiment hypothesis, out of which 26 enabled a true comparison between pre and post-stimulus epochs (Mossbridge et al., 2012), that is the pre-stimulus physiological responses mirrored even if to a lesser degree, the post-stimulus responses.
Here two paradigms were used: either a randomly ordered presentation of arousing vs. neutral stimuli or guessing tasks in which the stimulus is the feedback about the participant's guess (correct vs. incorrect). In both of these approaches it is difficult to envision mundane strategies that might explain the anomalous pre-stimulus effects observed, and indeed, Mossbridge et al, went to significant lengths in refuting the leading candidate -expectancy effects, both in the 2012 meta-analysis and in postreview exchanges with sceptical psychologists and physiologists. Regardless of the paradigm, a broad range of physiological measures were employed from skin conductance, heart rate, blood volume, respiration, electroencephalographic (EEG) activity, pupil dilation, blink rate, and/or blood oxygenation level dependent (BOLD) responses. These are recorded throughout the session, with a pre-determined anticipatory period of between 4 to 10 seconds, in which the any pre-stimulus effect is captured. The presentiment hypothesis calls for a difference between arousing and neutral pre-stimulus responses and this is calculated across sessions. Mossbridge et al. found substantive evidence in favour of a presentiment effect concatenated to over 6 sigma -extreme statistical significance. Additionally, they also found evidence of presentiment effects from mainstream research programs -something that is becoming increasingly important as these effects become more widely known.
Because of the high profile nature of Mossbridge et al, (over 93,000 views as of January 2018) there has been a good number of replications in the few years since publication. We located an additional 26 studies describing 34 effect sizes from a dozen laboratories. The most striking aspect of this fresh database is the sheer variation in experimental approaches as researchers seek to tackle more process oriented questions rather than continuing the proof-oriented work found in the earlier meta-analysis. Because expectancy effects have been forwarded to explain at least some of the presentiment effect, it is noteworthy that several experiments in this fresh cohort of studies tackle this head on by only analysing the first trial of a run. These single-trial presentiment studies are expectancy free and are becoming more dominant in this research domain. Another interesting question that is probed in these new studies is the idea of utilizing prestimulus physiological activity to predict future events. This provides a second objective measure of the validity of the presentiment effect. There are several studies that utilize this approach and they are discussed later on. Additionally, we also found increasing evidence of presentiment research piggybacking onto mainstream psychology programs, even informing aspects of the conventional research. Also of note we found several PhD theses describing presentiment research and a greater geographical spread than in 2012, both evidence of the increasing attention such research is garnering. Lastly, we found increasing dialogue between presentiment researchers and physicists interested in retrocausality -the idea that effects can precede their cause. This is witnessed in the recent AAAS retrocausality symposium in which several researchers participated and in which some of those papers made their way into this metaanalysis (Sheehan, 2017).

Methods
The whole procedure followed both the APA Meta-Analysis Reporting Standards (APA Publications and Communications Board Working Group on Journal Article Reporting Standards, 2008), the Preferred Reporting Items for Systematic reviews and Meta-Analyses for Protocols 2015 (Moher et al., 2015) and the reporting standards for literature searches and report inclusion (Atkinson et al., 2015). A completed PRISMA checklist can be found in Supplementary File 1.

Study eligibility criteria
Study inclusion criteria were the analysis of both psychophysiological or neurophysiological signals before the random presentation of whichever type of stimulus, e.g. pictures, sounds etc. Randomization could be performed by using pseudo-random algorithms e.g. like those implemented in MatLab or E-Prime® or true random sources of random digits, e.g. TrueRNG.
It is important to point out that these eligibility criteria are different from those used by Mossbridge et al. Those authors selected only studies were the anticipatory signals mirrored the post-stimulus ones. Differently we included all studies that used anticipatory signals to predict future events independently of the presence of post-stimulus physiological signals. For example, some authors, e.g. Mossbridge (2015) used heart rate variability to predict winning i.e. $4, versus losing outcomes.
Our inclusion criteria are consequently more comprehensive than those used by Mossbridge et al.

Studies retrieval procedure
Both co-authors who are experts in this type of investigations, searched for studies through Google Scholar and PubMed by using the keywords: "presentiment" OR "anticipation" OR "precognition". Furthermore, we emailed a request of the data of completed studies to all authors we knew were involved in this type of investigations. Even if Mossbridge et al. included all studies available up to 2010, we also searched studies that could have been missed in that meta-analysis. We searched all completed studies, both peer reviewed and non-peer reviewed, e.g. Ph.D dissertations, from January 2008 to October 2017.

Study selection
Study selection is illustrated in the flow-diagram presented in Figure 1 Excluded records were studies were the psychophysiological variables were analysed only after and not before the stimuli presentations (Jin et al., 2013) and with an unusual procedure (Tressoldi et al., 2015), i.e. using heart rate feedback to inform a voluntary decision to predict random positive or negative events.
Records excluded after the screening were studies where authors did not agree to share their data for different reasons (Baumgart et al., 2017;Modestino et al., 2011). Excluded studies revealed either statistically significant or trending evidence in support of the anticipation effect in most cases, thus reducing the concerns surrounding biased removal.
The references of the included studies are reported in Supplementary File 2.

Coding procedure
The two co-authors agreed on the following coding variables: Authors; year of publication; participant selection: yes = selected according to specific criteria; no = selected without specific criteria; number of participants; number of trials; stimuli type; type of randomisation: pseudo or true random; psychophysiological signals, e.g. EEG, Heart Rate, etc.; anticipatory period; type of statistics; value of statistics and independently extracted them from the eligible studies. After the comparison, they discussed how to solve the inter-coder' differences. On the database we have added a note for each effect size, describing where we extracted the corresponding statistics in the original papers. The database along with all 18 papers are available from Tressoldi (2017). A summary of the selected studies along with their corresponding effect sizes, variance and standard error, is reported on Table S1 in the Supplementary File 3.

Moderator variables
Apart from the overall effect, we chose to compare the following moderator variables, peer review (PeerRev, yes vs no) as a control of study quality. Given the low number of studies no further moderator analyses were carried out.

Statistical methods
The standardized effect size d of each dependent variable, was estimated from the descriptive statistics (means, standard deviation and number of participants) when available. In all other cases, it was estimated by using the available summary statistics, i.e. paired t-test; Stouffer's Z; etc. by using Lakens' software (Lakens, 2013) and the function escalc () of the R package metaphor (Viechtbauer, 2017).
All effect sizes were then converted into the Hedges' g and the corresponding variance by using the formulae suggested by In order to control the reliability of the results, a second analysis was carried out by using a multilevel approach as suggested by (Assink & Wibbelink, 2016) implemented with the metafor package (Viechtbauer, 2010) and reported in the Table S2 in the Supplementary File 3.
The Bayesian meta-analysis was implemented with the brms package (Bürkner, 2017).
A copy of the syntax is available here: https://doi.org/10.6084/ m9.figshare.5661070.v1 (Tressoldi, 2017) Even if with our search activity we are quite sure to have reduced to a minimum the problem of publication bias, we performed a statistical estimation by using the Copas selection model which is recommended by Jin et al. (2015).

Frequentist multilevel random model
The forest plot is presented in Figure 2. The summary of the frequentist multilevel random model analysis is presented in Table 1 compared with the results obtained by Mossbridge et al., whereas the summary of the Bayesian multilevel random model meta-analysis is presented in Table 2.
Sensitivity analysis of the overall effect size, didn't reveal any change from Rho 0 to Rho 1, suggesting that the degree of correlations among the dependent effect sizes don't affect its magnitude.
Another "sensitivity analysis" was carried out excluding the Mossbridge and the Tressoldi studies in order to control whether different authors could obtain similar results. The main results of this analysis by using the same frequentist multilevel random model, is reported in Table 3.
Both the frequentist and the Bayesian analyses support the evidence of an overall main effect of approximately .29, and a small difference between the peer and non-peer reviewed studies. These findings will be commented further in the discussion of the comparison with Mossbridge et al.

Publication bias
The search method used and the small number of people interested in this research field, guarantee that from an empirical point of view, any publication bias is almost absent.
Unfortunately, there is no consensus about what tests are statistically more valid (Carter et al., 2017).
All the traditional tests, like the Fail-Safe, the Trim-and-Fill, the Funnel Plot have been criticized for their limitations (Jin et al., 2015;Rothstein, 2008). We hence applied the Copas selection model which is recommended by Jin et al. (2015). Both the frequentist and the Bayesian analyses converged on similar results, making our findings quite robust. The overall effect size 0.29, 95% CI = 0.18 -0.39, overlaps to that reported in the original paper: 0.21, 95% CI = 0.13-0.29, even if the heterogeneity is substantially higher: I 2 = 80.5 vs 27.4.
The high level of heterogeneity is expected considering the varieties of experimental protocols and the diversity of dependent variables, from heart rate to pupil dilation. Furthermore, we did not find substantial differences between peer and not-peer reviewed papers as in the original paper.
We found very interesting evidence of presentiment distilled from the conventional post-stimulus psychological research of Jolij and Bierman, who have performed a long series of experiments using a face detection paradigm. Additionally, the work of Kittenis found prestimulus effects from a conventional research program and pre-registered single-trial work of Mossbridge represent an important conceptual replication in countering both the use of questionable research practices and expectancy effects arguments.
A promising development of this line of research is the development of paradigms that use software in real-time to predict meaningful future outcomes before they occur, e.g. The limitations of the present meta-analysis are similar to most meta-analyses which include non pre-registered studies that cannot be controlled for the degree of freedoms in the methodology and data analysis in the course of their implementations, making them prone, for example, to the so-called "questionable research practices" (John et al., 2012).    Click here to access the data.
Supplementary File 2 -List of references used in this analysis.
Click here to access the data.
Supplementary File 3 -contains Table S1: Summary of the selected studies along with their corresponding effect sizes, variance and standard error. Stephen Baumgart Department of Psychology and Brain Sciences, University of California, Santa Barbara, Santa Barbara, CA, USA

Addressing Major Criticisms
This is a controversial topic and careful consideration of objections is needed in a meta-analysis. Presentiment or Predictive Physiological Anticipation Studies (PAA) are typically criticized on these grounds (see, for example, Wagenmakers, Wetzels, Borsboom, Kievit, van der Maas, 2015): Physical impossibility File-drawer effect Biases due to multiple comparisons or p-hacking As for the first criticism, discussion of physical plausibility is beyond the scope of this meta-analysis and is left to the discretion of the authors. Nevertheless, apparent violations of our intuitions of time are found at the quantum level, such as the Wheeler Delayed-Choice experiment. It is not impossible that such effects may scale up to a macroscopic level in a not-yet-understood emergent process. I think the final two sentences of the introduction satisfy considerations of the physical impossibility objection and no changes are needed.
Though file-drawer effects are frequently cited as a serious concern, the results section adequately discusses this issue. However, expert review is needed for this area (my response to "Is the statistical analysis and its interpretation appropriate? " should really be a combination of "Partly" and "A qualified statistician is needed".) I agree with the first sentence of the "Publication Bias" subsection that publication bias is not that serious of a concern because of the limited number of researchers and available funding.
By far the most serious concern is the third, that of multiple comparisons or p-hacking, which I do not believe is adequately addressed by either the discussion or conclusion sections. Two sentences in the conclusion are not sufficient to address this serious concern. I have included recommendations later in this review. I am aware the authors already know the following but by doing multiple analyses and only reporting a sub-sample of them a believer or supporter of a hypothesis could bias effect sizes up while a skeptic or opponent could bias effect sizes down (and none of these biases are necessarily intentional or even conscious).
In the context of PAA, serious sources of p-hacking concern are establishing baselines for electrophysiological data, deciding time regions for analysis, and methodologies for rejecting bad data and artifacts. For some physiological measurements, the problem is even worse. In Electroencephalography (EEG) studies, for example, a researcher could either study event-related potentials (ERPs), the spectral power densities of various oscillations, or the phases of such oscillations, 1.

2.
Electroencephalography (EEG) studies, for example, a researcher could either study event-related potentials (ERPs), the spectral power densities of various oscillations, or the phases of such oscillations, or a host of other possible analyses. Considering oscillations, the frequency range of an analysis can also be freely selected. Additionally, a researcher could select different bandpass filters to use or even which section of the head is included in the analysis. This is in addition to the concerns with artifact rejection, time region, and baselining already discussed. With so many free parameters, a non-preplanned study is practically useless as hard evidence for an effect unless the statistical significance of the effect is high enough that it becomes implausible that the effect in question can be generated by tweaking free parameters. Even if the statistical significance is high, the effect size is still untrustworthy because an analyst could be tweaking parameters in an effort to improve the analysis or fix problems but is only homing in on statistical fluctuations. These concerns are one reason why I refused to include exploratory EEG research from my own lab in this meta-analysis.
The solution to the multiple analysis problem is to separate research into exploratory studies where adjustments can be made in analysis and pre-planned confirmatory studies. Some of the studies included in the meta-analysis are pre-planned confirmatory studies, which should be considered the only truly reliable results for estimates of effect size due to the concerns laid out in this review (even for confirmatory studies, mistakes by researchers could distort effect sizes but these mistakes may average out in the long run).
My recommended solutions for this paper are: More discussion of the risks of p-hacking in biasing results in the discussion section Separated analyses of pre-registered confirmatory studies and exploratory studies and discussion comparing the two For exploratory studies in the study tables, include the experimenter expectation of whether the hypothesis will be verified (such as in Galak, LeBoeuf, Nelson, & Simmons, 2012) Show whether multiple comparison corrections were made for exploratory studies Exploratory studies are necessary for advancing the field. But a meta-analysis should not include them without major caveats due to potential distortions of the effect size.
I am aware the extra attention given to p-hacking risks in this research is not precedented by other fields but the small effect sizes and the major implications to our understanding of physics, psychology, and neuroscience PAA research engenders may justify additional caution be used. My colleagues and I discuss this further in Schooler, Baumgart, & Franklin, 2018.

Other Comments
"The presentiment hypothesis calls for a difference between arousing and neural pre-stimulus response and this is calculated across sessions" is not always true. For example, the hypothesis could also cover the difference between two different types of arousing stimulus (for example, auditory versus visual stimulus or two different types of visual stimulus).
Further discussion should be included for the observations mentioned of the second-to-last paragraph of the discussion; otherwise, it may be unclear why these studies are interesting as the paper asserts. 1.

Are the conclusions drawn adequately supported by the results presented in the review? Partly
No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 12 Jul 2018 , Dipartimento di Psicologia Generale, Università di Padova, Italy Patrizio Tressoldi Thank you for your detailed and constructive comments.
Here it follows our replies to your main comments.
Though file-drawer effects are frequently cited as a serious concern, the results section adequately discusses this issue. However, expert review is needed for this area (my response to "Is the statistical analysis and its interpretation appropriate? " should really be a combination of "Partly" and "A qualified statistician is needed".) I agree with the first sentence of the "Publication Bias" subsection that publication bias is not that serious of a concern because of the limited number of researchers and available funding.
Reply: we think we have a sufficient expertise in dealing with this problem. Furthermore we consulted with R.C.M. van Aert who is an expert on this topic.
My recommended solutions for this paper are: More discussion of the risks of p-hacking in biasing results in the discussion section 1.

1.
More discussion of the risks of p-hacking in biasing results in the discussion section Separated analyses of pre-registered confirmatory studies and exploratory studies and discussion comparing the two Reply: we have added a direct comparison between preregistered and no-preregistered studies, see Table 4

and the paragraph "Preregistered vs No-preregistered studies"
For exploratory studies in the study tables, include the experimenter expectation of whether the hypothesis will be verified (such as in Galak, LeBoeuf, Nelson, & Simmons, 2012) Reply: Unfortunately no one study checked this moderating variable, but our sensitivity analysis reported in Table 3, suggests that the experimenter expectation did not affect considerably the overall results.
Show whether multiple comparison corrections were made for exploratory studies Reply: our choice to use multivariate analyses, partly reduce the impact of this procedure.
"The presentiment hypothesis calls for a difference between arousing and neural pre-stimulus response and this is calculated across sessions" is not always true. For example, the hypothesis could also cover the difference between two different types of arousing stimulus (for example, auditory versus visual stimulus or two different types of visual stimulus). Reply: revised as "The presentiment hypothesis calls for a difference between the pre-stimulus responses of the two stimulus categories.." Further discussion should be included for the observations mentioned of the second-to-last paragraph of the discussion; otherwise, it may be unclear why these studies are interesting as the paper asserts. Reply: we expanded our conclusion as suggested.
No competing interests were disclosed.

Competing Interests:
Introduction P.2, Line 21: Not sure I would agree with 'body predicting moments ahead of time' as this suggests understanding -try 'reacting ahead of time' or simply 'physiological changes ahead…' P.2: Para 2: the authors note that two paradigms were used, presentation of arousing/neutral stimuli or guessing tasks. Were any clear differences in PAA effects reported between these tasks? Also, given the 'broad range of physiological measures' used to assess such changes were there any key differences here? P.2, Para 2, final sentence: the 'evidence from mainstream research' -what specifically does this refer to? Behavioural effects? Ie changes in accuracy and/or response times and if so could do with a clear reference.
reference. P. 2, Para 3, line 9: 'forwarded' doesn't make sense. Do you mean 'proposed as a potential framework/theory'? P. 2, Para 3: Not sure I'd agree that using physiological markers to 'predict' future events is a 'second objective' measure. It is simply another way to view the same procedure. P. 2, Para 3: the vague references to 'presentiment piggybacking onto mainstream research' needs clarifying and supporting with references.

Methods
P.2, Para 1: need to identify the acronym 'PRISMA' after it is outlined. P. 2, Para 3, line 3: change 'were' to 'where ' ………………..,line 4: change 'Differently' to 'In addition,' Also, what is the rationale for utilising a distinct eligibility criterion? It seems that prior research focused on testing for a pre-stim signal that would match the post-stim presentation. By not using this method you open yourself up to the criticism of widening the scope and also of looking for 'any physiological change' as opposed to one that would be specifically linked to the presentation of the target. The authors claim this is 'more comprehensive' but it could just as easily be seen as less conservative. P.3, line 4: change to 'this type of investigation' Line 8: change 'investigations' to 'research'. Line 8: The point about studies possibly 'missed' by Mossbridge et al is not clear. What makes you think any studies were 'missed' and why did you then include the same time period -ie from 2008 to 2010 -if you are 'adding' to the data it would make sense to begin your inclusion time from 2010 unless you have evidence that some studies were 'missed'? P.3, Para 4: line 1: change 'were studies were' to 'were studies where' P.3 -is it possible to say a bit more about why some authors did not agree to share their data -looks distinctly odd. P.4, Para 6: sentence referring to 'Assink' doesn't make sense -unless you move the ref out of parenthesis and into the sentence. P.4: Change 'The Bayesian' to 'A Bayesian'. And pull the sentence with syntax to the same paragraph. P. 4: Change: 'Even if with our search activity we are quite….' To 'The robust search is likely to have reduced the probability of a publication bias occurring. Nevertheless, to test this a statistical estimation was conducted using the Copas selection model, as recommended by Jin et al'

Results
Keep tense to past ie peer reviewed not review.
It doesn't make sense to compare data from the current review to Mossbridge et al 'if' both sets of data contain the same studies -as this would lead to obvious similarities etc. To an extent this seems to be addressed by the data in Table 3 but not made clearly -ie why not simply state that when X studies were excluded due to Y reasons the overall effect was still significant? I don't see the moderation results for PeerRev reported here?
The reported 'small difference between the peer reviewed and non-peer reviewed' is vague and unhelpful. P.2, Line 21: Not sure I would agree with 'body predicting moments ahead of time' as this suggests understanding -try 'reacting ahead of time' or simply 'physiological changes ahead…' Reply: we changed with "'physiological changes ahead of time". P.2: Para 2: the authors note that two paradigms were used, presentation of arousing/neutral stimuli or guessing tasks. Were any clear differences in PAA effects reported between these tasks?

Reply: No
Also, given the 'broad range of physiological measures' used to assess such changes were there any key differences here? Reply: No P.2, Para 2, final sentence: the 'evidence from mainstream research' -what specifically does this refer to? Behavioural effects? Ie changes in accuracy and/or response times and if so could do with a clear reference. Reply: Added reference P. 2, Para 3, line 9: 'forwarded' doesn't make sense. Do you mean 'proposed as a potential framework/theory'? Reply: replaced with "proposed as a potential mechanism". P. 2, Para 3: Not sure I'd agree that using physiological markers to 'predict' future events is a 'second objective' measure. It is simply another way to view the same procedure.
Reply: changed as "another way.." P. 2, Para 3: the vague references to 'presentiment piggybacking onto mainstream research' needs clarifying and supporting with references.

Reply: changed accordingly
Also, what is the rationale for utilising a distinct eligibility criterion? It seems that prior research focused on testing for a pre-stim signal that would match the post-stim presentation. By not using this method you open yourself up to the criticism of widening the scope and also of looking for 'any physiological change' as opposed to one that would be specifically linked to the presentation of the target. The authors claim this is 'more comprehensive' but it could just as easily be seen as less conservative.
Reply: We prefer the term more comprehensive because some experimental designs, e.g. hit guessing, don't allow a post-stimulus physiological measure. However, all hit guessing, don't allow a post-stimulus physiological measure. However, all experimental designs tied the differential anticipatory physiological activity to two different outcomes, e.g. hits or misses. P.3, line 4: change to 'this type of investigation' Line 8: change 'investigations' to 'research'.

Reply: fixed.
Line 8: The point about studies possibly 'missed' by Mossbridge et al is not clear. What makes you think any studies were 'missed' and why did you then include the same time period -ie from 2008 to 2010 -if you are 'adding' to the data it would make sense to begin your inclusion time from 2010 unless you have evidence that some studies were 'missed'?

Reply: fixed.
P.3 -is it possible to say a bit more about why some authors did not agree to share their datalooks distinctly odd.
Reply: the reasons for such decisions are confidential.
P.4, Para 6: sentence referring to 'Assink' doesn't make sense -unless you move the ref out of parenthesis and into the sentence.
Reply: fixed P.4: Change 'The Bayesian' to 'A Bayesian'. And pull the sentence with syntax to the same paragraph. Reply: fixed P. 4: Change: 'Even if with our search activity we are quite….' To 'The robust search is likely to have reduced the probability of a publication bias occurring. Nevertheless, to test this a statistical estimation was conducted using the Copas selection model, as recommended by Jin et al'

Results
Keep tense to past ie peer reviewed not review.

Reply: fixed
It doesn't make sense to compare data from the current review to Mossbridge et al 'if' both sets of data contain the same studies -as this would lead to obvious similarities etc. To an extent this seems to be addressed by the data in Table 3 but not made clearly -ie why not simply state that when X studies were excluded due to Y reasons the overall effect was still significant?