The accuracy and validity of self-reported social media use measures among adolescents

A growing number of studies have tried to assess the effects of social media on adolescents, who are among the most avid social media users. To establish the effects of social media use, we need accurate and valid instruments to measure adolescents’ time spent with these media. The aim of this preregistered study was to examine the accuracy and convergent validity of retrospective surveys and experience sampling method (ESM) surveys, by comparing adolescents’ responses to these self-report measures with their digital trace data. The sample consisted of 125 adolescents (48% girls; M age = 14.1) with Android smartphones. In both retrospective surveys and ESM, adolescents overestimated their time spent on social media. They more accurately estimated their time spent on platforms that are used in a less fragmented way (Instagram) than on platforms that are used in a more fragmented way (Snapchat). The between-person convergent validity of adolescents’ time estimates according to retrospective surveys and ESM reached the threshold for minimum acceptable convergent validity ( r ranged from .55 to .65). The within-person convergent validity of adolescents’ ESM estimates of their time spent on social media was unacceptable ( r = .32). The between- and within-person convergent validity of ESM estimates decreased over time (i.e., fatigue effect).


Adolescents
Since the introduction of social media, numerous studies have tried to assess how adolescents are affected by these media (for meta-analyses, see e.g., Huang, 2017;Liu & Baumeister, 2016;Liu et al., 2018), most with retrospective surveys. In these studies, adolescents were asked to estimate the time they spent on social media in general (e.g., in a typical week) or within a specific time frame (e.g., in the previous week). In addition, a small but increasing number of studies have used experience sampling methodology (ESM) to measure social media use (Griffioen et al., 2020). In such studies, respondents used their smartphones to estimate their time spent on social media in the previous hour or since the last notification (e.g., Beyens et al., 2020).When relying on such self-report measures to investigate the effects of social media, it is of utmost importance that these measures are accurate, so that they prevent systematic under-and overestimation (Scharkow, 2016). In addition, these instruments should be valid, so that they measure what they are intended to measure (Bryman, 2012;Flake & Fried, 2020). After all, inaccurate and invalid self-report measures of social media use can lead to inaccurate estimations of (social) media effects (Scharkow, 2016).
Recent technological advances allow researchers to determine the accuracy and convergent validity of self-reported estimates of time spent with social media by comparing these subjective estimates to more objective digital trace data (Stier et al., 2019). To our knowledge, four studies have used digital trace data to investigate the accuracy and convergent validity of retrospective survey measures of time spent with social media (Burke et al., 2010;Ernala et al., 2020;Junco, 2013;Sewall et al., 2020), and one to establish the accuracy of ESM measures (Deng et al., 2019). All these studies showed low accuracy of both types of self-report measures: The overestimation of daily time spent on social media measured through retrospective surveys ranged from 51 minutes  to 256 minutes (Ernala et al., 2020), and, when measured through ESM, this overestimation was 142 minutes (Deng et al., 2019). In addition, these studies revealed substantial differences in the convergent validity of retrospective surveys of time spent on social media, as the associations of responses to retrospective surveys with digital trace data ranged from r = .24 among a sample of about 50,000 adolescents and adults (Ernala et al., 2020) to r = .87 among a sample of 45 college students (Junco, 2013). This points out that not all studies found acceptable convergent validity, since a correlation of r = .50 between digital trace data and self-report measures of time spent on social media is considered the minimum acceptable level of convergent validity according to Carlson and Herdman (2012).
While these previous studies have considerably enhanced our understanding of the accuracy and convergent validity of different social media use measures, the literature can be extended in three ways. First, while most previous validation studies assessed the accuracy and convergent validity of time spent on social media either among adults (Burke et al., 2010;Deng et al., 2019;Junco, 2013;Sewall et al., 2020) or mixed samples including both adolescents and adults (Ernala et al., 2020), no study has solely focused on adolescents. A focus on adolescents is important, because adolescents are among the most avid users of social media platforms (van der Veer et al., 2020), and are particularly prone to its effects (Dienlin & Johannes, 2020). Since most adolescents use multiple social media platforms in functionally complementary ways (Waterloo et al., 2017), we investigated whether the accuracy and convergent validity of self-report measures of time spent on social media differs across the three mostly used social media platforms among Dutch adolescents: Instagram, WhatsApp, and Snapchat (van der Veer et al., 2020;van Driel et al., 2019).
The second extension of earlier literature that we provide is a systematic investigation of the accuracy and convergent validity of both retrospective survey and ESM measures. Several researchers have called for ESM studies to investigate the effects of social media use because they believe that ESM measures reduce recall bias, so that they provide more accurate and valid estimates of adolescents' time spent on social media than retrospective surveys (Dienlin & Johannes, 2020;Griffioen et al., 2020;Naab et al., 2018;. However, this assumption has never been investigated, because previous studies have compared digital trace data with either ESM data (Deng et al., 2019) or retrospective survey estimates (e.g., Junco, 2013;Sewall et al., 2020). Therefore, the second aim of this study is to investigate and compare the accuracy and convergent validity of retrospective survey and ESM measures of adolescents' time spent on social media.
A third and final extension of the literature is that we incorporate a micro-longitudinal perspective on the accuracy and convergent validity of self-reported social media use measures.
Existing studies have examined the between-person convergent validity, showing that people who reported spending more time on social media compared to other people according to retrospective surveys, also spent relatively more time on social media according to digital trace data (e.g., Junco, 2013;Sewall et al., 2020). However, a rapidly growing number of studies now employ longitudinal designs to study within-person changes in time spent on social media (e.g., Aalbers et al., 2019;Bayer et al., 2018;Boers et al., 2019). This raises the question whether within-person fluctuations in time spent on social media are consistent across different social media measurement methods. In the present study, we therefore not only examined the convergent validity based on how adolescents' average self-reported scores are related to their average digital trace scores (i.e., between-person convergent validity), but also based on how adolescents' momentary ESM scores co-fluctuate with their momentary digital trace scores (i.e., within-person convergent validity). In addition, because the accuracy and convergent validity of a person's self-reports may change over time (Naab et al., 2018), we also explored potential time effects of adolescents' responses to retrospective surveys and ESM. Specifically, we investigated two opposing hypotheses regarding such time effects: the learning effect hypothesis, which proposes more accuracy and higher convergent validity across time and the fatigue effect hypothesis, which proposes less accuracy and lower convergent validity across time.

The Accuracy and Convergent Validity of Social Media Use Measures
Since validity studies of retrospective survey measures of time spent with social media are still scarce, studies that compared digital trace data with retrospective surveys of time spent with mobile phones or the internet are useful to inform our research questions and hypotheses. Over the past two decades, about two dozen studies have assessed the accuracy or convergent validity of retrospective surveys and ESM estimates by comparing these estimates with digital trace data of time spent on the phone and the internet among adults (e.g., Araujo et al., 2017;Jones-Jang et al., 2020;Jürgens et al., 2019;Scharkow, 2016) and eight studies among adolescents (e.g., Inyang et al., 2009;Marciano & Camerini, 2020;Mireku et al., 2018). These adult studies found that adults tend to overestimate the time spent on their phones and the internet, and they yielded correlations ranging from r = .23 (Araujo et al., 2017) to r = .79 (Funch et al., 1996) between retrospective surveys and digital trace methods. In addition, these studies pointed at the importance of distinguishing retrospective surveys of time spent on (social) media in a typical week from time spent in the previous week, as the former type of survey questions seemed to be more accurate than the latter (e.g., Araujo et al., 2017).
Compared to studies among adults, findings of validity studies among adolescents were even more inconclusive, with correlations of retrospective surveys of internet and phone use with digital trace data ranging from r = .10 (Inyang et al., 2009) to r = .77 (Goedhart et al., 2015). The accuracy of adolescents' survey reports of internet and phone use in these studies was low, with frequent overestimation of these uses (Aydin et al., 2011;Goedhart et al., 2018;Goedhart et al., 2015;Inyang et al., 2009;Marciano & Camerini, 2020;Mireku et al., 2018). Finally, adolescents were more likely to overestimate their time spent on the internet than adults (Jürgens et al., 2019;Scharkow, 2016).
One reason for the relatively low accuracy of retrospective survey estimates of adolescents' media use may be that these estimates may be prone to systematic biases that are particularly common in adolescence, such as recall bias (Marciano & Camerini, 2020). Another reason may be that popular social media among adolescents (e.g., Snapchat or WhatsApp) are typically used in rapid and fragmented ways throughout the day, rendering it difficult to correctly estimate social media use (Griffioen et al., 2020;Underwood et al., 2018). By keeping the time interval over which adolescents have to report short (e.g., 1 hour), recall bias can be reduced.
Researchers therefore believe that ESM is likely to yield more accurate and valid estimates of time spent on social media than retrospective surveys (Naab et al., 2018;Underwood et al., 2018).
Despite the fact that there have been multiple calls for ESM studies on social media effects, evidence of the accuracy and convergent validity of ESM estimates of media use is scarce. So far, only one study has investigated the accuracy of ESM estimates of time spent on social media in an adult sample, by making a comparison with digital trace data. This study found that adults overestimated their time on social media via ESM compared to digital trace methods (Deng et al., 2019). In contrast, a study that compared ESM and digital trace data of people's time spent on their phone showed that people underestimated their phone use since the last notification (Van Berkel et al., 2018). Studies that compared ESM with retrospective surveys have the potential to offer insight into how the accuracy and convergent validity of ESM and retrospective surveys relate to each other (Moreno et al., 2012;Naab et al., 2018). These studies showed that adults' ESM reports contained lower estimates of the duration of one social media episode (Naab et al., 2018) and daily internet use (Moreno et al., 2012) than retrospective surveys. In addition, these studies yielded correlations between r = .08 to r = .54 between these ESM reports and retrospective survey measures (Moreno et al., 2012;Naab et al., 2018). To gain more insight into the accuracy and convergent validity of ESM reports of adolescents' time spent on social media, as well as retrospective survey estimates of time spent on social media in a typical week and in the previous week, we formulated the following research questions: RQ1: How accurate are ESM and retrospective survey estimates in adolescence? In other words, compared to adolescents' digital trace data, what are the mean differences in the time adolescents spend using social media measured through retrospective surveys (RQ1a) and measured through ESM (RQ1b)?
RQ2: What is the (between-person) convergent validity of ESM and retrospective survey estimates in adolescence? In other words, how strongly are adolescents' retrospective survey, ESM, and digital trace data of the time they spend using social media correlated?

A Micro-Longitudinal Perspective on Accuracy and Convergent Validity
So far, most researchers have examined convergent validity by examining the correlation of self-reported measures with digital trace data across people at a single moment in time.
Beyond investigating this between-person convergent validity, it is also important to examine within-person convergent validity. Several scholars have recently called for media effects research that disentangles within-person and between-person associations Prinstein et al., 2020;Whitlock & Masur, 2019). Between-person associations and within-person associations focus on different types of research questions and each provide unique insights into social media use (Coyne et al., 2020;Valkenburg et al., 2021).
Between-person associations inform us about how, compared to their peers, adolescents' average social media use is related to other constructs on an average level. Within-person associations inform us about how adolescents' momentary social media use co-fluctuates with their momentary scores on other constructs. To draw adequate conclusions from within-person effects of time spent on social media, it is essential that we obtain insight in the within-person convergent validity of social media use measures. Therefore, we investigated the following research question: RQ3: What is the within-person convergent validity of the ESM estimates? In other words, how strongly are adolescents' ESM and digital trace data of the time they spend using social media correlated within adolescents across multiple moments in time?
A second important, timely, and as yet never investigated question related to the use of longitudinal research designs is how both the accuracy and convergent validity of retrospective survey and ESM estimates of the time adolescents spend on social media change over time. After all, if findings point at longitudinal trends in the data, it is important to preclude the possibility that these effects are due to changes in the accuracy and convergent validity of social media use measures across time. Naab et al. (2018) argued that self-reports in longitudinal studies depend on adolescents' ability and willingness to report on their social media use, which could be subject to change over the course of a study. There are two competing hypotheses about these possible changes in the accuracy and convergent validity of social media use measures. On the one hand, the learning effect hypothesis (Payne & Wenger, 1996) proposes that adolescents become more conscious of the time they spend on social media due to responding multiple times to the same question so the accuracy and convergent validity of their estimates enhances over time.
Accordingly, we hypothesize: H1 (learning effect hypothesis): The accuracy (H1a), between-person convergent validity (H1b), and within-person convergent validity (H1c) of adolescents' retrospective survey and ESM estimates of the time they spend using social media will increase over time.
On the other hand, the fatigue effect hypothesis (Reynolds et al., 2016;Savage & Waldman, 2008) assumes that adolescents get bored of repeatedly responding to the same question over and over again, which reduces the accuracy of social media measures over time. As an alternative hypothesis, we therefore expect that: H2 (fatigue effect hypothesis): The accuracy (H2a), between-person convergent validity (H2b), and within-person convergent validity (H2c) of adolescents' retrospective survey and ESM estimates of the time they spend using social media will decrease over time.

Differences in Accuracy and Convergent Validity Between Platforms
Several studies suggest that the accuracy and convergent validity of self-reports vary across different social media platforms. For example, Naab et al. (2018) found larger mean differences between retrospective survey and ESM estimates of time spent with WhatsApp than of time spent with Facebook (Naab et al., 2018). In addition, studies that examined the convergent validity of retrospective surveys with either digital trace data (Burke et al., 2010;Ernala et al., 2020;Junco, 2013) or ESM data (Naab et al., 2018) reported a correlation of r = .08 for WhatsApp (Naab et al., 2018), correlations between r = .24 and r = .59 for Facebook (Burke et al., 2010;Ernala et al., 2020;Junco, 2013), and correlations of r = .52 and r = .87 for YouTube and Twitter (Junco, 2013;Naab et al., 2018). Together, these studies suggest that the accuracy and convergent validity of social media measures are stronger for platforms that are used in a less fragmented way (e.g., Instagram) than those that are used in a more fragmented way (e.g., WhatsApp and Snapchat). To explore these possible differences in accuracy and convergent validity of retrospective surveys and ESM measures between social media platforms, we investigated the following research question: RQ4: Are there differences in accuracy and convergent validity of retrospective surveys and ESM between adolescents' time spent on Instagram, Snapchat, and WhatsApp? In other words, are there any differences between social media platforms in accuracy (RQ4a), betweenperson (RQ4b), and within-person (RQ4c) convergent validity of retrospective surveys and ESM?

Method Participants
This preregistered study is part of a larger study (https://osf.io/327cx) that investigates adolescents' social media use and psychosocial functioning, using a measurement burst design.
Based on a priori power analyses for our overall project (see https://osf.io/tk8pw), we included 388 participants in the project. Of these 388 participants, 300 Instagram, WhatsApp, or Snapchat users also participated in the second three-week ESM study that was part of this larger project.
The second ESM wave started on 3 June 2020, which coincidentally happened to be the day that the mandated school closures due to COVID-19 in the Netherlands ended after 2.5 months. As tracking software could only track Android phones, the potential sample of 300 adolescents was reduced to 171 Android users, of whom 131 (44%) provided active consent to track their app usage and had their app usage continuously tracked throughout the 21-day ESM period. Of these 131 participants, 125 (42%) also participated in the second ESM wave. The final sample of this study therefore consisted of 125 middle adolescents (Mage = 14.1 years, SDage = .72, 48% girls) of whom 98.5% identified themselves as Dutch. The educational levels of our sample were representative of the south of Netherlands: 38.4% were enrolled in the prevocational secondary education track, 32.8% in the intermediate general secondary education track, and 28.8% in the academic preparatory education track.

Procedure
The study was approved by the Ethics Review Board of [masked for review] and was performed in accordance with the guidelines formulated by the Ethics Review Board. We recruited participants through a secondary school in the south of Netherlands. At the start of the larger project, researchers informed the school, parents, and the participants of the aim and procedure of the study. Both parents and participants were informed that adolescents' responses would be treated confidentially and were asked to provide active consent. The present study relies on four parts of the larger project. A detailed timeline can be found on OSF (https://osf.io/fb945).
Part 1 contained a 30-minute pre-ESM survey administered via Qualtrics. Participants were asked to complete the survey online at home on a computer or tablet. This survey contained (amongst others) questions about demographic characteristics and adolescents' typical time spent on Instagram, WhatsApp and Snapchat. Participants received €5 upon completion of this survey.
Part 2 contained a three-week ESM study. Right after participants completed the pre-ESM survey via Qualtrics, they received online instructions about how to install the ESM application Ethica (Ethica Data, 2020a) on their own cell phone. They were asked, through the Ethica app, to indicate which social media platforms (i.e., Instagram [106 adolescents], WhatsApp [123 adolescents], and Snapchat [89 adolescents]) they used more than once per week. If they indicated that they used a platform more than once per week, we asked them to report on their use of that platform in all subsequent ESM assessments. If participants used any of these platforms less frequently, we asked questions about other platforms (i.e., YouTube, gaming) or activities so that each participant received the same number of questions in the ESM study.
The ESM study started 2 weeks after completion of the pre-ESM survey. The Ethica app installed on their smartphones was programmed to generate six notifications per day for a period of three weeks (i.e., a total of 126 ESM surveys; for more information see our notification scheme at https://osf.io/tbdjq) Each survey contained questions about participants' psychosocial functioning and their social media use and took about two minutes to complete. Participants received €0,30 for completing an ESM survey and €0,50 for completing the final ESM survey of the day. At the start of each day, participants who completed all 6 surveys on the previous day were entered into a lottery, in which four participants could win €25. Of the 15,750 surveys received (in total 15,750 were sent but 6 were not received due to unforeseen technical errors), participants (partially) completed 10,591 surveys (net compliance of 67%). On average, participants completed 84.73 ESM surveys (SD = 32.19; range 6-123; median = 96). More details on the procedure of the study can be found on OSF (https://osf.io/327cx).
Part 3 contained the surveys of retrospective social media use in the previous week. At the last ESM assessment of each week, participants received (through the Ethica app) three additional questions about the time they had spent with Instagram, WhatsApp, and Snapchat in the previous week. If participants did not respond to these weekly questions about their time spent on social media within one and a half hour, we sent a reminder and sent the questionnaire again on the subsequent days. Of the 125 participants with digital trace data, 6 completed two surveys (5%), 117 completed all three surveys (94%) and 2 (2%) did not complete any survey.
In Part 4, we asked participants to install the Ethica App Usage Stream (Ethica Data, 2020b) application on their phone, which tracked their app usage (i.e., type of app and duration of use) during the three-week ESM period. In addition to tracking participants' app usage, we also collected screen state data of these Android users through the Ethica app. This allowed us to check if participants' screens were turned on or off (and at what time).

Measures of Social Media Use
Retrospective surveys: Time spent on social media in a typical week. We measured typical weekly time spent on social media using direct estimates that assessed the frequency and duration of adolescents' social media use. First, adolescents were asked to indicate (by three separate questions) how many days (0-7 days) in a typical week they use Instagram, Snapchat, and WhatsApp on their phone. Next, if adolescents indicated that they used a platform more than one day per week, they were asked to indicate how many hours (0-24 hours) and minutes (0-59 minutes) on these days they used the respective platform on their phone (e.g., "On the days that you use Instagram, how much time do you approximately spend on Instagram via your phone?").
The variable time spent on social media in a typical week was calculated by multiplying the number of days on which adolescents typically use a specific platform by the total number of minutes they used these platforms on these days.
Retrospective surveys: Time spent on social media in the previous week. At the end of each of the three ESM weeks, we asked adolescents to report their time spent with Instagram, WhatsApp and Snapchat in the previous week. Again, we used direct estimates that assessed the frequency and duration of adolescents' social media use. First, we asked them how many days (0-7 days) in the previous week they used each of the three social media platforms. Next, if adolescents stated that they used a platform more than one day per week, they were asked to report the number of hours (0-24 hours) and minutes (0-59 minutes) they spent on these respective platforms in the previous week. The variable time spent on social media in the previous week was created by multiplying the number of days on which adolescents used a platform in the previous week by the number of minutes they used these platforms on these days. This resulted in 3 estimates, one for each ESM week, that were averaged. Experience sampling method. Adolescents' ESM estimates of their social media use were obtained by three questions per ESM assessment, in which adolescents were asked to estimate the time spent using Instagram, WhatsApp, and Snapchat in the previous hour. Response options ranged from 0 to 60 minutes on a horizontal slider, with 1-minute intervals. For the between-person analyses, we calculated adolescents' average time spent using Instagram, WhatsApp, and Snapchat (in minutes per hour) by averaging the estimates for each week of the ESM and across the three-week ESM period. For the within-person analyses, we used adolescents' raw ESM estimates per social media platform.
Digital trace data. Adolescents' use of Instagram, WhatsApp, and Snapchat was tracked continuously during the three-week ESM period by the Ethica App Usage Stream application.
Every five minutes, this application retrieved the Android log data on adolescents' personal devices. This data represented the foreground time of all applications, including Instagram, WhatsApp, and Snapchat, which could be defined as the usage of the applications when the adolescents' phone was unlocked. We also measured adolescents' screen state data throughout the day. This time-stamped data showed us when adolescents had their phone screen turned on or off. To control for the possibility that adolescents' phones still recorded app usage when apps were running in the background while their phone screen was turned off, we excluded records of app use when adolescents' screen was turned off (i.e., roughly 2% of app usage estimates). More details about the cleaning process of the digital trace data can be found on OSF (https://osf.io/jkre2).
We calculated between-person indices of adolescents' time spent on WhatsApp, Instagram, and Snapchat according to their digital trace data by aggregating their scores per week of the study as well as across the three-week study period. We calculated within-person indices of adolescents' time spent on social media according to their digital trace data by computing their total time spent on Instagram, WhatsApp, and Snapchat in the hours corresponding to their respective ESM schedules.

Statistical Analyses
The research questions and hypotheses were investigated using R (version 3.6.1; R Core Team, 2017) according to the preregistered analysis plan (https://osf.io/j8mzq). Unless indicated otherwise, we exactly followed the preregistration. We included all available data of each participant in our statistical analyses.
To assess the accuracy of the self-report measures of time spent on social media (RQ1), we compared adolescents' digital trace data with their reported average time spent on social media through retrospective surveys in a typical week, retrospective surveys of the previous week, and ESM. We investigated total time spent on social media by calculating the sum of adolescents' time spent on Instagram, WhatsApp, and Snapchat per measurement method. To determine the accuracy of each self-reported measure, we created three types of indices. First, we computed a difference score between the digital trace data and each self-report measure, by subtracting adolescents' respective digital trace data from their self-report estimates. Second, we calculated average overestimation, by subtracting self-reported estimates from the digital trace data, while setting scores from adolescents who correctly estimated or underestimated their time spent on social media to zero. Third, in a similar fashion, we calculated average underestimation by subtracting the digital trace data estimates from the self-reported estimates, while setting scores from adolescents who correctly estimated or overestimated their time spent on social media to 0. In the calculation of the average over/underestimation, we excluded adolescents with a score of zero.
The difference scores were used to examine possible between-platform differences in accuracy (RQ4a) and the learning (H1a) and fatigue (H2a) effect hypotheses. With regard to the platform differences, we compared the differences scores of Instagram, WhatsApp and Snapchat using paired sample t-tests through the function "t.test" of the Stats package in R. Platforms with a smaller difference scores between retrospective survey or ESM estimates and digital trace data were more accurate than platforms with a larger difference score. We also conducted a series of ttests to examine the learning and fatigue effect hypotheses by comparing the differences scores of the first, second and third week of the ESM. Specifically, we defined support for a learning effect (H1a) when the accuracy significantly improved (i.e., difference score decreased) between (1) the first and second week of the ESM, (2) the second and third week of the ESM, or the (3) first and third week of the ESM. Likewise, the fatigue effect hypothesis (H2a) was supported when the accuracy decreased over time (i.e., difference score increased).
We also calculated Cohen's d to compare the magnitude of the accuracy of retrospective surveys with the accuracy of ESM estimates of adolescents' time spent on social media, since they were measured on different time scales (i.e., hours per week vs. minutes per hour). We To investigate the between-person convergent validity (RQ2), we calculated Pearson's bivariate correlations between self-report estimates and digital trace data using the "corr.test" function of the Psych package in R (Revelle, 2020). We investigated the within-person convergent validity for ESM (RQ3) with the "statsBy" function of the Psych package in R (Revelle, 2020). To explore if there were any platform differences in the strength of these between-person (RQ4b) and within-person (RQ4c) correlations, we conducted a test for comparison of dependent groups with nonoverlapping correlations, using the function "cocor.dep.groups.nonoverlap" of the Cocor package in R (Diedenhofen & Musch, 2015). This function allowed us to test whether the correlations for Instagram, WhatsApp and Snapchat were significantly different from each other. We also used this package to test the learning effect (H1b/H1c) and fatigue effect (H2b/H2c) hypotheses regarding the convergent validity. We tested the learning effect hypotheses (H1b/H1c) by examining whether the convergent validity improved over time. Specifically, the learning effect hypothesis was supported when the strength of the correlations between self-reported estimates and digital trace data was significantly higher (1) in the first than in the second week of the ESM, (2) in the second than in the third week of the ESM, or (3) in the first than in the third week of the ESM. In contrast, the fatigue effect hypothesis (H2b/H2c) was supported when the strength of the correlations between self-reported estimates and digital trace data decreased over time.
Like Parry et al. (2020), we followed the guidelines set by Carlson and Herdman (2012) when drawing conclusions about the convergent validity of the self-report measures of time spent on social media. In line with Carlson and Herdman (2012), we set the threshold for minimum acceptable convergent validity at r = .50.
We made two deviations from our preregistered plan. First, to reduce the chance of false positives (i.e., incorrectly rejecting the null hypothesis), alpha (α) levels were corrected for multiple comparisons by applying a Bonferroni correction. We reported the adjusted alpha levels in the respective tables and figures. Second, we checked the key variables for skewness and kurtosis. Since the distribution of our key variables was highly skewed to the right (see OSF for histograms, https://osf.io/svuy8), we followed Sewall et al. (2020) and Vanden Abeele et al.
(2013) and analyzed our data based on the log-transformed variables. Moreover, to minimize the effect of outliers, we winsorized the most extreme values by replacing them with scores two standard deviations below or above the mean. Findings based on the untransformed unwinsorized variables are provided in an online OSF supplement (https://osf.io/239mt). The descriptive statistics and difference scores were based on the untransformed winsorized variables. Since the difference scores were normally distributed, they were not log-transformed.

Sensitivity power analysis.
We conducted a sensitivity power analyses based on the 125 adolescents with digital trace data. This sensitivity power analyses showed that with an α (twotailed) of .05 and power of .80, we could reliably detect correlations as small as r = .25, 95% CI [-.18, .18]. With a more stringent alpha level of α = .001 (two-tailed), we could detect a correlation of r = .36, 95% CI [-.29, .29]. In the preregistration of our paper (https://osf.io/j8mzq), we set the minimum acceptable convergent validity at r = .50 (based on Carlson & Herdman, 2012). Therefore, with 125 participants, our study had enough power to examine the convergent validity of our self-report measures.
Data and material availability. The preregistration of the hypotheses, design, sampling and analysis plan, and the analysis scripts used for this paper are available online on OSF (https://osf.io/sp3wf). The anonymous dataset will be published on Figshare upon publication. Table 1  We explored if there were any differences between social media platforms in the accuracy of retrospective surveys and ESM (RQ4a; see Table 1) by comparing the difference scores between platforms. With regard to estimates of social media use in a typical week, no significant differences between platforms were found. However, for retrospective survey estimates of social media use in the previous week, we did find significant differences between platforms.

The Accuracy of Social Media Use Measures
Adolescents' retrospective estimates of their time spent on Instagram (d = .19) and WhatsApp (d = .45) in the previous week were more accurate than their estimates of time spent on Snapchat (d = .66), as they yielded smaller differences with digital trace data. For ESM, adolescents were more accurate in estimating their time spent on Instagram (d = .67) than their time spent on WhatsApp (d = .96) and Snapchat (d = 1.23). Overall, these findings suggest that adolescents' retrospective survey and ESM estimates of their time spent on Instagram in the previous week were most accurate.

Fatigue versus learning effects in accuracy.
In order to test the learning (H1a) versus fatigue (H2a) effect hypotheses, we investigated whether adolescents' estimates of their social media use became less or more accurate over time (see Figure 1). For the accuracy of ESM and retrospective surveys of adolescents' total time spent on social media in the previous week, we did not find evidence for either a learning or fatigue effect. With regard to the fatigue effects per platform, we only found a fatigue effect for ESM estimates of Snapchat use (see Figure 1).

The Convergent Validity of Social Media Use Measures
To investigate the convergent validity of the self-report measures of time spent on social media, we calculated the between-person (RQ2) and within-person (RQ3) associations of the self-report estimates with digital trace data across the three-week ESM period (see Table 3).
Overall, the between-person convergent validity for all three types of measures ranged from r = .55 to r = .65. The within-person convergent validity of ESM was r = .32.
We also tested whether the between-person (RQ4b) and within-person (RQ4c) associations of retrospective surveys and ESM estimates with digital trace data differed between social media platforms. As Table 3 shows, the between-person convergent validity of retrospective surveys as well as ESM estimates was highest for Snapchat (r = .55 to r = .64).
Specific between-platform differences are presented in Table 3.

Fatigue versus learning effects in convergent validity.
To test our learning (H1b/H1c) and fatigue (H2b/H2c) effect hypotheses regarding the convergent validity of self-report measures, we examined whether the between-person associations and within-person associations of the self-report estimates with the digital trace data of time spent on social media became more or less strong over time (see Figure 2). With regard to the between-person convergent validity of the retrospective surveys, we did not find evidence for a learning or a fatigue effect. For the between-and within-person convergent validity of ESM, we found evidence for fatigue effects.
With regard to the fatigue effects per platform, we only found fatigue effects for Instagram (see Figure 2).

Sensitivity Analyses
As we were only able to track adolescents' app usage on their mobile phone, a preregistered sensitivity analysis was conducted to investigate whether the results would be affected after excluding adolescents who also used social media on other devices, such as their tablet computer or laptop. We therefore reconducted the main analyses by excluding adolescents who also used Instagram, WhatsApp, and Snapchat on other devices. This exclusion pertained to 15 adolescents for Instagram, 19 adolescents for WhatsApp, and 13 adolescents for Snapchat, and 25 adolescents for total time spent on social media. Overall, the models in which these adolescents were excluded showed that adolescents' estimates were slightly more accurate (difference scores decreased with roughly 1 hour for retrospective surveys, and no difference for ESM; see online supplement 1 on OSF, https://osf.io/yp5vj) and had a somewhat stronger convergent validity (differences between the models in Pearson's r's ranged from .02 to .03 for total time spent on social media; see online supplement 2 on OSF, https://osf.io/728qs).

Discussion
There is much academic and public debate about the effects of social media use on adolescents' psychosocial functioning. However, before we can establish whether (and how) adolescents are affected by their social media use, we need to know if the self-reports used to measure their time spent on social media are accurate (i.e., is there systematic under-or overestimation?) and valid (i.e., does the measurement method capture the construct it intends to measure?). In a sample of 125 adolescents with Android smartphones, we investigated the accuracy and convergent validity of their estimates of the time spent on social media according to retrospective surveys and ESM. We found that (1) adolescents overestimated their time spent on social media on all self-report measures; (2) their retrospective survey (previous week) and ESM estimates were more accurate for Instagram than for Snapchat and WhatsApp; (3) the between-person convergent validity of the various self-report measures reached the threshold for minimum acceptable convergent validity; (4) the within-person convergent validity for ESM was unacceptable; and (5) the convergent validity of the ESM estimates decreased over time (fatigue effect).

The Accuracy of Social Media Use Measures
Previous studies investigating the accuracy of time spent on social media for various selfreport measures (i.e., retrospective surveys or ESM) found that adults overestimate their time spent on social media by about 6 hours per week . The current study extended these studies by investigating the accuracy of social media measures among adolescents. In line with findings from adult studies, we found that adolescents overestimated the time they spent on social media in both retrospective surveys (about 7 per week) and ESM (about 8 minutes per hour). Although on the higher end, this overestimation is comparable to the overestimation reported in adult samples , and disconfirms earlier findings that there is a difference between adolescents' and adults' accuracy in estimating their time spent on social media (Ernala et al., 2020). Our time estimates differed from that of Ernala et al. (2020) and Sewall et al. (2020) in that we asked adolescents firstly to estimate the number of days per week they used each social media platform, and secondly to estimate the number of hours and minutes they spent on each platform on these days, whereas Ernala et al. (2020) and Sewall et al. (2020) asked participants to estimate their average time spent on social media per day in the previous week. Asking participants to estimate their time spent on social media in a stepwise manner may reduce the cognitive load on adolescents, which may in turn enhance the accuracy of their estimates.
Since adolescents often use multiple social media platforms simultaneously (van Driel et al., 2019), it is important to investigate and compare the accuracy of time spent on different social media platforms. For retrospective survey (previous week) and ESM estimates, we found that adolescents were more accurate in estimating the time they spent on Instagram than on WhatsApp and Snapchat. These findings are in line with Naab et al. (2018), who found less discrepancy between retrospective survey and ESM time estimates for Facebook than for WhatsApp. These platform differences could be due to the fact that Instagram involves less fragmented use than WhatsApp and Snapchat. Adolescents typically use Instagram less frequently than Snapchat or WhatsApp, but when they use Instagram, they use it for a longer time (van Driel et al., 2019). Consequently, it may be easier for adolescents to accurately remember their time spent on Instagram than their time spent on WhatsApp and Snapchat.
The overall pattern of overestimation of the time spent on social media is especially important for descriptive studies designed to assess the average time spent on social media measured through self-reports. Such research needs to take into account that adolescents' reports of their average time spent on social media are typically higher than their objective digital trace data suggest. However, in social science, researchers are particularly interested in the correlation of time spent on social media with different outcomes (e.g., well-being). Consequently, the accuracy (i.e., differences in mean levels between self-report estimates and digital trace data) of these time estimations could be argued to be less important than their convergent validity (i.e., the association of self-report estimates with digital trace data) (Scharkow, 2016).

The Convergent Validity of Social Media Use Measures
We further extended previous adult studies by investigating the convergent validity of social media use measures among adolescents. Although the results differed among platforms, we found between-person associations around r = .60 between adolescents' total time spent on social media according to digital trace data with (a) their retrospective time spent in a typical week, (b) their retrospective time spent in the previous week, and (c) their ESM assessments of their time spent in the previous hour. This level of convergent validity of our self-report measures is in line with Junco (2013) and Sewall et al. (2020). The associations reach the threshold set by Carlson and Herdman (2012) for minimum acceptable convergent validity (r > .50). Our study yielded a higher convergent validity than the study of Ernala et al. (2020) (r = .24) and Burke et al. (2010) (r = .45), which used a different approach to measure adolescents' time spent on social media than the present study. Specifically, in the present study we measured adolescents' retrospective time spent on social media in a stepwise manner, whereas Ernala et al. (2020) and Burke et al.
Longitudinal research designs allow researchers to investigate the effects of social media on a between-person and a within-person level (e.g., Coyne et al., 2020;Valkenburg et al., 2021).
Whereas the effects of social media use on adolescents' psychosocial functioning used to be predominantly investigated on a between-person level, the number of studies that investigate the within-person effect of time spent on social media on psychosocial functioning is currently rapidly increasing (e.g., Beyens et al., 2020;. To accurately interpret such within-person effects, assessing the convergent validity of social media use measures on a withinperson level is essential. With regard to the within-person convergent validity, we found associations between adolescents' ESM estimates of time spent on social media and digital trace data around r = .30. These associations were considerably lower than the between-person correlation of ESM estimates with digital trace data (r = .55) and the within-person convergent validity could be considered as unacceptable according to the criteria of (Carlson & Herdman, 2012).
An explanation for the difference in between-and within-person convergent validity of ESM measures of time spent with social media may be that adolescents sometimes use social media in a subconscious way (e.g., when waiting for a bus), making it difficult to remember their exact time spent on social media in the previous hour (Griffioen et al., 2020;Heitmayer & Lahlou, 2020). Subsequently, adolescents may underestimate their social media use at certain measurement occasions, overestimate it at other occasions, and correctly estimate it at yet other occasions. Such momentary differences in accuracy could unfold in unacceptable within-person convergent validity. However, under-and overestimations may cancel each other out when computing the average person-mean scores across occasions, resulting in a relatively high between-person convergent validity. The unacceptable within-person convergent validity of ESM estimates implies that researchers should consider including additional measurements (e.g., digital trace data) when focusing on within-person effects of time spent on social media.

Comparing the Accuracy and Convergent Validity of Social Media Use Measures
Although we expected that the accuracy and between-person convergent validity would be higher for adolescents' ESM estimates of their time spent on social media than for their retrospective surveys due to a reduction of recall bias (Naab et al., 2018;Underwood et al., 2018), we found that ESM estimates were less accurate than retrospective survey estimates, and that the convergent validity of retrospective surveys and ESM was more or less equivalent. When planning a study, researchers should therefore consider whether their research questions justify the potential physical burden (e.g., the high number of questionnaires) of an ESM design, given that our ESM estimates did not lead to a higher accuracy or convergent validity of adolescents' average time estimates across the three-week ESM period than our retrospective surveys did.
However, this does not imply that retrospective survey designs should be preferred over ESM designs by default. After all, certain research questions, such as the examination of dynamic process over short(er) timeframes (i.e., to study within-person processes) can only be answered through digital trace or ESM data.
The accuracy and convergent validity of retrospective survey and ESM estimates of time spent on social media also differed between platforms. Notably, Snapchat yielded the lowest accuracy, but the highest (between-person) convergent validity. The strong ephemeral nature of Snapchat might explain this discrepancy. Snapchat involves the sharing of photos and videos which expire after viewing (e.g., after 10 seconds). Since the photos or videos are often highly contextualized and made in-the-moment, adolescents might find it more difficult to remember the exact time they spent on Snapchat, resulting in systematic overestimation and a low accuracy among the majority of adolescents (Bayer et al., 2016;Vaterlaus et al., 2016). Although adolescents' may be inaccurate in estimating their absolute amount of time spent on social media, their estimated time spent on social media may still be an accurate reflection of whether they spent relatively much or little time on Snapchat as compared to their peers, given that almost all adolescents' systematically overestimated their time spent on social media. Subsequently, adolescents who spent, as compared to their peers, more time on Snapchat according to retrospective survey or ESM estimates, therefore also spent more time on Snapchat according to digital trace data, resulting in a high between-person validity.

Fatigue Versus Learning Effects of Accuracy and Convergent Validity
Our study found that the convergent validity of ESM estimates of time spent on social media decreased over time, supporting a fatigue effect. A possible explanation for this fatigue effect may lie in changes in the perceived burden of filling out ESM surveys among adolescents.
Answering the same questions six times per day may not burden adolescents in the first week of the study, but may do so in the second or final week of the study (Reynolds et al., 2016;Savage & Waldman, 2008), leading to more incorrect time estimations and, thus, lower convergent validity. When designing an ESM study, it is therefore important to reflect on the maximum duration of the study and the number of measurement occasions that are necessary to investigate the study's research questions or hypotheses (Eisele et al., 2020).

Strengths, Limitations, and Future Research
This study has several strengths. First, our intensive longitudinal design allowed us to disentangle the between-person and the within-person convergent validity of adolescents' time spent on social media according to ESM. Second, we were able to compare the accuracy and convergent validity of time spent on social media for three different platforms (i.e., Instagram, WhatsApp, and Snapchat). Third, our data allowed us to compare the accuracy and convergent validity of both adolescents' ESM time estimates and retrospective survey time estimates within a single study across a three-week time span.
This study also has some limitations. When digital trace methods gained prominence in media research, researchers initially viewed people's digital trace data as the 'gold standard' against which retrospective self-reports of phone, internet, and social media use could be evaluated (e.g., Araujo et al., 2017;Tokola et al., 2008). However, more recent studies suggest that digital trace methods also have their own limitations, most notably technical errors (e.g., crashes and bugs) and their erroneous tracing of social media apps running in the background (Deng et al., 2019). Another limitation is that the Ethica Data app we used to collect digital trace data only enables tracing of adolescents' app usage on Android phones and not on iPhones.
A final limitation of digital trace data is that many platforms also have a web-based application, which can be used without accessing the phone-based apps. These alternative access possibilities to platforms may endanger the accuracy and convergent validity of phone-based estimates of time spent on these platforms. Fortunately, in our study we were able to control for this possible confound with our sensitivity analyses, in which we compared the estimates of adolescents who used only the phone-based apps with those who used both the phone-based and web-based versions of the platforms. We found slightly more accurate and valid estimates for adolescents who only used social media via their phones, but these estimates were not strikingly different for adolescents who used both the phone-and web-based applications. Future studies should extend our study by comparing different sources of objective digital trace data with each other (e.g., iOS screen time data or social media archival data donations) (Boeschoten et al., 2020;Ohme et al., 2020).
In future studies, researchers should control for possible unacceptable convergent validity of measures of time spent on social media by including both self-report and digital trace measures in their study (as suggested by, Carlson & Herdman, 2012). Including digital trace data is especially important when studies aim to investigate the within-person effects of social media use on adolescents. If including multiple measures is financially difficult or otherwise impossible, researchers may need to consider what measures fit best with their research questions. For instance, self-report data of time spent on social media are more in line with the theoretical notion of perceived time spent on social media rather than actual time spent on social media . If researchers are more interested in perceived rather than actual use, self-report data are more in line with the goal of the study than digital trace data. More specifically, it has been argued that ESM is especially important to assess the cognitive and emotional aspects (e.g., how does someone interpret the valence of received feedback on social media) of adolescents' social media use (Griffioen et al., 2020). Finally, another important avenue for future research is to investigate how subjective (e.g., self-report measures) and objective data (e.g., digital trace data) agree or disagree in their prediction of various cognitive, affective, and behavioral outcomes of social media use (Carlson & Herdman, 2012).
In conclusion, our study found that adolescents overestimated their time spent on social media according to retrospective survey and ESM estimates. The between-person convergent validity reached the threshold for minimum acceptable convergent validity, whereas the withinperson convergent validity of ESM was unacceptable. Although it has been suggested that ESM estimates are more accurate and valid than retrospective surveys to measure people's time spent on social media because it reduces people's recall bias (Naab et al., 2018;Underwood et al., 2018), we did not find evidence for this claim. Consequently, researchers should consider combining subjective ESM estimates with more objective digital trace data sources of time spent on social media to obtain a true understanding of how social media use affects adolescents' psychosocial functioning.  Table 1 The Accuracy of Retrospective Survey Estimates (Typical & Previous Week)

and ESM Estimates of Time Spent Using Social Media
Note. RS = Retrospective surveys; ESM = Experience Sampling Methodology; d = Cohen's d; Mdiff = Average difference score between each self-report measure and digital trace data; Moverestimation = Average overestimation: differences score of <= 0 are excluded; Munderestimation = Average underestimation: differences score of >= 0 are excluded. Total Social Media is the sum scores across the three platforms. Means are averaged across the three-week ESM wave. A difference score of 0 indicates perfect accuracy; a difference score higher than 0 indicates overestimation; a difference score lower than 0 indicates underestimation. Mean difference scores within columns that do not share the same superscript are significantly different between platforms in a t-test (p < .004; α corrected for 12 tests). Means and standard deviations represent the untransformed winsorized scores. T-tests and effect sizes are based on the logtransformed winsorized values.  Note. Total Social Media is the sum across the three platforms. Depicted means are the mean difference scores (i.e., difference score between each self-report measure and digital trace data) per week of the ESM period. A difference score of 0 indicates perfect accuracy; a difference score higher than 0 indicates overestimation; a difference score lower than 0 indicates underestimation. Per platform and self-report measure, means were calculated based on a subsample of adolescents who estimated their social media use during all three weeks of the study period. Means were calculated on the untransformed winsorized scores. * p < .002 in a t-test (α corrected for 24 tests). .57 *** .64 ***a .55 ***a .25 *** Total Social Media .59 *** .65 *** .55 *** .32 *** Note. RS = Retrospective surveys; ESM = Experience Sampling Methodology. Total Social Media is the sum across the three platforms. Correlations within columns that do not share an identical superscript are statistically different in a z-test (p < .004; α corrected for 12 tests). Between-person correlations are based on person-mean scores aggregated across the three-week ESM period. Correlations are based on the log-transformed winsorized values. * = p < .05; ** p < .01; *** p < .001. Correlation between the self-report measure and digital trace data in the first, second, or third week of the study. Total Social Media is the sum across three platforms. Per platform and self-report measure, correlations with digital trace data were calculated based on a subsample of adolescents who estimated their social media use during all three weeks of the study period. Correlations are based on the log-transformed winsorized values. * p < .001 in a z-test (α corrected for 36 tests).