A win-win situation: Does familiarity with a social robot modulate feedback monitoring and learning?

Social species rely on the ability to modulate feedback-monitoring in social contexts to adjust one’s actions and obtain desired outcomes. When being awarded positive outcomes during a gambling task, feedback-monitoring is attenuated when strangers are rewarded, as less value is assigned to the awarded outcome. This difference in feedback-monitoring can be indexed by an event-related potential (ERP) component known as the Reward Positivity (RewP), whose amplitude is enhanced when receiving positive feedback. While the degree of familiarity influences the RewP, little is known about how the RewP and reinforcement learning are affected when gambling on behalf of familiar versus nonfamiliar agents, such as robots. This question becomes increasingly important given that robots may be used as teachers and/or social companions in the near future, with whom children and adults will interact with for short or long periods of time. In the present study, we examined whether feedback-monitoring when gambling on behalf of oneself compared with a robot is impacted by whether participants have familiarized themselves with the robot before the task. We expected enhanced RewP amplitude for self versus other for those who did not familiarize with the robot and that self–other differences in the RewP would be attenuated for those who familiarized with the robot. Instead, we observed that the RewP was larger when familiarization with the robot occurred, which corresponded to overall worse learning outcomes. We additionally observed an enhanced P3 effect for the high-familiarity condition, which suggests an increased motivation to reward. These findings suggest that familiarization with robots may cause a positive motivational effect, which positively affects RewP amplitudes, but interferes with learning.


Introduction
As social creatures, a considerable part of our lives revolves around interactions with other humans. However, due to the increased availability of artificially intelligent agents in modern societies, our future social interactions will likely expand to nonhuman agents, such as virtual avatars or robots (Wiese et al., 2017). In fact, robots have already been implemented as social assistants for elderly care to increase emotional comfort (Birks et al., 2016;Tapus et al., 2007), in therapeutic settings with children with autism spectrum disorder to practice socialcognitive skills (Bekele et al., 2014;Warren et al., 2015), as well as in rehabilitation settings to improve sensorimotor skills (Basteris et al., 2014). Nonetheless, despite considerable progress in equipping artificial agents with social capabilities, they are still limited in their ability to interact with humans in a natural way (Wiese et al., 2017), and the public remains skeptical concerning the introduction of robot assistants to everyday life (Bartneck & Reichenbach, 2005). Specifically, the use Impact statement With an increase in the use of robots in our lives, it is important to understand how robots impact our cognition and behavior. The present study compares the dynamics of outcome monitoring when people gamble on behalf of themselves or on behalf of robots. We demonstrate that interacting with social robots can influence how individuals process rewards during a gambling task. of robots as teachers or companions for children has been discussed controversially due to concerns about privacy, reduced interest in human-human interaction due to attachment to robot companions, and negative impacts on learning and social development (Sharkey, 2016). While these concerns should be taken seriously, there is a lack of empirical studies systematically examining the impact of robots on socialc o g n i t i v e p r o c e s s i n g , d e v e l o p m e n t a n d / o r wellbeing. Because robots will employ social roles in society and share our environments with us in the future, it is essential to understand how to design them so that they can engage in social interactions by activating relevant social schemes, behaviors, and emotional reactions without negatively impacting human-human interactions and social-cognitive development.
A fundamental part of human interactions revolves around monitoring the behavior of others, and adjusting our behaviors accordingly to ensure a successful exchange of knowledge, affiliation, and support (Insel & Fernald, 2004). In order to be able to adapt our behavior to ever-changing environments and learn from previous experiences, we heavily rely on feedback from others to be able to tell the difference between behavior that is appropriate and behavior that is not. Receiving positive feedback (e.g., smile) positively reinforces a given behavior and increases the likelihood that it will be shown again in the future; negative feedback (e.g., frown) negatively reinforces a given behavior and decreases the likelihood that it will be shown again in future interactions (Krigolson et al., 2009). This suggests that how we learn in social contexts is a consequence of how we process (i.e., feedback monitoring) and are reinforced by (i.e., reinforcement learning) feedback in the presence of others (Hobson & Inzlicht, 2016). While the effect of human presence on feedback processing is relatively well understood, feedback processing in the presence of robot agents has not been examined. This question is of particular importance given that robots are already used (and will be even more so in the future; Rahwan et al., 2019) in educational and therapeutic settings where a reduced response to feedback could have measurable negative effects on learning outcomes. To address this issue, we examined how feedback is processed in the presence of robot agents and, in particular, whether the degree of experience with a robot, based on prior interactions, modulates reward-related processes.
Given the importance of reward processing for engaging in satisfying social interactions (Chevallier et al., 2012), as well as learning through social reinforcement in the presence of others (Krigolson et al., 2009), it is surprising that the impact of familiarity on reward-related processes has not yet been examined with social robots. As a social species, we monitor positive and negative outcomes and/or feedback from others to calibrate our behaviors so that the likelihood for desired outcomes is optimized (Holroyd & Coles, 2002). Feedbackbased adaptation can be examined electrophysiologically via an event-related potential (ERP) components, such as the Reward Positivity (RewP; Holroyd et al., 2008), a component that is sensitive to rewards and appears to provide a reinforcement learning signal (Holroyd & Coles, 2002). The RewP 1 peaks between 200 and 300 ms after the presentation of feedback and is believed to originate from the dorsal anterior cortex (dACC; Holroyd & Coles, 2002;Botvonick et al., 2004). One paradigm that has traditionally been used and adapted to examine feedback-monitoring is the gambling task (Gehring & Willouby, 2002). In one variation, participants are shown two differently colored squares (of which one is associated with a higher chance of winning) and are asked over a consecutive sequence of trials to pick the square that is associated with a higher chance of winning, followed by positive ("Win") or negative ("Lose") feedback. Previous studies using the gambling task have shown that greater RewP amplitudes are observed on "Win" versus "Lose" trials, as well as when gambling for oneself ("Self") versus another person ("Other"; e.g., Hassall et al., 2016;Krigolson et al., 2013). Specifically, when strangers are the recipients of winning outcomes, individuals experience attenuation of reward processing compared with when gambling for themselves, because they assign less value to the winning outcome (Hassall et al., 2016;Krigolson et al., 2013).
Interestingly, the brain's ability to monitor feedback in the presence of others is altered by the social context in which a task is performed, as well as the social identity of the other entity. For instance, feedback monitoring has been modulated when a person's performance has a direct impact on their partner's performance Koban et al., 2012), when the interaction is cooperative versus competitive Radke et al., 2011;Van Meel & Van Heijningen, 2010) or when the outcome of one partner's performance causes negative circumstances for the other partner, such as experiencing pain (Koban et al., 2013). Critically, previous studies have demonstrated that the mere presence of others during the delivery of rewarding feedback can modulate feedback-monitoring (Simon et al., 2014). The presence of a social in-group member compared with an out-group member can attenuate the self-other difference in reward processing (the motivation to win is higher when a similar other is present; Hobson & Inzlicht, 2016), and reward processing is enhanced when observing a friend versus a stranger perform a gambling task (Leng & Zhou, 2010). These findings indicate that the extent to which feedback is rewarding in social situations strongly depends on contextual factors related to the situation and/or the interaction partner.

Goal of study
Given the important role of feedback processing for learning during social interactions, as well as the need for robotic agents to be perceived as social interaction partners in the future, it is essential to understand to what extent humans employ feedback-monitoring processes during human-robot interaction. While reward processing in response to gambling outcomes seems to be influenced by the degree of familiarity between the player and the recipient of a reward in humanhuman interaction (Leng & Zhou, 2010), little is known about how the RewP is influenced in social contexts where humans familiarize themselves with social robots. In the present study, we used the gambling task to examine how participants process reward in the presence of a social robot and to what degree the magnitude of familiarity with the robot plays in reward processing. We used the socially evocative robot Cozmo, which allows participants to engage in social interaction games via a mobile app. During the interaction, Cozmo reacts by displaying positive/negative emotions when winning/losing against the participant. Cozmo was chosen for this experiment, because it was designed to be engaging and promote positive social interactions; it also is a commonly used robot platform when examining social cognition (e.g., empathy, emotion recognition, joint action, social learning, trust) in interactions with mechanistic agents (Chaudhury et al., 2020;Cross et al., 2019;Hinz et al., 2021;Lefkeli et al., 2020;Pelikan et al., 2020Pelikan et al., , 2020Zhou & Tian, 2020) For a tutorial on how to use Cozmo as a research platform in everyday interactions, see Chaudhury et al. (2020).
Because familiarity with social agents has been associated with enhanced reward valuation, we hypothesized that participants who familiarized with Cozmo before the task would value rewards for themselves similarly to Cozmo (i.e., no self-other difference in RewP for the high familiarity group), whereas participants who did not familiarize with Cozmo would value rewards that affect Cozmo to a lesser extent compared to rewards that affect themselves (i.e., a self-other difference in RewP for the low familiarity group).

Methods and Materials
Participants Forty participants were recruited from George Mason University's undergraduate population (mean age = 21.88, range = 18-55, 27 females) 2 in exchange for course credit.
Five participants were removed from data analysis due to not following the task protocol (n = 3) or technical difficulties with the robot (n = 2). Two participants had corrupted behavioral data files and were excluded from the behavioral analysis, but not the ERP analysis. 3 All subjects were right-handed, had normal or corrected-to-normal vision, and reported no known neurological deficits, drug intake, or color blindness. Subjects were pseudo-randomized into either the highfamiliarity condition (n = 17) or the low-familiarity condition (n = 18). All data handling and collection was in accordance with George Mason University's ethics board. Based on prior work in ERP methods and feedback monitoring (Hobson & Inzlicht, 2016), we ran a power analysis using G*Power for a Repeated measures mixed-ANOVA using medium effect size (f = 0.25), an alpha of 0.05, power set to 0.8. The power analysis suggested a total sample of 34 participants. All data and power analysis results were uploaded to OSF (EEG data were excluded due to file size restraints): osf.io/m685p/.

Apparatus
Participants interacted with a small tank Social Robot Cozmo (Anki, CA). The Cozmo robot comes with three interaction cubes that a participant can use to interact with Cozmo (e.g., tapping or moving the cubes). All interactions with Cozmo were preprogrammed using the Cozmo mobile app. A picture of the robot can be found on the OSF page. The gambling task was programmed and presented using the MATLAB programming environment (The Mathworks, Natick, MA), functions from Psychtoolbox (Brainard, 1997), as well as other custom scripts and functions. Behavioral analyses were examined using R (version 3.6) with the lme4 package (Bates et al., 2014).

Procedure
After providing consent, the researcher administered the Snellen and Rosenbaum visual acuity tests, as well as the Ishihara color blindness test. Participants were then fitted with an EEG cap while they completed the questionnaires. Participants were given a cover story and informed that the experiment is a collaboration effort between the Psychology department and the Engineering department. They were told that the Engineering department is trying to decide which of their robots to upgrade and that this experiment will help inform that decision. The level of familiarity with the robot was manipulated through prior interaction, such that one group of participants was provided the opportunity to interact with Cozmo for 20 minutes before engaging in the gambling task, whereas the other group of participants plays another interactive game (the Simon game), for 20 minutes before engaging in the gambling task. After this playtime, Cozmo was placed underneath the computer screen for both groups (after being briefly introduced for the participants in the Simon group), where it sat still but made occasional eye blinks while participants completed the gambling task.
Participants were instructed that they would be gambling for themselves for a chance to win a gift card on half the blocks, while gambling for new upgrades for Cozmo (e.g., batteries, new tank, etc.) during the other half of the blocks. They also were instructed that the likelihoods for each block were completely independent. Participants engaged in the gambling task with an equal number of trials being "Self" (i.e., gamble for oneself) versus "Other" (i.e., gamble for Cozmo). At the mid-point of the experiment (i.e., after 16 blocks), participants completed the same interaction task again based on the condition that they were assigned (i.e., if they were in the high-familiarity condition, they interacted with Cozmo, whereas if they were in the low-familiarity condition, they played the Simon game) and then engaged in another 16 blocks of the gambling task. The RewP amplitudes were compared as an index of reward valuation.

Interaction Task
Participants were pseudo-randomized into either the highfamiliarity condition or the low-familiarity condition. In the high-familiarity condition, participants interacted with Cozmo, a tank robot and its cubes. The interaction task consisted of two pre-programmed games: Keep Away and Quick Tap. The objective of Keep Away was to hold one of Cozmo's cubes and push it towards Cozmo while it tries to tap the top of the cube. If participants pulled the cube away from Cozmo before it tapped the cube, the participant earned a point. If Cozmo tapped the cube first, it gained a point. The objective of Quick Tap was to play a color matching game with Cozmo where one cube is placed in front of the participant and one cube was placed in front of Cozmo. Next, both cubes would light up with a color. If the two cubes' colors matched, participants were to tap the cube before Cozmo did (i.e., similar to a go-trial in a go/nogo task). However, if the colors matched and the colors were red, then neither the participant nor Cozmo were to tap the cube (i.e., no go trial). 4 After each trial, a point was given to whoever tapped their cube first (with the exception of the no-go trial where the other player won the point if the cube was tapped on red). The order of the two tasks were counterbalanced between participants. Regardless of the outcome of the two tasks, participants were always told that their and Cozmo's scores were close, but that they won. In the low-familiarity condition, participants played the traditional Simon Says game using an electronic device placed in front of them. The device would then play a random series of tones and lights, and the participant had to imitate and same sequence by tapping the lights in the correct order. The Simon Says game was chosen due to its high similarity with the Quick Tap game with Cozmo.

Gambling task
Participants completed a gambling task on behalf of themselves and Cozmo. During the gambling task, participants were instructed to, based on trial and error, determine which of two-colored squares produced a "winning" outcome more often. Each trial started with a central fixation cross, which was presented for 500 ms. Two differently colored squares that were randomly generated by the experiment (i.e., to maximize the likelihood of presenting different color combinations for each block) would then appear on either side of the fixation cross. One color had a winning probability of 60% (i.e., high winning probability), and the other color had a 10% winning probability (i.e., low winning probability). Although the color scheme would be randomly selected for each block, the color of the squares would always appear as complementary colors (e.g., orange and blue). These colors would be presented according to the same color scheme over the course of the entire block. After the colored boxes had been presented for 500 ms, the fixation-cross would change colors from black to grey to indicate to participants that they should choose the color that they think is associated with the higher winning probability. The purpose of the change in color is to encourage the participant to contemplate the odds of the reward prior to responding. If the participant selected a square before the change of color for the fixation cross, that trial was removed from all analyses. The squares remained on the screen until the participants responded with either the "2" key (with their left index finger) to select the color on the left or the "8" key (with their right index finger) to select the color on the right. The probability of each colored square being presented on the left or right side of the screen was equiprobable. After participants responded, feedback would be presented for 1,000 ms to inform them about the gambling outcome (i.e., "win" or "lose"). The inter-trial interval was jittered between 400-600 ms (Fig. 1).
Participants completed 32 blocks of the gambling task that consist of 20 trials per block. Before the start of each block, a screen would be presented to the participant to indicate whether they were gambling on behalf of themselves "Self" or if they were gambling on behalf of Cozmo "Cozmo." In essence, subjects completed 16 blocks where they gambled for themselves or Cozmo, which equates to 640 trials overall. The order of "Self" or "Cozmo" blocks were counterbalanced across participants.

Electroencephalogram data recording and processing
The electroencephalogram (EEG) was recorded using a Neuroscan NuAmps amplifier and SCAN 4.01 software (Compumedics, North Carolina, USA). EEG data were collected from 32 scalp sites (extended 10-20 system) using Ag/ AgCl electrodes mounted in an elastic cap. Ag/AgCl electrodes also were placed at the left supraorbital and suborbital sites, as well as the left and right outer canthal sites to monitor vertical and horizontal electro-oculographic (EOG) activity, respectively. All scalp electrodes were referenced to the left mastoid (A1) online and re-referenced to the average of the left and right (A2) mastoid offline. The in-cap ground electrode was positioned just anterior to electrode Fz. EEG data were collected at a sampling rate of 500 Hz and were filtered online using a 0.1-Hz high-pass filter and a 70-Hz low-pass filter. Impedance for all electrodes was maintained below 5 kilo-Ohms throughout the duration of the recording session.
EEG data were filtered offline using a 30-Hz low-pass filter and then subjected to independent components analysis (ICA) using Brain Vision Analyzer to identify and reject components corresponding to blinks and saccades. Data were then exported to EEGLAB (Delorme & Makeig, 2004), an EEG processing toolbox for MATLAB, for all remaining processing steps. Data were epoched from 200 ms before feedback presentation to 800 ms following feedback, then subjected to an automated amplitude rejection threshold of ±100 microvolts and a spectral rejection threshold of 50 dB (20-40 Hz bandwidth) using the pop_rejspec function to remove EMGlike activity. If more than 20% of trials were rejected for a given channel, then that channel was removed from the dataset. Channels that were removed were interpolated using spherical spline interpolation. Epochs were baseline corrected using a window spanning −200 ms to 0 ms relative to feedback presentation.

Questionnaires
Previous research has shown that motivational control is related to reward processes, which also is related to rewardassociated ERPs (Santesso et al., 2011). Specifically, traitbased factors can influence if participants perceived outcomes as rewarding or punishing (Santesso et al., 2011). Therefore, we used the Gray's Sensitivity to Reward Questionnaire (SPSRQ: Torrubia et al., 2001) to ensure that any differences observed in reward processing were not related to trait-related differences. The SPSRQ also was used to ensure that the two familiarization conditions were similar in reward sensitivity. Fig. 1 Gambling task trial sequence. Participants were to complete a gambling task on behalf of themselves or Cozmo. The gambling task presented two colors in each block with one color having a higher probability of winning (i.e., 60% chance of winning). After making a choice, feedback was presented regarding whether they won or lost. The ERPs were locked to the feedback.
In addition, a handedness questionnaire was administered to determine subjects' preferences in handedness when completing different activities and to ensure that subjects were right-handed (e.g., which hand do you use when you write, use a spoon, etc.).

Electrophysiological data
The RewP was characterized as a difference wave of feedback-locked EEG in which losses were subtracted from wins, which were computed separately for each experimental condition. Statistical comparisons were based on data derived from electrode FCz, where the RewP was found to be of maximal amplitude. Mean amplitudes were computed using a 60-ms window (270-330 ms) centered on the peak latency of the grand average difference wave. RewP amplitudes were statistically compared using a 2 x 2 repeated measures mixed ANOVA with Ownership (Self vs. Cozmo) as a within factor and Familiarity (High vs. Low) as a between factor. Post-hoc t-tests were used to follow up any significant interactions. We restricted our analysis to the difference wave between positive and negative feedback because prior studies suggest that the modulation in feedback-related negativity for positive and negative feedback is mainly driven by reward-related mental processes (Holroyd et al., 2008(Holroyd et al., , 2011Miltner et al., 1997).
We additionally examined the influence of familiarity and ownership on the P3 amplitude, which we derived from the Pz electrode, the site where the amplitude of the P3 was maximal in the grand average parent waveform. Although we chose to use the parent waveform for window selection, because a distinct P3 was not evident in the difference wave, mean amplitude values were collected for statistical comparison from the difference wave in a similar manner to the RewP. Statistical comparisons were based on a 180-ms window (300-480 ms). The larger P3 window is consistent with prior work that uses larger time windows for analyzing P3 amplitudes (Gajewski et al., 2008;Hilgard et al., 2014;Threadgill & Gable, 2020). Analysis of the P3 amplitudes were using a 2 x 2 repeated measures mixed ANOVA with Ownership (Self vs. Cozmo) as a within factor and Familiarity (High vs. Low) as a between factor.

Behavioral data
To examine participant's performance in the gambling task, we were interested in how quickly participants learned the outcome of the gambling task (i.e., picking the correct color; correct vs. incorrect). This metric, as opposed to actual performance, allowed us to identify how quickly participants were learning while removing any influence of chance that is due to the probabilistic nature of the gambling task. To do this, we constructed conditional growth curves, which tracked each individual's choice (i.e., a dichotomous variable) for each of the different conditions using log-log mixed linear models on each given trial. In a growth curve analysis, the first step was to examine how the most basic model (i.e., the unconditional model) would track participant's individual performances. This model contains whether a participant chose the correct color (i.e., the high probability color) on each trial, regressed onto a single predictor, which is the log-log function. This model ignores any variance that is due to the experimental procedure (i.e., not accounting for the Ownership and Familiarity dummy variables). Next, we created a conditional growth curve model by including the factors of interest into the model as well as their interaction with the log-log function predicting whether participants chose the high probability color. Once the conditional growth model was constructed, we compared the conditional growth model to the unconditional growth model using a nested model comparison to test whether a more complex model accounts for more variance compared to the unconditional model (i.e., a more parsimonious model). Because nested model comparisons can favor complex models if they account for more variance in the data, we examined fit indices of the models, which allowed us to have a ratio of variance explained to parsimony. We used the Bayesian Information Criterion (BIC), because it generally favors parsimony over variance explained (Konishi & Kitagawa, 2008). Once a model of best fit was determined, we examined the individual predictors in that model to see if any of the predictors predicted participants learning performance. Specifically, we were interested in examining the interaction terms between the dummy variables (i.e., Ownership and Familiarity) and the log-log function as it allows us to determine whether change overtime was different depending on the factor. In essence, the growth curve model contained Ownership, Familiarity, the growth term, the 2-way interaction between the growth term and each dummy variable, and their 3-way interaction as regressors.

Questionnaire results
Results of the Welch two-sample t-tests did not indicate any differences between our two samples (i.e., high-familiarity vs. low familiarity) in handedness (t(30.24) = −0.63, p = 0.53). There also were no differences in the BIS (t(33.9) = −1.11, p = 0.23), BAS reward (t(27.92) = −0.84, p = 0.4), BAS drive (t(29.64) = −0.85, p = 0.4), and BAS fun (t(31.91) = −0.89, p = 0.37), which indicates that the two samples were identical on reward sensitivity. Cronbach's alpha showed acceptable reliability scores across all the reward sensitivity items (α = 0.89, 95% confidence interval (CI) [0.79, 0.93]) All degrees of freedom were adjusted for unequal variances in the two groups. Demographic data as well as a breakdown of the results of BIS-BAS by gender can be found in Table 1.

Behavioral results
The model of best fit revealed that the log-log function was a significant predictor (

Discussion
The goal of the present study was to examine the role of electrophysiological indices related to feedback processing associated with learning during social interactions with robots. We asked participants to either interact with Cozmo or perform a nonsocial task, then complete a learnable gambling paradigm in which they could gamble on behalf of themselves (i.e., Wins go to the participant) or Cozmo (i.e., Wins go to Cozmo). Because previous studies have proposed that social contexts have the ability to change feedback monitoring (e.g., feedback monitoring is altered when a social in-group member is present; Hobson & Inzlicht, 2016), thus influencing learning, we hypothesized that feedback-monitoring would be altered when subjects familiarize themselves with an animate robot, Cozmo, by interacting with it. Specifically, we hypothesized that participants who interacted with Cozmo would exhibit enhanced RewP amplitudes, an electrophysiological index associated with reward and reinforcement learning, when the outcome of the gambling task affected Cozmo compared with the group that did not interact with Cozmo. Additionally, we expected that participants who did not interact with Cozmo would show reduced RewP amplitudes when the outcome affected Cozmo in comparison to when the outcome affected themselves, as previous work has shown that outcomes affecting strangers were associated with inhibited feedback processing (Hassall et al., 2016). Prior work has suggested that motivational control is related to reward processes and that these processes can influence reward related ERPs (Santesso et al., 2011). Specifically, trait-based factors can influence how rewarding and punishing outcomes are perceived (Santesso et al., 2011). Therefore, we used the SPSRQ sensitivity to reward questionnaire to ensure that the differences that we find in RewP were due to our manipulation and not due to SPSRQ differences between familiarization conditions. The findings revealed that RewP amplitudes were enhanced for participants who interacted with the robot compared with subjects who did not. Additionally, contrary to our expectation, no differences were detected in RewP amplitudes when the outcome of the gambling task affected one's self or Cozmo for subjects who did not interact with Cozmo. Results of the behavioral data suggest that when participants familiarized themselves with Cozmo, they learned to discriminate which of the two response options (colored squares) would produce the "win" feedback at a slower rate compared with when they did not interact with Cozmo. Taken together, it is not surprising that slow learners (i.e., subjects who interacted with Cozmo) had larger RewP amplitudes, because they relied on the positive feedback signal to inform their following behavior due to the fact that the difference between the outcome of their behavior and the outcome of their desired behavior was large (Holroyd & Coles, 2002;Schultz, 2017), while fast learners (i.e., participants who did not interact with Cozmo) did not have to rely on the feedback as it contained information of little utility. This fast learning process diminished their physiological responses to the feedback stimulus as it carried less weight to inform future behavior. This interpretation is in line with Holroyd and Coles' (2002) theory of Reinforcement Learning (RL-ERN). The RL-ERN theory suggests that two types of prediction errors exist: a positive prediction error (i.e., outcomes that are better than our expected outcome) and a negative prediction error (i.e., outcomes that are worse than our expected outcome). It is believed that the RewP tracks this difference between our expectation and the actual outcome. While both prediction errors elicit a feedback response, fast learners have a diminished difference between their expectations and the actual outcome, thus resulting in smaller RewP amplitudes. Moreover, the reinforcement learning theory suggests that once fast learners learn a task, they place more weight on their own responses to inform their following behaviors as opposed to the feedback that they are provided (Holroyd & Coles, 2002;Krigolson et al., 2009).
We also observed that the P3 component was affected by our experimental manipulation in a manner that mimics the Fig. 2 Mean amplitudes of the RewP (at electrode FCz). The RewP is a difference wave between ERPs that are time-locked to the onset of Win / Loss feedback. The statistical analyses of RewP amplitudes were based on a time window of 270-330 ms, which was centered on the peak of the grand average waveform. The time-window is illustrated by the shaded region. The topographic plots (collapsing across Cozmo and Self gambling conditions) illustrate that the RewP is indeed maximal at electrode FCz. A graph of the parent waveforms can be found in the supplementary materials; see Figure S3. The raincloud plot illustrates the mean amplitudes, the individual data points as well as the distribution of the data. Error bars represent the standard error of the mean. Asterisks represent significance at the 0.05 level.
RewP. This raises the possibility that the observed effect for the latter could be attributed to component overlap. Although there is a parietal local maximum evident in the nonfamiliarity condition, it is distinct from the frontocentral maximum observed for the RewP. The P3 was quantified using an analysis window based on the parent waveforms, which suggests that P3 peak latency (354 ms) was~50 ms later than that of the RewP (300 ms). Moreover, the topography of the P3 is Fig. 3 Mean amplitudes of the P3 (at electrode Pz). Similar to the RewP, the P3 shown is a difference wave between ERPs that are time-locked to the onset of Win/Loss feedback. The statistical analyses were based on time-window of 300-480 ms and is illustrated by the shaded region. The topographic plots (collapsing across Cozmo and Self gambling conditions) illustrate that the P3 is indeed maximal at electrode Pz. A graph of the parent waveforms can be found in the supplementary materials; see Figure S4. Error bars represent the standard error of the mean. Asterisks represent significance at the 0.05 level. Fig. 4 Full results of the conditional growth curve model. The conditional growth model revealed a significant difference in learning rates between the high-familiarity (i.e., left panel) and low-familiarity conditions (right panel). However, no differences in learning rates were detected when subjects gambled on behalf of themselves (i.e., gray) or Cozmo (i.e., yellow), regardless of the condition. distinctly parietal (in fact, for the low-familiarity condition, it was maximal at parietal and occipital sites). These findings suggest that the P3 is not the primary factor contributing to the differences observed for the RewP, although we cannot completely rule out this possibility. Nevertheless, the finding of a significant effect for the P3 has theoretical implications for the study (see below).
One possibility as to why participants in the Cozmo group had enhanced RewP amplitudes could be because social interactions can enhance intrinsic motivations (Tauer & Harackiewicz, 2004), which influences reward processing (Mace et al., 2017;Wilhelm et al., 2019). This claim is supported by the Optimizing Performance Through Intrinsic Motivation and Attentional Learning (OPTIMAL) theory of motor learning that posits that motivational factors and attentional factors, which can be enhanced via social interactions and external factors, have the capability of improving learning performance (Wilhelm et al., 2019;Wulf & Lewthwaite, 2016). In other words, participants who interacted with Cozmo were more intrinsically motivated to perform well on the task. Our finding of increased P3 amplitude in the highfamiliarity condition is consistent with the suggestion that this condition was associated with increased motivation, given that a number of studies have linked the P3 to motivation for reward (Franken et al., 2011;Hughes et al., 2013;Yeung et al., 2004). However, this account does not explain why reward processing was associated with lower learning rates and future work should investigate this question. Here we note that other work has suggested that an immersive and enriched environment can hinder performance. This hindrance also is associated with descriptive increases in RewP (Lohse et al., 2020). This is a similar pattern to what we observed and could be explained by participants who interacted with Cozmo having experienced social enrichment. Such enrichment could have increased arousal and positive affect-possibly partially explaining the increase in RewP-while simultaneously reducing performance, as a result the social component of the environment creating a distraction from the task. The finding that, in the high-familiarity condition, an increase in amplitude of the P3 occurred in conjunction with an increase in the RewP suggests that generalized arousal (Luck, 2014;Rozenkrants & Polich, 2008) could explain the dissociation between RewP amplitude and behavioral performance. However, because the ERP waveforms did not diverge until relatively late in time, it remains possible that a more selective impact on reward processing associated with differing learning rates, as described above, is capable of explaining the relation between physiology and behavior. Similar research that has examined the use of social agents, avatars, and robots in learning environments has consistently found that these social agents can serve as distractors, which can hinder people's ability to learn (Kennedy et al., 2015;Momen et al., 2016;Yadollahi et al., 2018).
One remaining question is why participants did not value rewards more highly for themselves in comparison to Cozmo in the low familiarity condition (i.e., no RewP or learning rate difference when gambling for Self vs. Cozmo in the no interaction condition). One possible explanation is that the mere presence of an entity with somewhat social features might have been sufficient to increase participants' motivation to do well when gambling for the robot. This interpretation would be in line with previous studies on social facilitation showing that the mere presence of other human (Zajonc, 1965) or nonhuman (e.g., robots; Riether et al., 2012) entities positively impacts performance on easy tasks, but negatively impacts performance on difficult tasks, including gambling tasks (Lemoine & Roland-Lévy, 2017). In line with this interpretation, it was suggested that the presence of a social agent could positively affect attentional processes to a similar extent as monetary rewards (Anderson, 2016), which could have been sufficiently motivating to equalize any differences in reward processing between Self and Cozmo conditions.
We acknowledge some limitations in the current study. While we discuss the findings in our study of how interactions with a social agent can influence performance and learning, colleagues suggest that a distinction should be made between the two processes (Cahill et al., 2001). In other words, while we make inferences about learning outcomes from participant's performance, we acknowledge that learning is a complex process that is affected by a wide array of factors. For example, temporal dynamics of our manipulation could have an influence on performance (i.e., does performance hold up after a long period of interaction with an artificial agent?). This point needs to be made to ensure proper scientific rigor in experimental design (i.e., the need to control for other variables is important). This also suggests that future studies should design experiments that are able to make claims about learning that occurs over long periods. Lastly, it is important to acknowledge that, as the interaction with Cozmo involved a competitive task, participants in the high-familiarity condition may have been predisposed to experience greater reward processing as a result of being motivated by the competitive nature of the engagement task. In line with this idea is work showing that the RewP is enhanced when tasks are performed in a social context (Wilhelm et al., 2019), even when the social context involves insults that promote anger in the participant (Threadgill & Gable, 2020). Threadgill and Gable (2020) posit this could be due to the pleasurable feeling associated with revenge. In future work, it would be informative to include a noncompetitive or cooperative engagement task to determine whether tasks that induce emotions of differing valence impact reward processing.
These findings have several implications for both social cognition and the field of human-robot interaction. First, the findings of the present study add to the body of literature investigating the influence of social context on feedback monitoring. While research has shown that the RewP is modulated based on non-social factors (e.g., magnitude of reward), social factors can also affect feedback monitoring. The findings also provide additional data, which align with the reinforcement learning theory. This study also has several implications for the field of human-robot interaction (HRI). Our data suggest that we are able to perform certain tasks on behalf of robots when we familiarize with them (i.e., no difference in feedback monitoring when others gambling). The data also suggest that the use of social robots in learning settings may have a detrimental effect on learning. While using social robots may provide benefits in some settings (e.g., clinical settings), roboticists and technology adaptors should be wary about including animate and social agents in learning environments as they could hinder learning and performance.