Intersentential coreference expectations reflect mental models of events

: Comprehenders’ perception of the world is mediated by the mental models they construct. During discourse processing, incoming information allows comprehenders to update their model of the events being described. At the same time, comprehenders use these models to generate expectations about who or what will be mentioned next. The temporal dynamics of this interdependence between language processing and mental event representation has been difficult to disentangle. The present visual world eye-tracking experiment measures listeners’ coreference expectations during an intersentential pause between a sentence about a transfer-of-possession event and a continuation mentioning either its Source or Goal. We found a temporally dispersed but sustained preference for fixating the Goal that was significantly greater when the event was described as completed rather than incomplete ( passed versus was passing ). This aligns with reported offline sensitivity to event structure, as conveyed via verb aspect, and provides new evidence that our mental model of an event leads to early and, crucially, proactive expectations about subsequent mention in the upcoming discourse.


Introduction
When we process discourse, we create mental models of the events being described (Johnson-Laird, 1983;Van Dijk & Kintsch, 1983). We use incoming linguistic information, together with our knowledge of the world, to incrementally update these models as the discourse progresses. Subtle differences in linguistic choices can have measurable effects on our mental representations of a situation. For example, events described with imperfective-marked verbs (e.g., Leah was passing the salt to Eve) are construed as ongoing, with attention distributed across event participants, whereas perfective aspect (Leah passed the salt to Eve) evokes an event as completed, bringing participants associated with its end state into the focus of attention (Madden & Ferretti, 2009;Magliano & Schleich, 2000;Moens & Steedman, 1988). In the domain of reference processing, this focus of attention on an event participant is linked to the probability of that participant being rementioned in the next sentence. An ambiguous pronoun following a perfective-marked transfer-of-possession event (e.g., Leah passed the salt to Eve. She…) is preferentially interpreted as referring to the Goal of the transfer event (Eve), who is now in possession of the transferred object, rather than the Source (Leah; Stevenson, Crawley, & Kleinman, 1994). Importantly, when the verb is marked with imperfective (was passing), this Goal preference decreases (Rohde, Kehler, & Elman, 2006;Kehler, Kertz, Rohde, & Elman, 2008), consistent with more equal focus on participants in a mental model of an ongoing event.
In this paper we investigate this interdependence between language processing and our mental representations of events, probing the timecourse over which our situation models inform our processing of reference. This question has been addressed extensively 4 for events and situations described with Implicit Causality verbs (IC; Garvey & Caramazza, 1974;Hartshorne, 2014), where a long-standing debate centers on whether these referential biases emerge before, at, or after a pronoun in contexts involving inferences about the event's cause (see Koornneef, Dotlačil, van den Broek, & Sanders, 2016, for review). 1 One possibility is that these biases depend on the referential expression, such that a pronoun prompts comprehenders to consult their current situation model, and the distribution of attention to event participants in that model influences their referential choices. This could explain findings from story continuation experiments in which manipulations of IC status or grammatical aspect influenced participants' referential choices when they wrote completions for sentences starting with an ambiguous pronoun (Kehler et al., 2008). It is also compatible with findings from an ERP study by Ferretti, Rohde, Kehler and Crutchley (2009) This indicates comprehenders had more difficulty integrating the pronoun when its reference (forced by gender-marking) jarred with the status of that referent in their situation model (for similar effects on reading time in contexts with IC verbs, see van Berkum, 2006, andSanders, 2013).
Another account of these findings, and the one advocated by Ferretti, Rohde and colleagues, postulates a proactive bias whereby comprehenders' situation models exert an influence on referential processing independent of the presence of a particular referential form. More specifically, comprehenders may continually draw on their current situation models to generate expectations about who or what is likely to be mentioned next in the upcoming discourse. If and when a pronoun is encountered, its interpretation is then in part a function of the expectancies built up prior to that point. Such proactive 'thinking ahead' characterizes coreference models like the Expectancy Hypothesis (Arnold, 2001) as well as a more recent Bayesian approach (Kehler et al., 2008)-in both cases, properties of the discourse may directly or indirectly influence comprehenders' expectations about subsequent mention of a referent. This would be consistent with evidence of prediction in language processing that has accrued at other levels of linguistic representation, including morphophonology, syntax, and semantics (e.g., Federmeier, 2007;Kuperberg & Jaeger, 2016). 2 For example, when hearing The boy will eat the…, listeners look more at edible than inedible objects in a visual array, and this preference emerges before they hear the noun (…cake) (Altmann & Kamide, 1999). There is good 2 While the extent of the role that prediction plays in language processing is under current debate (Huettig & Mani, 2016;Nieuwland et al., 2017), the contribution of a proactive component to human cognition is widely agreed upon (Bar, 2007). evidence that listeners look at referents when they are named, or when there is syntactic and/or semantic information signaling that they are about to be named (Kamide, Scheepers, & Altman, 2003;Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995).
In all of these cases, the relevant cue (e.g., eat) can be pinpointed in the speech signal, and looking behavior predicted and examined in well-defined timeframes immediately following that cue.
Pinpointing the cue that focuses attention on referents in a mental model is less straightforward. Presumably the construction of event representations emerges from a complex combination of cues. This makes it challenging to detect effects of these models on gaze allocation because of their potential to be more widely distributed over time.
Indeed, visual world eye-tracking experiments that test for predictive looks contingent on IC bias have produced inconsistent results (Cozijn, Commandeur, Vonk, & Noordman, 2011;Itzhak & Baum, 2015;Pyykkönen & Järvikivi, 2010). Our study targets anticipatory coreference processing, but we use a cue whose effects emerge in a different way from those in IC studies. In IC experiments, the reported bias to a causallyimplicated referent is taken to reflect the assignment of thematic roles. We hold constant thematic role, and indeed all referent properties typically enumerated in the pronoun interpretation literature, while manipulating grammatical aspect on the verb. The assignment of Source and Goal roles does not vary, nor do their grammatical roles, their relative recency, their parallelism with a subsequent subject pronoun, etc. We manipulate nothing about the pragmatic status or morphosyntactic encoding of the referents at all (i.e., the types of referent properties typically implicated in next-mention expectancies and assessed in corpus studies; Arnold 1998); rather aspectual effects arise via a 7 manipulation of the completed/ongoing nature of the event and the repercussions that has on the discourse and the comprehenders' model thereof. Note that an effect of aspect need not arise in all contexts. It is only when the status of an event is ongoing that it has the capacity to minimize the relevance of a protagonist associated with the end state. A coreference model for capturing such behavior would therefore depend on conditional activation of certain factors, particularly those for which the causal relationship with pronoun interpretation is indirect (see Kehler & Rohde, 2017). This makes comprehenders' potential use of this cue all the more impressive for real-time computation regarding upcoming next mention. Crucially our study targets the anticipatory use of aspect, extending beyond the integration effects reported in prior coreference work on aspect. To test for a potentially broadly distributed effect of aspect on listeners' next-mention expectations in transfer-of-possession contexts, we measure looks to event participants during an intersentential pause before the onset of a subsequent sentence.

Participants
Sixty-three University of Hawai'i students who identified as native speakers of English participated after giving informed consent. Data from 7 participants was excluded prior to analysis due to eyetracker calibration difficulty (n=4), non-normal vision or hearing (n=2), or noise interference (n=1). Data from 3 participants was excluded after data inspection, due to insufficient data points in the eye gaze record (see 8 2.3), leaving 53 participants (28 females, mean age 23) in the final analysis. The study was approved by the UH Human Studies Program.

Materials and procedure
Linguistic stimuli consisted of two-sentence discourses followed by a question, as in (1)

Data treatment and analysis
Eye gaze and mouse-click responses were recorded and exported through SMI Experiment Suite software. Mouse-click accuracy in experimental items was close to ceiling (98.4%) and thus not further analysed. Eye gaze data were classified as fixations, saccades and blinks using the software's default settings. Data were binned into 20-ms samples for further analysis. Trials with a very low proportion of fixations overall were removed (4.2% of data). 4 Participants with fewer than 15 (out of 20) experimental trials remaining after this procedure were excluded from further analysis (n=2). One additional participant was excluded for consistently fixating the Theme during the intersentential pause, thus rendering Source/Goal analysis impossible.
Our analysis tests whether grammatical aspect in the context sentence modulates listeners' looks to the Source versus the Goal of the transfer-of-possession event before they hear a pronoun. On the standard assumption that it takes approximately 200 ms to execute a ballistic eye movement in response to an acoustic stimulus (Matin, Shao, & Boff, 1993), our region of interest extends to 200 ms after the onset of the pronoun in the continuation. It begins 500 ms after the offset of the context sentence to allow for sentence wrap-up and for looks directly contingent on the naming of entities in that sentence to dissipate. In the absence of predictions for the effect of time within this region of interest ('Silence'), we calculated a 'GoalAdvantage' score for the entire region for each trial by subtracting the number of 20-ms bins with looks to Source from those with looks to Goal. 5 A GoalAdvantage score was also calculated for the region from 200 to 1,500 ms after pronoun onset ('Continuation') in order to examine the effect of the disambiguating pronoun and potential interactions with aspect. We used linear mixedeffect regression with maximal random effects structures (Barr, Levy, Scheepers, & Tily, 2013) to model the effects of Aspect (perfective/imperfective; contrast-coded), Reference (pronoun disambiguating to Source/Goal; contrast-coded), and Window (Silence/Continuation; treatment-coded, reference level = Silence) on GoalAdvantage. Figure 2 shows an overview of participants' fixations over the entire trial, collapsing over experimental manipulations. Visual inspection indicates a small but sustained bias to look at the Goal versus Source during the intersentential pause (and beyond), consistent with the known bias to remention Goals following transfer-ofpossession events (Arnold, 2001;Stevenson et al., 1994).

[INSERT FIGURE 2 ABOUT HERE]
Of key interest here is whether this bias is modulated by grammatical aspect. Figure 3 depicts looks to Goal and Source for each Aspect-by-Reference condition. Visual inspection suggests a Goal-bias following perfective but not imperfective aspect, in both distributed. Models fit to empirical logit data aggregated by participant and by item also met model assumptions well, and produced similar patterns of significance.
expectations, we analysed fixations aggregated over the largest meaningful time period for the hypothesis at stake. This 2,200-ms window shows a sustained preference for fixating the Goal over the Source, and this preference was greater following perfective than imperfective transfer-of-possession events. Crucially, it appeared well before the onset of the pronoun. The findings lend direct support to the claim that the focus of attention on particular event participants in comprehenders' mental models allows them to proactively create expectations about who or what is likely to be mentioned next in the upcoming discourse. We have shown that a subtle linguistic manipulation of the way an event is characterized can affect listeners' expectations about upcoming reference, that these expectations are not reliant on the encounter of a pronoun or cue regarding the upcoming sentence's role in the discourse (e.g., a connective like because for IC verbs), and that they can be captured with VWP methods despite the more tenuous nature of the link that we must assume between mental event representations and eye movements.
These findings align well with Van Berkum, Koornneef, Otten, and Nieuwland's (2007) 'immediate focusing' account of the IC phenomenon (see note 1), which holds that comprehenders "use the implicit causality cue in something like 'David praised Linda because…' proactively, and essentially predict, before the pronoun comes along, that the remainder of the sentence will tell us something about Linda" (p. 167, italics in the original). However, despite suggestive evidence from other experimental paradigms (ERP, eye-tracking while reading; Van Berkum et al., 2007;Koornneef & Van Berkum, 2006), VWP studies testing for proactive IC-driven predictions of upcoming discourse referents have been somewhat inconclusive (Cozijn et al., 2011;Itzhak & Baum, 2015; to event participants conditioned on IC bias in time segments including or immediately following the causal connector and the pronoun, effects preceding the connector and the pronoun have only been reported in one study (Pyykkönen & Järvikivi, 2010), and this effect has been questioned as a potential experimental artifact since pronouns always referred to bias-consistent antecedents in that study (Cozijn et al., 2011). By contrast, our VWP study shows evidence of proactive prediction of upcoming referents in the context of transfer-of-possession events and the manipulation of the portrayal of the event as completed or ongoing, in an experimental design that systematically crossed reference of the pronoun with aspect to eliminate any reinforcement of bias. It is possible that the greater length of our silence region provided a sufficient window for such expectations to emerge before the arrival of disambiguating material.
This study shows that the visual world paradigm can be used to capture discourselevel effects of this nature, with choices in design and analysis that reflect the more tenuous linking assumptions that must be made between incrementally updated mental event representations during discourse processing and allocation of eye gaze in a visual world. Exploring a wider variety of discourse contexts using this paradigm should be fruitful in future work aimed at understanding the dynamic mutual relationships between mental models of events and processing of discourse in real time.     Imperf.