When the tune shapes morphology: The origins of vocatives

Many languages use pitch to express pragmatic meaning (henceforth ‘tune’). This requires segmental carriers with rich harmonic structure and high periodic energy, making vowels the optimal carriers of the tune. Tunes can be phonetically impoverished when there is a shortage of vowels, endangering the recovery of their function. This biases sound systems towards the optimisation of tune transmission by processes such as the insertion of vowels. Vocative constructions—used to attract and maintain the addressee’s attention—are often characterised by specific tunes. Many languages additionally mark vocatives morphologically. In this article, we argue that one potential pathway for the emergence of vocative morphemes is the morphological re-analysis of tune-driven phonetic variation that helps to carry pitch patterns. Looking at a corpus of 101 languages, we compare vocatives to structural case markers in terms of their phonological make-up. We find that vocatives are often characterised by additional prosodic modulation (vowel lengthening, stress shift, tone change) and contain substantially fewer consonants, supporting our hypothesis that the acoustic properties of tunes interact with segmental features and can shape the emergence of morphological markers. This fits with the view that the efficient transmission of information is a driving force in the evolution of languages, but also highlights the importance of defining ‘information’ broadly to include pragmatic, social, and affectual components alongside propositional meaning.


Introduction
Language is a complex adaptive system shaped by its communicative functions (Beckner et al. 2009). In work that adopts this general idea as its starting point, communication is often operationalised as the transmission of propositional information (see e.g. recent information-theoretic approaches summarised in Gibson et al. 2019). This approach has led to significant progress, including important results about how the shape of words is influenced by their propositional content (e.g. Hall et al. 2016Hall et al. , 2018. However, productive as it is, this view of communication rests on a simplification (Foulkes et al. 2018). Language performs a careful balancing act across multiple competing functions that also include the transmission of pragmatic, affectual, and social information.
Our article therefore aims to present a fuller picture of the effects of communication on linguistic structure: we look at how morphological markers are shaped by the need to transmit pragmatic information through prosody. We present evidence from a large cross-linguistic survey suggesting that the need to carry pragmatically relevant prosodic manipulations is a key factor in determining the phonological shape of vocative markers.

Tune-driven phonetic variation
In recent years, there has been increasing recognition of the intricate interactions among different channels of speech. One specific example is the interaction between pitch as an expression of discourse function-the tuneand properties of speech that typically express propositional meaning such as consonants and vowels-the text. Tune and text are necessarily linked: the tune is carried by the text.
Interactions between tune and text often manifest as systematic phonetic variation in vowels, which lend themselves particularly well to the transmission of tunes given their rich harmonic structure and high periodic energy. In a recent review of the literature, Roettger and Grice (2019) present examples from a variety of languages where pragmatically relevant pitch movements are accompanied by the lengthening of existing vowels (e.g. Frota 2002;Ladd 2008;Heston 2014), the insertion of non-lexical vowels (e.g. Dell and Elmedlaoui 2002;Cruz 2013;Roettger 2017), or the blocking of vowel deletion and devoicing (e.g. Dauer 1980;Andreeva and Koreman 2003;Kilbourn-Ceron and Sonderegger 2017). For example, experimental evidence from Bari Italian suggests that when a word occurs with a rising-falling pitch movement characteristic of a yesno question, speakers systematically lengthen the final vowel (Grice, Savino, and Roettger 2019). They are also more likely to add non-lexical vowels after word-final consonants (Grice, Savino, and Roettger 2018). This effect is particularly pronounced when the pitch movement occurs on a part of the word that does not have sufficient segmental material to realise the pitch movement. Final syllables that end on a consonant constitute a prime example of such a position: they make communicatively relevant pitch movements hard to realise in production (Odé 2005;Rathcke 2013) and retrieve in perception (Zhang 2004; Barnes et al. 2014).
The shortage of tune-bearing segmental material might impoverish the phonetic realisation of communicatively relevant tonal contours, potentially obscuring pragmatic contrasts. The resolution of such functional conflicts through systematic phonetic variation of the sort described above can eventually lead to diachronic re-analysis and the emergence of novel grammatical markers. For instance, Hellmuth (2020, in press) shows that speakers of Tunisian Arabic insert phrase-final vowels, but only in polar questions characterised by a pitch movement. Hellmuth speculates that this inserted vowel can be reinterpreted as a question particle, leading to recurring tune-driven segmental changes being reanalysed as grammatical devices.
This type of grammatical re-analysis-which we refer to as morphologisation following Fox (1995)hinges on two conditions. First, a communicative function such as a request for information must systematically co-occur with a specific pitch configuration in salient prosodic positions. Secondly, the articulatory realisation and/or perceptual retrieval of this pitch configuration must be facilitated by changes to the text such as vowel insertion. Learners might then reinterpret tune-driven changes as bona fide grammatical markers. This article will argue that vocative morphology exemplifies tunetext interactions of precisely this type.

Vocative constructions
Vocative constructions are used to call out to interlocutors or attract/maintain addressees' attention (Daniel and Spencer 2009). Vocatives are typically single constituent constructions. Many languages mark them grammatically through vocative affixes or particles. For example, in Hualapai, a Cochimi-Yuman language spoken in northwestern Arizona, a stem-final /-é/ is suffixed when speakers wish to attract the attention of someone near them or within their sight (Watahomigie et al., 2001). Similarly, in Lao, a Tai-Kadai language spoken in Laos, the postnominal vocative particle /?e+j/ is used to call out to someone (Enfield, 2007; see https://osf.io/ wzstm/ for further examples from our corpus). We know from languages with detailed prosodic descriptions that vocatives are often characterised by specific intonation contours. These tunes, referred to as vocative chants (Liberman 1975;Leben 1976), stylized falls (Ladd 1978), or chanted calls (Hayes and Lahiri 1991), usually consist of a rising pitch movement, followed by a sustained mid to high plateau (see Fig. 1 for an example).
Several authors speculate that the form and function of vocative chants are directly linked (Crystal 1969;Fox 1969;Ladd 1978). One relevant proposal is based on the fact that vocative chants are often used to bridge physical distances between interlocutors. The steady pitch component of the contour lends itself to maintaining a high volume for calling.
To realise such a steady-and usually high-pitch component, the segmental material in the vocative form must have sufficient periodicity and harmonic structure. Thus, we ask the following question: assuming that vocative tunes of a similar form are cross-linguistically widespread, could it be that the segmental make-up of vocative markers is optimised for carrying these tunes? As noted above, such optimisation could take place through the morphologisation of recurring tune-driven phonetic variation such as vowel insertion and lengthening. This would make vocatives a prime example of the tune driving the text (Roettger and Grice 2019).

Tune-driven vocatives?
While the phonological make-up of vocative markers has received little attention in the literature, Daniel and Spencer (2009) explicitly note that they are unusual from a phonological point of view: A remarkably robust observation is that the vocative doesn't seem to be marked by a consonant in any language (Daniel and Spencer 2009: 6).
A vocative marker in the form of a single vowel without consonants is exactly what we would predict under an approach where vocatives are optimised for carrying prosodic patterns. Vowels typically have richer harmonic structure and higher periodic energy than consonants, and therefore provide the perfect environment for transmitting pragmatic information.
To be clear, we are not suggesting that speakers deliberately choose a suitable vocalic variant over a consonantal alternative. As outlined above, we propose that tune-driven phonetic enhancements are sometimes promoted to the status of grammatical markers. This takes place through a process of morphologisation, similar to the phonologisation of gradient phonetic biases (Hyman 1976). Specifically, we suggest that the consonant-free vocative markers noted by Daniel and Spencer (2009) can emerge through the morphologisation of non-lexical vowel insertion in the context of vocative chants (cf. Roettger and Grice 2019).
Morphologisation could also lead to the emergence of categorical suprasegmental phenomena associated with vocatives. Such phenomena may develop (1) as a direct reflex of pragmatically conditioned prosodic manipulations such as specific pitch contours and stressshift, or (2) through prosodic changes that enhance the tune-carrying capacity of the text. To give a few examples of (1), many languages are described as showing primary stress shifts in vocative constructions. These include Nahuatl (Sullivan 1988), Tariana (Aikhenvald 2006), and Lavukaleve (Terrill 2011). Furthermore, tone languages such as Cantonese (Chen 2000) and Ngiti (Kutsch Lojenga 1994) are described as adding a final high tone to the word in a vocative construction. As for (2), Daniel and Spencer (2009) report that vocative constructions commonly exhibit prosodic processes 'sometimes violating the language's suprasegmental system' (e.g. stress shift, tone alternation, vowel lengthening, and consonant deletion; ibid: 4). For example, Chuckchi, a Chukotko-Kamchatkan language spoken in Siberia, has several 'irregular' prosodic patterns associated with vocative constructions (Dunn 1999). Vocatives in Chuckchi exhibit the insertion of nonlexical vowels, promotion of reduced vowels to full, lengthening of vowels, and diphthongisation, all of which increase the suitability of the text to carry the tune.
This article aims to test whether grammatical vocative forms are indeed optimised for the transmission of the tune. We follow recent work that has relied on statistical analysis of cross-linguistic data to address important theoretical questions relating to the origins of linguistic patterns, such as Everett (2018) and Wedel, Ussishkin, and King (2019). We conducted a large-scale survey of vocative markers across the languages of the world, reviewing several hundred grammars and identifying over a hundred languages with vocatives. As vocatives are often categorised as case markers, we compare the segmental make-up of vocatives to that of structural case markers-mainly accusatives and ergatives-in order to gauge the extent to which their tune-bearing properties are unique.
Importantly, 'tune-bearing properties' may be operationalised in many ways, such as the presence/absence of consonants, the presence/absence of vowels, vowel length, vowel quality, and so on. In the previous paragraphs-building on existing observations in the literature-we outlined one specific pathway to morphologisation that would promote vocatives that contain vowels but not consonants. Similar accounts could be developed for other tune-friendly segmental features: for instance, small gradient increases in vowel duration could eventually lead to a preference for phonologically long vowels in vocatives. Instead of arbitrarily focusing on a single aspect of segmental structure, we explore multiple possible ways of operationalising tune-friendliness. Below is a list of the predictions that we test as part of this study. We briefly justify each prediction by showing how they relate to the tune-bearing properties of vocatives and other case markers.
P1: Vocatives are accompanied by categorical prosodic modulations more often than other case forms. This prediction has already been outlined above, and is an important prerequisite to the rest of this investigation: the presence of morphologised prosodic manipulations would lend strong empirical support to the claim that vocatives are typically accompanied by vocative chant-type patterns. P2: Vocatives disprefer consonants compared to other case forms. This is expected under the account outlined above, where vocatives may arise through the morphologisation of tune-bearing inserted vowels. Consonants are also characterised by decreased periodic energy and poorer harmonic structure, making them less suitable to carry prosodic modulations. P3: When a consonant is present in a case marker, it tends not to be word-final in vocatives compared to other case forms. Suprasegmental phenomena such as stress assignment and lexical tones tend to be more sensitive to coda than onset consonants (e.g. stress placement is often dependent on the presence or number of coda consonants, but typically independent of onset consonants). This suggests that postvocalic consonants are more relevant to the transmission of prosodic information. We may therefore expect the preference against consonants in vocative markers to be stronger in final position.
P4: When a consonant is present in a case marker, obstruents, and especially voiceless obstruents are dispreferred in vocatives compared to other case forms. Obstruents are suboptimal carriers of pitch patterns (e.g. lexical contour tones are often restricted to syllables with sonorant consonants in their coda; Gordon, 2004). Voiceless obstruents are the least conducive to prosodic manipulations due to their lack of periodic vocal fold vibrations. P5: Vocatives contain vowels more often than other case forms. This prediction follows straightforwardly from the fact that vowels are uniquely well-suited to carrying tunes. P6: Vocatives contain long vowels more often than other case forms. Long vowels provide more support for prosodic manipulations than short vowels and vowel lengthening is a cross-linguistically common tune-driven segmental modulation. P7: Vocatives contain vowels with qualities that are different from those in other case forms. There are several possible ways in which these differences may manifest. Vocatives may show a preference for low vowels due to these being longer and more sonorous. Alternatively, tune-driven vowel insertion regularly results in a mid reduced vowel (e.g. Roettger 2017, Grice, Savino, andRoettger 2018). Thus, vocatives may contain more mid vowels if these vowels are the morphologised reflexes of inserted vocoids.
Our data provide strong evidence for two of these predictions (P1 and P2), and we do not find any reliable evidence for patterns that contradict the directional predictions above.
We must stress that our study is not confirmatory but exploratory in nature (for a discussion of these two modes of the hypothetico-deductive method, see Box 1976;Tukey 1977;De Groot 1956; see also Roettger, Winter, and Baayen 2019, for a recent discussion related to linguistic research). While most of our predictions were conceived of before collecting the corpus, some of them emerged during the data collection procedure. Moreover, certain phonological patterns related to vocatives were known to the authors from the literature (see e.g. the quote from Daniel and Spencer 2009 above). Finally, the sampling procedure for the corpus was not strictly random, as we attempted to gather as many grammars reporting on vocatives as possible while being limited by pragmatic considerations. Of course, these limitations are also present in most other typological studies of linguistic structure. To reflect the exploratory nature of our study, we discuss all predictions that we tested, including ones that were not supported by the data.

Methods
Our goal is to investigate whether vocatives are uniquely well-suited to carrying tunes. This can only be established by comparing them to other case forms across a diverse sample of languages. This involves constructing a cross-linguistic database of vocatives and other case markers (see Section 2.1); extracting segmental information about these markers (see Section 2.2); and using statistical modelling to evaluate differences in the phonological form of vocatives and other case markers (see Section 2.3). This section covers each of these steps in detail.

Database
Our database was created by manually searching for mentions of vocatives and other case forms in grammatical descriptions of different languages, and recording information about the case markers in a unified format. As a prerequisite to this process, we had to make decisions about (1)

what would be counted as a vocative and
(2) what other forms would be used as a comparison set.
We decided to include any non-zero marker that was explicitly labelled as a 'vocative' by the grammar writer. This includes not only vocative suffixes and prefixes, but also clitics, particles and suprasegmentally marked vocatives (primarily through vowel lengthening, tonal changes, and stress-shift). We did not include zero markers and markers with a vocative-like function that were not explicitly labelled as vocatives (e.g. 'calls', 'addresses').
As for (2), we recorded two different types of case forms marking core thematic roles: 'nominative-like' (nominative, absolutive, agentive, and subject case) and 'accusative-like' (accusative, ergative, objective, (direct) object, oblique, non-nominative, anti-agentive, relative and absolutive-modalis). Nominative-like case forms are marked by zero in more than half of all the languages in our sample, and zero-marked forms cannot be directly compared to explicitly marked vocatives. The remaining fifty-three non-zero nominative-like case forms are simply too small a sample to yield reliable estimates while also controlling for areal and genetic dependencies, and therefore we do not discuss them further. In contrast, accusative-like case forms are rarely zero-marked, andunlike vocatives-there is no reason to assume that they come with characteristic prosodic manipulations. Therefore, they serve as an ideal comparison set to vocatives in testing our predictions. The rest of this article focuses on vocatives and accusative-like forms.
Our search procedure was as follows. The first author identified all languages that had information about vocatives in the Surrey Syncretism Database (Baerman, Brown, and Corbett 2002), and also manually searched through the indices and tables of contents of all descriptive grammars that were physically available at the University of British Columbia library stacks. The second author manually searched through the indices and tables of contents of all published grammars and doctoral dissertations that were digitally available to him including the grammar series from Mouton and LangSci Press, as well as materials from search engine results using the search terms 'grammar of'.
We included all languages whose grammars listed non-zero-marked vocatives. However, since some of the existing claims about vocatives (cf. Section 1) are based on samples that are skewed in the direction of Indo-European languages (e.g. sixteen out of thirty-three languages with vocatives in Daniel and Spencer 2009 are Indo-European), we avoided the grammars of a number of well-described Indo-European languages where vocatives are known to exist (e.g. Latin, Greek, Slavic languages). This allows us to test existing observations in an independent sample of languages.
Our final database contains 228 vocative forms and 137 accusative-like forms from 101 languages representing 46 different language families (counting isolates as separate families). When a language had more than one vocative or accusative-like form, an effort was made to record all of these forms.
We note that our data set included a non-negligible number of cases where vocatives were marked by segment substitution (8 out of 228 vocatives, e.g. replacing one vowel with another) or segment/syllable deletion (17 out of 228 vocatives). These forms are difficult to compare to accusative-like case forms, which are almost never marked by substitution or deletion. Therefore, we excluded them from our analysis. However, since the deletion of segmental material goes directly against the prediction that vocatives are optimised for carrying tunes, we return to these cases in the Discussion section.

Variables
The variables shown in Table 1 are used for subsetting the data where necessary, as predictor variables, as random effect grouping variables, and as outcome variables.

Statistical modelling
We use Bayesian mixed-effect logistic and multinomial models implemented using the brms package (Bü rkner 2017; this package relies heavily on Stan, Carpenter et al. 2017) in R (R Core Team 2018) to test our main predictions. As noted above, we fit multiple models to test a range of possible predictions that all derive from the underlying assumption that prosody can drive segmental patterns. In some cases, we also fit the same model to different subsets of the data in order to ascertain the robustness of our conclusions under different conditions. All of these models have the same underlying structure: the outcome variable is either a binary or multinomial categorical variable, typically representing some aspect of segmental structure (see Table 1 for a full list of outcome variables); broad case is the only fixed effect predictor variable; and the models include random intercepts and random slopes over broad case by language family and macro area to control for genetic and areal dependencies. This can be summarised using the following brms model formula: We decided not to include language itself as a random effect despite the fact that some of our languages are represented by more than one case form, and some case forms are represented by more than one form within a given language. The reasons for avoiding these random effects are as follows. First, the majority of languages have only one form per case category. Secondly, some languages only have vocatives but no accusative-like forms. Thirdly, our models have categorical outcome variables, which makes estimation difficult when there are only a small number of observations, and may lead to quasi-complete separation within random effect levels (e.g. Gelman et al. 2008). For a small number of key models that were strongly in support of our predictions, we also fit models with by-language random effects, and found their estimates and credible intervals (CrIs) to be nearly identical to those of our simpler models. These models are not reported in the main text, but are available as part of the accompanying OSF repository (https://osf.io/ejr8m/). The analyses we present are all couched in a Bayesian framework. As part of the Bayesian model fitting process, we specify prior distributions that embody our expectations about the model parameters. Model fitting consists in updating these prior distributions (or priors for short) using evidence from the data to gain a posterior distribution. The posterior distribution represents the likelihood of different combinations of parameter values and allows us to draw inferences relating to our predictions (see Vasishth et al. (2018) and Nalborczyk et al. (2019) for recent examples of Bayesian analyses within linguistics). We used so-called weakly informative (regularising) priors. These prior distributions are agnostic with respect to the hypotheses being tested (i.e. they assign the highest prior probability mass to parameter values that correspond to no difference between vocatives and accusative-like cases), but they require substantial evidence to move parameter values outside a sensible range. For instance, the priors placed on the key fixed effect predictor (broad case) in our logistic models are compatible with any state of affairs where the probability of a given feature in vocatives is no more than 500 times more or less likely than the probability of the same feature in accusative-like cases (e.g. 0.2% in vocatives and 99.98% in accusatives). However, very strong evidence would be required for the model to conclude that the difference in probability is more than 500-fold. This is a lenient prior in that it allows a broad range of parameter values; however, it can help to avoid runaway parameter estimates in cases where estimation is difficult (e.g. complete or quasi-complete separation in regression models with categorical outcomes; Gelman et al. 2008). All our prior choices were based on advice in Gelman (2019). The details of the priors are provided in the main analysis file in the accompanying OSF repository (https://osf.io/ ejr8m/). For all models, the focus is on estimated differences between vocatives and accusative-like cases. We present posterior means and 95% CrIs which tell us about the most likely size of this difference and delimit its range based on our priors, model, and data. These values are presented on a probability scale to make them easier to interpret. We also present raw proportions calculated manually. The model estimates and raw proportions are broadly in agreement in most cases, but they occasionally diverge, especially when the data does not provide robust evidence for a given hypothesis. We believe the model estimates provide a more reliable picture of the data as they control for dependencies within language families and macro-areas and tell us about the whole range of plausible parameter values, not just a point estimate.

Data and code
Analysis scripts and data tables are stored on the Open Science Framework website (osf.io) and are publicly available for educational, research, and non-profit purposes under appropriate attribution (CC-By Attribution 4.0 International License): https://osf.io/ejr8m/.

Results
In this section, we investigate each of the predictions outlined in the introduction (P1-P7). Our modelling strategy is the same throughout and is only described in detail in cases that are challenging or require a less conventional model setup.

Prosodic manipulations (P1)
In this section, we investigate the prediction that vocatives are more often accompanied by prosodic manipulations than are accusative-like case forms (P1). The raw data provide strikingly strong support for this prediction: while 30% of the 228 vocative forms were explicitly described as having some form of prosodic manipulation, none of the 137 accusative-like forms had any similar explicit notes (we also found little evidence of prosodic manipulations in the transcriptions of such forms). Of the vocatives that do show prosodic manipulations, 36% exhibit changes in stress or pitch accent, 41% changes in tone, and 34% vowel lengthening (the sum of these percentages is higher than 100% as some forms are accompanied by multiple prosodic changes). Note that the observed asymmetry is likely an underestimation. Prosodic modulations, especially non-lexical prosodic patterns, are a heavily underdocumented aspect of linguistic structure. It is likely that subtle prosodic modulations either have been overlooked or were not formally described in some of the descriptive work used to compile our data set.
To assess the statistical strength of this pattern, we fit a Bayesian mixed effects logistic regression model to all vocative and accusative-like forms with prosodic manipulation as the outcome variable and the model structure outlined in Section 2.3. This proved to be a somewhat challenging task due to the absence of  accusative-like case forms with prosodic manipulations, which created a case of quasi-complete separation. Quasi-complete separation arises when the outcome can be predicted with perfect accuracy for a subset of the data (Zorn 2005), and leads to a situation where model parameters are ill-defined. This tends to result in absurdly high parameter estimates and overly wide confidence intervals. We followed the advice in Gelman et al. (2008) and Gelman (2019) to curb these parameter estimates by using restrictive priors (Student's t distributions with degrees of freedom of 4 and a scale parameter of 2) on the fixed intercept and the fixed effect of broad case. Nonetheless, the resulting estimates ought to be taken with a pinch of salt due to the inherent uncertainty of model parameters in cases of separation.
By-family averages and model estimates for the probability of prosodic manipulations in vocatives versus accusative-like forms are shown in Fig. 2. The estimated probability of prosodic manipulations in accusative-like forms is 0.003 with a 95% CrI of [0.00,0.02]. In vocatives, the estimated probability is 0.32 with a 95% CrI of [0.11,0.58]. The estimated difference between the two is 0.32 with a 95% CrI of [0.11,0.57]. Note that a difference of 0 is well outside this interval: the data provide strong support for the prediction that vocatives are more likely to be accompanied by prosodic manipulations than accusative-like case forms.

Presence, position, and nature of consonants (P2-P4)
We now turn to the prediction that vocatives are less likely to contain consonants than accusative-like case forms (P2). We test this prediction on a data set that contains case forms with at least some segmental material-in other words, we exclude forms that are zeromarked or marked only by prosody. Similar to the previous case, the raw data provide strong support for our prediction: 72.9% of accusative-like case forms contain consonants as opposed to 41.7% of vocatives (see Fig. 3). 1 Our statistical model is structured similarly to the one in the previous section, but with presence of consonants as the outcome. The estimates from this model are shown in Fig. 3 [-0.69,-0.18]. The data strongly support our hypothesis that vocatives are less likely to contain consonants than accusative-like case forms. Note that the model predictions are slightly more exaggerated than the difference in the raw data (-0.45 vs. -0.31, respectively). This is due to a few large language families that show less of a difference between the two case categories. These language families have an unduly large influence on the raw proportions due to the number of observations they contribute to the data set. Failing to control for dependencies within language families therefore leads to a smaller estimated effect size, as is the case for the raw proportions. Our mixed-effects model essentially treats these cases as outliers and assigns them a lower weight in calculating the fixed effect estimates, leading to a larger estimated effect size.
Our prediction P2 is based on the morphologisation of short intrusive vocoids accompanying vocative chants. Based on cross-linguistic observations , this type of intrusion typically takes place at the end of the word, in which case the inserted material will likely be morphologised as a suffix. However, our data set also contains prefixes, and a variety of clitics and particles, which may have arisen through different diachronic pathways. In order to provide a more stringent test of our predictions, we refit our model to a subset of the data containing only suffixes (leaving 89 accusative-like forms and 114 vocatives). The estimated difference in the probability of consonants between vocatives and accusative-like case forms remains essentially the same: -0.48 with a 95% CrI of [-0.75,-0.16] (compare Fig. 3b and 3c).
Turning our focus to the subset of forms that contain consonants (86 accusative-like forms and 70 vocatives), we now test two further predictions: vocatives are more likely to contain non-final consonants than other case forms (P3) and vocatives are less likely to contain (voiceless) obstruents (P4). Let us start with P3: 34.9% of accusative-like forms with consonants have that consonant in final position. The corresponding figure for vocatives is almost the same: 35.7%. This is in line with our model estimates: a probability of 0.34 for accusativelike forms with a 95% CrI of [0.05,0.78], and a probability of 0.36 for vocatives with a 95% CrI of [0.03,0.69]. The estimated difference is -0.08 with a 95% CrI of [-0.55,0.34]. Based on the small size of the estimated difference and the width of the confidence intervals, our data do not provide evidence that accusative-like forms and vocatives differ with respect to the typical position of existing consonants.
As for P4, the raw proportion of forms with obstruents is 43.6% in the accusative-like group and 54.0% among vocatives. A logistic model with obstruent as its outcome variable estimates the probabilities of obstruents as 0.37 for accusative-like forms with a 95% CrI of Given the wide credibility intervals, and the fact that 0 is well within the CrI around the estimated difference for both models, the data do not provide any evidence for differences in the distribution of obstruents or voiceless obstruents across accusative-like forms versus vocatives.
In summary, we found strong evidence that vocatives disprefer consonants in suffixes and also more generally in all types of morphological markers. However, we did not find any evidence that final consonants or (voiceless) obstruents are dispreferred in vocatives (over and beyond the general preference against consonants).

Presence and length of vowels (P5-P6)
In this section, we test the prediction that vocatives are more likely to contain vowels than accusative-like case forms (P5). Unsurprisingly-given that morphological markers typically constitute their own syllable-the raw proportion of forms with vowels is close to ceiling for both groups: 92% among accusative-like forms and 93% among vocatives. We fit a Bayesian mixed effects logistic regression model to the data with presence of vowel as the outcome variable. The estimated probability of vowels in accusative-like forms is 0.95 with a 95% CrI of [0.82,0.997], while the corresponding probability for vocatives is 0.95 with a 95% CrI of [0.85,0.995]. The estimated difference is 0 with a 95% CrI of [-0.11,0.13]. In sum, there is no support for the prediction that vocatives are more likely to contain vowels. One simple explanation for this fact is that the presence of vowels is essentially non-negotiable for most morphological markers, and therefore cannot vary as a function of case type.
We also tested for differences in vowel length between accusative-like forms and vocatives in the subset of forms with at least one vowel (108 accusative-like forms and 157 vocatives) (P6). Note that vowels were only coded as long when they were explicitly marked as such by the grammar writers. Vowels lengthened through prosodic manipulations were coded as short. Since prosodic manipulations are only present in vocatives, this makes our assessment conservative. The raw proportions seem to support the prediction that vowels are more likely to be long in vocatives than in other case forms, with 15% long vowels in accusative-like forms and 31% long vowels in vocatives. However, the advantage of vocatives shrinks substantially when the same proportions are estimated using a logistic regression model with vowel length as the outcome variable. The estimate for accusative-like forms is only 0.07 with a 95% CrI of [0.003,0.29], while for vocatives it is 0.1 with a 95% CrI of [0.007,0.36]. The estimated difference is 0.04 with a 95% CrI of [-0.13,0.22]. Considering the wide CrIs and the close-to-zero estimate, our statistical model does not support the prediction that vowels are more likely to be long in vocatives than in other case forms. Similar to the results for prosodic modulations above, it is likely that these numbers underestimate the strength of this pattern across languages. Subtle lengthening effects, especially in phrasefinal position, might have been overlooked by grammar writers or attributed to phonetic variation and therefore not coded explicitly.

Vowel quality (P7)
The final hypothesis that we test relates to vowel quality differences between vocatives and other case forms, with a specific focus on vowel height (P7). We presented two different possible predictions: vocatives may contain a higher proportion of low vowels (due to the higher sonority of low vowels) compared to other cases; or, alternatively, vocatives may contain a higher proportion of mid vowels (as morphologised intrusive vocoids are more likely to be non-peripheral). 2 In this section, we compare the proportion of low, mid and high vowels between accusative-like forms and vocatives. We fit a multinomial regression model to the data with the same structure as before but with three-levelled vowel height as the outcome variable. Though the parameterisation of the model is somewhat different from logistic models, our presentation will proceed along the same lines as before, relying on information derived from the posterior distribution. Thus, we present the estimated proportions of low, mid, and high vowels in accusative-like forms versus vocatives as well as the estimated difference between the two cases for each level of vowel height.
The raw proportions of different vowel heights in our data for accusative-like forms are 36% low, 19% mid and 45% high. For vocatives, the corresponding proportions are 30% low, 51% mid and 19% high. Based on these figures, there seems to be a higher proportion of mid vowels in vocatives, while the proportions of low and high vowels are slightly lower (see Fig. 4). The model estimates for the probabilities of low vowels are 0.39 [0.09,0.75] in accusative-like forms and 0.26 [0.05,0.58] in vocatives, with an estimated difference of -0.13 [-0.49,0.18], which means that we cannot reliably establish any differences between the two cases with respect to low vowels. For mid vowels, the estimates are 0.26 [0.05,0.59] for accusative-like forms, 0.55 [0.21,0.83] for vocatives and 0.29 [-0.03,0.58] for the difference between vocatives and accusative-like forms. Though the credibility interval around the estimated difference does include 0, 96% of the probability mass lies to its right, which means that the model provides some degree of evidence in favour of the hypothesis that vocatives are more likely to contain mid vowels. Finally, the model estimates for high vowels are 0.35 [0.13,0.62] for accusative-like forms, 0.19 [0.08,0.37] for vocatives and -0.16 [-0.41,0.08] for their difference, with 91% of the probability mass below 0 for this last estimate. This, again, provides some evidence in support of the prediction that peripheral (in this case, high) vowels are dispreferred in vocatives.
To summarise, we found no truly compelling evidence for different vowel height distributions in vocatives versus accusative-like forms. Our findings suggest that vocatives may be slightly more likely to contain mid vowels and slightly less likely to contain high vowels when compared to other case markers, though the evidence for these patterns is weak.

Summary of findings
We found strong evidence that vocatives are frequently accompanied by morphologised prosodic phenomena such as tonal changes, stress shift, and vowel lengthening, while accusative-like case forms did not show any such changes. We also identified a remarkably strong pattern whereby vocatives tend to avoid consonants in comparison with other case forms. This pattern holds even when the data set is restricted to suffixes only. We did not find any support for predictions relating to the phonetic quality or position of consonants in markers that contain them. The data further suggest that there is an overwhelming preference for markers of all types to contain at least one vowel. However, we found no differences in the strength of this preference between vocatives and other case forms. The data also did not provide any evidence for a preference for long vowels in vocatives. Finally, we found weak evidence that vocatives tend to appear with mid vowels, while avoiding high vowels.

The tune-driven evolution of vocatives
Let us first discuss the abundance of prosodic phenomena accompanying vocatives. They provide strong support for the assertion that vocative chants-pitch modulation, increased intensity, and vowel lengthening-are a common feature of vocatives. At this point, it will be useful to clarify our approach to the distinction between vocative markers and vocative chants. Implicit in our analysis is the assumption that the prosodic manipulations noted by the grammar writers have been morphologised as part of the vocative marker, and, although they historically originate in prosodic vocative chant-type patterns, they are now part of morphology. While this may be the correct interpretation for some of the languages, it is more than likely that some of the grammar writers who describe tone-or stress-related alternations are simply describing prosodic aspects of the vocative chant itself. In most cases, it is simply not possible to determine whether a given prosodic pattern is a historical consequence of the existence of vocative chants, or an inherent component of the vocative chant itself. Such a distinction may not even be meaningful in languages with obligatory vocative markers, where vocative chants and vocative markers may always appear together. In any case, this does not affect the main conclusion that we have drawn from the data, namely that vocative constructions are often accompanied by vocative chants.
The case of vowel lengthening deserves further discussion. Vowel lengthening accounts for 34% of all the prosodic manipulations in our sample, which corresponds to nearly 10% of all the vocative markers that we have recorded. Similar to stress-shift and tonal changes, increased vowel length may simply be part of vocative chants. Indeed, based on the authors' experience with English, German, and Hungarian, vocative chants can certainly be accompanied by vowel lengthening in languages with no segmental vocative markers (see also the Spanish example in Fig. 1). However, lengthening can also be seen as a prosodic device for enhancing the tune-carrying capacity of the words used in a vocative construction . Cases where such gradient enhancements are morphologised as categorical vowel lengthening constitute clear examples of tune-driven segmental developments. Again, it is difficult to say with certainty which of the languages in our sample fit this description, but at least some of them likely do.
Based on our findings, the typical profile of segmentally marked vocatives is a single mid (or perhaps low) vowel. There is a large number of vocative suffixes in our sample that fit this segmental profile: For example, the Hindi plural vocative marker /-e/ (Kachru 2006;from Proto-Indo-European *-e;Fortson 2010); the Georgian postconsonantal vocative marker /-o/ (Baerman, Brown, and Corbett 2002); the Fore vocative /-o/ (Scott 1978); the Awtuw vocative /-@/ (Feldman 1986); the Kickapoo vocative /-e/ (Voorhis 1974); the Ket vocative markers /-a/, /-o/ and /-@/ (Georg 2007); and many more examples could be listed. The frequency of such markers in our sample fits well with the historical pathway outlined in the introduction, whereby vocative markers arise through the morphologisation of tune-bearing intrusive vocoids. In this case, tune-driven pressures manifest as the first step of a process of morphologisation. They appear as low-level gradient adjustments that later transform into categorical patterns through mechanisms that are independent of tunecarrying capacity (e.g. misperception; Ohala 1981, Blevins 2004. Borrowing terminology from the sound change literature, such tune-driven pressures exert themselves during the initiation phase of a change. The pattern then spreads through the speech community through other mechanisms (e.g. Milroy and Milroy 1985: 347-8).
The data set also contains a relatively large number of particles and clitics with segmental properties that are similar to the suffixes listed above: mid/low vowels without any consonants (or with highly sonorous consonants). Examples include the prenominal vocative particle /a/ in Eton (Van de Velde 2008); the prenominal vocative particle /e/ in Maori (Bauer, Parker, and Evans 1993); the postnominal particles /(j)o/ and /(j)e/ in Mani (Childs 2011); the Lezgian prenominal vocative particle /ja/ (Haspelmath 1993); the Amele postnominal vocative particles /e/ and /o/ (Roberts 1987); the Khalkha Mongolian postnominal clitics /aa, ee, oo, OO/ (Janhunen 2012); and the Mian postnominal clitic /o/ (Fedden 2011). Since clitics and especially particles are more independent of the stem than suffixes, it seems less likely that these markers originated as intrusive vocoids. While a move from free-floating grammatical elements towards suffixes is a well-attested grammaticalisation cline, the opposite pattern is rarely seen: segmental material that is strongly associated with the stem (such as intrusive vocoids) does not typically become independent of it (Hopper and Traugott 1993: 7). However, exceptions may exist, such as Hellmuth's (2020, in press) proposed re-analysis of phrase-final vowel insertion in polar questions as a particle in Tunisian Arabic.
This suggests that such markers may have arisen through a different pathway. One likely candidate is the grammaticalisation of so-called conative interjections as vocative markers. Conative interjections are typically 'aimed at getting someone's attention or they demand an action or response from someone' (Ameka 2006: 744). Conative interjections include English hey, oi, Hungarian hé /he+/ and Catalan ei. Although we have not been able to find any cross-linguistic studies of the phonology of conative interjections, they appear to fit the key patterns in our sample: they disprefer consonants (perhaps with the exception of /h/) and tend to contain mid vowels. Moreover, this class of words overlaps with vocatives in terms of its function and is under some of the same communicative pressures: it often carries a characteristic intonation contour and needs to be capable of being transmitted over longer-than-usual distances. A particle preceding the target noun is also in a prosodically prominent position enabling the realisation of a significant component of the vocative contour, which is commonly characterised by an initial rise in pitch. In words with initial consonants, this rise might be obscured to some extent. A preceding vowel may therefore assist in the realisation and perceptual retrieval of vocative contours. This enhancement of the tunecarrying capacity of vocative constructions might further increase the likelihood of conative interjections being grammaticalised as vocatives.
Thus, the grammaticalisation of conative interjections provides an alternative route through which some of the observed patterns in our sample may have emerged. In this case, the role of tune-driven pressures is to shape the interjections themselves, which then give rise to vocative markers through regular and wellattested pathways of grammaticalisation. Although we can only speculate about the specific tune-driven pressures that shape interjections, there are at least two plausible candidates. First, speakers may have some direct control over the phonological shape of interjections by choosing from multiple alternatives or by adjusting them in phonetically gradient ways. Secondly, conative interjections that are not heard by the listener due to aspects of their phonological shape (e.g. because they do not contain sufficient acoustic energy) may fail to be learnt/reproduced by the listener, which puts them at a selective disadvantage compared to acoustically more salient interjections (cf. Wedel 2006 for a similar account of the selective disadvantage experienced by ambiguous tokens of phonological categories).
We have outlined three potential pathways towards the emergence of vocatives that may account for various aspects of our data: (1) the morphologisation of prosodic manipulations directly associated with vocative chants; (2) the morphologisation of intrusive vocoids that arise out of a need to increase the 'tune-friendliness' of the stem; (3) the grammaticalisation of conative interjections as vocatives.
Let us now turn to the consonant-related predictions that our data failed to confirm: the predicted pressure against final consonants (P3) and against (voiceless) obstruents (P4). Our proposed diachronic pathways are agnostic with respect to these patterns. The first pathway can only account for suprasegmental features, while pathways (2) and (3) lead to markers without consonants. Since markers with consonants are likely to have different sources, the fact that we did not find evidence for these predictions has no bearing on the proposed pathways. This highlights a point that is rarely made but is vitally important to studies that focus on the optimisation of linguistic structure along some parameter (e.g. the drive towards efficient communication; Gibson et al. 2019): available patterns of linguistic variation constrain the pathways through which such optimisation can take place, and therefore also limit the types of optimisation that may occur. In the current case, optimisation in terms of the presence/absence of consonants is possible through pathways (2) and (3), while optimisation in terms of the quality or position of these consonants cannot emerge through the same pathways.
We also made two vowel-related predictions that the data did not confirm: we predicted that vocatives show a preference for vowels (P5), and that vowels are more likely to be phonologically long in vocatives (P6). We found that the baseline probability of vowels is close to ceiling both in vocatives and accusative-like forms. If there was an underlying difference between the two groups, it would likely be so small that it would require a much larger data set to detect reliably. As for P6, we did find a substantial number of cases where vocatives are accompanied by prosodic lengthening (cf. Section 3.1), but there is no evidence that vocative markers are more likely to contain phonologically long vowels. This analysis is conservative: had cases of prosodic lengthening been grouped with phonological length, our results would have aligned more closely with prediction P6.
In Section 2.1, we noted that some vocatives are formed by deleting segmental material, which arguably decreases the tune-bearing capacity of the stem (at least in cases where the deleted material includes highly sonorous segments). For instance, vocatives in Seediq are formed by deleting all segmental material except for the last syllable of the stem (e.g. the name Masaw becomes Saw; Adelaar and Himmelmann 2005). To understand these patterns, it is useful to note that vocatives are typically used with personal names and kinship terms-a fact that is often mentioned explicitly by grammar writers. Both names and kinship terms are often subject to processes of hypocoristic formation, which typically involve truncation (e.g. English: Thomas > Tom; Hungarian: testvér > tesó 'sibling'). This parallelism between truncated vocatives and hypocoristics is also noted by Daniel and Spencer (2009: 4): '[a] common process [of vocative formation] is phonological (i.e. not morphological) truncation (cf. the Russian 'new vocative'), similar to common types of hypocoristic formation. ' We suggest that vocatives that involve segmental deletion are likely the result of a shift in the use of such hypocoristics towards a purely vocative function. This does not invalidate our general account, which predicts that some common pathways towards the emergence of vocatives involve tune-text interactions, but does not require all pathways to do so.
It is more than likely that many languages have zeromarked vocatives that are excluded from our data set due to our sampling method. In fact, zero-marking for vocatives may be more frequent than it is for accusativelike forms, which would seem inconsistent with our suggestion that vocatives are optimised for tune transmission. Note, however, that explicit marking does not necessarily increase tune-friendliness, as only sonorants are well-suited to carrying tunes-markers with obstruents may not contribute to tune-carrying capacity at all. Moreover, it is not clear to what extent we can classify zero-marked vocatives as morphological case forms, and therefore it may not even be meaningful to talk about such forms being phonologically optimised to carry tunes. Finally, the point about some but not all pathways favouring tune-text interactions stands here as well. It seems plausible that nominatives are a prominent source of vocative forms, and since nominatives are often zero-marked, this would yield a large proportion of zero-marked vocatives. Some of these forms may then be enriched with further phonological material through the pathways outlined above, while others may remain zero-marked. The presence of such zero-marked forms does not go against our claim that tune-text interactions play an important role in the emergence of morphologically marked vocatives.
Our findings align well with the literature on phonetic variability and segmental alternations across the world's languages . Languages commonly lengthen existing vowels, insert non-lexical vowels or suppress vowel deletion/devoicing in the presence of communicatively relevant pitch modulations. The phonetic forces that drive these alternations and explain their directionality are always present in the transmission of spoken language. However, our study goes further than merely demonstrating the presence of tune-text interactions. We show that-under the right circumstances-they can give rise to robust, systematic and categorical patterns by percolating up to phonology and morphology. To our knowledge, the present study of vocatives is the first demonstration of interactions between tune and text that have become part of the grammar.

Conclusion
This article contributes to our understanding of how language balances across multiple dimensions of meaning through a combination of prosodic and segmental devices. We found a systematic interaction between the acoustic requirements of pitch patterns used to convey pragmatic meaning and the segmental make-up of morphological markers. Specifically, our data suggest that the emergence of vocatives favours historical pathways that enhance the tune-bearing capacity of vocative constructions, resulting in a strong preference for vocalic markers. This implies that a comprehensive understanding of the evolution of linguistic systems must not be limited to propositional content (cf. Foulkes et al. 2018;Roettger and Grice 2019). There are other levels of meaning that language users express through oftenneglected aspects of the speech signal such as prosody. These contribute significantly to the evolution of linguistic patterns and exert a visible pressure on other levels of linguistic organisation.
1. In the limited sample of non-zero-marked nominative forms available to us, the proportion of forms with consonants is 58.5%-that is, still substantially higher than in vocatives. This is a suggestive finding, but, due to the small size of this sample, we do not analyse it further, as it is unlikely to yield conclusive results. 2. We note that many of the mid vowels in our data were transcribed or spelt using the symbols hei or hoi. When taken at face value, these symbols denote front and back vowels, respectively, and are therefore less peripheral than a central schwa, which appears to be at odds with our prediction. However, since not all of our sources used IPA transcriptions, it is possible that these symbols sometimes do refer to central vowels. Moreover, in languages that lack a mid central vowel phoneme, the phonologised reflex of an intrusive schwa-like sound may, in fact, be a fully front or back mid vowel. Therefore, sounds denoted by hei or hoi are plausible reflexes of an original mid central intrusive vocoid.