Here is a paper by Canaan Breiss and Bruce Hayes (I will refer to the paper as B&H). To offer a brief summary of B&H’s main empirical point, it shows that the choice of syntactic ‘structure’ (i.e., both the choice of terminals and their arrangement) is probabilistically biased towards avoiding phonotactically problematic sequences (e.g. a sequence of two consecutive sibilants), even when the sequence in question arises across a word boundary. It does so by focusing on a series of well-established phonological constraints (from work on word-level phonology), and showing that word-bigrams whose juncture violates these constraints are underattested. This is shown to be the case in a variety of corpora, both written and spoken. Let’s refer to this as Evidence for Phonologically-Influenced Choice of Syntactic Structure, or EPICSS for short.
NB: I have a qualm with the veracity of B&H’s empirical investigation, but it’s not what I’d like to focus on in this post, so I’ll just register it and move on. The issue is that B&H is measuring what happens at “word boundaries” – i.e., where we put the spaces – and I don’t believe that where we put the spaces is a faithful representation of any significant non-orthographic grammatical property. (There are of course phonological words, and there are individual morphemes; but it is clear that B&H is referring to neither of these when it says “across word boundaries.”) Sampling the set of environments where spaces occur is therefore essentially guaranteed to grab what is, grammatically speaking, a mixed bag of environments, some of which are every bit as “word-internal” as the environments that word-level phonology was developed for in the first place (except that they’re written with a space). For example, in the history of English, ‘today’ used to be written as ‘to-day’, and, before that, as ‘to day’, and it’s not immediately clear that anything else about it changed over that timecourse except how it was written. It remains to be seen, then, how much of the “signal” in B&H’s result is simply driven by this: single phonological words that just happen to be written with a space in the middle.
Setting this empirical qualm aside, though, I’d like to focus on the theoretical point that this empirical investigation is leveraged to argue for. The claim is that EPICSS militates against fully “feed-forward” theories of grammar, where syntax is phonology-free (as well as semantics-free), and phonology does not enter the picture until after spellout, once syntax hands over the output of the syntactic derivation to the phonological component. Instead, the argument goes, such results favor a “parallel” grammatical architecture (of the kind advocated for by Jackendoff, among others), where phonology and syntax (and semantics, too) are intertwined, and share information bidirectionally. EPICSS, according to B&H, shows that phonological considerations play a role in even the most basic combinatorics of syntax. (B&H also compares phrase-internal word boundaries, where the effects are significant, to pairs of word that span a major phrase boundary, where they are not.)
B&H is not the first contribution to discuss EPICSS. It cites earlier work by Stephanie Shih, Kevin Ryan, Arto Anttila, and others, which adduces different sources of EPICSS. That work, however, is far more cautious than B&H in what conclusions it attempts to draw from EPICSS (see in particular the discussion in Anttila 2016), and for good reason. B&H’s architectural argument, I will argue, is based on a somewhat fundamental misunderstanding of what syntactic theory is a theory of.
Obviously, it is not god-given that one’s theory needs to be about any particular X, and so one is free to envision theories that are about whatever one wants; but criticizing the feed-forward theories that have been put forth in the literature does require an accurate understanding of what those theories purported to address, and what they did not purport to address. Certainly if one looks at Chomsky’s Aspects, which is given in B&H as the reference-of-record for the feed-forward model, one sees that syntactic theory was supposed to be a ‘machine’ that generated exactly and only the licit form-meaning pairings in a given language – or, to be somewhat anachronistic, exactly and only the licit <PF, LF> pairings. Arguably, no greater aspirations were attached to the theoretical characterization of the syntax of the adult speaker. (There were of course lofty aspirations aimed in other directions, e.g. language acquisition.)
Let’s assume, somewhat artificially, that for some LF q it turns out that there are multiple PFs p, p, … p[n] such that <p, q>, <p, q>, … <p[n], q> are all deemed licit by the syntax of the language under consideration. Syntactic theory, as characterized in the previous paragraph, has nothing at all to say about what governs the choice between these different PFs. This, of course, is an abstraction; but abstractions are neither “right” nor “wrong” (or rather, they are always “wrong” in the sense of being at some nonzero distance from reality), and are to be judged only on whether they are fruitful. So, perhaps the real point of the literature I’m referring to is that its authors do not like the abstractions made in syntactic theory, and would prefer different ones. That would all be well and good, but it would not bear on feed-forward theories at all, given what those are supposed to be theories of. You don’t criticize Newton’s choice to treat objects as points in a frictionless vacuum on the grounds that it does not account for atmospheric resistance. He knew that going in, and so did everyone who read his work. (Nor, importantly, do you criticize it on the grounds that the factors that are involved in atmospheric resistance happen to also be forces and velocities and actions and reactions etc.) So the fact that syntactic theory does not account for the speaker’s choice among various licit <PF, LF> pairings is not a problem for syntactic theory. And that is so even if that choice, when finally carried out, also makes use of some linguistic properties.
Perhaps the point of B&H is slightly different, though, and can be phrased as follows: how could phonological factors possibly influence the choice between two syntactic structures S and S’ if syntax is encapsulated from phonology? If that is the question under consideration, then the misunderstanding is not about syntactic theory and its goals, but about modularization in general. Here, an example might be instructive: I remember seeing research indicating that speakers of English modulate their VOT (Voice Onset Time; roughly: how long you wait after a voiceless obstruent before (re)commencing vibration of your vocal folds), based on their social identity, and their perception of the social identity of their interlocutors. Furthermore, a speaker’s VOT informs the hearer’s own perception of the speaker’s social identity. Suppose all this is true. Does this mean that the cognitive component that keeps track of social identity is not fully encapsulated from the component that governs VOT, and, instead, the two are part of one single giant parallel architecture (which, at that point, might as well encompass all of cognition)? Notice that here, too, the information flow would have to be bidirectional, since one’s own social identity informs one’s own VOT, but the VOT of a speaker informs the hearer’s perception of the speaker’s social identity.
An alternative view of the VOT example, which is far more attractive, I think, would be to say that the competence system sets certain boundaries (or perhaps a characteristic probability distribution) for what is an acceptable VOT for voiceless obstruents in the language in question. But within the bounds that the competence system has set, the option that is actually utilized in a given setting is something that is connected to all kinds of factors, both linguistic and extra-linguistic, and it can both inform and be informed by these other factors. So: phonology and phonetics belong to a strictly distinct cognitive module than the one that social identity belongs to; but the latter module can still interact with the choice made among the different candidates (in this case, candidates with different VOT-values) that the competence system has deemed well-formed.
If you buy this logic – and gosh, I certainly hope you do, because on the opposite side lies a very steep slope towards mental globalism (i.e., no modules) – then you can probably already see the flaw in B&H’s argument. The fact that phonological factors can influence the choice among multiple well-formed syntactic structures is no more an argument against feed-forward models (or against the autonomy of syntax, really) than the VOT result is an argument against dissociating linguistic cognition from general social cognition.