Generative grammar, understood as a theory of the human capacity for linguistic cognition, is an explicitly modular theory. There are both empirical and conceptual reasons for this. I’ll get to the empirical ones later in this post; but on the conceptual front, very simple and straightforward considerations in the philosophy of science favor a modular approach over a globalist, associationist one. That is because a modular theory constitutes a particular edge case of the associationist theory; specifically, it is an edge case in which a particular set of connection weights happens to be zero, and furthermore, the set of those zero-weight connections carves out what is an informationally, representationally, and/or computationally distinct unit of cognitive machinery. But post-hoc analysis of the derived internal structures of associationist models is notoriously difficult to do (and only gets more difficult as the models scale up, as is the case with current Deep Learning models, for example). Therefore, if we take a globalist associationist model as our starting point, we may never discover that the modular approach was in fact correct. In contrast, if we start with a modular approach, we can investigate where and when this approach strains under the weight of the facts, and why.
A major recurring theme in the last 20‑30 years of linguistic research has therefore been to examine and re‑examine the boundaries between different grammatical modules, and to question where those boundaries are drawn and why. This has manifested itself in a number of different types of work, including: arguing that phenomenon P is properly construed as part of module M rather than module N; arguing that phenomenon Q belongs neither in module M nor in module N, but instead arises at the M‑N interface; and so forth. All of this, I believe, is a natural and necessary part of theorizing within a modular framework.
But one thing that I think is sometimes overlooked is the fact that, under Chomsky’s (1995 et seq.) Minimalist Program, the motivations and goals of modularization have undergone a subtle but significant shift. Consider: Chomsky’s “Strong Minimalist Thesis” (SMT) states that the only irreducibly syntactic operation or relation is Merge. Everything else that was once thought of as syntactic must be either a “third factor” effect (efficient computation, etc.), or, alternatively, an interface effect – the result of an interaction between syntax and systems of phonology and of semantics and pragmatics. Of course, the facts of language did not suddenly change upon arrival of the SMT on the scene. And that meant that the empirical burden previously borne by syntactic explanations had to, rather inescapably, be shifted onto these other two modes of explanation. Of equal importance: it was quickly recognized that the “third factor” explanations currently on offer are little more than speculative just-so stories.My favorite example, though far from the only one, is Chomsky using efficient computation to motivate the idea that syntactic operations happen as soon as possible (2001, Derivation by Phase), and … Continue reading
This left precisely one mode of explanation available for the bulk of syntactic explananda: other modules (morpho-phonology on the one hand, and semantics/pragmatics on the other) and their interfaces with syntax. This is most clearly evidenced, I think, in the proliferation of proposals in which “postsyntax,” especially on the morpho-phonological side, is enriched with computational capabilities that look a heck of a lot like syntax itself. (See Bobaljik 2008, for a representative example.) Here, then, the goal of modularizing the grammar was no longer simple philosophy of science; it was, instead, a way of working towards a maximally “thin” syntax, in the hopes of vindicating the SMT, the central conjecture of minimalist theorizing. (This is what Marantz 1995: 380‑381 called “the end of syntax.” And I feel fine.)
Importantly, these different motivations for modularization are often in direct conflict. For example, the more prima facie syntactic work is offloaded to “postsyntax,” the more the latter starts to resemble syntax itself, and the less computationally and representationally distinct syntax and postsyntax become – undermining the original, general, philosophy-of-science kind of modularization. Let me be more explicit: if your “postsyntax” traffics in, e.g., copies, chains, the A vs. A‑bar distinction, etc., then you have done violence to the very computational and representational distinctions that are supposed to distinguish syntax from other modules, in the first place.
This conflict is not necessarily a bad thing, scientifically speaking, because it sets up a research question – namely, which modular approach (if any) is on the right track. But what must be acknowledged is that, because the conceptual considerations at play often stand in this kind of opposition, the question is one that is best adjudicated on empirical grounds.
One of the key outgrowths of the “thin syntax” approach, at least on the morphosyntax side, has been the move – in practice, if not in theory – from (1) to (2):
- A process or relation that informs both morpho-phonology and semantics must be situated in the syntactic module.
- A process or relation that does not inform both morpho-phonology and semantics should not be situated in the syntactic module.
A useful way to think about (1) vs. (2) is in terms of ‘sufficient’ vs. ‘necessary’ conditions. The statement in (1) says that informing both morpho-phonology and semantics is a sufficient condition for inclusion in the syntactic module. The statement in (2) elevates the same condition to a necessary one. This makes sense if your goal is a maximally “thin” syntax: adhering to (2) maximizes the set of things you can offload to other, non‑syntactic modules of grammar. But (2) makes little sense on the prior, more venerable approach to modularity.
It’s noteworthy in this context that Distributed Morphology, and much of the work that is loosely associated with it, took (2) and ran with it. Declension classes? They “must” be postsyntactic, because they are semantically inert. Morphological case (a.k.a. m‑case)? It “must” be postsyntactic (in this case, both because of (2) and because it purportedly informs no narrow-syntactic processes; more on that below). Person/number/noun-class agreement? Well, it depends on m‑case, so if the latter is postsyntactic, so must agreement be (Bobaljik 2008). And on and on it goes.
As a side note, early DM’s own position on roots stood in violation of both (2) and (1): the choice between √DOG and √CAT clearly affects both morpho-phonology and semantics, and yet early DM placed it entirely outside of syntax, positing that syntax itself had only one, completely undifferentiated root object. Unsurprisingly, this then required a separate line of communication between PF and LF to enforce the correlation between the choice of morpho-phonological content and the choice of encyclopedic content, a move that, as I’ve noted elsewhere, is straight-up incoherent. Thankfully, more data was then brought to bear making it clear that the early DM position on roots was not only theoretically untenable, but empirically untenable, as well.
But back to the difference between (1) and (2). I’d like to first note that on the syn-sem side, no one in their right mind would subscribe to (2), not even as a heuristic. For example, (2) applied to QR would yield the conclusion that QR is not syntactic (since it does not have any morpho-phonological footprint). As I have noted repeatedly on this blog, there is something of a dearth these days of people directly comparing the syntax-PF mapping to the syntax-LF mapping and vice versa, which gives rise to the mistaken impression among practitioners concentrating on one of the two that the other is somehow fundamentally different. (Call it “PF-/LF-exceptionalism,” if you will.) But for the same reasons noted above, favoring the pursuit of the more restrictive theory first, we should assume that the mappings are identical (up to the substance sitting on the other side of the mapping, i.e., morphological exponents vs. whatever the fundamental units of meaning are), unless and until the facts force us otherwise. And it turns out that there are surprisingly few facts of the relevant sort, I think.
More to the point, we can and should ask: how has adopting (2) as a heuristic / working hypothesis actually fared? I think the answer is: pretty badly! Head-movement, the Person Case Constraint (PCC), and the aforementioned processes of morphological case assignment and person/number/noun-class agreement, have all been claimed at some point over the past 30 years to be postsyntactic, often partially or wholly on the basis of (2). This has resulted in a “postsyntax” that needs access to things like the fine-grained details of syntactic structure (up to and including c‑command), to get the facts of the PCC right. It also needs access to the notion of copies/chains, and the distinction between A and A‑bar movement. That’s because m‑case, at least in a large subset of languages, is sensitive neither to the final position nor the base position of a noun phrase, but to its highest A‑position. And so your postsyntax has to “undo” A‑bar movement, but not undo A‑movement, prior to performing its m‑case computation. (You should be squirming right about now.) This handout is something of a clip show that goes through these phenomena one by one, and demonstrates why relegating them to the “postsyntax” was a bad idea. What we are left with are a whole bunch of hierarchy-sensitive phenomena – which cannot be relegated to “postsyntax” under any contentful definition of how “postsyntax” is different from “syntax” – and which also don’t lend themselves to semantic explanation. In other words, what we are left with is… the Autonomy of Syntax.
Does this mean that, e.g., declension classes are therefore a syntactic property? No, of course not. But it means that the mere fact that they don’t inform semantics is not itself an argument for this conclusion. What distinguishes declension classes, on the one hand, from m‑case, agreement, head-movement, and the PCC, on the other, is that all these other processes and properties are sensitive to things like hierarchical structure and/or the A vs. A‑bar distinction and/or (in the case of m‑case and agreement) that they inform other syntactic processes. Whereas this is not so for declension classes. In other words, the kind of modularity considerations that turn out to be relevant are the kind we started out with, not the more recent, minimalism-driven kind.
Put another way, hewing to the minimalism-inspired kind of modularity considerations – which, to reiterate, stand in direct conflict with the notion of modularity originally employed in generative linguistics – has led the theory astray in demonstrable and rather clear ways. Syntax, it turns out, is not only autonomous (as the original modularity considerations suggested), but decidedly nontrivial (contrary to what the “thin syntax” modularity of minimalism would suggest).
If you care about syntax, this should make you happy.
|||My favorite example, though far from the only one, is Chomsky using efficient computation to motivate the idea that syntactic operations happen as soon as possible (2001, Derivation by Phase), and then using the very same considerations to motivate the mutually-opposing position that all operations happen simultaneously at the phase level (2008, On Phases). These two positions are contradictory because, on the latter view, operations that could in principle happen during the building of the interior of a phase must instead wait until the phase head before they can apply, in marked opposition to the view that operations happen as soon as possible. I have no idea which of these is correct, if either is; but what is quite clear is that considerations of “efficient computation” are vague enough to support whatever conclusion one wants.|