Chomskyan generative grammar, while correctly identifying that language has generative properties (discrete infinity), faces significant empirical challenges including the poverty of the stimulus argument being challenged by corpus studies, the proliferation of functional projections resembling Ptolemaic epicycles, and the inability to handle cross-linguistic data like long-distance reflexives in Mandarin, Icelandic, and Japanese; dependency grammar and construction grammar offer alternative frameworks that better align with cognitive science, neuroscience, and actual child language acquisition without requiring movement, empty categories, or universal grammar.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Chomsky was wrong.They taught me a lie.Added:
I nearly failed out of grad school.
People who know me already know this and the politics, not the science behind it.
But I guess now you do, too. But over a decade later, I now know that it was over protecting a lie. Let me explain.
Different linguistics programs have different ways of doing things. But the one that I attended at UPEN followed the relatively normal system of having mandatory coursework in the first few years, qualifying papers. You write a paper for publication, it's graded by your professors, and language exams. In our case, translating an academic paper from a language relevant to your research, sitting a traditional competence exam, although more on that later, or writing a paper about a sufficiently exotic language. Yeah, that's a direct quote. I'm told that when I translated a French research paper for my first exam, I translated more than anybody ever had. I thought you're supposed to translate the whole thing in an hour and I'd failed. You couldn't study at PEN without taking a full year of generative syntax. Does it matter that my focus was initially game theoretic pragmatics or that I ultimately wound up writing a dissertation on socioetics that uses geospatial statistical methods? No. Was there a course on introduction to linguistic typology and the kinds of broad questions relevant to the field?
No. Somehow the syntacticians got a strangle hold on the foundational training requirements and linguistic typology was taught incidental to learning syntax in a Chskian tradition.
And in the first semester, the syntax professor gave me a C on the midterm, which is basically the kiss of death. I won't tell you the story of how he told me nobody has ever come back from this and that I should just quit and that I shouldn't have been admitted in the first place and how I listened and then asked him simply what would it look like to come back from this and extracted painfully extracted by staying consistently focused on message what his actual standards were and then met and exceeded them with the final in the course of like 10 days. The point of the story is that generative grammar in a chskyan approach first Xbar theory and government and binding and later minimalism were the screener, the gatekeeper, the barrier, the flaming hoop that you have to jump through. If you want to know what a real intr linguistics looks like, the broad questions, typology, what the field actually does when it isn't busy gatekeeping, I'm making one link in the description. Consider it the orientation Pen forgot to give me. I should probably mention that Penn at the time had a reputation for failing students out, of course, after paying their stipen for a few years because why burn $30 to $100,000 when you could waste the same amount of money tormenting a grad student for a few years. It was an absolute hazing.
So, in my time there, I saw at least four people either asked not to return or given a terminal master's degree, which sounds much more morbid than it is, as a parting gift. Some just walked off and never came back. Pretty sure one left after the language exam was just an antagonistic conversation in German with a syntax professor grilling them about the double passive construction in German in German. So I learned syntax. I learned the [Β __Β ] out of it. I learned it so hard that I wrote a paper included in conference proceedings that prompted the professor who threatened to fail me out.
I eventually got a gentleman's B to come find me in the grad students office I was working in and ask me if I wrote it myself and who had helped me. I read all the books that I could find on Xbar theory and minimalism. I have feelings about the approaches in various textbooks, including some unpublished ones. I devoured Chsky and syntax. I had my doubts. My adviser even referred to it as a tomic system, which if you're an academic, is a sick burn. Things my wife said in casual conversation broke the model that I'd learned consistently. and a co-author of mine on the descriptive grammar of black English just handed me a book that brought the whole house of cards down. By the way, scan the QR or follow the link in description if you want to know more about that project and sign up for updates. I'm completely rebuilding my idea of how the world works like a freaking cult survivor. And today I'm going to share that with you.
This is going to be criticizing Chsky for political things like denying the Cambodian genocide or his anti-government writings while accepting money from federal defense grants. and is not going to criticize him for social things like his horrifically embarrassing interview with Ali G.
>> How many words does you know? What is some of them?
>> Or even his episttolary correspondences with Jeffrey Epste where he brainstormed how to rehabilitate the latter's public image after the trafficking was widely known. He is Chsky that is by all accounts one who doesn't research who he's talking to and who is credulous and eager to help to a fault. He's thinking about syntax the whole time. Anyway, I'm going to keep it linguistic. First, I'm going to explain what Chomsky got right and why he's so important. Then, I'll explain the challenges to his theory, including the niggling doubts I had even at the very beginning of grad school.
And finally, I'll explain the alternatives that I'm exploring and how they're the last nail in the coffin.
This is Language Jones.
First, let's start with how Chsky got famous and what Chomsky got right. His rise to fame in the 1960s in linguistics coincided with the cognitive revolution he helped kick off. A huge influence was his scathing review of BF Skinner's verbal behavior in 1959 eviscerating Skinner's behaviorist approach to language learning which more or less reduced human language to stimulus response like classical conditioning of dogs. He was developing his theory of transformational grammar at the time.
And despite having not been enrolled at Penn for four years at the time, in 1955 he submitted a thesis and was awarded the doctorate. It's astounding how different the times are. Anyway, he wrote syntactic structures in 1957 and rose to fame on the basis of that and his epic takedown of Skinner two years later. His approach has changed over the years. It's been about 70 years of theorizing and work, but there are a few key points that he got really right, and I think they're worth stating explicitly. First, those who criticize Chsky often criticize the concept of generative grammar. That's not up for debate. We clearly have the ability to make use of a limited set of symbols or mental objects and create infinite novel utterances from combining them in new ways. Not only that, but the ways we combine them are constrained. There are grammatical and ungrammatical sentences.
In linguistics, ungrammatical doesn't mean socially stigmatized, like using a double negative. It means something that completely breaks your ability to communicate or parse the sentence.
Something like, "What do you like and broccoli?" Quick aside, if you find this stuff interesting and you want a proper grounding in what linguists mean by grammar, grammatical, and the dozen or so other terms I'm about to throw around, my intro to linguistics course is in the description. It'll make the rest of the video hit a little bit harder. It's worth flagging upfront, generativity itself isn't the controversy. The alternatives I'll be talking about, dependency grammar and construction grammar are also generative in this broad sense. They account for the discrete infinity of language too.
The word generative just got captured by one specific research program. So a lot of people think rejecting chsky means rejecting generativity and it doesn't.
These frameworks just generate differently by combining constructions or by linking heads and dependents without phrase structure trees and without movement. Choskin and his acolytes developed a very robust system for exploring how you can get a small number of pieces, a small number of conceptual rules and generate language.
Their goal was to describe a mental architecture that can give rise to all and only natural human languages. That is, it doesn't over or undergenerate.
They ended up pursuing an approach that uses graph theory. It treats words, we're not going to define that for now, as nodes in an asyclic directed graph.
Chsky later imposed the condition that they're all binary branching for elegance of the theory and parsimony.
Another quick aside that'll matter later, dependency grammar also uses directed graphs, but the nodes are words connected directly to other words with no intervening phrasal nodes. No NP, no VP, no IP, no CP, just words in the asymmetric relationships between them.
That's a big deal because it means that the entire scaffolding of phrases that Chomsky's theory rests on is independency grammar just not posited.
It's an onlogical commitment Chsky made that you don't actually have to make.
Testnier was doing this in the 1950s parallel to an independent of Chsky.
Part of Chsky's approach was to assume that there is a conceptual category called a phrase. So for instance, you might have a noun phrase that has a head, the actual noun, and arbitrarily many modifiers. The insight is that the whole thing acts like the head taking on its category. So when I said the whole thing, I could just as easily replace that with it. It acts like the head and the sentence is perfectly grammatical because the categories match. Whereas if I tried to only use part of it, we've got problems. Dependency grammar accounts for headedness, too. In fact, more directly, the head in DG isn't an abstraction from a phrase. It just is the word that governs the dependence.
The substitutability I just described falls out for free because when it replaces the whole thing, it's just taking the same head position in the dependency structure. You don't need to posit a phrase that acts like its head.
The head is the structural anchor from the start. Where it gets tricky is where Chomsky adds movement. The idea is that there's an underlying base generated mental form of a sentence structure and other structures are derived by movement. So normal sentence structure in English is I gave a chocolate to my wife. To simplify the choskian approach would be to say that the passive my wife was given a chocolate is derived from the basic structure of the other sentence. There's a deep structure that's a cognitive architecture and a surface structure. That's the actual sentence we say or write. Now here's where construction grammar starts looking really different. A construction grimarian would say, "I gave a chocolate to my wife and my wife was given a chocolate." aren't derivations of each other at all. They're different constructions. And each pairing a form with a meaning. The active and the passive aren't transformations. They're separate form meaning pairings with overlapping but distinct semantics and discourse functions. The passive demotes the agent. It promotes the patient to subject but not in the actual derivation of the sentence. that just captures the social and semantic difference between the structures. The passive does work in discourse that the active doesn't. Adele Goldberg's work on argument structure constructions is the classic reference here. The point is there's no derivation, no deep structure, and crucially no need to explain why some movements work and others don't because nothing is moving. I don't want to get too deep into the weeds, but the basic idea is that in a Chskyian framework, you have phrases and they can move, they must move. There's all sorts of interesting problems that you have to address, like what happens with something like, "I ate a whole one and my wife half or I want him to stop versus,"I wish he would stop or why you can't say things like, "What do you want in broccoli?" or "John believes that himself is the best." Ultimately, the proposed mental architecture relies on the concept of phrases, the concept that you can move phrases, and the concept that there may be invisible or unpronounced material in the tree. When you really dig into this theory, first of all, you spend a lot of time getting acquainted and being evangelized to.
Lots of discussions of how it's a theory and theories not only can but maybe should precede empirical data and how it's really all a research program and shouldn't be evaluated against the standards of empiricism just yet. And how Chsky is really actually a lot like Galileo. No, really. And let's pause on those invisible pieces because they're doing a lot. Empty categories, big pro, little pro, traces, copies are all theoretical entities posited because the theory needs them, not because there's direct evidence for them. Both dependency grammar and construction grammar largely dispense with them. In DG, if a word isn't pronounced, it's not in the dependency tree. In construction grammar, what a chsky grammar analyzes via empty categories is often handled by constructional inheritance or by the semantics of the construction itself.
This is a real aams razor moment. GG posits a lot of unobservable theoretical machinery and the alternatives ask whether we need any of it. But at the end of the day, there's a problem that even Chsky acknowledges. When you posit that the child learning a language natively has to keep track of items that can move, and they can move to some number of invisible landing places, but also can't move to them all because of other invisible items. This grammar is computationally unlearnable. The amount of evidence and computing power that you would need to figure it out is astronomical. Chsky turns this into a plus and builds it into his takedown of Skinner. He argues that there is a poverty of the stimulus that is not enough verbal material to learn how language works and that therefore the conceptual system is innate. We're hardwired with a small set of parameters. Question words can remain where they would be in the corresponding statement or they can move to one of two places and no others. for instance. Then there's some number of principles that our natural innate bio program follows to help select a parameter setting. The poverty of the stimulus argument has been substantively challenged. Pullman and Schultz's 2002 paper empirical assessments of stimulus poverty arguments is the canonical takedown.
Corpus studies of child directed speech have shown the input is much richer than Chsky claimed. Usagebased linguists like Thomasello, Bby, and Goldberg argue that children learn constructions from input through general cognitive mechanisms, pattern recognition, analogy, statistical learning, intention reading, none of which are language specific, and computational models, including but not limited to modern LLMs, have demonstrated that you can acquire a lot of grammatical structure from input without a universal grammar. You don't have to endorse LLMs as a cognitive model to note that they're an existence proof against the strong learnability argument. So, we have a few issues.
First, Chsky's approach was taught to me as though it is the only approach and the only one that's even remotely scientifically valid. It's taught as though it solved a lot of problems and created a lot of new avenues of research, which in some ways it did, but not always as portrayed. For instance, one overly technical corner of research studied in a different framework was portrayed to me as though it had been solved in a Chskian framework just because it could be represented in that framework. This is the same as saying that a heliocentric and geocentric universe are two ways of saying the same thing because you can add enough epicycles to Mercury's orbit to predict its apparent location. You can get the right answer, but they're not equivalent models and one is actually demonstrably wrong. So philosophers of science call it a notational variant. A theory that can redescribe phenomena from another framework hasn't necessarily explained them. It's just translated them. This charge has been leveled at minimalism specifically that it's gotten so abstract that almost any phenomenon can be captured by the right combination of features and merge operations which makes it nearly impossible to falsify.
When everything can be expressed in your framework, your framework isn't saying very much. Second, many of the problems it solves are problems it created. Once I started learning about dependency grammars and construction grammar and all the other approaches that exist, I kept having these questions like, but how do they address raising versus control constructions? a theoretical corner of syntax that's trying to explain sentences like John seems to be hungry and why you can't say to be hungry seems by John or sentences like John tries to eat a sandwich and why you can't say to eat a sandwich is tried by John and I would keep asking how do they deal with raising and control and get answers like they don't because they don't have to because the problem is one that only arises when you posit movement it's raising it's in the name they created a problem and then sold the solution like lististerine and halattosis There's a name for this too.
Theory internal problems. A lot of problems in syntax aren't problems about language. They're problems for a particular theory. Dependency grammar treats raising and control as differences in the dependency structure between the matrix verb and the embedded predicate. Seam takes its subject as a structural dependent without assigning it a semantic role. Try assigns the subject a semantic role directly. No movement, no pro, no traces. The phenomenon is real. The problem is theory internal. Not only that, but the current trend in minimalism is to posit that verbs actually have two verb slots creatively named big V and little V to try to capture not just light verbs like make a scene or do the dishes, but also all sorts of other phenomena. And it posits that the subject of a sentence originates in V and is raised that is moved out of that spot as you construct the sentence. The sentence has begun in speech with the end already in mind because you cannot compute a sentence in this framework without starting at the end at least in a description of English which wouldn't be a problem if they weren't also criticized for relying too heavily on English as universal and this is actually part of a broader pattern in minimalism where increasing technical machinery is needed to handle phenomena that other frameworks just handle directly the proliferation of functional projections TP argument P VP Apple the folk p whatever it is the whole cardographic approach starts to look very much like epicycles each new phenomenon gets a new functional head construction grammar asks what if these aren't separate projections but just properties of constructions dependency grammar asks what if these aren't structural positions at all but information structural or semantic positions that don't need to be encoded in the syntactic tree so here's a problem well before I knew that that doesn't accord with neuroscience I was aware that I know plenty of people myself included who might start a sentence without knowing exactly how it will end The beginning constrains my subsequent choices. A lot of the linguists working on syntax are the particular flavor of ghost pepper neurospicy that they might actually believe that we all have formulated a full and complete thought before we speak. But that's another video for another day. And there's actual psycho linguistic literature here. Lavevelt's work on lexical access and speech production. Ferrer's work on good enough parsing. This stuff has shown for decades that production is incremental.
Speakers really do start sentences without fully planning them. that empirical work is much more compatible with construction-based and dependencybased approaches that don't require a fully specified deep structure to exist before any words come out of your mouth. The Chsky GG folks rebutt that their model is one that explains the relationships among structures in an utterance, but isn't attempting to be an explicit exact definition of what goes on in the brain. Except that's exactly what they claim. If you're not modeling language in the brain, which you absolutely are when you talk about your language acquisition device in the brain, then what are we even doing here?
And honestly, the framework slides between those two claims in ways that immunize it from both kinds of evidence.
Performance data, that's just performance, not competence. Brain data, well, neuroscience hasn't caught up to the theory yet. This is what philosophers of science call an unfalsifiable framework. Heads I win, tails you lose. Not to mention that empirical studies consistently challenge claims about what is happening in the brain and what is possible. Psycho linguistic research demonstrated that Chsky's whole bit about anaphors, words like himself, is just not supported empirically. So we moved on from government and binding when it became clear that the puredly universal principles A, B, and C, the rare, actually falsifiable claims were false.
at least some of them. The cross-ling linguistic work was particularly damaging. Long-distance reflexes in Mandarin, Icelandic, Japanese showed that principle A, as originally stated, couldn't be universal.
Logoraphic pronouns in the discourse sensitivity of binding more generally turned out to be empirically thorny.
Each fix made the theory more baroque and less predictive. The nail in the coffin for me was the book Syntax, a cognitive approach, which is an introduction to dependency grammar. Now, here's the thing. It's still ascyclic directed graphs with dependency relationships, but they posit that you don't get John ate the cake and the cake was eaten by John by transforming one into the other. There's just two different structural things. And as I explore the literal decades of literature on dependency grammar and construction grammarss that I was told either didn't exist or was not important or was dismissed out of hand, as I look into those, they align with both the rest of the science I was familiar with and the models of cognition that other fields have robust empirical evidence for and frankly common sense. Let me make this concrete. In a dependency analysis of John ate the cake, eight is the root with Jon as its subject dependent and cake as its object dependent. with the dependent on cake.
In the cake was eaten by John, the structure is different. Cake is the subject dependent of the verbal complex and John is a dependent inside the byphrase. Two distinct dependency structures, both directly produced, neither derived from the other. The relationships between them that they share truth conditions are semantic and pragmatic facts, not syntactic ones. And beyond that, constructions like the more the marrier, suddenly aren't a problem.
the date of alternation, the difference between I gave my wife chocolate and I gave a chocolate to my wife cease to be a problem. They're communicating slightly different things based on information structure and the existing context. And I don't have to figure out how to make one into the other. Even things like construction, grammar's ideas of inheritance and coercion are basically exactly what they sound like if you've ever coded a class in Python.
Take coercion. When sneeze appears in a caused motion construction, she sneezed the napkin off the table, the construction coerces the verb into a transitive caused motion reading. In a movementbased grammar, this is mysterious. In construction grammar, it's the construction contributing meaning the verb doesn't have on its own, just like how a parent class in Python contributes methods to a child class. And the more the marrier isn't a weird exception to be banished to the periphery. It's a construction with slots the X or the Y sitting in the same continuum as the transitive and the passive. They differ in degrees of productivity and abstractness, not in kind. It's constructions all the way down. Even the date of alternation turns out not to be a single alternation. Beth Lavine and others have shown different verb classes pattern differently and the choices between forms is conditioned by information structure, weight, animacy, and discourse status. Quick pause on animacy. That's how you can say things like, "I slid her the book." But you can't say, "I slid the door the book."
None of that falls out of a movement analysis. All of it makes sense if you start from constructions. So, here I am just over 5 years out from grad school, finally letting myself say out loud what my adviser was already not so subtly hinting at when he called it toic. The thing that nearly ended my career, the thing that I was told was the science of language, the thing that gate kept an entire generation of linguists out of a PhD program. It's not the only game in town. But it's not the best game in town, and it's not even by the standards that we hold every other science to. An especially good game. There are decades of work in dependency grammar and construction grammar that I was told just didn't exist or didn't matter or had been subsumed by the Chomskian framework. And none of that was true.
The frameworks I was kept away from align better with cognition, better with neuroscience, better with what we actually see kids doing when they learn language, and better with the cross-linguistic data. I wasn't bad at syntax. The syntax I was being taught was bad and they should feel bad.
>> Your music's bad and you should feel bad.
>> And the relief of saying that, I can't even describe it. If you've ever been deep in something where the explanations kept not quite working, where you kept having to be told the doubts that you were having were just because you didn't really fully understand it well enough yet, where the smart people around you kept gently suggesting maybe you weren't cut out for this, you know, a cult. And then one day you read the right book and the whole thing snaps and you realize that the doubts were the were the data were the facts. Yeah, that I'm not a cult survivor in any literal sense, but the cognitive shape of it is real and I see it in other people who escaped other intellectual traditions that overpromised and underdelivered. You're allowed to leave. If any of this resonated, if you've had your own version of this experience in linguistics or somewhere else, I want to hear about it in the comments. Subscribe if you want more in this vein because I have a lot more to say about syntax, about what gets taught as foundational versus what actually is, and about the politics of who gets to be a linguist.
And one more time, because it matters, scan the QR code or hit the link in description for the descriptive grammar of black English project. That's the work that broke the model for me. That's the kind of linguistics I want to do.
Come along. Until next time, happy learning.
Related Videos
Trumpβs Reflecting LAKE update
concussiontalks_slp
15K viewsβ’2026-05-28
WIL in Afrikaans is not WILL in English? | Ek leer Afrikaans | Part 6
afrikaanswithannelize
229 viewsβ’2026-05-28
How Brits Say British Pronunciation
MrBranicus
1K viewsβ’2026-05-30
π΅ A to Z Kids Song | Cute ABC Animation for Children
ABC_Little_Heros
10K viewsβ’2026-05-30
basque influence uniquely different spanish
Davantsi
761 viewsβ’2026-05-31
10 German Grammar Rules That Unlock the German Language | A1-B1 | Learn German
LearnGermanOriginal
357 viewsβ’2026-05-29
How To Express Disappointment In English #english #speakenglish #languagelearning #airlearn #viral
english_w_remi
6K viewsβ’2026-05-29
ONLY SENIORS WITH IQ 190+ CAN GET 2 OUT OF 20, | English grammar skills
EforEnglish161
582 viewsβ’2026-05-29











