Sunday, May 11, 2025
ICYMI: Mom's Day Edition (5/11)
Friday, May 9, 2025
The Failed Case for Super-NAEPery
At The74 (the nation's most uneven education coverage), Goldy Brown (Whitworth U and AEI/CERN) and Christos Makridis (Labor Economics and ASU) have a bold idea that involves putting fresh paint on a bad old idea--the national Big Standardized Test.
Their set-up is the usual noise about how the National Assessment of Educational Progress (NAEP) peaked around 2013, which is true if you also believe that the rise that carries I-80 across the Bonneville Salt Flats is also a peak. They are more accurate when they say that "student outcomes" (aka "Big Standardized Test scores") have "largely stagnated" over recent decades.
![]() |
Yep, it's a roller coaster |
Let me digress for just a moment to note the oddness of that idea of stagnation--as if test scores should keep rising like stock prices and property values. Each cohort of students should be smarter and better than the one before, a thing that would happen... why? What's the theory here? Each year's children will be genetically better than those that came before? That every teacher will significantly up her game with every passing year (because the students rotate out at a much higher rate than the teachers)? Schools get better at gaming the tests? If the expectation is that each successive group of students will score higher than the group before, what is supposed to cause that to happen? And how does it square with the people who think that education should be going "back" to something like "basics"? I mean, doesn't the vision of non-stagnating test scores include students who are all smarter and more knowledgeable than their parents?
Okay, digression over. The authors also point out that Dear Leader and his crew have "downsized" the staff that oversees the NAEP (while simultaneously insisting that NAEPing will continue normally)-- but they argue that the kneecapping will "create an opportunity to rethink the role this tool can play."
In particular, the Trump Administration could explore using the NAEP to promote greater transparency among schools, parents, and local communities, as well to enhance academic rigor and ensure genuine accountability in a comparable way across schools and states. That would mean replacing a disparate collection of state tests will a single national assessment administered to every fourth and eighth grade student every year.
Yikes. I checked quickly to see if Brown and Makridis are over 15 years of age, because if so, they should remember pretty clearly that the feds have tried this exact thing before. Every state was supposed to measure their Common Core achievements by taking the same BS Test, except then that turned out to be two BS Tests (PAARC and SBA) but then those turned out to be expensive and not-very-good tests and states started dumping them, while folks from all ends of the spectrum noted that this sure looked like an illegal attempt to control curriculum from the federal level.
With national standards and national testing, supporters argued, we would be able to compare students from Utah and Ohio, as if that was something anyone actually wanted to do. As if in Utah parents were saying, "Nice report card, Pat, but what I really want to know is how your test scores compare to the test scores of some kid in Teaneck, New Jersey."
No, these guys have to remember those days, because they are well versed in all the same bad arguments made at the time.
Parents, educators, and state leaders agree that more information — not more bureaucracy — is needed to make informed decisions for their children and communities, as well as to foster greater competition. Making the NAEP a truly national assessment would provide this information in a consistent, credible, and actionable manner.
Right. Test scores would be great for unleashing free market forces in a free market, education-as-a-commodity choice system. Also, competition doesn't unleash anything useful in education. Also also, choice fans have mostly stopped using this talking point because it turns out charter and voucher schools don't actually do any better on BS Tests. Get up to date, guys-- today it's all "choice is a virtue in and of itself" and "parents should get to choose a school that matches their values."
The writers call for the NAEP to be cranked out every year instead of every other, and for every student instead of the current sampling. No sweat, they say, because every state already has stuff in place for their own state test.
But an annual universal NAEP would be great because it's a "consistent and academically rigorous measure of student performance." There's a huge amount of room to debate that, but it only sort of matters because the writers have fallen into the huge fallacy of NAEP and PISA and all the rest of these data-generating numbers. "If we had some good solid data," says the fallacy, "then we could really Get Shit Done." We would Really Know how students are doing, we would Really Know about how bad the state tests are, and we would Really Know where the issues in the system are.
It's an appealing notion, and it has never, ever worked. For one thing, nobody can even agree on what critical terms like "proficient" mean when it comes to NAEP. But more importantly, the solid data of NAEP never solves anything. Everyone grabs a slice, applies it to the policies they were busy pushing anyway, and NAEP solves nothing, illuminates nothing, settles nothing.
The writers also want to use the test illegally in a method now familiar to both political parties. Tie Title I funding to compliance with NAEP testing mandates and presto-- "States would have a stronger incentive to align their instructional practices with higher expectations." In other words, test + money = federal control of local curriculum. Not okay.A national benchmark can support local autonomy while enabling cross-district comparisons that inform parents, educators, and policymakers alike.
Federal initiatives to improve student outcomes have historically produced mixed results.Yes, and theater trips to see "Our American Cousin" have historically produced mixed results for Presidents. Of the whole list of "mixed" results, they include just the Obama era attempt to use test scores to drive teacher improvement (well, not "improvement" exactly, but teaching to the test in order to raise scores).
In an era of educational fragmentation, the NAEP stands out as a uniquely credible and underutilized tool. Repurposing it as the primary national assessment — administered annually to all 4th and 8th graders in states receiving Title I dollars — would promote transparency, reduce redundant testing, and align incentives around higher academic standards. This reform would offer a shared benchmark to evaluate progress across states and districts. At a time when parents, educators, and policymakers are calling for both accountability and flexibility, a restructured NAEP provides a rare opportunity to deliver both.
AI Is Bad At Grading Essays (Chapter #412,277)
What Pearson and its competitors do in the area of essay scoring is not a science. It's not even an art. It's a brutal reduction of thought to numbers. The principles of industrial production that gave us hot dogs now give us essay scores.
The main hurdles to computerized grading have not changed. Reducing essay characteristics to a score is difficult for a human, but a computer does not read or comprehend the essay in any usual understanding of the words. Everything the software does involves proxies for actual qualities of actual writing. This paper from 2013 still applies-- robograders still stink..
Perelman and his team were particularly adept at demonstrating this with BABEL (the Basic Automatic B. S. Essay Language Generator), a program that could generator convincing piles of nonsense which robograders consistently gave high scores. Sadly, it appears that BABEL is no longer on line, but I've taken it out for a spin myself a few times-- the results always make robograders look incompetent (see here, here, here, and here).
The study of bad essay grading is deep. We have some classic studies of the bad formula essay. Paul Roberts' "How To Say Nothing in 500 Words" should be required reading in all ed programs. Way back in 2007, Inside Higher Ed ran this article about how an essay that included, among other beauties, reference to President Franklin Denelor Roosevelt was an SAT writing test winner. And I didn't find a link to the article, but in 2007 writing instructor Andy Jones took a recommendation letter, replaced every "the" with "chimpanzee," and scored a 6 out of 6 from the Criterion essay-scoring software at ETS. You can read the actual essay here. And as the classic piece from Jesse Lussenhop, part of robograding's problem is that it has adopted the failed procedures of grading-by-human-temps.
Like self-driving cars, robograding has been just around the corner for years. If you want to dive into my coverage here at the Institute, see here, here, here, here, here and here for starters. Bill Gates was predicting it two years ago, and just last year, an attempt was made to get ChatGPT involved which was not quite successful and very not cheap. Which is bad news because the "problem" that robograding is supposed to solve is the problem of having to hire humans to do the job. Test manufacturers have been trying to solve that problem for years (hence the practice of undertrained minimum wage temps as essay graders).
That brings us up to the recent attempt by The Learning Agency. TLA is an outfit pushing "innovation." It (along with the Learning Agency Lab) was founded by Ulrich Boser in 2017, and they partner with the Gates Foundation, Schmidt Futures, Georgia State University, and the Center for American Progress, where Boser is a senior fellow. He has also been an advisor to the Gates Foundation, Hillary Clinton's Presidential Campaign, and the Charles Butt Foundation--so a fine list of reform-minded left-leaning outfits. Their team involves former government wonks, non-profit managers, comms people and a couple of Teach for America types. The Lab is more of the same; there are more "data scientists" in this outfit than actual teachers.
TLA is not new to the search for better robograding. The Lab was involved in a competition, jointly sponsored by Georgia State University, called The Feedback Prize. It was a coding competition being run through Kaggle, in which competitors are asked to root through a database of just under 26K student argumentative essays that have been previously scored by "experts" as part of state standardized assessments between 2010 and 2020 (which raises a whole other set of issues, but let's skip that for now). The goal was to have your algorithm come close to the human scoring results; and the whole thing is highly technical.
Now TLA has dug through data again, to produce "Identifying Limitations and Bias in ChatGPT Essay Scores: Insights from Benchmark Data." They grabbed their 24,000 argumentative essay dataset and let ChatGPT do its thing so they could check for some issues.
Does ChatGPT show bias? A study just last year said yes, it does, which is always a (marketing) problem because tech is always sold with the idea that a machine is perfectly objective and not just, you know, filled with the biases of its programmers.
This particular study found bias that it deemed lacking in "practical significance," except when it didn't. Specifically, the difference between Asian/Pacific Islanders and Black students, which underlines how Black students come in last in the robograding.
So yes, there's bias. But the other result is that ChatGPT just isn't very good at the job. At all. There's more statistical argle bargle here, but the bottom line is that ChatGPT gives pretty much everyone a gentleman's C. To ChatGPT, nobody is excellent and nobody is terrible, which makes perfect sense because ChatGPT is not qualified to determine anything except whether the strong of words that the writer has created is, when compared to a million other strings of words, probable. ChatGPT cannot tell whether the writer has expressed a piercing insight, a common cliche, or a boneheaded error. ChatGPT does not read, does not understand.
Using ChatGPT to grade student essays is educational malpractice. It is using a yardstick to measure the weight of an elephant. It cannot do the job.
TLA ignores one other question, a question studiously ignored by everyone in the robograding world-- how is student performance affected when they know that their essay will not be read by an actual human being? How does one write like a real human being when your audience is mindless software? What will a student do when schools break the fundamental deal of writing--that it is an attempt to communicate an idea from the mind of one human to the mind of another?
This is one of the lasting toxic remnants of the modern reform movement--an emphasis on "output" and "product" that ignores input, process, and the fact that there are many ways to get a product-- particularly if that's all the people in charge care about.
"The computer has read your essay" is a lie. ChatGPT can scan your output as data (not as writing) and compare it to the larger data set (also not writing any more) and see if it lines up. Your best bet as a student is to aim for the same kind of slop that ChatGPT churns out thoughtlessly.
Add ChatGPT to the list of algorithmic software that can only do poorly a job it should not be asked to do at all.
Thursday, May 8, 2025
AI And Dead Writers
In a world-first, the bestselling novelist of all time offers you an unparalleled opportunity to learn the secrets behind her writing, in her own words. Made possible today by Agatha's family, an expert team of academics and cutting-edge audio and visual specialists, as if she were teaching you herself...
As if.
A team of academics combined or paraphrased statements from Christie’s archive to distill her advice about the writing process. They took care to preserve what they believed to be her intended meaning, with the aim of helping more of her fans interact with her work, and with fiction writing in general.
And if you haven't heard, yes, there's already a whole industry out there that will create a chatbot fake of your dead relative so you can pretend to talk to them. Except you aren't talking to them. And that isn't them, and it's not even a valid imitation of them.
What Do They Mean By "Gender Ideology"?
Gender ideology is the theory that the sex binary doesn’t capture the complexity of the human species, and that human individuals are properly described in terms of an “internal sense of gender” called “gender identity” that may be incongruent with their “sex assigned at birth.”
The plain truth: Gender ideology does not accommodate the reality of sex—the reproductive strategy of mammals including human beings. Sex, in this reckoning, is not an objective truth about men and women. We are not male or female by virtue of our body structure or the fact that our bodies are oriented around the production of sperm or eggs. Human beings, are, in essence, psychological selves with internal senses of gender—like disembodied gendered souls. These “gender identities” are independent of, and can be incongruent with, the bodies that God gave us and that medicine has come to associate with “male” and female.” These “sex” categories are mere conventions, says the gender ideologue, not facts.
Richards says that "gender acolytes" will "rarely speak so bluntly," because he is apparently certain that to just say what he just said would offend and upset people. I'm not so sure. It seems like a fairly food definition; what's less clear is why it's objectionable.
The standard right wing complaint seen here is one we've seen again and again-- there is One Objective Truth and we know it and people who disagree are stupid or nuts or evil. Learning to be in the world is all about learning the One Objective Truth about everything (as discovered by some dead white guys); "critical thinking" is about learning how to unfailing arrive at the One Objective Truth. People who talk about different points of view are just trying to cause trouble. This has animated endless arguments since the Mayflower docked, and it is the foundational principle behind the classical school movement.
The other argument here is a less common one, but Richards seems to be arguing that human beings are just flesh and bone, and talk about psychological selves or souls is just silly. God gave us a body, but our psychological selves, our souls, come from... somewhere else? There's a more obvious flaw with this part of the argument-- God does in fact give people bodies that are a wide variety of intersex (also, am I in trouble because I wear glasses and had cataract surgery to correct the eyes that God gave me at birth). But I am fascinated to see the christianist Heritage Foundation arguing against the idea of souls and asserting that we are just meat sacks, and the nature of the meat sacks determines all that we are.
Ultimately Richards falls back on depending on "what you know to be true" in resisting the gender ideologues who try to tell you that human beings are varied and different, because "you know" they aren't. Nor does one have to search far to find the same right wing folks arguing that not only are there just two sexes, but the correct ways to be a real man or woman are limited to only a few choices (go get to making some babies, missy).
We're talking about fundamentally different ways to view the world, which is why these arguments inevitably land in schools and debates about whether students should be taught about how to navigate through a rich and complicated world or whether they should be taught that for every question, there is one Right and True answer and all others should be avoided and suppressed-- not even mentioned or acknowledged to exist. It is one thing to disagree with a point of view, and a whole other thing to insist that it not even be mentioned. Maybe, the hope goes, if we can commandeer education and teach only the One Objective Truth and suppress all the rest, children will grow up to see the world as we do. Good luck with that.
Sunday, May 4, 2025
ICYMI: Star Wars Edition (5/4)
Drawing a Line
Battle lines being drawn
Friday, May 2, 2025
"Religious Liberty" is the new "State's Rights"
Attorney Michael Farris, speaking on behalf of a Virginia church, said the IRS had investigated it for alleged violations of the Johnson Amendment, which requires churches to refrain from participating in political campaigns if they want to keep their tax-exempt status. Representatives from Liberty University and Grand Canyon University also claimed their institutions were unfairly fined because of their Christian worldview.
Additional allegations included the denial of religious exemptions to COVID-19 vaccine mandates for military personnel, biased treatment of Christian Foreign Service Officers, and efforts to suppress Christian expression in federal schools and agencies. Critics further accused the Biden administration of marginalizing Christian holidays while giving prominence to non-Christian observances, and of sidelining faith-based foster care providers.
Speakers also alleged that Christian federal employees were retaliated against for opposing DEI and LGBT-related policies that conflicted with their religious beliefs.
Recent Federal and State policies have undermined this right by targeting conscience protections, preventing parents from sending their children to religious schools, threatening funding and non-profit status for faith-based entities, and excluding religious groups from government programs.
"Conscience protections" is another favored construction, as in "my conscience tells me that I shouldn't treat Those People like people and how dare you infringe on my right to do that."
The modern rejoinder to someone claiming that the Civil War was not about slavery, but about state's rights is to ask, "The state's right to do what?" The answer, of course, is "The state's right to perpetuate a system of enslavement."
When someone on the far right starts talking about religious liberty, the question is "The liberty to do what?" The answer is, "The liberty to enjoy a position of high privilege from which we can decide which people we think are worthy of civil rights." Or more simply, "The liberty to discriminate against others without consequence."
It all makes me sad because it is the worst testimony ever for the Christian faith. It's the kind of thing that makes my non-believing friends and relatives point and say, "See? Religious people are just as awful as anyone." There are actual Christians in the world, and they deserve better than this. There are people who daily wrestle with how to live out their faith in the world in challenging situations, and they deserve better than this. If your assertion is that you can't really, truly follow Christ unless you are freely enabled to treat certain people like shit, then you are talking about some Jesus that I don't remotely recognize. You are not talking about religious liberty; you're talking about toxic politics with some sort of faux Jesus fig leaf.