Thursday, February 11, 2016

Fordham Provides More Core Testing PR

The Fordham Institute is back with another "study" of circular reasoning and unexamined assumptions that concludes that reformster policy is awesome. 

The Thomas B. Fordham Institute is a right-tilted thinky tank that has been one of the most faithful and diligent promoters of the reformster agenda, from charters (they run some in Ohio) to the Common Core to the business of Big Standardized Testing.

In 2009, Fordham got an almost-a-million dollar grant from the Gates Foundation to "study" Common Core Standards, the same standards that Gates was working hard to promote. They concluded that the Core was swell. Since those days, Fordham's support team has traveled across the country, swooping into various state legislators to explain the wisdom of reformster ideas.

This newest report fits right into that tradition.

Evaluating the Content and Quality of Next Generation Assessments is a big, 122-page monster of a report. But I'm not sure we need to dig down into the details, because once we understand that it's built on a cardboard foundation, we can realize that the details don't really matter.

The report is authored by Nancy Doorey and Morgan Polikoff. Doorey is the founder of her own consulting firm, and her reformy pedigree is excellent. She works as a study lead for Fordham, and she has worked with the head of Education Testing Services to develop new testing goodies. She also wrote a nice report for SBA about how good the SBA tests were. Polikoff is a testing expert and professor at USC at Rossier. He earned his PhD from UPenn in 2010 (BA at Urbana in 2006), and immediately raised his profile by working as a lead consultant on the Gates Measures of Effective Teaching project. He is in high demand as an expert on how test and implement Common Core, and he has written a ton about it.

So they have some history with the materials being studied.

So what did the study set out to study? They picked the PARCC, SBA, ACT Aspire and Massachussetts MCAS to study. Polikoff sums it up in his Brookings piece about the report.

A key hope of these new tests is that they will overcome the weaknesses of the previous generation of state tests. Among these weaknesses were poor alignment with the standards they were designed to represent and low overall levels of cognitive demand (i.e., most items requiring simple recall or procedures, rather than deeper skills such as demonstrating understanding). There was widespread belief that these features of NCLB-era state tests sent teachers conflicting messages about what to teach, undermining the standards and leading to undesired instructional responses.

Or consider this blurb from the Fordham website

Evaluating the Content and Quality of Next Generation Assessments examines previously unreleased items from three multi-state tests (ACT Aspire, PARCC, and Smarter Balanced) and one best-in-class state assessment, Massachusetts’ state exam (MCAS), to answer policymakers’ most pressing questions: Do these tests reflect strong content? Are they rigorous? What are their strengths and areas for improvement? No one has ever gotten under the hood of these tests and published an objective third-party review of their content, quality, and rigor. Until now.

So, two main questions-- are the new tests well-aligned to the Core, and do they serve as a clear "unambiguous" driver of curriculum and instruction?

We start from the very beginning with a host of unexamined assumptions. The notion that Polikoff and Doorey or the Fordham Institute are in any way an objective third parties seems absurd, but it's not possible to objectively consider the questions because that would require us to unobjectively accept the premise that national or higher standards have anything to do with educational achievement, that the Core standards are in any way connected to college and career success, that a standardized test can measure any of the important parts of an education, and that having a Big Standardized Test drive instruction and curriculum is a good idea for any reason at all. These assumptions are at best highly debatable topics and at worst unsupportable baloney, but they are all accepted as givens before this study even begins.

And on top of them, another layer of assumption-- that having instruction and curriculum driven by a standardized test is somehow a good thing. That teaching to the test is really the way to go.

But what does the report actually say? You can look at the executive summary or the full report. I am only going to hit the highlights here.

The study was built around three questions:

Do the assessments place strong emphasis on the most important content for college and career readiness(CCR), as called for by the Common Core State Standards and other CCR standards? (Content)

Do they require all students to demonstrate the range of thinking skills, including higher-order skills, called for by those standards? (Depth)

What are the overall strengths and weaknesses of each assessment relative to the examined criteria forELA/Literacy and mathematics? (Overall Strengths and Weaknesses)

The first question assumes that Common Core (and its generic replacements) actually includes anything that truly prepares students for college and career. The second question assumes that such standards include calls for higher-order thinking skills. And the third assumes that the examined criteria are a legitimate measures of how weak or strong literacy and math instruction might be.

So we're on shaky ground already. Do things get better?

Well, the methodology involves using the CCSSO “Criteria for Procuring and Evaluating High-Quality Assessments.” So, here's what we're doing. We've got a new ruler from the emperor, and we want to make sure that it really measures twelve inches, a foot. We need something to check it against, some reference. So the emperor says, "Here, check it against this." And he hands us a ruler.

So who was selected for this objective study of the tests, and how were they selected.

We began by soliciting reviewer recommendations from each participating testing program and other sources, including content and assessment experts, individuals with experience in prior alignment studies, and several national and state organizations. 

That's right. They asked for reviewer recommendations from the test manufacturers. They picked up the phone and said, "Hey, do you anybody who would be good to use on a study of whether or not your product is any good?"

So what were the findings?

Well, that's not really the question. The question is, what were they looking for? Once they broke down the definitions from CCSSO's measure of a high-quality test, what exactly were they looking for? Because here's the problem I have with a "study" like this. You can tell me that you are hunting for bear, but if you then tell me, "Yup, and we'll know we're seeing a bear when we spot its flowing white mane and its shiny horn growing in the middle of its forehead, galloping majestically on its noble hooves while pooping rainbows."

I'm not going to report on every single criteria here-- a few will give you the idea of whether the report shows us a big old bear or a majestic, non-existent unicorn.

Do the tests place strong emphasis on the most important content etc?

When we break this down it means--

Do the tests require students to read closely and use evidence from texts to obtain and defend responses? 

The correct answer is no, because nothing resembling true close reading can be done on a short excerpt that is measured by close-ended responses that assume that all proper close readings of the text can only reach one "correct" conclusion. That is neither close reading (nor critical thinking). And before we have that conversation, we need to have the one where we discuss whether or not close reading is, in fact, a "most important" skill for college and career success.

Do the tests require students to write narrative, expository, and persuasive/argumentation essays (across each grade band, if not in each grade) in which they use evidence from sources to support their claims?

Again, the answer is no. None of the tests do this. No decent standardized test of writing exists, and the more test manufacturers try to develop one, the further into the weeds they wander, like the version of a standardized writing I've seen that involves taking an "evidence" paragraph and answering a prompt according to a method so precise that all "correct" answers will be essentially identical. If there is only one correct answer to your essay question, you are not assessing writing skills. Not to mention what bizarre sort of animal a narrative essay based on evidence must be.

Do the tests require students to demonstrate proficiency in the use of language, including academic vocabulary and language conventions, through tasks that mirror real-world activities?

None, again. Because nothing anywhere on a BS Tests mirrors real-world activities. Not to mention how "demonstrate proficiency" ends up on a test (hint: it invariably looks like a multiple choice Pick the Right Word question).

Do the tests require students to demonstrate research skills, including the ability to analyze, synthesize organize, and use information from sources?

Nope. Nope, nope, nope. We are talking about the skills involved in creating a real piece of research. We could be talking about the project my honors juniors complete in which they research a part of local history and we publish the results. Or you could be talking about a think tank putting together some experts in a field to do research and collecting it into a shiny 122-page report. But you are definitely not talking about something that can be squeezed into a twenty-minute standardized test section with all students trying to address the same "research" problem with nothing but the source material they're handed by the test. There are little-to-none research skills tested there.

How far in the weeds does this study get?

I look at the specific criteria for the "content" portion of our ELA measure, and I see nothing that a BS Test can actually provide, including the PARCC test for which I examined the sample version. But Fordham's study gives the PARCC a big fat E-for-excellent in this category.

The study "measures" other things, too.

Depth and complexity are supposed to be a thing. This turns out to be a call for higher-order thinking, as well as high quality texts on the test. We will, for the one-gazzillionth time, skip over any discussion of whether you can be talking about true high quality, deeply complex texts when none of them are ever longer than a page. How exactly do we argue that tests will cover fully complex texts without ever including an entire short story or an entire novel?

But that's what we get when testing drives the bus-- we're not asking "What would be the best assortment of complex, rich, important texts to assess students on?" We are asking "what excerpts short enough to fit in the time frame of a standardized text will be good enough to get by?"

Higher-order responses. Well, we have to have "at least one" question where the student generates rather than selects an answer. At least one?! And we do not discuss the equally important question of how that open response will be scored and evaluated (because if it's by putting a narrow rubric in the hands of a minimum-wage temp, then the test has failed yet again).

There's also math. 

But I am not a math teacher, nor do I play one on television.

Oddly enough 

When you get down to the specific comparisons of details of the four tests, you may find useful info, like how often the test has "broken" items, or how often questions allow for more than one correct answer. I'm just not sure these incidentals are worth digging past all the rest. They are signs, however, that researchers really did spend time actually looking at things, which shouldn't seem like a big deal, but in world where NCTQ can "study" teacher prep programs by looking at commencement fliers, it's actually kind of commendable that the researchers here really looked at what they were allegedly looking at. 

What else?

There are recommendations and commendations and areas of improvement (everybody sucks-- surprise-- at assessing speaking and listening skills), but it doesn't really matter. The premises of this entire study are flawed, based on assumptions that are either unproven or disproven. Fordham has insisted they are loaded for bear, when they have, in fact, gone unicorn hunting.

The premises and assumptions of the study are false, hollow, wrong, take your pick. Once again, the people who are heavily invested in selling the material of reform have gotten together and concluded once again that they are correct, as proven by them, using their own measuring sticks and their own definitions. An awful lot of time and effortappears to have gone into this report, but I'm not sure what it good it does anybody except the folks who live, eat and breathe Common Core PR and Big Standardized Testing promotion.

These are not stupid people, and this is not the kind of lazy, bogus "research" promulgated by groups like TNTP or NCTQ. But it assumes conclusions not in evidence and leaps to other conclusions that cannot be supported-- and all of these conclusions are suspiciously close to the same ideas that Fordham has been promoting all along. This is yet another study that is probably going to be passed around and will pick up some press-- PARCC and SBA in particularly will likely cling to it like the last life preserver on the Titanic. I just don't think it proves what it wants to prove.

1 comment:

  1. So, it's as if Petrelli's personally conducted taste-test allowed him to conclude that his personal recipe for the ideal chocolate covered pickled herring sandwich, produces the best tasting chocolate covered pickled herring sandwich. After which he reminds us how eating his sandwich will endow us with the super-power of our choice