Sunday, September 9, 2018

Still Pushing the PARCC

These days Laura Slover is the Big Cheese at CenterPoint Education, a nonprofit organization that's pushing its own brand of ed reform. Not getting too deeply into this outfit at the moment, but it includes team members like Tony Bennett, who lost his education job in Florida over his misbehavior at his previous reform job in Indiana; and Emily Alvarez, one of several folks who migrated here from PARCC where she was working after leaving lobbyist work. The head of the board is Paul Pastorak, formerly reformy boss of Louisiana and now an ed reform "transformation" consultant. Advisors include folks from Teach Plus and KIPP, plus admittedly some actual teachers. And they are generously supported by the reform-friendly William and Flora Hewlett Foundation.

Look! It's a river in Egypt!

But up until last year, Slover was the CEO at PARCC, and as evidenced by her piece at FutureEd (and reprinted at Education Next), she is still pushing for the Big Standardized Test that the Common Core tried to launch. Her co-author, Lesley Muldoon, is a former PARCC founder who also migrated to CenterPoint. And their article reads mostly like ad copy for the PARCC, with so many details spun and stretched that the article begins to resemble a big sticky ball of cotton candy. Like Arne Duncan, Slover and Muldoon want to try rewriting history. Let's see how they did.

When the U.S. Department of Education awarded $350 million to two consortia of states in September 2010 to develop new assessments measuring performance of the Common Core State Standards, state commissioners of education called it a milestone in American education.

That is true for a number of reasons, not the least of which is that the government spent $350 million on a product and then allowed a private company to take ownership of it and profit from it.

"By working together, states can make greater—and faster—progress than we can if we go it alone," said Mitchell Chester...

Chester was the Massachusetts ed commissioner who headed up the PARCC board from 2010 to 2015. Slover once called him the "Johnny Appleseed of US education policy," a comparison she probably wouldn't have made if she'd known more about Johnny Appleseed (he was a Swedenborgian who believed that he would have a brace of virgins waiting for him in heaven-- in my town, we know our Johnny Appleseed lore).

I'll give Slover and Muldoon one thing-- they don't flinch from some facts. They note participation in the two state consortia dropped from 44 to 16. "The reasons for leaving vary," they write, "but the decrease in participation makes it easy for some to declare the program a failure." Well, yes. That's true. It's easy to declare the program a failure because it has failed. Its goal was to make all states (and schools within them) comparable because  they would all be measure by the same instrument. That goal has not been achieved.

But in ed reform, as in all political endeavors, when you've failed, there's only one thing to do-- admit failure, listen to your critics, examine the cold hard facts of how you failed, reflect on what you've done, admit your mistakes, and do better next time. Ha! No, just kidding. The only thing to do is move the goalposts, and Slover and Muldoon have their backhoe all revved up and ready to go.

A closer look, however, suggests that Commissioner Chester’s optimism was not misplaced. Indeed, the testing landscape today is much improved. In many states, assessments have advanced considerably over the previous generation of assessments, which were generally regarded as narrowly focused, unengaging for students, and pegged at low levels of rigor that drove some educators to lower expectations for students.

This then is our new story. Common Core and the testing regimen that was attached to it have made the testing world better, an analysis that is rather like the flip side of the repeated promise over the last decade that a new generation of Really Great Tests was just around the corner.

So much baloney is needed to sell this story. Previous tests had "low levels of rigor that drove some educators to lower expectations for students"? First of all, I'm not going to attempt to count the number of qualifiers-- many, some-- that hedge every statement Slover and Muldoon make. Second, I dare you to find me five classroom teachers in the entire country who ever, ever said, "This single Big Standardized Test that they give at the end of the year isn't very rigorous, so I'm just going to slack off." I'm not surprised that Slover and Muldoon suggest otherwise-- part of the point of the BS Test has always been to create leverage so that the judgment of test manufacturers could override the judgment of classroom teachers, and the excuse for doing so was always that classroom teachers had lousy judgment and weren't trying very hard. "But if we hit them with this big, hard test at the end of the year, they'll have to do a good job of teaching what we want them to teach." The PARCC and SBA were always an insult to classroom teachers.

Today, many state assessments measure more ambitious content like critical thinking and writing...

No. No, they don't. They really, truly don't. Standardized tests, which by their very nature block out divergent and deep thinking, are incapable of measuring critical thinking. Heck, the mere fact that students must come up with an answer RIGHT NOW without chance to reflect, research, and just plain think, guarantees that they cannot "measure" critical thinking. Nor has a standardized test yet been invented that can do a decent job of assessing writing. We have taught our students to beat the writing test, and the tricks are-- restate the prompt, write a lot (even if it's repetitious), use some big words (even if you use them incorrectly), and never ever worry whether your content is correct or not.

But now these women who deeply believe in the PARCC's success, but who have gotten out of Dodge themselves, will give reflections on how the PARCC and SBA changed the testing landscape. Spoiler alert: they will not mention that the landscape has been changed by the billions of dollars now spent on BS Tests across the country.

One of the most important features of state tests today is their focus on college and career readiness. Unlike in the past, tests now measure a broad range of knowledge and skills that are essential to readiness and report students’ progress toward that goal. Tests of old, like the standards undergirding them, often fell short of measuring the most important knowledge and skills that are critical for being prepared for college and for work.

Three sentences, and only one is correct. Tests of old did fall short. Tests of new are not any better. First, we still have no idea what qualities are needed for college and career readiness. Nobody anywhere has a proven checklist of those qualities, particularly not a checklist that covers qualities common to every single major at every single college plus every single career option. And since we don't know what the qualities are, we certainly don't know how to test for them on a BS Test. So the first sentence in the above paragraph is false.

Second, the current BS Tests do not measure a "broad range of knowledge and skills." They cover reading and math. And not only do they cover a narrow range of disciplines, but they are deliberately designed not to cover knowledge. Reading tests are based on the (false) assumption that reading is a set of skills that exist independent of any prior knowledge. Despite claims to the contrary, test manufacturers still include questions that are essentially vocabulary questions. But are any of these tests covering knowledge of any content, like the plot of Hamlet or the invention of algebra or how to balance a checkbook? And once again, we have no clear idea of what knowledge and skills are "critical for being prepared for college and for work," so there's no way to include them on a test.

PARCC and Smarter Balanced set these advances in motion by establishing common performance levels for the assessments across the states in their consortia...

Do they? Because in practice it seems that states set their own performance levels-- in fact, that was one of the reasons many left the consortium. Slover and Muldoon cite several pieces of "research" throughout, but since they frequently turn to the Fordham Institute, and Fordham is well paid to promote Common Core and testing, I'm prepared to be unmoved by those citations.\

The fact that these common performance levels are shared by multiple states means that for the first time at this scale, states are able to compare individual student results. 

But, they aren't. 34 states aren't using consortium tests, so we're still comparing apples and oranges and mangos and hamburgers. They toss out an NCES study that shows... something? States have raised cut scores compared to the NAEP, which proves... what?

Taken together, this research is clear that the consortia assessments, particularly PARCC, set a higher standard for student proficiency and that most other states—whether administering a consortium test or not—raised the bar as well. These new, shared expectations of what students should know and be able to do reflect the expectations of the world of college and the workforce much more fully than did their predecessors.

And so, after almost a decade of this, where's the payoff. If these new expectations do reflect college and workforce preparation (and we should believe they do based on what, exactly-- what research helped you know and measure the unknowable and unmeasurable) then where's the payoff. Where are the mobs of high school graduates now sailing through college because they are so ready? Where are the colleges saying, "We've just stopped offering remedial classes for freshmen because nobody needs them"? Where are the businessmen saying, "We're thriving because today's high school grads are so totally ready for us?" Even reformer Jay Greene has been pointing out that raising BS Test scores doesn't appear to reflect any reality in the actual world.

For many years, large-scale assessments have been a black box for educators, providing limited opportunities for them to participate in test development and little information on what's assessed, how it will be scored, and what to do with the results. While many states have historically had a representative set of teachers review test items, the consortia were able to foster a depth and breadth of educator engagement that set a new bar for the industry. Indeed, the consortia engaged thousands of classroom educators to review items and offer insights on development of key policies such as accessibility and accommodations and performance-level setting.

BS Tests have remained firmly sealed in the black box. PARCC has been aggressive in monitoring and tracking down students and teachers who violate the requirement for test secrecy. Teachers are not even supposed to look at the test, and when students or teachers leak even a general description of test items, PARCC has tracked them down. In 2016, when a set of items leaked, PARCC had Google take down every blog post that provided even a vague general description of the items (I know, because one of the posts taken down was mine). Nobody is ever supposed to discuss the contents of the test, ever. Teachers and students get test scores back, but they may never know exactly what questions were missed. None of this has to do with test quality; it is strictly to control costs for the test manufacturers. If items are never leaked, they can be recycled, because making an actual new test would cut into company profits.

As long as the top secret requirements for test contents are in place, claims of transparency are a joke. Allowing a small group of handpicked educators to "review items" does not change the fact that under the new testing regime, teachers have even less information about "what's assessed [and] how it will be scored." Nor can PARCC, which is in the business of selling testing and not actual teaching, offer useful advice about what to do with the scores, and since teachers aren't allowed to know where exactly the score came from, it remains a useless piece of data. The release of old items is of little use, and the claim that "engagement from teachers and administrators helped align the assessments with instructional practices effective teachers use in the classroom" is a fancy way of saying that some folks have figured out some effective test prep techniques. Just in case I haven't been clear on this before, test prep is not education.

The design of the assessments has also helped push the education field in important ways by sending signals about the critical knowledge and skills for students to master at each grade level.

It is not admirable to use testing as backdoor method of taking control of curriculum. Particular because large scale standardized testing has pushed curriculum in the direction of test prep.

Writing is a prime example: The consortia assessments include more extensive measurement of writing than most previous state assessments, and include a strong focus on evidence-based writing.

Writing is a prime example of how these tests have failed. The "evidence-based writing" questions are a grotesque parody of actual writing; these questions start with the assumption that everyone would respond to the prompt with the exact same paragraph, and instead of doing actual writing, requires students to select from among pre-written sentences, or to choose which piece of "evidence" they are supposed to use. These writing tests require huge amounts of test prep, because they don't reflect anything that actual writers in the real world do-- they just reflect what test manufacturers are able to do (at low cost and maximum standardization) to pretend to test writing.

Slover and Muldoon to back this up by offering that "we have heard from educators" that this has really helped with writing across the curriculum. But if we're going to talk "evidence-based", then saying "we have heard from educators" is glaringly weak evidence.

But keep your hand on your jaws, because more droppage is on the way. In talking about how PARCC and SBA helped pioneer the use of computers to deliver assessments, Slover and Muldoon offer this claim:

Technology-enhanced items allowed for measuring knowledge and skills that paper and pencil tests could not assess, typically deeper learning concepts; computer-delivered tests could also allow for more efficient test administration technology and improve access to the assessments for students with disabilities and English learners.

Chew on that. Computer tests can measure knowledge and skills that paper tests cannot. Really? Name one. Okay, there are actually several, all related to being able to operate a computer-- hence standardized tests in which student score hinges on their ability to deal with the software and hardware of the interface. Is the student comfortable with a mouse? Are they able to read selections through those tiny windows that only show a few lines of text at a time? Can they deal with scrolling within scrolling? These can all end up mattering, and the cure is simple-- more test prep on operating a computerized testing environment.

But if the suggestion here is that computers can test reading skills or math knowledge that paper cannot, I'm stumped as to what those skills and knowledge could actually be. What "deeper learning concepts" are only accessed by computer?

I will give them the improved access for some students. That one I believe.

Slover and Muldoon are also mostly correct to say that computer based tests can be scored faster and are cheaper than paper (a savings that may or may not be passed on to schools). But they also require tests that only ask the questions that a computer can score. This is why test manufacturers dream of software that can score writing samples, but despite their frequent claims, they have still failed to do so, and computer based tests still require questions with simple answers. Even then, students have to learn to think like the programmers. As a Study Island student once told me, "I know the answer. I just can't figure out how the program wants me to say it."

Slover and Muldoon find it "remarkable" that so many states have transitioned to online testing, sidestepping the more important question of whether or not that's a good thing. And of course they completely ignore the question of data mining and the security and uses of that data.

Above all, the experience of the consortia demonstrated that collective state action on complex work is doable. It can improve quality significantly, and it can leverage economies of scale to make better use of public dollars. Indeed, states that left the consortia to go it alone, ended up spending millions of dollars to develop their new tests from scratch.

Does it prove that kind of work is doable? Because at the moment, that work remains undone. States may have found that "going it alone" was expensive-- and yet, that didn't move any of them to say to the consortium, "We want to come back!" In fact, one of the things that didn't happen is for a state to switch teams-- nobody said, "We'd like to back out of PARCC so that we can join up with SBA." All of this would suggest that vast majority of states found "collective state action" not very appealing or effective. The myth of "improved quality" sounds nice, but it's not an evidence-based statement; it's simply a piece of marketing fluff.

Slover and Muldoon claim there is more to do, and like Arne Duncan, they blame the failure to achieve certain Grand Goals on politics.

For example, concerns about testing time caused the PARCC states to move away from their initial bold vision of embedding the assessments into courses and distributing them throughout the year. This was an innovative design that would have more closely connected assessment to classrooms, but states ultimately determined it was too challenging to implement at scale.

Yes, that was a terrible idea that nobody wanted to pursue, in no small part because it boiled down to letting PARCC and SBA design your entire local scope and sequence. This is a bad idea for several reasons. One is that involves essentially privatizing public schools and leaving local curriculum design in the hands of a test manufacturing business. And if you can get past that, there's the part where the tests aren't very good, and designing a course around the multiple bad tests over the course of the year yields a bad course. And finally-- who the heck thinks that more standardized tests would be a good idea. Slover and Muldoon seem truly oblivious to the degree to which testing has shortened the teaching year. And testing is not teaching. Test prep is not education. They say that "luckily" ESSA opens the door to this foolishness. States would be better off opening the door to a candygram from a land shark.

And Slover and Muldoon are still sad that all these individual tests get in the way of compare test results across state lines. "Parents and policymakers" are supposedly sad about this, because all the time parents say, "Well, I can see Pat's results, but how does his school compare to one that's seven hundred miles away?" What about the NAEP?

In contrast, NAEP—which is administered once every two years to a sample of students in 4th, 8th, and 12th grades—serves as an important high-level barometer of student progress in the nation, but doesn’t provide information to school systems that can be used to inform academic programming, supports and interventions, or professional learning.

Holy shniekies where are my blood pressure pills!? The PARCC and SBA do not, do not, do not, DO NOT provide actionable data that can be used to "inform academic programming, supports and interventions, or professional learning." They just don't. They provide numbers that are the data equivalent of saying, "This kid did well, this kid did not so well, this kid did very well, this kid sucked, etc etc etc And no, you may not know what exactly they did well or poorly." Lacking any real actionable data, schools have been reduced to trying various test prep programs, and that's it.

Slover and Muldoon are sad that opt outs in some states have made the data "not as useful" because they didn't reflect all the students. The data were never useful.

And-- oh, Lordy, here we go--

Finally, we learned that leaders taking on an ambitious reform agenda should not give short shrift to the communications and outreach required to build support for and understanding of the work—including building strong relationships with stakeholders and seeking to form coalitions of supporters. Reform leaders should not assume that good work on its own will win the day, especially if key stakeholders don’t know about or support it.

It's the last resort of every failed reformster-- "We didn't fail. The PR just wasn't good enough. People just didn't understand."

It is true that Common Core and the related testing regimen were rolled out like a steamroller, with an antagonistic attitude of "we know you public schools and public school teachers all suck and we're going to force you to shape up" that didn't help matters. But do Reformsters want to argue that this wasn't really their attitude and they were just faking it to motivate us? I mean, the "public ed sucks and is filled with bad teachers who must be forced to do their jobs as we see fit" was offensive, but I think it was at least honest.

And what pitch would have been better? "We'd like to roll out a battery of unproven tests, and we'd like to use them as a means of finding and punishing bad schools, and maybe bad teachers, too. And we'd like to take up a chunk of your 180 days of school to administer it. And we'd like to keep everything in the test a secret so that you never know exactly what your students messed up on. And the best predictor of how students will do on these tests will be their socio-economic background. And while we're at it, we'd like to tell you what should be teaching, because any professional expertise you might have doesn't mean squat to us."

How exactly could that have been rephrased to better win hearts and minds?

Come on. You guys sought really hard to build coalitions of supporters by doing things like having guys like Bill Gates write huge checks to astroturf test advocacy and Common Core groups. You sold the national unions on it. Support didn't erode because people didn't understand what was going on. It was the exact opposite; the more people on the ground saw how this reformster idea played out, the less they liked it.

While some Reformsters like Jay Greene are looking at the evidence and honestly reflecting on how this all failed, Slover and Muldoon are saying "good works on its own" won't necessarily win the day. And while that may be true, it's also true (and any classroom teacher can tell you) that when something really doesn't succeed, you might want to question your assumptions about how good it was in the first place and not just start blaming politics and bad attitudes and everything except the crappiness of your idea.

That is not going to happen here. Slover and Muldoon will wrap up by saying, again, that "the quality of state testing has improved substantially in recent years," having provided no evidence that this is actually true. I don't know if Slover and Muldoon are cynical publicists for the cause or simply deep in denial, but it is long past time to keep trying to sell the PARCC as a success. I'll grant you this-- it has been part of a testing program that has successfully paved the way for Competency Based Education and Personalized [sic] Learning-- but that is nothing to be proud of. Better to just get off that river in Egypt.


  1. Dear Leslie and Laura,
    Please Google the, "Does this count?" rule.
    Best wishes

  2. I live in MD. We have CC and PARCC. Our Governor doesn't like PARCC because the parents are unhappy, so we will be dumping PARCC for a computer adaptive test that aligns better to our curriculum and takes less time. New Meridian will likely be the vendor.....guess who owns PARCC?.....New Meridian. Rebranding is all that we will get. The Governor knows it and he thinks we're all too stupid to figure this out.

  3. "You sold the national unions on it."

    Well, to be fair, it wasn't exactly a hard sell....