CURMUDGUCATION: BS Test

Showing posts with label BS Test. Show all posts

Wednesday, March 2, 2016

Ace That Test? I Think Not.

The full court press for the Big Standardized Test is on, with all manner of spokespersons and PR initiatives trying to convince Americans to welcome the warm, loving embrace of standardized testing. Last week the Boston Globe brought us Yana Weinstein and Megan Smith, a pair of psychology assistant professors who have co-founded Learning Scientists, which appears to be mostly a blog that they've been running for about a month. And say what you like-- they do not appear to be slickly or heavily funded by the Usual Gang of Reformsters.

Their stated goals include lessening test anxiety and decreasing the negative views of testing. And the reliably reformy Boston Globe gave them a chance to get their word out. Additionally, the pair blogged about additional material that did not make it through the Globe's edit.

The Testing Effect

Weinstein and Smith are fond of "the testing effect" a somewhat inexact term used to refer to the notion that recalling information helps people retain it. It always makes me want a name for whatever it is that makes some people believe that the only situation in which information is recalled is a test. Hell, it could be called the teaching effect, since we can get the same thing going by having students teach a concept to the rest of the class. Or the writing effect, or the discussion effect. There are many ways to have students sock information in place by recalling it; testing is neither the only or the best way to go about it.

Things That Make the Learning Scientists Feel Bad

From their blog, we learn that the LS team feels "awkward" when reading anti-testing writing, and they link to an example from Diane Ravitch. Awkward is an odd way to feel, really. But then, I think their example of a strong defense of testing is a little awkward. They wanted to quote a HuffPost pro-testing piece from Charles Coleman that, they say, addresses problems with the opt out movement "eloquently."

"To put it plainly: white parents from well-funded and highly performing areas are participating in petulant, poorly conceived protests that are ultimately affecting inner-city blacks at schools that need the funding and measures of accountability to ensure any hope of progress in performance." -- Charles F. Coleman Jr.

Ah. So opt outers are white, rich, whiny racists. That is certainly eloquent and well-reasoned support of testing. And let's throw in the counter-reality notion that testing helps poor schools, though after over a decade of test-driven accountability, you'd think supporters could rattle off a list of schools that A) nobody knew were underfunded and underresourced until testing and B) received an boost through extra money and resources after testing. Could it be that no such list actually exists?

Tests Cause Anxiety

The LS duo wants to decrease test anxiety by hammering students with testing all the time, so that it's no longer a big deal. I believe that's true, but not a good idea. Also, parents and teachers should stop saying bad things about the BS Tests, but just keep piling on the happy talk so that students can stop worrying and learn to love the test. All of this, of course, pre-supposes that the BS Tests are actually worthwhile and wonderful and that all the misgivings being expressed by professional educators and the parents of the children is-- what? An evil plot? Widespread confusion? The duo seem deeply committed to not admitting that test critics have any point at all. Fools, the lot of them.

Teaching to the Test

The idea that teaching to a test isn’t really teaching implies an almost astounding assumption that standardized tests are filled with meaningless, ill-thought-out questions on irrelevant or arbitrary information. This may be based on the myth that “teachers in the trenches” are being told what to teach by some “experts” who’ve probably never set foot in a “real” classroom.

Actually, it's neither "astounding" nor an "assumption," but, at least in the case of this "defiant" teacher (LS likes to use argument by adjective), my judgment of the test is based on looking at the actual test and using my professional judgment. It's a crappy test, with poorly-constructed questions that, as is generally the case with a standardized test, mostly test the student's ability to figure out what the test manufacturer wants the student to choose for an answer (and of course the fact that students are selecting answers rather than responding to open ended prompts further limits the usefulness of the BS Test).

But LS assert that tests are actually put together by testing experts and well-seasoned real teachers (and you can see the proof in a video put up by a testing manufacturer about how awesome that test manufacturer is, so totally legit). LS note that "defiant teachers" either "fail to realize" this or "choose to ignore" it. In other words, teachers are either dumb or mindlessly opposed to the truth.

Standardized Tests Are Biased

The team notes that bias is an issue with standardized tests, but it's "highly unlikely" that classroom teachers could do any better, so there. Their question-- if we can't trust a big board of experts to come up with an unbiased test, how can we believe that an individual wouldn't do even worse, and how would we hold them accountable?

That's a fair question, but it assumes some purposes for testing that are not in evidence. My classroom tests are there to see how my students have progressed with and grasped the material. I design those materials with my students in mind. I don't, as BS Tests often do, assume that "everybody knows about" the topic of the material, because I know the everybody's in my classroom, so I can make choices accordingly. I can also select prompts and test material that hook directly into their culture and background.

In short, BS Testing bias enters largely because the test is designed to fit an imaginary Generic Student who actually represents the biases of the test manufacturers, while my assessments are designed to fit the very specific group of students in my room. BS Tests are one-size-fits-all. Mine are tailored to fit.

Reformsters may then say, "But if yours are tailored to fit, how can we use them to compare your students to students across the nation." To which I say, "So what?" You'll need to convince me that there is an actual need to closely compare all students in the nation.

Tests Don't Provide Prompt Feedback

The duo actually agree that test "have a lot of room for improvement." They even acknowledge that the feedback from the test is not only late, but generally vague and useless. But hey-- tests are going to be totes better when they are all online, an assertion that makes the astonishing assumption that there is no difference between a paper test and a computer test except how the students record their answers.

Big Finish

The wrap up is a final barrage of Wrong Things.

Standardized tests were created to track students’ progress and evaluate schools and teachers.

Were they? Really? Is it even possible to create a single test that can actually be used for all those purposes? Because just about everyone on the planet not financially invested in the industry has pointed out that using test results to evaluate teachers via VAM-like methods is baloney. And tests need to be manufactured for a particular purpose-- not three or four entirely different ones. So I call shenanigans-- the tests were not created to both measure and track all three of those things.

Griping abounds about how these tests are measuring the wrong thing and in the wrong way; but what’s conspicuously absent is any suggestion for how to better measure the effect of education — i.e., learning — on a large scale.

A popular reformster fallacy. If you walk into my hospital room and say, "Well, your blood pressure is terrible, so we are going to chop off your feet," and then I say, "No, I don't want you to chop off my feet. I don't believe it will help, and I like my feet," your appropriate response is not, "Well, then, you'd better tell me what else you want me to chop off instead.

In other words, what is "conspicuously absent" is evidence that there is a need for or value in measuring the effects of education on a large scale. Why do we need to do that? If you want to upend the education system for that purpose, the burden is on you to prove that the purpose is valid and useful.

In the absence of direct measures of learning, we resort to measures of performance.

Since we can't actually measure what we want to measure, we'll measure something else as a proxy and talk about it as if it's the same thing. That is one of the major problems with BS Testing in a nutshell.

And the great thing is: measuring this learning actually causes it to grow.

And weighing the pig makes it heavier. This is simply not true, "testing effect" notwithstanding.

PS

Via the blog, we know that they wanted to link to this post at Learning Spy which has some interesting things to say about the difference between learning and performance, including this:

And students are skilled at mimicking what they think teachers want to see and hear. This mimicry might result in learning but often doesn’t.

That's a pretty good explanation of why BS Tests are of so little use-- they are about learning to mimic the behavior required by test manufacturers. But the critical difference between that mimicry on a test and in my classroom is that in my classroom, I can watch for when students are simply mimicking and adjust my instruction and assessment accordingly. A BS Tests cannot make any such adjustments, and cannot tell the difference between mimicry and learning at all.

The duo notes that their post is "controversial," and it is in the sense that it's more pro-test baloney, but I suspect that much of their pushback is also a reaction to their barely-disguised disdain for classroom teachers who don't agree with them. They might also consider widening their tool selection ("when your only tool is a hammer, etc...") to include a broader range of approaches beyond the "test effect." It's a nice trick, and it has its uses, but it's a lousy justification for high stakes BS Testing.

Tuesday, March 1, 2016

PTA Sells Out

Shannon Sevier, vice-president for advocacy of the National PTA, took to the Huffinmgton Post this week to shill for the testing industry. It was not a particularly artful defense, with Sevier parroting most of the talking points put forth by test manufacturers and their hired government guns.

Sevier starts out by reminiscing about when her children took their Big Standardized Tests, and while there was fear and trepidation, she also claims to remembers "the importance of the assessments in helping my children's teachers and school better support their success through data-driven planning and decision-making."

I'm a little fuzzy on what time frame we'd be talking about, because Sevier's LinkedIN profile seems to indicate that she was working in Europe from 2009-2014. Pre-2009 tests would be a different animal than the current crop. But even if she was commuting, or her children were here in the states, that line is a load of bull.

"Support their success through data-driven planning and decision-making" is fancy talk for "helped design more targeted test prep in order to make sure that test scores went up." No BS Tests help teachers teach. Not one of them. There is no useful educational feedback. There is no detailed educational breakdown of educational goals provided to teachers on a timely basis, and, in fact, in most cases no such feedback is possible because teachers are forbidden to know what questions and answers are on the test.

So, no, Ms. Sevier. That never happened anywhere except in the feverishly excited PR materials of test manufacturers.

Mass opt-out comes at a real cost to the goals of educational equity and individual student achievement while leaving the question of assessment quality unanswered.

Like most of Sevier's piece, this is fuzzier than a year-old gumball from under the bed. Exactly what are the costs to equity and individual student achievement? In what universe can we expect to find sad, unemployed men and women sitting in their van down by the river saying ruefully, "If only I had taken that big standardized test in school. Then my life would have turned out differently."

The consequences of non-participation in state assessments can have detrimental impacts on students and schools. Non-participation can result in a loss of funding, diminished resources and decreased interventions for students. Such ramifications would impact minorities and students with special needs disparately, thereby widening the achievement gap.

Did I mention that Sevier is a lawyer? This is some mighty fine word salad, but its Croutons of Truth are sad, soggy and sucky. While it is true that theoretically, the capacity to withhold some funding from schools is there in the law, it has never happened, ever (though Sevier does point out that some schools in New York got a letter. A letter! Possibly even a strongly worded letter! Horrors!! Did it go on their permanent record??) The number of schools punished for low participation rates is zero, which is roughly the same number as the number of politicians willing to tell parents that their school is going to lose funding because they exercised their legal rights.

And when we talk about the "achievement gap," always remember that this is reformster-speak for "difference in test scores" and nobody has tied test scores to anything except test scores.

More to the point, while test advocates repeatedly insist that test results are an important way of getting needed assistance and support to struggling students in struggling schools, it has never worked that way. Low test scores don't target students for assistance-- they target schools for takeover, turnaround, or termination.

The Sevier segues into the National PTA's position, which is exactly like the administration's position-- that maybe there are too many tests, and we should totally get rid of redundant and unnecessary tests and look at keeping other tests out of the classroom as well, by which they mean every test other than the BS Tests. They agree that we should get rid of bad tests, "while protecting the vital role that good assessments play in measuring student progress so parents and educators have the best information to support teaching and learning, improve outcomes and ensure equity for all children."

But BS Tests don't provide "the best information." The best information is provided by teacher-created, day-to-day, formal and informal classroom assessments. Tests such as PARCC, SBA, etc do not provide any useful information except to measure how well students do on the PARCC, SBA, etc-- and there is not a lick of evidence that good performance on the BS Tests is indicative of anything at all.

I'll give Sevier credit for stopping just sort of the usual assertion that teachers and parents are all thick headed ninnimuggins who cannot tell how students are doing unless they have access to revelatory standardized test scores. But PTA's stalwart and unwavering support seems to be for some imaginary set of tests that don't exist. Their policy statement on testing, says Sevier, advocates for tests that (1) ensure appropriate development; (2) guarantee reliability and implementation of high quality assessments; (3) clearly articulate to parents the assessment and accountability system in place at their child's school and (4) bring schools and families together to use the data to support student growth and learning.

BS Tests like the PARCC don't actually do any of these things. What's even more notable about the PTA policies is that in its full version, it's pretty much a cut and paste of the Obama administrations dreadful Test Action Plan which is in turn basically a marketing reboot for test manufacturers.

Did the PTA cave because they get a boatload of money from Bill Gates? Who knows. But what is clear is that when Sevier writes "National PTA strongly advocates for and continues to support increased inclusion of the parent voice in educational decision making at all levels," what she means is that parents should play nice, follow the government's rules, and count on policy makers to Do The Right Thing.

That's a foolish plan. Over a decade of reformy policy shows us that what reformsters want from parents, teachers and students is compliance, and that as long as they get that, they are happy to stay the course. The Opt Out movement arguably forced what little accommodation is marked by the Test Action Plan and ESSA's assertion of a parent's legal right to opt out. Cheerful obedience in hopes of a Seat at the Table has not accomplished jack, and the National PTA should be ashamed of itself for insisting that parents should stay home, submit their children to the tyranny of time-wasting testing, and just hope that Important People will spontaneously improve the tests. Instead, the National PTA should be joining the chorus of voices demanding that the whole premise of BS Testing should be questioned, challenged, and ultimately rejected so that students can get back to learning and teachers can get back to teaching.

Sevier and the PTA have failed on two levels. First, they have failed in insisting that quiet compliance is the way to get policymakers to tweak and improve test-driven education policies. Second, they have failed in refusing to challenge the very notion of re-organizing America's schools around standardized testing.

Tuesday, February 16, 2016

Leadership and Taking Risks

Nancy Flanagan had a great piece last week at EdWeek. "Defining Teacher Leadership" kicks off with her reaction to this handy meme:

She finds the first part is right on point. But the second part?

Most of the school leaders I encountered in 30 years in the classroom were good people, but the overwhelming majority were cautious rule-followers and cheerleaders for incremental change. The principals followed the superintendent's directives and the folks at Central Office looked to the state for guidance. Most recently, everyone has experienced the heavy hand of the feds--for standards, assessments and "aligned" materials. "Successful" leaders hit benchmarks set far from actual classrooms.

That sounds about right. As does this:

If I had waited for my school leaders to be risk-takers before feeling comfortable with change in my classroom, decades could have gone by.

I'm not sure we need school leaders who are risk takers; it's not the modeling that is most important. The biggest power that principals and superintendents have is not the power to demonstrate risk, but the power to define it.

School leaders get to decide two key aspects of risk-- what constitutes going outside the lines, and what possible consequences go with it. Principal A may run a school where getting caught with students up out of their seats in your classroom may win you a chance to stand in the principal's office while you're screamed at. Principal B may run a school where you can take students outside for an unscheduled sit on the lawn session and all that happens is you hear a, "Hey, shoot me an email before you do that the next time." Principal C, unfortunately, may run a school in which I'd better be on the scheduled scripted lesson at 10:36 on Tuesday, or there will be a letter in my file.

School leaders also get to decide how much they will protect their people. If you're teaching a controversial novel or running a project that may bring blowback form the community or from administrators at a higher level, will your principal help protect you from the heat, or throw you under the bus?

In other words, school leaders don't have to take risks -- they just have to create an environment where it is safe for teachers to take risks.

And teachers do share some responsibility in this risk-taking relationship. I have always had a pretty simple rule (like many rules, I figured it out by breaking it early in my career)-- if I'm about to do anything that could conceivably lead to my principal getting a phone call, I let him know what's going on, and why, and how, ahead of time. He can't support me if he doesn't know what I'm up to.

And of course, risk definition has been partially removed form local hands. Teachers now have personal ratings and school ratings and a host of other reformy accountability consequences riding on teacher choices. It makes leaders more risk averse, and that means clamping down on teacher risk taking as well. The last decade has not exactly fostered a risk-taking atmosphere.

The reformy movement has muddied the water on the other element of risk-- what, exactly, we are risking. Reformsters have tried to move us from , "Oh, no! That lesson didn't actually help my students master the concept I was teaching, meaning we lost a period of school and will have to try this again tomorrow" to "Oh no! We have low scores on a standardized test and must now lose money or be closed or fire somebody." Accountability and new standards and the Big Standardized Test have convinced too many administrators that teachers that take risks are now taking huge risks for enormous stakes and maybe we had all better just take it really, really easy and play it super, super safe and get back to those nice new test prep materials we just bought.

So I don't need my school leaders to model risk-taking for me. I just need them to provide me with a workplace where it's okay safe for me to try a few things and see if I can find interesting new paths for success. Which, ironically, is exactly what I am supposed to be providing for my students. If doing my teaching job is like changing a flat tire in the rain, I don't need an administrator who is changing another one of the tires on the car. I need someone who will make sure my tools are handy while they hold an umbrella over my head to keep the rain off me.

Thursday, February 11, 2016

Fordham Provides More Core Testing PR

The Fordham Institute is back with another "study" of circular reasoning and unexamined assumptions that concludes that reformster policy is awesome.

The Thomas B. Fordham Institute is a right-tilted thinky tank that has been one of the most faithful and diligent promoters of the reformster agenda, from charters (they run some in Ohio) to the Common Core to the business of Big Standardized Testing.

In 2009, Fordham got an almost-a-million dollar grant from the Gates Foundation to "study" Common Core Standards, the same standards that Gates was working hard to promote. They concluded that the Core was swell. Since those days, Fordham's support team has traveled across the country, swooping into various state legislators to explain the wisdom of reformster ideas.

This newest report fits right into that tradition.

Evaluating the Content and Quality of Next Generation Assessments is a big, 122-page monster of a report. But I'm not sure we need to dig down into the details, because once we understand that it's built on a cardboard foundation, we can realize that the details don't really matter.

The report is authored by Nancy Doorey and Morgan Polikoff. Doorey is the founder of her own consulting firm, and her reformy pedigree is excellent. She works as a study lead for Fordham, and she has worked with the head of Education Testing Services to develop new testing goodies. She also wrote a nice report for SBA about how good the SBA tests were. Polikoff is a testing expert and professor at USC at Rossier. He earned his PhD from UPenn in 2010 (BA at Urbana in 2006), and immediately raised his profile by working as a lead consultant on the Gates Measures of Effective Teaching project. He is in high demand as an expert on how test and implement Common Core, and he has written a ton about it.

So they have some history with the materials being studied.

So what did the study set out to study? They picked the PARCC, SBA, ACT Aspire and Massachussetts MCAS to study. Polikoff sums it up in his Brookings piece about the report.

A key hope of these new tests is that they will overcome the weaknesses of the previous generation of state tests. Among these weaknesses were poor alignment with the standards they were designed to represent and low overall levels of cognitive demand (i.e., most items requiring simple recall or procedures, rather than deeper skills such as demonstrating understanding). There was widespread belief that these features of NCLB-era state tests sent teachers conflicting messages about what to teach, undermining the standards and leading to undesired instructional responses.

Or consider this blurb from the Fordham website:

Evaluating the Content and Quality of Next Generation Assessments examines previously unreleased items from three multi-state tests (ACT Aspire, PARCC, and Smarter Balanced) and one best-in-class state assessment, Massachusetts’ state exam (MCAS), to answer policymakers’ most pressing questions: Do these tests reflect strong content? Are they rigorous? What are their strengths and areas for improvement? No one has ever gotten under the hood of these tests and published an objective third-party review of their content, quality, and rigor. Until now.

So, two main questions-- are the new tests well-aligned to the Core, and do they serve as a clear "unambiguous" driver of curriculum and instruction?

We start from the very beginning with a host of unexamined assumptions. The notion that Polikoff and Doorey or the Fordham Institute are in any way an objective third parties seems absurd, but it's not possible to objectively consider the questions because that would require us to unobjectively accept the premise that national or higher standards have anything to do with educational achievement, that the Core standards are in any way connected to college and career success, that a standardized test can measure any of the important parts of an education, and that having a Big Standardized Test drive instruction and curriculum is a good idea for any reason at all. These assumptions are at best highly debatable topics and at worst unsupportable baloney, but they are all accepted as givens before this study even begins.

And on top of them, another layer of assumption-- that having instruction and curriculum driven by a standardized test is somehow a good thing. That teaching to the test is really the way to go.

But what does the report actually say? You can look at the executive summary or the full report. I am only going to hit the highlights here.

The study was built around three questions:

Do the assessments place strong emphasis on the most important content for college and career readiness(CCR), as called for by the Common Core State Standards and other CCR standards? (Content)

Do they require all students to demonstrate the range of thinking skills, including higher-order skills, called for by those standards? (Depth)

What are the overall strengths and weaknesses of each assessment relative to the examined criteria forELA/Literacy and mathematics? (Overall Strengths and Weaknesses)

The first question assumes that Common Core (and its generic replacements) actually includes anything that truly prepares students for college and career. The second question assumes that such standards include calls for higher-order thinking skills. And the third assumes that the examined criteria are a legitimate measures of how weak or strong literacy and math instruction might be.

So we're on shaky ground already. Do things get better?

Well, the methodology involves using the CCSSO “Criteria for Procuring and Evaluating High-Quality Assessments.” So, here's what we're doing. We've got a new ruler from the emperor, and we want to make sure that it really measures twelve inches, a foot. We need something to check it against, some reference. So the emperor says, "Here, check it against this." And he hands us a ruler.

So who was selected for this objective study of the tests, and how were they selected.

We began by soliciting reviewer recommendations from each participating testing program and other sources, including content and assessment experts, individuals with experience in prior alignment studies, and several national and state organizations.

That's right. They asked for reviewer recommendations from the test manufacturers. They picked up the phone and said, "Hey, do you anybody who would be good to use on a study of whether or not your product is any good?"

So what were the findings?

Well, that's not really the question. The question is, what were they looking for? Once they broke down the definitions from CCSSO's measure of a high-quality test, what exactly were they looking for? Because here's the problem I have with a "study" like this. You can tell me that you are hunting for bear, but if you then tell me, "Yup, and we'll know we're seeing a bear when we spot its flowing white mane and its shiny horn growing in the middle of its forehead, galloping majestically on its noble hooves while pooping rainbows."

I'm not going to report on every single criteria here-- a few will give you the idea of whether the report shows us a big old bear or a majestic, non-existent unicorn.

Do the tests place strong emphasis on the most important content etc?

When we break this down it means--

Do the tests require students to read closely and use evidence from texts to obtain and defend responses?

The correct answer is no, because nothing resembling true close reading can be done on a short excerpt that is measured by close-ended responses that assume that all proper close readings of the text can only reach one "correct" conclusion. That is neither close reading (nor critical thinking). And before we have that conversation, we need to have the one where we discuss whether or not close reading is, in fact, a "most important" skill for college and career success.

Do the tests require students to write narrative, expository, and persuasive/argumentation essays (across each grade band, if not in each grade) in which they use evidence from sources to support their claims?

Again, the answer is no. None of the tests do this. No decent standardized test of writing exists, and the more test manufacturers try to develop one, the further into the weeds they wander, like the version of a standardized writing I've seen that involves taking an "evidence" paragraph and answering a prompt according to a method so precise that all "correct" answers will be essentially identical. If there is only one correct answer to your essay question, you are not assessing writing skills. Not to mention what bizarre sort of animal a narrative essay based on evidence must be.

Do the tests require students to demonstrate proficiency in the use of language, including academic vocabulary and language conventions, through tasks that mirror real-world activities?

None, again. Because nothing anywhere on a BS Tests mirrors real-world activities. Not to mention how "demonstrate proficiency" ends up on a test (hint: it invariably looks like a multiple choice Pick the Right Word question).

Do the tests require students to demonstrate research skills, including the ability to analyze, synthesize organize, and use information from sources?

Nope. Nope, nope, nope. We are talking about the skills involved in creating a real piece of research. We could be talking about the project my honors juniors complete in which they research a part of local history and we publish the results. Or you could be talking about a think tank putting together some experts in a field to do research and collecting it into a shiny 122-page report. But you are definitely not talking about something that can be squeezed into a twenty-minute standardized test section with all students trying to address the same "research" problem with nothing but the source material they're handed by the test. There are little-to-none research skills tested there.

How far in the weeds does this study get?

I look at the specific criteria for the "content" portion of our ELA measure, and I see nothing that a BS Test can actually provide, including the PARCC test for which I examined the sample version. But Fordham's study gives the PARCC a big fat E-for-excellent in this category.

The study "measures" other things, too.

Depth and complexity are supposed to be a thing. This turns out to be a call for higher-order thinking, as well as high quality texts on the test. We will, for the one-gazzillionth time, skip over any discussion of whether you can be talking about true high quality, deeply complex texts when none of them are ever longer than a page. How exactly do we argue that tests will cover fully complex texts without ever including an entire short story or an entire novel?

But that's what we get when testing drives the bus-- we're not asking "What would be the best assortment of complex, rich, important texts to assess students on?" We are asking "what excerpts short enough to fit in the time frame of a standardized text will be good enough to get by?"

Higher-order responses. Well, we have to have "at least one" question where the student generates rather than selects an answer. At least one?! And we do not discuss the equally important question of how that open response will be scored and evaluated (because if it's by putting a narrow rubric in the hands of a minimum-wage temp, then the test has failed yet again).

There's also math.

But I am not a math teacher, nor do I play one on television.

Oddly enough

When you get down to the specific comparisons of details of the four tests, you may find useful info, like how often the test has "broken" items, or how often questions allow for more than one correct answer. I'm just not sure these incidentals are worth digging past all the rest. They are signs, however, that researchers really did spend time actually looking at things, which shouldn't seem like a big deal, but in world where NCTQ can "study" teacher prep programs by looking at commencement fliers, it's actually kind of commendable that the researchers here really looked at what they were allegedly looking at.

What else?

There are recommendations and commendations and areas of improvement (everybody sucks-- surprise-- at assessing speaking and listening skills), but it doesn't really matter. The premises of this entire study are flawed, based on assumptions that are either unproven or disproven. Fordham has insisted they are loaded for bear, when they have, in fact, gone unicorn hunting.

The premises and assumptions of the study are false, hollow, wrong, take your pick. Once again, the people who are heavily invested in selling the material of reform have gotten together and concluded once again that they are correct, as proven by them, using their own measuring sticks and their own definitions. An awful lot of time and effortappears to have gone into this report, but I'm not sure what it good it does anybody except the folks who live, eat and breathe Common Core PR and Big Standardized Testing promotion.

These are not stupid people, and this is not the kind of lazy, bogus "research" promulgated by groups like TNTP or NCTQ. But it assumes conclusions not in evidence and leaps to other conclusions that cannot be supported-- and all of these conclusions are suspiciously close to the same ideas that Fordham has been promoting all along. This is yet another study that is probably going to be passed around and will pick up some press-- PARCC and SBA in particularly will likely cling to it like the last life preserver on the Titanic. I just don't think it proves what it wants to prove.

Monday, February 8, 2016

CAP: The Promise of Testing

CAP is back with another one of its "reports." This one took four whole authors to produce, and it's entitled "Praise Joyous ESSA and Let a Thousand Tests Bloom." Ha! Kidding. The actual report is "Implementing the Every Student Succeeds Act: Toward a Coherent, Aligned Assessment System."

The report is sixty-some pages of highly-polished CAP-flavored reformster baloney, and I've read it so you don't have to, but be warned-- this journey will be neither short nor sweet. But we have to take it in one shot, so you can see the entirety of it, because there are large swaths of their argument that you probably agree with.

Who is CAP, again?

The Center for American Progress is billed as a left-leaning thinky tank, but it has also served as a holding tank for Clintonian beltway denizens. It was formed by John Podesta and run by him between his gigs as Bill Clinton's Chief of Staff and Hillary Clinton's campaign chairman, and has provided food and shelter to many Clinton staffers who didn't want to have to leave DC while waiting for their next shot at the Big Show.

CAP loves the whole privatizing charterfying profiteering common core cheering reformster agenda. In fact, CAP's deep and abiding love for the Common Core has burned brighter than a thousand stars and longer than even Jeb! Bush's willingness to keep saying that name. CAP has stymied me, taxing my ability to invent new versions of the headline "CAP says something stupid in support of Common Core" (see here, here, here, and here).

If the last fifteen years have seen the building of a revolving door, education-industrial complex on par with the military and food industries, then CAP is right in the center of that culture. They have never met an ed reform idea they didn't like or promote, and they are not afraid to manufacture slick, baloney-stuffed "reports" to push the corporate agenda.

So that's who produced this big chunk of goofiness.

Introduction

Like many other advocacy groups, CAP sees a golden opportunity in ESSA, and that golden opportunity is all about the testing.

States and districts must work together to seize this opportunity to design coherent, aligned assessment systems that are based on rigorous standards. These systems need to include the smart and strategic use of formative and interim tests that provide real-time feedback to inform instruction, as well as high-quality summative tests that measure critical thinking skills and student mastery of standards.

So how can states build on the research base and knowledge regarding high-quality assessments in order to design systems that do not just meet the requirements of federal law but actually drive student learning to a higher level—especially for students from marginalized communities?

And later, CAP says that this report "outlines a vision and provides specific recommendations to help federal, state and local leaders realize the promise of tests." The promise of tests? Not students, not education, not learning, not empowering communities to help their children grow into their best selves. Nope. The promise of tests. So, as is too often the case, we've skipped right the question of "should we" and will proceed directly to "how," setting out once again to do a better job of more precisely hitting the absolutely wrong target. Yay.

History Lesson from Alternate Universe

CAP will now set the stage by hanging a backdrop of Things That Are Not True.

High-quality assessments play a critical role in student learning and school improvement. No, not really. Well, maybe, in the sense that "critical" is a pretty vague word.

High-quality tests can also show how well states, districts, and schools are doing in meeting the educational needs of all students. No. At least, not any allegedly high quality tests that currently exist.

CAP is willing to acknowledge that testing is "driving the agenda" and that's Not Good. They even acknowledge that despite their "research" showing that tests only take up 2% of school time, lots of folks have noticed that standardized testing has become the focus of too many schools.

CAP wants you to know that the ESSA has many cool, shiny features. It requires states to use broader measures and afford flexibility. CAP thinks ESSA might lead to less teacher evaluation emphasis on testing, maybe. There is money available for tweaking testing, including $$ for "innovation."

There's more history, like a history of tests. CAP equates the Socratic method with testing. They also cite the establishment of the Chinese testing that helped kick off centuries of conformity and non-innovation (read Yong Zhao's Who's Afraid of the Big Bad Dragon). We work our way through the present, skipping the parts where tests were useful for eugenics and Keeping the Lessers in Their Place.

Then we insert the usual Story of Accountability, beginning in 1983 Nation at Risk, which I always think is a bold choice since Nation at Risk predicted that the country would have collapsed by now, so maybe it's not such a great authority.

Then we move on to the "promise of the Common Core State Standards," and as usual, CAP is shameless in its willingness to recycle old baloney like "the Common Core Standards are comparable to the academic standards in the highest performing nations in the world" (this leads us, by a circuitous route, back to some Fordham Core promotional work) and in reference to the Core testing, "like the Common Core, these tests are more rigorous and of higher quality than what many previous states had before." It's a big word salad with baloney on top. CAP also lauds the imaginary "shifts in learning" which are supported by a footnote to the Common Core website, so you know it must be true.

The state of testing

CAP explains the three types of test (formative, interim and summative) and notes that federally mandated tests are summative, and are "used to give students, parents and educators a detailed picture of student progress toward meeting state standards over the past school year" and I wonder, do they giggle when they write this, or have they smacked themselves the brain with the PR sledgehammer so many times that they just don't feel it any more? The current Big Standardized Tests of course don't provide a detailed picture of anything at all.

CAP also wants us to know about high-quality tests, which "measure critical thinking and problem-solving skills" and why don't we also say that they measure the number of unicorns grazing in the fields of rainbow cauliflower growing behind the school, because they do both equally well. But CAP wants us to know that "good assessments are also field tested and evaluated by experts," so suddenly many of the BS Tests aren't looking too good.

CAP acknowledges the anti-test movement, but goes on to say that despite the backlash, national polling data shows that people really love the tests. Why, polls by Education Next and Education Post both found signs of the testing love! This is as surprising as a poll commissioned by the National Mustard Manufacturers that discovers a wide-spread love for mustard-- Post and Next are both unabashedly advocate, push for, and profit from the testing, reform and privatization industry. CAP also takes us on a tour of the many states that have tried to tweak the testing biz one way or another, and I would take you through those, but we still have pages and pages to go, my friends.

Methodology

CAP takes this moment to share their methodology, which appears to be that they held some focus groups, talked to some people, and checked in with some parents, some rich and some poor, according to CAP. How these people were either located or selected is a mystery--they could have been random strangers from the street or CAP family members. They also made sure to talk to some other thinky tank pro-reform profiteering groups like Achieve, the Education Trust, and the College Board. They describe their sample as a "wide variety of stakeholders and experts," and we will just have to take their word for it.

What did they find out?

So what are some of things discovered in this vaguely defined researchy sort of activity?

Parents want better tests.

Here we see a return of the classic "just misunderstood" story line; the value of tests needs to be "made more evident" to parents. The report quotes one parent as "not against standardized testing, because there is a need to understand on a national level whether our children are being educated and where different districts need to have extra resources and the like." Which is a great quote, and might be a useful purpose for testing, except that it doesn't work that way under current reformster programs. Instead of, "Hey, this school is clearly underfunded and undersupported," we hear cries of, "Hey, this school has low scores. We must rescue students from it with charters and maybe close it, too."

And while parents in the focus group seem to see global and large-scale uses for testing, they aren't getting much use out of them for their own children.

Teachers do not get the time and support they need

This section is shockingly frank, reporting teachers who got PD about the PARCC when it wasn't completed, and teachers who report essentially being told to get those test scores up, never mind the regular instruction. Shocking, huh? I wonder what created that sort of atmosphere. We will all just numbly skip over the issue of whether these reformsters ever listen to a single word that teachers say, because remember-- when you want to creatively disrupt and make over an entire field, it's important to disregard any noise fomr the trained, experienced practitioners in that field.

Communications to stakeholders is weak

Yes, it's the PR. Common Core and BS Tests are just misunderstood. If only people could be re-educated about the tests. Maybe at a nice camp somewhere. (Bonus sidebar lauds the PARCC for their clear and colorful report card, which uses nice graphics to tell parents far less useful information than could be gleaned from a five-minute phone call to your child's teacher.)

Fun sidenote: several parents reported that they got the most useful information about testing from the John Oliver show segment on tests. That was probably not the kind of info that CAP wanted to have spread.

The Test lacks value for individual students

And that, boys and girls, is how a bureaucrat translates "The students sense that the BS Tests are a bunch of time-wasting bullshit with no connection to their actual lives." In fact, some parents and teachers said they had the impression that BS Test scores aren't even used to influence instruction. It's true. Instruction is also not very influenced by reading the warts of a grey toad under a full moon.

End-of-year summatives are not aligned to instruction

Well, no, they aren't. And as long as your plan is built around a large-scale, one-size-fits-all BS test, they never will be.

Too much test prep is occuring

Well, duh. The BS Tests have high stakes. And while CAP wants to pretend that new BS Tests are just so high quality and awesome that test prep is a waste of everyone's time score-wise, most everybody's experience is the opposite. The most authentic assessment matches the instruction and the actual task being learned. Since reformsters have fixed it so that teachers cannot change the assessment, the only way to make the BS Tests a more authentic assessment is to change what we teach. As long as schools are locked into a statewide high stakes BS Test beyond their control, there will be test prep, and lots of it.

CAP found that test prep was more prevalent among the poorer students. Again, duh. Lower socio-economic status correlates pretty directly to standardized test results. Lower SES students are the ones who need the most extra help to get up to speed on the twisty mindset needed to play the "What does the test writer want me to say here" game.

Weak logistics and testing windows and nutsy bolty things

If the test must be given on a computer and there are only thirty computers in the building, there's a problem. I'm inclined to think the problem is that you are requiring the students to take the test on a computer. Also, CAP has concerns about timing of test results and test taking allowing for accurate measures and useful feedback. I'm prepared to reassure CAP that no matter when or how my students take the BS Test, it will not provide an accurate measure or useful feedback, so y'all can just relax.

So what does CAP think we should do about all this?

So here's what CAP thinks the state, district and school authorities can do "to improve the quality of assessments, address concerns about overtesting, and make assessments more valuable for students, parents, and teachers." And if you've been reading carefully, you can guess where this is going.

Here's what states should do

Develop rules for "robust" testing. Okay, CAP says "principles," but they mean rules. Write some state-level rules about what ever test should look like. Yessirree, what I need in my classroom is some suit from the state capital to tell me how to create an assessment.

Conduct alignment pogroms. Okay, maybe that's not the word they used. But they suggest that states check all up and down the school systems and make sure that every single teacher is fully aligned to the standards (including curriculum and homework). Because thanks to the Ed-Secretary-neutering powers of ESSA, reformsters can now shoot for total instructional control of every school district without raising the Federal Overreach Alarm. Oh, and the alignment should run K-16, so don't think you're getting off so easy, Dr. College Professor.

Since districts may not have the time and resources to make sure that every single solitary assessment is aligned and high quality, states should be ready to lend a hand. Give them some money. Create all the tests and assignments for them, or, you know, just hire some willing corporation to so it.

Demand a quick turnaround on test results. Because that creates more "buy-in" at the local level. Also "a quick turnaround also creates more value, and educators and families can use the assessment results more readily in their decision-making." Oh, yeah-- everyone is just waiting on pins and needles so they can make decisions about Young Chris's future. But about that...

Increase the value of tests for parents, teachers and students. How could we do that? By making better tests! Ha! Just kidding. By offering rewards, like college credits for good performance. Or awards and prizes for high scores. Like stickers and ribbons? Yes, that will make the BS Tests so much more valuable.

Jump on the innovative assessment development grant-band wagon. And here comes the punchline:

If states move forward with performance-based or competency-based assessments, they should consider carefully whether their districts and educators have the capacity and time to create high-quality, valid, reliable, and comparable performance assessments. Instead of looking to dramatically change the content of assessments, states should consider how they can dramatically change the delivery of assessments. States should explore moving away from a single end-of-year test and toward the use of shorter, more frequent interim assessments that measure student learning throughout the year and can be combined into a single summative determination.

Yes, all-testing, all the time. It solves all of our problems-- with one perfectly aligned system that constantly logs and records and data-crunches every canned assignment and delivers the assessments seamlessly through the computer, we can plug students in and monitor every educational step of every educational day.

Finally, states should step up their communication game with better, prettier and more explainier printouts from the uber-aligned 360 degree teaching machine system, so that parents will understand just how much their elder sibling loves them.

What should local districts do?

Bend over and kiss their autonomy goodbye? Ha! Just kidding. CAP would never say that out loud.

Get rid of redundant tests, preferably not the ones that are created by favored vendors.

"Build local capacity to support teachers' understanding of assessment design and administration." God, sometimes I think these guys are morons, and sometimes I think they are evil geniuses. Doesn't "support" sound so much nicer than "re-educate" or "properly indoctrinate." Because I have my own pretty well-developed understanding of assessment design and administration, but if they knew it, I don't think CAP would support it.

"Create coherent systems of high-quality formative and interim assessments that are aligned with state standards." Buy your entire assessment system from a single vendor. One size will fit all.

"Better communicate with parents about tests. To build trust, districts should be more transparent around assessments. This includes posting testing calendars online, releasing sample items, and doing more to communicate about the assessments." You know what's an excellent way to build trust? Behave in a trustworthy manner. Just saying. Also, this is not transparency. Transparency would include things like, say, releasing all the test items so students and parents could see exactly where Young Pat was marked right or wrong.

Tackle logistics. Remember how hard it is for schools to test many students on few computers? Districts should tackle that. It's not clear if that should be, like, a clean ankle grab tackle or districts can go ahead an clothesline that logistic. But CAP does have concrete examples, like "Plan well in advance" with the goal of "minimizing disruption." Thanks, CAP. I bet no district leaders ever thought of planning in advance. I can't believe you gave dynamite advice like that away for free.

What should schools do?

Make testing less torturous. Let students go pee.

Hold an explain-the-test social night. Have principals announce open-office hours so that any parent can stop by at any time to chat about the tests, because I'm sure the principal's day is pretty wide open and flexible.

Tell teachers things so that when parents ask questions, the teachers know the answers.

Oh, and stop unnecessary test prep. Just keep the necessary test prep, which is as much as you need to keep your numbers up. But thanks for the tip-- lots of teachers were in their classroom saying, "This test prep is a total waste of time, but I'm going to do it anyway just for shits and giggles, because I certainly didn't have it in my mind to teach my students useful things."

I am pretty sure that the further from broad policy strokes and the closer to actual classroom issues they get, the dumber CAP becomes.

How about the feds?

Use Title I as a means of threatening states that don't do all the jobs we gave them above. Help all the states that want to build the next generation all-day all-testing regimes. Spread best practices about assessment, because man, if there's anything we have learned over the past fifteen years, it's that when you want good solid answers about how to teach and assess your students, the federal government is the place to turn.

And the final recommendation?

If you are still reading, God bless you, but we needed to travel this twisty road in one go to see where it led.

It is the reformsters oldest and most favorite trick-- X is a clear and present problem, therefore you must accept Y as a solution, and I am going to sell X so well that you will forget to notice that I never explain how Y is any sort of solution.

Overtesting is a problem. Bad testing is a problem. Testing that yields up no useful results is a problem. Bad testing as an annual exercise in time-wasting futility is a problem. Testing driving instruction is a problem. CAP has given more ground on these issues than ever, but it appears to be a ju-jitsu move in hopes of converting all that anti-testing energy into support for Performance Based Education.

Don't like testing? Well, the solution is more testing. All the time. In a one-size-fits-all canned package of an education program. And here's the final huge irony. This is CAP wrapping up with a description of the long-term goal

system leaders should develop a robust, coherent, and aligned system of standards and assessments that measures student progress toward meeting challenging state standards. This exam system should be deeply grounded in the standards as assessed by an end-of-year summative test. Formative and interim assessments administered throughout the year will routinely—at natural transition points in the instructional program, such as the end of a unit—assess student understanding and progress and provide the results to teachers, parents, and students in close to real time. This system will enable everyone involved in a student’s education to make adjustments where needed in order to support learning so that no student slips through the cracks.

You know who does this sort of thing well already? Good, trained, professional classroom teachers. We assess daily, wrap those results back into our plans for the next day, and adjust our instruction to the needs and issues of individual students. We don't give pointless tests that are redundant or disconnected. We wrap larger and more formal assessments in with the informal assessments and we do it while maintaining instruction and looking after our students as if they were actual live human beings. And we do it all in timely manner. Of course, we don't do the things that CAP considers most critical.

For this assessment system to be as useful as possible, alignment is key. All assessments—formative, interim, and summative—must align with academic standards.

At the end of the day, CAP loves testing very much. But the thing they love even more is broadly adopted, all-knowing, all-controlling standards. One size fits all, selected by some Wiser Higher Authority who somehow knows what all human beings must know, and unhindered by those damn classroom teachers and their professional judgment, and all of it giving up a wondrous river of data, a nectar far more valuable than the vulnerable little humans from whom it was squeezed. Jam the standards in and drag the data out. That's CAP's coherent, aligned future.

Saturday, February 6, 2016

PA: Partial Testing Pause

This week Pennsylvania Governor Tom Wolf signed the bill that will delay using the Keystone Exams (our version of the Big Standardized Test for high school students) as a graduation requirement. Though we've been giving the test for a few years, it will now not become a grad requirement until 2019 (postponed from 2017). That's certainly not bad news, but there's no reason to put the party hats on just yet.

First, as (unfortunately) always, it's worth noting that this happens against the backdrop of our leaders' absolute inability to fulfill their most basic function- as I type this, Pennsylvania is on its 221st day without a budget. We are right on track to have the governor preparing next year's budget while this year's budget is still not fully adopted. It is entirely possible that Harrisburg is populated entirely by dopes.

Second, the idea is to have officials go back to the drawing board and come up with better ideas for BS Testing. This is akin to feeling great pain because you're hitting yourself in the head with a hammer and saying, "Hmm. Well, maybe if I turn the hammer sideways it won't hurt so much." It's akin to eating a terrible, vomit-inducing meal of liver and pineapple and rotted fish parts covered with chocolate sauce and saying, "Well, maybe if we put the chocolate sauce on first rather than last." This is about re-aranging deck chairs rather than examining the premise.

Third, while high school seniors will not be required by the state to pass the Keystones to graduate, the state still plans to use the Keystones to evaluate schools and teachers. So our professional fates are still tied to a BS Test that students have no reason to take seriously or care about. Great.

Fourth-- well, many DO have a reason to care about the test, because in anticipation of the state's BS Test grad requirement, many school districts have already made passing the Keystone a local graduation requirement. We do that in PA-- the state sets a grad requirement minimum, and local districts can require over and above that. So for many local students, the postponing of the Keystone grad requirement will make zero difference-- they still have to pass the test or an alternative assessment (known in my district as the Binder of Doom) in order to graduate.

So this is good news in the sense that it would be worse if the state had gone ahead with its original plan to require Keystones as exit tests right now. But it's bad news in the sense that we aren't really trying to fix anything or figure out what we really ought to be doing. And it's bad news because the decisions are still in the hands of the most expensive, most incompetent state government in the country.

Monday, January 11, 2016

Ranking Is Not Measuring

This point came up in passing a few days ago when I was reviewing some writing by Mark Garrison,
but it is worth hammering home all by itself.

We have been told repeatedly that we need to take the Big Standardized Tests so that we can hold schools accountable and tell whether our teachers are succeeding or not. "Of course we need accountability systems," the policy makers say. "Don't you want to know how well we're doing?"

And then we rank schools and teachers and students. But ranking is not measuring.

Would you rather be operated on by a top-ranking surgeon or one who was the bottom of his class? What if the former is the top graduate of Bob's Backyard School of Surgical Stuff and the latter is the bottom of Harvard Medical School? Would you like homework help from the dumbest person in MENSA or the smartest person in a 6th grade remedial class? And does that prompt you to ask what we even mean by "dumb" or "smart"?

"But hey," you may reply. "If I'm going to rank people by a particular quality, I have to measure that quality, don't I?"

Of course not. You can find the tallest student in a classroom without measuring any of them. You can find the heaviest box of rocks by using a scale that doesn't ever tell you how much they weigh. Ranking requires no actual measurement at all.

Not only that, but when we are forced to measure, ranking encourages us to do it badly. Many qualities or characteristics would best be described or measured with a many-dimensional matrix with a dozen different axes. But to rank-- we have to reduce complex multidimensional measurement to something that can be measured with a single-edged stick.

Who is most attractive-- Jon Hamm, Ryan Gosling, or George Clooney? It's an impossible question because it involves so many factors, from hair style to age to wry wit vs. full-on silliness all piled on top of, "Attractive to whom, exactly?" We can reduce all of those factors and measure each one independently, and that might create some sort of qualitative measure of attractiveness, but it would be so complicated that we'd have to chart it on some sort of multi-matrix omni-dimensional graphy thing, and THAT would make it impossible to rank the three gentlemen. No, in order to rank them we would either have to settle on some single measurement that we use as a proxy for all the rest, or some bastard offspring created by mashing all the measures together. This results in a ranking that doesn't reflect any kind of real measurement of anything, ultimately both meaningless and unconvincing (the ladies of the George Clooney fan club will not change allegiance because some data-driven list contradicts what they already know in their hearts).

In fact, when we create the bastardized mashup measurement, we're really creating a completely new quality. We can call it the Handsomeness Quotient, but we might as well call Shmerglishness.

So let's go back to "smart," a word that is both as universally used and as thoroughly vague as "good" or "stuff." Smartitude is a complex of factors, some of which exist not as qualities but as relationships between the smart-holder and the immediate environment (I'm pretty smart in a library, average under a car hood, and stupid on a basketball court). Measuring smart is complicated and difficult and multi-dimensional.

But then in the ed biz we're going to fold that quality into a broader domain that we'll call "student achievement" and now we are talking about describing the constellation of skills and abilities and aptitudes and knowledge for an individual human being, and to rank requires to use a single-axis shmerglishness number.

We could go on and on about the many examples of how complex systems cannot be reduced to simple measures, but I want to go back and underline that main idea--

Ranking is not measuring. In fact, ranking often works directly against measuring. As long as our accountability systems focus on ranking students, teachers, and schools, they will not tell us how well the education system is actually working.

Pages