CURMUDGUCATION: testing

Showing posts with label testing. Show all posts

Thursday, March 3, 2016

TOYT Shill For PARCC

PARCC is touting two new radio spots that feature a couple of Teacher of the Year winners touting the wonderfulness of the PARCC.

The National Network of Teachers of the Year produced a "research report" last year that determined that the Big Standardized Tests are super-duper and much more better than the old state tests. Was the report legit? Weelll.....

The report was reviewed by three-- well, "experts" seems like the wrong word. Three guys. Joshua Starr was a noted superintendent in Maryland, where he developed a reputation as a high stakes testing opponent. He lost that job, and moved on to become the CEO of Phi Delta Kappa. Next, Joshua Parker was a compliance specialist with Baltimore Schools, a teacher of the year, and a current member of the reform-pushing PR operation, Education Post. And the third reviewer was Mike Petrilli, head of the Fordham Institute, a group dedicated to promoting testing, charters, etc.

The study was funded by the Rockefeller Philanthropy advisors, while the NNTOY sponsors list includes by the Gates Foundation, Pearson, AIR, ETS and the College Board-- in other words, every major test manufacturer in the country that makes a hefty living on high stakes testing.

So the study's conclusion that tests like the PARCC and the SBAC are super-excellent is not exactly a shock or surprise, and neither can it be surprise that one follow-up to the study is these two radio spots.

The teachers in the spots are Steve Elza, 2015 Illinois TOYT and applied tech (automotive trades) teacher, and Josh Parker, a-- hey! Wait a minute!! Is that? Why, yes-- it appears to be one of the reviewers of the original study. Some days I start to think that some folks don't really understand what "peer review" means when it comes to research.

Anyway, the spots. What do they say? Let's listen to Elza's spot first--

A narrator (with a fairly distinct speech impediment which-- okay, fine, but it's a little distracting at first) says that Illinois students took a new PARCC test. It was the first time tests were ever aligned with what teachers taught in the classroom! Really!! The first time ever, ever! Can you believe that? No, I can't, either. And some of the best teachers in the country did a study last year to compare PARCC to state tests. And now, 2015 Teacher of the Year, Steve Elza:

Every teacher who took part in the research came to the same conclusion-- PARCC is a test worth taking. The results more accurately measure students' learning progress and tells us if kids are truly learning or if they're just repeating memorized facts. Because PARCC is aligned to our academic standards, the best preparation for it is good classroom instruction. As a teacher, I no longer have to give my students test-taking strategies-- instead I can focus on making sure students develop strong, critical, and analytical thinking skills. Our students were not as prepared for the more rigorous coursework in college or even to start working right after high school.

Sigh. First, "truly learning" and "repeating memorized facts" are not the two possible things that a test can measure, and any teacher who is not teaching test-taking strategies is not preparing her students for the test. I'm glad Elza is no longer working on test-taking strategies in auto shop, and I'm sure he's comfortable having his skills as a teacher of automotive tradecraft based in part on student math and English standardized test scores. The claim that PARCC measures readiness for the working world is just bizarre. I look forward to PARCC claims that the test measures readiness for marriage, parenthood, and running for elected office.

The narrator returns to exclaim how helpful PARCC is, loaded with "valuable feedback" that will make sure everybody is ready for "success in school and life." Yes, PARCC remains the most magical test product ever manufactured.

So how about the other spot? Let's give a listen.

Okay, same narrator, same copy with Illinois switched out for Maryland. That makes sense. And now, teacher Josh Parker:

Every teacher who took part in the research came to the same--- hey, wait a minute!! They just had these two different teachers read from the same script! Someone (could it be the PARCC marketting department?) just put words in their mouths. Parker goes one extra mile-- right after "analytical thinking skills" he throws in "PARCC also pulled back the curtain on a long-unspoken truth" before the baloney about how students were unprepared for life. Also, Parker didn't think there was a comma after "strong."

One more sad piece of marketing for the PARCC as it slowly loses piece after piece of its market. It's unfortunate that the title Teacher of the Year has been dragged into this. The award should speak more to admirable classroom qualities than simply be a way to set up teachers to be celebrity spokespersons for the very corporations that have undercut the teaching profession.

Wednesday, March 2, 2016

Ace That Test? I Think Not.

The full court press for the Big Standardized Test is on, with all manner of spokespersons and PR initiatives trying to convince Americans to welcome the warm, loving embrace of standardized testing. Last week the Boston Globe brought us Yana Weinstein and Megan Smith, a pair of psychology assistant professors who have co-founded Learning Scientists, which appears to be mostly a blog that they've been running for about a month. And say what you like-- they do not appear to be slickly or heavily funded by the Usual Gang of Reformsters.

Their stated goals include lessening test anxiety and decreasing the negative views of testing. And the reliably reformy Boston Globe gave them a chance to get their word out. Additionally, the pair blogged about additional material that did not make it through the Globe's edit.

The Testing Effect

Weinstein and Smith are fond of "the testing effect" a somewhat inexact term used to refer to the notion that recalling information helps people retain it. It always makes me want a name for whatever it is that makes some people believe that the only situation in which information is recalled is a test. Hell, it could be called the teaching effect, since we can get the same thing going by having students teach a concept to the rest of the class. Or the writing effect, or the discussion effect. There are many ways to have students sock information in place by recalling it; testing is neither the only or the best way to go about it.

Things That Make the Learning Scientists Feel Bad

From their blog, we learn that the LS team feels "awkward" when reading anti-testing writing, and they link to an example from Diane Ravitch. Awkward is an odd way to feel, really. But then, I think their example of a strong defense of testing is a little awkward. They wanted to quote a HuffPost pro-testing piece from Charles Coleman that, they say, addresses problems with the opt out movement "eloquently."

"To put it plainly: white parents from well-funded and highly performing areas are participating in petulant, poorly conceived protests that are ultimately affecting inner-city blacks at schools that need the funding and measures of accountability to ensure any hope of progress in performance." -- Charles F. Coleman Jr.

Ah. So opt outers are white, rich, whiny racists. That is certainly eloquent and well-reasoned support of testing. And let's throw in the counter-reality notion that testing helps poor schools, though after over a decade of test-driven accountability, you'd think supporters could rattle off a list of schools that A) nobody knew were underfunded and underresourced until testing and B) received an boost through extra money and resources after testing. Could it be that no such list actually exists?

Tests Cause Anxiety

The LS duo wants to decrease test anxiety by hammering students with testing all the time, so that it's no longer a big deal. I believe that's true, but not a good idea. Also, parents and teachers should stop saying bad things about the BS Tests, but just keep piling on the happy talk so that students can stop worrying and learn to love the test. All of this, of course, pre-supposes that the BS Tests are actually worthwhile and wonderful and that all the misgivings being expressed by professional educators and the parents of the children is-- what? An evil plot? Widespread confusion? The duo seem deeply committed to not admitting that test critics have any point at all. Fools, the lot of them.

Teaching to the Test

The idea that teaching to a test isn’t really teaching implies an almost astounding assumption that standardized tests are filled with meaningless, ill-thought-out questions on irrelevant or arbitrary information. This may be based on the myth that “teachers in the trenches” are being told what to teach by some “experts” who’ve probably never set foot in a “real” classroom.

Actually, it's neither "astounding" nor an "assumption," but, at least in the case of this "defiant" teacher (LS likes to use argument by adjective), my judgment of the test is based on looking at the actual test and using my professional judgment. It's a crappy test, with poorly-constructed questions that, as is generally the case with a standardized test, mostly test the student's ability to figure out what the test manufacturer wants the student to choose for an answer (and of course the fact that students are selecting answers rather than responding to open ended prompts further limits the usefulness of the BS Test).

But LS assert that tests are actually put together by testing experts and well-seasoned real teachers (and you can see the proof in a video put up by a testing manufacturer about how awesome that test manufacturer is, so totally legit). LS note that "defiant teachers" either "fail to realize" this or "choose to ignore" it. In other words, teachers are either dumb or mindlessly opposed to the truth.

Standardized Tests Are Biased

The team notes that bias is an issue with standardized tests, but it's "highly unlikely" that classroom teachers could do any better, so there. Their question-- if we can't trust a big board of experts to come up with an unbiased test, how can we believe that an individual wouldn't do even worse, and how would we hold them accountable?

That's a fair question, but it assumes some purposes for testing that are not in evidence. My classroom tests are there to see how my students have progressed with and grasped the material. I design those materials with my students in mind. I don't, as BS Tests often do, assume that "everybody knows about" the topic of the material, because I know the everybody's in my classroom, so I can make choices accordingly. I can also select prompts and test material that hook directly into their culture and background.

In short, BS Testing bias enters largely because the test is designed to fit an imaginary Generic Student who actually represents the biases of the test manufacturers, while my assessments are designed to fit the very specific group of students in my room. BS Tests are one-size-fits-all. Mine are tailored to fit.

Reformsters may then say, "But if yours are tailored to fit, how can we use them to compare your students to students across the nation." To which I say, "So what?" You'll need to convince me that there is an actual need to closely compare all students in the nation.

Tests Don't Provide Prompt Feedback

The duo actually agree that test "have a lot of room for improvement." They even acknowledge that the feedback from the test is not only late, but generally vague and useless. But hey-- tests are going to be totes better when they are all online, an assertion that makes the astonishing assumption that there is no difference between a paper test and a computer test except how the students record their answers.

Big Finish

The wrap up is a final barrage of Wrong Things.

Standardized tests were created to track students’ progress and evaluate schools and teachers.

Were they? Really? Is it even possible to create a single test that can actually be used for all those purposes? Because just about everyone on the planet not financially invested in the industry has pointed out that using test results to evaluate teachers via VAM-like methods is baloney. And tests need to be manufactured for a particular purpose-- not three or four entirely different ones. So I call shenanigans-- the tests were not created to both measure and track all three of those things.

Griping abounds about how these tests are measuring the wrong thing and in the wrong way; but what’s conspicuously absent is any suggestion for how to better measure the effect of education — i.e., learning — on a large scale.

A popular reformster fallacy. If you walk into my hospital room and say, "Well, your blood pressure is terrible, so we are going to chop off your feet," and then I say, "No, I don't want you to chop off my feet. I don't believe it will help, and I like my feet," your appropriate response is not, "Well, then, you'd better tell me what else you want me to chop off instead.

In other words, what is "conspicuously absent" is evidence that there is a need for or value in measuring the effects of education on a large scale. Why do we need to do that? If you want to upend the education system for that purpose, the burden is on you to prove that the purpose is valid and useful.

In the absence of direct measures of learning, we resort to measures of performance.

Since we can't actually measure what we want to measure, we'll measure something else as a proxy and talk about it as if it's the same thing. That is one of the major problems with BS Testing in a nutshell.

And the great thing is: measuring this learning actually causes it to grow.

And weighing the pig makes it heavier. This is simply not true, "testing effect" notwithstanding.

PS

Via the blog, we know that they wanted to link to this post at Learning Spy which has some interesting things to say about the difference between learning and performance, including this:

And students are skilled at mimicking what they think teachers want to see and hear. This mimicry might result in learning but often doesn’t.

That's a pretty good explanation of why BS Tests are of so little use-- they are about learning to mimic the behavior required by test manufacturers. But the critical difference between that mimicry on a test and in my classroom is that in my classroom, I can watch for when students are simply mimicking and adjust my instruction and assessment accordingly. A BS Tests cannot make any such adjustments, and cannot tell the difference between mimicry and learning at all.

The duo notes that their post is "controversial," and it is in the sense that it's more pro-test baloney, but I suspect that much of their pushback is also a reaction to their barely-disguised disdain for classroom teachers who don't agree with them. They might also consider widening their tool selection ("when your only tool is a hammer, etc...") to include a broader range of approaches beyond the "test effect." It's a nice trick, and it has its uses, but it's a lousy justification for high stakes BS Testing.

Thursday, February 11, 2016

Fordham Provides More Core Testing PR

The Fordham Institute is back with another "study" of circular reasoning and unexamined assumptions that concludes that reformster policy is awesome.

The Thomas B. Fordham Institute is a right-tilted thinky tank that has been one of the most faithful and diligent promoters of the reformster agenda, from charters (they run some in Ohio) to the Common Core to the business of Big Standardized Testing.

In 2009, Fordham got an almost-a-million dollar grant from the Gates Foundation to "study" Common Core Standards, the same standards that Gates was working hard to promote. They concluded that the Core was swell. Since those days, Fordham's support team has traveled across the country, swooping into various state legislators to explain the wisdom of reformster ideas.

This newest report fits right into that tradition.

Evaluating the Content and Quality of Next Generation Assessments is a big, 122-page monster of a report. But I'm not sure we need to dig down into the details, because once we understand that it's built on a cardboard foundation, we can realize that the details don't really matter.

The report is authored by Nancy Doorey and Morgan Polikoff. Doorey is the founder of her own consulting firm, and her reformy pedigree is excellent. She works as a study lead for Fordham, and she has worked with the head of Education Testing Services to develop new testing goodies. She also wrote a nice report for SBA about how good the SBA tests were. Polikoff is a testing expert and professor at USC at Rossier. He earned his PhD from UPenn in 2010 (BA at Urbana in 2006), and immediately raised his profile by working as a lead consultant on the Gates Measures of Effective Teaching project. He is in high demand as an expert on how test and implement Common Core, and he has written a ton about it.

So they have some history with the materials being studied.

So what did the study set out to study? They picked the PARCC, SBA, ACT Aspire and Massachussetts MCAS to study. Polikoff sums it up in his Brookings piece about the report.

A key hope of these new tests is that they will overcome the weaknesses of the previous generation of state tests. Among these weaknesses were poor alignment with the standards they were designed to represent and low overall levels of cognitive demand (i.e., most items requiring simple recall or procedures, rather than deeper skills such as demonstrating understanding). There was widespread belief that these features of NCLB-era state tests sent teachers conflicting messages about what to teach, undermining the standards and leading to undesired instructional responses.

Or consider this blurb from the Fordham website:

Evaluating the Content and Quality of Next Generation Assessments examines previously unreleased items from three multi-state tests (ACT Aspire, PARCC, and Smarter Balanced) and one best-in-class state assessment, Massachusetts’ state exam (MCAS), to answer policymakers’ most pressing questions: Do these tests reflect strong content? Are they rigorous? What are their strengths and areas for improvement? No one has ever gotten under the hood of these tests and published an objective third-party review of their content, quality, and rigor. Until now.

So, two main questions-- are the new tests well-aligned to the Core, and do they serve as a clear "unambiguous" driver of curriculum and instruction?

We start from the very beginning with a host of unexamined assumptions. The notion that Polikoff and Doorey or the Fordham Institute are in any way an objective third parties seems absurd, but it's not possible to objectively consider the questions because that would require us to unobjectively accept the premise that national or higher standards have anything to do with educational achievement, that the Core standards are in any way connected to college and career success, that a standardized test can measure any of the important parts of an education, and that having a Big Standardized Test drive instruction and curriculum is a good idea for any reason at all. These assumptions are at best highly debatable topics and at worst unsupportable baloney, but they are all accepted as givens before this study even begins.

And on top of them, another layer of assumption-- that having instruction and curriculum driven by a standardized test is somehow a good thing. That teaching to the test is really the way to go.

But what does the report actually say? You can look at the executive summary or the full report. I am only going to hit the highlights here.

The study was built around three questions:

Do the assessments place strong emphasis on the most important content for college and career readiness(CCR), as called for by the Common Core State Standards and other CCR standards? (Content)

Do they require all students to demonstrate the range of thinking skills, including higher-order skills, called for by those standards? (Depth)

What are the overall strengths and weaknesses of each assessment relative to the examined criteria forELA/Literacy and mathematics? (Overall Strengths and Weaknesses)

The first question assumes that Common Core (and its generic replacements) actually includes anything that truly prepares students for college and career. The second question assumes that such standards include calls for higher-order thinking skills. And the third assumes that the examined criteria are a legitimate measures of how weak or strong literacy and math instruction might be.

So we're on shaky ground already. Do things get better?

Well, the methodology involves using the CCSSO “Criteria for Procuring and Evaluating High-Quality Assessments.” So, here's what we're doing. We've got a new ruler from the emperor, and we want to make sure that it really measures twelve inches, a foot. We need something to check it against, some reference. So the emperor says, "Here, check it against this." And he hands us a ruler.

So who was selected for this objective study of the tests, and how were they selected.

We began by soliciting reviewer recommendations from each participating testing program and other sources, including content and assessment experts, individuals with experience in prior alignment studies, and several national and state organizations.

That's right. They asked for reviewer recommendations from the test manufacturers. They picked up the phone and said, "Hey, do you anybody who would be good to use on a study of whether or not your product is any good?"

So what were the findings?

Well, that's not really the question. The question is, what were they looking for? Once they broke down the definitions from CCSSO's measure of a high-quality test, what exactly were they looking for? Because here's the problem I have with a "study" like this. You can tell me that you are hunting for bear, but if you then tell me, "Yup, and we'll know we're seeing a bear when we spot its flowing white mane and its shiny horn growing in the middle of its forehead, galloping majestically on its noble hooves while pooping rainbows."

I'm not going to report on every single criteria here-- a few will give you the idea of whether the report shows us a big old bear or a majestic, non-existent unicorn.

Do the tests place strong emphasis on the most important content etc?

When we break this down it means--

Do the tests require students to read closely and use evidence from texts to obtain and defend responses?

The correct answer is no, because nothing resembling true close reading can be done on a short excerpt that is measured by close-ended responses that assume that all proper close readings of the text can only reach one "correct" conclusion. That is neither close reading (nor critical thinking). And before we have that conversation, we need to have the one where we discuss whether or not close reading is, in fact, a "most important" skill for college and career success.

Do the tests require students to write narrative, expository, and persuasive/argumentation essays (across each grade band, if not in each grade) in which they use evidence from sources to support their claims?

Again, the answer is no. None of the tests do this. No decent standardized test of writing exists, and the more test manufacturers try to develop one, the further into the weeds they wander, like the version of a standardized writing I've seen that involves taking an "evidence" paragraph and answering a prompt according to a method so precise that all "correct" answers will be essentially identical. If there is only one correct answer to your essay question, you are not assessing writing skills. Not to mention what bizarre sort of animal a narrative essay based on evidence must be.

Do the tests require students to demonstrate proficiency in the use of language, including academic vocabulary and language conventions, through tasks that mirror real-world activities?

None, again. Because nothing anywhere on a BS Tests mirrors real-world activities. Not to mention how "demonstrate proficiency" ends up on a test (hint: it invariably looks like a multiple choice Pick the Right Word question).

Do the tests require students to demonstrate research skills, including the ability to analyze, synthesize organize, and use information from sources?

Nope. Nope, nope, nope. We are talking about the skills involved in creating a real piece of research. We could be talking about the project my honors juniors complete in which they research a part of local history and we publish the results. Or you could be talking about a think tank putting together some experts in a field to do research and collecting it into a shiny 122-page report. But you are definitely not talking about something that can be squeezed into a twenty-minute standardized test section with all students trying to address the same "research" problem with nothing but the source material they're handed by the test. There are little-to-none research skills tested there.

How far in the weeds does this study get?

I look at the specific criteria for the "content" portion of our ELA measure, and I see nothing that a BS Test can actually provide, including the PARCC test for which I examined the sample version. But Fordham's study gives the PARCC a big fat E-for-excellent in this category.

The study "measures" other things, too.

Depth and complexity are supposed to be a thing. This turns out to be a call for higher-order thinking, as well as high quality texts on the test. We will, for the one-gazzillionth time, skip over any discussion of whether you can be talking about true high quality, deeply complex texts when none of them are ever longer than a page. How exactly do we argue that tests will cover fully complex texts without ever including an entire short story or an entire novel?

But that's what we get when testing drives the bus-- we're not asking "What would be the best assortment of complex, rich, important texts to assess students on?" We are asking "what excerpts short enough to fit in the time frame of a standardized text will be good enough to get by?"

Higher-order responses. Well, we have to have "at least one" question where the student generates rather than selects an answer. At least one?! And we do not discuss the equally important question of how that open response will be scored and evaluated (because if it's by putting a narrow rubric in the hands of a minimum-wage temp, then the test has failed yet again).

There's also math.

But I am not a math teacher, nor do I play one on television.

Oddly enough

When you get down to the specific comparisons of details of the four tests, you may find useful info, like how often the test has "broken" items, or how often questions allow for more than one correct answer. I'm just not sure these incidentals are worth digging past all the rest. They are signs, however, that researchers really did spend time actually looking at things, which shouldn't seem like a big deal, but in world where NCTQ can "study" teacher prep programs by looking at commencement fliers, it's actually kind of commendable that the researchers here really looked at what they were allegedly looking at.

What else?

There are recommendations and commendations and areas of improvement (everybody sucks-- surprise-- at assessing speaking and listening skills), but it doesn't really matter. The premises of this entire study are flawed, based on assumptions that are either unproven or disproven. Fordham has insisted they are loaded for bear, when they have, in fact, gone unicorn hunting.

The premises and assumptions of the study are false, hollow, wrong, take your pick. Once again, the people who are heavily invested in selling the material of reform have gotten together and concluded once again that they are correct, as proven by them, using their own measuring sticks and their own definitions. An awful lot of time and effortappears to have gone into this report, but I'm not sure what it good it does anybody except the folks who live, eat and breathe Common Core PR and Big Standardized Testing promotion.

These are not stupid people, and this is not the kind of lazy, bogus "research" promulgated by groups like TNTP or NCTQ. But it assumes conclusions not in evidence and leaps to other conclusions that cannot be supported-- and all of these conclusions are suspiciously close to the same ideas that Fordham has been promoting all along. This is yet another study that is probably going to be passed around and will pick up some press-- PARCC and SBA in particularly will likely cling to it like the last life preserver on the Titanic. I just don't think it proves what it wants to prove.

Wednesday, February 10, 2016

College Board's Real Business

Here's the morning's promoted tweet from the College Board

College Board Search's PSAT database increased 4.1%. Reach these students today! https://t.co/sOSYdVqbbY
— The College Board (@CollegeBoard) January 12, 2016

That link takes you to the College Board page tagged with "Transformed Services for Smart Recruiting." Here you can find all sorts of useful headings like

Student Search Service (registered trademark)

Connect with students and meet recruitment goals using precise, deep data from the largest and richest database of college-bound students in the nation.

Enrollment Planning Service (trademark)

Achieve your enrollment goals with powerful data analysis tools that efficiently facilitate exploration of the student population and inform a smarter recruitment plan.

Segment Analysis Service (trademark)

Leverage sophisticated geographic, attitudinal and behavioral information to focus your enrollment efforts and achieve better yields from admission through graduation.

That last one, with its ability to leverage attitudinal and behavioral data-- how the heck do they do that? Exactly what is in the big fat College Board data base.

There's a phone number for customers to call, and of course, "customers" does not mean "students and their families." It means all the nice people who keep the College Board in business by paying for the data that they've mined from their testing products. Those folks can click over to the College Board Search Support page to learn that every high school student who ever took a College Board test product (PSAT, SAT, AP exam, or any of the many new SAT products) is in the database.

I don't know that the data miners at the College Board are any more nefarious than those at Facebook or a television network. Though those at least give the datamined subjects a free "product" to play with-- the College Board manages to mine students for data and get them to pay for the privilege.

But so many people think of the College Board and its test products as some sort of public service or educational necessity. It would be useful if we could all remember who they really are, what they really do, and how they make their money.

Monday, February 8, 2016

CAP: The Promise of Testing

CAP is back with another one of its "reports." This one took four whole authors to produce, and it's entitled "Praise Joyous ESSA and Let a Thousand Tests Bloom." Ha! Kidding. The actual report is "Implementing the Every Student Succeeds Act: Toward a Coherent, Aligned Assessment System."

The report is sixty-some pages of highly-polished CAP-flavored reformster baloney, and I've read it so you don't have to, but be warned-- this journey will be neither short nor sweet. But we have to take it in one shot, so you can see the entirety of it, because there are large swaths of their argument that you probably agree with.

Who is CAP, again?

The Center for American Progress is billed as a left-leaning thinky tank, but it has also served as a holding tank for Clintonian beltway denizens. It was formed by John Podesta and run by him between his gigs as Bill Clinton's Chief of Staff and Hillary Clinton's campaign chairman, and has provided food and shelter to many Clinton staffers who didn't want to have to leave DC while waiting for their next shot at the Big Show.

CAP loves the whole privatizing charterfying profiteering common core cheering reformster agenda. In fact, CAP's deep and abiding love for the Common Core has burned brighter than a thousand stars and longer than even Jeb! Bush's willingness to keep saying that name. CAP has stymied me, taxing my ability to invent new versions of the headline "CAP says something stupid in support of Common Core" (see here, here, here, and here).

If the last fifteen years have seen the building of a revolving door, education-industrial complex on par with the military and food industries, then CAP is right in the center of that culture. They have never met an ed reform idea they didn't like or promote, and they are not afraid to manufacture slick, baloney-stuffed "reports" to push the corporate agenda.

So that's who produced this big chunk of goofiness.

Introduction

Like many other advocacy groups, CAP sees a golden opportunity in ESSA, and that golden opportunity is all about the testing.

States and districts must work together to seize this opportunity to design coherent, aligned assessment systems that are based on rigorous standards. These systems need to include the smart and strategic use of formative and interim tests that provide real-time feedback to inform instruction, as well as high-quality summative tests that measure critical thinking skills and student mastery of standards.

So how can states build on the research base and knowledge regarding high-quality assessments in order to design systems that do not just meet the requirements of federal law but actually drive student learning to a higher level—especially for students from marginalized communities?

And later, CAP says that this report "outlines a vision and provides specific recommendations to help federal, state and local leaders realize the promise of tests." The promise of tests? Not students, not education, not learning, not empowering communities to help their children grow into their best selves. Nope. The promise of tests. So, as is too often the case, we've skipped right the question of "should we" and will proceed directly to "how," setting out once again to do a better job of more precisely hitting the absolutely wrong target. Yay.

History Lesson from Alternate Universe

CAP will now set the stage by hanging a backdrop of Things That Are Not True.

High-quality assessments play a critical role in student learning and school improvement. No, not really. Well, maybe, in the sense that "critical" is a pretty vague word.

High-quality tests can also show how well states, districts, and schools are doing in meeting the educational needs of all students. No. At least, not any allegedly high quality tests that currently exist.

CAP is willing to acknowledge that testing is "driving the agenda" and that's Not Good. They even acknowledge that despite their "research" showing that tests only take up 2% of school time, lots of folks have noticed that standardized testing has become the focus of too many schools.

CAP wants you to know that the ESSA has many cool, shiny features. It requires states to use broader measures and afford flexibility. CAP thinks ESSA might lead to less teacher evaluation emphasis on testing, maybe. There is money available for tweaking testing, including $$ for "innovation."

There's more history, like a history of tests. CAP equates the Socratic method with testing. They also cite the establishment of the Chinese testing that helped kick off centuries of conformity and non-innovation (read Yong Zhao's Who's Afraid of the Big Bad Dragon). We work our way through the present, skipping the parts where tests were useful for eugenics and Keeping the Lessers in Their Place.

Then we insert the usual Story of Accountability, beginning in 1983 Nation at Risk, which I always think is a bold choice since Nation at Risk predicted that the country would have collapsed by now, so maybe it's not such a great authority.

Then we move on to the "promise of the Common Core State Standards," and as usual, CAP is shameless in its willingness to recycle old baloney like "the Common Core Standards are comparable to the academic standards in the highest performing nations in the world" (this leads us, by a circuitous route, back to some Fordham Core promotional work) and in reference to the Core testing, "like the Common Core, these tests are more rigorous and of higher quality than what many previous states had before." It's a big word salad with baloney on top. CAP also lauds the imaginary "shifts in learning" which are supported by a footnote to the Common Core website, so you know it must be true.

The state of testing

CAP explains the three types of test (formative, interim and summative) and notes that federally mandated tests are summative, and are "used to give students, parents and educators a detailed picture of student progress toward meeting state standards over the past school year" and I wonder, do they giggle when they write this, or have they smacked themselves the brain with the PR sledgehammer so many times that they just don't feel it any more? The current Big Standardized Tests of course don't provide a detailed picture of anything at all.

CAP also wants us to know about high-quality tests, which "measure critical thinking and problem-solving skills" and why don't we also say that they measure the number of unicorns grazing in the fields of rainbow cauliflower growing behind the school, because they do both equally well. But CAP wants us to know that "good assessments are also field tested and evaluated by experts," so suddenly many of the BS Tests aren't looking too good.

CAP acknowledges the anti-test movement, but goes on to say that despite the backlash, national polling data shows that people really love the tests. Why, polls by Education Next and Education Post both found signs of the testing love! This is as surprising as a poll commissioned by the National Mustard Manufacturers that discovers a wide-spread love for mustard-- Post and Next are both unabashedly advocate, push for, and profit from the testing, reform and privatization industry. CAP also takes us on a tour of the many states that have tried to tweak the testing biz one way or another, and I would take you through those, but we still have pages and pages to go, my friends.

Methodology

CAP takes this moment to share their methodology, which appears to be that they held some focus groups, talked to some people, and checked in with some parents, some rich and some poor, according to CAP. How these people were either located or selected is a mystery--they could have been random strangers from the street or CAP family members. They also made sure to talk to some other thinky tank pro-reform profiteering groups like Achieve, the Education Trust, and the College Board. They describe their sample as a "wide variety of stakeholders and experts," and we will just have to take their word for it.

What did they find out?

So what are some of things discovered in this vaguely defined researchy sort of activity?

Parents want better tests.

Here we see a return of the classic "just misunderstood" story line; the value of tests needs to be "made more evident" to parents. The report quotes one parent as "not against standardized testing, because there is a need to understand on a national level whether our children are being educated and where different districts need to have extra resources and the like." Which is a great quote, and might be a useful purpose for testing, except that it doesn't work that way under current reformster programs. Instead of, "Hey, this school is clearly underfunded and undersupported," we hear cries of, "Hey, this school has low scores. We must rescue students from it with charters and maybe close it, too."

And while parents in the focus group seem to see global and large-scale uses for testing, they aren't getting much use out of them for their own children.

Teachers do not get the time and support they need

This section is shockingly frank, reporting teachers who got PD about the PARCC when it wasn't completed, and teachers who report essentially being told to get those test scores up, never mind the regular instruction. Shocking, huh? I wonder what created that sort of atmosphere. We will all just numbly skip over the issue of whether these reformsters ever listen to a single word that teachers say, because remember-- when you want to creatively disrupt and make over an entire field, it's important to disregard any noise fomr the trained, experienced practitioners in that field.

Communications to stakeholders is weak

Yes, it's the PR. Common Core and BS Tests are just misunderstood. If only people could be re-educated about the tests. Maybe at a nice camp somewhere. (Bonus sidebar lauds the PARCC for their clear and colorful report card, which uses nice graphics to tell parents far less useful information than could be gleaned from a five-minute phone call to your child's teacher.)

Fun sidenote: several parents reported that they got the most useful information about testing from the John Oliver show segment on tests. That was probably not the kind of info that CAP wanted to have spread.

The Test lacks value for individual students

And that, boys and girls, is how a bureaucrat translates "The students sense that the BS Tests are a bunch of time-wasting bullshit with no connection to their actual lives." In fact, some parents and teachers said they had the impression that BS Test scores aren't even used to influence instruction. It's true. Instruction is also not very influenced by reading the warts of a grey toad under a full moon.

End-of-year summatives are not aligned to instruction

Well, no, they aren't. And as long as your plan is built around a large-scale, one-size-fits-all BS test, they never will be.

Too much test prep is occuring

Well, duh. The BS Tests have high stakes. And while CAP wants to pretend that new BS Tests are just so high quality and awesome that test prep is a waste of everyone's time score-wise, most everybody's experience is the opposite. The most authentic assessment matches the instruction and the actual task being learned. Since reformsters have fixed it so that teachers cannot change the assessment, the only way to make the BS Tests a more authentic assessment is to change what we teach. As long as schools are locked into a statewide high stakes BS Test beyond their control, there will be test prep, and lots of it.

CAP found that test prep was more prevalent among the poorer students. Again, duh. Lower socio-economic status correlates pretty directly to standardized test results. Lower SES students are the ones who need the most extra help to get up to speed on the twisty mindset needed to play the "What does the test writer want me to say here" game.

Weak logistics and testing windows and nutsy bolty things

If the test must be given on a computer and there are only thirty computers in the building, there's a problem. I'm inclined to think the problem is that you are requiring the students to take the test on a computer. Also, CAP has concerns about timing of test results and test taking allowing for accurate measures and useful feedback. I'm prepared to reassure CAP that no matter when or how my students take the BS Test, it will not provide an accurate measure or useful feedback, so y'all can just relax.

So what does CAP think we should do about all this?

So here's what CAP thinks the state, district and school authorities can do "to improve the quality of assessments, address concerns about overtesting, and make assessments more valuable for students, parents, and teachers." And if you've been reading carefully, you can guess where this is going.

Here's what states should do

Develop rules for "robust" testing. Okay, CAP says "principles," but they mean rules. Write some state-level rules about what ever test should look like. Yessirree, what I need in my classroom is some suit from the state capital to tell me how to create an assessment.

Conduct alignment pogroms. Okay, maybe that's not the word they used. But they suggest that states check all up and down the school systems and make sure that every single teacher is fully aligned to the standards (including curriculum and homework). Because thanks to the Ed-Secretary-neutering powers of ESSA, reformsters can now shoot for total instructional control of every school district without raising the Federal Overreach Alarm. Oh, and the alignment should run K-16, so don't think you're getting off so easy, Dr. College Professor.

Since districts may not have the time and resources to make sure that every single solitary assessment is aligned and high quality, states should be ready to lend a hand. Give them some money. Create all the tests and assignments for them, or, you know, just hire some willing corporation to so it.

Demand a quick turnaround on test results. Because that creates more "buy-in" at the local level. Also "a quick turnaround also creates more value, and educators and families can use the assessment results more readily in their decision-making." Oh, yeah-- everyone is just waiting on pins and needles so they can make decisions about Young Chris's future. But about that...

Increase the value of tests for parents, teachers and students. How could we do that? By making better tests! Ha! Just kidding. By offering rewards, like college credits for good performance. Or awards and prizes for high scores. Like stickers and ribbons? Yes, that will make the BS Tests so much more valuable.

Jump on the innovative assessment development grant-band wagon. And here comes the punchline:

If states move forward with performance-based or competency-based assessments, they should consider carefully whether their districts and educators have the capacity and time to create high-quality, valid, reliable, and comparable performance assessments. Instead of looking to dramatically change the content of assessments, states should consider how they can dramatically change the delivery of assessments. States should explore moving away from a single end-of-year test and toward the use of shorter, more frequent interim assessments that measure student learning throughout the year and can be combined into a single summative determination.

Yes, all-testing, all the time. It solves all of our problems-- with one perfectly aligned system that constantly logs and records and data-crunches every canned assignment and delivers the assessments seamlessly through the computer, we can plug students in and monitor every educational step of every educational day.

Finally, states should step up their communication game with better, prettier and more explainier printouts from the uber-aligned 360 degree teaching machine system, so that parents will understand just how much their elder sibling loves them.

What should local districts do?

Bend over and kiss their autonomy goodbye? Ha! Just kidding. CAP would never say that out loud.

Get rid of redundant tests, preferably not the ones that are created by favored vendors.

"Build local capacity to support teachers' understanding of assessment design and administration." God, sometimes I think these guys are morons, and sometimes I think they are evil geniuses. Doesn't "support" sound so much nicer than "re-educate" or "properly indoctrinate." Because I have my own pretty well-developed understanding of assessment design and administration, but if they knew it, I don't think CAP would support it.

"Create coherent systems of high-quality formative and interim assessments that are aligned with state standards." Buy your entire assessment system from a single vendor. One size will fit all.

"Better communicate with parents about tests. To build trust, districts should be more transparent around assessments. This includes posting testing calendars online, releasing sample items, and doing more to communicate about the assessments." You know what's an excellent way to build trust? Behave in a trustworthy manner. Just saying. Also, this is not transparency. Transparency would include things like, say, releasing all the test items so students and parents could see exactly where Young Pat was marked right or wrong.

Tackle logistics. Remember how hard it is for schools to test many students on few computers? Districts should tackle that. It's not clear if that should be, like, a clean ankle grab tackle or districts can go ahead an clothesline that logistic. But CAP does have concrete examples, like "Plan well in advance" with the goal of "minimizing disruption." Thanks, CAP. I bet no district leaders ever thought of planning in advance. I can't believe you gave dynamite advice like that away for free.

What should schools do?

Make testing less torturous. Let students go pee.

Hold an explain-the-test social night. Have principals announce open-office hours so that any parent can stop by at any time to chat about the tests, because I'm sure the principal's day is pretty wide open and flexible.

Tell teachers things so that when parents ask questions, the teachers know the answers.

Oh, and stop unnecessary test prep. Just keep the necessary test prep, which is as much as you need to keep your numbers up. But thanks for the tip-- lots of teachers were in their classroom saying, "This test prep is a total waste of time, but I'm going to do it anyway just for shits and giggles, because I certainly didn't have it in my mind to teach my students useful things."

I am pretty sure that the further from broad policy strokes and the closer to actual classroom issues they get, the dumber CAP becomes.

How about the feds?

Use Title I as a means of threatening states that don't do all the jobs we gave them above. Help all the states that want to build the next generation all-day all-testing regimes. Spread best practices about assessment, because man, if there's anything we have learned over the past fifteen years, it's that when you want good solid answers about how to teach and assess your students, the federal government is the place to turn.

And the final recommendation?

If you are still reading, God bless you, but we needed to travel this twisty road in one go to see where it led.

It is the reformsters oldest and most favorite trick-- X is a clear and present problem, therefore you must accept Y as a solution, and I am going to sell X so well that you will forget to notice that I never explain how Y is any sort of solution.

Overtesting is a problem. Bad testing is a problem. Testing that yields up no useful results is a problem. Bad testing as an annual exercise in time-wasting futility is a problem. Testing driving instruction is a problem. CAP has given more ground on these issues than ever, but it appears to be a ju-jitsu move in hopes of converting all that anti-testing energy into support for Performance Based Education.

Don't like testing? Well, the solution is more testing. All the time. In a one-size-fits-all canned package of an education program. And here's the final huge irony. This is CAP wrapping up with a description of the long-term goal

system leaders should develop a robust, coherent, and aligned system of standards and assessments that measures student progress toward meeting challenging state standards. This exam system should be deeply grounded in the standards as assessed by an end-of-year summative test. Formative and interim assessments administered throughout the year will routinely—at natural transition points in the instructional program, such as the end of a unit—assess student understanding and progress and provide the results to teachers, parents, and students in close to real time. This system will enable everyone involved in a student’s education to make adjustments where needed in order to support learning so that no student slips through the cracks.

You know who does this sort of thing well already? Good, trained, professional classroom teachers. We assess daily, wrap those results back into our plans for the next day, and adjust our instruction to the needs and issues of individual students. We don't give pointless tests that are redundant or disconnected. We wrap larger and more formal assessments in with the informal assessments and we do it while maintaining instruction and looking after our students as if they were actual live human beings. And we do it all in timely manner. Of course, we don't do the things that CAP considers most critical.

For this assessment system to be as useful as possible, alignment is key. All assessments—formative, interim, and summative—must align with academic standards.

At the end of the day, CAP loves testing very much. But the thing they love even more is broadly adopted, all-knowing, all-controlling standards. One size fits all, selected by some Wiser Higher Authority who somehow knows what all human beings must know, and unhindered by those damn classroom teachers and their professional judgment, and all of it giving up a wondrous river of data, a nectar far more valuable than the vulnerable little humans from whom it was squeezed. Jam the standards in and drag the data out. That's CAP's coherent, aligned future.

Wednesday, February 3, 2016

USED Supports Unicorn Testing (With an Irony Saddle)

Acting Pretend Secretary of Education John King has offered further guidance as a follow-up to last year's Testing Action Plan, and it provides a slightly clearer picture of the imaginary tests that the department wants to see.

Here are the characteristics of the Big Testing Unicorn that King wants to see:

Worth taking: By "worth taking," King means aligned to the actual classroom, and requiring "the same kind of complex work students do in an effective classroom and the real world, and provide timely, actionable feedback." There are several things to parse here, not the least of which is "timely, actionable feedback" for whom, and for what purpose? Is King's ideal test a formative assessment, and if so, is the implication that it shouldn't be used for actions such as grading at all?

"Worth taking" is one of those chummy phrases that sounds like it means something until you are pinned between the rubber and the road trying to figure out what it means exactly. In my own classroom, I certainly have standards for whether or not an assessment is worth giving, but that decision rests heavily on my particular students, the particular subject matter, and the particular place we are in our journey, all of which also connects to how heavily weighted the grade is and if, in fact, there will be a grade at all.

But King's vision of a test aligned to both classroom and the real world is a bit mysterious and not very helpful.

High quality: This means we hit the full range of standards and "elicits complex student demonstrations of knowledge" and is supposed to measure both achievement and growth. That is a huge challenge, since complex constellations of skills and knowledge are not always easily comparable to each other. Your basketball-playing child got better at foul shots and dribbling, but worse at passing and footwork. She scores more points but is worse at teamwork. Is she a better player or not?

Time-limited: "States and districts must determine how to best balance instructional time and the need for high-quality assessments by considering whether each assessment serves a unique, essential role in ensuring all students are learning."

So, wait. The purpose of an assessment is to ensure that all students learn? How exactly does a test ensure learning? It can measure it, somewhat. But ensure it? Do you guys still not get that testing is not teaching?

This appears to say, "Don't let testing eat up too much instructional time." Sure. Of course, really good testing eats up almost no instructional time at all. On this point, the Competency Based Learning folks are correct.

Fair: The assessments are supposed to "provide fair measures of what all students, including students with disabilities and English learners, are learning." So this uber-test will accurately assess all levels of ability, from the very basement to the educational penthouse. King doesn't have any idea of how to do this, but he does throw the word "robust" in here.

Fully transparent to students and parents: King lists every form of transparency except the one that matters-- showing exact item by item results that include te question, the answers, and an explanation of why the test manufacturer believes their answer is the correct one. What KIng wants to make transparent is the testing PR-- reasons for the test, source of the mandate for the test, broad ungranulated reports of results, what parents can do even though we won't tell them exactly how their child's test went.

BS Tests currently provide almost no useful information, primarily because the testing system is organized around protecting the intellectual property rights of the test manufacturers. Until we address that, King's call for transparency is empty nonsense.

Just one of multiple measures: No single assessment should decide anything important. I look forward to the feds telling some states that they are not allowed to hold third graders back because of results on the BS reading test.

Tied to improved learning: "In a well-designed testing strategy, assessment outcomes should be used not only to identify what students know, but also to inform and guide additional teaching, supports, and interventions." No kidding. You know what my unattainable unicorn is? A world in which powerful amateurs don't make a big deal out of telling me what I already know as if they just discovered it themselves.

And your saddle of irony: Every working teacher reading this or the original letter has had exactly the same thought-- BS Tests like the PARCC and SBA and all the rest of them absolutely fail this list. The BS Tests don't measure the full range of standards, don't require complex, higher-order responses, suck up far too much time, cannot measure the full range of student ability, are supremely opaque, are given way too much weight as single measures, and are useless as tools for improving instruction. They are, in fact, not worth taking at all. Under this test action plan, they should be the first to go.

More swell ideas.

The letter comes with a five-page PS, ideas from the feds about how to improve your testing picture, or at least ways to score money from the department for that alleged purpose.

You could audit your state tests. You could come up with cool data-management systems, because bad, useless data is always magically transformed when you run it through computer systems. You might train teachers more in "assessment literacy," because we am dummies who need to learn how to squint at the ugly tests in order to see their beauty. You could increase transparency, but you won't. You could increase the reliability and validity of the tests-- or at least check and see if they have any at all to start with.

Or you could just take a whole bunch of testing materials and smack yourself over the head with them. Any of these seem like viable options for running your own personal state-level unicorn farm.

Thursday, December 3, 2015

The New ESEA and Content

There's a huge amount of discussion about how the New ESEA will affect policy and the flow of money and the new ways that privateers can grub for that money and just how big a hash states will make out of education, anyway etc etc etc,

But over at the Fordham blog, Robert Pondiscio has put a bit of focus where focus ought to be-- the new bill's effect on content.

Pondiscio is a reform fan who has always been willing to see what we see in the classroom-- that an emphasis on high stakes reading tests is destructive to the teaching of reading. I've made the same argument. The current theories about reading embedded in both the Common Core and in Big Standardized Tests is that reading is a set of free-floating skills unrelated to content, prior knowledge, or the engagement of the reader. The BS Tests have focused on short excerpts specifically chosen to be boring and weirdly obscure so as to guarantee that students will have no prior knowledge and will not find the excerpts interesting. All this because some reformsters believe that reading is a set of skills that has nothing to do with content, which is kind of like trying to imagine waves that exist independent of any matter through which they move. As Pondiscio puts it:

Years of treating reading as a discrete subject or a skill—teaching it and testing it that way—have arguably set reading achievement in reverse. You don’t build strong readers by teaching children to “find the main idea,” “make inferences,” and “compare and contrast.” You do it by fixing a child’s gaze on the world outside the classroom window.

It has been, and continues to be, a dumb and counterproductive way to approach reading. For one thing, it means that the best way for me to increase student achievement would be to never teach anything but daily three-paragraph excerpts from anything at all. Throwing out my anthology of American literature and replacing it with daily newspaper clippings would be an excellent way to get test scores up-- and a complete abdication of my responsibilities as a professional English teacher.

And professional English teachers know that. But for the past many years, we have also known that our school and professional ratings rest on those scores. So we have made compromises, or we have been commanded by state and/or local authorities to commit educational malpractice in the name of "student achievement" (the ongoing euphemism for "test scores").

This, more than anything else, is why the federal decoupling of teacher evaluation and school ratings from the BS Tests is good news.

Under the new ESEA, states will still have to test students annually, including in reading. But they have a lot more control over the way the results from those tests are turned into grades for schools. This could offer an opportunity to restore some sanity to schooling.

Exactly. States have the chance now to put an end to questions like, "Well, that's a lovely unit, but how will it prepare students for The Test?" It gives us the chance to get back to teaching students that reading (and writing and speaking and listening) are ways to engage with and unlock the wonders of the world.

Whether states will take the opportunity remains to be seen. But if they screw this up, they can no longer blame it on the feds. And if we sit in our schools and let them screw this up without raising a fuss in our respective state capitals, shame on us. The federal defanging of tests gives us the opportunity to put reading (and writing and listening and speaking) back in its rightful place, taught properly and properly used to empower student discovery of a million amazing things. No matter how I feel about the rest of the ESSA, I feel good about this.

Wednesday, November 25, 2015

PA: Testing Good News & Bad News

This week the Pennsylvania House of Representatives voted to postpone the use of the Keystone Exam (Pennsylvania's version of the Big Standardized Test required by the feds) as a graduation requirement. The plan had been to make the Class of 2017 pass the reading, math and biology exams in order to get a diploma. The House bill pushes that back to 2019.

The House measure joins a similar Senate bill passed last summer. The only significant difference between the bills is that the House bill adds a requirement to search for some tool more useful than the Keystones. The bills should be easy to fit together, and the governor is said to support the two-year pause, so the postponement is likely to become law. And that is both good news and bad news.

Good News

The Keystone is a lousy test. It is so lousy that, as I was reminded in my recent Keystone Test Giver Training Slideshow, all Pennsylvania teachers are forbidden to see it, to look at it, to lays eyes on it, and, if we do somehow end up seeing any of the items, sworn to secrecy about it. But because I am a wild and crazy rebel, I have looked at the Keystone exam as well as the practice items released by the state, and in my professional opinion, it's a lousy test.

So it's a blessing that two more rounds of students will not have to pass the tests in order to graduate-- particularly as the feds bear down on their insistence that students with special needs be required to take the same test as everyone else, with no adaptations or modifications. The year the Keystones are made a graduation requirement is the year that many Pennsylvania students will fail to graduate, even though they have met all other requirements set by their school board and local district.

That will not be a good year.

Bad News

The tests will still be given, and they will still be used for other purposes. Those purposes include evaluating teachers, and evaluating schools.

Pennsylvania's School Performance Profile, looks like it based on a variety of measures (some of which are shaky enough-- PA schools get "points" for buying more of the College Board's AP products) but at least one research group has demonstrated that 90% of the SPP is based on test scores (one huge chunk for a VAMmy growth score and another for the level of the actual score).

So we will continue to have schools and teachers evaluated based on the results of a frustrating and senseless test that students know they have absolutely no stake in, and which they know serves no purpose except to evaluate teachers and schools. Get in there and do your very best, students!

Bonus Round

Of course, some districts tried to deal with that issue of getting student skin in the game by also phasing in a pass-the-test requirement as a local thing. So now a whole bunch of students who have been hearing that they'll have to pass the Keystones to graduate-- they'll be hearing that the state took that requirement away, except then someone will have to tell them that their local district did NOT take the requirement away. This should open up some swell discussion.

So How Do We Feel?

State Board of Education Chairman Larry Wittig (a CPA who was appointed to board by Tom Corbett) is sad, because he thinks the whole testing program is awesome and well-designed. Wittig's reaction is itself a mixed bag. On the one hand, he thinks that the testing system is "well-crafted" and beneficial to students, which is just silly, because the test is neither of those things. On the other hand, he also said this:

If I'm a teacher and in part my evaluation is based on the result of these tests and now the tests are meaningless, I'm going to have a problem with that.

And, well, yes. "Not stakes for students, high stakes for schools and teachers" kind of sucks as an approach.

And really, is there anyone in Harrisburg who wants to articulate the reasoning behind, "We don't have faith in this test's ability to fairly measure student achievement, but we do have faith in its ability to measure teacher achievement." No? No, there doesn't seem anybody trying to explain the inherent self-contradiction in this position.

Perhaps the House-sponsored search for a Better Tool will yield fabulous results. But in the meantime, we've already signed Data Recognition Corporation, Inc, to a five-to-eight year contract to keep producing the Keystone, even if we don't know what we want to use the tests for.

The good part of this news is undeniable. Two more years of students who will not have to clear a pointless, poorly-constructed hurdle before they can get their diplomas. That's a win for those students.

But to postpone the test rather than obliterate it, to keep the test in place to club teachers and schools over the head, to signal that you don't really have an idea or a plan or a clue about what the test is for and why we're giving it-- those are all big losses for teachers, for education, and for all the students who have more than two years left in the system.

Thursday, November 19, 2015

More Evidence That Tests Measure SES

Want more proof, again, some more, of the connection between socio-economic status and standardized test results? Twitter follower Joseph Robertshaw pointed me at a pair of studies by Randy Hoover, PhD, at the Department of Teacher Education, Beeghly College of Education, Youngstown State University.

Hoover is now a professor emeritus, but the validity of standardized testing and the search for a valid and reliable accountability system. He now runs a website called the Teacher Advocate and it's worth a look.

Hoover released two studies-- one in 2000, and one in 2007-- that looked at the validity of the Ohio Achievement Tests and the Ohio Graduate Test, and while there are no surprises here, you can add these to your file of scientific debunking of standardized testing. We're just going to look at the 2007 study, which was in part intended to check on the results of the 2000 study.

The bottom line of the earlier study appears right up front in the first paragraph of the 2007 paper:

The primary finding of this previous study was that student performance on the tests was most significantly (r = 0.80) affected by the non-school variables within the student social-economic living conditions. Indeed, the statistical significance of the predictive power of SES led to the inescapable conclusion that the tests had no academic accountability or validity whatsoever.

The 2007 study wanted to re-examine the findings, check the fairness and validity of the tests, and draw conclusions about what those findings meant to the Ohio School Report Card.

So what did Hoover find? Well, mostly that he was right the first time. He does take the time to offer a short lesson in statistical correlation analysis, which will be helpful if, like me, you are not a research scholar. Basically, the thing to remember is that a perfect correlation is 1.0 (or -1,0). So, getting punched in the nose correlates about 1.0 to feeling pain.

Hoover is out to find the correlation between what he calls the students' "lived experience" to district level performance is 0.78. Which is high.

If you like scatterplot charts (calling Jersey Jazzman), then Hoover has some of those for you, all driving home the same point. For instance, here's one looking at the percent of economically disadvantaged students as a predictor of district performance.

That's an r value of -0.75, which means you can do a pretty good job of predicting how a district will do based on how few or many economically disadvantaged students there are.

Hoover crunched together three factors to create what he calls a Lived Experience Index that shows, in fact, a 0.78 r value. Like Chris Tienken, Hoover has shown that we can pretty well assign a school or district a rating based on their demographics and just skip the whole testing business entirely.

Hoover takes things a step further, and reverse-maths the results to a plot of results with his live experience index factored out-- a sort of crude VAM sauce. He has a chart for those results, showing that there are poor schools performing well and rich schools performing poorly. Frankly, I think he's probably on shakier ground here, but it does support his conclusion about the Ohio school accountability system of the time to be "grossly misleading at best and grossly unfair at worst," a system that "perpetuates the political fiction that poor children can't learn and teachers in schools with poor children can't teach."

That was back in 2007, so some of the landscape such as the Ohio school accountability system (well, public school accountability-- Ohio charters are apparently not accountable to anybody) has changed, along with many reformster advances of the past eight years.

But this research does stand as one more data point regarding standardized tests and their ability to measure SES far better than they measure anything else.

Monday, November 16, 2015

USED Goes Open Source, Stabs Pearson in the Back for a Change

The United States Department of Education announced at the end of last month its new #GoOpen campaign, a program in support of using "openly licensed" aka open source materials for schools. Word of this is only slowly leaking into the media, which is odd, because unless I'm missing something here, this is kind of huge. Open sourced material does not have traditional copyright restrictions and so can be shared by anybody and modified by anybody (to really drive that point home, I'll link to Wikipedia).

Is the USED just dropping hints that we are potentially reading too much into? I don't think so. Here's the second paragraph from the USED's own press release:

“In order to ensure that all students – no matter their zip code – have access to high-quality learning resources, we are encouraging districts and states to move away from traditional textbooks and toward freely accessible, openly-licensed materials,” U.S. Education Secretary Arne Duncan said. “Districts across the country are transforming learning by using materials that can be constantly updated and adjusted to meet students’ needs.”

Yeah, that message is pretty unambiguous-- stop buying your textbooks from Pearson and grab a nice online open-source free text instead.

And if that still seems ambiguous, here's something that isn't-- a proposed rules change for competitive grants.

In plain English, the proposed rule "would require intellectual property created with Department of Education grant funding to be openly licensed to the public. This includes both software and instructional materials." The policy parallels similar policies in other government departments.

The represents such a change of direction for the department that I still suspect there's something about this I'm either not seeing or not understanding. We've operated so long under the theory that the way government gets things done is to hand a stack of money to a private company, allowing them both to profit and to maintain their corporate independence. You get federal funds to help you develop a cool new idea, then you turn around and market that cool idea to make yourself rich. That was old school. That was "unleashing the power of the free market."

But imagine if this new policy had been the rule for the last fifteen years. If any grant money had touched the development of Common Core, the standards would have been open source, free and editable to anyone in the country. If any grant money touched the development of the SBA and PARCC tests, they would be open and editable for every school in America. And if USED money was tracked as it trickled down through the states- the mind reels. If, for instance, any federal grant money found its way to a charter school, all of that schools instructional ideas and educational materials would have become property of all US citizens.

As a classroom teacher, I find the idea of having the federal government confiscate all my work because federal grant money somehow touched my classroom-- well, that's kind of appalling. But I confess-- the image of Eva Moskowitz having to not only open her books but hand over all her proprietary materials to the feds is a little delicious.

Corporations no doubt know how to build firewalls that allow them to glom up federal money while protecting intellectual property. And those that don't may just stop taking federal money to fuel their innovation-- after all, what else is a Gates or a Walton foundation for?

And realistically speaking, this will not have a super-broad impact because it refers only to competitive grants, which account for about $3 billion of the $67 billion that the department throws around.

So who knows if anything will actually come of this. Still, the prospect of the feds standing in front of a big rack of textbooks and software published by Pearson et al and declaring, "Stop! Don't waste your money on this stuff!" Well, that's just special.

And in case you're wondering if this will survive the transition coming up in a month, the USED also quotes the hilariously-titled John King:

“By requiring an open license, we will ensure that high-quality resources created through our public funds are shared with the public, thereby ensuring equal access for all teachers and students regardless of their location or background,” said John King, senior advisor delegated the duty of the Deputy Secretary of Education. “We are excited to join other federal agencies leading on this work to ensure that we are part of the solution to helping classrooms transition to next generation materials.”

The proposed change will be open for thirty days of comment as soon as it's published at the regulations site. In the meantime, we can ponder what curious conditions lead to fans of the free market declaring their love for just plain free. But hey-- we know they're serious because they wrote a hashtag for it.

Tuesday, September 8, 2015

AP, Please Do Your CCSS Homework

Sally Ho, who works the Nevada/Utah beat for the Associated Press, tried her hand at a Common Core Big Standardized Test Explainer. She needed to do a little more homework, including a few more things not written by Common Core Testing flacks.

She sets out with a good question. Last year students were supposed to be taking super-duper adaptive tests that would generate lots of super-duper data. But in many states the computerized on-line testing was a giant cluster farfegnugen, and in many other states there was an "unprecedented spread of refusals."

So what is the impact of the incomplete data?

Common Core History

Ho's research skills fail her here, and she goes with the Core's classic PR line: The Core is standards, not curriculum. Also, it was totally developed by governors and state school superintendents "with the input of teachers, experts and community members." It's pretty easy to locate the actual list of people in the room when the Core was written.

Ho locates the opposition to the Core strictly in the right wing, reacting to Obama's involvement and a perceived federal overreach. Granted, she's a Nevada-Utah reporter, but at this point it's not that hard to note the large number of people all across the political spectrum who have found reasons to dislike the Core.

What Happened Last Year?

Ho's general outline is accurate, though her generous use of passive voice (the Clark County School District "was crippled") lets the test manufacturers and states off the hook for their spectacular bollixing of the on-line testing. She also notes the widespread test refusal (go get 'em, New York).

She also dips into the history of incomplete data, noting Kansas in 2014 and Wyoming in 2010. She might have spared a sentence or two to note that nothing like this has happened before because nobody has tried data generation and collection on this scale before.

How Are Test Scores Usually Used?

States are required to test all students and use their scores to determine how the school systems are doing, which can affect funding. Some states use the data for a "ratings" system. A few are using it as a part of teacher evaluations. In the classroom, schools generally share the data with teachers who use it to guide curriculum decisions and measure individual students.

True-ish, true (particularly with air quotes around "ratings"), true-ish, and false. We can call it etra false because it's not possible to effectively do all of those things with a single test. Tests are designed for a particular purpose. Trying to use them for other purposes just produces junk data.

How Will Incomplete Scores Affect the Classroom?

Ho has a wry and understated answer to this question: "Direct impacts on the classroom are likely to be minimal." I think that's a safe prediction from an instructional standpoint, though she rather blithely slides past "most states aren't using it for teacher evaluations yet," which strikes me as rather blandly vague, considering we're talking about the use of junk data to decide individual teachers' fates.

Still it's true that, since the test data never provided anything useful for classroom, having less of a useless thing doesn't really interfere with anyone's teaching. And if there's a teacher out there saying, "But how shall I design my instruction without a full Big Standardized Test Data profile of my students," that teacher needs to get out of the profession.

Ho might also have addressed the issue that in most states the data, incomplete or otherwise, doesn't arrive before the start of the school year, anyway.

She also claims that everyone says that test scores don't make the final call on grade promotion, which will come as news to all those states that have a Third Grade Reading Test Retention policy.

Oh No She Didn't

Ho answers the question of "Why even bother to test" with the hoariest of chestnuts, the Bathroom Scale Analogy-- "a school district trying to tackle chronic problems without standardize test scores can be like trying to diet without a scale." It is a dumb analogy. I have ranted about this before, so let me just quote me on this:

The bathroom scale image is brave, given the number of times folks in the resistance have pointed out that you do not change the weight of a pig by repeatedly measuring it. But I am wondering now-- why do I have to have scales or a mirror to lose weight? Will the weight loss occur if it is not caught in data? If a tree's weight falls in the forest but nobody measures it, does it shake a pound?

This could be an interesting new application of quantum physics, or it could be another inadvertent revelation about reformster (and economist) biases. Because I do not need a bathroom scale to lose weight. I don't even need a bathroom scale to know I'm losing weight-- I can see the difference in how my clothes fit, I can feel the easier step, the increase in energy. I only need a bathroom scale if I don't trust my own senses, or because I have somehow been required to prove to someone else that I have lost weight. Or if I believe that things are only real when Important People measure them.

Ho tries to hedge her bets by going on to say that of course you need other data, but the basic analogy is still just bad.

What's Next?

Studies looking at the validity of scores that states do have, which is kind of hilarious given that most of the BS Tests have never been proven valid in the first place.* So I guess states will try to find out if their partial unvalidated junk is as valid as a full truckload of unvalidated junk. That is almost as wacky as the next line:

For the next testing cycle, states say they don't expect problems.

Ho might want to check the files and see if the states expected problems this last time. You know, the time with all the unexpected problems. But Nevada has a new test manufacturer, Montana has no Plan B, and New York is leaning on parents. So everything should be awesome soon. And anyway, there's plenty of year left before it's time for the next puff piece on Common Core testing. Can I please request that AP reporters use that time to do some reading?

*I originally wrote that they have never been studied for validity; that's not true. Studies are out there. I and others remain unconvinced by them.

Pages