Saturday, September 5, 2015

FL: Testing Swamp

Since the 1920's boom in swampland sales, Florida has a been a land of Things Too Good To Be True. There is no state in the union that has pulled off the Big Standardized Testing piece of the Common Core-related Reformsterama, but Florida has brought a special "panache" and "je ne sais quoi le hell I'm doing" to the deployment of Le Gran Test.

Florida was on the testing bus early, rolling out the FCAT in 1998. They were members of the PARCC club, until it became clear that all things Common Core were going to be liabilities for politicians. So Florida jumped ship and eventually awarded its testing contract to American Institutes for Research, a test manufacturer that has been a perennial runner up in the testing contract beauty pageants.

According to various published reports, this may have led to the odd sidetrack of Florida buying into an alternative test that Utah had commissioned and then thrown out. But you won't find many Florida officials talking about the reported $5 million they paid for that test, perhaps because they've been too busy dealing with all the testing fallout. 

Le rolloutte was not tres bien. Reports came rolling in that the technological infrastructure could not manage the job, leading an editorial writer at the Tampa Bay Times to give the state an F for testing, noting Buzzfeed had managed to let 41 million people vote on what color that damn dress was, with peak traffic of 670K viewers. Why, with months to prepare, could the testing computers not handle a smaller load, well known ahead of time?

Well, apparently the Tampa Bay writer choice a particularly apt analogy, because we have now arrived at a moment that is just as unclear as the color of the fabled dress.

See, the Florida legislature decided that the best way to quiet the din of criticism next logical step was a $600K study of the BS Test. They hired Alpine Testing Solutions, a company that specializes in "psychometric and test development services." The company was well at it this summer, working to complete their study by the end of August. Because you definitely want to make sure your test is valid before you give it a second time.

But now the report has emerged and-- well, is it blue or gold or brown or what? That seems to be a bit of a contratemps regarding this point.

The State of Florida thinks that the reports says that the test is a thing of beauty and a joy forever. They point out things like how this year more tests were given than last year, and the test is totally safe from cyberattack. The big headline, though, is that despite a long, long list of issues, the test is still "valid in judging students' skills"

This is not exactly a home run. It's more, "We've talked to Mrs. Lincoln and we're pretty sure that she thought the play was swell."

You can read the report here. Well, you can try to read the report; it's written in fluent test manufacturer jargon, and would probably be easier to follow even in my pidgin French. You can go for the executive summary, but it's only shorter-- not clearer.

But digging out some specifics does not help the test's cause.

Some are simply practical, like the idea that the Utah items should be more Floridified. Because apparently around a third of the test questions are not even connected to Florida's standards. Without digging more deeply than I'm going to on a Saturday afternoon, it's hard to know just how bad that really is-- Utah and Florida are both states that faux-dropped the Common Core so that they could adopt some faux-local standards that are not too terribly different from CCSS. But there are other issues of deeper concern raised by the report.

With respect to student level decisions, the evidence for the paper and pencil delivered exams support the use of the FSA at the student level. For the CBT FSA, the FSA scores for some students will be suspect. Although the percentage of students in the aggregate may appear small, it still represents a significant number of students for whom critical decisions need to be made. Therefore, test scores should not be used as a sole determinant in decisions such as the prevention of advancement to the next grade, graduation eligibility, or placement into a remedial course.

So, the individual scores for students who took the computer version of the test shouldn't be used to make any decisions about the student, because it's entirely possible that they're wrong.

The interim passing scores were not established through a formal standard setting process and therefore do not represent a criterion-based measure of student knowledge and skills. The limitations regarding the meaning of these interim passing scores should be communicated to stakeholders.

We really need to talk about this more often. The BS Tests are not measuring students against any standard; they're just being used to stack rank students. Your child could only miss one question on the test, but if most of the other students miss zero questions, your child is still a failure. Well, assuming she actually missed the question.

The spring 2015 FSA administration was problematic. Problems were encountered on just about every aspect of the administration, from the initial training and preparation to the delivery of the tests themselves.

Test administration was a giant cluster-farfegnugen. Everything that could go wrong did.

If we take a step back and look at the larger picture, things don't look any better for this report.

First of all, while the state did hire "independent third party" to examine its test, they were only independent in the sense that they aren't directly involved in sales and marketing for this particular computer-based BS Test. But since they are in the industry, they are not going to ask some of the other necessary questions, like "Is it ever a good idea to try to give eight-year-olds a test on a computer." The report is filled with language about how aspects of FSA fell "within industry standards." There was never going to be any question about whether or not industry standards are a big pile of baloney.

Nor could these independent third party fail to notice that the people paying them $600,000 were hoping for a particular answer. Nobody in the state capital wanted to hear about having wasted a gazzillion dollars on a big pile of useless crap. There was always going to be a limit to just how much bad news Alpine could slip into the report.

Despite the unveiling of this $600K PR package, the hub-bub doesn't seem to be subsiding. A few counties are still making noise about getting away from the test, nor has there been a great upswell of parents declaring, "Tres jolie! All of my concerns have been addressed, and I know welcome the FSA as a beloved and trusted friend." In fact, if we go back and look at the concerns that were being voiced last spring, we can't help but notice that they are largely unaddressed by the Alpine report. And the Tampa Bay Times still gives the test an F. It's safe to assume that a great many Floridians would still like to bid the test, "Bon voyage!" Even if education commissioner Pam Stewart declares the report "welcome news," it seems to be fairly unconvincing. Too bad, taxpayers who forked over $600 K for nothing. C'est la vie.


  1. In the link to the concerns being voiced last spring, I like Lynne Rigby's suggestion that the legislature and department of ed all take the test and have their scores published.

  2. Roaring with laughter as I read each commendation in the executive study. The writers seemed to have gone out of their way to praise Florida's Department of Education. Flattery will gain them everything, I suppose, as some of their recommendations call for further study ... which they would be happy to provide, for a fee of course.