CURMUDGUCATION: Depth of Knowledge

Showing posts with label Depth of Knowledge. Show all posts

Monday, February 1, 2016

CCSS Flunks Complexity Test

The Winter 2016 issue of the AASA Journal of Scholarship and Practice includes an important piece of research by Dario Sforza, Eunyoung Kim, and Christopher Tienken, showing that when it comes to demanding complex thinking, the Common Core Standards are neither all that nor the bag of chips.

You may recognize Tienken's name-- the Seton Hall professor previously produced research showing that demographic data was sufficient to predict results on the Big Standardized Test. He's also featured in this video from 2014 that does a pretty good job of debunking the whole magical testing biz.

The researchers in this set out to test the oft-repeated claim that The Core replaces old lower order flat-brained standards with new requirements for lots of higher-order thinking. They did this by doing a content analysis of the standards themselves and doing the same analysis of New Jersey's pre-Core standards. They focused on 9-12 standards because they're more closely associated with the end result of education; I reckon it also allowed them to sidestep questions about developmental appropriateness.

The researchers used Webb's Depth of Knowledge framework to analyze standards, and to be honest and open here, I've met the Depth of Knowledge thing (twice, actually) and remain relatively unimpressed. But the DOK measures are widely loved and accepted by Common Coresters (I had my first DOK training from a Marzano-experienced pro from the Common Core Institute), so using DOK makes more sense than using some other measure that would allow Core fans to come back with, "Well, you just didn't use the right thing to measure stuff."

DOK divides everything up into four levels of complexity, and while there's a temptation to equate complexity and difficulty, they don't necessarily go together. ("Compare and contrast the Cat in the Hat and the Grinch" is complex but not difficult, while "Find all the references to sex in Joyce's Ulysses" is difficult but not complex.) The DOK levels, as I learned them, are

Level 1: Recall
Level 2: Use a skill
Level 3: Build an argument. Strategic thinking. Give evidence.
Level 4: Connect multiple dots to create a bigger picture.

Frankly, my experience is that the harder you look at DOK, the fuzzier it gets. But generally 3 and 4 are your higher order thinking levels.

The article is for a scholarly research journal, so there is a section about How We Got Here (mainstream started clamoring for students graduating with higher order smarterness skills so that we would not be conquered by Estonia). There's also a highly detailed explanation of methodology; all I'm going to say about that is that it looks solid to me. If you don't want to take my word for it, here's the link again-- go knock yourself out.

But the bottom line?

In the ELA standards, the complexity level is low. 72% of the ELA standards were rated as Level 1 or 2. That would include such classic low-level standards like "By the end of Grade 9, read and comprehend literature, including stories, dramas, and poems, in the grades 9–10 text complexity band proficiently, with scaffolding as needed at the high end of the range." Which is pretty clearly a call for straight-up comprehension and nothing else.

Level 3 was 26% of the standards. Level 4 was a whopping 2%, and examples of that include CCSS's notoriously vague research project standard:

Conduct short as well as more sustained research projects to answer a question (including a self-generated question) or solve a problem; narrow or broaden the inquiry when appropriate; synthesize multiple sources on the subject, demonstrating understanding of the subject under investigation

Also known as "one of those standards describing practices already followed by every competent English teacher in the country."

Math was even worse, with Level 1 and 2 accounting for a whopping 90% of the standards.

So if you want to argue that the standards are chock full of higher order thinkiness, you appear to have no legs upon which to perform your standardized happy dance.

But, hey. Maybe the pre-Core NJ standards were even worse, and CCSS, no matter how lame, are still a step up.

Sorry, no. Still no legs for you.

NJ ELA standards worked out as 66% Level 1 and 2, Level 3 with a 33%, and Level 4 a big 5%.

NJ math standards? Level 1 and 2 are 62% (and only 8% of that was Level 1). Level 3 was 28%, and Level 4 was 10%.

The researchers have arranged their data into a variety of charts and graphs, but no matter how you slice it, the Common Core pie has less high order filling than NJ's old standards. The bottom line here is that when Core fans talk about all the higher order thinking the Core has ushered into the classroom, they are wrong.

Wednesday, August 26, 2015

Content-Free Curriculum

Coming back to school (first student day yesterday, thanks) reminds me once again of the huge hole in the heart of current Core curriculum. The lack of content. The sad fact that there is no there there.

Yes, someone is going to pop up to say that the Core (both in its original incarnation and all the old wine in new skins versions that have promulgated throughout states where "OMGZ NOES! We has no Common Core!" versions of the Core still roam free) is NOT a curriculum, which is part of the trick. Because the Core really isn't a curriculum in a classic sense; the ELA standards are a sort of anti-curriculum in which teachers are forbidden to care about the content, and must only worry about teaching students to perform certain actions, certain tricks, on a test.

Content exists, and teachers a free to select what they will. Teach Romeo and Juliet or Heart of Darkness or Green Eggs and Ham-- we don't care because it doesn't matter because the literature, the content, has no purpose beyond a playing field on which to practice certain plays. I've accused the Core of treating literature like a bucket in which we carry the important part, the "skills" that the Core demands, but it's more accurate to call literature the paper cup-- disposable and replaceable. We just want you to be able to "find support" or "draw conclusions." About what just doesn't matter.

Back in the early days, we had folks arguing that CCSS called for rich content instruction, that it absolutely demanded a classroom filled with the classic canon. At the time I thought those folks were simply hallucinating, since CCSS makes no content demands at all (the closest it comes is the infamous appendix suggested readings list). But I've come to believe that those folks were reacting to the gap that they saw-- "Without rich content, this set of standards is crap, so apparently, by implication and necessity, this must call for rich content. Because otherwise it's crap."

I think the absence of content is also the origin of the "new kind of non-bubble non-memorizing test" talking point. The old school test their thinking of is the kind that asks you to pick the year the Magna Carta was signed, or identify the main characters of Hamlet. But the Big Standardized Tests cover absolutely no content at all. I could throw out all my literature basal texts and never teach a single item from the canon, a single work of literature all year, and still have my students prepared for the BS Test by studying test-taking techniques while reading an article from the newspaper and answering questions about it every single day.

This is also the secret of Depth of Knowledge instruction. It doesn't matter what you teach, as long as you use it to develop certain mental tricks.

Look at it this way:

A student could graduate from high school with top scores on the BS Test and have read nothing in high school except the daily newspaper. The student's teachers would be rated "proficient," the student's school would be high-achieving, and the student could proudly carry the Common Core BS Test "advanced" seal of approval, without that student ever having read a single classic work of literature or every having learned anything except how to perform certain tricks for answering certain questions when confronted with a text.

This is not a high standard. This is neither college nor career ready. Core supporters are going to say, "Well, the local school is free to-- and should-- fill in the blanks with classic literature and great reading." But the test-and-punish reformster system that we live on does not care a whit for content. If students cannot perform the proper tricks on the BS test, students, teachers and schools will be punished. If students cannot identify Huck Finn, MacBeth, James Baldwin, or Toni Morrison, it won't make a bit of difference to the system.

This tilting of the playing field does not just make the Core content neutral; it makes the Core content hostile. It dismisses the value of literature without so much as a conversation. If you are talking to a Core fan who insists otherwise, ask them this question-- Can I prepare my students to be proficient on the BS Test without reading a single important work of literature? The answer is "yes." If they say otherwise, they are lying.

Thursday, November 13, 2014

NCTQ: It's So Easy

The National Council on Teacher Quality is one of the leaders in the production of education-related nonsense that is somehow taken seriously. The offices of NCTQ may not produce much of anything that provides real substance, but somewhere in that cushy suite there must be the best turd-polishing machine ever built.

NCTQ has published a new "report" that "seeks" to "answer" two questions:

Are teacher candidates graded too easily, misleading them about their readiness to teach? Are teacher preparation programs providing sufficiently rigorous training, or does the approach to training drive higher grades?

And when I say "seeks to answer," what I mean is "tries to cobble together support for the answer they've already settled on." There is no indication anywhere that NCTQ actually wondered what the answers to these questions might be. No, it appears that they set out to prove that teacher candidates are graded too easily as they meander through their rigorless teacher programs.

Who are these guys?

Does the NCTQ moniker seem familiar? That's because these are the guys who evaluated everyone's education program and ran it as a big story in US News (motto "When it comes to sales, lists are better than news"). That evaluation list caused a lot of stir. A lot.

Funny story. I interviewed a college president from a local school who was steamed about that list in part because it slammed them for the low quality of a teacher program that they don't even have. Turns out a lot of people had problems with NCTQ methodology, which involved not actually talking to anybody at the schools, but collecting information less detailed than what you can get from your high school guidance counselor. You can read a cranky critique here and a more scholarly one here. Bottom line-- they get great media penetration with a report that has less substance than my hair.

NCTQ has tried to blunt some of the criticism with moves like adding a Teacher Advisory Group composed of Real Live Teachers. Also, NCTQ uses very pretty graphics in soothing colors.

The new report-- Easy A's (they even italicize the title, like a book title, because it's more hefty and important than a mere "article title")-- is billed as "the latest latest installment of the National Council on Teacher Quality’s Teacher Prep Review, a decade-old initiative examining the quality of the preparation of new teachers in the United States." This is supposed to be part of their "growing body of work designed to ensure that teacher preparation programs live up to the awesome responsibility they assume." For the moment, let's look at the "findings" in this "report."

And the bottom line is...

They studied about 500 schools; these schools are collectively responsible for about half the teacher degrees granted. Two "findings" here.

First, they find the majority of institutions studied (58%) grading standards are lower than for students in other majors on the same campus.

Second, they find a strong link "between high grades and a lack of rigorous coursework, with the primary cause being assignments that fail to develop the critical skills and knowledge every new teacher needs."

Wow!!

I know! My mind boggles at the huge amount of research involved here. This must have required an extensive study of each of the 500 institutions studied. I mean, we're talking about comparing the rigor of assignments in both education and non-education courses, so researchers must have had to dig through tens of thousands of college course assignments in addition to an extensive study of the grading standards of thousands of professors to be able to make these comparisons.

And then to do all the research and number crunching needed to establish a correlation between the rigor of assignments and grades achieved-- this would be a more complicated model than VAM, to tease out all those data,

Also, it's worth noting that NCTQ knows what critical skills and knowledge all new teachers need, which must have been a huge research project all by itself. I do hope they publish that one soon, because if we had such a list, it would certainly revolutionize teacher evaluation and training. In fact, if they've done all this research and know all these answers, why aren't they just traveling from college to college and saying, "Here-- this is what our proven research shows you should be teaching teacher trainees."

Never mind.

Apparently the minds at NCTQ boggled at that research challenge as well. So let's look at what they actually did.

For the first point-- the idea that teacher grads are the recipients of departmental grade inflation-- NCTQ looked at commencement brochures. They checked commencement brochures to see a) who graduated from a teaching program and b) who had an honors grad designation based on GPA.

It is not clear how many years are spanned. There were 500-ish schools studied, and footnotes in an appendix indicate that a total of 436 commencement brochures were discarded for insufficient data. Yet the executive summary says that 44% of all teacher grads in all 509 schools earned honors, while only 30% of all graduating students did. And in 214 schools, there's no real difference, and in 62 schools, teachers had fewer honors grads. How did they get such precise numbers? That is not explained.

There's some more detailed breakdown of methodologies of teasing the data, but that's the data they accumulated from graduation brochures, and the whole argument boils down to "Barely more teachers graduate with honors than do other majors." Oddly enough, this does not lead NCTQ to conclude, "Good news, America! Teachers are actually smarter than the average college grad." I guess it comes down to how you interpret the data. That you collected from graduation brochures.

But what about that lack of rigor?

Having somehow concluded that teacher programs are hotbeds of easy grades, NCTQ turns to the question of who let the rigor out. Once again, their methodology is itself as rigorous as a bowl of noodles left to boil for twelve hours.

Multiple theories as to why students in education majors might appear to excel so often were also examined (e.g., clinical coursework that lends itself to high grades, too many arts and crafts assignments, too much group work, particularly egregious grade inflation, better quality instruction, more female students who tend to get higher grades, opportunities to revise work, and higher caliber students), but none appears to explain these findings as directly as the nature of the assignments.

First of all, interesting list of theories there. Second of all-- "none appears to explain"?? How did you judge that appearance, exactly. Did we just put each theory on a post-it note, stick it to the wall, stand back and hold up our thumbs and squint to see which one looked likely? Because usually the way you reject theories in research is with research.

The Big NCTQ Thumb came to rest on "criterion-referenced" and "criterion-deficient" assignments. We'll come back to that in a moment-- it deserves its own subheading. Just one more note about te methodology here.

NCTQ got their hands on syllabi for 1,161 courses, "not just on teacher education but across an array of majors." Except-- wait. We're looking at 509 schools, so 1,161 works out to a hair over two courses per school, including schools with multiple education programs. Oh, no-- never mind. Here it is in the appendix-- we're only going to do this in depth analysis for seven schools. Plus at thirty-three other schools, we will look at just the education programs. Oh, and on this chart it shows that of the seven in-depth schools, we'll look at only teacher programs in two. So, "wide array" means six other majors at five of the 509 schools.

And yes-- the data comes from course syllabi. At seven schools. Not the course, professors, students-- just the syllabi. So our first assumption will be that these syllabi with their lists of course assignments will tell us everything we need to know about how rigorous the coursework is.

Creating the right hatchet for the job

"Criterion-referenced" is a thing, and it basically means an objective test. "Criterion-deficient," on the other hand, will actually win you a game of googlewhacking because apparently nobody uses the term except NCTQ. "Criterion deficiency" is a real thing, used it seems mostly in the non-teaching world to describe a test that fails to assess an important criterion (e.g. you want a secretary who can word process, but the job evaluation doesn't check for word processing). I bring this up only because now I have a great fancy word for discussing high stakes standardized tests-- they suffer from citerion deficiency.

But back to our story.

NCTQ cross-reference course syllabi with grade records posted publicly by registrars and open records requests (Schools DO that?! Seven years ago I wasn't allowed to know the grades of my own children for whom I was paying bills because of privacy laws!) NCTQ "applauds the commitment to transparency" of those schools willing to complete ignore student privacy concerns.

NCTQ looked at 7,500 assignments and ranked them as either CR or CD (that's my abbreviation--if they can be lazy researchers, I can be a lazy typist). Here are the criteria used to tell the difference:

An assignment is considered criterion-referenced when it is focused on a clearly circumscribed body of knowledge and the assignment is limited so that the instructor can compare students’ work on the same assignment.

Qualities that indicate an assignment is criterion-referenced include
* a limited scope
* evaluation based on objective criteria;
* students’ work products similar enough to allow comparison.

Qualities that indicate an assignment is criterion-deficient include
* an unlimited or very broad scope
* evaluation based on subjective criteria
* students’ work products that differ too much to be compared.

NCTQ provides a sample of each. A CR lesson would be one in which the student is to apply a specific tool to critique videotaped (quaint) lessons. This is good because everyone uses the same tool for the same lessons so that the instructor knows exactly what is going on. A CD assignment would be to teach something-- anything-- to the class using information from the chapter. This is bad because everybody teaches something different, uses different specific parts of the chapter, and teaches in different ways. This would be bad because there would be too many differences the instructor would be unable to determine who is best at the teaching of stuff.

How is this distinction a useless one? Let me count the ways.

1) It assumes that the purpose of an assignment is to allow comparison of one student to another. This would be different from, say, using assignment as a way to develop and synthesize learning, or to mark the growth and development of the individual student.

2) It assumes that all good teachers look exactly the same and can be directly compared. It imagines that difference is a Bad Thing that never intrudes in a proper classroom. This is bananas.

3) It assumes that the best assignments are close-ended assignments that have only one possible outcome. Even reformsters steeped in Depth of Knowledge and Rigorology tout the value of open-ended response tasks with a variety of correct answers without demanding that the many correct responses be ranked in order of most correct.

4) It appears to place the highest value on objective testing. If that is true for my teacher training, is it not true for my practice? In other words, is NCTQ suggesting that the best assessments for me to use with my students are true-false and multiple choice tests rather than any sort of project-based assessment. Because, no.

5) It assumes that all students in the future teacher's future classroom will also be one size fits all. When I ask my students to prepare oral presentations, should I require that they all do reports on Abraham Lincoln so that they can be more easily and accurately compared?

6) It discounts the truth that part of being a professional is being ready and able to exercise subjective judgment in an objective manner. In other words, free from personal prejudice, but open to the ways that the individual student's personality, skills, and history play into the work at hand (without excusing crappy work).

7) They are trying to make this distinction based on assignments listed in syllabi.

There's more

I did look for the appendix evaluating how well five weeks will prepare you for teaching if you come from a super-duper university, but that was one aspect of teacher training ease NCTQ did not address.

There are other appendices, examining ideas such as Why High Grades Are Bad (including, but not limited to, if grades are divorced from learning, would-be employers will find grades less useful). But I'm not going to plow through them because at the end of the day, this is a report in which some people collected some graduation brochures and course syllabi and close read their way to an indictment of all college teacher training programs.

It is not that these questions are without merit. Particularly nowadays, as teacher programs are increasingly desperate to hold onto enough paying customers to keep the ivy covered lights on, teacher training programs are undoubtedly under increasing pressure to manufacture success for their students, one way or another. Nor is it unheard of for co-operating teachers to look at student teachers and think, "Who the hell let this hapless kid get this far, and who the hell is going to actually give him a teaching certificate?"

The question of how well training programs are preparing teachers for real jobs in the real world ought to be asked and discussed regularly (more colleges and universities might consider the radical notion of consulting actual teachers on the matter). And as more reformster foolishness infects college ed departments, the problem of Useless Training only becomes worse. So this is absolutely a matter that needs to be examined and discussed, but the method should be something more rigorous than collecting some commencement brochures and course syllabi and sitting in an office making ill-supported guesses about what these random slips of paper mean.

And yet I feel in my bones that soon enough I'll be reading main stream media accounts of the important "findings" of this significant "research." Talk about your easy A.

Monday, October 6, 2014

Depth of Knowledge? You'll Need Hip Boots.

Have you met Webb's Depth of Knowledge in all its reformy goodness. I just spent a couple of blood pressure-elevating hours with it. Here's the scoop.

In Pennsylvania, our state department of education has Intermediate Units which are basically regional offices for the department. The IU's do some useful work, but they are also the mechanism by which the state pumps the Kool-Aid of the Week out into local districts.

Today my district hosted a pair of IU ladies today (IU reps are typically people who tried classroom teaching on for size and decided to move on to other things). As a courtesy, I'll refer to them as Bert and Ernie, because one was shorter are chirpier and the other has a taller frame and a lower voice. I've actually sat through DOK training before, but this was a bit clearer and direct (but not in a good way).

Why bother with DOK?

Bert and Ernie cleared this up right away. Here's what was written on one of the first slides in the presentation:

It's not fair to students if the first time they see a Depth of Knowledge 2 or 3 question is on a state test (PSSA or Keystone).

In other words, DOK is test prep.

Ernie showed us a pie chart breaking down the share of DOK 2 and 3 questions. She asked how we thought the state will assess DOK 4 questions? Someone went with the obvious "on the test" answer, and Ernie said no, that since DOK 4 questions take time, the Test "unfortunately" could not do that.

There was never any other reason. Bert and Ernie did not even attempt to pretend to make a case that attending to DOK would help students in life, aid their understand, or even improve their learning. This is test prep.

Where did it come from?

Webb (it's a person, not a piece of jargon) developed his DOK stuff in some sort of conjunction with CCSSO. Ernie read out what the initials stand for and then said without a trace of irony, as God is my witness, "They sound like real important people, so we should trust them." She did not mention their connection to the Common Core which, given the huge amount of CCSS love that was going to be thrown around, seems like an odd oversight. The presenters did show us a graphic reminding us that standards, curriculum, and assessments are tied together like the great circle of life. So there's that.

How does it work?

This turned out to be the Great White Whale of the morning. We watched two videos from the Teacher Channel that showed well-managed dog and pony shows in classrooms. Bert noted that she really liked how the students didn't react to or for the camera. You know how you get that? By having them spend lots of time in front of the cameras, say, rehearsing their stuff over and over.

The first grade class was pretty impressive, but it also only had ten children in it. One of my colleagues asked if the techniques can be used in classes with more than ten students (aka, classes in the real world) and that opened up an interesting side note. The duo noted that the key here is routine and expectations, and that you need to spend the first few weeks of school hammering in your classroom routines so that you could manage more work. One teacher in the crowd noted that this would be easier if all teachers had the same expectations (apparently we were all afraid to use the word "rules") and Ernie allowed as how having set expectations and routines from K through the upper grades would make all of this work much better. "Wouldn't it be lovely?" she said.

Because when you've got a system that doesn't work very well with real, live children, the solution is to regiment the children and put them in lockstep. If the system and the childron don't mesh well-- change the children.

Increasing rigor!

You might have thought this section would come with a definition of that illusive magical quality, but no. We still can't really explain what it is, but we know that we can increase rigor by ramping up content or task or both.

We had some examples, but that brought up another unsolved mystery of the day. "Explain where you live" (DOK 1) ramped its way up to "Explain why your city is better than these other cities" (DOK 3). One of my colleagues observed that this was not only a change in rigor, but a complete change of the task and content at hand. Bert hemmed and hawed and did that little I Will Talk To You Later But For Right Now Let's Agree To Ignore Your Point dance, and no answer ever appeared.

So if you are designing a lesson, "List the names of the planets" might be a DOK 1 question, but a good DOK 3 question for that same lesson might be "Compare and contrast Shakespeare's treatment of female characters in three of his tragedies."

Audience participation

Bert and Ernie lost most of the crowd pretty early on, and by the time we arrived at the audience participation portion (two hours later), the audience seemed to have largely checked out. This would have been an interesting time for them to demonstrate how to handle a class when your plan is bombing and your class is disengaged and checked out, but they went with Pretending Everything Is Going Swell.

The audience participation section highlighted just how squishy Depth of Knowledge is. Bert and Ernie consigned all vocabulary-related activities to Level 1, because "you know the definition or you don't." That's fairly representative of how test creators seem to think, but it is such a stunted version of language use, the mind reels. Yes, words have definitions. But there's a reason that centuries of poetry and song lyric that all basically mean, "I would like to have the sex with you," have impressed women far more than simply saying "I would like to have the sex with you."

There's a lot of this in DOK, a lot of just blithely saying, "Well, this is what was going on in the person's brain when they did this, so this is the level we'll assign this task."

DOK's big weakness

DOK is not total crap. There are some ideas in there that can lead to some useful thinking about thinking. And if you set it side by side with the venerable Bloom's, it can get your brain working in the same way that Bloom's used to.

But like all test prep activities, DOK does not set out to teach students any useful habits of mind. It is not intended to educate; it is intended to train students to respond to certain sorts of tasks in a particular manner. This is not about education and learning; this is about training and compliance. It's a useful window into the minds of the people who are writing test items for the Big Test, if you're concerned about your students' test scores. If you're interested in education, this may not be the best use of your morning.

Pages