Sunday, April 20, 2014

How To Do Real Teacher Evaluation

The fans of Reformy Stuff are not wrong about everything. For example, they are correct that the general state of teacher evaluation in this country was pretty useless. Their mistake was replacing Inertly Useless with Actively Destructive. The old system was a simple two step process (1- check for teacher pulse; 2- award perfect score [edit--or, in some Bad Places To Work, award lousy score just because you want to]) while the new step is a little more involved (1- apply random groundless unproven mathematical gobbledeegook to big bunch of bad data; 2- award randomly assigned bad score).

Years ago, frustrated with the old mostly-useless model and before the current looney tunes empire took hold, a friend and I had started to rough out an evaluation system. Let me sketch out the basics for you.

What Should a Good Teacher Eval System Do?

1) Provide clear expectations to the teacher. One of the wacky things about teaching is that everybody is sure that everybody knows what a teacher's exact job description is, and yet it invariably turns out that nobody agrees. In many districts, teachers enter their classrooms with no job description and no really clear idea of what is expected of them.

2) Provide useful feedback and remediation. That includes setting the stage for meaningful remediation if it's called for. Only a small percentage of new teachers will be awesome right out of the box or clearly hopeless. Most are waiting to be guided toward either excellence or despair, and most districts depend on a system that I like to call "Blind Luck." I swear there are teachers out there whose careers could have gone a completely different direction if they had just eaten lunch with a different set of veteran teachers in their first few years.

3) Provide the district with clear information on whether they need to retain, retrain or refrain from hiring permanently.

Assumptions in Building the Eval System

1) Precise, observable data is the enemy of real, useful information. In the hands of hard data overlords, traits like "maintains good communication with parents" ends up being some numerical observable, such as "calls at least two parents every five days." Hard data fans like really precise measures, and so their data may be precise, but their conclusion is always wrong. Mr. McSwellteach may personally visit 150 parents a month or sing in a church choir with half of his total parental units. He may rely more on e-mail because that's what his students' parents prefer. He may have an absolutely uncanny sense of when to contact the parents and when to leave them alone. He may, in short, be a pretty awesome parent communicators, but since the metric focuses on one specific, concrete, observable, measurable piece of data out of a thousand possible factors, it completely misses the real information here.

2) People may not be able to explain a good teacher, but they generally know one when they see her.

3) The best way to correct for individual bias in a survey is to collect information from many, many individuals. And anyway...

4) You're trying to evaluate subjective qualities. This is like trying to evaluate husbands. Your husband from hell may be her perfect dreamboat. There are certainly some rough patterns of qualities that will emerge, clustered around a statistical strange attractor of some sort, but you will not be able to draw a box around a configuration and say, "Everything inside the line is good, and everything outside is bad." If that violates your world view or makes you uncomfortable, just suck it up and put on your Big Person Pants.

The Short Method for Real Teacher Evaluation

Hire a really good principal and let him do his job.

The Method Proposed for the Other 98% of Schools

The first step actually occurs before your district even gets started. This is where our consulting firm start-up was going to have to do some real work. Basically, you need a giant list, a huge constellation of teacher qualities arranged around some master categories such as Knowledge, Community Interaction, Classroom Management-- mostly the basic main qualities that we're familiar with. For each of the master qualities, a truckload of specifics, from "dresses up for work" to "enthusiastic with kids." Not that mostly these will not be specific enough for some of you-- it will be more "communicate well with parents" and not at all "makes two parent phone calls every four days." This massive menu of teacher qualities is where we start and launch into the following steps.

1) Pull together a large committee. It will include teachers, students, administrators, community members, parents, business folks-- as much of a broad representation of the stakeholders as you can gather up. And then using one of any of the many fine models for this kind of group work out there, your group is going to take the master list of traits and customize it.

2) Customizing will cover two factors-- what to include, and how to weight it. It's here where your folks will decide, for instance, that in your community dressing up to teach doesn't matter at all, and that being kind to students is twice as important as being funny. You'll work this out on two levels-- which micro-traits will contribute what percent of the score for the categories (eg "strict disciplinarian" will make up 4% of the "Classroom management" category). And you will work out the relative weight of the master categories. You'll do this on the school level-- Content Knowledge may be 50% of your expectation for secondary teachers but only 30% for elementary, whereas Parent Relationship might run the other way. And hey-- if you want "Prepares students thoroughly for standardized tests" to be a huge factor, you can go ahead and do that.

3) Congratulations. The process was long and hard and involved lots and lots and LOTS of discussion, some of it probably heated, but once you're done you have created a fairly detailed job description, a picture of what your stakeholders expect from a teacher in the district. Imagine, teachers, if on your first day someone had handed you a multi-page detailed and weighted list of the qualities and behaviors they expected you to display instead of a room key and hearty "Good luck!"

4) Hey look! That big involved job description is also the evaluation form. All we have to do is give you a score for each line item, and we have already figured out how much weight that carries in your final evaluation.

5) Who fills out your eval form? Well, some of you won't like this, but our answer was "Everybody." Other teachers, current students, former students (we thought it important to keep alumni in the loop for decades after graduation), admins, parents, anybody we can think of.

The eval forms can be filled with simple number scores, but we allow for narrative to be added if they wish. Will there be outliers-- cranky parent, jerk student, someone who just has an axe to grind? Sure-- but if our sample is large, small outliers won't screw anybody up, and the same software that's going to crunch (and possibly collect) all this data can also be taught to toss out small left-field samples. We could probably even teach the program to block folks who are consistently mischief-makers.

We had never quite figured out weighting as it applied to this portion. I'm pretty sure the principal should carry more weight than Billy-Bob Schnoodleman in 5th grade art class, but we hadn't quite worked out that kink when we stopped working on this. Put it in the to-do pile.

6) Hey look! The evaluation results tell you exactly where your strengths and weaknesses are! And part of this process will involve establishing a in-school remediation work group-- folks who can be mentor and help other teachers with particular weaknesses that match up with their particular strengths. There's a piece for deciding when someone is, well, hopeless, but our focus is on strengthening people. But we'll stop there before I start in on my plan for creating teaching schools that work like teaching hospitals for doctors.

Why This Is Better Than What State and Federal Authorities Want To Do

1) The goal is to help people improve rather than firing our way to excellence. FOWTE creates an ugly atmosphere in a building, and it doesn't really help because the replacement hire is only going to require you to start from scratch again. I know some reformy types think we can churn and turn TFA-style forever, but those people are idiots. With emphasis on building strength, we not only get better teachers, but we automatically build the atmosphere of collegiality, support, and quality work that makes a school a better place.

2) The data comes from many many observations over much time, and not one forty-five minute squat and squint by just one guy. No evaluation system in the world can protect a teacher from an incompetent vindictive principal if he's the only guy who has a say.

3) The data does not come from a bad standardized test that measures little of value and is useless as a teacher measurement tool.

4) The system is transparent. Unlike VAM, which cannot be successfully explained and is apparently created by magic gnomes in a castle under the South Pole, this system is created and weighted in plain sight. Everybody knows what's going on.

5) The system reflects local values. What's the story we keep hearing with the current crop of test-based VAMified evals-- that Mrs. McWunderteach has gotten a terrible rating even though everybody knows that she's an awesome teacher. We should be tapping the source of that "everybody knows," not the Data Overlord system. Does this mean that Teacher Excellence will look different from district to district? Well, yes. Of course it will-- because IT DOES!!

National Standards

I am aware that this system does not really give us a model for teacher evaluation and excellence that scales to the national level. That's one of the reasons I like it. It's actually a bit of a compromise, because if every single district used it, they would still be able to talk to each other, but they would still be free to do what seemed best in their own district ("Oh, is that how you scaled and weighted Content Knowledge for elementary? Here's what our sheet ended up looking like").

Yes, an Excellent Teacher in Buford, Montana might be a different set of measures and paperwork than an Excellent Teacher in Nicetown, Tennessee-- but each of those districts would have what they believed to be examples of excellent teachers. What would be better than that?

At any rate, we had this system well past halfway done when the New Evaluations started to emerge. But it still represents my idea of how a useful, authentic teacher evaluation would work, and is definitely my answer to, "Well, if you don't want to use this awesome teacher evaluation system of VAm and test scores and Danielson rubrics, then what DO you want to do?" This. I want to do this.


  1. I don't know how you reach your first conclusion, "Check for pulse... etc." I've known a number of beginning teachers that were let go and they fall into a few groups; 1. I don't know why they were let go, or 2. they decided that they would rather do something else, or 3. the principal didn't like them and decided to run them out. In L.A. number 3 is very common and we have to acknowledge that since we aren't in the classroom with them, number 1 is almost always true.
    In all of the analysis such as yours that tries to make sense of teacher evaluations one thing is perfectly clear and always unspoken; the decision comes down to one person - the principal and the assumption is that he/she is endowed by his/her creator with unique and infallible ability to make this decision based upon a checklist or two provided him/her be a superior administrator someplace that is the recipient of an even more awesome and infallible ability to see into the hearts and minds of teachers and decide who is worthy and who is not.


    What makes a good teacher is hard to pin down but dedication to learning and connection with students are two important qualities. In L.A. schools are run like little seigneuries with the petty-despot principal in charge and teachers that are not on his good side are damned no matter what they do. Two things are certain; to continue to treat teachers as ignorant and stupid will guarantee that they will never do their best and that they'll do what is necessary to maintain their livelihoods; and to continue to treat administrators as endowed with a saint-like ability to run schools without meaningful teacher input will give our system nothing better than what we have now.
    Never in these discussions is the role and evaluation of a principal addressed. In L.A. it is career suicide to go above the principal's head for any request or concern including text books. The tribalism of the all powerful middle managers cannot be breached. If a teacher doesn't have the materials necessary to teach then it's just too bad and by the way the teacher can be gotten rid of for not having all of the materials.

  2. Very true about step 2 in the traditional set-up-- I think I'll edit to reflect that. I do hear you about the issues of the god-king principalship-- that's why this model intended to reduce principals to one voice in a very large crowd.

  3. I can't remember if I've shared this link with you or not. Apologies if I'm repeating myself, but if not, an interesting read:

  4. This comment has been removed by the author.

  5. What if in addition to somehow determining weights for different people's evaluations, some categories were limited to only certain "types" of people? For example, parents wouldn't have first-hand experience with a teacher's classroom style. They would only hear what their children have to say. If he or she scored aspects of the teacher's classroom style based only on what his or her child has to say, that child's opinion would essentially be represented multiple times, leading to skewed scores depending on the frequency of situations like this.

  6. The Evaluation system in our District can be skewed to fit the agenda or philosophy of the admin, who often put "favorites" on a pedestal. I saw it with my own eyes this past week, Nov/2019. My colleague across the hall was rated for 2019 as distinguished, with a whopping 2.53 however, that colleague is not certified in the area (English) and has 2 rough years of classroom experience, teaching English, and largely worked out of textbook packets. I can point out that the Learning Support certification does have a "Bridge" for ELA. For veterans in the field, the evaluation model is meaningless, favorites are touted, and the dirty deed goes largely unnoticed. My 30 year veteran score for a "knock your socks off" on the Importance of Being Earnest - 2.13. Ohhh yeah.