Wednesday, October 10, 2012

Measures for Teacher' Evaluations

The MET Project Combines Three Measures for Teacher Evaluation ​In this Education Next article, Harvard professor Thomas Kane describes the work of the Gates-funded Measures of Effective Teaching (MET) project, which, under his direction, is “searching for tools to save the world from perfunctory teacher evaluation.” MET’s 2012 report recommends that three tools should be used to evaluate teachers – test-score gains, classroom observations, and student surveys – to compensate for the built-in weaknesses of each. Here is Kane’s analysis of their strengths and shortcomings: ​• Test-score gains – Looking at a teacher’s track record of producing student-achievement gains is better than the other two measures at signaling whether a teacher will get similar gains in the future, especially if the same test is used. The correlation between a teacher’s value-added in one year with another is .48 in math and .36 in English language arts. Interestingly, MET researchers found that gains on lower-level multiple-choice tests correlated well with gains on higher-level constructed-response tests and with students’ success in non-cognitive areas. But value-added analysis of test scores has significant weaknesses: only about one quarter of teachers work in grades with standardized ELA and math tests; the scores that are available don’t provide much help in improving classroom practices; and ELA scores are considerably less reliable than math scores. ​• Classroom observations – The MET researchers hired and trained observers and studied the efficacy of six different rubrics to score 7,500 classroom videotapes. Observation of lessons did better than the other two measures at improving classroom practice, especially if the observers were well trained and honest with their feedback. But lesson evaluations have numerous disadvantages: the evidence of impact on student achievement is unproven; classroom creativity may be stifled if teachers feel they have to conform to one rubric’s definition of good teaching; there’s considerable variation in ratings from lesson to lesson and observer to observer; and getting several observations by several different observers, which MET considers essential to reliability, is expensive. ​• Student surveys – MET researchers administered the Tripod survey to students in grades 4-9 (see item #3 below), making sure students trusted that their feedback was confidential. The questions, developed by Ron Ferguson of Harvard and his colleagues, ask students to rate their teachers on a 5-4-3-2-1 scale on specific, observable characteristics, for example: - In this class, we learn to correct our mistakes. - Our class stays busy and does not waste time. - Everybody knows what they should be doing and learning in this class. Data from the surveys showed that students see clear differences among teachers, and ratings of teachers were quite consistent across different groups of students (.66 correlation). Students’ evaluations of their teachers were a better predictor of ELA and math achievement gains than classroom observations, but not as robust as value-added test-score analysis. “Even if the typical student is less discerning than a trained adult,” says Kane, “the ability to average over many students (rather than one or two adults), and having students experience 180 days of instruction (rather than observe two or three lessons), obviously improves reliability.” Student surveys have the additional advantage of being quite inexpensive. ​Kane uses the analogy of the way his 6-year-old son picks a team of superheroes with different strengths: the way to deal with the various strengths and weaknesses of these three approaches to teacher evaluation is to use all three. Combining test-score analysis, classroom observation, and student surveys produces evaluations that are less volatile and have greater predictive power. Plotting predictive power against reliability (see the graph in the linked article below), combined ratings are significantly better than any single measure. The MET team weighted the three tools .758, .042, and .200 respectively. (Although classroom observations were given the least weight, the team hopes that feedback to teachers will end up being an important contributor to improved teaching and learning.) “The use of multiple measures not only spreads the risk but also provides opportunities to detect manipulation or gaming,” says Kane. It also allows administrators to take a closer look when results from the three tools don’t line up – for example, a teacher might be using unconventional classroom methods that don’t produce high rubric scores, but still show high student achievement value-added.

“Capturing the Dimensions of Effective Teaching” by Thomas Kane in Education Next, Fall 2012 (Vol. 12, #4, p. 34-41), http://educationnext.org/capturing-the-dimensions-of-effective-teaching

Stephen Anderson

No comments:

Post a Comment