Incident 9: NY City School Teacher Evaluation Algorithm Contested

Description: An algorithm used to rate the effectiveness of school teachers in New York has resulted in thousands of disputes of its results.
Alleged: New York city Dept. of Education developed and deployed an AI system, which harmed Teachers.

Suggested citation format

Olsson, Catherine. (2012-02-25) Incident Number 9. in McGregor, S. (ed.) Artificial Intelligence Incident Database. Responsible AI Collaborative.

Incident Stats

Incident ID
Report Count
Incident Date
Sean McGregor


New ReportNew ReportNew ResponseNew ResponseDiscoverDiscover

CSET Taxonomy Classifications

Taxonomy Details

Full Description

A value-added measurement based algorithm used to calculate the effectiveness of school teachers is being challenged for its lack of apparent accuracy and real-world relevance. In Rochester, New York approximately 600 teachers are disputing the results, and in Syracuse between 400-500 are disputing the results. The VAM score can be used to give raises to teachers deemed "effective" or "highly effective", but can also be used to fire teachers who are given two "ineffective" ratings in a row. Teachers criticize the algorithm for including only Math and English in the evaluation (even for teachers of other subjects as the algorithm only covers those two subjects), using school averages to calculate a single student's expected average, and high-grade-earning students being predicted to grow at literally impossible rates (to score grades higher than 100% on tests).

Short Description

An algorithm used to rate the effectiveness of school teachers in New York has resulted in thousands of disputes of its results.



Harm Distribution Basis

Other:School Teachers

Harm Type

Financial harm

AI System Description

value-added analysis algorithm used for evaluating a school teacher's effectiveness of teaching

System Developer

New York city Dept. of Education

Sector of Deployment


Relevant AI functions


AI Techniques

value-added mesaurements

AI Applications

data processing, data prediction


New York, United States of America

Named Entities

New York Department of Education, United Federation of Teachers, Sheri Lederman, Common Core, Governor Andrew Cuomo

Technology Purveyor

New York Department of Education

Beginning Date


Ending Date


Near Miss

Harm caused



Lives Lost


Data Inputs

School grades, student grades, predicted grades

Incident Reports

Stories & Grievances

A NYC Math Teacher Fights Back After Receiving an Unfair 'Unsatisfactory' Rating from a Principal

Edmond Farrell uses the Freedom of Information Law and Department of Education/Teacher regulations in his fight to change an unfair 'unsatisfactory' rating. His rights in the administrative proceedings were violated, Chancellor Klein never answered his appeal, and now a Verified Petition has been filed with Commissioner Mills on the grounds of teacher abuse, abuse of discretion, and age discrimination. This case may change the practice of dumping good teachers in New York State.

Parents throughout New York City and across America are wondering why good teachers disappear from their child's schools while terrible - and sometimes even abusive - teachers are allowed to stay. We at believe that there is a "new" criteria for teachers that no one wants to talk about, because it is illegal. The criteria for obtaining employment at the Department of Education is no longer quality in teaching or "loves children". Instead, you will get hired if:(1) you are perceived as someone who will never ask questions or whistleblow what you see inside the secret hallways and offices of your school. In fact, you may have just moved to a new state because of crimes committed there, and you have a reason to maintain a school's secrecy.(2) you support the implementation of curricula that is unproven, unknown, and worthless, such as TERC math, Everyday Mathematics, "Balanced Literacy", etc., all of which are currently imposed on teachers throughout the US. The US government and educational researchers still have not found statistical research to support these programs. We all must see newspaper reports about success as part of the "Armstrong Williams" syndrome, or payola in the media. (See " Armstrong Williams: Education Propaganda, Payola, or Whatever You Call it, is Still False ADvertising and Political Misconduct ")This obviously "must keep confidential" criteria for teachers is harming our children and their future. They are not being given the skills and knowledge that they need to achieve and be successful in whatever they want to do because many kids drop out in frustration or are pushed out, suspended, arrested, and abused in order to leave. Principals do, we must not forget, want their school scores to go up rather than down, any way they can make this happen. Bonuses matter.Superintendents and principals have the power to "observe" subordinates and judge their performance based upon their "observations". Often the observation reports may be biased or prejudicial, and the teacher has little recourse. In New York City, an award-winning math teacher at John Adams High School, Mr. Edmond Farrell, was given six unsatisfactory observation reports after 9 years of satisfactory and excellent ratings. What changed was the administration: a new Principal, Mrs. Grace Zwillenberg, was appointed to the school in 2003-2004. She does not like Ed, who is over the age of 50. She wanted him out, and used observation reports to get him out of the school. He believes this is wrong, and is determined to prove this in court if necessary.In New York City observation reports have been designated in the courts as "opinions" ( Elentuck v Green, 202 AD2d 425, 608 NYS2d 701, 702; 1994 ). Will Mr. Farrell's U-rating hold up in court? He has filed a Verified Petition with New York State Education Commissioner Richard Mills to find out.Edmond Farrell, a math teacher in New York City who had the unfortunate experience of being, he believed, falsely accused of 'unsatisfactory' performance, fought back using references to the Freedom of Information Law and the Regulations of the Commissioner of Education. His letter, we believe, should be used as a model for anyone fighting an unfair assessment by someone higher up who is trying to get rid of/silence/harass him/her.I should add that at the Office of Appeals and Reviews Hearing for Edmond Farrell on November 8, 2004, I and Mr. Norman Scott, both listed as witnesses in the letter that follows, were barred from being witnesses or entering the Hearing at all. OAR Director Mrs. Virginia Caputo created quite a scene screaming at me that I could not, under any circumstances, be part of the Hearing, because "only Board of Education employees may be witnesses". When I asked to see this regulation in writing, she became even more angry, and told me she did have them available. Her Assistant in the OAR is Mr. Greg Brooks, who told me the same thing in the same way. Mr. Norman Scott was also barred because he currently does not work for the DOE and is retired. The other three witnesses listed in the letter below were dismissed before the Hearing. Ms. Caputo made a 'new' regulation for Mr. Scott, namely that he DID work for the DOE but not at John Adams High School. Mr. Farrell's UFT representative, Ms. Ritter, who accompanied him into the hearing, was also a retired teacher but had

A NYC Math Teacher Fights Back After Receiving an Unfair 'Unsatisfactory' Rating from a Principal

The New York City Department of Education released today a list of individual ratings of thousands of the city's schoolteachers, a move that concludes a lengthy legal battle waged by the local teachers' union and media.

The Teacher Data Reports rate more than 12,000 teachers who taught fourth through eighth grade English or math between 2007 and 2010 based on value-added analysis. Value-added analysis calculates a teacher's effectiveness in improving student performance on standardized tests -- based on past test scores. The forecasted figure is compared to the student's actual scores, and the difference is considered the "value added," or subtracted, by the teachers.

To some, the release means a step forward in using student data and improving transparency and accountability by giving parents access to information on teacher effectiveness. To others, it's a misguided over-reliance on incomplete or inaccurate data that publicly shames or praises educators, whether deserving or not.

In response, the union, the United Federation of Teachers, has launched a city-wide newspaper advertising campaign. The ad headlines, "This Is No Way To Rate A Teacher!" followed by a lengthy and complicated mathematical formula as well as a letter from UFT President Michael Mulgrew with a list of all the reasons he says the data reports are faulty and unreliable.

The ad will likely appear in the very publications being targeted for disseminating the Teacher Data Reports. Cynthia Brown, vice president of education policy at the Center for American Progress, issued a statement statement Friday drawing on findings from a November CAP report. The study concluded that publicly naming teachers tied to the performance and projected performance of their students actually undermines efforts to improve public schools, making it much harder to implement teacher evaluation systems that actually work.

"While we support next-generation evaluation systems that include student achievement as a component, we believe the public release of value-added data on individual teachers is irresponsible," Brown said Friday. "In this case, less disclosure is more reform."

Amid the report-releasing frenzy, GothamSchools is one news organizations that has stepped back from the crowd. It was one of the many news outlets that sought access to the Teacher Data Reports last year, but after internal deliberations, determined that they would not publish the raw database because "the data were flawed, that the public might easily be misled by the ratings, and that no amount of context would justify attaching teachers' names to the statistics."

The Times has publicly invited teachers to respond to their ratings, to be published side-by-side for readers to consider together: "If there were special circumstances that compromise the credibility of the numbers in particular cases, we want to know."

The reports were developed as a pilot program several years ago by then-Schools Chancellor Joel Klein as a part of the city's annual review of its teachers, and were later factored into tenure decisions. The ratings were intended for internal use and were not planned to be made public. Media organizations -- among them The Wall Street Journal, The New York Times and the New York Daily News -- sued for access to the data under the Freedom of Information Act. A court ruled in favor of the news organizations in August.

"When balancing the privacy interests at stake against the public interest in disclosure of the information ... we conclude that the requested reports should be disclosed," the court wrote, according to The Wall Street Journal. "Indeed, the reports concern information of a type that is of compelling interest to the public, namely, the proficiency of public employees in the performance of their job duties."

Because the New York teacher ratings are based on small amounts of data, there exist large margins of error. To add to that, the test scores the analyses are based on were determined by the state Department of Education to have been inflated because the exams had become predictable and easier to pass -- to the extent that students were told incorrectly they were proficient in certain subjects.

Teachers of students who took those tests, according to the Daily News, could find themselves penalized in their Teacher Data Report ratings for not teaching to the test. Conversely, those who narrowed their curricular focus catered to the exam could be rewarded.

Other omissions and errors, like failure to verify class size and assignment for each teacher, will also skew results of the analysis.

New York high school teacher Stephen Lazar expressed on his blog -- and on a comment in The New York Times -- that he's disappointed by the decision of many publications to release the data. He points to the shortcomings of value-added systems, writing about how he spent six weeks teaching students how to do college-level research that likely cost his students 5-10 points on the Re

New York City Teacher Ratings: Teacher Data Reports Publicly Released Amid Controversy

Late last week and over the weekend, New York City newspapers, including the New York Times and Wall Street Journal, published the value-added scores (teacher data reports) for thousands of the city’s teachers. Prior to this release, I and others argued that the newspapers should present margins of error along with the estimates. To their credit, both papers did so.

In the Times’ version, for example, each individual teacher’s value-added score (converted to a percentile rank) is presented graphically, for math and reading, in both 2010 and over a teacher’s “career” (averaged across previous years), along with the margins of error. In addition, both papers provided descriptions and warnings about the imprecision in the results. So, while the decision to publish was still, in my personal view, a terrible mistake, the papers at least make a good faith attempt to highlight the imprecision.

That said, they also published data from the city that use teachers’ value-added scores to label them as one of five categories: low, below average, average, above average or high. The Times did this only at the school level (i.e., the percent of each school’s teachers that are “above average” or “high”), while the Journal actually labeled each individual teacher. Presumably, most people who view the databases, particularly the Journal's, will rely heavily on these categorical ratings, as they are easier to understand than percentile ranks surrounded by error margins. The inherent problems with these ratings are what I’d like to discuss, as they illustrate important concepts about estimation error and what can be done about it.

First, let’s quickly summarize the imprecision associated with the NYC value-added scores, using the raw datasets from the city. It has been heavily reported that the average confidence interval for these estimates – the range within which we can be confident the “true estimate” falls - is 35 percentile points in math and 53 in English Language Arts (ELA). But this oversimplifies the situation somewhat, as the overall average masks quite a bit of variation by data availability. Take a look at the graph below, which shows how the average confidence interval varies by the number of years of data available, which is really just a proxy for sample size (see the figure's notes).

When you’re looking at the single-year teacher estimates (in this case, for 2009-10), the average spread is a pretty striking 46 percentile points in math and 62 in ELA. Furthermore, even with five years of data, the intervals are still quite large – about 30 points in math and 48 in ELA. (There is, however, quite a bit of improvement with additional years. The ranges are reduced by around 25 percent in both subjects when you use five years data compared with one.)

Now, opponents of value-added have expressed a great deal of outrage about these high levels of imprecision, and they are indeed extremely wide – which is one major reason why these estimates have absolutely no business being published in an online database. But, as I’ve discussed before, one major, frequently-ignored point about the error – whether in a newspaper or an evaluation system – is that the problem lies less with how much there is than how you go about addressing it.

It’s true that, even with multiple years of data, the estimates are still very imprecise. But, no matter how much data you have, if you pay attention to the error margins, you can, at least to some degree, use this information to ensure that you’re drawing defensible conclusions based on the available information. If you don’t, you can't.

This can be illustrated by taking a look at the categories that the city (and the Journal) uses to label teachers (or, in the case of the Times, schools).

Here’s how teachers are rated: low (0-4th percentile); below average (5-24); average (25-74); above average (75-94); and high (95-99).

To understand the rocky relationship between value-added margins of error and these categories, first take a look at the Times’ “sample graph” below.

This is supposed to be a sample of one teacher’s results. This particular hypothetical teacher’s value-added score was at the 50th percentile, with a margin of error of plus or minus roughly 30 percentile points. What this tells you is that we can have a high level of confidence that this teacher’s “true estimate” is somewhere between the 20th and 80th percentile (that’s the confidence interval for this teacher), although it is more likely to be closer to 50 than 20 or 80.

One shorthand way to see whether teachers scores are, accounting for error, average, above average or below average is to see whether their confidence intervals overlap with the average (50th percentile, which is actually the median, but that's a semantic point in these data).

Let’s say we have a teacher with a value-added score at the 60th percentile, plus or minus 20 points, making for a confidence interval of 40-80. This crosses the average/median “border," si

Reign Of Error: The Publication Of Teacher Data Reports In New York City

In part 1 I demonstrated there was little correlation between how a teacher was rated in 2009 to how that same teacher was rated in 2010. So what can be more crazy than a teacher being rated highly effective one year and then highly ineffective the next? How about a teacher being rated highly effective and highly ineffective IN THE SAME YEAR.

I will show in this post how exactly that happened for hundreds of teachers in 2010. By looking at the data I noticed that of the 18,000 entries in 2010, about 6,000 were repeated names. This is because there are two ways that one teacher can get multiple value-added ratings for the same year.

The most common way this happens is when the teacher is teaching self-contained elementary in 3rd, 4th, or 5th grade. The students take the state test in math and in language arts and that teacher gets two different effectiveness ratings. So a teacher might, according to the formula, ‘add’ a lot of ‘value’ when it comes to math, but ‘add’ little ‘value’ (or even ‘subtract’ value) when it comes to language arts.

To those who don’t know a lot about education (yes, I’m talking to you ‘reformers’), it might seem reasonable that a teacher can do an excellent job in math and a poor job in language arts and should not be surprising if the two scores for that teacher do not correlate. But those who do know about teaching would expect the amount the students to learn to correlate since someone who is doing an excellent job teaching math is likely to be doing an excellent job teaching language arts since both jobs are set up by some common groundwork that benefits all learning in the class. The teacher has good classroom management. The teacher has helped her students to be self-motivated. The teacher has a relationship with the families. All these things increase the amount of learning of every subject taught. So even if an elementary teacher is a little stronger in one subject than another, it is more about the learning environment that the teacher created than anything else.

Looking through the data I noticed teachers, like a 5th grade teacher at P.S. 196 who scored 97 out of 100 in language arts and 2 out of 100 in math. This is with the same students in the same year! How can a teacher be so good and so bad at the same time? Any evaluation system in which this can happen is extremely flawed, of course, but I wanted to explore if this was a major outlier or if it was something quite common. I ran the numbers and the results shocked me (which is pretty hard to do). Here’s what I learned:

Out of 5,675 elementary school teachers, the average difference between the two scores was a whopping 22 points. One out of six teachers, or approximately 17%, had a difference of 40 or more points. One out of 25 teachers, which was 250 teachers altogether, had a difference of 60 or more points, and, believe it or not, 110 teachers, or about 2% (that’s one out of fifty!) had differences of 70 or more points. At the risk of seeming repetitive, let me repeat that this was the same teacher, the same year, with the same kids. Value-added was more inaccurate than I ever imagined.

I made a scatter plot of the 5,675 teachers. On the x-axis is that teacher’s language arts score for 2010. On the y-axis is that same teacher’s math score for 2010. There is almost no correlation.

For people who know education, this is shocking, but there are people who probably are not convinced by my explanation that these should be more correlated if the formulas truly measured learning. Some might think that this really just means that just like there are people who are better at math than language arts and vice versa, there are teachers who are better at teaching math than language arts and vice versa.

So I ran a different experiment for those who still aren’t convinced. There is another scenario where a teacher got multiple ratings in the same year. This is when a middle school math or language arts teacher teaches multiple grades in the same year. So, for example, there is a teacher at M.S. 35 who taught 6th grade and 7th grade math. As these scores are supposed to measure how well you advanced the kids that were in your class, regardless of their starting point, one would certainly expect a teacher to get approximately the same score on how well they taught 6th grade math and 7th grade math. Maybe you could argue that some teachers are much better at teaching language arts than math, but it would take a lot to try to convince someone that some teachers are much better at teaching 6th grade math than 7th grade math. But when I went to the data report for M.S. 35 I found that while this teacher scored 97 out of 100 for 6th grade math, she only scored a 6 out of 100 for 7th grade math.

Again, I investigated to see if this was just a bizarre outlier. It wasn’t. In fa

Analyzing Released NYC Value-Added Data Part 2

Teachers plan widespread appeals of 'unfair' evaluations

ALBANY—Hundreds of teachers in urban school districts plan to appeal performance evaluations that could be used as grounds for termination under a new statewide system for evaluating teachers.

Districts and unions pointed to students' low test scores on the new, more difficult state exams and technical glitches to explain the lower-than-expected ratings.

Story Continued Below

Union leaders estimated that 40 percent of teachers in Syracuse and Rochester got the two lowest grades, which will require their districts to develop individualized professional development plans to help them improve. More than 600 teachers in Rochester and between 400 and 500 teachers in Syracuse are planning to appeal or have already begun the appeals process, according to unions.

School districts reviewed teachers' scores for the state-assigned portion of the evaluations and were required to report the full scores to the Department of Education on Friday. Sixty percent of the evaluation is based on observations by administrators or peers, 20 percent is based on local assessments or other growth measures and 20 percent is based on state exams.

Teachers are rated on a scale of "highly effective," "effective," "developing" and "ineffective." If a teacher gets two "ineffective" ratings in a row, he or she could be fired.

State education commissioner John King said this week the state will release the scores to the public “later this fall.”

Schools and unions claim that teachers are being unfairly penalized for the proficiency plunge on school exams from last school year, which were based on more difficult curriculum standards, called the Common Core. But many of the problems were technical in nature, such as situations in which teachers were held accountable for the test scores of students they never taught, school leaders said.

Education department officials were not available for an interview, despite several requests.

“The results of this process are a clear indictment of this process,” said Adam Urbanski, president of the Rochester Teachers Association. “If the commissioner and the Regents continue to turn a deaf ear to this unfair and ridiculous approach to teacher evaluation, I think that teachers will refuse to participate in contributing to their own demise. You may have mass insubordination on your hands.”

Urbanski said most of the 900 teachers in Rochester that were rated "developing" or "ineffective" received high scores in the category determined by observations, but few or no points for the state-exam portion.

Elementary and middle-school students took the first Common Core-aligned exams in April. Education officials predicted test scores would plummet, and they did: just 31 percent of students scored proficient or higher in math and language arts, and those scores were lower in urban districts.

In Rochester, only five percent of students were proficient.

Urbanski said teachers who had more students with disabilities or English-language learners fared worse under the evaluation system. The union's data was collected informally and is incomplete, but provides a picture of teachers' results in Monroe County and surrounding counties.

Chip Partner, spokesman for the Rochester district, said he could not discuss the scores until the state had released them to the public.

Kevin Ahern, president of the Syracuse Teachers Association, said the volume of appeals in his district is a result of the rushed implementation of the evaluations and Common Core.

But he's hoping the process will go more smoothly in its second year.

“As we go forward, we are working with the district to address many of the issues we have identified through this appeal process, and we are going to work with the district to, for lack of a better term, fix the scores of people whose appeals have merit, to the best of our ability,” he said. “But going forward, we're hoping to improve the process.”

Officials from the Syracuse school district were not immediately available for comment.

New York City teachers were not evaluated last year, because the Department of Education and the United Federation of Teachers missed a mid-January deadline for negotiating a state-approved plan, forfeiting an increase in state funding. The district's final plan wasn't adopted until June.

The evaluation system was a requirement of New York's $750 million Race To The Top award, a federal competitive grant the state secured in 2010. Gov. Andrew Cuomo has pushed the evaluations as a key strategy for improving student outcomes, which are mediocre despite that New York spends more money per pupil than any other state.

Robert Lowry, deputy director of the state Council of School Superintendents, said administrators felt cramped for time during the implementation process but were grateful for the opportunities to engage with teachers about good instructional practice.

“A lot of [districts] encountered difficulties, and some of them are

Teachers plan widespread appeals of 'unfair' evaluations

Former New York state education commissioner John King, who is being sued by a N.Y. teacher over the state’s educator evaluation system. (Mike Groll/AP)

A veteran teacher suing New York state education officials over the controversial method they used to evaluate her as “ineffective” is expected to go to New York Supreme Court in Albany this week for oral arguments in a case that could affect all public school teachers in the state and even beyond.

Sheri G. Lederman, a fourth-grade teacher in New York’s Great Neck public school district, is “highly regarded as an educator,” according to her district superintendent, Thomas Dolan, and has a “flawless record”. The standardized math and English Language Arts test scores of her students are consistently higher than the state average.

Yet her 2013-2014 evaluation, based in part on student standardized test scores, rated her as “ineffective.” How can a teacher known for excellence be rated “ineffective”? It happens — and not just in New York.

The evaluation method, known as value-added measurement (or modeling), purports to be able to predict through a complicated computer model how students with similar characteristics are supposed to perform on the exams — and how much growth they are supposed to show over time — and then rate teachers on how well their students measure up to the theoretical students. New York is just one of the many states where VAM is one of the chief components used to evaluate teachers.

Testing experts have for years been warning school reformers that efforts to evaluate teachers using VAM are not reliable or valid, but school reformers, including Education Secretary Arne Duncan and New York Gov. Andrew Cuomo, both Democrats, have embraced the method as a “data-driven” evaluation solution championed by some economists.

Lederman’s suit against state education officials — including John King, the former state education commissioner, who now is a top adviser to Duncan at the Education Department — challenges the rationality of the VAM model used to evaluate her and, by extension, other teachers in the state. The lawsuit alleges that the New York State Growth Measures “actually punishes excellence in education through a statistical black box which no rational educator or fact finder could see as fair, accurate or reliable.”

It also, in many aspects, defies comprehension. High-stakes tests are given only in math and English language arts, so reformers have decided that all teachers (and, sometimes, principals) in a school should be evaluated by reading and math scores. Sometimes, school test averages are factored into all teachers’ evaluations. Sometimes, a certain group of teachers are attached to either reading or math scores; social studies teachers, for example, are more often attached to English Language Arts scores, while science teachers are attached to math scores. An art teacher in New York City explained in this post how he was evaluated on math standardized test scores and saw his evaluation rating drop from “effective” to “developing.”

A teacher in Florida — which is another state that uses VAM — discovered that his top-scoring students actually hurt his evaluation. How? In Indian River County, an English Language Arts middle school teacher named Luke Flynt told his school board that through VAM formulas, each student is assigned a “predicted” score — based on past performance by that student and other students — on the state-mandated standardized test. If the student exceeds the predicted score, the teacher is credited with “adding value.” If the student does not do as well as the predicted score, the teacher is held responsible and that score counts negatively toward his/her evaluation. He said he had four students whose predicted scores were “literally impossible” because they were higher than the maximum number of points that can be earned on the exam. He said:

“One of my sixth-grade students had a predicted score of 286.34. However, the highest a sixth-grade student can earn earn is 283. The student did earn a 283, incidentally. Despite the fact that she earned a perfect score, she counted negatively toward my valuation because she was 3 points below predicted.

Hard to believe, isn’t it?

In 2012-13, 68.75 percent of Lederman’s New York students met or exceeded state standards in both English and math. She was labeled “effective” that year. In 2013-2014, her students’ test results were very similar, but she was rated “ineffective.” Dolan, the superintendent, said in an affidavit:

As superintendent of the GNPS, I have personally known Dr. Lederman for approximately 4 years. I have had the opportunity to meet with her personally. I have also reviewed her record of teaching, particularly the performance of her students on New York State assessment tests. I can personally attest that she is highly regarded as an educator by the administration of GNPS. Her classroom observations have consistently identified her as an exceptional educat

Master teacher suing New York state over ‘ineffective’ rating is going to court

Just over half of New York City teachers were evaluated in the 2015–16 school year, in part, by tests in subjects or of students they didn’t teach, according to data obtained by Chalkbeat through a public records request.

At 53 percent of city teachers, it’s significant number, but substantially lower than in previous years, possibly thanks to a moratorium placed on using state tests, instituted mid-year.

That figure also highlights a key tension in evaluating all teachers by student achievement, even teachers who work with young students or in subjects like physical education. Being judged by other teachers’ students or subjects has long annoyed some educators and relieved others, who otherwise might have had to administer additional tests.

Supporters say evaluating teachers by group measures — often school-wide scores on standardized tests — helps create a sense of shared mission in a school. But the approach could also push teachers away from working in struggling schools.

RELATED: Do school vouchers really work?

“The key point around school-wide measures is that this could serve as a strong disincentive for these teachers in non-tested grades and subjects to stay in lower-performing schools,” said Matthew Steinberg at the University of Pennsylvania, who has studied teacher evaluation systems.

Will Mantell, a spokesperson for the New York City Department of Education, defended the district’s approach.

“Selecting school-wide [or] grade-wide … measures may better measure educators’ practice and support professional development,” he said. “For example, it makes sense for a social studies teacher who emphasizes writing in her classroom to be evaluated partially on an assessment of students’ ELA skills.”

New York’s evaluation system has gone through a number of substantial changes since it was first codified in state law in 2012, part of a nationwide push to connect teacher performance to student test scores, spurred by federal incentives.

Student assessments have comprised anywhere from 40 percent of the evaluation to essentially 50 percent, under a matrix system pushed by Governor Andrew Cuomo in 2015. Most recently, New York stopped using grades 3-8 English and math state tests as part of the system, but teachers must continue to be judged based on some assessment.

States across the country have struggled to evaluate teachers in traditionally non-tested grades and subjects. New York City has created a number of exams — known as performance assessments — in non-tested areas and given schools significant flexibility in which measures are used to judge their teachers.

In the 2015-16 school year, 53 percent of teachers were evaluated by a group metric, meaning one not focused on their subject or students. In the two previous years, the number was much higher — around 85 percent. It’s not clear why there was a substantial drop, but a spokesperson for the city’s education department notes that 2015-16 was an “outlier” due to the moratorium on state tests, instituted mid-year.

In all three years, most teachers were also evaluated by at least one individualized measure targeted to teachers’ grade, subject and students.

Data for the most recent school year are not yet available.

It’s also not clear what percentage of a teacher’s rating was based on group measures, and Mantell said this “varies from teacher to teacher.”

The United Federation of Teachers has pushed to give schools more individual options, including the use of more “authentic” assessments, not based on multiple choice questions.

“Right now, we don’t have enough options, which is why our most recent agreement with the DOE seeks to build more authentic assessments for additional grades and subjects,” said Michael Mulgrew, president of the UFT in a statement.

RELATED: De Blasio strikes deal with charter schools

Group measures offer an alternative to creating exams for each teacher in every grade and subject, which can lead to a proliferation of new tests, though in New York City teachers have often been judged by both group and individual metrics.

The challenge of evaluating teachers in traditionally untested areas is not unique to New York, and a number of states have embraced group or school-wide approaches. An analysis of 32 states, conducted by Steinberg, found that the average teacher in a non-tested grade or subject had about 7 percent of his or her evaluation based on school-wide achievement measures, though this averaged together substantial variation from place to place. Teachers in Tennessee and Florida have sued (unsuccessfully), arguing that it is unfair to evaluate them based on students they didn’t teach.

A more popular option, used in some districts in New York, has been student-learning objectives, in which teachers set goals for students often based on classroom exams. This approach has been praised for helping teachers set specific goals, but criticized as burdensome and easy to manipulate.

Research has found that using school-wide meas

New data show more than half of NYC teachers judged, in part, by test scores they don’t directly affect

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents