New York State Tests Are A “BAD SCALE”
“[We PS 321] teachers and administration are truly devastated by what a terrible test it was and how little it will tell us about our students… We have never seen an ELA exam that does a worse job of testing reading comprehension. There was inappropriate content, many highly ambiguous questions, and a focus on structure rather than meaning of passages. Our teachers and administrators feel that this test is an insult to the profession of teaching and that students’ scores on it will not correlate with their reading ability.” —PS 321 Principal Liz Phillips, regarding the 2014 NYS ELA exams
Standardized tests can be helpful tools when used properly. A good standardized test is a reliable, steady gauge. Think of a scale. A scale, to be effective, must weigh a pound the same way each time. A scale that tells you that you weigh 150 pounds one moment but a day later says you weigh 130 pounds or 170 pounds is not a useful scale.
Here is a snapshot of test scores on the NAEP math exam:
The NAEP is a well-established, national test created by educators and used since in 1969. We don’t hear a lot about the NAEP because, frankly, it’s a pretty boring test. It doesn’t make kids cry. It’s not used to evaluated teachers. It doesn’t make our schools to do loads of test prep.
Scores on the NAEP are fairly steady but have gone up slightly since 2003. The test is pretty reliable, and researchers can use the results to study how different populations (broken down by race, income, gender, region, state, etc.) perform.
Here are results of the NYS math test over the same period for comparison:
Notice the dramatic swings over time. In New York State, the “cut scores” for these tests are not based on some objective standard — they are determined after students take the test. One year, a 63% was needed to pass. Another year, students had to score 87% in order to pass.
The reason? Politics.
Notice how NYS test scores rose significantly starting around 2003. In 2001, the George W. Bush administration implemented No Child Left Behind, a federal law that did the following:
- Required all students in all schools to be proficient by 2014. (Educators agree that 100% proficiency is an impossible standard: it’s like demanding that everyone be above average.)
- No Child Left Behind did not define proficiency, so states had an incentive to lower their standards; if they didn’t, they could risk losing federal funding or other punitive
- No Child Left Behind required states to have students take annual tests from grades 3 thru 8.
Under “No Child Left Behind,” New York State lowered proficiency standards in order to avoid the potential loss of federal funding. Test scores thus went up. But things changed in 2009.
In 2009, Bush’s policy was updated with Obama’s “Race to the Top.” Obama’s policy built on Bush’s. It recognized the problem with the 100% proficiency standard and allowed states to get a waiver on meeting that goal. But in order to receive that waiver states needed to do the following:
- States must implement Common Core Standards, created by the Gates Foundation
- Teachers must be evaluated by standardized test scores
- States must favor policies encouraging school choice: charters, vouchers
- States must build data-tracking systems
Each of these bullet points has contributed to radical changes in schools, with many unintended consequences. But our point here is simply to show that test scores are the result of political wrangling. Once the pressure to show 100% proficiency was removed in New York State, the tests became harder and scores dropped a great deal. The low-passing rates then became part of a media narrative about failing teachers and failing schools.
When politicians attack high-stakes to standardized tests those tests cease to become useful for educational purposes. This is known as Goodheart’s Law: “When a measure becomes a target, it ceases to be good measure.”
- For a summary of how Goodhart’s Law plays out in regard to school testing, this editorial by a co-author of the National Academy of Sciences debunks test-based accountability.