AI Pokes Another Hole In Standardized Testing

The stories were supposed to capture a new step forward in artificial intelligence. A “Breakthrough for A.I. Technology: Passing an 8th-Grade Science Test,” said the New York Times. “AI Aristo takes science test, emerges multiple-choice superstar,” said TechXPlore. Both stories were talking about Aristo (indicating a child version of Aristotle), a project of Paul Allen’s Allen Institute for Artificial Intelligence, where the headline read, “How to tutor AI from an ‘F’ to an ‘A.’

The occasion for all this excitement is Aristo’s conquest of a big standardized test, answering a convincing 80% of questions correctly on the 12th grade science test and 90% on the 8th grade test. Four years ago, none of the programs that attempted this feat were successful at all.

We see these occasional steps forward greeted with a certain amount of hyperbole (last year the New York Post announced that computers were “beating humans” at reading comprehension), or the time the BBC announced that an AI “had the IQ of a four-year-old child,” but the field still has a very long way to go. And as it tries to get there, it tells us something about the education tasks set for humans.

Wired perhaps best captured the issue in a story headlined “AI Can Pass Standardized Tests—But It Would Fail Preschool.” AI’s still can’t answer open-ended questions, and Aristo was designed strictly to deal with multiple choice, and only within certain parameters. Aristo has problems with questions involving diagrams, charts, or hypotheticals. The program, as Melanie Mitchell at Wired puts it, lacks common sense. Multiple choice questions tend to come with certain cues and “giveaways,” enough that Mitchell found she could just about pass the test with googling, making Aristo marginally “smarter” than a search engine.

These articles are all considering the development, design, and pursuit of artificial intelligence, but I would rather look at what all this says about the standardized tests themselves.

Despite the Post headline, no piece of software actually “comprehends” reading, and Aristo is not ready to be a cybernetic scientist. Or as Mitchell puts it, in a quote I would have mounted on my classroom wall, “We must keep in mind that a high score on a particular data set does not always mean that a machine has actually learned the task its human programmers intended.”

In that quote, we could as easily replace “machine” with “student” and “human programmers” with “teachers.”

What these AI experiments keep proving over and over is that students do not have to possess any knowledge or understanding of the subject matter to be trained to succeed on the tests. The high stakes test that have been the foundation of the education accountability movement clearly do not measure what they purport to measure, as demonstrated by computer software that has zero “academic achievement” and yet scores well on the test.

If actual academic knowledge and understanding is not a prerequisite for a good score on the test, then what does the big standardized test actually measure? And is there anything be gained by pushing–and measuring–students to be more like software that doesn’t know much except how to figure out the correct answer on a multiple choice test?

Follow me on Twitter.

I spent 39 years as a high school English teacher, looking at how hot new reform policies affect the classroom.