Media critic Clay Shirky likes to say that “institutions tend to preserve the problem they were designed to solve.” In the on-going debate about standardized testing, it might be useful to flip the usual formulation of the question and focus on what the standardized, multiple choice test was designed to solve and to think about whether there are other solutions to that problem. If we can address that question, the next step–finding a better “institution” to solve the problem–might be much easier.
In the “How We Measure” chapter of Now We See It, I write about the man who, for all practical purposes, invented standardized testing. Frederick J. Kelly took a form of testing that was being discussed at the time and proposed standardizing it in his 1914 doctoral dissertation at Kansas State University. The Kansas Silent Reading test is a One Best Answer choice test or what is also known as a “bubble test.” Here’s the structure, and it will be familiar to anyone who has ever made it through any school system pretty much in the developed and much of the undeveloped world now: You choose the discrete, correct answer from four or five possibilities presented to you and record that answer by correctly penciling in a bubble. Everyone starts at the same time and finishes at the same time. Everyone answers alone. It is cheating to help someone else during the test. The test is taken silently, timed, and the questions are constructed so there is only one right answer. Any person with a correct answer key or, now, any machine programmed to “read” the right answer can mark the test and determine the score. If I live in Boston or Peoria or Biloxi–or, now, in Beijing, Helsinki, or Tokyo–I can take the same test, be graded on the same key, and my test scores on the exact same test can instantly be compared. That structure was invented in 1914 and pretty much persists to the present.
What is the problem that this structure (the persisting “institution” in Clay Shirky’s quote) solves? First, the bubble test solves the problem of scoring variability. Because the same answer is marked right or wrong depending not on the opinion or knowledge of the grader but because the answer key says it is the right answer. Everyone takes the same test. Everyone’s test is graded on the same key. In his 1914 dissertation, Kelly notes the widespread problem of teachers with radically different levels of training. How do I know my child is a “top student” when the person determining that excellence is herself not “top”? The item-response bubble test solves that problem by allowing testers to be measured by a fixed standard. Who administers the test is irrelevant. All test graders use the same answer key. Once we have the scores, we can see patterns of excellence and failure instantly. One class scores brilliantly, another in a nearby town or even in the same school does poorly on exactly the same test. Why?
How we answer the “why?” question is not the problem the bubble test is designed to solve. The bubble test is designed to solve the problem of how we can even begin to talk about educational success or failure by giving us a fixed point of comparison, child to child, classroom to classroom, school to school, district to district, region to region, nation to nation. There are other metrics that do other things, but no system yet developed does a better job of solving this one problem of variability. If the aggregate test scores of my child’s school in Durham, North Carolina, are radically different than the test scores of the class my brother’s child attends in Denver, Colorado, I have to ask what is contributing to the difference? This is no longer about my child and how well my child is doing in school. Suddenly, with such standardization before us, I have to ask larger questions about the school, the teacher, the curriculum, the funding of the school, or the social conditions of the kids attending the school that contribute to either high or low scores when measured against the scores of kids taking the identical test. The variable of the test itself is the same. That gives me the grounds to ask what is the cause of the variability. If we ever want to replace standardized tests as an institution, we can’t just ask the “why” question but have to have an institutionalized form of testing that provides for side-by-side comparisons.
Second, the multiple choice test is efficient to administer and grade. Remember, Kelly created this test in the very important year of 1914. The immigrant population had tripled in the last decade. New school policies made secondary education not just college-prep but a requirement for most kids progressing through the school system at the normal rate (i.e. they couldn’t drop out of school until they were 16 in most states). A World War had changed the labor force so that men were in Europe at the front, women were in factories, and there was an extreme teacher shortage. Not only did that exacerbate the variability problem (with lots of under-qualified people filling in to deal with the teacher shortage), but it also worsened the burden of grading. The bubble test allowed a small group of professional testers to create tests for a given field in a given grade. The test could be reproduced and given to school kids in that discipline (say, math) and grade (say sixth graders) nation wide and, at low cost, anyone could be hired, with quite minimal literacy skills even and no teacher training, to put the grade key over the test page and check off the right answers without even reading the questions. Now, machines read the tests.
No wonder virtually all testers now use this form. The institution of the multiple choice does astonishing well what it was created to do, it solves the two problems it was designed to solve: variability and ease.
Now, what was the test not intended to solve? Just about everything else. Kelly himself recognized the test only addressed “lower-order thinking.” He was aware it was a baseline only. Given the national crisis of over-populated public education and a teacher shortage caused by the War, he was also keenly aware that the test had most utility for quickly processing the test abilities of the masses–not the Harvard-bound student coming out of an elite prep school but what in 1914 were called the “lower orders.” It, in so many ways, was the perfect test for eliminating the variability of teacher standards by cheaply, rapidly and efficiently delivering the same testing scores for anyone–from the nation’s poorest students to its wealthiest, from its worst educated to its best.
Here are some problems the institution of the Kansas Silent Reading or bubble test were not intended to address: higher order thinking, associational thinking, problem solving, collaborative thinking, interdisciplinary thinking, complex analysis, the ability to apply learning to other problems, complexity and causality that do not have one right answer. Here are some other problems the institution of the Kansas Silent Reading or bubble test were not intended to address: creativity, imagination, originality.
And here are some other problems the bubble test was not intended to address: how to motivate learning for all children, how to motivate gifted kids for whom the tests are easy and boring, how to inspire kids who are brilliant but poor at test taking, how to inspire learning in bleak social situations where there seems to be no way out of poverty and where high tests scores are justified as the (impossibly expensive) way to get to college, how to inspire better modes of complex “higher order” thinking in those kids who increasingly have to take test after test after test. And some others, especially extreme since 2002 when No Child Left Behind mandates that schools with “failing scores” are “failing schools” and should be closed down or privatized by 2014 (ironically, the centennial of the bubble test): how do you reward brilliant, inspired teaching when teachers are penalized for not delivering passing scores for “lower order thinking”?
And here is the issue that I pose in Now You See It, the one that keeps me up at night: how do you prepare kids for an increasingly indefinite, rapidly changing job world, in an era of high-speed technological change and global competitiveness, where what is required for success is (I’m quoting the first set of problems the bubble test is not intended to address) is: “intellectual dexterity, higher order thinking, associational thinking, problem solving, collaborative thinking, complex analysis, the ability to apply learning to other problems, complexity and causality that do not have one right answer”
* * * *
- Until we recognize and can address in a better way the issue of variability and the issue of efficiency, we will not be able to get rid of the bubble test, invented in 1914 to solve those precise problems.
- Until we get rid of the bubble test, we will not be able to address any of the urgent issues of real learning and higher order thinking that students and teachers face in 2012.
In specific terms, it is ludicrous and, in some cases, perhaps even hypocritical and downright dishonest to think that we can solve all the additional learning problems simply by getting rid of the 2002 national policy of No Child Left Behind which is based on end-of-grade standardized test scores or the current “Race to the Top” variations which, in some states, makes the results kids obtain on tests the standard by which we measure teacher success. We won’t. Getting rid of the test without finding an equally good but more flexible way of preventing variability and ensuring efficiency. On the other hand, until we have a substitute for the item-response testing institution which solves the problem of variability and efficiency, we don’t even have a chance of addressing the other profound learning issues the multiple choice was never intended to solve. That’s the challenge.
My hope is that by separating out some of the strands of the institutional problem that we can begin to find solutions. We need a far better way to address variability and efficiency. I’ve written about some of those ways in Now You See It, and I’ve been involved in the MacArthur Foundation’s Badges for Lifelong Learning Competition because we’re hoping that, through sponsoring twenty or thirty institutions working with developers to creating badges for recognition, reputation, credit, accreditation, credentialing, and other forms of assessment, that we will learn more about what is possible in the way of complex, nuanced, peer-driven, interactive evaluation that still addresses the two problems standardized multiple-choice testing solve: variability and efficiency. Once we can find a better way to solve variability and efficiency, then we can concentrate on the real purpose of learning. What would be amazing is if we could solve the problems of variability and efficiency with a peer-driven system that actually motivates and rewards real learning. What would be equally amazing is if we could find a system that solves variability and efficiency and, at the same time, supports learning communities (for informal learning), teachers (in the classroom), and workforce trainers (in the workplace) who strive for complex, ongoing, lifelong, connected collaborative learning. (Yes, I know that is a pipe dream but if we don’t dream, we won’t be motivated to change)
Right now, variability and efficiency have become an ends not a means. That’s a disaster. But it is a disaster because the real purpose of learning is not the problem the institution of the bubble test was designed to solve. The bubble test solves the problem of variability and efficiency. The profound problem of education that remains, once the issue of variability and efficiency is solved. If we find a better solution to variability and efficiency than the bubble test, we can then concentrate on the real learning objective of school: how best to prepare our kids to thrive in the life that they will lead once they are no longer in school.