Testing, Testing, I

salvēte, amīcī et sodālēs! Today’s post, as you’ve probably guessed, is about testing in several senses of the word. Monday marks the end of our first reporting period in my face-to-face teaching world, so it’s an appropriate time to pause and give students (and teachers) an opportunity to see how well they’ve done with the important Knowledge, Skills, and Understandings presented in each class during those first 4 ½ weeks of school. For many of my colleagues, this means they “have to give a test” – or in some cases, several tests.

For some reason, we teachers often have love-hate relationships with tests … even with the tests we ourselves write and administer to our students. I think of a former colleague I’ll call Mrs. Y … as in, “Y R U doing this?” I once had a conversation with her at the photocopier that went something like this:

Mrs. Y:  I’m so disgusted and angry. [sadly, Mrs. Y often opened conversations like that.  She was clearly a happy, positive person who loved her life, wasn’t she? :-)]

Me:  What’s wrong? [sadly, I had not yet learned to avoid negative people who wanted to vent.  I also had not yet learned that some people genuinely want to be unhappy, and sometimes you should just get out of their way and let them.]

Mrs. Y:  Well, I have to give them a test today, and they’re all going to fail.

Me:   You know they’re all going to fail?

Mrs. Y:  Yes.

Me: So, why are you giving the test today?

Of course, I’ve also been like Mrs. Y a few times … that is, I’ve certainly given the occasional test for which I thought some of my students weren’t quite ready. Perhaps I just wanted a quiet, peaceful day, or I wanted to “send a message” to my students that they needed to do their work. But I think Mrs. Y genuinely felt that she “had to” give that test that day, even though she already knew (and admitted) that none of her students were prepared to do well on the test. I hope I haven’t ever done that!

Back before the advent of state-created tests for “core” high-school subjects, another former colleague, long retired, used to give three tests in a row on the last three days of school. First she gave a “nine-weeks test,” which included all the new concepts from the fourth grading period. The next day, without going over the answers or doing any additional work with her students, she gave a “semester exam,” which included all the concepts from the second half of the course. And then, of course, she gave a cumulative “final exam,” which included everything. When I asked her about the logic for this process, she claimed that her students “needed” the three tests in a row “to help them review what they learned.” Of course, they never saw their scores on the previous tests before they took the new ones, so I’m not sure how much help the tests actually provided; they did have the advantage of keeping her students quiet and busy at a time when they might otherwise have been a bit boisterous. Perhaps that was the real point of the three tests in a row?

Ironically, some recent research summarized in this New York Times article partly supported my former colleague’s commitment to testing in this way. It seems that practice tests actually do increase retention, at least of knowledge-level information, and it turns out that practice opportunities involving multiple skills and concepts work even better than those that focus only on a single skill or procedure. I think I owe Mrs. X an apology for some of the uncharitable thoughts that crossed my mind two decades ago!

On the other hand, does a test always have to be a test? In other words, what is the proper place of large, written, individually-administered assessments in a given teaching-and-learning environment? I doubt that there’s a single right answer to that question – so much depends on the needs and preferences of the school, the teacher, the students, and their families, not to mention the structure of the class itself and of the academic discipline involved.

When I was a young teacher, I was a firm believer in “tests at the end of every chapter.” In the course of a reporting period like the one we just finished, when my Latin I students usually work with 4-6 chapters of their “Big Three” reading-method Latin textbook, I would have given three or four “big” tests, lots of smaller quizzes, and a “huge” end-of-reporting-period cumulative test. We also would have done a complicated test-correction procedures for each “big” test, and we’d start the new reporting period by repeating that procedure for the “huge” cumulative exam. That process worked well for my students for a long time – they especially liked the fact that the test wasn’t the end of the learning, and that they could actually learn from their mistakes and shortcomings through the correction process.

These days, I still give a couple of “big” tests each reporting period, and we still follow the correction process, which I can describe in more detail next week if you lectōrēs fidēlissimī are interested. I also give a midterm and a final examination, which are required by the school and the district. At the end of a reporting period like this one, though, I find it a lot more helpful to use an interactive and dynamic summative task rather than a static and written one for several reasons:

First, my students don’t do their best work when they’re overwhelmed and exhausted … and many of them are overwhelmed and exhausted at the end of the first reporting period. Their other teachers tend to “pile on” projects, tests, and other large tasks at the end of the reporting period, and many of them also have jobs, significant family responsibilities, and other non-school commitments that take a significant amount of time and energy. Why give a test for test’s sake that doesn’t accurately measure what they know and can do?

Second, at this early point in the course I’m as interested in the process my students use as I am in the final product. When they’re constructing Latin sentences, I want to know what they’re thinking about – are they choosing words randomly? Do they understand the connection between a given noun or verb form and its function? And when they’re reading for comprehension, I want to know how comfortable they are with the vocabulary of a passage, with the structure of a sentence, and with the relationship, say, between a question I’ve asked and the text where the answer can be found. A typical test will show me the product of students’ thoughts, but it won’t show me the process. I’d really like to be able to look into their heads as they’re producing the product …and I’ve finally found a way to do something like that. To keep this post from getting too long, I’ll tell you all about it on Monday!

quid respondētis, amīcī?

  • What do you think about testing and learning?
  • How did you respond to that article I mentioned earlier?
  • What’s the proper role of testing in your face-to-face teaching-and-learning world?

Tune in next time, when we’ll explore my alternative process and product in more detail. intereā, grātiās maximās omnibus iam legentibus et respondentibus.

Assessment and Testing Redux, Part I

salvēte, amīcī et sodālēs! If you’re reading this “live” – and if you’re located in the midwestern or southern parts of the United States – you’re probably grateful to have escaped the worst of the winter storm. For those in other parts of the world, and for those who will read this post in future years, I expect you have your own reasons to be grateful. In any case, I’m glad you’re continuing to read, to think, to comment, and to participate in our Joyful Learning Community.

In the next few days after I write this post, you should be able to see a complete Lectiō from Cursus Primus of the Tres Columnae project, and possibly as many as five. More Lectiōnēs will be coming soon, too. As you look at them – especially if you’ve had any experience at all with traditional schools and textbooks – you may have a number of questions! One that a lot of readers will have is, “Where are the tests?” Or perhaps, “what kind of test do you give for something like this?” Or even, perhaps, “How would you keep students from cheating on a test if they’re taking it online, without your direct supervision?” They’re all important questions, especially if you come from the perspective of school-based learning. And we’ll address them … but not in this post! 🙂

My goal today is to consider a hugely important distinction: the one between assessment and testing. Just this morning, on the Latinteach listserv, our colleague “obliquelywadling” made a really good point about what’s required, in a perfect world, for a test to be valid (that is, to measure what it says it’s going to measure) and reliable (that is, to give consistent results:

I once read that for a test to be valid and reliable, it must at the least not have its contents announced in advanced, or its date, and there can be no consequences attached to the tests, which of course must themselves be nameless. If these conditions are met, it is easier to say with a straight face what it is that students have learned or not. When I have tried this, it was very humbling! Otherwise, one may be measuring the effects of wealth, private tutoring, test prep classes, panic, the Pygmalion effect etc.

Of course, obliquelywadling’s points are about tests, rather than assessment in general. What’s the difference? We’ll explore that today, but we might start by saying that all tests are (or at least should be) assessments, but not all assessments are tests. There are lots of other ways to assess besides a formal, pen-and-paper (or computer-administered) test.

And, of course, it’s possible to use a formal test for purposes other than assessment, or measuring learning. Some teachers, for example, use them to fill time on Fridays; others use them to punish students for “being bad” or “not listening to me” or “playing around and wasting our time yesterday.” And, of course, it’s possible to use a test for multiple purposes – for example, as a teacher, you may genuinely want to fill that hour on Friday, or for that matter, you may genuinely want to punish those naughty children for not paying attention! 🙂 But, at the same time, you may also genuinely want to know how your students are doing with the material you’ve been teaching recently.

And, of course, if a test is designed to be an assessment, it clearly needs to be valid (that is, to measure what it says it is measuring) and reliable (that is, to give similar results for different learners, and at different times that it’s given). Any assessment would need to meet these criteria if its results are to be useful! But both validity and reliability are big issues in the testing industry … and so is the bigger question of usefulness! Usefulness … for whom? In other words, who is the primary customer of test results, or of any other assessment data?

Let’s return for a moment to obliquelywadling’s valid and reliable test, which

  • is not announced to the learner in advance;
  • is not described to the learner in advance; and
  • has no consequences for the learner.

If you designed and gave such a test, would you even share the results with the learners? Or would such sharing invalidate future results, or make a future administration unreliable? Even if you did share the results, such a test is clearly not designed for the learner … though it will have an indirect benefit as the teacher uses its results to modify or confirm his or her teaching.

Just a brief rant here: What does it say about the “high-stakes” testing movement that its goals are so diametrically opposed to this definition of validity? And in a quest for reliability, experts in the field of test design often include “equating” questions, repeated from year to year or from one test to another. Of course, if the tests are publicly released (as teachers, parents, administrators, and legislators understandably desire – and sometimes insist in a high-stakes world), reliability can be called into question because the equating questions no longer equate! OK, I’m done with my rant now. 🙂

But for assessments that aren’t tests, and especially for assessments that help learners measure their own progress, it’s possible to maintain both validity and reliability without secrecy. For example, in the world of Tres Columnae, a learner might choose to demonstrate proficiency with a particular set of vocabulary and morphological items by constructing a story. There are other, similar stories out there, and there’s a rubric that was used to assess them. The learner is welcome, even encouraged, to consult the other stories and the rubric. After all, even if you look at these, you’ll still have to make your own story … you’ll just have a better idea of what to do. If you copy an existing story verbatim, we’ll know. After all, we will have a catalog of them … and as I remind my face-to-face students, if you can Google it, I can Google it too! 🙂 But if you have ownership of your learning – and if you want to take ownership of the assessment process, too – you’re quite unlikely to copy someone else’s story in the first place.

quid respondētis, amīcissimī?

  • Who is the primary customer for assessment (and testing) results in your world?
  • Is that who you think should be the primary customer?
  • If not, what changes in assessment practices would need to happen so that your preferred primary customer did have primary ownership?
  • And how does your ideal assessment system compare with what we’re proposing for Tres Columnae?

I don’t want to dismiss testing completely … I’ve given tests for years to my face-to-face students, and I find the results helpful, both for me and for them. We’ll explore one critical purpose of testing (and, to a lesser degree, of all forms of assessment) in our next post. But I do want to find at least one Via Media, at least one “Third Alternative,” between a “high-stakes” approach that’s neither valid nor reliable, on the one hand, and a “touchy-feely” result that’s equally invalid and unreliable, on the other. Tune in next time for more about that, and about an overlooked purpose of testing that, in the beginning, was the reason for external assessments!