Assessment and Testing Redux, Part I

salvēte, amīcī et sodālēs! If you’re reading this “live” – and if you’re located in the midwestern or southern parts of the United States – you’re probably grateful to have escaped the worst of the winter storm. For those in other parts of the world, and for those who will read this post in future years, I expect you have your own reasons to be grateful. In any case, I’m glad you’re continuing to read, to think, to comment, and to participate in our Joyful Learning Community.

In the next few days after I write this post, you should be able to see a complete Lectiō from Cursus Primus of the Tres Columnae project, and possibly as many as five. More Lectiōnēs will be coming soon, too. As you look at them – especially if you’ve had any experience at all with traditional schools and textbooks – you may have a number of questions! One that a lot of readers will have is, “Where are the tests?” Or perhaps, “what kind of test do you give for something like this?” Or even, perhaps, “How would you keep students from cheating on a test if they’re taking it online, without your direct supervision?” They’re all important questions, especially if you come from the perspective of school-based learning. And we’ll address them … but not in this post! 🙂

My goal today is to consider a hugely important distinction: the one between assessment and testing. Just this morning, on the Latinteach listserv, our colleague “obliquelywadling” made a really good point about what’s required, in a perfect world, for a test to be valid (that is, to measure what it says it’s going to measure) and reliable (that is, to give consistent results:

I once read that for a test to be valid and reliable, it must at the least not have its contents announced in advanced, or its date, and there can be no consequences attached to the tests, which of course must themselves be nameless. If these conditions are met, it is easier to say with a straight face what it is that students have learned or not. When I have tried this, it was very humbling! Otherwise, one may be measuring the effects of wealth, private tutoring, test prep classes, panic, the Pygmalion effect etc.

Of course, obliquelywadling’s points are about tests, rather than assessment in general. What’s the difference? We’ll explore that today, but we might start by saying that all tests are (or at least should be) assessments, but not all assessments are tests. There are lots of other ways to assess besides a formal, pen-and-paper (or computer-administered) test.

And, of course, it’s possible to use a formal test for purposes other than assessment, or measuring learning. Some teachers, for example, use them to fill time on Fridays; others use them to punish students for “being bad” or “not listening to me” or “playing around and wasting our time yesterday.” And, of course, it’s possible to use a test for multiple purposes – for example, as a teacher, you may genuinely want to fill that hour on Friday, or for that matter, you may genuinely want to punish those naughty children for not paying attention! 🙂 But, at the same time, you may also genuinely want to know how your students are doing with the material you’ve been teaching recently.

And, of course, if a test is designed to be an assessment, it clearly needs to be valid (that is, to measure what it says it is measuring) and reliable (that is, to give similar results for different learners, and at different times that it’s given). Any assessment would need to meet these criteria if its results are to be useful! But both validity and reliability are big issues in the testing industry … and so is the bigger question of usefulness! Usefulness … for whom? In other words, who is the primary customer of test results, or of any other assessment data?

Let’s return for a moment to obliquelywadling’s valid and reliable test, which

  • is not announced to the learner in advance;
  • is not described to the learner in advance; and
  • has no consequences for the learner.

If you designed and gave such a test, would you even share the results with the learners? Or would such sharing invalidate future results, or make a future administration unreliable? Even if you did share the results, such a test is clearly not designed for the learner … though it will have an indirect benefit as the teacher uses its results to modify or confirm his or her teaching.

Just a brief rant here: What does it say about the “high-stakes” testing movement that its goals are so diametrically opposed to this definition of validity? And in a quest for reliability, experts in the field of test design often include “equating” questions, repeated from year to year or from one test to another. Of course, if the tests are publicly released (as teachers, parents, administrators, and legislators understandably desire – and sometimes insist in a high-stakes world), reliability can be called into question because the equating questions no longer equate! OK, I’m done with my rant now. 🙂

But for assessments that aren’t tests, and especially for assessments that help learners measure their own progress, it’s possible to maintain both validity and reliability without secrecy. For example, in the world of Tres Columnae, a learner might choose to demonstrate proficiency with a particular set of vocabulary and morphological items by constructing a story. There are other, similar stories out there, and there’s a rubric that was used to assess them. The learner is welcome, even encouraged, to consult the other stories and the rubric. After all, even if you look at these, you’ll still have to make your own story … you’ll just have a better idea of what to do. If you copy an existing story verbatim, we’ll know. After all, we will have a catalog of them … and as I remind my face-to-face students, if you can Google it, I can Google it too! 🙂 But if you have ownership of your learning – and if you want to take ownership of the assessment process, too – you’re quite unlikely to copy someone else’s story in the first place.

quid respondētis, amīcissimī?

  • Who is the primary customer for assessment (and testing) results in your world?
  • Is that who you think should be the primary customer?
  • If not, what changes in assessment practices would need to happen so that your preferred primary customer did have primary ownership?
  • And how does your ideal assessment system compare with what we’re proposing for Tres Columnae?

I don’t want to dismiss testing completely … I’ve given tests for years to my face-to-face students, and I find the results helpful, both for me and for them. We’ll explore one critical purpose of testing (and, to a lesser degree, of all forms of assessment) in our next post. But I do want to find at least one Via Media, at least one “Third Alternative,” between a “high-stakes” approach that’s neither valid nor reliable, on the one hand, and a “touchy-feely” result that’s equally invalid and unreliable, on the other. Tune in next time for more about that, and about an overlooked purpose of testing that, in the beginning, was the reason for external assessments!


  1. I love the attitude that obliquelywadling espouses regarding test. I once had a colleague who did not announce any “quizzes” or “tests” to encourage students to be “semper paratus.” I, myself, thought about doing this once, but the counselors at my school discouraged me, insisting that they would resists such a practice at any level necessary. Because they didn’t do much else, I figured they would have the time AND would do such a thing, so I didn’t pursue it.

    Inspired by the tradition of The University of Georgia’s esteemed Latin professor, R. Robert Harris (and by the cogent persuasion of my Latin colleague, Mr. Stewart Tarvin), I no longer use the words “test” and “quiz.” We refer to such assessments (and all assessments for that matter) as “Chances To Shine.” Occasionally, we have to revisit the idea of “shining” vs. “suffering punishment,” but it has made a palpable difference in the attitude of my students. On their own, the students accumulate nits, the units of luminous intensity equal to one candela per square meter as well as stickers with images of stars wearing sunglasses. With my middle school students, I put on sunglasses a few days before we have a big one in order to anticipate their “brilliance” in a self-fulfilling way. I know, it sounds really hokey, and the kids occasionally mock the idea (which inspires the aforementioned review of the purpose of such an appellation), BUT approaching this from this angle has changed the attitude of nearly all of my students. They have come to understand, in Mr. Tarvin’s words, that I am not “out to get them,” but “out to get them to learn.”

    Chances To Shine come in sections, outlined by the 5 C’s and our state’s dissection of them, and then are recorded into our grading program, each under a different C. When a student gets a progress report, it shows his performance levels in each of the five C (essentially), thus demonstrating the areas in which he shines brightest and, I guess, dullest. It then becomes incumbent upon the student to see out assignments that will encourage her/him to improve in that area, and specifically on the elements of that area in which he/she is the weakest.

    A pen and paper test is a CTS. So is a presentation. A quiz is too. A final exam was called by one of my students a “Chance To Nuclearly Radiate.” Regardless, the appellation focuses the students on demonstrating their understanding in a more positive way and consistent with my ideas of ownership and lifelong learning.

    A few parents complained about my system because they didn’t understand it. Fortunately, I had accumulated enough “capital” that they trusted me nonetheless. Furthermore, I did a comparison between a class that I had taught before under the old system and a class in the new system and the difference was remarkable. I’ll let you decide what the cause of this was. I will say, however, that my attitude towards instruction and evaluation changed because I saw it as my own “Chance To Shine” as well.

    Thank you for following this post, however tangentially related it was. Just my thoughts during my break today.

    • Randy,
      I love the idea of a Chance to Shine! I also like the way you break down your grading system according to the 5 C’s, so that students can see their areas of strength and weakness and, having seen them, take ownership of the process to improve their areas of weakness by seeking out assignments related to them. What a great idea! Applause to you and to your colleague for developing such a system.

      Strange side question: does your colleague Mr. Tarvin have any relatives in Knoxville?

      • As to the side question: I doubt Mr. Tarvin has any relatives in Knoxville. They are diehard Universities of Alabama and Georgia fans would likely excommunicate anyone who moved into the vicinity of Knoxville. 🙂

      • And in turn, the Tarvins I knew in Knoxville would probably excommunicate (or even execute) him and his relatives, since they were diehard Tennessee fans. 🙂 But I’m grateful to him, despite his taste in college football, for the Chance to Shine!

