Owning the Grade, Part II

salvēte iterum, amīcī et sodālēs! In today’s first post, I asked:

Is there a role for a single-number “grade” in a Joyful Learning Community? If so, how might this grade be calculated, and what factors might be used in its calculation? Would there be a “penalty for late work” or “points off for incorrect answers”?

In the end, much depends on what you, the community, want from the project. If you think a single-number grade would be valuable and helpful to you, we can provide you with one. If you don’t, we won’t.

In this post we’ll look at why I’m a bit suspicious of single-number grades, but we’ll also look at what I do to create a relatively defensible one for my face-to-face students. If you think Tres Columnae should feature single-number grades (or should provide an option for them), you may be interested in adapting this existing system.

First, though, my suspicions. The more I think about the issues involved, the more suspicious I become of a single-number “grade” as a measure of anything. I prefer to measure separate things separately, just as (for example) the doctor will tell you your heart rate, your blood pressure, and your cholesterol level after your physical exam, rather than putting them together with a mysterious formula and telling you your “health grade” is 81%!

After all, if that’s all you know, what can you do to improve your health? Should you be exercising more, eating less, or watching your salt intake? In the same way, that 83.125 or 78.2 , for all its apparent scientific precision, tells poor X (our student in the previous post) nothing about what she needs to do to improve. “Study harder,” her teachers say, and “Do all of your homework.” But what does X need to study, and how does she need to study it?

Instead, the Tres Columnae assessment system will give specific feedback based on learners’ responses (correct or incorrect) to questions that measure a specific learning goal, and we’ll make the goals clear and accessible to the learners. For example, consider this assessing sequence from our most recent sample story. The relevant text says:

servī cubiculum Cnaeī intrant. “domine, quaesō, surge!” servī exclāmant. “hōra enim prīma est. nōnne hic est tibi prīmus lūdī diēs?”

extrā cubiculum Cnaeī sōrōrēs audiunt et rīdent.

and the question is:

  1. quid faciunt servī?
    1. cubiculum intrant
    2. Cnaeum excitant
    3. Cnaeum laudant
    4. audiunt et rīdent

Correct answer: #1 – it’s right there in the story.  This is a simple comprehension question … or is it?

#2 is sort-of correct, except that Cnaeus seems to be already awake (so they didn’t wake him up) and he certainly didn’t get out of bed (so that meaning for excitāre didn’t happen either). If you chose that one, you might get some pre-written feedback, or you might receive a little survey asking why you chose it. Was it that you were taking a more global view of the story, that you thought excitāre meant something different, or some other reason?

If you chose #3, you probably have an issue with the meaning of the word laudant. The system will probably suggest a vocabulary-review task to you right away right away.

If you chose #4, you may have a vocabulary issue; you may not have read very carefully (after all, that sentence does say sorōrēs audiunt et rīdent); or you may have a more general comprehension issue. We’ll need a pattern of responses before we can be sure, but you’ll probably see a survey somewhat like the one in response to #2.

Regardless of which incorrect answer you chose, the important thing is that you, the learner, see this mistake not as a bad thing you did but as an opportunity to improve. But what’s the best way for us to send that message to our learners consistently? Should we

  • only record correct responses?
  • record the number of attempts before a correct response occurs?
  • record both correct and incorrect responses?
  • record (and somehow celebrate) the number of incorrect responses?

At Tres Columnae, we don’t think we want to be in the grading business, but some of our customers will probably want us to. How should we best respond to those customers? If we do issues “grades,” how should we calculate them, and what factors should we consider? For example, do you want a 100-point system, with all its difficulties, or a rubric-based system? If a rubric, how many levels should there be, and what should be measured? Should the rubricated grades be reported individually, or should they somehow be combined into an “overall” number? And if they’re combined, how? quaesō, amīcī, mihi suādēte.

My personal preference would be for a clearly defined, 4- or 5-point rubric for each major skill; I’d want you, the learner, to assess yourself, and then I’d want to compare your assessment with mine (or that of whoever is your guide or editor as you progress through Tres Columnae). But I’m not sure how to combine the rubric scores into an “overall” score, nor am I exactly sure how to define or describe the levels of performance.

Perhaps it would be best if you, the learner, had some input into the design of the rubric that’s used to assess you; I’ve had good results with student input into rubrics with my face-to-face classes. Although I use a system that returns a 100-point “average grade” with my face-to-face students, I’m actually rather suspicious of such systems as you know. 🙂

As you consider your answers, you may want to know something about that system, which I’ve developed over the years to give my face-to-face students as much ownership as possible of their learning. I explain the whole process in a lengthy document (which I can upload if you’re really interested), but it can be summarized fairly briefly:

  • Two things are measured
    • Your knowledge, skill, and understanding of the material in the curriculum, and
    • Your hard work, effort, and improvement during the term
  • There are 3000 possible “work points” per term; each assignment is worth a different number of points.
  • By district policy, you must accumulate at least 2775 points (92.5% of the 3000) for an “A,” at least 2535 points (84.5%) for a “B,” at least 2295 points (76.5%) for a “C,” and at least 2055 points (68.5%) for a passing grade.
  • Most homework assignments (checked for completion only) are worth 10 points.
  • Vocabulary flashcards (yes, I know! But there are several options!) are worth 50 points.
  • Classwork assignments are worth 10-60 points depending on length. They can earn
    • full credit (if you do the whole thing and/or are actively involved the whole time);
    • half credit (if you are partially involved, but off task more than incidentally); or
    • no credit (if you don’t do them at all).
  • Quizzes are worth 20 points or a multiple of 20 – 1 point per correct response.
  • Extended projects are worth 50 points or a multiple of 50, and are assessed with a rubric. The rubricated score is converted to a number such that level 4 (the top) receives 100% credit and level 1 (below standard, but you did attempt the whole thing) receives 70%, a D- on the district scale.
  • Tests are worth 100 points or a multiple of 100, but there are more than 100 points possible, so you can “play to your strengths.”

One obvious advantage, at least for most students, is that they do feel a sense of ownership and control over their grade, and they know what to do to “get the grade they want.” One obvious disadvantage is that so many different things are being measured in the single number.

Or at least that’s my opinion; what do you all think? et quid dē Tribus Columnīs? How could we – or should we – adapt such a model to this project? Or what should we do instead?

Tune in next time for your comments, some preliminary answers, and a preview of our next big focus: the relationship of Tres Columnae to “big-C” and “small-c” culture.

  1. I really appreciated your careful assessment here of what the wrong answers on the multiple choice question tell us – and most computers are great at giving specific feedback for different wrong answers on the quiz if you set it up that way! I tend to use tests on the idea that the questions are randomized and students take them multiple times, so I don’t reveal answers – but what you have demonstrated here is also a great model, where you construct your distractors very carefully so that the quiz can serve as a diagnostic tool with valuable information for both the students and the teachers.

    The place I’ve ended up with my quizzes is that they come from a random pool, students can take the quizzes multiple times, only their score is revealed on each attempt, and their grade is the average of their repeated attempts. I limit the attempts to 5 simply because I don’t think it is productive for a student to bang away at something more than that – and the reason they are tempted to do that is because they fixate on getting a perfect score, when I take the 80% approach to all my quizzing – if they get an 80% or better, they should feel confident to move on to the next activity. Moving on is a virtue in and of itself, and I’ve definitely found that some of my students who obsess about perfect scores need some help in learning how to say “good job!” and move on, short of a perfect score!

    I tend to treat grading as a kind of checklist of tasks accomplished and time invested. The main thing it seems to me is that students move through that list of tasks at different rates – which is why I have been so happy with the flexibility of the online environment. Juggling the range of student abilities in the classroom environment was something I never felt able to do very well, but online it’s not a problem at all. 🙂

    • Laura,
      I’m glad you liked the assessment of wrong answers. One thing that I personally find difficult with writing multiple-choice questions like this is that it’s so hard to develop good distractors! I know that’s an issue for the test-construction industry in general; of course, that’s one reason why high-stakes tests are so closely guarded, and why state Departments of Education are so reluctant to release old exams. I do like the idea of a large pool of questions, from which only a few appear on each quiz (and, of course, the system can remember which questions it’s used with you, the subscriber, in the past, so you don’t get the same questions over again). One could either set a limit on the number of attempts, as you do, or simply have a large but restricted pool of questions. Even if you’re a perfectionist, you couldn’t answer any more once all the questions had been used! 🙂

      I agree with your emphasis on the virtue of Moving On … having lived the life of a perfectionist (I’m in recovery from perfectionism these days!), it’s a painful and difficult one, and my fellow perfectionists need careful handling and much support as they learn to accept “good enough.” I think of a former student who was in tears over a 108% grade on a test (yes, she cried about a 108, because she’d made a “stupid mistake” and could have made a 110). I’ve seen her since and am happy to report that she’s doing a lot better. 🙂

      And I agree … students progress at different rates. I can make that work in a classroom, but it’s quite difficult if the rates are very different. On the other hand, it’s natural for an online environment. We’ve talked before, I believe, about how painful it is to watch bad face-to-face practices (like lockstep progression through content) be transferred, even more inappropriately, to an online environment. One secondary goal for Tres Columnae, beyond the obvious ones, is to model that type of flexibility – in other words, to be a model for good online curriculum design as well as good online instruction. Please help keep us honest in that regard!

