COMPARATIVE JUDGMENT AND ADAPTIVE COMPARATIVE JUDGMENT: DO THEY HAVE A FUTURE IN PEER ASSESSMENT?

OPERATION OUTBREAK: AN INFECTIOUS WAY TO TEACH AND LEARN

HOW TO MAKE LEARNING STRATEGY ASSESSMENTS MORE STUDENT-CENTERED

Published by Reimagine Education

Introduction

The ninety-year-old ‘Law of Comparative Judgment’ is the origin of one of the latest innovations in educational assessment. What is (adaptive) comparative judgment and what can it contribute to peer assessment practices? We review some of the literature and comment on the reliability of this method and its potential as an assessment tool for peer-to-peer assessment.

Nuria Lopez
View Bio

Rubrics and guiding questions are the most common methods used to provide guidance to students engaging in peer review activities. Both tools are effective in helping students familiarize themselves with the assessment criteria and increase their capability to produce accurate evaluations of their peers’ work. However, rubrics and guiding questions seem to have limited efficiency in the assessment of higher-order thinking skills such as analysis, or constructs such as creativity and conceptual understanding. The quality of these skills can be reflected in a range of ways, which makes it difficult to create rubrics or questions that incorporate all possible manifestations of high quality in a performance or piece of work. In the last few years, Comparative Judgment (CJ), which consists of assessing the overall excellence of two pieces of work, and decides which one is of better quality, has been gaining traction as a more appropriate assessment method. It is being used as an alternative to rubrics in the evaluation of assignments such as writing tasks, design and visual arts products, technology portfolios and conceptual understanding of mathematical ideas. CJ is an example of holistic assessment, as assessors provide an overall evaluative judgement of a “product”, based on their shared consensus of what constitutes quality in the specific type of product being assessed (Pollitt, 2011).

CJ originated in Louis L. Thurstone’s Law of Comparative Judgment (1927), which postulated that, when comparing two phenomena or products of the same category, a person is able to assign a “value” to each one of them and subsequently conclude which of the two is of better quality. The premise is that we are better at producing judgment when comparing two items than when evaluating one item in isolation; a repetition of this comparative evaluation with different pairs of items would result in a quality-related ranking of all the items assessed.

Since the 1990s, Alastair Pollitt and other researchers have developed Thurstone’s ideas to drive the implementation of CJ in educational assessment. Technology has facilitated the process by making it possible to complete evaluations online. Furthermore, the introduction of a web-based adaptive system has increased the reliability of this method and consequently enhanced its potential applications in education. In Adaptive Comparative Judgment (ACJ), a first comparison is made between two randomly-assigned items. The system then adapts the selection of the next items to be compared, so that subsequent comparisons are made between pairs of items that have been ranked similarly before. Because most comparisons are made between pairs of items of very similar quality, the robustness of the method is increased and it becomes possible “to generate extremely reliable results for educational assessment” (Pollitt, 2012).

While initially ACJ was mostly used in teacher-provided evaluations, in recent years it has also been tried in peer assessment exercises, opening new possibilities for students to assess and provide feedback to each other. Seery, Canty and Phelan (2012) implemented the ACJ method in a peer assessment exercise in two courses of a teacher education program at the University of Limerick (Ireland). 137 students in the Materials and Engineering Technology and Materials and Construction Technology courses completed an assignment that required them to use wood and metal decorative processing techniques to produce two artefacts, a pictorial scene of a strong emotion, and a flower reflecting that same feeling or emotion.

The researchers felt that, because of the many variables that could demonstrate excellence in design and technology, an assessment rubric would not be appropriate to evaluate design-related competencies in this assignment. For this reason, an assessment rubric was not provided to students. Instead, students were encouraged to identify the key factors in their design and produce for their peers (assessors) a portfolio with evidence of their capability and their design journey, through text, pictures or videos. Students could link these pieces of evidence to one of three criteria: having, growing or proving an idea, so that they could reflect on their design process and their peers could also understand their process more easily when assessing the portfolios.

Applying ACJ, students evaluated their peers’ portfolios in pairs, deciding in each case which one demonstrated better quality. A rank order of portfolios was produced after the comparisons were completed. The validity of the ACJ method was tested by checking if the elements identified as proof of quality in the student-generated ranking of portfolios coincided with the ones identified by the module leaders. The results showed a “significant correlation between the students’ ranking of portfolios and the module leader’s evaluation of capability” (Seery, Canty and Phelan, 2012). A further positive result of the study was the students’ satisfaction with being able to define the criteria that would be applicable to their designs through their portfolios, instead of being assessed by a predefined assessment rubric.

Jones and Alcock (2014) conducted a peer assessment exercise in a first-year undergraduate calculus course at Loughborough University (UK). Students used CJ to assess the conceptual understanding of limits, continuity and partial derivatives. The assignment question was given to students six days before the test took place so that they could prepare their responses. Students were also provided with a set of criteria to help them prepare their submission. After submitting their work online, students were presented with pairs of scripts produced by other students and asked to decide which script showed a better conceptual understanding. They completed twenty pair evaluations.

In order to test the validity of the CJ method, the peer assessment results were compared to the independent evaluations of a group of experts (postgraduate students and mathematic lecturers). The comparison between experts and students’ evaluations showed a higher than expected inter-rater reliability. The researchers also tested whether CJ could be an alternative to assessment criteria as a marking tool and whether it proved effective in assessing conceptual understanding. They concluded not only that CJ was a viable alternative to assessment criteria when assessing conceptual understanding in this particular assignment, but also that CJ worked efficiently as an assessment method precisely because the evaluation did not depend on applying specific assessment criteria. However, the researchers also acknowledged that the set of criteria given to students for preparation might have implicitly influenced some of the peer evaluations.

One year after Jones and Alcock’s study, Jones and Wheadon (2015) tested whether the results obtained in the undergraduate calculus course could be replicated to assess secondary school students’ understanding of fractions. In this study, researchers specifically tested whether there were differences between the use of absolute and comparative judgement when students assessed the work of their peers. A total of 157 students from three different English schools (two urban schools in central England and one rural school in the south of England) assessed the work produced by students in another English school. Student assessors were divided into two groups; one used absolute judgement and the other used comparative judgement. Assessment criteria were not used by any of the groups. The comparative judgement group completed their evaluation through the No More Marking website. For each pair of responses, they selected the one that according to them showed a better understanding of fractions. The absolute judgment group assessed responses individually by rating them from 0 to 100.

The evaluations from the comparative and absolute group were compared and all of them were also contrasted with experts’ evaluations. The results showed high inter-rater reliability and validity in the case of comparative judgement and poor in the case of absolute judgement. The researchers concluded that comparative judgement can be an effective method of assessment for open-ended assignments that can result in a wide range of responses that cannot be anticipated in rubrics. They explained that “there may be contexts in which the aims of a peer assessment exercise are best served using global judgements without criteria” (Jones and Wheadon, 2015).

Seery et al. (2016) tested the reliability of ACJ to peer assess graphical capability by comparing the results obtained with this method with the results obtained with criteria-based assessment. A total of 128 third-year undergraduate students in an Initial Technology Teacher Education program at the University of Limerick (Ireland) participated in this study. Students had to design and model a personal device or artefact that could enhance the quality of life for an elderly person. They were not provided with specific criteria for the preparation of their assignment, but they were asked to complete a portfolio with evidence of their graphical capability. Once the task was completed, students assessed their peers’ portfolios using ACJ first and a criteria-based assessment second. Some of the criteria used to assess the portfolios were: creativity (how innovative or creative the design solution was), stages (how well defined the stages of the design approach were) and functions (how appropriate the selected functions were for the stage in which they were used).

The results showed a very close correlation between the ACJ rank position of the portfolios and the average scores for each criteria. The researchers concluded that ACJ seems to have the “capacity to validly measure the construct of graphical capability” (Seery et al., 2016). They also identified that additional variables seemed to influence the holistic judgement and contribute to the ACJ ranking, which led them to posit that the criteria list was “omitting critical elements associated with the task”. This seems to confirm that holistic assessment might be more appropriate than criteria for the evaluation of graphical capability because it is a method that rewards capacity “despite the inherent difficulty in the explicit observation of criteria” (Seery et al., 2016).

One of the most recent studies on ACJ has been conducted by Bartholomew, Strimel and Yoshikawa (2019). Unlike the studies previously mentioned, primarily focused on reliability, this study also analyzed the potential of ACJ as a formative and feedback tool in peer assessment. A total of 130 students between the ages of twelve and thirteen years from a school in the Midwestern United States participated in the peer review of a design project. The task consisted of producing a travel brochure about one Southeast Asian country designed specifically for one of five family profiles provided. The peer assessment exercise took place at the midpoint of the preparation period so that feedback could be incorporated in the final submission.

Students were divided into two groups: the experimental group used the ACJ tool CompareAssess to assess their peers’ brochures and the control group participated in a peer-sharing activity where students discussed their work with their classmates. The students using ACJ produced a holistic assessment based on the assignment description and a rubric that mainly referred to the content the brochure should include (e.g. summary of history of the country mentioning colonization and independence). They also provided anonymous feedback that could clarify their holistic judgement and serve as suggestions for improvement for their peers. This feedback was thematically analyzed by researchers, who used codes to classify the feedback in specific categories such assessments of visuals, layout or format. The exercise was repeated once again after the final submission of the task, this time without feedback being required, and the experimental group students completed a questionnaire about their experience with ACJ. The classroom teacher also used the aforementioned rubric to grade the work of all students.

This study revealed some promising results about ACJ as a formative tool. Students who participated in the ACJ assessment in the midpoint of the task performed significantly better than students who did not. Furthermore, 72.7% of the students responded that the feedback provided together with the holistic evaluation was helpful and 89.1% said they had made changes based on the feedback they received. 87.3% of the students responded that the ACJ process had been “somewhat helpful” or “definitely helpful”. Students reported to have benefited from “exposure to new products, ideas, and approaches, opportunities and learning related to providing and receiving feedback, and learning from the identification of positive and negative aspects of each item displayed” (Bartholomew, Strimel and Yoshikawa, 2019). However, it is interesting to note that the study also showed inconsistencies between what students reported as influential factors in their ACJ assessment and what the thematic analysis of their feedback reflected. Although students reported that alignment with the assignment criteria had been the most influential factor in their decisions during the ACJ process, the analysis of their feedback shows that most of their comments referred to aesthetics.

The results obtained in these studies are positive. They have confirmed the high reliability of the CJ and ACJ methods as well as their efficiency in assessing skills or constructs whose high quality can be manifested in a variety of ways, and therefore cannot be easily anticipated in predefined rubrics or questions. Researchers point out that, for the specific assignments analyzed in these studies, assessment criteria would be insufficient in capturing all the possible evidence of quality, and therefore a holistic assessment was more appropriate. However, it must be noted that the use of holistic assessment does not seem to necessarily mean that criteria are completely absent from the assessment process. In two of the studies, criteria are defined by the students themselves through their portfolios; in another case, criteria are given to students for the preparation of the assignment: and in another, criteria are used to guide the provision of the holistic assessment. Therefore, it can be concluded that instead of omitting assessment criteria completely, it is perhaps more useful to change how it is created (student-created instead of teacher-provided) and used (as a guide instead of as a restrictive and prescriptive list of qualities to be found in the assessed item).

Technological tools such as RM Compare and No More Marking are facilitating the implementation of CJ and ACJ online and will surely contribute to its expansion. Professor Scott R. Bartholomew, from Purdue University (USA), describes the benefits of ACJ in this video: https://rmresults.com/digital-assessment-solutions/rmcompare. Daisy Christodoulou, Director of Education at No More Marking and a well-known advocate for CJ in the UK, talks about CJ in this video:

To return to the question that has served as the title of this article, Comparative Judgment and Adaptive Comparative Judgment: do they have a future in peer assessment?, the answer is “most likely”. It can be expected that CJ and ACJ will become more and more noticeable in the field of educational assessment. They have the potential to improve the efficiency of peer-to-peer assessment to evaluate a wide range of tasks. The use of a holistic approach in assessment can also open doors for a more student-centered and flexible assessment process. Consequently, the development of CJ and ACJ in the coming years should prove compelling to educational professionals interested in innovative pedagogical practice.

References

Bartholomew, S.R., Strimel, G.J., Yoshikawa, E. (2019) “Using adaptive comparative judgement for student formative feedback and learning during a middle school design project”, International Journal of Technology and Design Education, 29, 363-385

Jones, I. and Alcock, A. (2014) “Peer assessment without assessment criteria”, Studies in Higher Education, 39:10, 1774-1787

Jones, I. and Wheadon, C. (2015) “Peer assessment using comparative and absolute judgement”, Studies in Educational Evaluation, 47, 93-101

No More Marking https://www.nomoremarking.com/

Pollitt, A. (2011) “Comparative judgement for assessment”, International Journal of Technology and Design Education, 22, 157-170

Pollitt, A. (2012) “The method of adaptive comparative judgement”, Assessment in Education: Principles, Policy & Practice, 19:3, 281-300

RM Compare https://rmresults.com/digital-assessment-solutions/rmcompare

Seery, N., Canty, D. and Phelan, P. (2012) “The validity and value of peer assessment using adaptive comparative judgement in design driven practical education”, International Journal of Technology and Design Education, 22, 205-226

Seery, N., Buckley, J., Doyle, A. and Canty, D. (2016) “The validity and reliability of adaptive comparative judgements in the assessment of graphical capability”, Conference paper, 71^st ASEE Engineering Design Graphics Division Midyear Conference, Daniel Webster College, New Hampshire

Thurstone, L.L. (1927) “A law of comparative judgment”, Psychology Review, 34, 273-286

Nuria Lopez

Nuria Lopez works as Research Assistant for Blended Learning at Copenhagen Business School. She has a PhD in English and twenty years’ experience teaching languages and academic writing in higher education. Her research focuses on pedagogical issues and is particularly interested in finding ways to link pedagogical research with classroom practices.