Comments on: PROOF POINTS: AI essay grading is already as ‘good as an overburdened’ teacher, but researchers say it needs more work https://hechingerreport.org/proof-points-ai-essay-grading/ Covering Innovation & Inequality in Education Mon, 10 Jun 2024 15:12:17 +0000 hourly 1 By: Matthew S. Johnson and Mo Zhang https://hechingerreport.org/proof-points-ai-essay-grading/comment-page-1/#comment-69828 Mon, 10 Jun 2024 15:12:17 +0000 https://hechingerreport.org/?p=101011#comment-69828 Dear Jill,
We appreciated your insightful article, “PROOF POINTS: AI essay grading is already as ‘good as an overburdened’ teacher, but researchers say it needs more work.” While we agree that the potential of AI to ease the grading burden and improve writing instruction is promising, a number of ethical issues must be addressed before we allow AI into our grading practices.

Your article mentions that AI-powered tools, specifically ChatGPT, are not yet accurate enough to be used on high-stakes tests or essays that would affect final grades.

We argue that AI accuracy is only one component of its overall suitability for grading. While accuracy is important in ensuring educational impact and integrity, fairness, bias mitigation, transparency, and explainability are equally crucial. In fact, fairness is a foundational principle of responsible AI and a key standard of educational testing.

As scientists, it is our power and responsibility to draw attention not just to the promise of AI but also to its numerous potential biases. For example, language differences or cul-tural references in student writing could lead to biased scoring, disadvantaging certain groups. AI systems in education must identify, reduce, and eliminate biases to create an inclusive environment. This involves carefully selecting training data and ensuring the AI evaluates diverse student backgrounds and abilities.

In your article, you mention a study by Dr.Tamara Tate, a researcher at the University of California, which compared how ChatGPT stacked up against humans in scoring essays written by middle and high school students. At ETS Research Institute, we conducted an experiment using the same dataset as Dr.Tate to evaluate GPT-4o’s fairness in over 12,000 essays. On average, the scores generated by ChatGPT were 0.9 points lower than human ratings and matched human scores exactly only 30% of the time. Notably, essays by Asian/Pacific Islander students received significantly lower scores from the AI compared to human raters, revealing a bias that needs addressing.

Understanding how AI makes scoring decisions and why it disadvantages certain populations remains a significant challenge, even for its developers. For instance, we found that GPT-4o could predict the race/ethnicity of essay writers more accurately than scoring essays. This suggests that the features it uses to predict race/ethnicity may also influence its scoring, contributing to fairness issues.

As we integrate AI into education, it is our collective responsibility to ensure these technologies are used ethically and effectively. Numerous agencies, such as NIST, UNESCO, and OECD, have published guidance on the responsible use of AI in education. At ETS Research Institute, we have synthesized these broad guidelines to develop principles for the responsible use of AI in assessments. Unique to educational testing, our principles include:
• Fairness and bias mitigation
• Privacy & security
• Transparency, explain ability, and accountability
• Educational impact & integrity
• Continuous improvement

Only by prioritizing fairness over hype, integrity over cost-saving, and educational impact over convenience can we create a more inclusive, reliable, and effective educational environment that truly benefits all students and educators.

We hope our perspective contributes to the ongoing dialogue about AI’s role in education.

Warm regards,

Matthew S. Johnson and Mo Zhang
ETS Research Institute

Matt Johnson is a principal research director at ETS Research Institute and a leading author of ETS Principles on the Responsible Use of AI in Assessment (ETS Research Institute, 2024). His research focuses on statistical methods in education and psychology, with a primary focus on item response theory and related models.

Mo Zhang is a senior research scientist at ETS Research Institute. She specializes in writing research, automated scoring of constructed-response items, and performance-based assessment design and validation. She currently holds two U.S. patents and has published extensively in the field of educational measurement.

]]>
By: Deron Marvin https://hechingerreport.org/proof-points-ai-essay-grading/comment-page-1/#comment-69246 Tue, 28 May 2024 17:31:14 +0000 https://hechingerreport.org/?p=101011#comment-69246 RE: PROOF POINTS: AI essay grading is already as ‘good as an overburdened’ teacher, but researchers say it needs more work.

Schools and educators are being subjected to the latest contrivance that promises a reduction to the most drudgerous aspects of being a teacher. Often, educators are swept into a tyranny of efficiency for which tech companies repeatedly pitch to the unsuspicious. The latest? AI-enabled “personal teaching assistants”, which appear as add-ons for many school-wide information systems (including curriculum planning and reporting software and the like). These AI features claim they can eliminate or lessen the need to work through the following: Writing student reports, collaborating on lesson planning, developing personalized learning for students, analyzing data, and creating assessments . . . and now, grading papers. These conveniences are the first step in the relentless march to dehumanizing education.


At the very core of our educational endeavor is the necessity of highly trained teachers who retain the knowledge, skills, and talents to be effective. And, most importantly, teachers must absolutely possess deep human qualities, which is portrayed as one who is passionately committed to the care and development of children (Teachers spark inspiration, turn lives around, and ratify individual student needs, repeatedly). To truly “know thy learner” a teacher must be 100 percent engaged in their students’ learning growth. That undertaking carries a litany of responsibilities, which may include mustering through observational notes, grappling with a colleague about a student’s learning, listening intently to a student’s ideas, and writing assessments with an understanding of who the students are, as learners. There are no facets of the teacher/student relationship that ought to be denuded into rudimentary and impersonal forms. Conclusively, education is principally grounded in humanism.



Let us unfold what is happening now that we are allowing those AI-enabled components to be tested by teachers. The sales promotion, remember, is that AI will trim those burdensome aspects of teaching. For example, one company says we should use the “teaching assistant,” as it will save hours of having to write those arduous report card comments about students’ achievement. One even has a magic wand on their app that “generates a hyper-personalized comment” for progress reports. Teachers simply click on the icon to abolish the need to aggregate and grapple through the writing of a personal passage. Once the comment is written by AI, the teachers can then simply click a button to configure the appropriate voice and tone for the message: “firm” or “witty” or “serious,” to name just a few.

As a school leader, I am familiar with the onset of complaints from teachers around reporting time. I deflect complaints unabashedly knowing that the difficult process of reporting on learning results in a deeper understanding of each student. Before AI, teachers would not have fathomed having a friend or even a family member write our comments for them. Why, then, would we outsource the task to a bot, which is several shades away from the teacher, who is accountable and responsible for personally knowing their learners?

The rollout of ChatGPT sure scared educators. The fear was that students would never learn to compose even a paragraph if they simply reverted to ChatGPT to do it for them. It was a legitimate worry that was solved by ensuring teachers assigned work differently, and more importantly, knew their students as learners . . . and knew them well. Now a form of ChatGPT (in the shape of AI-enabled software) will allow teachers to avoid their own annoyances with writing—yes, it can even be used to write messages and letters to parents. As a parent, I would be quite disappointed to know that my child’s teacher couldn’t be bothered to write a personalized comment about my child’s learning. Or that those “witty” letters home did not represent the quiet and kind teacher I met at the beginning of the school year.


And how about the labor of curriculum planning? The former art of taking the adopted educational program and planning in teacher teams will no longer be necessary with AI-enabled curriculum planning software. It promises to streamline lesson planning therefore taking the teacher, again, out of the learning equation. Curriculum planning is a collaborative affair and when conducted in isolation, or by someone (or something) else, can result in a school where teachers work in silos, or worse yet, have no allegiance to the lessons they are delivering. The more teachers allow AI to do their lesson planning, the more teachers will place unyielding trust in the algorithmic output from AI. In the long run, teachers will begin to trust AI over their fellow colleagues and eventually even mistrust their own work.


Even these few “drudgeries” above are essential for strengthening teacher : student relationships. By the time you read this, educational software companies will have introduced a slew of additional AI-enabled accessories to eliminate these onerous tasks from teachers, all of which will accelerate the divorce between teachers and students. I implore educational leaders to scrutinize the use of these AI-enabled components to ensure the human connection between teachers and students are not compromised. Innocent overuse of AI-enabled tools will only goad the designers to create more of these embellishments to which will ultimately wedge itself firmly between students and teachers.

]]>