Automated Essay Scoring and Revising Based onOpen-Source Large Language Models
Yishen Song , Qianta Zhu , Huaibo Wang , and Qinhua Zheng
Abstract—Manually scoring and revising student essays has long been a time-consuming task for educators.With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack generalizability, which makes them hard to implement in daily teaching activities. Moreover,online sites offering AES and AER services charge high fees and have security issues uploading student content. In light of these challenges and recognizing the advancements in large language models (LLMs), we aim to fill these research gaps by analyzing the performance of open-source LLMs when accomplishing AES and AER tasks. Using a human-scored essay dataset (n = 600) collected in an online assessment, we implemented zero-shot, few-shot, and p-tuning AES methods based on theLLMsand conducted a human–machine consistency check. We conducted a similarity test and a score difference test for the results of AER with LLMs support. The human–machine consistency check result shows that the performance of open-sourceLLMswith a 10Bparameter size in the AES task is close to that of some deep-learning baseline models,and it can be improved by integrating the comment with the score into the shot or training continuous prompts. The similarity test and score difference test results show that open-source LLMs can effectively accomplish the AER task, improving the quality of the essays while ensuring that the revision results are similar to the original essays. This study reveals a practical path to cost-effectively,time-efficiently, and content-safely assisting teachers with student essay scoring and revising using open-source LLMs.
Index Terms—Assessment, automated essay revising (AER),automated essay scoring (AES), generative artificial intelligence,open-source large language model (LLM).
点击查看原文:Automated Essay Scoring and Revising Based on Open-Source Large Language Models