Advancing knowledge for building secure, efficient, and usable applications, systems, and networks
Room 541 Ingersoll Hall Externsion
2900 Bedford Avenue
Brooklyn, NY, 11210
The Laboratory for Modelling and Analytics of Software and Systems (MASS Lab) is Professor Hui Chen’s research group at the City University of New York. The group is investigating a multi-pronged approach for engineering large and complex software and networked systems.
High-quality vulnerability patch data is essential for understanding vulnerabilities in software systems. Accurate patch data sheds light on the nature of vulnerabilities, their origins, and effective remediation strategies. However, current data collection efforts prioritize rapid release over quality, leading to patches that are incomplete or contain extraneous changes. In addition to supporting vulnerability analysis, high-quality patch data improves automatic vulnerability prediction models, which require reliable inputs to predict issues in new or existing code. In this paper, we explore using large language models (LLMs) to filter vulnerability data by identifying and removing low-quality instances. Trained on large textual corpora including source code, LLMs offer new opportunities to improve data accuracy. Our goal is to leverage LLMs for reasoning-based assessments of whether a code hunk fixes a described vulnerability. We evaluate several prompting strategies and find that Generated Knowledge Prompting, where the model first explains a hunk’s effect, then assesses whether it fixes the bug, is most effective across three LLMs. Applying this filtering to the BigVul dataset, we show a 7%–9% improvement in prediction precision for three popular vulnerability prediction models. Recall declines slightly, 2%–8%, across models, likely reflecting the impact of reduced dataset size.
@article{dil2025,title={Towards higher quality software vulnerability data using LLM-based patch filtering},journal={Journal of Systems and Software},pages={112581},year={2025},issn={0164-1212},doi={10.1016/j.jss.2025.112581},url={https://www.sciencedirect.com/science/article/pii/S016412122500250X},author={Dil, Charlie and Chen, Hui and Damevski, Kostadin},keywords={Vulnerability patch quality, Automatic vulnerability prediction, Large language models},preprint={preprint/llmcleanjss.pdf},note={In press},}
This research-to-practice WIP paper describes the development and evaluation of a generative Large Language Model (gLLM)-based autograder for computer programming assignments. Manual grading is becoming increasingly unsustainable due to growing student enrollment and the demand for timely, high-quality feedback. To address these challenges, this study explores the use of automated grading tools to reduce instructors’ workload and improve scalability. The proposed autograder takes a “reverse-engineering” approach, i.e., it converts student code into structured natural language summaries, which are then compared against predefined grading rubrics. An evaluation is performed using an external dataset (the Menagerie dataset), which contains real student submissions graded by four human graders. The objective is to assess the alignment between grades assigned by the autograder and those assigned by human graders. Findings indicate that the autograder closely matches human grading when letter grades are considered, though it performs less accurately with fine-grained numerical scores. While not yet a complete substitute for human assessment, the autograder shows strong potential as a scalable, efficient tool for supporting grading in programming education.
@inproceedings{fie2025,author={Lewis, Kevin and Chen, Hui},title={{WIP}: How Effective Are LLM-Implemented Autograders for Programming Assignments Compared to Human Graders?},booktitle={Proceedings of the 2025 IEEE Frontiers in Education Conference (FIE)},volume={},number={},pages={LC:1--LC:5},year={2025},organization={IEEE},keywords={Grades, Grading Systems, Automated Grading, Student Assessment, Computing Skills},preprint={preprint/fie2025autograder.pdf},note={Accepted and to appear},}