Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 15452))

Included in the following conference series:

International Conference on Product-Focused Software Process Improvement

395Accesses

Abstract

The use of large language models (LLMs) for software engineering is growing, especially for code - typically to generate code or to detect or fix quality problems. Because requirements are often written in natural language, it seems promising to exploit the capabilities of LLMs to detect requirement problems. We replicated an inspection experiment in which computer science students searched for defects in requirements documents using different reading techniques. In our replication, we used the LLM GPT-4-Turbo instead of students to determine how the model compares to human reviewers. Additionally, we considered GPT-3.5-Turbo, Nous-Hermes-2-Mixtral-8x7B-DPO, and Phi-3-medium-128k-instruct for one research question. We focus on single prompt approaches and avoid more complex approaches to mimic the original study design where students received all the material at once. We had two phases. First, we explored the general feasibility of using LLMs for requirements inspection on a practice document and examined different prompts. Second, we applied selected approaches to two requirements documents and compared the approaches to each other and to human reviewers. The approaches include variations in reading techniques (ad-hoc, perspective-based, checklist-based), LLMs, the instructions, and material provided. We found that LLMs (a) report only a limited number of deficits despite having enough tokens, which (b) do not vary much across prompts. They (c) rarely match the sample solution.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 7550; Price includes VAT (Japan)

Softcover Book: JPY 9437; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Arora, C., Grundy, J., Abdelrazek, M.: Advancing requirements engineering through generative AI: assessing the role of LLMs.arXiv:2310.13976 (2023)
Basili, V.R., Green, S., Laitenberger, et al.: The empirical investigation of Perspective-Based Reading. Empirical SE (1996)
Google Scholar
Berhanu, F., Alemneh, E.: Classification and prioritization of requirements smells using machine learning techniques. ICT4DA (2023)
Google Scholar
Ciolkowski, M., Laitenberger, O., Biffl, S.: Software reviews: the state of the practice. IEEE Softw. (2003)
Google Scholar
Ciolkowski, M.: What do we know about perspective-based reading? An approach for quantitative aggregation in SE. ESEM (2009)
Google Scholar
Ciolkowski, M., Differding, C., Laitenberger, O., Münch, J.: Empirical investigation of perspective-based reading: a replicated experiment. ISERN (1997)
Google Scholar
Habib, M.K., Wagner, S., Graziotin, D.: AIRE (2021)
Google Scholar
Hou, X., Zhao, Y., Liu, Y., et al.: Large language models for SE: a systematic literature review.arXiv:2308.10620 (2023)
Krasner, H.: The cost of poor software quality in the USA: a 2022 report. From Problem to Solutions, CISQ (2022)
Google Scholar
Krishna, M., Gaur, B., Verma, A., Jalote, P.: Using LLMs in software requirements specifications: an empirical evaluationarXiv:2404.17842 (2024)
Li, C., Wang, J., Zhang, Y., et al.: Large language models understand and can be enhanced by emotional stimuli.arXiv:2307.11760 (2023)
Luitel, D., Hassani, S., Sabetzadeh, M.: Improving requirements completeness: automated assistance through large language models. Requirem. Eng. (2024)
Google Scholar
Mittal, A., Murthy, R., Kumar, V., Bhat, R.: Towards understanding and mitigating the hallucinations in NLP and Speech. CODS-COMAD (2024)
Google Scholar
Naeem, A., Aslam, Z., Shah, M.A.: Analyzing quality of software requirements; a comparison study on NLP tools. ICAC (2019)
Google Scholar
Nguyen-Duc, A., Cabrero-Daniel, B., Przybylek, A., et al.: Generative artificial intelligence for SE – a research agenda.arXiv:2310.18648 (2023)
Nouri, A., Cabrero-Daniel, B., Törner, F., et al.: Engineering safety requirements for autonomous driving with large language models.arXiv:2403.16289 (2024)
OpenAI: OpenAI base library: large language models (2024).https://api.python.langchain.com/en/latest/llms/langchain_openai.llms.base.OpenAI.html. Accessed 17 July 2024
OpenAI, Achiam, J., Adler, et al.: GPT-4. Technical Report.arXiv:2303.08774 (2023)
Parra, E., Dimou, C., Llorens, J., et al.: A methodology for the classification of quality of requirements using machine learning techniques. Inform. Softw. Technol. (2015)
Google Scholar
Porter, A.A., Votta, L.G., Basili, V.R.: Comparing detection methods for soft-ware requirements inspections: a replicated experiment. IEEE Trans. Soft. Eng. (1995)
Google Scholar
Shepperd, M.: A critique of cyclomatic complexity as a software metric. Soft. Eng. J. UK (1988)
Google Scholar
Slaughter, S.A., Harter, D.E., Krishnan, M.S.: Evaluating the cost of software quality. ACM, Commun (1998)
Book Google Scholar
Wang, J., Huang, Y., Chen, C., et al.: Software testing with large language models: survey, landscape, and vision. IIEEE Trans. Soft. Eng. (2024)
Google Scholar
Waseem, M., Das, T., Ahmad, A., et al.: ChatGPT as a software development bot: a project-based study.arXiv:2310.13648 (2023)
White, J., Hays, S., Fu, Q., Spencer-Smith, J., Schmidt, D.C.: ChatGPT prompt patterns for improving code quality, refactoring, requirements elicitation, and software design.arXiv:2303.07839 (2023)
Wohlin, C.: Experimentation in SE. An introduction. The Kluwer international series in SE (2012)
Google Scholar
Zheng, Z., Ning, K., Wang, Y., et al.: A survey of large language models for code: evolution, benchmarking, and future trends.arXiv:2311.10372 (2023)

Download references

Acknowledgments

Parts of this work have been funded by the German Federal Ministry of Education and Research (BMBF) in the project “DeepQuali” (grant no. 01IS23016D).

Author information

Authors and Affiliations

Fraunhofer Institute for Experimental SE IESE, Fraunhofer Platz 1, 67663, Kaiserslautern, Germany
Daniel Seifert, Lisa Jöckel, Adam Trendowicz, Thorsten Honroth & Andreas Jedlitschka
QAware GmbH, Aschauer Street 30, 81549, München, Germany
Marcus Ciolkowski

Authors

Daniel Seifert
View author publications
You can also search for this author inPubMed Google Scholar
Lisa Jöckel
View author publications
You can also search for this author inPubMed Google Scholar
Adam Trendowicz
View author publications
You can also search for this author inPubMed Google Scholar
Marcus Ciolkowski
View author publications
You can also search for this author inPubMed Google Scholar
Thorsten Honroth
View author publications
You can also search for this author inPubMed Google Scholar
Andreas Jedlitschka
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toDaniel Seifert.

Editor information

Editors and Affiliations

University of Tartu, Tartu, Estonia
Dietmar Pfahl
Blekinge Institute of Technology, Karlskrona, Sweden
Javier Gonzalez Huerta
Leibniz Universität Hannover, Hannover, Germany
Jil Klünder
University of Tartu, Tartu, Estonia
Hina Anwar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Seifert, D., Jöckel, L., Trendowicz, A., Ciolkowski, M., Honroth, T., Jedlitschka, A. (2025). Can Large Language Models (LLMs) Compete with Human Requirements Reviewers? – Replication of an Inspection Experiment on Requirements Documents. In: Pfahl, D., Gonzalez Huerta, J., Klünder, J., Anwar, H. (eds) Product-Focused Software Process Improvement. PROFES 2024. Lecture Notes in Computer Science, vol 15452. Springer, Cham. https://doi.org/10.1007/978-3-031-78386-9_3

Download citation

DOI:https://doi.org/10.1007/978-3-031-78386-9_3
Published:27 November 2024
Publisher Name:Springer, Cham
Print ISBN:978-3-031-78385-2
Online ISBN:978-3-031-78386-9
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Movatterモバイル変換

Can Large Language Models (LLMs) Compete with Human Requirements Reviewers? – Replication of an Inspection Experiment on Requirements Documents

Abstract

Access this chapter

Subscribe and save

Buy Now

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Access this chapter

Subscribe and save

Buy Now