376Accesses
1Altmetric
Abstract
Software refactoring is an essential activity for improving the readability, maintainability, and reusability of software projects. To this end, a large number of automated or semi-automated approaches/tools have been proposed to locate poorly designed code, recommend refactoring solutions, and conduct specified refactorings. However, even equipped with such tools, it remains challenging for developers to decide where and what kind of refactorings should be applied. Recent advances in deep learning techniques, especially in large language models (LLMs), make it potentially feasible to automatically refactor source code with LLMs. However, it remains unclear how well LLMs perform compared to human experts in conducting refactorings automatically and accurately. To fill this gap, in this paper, we conduct an empirical study to investigate the potential of LLMs in automated software refactoring, focusing on the identification of refactoring opportunities and the recommendation of refactoring solutions. We first construct a high-quality refactoring dataset comprising 180 real-world refactorings from 20 projects, and conduct the empirical study on the dataset. With the to-be-refactored Java documents as input, ChatGPT and Gemini identified only 28 and 7 respectively out of the 180 refactoring opportunities. The evaluation results suggested that the performance of LLMs in identifying refactoring opportunities is generally low and remains an open problem. However, explaining the expected refactoring subcategories and narrowing the search space in the prompts substantially increased the success rate of ChatGPT from 15.6 to 86.7%. Concerning the recommendation of refactoring solutions, ChatGPT recommended 176 refactoring solutions for the 180 refactorings, and 63.6% of the recommended solutions were comparable to (even better than) those constructed by human experts. However, 13 out of the 176 solutions suggested by ChatGPT and 9 out of the 137 solutions suggested by Gemini were unsafe in that they either changed the functionality of the source code or introduced syntax errors, which indicate the risk of LLM-based refactoring.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The replication package, including the tools and the data, is publicly available (Liu2024b).
References
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: GPT-4 technical report (2023).https://arxiv.org/abs/2303.08774
Alon, U., Brody, S., Levy, O., Yahav, E.: code2seq: generating sequences from structured representations of code. In: Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview, New Orleans, LA, USA (2019)
Alizadeh, V., Kessentini, M.: Reducing interactive refactoring effort via clustering-based multi-objective search. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE’18), pp. 464–474. ACM, Montpellier (2018).https://doi.org/10.1145/3238147.3238217
Alizadeh, V., Kessentini, M., Mkaouer, M.W., Cinnéide, M.Ó., Ouni, A., Cai, Y.: An interactive and dynamic search-based approach to software refactoring recommendations. IEEE Trans. Softw. Eng.46(9), 932–961 (2020).https://doi.org/10.1109/TSE.2018.2872711
Akın, F.K.: Awesome ChatGPT prompts (2024).https://github.com/f/awesome-chatgpt-prompts
AlOmar, E.A., Mkaouer, M.W., Ouni, A.: Behind the intent of extract method refactoring: a systematic literature review. IEEE Trans. Softw. Eng. (2024).https://doi.org/10.1109/TSE.2023.3345800
Alizadeh, V., Ouali, M.A., Kessentini, M., Chater, M.: RefBot: intelligent software refactoring bot. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19), pp. 823–834. IEEE, San Diego, CA, USA (2019).https://doi.org/10.1109/ASE.2019.00081
AlOmar, E.A., Venkatakrishnan, A., Mkaouer, M.W., Newman, C.D., Ouni, A.: How to refactor this code? An exploratory study on developer-ChatGPT refactoring conversations. In: Proceedings of the 21st International Conference on Mining Software Repositories (MSR’24), pp. 202–206. IEEE (2024).https://doi.org/10.1145/3643991.3645081
Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: learning distributed representations of code. In: Proceedings of the ACM on Programming Languages 3(POPL), pp. 1–29 (2019).https://doi.org/10.1145/3291636
Baqais, A.A.B., Alshayeb, M.: Automatic software refactoring: a systematic literature review. Softw. Qual. J.28(2), 459–502 (2020).https://doi.org/10.1007/s11219-019-09477-y
Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., et al.: A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity (2023).arxiv: 2302.04023
Bavota, G., De Lucia, A., Di Penta, M., Oliveto, R., Palomba, F.: An experimental investigation on the innate relationship between quality and refactoring. J. Syst. Softw.107, 1–14 (2015).https://doi.org/10.1016/j.jss.2015.05.024
Bavota, G., De Lucia, A., Oliveto, R.: Identifying extract class refactoring opportunities using structural and semantic cohesion measures. J. Syst. Softw.84(3), 397–414 (2011).https://doi.org/10.1016/j.jss.2010.11.918
Barbez, A., Khomh, F., Guéhéneuc, Y.-G.: Deep learning anti-patterns from code metrics history. In: Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME’19), pp. 114–124. IEEE, Cleveland, OH, USA (2019).https://doi.org/10.1109/ICSME.2019.00021
Bavota, G., Oliveto, R., De Lucia, A., Antoniol, G., Guéhéneuc, Y.-G.: Playing with refactoring: identifying extract class opportunities through game theory. In: Proceedings of the 2010 IEEE International Conference on Software Maintenance (ICSM’10), pp. 1–5. IEEE (2010).https://doi.org/10.1109/ICSM.2010.5609739
Charalampidou, S., Ampatzoglou, A., Chatzigeorgiou, A., Gkortzis, A., Avgeriou, P.: Identifying extract method refactoring opportunities based on functional relevance. IEEE Trans. Softw. Eng.43(10), 954–974 (2017).https://doi.org/10.1109/TSE.2016.2645572
Chouchen, M., Bessghaier, N., Begoug, M., Ouni, A., AlOmar, E.A., Mkaouer, M.W.: How do so ware developers use ChatGPT? An exploratory study on github pull requests. In: Proceedings of the 21st International Conference on Mining Software Repositories (MSR’24), pp. 212–216. IEEE (2024).https://doi.org/10.1145/3643991.3645084
Chang, S., Fosler-Lussier, E.: How to prompt LLMs for text-to-SQL: a study in zero-shot, single-domain, and cross-domain settings (2023).arxiv: 2305.11853
Chen, T., Jiang, Y., Fan, F., Liu, B., Liu, H.: A position-aware approach to decomposing god classes. In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE’24), pp. 129–140 (2024). IEEE.https://doi.org/10.1145/3691620.3694992
Cui, D., Wang, S., Luo, Y., Li, X., Dai, J., Wang, L., Li, Q.: RMove: recommending move method refactoring opportunities using structural and semantic representations of code. In: Proceedings of the 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME’22), pp. 281–292. IEEE, Limassol, Cyprus (2022).https://doi.org/10.1109/ICSME55016.2022.00033
Dair.AI: Prompt Engineering Guide (2024).https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/guides/prompts-intro.md
Dawes, J.: Do data characteristics change according to the number of scale points used? An experiment using 5-point, 7-point and 10-point scales. Int. J. Mark. Res.50(1), 61–104 (2008).https://doi.org/10.1177/147078530805000106
Dilhara, M., Bellur, A., Bryksin, T., Dig, D.: Unprecedented code change automation: the fusion of LLMS and transformation by example. In: Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE’24), pp. 631–653. ACM, Porto de Galinhas, Brazil (2024).https://doi.org/10.1145/3643755
Desai, U., Bandyopadhyay, S., Tamilselvam, S.: Graph neural network to dilute outliers for refactoring monolith application. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI’21), vol. 35, pp. 72–80 (2021).https://doi.org/10.1609/aaai.v35i1.16079
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), pp. 4171–4186. ACL, Minneapolis, MN, USA (2019).https://doi.org/10.18653/V1/N19-1423
Dig, D., Comertoglu, C., Marinov, D., Johnson, R.: Automated detection of refactorings in evolving components. In: Proceedings of the 20th European Conference on Object-Oriented Programming (ECOOP’06), pp. 404–428. Springer, Nantes, France (2006).https://doi.org/10.1007/11785477_24
Deo, S., Hinge, D., Chavan, O.S., Wang, Y.O., Mkaouer, M.W.: Analyzing developer-ChatGPT conversations for software refactoring: an exploratory study. In: Proceedings of the 21st International Conference on Mining Software Repositories (MSR’24), pp. 207–211 (2024). IEEE.https://doi.org/10.1145/3643991.3645082
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology26(3), 297–302 (1945).https://doi.org/10.2307/1932409
DePalma, K., Miminoshvili, I., Henselder, C., Moss, K., AlOmar, E.A.: Exploring ChatGPT’s code refactoring capabilities: an empirical study. Expert Syst. Appl.249, 1–26 (2024).https://doi.org/10.1016/j.eswa.2024.123602
Farmmamba: Hadoop HDFS.https://issues.apache.org/jira/browse/HDFS-17322 (2024)
Fontana, F.A., Caracciolo, A., Zanoni, M.: DPB: A benchmark for design pattern detection tools. In: Proceedings of the 16th European Conference on Software Maintenance and Reengineering (CSMR’12), Szeged, Hungary, pp. 235–244. IEEE (2012).https://doi.org/10.1109/CSMR.2012.32
Fulop, L.J., Ferenc, R., Gyimóthy, T.: Towards a benchmark for evaluating design pattern miner tools. In: Proceedings of the 12th European Conference on Software Maintenance and Reengineering (CSMR’08), pp. 143–152. IEEE, Athens, Greece (2008).https://doi.org/10.1109/CSMR.2008.4493309
Foster, S.R., Griswold, W.G., Lerner, S.: WitchDoctor: IDE support for real-time auto-completion of refactorings. In: Proceedings of the 34th International Conference on Software Engineering (ICSE’12), pp. 222–232. IEEE, Zurich, Switzerland (2012).https://doi.org/10.1109/ICSE.2012.6227191
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull.76(5), 378–382 (1971).https://doi.org/10.1037/h0031619
Falleri, J.-R., Morandat, F., Blanc, X., Martinez, M., Monperrus, M.: Fine-grained and accurate source code differencing. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE’14), pp. 313–324. ACM, Vasteras, Sweden (2014).https://doi.org/10.1145/2642937.2642982
Feitelson, D.G., Mizrahi, A., Noy, N., Shabat, A.B., Eliyahu, O., Sheffer, R.: How developers choose names. IEEE Trans. Softw. Eng.48(1), 37–52 (2022).https://doi.org/10.1109/TSE.2020.2976920
Fowler, M.: Refactoring: Improving the Design of Existing Code. Addison-Wesley, Boston (1999)
Fokaefs, M., Tsantalis, N., Chatzigeorgiou, A.: JDeodorant: identification and removal of feature envy bad smells. In: Proceedings of the 23rd IEEE International Conference on Software Maintenance (ICSM’07), pp. 519–520. IEEE, Paris, France (2007).https://doi.org/10.1109/ICSM.2007.4362679
Fluri, B., Wursch, M., Pinzger, M., Gall, H.: Change distilling: tree differencing for fine-grained source code change extraction. IEEE Trans. Softw. Eng.33(11), 725–743 (2007).https://doi.org/10.1109/TSE.2007.70731
Grund, F., Chowdhury, S.A., Bradley, N.C., Hall, B., Holmes, R.: CodeShovel: constructing method-level source code histories. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE’21), pp. 1510–1522. IEEE, Madrid, Spain (2021).https://doi.org/10.1109/ICSE43902.2021.00135
Guo, Q., Cao, J., Xie, X., Liu, S., Li, X., Chen, B., Peng, X.: Exploring the potential of ChatGPT in automated code refinement: an empirical study. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE’24), pp. 1–13. ACM, Lisbon, Portugal (2024).https://doi.org/10.1145/3597503.3623306
Ge, X., DuBose, Q.L., Murphy-Hill, E.: Reconciling manual and automatic refactoring. In: Proceedings of the 34th International Conference on Software Engineering (ICSE’12), pp. 211–221. IEEE, Zurich, Switzerland (2012).https://doi.org/10.1109/ICSE.2012.6227192
Grissom, R.J., Kim, J.J.: Effect Sizes for Research: A Broad Practical Approach. Lawrence Erlbaum Associates, Hillsdale, NJ (2005)
Google: Gemini Model (2024).https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models
JetBrains: IntelliJ IDEA Refactoring Meun (2024).https://www.jetbrains.com/help/idea/refactoring-source-code.html
Kniesel, G., Binun, A.: Standing on the shoulders of giants—a data fusion approach to design pattern detection. In: Proceedings of the IEEE 17th IEEE International Conference on Program Comprehension (ICPC’09), pp. 208–217. IEEE, Vancouver, BC, Canada (2009).https://doi.org/10.1109/ICPC.2009.5090044
Kim, M., Gee, M., Loh, A., Rachatasumrit, N.: Ref-Finder: A refactoring reconstruction tool based on logic query templates. In: Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’10), pp. 371–372. ACM, Santa Fe, NM, USA (2010).https://doi.org/10.1145/1882291.1882353
Kurbatova, Z., Veselov, I., Golubev, Y., Bryksin, T.: Recommendation of move method refactoring using path-based representation of code. In: Proceedings of the 42nd IEEE/ACM International Conference on Software Engineering Workshops (IWoR’20), pp. 315–322. ACM, Seoul, Republic of Korea (2020).https://doi.org/10.1145/3387940.3392191
Liu, H., Guo, X., Shao, W.: Monitor-based instant software refactoring. IEEE Trans. Softw. Eng.39(8), 1112–1126 (2013).https://doi.org/10.1109/TSE.2013.4
Liu, Y., Han, T., Ma, S., Zhang, J., Yang, Y., Tian, J., He, H., Li, A., He, M., Liu, Z., et al.: Summary of ChatGPT-related research and perspective towards the future of large language models. Meta-Radiology1(2), 1–14 (2023).https://doi.org/10.1016/j.metrad.2023.100017
Liu, B.: ReExtractor (2024).https://github.com/lyoubo/ReExtractor
Liu, B.: Replication Package (2024).https://github.com/bitselab/LLM4Refactoring
Liu, H., Jin, J., Xu, Z., Zou, Y., Bu, Y., Zhang, L.: Deep learning based code smell detection. IEEE Trans. Softw. Eng.47(9), 1811–1837 (2021).https://doi.org/10.1109/TSE.2019.2936376
Liu, K., Kim, D., Bissyandé, T.F., Kim, T., Kim, K., Koyuncu, A., Kim, S., Le Traon, Y.: Learning to spot and refactor inconsistent method names. In: Proceedings of the 41st International Conference on Software Engineering (ICSE’19), pp. 1–12. IEEE, Montreal, QC, Canada (2019).https://doi.org/10.1109/ICSE.2019.00019
Liu, B., Liu, H., Li, G., Niu, N., Xu, Z., Wang, Y., Xia, Y., Zhang, Y., Jiang, Y.: Deep learning based feature envy detection boosted by real-world examples. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’23), pp. 908–920. ACM, San Francisco, CA, USA (2023).https://doi.org/10.1145/3611643.3616353
Liu, H., Liu, Q., Liu, Y., Wang, Z.: Identifying renaming opportunities by expanding conducted rename refactorings. IEEE Trans. Softw. Eng.41(9), 887–900 (2015).https://doi.org/10.1109/TSE.2015.2427831
Liu, B., Liu, H., Niu, N., Zhang, Y., Li, G., Jiang, Y.: Automated software entity matching between successive versions. In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE’23), pp. 1615–1627. IEEE, Luxembourg, Luxembourg (2023).https://doi.org/10.1109/ASE56229.2023.00132
Lacerda, G., Petrillo, F., Pimenta, M., Guéhéneuc, Y.G.: Code smells and refactoring: a tertiary systematic review of challenges and observations. J. Syst. Softw.167, 110610 (2020).https://doi.org/10.1016/j.jss.2020.110610
Liu, H., Wang, Y., Wei, Z., Xu, Y., Wang, J., Li, H., Ji, R.: RefBERT: a two-stage pre-trained framework for automatic rename refactoring. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’23), pp. 740–752. ACM, Seattle, WA, USA (2023).https://doi.org/10.1145/3597926.3598092
Murphy-Hill, E., Parnin, C., Black, A.P.: How we refactor, and how we know it. IEEE Trans. Softw. Eng.38(1), 5–18 (2012).https://doi.org/10.1109/TSE.2011.41
Mkaouer, M.W., Kessentini, M., Cinnéide, M.Ó., Hayashi, S., Deb, K.: A robust multi-objective approach to balance severity and importance of refactoring opportunities. Empir. Softw. Eng.22, 894–927 (2017).https://doi.org/10.1007/s10664-016-9426-8
Minna, F., Massacci, F., Tuma, K.: Analyzing and Mitigating (with LLMs) the Security Misconfigurations of Helm Charts from Artifact Hub (2024).https://arxiv.org/abs/2403.09537
Mu, F., Shi, L., Wang, S., Yu, Z., Zhang, B., Wang, C., Liu, S., Wang, Q.: ClarifyGPT: empowering LLM-based code generation with intention clarification (2023).https://arxiv.org/abs/2310.10996
Mens, T., Tourwé, T.: A survey of software refactoring. IEEE Trans. Softw. Eng.30(2), 126–139 (2004).https://doi.org/10.1109/TSE.2004.1265817
Negara, S., Chen, N., Vakilian, M., Johnson, R.E., Dig, D.: A comparative study of manual and automated refactorings. In: Proceedings of the 27th European Conference on Object-Oriented Programming (ECOOP’13), pp. 552–576. Springer, Berlin (2013).https://doi.org/10.1007/978-3-642-39038-8_23
OpenAI: ChatGPT (2024).https://openai.com/index/chatgpt
OpenAI: GPT-4 Model (2024).https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo
Pomian, D., Bellur, A., Dilhara, M., Kurbatova, Z., Bogomolov, E., Bryksin, T., Dig, D.: Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring (2024).arxiv: 2401.15298
Pomian, D., Bellur, A., Dilhara, M., Kurbatova, Z., Bogomolov, E., Sokolov, A., Bryksin, T., Dig, D.: EM-Assist: safe automated extract method refactoring with LLMs. In: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE’24’), pp. 582–586. ACM (2024).https://doi.org/10.1145/3663529.3663803
PMD: PMD (2024).https://github.com/pmd/pmd
Peruma, A., Mkaouer, M.W., Decker, M.J., Newman, C.D.: An empirical investigation of how and why developers rename identifiers. In: Proceedings of the 2nd International Workshop on Refactoring (IWoR’18’), pp. 26–33. ACM (2018).https://doi.org/10.1145/3242163.3242169
Prete, K., Rachatasumrit, N., Sudan, N., Kim, M.: Template-based reconstruction of complex refactorings. In: Proceedings of the 26th IEEE International Conference on Software Maintenance (ICSM’10), pp. 1–10. IEEE, Timisoara, Romania (2010).https://doi.org/10.1109/ICSM.2010.5609577
Peruma, A., Simmons, S., AlOmar, E.A., Newman, C.D., Mkaouer, M.W., Ouni, A.: How do I refactor this? An empirical study on refactoring trends and topics in Stack Overflow. Empir. Softw. Eng.27(11), 1–43 (2022).https://doi.org/10.1007/s10664-021-10045-x
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res.21(1), 5485–5551 (2020).https://doi.org/10.18653/V1/N19-1423
Silva, D., Silva, J.P., Santos, G., Terra, R., Valente, M.T.: RefDiff 2.0: A multi-language refactoring detection tool. IEEE Trans. Softw. Eng.47(12), 2786–2802 (2020).https://doi.org/10.1109/TSE.2020.2968072
Shirafuji, A., Oda, Y., Suzuki, J., Morishita, M., Watanobe, Y.: Refactoring programs using large language models with few-shot examples. In: Proceedings of the 30th Asia-Pacific Software Engineering Conference (APSEC’23), pp. 151–160. IEEE (2023).https://doi.org/10.1109/APSEC60848.2023.00025
Silva, D., Tsantalis, N., Valente, M.T.: Why we refactor? Confessions of GitHub contributors. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’16), pp. 858–870. ACM, Seattle WA USA (2016).https://doi.org/10.1145/2950290.2950305
Silva, D., Valente, M.T.: RefDiff: detecting refactorings in version histories. In: Proceedings of the 14th International Conference on Mining Software Repositories (MSR’17), pp. 269–279. IEEE (2017).https://doi.org/10.1109/MSR.2017.14
Team, G., Anil, R., Borgeaud, S., Wu, Y., Alayrac, J.-B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., et al.: Gemini: a family of highly capable multimodal models (2023).arXiv. 2312.11805
Tate, R.F.: Correlation between a discrete and a continuous variable, point-biserial correlation. Ann. Math. Stat.25(3), 603–607 (1954).https://doi.org/10.1214/aoms/1177728730
Tsantalis, N., Chatzigeorgiou, A.: Identification of move method refactoring opportunities. IEEE Trans. Softw. Eng.35(3), 347–367 (2009).https://doi.org/10.1109/TSE.2009.1
Tsantalis, N., Chatzigeorgiou, A.: Identification of extract method refactoring opportunities for the decomposition of methods. J. Syst. Softw.84(10), 1757–1782 (2011).https://doi.org/10.1016/j.jss.2011.05.016
Tsantalis, N., Chaikalis, T., Chatzigeorgiou, A.: JDeodorant: Identification and removal of type-checking bad smells. In: Proceedings of the 12th European Conference on Software Maintenance and Reengineering (CSMR’08), pp. 329–331. IEEE, Athens, Greece (2008).https://doi.org/10.1109/CSMR.2008.4493342
Tsantalis, N., Ketkar, A., Dig, D.: RefactoringMiner 2.0. IEEE Trans. Softw. Eng.48(3), 930–950 (2022).https://doi.org/10.1109/TSE.2020.3007722
Tourwé, T., Mens, T.: Identifying refactoring opportunities using logic meta programming. In: Proceedings of the 7th European Conference on Software Maintenance and Reengineering (CSMR’03), pp. 91–100. IEEE, Benevento, Italy (2003).https://doi.org/10.1109/CSMR.2003.1192416
Tsantalis, N., Mansouri, M., Eshkevari, L.M., Mazinanian, D., Dig, D.: Accurate and efficient refactoring detection in commit history. In: Proceedings of the 40th International Conference on Software Engineering (ICSE’18), pp. 483–494. ACM, Gothenburg, Sweden (2018).https://doi.org/10.1145/3180155.3180206
Tufano, R., Mastropaolo, A., Pepe, F., Dabić, O., Di Penta, M., Bavota, G.: Unveiling ChatGPT’s usage in open source projects: A mining-based study. In: Proceedings of the 21st International Conference on Mining Software Repositories (MSR ’24), pp. 571–583. IEEE (2024).https://doi.org/10.1145/3643991.3644918
Tufano, M., Pantiuchina, J., Watson, C., Bavota, G., Poshyvanyk, D.: On learning meaningful code changes via neural machine translation. In: Proceedings of the 41st International Conference on Software Engineering (ICSE’19), pp. 25–36. IEEE, Montreal, QC, Canada (2019).https://doi.org/10.1109/ICSE.2019.00021
Vitale, A., Piantadosi, V., Scalabrino, S., Oliveto, R.: Using deep learning to automatically improve code readability. In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE’23), pp. 573–584. IEEE, Luxembourg, Luxembourg (2023).https://doi.org/10.1109/ASE56229.2023.00112
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT (2023).https://arxiv.org/abs/2302.11382
White, J., Hays, S., Fu, Q., Spencer-Smith, J., Schmidt, D.C.: ChatGPT prompt patterns for improving code quality, refactoring, requirements elicitation, and software design, pp. 71–108 (2024).https://doi.org/10.1007/978-3-031-55642-5_4
Wilcoxon, F.: Individual comparisons by ranking methods. Int. Biomet. Soc.1(6), 80–83 (1945).https://doi.org/10.2307/3001968
Wu, Y., Li, Z., Zhang, J.M., Papadakis, M., Harman, M., Liu, Y.: Large language models in fault localisation (2023).https://arxiv.org/abs/2308.15276
Xing, Z., Stroulia, E.: The JDEvAn tool suite in support of object-oriented evolutionary development. In: Companion of the 30th International Conference on Software Engineering (ICSE Companion’08), pp. 951–952. ACM, Leipzig, Germany (2008).https://doi.org/10.1145/1370175.1370203
Xia, C.S., Zhang, L.: Keep the Conversation Going: Fixing 162 out of 337 bugs for \$0.42 each using ChatGPT (2023).arxiv. 2304.00385
Yaron: TestMe (2024).https://github.com/wrdv/testme-idea
Yamashita, A., Moonen, L.: Do developers care about code smells? An exploratory survey. In: Proceedings of the 20th Working Conference on Reverse Engineering (WCRE’13), pp. 242–251. IEEE (2013).https://doi.org/10.1109/WCRE.2013.6671299
Zhang, J., Luo, J., Liang, J., Gong, L., Huang, Z.: An accurate identifier renaming prediction and suggestion approach. ACM Trans. Softw. Eng. Methodol.32(6), 1–51 (2023).https://doi.org/10.1145/3603109
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (62232003 and 62172037), China National Postdoctoral Program for Innovative Talents (BX20240008) and CCF-Huawei Populus Grove Fund (CCF-HuaweiSE202411).
Author information
Authors and Affiliations
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
Bo Liu, Yanjie Jiang, Yuxia Zhang & Hui Liu
Key Laboratory of High Confidence Software Technologies, Ministry of Education, School of Computer Science, Peking University, Beijing, 100871, China
Yanjie Jiang
Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, 45221, USA
Nan Niu
National Innovation Institute of Defense Technology, Beijing, 100071, China
Guangjie Li
- Bo Liu
You can also search for this author inPubMed Google Scholar
- Yanjie Jiang
You can also search for this author inPubMed Google Scholar
- Yuxia Zhang
You can also search for this author inPubMed Google Scholar
- Nan Niu
You can also search for this author inPubMed Google Scholar
- Guangjie Li
You can also search for this author inPubMed Google Scholar
- Hui Liu
You can also search for this author inPubMed Google Scholar
Corresponding authors
Correspondence toYanjie Jiang orHui Liu.
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, B., Jiang, Y., Zhang, Y.et al. Exploring the potential of general purpose LLMs in automated software refactoring: an empirical study.Autom Softw Eng32, 26 (2025). https://doi.org/10.1007/s10515-025-00500-0
Received:
Accepted:
Published:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative