Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Exploring the potential of general purpose LLMs in automated software refactoring: an empirical study

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Software refactoring is an essential activity for improving the readability, maintainability, and reusability of software projects. To this end, a large number of automated or semi-automated approaches/tools have been proposed to locate poorly designed code, recommend refactoring solutions, and conduct specified refactorings. However, even equipped with such tools, it remains challenging for developers to decide where and what kind of refactorings should be applied. Recent advances in deep learning techniques, especially in large language models (LLMs), make it potentially feasible to automatically refactor source code with LLMs. However, it remains unclear how well LLMs perform compared to human experts in conducting refactorings automatically and accurately. To fill this gap, in this paper, we conduct an empirical study to investigate the potential of LLMs in automated software refactoring, focusing on the identification of refactoring opportunities and the recommendation of refactoring solutions. We first construct a high-quality refactoring dataset comprising 180 real-world refactorings from 20 projects, and conduct the empirical study on the dataset. With the to-be-refactored Java documents as input, ChatGPT and Gemini identified only 28 and 7 respectively out of the 180 refactoring opportunities. The evaluation results suggested that the performance of LLMs in identifying refactoring opportunities is generally low and remains an open problem. However, explaining the expected refactoring subcategories and narrowing the search space in the prompts substantially increased the success rate of ChatGPT from 15.6 to 86.7%. Concerning the recommendation of refactoring solutions, ChatGPT recommended 176 refactoring solutions for the 180 refactorings, and 63.6% of the recommended solutions were comparable to (even better than) those constructed by human experts. However, 13 out of the 176 solutions suggested by ChatGPT and 9 out of the 137 solutions suggested by Gemini were unsafe in that they either changed the functionality of the source code or introduced syntax errors, which indicate the risk of LLM-based refactoring.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

ArticleOpen access14 February 2023

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The replication package, including the tools and the data, is publicly available (Liu2024b).

Notes

References

  • Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: GPT-4 technical report (2023).https://arxiv.org/abs/2303.08774

  • Alon, U., Brody, S., Levy, O., Yahav, E.: code2seq: generating sequences from structured representations of code. In: Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview, New Orleans, LA, USA (2019)

  • Alizadeh, V., Kessentini, M.: Reducing interactive refactoring effort via clustering-based multi-objective search. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE’18), pp. 464–474. ACM, Montpellier (2018).https://doi.org/10.1145/3238147.3238217

  • Alizadeh, V., Kessentini, M., Mkaouer, M.W., Cinnéide, M.Ó., Ouni, A., Cai, Y.: An interactive and dynamic search-based approach to software refactoring recommendations. IEEE Trans. Softw. Eng.46(9), 932–961 (2020).https://doi.org/10.1109/TSE.2018.2872711

    Article  Google Scholar 

  • Akın, F.K.: Awesome ChatGPT prompts (2024).https://github.com/f/awesome-chatgpt-prompts

  • AlOmar, E.A., Mkaouer, M.W., Ouni, A.: Behind the intent of extract method refactoring: a systematic literature review. IEEE Trans. Softw. Eng. (2024).https://doi.org/10.1109/TSE.2023.3345800

    Article MATH  Google Scholar 

  • Alizadeh, V., Ouali, M.A., Kessentini, M., Chater, M.: RefBot: intelligent software refactoring bot. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19), pp. 823–834. IEEE, San Diego, CA, USA (2019).https://doi.org/10.1109/ASE.2019.00081

  • AlOmar, E.A., Venkatakrishnan, A., Mkaouer, M.W., Newman, C.D., Ouni, A.: How to refactor this code? An exploratory study on developer-ChatGPT refactoring conversations. In: Proceedings of the 21st International Conference on Mining Software Repositories (MSR’24), pp. 202–206. IEEE (2024).https://doi.org/10.1145/3643991.3645081

  • Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: learning distributed representations of code. In: Proceedings of the ACM on Programming Languages 3(POPL), pp. 1–29 (2019).https://doi.org/10.1145/3291636

  • Baqais, A.A.B., Alshayeb, M.: Automatic software refactoring: a systematic literature review. Softw. Qual. J.28(2), 459–502 (2020).https://doi.org/10.1007/s11219-019-09477-y

    Article  Google Scholar 

  • Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., et al.: A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity (2023).arxiv: 2302.04023

  • Bavota, G., De Lucia, A., Di Penta, M., Oliveto, R., Palomba, F.: An experimental investigation on the innate relationship between quality and refactoring. J. Syst. Softw.107, 1–14 (2015).https://doi.org/10.1016/j.jss.2015.05.024

    Article MATH  Google Scholar 

  • Bavota, G., De Lucia, A., Oliveto, R.: Identifying extract class refactoring opportunities using structural and semantic cohesion measures. J. Syst. Softw.84(3), 397–414 (2011).https://doi.org/10.1016/j.jss.2010.11.918

    Article MATH  Google Scholar 

  • Barbez, A., Khomh, F., Guéhéneuc, Y.-G.: Deep learning anti-patterns from code metrics history. In: Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME’19), pp. 114–124. IEEE, Cleveland, OH, USA (2019).https://doi.org/10.1109/ICSME.2019.00021

  • Bavota, G., Oliveto, R., De Lucia, A., Antoniol, G., Guéhéneuc, Y.-G.: Playing with refactoring: identifying extract class opportunities through game theory. In: Proceedings of the 2010 IEEE International Conference on Software Maintenance (ICSM’10), pp. 1–5. IEEE (2010).https://doi.org/10.1109/ICSM.2010.5609739

  • Charalampidou, S., Ampatzoglou, A., Chatzigeorgiou, A., Gkortzis, A., Avgeriou, P.: Identifying extract method refactoring opportunities based on functional relevance. IEEE Trans. Softw. Eng.43(10), 954–974 (2017).https://doi.org/10.1109/TSE.2016.2645572

    Article  Google Scholar 

  • Chouchen, M., Bessghaier, N., Begoug, M., Ouni, A., AlOmar, E.A., Mkaouer, M.W.: How do so ware developers use ChatGPT? An exploratory study on github pull requests. In: Proceedings of the 21st International Conference on Mining Software Repositories (MSR’24), pp. 212–216. IEEE (2024).https://doi.org/10.1145/3643991.3645084

  • Chang, S., Fosler-Lussier, E.: How to prompt LLMs for text-to-SQL: a study in zero-shot, single-domain, and cross-domain settings (2023).arxiv: 2305.11853

  • Chen, T., Jiang, Y., Fan, F., Liu, B., Liu, H.: A position-aware approach to decomposing god classes. In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE’24), pp. 129–140 (2024). IEEE.https://doi.org/10.1145/3691620.3694992

  • Cui, D., Wang, S., Luo, Y., Li, X., Dai, J., Wang, L., Li, Q.: RMove: recommending move method refactoring opportunities using structural and semantic representations of code. In: Proceedings of the 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME’22), pp. 281–292. IEEE, Limassol, Cyprus (2022).https://doi.org/10.1109/ICSME55016.2022.00033

  • Dair.AI: Prompt Engineering Guide (2024).https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/guides/prompts-intro.md

  • Dawes, J.: Do data characteristics change according to the number of scale points used? An experiment using 5-point, 7-point and 10-point scales. Int. J. Mark. Res.50(1), 61–104 (2008).https://doi.org/10.1177/147078530805000106

    Article MATH  Google Scholar 

  • Dilhara, M., Bellur, A., Bryksin, T., Dig, D.: Unprecedented code change automation: the fusion of LLMS and transformation by example. In: Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE’24), pp. 631–653. ACM, Porto de Galinhas, Brazil (2024).https://doi.org/10.1145/3643755

  • Desai, U., Bandyopadhyay, S., Tamilselvam, S.: Graph neural network to dilute outliers for refactoring monolith application. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI’21), vol. 35, pp. 72–80 (2021).https://doi.org/10.1609/aaai.v35i1.16079

  • Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), pp. 4171–4186. ACL, Minneapolis, MN, USA (2019).https://doi.org/10.18653/V1/N19-1423

  • Dig, D., Comertoglu, C., Marinov, D., Johnson, R.: Automated detection of refactorings in evolving components. In: Proceedings of the 20th European Conference on Object-Oriented Programming (ECOOP’06), pp. 404–428. Springer, Nantes, France (2006).https://doi.org/10.1007/11785477_24

  • Deo, S., Hinge, D., Chavan, O.S., Wang, Y.O., Mkaouer, M.W.: Analyzing developer-ChatGPT conversations for software refactoring: an exploratory study. In: Proceedings of the 21st International Conference on Mining Software Repositories (MSR’24), pp. 207–211 (2024). IEEE.https://doi.org/10.1145/3643991.3645082

  • Dice, L.R.: Measures of the amount of ecologic association between species. Ecology26(3), 297–302 (1945).https://doi.org/10.2307/1932409

    Article MATH  Google Scholar 

  • DePalma, K., Miminoshvili, I., Henselder, C., Moss, K., AlOmar, E.A.: Exploring ChatGPT’s code refactoring capabilities: an empirical study. Expert Syst. Appl.249, 1–26 (2024).https://doi.org/10.1016/j.eswa.2024.123602

    Article  Google Scholar 

  • Farmmamba: Hadoop HDFS.https://issues.apache.org/jira/browse/HDFS-17322 (2024)

  • Fontana, F.A., Caracciolo, A., Zanoni, M.: DPB: A benchmark for design pattern detection tools. In: Proceedings of the 16th European Conference on Software Maintenance and Reengineering (CSMR’12), Szeged, Hungary, pp. 235–244. IEEE (2012).https://doi.org/10.1109/CSMR.2012.32

  • Fulop, L.J., Ferenc, R., Gyimóthy, T.: Towards a benchmark for evaluating design pattern miner tools. In: Proceedings of the 12th European Conference on Software Maintenance and Reengineering (CSMR’08), pp. 143–152. IEEE, Athens, Greece (2008).https://doi.org/10.1109/CSMR.2008.4493309

  • Foster, S.R., Griswold, W.G., Lerner, S.: WitchDoctor: IDE support for real-time auto-completion of refactorings. In: Proceedings of the 34th International Conference on Software Engineering (ICSE’12), pp. 222–232. IEEE, Zurich, Switzerland (2012).https://doi.org/10.1109/ICSE.2012.6227191

  • Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull.76(5), 378–382 (1971).https://doi.org/10.1037/h0031619

    Article MATH  Google Scholar 

  • Falleri, J.-R., Morandat, F., Blanc, X., Martinez, M., Monperrus, M.: Fine-grained and accurate source code differencing. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE’14), pp. 313–324. ACM, Vasteras, Sweden (2014).https://doi.org/10.1145/2642937.2642982

  • Feitelson, D.G., Mizrahi, A., Noy, N., Shabat, A.B., Eliyahu, O., Sheffer, R.: How developers choose names. IEEE Trans. Softw. Eng.48(1), 37–52 (2022).https://doi.org/10.1109/TSE.2020.2976920

    Article  Google Scholar 

  • Fowler, M.: Refactoring: Improving the Design of Existing Code. Addison-Wesley, Boston (1999)

    MATH  Google Scholar 

  • Fokaefs, M., Tsantalis, N., Chatzigeorgiou, A.: JDeodorant: identification and removal of feature envy bad smells. In: Proceedings of the 23rd IEEE International Conference on Software Maintenance (ICSM’07), pp. 519–520. IEEE, Paris, France (2007).https://doi.org/10.1109/ICSM.2007.4362679

  • Fluri, B., Wursch, M., Pinzger, M., Gall, H.: Change distilling: tree differencing for fine-grained source code change extraction. IEEE Trans. Softw. Eng.33(11), 725–743 (2007).https://doi.org/10.1109/TSE.2007.70731

    Article  Google Scholar 

  • Grund, F., Chowdhury, S.A., Bradley, N.C., Hall, B., Holmes, R.: CodeShovel: constructing method-level source code histories. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE’21), pp. 1510–1522. IEEE, Madrid, Spain (2021).https://doi.org/10.1109/ICSE43902.2021.00135

  • Guo, Q., Cao, J., Xie, X., Liu, S., Li, X., Chen, B., Peng, X.: Exploring the potential of ChatGPT in automated code refinement: an empirical study. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE’24), pp. 1–13. ACM, Lisbon, Portugal (2024).https://doi.org/10.1145/3597503.3623306

  • Ge, X., DuBose, Q.L., Murphy-Hill, E.: Reconciling manual and automatic refactoring. In: Proceedings of the 34th International Conference on Software Engineering (ICSE’12), pp. 211–221. IEEE, Zurich, Switzerland (2012).https://doi.org/10.1109/ICSE.2012.6227192

  • Grissom, R.J., Kim, J.J.: Effect Sizes for Research: A Broad Practical Approach. Lawrence Erlbaum Associates, Hillsdale, NJ (2005)

    MATH  Google Scholar 

  • Google: Gemini Model (2024).https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models

  • JetBrains: IntelliJ IDEA Refactoring Meun (2024).https://www.jetbrains.com/help/idea/refactoring-source-code.html

  • Kniesel, G., Binun, A.: Standing on the shoulders of giants—a data fusion approach to design pattern detection. In: Proceedings of the IEEE 17th IEEE International Conference on Program Comprehension (ICPC’09), pp. 208–217. IEEE, Vancouver, BC, Canada (2009).https://doi.org/10.1109/ICPC.2009.5090044

  • Kim, M., Gee, M., Loh, A., Rachatasumrit, N.: Ref-Finder: A refactoring reconstruction tool based on logic query templates. In: Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’10), pp. 371–372. ACM, Santa Fe, NM, USA (2010).https://doi.org/10.1145/1882291.1882353

  • Kurbatova, Z., Veselov, I., Golubev, Y., Bryksin, T.: Recommendation of move method refactoring using path-based representation of code. In: Proceedings of the 42nd IEEE/ACM International Conference on Software Engineering Workshops (IWoR’20), pp. 315–322. ACM, Seoul, Republic of Korea (2020).https://doi.org/10.1145/3387940.3392191

  • Liu, H., Guo, X., Shao, W.: Monitor-based instant software refactoring. IEEE Trans. Softw. Eng.39(8), 1112–1126 (2013).https://doi.org/10.1109/TSE.2013.4

    Article MATH  Google Scholar 

  • Liu, Y., Han, T., Ma, S., Zhang, J., Yang, Y., Tian, J., He, H., Li, A., He, M., Liu, Z., et al.: Summary of ChatGPT-related research and perspective towards the future of large language models. Meta-Radiology1(2), 1–14 (2023).https://doi.org/10.1016/j.metrad.2023.100017

    Article MATH  Google Scholar 

  • Liu, B.: ReExtractor (2024).https://github.com/lyoubo/ReExtractor

  • Liu, B.: Replication Package (2024).https://github.com/bitselab/LLM4Refactoring

  • Liu, H., Jin, J., Xu, Z., Zou, Y., Bu, Y., Zhang, L.: Deep learning based code smell detection. IEEE Trans. Softw. Eng.47(9), 1811–1837 (2021).https://doi.org/10.1109/TSE.2019.2936376

    Article  Google Scholar 

  • Liu, K., Kim, D., Bissyandé, T.F., Kim, T., Kim, K., Koyuncu, A., Kim, S., Le Traon, Y.: Learning to spot and refactor inconsistent method names. In: Proceedings of the 41st International Conference on Software Engineering (ICSE’19), pp. 1–12. IEEE, Montreal, QC, Canada (2019).https://doi.org/10.1109/ICSE.2019.00019

  • Liu, B., Liu, H., Li, G., Niu, N., Xu, Z., Wang, Y., Xia, Y., Zhang, Y., Jiang, Y.: Deep learning based feature envy detection boosted by real-world examples. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’23), pp. 908–920. ACM, San Francisco, CA, USA (2023).https://doi.org/10.1145/3611643.3616353

  • Liu, H., Liu, Q., Liu, Y., Wang, Z.: Identifying renaming opportunities by expanding conducted rename refactorings. IEEE Trans. Softw. Eng.41(9), 887–900 (2015).https://doi.org/10.1109/TSE.2015.2427831

    Article MATH  Google Scholar 

  • Liu, B., Liu, H., Niu, N., Zhang, Y., Li, G., Jiang, Y.: Automated software entity matching between successive versions. In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE’23), pp. 1615–1627. IEEE, Luxembourg, Luxembourg (2023).https://doi.org/10.1109/ASE56229.2023.00132

  • Lacerda, G., Petrillo, F., Pimenta, M., Guéhéneuc, Y.G.: Code smells and refactoring: a tertiary systematic review of challenges and observations. J. Syst. Softw.167, 110610 (2020).https://doi.org/10.1016/j.jss.2020.110610

    Article  Google Scholar 

  • Liu, H., Wang, Y., Wei, Z., Xu, Y., Wang, J., Li, H., Ji, R.: RefBERT: a two-stage pre-trained framework for automatic rename refactoring. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’23), pp. 740–752. ACM, Seattle, WA, USA (2023).https://doi.org/10.1145/3597926.3598092

  • Murphy-Hill, E., Parnin, C., Black, A.P.: How we refactor, and how we know it. IEEE Trans. Softw. Eng.38(1), 5–18 (2012).https://doi.org/10.1109/TSE.2011.41

    Article  Google Scholar 

  • Mkaouer, M.W., Kessentini, M., Cinnéide, M.Ó., Hayashi, S., Deb, K.: A robust multi-objective approach to balance severity and importance of refactoring opportunities. Empir. Softw. Eng.22, 894–927 (2017).https://doi.org/10.1007/s10664-016-9426-8

    Article  Google Scholar 

  • Minna, F., Massacci, F., Tuma, K.: Analyzing and Mitigating (with LLMs) the Security Misconfigurations of Helm Charts from Artifact Hub (2024).https://arxiv.org/abs/2403.09537

  • Mu, F., Shi, L., Wang, S., Yu, Z., Zhang, B., Wang, C., Liu, S., Wang, Q.: ClarifyGPT: empowering LLM-based code generation with intention clarification (2023).https://arxiv.org/abs/2310.10996

  • Mens, T., Tourwé, T.: A survey of software refactoring. IEEE Trans. Softw. Eng.30(2), 126–139 (2004).https://doi.org/10.1109/TSE.2004.1265817

    Article MATH  Google Scholar 

  • Negara, S., Chen, N., Vakilian, M., Johnson, R.E., Dig, D.: A comparative study of manual and automated refactorings. In: Proceedings of the 27th European Conference on Object-Oriented Programming (ECOOP’13), pp. 552–576. Springer, Berlin (2013).https://doi.org/10.1007/978-3-642-39038-8_23

  • OpenAI: ChatGPT (2024).https://openai.com/index/chatgpt

  • OpenAI: GPT-4 Model (2024).https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo

  • Pomian, D., Bellur, A., Dilhara, M., Kurbatova, Z., Bogomolov, E., Bryksin, T., Dig, D.: Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring (2024).arxiv: 2401.15298

  • Pomian, D., Bellur, A., Dilhara, M., Kurbatova, Z., Bogomolov, E., Sokolov, A., Bryksin, T., Dig, D.: EM-Assist: safe automated extract method refactoring with LLMs. In: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE’24’), pp. 582–586. ACM (2024).https://doi.org/10.1145/3663529.3663803

  • PMD: PMD (2024).https://github.com/pmd/pmd

  • Peruma, A., Mkaouer, M.W., Decker, M.J., Newman, C.D.: An empirical investigation of how and why developers rename identifiers. In: Proceedings of the 2nd International Workshop on Refactoring (IWoR’18’), pp. 26–33. ACM (2018).https://doi.org/10.1145/3242163.3242169

  • Prete, K., Rachatasumrit, N., Sudan, N., Kim, M.: Template-based reconstruction of complex refactorings. In: Proceedings of the 26th IEEE International Conference on Software Maintenance (ICSM’10), pp. 1–10. IEEE, Timisoara, Romania (2010).https://doi.org/10.1109/ICSM.2010.5609577

  • Peruma, A., Simmons, S., AlOmar, E.A., Newman, C.D., Mkaouer, M.W., Ouni, A.: How do I refactor this? An empirical study on refactoring trends and topics in Stack Overflow. Empir. Softw. Eng.27(11), 1–43 (2022).https://doi.org/10.1007/s10664-021-10045-x

    Article  Google Scholar 

  • Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res.21(1), 5485–5551 (2020).https://doi.org/10.18653/V1/N19-1423

    Article MathSciNet  Google Scholar 

  • Silva, D., Silva, J.P., Santos, G., Terra, R., Valente, M.T.: RefDiff 2.0: A multi-language refactoring detection tool. IEEE Trans. Softw. Eng.47(12), 2786–2802 (2020).https://doi.org/10.1109/TSE.2020.2968072

    Article MATH  Google Scholar 

  • Shirafuji, A., Oda, Y., Suzuki, J., Morishita, M., Watanobe, Y.: Refactoring programs using large language models with few-shot examples. In: Proceedings of the 30th Asia-Pacific Software Engineering Conference (APSEC’23), pp. 151–160. IEEE (2023).https://doi.org/10.1109/APSEC60848.2023.00025

  • Silva, D., Tsantalis, N., Valente, M.T.: Why we refactor? Confessions of GitHub contributors. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’16), pp. 858–870. ACM, Seattle WA USA (2016).https://doi.org/10.1145/2950290.2950305

  • Silva, D., Valente, M.T.: RefDiff: detecting refactorings in version histories. In: Proceedings of the 14th International Conference on Mining Software Repositories (MSR’17), pp. 269–279. IEEE (2017).https://doi.org/10.1109/MSR.2017.14

  • Team, G., Anil, R., Borgeaud, S., Wu, Y., Alayrac, J.-B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., et al.: Gemini: a family of highly capable multimodal models (2023).arXiv. 2312.11805

  • Tate, R.F.: Correlation between a discrete and a continuous variable, point-biserial correlation. Ann. Math. Stat.25(3), 603–607 (1954).https://doi.org/10.1214/aoms/1177728730

    Article MathSciNet MATH  Google Scholar 

  • Tsantalis, N., Chatzigeorgiou, A.: Identification of move method refactoring opportunities. IEEE Trans. Softw. Eng.35(3), 347–367 (2009).https://doi.org/10.1109/TSE.2009.1

    Article MATH  Google Scholar 

  • Tsantalis, N., Chatzigeorgiou, A.: Identification of extract method refactoring opportunities for the decomposition of methods. J. Syst. Softw.84(10), 1757–1782 (2011).https://doi.org/10.1016/j.jss.2011.05.016

    Article MATH  Google Scholar 

  • Tsantalis, N., Chaikalis, T., Chatzigeorgiou, A.: JDeodorant: Identification and removal of type-checking bad smells. In: Proceedings of the 12th European Conference on Software Maintenance and Reengineering (CSMR’08), pp. 329–331. IEEE, Athens, Greece (2008).https://doi.org/10.1109/CSMR.2008.4493342

  • Tsantalis, N., Ketkar, A., Dig, D.: RefactoringMiner 2.0. IEEE Trans. Softw. Eng.48(3), 930–950 (2022).https://doi.org/10.1109/TSE.2020.3007722

    Article  Google Scholar 

  • Tourwé, T., Mens, T.: Identifying refactoring opportunities using logic meta programming. In: Proceedings of the 7th European Conference on Software Maintenance and Reengineering (CSMR’03), pp. 91–100. IEEE, Benevento, Italy (2003).https://doi.org/10.1109/CSMR.2003.1192416

  • Tsantalis, N., Mansouri, M., Eshkevari, L.M., Mazinanian, D., Dig, D.: Accurate and efficient refactoring detection in commit history. In: Proceedings of the 40th International Conference on Software Engineering (ICSE’18), pp. 483–494. ACM, Gothenburg, Sweden (2018).https://doi.org/10.1145/3180155.3180206

  • Tufano, R., Mastropaolo, A., Pepe, F., Dabić, O., Di Penta, M., Bavota, G.: Unveiling ChatGPT’s usage in open source projects: A mining-based study. In: Proceedings of the 21st International Conference on Mining Software Repositories (MSR ’24), pp. 571–583. IEEE (2024).https://doi.org/10.1145/3643991.3644918

  • Tufano, M., Pantiuchina, J., Watson, C., Bavota, G., Poshyvanyk, D.: On learning meaningful code changes via neural machine translation. In: Proceedings of the 41st International Conference on Software Engineering (ICSE’19), pp. 25–36. IEEE, Montreal, QC, Canada (2019).https://doi.org/10.1109/ICSE.2019.00021

  • Vitale, A., Piantadosi, V., Scalabrino, S., Oliveto, R.: Using deep learning to automatically improve code readability. In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE’23), pp. 573–584. IEEE, Luxembourg, Luxembourg (2023).https://doi.org/10.1109/ASE56229.2023.00112

  • White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT (2023).https://arxiv.org/abs/2302.11382

  • White, J., Hays, S., Fu, Q., Spencer-Smith, J., Schmidt, D.C.: ChatGPT prompt patterns for improving code quality, refactoring, requirements elicitation, and software design, pp. 71–108 (2024).https://doi.org/10.1007/978-3-031-55642-5_4

  • Wilcoxon, F.: Individual comparisons by ranking methods. Int. Biomet. Soc.1(6), 80–83 (1945).https://doi.org/10.2307/3001968

    Article MATH  Google Scholar 

  • Wu, Y., Li, Z., Zhang, J.M., Papadakis, M., Harman, M., Liu, Y.: Large language models in fault localisation (2023).https://arxiv.org/abs/2308.15276

  • Xing, Z., Stroulia, E.: The JDEvAn tool suite in support of object-oriented evolutionary development. In: Companion of the 30th International Conference on Software Engineering (ICSE Companion’08), pp. 951–952. ACM, Leipzig, Germany (2008).https://doi.org/10.1145/1370175.1370203

  • Xia, C.S., Zhang, L.: Keep the Conversation Going: Fixing 162 out of 337 bugs for \$0.42 each using ChatGPT (2023).arxiv. 2304.00385

  • Yaron: TestMe (2024).https://github.com/wrdv/testme-idea

  • Yamashita, A., Moonen, L.: Do developers care about code smells? An exploratory survey. In: Proceedings of the 20th Working Conference on Reverse Engineering (WCRE’13), pp. 242–251. IEEE (2013).https://doi.org/10.1109/WCRE.2013.6671299

  • Zhang, J., Luo, J., Liang, J., Gong, L., Huang, Z.: An accurate identifier renaming prediction and suggestion approach. ACM Trans. Softw. Eng. Methodol.32(6), 1–51 (2023).https://doi.org/10.1145/3603109

    Article MATH  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (62232003 and 62172037), China National Postdoctoral Program for Innovative Talents (BX20240008) and CCF-Huawei Populus Grove Fund (CCF-HuaweiSE202411).

Author information

Authors and Affiliations

  1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China

    Bo Liu, Yanjie Jiang, Yuxia Zhang & Hui Liu

  2. Key Laboratory of High Confidence Software Technologies, Ministry of Education, School of Computer Science, Peking University, Beijing, 100871, China

    Yanjie Jiang

  3. Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, 45221, USA

    Nan Niu

  4. National Innovation Institute of Defense Technology, Beijing, 100071, China

    Guangjie Li

Authors
  1. Bo Liu

    You can also search for this author inPubMed Google Scholar

  2. Yanjie Jiang

    You can also search for this author inPubMed Google Scholar

  3. Yuxia Zhang

    You can also search for this author inPubMed Google Scholar

  4. Nan Niu

    You can also search for this author inPubMed Google Scholar

  5. Guangjie Li

    You can also search for this author inPubMed Google Scholar

  6. Hui Liu

    You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence toYanjie Jiang orHui Liu.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, B., Jiang, Y., Zhang, Y.et al. Exploring the potential of general purpose LLMs in automated software refactoring: an empirical study.Autom Softw Eng32, 26 (2025). https://doi.org/10.1007/s10515-025-00500-0

Download citation

Keywords

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Advertisement


[8]ページ先頭

©2009-2025 Movatter.jp