Movatterモバイル変換

Duplicate code

From Wikipedia, the free encyclopedia

Repeated fragment of computer source code

Incomputer programming,duplicate code is multiple occurrences of equivalentsource code in acodebase. A duplicate code fragment is also known as acode clone, and the process of finding clones in source code is calledclone detection. Duplicate code has multipleundesirable aspects.^[1]

Whether fragments are classified as duplicate can be subjective. Fragments that are very small – such as a singlestatement – are probably not classified as duplicate. Additionally, fragments that is not exactly the same text might be considered duplicate code if they match except for less important aspects such aswhitespace,comments andvariable names. Even fragments that are only functionally equivalent may be classified as duplicate.

Cost

[edit]

Code that includes duplicate functionality is more difficult tomaintain because if it needs updating, there is risk that only some of the duplicates will be updated; leaving the others as-is. When code with avulnerability is duplicated, the vulnerability exists in the duplicate even after it is fixed in one copy.^[2]Refactoring to eliminate duplicate code can improve many software metrics, such aslines of code,cyclomatic complexity, andcoupling. This may lead to shorter compilation time, lowercognitive load, lesshuman error, and fewer forgotten or overlooked pieces of code.

However, not all code duplication can be refactored.^[3] Clones may be the most effective solution if theprogramming language provides inadequate or overly complex abstractions, particularly if supported with user interface techniques such assimultaneous editing. Furthermore, the risk of breaking code when refactoring may outweigh maintenance benefit.^[4] A study by Wagner, Abdulkhaleq, and Kaya concluded that while additional work must be done to keep duplicates in sync, if the programmers involved are aware of the duplicate code there weren't significantly more faults caused than in unduplicated code.^[5]^{[disputed –discuss]}

Another cost is memory size as duplicate code requires memory to store each copy.

Emergence

[edit]

Some practices that lead to duplicate code include:

Scrounging: Viacopy and paste programming, a section of code is duplicated in the codebase instead of factored it into a reusablefunction.

Code snippets: Some development tools like LLMs^[6] automate the process of insertingcode snippets; code that is identical or functionality equivalent.

Coincidence: Similar code may be developed independently although studies suggest that such code is typically not syntactically similar.^[7]

Generated: Automatically generated code may be duplicate but this may be done for runtime performance or ease of development.

Fixing

[edit]

Example of duplicate code fix via code replaced by the method

Duplicate code is most commonly eliminated by moving the code to a function and replacing each duplicate with a call to that function.

For example, the following code calculates theaverage of anarray ofintegers.

externintarray_a[4];externintarray_b[4];intsum_a=0;for(inti=0;i<4;i++){sum_a+=array_a[i];}intaverage_a=sum_a/4;intsum_b=0;for(inti=0;i<4;i++){sum_b+=array_b[i];}intaverage_b=sum_b/4;

The two loops can be rewritten as the function:

intcalc_average_of_four(inta[]){intsum=0;for(inti=0;i<4;i++){sum+=a[i];}returnsum/4;}

Using this function eliminates the duplicated code.

externintarray1[4];externintarray2[4];intaverage1=calc_average_of_four(array1);intaverage2=calc_average_of_four(array2);

Thecompiler mightinline the calls such that the resultingmachine code is identical for both versions. If the function is not inlined, then theadditional overhead of the function calls will take longer to run by a relatively small amount.

Detecting

[edit]

A number of algorithms have been proposed to detect duplicate code. For example:

Baker's algorithm.^[8]
Rabin–Karp string search algorithm.
Usingabstract syntax trees.^[9]
Visual clone detection.^[10]
Count matrix clone detection.^[11]^[12]
Locality-sensitive hashing
Anti-unification^[13]

References

[edit]

^Spinellis, Diomidis."The Bad Code Spotter's Guide". InformIT.com. Retrieved2008-06-06.
^Li, Hongzhe; Kwon, Hyuckmin; Kwon, Jonghoon; Lee, Heejo (25 April 2016). "CLORIFI: software vulnerability discovery using code clone verification".Concurrency and Computation: Practice and Experience.28 (6):1900–1917.doi:10.1002/cpe.3532.S2CID 17363758.
^Arcelli Fontana, Francesca; Zanoni, Marco; Ranchetti, Andrea; Ranchetti, Davide (2013)."Software Clone Detection and Refactoring"(PDF).ISRN Software Engineering.2013:1–8.doi:10.1155/2013/129437.
^Kapser, C.; Godfrey, M.W.,""Cloning Considered Harmful" Considered Harmful," 13th Working Conference on Reverse Engineering (WCRE), pp. 19-28, Oct. 2006
^Wagner, Stefan; Abdulkhaleq, Asim; Kaya, Kamer; Paar, Alexander (2016)."On the Relationship of Inconsistent Software Clones and Faults: An Empirical Study".2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). pp. 79–89.arXiv:1611.08005.doi:10.1109/SANER.2016.94.ISBN 978-1-5090-1855-0.S2CID 3154845.
^https://arxiv.org/html/2504.12608v1 Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation
^Code similarities beyond copy & paste by Elmar Juergens, Florian Deissenboeck, Benjamin Hummel.
^Brenda S. Baker.A Program for Identifying Duplicated Code. Computing Science and Statistics,24:49–57, 1992.
^Ira D. Baxter, et al.Clone Detection Using Abstract Syntax Trees
^Visual Detection of Duplicated Code Archived 2006-06-29 at theWayback Machine by Matthias Rieger, Stephane Ducasse.
^Yuan, Y. and Guo, Y.CMCD: Count Matrix Based Code Clone Detection, in 2011 18th Asia-Pacific Software Engineering Conference. IEEE, Dec. 2011, pp. 250–257.
^Chen, X., Wang, A. Y., & Tempero, E. D. (2014).A Replication and Reproduction of Code Clone Detection Studies. In ACSC (pp. 105-114).
^Bulychev, Peter, and Marius Minea. "Duplicate code detection using anti-unification." Proceedings of the Spring/Summer Young Researchers’ Colloquium on Software Engineering. No. 2. Федеральное государственное бюджетное учреждение науки Институт системного программирования Российской академии наук, 2008.