- Notifications
You must be signed in to change notification settings - Fork1
Copy & Paste finder for structured text files.
License
tasleson/duplihere
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Copy & Paste finder for source files or any structured utf-8 text files.
A number of different copy and paste detectors exist. Some examples include:
So why write another? I've wanted a simple tool, one that works like simian,but is open source and free for everyone. Thus this project was born. Ingeneral I think writing a lexer and tokenizing the source isn't needed.There is a ton of code that is very much copy and pasted verbatim.Developers are lazy, they don't change things :-)
duplihere - 0.9.0 - find duplicate textusage: duplihere [-pj -l<number> -i<file name>-t<thread number>] -f<pattern or specific file>Find duplicate lines of textin one or more text files.The duplicated text can be at different levels of indention,but otherwise needs to be identical.More information: https://github.com/tasleson/duplihereargument: description -p, --print print duplicate text [default: false] -j, --json output JSON [default: false] -l, --lines<number> minimum number of duplicate lines [default: 6] -f, --file<pattern or specific file> pattern or file eg."**/*.[h|c]" recursive,"*.py","file.ext", can repeat [required] -i, --ignore<file name> file containinghash values to ignore, one per line -t, --threads<thread number> number of threads to utilize. Set to 0 to match#cpu cores [default: 4]
An example where we re-curse in a directory for python files and a directorythat contains python files ...
$ duplihere -l 10 -p -f'/home/user/somewhere/**/*.py' -f'/tmp/*.py'
An example showing JSON output (not finalized)
$ duplihere -f /home/tasleson/projects/linux/init/main.c -l 5 -j
{"num_lines":5,"num_ignored":0,"duplicates": [ {"key":11558319874972720381,"num_lines":5,"files": [ ["/home/tasleson/projects/linux/init/main.c",830 ], ["/home/tasleson/projects/linux/init/main.c",864 ] ] } ]}
Tool has enough features and functionality for meaningful results.With the latest multi-thread support it's quite fast onbig source trees. Current graph of memory and CPU consumption while examiningthe Linux kernel source tree for duplicates. Run against Linux6.5
branch (~24M lines) and allavailable CPU cores. Chart generated withpsrecord.
About
Copy & Paste finder for structured text files.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.