Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Copy & Paste finder for structured text files.

License

NotificationsYou must be signed in to change notification settings

tasleson/duplihere

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What

Copy & Paste finder for source files or any structured utf-8 text files.

Why

A number of different copy and paste detectors exist. Some examples include:

So why write another? I've wanted a simple tool, one that works like simian,but is open source and free for everyone. Thus this project was born. Ingeneral I think writing a lexer and tokenizing the source isn't needed.There is a ton of code that is very much copy and pasted verbatim.Developers are lazy, they don't change things :-)

How

duplihere - 0.9.0 - find duplicate textusage: duplihere [-pj -l<number> -i<file name>-t<thread number>] -f<pattern or specific file>Find duplicate lines of textin one or more text files.The duplicated text can be at different levels of indention,but otherwise needs to be identical.More information: https://github.com/tasleson/duplihereargument:                                        description    -p, --print                                  print duplicate text [default: false]    -j, --json                                   output JSON [default: false]    -l, --lines<number>                         minimum number of duplicate lines [default: 6]    -f, --file<pattern or specific file>        pattern or file eg."**/*.[h|c]" recursive,"*.py","file.ext", can repeat [required]    -i, --ignore<file name>                     file containinghash values to ignore, one per line    -t, --threads<thread number>                number of threads to utilize. Set to 0 to match#cpu cores [default: 4]

An example where we re-curse in a directory for python files and a directorythat contains python files ...

$ duplihere -l 10 -p -f'/home/user/somewhere/**/*.py' -f'/tmp/*.py'

An example showing JSON output (not finalized)

$ duplihere -f /home/tasleson/projects/linux/init/main.c -l 5 -j
{"num_lines":5,"num_ignored":0,"duplicates": [    {"key":11558319874972720381,"num_lines":5,"files": [        ["/home/tasleson/projects/linux/init/main.c",830        ],        ["/home/tasleson/projects/linux/init/main.c",864        ]      ]    }  ]}

Status

Tool has enough features and functionality for meaningful results.With the latest multi-thread support it's quite fast onbig source trees. Current graph of memory and CPU consumption while examiningthe Linux kernel source tree for duplicates. Run against Linux6.5 branch (~24M lines) and allavailable CPU cores. Chart generated withpsrecord.

threadripper

Releases

No releases published

Packages

No packages published

Contributors2

  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp