- Notifications
You must be signed in to change notification settings - Fork11
License
mizchi/similarity
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
High-performance code similarity detection tools written in Rust. Detects duplicate functions and similar code patterns across your codebase in multiple programming languages.
Tool | Language | Status | Description |
---|---|---|---|
similarity-ts | TypeScript/JavaScript | ✅Production Ready | Most mature and production-tested |
similarity-py | Python | Not production-tested yet | |
similarity-rs | Rust | Not production-tested yet | |
similarity-elixir | Elixir | 🧪Experimental | Early development stage |
similarity-generic | Go, Java, C/C++, C#, Ruby | 🧪Experimental | Early development stage |
similarity-md | Markdown | 🧪Experimental | Early development stage |
- Zero configuration - works out of the box
- Multi-language support - TypeScript/JavaScript, Python, and Rust
- Fast & Accurate - AST-based comparison, not just text matching
- AI-friendly output - Easy to share with Claude, GPT-4, etc.
cargo install similarity-ts
# Scan current directorysimilarity-ts.# Scan specific filessimilarity-ts src/utils.ts src/helpers.ts# Show actual codesimilarity-ts. --print
Copy the output and use this prompt with Claude:
Run `similarity-ts .` to detect semantic code similarities. Execute this command, analyze the duplicate code patterns, and create a refactoring plan. Check `similarity-ts -h` for detailed options.
Example output:
Duplicates in src/utils.ts:──────────────────────────────────────────────────────────── src/utils.ts:10-20 calculateTotal <-> src/helpers.ts:5-15 computeSum Similarity: 92.50%, Score: 9.2 points
The AI will analyze patterns and suggest refactoring strategies.
- AI Assistant Guide - Refactoring workflow and best practices
- similarity-ts - TypeScript/JavaScript similarity detection ✅Most mature and production-tested
- similarity-py - Python similarity detection
⚠️ Not production-tested - similarity-rs - Rust similarity detection
⚠️ Not production-tested
- similarity-elixir - Elixir similarity detection 🧪Experimental
- similarity-generic - Generic similarity detection for Go, Java, C/C++, C#, Ruby 🧪Experimental
- similarity-md - Markdown similarity detection 🧪Experimental
# Install from crates.iocargo install similarity-ts# Use the installed binarysimilarity-ts --help
# Install from crates.iocargo install similarity-py# Use the installed binarysimilarity-py --help
# Install from crates.iocargo install similarity-rs# Use the installed binarysimilarity-rs --help
# Install from crates.iocargo install similarity-elixir# Use the installed binarysimilarity-elixir --help
# Install from crates.iocargo install similarity-generic# Use the installed binarysimilarity-generic --language go main.gosimilarity-generic --language java Main.java
# Clone the repositorygit clone https://github.com/mizchi/similarity.gitcd similarity# Build all toolscargo build --release# Or install specific toolcargo install --path crates/similarity-tscargo install --path crates/similarity-pycargo install --path crates/similarity-rs
--threshold
/-t
- Similarity threshold (0.0-1.0, default: 0.85)--min-lines
/-m
- Minimum lines for functions (default: 3-5)--min-tokens
- Minimum AST nodes for functions--print
/-p
- Print code in output--cross-file
/-c
- Enable cross-file comparison--no-size-penalty
- Disable size difference penalty
# Check for duplicate functions (default)similarity-ts ./src# Enable type checking (experimental)similarity-ts ./src --experimental-types# Check types onlysimilarity-ts ./src --no-functions --experimental-types# Fast mode with bloom filter (default)similarity-ts ./src --no-fast# disable
# Check Python filessimilarity-py ./src# Include test filessimilarity-py. --extensions py,test.py
# Check Rust filessimilarity-rs ./src# Skip test functions (test_ prefix or #[test])similarity-rs. --skip-test# Set minimum tokens (default: 30)similarity-rs. --min-tokens 50
The tool outputs in a VSCode-compatible format for easy navigation:
Duplicates in src/utils.ts:──────────────────────────────────────────────────────────── src/utils.ts:10 | L10-15 similar-function: calculateSum src/utils.ts:20 | L20-25 similar-function: addNumbers Similarity: 85.00%, Priority: 8.5 (lines: 10)
Click on the file paths in VSCode's terminal to jump directly to the code.
Results are sorted by priority (lines × similarity) to help you focus on the most impactful duplications first.
For AI assistants (like Claude, GPT-4, etc.) to help with code deduplication:
`similarity-ts .` でコードの意味的な類似が得られます。あなたはこれを実行し、ソースコードの重複を検知して、リファクタリング計画を立てます。細かいオプションは similarity-ts -h で確認してください。
English version:
Run `similarity-ts .` to detect semantic code similarities. Execute this command, analyze the duplicate code patterns, and create a refactoring plan. Check `similarity-ts -h` for detailed options.
Run similarity detection:
similarity-ts. --threshold 0.8 --min-lines 10
Share output with AI: Copy the similarity report to your AI assistant
AI analyzes patterns: The AI will identify common patterns and suggest refactoring strategies
Iterative refinement: Adjust threshold and options based on AI recommendations
This tool can be integrated into:
- Pre-commit hooks to prevent duplicate code
- CI/CD pipelines for code quality checks
- IDE extensions for real-time duplicate detection
- AI-powered code review workflows
The--experimental-overlap
flag enables detection of partial code overlaps within and across functions:
# Basic overlap detectionsimilarity-ts ./src --experimental-overlap# With custom parameterssimilarity-ts ./src --experimental-overlap \ --threshold 0.75 \ --overlap-min-window 8 \ --overlap-max-window 25 \ --overlap-size-tolerance 0.25
Parameters:
--experimental-overlap
: Enable overlap detection mode--overlap-min-window
: Minimum AST nodes to consider (default: 8)--overlap-max-window
: Maximum AST nodes to consider (default: 25)--overlap-size-tolerance
: Size variation tolerance (default: 0.25)
Use Cases:
- Finding copy-pasted code fragments within larger functions
- Detecting similar algorithmic patterns across different contexts
- Identifying refactoring opportunities for common code blocks
- TypeScript: Type similarity detection (interfaces, type aliases)
- Python: Class and method detection, decorator support
- Rust: Test function filtering, impl block analysis
# Find duplicate functionssimilarity-ts ./src --threshold 0.7 --print# Find similar types across filessimilarity-ts ./src --no-functions --experimental-types --cross-file --print# Comprehensive analysissimilarity-ts ./src \ --threshold 0.8 \ --min-lines 10 \ --cross-file \ --extensions ts,tsx# Detect partial code overlaps (Experimental)similarity-ts ./src --experimental-overlap --threshold 0.75 --print
# Find duplicate functions in Python projectsimilarity-py ./src --threshold 0.85 --print# Check with custom settingssimilarity-py. \ --min-lines 5 \ --extensions py
# Find duplicates excluding testssimilarity-rs ./src --skip-test --print# Strict checking with high token countsimilarity-rs. \ --min-tokens 50 \ --threshold 0.9 \ --skip-test
⚠️ EXPERIMENTAL: The generic language support is in early development and may have limitations or bugs.
Thesimilarity-generic
tool provides experimental support for additional languages using tree-sitter parsers:
- Go
- Java
- C
- C++
- C#
- Ruby
- Elixir
# From crates.io (when available)cargo install similarity-generic# From sourcecargo install --path crates/similarity-generic
# Detect Go duplicatessimilarity-generic --language go ./src# Detect Java duplicatessimilarity-generic --language java ./src# Detect C/C++ duplicatessimilarity-generic --language c ./srcsimilarity-generic --language cpp ./src# Detect C# duplicatessimilarity-generic --language csharp ./src# Detect Ruby duplicatessimilarity-generic --language ruby ./src# Detect Elixir duplicatessimilarity-generic --language elixir ./src# Common options work the same waysimilarity-generic --language go ./src --threshold 0.8 --print
Language | File Extensions | Status |
---|---|---|
Go | .go | Experimental |
Java | .java | Experimental |
C | .c, .h | Experimental |
C++ | .cpp, .cc, .cxx, .hpp, .h | Experimental |
C# | .cs | Experimental |
Ruby | .rb | Experimental |
You can also provide custom language configurations:
# Use custom config filesimilarity-generic --config ./my-language.json ./src
Seeexamples/configs/custom-language-template.json for configuration format.
- Performance is slower than specialized tools (similarity-ts, similarity-py, similarity-rs)
- Detection accuracy may vary by language
- Some language-specific features may not be fully supported
- Custom configurations require understanding of tree-sitter node types
For production use, prefer the specialized tools when available.
- Written in Rust for maximum performance
- Concurrent file processing
- Memory-efficient algorithms
- Language-specific optimizations:
- TypeScript/JavaScript: Fast mode with bloom filters (~4x faster)
- Python/Rust: Tree-sitter based parsing
- Intelligent filtering reduces unnecessary comparisons
MIT
About
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Contributors5
Uh oh!
There was an error while loading.Please reload this page.