Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

License

NotificationsYou must be signed in to change notification settings

mizchi/similarity

Repository files navigation

High-performance code similarity detection tools written in Rust. Detects duplicate functions and similar code patterns across your codebase in multiple programming languages.

Tool Maturity Status

ToolLanguageStatusDescription
similarity-tsTypeScript/JavaScriptProduction ReadyMost mature and production-tested
similarity-pyPython⚠️BetaNot production-tested yet
similarity-rsRust⚠️BetaNot production-tested yet
similarity-elixirElixir🧪ExperimentalEarly development stage
similarity-genericGo, Java, C/C++, C#, Ruby🧪ExperimentalEarly development stage
similarity-mdMarkdown🧪ExperimentalEarly development stage

Features

  • Zero configuration - works out of the box
  • Multi-language support - TypeScript/JavaScript, Python, and Rust
  • Fast & Accurate - AST-based comparison, not just text matching
  • AI-friendly output - Easy to share with Claude, GPT-4, etc.

Quick Start

1. Install (TypeScript/JavaScript)

cargo install similarity-ts

2. Detect duplicates

# Scan current directorysimilarity-ts.# Scan specific filessimilarity-ts src/utils.ts src/helpers.ts# Show actual codesimilarity-ts. --print

3. Refactor with AI

Copy the output and use this prompt with Claude:

Run `similarity-ts .` to detect semantic code similarities. Execute this command, analyze the duplicate code patterns, and create a refactoring plan. Check `similarity-ts -h` for detailed options.

Example output:

Duplicates in src/utils.ts:────────────────────────────────────────────────────────────  src/utils.ts:10-20 calculateTotal <-> src/helpers.ts:5-15 computeSum  Similarity: 92.50%, Score: 9.2 points

The AI will analyze patterns and suggest refactoring strategies.

Documentation

Available Tools

Production Ready

  • similarity-ts - TypeScript/JavaScript similarity detection ✅Most mature and production-tested

Beta (Not production-tested yet)

  • similarity-py - Python similarity detection⚠️Not production-tested
  • similarity-rs - Rust similarity detection⚠️Not production-tested

Experimental

  • similarity-elixir - Elixir similarity detection 🧪Experimental
  • similarity-generic - Generic similarity detection for Go, Java, C/C++, C#, Ruby 🧪Experimental
  • similarity-md - Markdown similarity detection 🧪Experimental

Installation

TypeScript/JavaScript

# Install from crates.iocargo install similarity-ts# Use the installed binarysimilarity-ts --help

Python

# Install from crates.iocargo install similarity-py# Use the installed binarysimilarity-py --help

Rust

# Install from crates.iocargo install similarity-rs# Use the installed binarysimilarity-rs --help

Elixir

# Install from crates.iocargo install similarity-elixir# Use the installed binarysimilarity-elixir --help

Other Languages (Go, Java, C/C++, C#, Ruby)

# Install from crates.iocargo install similarity-generic# Use the installed binarysimilarity-generic --language go main.gosimilarity-generic --language java Main.java

From source

# Clone the repositorygit clone https://github.com/mizchi/similarity.gitcd similarity# Build all toolscargo build --release# Or install specific toolcargo install --path crates/similarity-tscargo install --path crates/similarity-pycargo install --path crates/similarity-rs

Usage

Common Options (All Languages)

  • --threshold /-t - Similarity threshold (0.0-1.0, default: 0.85)
  • --min-lines /-m - Minimum lines for functions (default: 3-5)
  • --min-tokens - Minimum AST nodes for functions
  • --print /-p - Print code in output
  • --cross-file /-c - Enable cross-file comparison
  • --no-size-penalty - Disable size difference penalty

TypeScript/JavaScript Specific

# Check for duplicate functions (default)similarity-ts ./src# Enable type checking (experimental)similarity-ts ./src --experimental-types# Check types onlysimilarity-ts ./src --no-functions --experimental-types# Fast mode with bloom filter (default)similarity-ts ./src --no-fast# disable

Python Specific

# Check Python filessimilarity-py ./src# Include test filessimilarity-py. --extensions py,test.py

Rust Specific

# Check Rust filessimilarity-rs ./src# Skip test functions (test_ prefix or #[test])similarity-rs. --skip-test# Set minimum tokens (default: 30)similarity-rs. --min-tokens 50

Output Format

The tool outputs in a VSCode-compatible format for easy navigation:

Duplicates in src/utils.ts:────────────────────────────────────────────────────────────  src/utils.ts:10 | L10-15 similar-function: calculateSum  src/utils.ts:20 | L20-25 similar-function: addNumbers  Similarity: 85.00%, Priority: 8.5 (lines: 10)

Click on the file paths in VSCode's terminal to jump directly to the code.

Results are sorted by priority (lines × similarity) to help you focus on the most impactful duplications first.

AI Integration

Prompt for Code Deduplication

For AI assistants (like Claude, GPT-4, etc.) to help with code deduplication:

`similarity-ts .` でコードの意味的な類似が得られます。あなたはこれを実行し、ソースコードの重複を検知して、リファクタリング計画を立てます。細かいオプションは similarity-ts -h で確認してください。

English version:

Run `similarity-ts .` to detect semantic code similarities. Execute this command, analyze the duplicate code patterns, and create a refactoring plan. Check `similarity-ts -h` for detailed options.

Example Workflow with AI

  1. Run similarity detection:

    similarity-ts. --threshold 0.8 --min-lines 10
  2. Share output with AI: Copy the similarity report to your AI assistant

  3. AI analyzes patterns: The AI will identify common patterns and suggest refactoring strategies

  4. Iterative refinement: Adjust threshold and options based on AI recommendations

Integration with Development Tools

This tool can be integrated into:

  • Pre-commit hooks to prevent duplicate code
  • CI/CD pipelines for code quality checks
  • IDE extensions for real-time duplicate detection
  • AI-powered code review workflows

How It Works

Core Algorithm

  1. AST Parsing: Language-specific parsers convert code to ASTs
    • TypeScript/#"auto">

      Overlap Detection (Experimental)

The--experimental-overlap flag enables detection of partial code overlaps within and across functions:

# Basic overlap detectionsimilarity-ts ./src --experimental-overlap# With custom parameterssimilarity-ts ./src --experimental-overlap \  --threshold 0.75 \  --overlap-min-window 8 \  --overlap-max-window 25 \  --overlap-size-tolerance 0.25

Parameters:

  • --experimental-overlap: Enable overlap detection mode
  • --overlap-min-window: Minimum AST nodes to consider (default: 8)
  • --overlap-max-window: Maximum AST nodes to consider (default: 25)
  • --overlap-size-tolerance: Size variation tolerance (default: 0.25)

Use Cases:

  • Finding copy-pasted code fragments within larger functions
  • Detecting similar algorithmic patterns across different contexts
  • Identifying refactoring opportunities for common code blocks

Language-Specific Features

  • TypeScript: Type similarity detection (interfaces, type aliases)
  • Python: Class and method detection, decorator support
  • Rust: Test function filtering, impl block analysis

Examples

TypeScript/JavaScript

# Find duplicate functionssimilarity-ts ./src --threshold 0.7 --print# Find similar types across filessimilarity-ts ./src --no-functions --experimental-types --cross-file --print# Comprehensive analysissimilarity-ts ./src \  --threshold 0.8 \  --min-lines 10 \  --cross-file \  --extensions ts,tsx# Detect partial code overlaps (Experimental)similarity-ts ./src --experimental-overlap --threshold 0.75 --print

Python

# Find duplicate functions in Python projectsimilarity-py ./src --threshold 0.85 --print# Check with custom settingssimilarity-py. \  --min-lines 5 \  --extensions py

Rust

# Find duplicates excluding testssimilarity-rs ./src --skip-test --print# Strict checking with high token countsimilarity-rs. \  --min-tokens 50 \  --threshold 0.9 \  --skip-test

Experimental: Generic Language Support

⚠️EXPERIMENTAL: The generic language support is in early development and may have limitations or bugs.

similarity-generic

Thesimilarity-generic tool provides experimental support for additional languages using tree-sitter parsers:

  • Go
  • Java
  • C
  • C++
  • C#
  • Ruby
  • Elixir

Installation

# From crates.io (when available)cargo install similarity-generic# From sourcecargo install --path crates/similarity-generic

Usage

# Detect Go duplicatessimilarity-generic --language go ./src# Detect Java duplicatessimilarity-generic --language java ./src# Detect C/C++ duplicatessimilarity-generic --language c ./srcsimilarity-generic --language cpp ./src# Detect C# duplicatessimilarity-generic --language csharp ./src# Detect Ruby duplicatessimilarity-generic --language ruby ./src# Detect Elixir duplicatessimilarity-generic --language elixir ./src# Common options work the same waysimilarity-generic --language go ./src --threshold 0.8 --print

Supported Languages

LanguageFile ExtensionsStatus
Go.goExperimental
Java.javaExperimental
C.c, .hExperimental
C++.cpp, .cc, .cxx, .hpp, .hExperimental
C#.csExperimental
Ruby.rbExperimental

Custom Language Configuration

You can also provide custom language configurations:

# Use custom config filesimilarity-generic --config ./my-language.json ./src

Seeexamples/configs/custom-language-template.json for configuration format.

Limitations

  • Performance is slower than specialized tools (similarity-ts, similarity-py, similarity-rs)
  • Detection accuracy may vary by language
  • Some language-specific features may not be fully supported
  • Custom configurations require understanding of tree-sitter node types

For production use, prefer the specialized tools when available.

Performance

  • Written in Rust for maximum performance
  • Concurrent file processing
  • Memory-efficient algorithms
  • Language-specific optimizations:
    • TypeScript/JavaScript: Fast mode with bloom filters (~4x faster)
    • Python/Rust: Tree-sitter based parsing
  • Intelligent filtering reduces unnecessary comparisons

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors5


[8]ページ先頭

©2009-2025 Movatter.jp