Movatterモバイル変換

Intsights/PyRepScanPublic

NotificationsYou must be signed in to change notification settings
Fork6
Star39

A Git Repository Secrets Scanner written in Rust

License

MIT license

39 stars 6 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
images		images
pyrepscan		pyrepscan
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
cortex.yaml		cortex.yaml
pyproject.toml		pyproject.toml

Repository files navigation

A Git Repository Secrets Scanner written in Rust

About The Project

PyRepScan is a python library written in Rust. The library usesgit2-rs for repository parsing and traversing,regex for regex pattern matching andcrossbeam for concurrency. The library was written to achieve high performance and python bindings.

Built With

Performance

CPU

Library	Time	Peak Memory
PyRepScan	8.74s	1,149,152 kb
gitleaks	1118s	1,146,300 kb

Installation

pip3 install PyRepScan

Documentation

classGitRepositoryScanner:def__init__(self,    )->None

This class holds all the added rules for fast reuse.

defadd_content_rule(self,name:str,pattern:str,whitelist_patterns:typing.List[str],blacklist_patterns:typing.List[str],)->None

Theadd_content_rule function adds a new rule to an internal list of rules that could be reused multiple times against different repositories. The same name can be used multiple times and would lead to results which can hold the same name. Content rule means that the regex pattern would be tested against the content of the files.

name - The name of the rule so it can be identified.
pattern - The regex pattern (Rust Regex syntax) to match against the content of the commited files.
whitelist_patterns - A list of regex patterns (Rust Regex syntax) to match against the content of the committed file to filter in results. Only one of the patterns should be matched to pass through the result. There is an OR relation between the patterns.
blacklist_patterns - A list of regex patterns (Rust Regex syntax) to match against the content of the committed file to filter out results. Only one of the patterns should be matched to omit the result. There is an OR relation between the patterns.

defadd_file_path_rule(self,name:str,pattern:str,)->None

Theadd_file_path_rule function adds a new rule to an internal list of rules that could be reused multiple times against different repositories. The same name can be used multiple times and would lead to results which can hold the same name. File name rule means that the regex pattern would be tested against the file paths.

name - The name of the rule so it can be identified.
pattern - The regex pattern (Rust Regex syntax) to match against the file paths of the commited files.

defadd_file_extension_to_skip(self,file_extension:str,)->None

Theadd_file_extension_to_skip function adds a new file extension to the filtering phase to reduce the amount of inspected files and to increase the performance of the scan.

file_extension - A file extension, without a leading dot, to filter out from the scan.

defadd_file_path_to_skip(self,file_path:str,)->None

Theadd_file_path_to_skip function adds a new file path pattern to the filtering phase to reduce the amount of inspected files and to increase the performance of the scan. Every file path that would include thefile_path substring would be left out of the scanned files.

file_path - If the inspected file path would include this substring, it won't be scanned. This parameter is a free text.

defscan(self,repository_path:str,branch_glob_pattern:typing.Optional[str],from_timestamp:typing.Optional[int],)->typing.List[typing.Dict[str,str]]

Thescan function is the main function in the library. Calling this function would trigger a new scan that would return a list of matches. The scan function is a multithreaded operation, that would utilize all the available core in the system. The results would not include the file content but only the regex matching group. To retrieve the full file content one should take theresults['oid'] and to callget_file_content function.

repository_path - The git repository folder path.
branch_glob_pattern - A glob pattern to filter branches for the scan. If None is sent, defaults to*.
from_timestamp - A UTC timestamp (Int) that only commits that were created after this timestamp would be included in the scan. If None is sent, defaults to0.

A sample result would look like this:

{'rule_name':'First Rule','author_email':'author@email.email','author_name':'Author Name','commit_id':'1111111111111111111111111111111111111111','commit_message':'The commit message','commit_time':'2020-01-01T00:00:00e','file_path':'full/file/path','file_oid':'47d2739ba2c34690248c8f91b84bb54e8936899a','match':'The matched group',}

defscan_from_url(self,url:str,repository_path:str,branch_glob_pattern:typing.Optional[str],from_timestamp:typing.Optional[int],)->typing.List[typing.Dict[str,str]]

The same asscan function but also clones a repository from a given URL into the provided repository path.

url - URL of a git repository.
repository_path - The path to clone the repository to
branch_glob_pattern - A glob pattern to filter branches for the scan. If None is sent, defaults to*.
from_timestamp - A UTC timestamp (Int) that only commits that were created after this timestamp would be included in the scan. If None is sent, defaults to0.

defget_file_content(self,repository_path:str,file_oid:str,)->bytes

Theget_file_content function exists to retrieve the content of a file that was previously matched. The full file content is omitted from the results to reduce the results list size and to deliver better performance.

repository_path - The git repository folder path.
file_oid - A string representing the file oid. This parameter exists in the results dictionary returned by thescan function.

Usage

importpyrepscangrs=pyrepscan.GitRepositoryScanner()# Adds a specific rule, can be called multiple times or nonegrs.add_content_rule(name='First Rule',pattern=r'(-----BEGIN PRIVATE KEY-----)',whitelist_patterns=[],blacklist_patterns=[],)grs.add_file_path_rule(name='Second Rule',pattern=r'.+\.pem',)grs.add_file_path_rule(name='Third Rule',pattern=r'(prod|dev|stage).+key',)# Add file extensions to ignore during the searchgrs.add_file_extension_to_skip(file_extension='bin',)grs.add_file_extension_to_skip(file_extension='jpg',)# Add file paths to ignore during the search. Free text is allowedgrs.add_file_path_to_skip(file_path='site-packages',)grs.add_file_path_to_skip(file_path='node_modules',)# Scans a repositoryresults=grs.scan(repository_path='/repository/path',branch_glob_pattern='*',)# Results is a list of dicts. Each dict is in the following format:{'rule_name':'First Rule','author_email':'author@email.email','author_name':'Author Name','commit_id':'1111111111111111111111111111111111111111','commit_message':'The commit message','commit_time':'2020-01-01T00:00:00e','file_path':'full/file/path','file_oid':'47d2739ba2c34690248c8f91b84bb54e8936899a','match':'The matched group',}# Fetch the file_oid full contentfile_content=grs.get_file_content(repository_path='/repository/path',file_oid='47d2739ba2c34690248c8f91b84bb54e8936899a',)# file_contentb'binary data'# Creating a RulesManager directlyrules_manager=pyrepscan.RulesManager()# For testing purposes, check your regexes pattern using check_pattern functionrules_manager.check_pattern(content='some content1 to check, another content2 in the same line\nanother content3 in another line\n',pattern=r'(content\d)',)# Results are the list of captured matches['content1','content2','content3',]

License

Distributed under the MIT License. SeeLICENSE for more information.

Contact

Gal Ben David -gal@intsights.com

Project Link:https://github.com/intsights/PyRepScan

About

A Git Repository Secrets Scanner written in Rust

Releases24

v0.12.0 Latest

Aug 9, 2023

+ 23 releases

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

A Git Repository Secrets Scanner written in Rust

Table of Contents

About The Project

Built With

Performance

CPU

Installation

Documentation

Usage

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases24

Packages

Uh oh!

Contributors6

Languages

Movatterモバイル変換

License

Intsights/PyRepScan

Folders and files

Latest commit

History

Repository files navigation

A Git Repository Secrets Scanner written in Rust

Table of Contents

About The Project

Built With

Performance

CPU

Installation

Documentation

Usage

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases24

Packages0

Uh oh!

Contributors6

Languages

Packages