- Notifications
You must be signed in to change notification settings - Fork70
Setupscripts/es-sarif/ for MRVA -> SARIF -> Elasticsearch#963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Draft
data-douser wants to merge8 commits intomainChoose a base branch fromes-sarif
base:main
Could not load branches
Branch not found:{{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline, and old review comments may become outdated.
Uh oh!
There was an error while loading.Please reload this page.
Draft
Changes fromall commits
Commits
Show all changes
8 commits Select commitHold shift + click to select a range
b6b071a scripts/es-sarif/ for MRVA->SARIF->Elasticsearch
data-douser8806b10 Add sarif-infer-versionControlProvenance.py
data-douserc182f91 Update index-sarif-results-in-elasticsearch.py
data-douser42e6b4a ES indexing should only use SARIF results
data-douser1018933 Improve logging in scripts/es-sarif/index-sarif-*
data-douserec7b5f6 Add delay to es bulk indexing
data-douserbb72b34 Update index-sarif-results-in-elasticsearch.py
data-douser25763c5 Another update to fix repositoryUri sarif
data-douserFile filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
There are no files selected for viewing
181 changes: 181 additions & 0 deletionsscripts/es-sarif/.gitignore
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,181 @@ | ||
| # Byte-compiled / optimized / DLL files | ||
| __pycache__/ | ||
| *.py[cod] | ||
| *$py.class | ||
| # C extensions | ||
| *.so | ||
| # Distribution / packaging | ||
| .Python | ||
| build/ | ||
| develop-eggs/ | ||
| dist/ | ||
| downloads/ | ||
| eggs/ | ||
| .eggs/ | ||
| lib/ | ||
| lib64/ | ||
| parts/ | ||
| sdist/ | ||
| var/ | ||
| wheels/ | ||
| share/python-wheels/ | ||
| *.egg-info/ | ||
| .installed.cfg | ||
| *.egg | ||
| MANIFEST | ||
| # PyInstaller | ||
| # Usually these files are written by a python script from a template | ||
| # before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
| *.manifest | ||
| *.spec | ||
| # Installer logs | ||
| pip-log.txt | ||
| pip-delete-this-directory.txt | ||
| # Unit test / coverage reports | ||
| htmlcov/ | ||
| .tox/ | ||
| .nox/ | ||
| .coverage | ||
| .coverage.* | ||
| .cache | ||
| nosetests.xml | ||
| coverage.xml | ||
| *.cover | ||
| *.py,cover | ||
| .hypothesis/ | ||
| .pytest_cache/ | ||
| cover/ | ||
| # Translations | ||
| *.mo | ||
| *.pot | ||
| # Django stuff: | ||
| *.log | ||
| local_settings.py | ||
| db.sqlite3 | ||
| db.sqlite3-journal | ||
| # Flask stuff: | ||
| instance/ | ||
| .webassets-cache | ||
| # Scrapy stuff: | ||
| .scrapy | ||
| # Sphinx documentation | ||
| docs/_build/ | ||
| # PyBuilder | ||
| .pybuilder/ | ||
| target/ | ||
| # Jupyter Notebook | ||
| .ipynb_checkpoints | ||
| # IPython | ||
| profile_default/ | ||
| ipython_config.py | ||
| # pyenv | ||
| # For a library or package, you might want to ignore these files since the code is | ||
| # intended to run in multiple environments; otherwise, check them in: | ||
| # .python-version | ||
| # pipenv | ||
| # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
| # However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
| # having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
| # install all needed dependencies. | ||
| #Pipfile.lock | ||
| # poetry | ||
| # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. | ||
| # This is especially recommended for binary packages to ensure reproducibility, and is more | ||
| # commonly ignored for libraries. | ||
| # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control | ||
| #poetry.lock | ||
| # pdm | ||
| # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. | ||
| #pdm.lock | ||
| # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it | ||
| # in version control. | ||
| # https://pdm.fming.dev/#use-with-ide | ||
| .pdm.toml | ||
| # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm | ||
| __pypackages__/ | ||
| # Celery stuff | ||
| celerybeat-schedule | ||
| celerybeat.pid | ||
| # SageMath parsed files | ||
| *.sage.py | ||
| # Environments | ||
| .env | ||
| .env.local | ||
| .env.*.local | ||
| .venv | ||
| env/ | ||
| venv/ | ||
| ENV/ | ||
| env.bak/ | ||
| venv.bak/ | ||
| # Spyder project settings | ||
| .spyderproject | ||
| .spyproject | ||
| # Rope project settings | ||
| .ropeproject | ||
| # mkdocs documentation | ||
| /site | ||
| # mypy | ||
| .mypy_cache/ | ||
| .dmypy.json | ||
| dmypy.json | ||
| # Pyre type checker | ||
| .pyre/ | ||
| # pytype static type analyzer | ||
| .pytype/ | ||
| # Cython debug symbols | ||
| cython_debug/ | ||
| # PyCharm | ||
| # JetBrains specific template is maintained in a separate JetBrains.gitignore that can | ||
| # be added to the global gitignore or merged into this project gitignore. For a PyCharm | ||
| # project, it is recommended to include/store project specific gitignore file(s) within | ||
| # the project root. | ||
| # https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore | ||
| .idea/ | ||
| # IDE specific files | ||
| .vscode/ | ||
| *.swp | ||
| *.swo | ||
| *~ | ||
| # OS specific files | ||
| .DS_Store | ||
| .DS_Store? | ||
| ._* | ||
| .Spotlight-V100 | ||
| .Trashes | ||
| ehthumbs.db | ||
| Thumbs.db | ||
| elastic-start-local/ | ||
| mrva/ |
20 changes: 20 additions & 0 deletionsscripts/es-sarif/Makefile
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| .PHONY: format check install help | ||
| check: | ||
| black --check --diff *.py | ||
| compile: | ||
| python3 -m py_compile *.py | ||
| format: | ||
| black *.py | ||
| help: | ||
| @echo "Available targets:" | ||
| @echo " install - Install black formatter" | ||
| @echo " format - Format all Python files in this directory" | ||
| @echo " check - Check formatting without making changes" | ||
| install: | ||
| pip install -r requirements.txt | ||
18 changes: 18 additions & 0 deletionsscripts/es-sarif/activate.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| #!/bin/bash | ||
| # Convenience script to activate the SARIF Elasticsearch Indexer environment | ||
| SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
| VENV_DIR="$SCRIPT_DIR/.venv" | ||
| if [ ! -d "$VENV_DIR" ]; then | ||
| echo "Virtual environment not found. Run setup.sh first." | ||
| exit 1 | ||
| fi | ||
| echo "Activating SARIF Elasticsearch Indexer environment..." | ||
| echo "Python version: $($VENV_DIR/bin/python --version)" | ||
| echo "To deactivate, run: deactivate" | ||
| echo | ||
| # Start a new shell with the virtual environment activated | ||
| exec bash --rcfile <(echo "source $VENV_DIR/bin/activate; PS1='(es-sarif) \u@\h:\w\$ '") |
88 changes: 88 additions & 0 deletionsscripts/es-sarif/index-sarif-results-in-elasticsearch.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,88 @@ | ||
| # SARIF Files Elasticsearch Indexer | ||
| This script creates a fresh Elasticsearch index and indexes SARIF 2.1.0 results from multiple SARIF files into it. | ||
| ## Requirements | ||
| - Python 3.11+ | ||
| - SARIF files conforming to version 2.1.0 specification (such as those produced by `gh mrva`) | ||
| - Accessible URLs for running instances of Elasticsearch (aka "es") and Kibana (e.g. via `Quick Setup` below) | ||
| ## Usage | ||
| ```bash | ||
| python index-sarif-results-in-elasticsearch.py <sarif_files_list.txt> <elasticsearch_index_name> | ||
| ``` | ||
| ## Input File Format | ||
| The SARIF files list should be a plain text file with one relative file path per line: | ||
| ```text | ||
| output_misra-c-and-cpp-default_top-1000/solvespace/solvespace/solvespace_solvespace_18606.sarif | ||
| output_misra-c-and-cpp-default_top-1000/solvespace/solvespace/solvespace_solvespace_18607.sarif | ||
| # Comments starting with # are ignored | ||
| ``` | ||
| **Note**: Paths are resolved relative to the directory containing the list file. | ||
| ## Quick Setup | ||
| 1. **Set up Python environment:** | ||
| ```bash | ||
| ## Change to the directory that contains this document | ||
| cd scripts/es-sarif | ||
| bash setup-venv.sh | ||
| source .venv/bin/activate | ||
| ``` | ||
| 1. **Set up Elasticsearch and Kibana with Docker:** | ||
| ```bash | ||
| curl -fsSL https://elastic.co/start-local | sh | ||
| ``` | ||
| 1. **Run the indexer:** | ||
| ```bash | ||
| ## from the `scripts/es-sarif` directory | ||
| python index-sarif-results-in-elasticsearch.py mrva/sessions/sarif-files.txt codeql-coding-standards-misra-sarif | ||
| ``` | ||
| The `elastic-start-local` setup provides: | ||
| - Elasticsearch at `http://localhost:9200` | ||
| - Kibana at `http://localhost:5601` | ||
| - API key stored in `elastic-start-local/.env` as `ES_LOCAL_API_KEY` | ||
| ## Example Queries | ||
| Search for high-severity results: | ||
| ```json | ||
| GET /codeql-coding-standards-misra-sarif/_search | ||
| { | ||
| "query": { "term": { "level": "error" } } | ||
| } | ||
| ``` | ||
| Find results for a specific rule: | ||
| ```json | ||
| GET /codeql-coding-standards-misra-sarif/_search | ||
| { | ||
| "query": { "term": { "ruleId": "CERT-C-MSC30-C" } } | ||
| } | ||
| ``` | ||
| ## Managing Elasticsearch Services | ||
| Control the Docker services: | ||
| ```bash | ||
| cd elastic-start-local | ||
| ./start.sh # Start services | ||
| ./stop.sh # Stop services | ||
| ./uninstall.sh # Remove everything (deletes all data) | ||
| ``` |
Oops, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
Oops, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.