Add approximate matching capabilities to software vulnerability discovery
It is of the utmost importance to ensure that FOSS packages from public repositories have not been tampered with by malicious actors. This type of compromise is described as an open source "supply chain attack" and these have been increasing significantly. This project is building a new system (which is FOSS itself) to help verify the integrity of deployed code packages and validate their origin with external data sources, with the potential to mitigate attacks on open source packages supply chains such as: detecting if a package in use is matching verified code by matching source and binaries exactly and approximately. Or detecting abnormal code changes that may be signs of malicious modifications and possible attacks on a package.
The key components of this open code and data solution are a Package and File Fingerprints Database, a Code Similarity and Changes Detection Engine, utilities to detect possibly malicious changes in upstream projects, and integration in build system(s). While existing approaches may require a tight control of the whole code supply chain, the approach of this project is designed for practical usage with limited changes to a build and CI/CD pipeline.
This is the second phase of this ambitious project, the focus of which is to enable approximate matching between a database of FOSS packages resources and an actual FOSS package or other code. Moreover, various architectural improvements will be performed to support use at larger scale.
This project was funded through theNGI0 Entrust Fund, a fund established byNLnet with financial support from the European Commission'sNext Generation Internet programme, under the aegis ofDG Communications Networks, Content and Technology under grant agreement No 101069594.