
Disclosure: I was the Staff Engineering Manager for the npm CLI team between July 2019 & December 2022. I was a part of the GitHub acquistion of npm inc. in 2020. I left GitHub, for various reasons, in December.
In terms of novel supply chain attacks go, this is a biggy & from here on out I'll be referring to this as"manifest confusion".
Before the node ecosystem became what it is today - aka.tens of millions of developers around the world creating over~3.1 million packages being downloaded208 billion times a month - the number of people contributing to the corpus of software you trusted to use & download was very small. With a smaller community you have more trust & even as the npm registry was being developed most aspects were open source & freely available to be contributed to & code inspected. But, over time, as the ecosystem grew up, so did the policies & practices of organizations consuming from the corpus.
From the outset, the npm project also put a lot of trust in the client vs. server-side of the registry. Looking back now, its clear that the practice of relying so heavily on a client to handle validation of data is riddle with issues but that strategy also allowed for the JavaScript tooling ecosystem to organically grow & participate in the shape of the data.
The npm Public Registry does not validate manifest information with the contents of the package tarball, relying instead on npm-compatible clients to interpret & enforce validation/consistency. In fact, as I researched this issue it looks like the server hasnever done this validation (so you may want to call this a "feature").
Today,registry.npmjs.com lets users publish packages via aPUT request to the corresponding package URI (ex.https://registry.npmjs.com/-/<package-name>). This endpoint accepts a requestbody which looks something like this (note: after almost a decade & a half, this & all other registry APIs continue to be horribly undocumented):
{ _id: <pkg>, name: <pkg>, 'dist-tags':{ ...}, versions:{ '<version>':{ _id: '<pkg>@<version>`, name: '<pkg>', version: '<version>', dist:{ integrity: '<tarball-sha512-hash>', shasum: '<tarball-sha1-hash>', tarball: ''} ...}}, _attachments:{0:{ content_type: 'application/octet-stream', data: '<tarball-base64-string>', length: '<tarball-length>'}}}The issue at hand is that theversion metadata (aka. "manifest" data) is submitted independent from the attached tarball which houses the package'spackage.json. These two pieces of information arenever validated against one another & calls into question which one should be *the canonical source of truth* for data such asdependencies,scripts,license & more. As far as I can tell, the tarball is the only artifact that gets signed & has an integrity value that can be stored & verified offline (making the case forit to potentially be the proper source; yet, very surprisngly, thename &version fields inpackage.json can actually differ from those in the manifest, because they were never validated).
https://www.npmjs.com/settings/<your-username>/tokens/new - choose "Automation" for ease)mkdir test && cd test/ && npm init -y)npm install ssri libnpmpack npm-registry-fetch)mkdir pkg && cd pkg/ && npm init -y)publish.js file in the project root with something like the following:;(async()=>{// libsconst ssri=require('ssri')const pack=require('libnpmpack')const fetch=require('npm-registry-fetch')// pack tarball & generate ingetrityconst tarball=awaitpack('./pkg/')const integrity= ssri.fromData(tarball,{algorithms:[...newSet(['sha1','sha512'])],})// craft manifestconst name='<pkg name>'const version='<pkg version>'const manifest={_id: name,name: name,'dist-tags':{latest: version,},versions:{[version]:{_id:`${name}@${version}`, name, version,dist:{integrity: integrity.sha512[0].toString(),shasum: integrity.sha1[0].hexDigest(),tarball:'',},scripts:{},dependencies:{},},},_attachments:{0:{content_type:'application/octet-stream',data: tarball.toString('base64'),length: tarball.length,},},}// publish via PUTfetch(name,{'//registry.npmjs.org/:_authToken':'<auth token>',method:'PUT',body: manifest,})})()manifest keys as you wish (ex. I've stripped thescripts &dependencies in the above)node publish.js)https://registry.npmjs.com/<pkg>/ &https://www.npmjs.com/package/<pkg>/v/<version>?activeTab=explore to see the discrepancies
In the above example, the package was published with a different manifest then it's correspondingpackage.json (ref.https://www.npmjs.com/darcyclarke-manifest-pkg &https://registry.npmjs.com/darcyclarke-manifest-pkg/).
If you want an even easier way to reproduce this inconsistency you can use thenpm CLI today, as it actually mutates the manifest duringnpm publish when it sees abinding.gyp file in your project. This is a behaviour that seems to have existed in the client since before my time on the team (ie.<6.x or earlier) & is the cause of many bugs/confusion by consumers.
npm init -ytouch binding.gypnpm publish"node-gyp rebuild"scripts.install entry was automatically added to the manifest but not the actual tarball'spackage.json (ex.https://registry.npmjs.com/darcyclarke-binding &https://unpkg.com/[email protected]/package.json)A real-world example/victim of this inconsistency isnode-canvas:
There are several ways this bug actually impacts consumers/end-users:
Update: It was previously stated thatSocket Security was succceptable to the manifest confusion issue. Since September 5, 2022 Socket has used thepackage.jsonfile inside the tarball as the source of truth & should show accurate information for packages (ex. dependencies, licenses, scripts). When this blog was posted, the package page fordarcyclarke0-manifest-pkgwas incorrectly using an outdated data reference & was quickly resolved by the team at Socket. Notably, the team at Socket is likely the first in this space to properly handle this problem.
This issue also effects all known, major JavaScript package managers in various ways detailed below. Third-party registry implementations like jFrog's Artifacory seem to also have replicated this API-design/issue, meaning that all clients of those private registry instances will notice the same issue/inconsistency.
Notably, the various package managers & tooling have different scenarios in which they will use/referenceeither the package's registry manifest or tarball'spackage.json (almost always, as a mechanism to cache & increase performance of installations).
The key point to make here is that the ecosystem is currently under the incorrect assumption that the manifest always contains the contents of the tarball'spackage.json (this is in large part because of the significant lack of registry API documentation as well as various references in docs.npmjs.com to the fact that the registry stores the contents ofpackage.json as the metadata - & no where does it mention that the client is responsible for ensuring consistency).
npm@6npx npm@6 install[email protected]hasInstallScript isundefined/false) (ref.https://registry.npmjs.org/darcyclarke-manifest-pkg/2.1.13 - code/package ref.https://github.com/npm/minify-registry-metadata/blob/main/lib/index.js)package.json innode_modules/darcyclarke-manifest-pkg reflects the tarball entry
Because the package tarball gets cached in a global store, if the--prefer-offline config is used alongside--no-package-lock, the next time aninstall is run of that same package across the system, its dependencies that are hidden in the tarball may be installed.
npx npm@6 install[email protected]npx npm@6 install --prefer-offline --no-package-lock
npm@9Similar tonpm@6,npm@9 will happily install the dependencies referenced inside of a package's cached tarballpackage.json when using the--offline config.
Note: there seems to be a race condition where
--offlinemay or may not pull from cache resulting in intermittant results
--offline configuration &/or by turning off network availability (ex.npm install --offline --no-package-lock)yarn@1Likenpm@6 &npm@9,yarn@1 will run scripts that are inside the tarball but that aren't referenced in the manifest & vice-versa.

version found in the tarball - exposing a potential downgrade attack vectorAs known by now, a tarball can have a differentversion defined then the manifest; in this case,yarn@1 will happily upgrade/downgrade & save back to the consuming project'spackage.json the incorrect version (potentially exposing consumers to a downgrade attack on subsequent installations)

pnpm@7Like all the others,pnpm will run scripts that are inside the tarball but that aren't referenced in the manifest & vice-versa.

There are potentially various CWE categorizations for this vulnerability. At the very least, if this issue might ever be considered a "feature", then what we see here must be considered "Client-Side Enforcement of Server-Side Security" (ie.CWE-602) - but I doubt that's the minimum scope applicable. I've broken down the various issues along with their corresponding CWE categorization below (code references have been provided in each case).
npm CLI) to do work that should be done server-side; this is a perfect examplenpm); as noted below, they all have various issues because of thisname,version,dependencies,license,scripts etc.) differ from the registry index their associated withpackage.json & the package manifestTo my knowledge, GitHub was first made aware of this issue on, or around, November 4th, 2022; after doing independent research, I believed the potential impact/risk of this issue was actually far greater then originally understood & I submitted a HackerOne report with my findings on March 9. GitHub closed that ticket & said they were dealing with the issue "internally" on March 21st. To my knowledge, they have not made any significant headway, nor have they made this issue public - instead, they've actuallydivested their position in npm as a product the last 6 months & refused to follow-up or provide insight into any remediation work.
GitHub is understandably in a tough spot. The fact thatnpmjs.com has functioned this way for over a decade means that the current state is pretty much codified & likely to break someone in a unique way. As mentioned before, thenpm CLI itself relies on this behaivour & there's potentially other non-nefarious uses of this in the wild today.
package.jsonContact any known tooling author/maintainer who you know relies on the npm registries manifest data & ensure they start using the package's contents for metadata when appropriate (ie. everything *but*name &version). Start using a registry proxy which strictly enforces/validates for consistency.
Curious to learn more about vlt? Join our waitlist and get early access.