nodejs/nodePublic

NotificationsYou must be signed in to change notification settings
Fork32.2k
Star112k

Proposal for a simple, universal module loader hooks API to replace require() monkey-patching #52219

New issue

Closed

Proposal for a simple, universal module loader hooks API to replace require() monkey-patching#52219

Labels

loadersIssues and PRs related to ES module loadersmoduleIssues and PRs related to the module subsystem.

Description

joyeecheung

opened

on Mar 26, 2024

Spinning off from#51977

Background

There has been wide-spread monkey-patching of the CJS loader in the ecosystem to customize the loading process of Node.js (e.g. utility packages that abstract over the patching and get depended on by other packages e.g.require-in-the-middle,pirates, or packages that do this on their own liketsx orts-node). This includes but is not limited to patchingModule.prototype._compile,Module._resolveFilename,Module.prototype.require,require.extensions etc. To avoid breaking them Node.js has to maintain the patchability of the CJS loader (even for the underscored methods on the prototype) and this leads to very convoluted code in the CJS loader and also spreads to the ESM loader. It also makes refactoring of the loaders for any readability or performance improvements difficult.

While the ecosystem can migrate to produce and run real ESM gradually, existing tools still have to maintain support for CJS output (either written as CJS, or transpiled from ESM) while that happens, and the maintenance of CJS loader is still important. If we just provide a universal API that works for both CJS and ESM loading customizations, tooling and their users can benefit from a more seamless migration path.

Why`module.register()` is not enough

The loader hooks (in the current form,module.register()) were created to address loading customization needs for the ESM loader, so they only work when the graph is handled by the ESM loader. For example it doesn't work when therequire() comes from a CJS root (which could be a transpiled result), or fromcreateRequire(import.meta.url). Addressing existing use cases in the CJS loader is not in the scope ofthe loaders effort, either (it was brought up before was dismissed ina previous PR too). Therefore tooling in the wild still have to maintain both monkey-patching-based hooks for CJS and loader hooks for ESM when they want to provide universal support. Their users either register both hooks, or (if they know what format is actually being run by Node.js) choose one of them.

Themodule.register() API currently forces the loader hooks to be run on a different worker thread even if the user hooks can do everything synchronously or need to mutate the context on the main thread. While this simplifies de-async-ing of loader code to some extent when they want to run asynchronous code in loader hooks, the value is lost once the loader needs to provide universal support for CJS graphs and has to maintain synchronous hooks for that too (and in that case, they could just spawn their own workers to de-async, e.g. what@babel/register does on top ofrequire() monkey-patching).

For tooling that only has CJS support viarequire() monkey-patching, if they want to add ESM support, this unconditional worker abstraction as the only way to customize ESM loading makes wiring existing customizations into ESM more complicated that it needs to be. The move to unconditional workers also lead to many issues that are still unaddressed:

Forcing loaders to be registered on a different worker thread and having to block the main thread for itmakes the hooks prone to accidental deadlocks
When users want to spawn worker or child processes, having no control over the de-async loader worker leads to usability issues likeESM loaders cannot be defined viaWorker optionexecArgv in v20 #47747 andLoaders that use childProcess.fork lead to endless recursion of processes #47615 and this just looks like a rabbit hole
Debugging a worker is harder and more confusing than just doing everything on the main thread: e.g.not being able to see console.log() in the hooks
It imposes an performance overhead from worker startup and messaging that users cannot opt out of, even if their code don't need to be run off-thread in anyway. They would also have to mutate the context on the main thread via messaging but not everything can cross the boundary.

For us maintainers, having to support this worker setup as the only way to customize module loading also adds maintenance burden to the already convoluted loaders. It is already difficult to get right in the ESM loader (e.g.having to doge infinite worker creation orhaving to figure out how to share the loader worker among user workers), let alone in the monkey-patchable CJS loader.

Proposal of a synchronous, in-thread, universal loader hooks API

As such I think we need something simpler thanmodule.register() that:

Allow users to customize module loading universally - both CJS and ESM, no matter how they are loaded.
Easy to migrate to from existingrequire() monkey-patching-based hooks.
Allow us to fully deprecaterequire() monkey-patching sooner.

This becomes more important now that we are on a path to supportrequire(esm) and want to help the ecosystem migrate to ESM by providing a path with backwards-compatibility and best-effort interop, instead of providing features that does not work in the existing CJS loader or goes against existing CJS usage patterns, making it difficult for people to migrate.

I propose that we just add a synchronous hooks API that work in both the CJS and the ESM loader as a replacement for the monkey-patchability ofrequire(). The API can be something like this - this is just a straw-person sketch combining existingmodule.register() APIs and APIs in npm packages like pirates and require-in-the-middle. The key is that we should keep it simple and just take synchronous methods directly, and apply them in-thread.:

// If users want to do something off-thread, they can spawn their own workers here.// Or import { addHooks } from 'node:module'require('module').addHooks({// Useful for transpiler support of new extensions, etc.resolve(specifier,context,nextResolve){ ...},// load() + resolve() should be enough to provide what pirates needload(specifier,context,nextLoad){// If users want to do things off-thread, e.g. to generate// or fetch the source code asynchronously, they could run it// in their own worker and block on it using their own// Atomics.wait().},// Should be enough to provide what require-in-the-middle needexport(specifier,exports,context,nextExport){ ...}// Other hooks that we find necessary to replace `require()` monkey-patching});

The main difference between this andmodule.register() is that hooks added viamodule.register() are run on a different worker unconditionally, whilemodule.addHooks() just keeps things simple and runs synchronous hooks synchronously in-thread. If users want to run asynchronous code in the synchronous hooks, they can spawn their own workers - this means technically they could just implement whatmodule.register() offers on top ofmodule.addHooks() themselves. Somodule.register() just serves as a convenience method for those who want to run the code off-thread and prefer to delegate the worker-atomics-wait handling to Node.js core.

In a graph involving real ESM,module.register() can work in conjunction tomodule.addHooks(), the hooks are applied in the same order that they are added in the main thread. In a pure CJS graph,module.register() continues to be unsupported, as what's has already been happening. Maybe someday someone would be interested in figuring out how to makemodule.register() work safely in the CJS loader, but I think the burden from the handling the unconditional workers is just not worth the effort, especially when users can and already do spawn their own workers more safely for this use case. IMO a simple alternative likemodule.addHooks() would be a more viable plan for universal module loading customization, and it gets us closer to deprecatingrequire() monkey-patching sooner.

Migration plan

Tooling in the wild can maintain just one set of synchronous customizations, and handle the migration path by changing how these customizations are wired into Node.js:

On older versions of Node.js that they still need to support, they continue to fall back to customization wiring via both paths: into CJS viarequire()-monkey patching, and into ESM viamodule.register(). This is unfortunately what they already do today if they want to provide universal module support.
On newer versions of Node.js, they can wire the same set of customizations into both CJS and ESM via just one path:module.addHooks(). There is no longer need to maintain two wiring for universal support of CJS and ESM. And, if they didn't support real ESM before, they get to implement ESM support relatively simply by just migrating fromrequire() monkey-patching tomodule.addHooks().
When they drop support of older versions that lackmodule.addHooks(), they can remove dependency onrequire() monkey patching completely.

For us, the migration plan looks like this:

Providemodule.addHooks() as a replacement forrequire() monkey-patching, and make it wired into bothrequire()/createRequire() from the CJS loader andimport/require inimported CJS from the ESM loader
Doc-deprecaterequire() monkey patching and actively encourage user-land packages that rely on patching to migrate. At the mean timerequire() monkey patching will still work to some extent in conjunction with the new loader hooks, so that packages have a graceful migration period.
When the dependencies onrequire() monkey patching drop enough in the ecosystem, start emitting runtime warnings when the internal properties ofModule are patched, and suggesting to usemodule.addHooks().
After that we will no longer maintain compatibility hacks forrequire() monkey patching to work. Packages who monkey-patchModule but don't manage to migrate might still work with newer versions of Node.js - until we do internal changes to the internal properties that they rely on. When we do that and break them, instead of further convoluting internals to make patching work, we'll suggest them to just usemodule.addHooks() on newer versions of Node.js.

Metadata

Assignees

No one assigned

Labels

loadersIssues and PRs related to ES module loadersmoduleIssues and PRs related to the module subsystem.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Proposal for a simple, universal module loader hooks API to replace require() monkey-patching #52219

Description

Background

Why`module.register()` is not enough

Proposal of a synchronous, in-thread, universal loader hooks API

Migration plan

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Movatterモバイル変換

Uh oh!

Proposal for a simple, universal module loader hooks API to replace require() monkey-patching #52219

Description

Background

Whymodule.register() is not enough

Proposal of a synchronous, in-thread, universal loader hooks API

Migration plan

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Why`module.register()` is not enough