Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.2k
Fix a segfault by ensuring TLS Sockets are closed if the underlying stream closes#49327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This fixes a potential segfault, among various other likely-relatedissues, which all occur because TLSSockets were not informed if theirunderlying stream was closed in many cases.This also significantly modifies an existing TLS test. With this changein place, that test no longer works, as it tries to mess with internalsto trigger a race, and those internals are now cleaned up earlier. Thistest has been simplified to a more general TLS shutdown test.
|
Currently the test fails only in a couple of environments (Mac & ASan)reliably, even though it passes reliably in all others. The failures aredue to emitted errors on the client H2 stream (not the focus of thistest). I'm hoping here that that's due to a timing different in thoseenvironments, and scheduling the shutdown like this instead willguarantee the order so that doesn't happen any more.
Thanks@aduh95! It seems this fails because the new segfault test throws a connection reset error on the client H2 stream. That's not a huge problem: it's the server that segfaults without this change, so the test is really interested in server behaviour here, not the client, and a handleable connection error event is far better than a segfault anyway. That said, it's weird that it's different for Mac & ASAN compared to the other cases, I wonder if it's just a timing issue? I've pushed a change to the tests, to see whether it's a timing problem, which moves the shutdown logic into timers startedafter the response is definitely fully received. Hopefully that will ensure the client session handles this more cleanly. The test still segfaults on Node 20.5.1 and passes reliably with this change (on my Linux machine). If this test still doesn't pass in the same cases, I think just ignoring client errors for this case is fine - it's an unusual shutdown regardless, and really we just want to check the server doesn't crash, so I don't want to get distracted too much by independent client issues (even if they might be worth investigating elsewhere later). |
Seems it was indeed a timing issue - all tests are now passing everywhere 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
lgtm
I can see two CI failures here:
Both seem very unlikely to be related to this change, so I'm going to assume this is all good and done, and those will presumably pass later on (maybe once that node's disk space is cleaned up?). Thanks for the review@mcollina! I think this is ready to merge but let me know if there's anything else I should do to make that happen. |
Commit Queue failed- Loading data for nodejs/node/pull/49327✔ Done loading data for nodejs/node/pull/49327----------------------------------- PR info ------------------------------------Title Fix a segfault by ensuring TLS Sockets are closed if the underlying stream closes (#49327) ⚠ Could not retrieve the email or name of the PR author's from user's GitHub profile!Branch pimterry:fix-tls-shutdown -> nodejs:mainLabels tls, author readyCommits 2 - tls: ensure TLS Sockets are closed if the underlying wrap closes - Tweak new H2 shutdown test towards reliable ordering & client shutdownCommitters 1 - Tim Perry PR-URL: https://github.com/nodejs/node/pull/49327Reviewed-By: Matteo Collina ------------------------------ Generated metadata ------------------------------PR-URL: https://github.com/nodejs/node/pull/49327Reviewed-By: Matteo Collina -------------------------------------------------------------------------------- ℹ This PR was created on Fri, 25 Aug 2023 16:11:30 GMT ✔ Approvals: 1 ✔ - Matteo Collina (@mcollina) (TSC): https://github.com/nodejs/node/pull/49327#pullrequestreview-1600235070 ✘ This PR needs to wait 41 more hours to land (or 0 hours if there is one more approval) ✔ Last GitHub CI successful ℹ Last Full PR CI on 2023-08-30T19:32:01Z: https://ci.nodejs.org/job/node-test-pull-request/53650/- Querying data for job/node-test-pull-request/53650/ ✔ Last Jenkins CI successful-------------------------------------------------------------------------------- ✔ Aborted `git node land` session in /home/runner/work/node/node/.ncuhttps://github.com/nodejs/node/actions/runs/6030922076 |
Commit Queue failed- Loading data for nodejs/node/pull/49327✔ Done loading data for nodejs/node/pull/49327----------------------------------- PR info ------------------------------------Title Fix a segfault by ensuring TLS Sockets are closed if the underlying stream closes (#49327) ⚠ Could not retrieve the email or name of the PR author's from user's GitHub profile!Branch pimterry:fix-tls-shutdown -> nodejs:mainLabels tls, author ready, commit-queue-squashCommits 2 - tls: ensure TLS Sockets are closed if the underlying wrap closes - Tweak new H2 shutdown test towards reliable ordering & client shutdownCommitters 1 - Tim Perry PR-URL: https://github.com/nodejs/node/pull/49327Reviewed-By: Matteo Collina ------------------------------ Generated metadata ------------------------------PR-URL: https://github.com/nodejs/node/pull/49327Reviewed-By: Matteo Collina -------------------------------------------------------------------------------- ℹ This PR was created on Fri, 25 Aug 2023 16:11:30 GMT ✔ Approvals: 1 ✔ - Matteo Collina (@mcollina) (TSC): https://github.com/nodejs/node/pull/49327#pullrequestreview-1600235070 ✘ This PR needs to wait 27 more hours to land (or 0 hours if there is one more approval) ✔ Last GitHub CI successful ℹ Last Full PR CI on 2023-08-30T23:03:30Z: https://ci.nodejs.org/job/node-test-pull-request/53650/- Querying data for job/node-test-pull-request/53650/ ✔ Last Jenkins CI successful-------------------------------------------------------------------------------- ✔ Aborted `git node land` session in /home/runner/work/node/node/.ncuhttps://github.com/nodejs/node/actions/runs/6037572193 |
Landed in048e0be |
lpinca commentedSep 9, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@pimterry the
I can reproduce the issue on macOS by running
|
lpinca commentedSep 9, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
The following patch seems to fix the issue diff --git a/test/parallel/test-tls-socket-close.js b/test/parallel/test-tls-socket-close.jsindex 667b291309..70af760d53 100644--- a/test/parallel/test-tls-socket-close.js+++ b/test/parallel/test-tls-socket-close.js@@ -44,9 +44,9 @@ function connectClient(server) { setImmediate(() => { assert.strictEqual(netSocket.destroyed, true);- assert.strictEqual(clientTlsSocket.destroyed, true); setImmediate(() => {+ assert.strictEqual(clientTlsSocket.destroyed, true); assert.strictEqual(serverTlsSocket.destroyed, true); tlsServer.close(); but I'm not sure if it invalidates some assumptions. |
Ah, sorry about that. I think that patch absolutely makes sense - there's no specific reason that it should be exactly one setImmediate to get to the destroyed state on the remote socket, it just needs to settle there eventually. I assume you're happy to PR that patch, but let me know if you'd prefer me to do it. |
Move the check for the destroyed state of the remote socket to the inner`setImmediate()`.Refs:nodejs#49327 (comment)
Done, thanks for the fast reply. |
This fixes a potential segfault, among various other likely-relatedissues, which all occur because TLSSockets were not informed if theirunderlying stream was closed in many cases.This also significantly modifies an existing TLS test. With this changein place, that test no longer works, as it tries to mess with internalsto trigger a race, and those internals are now cleaned up earlier. Thistest has been simplified to a more general TLS shutdown test.PR-URL:#49327Reviewed-By: Matteo Collina <matteo.collina@gmail.com>Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Move the check for the destroyed state of the remote socket to the inner`setImmediate()`.Refs:#49327 (comment)PR-URL:#49575Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Move the check for the destroyed state of the remote socket to the inner`setImmediate()`.Refs:#49327 (comment)PR-URL:#49575Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
This fixes a potential segfault, among various other likely-relatedissues, which all occur because TLSSockets were not informed if theirunderlying stream was closed in many cases.This also significantly modifies an existing TLS test. With this changein place, that test no longer works, as it tries to mess with internalsto trigger a race, and those internals are now cleaned up earlier. Thistest has been simplified to a more general TLS shutdown test.PR-URL:nodejs#49327Reviewed-By: Matteo Collina <matteo.collina@gmail.com>Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Move the check for the destroyed state of the remote socket to the inner`setImmediate()`.Refs:nodejs#49327 (comment)PR-URL:nodejs#49575Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
* chore: upgrade to Node.js v20* src: allow embedders to override NODE_MODULE_VERSIONnodejs/node#49279* src: fix missing trailing ,nodejs/node#46909* src,tools: initialize cppgcnodejs/node#45704* tools: allow passing absolute path of config.gypi in js2cnodejs/node#49162* tools: port js2c.py to C++nodejs/node#46997* doc,lib: disambiguate the old term, NativeModulenodejs/node#45673* chore: fixup Node.js BSSL tests*nodejs/node#49492*nodejs/node#44498* deps: upgrade to libuv 1.45.0nodejs/node#48078* deps: update V8 to 10.7nodejs/node#44741* test: use gcUntil() in test-v8-serialize-leaknodejs/node#49168* module: make CJS load from ESM loadernodejs/node#47999* src: make BuiltinLoader threadsafe and non-globalnodejs/node#45942* chore: address changes to CJS/ESM loading* module: make CJS load from ESM loader (nodejs/node#47999)* lib: improve esm resolve performance (nodejs/node#46652)* bootstrap: optimize modules loaded in the built-in snapshotnodejs/node#45849* test: mark test-runner-output as flakynodejs/node#49854* lib: lazy-load deps in modules/run_main.jsnodejs/node#45849* url: use private properties for brand checknodejs/node#46904* test: refactor `test-node-output-errors`nodejs/node#48992* assert: deprecate callTrackernodejs/node#47740* src: cast v8::Object::GetInternalField() return value to v8::Valuenodejs/node#48943* test: adapt test-v8-stats for V8 updatenodejs/node#45230* tls: ensure TLS Sockets are closed if the underlying wrapclosesnodejs/node#49327* test: deflake test-tls-socket-closenodejs/node#49575* net: fix crash due to simultaneous close/shutdown on JS Stream Socketsnodejs/node#49400* net: use asserts in JS Socket Stream to catch races in futurenodejs/node#49400* lib: fix BroadcastChannel initialization locationnodejs/node#46864* src: create BaseObject with node::Realmnodejs/node#44348* src: implement DataQueue and non-memory resident Blobnodejs/node#45258* sea: add support for V8 bytecode-only cachingnodejs/node#48191* chore: fixup patch indices* gyp: put filenames in variablesnodejs/node#46965* build: modify js2c.py into GN executable* fix: (WIP) handle string replacement of fs -> original-fs* [v20.x] backport vm-related memoryfixesnodejs/node#49874* src: make BuiltinLoader threadsafe and non-globalnodejs/node#45942* src: avoid copying string in fs_permissionnodejs/node#47746* look upon my works ye mightyand dispair* chore: patch cleanup* [api] Remove AllCan Read/Writehttps://chromium-review.googlesource.com/c/v8/v8/+/5006387* fix: missing include for NODE_EXTERN* chore: fixup patch indices* fix: fail properly when js2c fails in Node.js* build: fix js2c root_gen_dir* fix: lib/fs.js -> lib/original-fs.js* build: fix original-fs file xforms* fixup! module: make CJS load from ESM loader* build: get rid of CppHeap for now* build: add patch to prevent extra fs lookup on esm load* build: greatly simplify js2c modificationsMoves our original-fs modifications back into a super simple python script action, wires up the output of that action into our call to js2c* chore: update to handle moved internal/modules/helpers file* test: update @types/node test* feat: enable preventing cppgc heap creation* feat: optionally prevent calling V8::EnableWebAssemblyTrapHandler* fix: no cppgc initialization in the renderer* gyp: put filenames in variablesnodejs/node#46965* test: disable single executable tests* fix: nan tests failing on node headers missing file* tls,http2: send fatal alert on ALPN mismatchnodejs/node#44031* test: disable snapshot tests*nodejs/node#47887*nodejs/node#49684*nodejs/node#44193* build: use deps/v8 for v8/toolsNode.js hard depends on these in their builtins* test: fix edge snapshot stack tracesnodejs/node#49659* build: remove js2c //base dep* build: use electron_js2c_toolchain to build node_js2c* fix: don't create SafeSet outside packageResolveFixes failure in parallel/test-require-delete-array-iterator:=== release test-require-delete-array-iterator ===Path: parallel/test-require-delete-array-iteratornode:internal/per_context/primordials:426 constructor(i) { super(i); } // eslint-disable-line no-useless-constructor ^TypeError: object is not iterable (cannot read property Symbol(Symbol.iterator)) at new Set (<anonymous>) at new SafeSet (node:internal/per_context/primordials:426:22)* fix: failing crashReporter tests on LinuxThese were failing because our change from node::InitializeNodeWithArgs tonode::InitializeOncePerProcess meant that we now inadvertently calledPlatformInit, which reset signal handling. This meant that our intentionalcrash function ElectronBindings::Crash no longer worked and the renderer processno longer crashed when process.crash() was called. We don't want to use Node.js'default signal handling in the renderer process, so we disable it by passingkNoDefaultSignalHandling to node::InitializeOncePerProcess.* build: only create cppgc heap on non-32 bit platforms* chore: clean up util:CompileAndCall* src: fix compatility with upcoming V8 12.1 APIsnodejs/node#50709* fix: use thread_local BuiltinLoader* chore: fixup v8 patch indices---------Co-authored-by: Keeley Hammond <vertedinde@electronjs.org>Co-authored-by: Samuel Attard <marshallofsound@electronjs.org>
* chore: upgrade to Node.js v20* src: allow embedders to override NODE_MODULE_VERSIONnodejs/node#49279* src: fix missing trailing ,nodejs/node#46909* src,tools: initialize cppgcnodejs/node#45704* tools: allow passing absolute path of config.gypi in js2cnodejs/node#49162* tools: port js2c.py to C++nodejs/node#46997* doc,lib: disambiguate the old term, NativeModulenodejs/node#45673* chore: fixup Node.js BSSL tests*nodejs/node#49492*nodejs/node#44498* deps: upgrade to libuv 1.45.0nodejs/node#48078* deps: update V8 to 10.7nodejs/node#44741* test: use gcUntil() in test-v8-serialize-leaknodejs/node#49168* module: make CJS load from ESM loadernodejs/node#47999* src: make BuiltinLoader threadsafe and non-globalnodejs/node#45942* chore: address changes to CJS/ESM loading* module: make CJS load from ESM loader (nodejs/node#47999)* lib: improve esm resolve performance (nodejs/node#46652)* bootstrap: optimize modules loaded in the built-in snapshotnodejs/node#45849* test: mark test-runner-output as flakynodejs/node#49854* lib: lazy-load deps in modules/run_main.jsnodejs/node#45849* url: use private properties for brand checknodejs/node#46904* test: refactor `test-node-output-errors`nodejs/node#48992* assert: deprecate callTrackernodejs/node#47740* src: cast v8::Object::GetInternalField() return value to v8::Valuenodejs/node#48943* test: adapt test-v8-stats for V8 updatenodejs/node#45230* tls: ensure TLS Sockets are closed if the underlying wrapclosesnodejs/node#49327* test: deflake test-tls-socket-closenodejs/node#49575* net: fix crash due to simultaneous close/shutdown on JS Stream Socketsnodejs/node#49400* net: use asserts in JS Socket Stream to catch races in futurenodejs/node#49400* lib: fix BroadcastChannel initialization locationnodejs/node#46864* src: create BaseObject with node::Realmnodejs/node#44348* src: implement DataQueue and non-memory resident Blobnodejs/node#45258* sea: add support for V8 bytecode-only cachingnodejs/node#48191* chore: fixup patch indices* gyp: put filenames in variablesnodejs/node#46965* build: modify js2c.py into GN executable* fix: (WIP) handle string replacement of fs -> original-fs* [v20.x] backport vm-related memoryfixesnodejs/node#49874* src: make BuiltinLoader threadsafe and non-globalnodejs/node#45942* src: avoid copying string in fs_permissionnodejs/node#47746* look upon my works ye mightyand dispair* chore: patch cleanup* [api] Remove AllCan Read/Writehttps://chromium-review.googlesource.com/c/v8/v8/+/5006387* fix: missing include for NODE_EXTERN* chore: fixup patch indices* fix: fail properly when js2c fails in Node.js* build: fix js2c root_gen_dir* fix: lib/fs.js -> lib/original-fs.js* build: fix original-fs file xforms* fixup! module: make CJS load from ESM loader* build: get rid of CppHeap for now* build: add patch to prevent extra fs lookup on esm load* build: greatly simplify js2c modificationsMoves our original-fs modifications back into a super simple python script action, wires up the output of that action into our call to js2c* chore: update to handle moved internal/modules/helpers file* test: update @types/node test* feat: enable preventing cppgc heap creation* feat: optionally prevent calling V8::EnableWebAssemblyTrapHandler* fix: no cppgc initialization in the renderer* gyp: put filenames in variablesnodejs/node#46965* test: disable single executable tests* fix: nan tests failing on node headers missing file* tls,http2: send fatal alert on ALPN mismatchnodejs/node#44031* test: disable snapshot tests*nodejs/node#47887*nodejs/node#49684*nodejs/node#44193* build: use deps/v8 for v8/toolsNode.js hard depends on these in their builtins* test: fix edge snapshot stack tracesnodejs/node#49659* build: remove js2c //base dep* build: use electron_js2c_toolchain to build node_js2c* fix: don't create SafeSet outside packageResolveFixes failure in parallel/test-require-delete-array-iterator:=== release test-require-delete-array-iterator ===Path: parallel/test-require-delete-array-iteratornode:internal/per_context/primordials:426 constructor(i) { super(i); } // eslint-disable-line no-useless-constructor ^TypeError: object is not iterable (cannot read property Symbol(Symbol.iterator)) at new Set (<anonymous>) at new SafeSet (node:internal/per_context/primordials:426:22)* fix: failing crashReporter tests on LinuxThese were failing because our change from node::InitializeNodeWithArgs tonode::InitializeOncePerProcess meant that we now inadvertently calledPlatformInit, which reset signal handling. This meant that our intentionalcrash function ElectronBindings::Crash no longer worked and the renderer processno longer crashed when process.crash() was called. We don't want to use Node.js'default signal handling in the renderer process, so we disable it by passingkNoDefaultSignalHandling to node::InitializeOncePerProcess.* build: only create cppgc heap on non-32 bit platforms* chore: clean up util:CompileAndCall* src: fix compatility with upcoming V8 12.1 APIsnodejs/node#50709* fix: use thread_local BuiltinLoader* chore: fixup v8 patch indices---------Co-authored-by: Keeley Hammond <vertedinde@electronjs.org>Co-authored-by: Samuel Attard <marshallofsound@electronjs.org>
Uh oh!
There was an error while loading.Please reload this page.
This fixes a potential HTTP/2 segfault (#49307,#48567) caused by underlying TLS behaviour, and likely fixes various other apparently related issues (#46094,#35695).
I found this while investigating#48519, which this doesn'tdirectly fix, but definitely relates to, as the JS Stream Sockets there have all sorts of issues directly caused by TLS interactions after the JSS is closed.
The underlying problem here is that TLS sockets which are created by passing an existing socket (either
tlsServer.emit('connection', socket)
ortls.createConnection({ ..., socket })
) do not listen to whether the underlying socket was closed elsewhere. This is easy to demo:This prints:
With this change, the last line is instead
false false
.Clearly the TLS socket should not be readable or writeable 100ms after it's definitely disconnected. This happens even if something is reading data - the data just stops, and the affected TLS socket (the client in this case) never closes.
These TLSSockets do seem to shut down correctly if theremote side closes the connection, but in any other scenario, the TLS connection will remain happily active like this even when the underlying socket is gone, until something more serious interacts with its internals, which then explode (either segfaults in normal cases, or the various
finishWrite
errors in the issues above in JS Stream Socket cases).The first segfault is definitely caused by this case, and I've added a new test for that.
I'mfairly confident this is the underlying issue for the other cases too - note that both cases use the manual socket handling pattern (with HTTP/2, but implicitly on top of TLS) - but neither has a reliable repro to confirm that.
This change notably significantly modifies an existing TLS test too. Unfortunately with this change in place, that test no longer works at all, as it tries to mess with internals to trigger a race condition that crashed in some Node v7 versions, and those internals are now cleaned up before they can be messed with. This test has been simplified to a more general TLS shutdown test instead.
There is some history to this - very similar code did used to exist!
@addaleax and@ronag will likely be interested in this, since they were involved at various stages of that.
I think there's a good case here that something explicitly handling thisis necessary. Somehow, TLS sockets need to be informed if the underlying socket is closed. Note that this change as here isnot specific to StreamWraps, unless some past changes. AFAICT this is required here for all manually constructed TLS cases - the new test here just passes a net.Socket directly and it still crashes on all current Node versions.