- Notifications
You must be signed in to change notification settings - Fork1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Fix the stack overflow in the stack overflow handler on Windows!#1289
Conversation
When backtrace recording is enabled, the alternative stack isoverflowing before caml_raise_exception has a chance to restore theoriginal stack. Seen with VS 2015 and VS 2017, but dependent on thetarget machine (Windows 10 1703 consistently demonstrating, AppVeyor'sServer 2012 instances similarly but Windows 10 1511 and Azure WindowsServer 2016 both apparently fine - all with the same binary).
Glad to see this ! For information,#1200 also triggers this systematically. |
@nojb - excellent! I've been investigating so many things around it on so many systems and then suddenly realised cycling to work today that it was simply blowing the second stack!! |
Good - CI pass. /cc @ondrieu,@xavierleroy and anyone else familiar with the stack backtrace machinery to review the reasoning behind this change... |
A bigger alt stack cannot hurt, and 128 words (the current size) is small indeed. Let's try the bigger stack. |
#1070 has revealed a slight flaw in#938 in that the native code stack overflow test is failing. My investigations have determined that the most likely cause of this is that
win32_alt_stack
declared inbyterun/win32.c
is in certain circumstances not large enough.This GPR is arriving in two bits, just so that I can get more AppVeyor logs - the first commit being added alters the stack overflow test to record backtraces for the second run. On AppVeyor and on my Windows 10 1703 machines, this is sufficient to cause the native code executable to segfault. Interestingly, on Windows Server 2016 instances running on Azure and also on my Windows 10 1511 box, the same binary works successfully. I'm making slightly hand-wavy assertions that this is a mixture of ASLR and possibly code injecty things (Windows Defender et al) causing this. win32_alt_stack does not have a guard page, so I believe I'm correct that if it overruns, we're just looking at chaos - hence the strange fact that#1070 appears to have triggered it. On machines where it fails, it seems to fail with both msvc32 and msvc64 but not with mingw32 or mingw64. I've been testing VS 2015 and VS 2017, so I'm guessing it's just down to the different runtimes possibly needing a teensy bit more stack between the exception handler completing and
caml_raise_exception
restoring the old one.As far as I can tell, just doubling the size of
win32_alt_stack
is enough to mitigate this.