- Notifications
You must be signed in to change notification settings - Fork4.5k
Fix close in use certificate providers after doubleClose()
method call on wrapper object#7128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
…call on wrapper objectFix errors like:```[core][Channelgrpc#39 SubChannelgrpc#941] grpc: addrConn.createTransport failed to connect to ...Err: connection error: desc = "transport: authentication handshake failed:xds: fetching trusted roots from CertificateProvider failed: provider instance is closed"```
arvindbr8 commentedApr 17, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@bozaro -- Calling Close() feels more like a User error to me and wouldnt categorize this as a bug. I agree that the API could have been better by not providing a Close() method which can be called multiple times, which would cause incorrect accounting wrt That being said, inorder to fix this, I would rather makeProvider.Close agrpcsync.OnceFunc. That way we can cleanly assert that a reference can only decrement their reference when calling Close(). |
bozaro commentedApr 18, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I think that even if the double The situation when all neighboring uses must be written correctly for your code to work is very fragile. I have found at least two places through which double
I can also add a wrapper in |
@bozaro : Thank you for figuring this out and sending us a PR. There are a few options available to fix this issue:
Instead the approach that we like (which is quite similar to your approach) is as follows:
typesingleCloseWrappedProviderstruct {mu sync.MutexproviderProvider}func (s*singleCloseWrappedProvider)KeyMaterial(ctx context.Context) (*KeyMaterial,error) {s.mu.Lock()p:=s.providers.mu.Unlock()returnp.KeyMaterial(ctx)}func (s*singleCloseWrappedProvider)Close() {s.mu.Lock()s.provider.Close()s.provider=panickingProvider{}s.mu.Unlock()}
typepanickingProviderstruct{}func (panickingProvider)Close() {panic("Close called more than once on certificate provider")}func (panickingProvider)KeyMaterial(context.Context) (*KeyMaterial,error) {returnnil,errProviderClosed}
Please let me know your thoughts on this approach. |
385dd0d
tofd7af42
Compare…method call on wrapper object
fd7af42
to64fdac1
Compare
This approach has the opposite problem - you can use Provider after closing if it is used by someone else.
I don't think double closure is a problem that needs to be dealt with:
funcfoo()error {x,err:=os.Open(fileName)iferr!=nil {returnerr }deferx.Close()// ...iferr:=x.Close();err!=nil {returnerr }// ...returnnil}
I have corrected the PR, except that I do not panic on double |
@easwars, as a result, I'm trying to make the |
I agree that there are things like files and conns, where Go supports multiple calls to close. But there are things like channels, which panic when closed more than once. But making the double close lead to I agree that both approaches (whether we panic or not on double close) result in |
@easwars, I tried adding panic when closing twice:https://github.com/bozaro/grpc-go/pull/2/files (all unit tests green). But:
|
@dfawley for second set of eyes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
When using the server in production with xds:/// + mTLS, there is guaranteed (or very often) to be a double Close() call at the shutdown stage.
This is concerning, as it indicates there may be a problem with either the basic usage of the provider itself, or the ownership model surrounding the provider. Can you provide any more information about how this happens?
} | ||
// singleCloseWrappedProvider wraps a provider instance with a reference count to avoid double | ||
// close still in use provider. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Nit:
... with a reference count to properly handle multiple calls to Close.
w.mu.Lock() | ||
defer w.mu.Unlock() | ||
w.provider.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can we do this call without the lock held?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I have replaced the use ofsync.RWMutex
withatomic.Pointer
…method call on wrapper object
…method call on wrapper object
eccee0e
to527266f
Compare…method call on wrapper object
The most common example of double First
Second
|
@bozaro : Thank you very much for the stack traces. I think this is what is happening:
Also, I see that we have defined this error sentinel grpc-go/credentials/credentials.go Line 125 ine22436a
I feel that the xDS credentials implementation should return this error in the case where the server handshake failed, and it ended up calling @dfawley |
Interesting sequence of events. It seems if we use TLS to wrap a connection, we need to allow for double-close (either in
|
Do I need to do anything with this PR (e.g. rebase)? |
Uh oh!
There was an error while loading.Please reload this page.
@dfawley : Based on our offline discussion, the changes here reflect mostly what we want. We can track the double close in the TLS code path separately (if required, and if we can do anything actionable there). This is waiting for your second review, apart from the minor nit that I have above. |
Uh oh!
There was an error while loading.Please reload this page.
24e9024
intogrpc:masterUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
wrappedProvider
add refcounter tocertprovider.Provider
, but there no guards to prevent multipleClose()
call after singleBuild()
call.The situation is complicated by:
certprovider.Provider
implementation don't panic/error on multipleClose()
calls.Fix errors like:
RELEASE NOTES:
xds: fetching trusted roots from CertificateProvider failed: provider instance is closed
error