Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Fix close in use certificate providers after doubleClose() method call on wrapper object#7128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
easwars merged 5 commits intogrpc:masterfrombozaro:fix-certificate-provider-close
May 29, 2024

Conversation

bozaro
Copy link
Contributor

@bozarobozaro commentedApr 16, 2024
edited
Loading

wrappedProvider add refcounter tocertprovider.Provider, but there no guards to prevent multipleClose() call after singleBuild() call.

The situation is complicated by:

  • the error does not appear where there was a double closure;
  • before the rotation of certificates, the problem may not manifest itself (that is, the effect is greatly delayed in time);
  • certprovider.Provider implementation don't panic/error on multipleClose() calls.

Fix errors like:

[core][Channel #39 SubChannel #941] grpc: addrConn.createTransport failed to connect to ...Err: connection error: desc = "transport: authentication handshake failed:xds: fetching trusted roots from CertificateProvider failed: provider instance is closed"

RELEASE NOTES:

  • Fixxds: fetching trusted roots from CertificateProvider failed: provider instance is closed error

…call on wrapper objectFix errors like:```[core][Channelgrpc#39 SubChannelgrpc#941] grpc: addrConn.createTransport failed to connect to ...Err: connection error: desc = "transport: authentication handshake failed:xds: fetching trusted roots from CertificateProvider failed: provider instance is closed"```
@aranjansaranjans added this to the1.64 Release milestoneApr 17, 2024
@arvindbr8
Copy link
Member

arvindbr8 commentedApr 17, 2024
edited
Loading

@bozaro -- Calling Close() feels more like a User error to me and wouldnt categorize this as a bug. I agree that the API could have been better by not providing a Close() method which can be called multiple times, which would cause incorrect accounting wrtwp.refCount.

That being said, inorder to fix this, I would rather makeProvider.Close agrpcsync.OnceFunc. That way we can cleanly assert that a reference can only decrement their reference when calling Close().

@bozaro
Copy link
ContributorAuthor

bozaro commentedApr 18, 2024
edited
Loading

Calling Close() feels more like a User error to me and wouldnt categorize this as a bug.

I think that even if the doubleClose() call is considered a user error, it is useful to do a check here to localize the consequences of it and not play the whack-a-mole game.

The situation when all neighboring uses must be written correctly for your code to work is very fragile.

I have found at least two places through which doubleClose() is possible:

That being said, inorder to fix this, I would rather makeProvider.Close agrpcsync.OnceFunc.

I can also add a wrapper ingrpcsync.OnceFunc, but this will not make the code simpler: I still need to check the closing status to return an error fromprovider.KeyMaterial of the closed wrapper.

@easwars
Copy link
Contributor

@bozaro : Thank you for figuring this out and sending us a PR.

There are a few options available to fix this issue:

  • Make breaking API changes to this package
    • Remove theClose method from theProvider interface
    • Change the signature of thestarter argument passed toNewBuildableConfig to return a tupleProvider, func(). The second return value is aclose function to be invoked when the provider is to be closed.
    • ChangeBuildableConfig.Build() to return three values:(Provider, func(), error). The second return value would the close function returned by the provider wrapped in agrpcsync.OnceFunc.
    • With this approach, the user of a provider will be able to close it how many ever times they want, and things will still continue to work cleanly. But this approach could hide potential bugs in the user's code that never show up in their tests, but only happen in production. Therefore we want to avoid this approach
  • The approach taken by you, where we return a separate per-instantiation wrapper that wraps the currentwrappedProvider.
    • With this approach, the user of a provider instance that has calledClose will start seeingerrProviderClosed on subsequent invocations ofKeyMaterial()
    • While this approach is a significant improvement over the existing code, this still will not help the user easily debug when and where they are double closing.

Instead the approach that we like (which is quite similar to your approach) is as follows:

  • Have a per-instantiation wrapper that wraps the currentwrappedProvider. Call it something else,wrappedProviderCloser is too close towrappedProvider. Maybe something likesingleCloseWrappedProvider orsingleRefWrappedProvider or something better.
typesingleCloseWrappedProviderstruct {mu       sync.MutexproviderProvider}func (s*singleCloseWrappedProvider)KeyMaterial(ctx context.Context) (*KeyMaterial,error) {s.mu.Lock()p:=s.providers.mu.Unlock()returnp.KeyMaterial(ctx)}func (s*singleCloseWrappedProvider)Close() {s.mu.Lock()s.provider.Close()s.provider=panickingProvider{}s.mu.Unlock()}
  • Have a provider implementation that panics in itsClose() method.
typepanickingProviderstruct{}func (panickingProvider)Close() {panic("Close called more than once on certificate provider")}func (panickingProvider)KeyMaterial(context.Context) (*KeyMaterial,error) {returnnil,errProviderClosed}
  • ChangeBuildableConfig.Build() to always return an instance ofsingleCloseWrappedProvider wrapping a newly createdwrappedProvider or an existing one (in which case, the ref count on it would have to be incremented, as is done today)

Please let me know your thoughts on this approach.

@bozarobozaroforce-pushed thefix-certificate-provider-close branch fromfd7af42 to64fdac1CompareMay 11, 2024 07:17
@bozaro
Copy link
ContributorAuthor

Make breaking API changes to this package...

This approach has the opposite problem - you can use Provider after closing if it is used by someone else.

The approach taken by you, where we return a separate per-instantiation wrapper that wraps the current wrappedProvider...

I don't think double closure is a problem that needs to be dealt with:

  1. Go often uses a pattern like:
funcfoo()error {x,err:=os.Open(fileName)iferr!=nil {returnerr   }deferx.Close()// ...iferr:=x.Close();err!=nil {returnerr   }// ...returnnil}
  1. Errors will be localized:
    • Close of one wrapper will not break neighboring ones;
    • An unexpected Close wrapper will break code that continues to use the same wrapper;
  2. The current Provider implementation ignores doubleClose calls.

Instead the approach that we like (which is quite similar to your approach) is as follows:

I have corrected the PR, except that I do not panic on doubleClose call.

@bozaro
Copy link
ContributorAuthor

@easwars, as a result, I'm trying to make thewrappedProvider have the same behavior as thecertprovider.Provider itself.

@easwars
Copy link
Contributor

I don't think double closure is a problem that needs to be dealt with:

I agree that there are things like files and conns, where Go supports multiple calls to close. But there are things like channels, which panic when closed more than once.

But making the double close lead topanic in our case, we can ensure that our users are notified right at the moment where they are doing something wrong (i.e closing the same provider more than once). And this also ensures that this can be caught in their testing.

I agree that both approaches (whether we panic or not on double close) result inKeyMaterial returningerrProviderClosed when a closed provider is used by the user. And this also possibly be caught in their testing.

@bozaro
Copy link
ContributorAuthor

@easwars, I tried adding panic when closing twice:https://github.com/bozaro/grpc-go/pull/2/files (all unit tests green).

But:

  • This is a breaking change.
  • I don't really understand what thispanic gives in this case from a practical point of view.
  • Unfortunately, I am not able to reproduce problems with doubleClose() in a test environment, only in production (requirexds:/// to reproduce issue).
  • When using the server in production withxds:/// + mTLS, there is guaranteed (or very often) to be a doubleClose() call at the shutdown stage.

@easwars
Copy link
Contributor

@dfawley for second set of eyes.

Copy link
Member

@dfawleydfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

When using the server in production with xds:/// + mTLS, there is guaranteed (or very often) to be a double Close() call at the shutdown stage.

This is concerning, as it indicates there may be a problem with either the basic usage of the provider itself, or the ownership model surrounding the provider. Can you provide any more information about how this happens?

}

// singleCloseWrappedProvider wraps a provider instance with a reference count to avoid double
// close still in use provider.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nit:

... with a reference count to properly handle multiple calls to Close.

w.mu.Lock()
defer w.mu.Unlock()

w.provider.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we do this call without the lock held?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I have replaced the use ofsync.RWMutex withatomic.Pointer

@bozarobozaroforce-pushed thefix-certificate-provider-close branch fromeccee0e to527266fCompareMay 17, 2024 05:09
@bozaro
Copy link
ContributorAuthor

Can you provide any more information about how this happens?

The most common example of doubleClose() (I don't think it's the only one):

FirstClose() call:

google.golang.org/grpc/xds/internal/server.(*connWrapper).Close(0x4008382d80)go/pkg/mod/google.golang.org/grpc@v1.63.2/xds/internal/server/conn_wrapper.go:162 +0x50crypto/tls.(*Conn).Close(0x40031f6e08?)GOROOT/src/crypto/tls/conn.go:1429 +0xf4google.golang.org/grpc/credentials/xds.(*credsImpl).ServerHandshake(0x40048b0030, {0x36bf288, 0x4008382d80})go/pkg/mod/google.golang.org/grpc@v1.63.2/credentials/xds/xds.go:250 +0x3f8google.golang.org/grpc/internal/transport.NewServerTransport({0x36bf288, 0x4008382d80}, 0x400170be58)go/pkg/mod/google.golang.org/grpc@v1.63.2/internal/transport/http2_server.go:146 +0x84google.golang.org/grpc.(*Server).newHTTP2Transport(0x40019f2a00, {0x36bf288, 0x4008382d80})go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:974 +0xfcgoogle.golang.org/grpc.(*Server).handleRawConn(0x40019f2a00, {0x4001db7f80, 0x9}, {0x36bf288, 0x4008382d80})go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:933 +0x94google.golang.org/grpc.(*Server).Serve.func3()go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:917 +0x64created by google.golang.org/grpc.(*Server).Serve in goroutine 429go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:916 +0x4f4

SecondClose() call:

google.golang.org/grpc/xds/internal/server.(*connWrapper).Close(0x4008382d80)go/pkg/mod/google.golang.org/grpc@v1.63.2/xds/internal/server/conn_wrapper.go:162 +0x50google.golang.org/grpc.(*Server).newHTTP2Transport(0x40019f2a00, {0x36bf288, 0x4008382d80})go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:986 +0x37cgoogle.golang.org/grpc.(*Server).handleRawConn(0x40019f2a00, {0x4001db7f80, 0x9}, {0x36bf288, 0x4008382d80})go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:933 +0x94google.golang.org/grpc.(*Server).Serve.func3()go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:917 +0x64created by google.golang.org/grpc.(*Server).Serve in goroutine 429go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:916 +0x4f4
easwars reacted with thumbs up emoji

@easwars
Copy link
Contributor

@bozaro : Thank you very much for the stack traces.

I think this is what is happening:

  • xDS enabled gRPC server gets a connection
  • This server uses a customnet.Listener which accepts the connection and returns aconnWrapper here:
    cw:=&connWrapper{Conn:conn,filterChain:fc,parent:l,urc:fc.UsableRouteConfiguration}
  • The gRPC server then spawns a goroutine to handle this connection here:
    s.handleRawConn(lis.Addr().String(),rawConn)
  • As part of handling the new connection, an HTTP/2 server transport is created here:
    st,err:=transport.NewServerTransport(c,config)
    • If the server transport creation fails and the returned error value is notcredentials.ErrConnDispatched, the rawConn is closed, which ends up callingClose on theconnWrapper (this will be the second close)
  • As part of creating the HTTP/2 server transport, the server TLS handshake is initiated here:
    conn,authInfo,err=config.Credentials.ServerHandshake(rawConn)
  • This calls toServerHandshake is handled by the xDS credentials implementation here:
    func (c*credsImpl)ServerHandshake(rawConn net.Conn) (net.Conn, credentials.AuthInfo,error) {
  • It wraps the rawConn in atls.Conn here:
    conn:=tls.Server(rawConn,cfg)
  • It then performs the TLS handshake by calling theHandshake method on thetls.Conn here:
    iferr:=conn.Handshake();err!=nil {
    • And if the handshake fails, then it callsClose on thetls.Conn, which ends up callingClose on the wrapped conn, which essentially lands inconnWrapper (this will be the first close)

Also, I see that we have defined this error sentinelErrConnDispatched here:

varErrConnDispatched=errors.New("credentials: rawConn is dispatched out of gRPC")
, but none of the credentials implementations are returning this error.

I feel that the xDS credentials implementation should return this error in the case where the server handshake failed, and it ended up callingClose on thetls.Conn.

@dfawley
Do you know some history behind this error sentinel? What do you think about returning this value from the xDS creds implementation in the above case? Thanks.

@dfawley
Copy link
Member

Dispatched is used to allow another protocol to handle the connection. That's not intended for this kind of use case.

Interesting sequence of events. It seems if we use TLS to wrap a connection, we need to allow for double-close (either inconnWrapper orcertprovider), because:

  1. we can't start requiring that if any customServerHandshake errors, that it closes the connection (as that would be a behavior change), and
  2. we probably shouldn't remove theconn.Close from our TLS handshaker, because not calling theClose method on atls.Conn is probably a bad idea (even if it doesn't leak anything today).

@dfawleydfawley assignedeaswars and unassigneddfawleyMay 20, 2024
@bozaro
Copy link
ContributorAuthor

Do I need to do anything with this PR (e.g. rebase)?

@easwars
Copy link
Contributor

@dfawley : Based on our offline discussion, the changes here reflect mostly what we want. We can track the double close in the TLS code path separately (if required, and if we can do anything actionable there).

This is waiting for your second review, apart from the minor nit that I have above.

@dfawleydfawley assignedeaswars and unassigneddfawleyMay 29, 2024
@easwarseaswars merged commit24e9024 intogrpc:masterMay 29, 2024
1 check passed
bozaro added a commit to bozaro/grpc-go that referenced this pull requestMay 30, 2024
@github-actionsgithub-actionsbot locked asresolvedand limited conversation to collaboratorsNov 26, 2024
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@easwarseaswarseaswars approved these changes

@dfawleydfawleydfawley approved these changes

Assignees

@easwarseaswars

Labels
Projects
None yet
Milestone
1.65 Release
Development

Successfully merging this pull request may close these issues.

5 participants
@bozaro@arvindbr8@easwars@dfawley@aranjans

[8]ページ先頭

©2009-2025 Movatter.jp