Trying toClose#1287 and#783 by Utilizing dyn on ContainerType and dyn on create_type
the bottleneck was container_type based oncargo llvm lines on our orgs backend
Lines Copies Function name ----- ------ ------------- 31446347 848007 (TOTAL) 1384736 (4.4%, 4.4%) 4464 (0.5%, 0.5%) async_graphql::resolver_utils::container::Fields::add_set::{{closure}} 762755 (2.4%, 6.8%) 1730 (0.2%, 0.7%) async_graphql::registry::Registry::create_type 704576 (2.2%, 9.1%) 33159 (3.9%, 4.6%) core::result::Result<T,E>::map_err 601957 (1.9%, 11.0%) 1121 (0.1%, 4.8%) async_graphql::resolver_utils::container::Fields::add_set 411070 (1.3%, 12.3%) 1111 (0.1%, 4.9%) async_graphql::resolver_utils::container::resolve_container_inner::{{closure}} 407835 (1.3%, 13.6%) 795 (0.1%, 5.0%) async_graphql::resolver_utils::list::resolve_list::{{closure}} 383187 (1.2%, 14.8%) 1618 (0.2%, 5.2%) <futures_util::stream::futures_unordered::FuturesUnordered<Fut> as futures_core::stream::Stream>::poll_nex
i've created a reproduction by forkingrust-graphql-pref and adding 100 controllers to see the clearer benchmark.rust-graphql-pref-100-controllers
and running cargoCARGO_PROFILE_RELEASE_LTO=fat cargo llvm-lines --release
before
Lines Copies Function name ----- ------ ------------- Lines Copies Function name ----- ------ ------------- 7753978 167295 (TOTAL) 447659 (5.8%, 5.8%) 790 (0.5%, 0.5%) async_graphql::registry::Registry::create_type 316899 (4.1%, 9.9%) 832 (0.5%, 1.0%) async_graphql::resolver_utils::container::Fields::add_set::{{closure}} 141296 (1.8%, 11.7%) 213 (0.1%, 1.1%) async_graphql::resolver_utils::container::Fields::add_set 127600 (1.6%, 13.3%) 200 (0.1%, 1.2%) async_graphql::resolver_utils::list::resolve_list::{{closure}} 122281 (1.6%, 14.9%) 4808 (2.9%, 4.1%) core::result::Result<T,E>::map_err 122203 (1.6%, 16.5%) 414 (0.2%, 4.3%) <futures_util::stream::futures_unordered::FuturesUnordered<Fut> as futures_core::stream::Stream>::poll_next 112600 (1.5%, 17.9%) 200 (0.1%, 4.5%) <async_graphql::types::merged_object::MergedObject<A,B
after using this patch
Lines Copies Function name ----- ------ ------------- 7000605 170073 (TOTAL) 138200 (2.0%, 2.0%) 200 (0.1%, 0.1%) async_graphql::resolver_utils::list::resolve_list::{{closure}}::resolve_list_inner::{{closure}} 122281 (1.7%, 3.7%) 4808 (2.8%, 2.9%) core::result::Result<T,E>::map_err 122203 (1.7%, 5.5%) 414 (0.2%, 3.2%) <futures_util::stream::futures_unordered::FuturesUnordered<Fut> as futures_core::stream::Stream>::poll_next 112600 (1.6%, 7.1%) 200 (0.1%, 3.3%) <async_graphql::types::merged_object::MergedObject<A,B> as async_graphql::base::OutputTypeMarker>::create_type_info::{{closure}} 106098 (1.5%, 8.6%) 414 (0.2%, 3.5%) <futures_util::future::try_join_all::TryJoinAll<F> as core::future::future::Future>::poll 97277 (1.4%, 10.0%) 3971 (2.3%, 5.9%) alloc::boxed::Box<T>::new 95064 (1.4%, 11.3%) 204 (0.1%, 6.0%) async_graphql::resolver_utils::container::resolve_container_inner::{{closure}}
the lines generated by llvm is lower by 750K lines
and after with -Zself-profile and thensummarize summarize your_prof_data on before and after this patch
before
+--------------------------------------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+| Item | Self time | % of total time | Time | Item count | Cache hits | Blocked time |+--------------------------------------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+| evaluate_obligation ................................................... | 235.47s | 19.629 | 235.81s | 665649 | 1048674 | 288.85ms || LLVM_module_optimize .................................................. | 149.95s | 12.499 | 149.95s | 17 | 0 | 0.00ns || LLVM_lto_optimize ..................................................... | 104.60s | 8.719 | 104.60s | 16 | 0 | 0.00ns || LLVM_module_codegen_emit_obj .......................................... | 73.40s | 6.119 | 73.40s | 17 | 0 | 0.00ns || LLVM_passes ........................................................... | 47.90s | 3.993 | 48.06s | 1 | 0 | 0.00ns || finish_ongoing_codegen ................................................ | 39.10s | 3.259 | 39.10s | 1 | 0 | 0.00ns || codegen_module ........................................................ | 34.43s | 2.870 | 36.26s | 16 | 0 | 0.00ns |
after
+--------------------------------------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+| Item | Self time | % of total time | Time | Item count | Cache hits | Blocked time |+--------------------------------------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+| evaluate_obligation ................................................... | 231.93s | 20.581 | 232.27s | 668701 | 1128934 | 175.54ms || LLVM_module_optimize .................................................. | 130.91s | 11.617 | 130.91s | 17 | 0 | 0.00ns || LLVM_lto_optimize ..................................................... | 89.99s | 7.986 | 89.99s | 16 | 0 | 0.00ns || LLVM_module_codegen_emit_obj .......................................... | 65.29s | 5.794 | 65.29s | 17 | 0 | 0.00ns || LLVM_passes ........................................................... | 38.23s | 3.393 | 38.40s | 1 | 0 | 0.00ns || codegen_module ........................................................ | 32.41s | 2.876 | 32.97s | 16 | 0 | 0.00ns || finish_ongoing_codegen ................................................ | 29.72s | 2.637 | 29.72s | 1 | 0 | 0.00ns |
it lower by 1 minute and for larger codebases like my orgs, it reduced from 30-35m to 22-25m
all of my findings is replicable trough myrust-graphql-pref
dyn on ContainerType works only when boxed-trait feature is enabled, not with default one because of impl Future on the return method trait that not dyn compatible
Uh oh!
There was an error while loading.Please reload this page.
Trying toClose#1287 and#783 by Utilizing dyn on ContainerType and dyn on create_type
the bottleneck was container_type based on
cargo llvm lineson our orgs backendi've created a reproduction by forkingrust-graphql-pref and adding 100 controllers to see the clearer benchmark.rust-graphql-pref-100-controllers
and running cargo
CARGO_PROFILE_RELEASE_LTO=fat cargo llvm-lines --releasebefore
after using this patch
the lines generated by llvm is lower by 750K lines
and after with -Zself-profile and then
summarize summarize your_prof_dataon before and after this patchbefore
after
it lower by 1 minute and for larger codebases like my orgs, it reduced from 30-35m to 22-25m
all of my findings is replicable trough myrust-graphql-pref
dyn on ContainerType works only when boxed-trait feature is enabled, not with default one because of impl Future on the return method trait that not dyn compatible