Summary
Tables info can be queried faster by calling pg_total_relation_size only once per row, pg_total_relation_size is a volatile function, meaning that it would be called twice per row as volatile functions are not guaranteed the same result per table lookup, but that is not meaningful in this context
Additionally the group by expression can be removed entirely since only one row per oid can be produced by the join conditions
the speed up can be noticed specially in databases with large number of tables, to test this out I will set up a test database and measure the improvement
Set up the environment
We will create a postgres 12 database and populate it with 50000 tables with primary key
Using the following init.sql script:
-- init.sqlCREATETABLEIF NOT EXISTSpublic.client_types ( idSERIALPRIMARY KEY, nameVARCHAR(255));-- create multiple tables with foreign keys in publicDO $$DECLARE table_prefixTEXT :='client_catalog_'; number_of_tables_to_createINT :=50000;BEGIN-- create tables FOR iIN1..number_of_tables_to_create LOOP EXECUTE format(' CREATE TABLE IF NOT EXISTS public.%I ( LIKE public.client_types )' , table_prefix|| i::text); END LOOP;END $$;Let's start a postgres server that initializes with init.sql
❯ docker run --rm -p 15432:5432 -e POSTGRES_USER=example_user -e POSTGRES_PASSWORD=example_password -e POSTGRES_DB=example_db -v $PWD/init.sql:/docker-entrypoint-initdb.d/init.sql:ro --name pg_sql_bench postgres:12-alpine -c shared_buffers=512MB -c max_locks_per_transaction=1000
Validation
Using the two versions of the query:
current_query
-- current_query.sqlselectc.oid :: int8as"id!",nc.nspnameas schema,c.relnameas name,c.relkindas table_kind,c.relrowsecurityas rls_enabled,c.relforcerowsecurityas rls_forced, case whenc.relreplident='d' then'DEFAULT' whenc.relreplident='i' then'INDEX' whenc.relreplident='f' then'FULL' else'NOTHING' endas"replica_identity!", pg_total_relation_size(format('%I.%I',nc.nspname,c.relname)) :: int8as"bytes!", pg_size_pretty( pg_total_relation_size(format('%I.%I',nc.nspname,c.relname)) )as"size!", pg_stat_get_live_tuples(c.oid)as"live_rows_estimate!", pg_stat_get_dead_tuples(c.oid)as"dead_rows_estimate!", obj_description(c.oid)as commentfrom pg_namespace ncjoin pg_class connc.oid=c.relnamespacewherec.relkindin ('r','p','v','m')and not pg_is_other_temp_schema(nc.oid)and ( pg_has_role(c.relowner,'USAGE')or has_table_privilege(c.oid,'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER' )or has_any_column_privilege(c.oid,'SELECT, INSERT, UPDATE, REFERENCES') )group byc.oid,c.relname,c.relkind,c.relrowsecurity,c.relforcerowsecurity,c.relreplident,nc.nspname;new_query
-- new_query.sqlselectc.oid :: int8as"id!",nc.nspnameas schema,c.relnameas name,c.relkindas table_kind,c.relrowsecurityas rls_enabled,c.relforcerowsecurityas rls_forced, case whenc.relreplident='d' then'DEFAULT' whenc.relreplident='i' then'INDEX' whenc.relreplident='f' then'FULL' else'NOTHING' endas"replica_identity!", relation_size:: int8as"bytes!", pg_size_pretty(relation_size)as"size!", pg_stat_get_live_tuples(c.oid)as"live_rows_estimate!", pg_stat_get_dead_tuples(c.oid)as"dead_rows_estimate!", obj_description(c.oid)as commentfrom pg_namespace ncjoin pg_class connc.oid=c.relnamespacecross join lateral pg_total_relation_size(c.oid) relation_sizewherec.relkindin ('r','p','v','m')and not pg_is_other_temp_schema(nc.oid)and ( pg_has_role(c.relowner,'USAGE')or has_table_privilege(c.oid,'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER' )or has_any_column_privilege(c.oid,'SELECT, INSERT, UPDATE, REFERENCES') )The first step is to test that both queries produce the same results
❯ psql postgres://example_user:example_password@localhost:15432/example_db -t -A < current_query.sql | sort > current_query_results❯ psql postgres://example_user:example_password@localhost:15432/example_db -t -A < new_query.sql | sort > new_query_results❯ diff -s current_query_results new_query_resultsFiles current_query_results and new_query_results are identical
Performance analysis
Knowing both queries produce the same result lets check the performance
Prior to each hyperfine run we must restart the database to ensure queries run on equal context
❯ hyperfine 'psql postgres://example_user:example_password@localhost:15432/example_db -t -A < current_query.sql'Benchmark 1: psql postgres://example_user:example_password@localhost:15432/example_db -t -A < current_query.sql Time (mean ± σ): 1.670 s ± 0.379 s [User: 0.050 s, System: 0.041 s] Range (min … max): 1.542 s … 2.748 s 10 runs❯ hyperfine 'psql postgres://example_user:example_password@localhost:15432/example_db -t -A < new_query.sql'Benchmark 1: psql postgres://example_user:example_password@localhost:15432/example_db -t -A < new_query.sql Time (mean ± σ): 1.049 s ± 0.359 s [User: 0.046 s, System: 0.039 s] Range (min … max): 0.929 s … 2.071 s 10 runs
Mean and max time differ on 700ms
Summary
Tables info can be queried faster by calling pg_total_relation_size only once per row, pg_total_relation_size is a volatile function, meaning that it would be called twice per row as volatile functions are not guaranteed the same result per table lookup, but that is not meaningful in this context
Additionally the group by expression can be removed entirely since only one row per oid can be produced by the join conditions
the speed up can be noticed specially in databases with large number of tables, to test this out I will set up a test database and measure the improvement
Set up the environment
We will create a postgres 12 database and populate it with 50000 tables with primary key
Using the following init.sql script:
Let's start a postgres server that initializes with init.sql
Validation
Using the two versions of the query:
current_query
new_query
The first step is to test that both queries produce the same results
Performance analysis
Knowing both queries produce the same result lets check the performance
Prior to each hyperfine run we must restart the database to ensure queries run on equal context
Mean and max time differ on 700ms