|
| 1 | +Originally from:[tweet](https://twitter.com/samokhvalov/status/1730107298369171943),[LinkedIn post](). |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +#UUID v7 and partitioning (TimescaleDB) |
| 6 | + |
| 7 | +>I post a new PostgreSQL "howto" article every day. Join me in this |
| 8 | +>journey –[subscribe](https://twitter.com/samokhvalov/), provide feedback, share! |
| 9 | +
|
| 10 | +Ok, you've asked – here it is, a draft recipe to use UUIDv7 and partitioning (we'll |
| 11 | +use[@TimescaleDB](https://twitter.com/TimescaleDB)). It's not super elegant, might be not the best, and requires some |
| 12 | +effort to reach efficient plans (with partition pruning involved). If you have an alternative or improvement ideas in |
| 13 | +mind – let me know. |
| 14 | + |
| 15 | +We'll take the function by[@DanielVerite](https://twitter.com/DanielVerite) to generate UUIDv7 as basis: |
| 16 | + |
| 17 | +```sql |
| 18 | +create or replacefunctionuuid_generate_v7() returns uuid |
| 19 | +as $$ |
| 20 | +-- use random v4 uuid as starting point (which has the same variant we need) |
| 21 | +-- then overlay timestamp |
| 22 | +-- then set version 7 by flipping the 2 and 1 bit in the version 4 string |
| 23 | +select encode( |
| 24 | + set_bit( |
| 25 | + set_bit( |
| 26 | + overlay( |
| 27 | + uuid_send(gen_random_uuid()) |
| 28 | + placingsubstring(int8send(floor(extract(epochfrom clock_timestamp())*1000)::bigint)from3) |
| 29 | +from1 for6 |
| 30 | + ), |
| 31 | +52,1 |
| 32 | + ), |
| 33 | +53,1 |
| 34 | + ), |
| 35 | +'hex')::uuid; |
| 36 | +$$ language SQL volatile; |
| 37 | +``` |
| 38 | + |
| 39 | +##Helper functions, UUIDv7 <-> timestamptz |
| 40 | + |
| 41 | +Next, we'll create two functions: |
| 42 | + |
| 43 | +1.`ts_to_uuid_v7` – generate UUIDv7 based on any arbitrary`timestamptz` value, and |
| 44 | +2.`uuid_v7_to_ts` – extract`timestamptz` from the existing UUIDv7 value. |
| 45 | + |
| 46 | +Note that this approach is not what the authors of revised RFC4122 (that will likely be finalized soon) would encourage; |
| 47 | +see[this discussion and the words](https://postgresql.org/message-id/flat/C80B8FDB-8D9E-48A2-82A2-48863987A1B1%40yandex-team.ru#074a05d31c9ce38bee2f8c8097877485) |
| 48 | +by |
| 49 | +[@x4mmmmmm](https://twitter.com/x4mmmmmm): |
| 50 | + |
| 51 | +>... as far as I know, RFC discourages extracting timestamps from UUIDs. |
| 52 | +
|
| 53 | +Anyway, let's just do it: |
| 54 | + |
| 55 | +```sql |
| 56 | +create extension pgcrypto; |
| 57 | + |
| 58 | +create or replacefunctionts_to_uuid_v7(timestamptz) returns uuid |
| 59 | +as $$ |
| 60 | +select encode( |
| 61 | + set_bit( |
| 62 | + set_bit( |
| 63 | + overlay( |
| 64 | + uuid_send(gen_random_uuid()) |
| 65 | + placingsubstring(int8send(floor(extract(epochfrom $1)*1000)::bigint)from3) |
| 66 | +from1 for6 |
| 67 | + ), |
| 68 | +52,1 |
| 69 | + ), |
| 70 | +53,1 |
| 71 | + ), |
| 72 | +'hex')::uuid; |
| 73 | +$$ language SQL volatile; |
| 74 | + |
| 75 | +create or replacefunctionuuid_v7_to_ts(uuid_v7 uuid) returnstimestamptz |
| 76 | +as $$ |
| 77 | +select |
| 78 | + to_timestamp( |
| 79 | + ( |
| 80 | +'x'||substring( |
| 81 | + encode(uuid_send(uuid_v7),'hex') |
| 82 | +from1 for12 |
| 83 | + ) |
| 84 | + )::bit(48)::bigint/1000.0 |
| 85 | + )::timestamptz; |
| 86 | +$$ language sql; |
| 87 | +``` |
| 88 | + |
| 89 | +Checking the functions: |
| 90 | + |
| 91 | +```sql |
| 92 | +test=# select now(), ts_to_uuid_v7(now() - interval '1y'); |
| 93 | + now | ts_to_uuid_v7 |
| 94 | +-------------------------------+-------------------------------------- |
| 95 | +2023-11-3005:36:32.205093+00 | 0184c709-63cd-7bd1-99c3-a4773ab1e697 |
| 96 | +(1 row) |
| 97 | + |
| 98 | +test=# select uuid_v7_to_ts('0184c709-63cd-7bd1-99c3-a4773ab1e697'); |
| 99 | + uuid_v7_to_ts |
| 100 | +---------------------------- |
| 101 | +2022-11-3005:36:32.205+00 |
| 102 | +(1 row) |
| 103 | +``` |
| 104 | + |
| 105 | +Pretending that we haven't noticed the loss of microseconds, we continue. |
| 106 | + |
| 107 | +>🎯**TODO:** : |
| 108 | +>1) may it be the case when we need that precision? |
| 109 | +>2) timezones |
| 110 | +
|
| 111 | +##Hypertable |
| 112 | + |
| 113 | +Create a table, where we'll store ID as UUID, but additionally have a`timestamptz` column – this column will be used as |
| 114 | +partitioning key when we convert the table to partitioned table ("hypertable" in TimescaleDB's terminology): |
| 115 | + |
| 116 | +```sql |
| 117 | +createtablemy_table ( |
| 118 | + id uuidnot null |
| 119 | + default'00000000-0000-0000-0000-000000000000'::uuid, |
| 120 | + payloadtext, |
| 121 | + uuid_tstimestamptznot null default clock_timestamp()-- or now(), depending on goals |
| 122 | +); |
| 123 | +``` |
| 124 | + |
| 125 | +The default value`00000000-...00` for`id` is "fake" – it will always be replaced in trigger, based on the timestamp: |
| 126 | + |
| 127 | +```sql |
| 128 | +create or replacefunctiont_update_uuid() returns trigger |
| 129 | +as $$ |
| 130 | +begin |
| 131 | + ifnew.id isnullornew.id='00000000-0000-0000-0000-000000000000'::uuid then |
| 132 | +new.id := ts_to_uuid_v7(new.uuid_ts); |
| 133 | + end if; |
| 134 | + |
| 135 | + return new; |
| 136 | +end; |
| 137 | +$$ language plpgsql; |
| 138 | + |
| 139 | +createtriggert_update_uuid_before_insert_update |
| 140 | +before insertorupdateon my_table |
| 141 | +for each row execute function t_update_uuid(); |
| 142 | +``` |
| 143 | + |
| 144 | +Now, use TimescaleDB partitioning: |
| 145 | + |
| 146 | +```sql |
| 147 | +create extension timescaledb; |
| 148 | + |
| 149 | +select create_hypertable( |
| 150 | + relation :='my_table', |
| 151 | + time_column_name :='uuid_ts', |
| 152 | +-- !! very small interval is just for testing |
| 153 | + chunk_time_interval :='1 minute'::interval |
| 154 | +); |
| 155 | +``` |
| 156 | + |
| 157 | +##Test data - fill the chunks |
| 158 | + |
| 159 | +And now insert some test data – some rows for the "past" and some "current" rows: |
| 160 | + |
| 161 | +```sql |
| 162 | +insert into my_table(payload, uuid_ts) |
| 163 | +select random()::text, ts |
| 164 | +from generate_series( |
| 165 | +timestamptz'2000-01-01 00:01:00', |
| 166 | +timestamptz'2000-01-01 00:05:00', |
| 167 | + interval'5 second' |
| 168 | +)as ts; |
| 169 | + |
| 170 | +insert into my_table(payload) |
| 171 | +select random()::text |
| 172 | +from generate_series(1,10000); |
| 173 | + |
| 174 | +vacuum analyze my_table; |
| 175 | +``` |
| 176 | + |
| 177 | +Checking the structure of`my_table` in psql using`\d+` we now see that multiple partitions ("chunks") were created by |
| 178 | +TimescaleDB: |
| 179 | + |
| 180 | +```sql |
| 181 | +test=# \d+ my_table |
| 182 | +... |
| 183 | +Child tables:_timescaledb_internal._hyper_2_3_chunk, |
| 184 | +_timescaledb_internal._hyper_2_4_chunk, |
| 185 | +_timescaledb_internal._hyper_2_5_chunk, |
| 186 | +_timescaledb_internal._hyper_2_6_chunk, |
| 187 | +_timescaledb_internal._hyper_2_7_chunk, |
| 188 | +_timescaledb_internal._hyper_2_8_chunk, |
| 189 | +_timescaledb_internal._hyper_2_9_chunk |
| 190 | +``` |
| 191 | + |
| 192 | +##Test queries – partition pruning |
| 193 | + |
| 194 | +Now we just need to remember that`uuid_ts` should always participate in queries, to let planner deal with as few |
| 195 | +partitions as possible – but knowing the`id` values, we can always reconstruct the`uuid_ts` values, using |
| 196 | +`uuid_v7_to_ts()`. Note that I first disabled`seqscan` as the table`my_table` has too few rows, otherwise PostgreSQL |
| 197 | +may decide on preferring`seqscan` over index scan: |
| 198 | + |
| 199 | +```sql |
| 200 | +test# set enable_seqscan = off; |
| 201 | +SET |
| 202 | + |
| 203 | +test=# explain select * from my_table where uuid_ts = uuid_v7_to_ts('00dc6ad0-9660-7b92-a95e-1d7afdaae659'); |
| 204 | + QUERY PLAN |
| 205 | +-------------------------------------------------------------------------------------------------------------------------- |
| 206 | + Append (cost=0.14..8.16 rows=1 width=41) |
| 207 | +-> Index Scan using _hyper_5_11_chunk_my_table_uuid_ts_idxon _hyper_5_11_chunk (cost=0.14..8.15 rows=1 width=41) |
| 208 | + Index Cond: (uuid_ts='2000-01-01 00:01:00+00'::timestamp with time zone) |
| 209 | +(3 rows) |
| 210 | + |
| 211 | +test=# explain select * from my_table |
| 212 | +where uuid_ts>= uuid_v7_to_ts('018c1ecb-d3b7-75b1-add9-62878b5152c7') |
| 213 | +order by uuid_tsdesclimit10; |
| 214 | + QUERY PLAN |
| 215 | +----------------------------------------------------------------------------------------------------------------------------------- |
| 216 | +Limit (cost=0.29..1.17 rows=10 width=41) |
| 217 | +-> Custom Scan (ChunkAppend)on my_table (cost=0.29..11.49 rows=126 width=41) |
| 218 | + Order:my_table.uuid_tsDESC |
| 219 | +-> Index Scan using _hyper_5_16_chunk_my_table_uuid_ts_idxon _hyper_5_16_chunk (cost=0.29..11.49 rows=126 width=41) |
| 220 | + Index Cond: (uuid_ts>='2023-11-30 05:55:23.703+00'::timestamp with time zone) |
| 221 | +(5 rows) |
| 222 | +``` |
| 223 | + |
| 224 | +– partition pruning in play, although it will require certain effort to have it in various queries. But it works. |
| 225 | + |
| 226 | + |
| 227 | +-------- |
| 228 | + |
| 229 | +##Postscript |
| 230 | + |
| 231 | +Also read the following comment by[@jamessewell](https://twitter.com/jamessewell), originaly posted |
| 232 | +[here](https://twitter.com/jamessewell/status/1730125437903450129): |
| 233 | + |
| 234 | +>If update your`create_hypertable` call with: |
| 235 | +> |
| 236 | +>``` |
| 237 | +> time_column_name => 'id' |
| 238 | +> time_partitioning_func => 'uuid_v7_to_ts' |
| 239 | +> ``` |
| 240 | +> |
| 241 | +> Then you'll be able to drop the `uuid_ts` col and your trigger! |
| 242 | +> |
| 243 | +> ```sql |
| 244 | +> SELECT * FROM my_table WHERE id = '018c1ecb-d3b7-75b1-add9-62878b5152c7'; |
| 245 | +> ``` |
| 246 | +> |
| 247 | +> Will just work 🪄 |