|
| 1 | +AnyArray – 1-D anyarray functionality for PostgreSQL |
| 2 | +============================================== |
| 3 | + |
| 4 | +Introduction |
| 5 | +------------ |
| 6 | + |
| 7 | +AnyArray – is a PostgreSQL extension which implements 1-D anyarray |
| 8 | +functionality. |
| 9 | + |
| 10 | + |
| 11 | +Authors |
| 12 | +------- |
| 13 | + |
| 14 | +* Teodor Sigaev<teodor@sigaev.ru> , Postgres Professional, Moscow, Russia |
| 15 | +* Oleg Bartunov <o.bartunov@postgrespro.ru> , Postgres Professional, Moscow, Russia |
| 16 | + |
| 17 | +Availability |
| 18 | +------------ |
| 19 | + |
| 20 | +AnyArray is released as an extension and not available in default PostgreSQL |
| 21 | +installation. It is available from |
| 22 | +[github](https://github.com/postgrespro/anyarray) |
| 23 | +under the same license as |
| 24 | +[PostgreSQL](http://www.postgresql.org/about/licence/) |
| 25 | +and supports PostgreSQL 9.1+. |
| 26 | + |
| 27 | +Installation |
| 28 | +------------ |
| 29 | + |
| 30 | +Before build and install AnyArray you should ensure following: |
| 31 | + |
| 32 | +* PostgreSQL version is 9.1 or higher. |
| 33 | +* You have development package of PostgreSQL installed or you built |
| 34 | + PostgreSQL from source. |
| 35 | +* Your PATH variable is configured so that pg\_config command available. |
| 36 | + |
| 37 | +Typical installation procedure may look like this: |
| 38 | + |
| 39 | + $ git clonehttps://github.com/postgrespro/anyarray.git |
| 40 | + $ cd anyarray |
| 41 | + $ make USE_PGXS=1 |
| 42 | + $ sudo make USE_PGXS=1 install |
| 43 | + $ make USE_PGXS=1 installcheck |
| 44 | + $ psql DB -c "CREATE EXTENSION anyarray;" |
| 45 | + |
| 46 | +Usage |
| 47 | +----- |
| 48 | + |
| 49 | +###Anyarray functions |
| 50 | + |
| 51 | +|Function|Description|Examples| |
| 52 | +|--------|-----------|--------| |
| 53 | +|`anyset(int)` →`int[1]`||`anyset(1234)` →`ARRAY[1234]`| |
| 54 | +|`icount(anyarray)`→`int`|Returns the length of anyarray. icount() returns 0 for empty arrays.|`icount( '{1234234, 0234234}'::int[] )` →`2`| |
| 55 | +|`sort(anyarray [, 'asc'\|'desc'])`→`anyarray`|Returns the anyarray sorted in an ascending (default) or descending order.|`sort( '{1234234, -30, 0234234}'::int[],'desc')` →`{1234234, 234234, -30}`| |
| 56 | +|`sort_asc(anyarray)`→`anyarray`|Returns the anyarray sorted in an ascending order.|`sort_asc( '{1234234,-30, 0234234}'::int[])` →`{-30,234234,1234234}`| |
| 57 | +|`sort_desc(anyarray)`→`anyarray`|Returns the anyarray sorted in a descending order.|`sort( '{1234234, -30, 0234234}'::int[],'desc' )` →`{1234234,234234,-30}`| |
| 58 | +|`uniq(anyarray)`→`anyarray`|Returns anyarray where consequent repeating elements replaced by one element. If you need to remove all repeating elements in array, you can sort array and apply uniq() function.|`uniq( '{1234234, -30, -30, 0234234, -30}'::int[])` →`{1234234, -30, 234234, -30}` ,`uniq( sort_asc( '{1234234, -30, -30, 0234234, -30}'::int[] ) )` →`{-30,234234,1234234}`| |
| 59 | +|`uniq_d(anyarray)`→`anyarray`|Returns only consequent repeating elements. If you need to return all repeating elements, you can sort array and apply uniq_d() function|`uniq_d( '{1234234, -30, -30, 0234234, -30, 0234234}'::int[] )` →`{-30}`,`uniq_d( sort_asc('{1234234, -30,-30, 0234234, -30, 0234234}'::int[] ) )` →`{-30,234234}`| |
| 60 | +|`idx(anyarray, searchelement)`→`int`|Returns the position of the searchelement first occurance in the array|`idx( '{1234234,-30,-30,0234234,-30}'::int[], -30 )` →`2`| |
| 61 | +|`subarray(anyarray, start int, length int)`→`anyarray`|Returns the subarray from original array. If the start position value is negative, it is counted from the end of the original array (-1 means last element, -2 means element before last etc.)|`subarray( '{1234234, -30, -30, 0234234, -30}'::int[],2,3 )` →`{-30, -30, 234234}` ,`subarray( '{1234234, -30, -30, 0234234, -30}'::int[], -1, 1 )` →`{-30}`,`subarray( '{1234234, -30, -30, 0234234, -30}'::int[], 0, -1 )` →`{1234234, -30, -30, 234234}`| |
| 62 | + |
| 63 | + |
| 64 | +###Anyarray operators |
| 65 | + |
| 66 | +|Operator|Description|Examples| |
| 67 | +|--------|-----------|--------| |
| 68 | +|`#anyarray` →`int`|Returns the length of anyarray.|`#'{1234234,0234234}'::int[]` →`2`| |
| 69 | +|`anyarray + anyarray`→`anyarray`|Returns the union of arrays|`'{123,623,445}'::int[] + 1245` →`{123,623,445,1245}` ,`'{123,623,445}'::int[] + '{1245,87,445}'` →`{123,623,445,1245,87,445}`| |
| 70 | +|`anyarray - anyarray`→`anyarray`|Returns the substraction of left array and right array|`'{123,623,445}'::int[] - 623` →`{123,445}`,`'{123,623,445}'::int[] - '{1623,623}'::int[]``{123,445}`| |
| 71 | +|`anyarray \| anyarray`→`anyarray`|Returns the union of array, repeating elements are excluded from resulting array.|`'{123,623,445}'::int[]`\|`{1623,623}'::int[]` →`{123,445,623,1623}`| |
| 72 | +|`anyarray & anyarray`→`anyarray`|Returns arrays intersection.|`'{1,3,1}'::int[] & '{1,2}'` →`{1}`| |
| 73 | + |
| 74 | + |
| 75 | +###Anyarray operator class strategies |
| 76 | + |
| 77 | + |
| 78 | +|Operator|GIST and GIN Strategy num|Description| |
| 79 | +|--------|-------------------------|-----------| |
| 80 | +|`anyarray` &&`anyarray`|RTOverlapStrategyNumber 3|Overlapped| |
| 81 | +|`anyarray` =`anyarray`|RTSameStrategyNumber 6|Same| |
| 82 | +|`anyarray` @>`anyarray`|RTContainsStrategyNumber 7|Contains| |
| 83 | +|`anyarray` <@`anyarray`|RTContainedByStrategyNumber 8|Contained| |
| 84 | +|`anyarray` %`anyarray`|AnyAarraySimilarityStrategy 16|Similarity| |
| 85 | + |
| 86 | + |
| 87 | +###Similarity search options |
| 88 | + |
| 89 | +Set distance type for similarity search. |
| 90 | +``` |
| 91 | +SET anyarray.similarity_type=cosine; |
| 92 | +SET anyarray.similarity_type=jaccard; |
| 93 | +SET anyarray.similarity_type=overlap; |
| 94 | +``` |
| 95 | + |
| 96 | +Set threshold for similarity search. |
| 97 | +``` |
| 98 | +SET anyarray.similarity_threshold = 3; |
| 99 | +RESET anyarray.similarity_threshold; |
| 100 | +``` |
| 101 | + |
| 102 | +Examples |
| 103 | +------- |
| 104 | + |
| 105 | +Examples for INTEGER[] . |
| 106 | + |
| 107 | +###Create a table with sample data. |
| 108 | +``` |
| 109 | +SELECT t, ARRAY( |
| 110 | +SELECT v::int4 |
| 111 | +FROM generate_series(max(0, t - 10), t) as v |
| 112 | +) AS v |
| 113 | +INTO test_int4 |
| 114 | +FROM generate_series(1, 200) as t; |
| 115 | +``` |
| 116 | + |
| 117 | +###Similarity calculation. |
| 118 | + |
| 119 | +``` |
| 120 | +SET anyarray.similarity_type=cosine; |
| 121 | +SELECT t, similarity(v, '{10,9,8,7,6,5,4,3,2,1}') AS s FROM test_int4 |
| 122 | +WHERE v % '{10,9,8,7,6,5,4,3,2,1}' ORDER BY s DESC, t; |
| 123 | +SELECT t, similarity(v, '{50,49,8,7,6,5,4,3,2,1}') AS s FROM test_int4 |
| 124 | +WHERE v % '{50,49,8,7,6,5,4,3,2,1}' ORDER BY s DESC, t; |
| 125 | +
|
| 126 | +SET anyarray.similarity_type=jaccard; |
| 127 | +SELECT t, similarity(v, '{10,9,8,7,6,5,4,3,2,1}') AS s FROM test_int4 |
| 128 | +WHERE v % '{10,9,8,7,6,5,4,3,2,1}' ORDER BY s DESC, t; |
| 129 | +SELECT t, similarity(v, '{50,49,8,7,6,5,4,3,2,1}') AS s FROM test_int4 |
| 130 | +WHERE v % '{50,49,8,7,6,5,4,3,2,1}' ORDER BY s DESC, t; |
| 131 | +
|
| 132 | +SELECT t, v FROM test_int4 WHERE v && '{43,50}' ORDER BY t; |
| 133 | +SELECT t, v FROM test_int4 WHERE v @> '{43,50}' ORDER BY t; |
| 134 | +SELECT t, v FROM test_int4 WHERE v <@ '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 135 | +SELECT t, v FROM test_int4 WHERE v = '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 136 | +SET anyarray.similarity_type=cosine; |
| 137 | +SELECT t, v FROM test_int4 WHERE v % '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 138 | +SET anyarray.similarity_type=jaccard; |
| 139 | +SELECT t, v FROM test_int4 WHERE v % '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 140 | +SET anyarray.similarity_type=overlap; |
| 141 | +SET anyarray.similarity_threshold = 3; |
| 142 | +SELECT t, v FROM test_int4 WHERE v % '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 143 | +RESET anyarray.similarity_threshold; |
| 144 | +``` |
| 145 | + |
| 146 | +###Create GIST index. |
| 147 | + |
| 148 | +``` |
| 149 | +CREATE INDEX idx_test_int4 ON test_int4 USING gist (v _int4_aa_ops); |
| 150 | +
|
| 151 | +SET enable_seqscan=off; |
| 152 | +
|
| 153 | +EXPLAIN (COSTS OFF) SELECT t, v FROM test_int4 WHERE v && '{43,50}' ORDER BY t; |
| 154 | +EXPLAIN (COSTS OFF) SELECT t, v FROM test_int4 WHERE v @> '{43,50}' ORDER BY t; |
| 155 | +EXPLAIN (COSTS OFF) SELECT t, v FROM test_int4 WHERE v <@ '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 156 | +EXPLAIN (COSTS OFF) SELECT t, v FROM test_int4 WHERE v = '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 157 | +EXPLAIN (COSTS OFF) SELECT t, v FROM test_int4 WHERE v % '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 158 | +
|
| 159 | +SELECT t, v FROM test_int4 WHERE v && '{43,50}' ORDER BY t; |
| 160 | +SELECT t, v FROM test_int4 WHERE v @> '{43,50}' ORDER BY t; |
| 161 | +SELECT t, v FROM test_int4 WHERE v <@ '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 162 | +SELECT t, v FROM test_int4 WHERE v = '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 163 | +SET anyarray.similarity_type=cosine; |
| 164 | +SELECT t, v FROM test_int4 WHERE v % '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 165 | +SET anyarray.similarity_type=jaccard; |
| 166 | +SELECT t, v FROM test_int4 WHERE v % '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 167 | +SET anyarray.similarity_type=overlap; |
| 168 | +SET anyarray.similarity_threshold = 3; |
| 169 | +SELECT t, v FROM test_int4 WHERE v % '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 170 | +RESET anyarray.similarity_threshold; |
| 171 | +
|
| 172 | +DROP INDEX idx_test_int4; |
| 173 | +``` |
| 174 | + |
| 175 | +###Create GIN index. |
| 176 | + |
| 177 | +``` |
| 178 | +CREATE INDEX idx_test_int4 ON test_int4 USING gin (v _int4_aa_ops); |
| 179 | +
|
| 180 | +SET enable_seqscan=off; |
| 181 | +
|
| 182 | +EXPLAIN (COSTS OFF) SELECT t, v FROM test_int4 WHERE v && '{43,50}' ORDER BY t; |
| 183 | +EXPLAIN (COSTS OFF) SELECT t, v FROM test_int4 WHERE v @> '{43,50}' ORDER BY t; |
| 184 | +EXPLAIN (COSTS OFF) SELECT t, v FROM test_int4 WHERE v <@ '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 185 | +EXPLAIN (COSTS OFF) SELECT t, v FROM test_int4 WHERE v = '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 186 | +EXPLAIN (COSTS OFF) SELECT t, v FROM test_int4 WHERE v % '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 187 | +
|
| 188 | +SELECT t, v FROM test_int4 WHERE v && '{43,50}' ORDER BY t; |
| 189 | +SELECT t, v FROM test_int4 WHERE v @> '{43,50}' ORDER BY t; |
| 190 | +SELECT t, v FROM test_int4 WHERE v <@ '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 191 | +SELECT t, v FROM test_int4 WHERE v = '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 192 | +SET anyarray.similarity_type=cosine; |
| 193 | +SELECT t, v FROM test_int4 WHERE v % '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 194 | +SET anyarray.similarity_type=jaccard; |
| 195 | +SELECT t, v FROM test_int4 WHERE v % '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 196 | +SET anyarray.similarity_type=overlap; |
| 197 | +SET anyarray.similarity_threshold = 3; |
| 198 | +SELECT t, v FROM test_int4 WHERE v % '{0,1,2,3,4,5,6,7,8,9,10}' ORDER BY t; |
| 199 | +RESET anyarray.similarity_threshold; |
| 200 | +``` |
| 201 | + |
| 202 | + |
| 203 | +Operator class names for all types implemented in anyarray. |
| 204 | +----------------------------------------------------------- |
| 205 | + |
| 206 | +|Type|GIST|GIN| |
| 207 | +|----|----|---| |
| 208 | +|bit|_bit_aa_ops|_bit_aa_ops| |
| 209 | +|bytea|_bytea_aa_ops|_bytea_aa_ops| |
| 210 | +|char|_char_aa_ops|_char_aa_ops| |
| 211 | +|cidr|_cidr_aa_ops|_cidr_aa_ops| |
| 212 | +|date|_date_aa_ops|_date_aa_ops| |
| 213 | +|float4|_float4_aa_ops|_float4_aa_ops| |
| 214 | +|float8|_float8_aa_ops|_float8_aa_ops| |
| 215 | +|inet|_inet_aa_ops|_inet_aa_ops| |
| 216 | +|int2|_int2_aa_ops|_int2_aa_ops| |
| 217 | +|int4|_int4_aa_ops|_int4_aa_ops| |
| 218 | +|int8|_int8_aa_ops|_int8_aa_ops| |
| 219 | +|interval|_interval_aa_ops|_interval_aa_ops| |
| 220 | +|macaddr|_macaddr_aa_ops|_macaddr_aa_ops| |
| 221 | +|money|_money_aa_ops|_money_aa_ops| |
| 222 | +|numeric|_numeric_aa_ops|_numeric_aa_ops| |
| 223 | +|oid|_oid_aa_ops|_oid_aa_ops| |
| 224 | +|text|_text_aa_ops|_text_aa_ops| |
| 225 | +|time|_time_aa_ops|_time_aa_ops| |
| 226 | +|timestamp|_timestamp_aa_ops|_timestamp_aa_ops| |
| 227 | +|timestamptz|_timestamptz_aa_ops|_timestamptz_aa_ops| |
| 228 | +|timetz|_timetz_aa_ops|_timetz_aa_ops| |
| 229 | +|varbit|_varbit_aa_ops|_varbit_aa_ops| |
| 230 | +|varchar|_varchar_aa_ops|_varchar_aa_ops| |