Movatterモバイル変換

postgrespro/vopsPublic

NotificationsYou must be signed in to change notification settings
Fork23
Star164

License

View license

164 stars 23 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
expected		expected
sql		sql
.gitignore		.gitignore
LICENSE		LICENSE
META.json		META.json
Makefile		Makefile
README.md		README.md
deparse.c		deparse.c
meson.build		meson.build
projections.sql		projections.sql
tpch.sql		tpch.sql
tpch2.sql		tpch2.sql
vops--1.0--1.1.sql		vops--1.0--1.1.sql
vops--1.1.sql		vops--1.1.sql
vops.c		vops.c
vops.control		vops.control
vops.h		vops.h
vops.html		vops.html
vops_as_123.md		vops_as_123.md
vops_fdw.c		vops_fdw.c
vops_fdw.h		vops_fdw.h

Repository files navigation

Motivation

PostgreSQL looks very competitive with other mainstream databases onOLTP workload (execution of large number of simple queries). But on OLAPqueries, requiring processing of larger volumes of data, DBMS-esoriented on analytic queries processing can provide an order ofmagnitude better speed. Let's investigate why it happen and can we dosomething to make PostgreSQL efficient also for OLAPqueries.

Where DBMS spent most of the time during processing OLAP queries?

Profiling execution of queries shows several main factors, limitingPostgres performance:

Unpacking tuple overhead (tuple_deform). To be able to accesscolumn values, Postgres needs to deform the tuple. Values can becompressed, stored at some other page (TOAST), ... Also, as far assize of column can be varying, to extract N-th column we need tounpack preceding N-1 columns. So deforming tuple is quite expensiveoperation, especially for tables with large number of attributes. Insome cases rearranging columns in the table allows to significantlyreduce query execution time. Another universal approach is to splittable into two: one small with frequently accessed scalar columnsand another with rarely used large columns. Certainly in this casewe need to perform extra join but it allows to several times reduceamount of fetched data. In queries like TPC-H Q6, tuple deform takesabout 40% of total query execution time.
Interpretation overhead. Postgres compiler and optimizer build treerepresenting query execution plan. So query executor performsrecursive invocation of evaluate functions for nodes of this tree.Implementation of some nodes also contain switches used to selectrequested action. So query plan is interpreted by Postgres queryexecutor rather than directly executed. Usually interpreter is about10 times slower than native code. This is why elimination ofinterpretation overhead allows to several times increase queryspeed, especially for queries with complex predicates where mosttime is spent in expression evaluation.
Abstraction penalty. Support of abstract (user defined) types andoperations is one of the key features of Postgres. It's executor isable to deal not only with built-in set of scalar types (likeinteger, real, ...) but with any types defined by user (for examplecomplex, point,...). But the price of such flexibility is that eachoperations requires function call. Instead of adding to integersdirectly, Postgres executor invokes function which performs additionof two integers. Certainly in this case function call overhead ismuch larger then performed operation itself. Function call overheadis also increased because of Postgres function call conventionrequiring passing parameter values through memory (not usingregister call convention).
Pull model overhead. Postgres executor is implementing classicalVolcano-style query execution model - pull model. Operand's valuesare pulled by operator. It simplifies executor and operatorsimplementation. But it has negative impact on performance, becauseleave nodes (fetching tuple from heap or index page) have to do alot of extra work saving and restoring their context.
MVCC overhead. Postgres provides multiversion concurrency control,which allows multiple transactions to concurrently work with thesame record without blocking each other. It is goods for frequentlyupdated data (OLTP), but for read-only or append-only data in OLAPscenarios it adds just extra overhead. Both space overhead (about 20extra bytes per tuple) and CPU overhead (checking visibility of eachtuple).

There are many different ways of addressing this issues. For example wecan use JIT (Just-In-Time) compiler to generate native code for queryand eliminate interpretation overhead and increase heap deform speed. Wecan rewrite optimizer from pull to push model. We can try to optimizetuple format to make heap deforming more efficient. Or we can generatebyte code for query execution plan which interpretation is moreefficient than recursive invocation of evaluate function for each nodebecause of better access locality. But all this approaches requiresignificant rewriting of Postgres executor and some of them also requirechanges of all Postgres architecture.

But there is an approach which allows to address most of this issueswithout radical changes of executor. It is vector operations. It isexplained in next section.

Vertical storage

Traditional query executor (like Postgres executor) deals with singlerow of data at each moment of time. If it has to evaluate expression(x+y) then it fetches value of "x", then value of "y", performsoperation "+" and returns the result value to the upper node. Incontrast vectorized executor is able to process in one operationmultiple values. In this case "x" and "y" represent not just a singlescalar value, but vector of values and result is also a vector ofvalues. In vector execution model interpretation and function calloverhead is divided by size of vector. The price of performing functioncall is the same, but as far as function proceeds N values instead ofjust one, this overhead become less critical.

What is the optimal size for the vector? From the explanation above itis clear that the larger vector is, the less per-row overhead we have.So we can form vector from all values of the correspondent tableattribute. It is so called vertical data model or columnar store. Unlikeclassical "horizontal" data model where the unit of storing data is row(tuple), here we have vertical columns. Columnar store has the followingmain advantages:

Reduce size of fetched data: only columns used in query need to befetched
Better compression: storing all values of the same attributetogether makes it possible to much better and faster compress them,for example using delta encoding.
Minimize interpretation overhead: each operation is perform not forsingle value, but for set of values
Use CPU vector instruction (SIMD) to process data

There are several DBMS-es implementing columnar store model. Mostpopular areVertica,MonetDB. But actually performingoperation on the whole column is not so good idea. Table can be verylarge (OLAP queries are used to work with large data sets), so vectorcan also be very big and even doesn't fit in memory. But even if it fitsin memory, working with such larger vectors prevent efficientutilization of CPU caches (L1, L2,...). Consider expression(x+y)*(x-y). Vector executor performs addition of two vectors : "x" and"y" and produces result vector "r1". But when last element of vector "r"is produced, first elements of vector "r1" are already thrown from CPUcache, as well as first elements of "x" and "y" vectors. So when we needto calculate (x-y) we once again have to load data for "x" and "y" fromslow memory to fast cache. Then we produce "r2" and performmultiplication of "r1" and "r2". But here we also need first to loaddata for this vectors into the CPU cache.

So it is more efficient to split column into relatively smallchunks(ortiles - there is no single notion for it accepted by everyone).This chunk is a unit of processing by vectorized executor. Size of suchchunk is chosen to keep all operands of vector operations in cache evenfor complex expressions. Typical size of chunk is from 100 to 1000elements. So in case of (x+y)*(x-y) expression, we calculate it not forthe whole column but only for 100 values (assume that size of the chunkis 100). Splitting columns into chunks in successors of MonetDB x100 andHyPer allows to increase speed up to ten times.

VOPS

Overview

There are several attempts to integrate columnar store in PostgreSQL.The most known isCStore FDWby CitusDB. It is implemented as foreign data wrapper (FDW) and isefficient for queries fetching relatively small fraction of columns. Butit is using standard Postgres raw-based executor and so is not able totake advantages of vector processing. There is interestingproject doneby CitusDB intern. He implements vector operations on top of CStoreusing executors hooks for some nodes. IT reports 4-6 times speedup forgrand aggregates and 3 times speedup for aggregation with group by.

Another project isIMCS: In-MemoryColumnar Store. Here columnar store is implemented in memory and isaccessed using special functions. So you can not use standard SQL queryto work with this storage - you have to rewrite it using IMCS functions.IMCS provides vector operations (using tiles) and parallel execution.

Both CStore and IMCS are keeping data outside Postgres. But what if wewant to use vector operations for data kept in standard Postgres tables?Definitely, the best approach is to impalement alternative heap format.Or even further: eliminate notion of heap at all - treat heap just asyet another access method, similar with other indexes.

But such radical changes requires deep redesign of all Postgresarchitecture. It will be better to estimate first possible advantages wecan expect from usage of vector vector operations. Vector executor iswidely discussed in Postgres forums, but efficient vector executor isnot possible without underlying support at storage layer. Advantages ofvector processing will be annihilated if vectors are formed fromattributes of rows extracted from existed Postgres heap page.

The idea of VOPS extension is to implement vector operations for tilesrepresented as special Postgres types. Tiles should be used as tablecolumn types instead of scalar types. For example instead of "real" weshould use "vops_float4" which is tile representing up to 64 values ofthe correspondent column. Why 64? There are several reasons for choosingthis number:

We provide efficient access to tiles, we need thatsize_of_tile*size_of_attribute*number_of_attributes issmaller than page size. Typical record contains about 10 attributes,default size of Postgres page is 8kb.
64 is number of bits in large word. We need to maintain bitmask tomark null values. Certainly it is possible to store bitmask in arraywith arbitrary size, but manipulation with single 64-bit integer ismore efficient.
Due to the arguments above, to efficiently utilize cache, size oftile should be in range 100..1000.

VOPS is implemented as Postgres extension. It doesn't change anything inPostgres executor and page format. It also doesn't setup any executorshooks or alter query execution plan. The main idea of this project wasto measure speedup which can be reached by using vector operation withexisted executor and heap manager. VOPS provides set of standardoperators for tile types, allowing to write SQL queries in the waysimilar with normal SQL queries. Right now vector operators can be usedinside predicates and aggregate expressions. Joins are not currentlysupported. Details of VOPS architecture are described below.

Types

VOPS supports all basic Postgres numeric types: 1,2,4,8 byte integersand 4,8 bytes floats. Also it supportsdate andtimestamp types butthem are using the same implementation asint4 andint8correspondingly.

SQL type	C type	VOPS tile type
bool	bool	vops_bool
"char"	char	vops_char
int2	int16	vops_int2
int4	int32	vops_int4
int8	int64	vops_int8
float4	float4	vops_float4
float8	float8	vops_float8
date	DateADT	vops_date
timestamp	Timestamp	vops_timestamp
char(N), varchar(N)	text	vops_text

VOPS doesn't support work with strings (char or varchar types), exceptcase of single character. If strings are used as identifiers, in mostcases it is preferable to place them in some dictionary and use integeridentifiers instead of original strings.

Vector operators

VOPS provides implementation of all built-in SQL arithmetic operationsfor numeric types:+ - / * Certainly it also implements allcomparison operators:= <> > >= < <=. Operands of suchoperators can be either tiles, either scalar constants:x=y orx=1.

Boolean operatorsand,or,not can not be overloaded. This is whyVOPS provides instead of them operators& | !. Please notice thatprecedence of this operators is different fromand,or,notoperators. So you can not write predicate asx=1 | x=2 - it will causesyntax error. To solve this problem please use parenthesis:(x=1) | (x=2).

Also VOPS provides analog of between operator. In SQL expression(x BETWEEN a AND b) is equivalent to(x >= a AND x <= b). But as far asAND operator can not be overloaded, such substitution will not work forVOPS tiles. This is why VOPS provides special function for range check.UnfortunatelyBETWEEN is reserved keyword, so no function with suchname can be defined. This is why synonymBETWIXT is used.

Postgres requires predicate expression to have boolean type. But resultof vector boolean operators isvops_bool, notbool. This is whycompiler doesn't allow to use it in predicate. The problem can be solvedby introducing specialfilter function. This function is givenarbitrary vector boolean expression and returns normal boolean which ...is always true. So from Postgres executor point of view predicate valueis always true. Butfilter function setsfilter_mask which isactually used in subsequent operators to determine selected records. Soquery in VOPS looks something like this:

  select sum(price) from trades where filter(day >= '2017-01-01'::date);

Please notice one more difference from normal sequence: we have to useexplicit cast of string constant to appreciate data type (date type inthis example). Forbetwixt function it is notneeded:

  select sum(price) from trades where filter(betwixt(day, '2017-01-01', '2017-02-01'));

Forchar,int2 andint4 types VOPS provides concatenation operator|| which produces doubled integer type:(char || char) -> int2,(int2 || int2) -> int4,(int4 || int4) -> int8. Them can be used forgrouping by several columns (see below).

Operator	Description
`+`	Addition
`-`	Binary subtraction or unary negation
`*`	Multiplication
`/`	Division
`=`	Equals
`<>`	Not equals
`<`	Less than
`<=`	Less than or Equals
`>`	Greater than
`>=`	Greater than or equals
`&`	Boolean AND
`\|`	Boolean OR
`!`	Boolean NOT
`bitwixt(x,low,high)`	Analog of BETWEEN
`is_null(x)`	Analog of IS NULL
`is_not_null(x)`	Analog of IS NOT NULL
`ifnull(x,subst)`	Analog of COALESCE

Vector aggregates

OLAP queries usually perform some kind of aggregation of large volumesof data. These includesgrand aggregates which are calculated for thewhole table or aggregates withgroup by which are calculated for eachgroup. VOPS implements all standard SQL aggregates:count, min, max, sum, avg, var_pop, var_sampl, variance, stddev_pop, stddev_samp, stddev. Them can be used exactly in the same way as in normal SQLqueries:

select sum(l_extendedprice*l_discount) as revenuefrom vops_lineitemwhere filter(betwixt(l_shipdate, '1996-01-01', '1997-01-01')        & betwixt(l_discount, 0.08, 0.1)        & (l_quantity < 24));

Also VOPS provides weighted average aggregate VWAP which can be used tocalculate volume-weighted average price:

select wavg(l_extendedprice,l_quantity) from vops_lineitem;

Using aggregation with group by is more complex. VOPS provides twofunctions for it:map andreduce. The work is actually done bymap(group_by_expression,aggregate_list,expr {,expr })VOPS implements aggregation using hash table, which entries collectaggregate states for all groups. And set returning functionreducejust iterates through the hash table consrtucted bymap.reducefunction is needed because result of aggregate in Postgres can not be aset. So aggregate query with group by looks something likethis:

select reduce(map(l_returnflag||l_linestatus, 'sum,sum,sum,sum,avg,avg,avg',    l_quantity,    l_extendedprice,    l_extendedprice*(1-l_discount),    l_extendedprice*(1-l_discount)*(1+l_tax),    l_quantity,    l_extendedprice,    l_discount)) from vops_lineitem where filter(l_shipdate <= '1998-12-01'::date);

Here we use concatenation operator to perform grouping by two columns.Right now VOPS supports grouping only by integer type. Another seriousrestriction is that all aggregated expressions should have the sametype, for examplevops_float4. It is not possible to calculateaggregates forvops_float4 andvopd_int8 columns in one call ofmap function, because it accepts aggregation arguments as variadicarray, so all elements of this array should have the same type.

Aggregate string inmap function should contain list of requestedaggregate functions, separated by colon. Standard lowercase names shouldbe used:count, sum, agg, min, max. Count is executed for theparticular column:count(x). There is no need to explicitly specifycount(*) because number of records in each group is returned byreduce function in any case.

reduce function returns set ofvops_aggregate type. It containsthree components: value of group by expression, number of records in thegroup and array of floats with aggregate values. Please notice thatvalues of all aggregates, includingcount andmin/max, are returnedasfloats.

create type vops_aggregates as(group_by int8, count int8, aggs float8[]);create function reduce(bigint) returns setof vops_aggregates;

But there is much simple and straightforward way of performing groupaggregates using VOPS. We need to partition table bygroup by fields.In this case grouping keys will be stored in normal way and other fields

inside tiles. Now Postgres executor will execute VOPS aggregates foreach group:
selectl_returnflag,l_linestatus,sum(l_quantity) as sum_qty,sum(l_extendedprice) as sum_base_price,sum(l_extendedprice*(1-l_discount)) as sum_disc_price,sum(l_extendedprice*(1-l_discount)(1+l_tax)) as sum_charge,avg(l_quantity) as avg_qty,avg(l_extendedprice) as avg_price,avg(l_discount) as avg_disc,countall() as count_orderfromvops_lineitem_projectionwherefilter(l_shipdate <= '1998-12-01'::date)group byl_returnflag,l_linestatusorder byl_returnflag,l_linestatus;

In this examplel_returnflag andl_linestatus fields of tablevops_lineitem_projection have"char" type while all other usedfields - tile types (l_shipdate has typevops_date and other fields

vops_float4). The query above is executed even faster than querywithreduce(map(...)). The main problem with this approach is that youhave to create projection for each combination of group by keys you wantto use in queries.

Vector window functions

VOPS provides limited support of Postgres window functions. Itimplementscount, sum, min, max, avg andlag functions. Butunfortunately Postgres requires aggregates to have to similar final typefor moving (window) and plain implementations. This is why VOPS has tochoose define this aggregate under different names:mcount, msum, mmin, mmax, mavg.

There are also two important restrictions:

Filtering, grouping and sorting can be done only by scalar(non-tile) attributes
Onlyrows between unbounded preceding and current row frame issupported (but there is special version ofmsum which acceptsextra window size parameter)

Example of using window functions withVOPS:

select vops_unnest(t.*) from (select mcount(*) over w,mcount(x) over w,msum(x) over w,mavg(x) over w,mmin(x) over w,mmax(x) over w,x - lag(x) over w from v window w as (rows between unbounded preceding and current row)) t;

Using indexes

Analytic queries are usually performed on the data for which no indexesare defined. And columnar store vector operations are most efficient inthis case. But it is still possible to use indexes with VOPS.

As far as each VOPS tile represents multiple values, index can be usedonly for some preliminary, non-precise filtering of data. It issomething similar with BRIN indexes. VOPS provides four functions:first, last, high, low which can be used to obtain high/low boundaryof values stored in the tile. First two functionsfirst andlastshould be used for sorted data set. In this case first value is thesmallest value in the tile and last value is the largest value in thetile. If data is not sorted, thenlowhigh functions should be used,which are more expensive, because them need to inspect all tile values.Using this four function it is possible to construct functional indexesfor VOPS table. BRIN index seems to be the best choice for VOPStable:

create index low_boundary on trades using brin(first(day)); -- trades table is ordered by daycreate index high_boundary on trades using brin(last(day)); -- trades table is ordered by day

Now it is possible to use this indexes in query. Please notice that wehave to recheck precise condition because index gives only approximateresult:

select sum(price) from trades where first(day) >= '2015-01-01' and last(day) <= '2016-01-01'                                               and filter(betwixt(day, '2015-01-01', '2016-01-01'));

Preparing data for VOPS

Now the most interesting question (from which may be we should start) -how we managed to prepare data for VOPS queries? Who and how willcombine attribute values of several rows inside one VOPS tile? It isdone bypopulate functions, provided by VOPS extension.

First of all you need to create table with columns having VOPS tiletypes. It can map all columns of the original table or just some mostfrequently used subset of them. This table can be treated asprojection of original table (this concept of projections is takenfrom Vertica). Projection should include columns which are mostfrequently used together in queries.

Original table from TPC-H benchmark:

create table lineitem(   l_orderkey integer,   l_partkey integer,   l_suppkey integer,   l_linenumber integer,   l_quantity real,   l_extendedprice real,   l_discount real,   l_tax real,   l_returnflag "char",   l_linestatus "char",   l_shipdate date,   l_commitdate date,   l_receiptdate date,   l_shipinstruct char(25),   l_shipmode char(10),   l_comment char(44));

VOPS projection of this table:

create table vops_lineitem(   l_shipdate vops_date not null,   l_quantity vops_float4 not null,   l_extendedprice vops_float4 not null,   l_discount vops_float4 not null,   l_tax vops_float4 not null,   l_returnflag vops_char not null,   l_linestatus vops_char not null);

Original table can be treated as write optimized storage (WOS). If ithas not indexes, then Postgres is able to provide very fast insertionspeed, comparable with raw disk write speed. Projection in VOPS formatcan be treated as read-optimized storage (ROS), most efficient forexecution of OLAP queries.

Data can be transferred from original to projected table using VOPSpopulate function:

create function populate(destination regclass,                         source regclass,                         predicate cstring default null,                         sort cstring default null) returns bigint;

Two first mandatory arguments of this function specify target and sourcetables. Optional predicate and sort clauses allow to restrict amount ofimported data and enforce requested order. By specifying predicate it ispossible to update VOPS table using only most recently received records.This functions returns number of loaded records. Example of populatefunctioninvocation:

select populate(destination := 'vops_lineitem'::regclass, source := 'lineitem'::regclass);

You can use populated table in queries performing sequential scan. VOPSoperators can speed up filtering of records and calculation ofaggregates. Aggregation withgroup by requires use ofreduce + mapfunctions. But as it was mentioned above in the section describingaggregates, it is possible to populate table in such way, that standardPostgres grouping algorithm will be used.

We need to choose partitioning keys and sort original table by thiskeys. Combination of partitioning keys expected to be NOT unique -otherwise tiles can only increase used space and lead to performancedegradation. But if there are a lot of duplicates, then "collapsing"them and storing other fields in tiles will help to reduce space andspeed up queries. Let's create the following projection oflineitemstable:

create table vops_lineitem_projection(                                                                                       l_shipdate vops_date not null,   l_quantity vops_float4 not null,   l_extendedprice vops_float4 not null,   l_discount vops_float4 not null,   l_tax vops_float4 not null,   l_returnflag "char" not null,   l_linestatus "char" not null);

As you can see, in this tablel_returnflag andl_linestatus fieldsare scalars, and other fields - tiles. This projection can be populatedusing the followingcommand:

select populate(destination := 'vops_lineitem_projection'::regclass, source := 'lineitem_projection'::regclass, sort := 'l_returnflag,l_linestatus');

Now we can create normal index on partitioning keys, define standardpredicates for them and use them ingroup by andorder by clauses.

Sometimes it is not possible or not desirable to store two copies of thesame dataset. VOPS allows to load data directly from CSV file into VOPStable with tiles, bypassing creation of normal (plain) table. It can bedone usingimportfunction:

select import(destination := 'vops_lineitem'::regclass, csv_path := '/mnt/data/lineitem.csv', separator := '|');

import function is defined in this way:

create function import(destination regclass,                        csv_path cstring,                        separator cstring default ',',                        skip integer default 0) returns bigint;

It accepts name of target VOPS table, path to CSV file, optionalseparator (default is ',') and number of lines in CSV header (no headerby default). The function returns number of imported rows.

Back to normal tuples

A query from VOPS projection returns set of tiles. Output function oftile type is able to print content of the tile. But in some cases it ispreferable to transfer result to normal (horizontal) format where eachtuple represents one record. It can be done usingvops_unnestfunction:

postgres=# select vops_unnest(l.*) from vops_lineitem l where filter(l_shipdate <= '1998-12-01'::date) limit 3;              vops_unnest                 --------------------------------------- (1996-03-13,17,33078.9,0.04,0.02,N,O) (1996-04-12,36,38306.2,0.09,0.06,N,O) (1996-01-29,8,15479.7,0.1,0.02,N,O)(3 rows)

Back to normal tables

As it was mentioned in previous section,vops_unnest function canscatter records with VOPS types into normal records with scalar types.So it is possible to use this records in arbitrary SQL queries. Butthere are two problems with vops_unnest function:

It is not convenient to use. This function has no static knowledgeabout the format of output record and this is why programmer has tospecify it manually, if here wants to decompose this record.
PostgreSQL optimizer has completely no knowledge on result oftransformation performed by vops_unnest() function. This is why itis not able to choose optimal query execution plan for dataretrieved from VOPS table.

Fortunately Postgres provides solution for both of this problem: foreigndata wrappers (FDW). In our case data is not really "foreign": it isstored inside our own database. But in alternatives (VOPS) format. VOPSFDW allows to "hide" specific of VOPS format and run normal SQL querieson VOPS tables. FDW allows the following:

Extract data from VOPS table in normal (horizontal) format so thatit can be proceeded by upper nodes in query execution plan.
Pushdown to VOPS operations that can be efficiently executed usingvectorized operations on VOPS types: filtering and aggregation.
Provide statistic for underlying table which can be used by queryoptimizer.

So, by placing VOPS projection under FDW, we can efficiently performsequential scan and aggregation queries as if them will be explicitlywritten for VOPS table and at the same time be able to execute any otherqueries on this data, including joins, CTEs,... Query can be written instandard SQL without usage of any VOPS specific functions.

Below is an example of creating VOPS FDW and running some queries on it:

create foreign table lineitem_fdw  (   l_suppkey int4 not null,   l_orderkey int4 not null,   l_partkey int4 not null,   l_shipdate date not null,   l_quantity float4 not null,   l_extendedprice float4 not null,   l_discount float4 not null,   l_tax      float4 not null,   l_returnflag "char" not null,   l_linestatus "char" not null) server vops_server options (table_name 'vops_lineitem');explain select   sum(l_extendedprice*l_discount) as revenuefrom   lineitem_fdwwhere   l_shipdate between '1996-01-01' and '1997-01-01'   and l_discount between 0.08 and 0.1   and l_quantity < 24;                       QUERY PLAN                        --------------------------------------------------------- Foreign Scan  (cost=1903.26..1664020.23 rows=1 width=4)(1 row)-- Filter was pushed down to FDWexplain select    n_name,    count(*),    sum(l_extendedprice * (1-l_discount)) as revenuefrom    customer_fdw join orders_fdw on c_custkey = o_custkey    join lineitem_fdw on l_orderkey = o_orderkey    join supplier_fdw on l_suppkey = s_suppkey    join nation on c_nationkey = n_nationkey    join region on n_regionkey = r_regionkeywhere    c_nationkey = s_nationkey    and r_name = 'ASIA'    and o_orderdate >= '1996-01-01'    and o_orderdate < '1997-01-01'group by    n_nameorder by    revenue desc;                                                              QUERY PLAN                                                              -------------------------------------------------------------------------------------------------------------------------------------- Sort  (cost=2337312.28..2337312.78 rows=200 width=48)   Sort Key: (sum((lineitem_fdw.l_extendedprice * ('1'::double precision - lineitem_fdw.l_discount)))) DESC   ->  GroupAggregate  (cost=2336881.54..2337304.64 rows=200 width=48)         Group Key: nation.n_name         ->  Sort  (cost=2336881.54..2336951.73 rows=28073 width=40)               Sort Key: nation.n_name               ->  Hash Join  (cost=396050.65..2334807.39 rows=28073 width=40)                     Hash Cond: ((orders_fdw.o_custkey = customer_fdw.c_custkey) AND (nation.n_nationkey = customer_fdw.c_nationkey))                     ->  Hash Join  (cost=335084.53..2247223.46 rows=701672 width=52)                           Hash Cond: (lineitem_fdw.l_orderkey = orders_fdw.o_orderkey)                           ->  Hash Join  (cost=2887.07..1786058.18 rows=4607421 width=52)                                 Hash Cond: (lineitem_fdw.l_suppkey = supplier_fdw.s_suppkey)                                 ->  Foreign Scan on lineitem_fdw  (cost=0.00..1512151.52 rows=59986176 width=16)                                 ->  Hash  (cost=2790.80..2790.80 rows=7702 width=44)                                       ->  Hash Join  (cost=40.97..2790.80 rows=7702 width=44)                                             Hash Cond: (supplier_fdw.s_nationkey = nation.n_nationkey)                                             ->  Foreign Scan on supplier_fdw  (cost=0.00..2174.64 rows=100032 width=8)                                             ->  Hash  (cost=40.79..40.79 rows=15 width=36)                                                   ->  Hash Join  (cost=20.05..40.79 rows=15 width=36)                                                         Hash Cond: (nation.n_regionkey = region.r_regionkey)                                                         ->  Seq Scan on nation  (cost=0.00..17.70 rows=770 width=40)                                                         ->  Hash  (cost=20.00..20.00 rows=4 width=4)                                                               ->  Seq Scan on region  (cost=0.00..20.00 rows=4 width=4)                                                                     Filter: ((r_name)::text = 'ASIA'::text)                           ->  Hash  (cost=294718.76..294718.76 rows=2284376 width=8)                                 ->  Foreign Scan on orders_fdw  (cost=0.00..294718.76 rows=2284376 width=8)                     ->  Hash  (cost=32605.64..32605.64 rows=1500032 width=8)                           ->  Foreign Scan on customer_fdw  (cost=0.00..32605.64 rows=1500032 width=8)-- filter on orders range is pushed to FDW

Standard SQL query transformation

Previous section describes VOPS specific types, operators, functions,...Good news! You do not need to learn them. You can use normal SQL. Well,it is still responsibility of programmer or database administrator tocreate proper projections of original table. This projections need touse tiles types for some attributes (vops_float4,...). Then you canquery this table using standard SQL. And this query will be executedusing vector operations!

How it works? There are absolutely no magic here. There are four maincomponents of the puzzle:

User defined types
User defined operator
User defined implicit type casts
Post parse analyze hook which performs query transformation

So VOPS defines tile types and standard SQL operators for this types.Then it defines implicit type cast fromvops_bool (result of booleanoperation with tiles) to boolean type. Now programmer do not have towrap vectorized boolean operations infilter() function call. And thefinal transformation is done by post parse analyze hook, defined by VOPSextension. It replaces scalar boolean operations with vector booleanoperations:

Original expression	Result of transformation
`NOT filter(o1)`	`filter(vops_bool_not(o1))`
`filter(o1) AND filter(o2)`	`filter(vops_bool_and(o1, o2))`
`filter(o1) OR filter(o2)`	`filter(vops_bool_or(o1, o2))`

Now there is no need to use VOPS specificBETIXT operator: standardSQLBETWEEN operator will work (but still usingBETIXT is slightlymore efficient, because it performs both comparions in one function).Also there are no problems with operators precedence and extraparenthesis are not needed. If query includes vectorized aggregates,thencount(*) is transformed tocountall(*).

There is only one difference left between standard SQL and itsvectorized extension. You still have to perform explicit type cast incase of using string literal, for examplel_shipdate <= '1998-12-01'will not work forl_shipdate column with tile type. Postgres have twooverloaded versions of <= operator which can be applied here:

vops_date<=vops_date
vops_date<=date

And it decides that it is better to convert string to the tile typevops_date. In principle, it is possible to provide such conversionoperator. But it is not good idea, because we have to generate dummytile with all components equal to the specified constant and perform(vectorOPvector) operation instead of more efficient (vectorOPscalar).

There is one pitfall with post parse analyze hook: it is initialized inthe extension_PG_init function. But if extension was not registeredinshared_preload_libraries list, then it will be loaded on demandwhen any function of this extension is requested. Unfortunately ithappensafter parse analyze is done. So first time you execute VOPSquery, it will not be transformed. You can get wrong result in thiscase. Either take it in account, either addvops toshared_preload_libraries configuration string. VOPS extension providesspecial functionvops_initialize() which can be invoked to forceinitialization of VOPS extension. After invocation of this function,extension will be loaded and all subsequent queries will be normallytransformed and produce expected results.

VOPS projections and automatic table sustitution

VOPS provides some functions simplifying creation and usage of projections.In future it may be added to SQL grammar, so that it is possible to writeCREATE PROJECTION xxx OF TABLE yyy(column1, column2,...) GROUP BY (column1, column2, ...).But right now it can be done usingcreate_projection(projection_name text, source_table regclass, vector_columns text[], scalar_columns text[] default null, order_by text default null) function.First argument of this function specifies name of the projection, second refers to existed Postgres table,vector_columns is array ofcolumn names which should be stores as VOPS tiles,scalar_columns is array of grouping columns which type is preserved andoptionalorder_by parameter specifies name of ordering attribute (explained below).Thecreate_projection(PNAME,...) functions does the following:

Creates projection table with specified name and attributes.
Creates PNAME_refresh() functions which can be used to update projection.
Creates functional BRIN indexes forfirst() andlast() functions of ordering attribute (if any)
Creates BRIN index on grouping attributes (if any)
Insert information about created projection invops_projections table. This table is used by optimizer toautomatically substitute table with partition.

Theorder_by attribute is one of the VOPS projection vector columns by which data is sorted. Usually it is some kind of timestampused intime series (for example trade date). Presence of such column in projection allows to incrementally update projection.GeneratedPNAME_refresh() method callspopulate method with correspondent values ofpredicate andsort parameters, selecting from original table only rows withorder_by column value greater than maximalvalue of this column in the projection. It assumes thatorder_by is unique or at least refresh is done at the moment when there is some gapin collected events. In addition toorder_by, sort list forpopulate includes all scalar (grouping) columns.It allows to efficiently group imported data by scalar columns and fill VOPS tiles (vector columns) with data.

Whenorder_by attribute is specified, VOPS creates two functional BRIN indexes onfirst() andlast()functions of this attribute. Presence of such indexes allows to efficiently select time slices. If original query containspredicate like(trade_date between '01-01-2017' and '01-01-2018') then VOPS projection substitution mechanism adds(first(trade_date) >= '01-01-2017' and last(trade_date) >= '01-01-2018') conjuncts which allow Postgres optimizer to use BRINindex to locate affected pages.

In in addition to BRIN indexes fororder_by attribute, VOPS also creates BRIN index for grouping (scalar) columns.Such index allows to efficiently select groups and perform index join.

Like materialized views, VOPS projections are not updated automatically. It is responsibility of programmer to periodically refresh them.Certainly it is possible to define trigger or rule which will automatically insert data in projection table when original table is updated.But such approach will be extremely inefficient and slow. To take advantage of vector processing, VOPS has to group data in tiles.It can be done only if there is some batch of data which can be grouped by scalar attributes. If you insert records in projection table on-by-one,then most of VOPS tiles will contain just one element.The most convenient way is to use generatedPNAME_refresh() function.Iforder_by attribute is specified, this function imports from original table only the new data (not present in projection).

The main advantage of VOPS projection mechanism is that it allows to automatically substitute queries on original tables with projections.There isvops.auto_substitute_projections configuration parameter which allows to switch on such substitution.By default it is switched off, because VOPS projects may be not synchronized with original table and query on projection may return different result.Right now projections can be automatically substituted only if:

Query doesn't contain joins.
Query performs aggregation of vector (tile) columns.
All other expressions in target list,ORDER BY /GROUP BY clauses refer only to scalar attributes of projection.

Projection can be removed usingdrop_projection(projection_name text) function.It not only drops the correspondent table, but also removes information about it fromvops_partitions tableand drops generated refresh function.

Example of using projections:

create extension vops;create table lineitem(   l_orderkey integer,   l_partkey integer,   l_suppkey integer,   l_linenumber integer,   l_quantity real,   l_extendedprice real,   l_discount real,   l_tax real,   l_returnflag "char",   l_linestatus "char",   l_shipdate date,   l_commitdate date,   l_receiptdate date,   l_shipinstruct char(25),   l_shipmode char(10),   l_comment char(44),   l_dummy char(1));select create_projection('vops_lineitem','lineitem',array['l_shipdate','l_quantity','l_extendedprice','l_discount','l_tax'],array['l_returnflag','l_linestatus']);\timingcopy lineitem from '/mnt/data/lineitem.tbl' delimiter '|' csv;select vops_lineitem_refresh();select    l_returnflag,    l_linestatus,    sum(l_quantity) as sum_qty,    sum(l_extendedprice) as sum_base_price,    sum(l_extendedprice*(1-l_discount)) as sum_disc_price,    sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,    avg(l_quantity) as avg_qty,    avg(l_extendedprice) as avg_price,    avg(l_discount) as avg_disc,    count(*) as count_orderfrom    lineitemwhere    l_shipdate <= '1998-12-01'group by    l_returnflag,    l_linestatusorder by    l_returnflag,    l_linestatus;set vops.auto_substitute_projections TO  on;select    l_returnflag,    l_linestatus,    sum(l_quantity) as sum_qty,    sum(l_extendedprice) as sum_base_price,    sum(l_extendedprice*(1-l_discount)) as sum_disc_price,    sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,    avg(l_quantity) as avg_qty,    avg(l_extendedprice) as avg_price,    avg(l_discount) as avg_disc,    count(*) as count_orderfrom    lineitemwhere    l_shipdate <= '1998-12-01'group by    l_returnflag,    l_linestatusorder by    l_returnflag,    l_linestatus;

Example

The most popular benchmark for OLAP isTPC-H.It contains 21 different queries. We adopted for VOPS only two of them:Q1 and Q6 which are not using joins. Most of fragments of this code arealready mentioned above, but here we collect it together:

-- Standard way of creating extensioncreate extension vops; -- Original TPC-H tablecreate table lineitem(   l_orderkey integer,   l_partkey integer,   l_suppkey integer,   l_linenumber integer,   l_quantity real,   l_extendedprice real,   l_discount real,   l_tax real,   l_returnflag "char",   l_linestatus "char",   l_shipdate date,   l_commitdate date,   l_receiptdate date,   l_shipinstruct char(25),   l_shipmode char(10),   l_comment char(44),   l_dummy char(1)); -- this table is needed because of terminator after last column in generated data -- Import data to itcopy lineitem from '/mnt/data/lineitem.tbl' delimiter '|' csv;-- Create VOPS projectioncreate table vops_lineitem(   l_shipdate vops_date not null,   l_quantity vops_float4 not null,   l_extendedprice vops_float4 not null,   l_discount vops_float4 not null,   l_tax vops_float4 not null,   l_returnflag vops_char not null,   l_linestatus vops_char not null);-- Copy data to the projection tableselect populate(destination := 'vops_lineitem'::regclass, source := 'lineitem'::regclass);-- For honest comparison creates the same projection without VOPS typescreate table lineitem_projection as (select l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag::"char",l_linestatus::"char" from lineitem);-- Now create mixed projection with partitioning keys:create table vops_lineitem_projection(                                                                                       l_shipdate vops_date not null,   l_quantity vops_float4 not null,   l_extendedprice vops_float4 not null,   l_discount vops_float4 not null,   l_tax vops_float4 not null,   l_returnflag "char" not null,   l_linestatus "char" not null);-- And populate it with data sorted by partitioning key:select populate(destination := 'vops_lineitem_projection'::regclass, source := 'lineitem_projection'::regclass, sort := 'l_returnflag,l_linestatus');-- Let's measure time\timing-- Original Q6 query performing filtering with calculation of grand aggregateselect    sum(l_extendedprice*l_discount) as revenuefrom    lineitemwhere    l_shipdate between '1996-01-01' and '1997-01-01'    and l_discount between 0.08 and 0.1    and l_quantity < 24;-- VOPS version of Q6 using VOPS specific operatorsselect sum(l_extendedprice*l_discount) as revenuefrom vops_lineitemwhere filter(betwixt(l_shipdate, '1996-01-01', '1997-01-01')        & betwixt(l_discount, 0.08, 0.1)        & (l_quantity < 24));-- Yet another vectorized version of Q6, but now in stadnard SQL:select sum(l_extendedprice*l_discount) as revenuefrom vops_lineitemwhere l_shipdate between '1996-01-01'::date AND '1997-01-01'::date   and l_discount between 0.08 and 0.1   and l_quantity < 24;-- Original version of Q1: filter + group by + aggregationselect     l_returnflag,    l_linestatus,    sum(l_quantity) as sum_qty,    sum(l_extendedprice) as sum_base_price,    sum(l_extendedprice*(1-l_discount)) as sum_disc_price,    sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,    avg(l_quantity) as avg_qty,    avg(l_extendedprice) as avg_price,    avg(l_discount) as avg_disc,    count(*) as count_orderfrom    lineitemwhere    l_shipdate <= '1998-12-01'group by    l_returnflag,    l_linestatusorder by    l_returnflag,    l_linestatus;-- VOPS version of Q1, sorry - no final sortingselect reduce(map(l_returnflag||l_linestatus, 'sum,sum,sum,sum,avg,avg,avg',    l_quantity,    l_extendedprice,    l_extendedprice*(1-l_discount),    l_extendedprice*(1-l_discount)*(1+l_tax),    l_quantity,    l_extendedprice,    l_discount)) from vops_lineitem where filter(l_shipdate <= '1998-12-01'::date);       -- Mixed mode: let's Postgres does group by and calculates VOPS aggregates for each groupselect    l_returnflag,    l_linestatus,    sum(l_quantity) as sum_qty,    sum(l_extendedprice) as sum_base_price,    sum(l_extendedprice*(1-l_discount)) as sum_disc_price,    sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,    avg(l_quantity) as avg_qty,    avg(l_extendedprice) as avg_price,    avg(l_discount) as avg_disc,    count(*) as count_orderfrom    vops_lineitem_projectionwhere    l_shipdate <= '1998-12-01'::dategroup by    l_returnflag,    l_linestatusorder by    l_returnflag,    l_linestatus;

Performance evaluation

Now most interesting thing: compare performance results on originaltable and using vector operations on VOPS projection. All measurementswere performed at desktop with 16Gb of RAM and quad-core i7-4770 CPU @3.40GHz processor with enabled hyper-threading. Data set for benchmarkwas generated by dbgen utility included in TPC-H benchmark. Scale factoris 10 which corresponds to about 8Gb database. It can completely fit inmemory, so we are measuring best query execution time forwarm data.Postgres was configured with shared buffer size equal to 8Gb. For eachquery we measured time of sequential and parallel execution with 8parallelworkers.

Query	Sequential execution (msec)	Parallel execution (msec)
Original Q1 for lineitem	38028	10997
Original Q1 for lineitem_projection	33872	9656
Vectorized Q1 for vops_lineitem	3372	951
Mixed Q1 for vops_lineitem_projection	1490	396
Original Q6 for lineitem	16796	4110
Original Q6 for lineitem_projection	4279	1171
Vectorized Q6 for vops_lineitem	875	284

Conclusion

As you can see in performance results, VOPS can provide more than 10times improvement of query speed. And this result is achieved withoutchanging something in query planner and executor. It is better than anyof existed attempt to speed up execution of OLAP queries using JIT (4times for Q1, 3 times for Q6):Speeding up query execution inPostgreSQL using LLVM JITcompiler.

Definitely VOPS extension is just a prototype which main role is todemonstrate potential of vectorized executor. But I hope that it alsocan be useful in practice to speedup execution of OLAP aggregationqueries for existed databases. And in future we should think about thebest approach of integrating vectorized executor in Postgres core.

ALL sources of VOPS project can be obtained from thisGITrepository. Chinese version ofdocumentation is availablehere.Please send any feedbacks, complaints, bug reports, change requests toKonstantin Knizhnik.

About

No description, website, or topics provided.

Movatterモバイル変換

License

postgrespro/vops

Folders and files

Latest commit

History

Repository files navigation

Motivation

Where DBMS spent most of the time during processing OLAP queries?

Vertical storage

VOPS

Types

Vector operators

Vector aggregates

Vector window functions

Using indexes

Preparing data for VOPS

Back to normal tuples

Back to normal tables

Standard SQL query transformation

VOPS projections and automatic table sustitution

Example

Performance evaluation

Conclusion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors13

Uh oh!

Languages

Packages