| Structured Query Language Window functions | with clause |
Thewindow functions discussed on this page are a special and very powerful extension to 'traditional' functions. They compute their result not on a single row but on a set of rows (similar toaggregate functions acting in correlation with aGROUP BY clause). This set of rows - and this is the crucial point - 'moves' or 'slides' over all rows, which are determined by theWHERE clause. This 'sliding window' is called aframe or - in terms of the official SQL standard - the 'window frame'.
Here are some examples:
GROUP BY clauses.In contrast toGROUP BY clauses, where only one output row per group exists, withwindow functions all rows of the result set retain their identity and are shown.
Window functions are listed between the two keywordsSELECT andFROM at the same place where usual functions and columns are listed. They contain the keyword OVER.
-- Window functions appear between the key words SELECT and FROMSELECT...,<window_function>,...FROM<tablename>...;-- They consist of three main parts:-- 1. function type (which is the name of the function)-- 2. key word 'OVER'-- 3. specification, which rows constitute the 'sliding window' (partition, order and frame)<window_function>:=<window_function_type>OVER<window_specification><window_function_type>:=ROW_NUMBER()|RANK()|LEAD(<column>)|LAG(<column>)|FIRST_VALUE(<column>)|LAST_VALUE(<column>)|NTH_VALUE(<column>,<n>)|SUM(<column>)|MIN(<column>)|MAX(<column>)|AVG(<column>|COUNT(<column>)<window_specification>:=[<window_partition>][<window_order>][<window_frame>]<window_partition>:=PARTITIONBY<column><window_order>:=ORDERBY<column><window_frame>:=seebelow

Concerning window functions, there are some similar concepts. To distinguish the concepts from each other, it is necessary to use an exact terminology. This terminology is introduced in the next eight paragraphs, which also - roughly - reflect the order of execution. The goal of the first seven steps is the determination of the actual frame, and the eighth step acts on it.
WHERE clause returns a certain number of rows. They constitutes theresult set.ORDER BY clause (syntactically behind theWHERE clause) re-orders theresult set into a certain sequence.SELECT clause. The row, which is actually given to theSELECT clause, is called thecurrent row.WINDOW PARTITION clause divides theresult set intowindow partitions (We will use the shorter termpartition as in the context of our site there is no danger of confusion). If there is noWINDOW PARTITION clause, all rows of theresult set constitutes onepartition. (Thesepartitions are equivalent to groups created by theGROUP BY clause.)Partitions are distinct from each other: there is no overlapping as every row of theresult set belongs to one and only onepartition.WINDOW ORDER clause orders the rows of eachpartition (which may differ from theORDER BY clause).WINDOW FRAME clause defines which rows of the actualpartition belong to the actualwindow frame (We will use the shorter termframe). The clause defines oneframe for every row of theresult set. This is done by determining the lower and upper boundary of affected rows. In consequence, there are as many (mostly different) frames as number of rows in the result set. The upper and lower boundaries are newly determined with every row of the result set! Single rows may be part of more than oneframe.The actualframe is the instantiation of the 'sliding window'. Its rows are ordered according to theWINDOW ORDER clause.WINDOW FRAME clause, the rows of the actualpartition constituteframes with the following default boundaries: The first row of the actualpartition is their lower boundary and thecurrent row is their upper boundary. If there is noWINDOW FRAME clause and noWINDOW ORDER clause, the upper boundary switches to the last row of the actualpartition. Below we will explain how to change this default behavior.We use the following table to demonstrate window functions.
CREATETABLEemployee(-- define columns (name / type / default value / column constraint)idDECIMALPRIMARYKEY,emp_nameVARCHAR(20)NOTNULL,dep_nameVARCHAR(20)NOTNULL,salaryDECIMAL(7,2)NOTNULL,ageDECIMAL(3,0)NOTNULL,-- define table constraints (it's merely an example table)CONSTRAINTempoyee_ukUNIQUE(emp_name,dep_name));INSERTINTOemployeeVALUES(1,'Matthew','Management',4500,55);INSERTINTOemployeeVALUES(2,'Olivia','Management',4400,61);INSERTINTOemployeeVALUES(3,'Grace','Management',4000,42);INSERTINTOemployeeVALUES(4,'Jim','Production',3700,35);INSERTINTOemployeeVALUES(5,'Alice','Production',3500,24);INSERTINTOemployeeVALUES(6,'Michael','Production',3600,28);INSERTINTOemployeeVALUES(7,'Tom','Production',3800,35);INSERTINTOemployeeVALUES(8,'Kevin','Production',4000,52);INSERTINTOemployeeVALUES(9,'Elvis','Service',4100,40);INSERTINTOemployeeVALUES(10,'Sophia','Sales',4300,36);INSERTINTOemployeeVALUES(11,'Samantha','Sales',4100,38);COMMIT;
The example demonstrates how the boundaries 'slides' over the result set. Doing so, they create one frame after the next,one per row of the result set. These frames are part of partitions, the partitions are part of the result set, and the result set is part of the table.
SELECTid,emp_name,dep_name,-- The functions FIRST_VALUE() and LAST_VALUE() explain themselves by their name. They act within the actual frame.FIRST_VALUE(id)OVER(PARTITIONBYdep_nameORDERBYid)ASframe_first_row,LAST_VALUE(id)OVER(PARTITIONBYdep_nameORDERBYid)ASframe_last_row,COUNT(*)OVER(PARTITIONBYdep_nameORDERBYid)ASframe_count,-- The functions LAG() and LEAD() explain themselves by their name. They act within the actual partition.LAG(id)OVER(PARTITIONBYdep_nameORDERBYid)ASprev_row,LEAD(id)OVER(PARTITIONBYdep_nameORDERBYid)ASnext_rowFROMemployee;-- For simplification, we use the same PARTITION and ORDER definitions for all window functions.-- This is not necessary, you can use divergent definitions.
Please notice how the lower boundary (FRAME_FIRST_ROW) and the upper boundary (FRAME_LAST_ROW) changes from row to row.
| ID | EMP_NAME | DEP_NAME | FRAME_FIRST_ROW | FRAME_LAST_ROW | FRAME_COUNT | PREV_ROW | NEXT_ROW |
|---|---|---|---|---|---|---|---|
| 1 | Matthew | Management | 1 | 1 | 1 | - | 2 |
| 2 | Olivia | Management | 1 | 2 | 2 | 1 | 3 |
| 3 | Grace | Management | 1 | 3 | 3 | 2 | - |
| 4 | Jim | Production | 4 | 4 | 1 | - | 5 |
| 5 | Alice | Production | 4 | 5 | 2 | 4 | 6 |
| 6 | Michael | Production | 4 | 6 | 3 | 5 | 7 |
| 7 | Tom | Production | 4 | 7 | 4 | 6 | 8 |
| 8 | Kevin | Production | 4 | 8 | 5 | 7 | - |
| 10 | Sophia | Sales | 10 | 10 | 1 | - | 11 |
| 11 | Samantha | Sales | 10 | 11 | 2 | 10 | - |
| 9 | Elvis | Service | 9 | 9 | 1 | - | - |
The query has noWHERE clause. Therefore all rows of the table are part of the result set. According to theWINDOW PARTITION clause, which is 'PARTITION BY dep_name', the result set is divided into the 4 partitions: 'Management', 'Production', 'Sales' and 'Service'. The frames run within these partitions. As there is noWINDOW FRAME clause the frames start at the first row of the actual partition and runs up to the current row.
You can see that the actual number of rows within a frame (column FRAME_COUNT) grows from 1 up to the sum of all rows within the partition. When the partition switches to the next one, the number starts again with 1.
The columns PREV_ROW and NEXT_ROW show the ids of the previous and next row within the actual partition. As the first row has no predecessor, theNULL indicator is shown. This applies correspondingly to the last row and its successor.
We present some of the<window_function_type> functions and their meaning. The standard as well as most implementations include additional functions and overloaded variants.
| Signature | Scope | Meaning / Return Value |
|---|---|---|
| FIRST_VALUE(<column>) | Actual Frame | The column value of the first row within the frame. |
| LAST_VALUE(<column>) | Actual Frame | The column value of the last row within the frame. |
| LAG(<column>) | Actual Partition | The column value of the predecessor row (the row which is before the current row). |
| LAG(<column>, <n>) | Actual Partition | The column value of the n.-th row before the current row. |
| LEAD(<column>) | Actual Partition | The column value of the successor row (the row which is after the current row). |
| LEAD(<column>, <n>) | Actual Partition | The column value of the n.-th row after the current row. |
| ROW_NUMBER() | Actual Frame | A numeric sequence of the row within the frame. |
| RANK() | Actual Frame | A numeric sequence of the row within the frame. Identical values in the specified order evaluate to the same number. |
| NTH_VALUE(<column>, <n>) | Actual Frame | The column value of the n.-th row within the frame. |
| SUM(<column>) MIN(<column>) MAX(<column>) AVG(<column>) COUNT(<column>) | Actual Frame | As usual. |
Here are some examples:
SELECTid,emp_name,dep_name,ROW_NUMBER()OVER(PARTITIONBYdep_nameORDERBYid)ASrow_number_in_frame,NTH_VALUE(emp_name,2)OVER(PARTITIONBYdep_nameORDERBYid)ASsecond_row_in_frame,LEAD(emp_name,2)OVER(PARTITIONBYdep_nameORDERBYid)AStwo_rows_aheadFROMemployee;
| ID | EMP_NAME | DEP_NAME | ROW_NUMBER_IN_FRAME | SECOND_ROW_IN_FRAME | TWO_ROWS_AHEAD |
|---|---|---|---|---|---|
| 1 | Matthew | Management | 1 | - | Grace |
| 2 | Olivia | Management | 2 | Olivia | - |
| 3 | Grace | Management | 3 | Olivia | - |
| 4 | Jim | Production | 1 | - | Michael |
| 5 | Alice | Production | 2 | Alice | Tom |
| 6 | Michael | Production | 3 | Alice | Kevin |
| 7 | Tom | Production | 4 | Alice | - |
| 8 | Kevin | Production | 5 | Alice | - |
| 10 | Sophia | Sales | 1 | - | - |
| 11 | Samantha | Sales | 2 | Samantha | - |
| 9 | Elvis | Service | 1 | - | - |
The three example shows:
As shown in the above examples, theWINDOW PARTITION clause defines the partitions by using the keywords PARTITION BY and theWINDOW ORDER clause defines the sequence of rows within the partition by using the key words ORDER BY.
The frames are defined by theWINDOW FRAME clause, which optionally follows theWINDOW PARTITION clause and theWINDOW ORDER clause.
With the exception of the lead() and lag() functions, whose scope is the actual partition, all other window functions act on the actual frame. Therefore it is an elementary decision, which rows shall constitute the frame. This is done by establishing the lower and upper boundary (in the sense of theWINDOW ORDER clause). All rows within these two bounds constitute the actual frame. Therefore theWINDOW FRAME clause consists mainly of the definition of the two boundaries - in one of four ways:
SELECT DISTINCT ... orGROUP BY. The resulting frame covers all rows, whose values fall into one of the groups. As every group may be built out of multiple rows (with the same value), the number of rows per frame is not constant.In accordance with these different strategies, there are three keywords 'ROWS', 'GROUPS' and 'RANGE' which leads to the different behavior.
TheWINDOW FRAME clause uses some keywords that modify or specify where the ordered rows of a partition are visualized.
Rows in a partition and the related keywords - <-- UNBOUNDED PRECEDING (first row) ... - <-- 2 PRECEDING - <-- 1 PRECEDING - <-- CURRENT ROW - <-- 1 FOLLOWING - <-- 2 FOLLOWING ... - <-- UNBOUNDED FOLLOWING (last row)
The term UNBOUNDED PRECEDING denotes the first row in a partition and UNBOUNDED FOLLOWING the last row. Counting from the CURRENT ROW there are <n> PRECEDING and <n> FOLLOWING rows. Obviously this PRECEDING/FOLLOWING terminology works only, if there is aWINDOW ORDER clause, which creates an unambiguous sequence.
The (simplified) syntax of theWINDOW FRAME clause is:
<window_frame>:=[ROWS|GROUPS|RANGE]BETWEEN[UNBOUNDEDPRECEDING|<n>PRECEDING|CURRENTROW]AND[UNBOUNDEDFOLLOWING|<n>FOLLOWING|CURRENTROW]
An example of a complete window function with itsWINDOW FRAME clause is:
...SUM(salary)OVER(PARTITIONBYdep_nameORDERBYsalaryROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW)asgrowing_sum,...
In this case theWINDOW FRAME clause starts with the keyword 'ROWS'. It defines the lower boundary to the very first row of the partition and the upper boundary to the actual row. This means that the series of frames grows from frame to frame by one additional row until all rows of the partition are handled. Afterward, the next partition starts with a 1-row-frame and repeats the growing.
The ROWS syntax defines a certain number of rows to process.
SELECTid,dep_name,salary,SUM(salary)OVER(PARTITIONBYdep_nameORDERBYsalaryROWSBETWEEN2PRECEDINGANDCURRENTROW)ASsum_over_1or2or3_rowsFROMemployee;
The example acts on a certain number of rows, namely the two rows before the current row (if existing within the partition) and the current row. There is no situation where more than three rows exists in one of the frames. The window function computes the sum of the salary over these maximal three rows.
The sum is reset to zero with every new partition, which is the department in this case. This holds true also for the GROUPS and RANGE syntax.
The ROWS syntax is often used when one is interested in the average about a certain number of rows or in the distance between two rows.
The GROUPS syntax has a similar semantic as the ROWS syntax - with one exception: rows with equal values within the column of theWINDOW ORDER clause count as 1 row. The GROUPS syntax counts the number of distinct values, not the number of rows.
-- Hint: The syntax 'GROUPS' (Feature T620) is not supported by Oracle 11SELECTid,dep_name,salary,SUM(salary)OVER(PARTITIONBYdep_nameORDERBYsalaryGROUPSBETWEEN1PRECEDINGAND1FOLLOWING)ASsum_over_groupsFROMemployee;
The example starts with the keyword GROUPS and defines that it wants to work on three distinct values of the column 'salary'. Possibly more than three rows are satisfying these criteria - in opposite to the equivalent ROWS strategy.
The GROUPS syntax is the appropriate strategy, if one has a varying number of rows within the time period under review, eg.: one has a varying number of measurement values per day and is interested in the average of the variance over a week or month.
At a first glance, the RANGE syntax is similar to the ROWS and GROUPS syntax. But the semantic is very different! Numbers <n> given in this syntax did not specify any counter. They specify thedistance from the value in the current row to the lower or upper boundary. Therefore the ORDER BY column shall be of type NUMERIC, DATE, or INTERVAL.
SELECTid,dep_name,salary,SUM(salary)OVER(PARTITIONBYdep_nameORDERBYsalaryRANGEBETWEEN100PRECEDINGAND50FOLLOWING)ASsum_over_rangeFROMemployee;
This definition leads to the sum over all rows which have a salary from 100 below and 50 over the actual row. In our example table, this criteria applies in some rare cases to more than 1 row.
Typical use cases for the RANGE strategy are situations where someone analyzes a wide numeric range and expects to meet only a few rows within this range, e.g.: a sparse matrix.
If theWINDOW FRAME clause is omitted, its default value is: 'RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW'. This leads to a range from the very first row of the partition up the current row plus all rows with the same value as the current row - because the RANGE syntax applies.
If theWINDOW ORDER clause is omitted, theWINDOW FRAME clause is not allowed and all rows of the partition constitute the frame.
If thePARTITION BY clause is omitted, all rows of the result set constitutes the one and only partition.
Although the SQL standard 2003 and his successors define very clear rules concerning window functions, several implementations did not follow them. Some vendors implement only parts of the standard - which is their own responsibility -, but others seem to interpret the standard in a fanciful fashion.
As far as we know, the ROWS syntax conforms to the standard when it is implemented. But it seems that the RANGE syntax sometimes implements what the GROUPS syntax of the SQL standard requires. (Perhaps this is a misrepresentation, and only the public available descriptions of various implementations do not reflect the details.) So: be careful, test your system, and give us feedback on the discussion page.
Show id, emp_name, dep_name, salary and the average salary within the department.
---- To get the average of the department, every frame must be build by ALL rows of the department.--SELECTid,emp_name,dep_name,salary,AVG(salary)OVER(PARTITIONBYdep_nameORDERBYdep_name-- all rows of partition (=department)ROWSBETWEENUNBOUNDEDPRECEDINGANDUNBOUNDEDFOLLOWING)asavg_salaryFROMemployee;---- It's possible to omit the 'window order' clause. Thereby the frames include ALL rows of the actual partition.-- See: 'Defaults' above.--SELECTid,emp_name,dep_name,salary,AVG(salary)OVER(PARTITIONBYdep_name)asavg_salaryFROMemployee;---- The following statements lead to different results as the frames are composed of a growing number of rows.--SELECTid,emp_name,dep_name,salary,AVG(salary)OVER(PARTITIONBYdep_nameORDERBYsalary)asavg_salaryFROMemployee;---- It's possible to sort the result set by arbitrary rows (test the emp_name, it's interesting)--SELECTid,emp_name,dep_name,salary,AVG(salary)OVER(PARTITIONBYdep_name)asavg_salaryFROMemployeeORDERBYdep_name,salary;
Does older persons earn more money than younger?
To give an answer show id, emp_name, salary, age and the average salary of 3 (or 5) persons, which are in a similar age.
SELECTid,emp_name,salary,age,AVG(salary)OVER(ORDERBYageROWSBETWEEN1PRECEDINGAND1FOLLOWING)ASmean_over_3,AVG(salary)OVER(ORDERBYageROWSBETWEEN2PRECEDINGAND2FOLLOWING)ASmean_over_5FROMemployee;-- As there is no restriction to any other criterion than the age (department or something else), there is-- no need for any PARTITION definition. Averages are computed without any interruption.
Extend the above question and its solution to show the results within the four departments.
SELECTid,emp_name,salary,age,dep_name,AVG(salary)OVER(PARTITIONBYdep_nameORDERBYageROWSBETWEEN1PRECEDINGAND1FOLLOWING)ASmean_over_3,AVG(salary)OVER(PARTITIONBYdep_nameORDERBYageROWSBETWEEN2PRECEDINGAND2FOLLOWING)ASmean_over_5FROMemployee;-- Averages are computed WITHIN departments.
Show id, emp_name, salary and the difference to the salary of the previous person (in ID-order).
-- For mathematician: This is a very first approximation to first derivate.SELECTid,emp_name,salary,salary-LAG(salary)OVER(ORDERBYid)asdiff_salaryFROMemployee;-- And the difference of differences:SELECTid,emp_name,salary,(LAG(salary)OVER(ORDERBYid)-salary)ASdiff_salary_1,(LAG(salary)OVER(ORDERBYid)-salary)-(LAG(salary,2)OVER(ORDERBYid)-LAG(salary)OVER(ORDERBYid))ASdiff_salary_2FROMemployee;
Show the 'surrounding' of a value: id and emp_name of all persons ordered by emp_name. Supplement each row with the two emp_names before and the two after the actual emp_name (in the usual alphabetical order).
SELECTid,LAG(emp_name,2)OVER(ORDERBYemp_name)ASbefore_prev,LAG(emp_name)OVER(ORDERBYemp_name)ASprev,emp_nameASact,LEAD(emp_name)OVER(ORDERBYemp_name)ASfollower,LEAD(emp_name,2)OVER(ORDERBYemp_name)ASbehind_followerFROMemployeeORDERBYemp_name;
| Structured Query Language Window functions | with clause |