Aggregate functions
Functions for computing asingle result from a set of input values. Elasticsearch SQL supports aggregate functions only alongsidegrouping (implicit or explicit).
AVG(numeric_field)Input:
- numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field.
Output:double numeric value
Description: Returns theAverage (arithmetic mean) of input values.
SELECT AVG(salary) AS avg FROM emp; avg---------------48248.55SELECT AVG(salary / 12.0) AS avg FROM emp; avg---------------4020.7125COUNT(expression)Input:
- a field name, wildcard (
*) or any numeric value. ForCOUNT(*)orCOUNT(<literal>), all values are considered, includingnullor missing ones. ForCOUNT(<field_name>),nullvalues are not considered.
Output: numeric value
Description: Returns the total number (count) of input values.
SELECT COUNT(*) AS count FROM emp; count---------------100COUNT(ALL field_name)Input:
- a field name. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field.
Output: numeric value
Description: Returns the total number (count) of allnon-null input values.COUNT(<field_name>) andCOUNT(ALL <field_name>) are equivalent.
SELECT COUNT(ALL last_name) AS count_all, COUNT(DISTINCT last_name) count_distinct FROM emp; count_all | count_distinct---------------+------------------100 |96SELECT COUNT(ALL CASE WHEN languages IS NULL THEN -1 ELSE languages END) AS count_all, COUNT(DISTINCT CASE WHEN languages IS NULL THEN -1 ELSE languages END) count_distinct FROM emp; count_all | count_distinct---------------+---------------100 |6COUNT(DISTINCT field_name)Input:
- a field name
Output: numeric value. If this field contains onlynull values, the function returnsnull. Otherwise, the function ignoresnull values in this field.
Description: Returns the total number ofdistinct non-null values in input values.
SELECT COUNT(DISTINCT hire_date) unique_hires, COUNT(hire_date) AS hires FROM emp; unique_hires | hires----------------+---------------99 |100SELECT COUNT(DISTINCT DATE_TRUNC('YEAR', hire_date)) unique_hires, COUNT(DATE_TRUNC('YEAR', hire_date)) AS hires FROM emp; unique_hires | hires---------------+---------------14 |100FIRST( field_name [, ordering_field_name])Input:
- target field for the aggregation
- optional field used for ordering
Output: same type as the input
Description: Returns the first non-null value (if such exists) of thefield_name input column sorted by theordering_field_name column. Ifordering_field_name is not provided, only thefield_name column is used for the sorting. E.g.:
| a | b |
|---|---|
| 100 | 1 |
| 200 | 1 |
| 1 | 2 |
| 2 | 2 |
| 10 | null |
| 20 | null |
| null | null |
SELECT FIRST(a) FROM twill result in:
| FIRST(a) |
|---|
| 1 |
and
SELECT FIRST(a, b) FROM twill result in:
| FIRST(a, b) |
|---|
| 100 |
SELECT FIRST(first_name) FROM emp; FIRST(first_name)--------------------AlejandroSELECT gender, FIRST(first_name) FROM emp GROUP BY gender ORDER BY gender; gender | FIRST(first_name)------------+--------------------null | BerniF | AlejandroM | AmabileSELECT FIRST(first_name, birth_date) FROM emp; FIRST(first_name, birth_date)--------------------------------RemziSELECT gender, FIRST(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender; gender | FIRST(first_name, birth_date)--------------+--------------------------------null | LillianF | SumantM | RemziFIRST_VALUE is a name alias and can be used instead ofFIRST, e.g.:
SELECT gender, FIRST_VALUE(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender; gender | FIRST_VALUE(first_name, birth_date)--------------+--------------------------------------null | LillianF | SumantM | RemziSELECT gender, FIRST_VALUE(SUBSTRING(first_name, 2, 6), birth_date) AS "first" FROM emp GROUP BY gender ORDER BY gender; gender | first---------------+---------------null |illianF |umantM |emziFIRST cannot be used in a HAVING clause.
FIRST cannot be used with columns of typetext unless the field is alsosaved as a keyword.
LAST( field_name [, ordering_field_name])Input:
- target field for the aggregation
- optional field used for ordering
Output: same type as the input
Description: It’s the inverse ofFIRST/FIRST_VALUE. Returns the last non-null value (if such exists) of thefield_name input column sorted descending by theordering_field_name column. Ifordering_field_name is not provided, only thefield_name column is used for the sorting. E.g.:
| a | b |
|---|---|
| 10 | 1 |
| 20 | 1 |
| 1 | 2 |
| 2 | 2 |
| 100 | null |
| 200 | null |
| null | null |
SELECT LAST(a) FROM twill result in:
| LAST(a) |
|---|
| 200 |
and
SELECT LAST(a, b) FROM twill result in:
| LAST(a, b) |
|---|
| 2 |
SELECT LAST(first_name) FROM emp; LAST(first_name)-------------------ZvonkoSELECT gender, LAST(first_name) FROM emp GROUP BY gender ORDER BY gender; gender | LAST(first_name)------------+-------------------null | PatricioF | XinglinM | ZvonkoSELECT LAST(first_name, birth_date) FROM emp; LAST(first_name, birth_date)-------------------------------HilariSELECT gender, LAST(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender; gender | LAST(first_name, birth_date)-----------+-------------------------------null | EberhardtF | ValdiodioM | HilariLAST_VALUE is a name alias and can be used instead ofLAST, e.g.:
SELECT gender, LAST_VALUE(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender; gender | LAST_VALUE(first_name, birth_date)-----------+-------------------------------------null | EberhardtF | ValdiodioM | HilariSELECT gender, LAST_VALUE(SUBSTRING(first_name, 3, 8), birth_date) AS "last" FROM emp GROUP BY gender ORDER BY gender; gender | last---------------+---------------null |erhardtF |ldiodioM |lariLAST cannot be used inHAVING clause.
LAST cannot be used with columns of typetext unless the field is alsosaved as a keyword.
MAX(field_name)Input:
- a numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field.
Output: same type as the input
Description: Returns the maximum value across input values in the fieldfield_name.
SELECT MAX(salary) AS max FROM emp; max---------------74999SELECT MAX(ABS(salary / -12.0)) AS max FROM emp; max-----------------6249.916666666667MAX on a field of typetext orkeyword is translated intoLAST/LAST_VALUE and therefore, it cannot be used inHAVING clause.
MIN(field_name)Input:
- a numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field.
Output: same type as the input
Description: Returns the minimum value across input values in the fieldfield_name.
SELECT MIN(salary) AS min FROM emp; min---------------25324MIN on a field of typetext orkeyword is translated intoFIRST/FIRST_VALUE and therefore, it cannot be used inHAVING clause.
SUM(field_name)Input:
- a numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field.
Output:bigint for integer input,double for floating points
Description: Returns the sum of input values in the fieldfield_name.
SELECT SUM(salary) AS sum FROM emp; sum---------------4824855SELECT ROUND(SUM(salary / 12.0), 1) AS sum FROM emp; sum---------------402071.3KURTOSIS(field_name)Input:
- a numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field.
Output:double numeric value
Description:
Quantify the shape of the distribution of input values in the fieldfield_name.
SELECT MIN(salary) AS min, MAX(salary) AS max, KURTOSIS(salary) AS k FROM emp; min | max | k---------------+---------------+------------------25324 |74999 |2.0444718929142986KURTOSIS cannot be used on top of scalar functions or operators but only directly on a field. So, for example, the following is not allowed and an error is returned:
SELECT KURTOSIS(salary / 12.0), gender FROM emp GROUP BY genderMAD(field_name)Input:
- a numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field.
Output:double numeric value
Description:
Measure the variability of the input values in the fieldfield_name.
SELECT MIN(salary) AS min, MAX(salary) AS max, AVG(salary) AS avg, MAD(salary) AS mad FROM emp; min | max | avg | mad---------------+---------------+---------------+---------------25324 |74999 |48248.55 |10096.5SELECT MIN(salary / 12.0) AS min, MAX(salary / 12.0) AS max, AVG(salary/ 12.0) AS avg, MAD(salary / 12.0) AS mad FROM emp; min | max | avg | mad------------------+-----------------+---------------+-----------------2110.3333333333335|6249.916666666667|4020.7125 |841.3750000000002PERCENTILE( field_name, percentile[, method[, method_parameter]])Input:
- a numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field. - a numeric expression (must be a constant and not based on a field). If
null, the function returnsnull. - optional string literal for thepercentile algorithm. Possible values:
tdigestorhdr. Defaults totdigest. - optional numeric literal that configures thepercentile algorithm. Configures
compressionfortdigestornumber_of_significant_value_digitsforhdr. The default is the same as that of the backing algorithm.
Output:double numeric value
Description:
Returns the nthpercentile (represented bynumeric_exp parameter) of input values in the fieldfield_name.
SELECT languages, PERCENTILE(salary, 95) AS "95th" FROM emp GROUP BY languages; languages | 95th---------------+-----------------null |74482.41 |71122.82 |70271.43 |71926.04 |69352.155 |56371.0SELECT languages, PERCENTILE(salary / 12.0, 95) AS "95th" FROM emp GROUP BY languages; languages | 95th---------------+------------------null |6206.8666666666671 |5926.92 |5855.9499999999993 |5993.8333333333334 |5779.3458333333335 |4697.583333333333SELECT languages, PERCENTILE(salary, 97.3, 'tdigest', 100.0) AS "97.3_TDigest", PERCENTILE(salary, 97.3, 'hdr', 3) AS "97.3_HDR"FROM empGROUP BY languages; languages | 97.3_TDigest | 97.3_HDR---------------+-----------------+---------------null |74720.036 |74992.01 |72316.132 |73712.02 |71792.436 |69936.03 |73326.23999999999|74992.04 |71753.281 |74608.05 |61176.16000000001|56368.0PERCENTILE_RANK( field_name, value[, method[, method_parameter]])Input:
- a numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field. - a numeric expression (must be a constant and not based on a field). If
null, the function returnsnull. - optional string literal for thepercentile algorithm. Possible values:
tdigestorhdr. Defaults totdigest. - optional numeric literal that configures thepercentile algorithm. Configures
compressionfortdigestornumber_of_significant_value_digitsforhdr. The default is the same as that of the backing algorithm.
Output:double numeric value
Description:
Returns the nthpercentile rank (represented bynumeric_exp parameter) of input values in the fieldfield_name.
SELECT languages, PERCENTILE_RANK(salary, 65000) AS rank FROM emp GROUP BY languages; languages | rank---------------+-----------------null |73.657665699620621 |73.72916251577342 |88.880056070106433 |79.436626232958294 |85.704463896434935 |96.79075152940749SELECT languages, PERCENTILE_RANK(salary/12, 5000) AS rank FROM emp GROUP BY languages; languages | rank---------------+------------------null |66.912408759124091 |66.707667076670762 |84.132668950482713 |61.0529926256216844 |76.556464439900015 |94.00696864111498SELECT languages, ROUND(PERCENTILE_RANK(salary, 65000, 'tdigest', 100.0), 2) AS "rank_TDigest", ROUND(PERCENTILE_RANK(salary, 65000, 'hdr', 3), 2) AS "rank_HDR"FROM empGROUP BY languages; languages | rank_TDigest | rank_HDR---------------+---------------+---------------null |73.66 |80.01 |73.73 |73.332 |88.88 |89.473 |79.44 |76.474 |85.7 |83.335 |96.79 |95.24SKEWNESS(field_name)Input:
- a numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field.
Output:double numeric value
Description:
Quantify the asymmetric distribution of input values in the fieldfield_name.
SELECT MIN(salary) AS min, MAX(salary) AS max, SKEWNESS(salary) AS s FROM emp; min | max | s---------------+---------------+------------------25324 |74999 |0.2707722118423227SKEWNESS cannot be used on top of scalar functions but only directly on a field. So, for example, the following is not allowed and an error is returned:
SELECT SKEWNESS(ROUND(salary / 12.0, 2), gender FROM emp GROUP BY genderSTDDEV_POP(field_name)Input:
- a numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field.
Output:double numeric value
Description:
Returns thepopulation standard deviation of input values in the fieldfield_name.
SELECT MIN(salary) AS min, MAX(salary) AS max, STDDEV_POP(salary) AS stddev FROM emp; min | max | stddev---------------+---------------+------------------25324 |74999 |13765.125502787832SELECT MIN(salary / 12.0) AS min, MAX(salary / 12.0) AS max, STDDEV_POP(salary / 12.0) AS stddev FROM emp; min | max | stddev------------------+-----------------+-----------------2110.3333333333335|6249.916666666667|1147.093791898986STDDEV_SAMP(field_name)Input:
- a numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field.
Output:double numeric value
Description:
Returns thesample standard deviation of input values in the fieldfield_name.
SELECT MIN(salary) AS min, MAX(salary) AS max, STDDEV_SAMP(salary) AS stddev FROM emp; min | max | stddev---------------+---------------+------------------25324 |74999 |13834.471662090747SELECT MIN(salary / 12.0) AS min, MAX(salary / 12.0) AS max, STDDEV_SAMP(salary / 12.0) AS stddev FROM emp; min | max | stddev------------------+-----------------+-----------------2110.3333333333335|6249.916666666667|1152.872638507562SUM_OF_SQUARES(field_name)Input:
- a numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field.
Output:double numeric value
Description:
Returns the sum of squares of input values in the fieldfield_name.
SELECT MIN(salary) AS min, MAX(salary) AS max, SUM_OF_SQUARES(salary) AS sumsq FROM emp; min | max | sumsq---------------+---------------+----------------25324 |74999 |2.51740125721E11SELECT MIN(salary / 24.0) AS min, MAX(salary / 24.0) AS max, SUM_OF_SQUARES(salary / 24.0) AS sumsq FROM emp; min | max | sumsq------------------+------------------+-------------------1055.1666666666667|3124.9583333333335|4.370488293767361E8VAR_POP(field_name)Input:
- a numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field.
Output:double numeric value
Description:
Returns thepopulation variance of input values in the fieldfield_name.
SELECT MIN(salary) AS min, MAX(salary) AS max, VAR_POP(salary) AS varpop FROM emp; min | max | varpop---------------+---------------+----------------25324 |74999 |1.894786801075E8SELECT MIN(salary / 24.0) AS min, MAX(salary / 24.0) AS max, VAR_POP(salary / 24.0) AS varpop FROM emp; min | max | varpop------------------+------------------+------------------1055.1666666666667|3124.9583333333335|328956.04185329855VAR_SAMP(field_name)Input:
- a numeric field. If this field contains only
nullvalues, the function returnsnull. Otherwise, the function ignoresnullvalues in this field.
Output:double numeric value
Description:
Returns thesample variance of input values in the fieldfield_name.
SELECT MIN(salary) AS min, MAX(salary) AS max, VAR_SAMP(salary) AS varsamp FROM emp; min | max | varsamp---------------+---------------+----------------25324 |74999 |1.913926061691E8SELECT MIN(salary / 24.0) AS min, MAX(salary / 24.0) AS max, VAR_SAMP(salary / 24.0) AS varsamp FROM emp; min | max | varsamp------------------+------------------+----------------1055.1666666666667|3124.9583333333335|332278.830154847