- Categories:
Aggregate functions (Cardinality Estimation) ,Window functions
APPROX_COUNT_DISTINCT¶
Uses HyperLogLog to return an approximation of the distinct cardinality of the input (i.e.HLL(col1,col2,...) returns an approximation ofCOUNT(DISTINCTcol1,col2,...)).
For more information about HyperLogLog, seeEstimating the Number of Distinct Values.
- Aliases:
HLL.
- See also:
Syntax¶
Aggregate function
APPROX_COUNT_DISTINCT([DISTINCT]<expr1>[,...])APPROX_COUNT_DISTINCT(*)
Window function
APPROX_COUNT_DISTINCT([DISTINCT]<expr1>[,...])OVER([PARTITIONBY<expr2>])APPROX_COUNT_DISTINCT(*)OVER([PARTITIONBY<expr2>])
Arguments¶
expr1This is the expression for which you want to know the number of distinct values.
expr2This is the optional expression used to group rows into partitions.
*Returns an approximation of the total number of records, excluding records with NULL values.
When you pass a wildcard to the function, you can qualify the wildcard with the name or alias for the table.For example, to pass in all of the columns from the table named
mytable, specify the following:(mytable.*)
CopyYou can also use the ILIKE and EXCLUDE keywords for filtering:
ILIKE filters for column names that match the specified pattern. Only onepattern is allowed. For example:
(*ILIKE'col1%')
CopyEXCLUDE filters out column names that don’t match the specified column or columns. For example:
(*EXCLUDEcol1)(*EXCLUDE(col1,col2))
Copy
Qualifiers are valid when you use these keywords. The following example uses the ILIKE keyword tofilter for all of the columns that match the pattern
col1%in the tablemytable:(mytable.*ILIKE'col1%')
CopyThe ILIKE and EXCLUDE keywords can’t be combined in a single function call.
For this function, the ILIKE and EXCLUDE keywords are valid only in a SELECT list or GROUP BY clause.
For more information about the ILIKE and EXCLUDE keywords, see the “Parameters” section inSELECT.
Returns¶
The data type of the returned value is INTEGER.
Usage notes¶
Although the computation is an approximation, it is deterministic. When this function is called with the same inputdata, this function returns the same results.
For information about NULL values and aggregate functions, seeAggregate functions and NULL values.
When this function is called as a window function, it does not support:
An ORDER BY clause within the OVER clause.
Explicit window frames.
Examples¶
This example shows how to use APPROX_COUNT_DISTINCT and its alias HLL. This example callsbothCOUNT(DISTINCTi) andAPPROX_COUNT_DISTINCT(i) to emphasizethat the results of those two functions do not always match exactly.
The exact output of the following query might vary because APPROX_COUNT_DISTINCT returns an approximation, not an exact value.
SELECTCOUNT(i),COUNT(DISTINCTi),APPROX_COUNT_DISTINCT(i),HLL(i)FROMsequence_demo;
+----------+-------------------+--------------------------+--------+| COUNT(I) | COUNT(DISTINCT I) | APPROX_COUNT_DISTINCT(I) | HLL(I) ||----------+-------------------+--------------------------+--------|| 1024 | 1024 | 1007 | 1007 |+----------+-------------------+--------------------------+--------+