- Notifications
You must be signed in to change notification settings - Fork5
Commitbe4b4dc
committed
Omit null rows when applying the Haas-Stokes estimator for ndistinct.
Previously, we included null rows in the values of n and N that wentinto the formula, which amounts to considering null as a value in itsown right; but the d and f1 values do not include nulls. This isinconsistent, and it contributes to significant underestimation ofndistinct when the column is mostly nulls. In any case stadistinctis defined as the number of distinct non-null values, so we shouldexclude nulls when doing this computation.This is an aboriginal bug in our application of the Haas-Stokes formula,but we'll refrain from back-patching for fear of destabilizing planchoices in released branches.While at it, make the code a bit more readable by omitting unnecessarycasts and intermediate variables.Observation and original patch by Tomas Vondra, adjusted to fix bothuses of the formula by Alex Shulgin, cosmetic improvements by me1 parent82c83b3 commitbe4b4dc
1 file changed
+38
-24
lines changedLines changed: 38 additions & 24 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
2072 | 2072 |
| |
2073 | 2073 |
| |
2074 | 2074 |
| |
| 2075 | + | |
| 2076 | + | |
| 2077 | + | |
| 2078 | + | |
| 2079 | + | |
| 2080 | + | |
2075 | 2081 |
| |
2076 | 2082 |
| |
2077 | 2083 |
| |
| |||
2081 | 2087 |
| |
2082 | 2088 |
| |
2083 | 2089 |
| |
2084 |
| - | |
2085 |
| - | |
2086 |
| - | |
2087 |
| - | |
2088 |
| - | |
| 2090 | + | |
| 2091 | + | |
| 2092 | + | |
2089 | 2093 |
| |
2090 |
| - | |
2091 |
| - | |
| 2094 | + | |
| 2095 | + | |
| 2096 | + | |
| 2097 | + | |
| 2098 | + | |
2092 | 2099 |
| |
2093 |
| - | |
2094 | 2100 |
| |
2095 |
| - | |
2096 |
| - | |
2097 |
| - | |
2098 |
| - | |
| 2101 | + | |
| 2102 | + | |
| 2103 | + | |
| 2104 | + | |
| 2105 | + | |
2099 | 2106 |
| |
2100 | 2107 |
| |
2101 | 2108 |
| |
| |||
2425 | 2432 |
| |
2426 | 2433 |
| |
2427 | 2434 |
| |
| 2435 | + | |
| 2436 | + | |
| 2437 | + | |
| 2438 | + | |
| 2439 | + | |
| 2440 | + | |
2428 | 2441 |
| |
2429 | 2442 |
| |
2430 | 2443 |
| |
2431 | 2444 |
| |
2432 | 2445 |
| |
2433 |
| - | |
2434 |
| - | |
2435 |
| - | |
2436 |
| - | |
2437 |
| - | |
| 2446 | + | |
| 2447 | + | |
| 2448 | + | |
2438 | 2449 |
| |
2439 |
| - | |
2440 |
| - | |
| 2450 | + | |
| 2451 | + | |
| 2452 | + | |
| 2453 | + | |
| 2454 | + | |
2441 | 2455 |
| |
2442 |
| - | |
2443 | 2456 |
| |
2444 |
| - | |
2445 |
| - | |
2446 |
| - | |
2447 |
| - | |
| 2457 | + | |
| 2458 | + | |
| 2459 | + | |
| 2460 | + | |
| 2461 | + | |
2448 | 2462 |
| |
2449 | 2463 |
| |
2450 | 2464 |
| |
|
0 commit comments
Comments
(0)