Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitb33b1fe

Browse files
committed
Update bloom filter README.
1 parent9dbf1c9 commitb33b1fe

File tree

1 file changed

+39
-15
lines changed

1 file changed

+39
-15
lines changed

‎src/data-structures/bloom-filter/README.md

Lines changed: 39 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,61 @@
11
#Bloom Filter
22

3-
A bloom filter is a data structure designed to
4-
test whether an element is present in a set. It
5-
is designed to be blazingly fast and use minimal
6-
memory at the cost of potential false positives.
3+
A bloom filter is a space-efficient probabilistic
4+
data structure designed to test whether an element
5+
is present in a set. It is designed to be blazingly
6+
fast and use minimal memory at the cost of potential
7+
false positives. False positive matches are possible,
8+
but false negatives are not – in other words, a query
9+
returns either "possibly in set" or "definitely not in set".
10+
11+
Bloom proposed the technique for applications where the
12+
amount of source data would require an impractically large
13+
amount of memory if "conventional" error-free hashing
14+
techniques were applied.
15+
16+
##Algorithm description
17+
18+
An empty Bloom filter is a bit array of`m` bits, all
19+
set to`0`. There must also be`k` different hash functions
20+
defined, each of which maps or hashes some set element to
21+
one of the`m` array positions, generating a uniform random
22+
distribution. Typically,`k` is a constant, much smaller
23+
than`m`, which is proportional to the number of elements
24+
to be added; the precise choice of`k` and the constant of
25+
proportionality of`m` are determined by the intended
26+
false positive rate of the filter.
27+
28+
Here is an example of a Bloom filter, representing the
29+
set`{x, y, z}`. The colored arrows show the positions
30+
in the bit array that each set element is mapped to. The
31+
element`w` is not in the set`{x, y, z}`, because it
32+
hashes to one bit-array position containing`0`. For
33+
this figure,`m = 18` and`k = 3`.
734

835
![Bloom Filter](https://upload.wikimedia.org/wikipedia/commons/a/ac/Bloom_filter.svg)
936

1037
##Operations
1138

1239
There are two main operations a bloom filter can
13-
perform:insertion andsearch. Search may result in
40+
perform:_insertion_ and_search_. Search may result in
1441
false positives. Deletion is not possible.
1542

1643
In other words, the filter can take in items. When
1744
we go to check if an item has previously been
1845
inserted, it can tell us either "no" or "maybe".
1946

20-
Both insertion and search are O(1) operations.
47+
Both insertion and search are`O(1)` operations.
2148

2249
##Making the filter
2350

2451
A bloom filter is created by allotting a certain size.
25-
In our example, we use 100 as a default length. All
52+
In our example, we use`100` as a default length. All
2653
locations are initialized to`false`.
2754

2855
###Insertion
2956

3057
During insertion, a number of hash functions,
31-
in our case3 hash functions, are used to create
58+
in our case`3` hash functions, are used to create
3259
hashes of the input. These hash functions output
3360
indexes. At every index received, we simply change
3461
the value in our bloom filter to`true`.
@@ -65,13 +92,13 @@ The formula to calculate probablity of a false positive is:
6592

6693
( 1 - e <sup>-kn/m</sup> ) <sup>k</sup>
6794

68-
k =# hash functions
95+
`k` =number of hash functions
6996

70-
m = size
97+
`m` = filter size
7198

72-
n =# items inserted
99+
`n` =number of items inserted
73100

74-
These variables,k, m, andn, should be picked based
101+
These variables,`k`,`m`, and`n`, should be picked based
75102
on how acceptable false positives are. If the values
76103
are picked and the resulting probability is too high,
77104
the values should be tweaked and the probability
@@ -92,9 +119,6 @@ but the cost is acceptable. It's ok if a user never sees
92119
a few articles as long as they have other, brand new ones
93120
to see every time they visit the site.
94121

95-
The popular blog site Medium does a version of this.
96-
Feel free to read[their article](https://blog.medium.com/what-are-bloom-filters-1ec2a50c68ff).
97-
98122
##References
99123

100124
-[Wikipedia](https://en.wikipedia.org/wiki/Bloom_filter)

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp