









技术领域technical field
本发明涉及计算机信息表示与信息检索技术领域,特别涉及一种标签布谷鸟过滤器。The invention relates to the technical field of computer information representation and information retrieval, in particular to a tag cuckoo filter.
背景技术Background technique
成员隶属查询(Membership Query)是许多网络应用和分布式系统(例如:协同缓存、数据包处理、键值存储和重复数据删除)的关键方法之一,要求满足存储空间开销低、快速查询和增量更新等三个关键需求。目前成员隶属查询普遍采用布鲁姆过滤器(BloomFilter)、标准布鲁姆过滤器(Standard Bloom Filter)、计数布鲁姆过滤器(CountingBloom Filter)以及布谷鸟过滤器(Cuckoo Filter)等,但是布鲁姆过滤器(Bloom Filter)及其变型难以同时满足上述三个关键需求。例如,标准布鲁姆过滤器支持元素插入和查询操作,但是不支持元素删除操作。计数布鲁姆过滤器是一种支持删除操作的布鲁姆过滤器,但是其存储空间开销高。布谷鸟过滤器是一种支持删除操作的空间高效布鲁姆过滤器,显著减少计数布鲁姆过滤器的存储空间开销,甚至比标准布鲁姆过滤器的存储空间开销低。但是,现有布谷鸟过滤器存在每个数据成员的存储空间开销随元素个数动态变化的问题,其原因是布谷鸟过滤器的异或操作要求存储桶数必须为2的幂(即2b,b为指数),导致最坏情况的每个数据成员的存储空间开销增大2倍。Membership Query is one of the key methods for many network applications and distributed systems (such as collaborative caching, packet processing, key-value storage, and data deduplication), which requires low storage space overhead, fast query and increase. Three key requirements such as volume update. At present, member membership queries generally use Bloom Filter, Standard Bloom Filter, Counting Bloom Filter and Cuckoo Filter, etc. It is difficult for Bloom Filters and their variants to satisfy the above three key requirements at the same time. For example, standard Bloom filters support element insertion and query operations, but not element deletion. The count bloom filter is a bloom filter that supports delete operations, but it has a high storage space overhead. The cuckoo filter is a space-efficient bloom filter that supports delete operations, significantly reducing the storage space overhead of a counting bloom filter, even lower than that of a standard bloom filter. However, the existing cuckoo filter has the problem that the storage space overhead of each data member changes dynamically with the number of elements. The reason is that the XOR operation of the cuckoo filter requires that the number of buckets must be a power of 2 (that is, 2b , where b is an exponent), resulting in a 2x increase in the storage space overhead of each data member in the worst case.
因而现有技术还有待改进和提高。Therefore, the existing technology still needs to be improved and improved.
发明内容SUMMARY OF THE INVENTION
本发明要解决的技术问题在于,针对现有技术的不足,提供一种标签布谷鸟过滤器。The technical problem to be solved by the present invention is to provide a label cuckoo filter for the deficiencies of the prior art.
为了解决上述技术问题,本发明所采用的技术方案如下:In order to solve the above-mentioned technical problems, the technical scheme adopted in the present invention is as follows:
一种标签布谷鸟过滤器,其中,其包括布谷鸟哈希表,所述布谷鸟哈希表包括若干存储桶,每个数据成员对应两个标签指纹,两个标签指纹分别存储于两个存储桶内;当标签布谷鸟过滤器接收到数据成员管理操作时,基于异或操作确定预设数据成员对应的两个候选存储桶以及两个标签指纹,并基于确定得到的两个候选存储桶以及两个标签指纹执行所述数据成员管理操作,其中,所述预设数据成员为所述数据成员管理操作对应的数据成员。A tag cuckoo filter, wherein, it includes a cuckoo hash table, the cuckoo hash table includes several storage buckets, each data member corresponds to two tag fingerprints, and the two tag fingerprints are respectively stored in two storage buckets. In the bucket; when the tag cuckoo filter receives the data member management operation, it determines two candidate buckets and two tag fingerprints corresponding to the preset data member based on the XOR operation, and based on the determined two candidate buckets and The two tag fingerprints perform the data member management operation, wherein the preset data member is the data member corresponding to the data member management operation.
所述标签布谷鸟过滤器,其中,所述标签指纹包括标签以及标签指纹余数,所述预设数据成员对应的两个标签指纹的标签指纹余数相同。In the label cuckoo filter, wherein the label fingerprint includes a label and a label fingerprint remainder, and the label fingerprint remainders of the two label fingerprints corresponding to the preset data members are the same.
所述标签布谷鸟过滤器,其中,所述基于异或操作确定预设数据成员对应的两个候选存储桶以及两个标签指纹,具体包括:The label cuckoo filter, wherein the two candidate storage buckets and two label fingerprints corresponding to the preset data members are determined based on the XOR operation, specifically including:
确定所述预设数据成员对应的标签指纹余数及第一候选存储桶;determining the remainder of the tag fingerprint corresponding to the preset data member and the first candidate storage bucket;
基于所述标签指纹余数以及所述第一候选存储桶,采用异或操作确定所述预设数据成员对应到的第二候选存储桶;Based on the label fingerprint remainder and the first candidate storage bucket, an exclusive OR operation is used to determine the second candidate storage bucket corresponding to the preset data member;
根据所述第一候选存储桶确定预设数据成员对应的第一标签,以及根据第二候选存储桶确定预设数据成员对应的第二标签;Determine the first label corresponding to the preset data member according to the first candidate storage bucket, and determine the second label corresponding to the preset data member according to the second candidate storage bucket;
根据所述第一标签及所述标签指纹余数确定预设数据成员对应的第一标签指纹,以及根据所述第二标签及所述标签指纹余数确定所述预设数据成员对应的第二标签指纹;A first tag fingerprint corresponding to a preset data member is determined according to the first tag and the remainder of the tag fingerprint, and a second tag fingerprint corresponding to the preset data member is determined according to the second tag and the remainder of the tag fingerprint ;
根据所述存储桶数分别对第一候选存储桶和第二候选存储桶进行修正,并将修正后的第一候选存储桶和第二存储候选桶作为预设数据成员对应的第一候选存储桶和第二存储候选桶。Modify the first candidate storage bucket and the second candidate storage bucket respectively according to the number of storage buckets, and use the revised first candidate storage bucket and the second storage candidate bucket as the first candidate storage bucket corresponding to the preset data member and the second storage candidate bucket.
所述标签布谷鸟过滤器,其中,当所述数据成员管理操作为插入操作时,所述基于确定得到的两个候选存储桶以及两个标签指纹执行所述数据成员管理操作具体包括:In the tag cuckoo filter, when the data member management operation is an insert operation, the execution of the data member management operation based on the determined two candidate storage buckets and the two tag fingerprints specifically includes:
检测所述第一候选存储桶和所述第二候选存储桶是否存在空闲存储位置;Detecting whether there are free storage locations in the first candidate storage bucket and the second candidate storage bucket;
若存在空闲存储位置,则将预设数据成员的候选标签指纹存储于该空闲存储位置,其中,所述候选标签指纹为该空闲存储位置所属候选存储桶对应的指纹标签。If there is an idle storage location, the candidate tag fingerprint of the preset data member is stored in the idle storage location, wherein the candidate tag fingerprint is the fingerprint tag corresponding to the candidate storage bucket to which the idle storage location belongs.
所述标签布谷鸟过滤器,其中,当所述数据成员管理操作为插入操作时,所述基于确定得到的两个候选存储桶以及两个标签指纹执行该数据成员管理操作包括:In the tag cuckoo filter, when the data member management operation is an insert operation, the performing the data member management operation based on the two determined candidate storage buckets and the two tag fingerprints includes:
若未存在空闲存储位置,则在所述第一候选存储桶和所述第二候选存储桶中选取一目标候选存储桶,将目标候选存储桶对应的标签指纹作为预设数据成员对应的标签指纹;If there is no free storage location, select a target candidate storage bucket from the first candidate storage bucket and the second candidate storage bucket, and use the tag fingerprint corresponding to the target candidate storage bucket as the tag fingerprint corresponding to the preset data member ;
在所述目标候选存储桶内选取一目标标签指纹,将所述预设数据成员对应的标签指纹存储于所述目标标签指纹对应的存储位置;Selecting a target tag fingerprint in the target candidate storage bucket, and storing the tag fingerprint corresponding to the preset data member in a storage location corresponding to the target tag fingerprint;
根据所述目标候选存储桶以及所述目标标签指纹,采用异或操作确定所述目标标签指纹对应的参考存储桶及参考标签指纹;According to the target candidate storage bucket and the target tag fingerprint, an exclusive OR operation is used to determine the reference storage bucket and the reference tag fingerprint corresponding to the target tag fingerprint;
若参考存储桶具有空闲存储位置,则将参考标签指纹存储于该空闲存储位置。If the reference bucket has a free storage location, the reference tag fingerprint is stored in the free storage location.
所述标签布谷鸟过滤器,其中,当所述数据成员管理操作为插入操作时,所述基于确定得到的两个候选存储桶以及两个标签指纹执行该数据成员管理操作包括:In the tag cuckoo filter, when the data member management operation is an insert operation, the performing the data member management operation based on the two determined candidate storage buckets and the two tag fingerprints includes:
若参考存储桶未具有空闲存储位置,在所述参考存储桶内选取一目标存储位置,将参考标签指纹存储于该目标存储位置;If the reference storage bucket does not have a free storage location, select a target storage location in the reference storage bucket, and store the reference tag fingerprint in the target storage location;
将该目标存储位置对应的标签指纹作为目标标签指纹,将该参考存储桶作为目标候选存储桶;The label fingerprint corresponding to the target storage location is used as the target label fingerprint, and the reference bucket is used as the target candidate bucket;
继续执行根据所述目标候选存储桶以及所述目标标签指纹,采用异或操作确定所述目标标签指纹对应的参考存储桶及参考标签指纹的步骤,直至参考存储桶具有空闲存储位置或执行次数达到预设次数阈值。Continue to perform the step of determining the reference bucket and the reference label fingerprint corresponding to the target label fingerprint by using the XOR operation according to the target candidate bucket and the target label fingerprint, until the reference bucket has an idle storage location or the number of executions reaches Preset count threshold.
所述标签布谷鸟过滤器,其中,所述根据所述目标候选存储桶以及所述目标标签指纹,采用异或操作确定所述目标标签指纹对应的参考存储桶及参考标签指纹具体包括:The tag cuckoo filter, wherein, according to the target candidate storage bucket and the target tag fingerprint, the use of an exclusive OR operation to determine the reference storage bucket and the reference tag fingerprint corresponding to the target tag fingerprint specifically includes:
获取所述目标标签指纹的目标标签,并基于所述目标标签更新所述目标候选存储桶;obtaining the target label of the target label fingerprint, and updating the target candidate storage bucket based on the target label;
基于更新后的目标候选存储桶以及目标标签指纹的标签指纹余数,采用异或操作确定目标标签指纹对应的对照存储桶;Based on the updated target candidate bucket and the label fingerprint remainder of the target label fingerprint, the XOR operation is used to determine the comparison bucket corresponding to the target label fingerprint;
基于所述对照存储桶以及目标标签指纹的标签指纹余数,确定所述目标标签指纹对应的参考标签指纹,并基于存储桶数对所述对照存储桶进行修正,以得到目标标签指纹对应的参考存储桶。Based on the comparison bucket and the tag fingerprint remainder of the target tag fingerprint, the reference tag fingerprint corresponding to the target tag fingerprint is determined, and the comparison bucket is corrected based on the number of buckets to obtain the reference storage corresponding to the target tag fingerprint. bucket.
所述标签布谷鸟过滤器,其中,所述当所述数据成员管理操作为查询操作或删除操作时,所述基于确定得到的两个候选存储桶以及两个标签指纹执行该数据成员管理操作具体包括:The tag cuckoo filter, wherein, when the data member management operation is a query operation or a deletion operation, the data member management operation is performed based on the two candidate storage buckets and the two tag fingerprints determined to be obtained. Specifically: include:
在所述第一候选存储桶和所述第二候选存储桶中分别查找该预设数据成员对应的第一标签指纹以及第二标签指纹;Searching for the first tag fingerprint and the second tag fingerprint corresponding to the preset data member in the first candidate storage bucket and the second candidate storage bucket respectively;
若查找到该预设数据成员对应的第一标签指纹或第二标签指纹,则对该指纹执行该数据成员管理操作。If the first tag fingerprint or the second tag fingerprint corresponding to the preset data member is found, the data member management operation is performed for the fingerprint.
所述标签布谷鸟过滤器,其中,所述对该指纹执行该数据成员管理操作具体包括:In the tag cuckoo filter, the performing the data member management operation on the fingerprint specifically includes:
当数据成员管理操作为查询操作时,则提示数据成员查询成功;When the data member management operation is a query operation, it will prompt that the data member query is successful;
当数据成员管理操作为删除操作时,则删除查找到的预设数据成员对应的标签指纹,其中,所述标签指纹为第一标签指纹或第二标签指纹。When the data member management operation is a deletion operation, the tag fingerprint corresponding to the found preset data member is deleted, wherein the tag fingerprint is the first tag fingerprint or the second tag fingerprint.
所述标签布谷鸟过滤器,其中,所述当所述数据成员管理操作为查询操作或删除操作时,所述基于确定得到的两个候选存储桶以及两个标签指纹执行该数据成员管理操作具体包括:The tag cuckoo filter, wherein, when the data member management operation is a query operation or a deletion operation, the data member management operation is performed based on the two candidate storage buckets and the two tag fingerprints determined to be obtained. Specifically: include:
若未查找到预设数据成员对应的第一标签指纹或第二标签指纹,则提示数据成员管理操作失败。If the first tag fingerprint or the second tag fingerprint corresponding to the preset data member is not found, it will prompt that the data member management operation fails.
有益效果:与现有技术相比,本发明提供了一种标签布谷鸟过滤器,所述标签布谷鸟过滤器包括布谷鸟哈希表,所述布谷鸟哈希表包括存储桶数的存储桶,每个数据成员对应两个标签指纹,两个标签指纹分别存储于两个存储桶内;当标签布谷鸟过滤器接收到数据成员管理操作时,基于异或操作确定预设数据成员对应的两个候选存储桶以及两个标签指纹,并基于确定得到的两个候选存储桶以及两个标签指纹执行所述数据成员管理操作。本发明通过分配为每个数据成员操作配置两个标签指纹以及两个存储桶时,采用基于标签指纹的异或操作确定数据成员对应的候选存储桶,不要求存储桶数必须为2的幂,从而减少每个数据成员的存储空间开销。Beneficial effects: Compared with the prior art, the present invention provides a tag cuckoo filter, the tag cuckoo filter includes a cuckoo hash table, and the cuckoo hash table includes a bucket number of buckets , each data member corresponds to two label fingerprints, and the two label fingerprints are stored in two storage buckets respectively; when the label cuckoo filter receives the data member management operation, it determines the two corresponding to the preset data member based on the XOR operation. There are two candidate buckets and two tag fingerprints, and the data member management operation is performed based on the determined two candidate buckets and two tag fingerprints. In the present invention, when two label fingerprints and two storage buckets are configured for each data member operation, the XOR operation based on the label fingerprint is used to determine the candidate storage buckets corresponding to the data members, and the number of storage buckets is not required to be a power of 2. Thereby reducing the storage space overhead of each data member.
附图说明Description of drawings
图1为本发明提供的标签布谷鸟过滤器的一个示例。FIG. 1 is an example of a tag cuckoo filter provided by the present invention.
图2为本发明提供的标识指纹的格式图。FIG. 2 is a format diagram of an identification fingerprint provided by the present invention.
图3为当存储桶数不为2的幂时在本发明提供的标签布谷鸟过滤器中插入数据成员的流程图。FIG. 3 is a flow chart of inserting data members in the tag cuckoo filter provided by the present invention when the number of buckets is not a power of 2.
图4为在本发明提供的标签布谷鸟过滤器中插入数据成员的一个示例。FIG. 4 is an example of inserting data members in the tag cuckoo filter provided by the present invention.
图5为在本发明提供的标签布谷鸟过滤器中插入数据成员的另一个示例。FIG. 5 is another example of inserting data members in the tag cuckoo filter provided by the present invention.
图6为当存储桶数为2的幂时在本发明提供的标签布谷鸟过滤器中插入数据成员的流程图。FIG. 6 is a flow chart of inserting data members in the tag cuckoo filter provided by the present invention when the number of buckets is a power of 2.
图7为在本发明提供的标签布谷鸟过滤器中查找数据成员的流程图。FIG. 7 is a flow chart of finding data members in the tag cuckoo filter provided by the present invention.
图8为在本发明提供的标签布谷鸟过滤器中查找数据成员的示例。FIG. 8 is an example of finding data members in the tag cuckoo filter provided by the present invention.
图9为在本发明提供的标签布谷鸟过滤器中删除数据成员的流程图。FIG. 9 is a flow chart of deleting data members in the tag cuckoo filter provided by the present invention.
图10为在本发明提供的标签布谷鸟过滤器中删除数据成员的示例。FIG. 10 is an example of deleting data members in the tag cuckoo filter provided by the present invention.
具体实施方式Detailed ways
本发明提供一种标签布谷鸟过滤器,为使本发明的目的、技术方案及效果更加清楚、明确,以下参照附图并举实施例对本发明进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。The present invention provides a label cuckoo filter. In order to make the purpose, technical solution and effect of the present invention clearer and clearer, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本发明的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。It will be understood by those skilled in the art that the singular forms "a", "an", "the" and "the" as used herein can include the plural forms as well, unless expressly stated otherwise. It should be further understood that the word "comprising" used in the description of the present invention refers to the presence of stated features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components and/or groups thereof. It will be understood that when we refer to an element as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Furthermore, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combination of one or more of the associated listed items.
本技术领域技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语),具有与本发明所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语,应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非像这里一样被特定定义,否则不会用理想化或过于正式的含义来解释。It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It should also be understood that terms, such as those defined in a general dictionary, should be understood to have meanings consistent with their meanings in the context of the prior art and, unless specifically defined as herein, should not be interpreted in idealistic or overly formal meaning to explain.
发明人经过研究发现,成员隶属查询(Membership Query)是许多网络应用和分布式系统(例如:协同缓存、数据包处理、键值存储和重复数据删除)的关键方法之一,要求满足存储空间开销低、快速查询和增量更新等三个关键需求。目前成员隶属查询普遍采用布鲁姆过滤器(Bloom Filter)、标准布鲁姆过滤器(Standard Bloom Filter)、计数布鲁姆过滤器(Counting Bloom Filter)和布谷鸟过滤器(Cuckoo Filter)等。The inventor found through research that Membership Query is one of the key methods for many network applications and distributed systems (such as collaborative caching, data packet processing, key-value storage, and data deduplication), which requires storage space overhead. Three key requirements are low, fast query and incremental update. Currently, member membership queries generally use Bloom Filter, Standard Bloom Filter, Counting Bloom Filter, and Cuckoo Filter.
标准布鲁姆过滤器采用n个比特(即位图)来表示一个集合的m个元素(Item),即每个插入的元素采用k个哈希函数映射到位图的k个比特,该k个比特值设置为1。每个查询的元素采用相同的k个哈希函数映射到位图的k个比特,检查该k个比特值是否全为1;如果全为1,表明该元素在集合中;否则,表明该元素不在集合中。标准布鲁姆过滤器是一种空间高效的随机化数据结构,查询的假阳性错误率低(False Positive Rate)(即查询结果表明元素在集合中,但是该元素实际上不在该集合中),但不产生假阴性错误(False Negatives)(即如果查询结果表明元素不在集合中,那么该元素一定不在该集合中)。标准布鲁姆过滤器支持元素插入和查询操作,但是不支持元素删除操作。The standard Bloom filter uses n bits (ie bitmap) to represent m elements (Item) of a set, that is, each inserted element uses k hash functions to map k bits of the bitmap, the k bits The value is set to 1. The elements of each query are mapped to k bits of the bitmap using the same k hash functions, and check whether the k bits are all 1; if all are 1, it indicates that the element is in the set; otherwise, it indicates that the element is not in the set in the collection. The standard Bloom filter is a space-efficient randomized data structure with a low False Positive Rate for queries (that is, the query results indicate that the element is in the set, but the element is not actually in the set), But no false negative errors (False Negatives) (ie if the query result indicates that the element is not in the set, then the element must not be in the set). Standard Bloom filters support element insertion and query operations, but not element deletion operations.
计数布鲁姆过滤器是一种支持删除操作的布鲁姆过滤器,即采用n个计数器(Counter)来表示一个集合的m个元素。当插入元素时,采用k个哈希函数映射元素到k个计数器,该k个计数器值自增1;当删除元素时,该k个计数器值自减1。当查询元素时,采用相同的k个哈希函数映射元素到k个计数器,检查该k个计数器值是否全大于1;如果全大于1,表明该元素在集合中;否则,表明该元素不在集合中。在实际应用中,计数器大小设置为4比特,可避免计数器溢出问题。因此,计数布鲁姆过滤器支持快速增量更新,但是其存储空间开销高,是标准布鲁姆过滤器的4倍。Counting Bloom filter is a Bloom filter that supports delete operation, that is, it uses n counters (Counter) to represent m elements of a collection. When an element is inserted, k hash functions are used to map the element to k counters, and the value of the k counters is incremented by 1; when an element is deleted, the value of the k counters is decremented by 1. When querying elements, use the same k hash functions to map the elements to k counters, and check whether the k counter values are all greater than 1; if all are greater than 1, it indicates that the element is in the set; otherwise, it indicates that the element is not in the set middle. In practical applications, the counter size is set to 4 bits to avoid the counter overflow problem. Therefore, the counting bloom filter supports fast incremental updates, but it has a high storage space overhead, which is 4 times that of the standard bloom filter.
布谷鸟过滤器是一种支持删除操作的空间高效布鲁姆过滤器,显著减少计数布鲁姆过滤器的存储空间开销,甚至比标准布鲁姆过滤器的存储空间开销低。布谷鸟过滤器采用布谷鸟哈希表(Cuckoo Hash Table)和基于异或操作(XOR)的候选存储桶索引值计算,即在每个元素哈希映射的两个候选存储桶(Bucket)中,插入或删除或查询该元素的一个指纹(Fingerprint),而不是该元素本身。但是,布谷鸟过滤器存在每个数据成员的存储空间开销随元素个数动态变化的问题,其原因是布谷鸟过滤器的异或操作要求存储桶数必须为2的幂(即2b,b为指数),导致最坏情况的每个数据成员的存储空间开销增大2倍。The cuckoo filter is a space-efficient bloom filter that supports delete operations, significantly reducing the storage space overhead of a counting bloom filter, even lower than that of a standard bloom filter. The cuckoo filter uses the Cuckoo Hash Table and the candidate bucket index value based on the exclusive OR operation (XOR) to calculate, that is, in the two candidate buckets (Buckets) of each element hash map, Insert or delete or query a Fingerprint of the element, not the element itself. However, the cuckoo filter has the problem that the storage space overhead of each data member changes dynamically with the number of elements. The reason is that the XOR operation of the cuckoo filter requires that the number of buckets must be a power of 2 (
由上所述可以知,目前成员隶属查询所采用的过滤器无法满足成员隶属查询的要求。由此,在本发明实施例一种标签布谷鸟过滤器,所述标签布谷鸟过滤器包括布谷鸟哈希表,所述布谷鸟哈希表包括若干存储桶,每个数据成员对应两个标签指纹,两个标签指纹分别存储于两个存储桶内;当标签布谷鸟过滤器接收到数据成员管理操作时,基于异或操作确定预设数据成员对应的两个候选存储桶以及两个标签指纹,并基于确定得到的两个候选存储桶以及两个标签指纹执行所述数据成员管理操作。本发明通过分别为每个数据成员配置两个标签指纹以及两个存储桶,采用基于标签指纹的异或操作确定数据成员对应的候选存储桶,不要求存储桶数必须为2的幂,从而减少每个数据成员的存储空间开销。It can be seen from the above that the filters currently used for member membership query cannot meet the requirements of member membership query. Thus, in an embodiment of the present invention, a tag cuckoo filter includes a cuckoo hash table, the cuckoo hash table includes several storage buckets, and each data member corresponds to two tags Fingerprint, two label fingerprints are stored in two buckets respectively; when the label cuckoo filter receives the data member management operation, it determines two candidate buckets and two label fingerprints corresponding to the preset data member based on the XOR operation , and perform the data member management operation based on the determined two candidate buckets and the two tag fingerprints. The invention configures two label fingerprints and two storage buckets for each data member respectively, and adopts the XOR operation based on the label fingerprint to determine the candidate storage buckets corresponding to the data members, and does not require that the number of storage buckets must be a power of 2, thereby reducing the number of storage buckets. Storage overhead for each data member.
下面结合附图,通过对实施例的描述,对发明内容作进一步说明。In the following, the content of the invention will be further illustrated by describing the embodiments with reference to the accompanying drawings.
本实施提供了一种标签布谷鸟过滤器,如图1所示,所述标签布谷鸟过滤器包括布谷鸟哈希表,其中,所述布谷鸟哈希表为紧凑布谷鸟哈希表。所述标签布谷鸟过滤器为预设数据成员分配两个标签指纹,并分别将两个标签指纹分别映射至两个候选存储桶,并且在确定两个候选存储桶中的一个候选存储桶的索引后,可以基于异或操作确定另外一个候选存储桶的索引。可以理解的是,在获取到预设数据成员时,基于该预设数据成员可以计算得到预设数据成员对应的候选存储桶A的索引,再采用异或操作可以确定该预设数据成员对应的候选存储桶B的索引,以确定预设数据成员对应的两个候选存储桶。此外,所述布谷鸟哈希表对应的存储桶数不为2的幂,即存储桶数不为2b,b为指数,b可以为正整数。例如,存储桶数不为4,8,16等。当然,值得说明的是,当存储桶数m是2的幂时,本实施例中的标签布谷鸟过滤器退化为现有布谷鸟过滤器,标签指纹不包含标签且每个预设数据成员仅有一个指纹。This implementation provides a tag cuckoo filter. As shown in FIG. 1 , the tag cuckoo filter includes a cuckoo hash table, where the cuckoo hash table is a compact cuckoo hash table. The label cuckoo filter assigns two label fingerprints to preset data members, maps the two label fingerprints to two candidate storage buckets respectively, and determines the index of one candidate storage bucket in the two candidate storage buckets. After that, the index of another candidate bucket can be determined based on the XOR operation. It can be understood that, when the preset data member is acquired, the index of the candidate bucket A corresponding to the preset data member can be calculated based on the preset data member, and then the XOR operation can be used to determine the corresponding index of the preset data member. The index of the candidate bucket B to determine the two candidate buckets corresponding to the preset data members. In addition, the number of buckets corresponding to the cuckoo hash table is not a power of 2, that is, the number of buckets is not 2b , b is an exponent, and b may be a positive integer. For example, the number of buckets is not 4, 8, 16, etc. Of course, it is worth noting that when the number of buckets m is a power of 2, the tag cuckoo filter in this embodiment degenerates into the existing cuckoo filter, the tag fingerprint does not contain tags and each preset data member only has There is a fingerprint.
进一步,所述布谷鸟哈希表中的每个存储桶均包括指定数量的存储位置,其中,存储位置用于存储数据成员的标签指纹,并且每个存储位置存储一个数据成员的标签指纹。此外,所述指定数量可以根据实际需要确定。例如,所述指定数量为4等。当所述指定数量为4时,说明布谷鸟哈希表中的每个存储桶均包含四个存储位置,即每个存储桶可以存储四个数据成员的标签指纹。Further, each storage bucket in the cuckoo hash table includes a specified number of storage locations, wherein the storage locations are used to store tag fingerprints of data members, and each storage location stores a tag fingerprint of a data member. In addition, the specified number can be determined according to actual needs. For example, the specified number is 4 or the like. When the specified number is 4, it means that each bucket in the cuckoo hash table includes four storage locations, that is, each bucket can store tag fingerprints of four data members.
进一步,所述标签指纹包括标签(Tag)以及标签指纹余数(Remainder),所述标签用于表示标签指纹对应的候选存储桶索引与存储桶数(即存储桶的数量)的关系,所述指纹余数用于表示数据成员的指纹。在一个具体实现方式中,如图2所示,所述标签包含1个比特,表示存储该标签指纹的候选存储桶索引i是否小于总存储桶数m,如果i<m,指纹的标签值设置为0,否则,指纹的标签值设置为1。所述标签指纹余数包含标签指纹的剩余r个比特,每个数据成员对应的两个标签指纹的标签指纹余数相同,两个标签指纹对应的标签可以相同,也可以不同。Further, the tag fingerprint includes a tag (Tag) and a tag fingerprint remainder (Remainder), and the tag is used to represent the relationship between the candidate bucket index corresponding to the tag fingerprint and the number of buckets (that is, the number of buckets). The remainder is used to represent the fingerprint of the data member. In a specific implementation, as shown in Figure 2, the tag contains 1 bit, indicating whether the candidate bucket index i storing the tag fingerprint is less than the total number of buckets m, if i<m, the tag value of the fingerprint is set is 0, otherwise, the tag value of the fingerprint is set to 1. The label fingerprint remainder includes the remaining r bits of the label fingerprint, the label fingerprint remainders of the two label fingerprints corresponding to each data member are the same, and the labels corresponding to the two label fingerprints may be the same or different.
所述数据成员管理操作包括插入操作、查询操作以及删除操作中的一种或多种。在本实施例中,所述数据成员管理操作可以为插入操作、查询操作或者删除操作。可以理解的是,所述标签布谷鸟过滤器支持插入操作、查询操作以及删除操作。下面分别以实现方式的形成对插入操作、查询操作以及删除操作进行具体说明。The data member management operations include one or more of insert operations, query operations, and delete operations. In this embodiment, the data member management operation may be an insert operation, a query operation or a delete operation. It can be understood that, the tag cuckoo filter supports insertion operation, query operation and deletion operation. The insert operation, the query operation, and the delete operation are described in detail below in terms of the formation of the implementation manners.
在本实施例的一个实现方式中,所述数据成员管理操作为插入操作;当标签布谷鸟过滤器的存储桶数不为2的幂时,如图3所示,所述当标签布谷鸟过滤器接收到数据成员管理操作时,基于异或操作确定预设数据成员对应的两个候选存储桶以及两个标签指纹,并基于确定得到的两个候选存储桶以及两个标签指纹执行所述数据成员管理操作具体包括:In an implementation manner of this embodiment, the data member management operation is an insert operation; when the number of buckets of the tag cuckoo filter is not a power of 2, as shown in FIG. 3 , when the tag cuckoo filter When the device receives the data member management operation, it determines two candidate buckets and two tag fingerprints corresponding to the preset data member based on the XOR operation, and executes the data based on the determined two candidate buckets and two tag fingerprints. Member management operations include:
A10、确定所述预设数据成员对应的标签指纹余数及第一候选存储桶;A10. Determine the label fingerprint remainder and the first candidate storage bucket corresponding to the preset data member;
A20、基于所述标签指纹余数以及所述第一候选存储桶,采用异或操作确定所述预设数据成员对应到的第二候选存储桶;A20. Based on the tag fingerprint remainder and the first candidate storage bucket, adopt an exclusive OR operation to determine the second candidate storage bucket corresponding to the preset data member;
A30、根据所述第一候选存储桶确定预设数据成员对应的第一标签,以及根据第二候选存储桶确定预设数据成员对应的第二标签;A30. Determine the first label corresponding to the preset data member according to the first candidate storage bucket, and determine the second label corresponding to the preset data member according to the second candidate storage bucket;
A40、根据所述第一标签及所述标签指纹余数确定预设数据成员对应的第一标签指纹,以及根据所述第二标签及所述标签指纹余数确定所述预设数据成员对应的第二标签指纹;A40. Determine a first tag fingerprint corresponding to a preset data member according to the first tag and the remainder of the tag fingerprint, and determine a second tag corresponding to the preset data member according to the second tag and the remainder of the tag fingerprint label fingerprint;
A50、根据所述存储桶数分别对第一候选存储桶和第二候选存储桶进行修正,并将修正后的第一候选存储桶和第二存储候选桶作为预设数据成员对应的第一候选存储桶和第二存储候选桶;A50. Modify the first candidate storage bucket and the second candidate storage bucket respectively according to the number of storage buckets, and use the corrected first candidate storage bucket and the second storage candidate bucket as the first candidate corresponding to the preset data member the bucket and the second storage candidate bucket;
A60、检测所述第一候选存储桶和所述第二候选存储桶是否存在空闲存储位置,若存在空闲存储位置,则执行步骤A70;若未存在空闲存储位置,则执行步骤A80;A60. Detect whether there is an idle storage location in the first candidate storage bucket and the second candidate storage bucket, if there is an idle storage location, perform step A70; if there is no idle storage location, perform step A80;
A70、将预设数据成员的候选标签指纹存储于该空闲存储位置,其中,所述候选标签指纹为该空闲存储位置所属候选存储桶对应的指纹标签;A70. Store the candidate tag fingerprint of the preset data member in the free storage location, wherein the candidate tag fingerprint is the fingerprint tag corresponding to the candidate storage bucket to which the free storage location belongs;
A80、在所述第一候选存储桶和所述第二候选存储桶中选取一目标候选存储桶,将目标候选存储桶对应的标签指纹作为预设数据成员对应的标签指纹;A80. Select a target candidate storage bucket from the first candidate storage bucket and the second candidate storage bucket, and use the tag fingerprint corresponding to the target candidate storage bucket as the tag fingerprint corresponding to the preset data member;
A90、在所述目标候选存储桶内选取一目标标签指纹,将所述预设数据成员对应的标签指纹存储于所述目标标签指纹对应的存储位置;A90, select a target tag fingerprint in the target candidate storage bucket, and store the tag fingerprint corresponding to the preset data member in the storage location corresponding to the target tag fingerprint;
A100、根据所述目标候选存储桶以及所述目标标签指纹,采用异或操作确定所述目标标签指纹对应的参考存储桶及参考标签指纹;若参考存储桶具有空闲存储位置,则执行步骤A110;若参考存储桶未具有空闲存储位置,则执行步骤A120;A100. According to the target candidate storage bucket and the target tag fingerprint, use an exclusive OR operation to determine the reference storage bucket and the reference tag fingerprint corresponding to the target tag fingerprint; if the reference storage bucket has an idle storage location, perform step A110; If the reference storage bucket does not have a free storage location, perform step A120;
A110、将参考标签指纹存储于该空闲存储位置;A110. Store the reference tag fingerprint in the free storage location;
A120、所述参考存储桶内选取一目标存储位置,将参考标签指纹存储于该目标存储位置;A120. Select a target storage location in the reference storage bucket, and store the reference tag fingerprint in the target storage location;
A130、将该目标存储位置对应的标签指纹作为目标标签指纹,将该参考存储桶作为目标候选存储桶;继续执行步骤A100,直至参考存储桶具有空闲存储位置或执行次数达到预设次数阈值。A130. Use the tag fingerprint corresponding to the target storage location as the target tag fingerprint, and use the reference storage bucket as the target candidate storage bucket; continue to perform step A100 until the reference storage bucket has an idle storage location or the execution times reaches a preset number of times threshold.
具体地,在所述步骤A10中,所述预设数据成员为所述插入操作对应的数据成员,在接收到所述插入操作时,获取到该预设数据成员。在获取到预设数据成员后,基于哈希函数G(x)计算该预设数据成员对应的标签指纹余数以及第一候选存储桶的索引h0(x);其中,所述标签指纹余数以及第一候选存储桶的索引的计算公式可以为:Specifically, in the step A10, the preset data member is a data member corresponding to the insert operation, and the preset data member is acquired when the insert operation is received. After acquiring the preset data member, calculate the remainder of the tag fingerprint corresponding to the preset data member and the index h0 (x) of the first candidate storage bucket based on the hash function G(x); wherein the remainder of the tag fingerprint and The calculation formula of the index of the first candidate bucket may be:
h0(x):rx=G(x) (1)h0 (x):rx =G(x) (1)
其中,rx为预设数据成员对应的标签指纹余数,h0(x)为第一候选存储桶的索引,x为预设数据成员,G(x)为预设数据成员x对应的哈希值,其中,所述哈希值中的低位数值为rx,高位数值为h0(x),“:”为数值连接符,所述h0(x)的取值范围为[0,...,M-1],其中,M为大于m的最小的2的幂,m为布谷鸟哈希表对应的存储桶数。例如,若m=6,则M=8,若m=14,则M=16。Among them, rx is the label fingerprint remainder corresponding to the preset data member, h0 (x) is the index of the first candidate bucket, x is the preset data member, and G(x) is the hash corresponding to the preset data member x value, wherein the low digit value in the hash value is rx , the high digit value is h0 (x), “:” is a numerical connector, and the value range of the h0 (x) is [0,. ..,M-1], where M is the smallest power of 2 greater than m, and m is the number of buckets corresponding to the cuckoo hash table. For example, if m=6, then M=8, and if m=14, then M=16.
进一步,在所述步骤A20中,在确定得到预设数据成员对应的标签指纹余数以及第一候选存储桶的索引后,可以采用异或操作确定所述预设数据成员对应的第二候选存储桶。其中,所述第二候选存储桶的索引h1(x)的计算公式可以为:Further, in the step A20, after it is determined that the remainder of the label fingerprint corresponding to the preset data member and the index of the first candidate storage bucket are obtained, an XOR operation can be used to determine the second candidate storage bucket corresponding to the preset data member. . Wherein, the calculation formula of the index h1 (x) of the second candidate storage bucket may be:
其中,rx为预设数据成员对应的标签指纹余数,h0(x)为第一候选存储桶的索引,x为预设数据成员,是异或操作符,“mod”是求模操作符,H(rx)为标签指纹余数对应的哈希值,哈希值H(rx)的取值范围为[0,...,M-1];h1(x)为第二候选存储桶的索引,h1(x)的取值范围为[0,...,M-1],其中,M为大于m的最小的2的幂,m为布谷鸟哈希表对应的存储桶数。Among them, rx is the label fingerprint remainder corresponding to the preset data member, h0 (x) is the index of the first candidate bucket, x is the preset data member, is the XOR operator, "mod" is the modulo operator, H(rx ) is the hash value corresponding to the remainder of the tag fingerprint, and the value range of the hash value H(rx ) is [0,..., M-1]; h1 (x) is the index of the second candidate bucket, and the value range of h1 (x) is [0,...,M-1], where M is the smallest value greater than m Power of 2, m is the number of buckets corresponding to the cuckoo hash table.
进一步,在所述步骤A30中,在获取到第一候选存储桶的索引和第二候选存储桶的索引后,可以根据第一候选存储桶和第二候选存储桶确定预设数据成员对应的第一标签和第二标签,其中,所述第一标签为根据第一候选存储桶确定,第二标签为根据第一候选存储桶确定。此外,由标签用于表示标签指纹对应的候选存储桶索引与存储桶数的关系可知,第一标签可以根据第一候选存储桶的索引与存储桶数m的对应关系确定,第二标签可以根据第二候选存储桶的索引与存储桶数m的对应关系确定。Further, in the step A30, after the index of the first candidate storage bucket and the index of the second candidate storage bucket are obtained, the first candidate storage bucket and the second candidate storage bucket may be determined according to the first candidate storage bucket corresponding to the preset data member. A label and a second label, wherein the first label is determined according to the first candidate storage bucket, and the second label is determined according to the first candidate storage bucket. In addition, from the relationship between the index of candidate buckets corresponding to the label fingerprint and the number of buckets, the first label can be determined according to the corresponding relationship between the index of the first candidate bucket and the number of buckets m, and the second label can be determined according to The corresponding relationship between the index of the second candidate bucket and the number of buckets m is determined.
在本实施例的一个具体实现方式中,可以分别将第一候选存储桶的索引h0(x)和第二候选存储桶的索引h1(x)与存储桶数m进行比较,若干第一候选存储桶的索引h0(x)大于或者等于存储桶数m,则第一标签为0,反之,若干第一候选存储桶的索引h0(x)小于存储桶数m,则第一标签为1,同理,若干第二候选存储桶的索引h1(x)大于或者等于存储桶数m,则第二标签为0,反之,若干第一候选存储桶的索引h1(x)小于存储桶数m,则第二标签为1。由此,所述第一标签的计算公式和第二标签的计算公式可以分别为:In a specific implementation manner of this embodiment, the index h0 (x) of the first candidate storage bucket and the index h1 (x) of the second candidate storage bucket may be compared with the number of buckets m, and a number of first candidate storage buckets may be compared. If the index h0 (x) of the candidate bucket is greater than or equal to the number of buckets m, then the first label is 0, otherwise, the index h0 (x) of several first candidate buckets is less than the number of buckets m, then the first label is 1. Similarly, if the index h1 (x) of several second candidate buckets is greater than or equal to the number of buckets m, then the second label is 0, otherwise, the index h1 (x) of several first candidate buckets is less than the number of buckets m, then the second label is 1. Thus, the first label The calculation formula of and the calculation formula of the second label can be respectively:
其中,为第一标签,为第二标签,h0(x)为第一候选存储桶的索引,h1(x)为第二候选存储桶的索引,rx为标签指纹余数。in, for the first label, is the second label, h0 (x) is the index of the first candidate bucket, h1 (x) is the index of the second candidate bucket, and rx is the remainder of the label fingerprint.
进一步,在所述步骤A50中,在确定第一标签和第二标签后,可以基于存储桶数分别对第一候选存储桶的索引和第二候选存储桶的索引进行修正,并将修正后的第一候选存储桶的索引和第二存储候选桶的索引作为预设数据成员对应的第一候选存储桶的索引和第二存储候选桶的索引,其中,修正后的第一候选存储桶的索引和修正后的第二候选存储桶的索引均在[0,...,m-1]。在本实施例的一个实现方式中,所述第一候选存储桶和第二存储候选桶的修正公式分别为:Further, in the step A50, after the first label and the second label are determined, the index of the first candidate storage bucket and the index of the second candidate storage bucket may be respectively revised based on the number of buckets, and the revised index The index of the first candidate storage bucket and the index of the second storage candidate bucket are taken as the index of the first candidate storage bucket and the index of the second storage candidate bucket corresponding to the preset data member, wherein the revised index of the first candidate storage bucket and the indices of the revised second candidate bucket are in [0,...,m-1]. In an implementation of this embodiment, the correction formulas of the first candidate storage bucket and the second storage candidate bucket are respectively:
其中,h0(x)为第一候选存储桶的索引,h1(x)为第二候选存储桶的索引,m为存储桶数,“mod”是求模操作符。Among them, h0 (x) is the index of the first candidate bucket, h1 (x) is the index of the second candidate bucket, m is the number of buckets, and "mod" is the modulo operator.
进一步,在所述步骤A60中,在获取到第一候选存储桶的索引和第二候选存储桶的索引后,可以根据第一候选存储桶的索引和第二候选存储桶的索引确定第一候选存储桶和第二候选存储桶。在确定第一候选存储桶和第二候选存储桶后,可以检测所述第一候选存储桶h0(x)和所述第二候选存储桶h1(x)是否存在空闲存储位置。其中,所述检测所述第一候选存储桶h0(x)和所述第二候选存储桶h1(x)是否存在空闲存储位置的过程可以为:依次搜索第一候选存储桶h0(x)和所述第二候选存储桶h1(x),搜索第一候选存储桶h0(x)和所述第二候选存储桶h1(x)中各存储位置的存储数值是否为预设数值,若搜索到存储数字为预设数值的存储位置,在将该存储位置判定为空闲存储位置。其中,预设数值表示该存储位置为空闲存储位置,未存储元素的指纹,所述预设数值为预先设置的,例如,0等。Further, in the step A60, after the index of the first candidate storage bucket and the index of the second candidate storage bucket are obtained, the first candidate storage bucket can be determined according to the index of the first candidate storage bucket and the index of the second candidate storage bucket bucket and the second candidate bucket. After the first candidate storage bucket and the second candidate storage bucket are determined, it may be detected whether the first candidate storage bucket h0 (x) and the second candidate storage bucket h1 (x) have free storage locations. Wherein, the process of detecting whether the first candidate storage bucket h0 (x) and the second candidate storage bucket h1 (x) have free storage locations may be: sequentially searching for the first candidate storage bucket h0 ( x) and the second candidate storage bucket h1 (x), search whether the storage value of each storage location in the first candidate storage bucket h0 (x) and the second candidate storage bucket h1 (x) is a predetermined value Set the value. If a storage location where the stored number is the preset value is searched, the storage location is determined as a free storage location. The preset value indicates that the storage location is a free storage location, and the fingerprint of the element is not stored, and the preset value is preset, for example, 0 and the like.
进一步,在所述步骤A70中,当存在空闲存储位置时,获取各空闲存储位置,并在获取到的所有空闲存储位置中选取一个空闲存储位置,并确定该空闲存储位置对应的候选存储桶,将候选存储桶对应的标签指纹存储于该空闲存储位置。其中,所述空闲存储位置可以为第一候选存储桶h0(x)中的存储位置,也可以为第二候选存储桶h1(x)中的存储位置,例如,当空闲存储位置为第一候选存储桶h0(x)中的存储位置时,将第一标签指纹存储于空闲存储位置;当空闲存储位置为第二候选存储桶h1(x)中的存储位置时,将第二标签指纹存储于空闲存储位置。此外,在将预设数据成员对应的标签指纹存储于该空闲存储位置后,完成该预设数据成员对应的插入操作。Further, in the step A70, when there is an idle storage location, each idle storage location is acquired, and an idle storage location is selected from all the acquired idle storage locations, and a candidate storage bucket corresponding to the idle storage location is determined, Store the tag fingerprint corresponding to the candidate bucket in the free storage location. The free storage location may be the storage location in the first candidate storage bucket h0 (x), or may be the storage location in the second candidate storage bucket h1 (x), for example, when the idle storage location is the first candidate storage bucket h 0 (x) When the storage location in a candidate storage bucket h0 (x), the first tag fingerprint is stored in the free storage location; when the idle storage location is the storage location in the second candidate storage bucket h1 (x), the second Tag fingerprints are stored in free memory locations. In addition, after the tag fingerprint corresponding to the preset data member is stored in the free storage location, the insertion operation corresponding to the preset data member is completed.
进一步,在所述步骤A80中,当未存在空闲存储位置时,在所述第一候选存储桶和所述第二候选存储桶中选取一目标候选存储桶(即当前存储桶),并将该目标候选存储桶对应的标签指纹作为选定标签指纹。例如,将第一候选存储桶作为目标候选存储桶时,第一标签指纹为选定标签指纹(即新标签指纹f);将第二候选存储桶作为目标候选存储桶时,第二标签指纹为选定标签指纹。此外,在选取到目标候选存储桶之前,设置数据成员移动次数为0。Further, in the step A80, when there is no free storage location, select a target candidate storage bucket (ie the current storage bucket) from the first candidate storage bucket and the second candidate storage bucket, and store the The label fingerprint corresponding to the target candidate bucket is used as the selected label fingerprint. For example, when the first candidate bucket is used as the target candidate bucket, the first label fingerprint is the selected label fingerprint (ie, the new label fingerprint f); when the second candidate bucket is used as the target candidate bucket, the second label fingerprint is Selected label fingerprint. In addition, before selecting the target candidate bucket, set the number of data member moves to 0.
进一步,在所述步骤A90中,在目标候选存储桶中随机选取一存储位置,将该存储位置存储的标签指纹作为目标标签指纹(即旧标签指纹),并将该预设数据成员对应的选定标签指纹存储于所述目标标签指纹对应的存储位置。此外,在目标候选存储桶中随机选取一存储位置之前,将移动次数自增1,并判断移动次数是否超过预设次数阈值,若移动次数超过预设次数阈值,则判定预设数据成员插入失败,若移动次数未超过预设次数阈值,则执行步骤A100。其中,所述预设次数阈值可以根据实际情况确定,例如,预设次数阈值为500等。Further, in the step A90, a storage location is randomly selected in the target candidate storage bucket, the label fingerprint stored in this storage location is used as the target label fingerprint (ie the old label fingerprint), and the corresponding selection of the preset data member is selected. The fixed tag fingerprint is stored in the storage location corresponding to the target tag fingerprint. In addition, before randomly selecting a storage location in the target candidate storage bucket, the number of moves is automatically incremented by 1, and it is determined whether the number of moves exceeds the preset number of times threshold. If the number of moves exceeds the preset number of times threshold, it is determined that the insertion of the preset data member failed , if the number of moves does not exceed the preset number of times threshold, step A100 is performed. The preset number of times threshold may be determined according to actual conditions, for example, the preset number of times threshold is 500 or the like.
进一步,在所述步骤A100中,在目标候选存储桶中选取到目标标签指纹后,采用异或操作确定所述目标标签指纹对应的参考存储桶(即将参考存储桶作为当前存储桶h)及参考标签指纹(即将参考标签指纹作为旧指纹g′)。在确定参考存储桶之前,需要根据目标标签指纹对应的目标标签,调整目标候选存储桶的索引。其中,如果目标标签的标签值为0,则目标候选存储桶的索引保持不变,如果目标标签的标签值为1,则目标候选存储桶的索引调整为目标候选存储桶的索引加m。例如,目标标签g为0,目标候选存储桶的索引为i,那么调整后的目标候选存储桶的索引仍然为i;目标标签g为1,目标候选存储桶的索引为i,那么调整后的目标候选存储桶的索引仍然为i+m。此外,在确定调整后的目标候选存储桶的索引后,根据目标标签指纹的目标标签指纹余数以及调整后的目标候选存储桶的索引,计算参考存储桶的索引,参考存储桶为所述目标标签指纹对应的数据成员对应的另一个候选存储桶。其中,参考存储桶的索引j的计算公式可以为:Further, in the step A100, after the target tag fingerprint is selected in the target candidate storage bucket, the XOR operation is used to determine the reference storage bucket (that is, the reference storage bucket is taken as the current storage bucket h) and the reference storage bucket corresponding to the target tag fingerprint. Tag fingerprint (that is, refer to the tag fingerprint as the old fingerprint g'). Before determining the reference bucket, the index of the target candidate bucket needs to be adjusted according to the target label corresponding to the target label fingerprint. Among them, if the tag value of the target tag is 0, the index of the target candidate bucket remains unchanged; if the tag value of the target tag is 1, the index of the target candidate bucket is adjusted to the index of the target candidate bucket plus m. For example, if the target label g is 0 and the index of the target candidate bucket is i, then the index of the adjusted target candidate bucket is still i; the target label g is 1, and the index of the target candidate bucket is i, then the adjusted The index of the target candidate bucket is still i+m. In addition, after determining the index of the adjusted target candidate bucket, calculate the index of the reference bucket according to the target label fingerprint remainder of the target label fingerprint and the adjusted index of the target candidate bucket, where the reference bucket is the target label Another candidate bucket corresponding to the data member corresponding to the fingerprint. Wherein, the calculation formula of the index j of the reference bucket can be:
其中,i为调整后的目标候选存储桶的索引,r为目标标签指纹余数,H(r)为目标标签指纹余数对应的哈希值,H(r)的取值范围为[0,...,M-1],是异或操作符,“mod”是求模操作符。Among them, i is the index of the adjusted target candidate bucket, r is the target label fingerprint remainder, H(r) is the hash value corresponding to the target label fingerprint remainder, and the value range of H(r) is [0, .. .,M-1], is the XOR operator, and "mod" is the modulo operator.
进一步,在参考存储桶后,根据目标标签指纹的目标标签指纹余数和参考存储桶的索引,计算参考存储桶对应的参考标签指纹。其中,参考标签指纹的计算过程可以为:检查参考存储桶的索引j是否小于m,如果j<m,参考标签指纹g′的标签值设置为0;否则参考标签指纹g′的标签值设置为1,计算公式如下:Further, after referring to the bucket, the reference label fingerprint corresponding to the reference bucket is calculated according to the target label fingerprint remainder of the target label fingerprint and the index of the reference bucket. The calculation process of the reference tag fingerprint may be: check whether the index j of the reference bucket is less than m, if j<m, the tag value of the reference tag fingerprint g' is set to 0; otherwise, the tag value of the reference tag fingerprint g' is set to 1. The calculation formula is as follows:
其中,g′为参考标签指纹,r为目标标签指纹余数,j为参考存储桶的索引,m为存储桶数。Among them, g' is the reference label fingerprint, r is the remainder of the target label fingerprint, j is the index of the reference bucket, and m is the number of buckets.
在获取到参考标签指纹的参考标签后,对所述考存储桶的索引j进行调整,调整后的考存储桶的索引j在范围[0,...,m-1]中,其中,考存储桶的索引j的调整公式为:After the reference label of the reference label fingerprint is obtained, the index j of the test bucket is adjusted, and the adjusted index j of the test bucket is in the range [0,...,m-1], where the test bucket is in the range [0,...,m-1]. The formula for adjusting the index j of the bucket is:
j=j mod m (8)j=j mod m (8)
其中,j为参考存储桶的索引,m为存储桶数,“mod”为是求模操作符。Among them, j is the index of the reference bucket, m is the number of buckets, and "mod" is the modulo operator.
进一步,在获取到参考存储桶后,检测参考存储桶是否均有具有空闲存储位置,若参考存储桶具有空闲存储位置,则执行步骤A110;若参考存储桶未具有空闲存储位置,则执行步骤A120。Further, after obtaining the reference storage buckets, it is detected whether all the reference storage buckets have free storage locations. If the reference storage buckets have idle storage locations, step A110 is performed; if the reference storage buckets do not have idle storage locations, step A120 is performed. .
进一步,在所述步骤A110中,将参考标签指纹存储于该空闲存储位置,已完成所述预设数据成员的插入操作。Further, in the step A110, the reference tag fingerprint is stored in the free storage location, and the insertion operation of the preset data member has been completed.
进一步,在所述步骤A120中,在所述参考存储桶内选取一目标存储位置,将参考标签指纹存储于该目标存储位置;将该目标存储位置对应的标签指纹作为参考标签指纹,将该参考存储桶作为目标候选存储桶,继续执行步骤A100直至参考存储桶具有空闲存储位置或移动次数超过预设次数阈值。此外,在所述参考存储桶内选取一目标存储位置,将参考标签指纹存储于该目标存储位置之前,将所述移动次数自增1,以统计预设成员元素的插入操作对应的成员移动次数,用于衡量预设成员元素对应的插入操作是否结束,避免插入操作进入死循环。Further, in the step A120, a target storage location is selected in the reference storage bucket, and the reference label fingerprint is stored in the target storage location; the label fingerprint corresponding to the target storage location is used as the reference label fingerprint, and the reference label fingerprint is used as the reference label fingerprint. Taking the bucket as the target candidate bucket, step A100 is continued until the reference bucket has a free storage location or the number of moves exceeds the preset number of times threshold. In addition, a target storage location is selected in the reference storage bucket, and before the reference tag fingerprint is stored in the target storage location, the number of movements is automatically incremented by 1 to count the number of member movements corresponding to the insertion operation of the preset member element. , which is used to measure whether the insertion operation corresponding to the preset member element has ended, so as to prevent the insertion operation from entering an infinite loop.
举例说明加减布谷鸟过滤器的元素插入方法:An example to illustrate the element insertion method of the addition and subtraction cuckoo filter:
示例1:如图4所示,当插入元素x时,首先,采用公式(1)和(2)计算元素x的标签指纹余数rx以及两个候选存储桶索引h0(x)=6和h1(x)=4;其次,采用公式(3)、(4)和(5)计算元素x的两个标签指纹和同时调整两个候选存储桶索引h0(x)和h1(x)在范围[0,…,4]中,因此调整后的候选存储桶索引为h0(x)=1和h1(x)=4;接着,搜索两个候选存储桶h0(x)=1和h1(x)=4,发现该两个候选存储桶均包含空闲存储位置;最后,随机选择一个空闲候选存储桶h0(x)=1,存储对应的标签指纹在该存储桶中。Example 1: As shown in Figure 4, when inserting element x, first, formulas (1) and (2) are used to calculate the label fingerprint remainder rx of element x and two candidate bucket indices h0 (x)=6 and h1 (x)=4; secondly, formulas (3), (4) and (5) are used to calculate the two label fingerprints of element x and At the same time, the two candidate bucket indices h0 (x) and h1 (x) are adjusted in the range [0,...,4], so the adjusted candidate bucket indices are h0 (x)=1 and h1 ( x)=4; then, search two candidate buckets h0 (x)=1 and h1 (x)=4, and find that both candidate buckets contain free storage locations; finally, randomly select an idle candidate storage Bucket h0 (x)=1, store the corresponding tag fingerprint in that bucket.
示例2:如图5所示,当插入元素y时,首先,采用公式(1)和(2)计算元素y的标签指纹余数ry以及两个候选存储桶索引h0(y)=3和h1(y)=6;其次,采用公式(3)、(4)和(5)计算元素y的两个标签指纹和同时调整两个候选存储桶索引h0(y)和h1(y)在范围[0,…,4]中,因此调整后的候选存储桶索引为h0(y)=3和h1(y)=1;接着,搜索两个候选存储桶索引h0(y)=3和h1(y)=1,发现该两个存储桶均不包含空闲存储位置,随机选择一个候选存储桶h1(y)=1,从该存储桶中移出旧指纹存储新指纹在该存储桶中;接着,基于旧指纹和当前存储桶索引h1(a)=1,假设标签指纹的标签值为1且余数为ra,采用公式(6)、(7)和(8)计算另一个候选存储桶索引h0(a)=4及其对应的标签指纹(即标签值设置为0,原因是h0(a)=4<m=5);最后,搜索候选存储桶h0(a)=4,发现该候选存储桶包含空闲存储位置,存储对应的标签指纹在该存储桶中。Example 2: As shown in Figure 5, when inserting element y, first, formulas (1) and (2) are used to calculate the label fingerprint remainder ry of element y and two candidate bucket indices h0 (y) = 3 and h1 (y)=6; secondly, formulas (3), (4) and (5) are used to calculate the two tag fingerprints of element y and At the same time, the two candidate bucket indices h0 (y) and h1 (y) are adjusted in the range [0,...,4], so the adjusted candidate bucket indices are h0 (y)=3 and h1 ( y)=1; then, search for two candidate bucket indices h0 (y)=3 and h1 (y)=1, find that neither of the two buckets contains free storage locations, and randomly select a candidate bucket h1 (y)=1, remove old fingerprints from this bucket store new fingerprint in that bucket; then, based on the old fingerprint and the current bucket index h1 (a) = 1, assuming the tag fingerprint The label value is 1 and the remainder isra , and formulas (6), (7) and (8) are used to calculate another candidate bucket index h0 (a)=4 and its corresponding label fingerprint (which is The tag value is set to 0, the reason is that h0 (a)=4<m=5); finally, the candidate bucket h0 (a)=4 is searched, and it is found that the candidate bucket contains a free storage location, and the corresponding tag fingerprint is stored in that bucket.
进一步,在本实施例的一个实现方式中,在接收到插入操作时,可以判断存储桶数m是否是2的幂。如果m是2的幂,标签布谷鸟过滤器退化为现有布谷鸟过滤器,如果m不是2的幂,则按照标签鸟过滤器执行插入操作的过程执行。其中,当标签布谷鸟过滤器退化为现有布谷鸟过滤器时(即当存储桶数为2的幂时),如图6所示,插入操作的实现过程可以为:Further, in an implementation manner of this embodiment, when an insert operation is received, it may be determined whether the number m of storage buckets is a power of 2. If m is a power of 2, the label cuckoo filter degenerates into the existing cuckoo filter, if m is not a power of 2, the insertion operation is performed according to the process of the label bird filter. Among them, when the tag cuckoo filter degenerates into the existing cuckoo filter (that is, when the number of buckets is a power of 2), as shown in Figure 6, the implementation process of the insertion operation can be as follows:
B10、计算元素x的指纹fx和第一候选存储桶索引h0(x);采用哈希函数G(x),计算元素x的指纹fx和候选存储桶索引h0(x),计算公式如下:B10. Calculate the fingerprint fx of the element x and the first candidate bucket index h0 (x); use the hash function G(x) to calculate the fingerprint fx of the element x and the candidate bucket index h0 (x), calculate The formula is as follows:
h0(x):fx=G(x) (9)h0 (x):fx =G(x) (9)
其中,哈希值G(x)中的低位数值为fx,高位数值为h0(x),“:”为数值连接符。第一候选存储桶索引h0(x)的范围为[0,...,M-1],其中,M是2的幂且m等于M。Among them, the low digit value in the hash value G(x) is fx , the high digit value is h0 (x), and ":" is a numerical connector. The first candidate bucket index h0 (x) is in the range [0, . . . , M-1], where M is a power of 2 and m is equal to M.
B20、计算元素x的第二候选存储桶索引h1(x)。基于元素x的指纹fx和第一候选存储桶索引h0(x),采用异或操作,计算第二候选存储桶索引h1(x),计算公式如下:B20. Calculate the second candidate bucket index h1 (x) of the element x. Based on the fingerprint fx of the element x and the first candidate bucket index h0 (x), the XOR operation is used to calculate the second candidate bucket index h1 (x), and the calculation formula is as follows:
其中,哈希值H(rx)的范围为[0,...,M-1],是异或操作符,“mod”是求模操作符,第二候选存储桶索引h1(x)的范围为[0,...,M-1],其中,M是2的幂且m等于M。Among them, the range of the hash value H(rx ) is [0,...,M-1], is the XOR operator, "mod" is the modulo operator, and the second candidate bucket index h1 (x) is in the range [0,...,M-1], where M is a power of 2 and m equal to M.
B30、判断两个候选存储桶h0(x)和h1(x)是否至少一个包含空闲存储位置;搜索每个候选存储桶h0(x)或h1(x),检查是否包含空闲存储位置;如果至少一个候选存储桶包含空闲存储位置,进入步骤B40;否则,进入步骤B50。B30. Determine whether at least one of the two candidate buckets h0 (x) and h1 (x) contains a free storage location; search each candidate bucket h0 (x) or h1 (x) to check whether it contains free storage location; if at least one candidate storage bucket contains a free storage location, go to step B40; otherwise, go to step B50.
B40、存储元素x的指纹fx在其中一个空闲候选存储桶h0(x)或h1(x)中,元素x插入结束。B40. The fingerprint fx of the storage element x is in one of the free candidate storage buckets h0 (x) or h1 (x), and the insertion of the element x ends.
B50、随机选择一个候选存储桶h0(x)或h1(x),设置新指纹f为指纹fx(即f=fx)。设置目标候选存储桶h为h0(x)或h1(x),并设置元素移动次数为0。B50. Randomly select a candidate storage bucket h0 (x) or h1 (x), and set the new fingerprint f as the fingerprint fx (ie, f=fx ). Set the target candidate bucket h to h0 (x) or h1 (x), and set the number of element moves to 0.
B60、元素移动次数自增1,判断元素移动次数是否超过500次;如果超过500次,进入步骤B70;否则,进入步骤B80。B60, the number of element moves is incremented by 1, and it is judged whether the number of element moves exceeds 500 times; if it exceeds 500 times, go to step B70; otherwise, go to step B80.
B70、元素x插入失败,元素插入结束。B70. The insertion of element x fails, and the element insertion ends.
B80、随机选择目标候选存储桶h的一个存储位置,移出该存储位置的目标指纹g,存储新指纹f在该存储位置。B80. Randomly select a storage location of the target candidate storage bucket h, remove the target fingerprint g from the storage location, and store the new fingerprint f in the storage location.
B90、计算目标指纹g的另一个参考选存储桶索引,设置参考选存储桶索引作为该目标候选存储桶h。其中,基于目标指纹g和目标候选存储桶索引i(即i=h),采用异或操作,计算参考选存储桶索引j,计算公式如下:B90. Calculate another reference bucket index of the target fingerprint g, and set the reference bucket index as the target candidate bucket h. Among them, based on the target fingerprint g and the target candidate bucket index i (that is, i=h), the XOR operation is used to calculate the reference selection bucket index j, and the calculation formula is as follows:
其中,哈希值H(g)的范围为[0,...,M-1],是异或操作符,“mod”是求模操作符;将参考储桶的索引j设置为目标候选存储桶索引h(即h=j)。Among them, the range of the hash value H(g) is [0,...,M-1], is the XOR operator, and "mod" is the modulo operator; set the index j of the reference bucket to the target candidate bucket index h (ie, h=j).
B100、判断目标候选存储桶h是否包含至少一个空闲存储位置。如果包含至少一个空闲存储位置,进入步骤B110;否则,进入步骤B120。B100. Determine whether the target candidate storage bucket h contains at least one free storage location. If it contains at least one free storage location, go to step B110; otherwise, go to step B120.
B110、存储目标指纹g在目标候选存储桶h中,元素插入结束。B110. Store the target fingerprint g in the target candidate bucket h, and the element insertion ends.
B120、跳转至步骤B60,递归移出其他旧指纹并插入新指纹,直至移动次数超过500次。B120. Jump to step B60, recursively remove other old fingerprints and insert new fingerprints until the number of moves exceeds 500 times.
在本实施例的一个实现方式中,所述数据成员管理操作为查询操作;如图7所示,所述当标签布谷鸟过滤器接收到数据成员管理操作时,基于异或操作确定预设数据成员对应的两个候选存储桶以及两个标签指纹,并基于确定得到的两个候选存储桶以及两个标签指纹执行所述数据成员管理操作具体包括:In an implementation of this embodiment, the data member management operation is a query operation; as shown in FIG. 7 , when the tag cuckoo filter receives the data member management operation, the preset data is determined based on the XOR operation Two candidate buckets and two label fingerprints corresponding to the member, and performing the data member management operation based on the determined two candidate buckets and two label fingerprints specifically includes:
C10、确定所述预设数据成员对应的标签指纹余数及第一候选存储桶;C10. Determine the remainder of the label fingerprint corresponding to the preset data member and the first candidate storage bucket;
C20、基于所述标签指纹余数以及所述第一候选存储桶,采用异或操作确定所述预设数据成员对应到的第二候选存储桶;C20. Based on the remainder of the tag fingerprint and the first candidate storage bucket, adopt an exclusive OR operation to determine the second candidate storage bucket corresponding to the preset data member;
C30、根据所述第一候选存储桶确定预设数据成员对应的第一标签,以及根据第二候选存储桶确定预设数据成员对应的第二标签;C30. Determine the first label corresponding to the preset data member according to the first candidate storage bucket, and determine the second label corresponding to the preset data member according to the second candidate storage bucket;
C40、根据所述第一标签及所述标签指纹余数确定预设数据成员对应的第一标签指纹,以及根据所述第二标签及所述标签指纹余数确定所述预设数据成员对应的第二标签指纹;C40. Determine a first tag fingerprint corresponding to a preset data member according to the first tag and the tag fingerprint remainder, and determine a second tag corresponding to the preset data member according to the second tag and the tag fingerprint remainder label fingerprint;
C50、根据所述存储桶数分别对第一候选存储桶和第二候选存储桶进行修正,并将修正后的第一候选存储桶和第二存储候选桶作为预设数据成员对应的第一候选存储桶和第二存储候选桶;C50. Modify the first candidate storage bucket and the second candidate storage bucket respectively according to the number of storage buckets, and use the corrected first candidate storage bucket and the second storage candidate bucket as the first candidate corresponding to the preset data member the bucket and the second storage candidate bucket;
C60、在所述第一候选存储桶和所述第二候选存储桶中分别查找该预设数据成员对应的第一标签指纹以及第二标签指纹;C60. Search for the first tag fingerprint and the second tag fingerprint corresponding to the preset data member in the first candidate storage bucket and the second candidate storage bucket, respectively;
C70、若查找到该预设数据成员对应的第一标签指纹或第二标签指纹,则提示数据成员查询成功;C70. If the first tag fingerprint or the second tag fingerprint corresponding to the preset data member is found, prompt the data member to query successfully;
C80、若未查找到预设数据成员对应的第一标签指纹或第二标签指纹,则提示数据成员查询失败。C80. If the first tag fingerprint or the second tag fingerprint corresponding to the preset data member is not found, prompt that the data member query fails.
具体地,所述步骤C10-步骤C50的执行过程与步骤A10-A50的执行过程相同,这里就不在赘述,具体可以参照步骤A10-A50的说明。Specifically, the execution process of the steps C10-step C50 is the same as the execution process of the steps A10-A50, which will not be repeated here, and can refer to the description of the steps A10-A50 for details.
进一步,在获取到第一候选存储桶h0(x)和第二候选存储桶h1(x)后,搜索第一候选存储桶h0(x)是否匹配该预数据成员x的第一标签指纹以及第二候选存储桶h1(x)是否匹配该预数据成员x的第二标签指纹如果匹配第一标签指纹或第二标签指纹表明预设数据成员元素x在集合中,返回查询结果为真(True),预设数据成员查询结束;如果未存储匹配第一标签指纹或第二标签指纹表明预设数据成员元素x不在集合中,返回查询结果为假(False),元素查询结束。Further, after obtaining the first candidate storage bucket h0 (x) and the second candidate storage bucket h1 (x), search whether the first candidate storage bucket h0 (x) matches the first label of the pre-data member x fingerprint and whether the second candidate bucket h1 (x) matches the second tag fingerprint of the pre-data member x If matching the first tag fingerprint or second tag fingerprint Indicates that the preset data member element x is in the set, the returned query result is True, and the preset data member query ends; if the fingerprint matching the first tag is not stored or second tag fingerprint Indicates that the preset data member element x is not in the set, the returned query result is false (False), and the element query ends.
举例说明加减布谷鸟过滤器的元素查询方法:An example to illustrate the element query method of adding and subtracting cuckoo filters:
如图8所示,当查询元素x时,当查询元素x时,首先,采用公式(1)和(2)计算元素x的两个候选存储桶索引h0(x)=6和h1(x)=4;其次,采用公式(3)、(4)和(5)计算两个候选存储桶h0(x)和h1(x)对应的两个标签指纹和同时调整两个候选存储桶索引为h0(x)=1和h1(x)=4;接着,搜索两个候选存储桶h0(x)=1和h1(x)=4是否匹配对应的两个标签指纹和发现候选存储桶h0(x)=1存在匹配的标签指纹最后,查询表明元素x在集合中,返回查询结果为真。As shown in FIG. 8, when querying element x, first, two candidate bucket indices h0 (x)=6 and h1 ( x)=4; secondly, the two label fingerprints corresponding to the two candidate buckets h0 (x) and h1 (x) are calculated using formulas (3), (4) and (5) and Adjust the two candidate bucket indices to h0 (x)=1 and h1 (x)=4 at the same time; then, search whether the two candidate buckets h0 (x)=1 and h1 (x)=4 match The corresponding two tag fingerprints and It is found that there is a matching tag fingerprint in the candidate bucket h0 (x) = 1 Finally, the query indicates that element x is in the set, returning the query result as true.
在本实施例的一个实现方式中,在接收到插入操作时,可以判断存储桶数m是否是2的幂。如果m是2的幂,标签布谷鸟过滤器退化为现有布谷鸟过滤器,如果m不是2的幂,则按照标签鸟过滤器执行查询操作的过程执行。其中,当标签布谷鸟过滤器退化为现有布谷鸟过滤器时,如图7所示,所述查询操作的实现过程可以为:In an implementation manner of this embodiment, when an insert operation is received, it may be determined whether the number m of buckets is a power of 2. If m is a power of 2, the tag cuckoo filter degenerates into the existing cuckoo filter, if m is not a power of 2, the query operation process is performed according to the tag bird filter. Wherein, when the tag cuckoo filter degenerates into an existing cuckoo filter, as shown in FIG. 7 , the implementation process of the query operation may be:
D10、计算元素x的指纹fx和第一候选存储桶索引h0(x)。采用公式(9)计算元素x的指纹fx和候选存储桶索引h0(x)其中h0(x)的范围为[0,...,M-1],其中,M是2的幂且m等于M。D10. Calculate the fingerprint fx of the element x and the first candidate bucket index h0 (x). The fingerprint fx and the candidate bucket index h0 (x) of element x are calculated using formula (9) where h0 (x) is in the range [0,...,M-1], where M is a power of 2 And m is equal to M.
D20、计算元素x的第二个候选存储桶索引h1(x)。采用公式(10)计算第二候选存储桶索引h1(x),其中,h1(x)的范围为[0,...,M-1],M是2的幂且m等于M。D20. Calculate the second candidate bucket index h1 (x) of the element x. The second candidate bucket index h1 (x) is calculated using formula (10), where h1 (x) is in the range [0, . . . , M-1], M is a power of 2 and m is equal to M.
D30、搜索两个候选存储桶h0(x)和h1(x)是否匹配指纹fx;如果匹配指纹fx,进入步骤D40;否则,进入步骤D50。D30. Search whether the two candidate storage buckets h0 (x) and h1 (x) match the fingerprint fx ; if it matches the fingerprint fx , go to step D40; otherwise, go to step D50.
D40、表明元素x在集合中,返回查询结果为真,元素查询结束。D40. Indicates that the element x is in the set, the returned query result is true, and the element query ends.
D50、表明元素x不在集合中,返回查询结果为假,元素查询结束。D50. Indicates that the element x is not in the set, the returned query result is false, and the element query ends.
在本实施例的一个实现方式中,所述数据成员管理操作为删除操作;如图9所示,所述当标签布谷鸟过滤器接收到数据成员管理操作时,基于异或操作确定预设数据成员对应的两个候选存储桶以及两个标签指纹,并基于确定得到的两个候选存储桶以及两个标签指纹执行所述数据成员管理操作具体包括:In an implementation of this embodiment, the data member management operation is a delete operation; as shown in FIG. 9 , when the tag cuckoo filter receives the data member management operation, the preset data is determined based on the XOR operation Two candidate buckets and two label fingerprints corresponding to the member, and performing the data member management operation based on the determined two candidate buckets and two label fingerprints specifically includes:
E10、确定所述预设数据成员对应的标签指纹余数及第一候选存储桶;E10. Determine the label fingerprint remainder and the first candidate storage bucket corresponding to the preset data member;
E20、基于所述标签指纹余数以及所述第一候选存储桶,采用异或操作确定所述预设数据成员对应到的第二候选存储桶;E20. Based on the label fingerprint remainder and the first candidate storage bucket, adopt an exclusive OR operation to determine the second candidate storage bucket corresponding to the preset data member;
E30、根据所述第一候选存储桶确定预设数据成员对应的第一标签,以及根据第二候选存储桶确定预设数据成员对应的第二标签;E30. Determine the first label corresponding to the preset data member according to the first candidate storage bucket, and determine the second label corresponding to the preset data member according to the second candidate storage bucket;
E40、根据所述第一标签及所述标签指纹余数确定预设数据成员对应的第一标签指纹,以及根据所述第二标签及所述标签指纹余数确定所述预设数据成员对应的第二标签指纹;E40. Determine a first tag fingerprint corresponding to a preset data member according to the first tag and the tag fingerprint remainder, and determine a second tag corresponding to the preset data member according to the second tag and the tag fingerprint remainder label fingerprint;
E50、根据所述存储桶数分别对第一候选存储桶和第二候选存储桶进行修正,并将修正后的第一候选存储桶和第二存储候选桶作为预设数据成员对应的第一候选存储桶和第二存储候选桶;E50. Modify the first candidate storage bucket and the second candidate storage bucket respectively according to the number of storage buckets, and use the corrected first candidate storage bucket and the second storage candidate bucket as the first candidate corresponding to the preset data member the bucket and the second storage candidate bucket;
E60、在所述第一候选存储桶和所述第二候选存储桶中分别查找该预设数据成员对应的第一标签指纹以及第二标签指纹;E60. Search for the first tag fingerprint and the second tag fingerprint corresponding to the preset data member in the first candidate storage bucket and the second candidate storage bucket, respectively;
E70、若查找到该预设数据成员对应的第一标签指纹或第二标签指纹,则删除查找到的预设数据成员对应的标签指纹,其中,所述标签指纹为第一标签指纹或第二标签指纹;E70. If the first tag fingerprint or the second tag fingerprint corresponding to the preset data member is found, delete the tag fingerprint corresponding to the found preset data member, wherein the tag fingerprint is the first tag fingerprint or the second tag fingerprint label fingerprint;
E80、若未查找到预设数据成员对应的第一标签指纹或第二标签指纹,则提示数据成员删除失败。E80. If the first tag fingerprint or the second tag fingerprint corresponding to the preset data member is not found, prompt that the deletion of the data member fails.
具体地,所述步骤E10-步骤E50的执行过程与步骤A10-A50的执行过程相同,这里就不在赘述,具体可以参照步骤A10-A50的说明。Specifically, the execution process of the steps E10-step E50 is the same as the execution process of the steps A10-A50, which is not repeated here, and can refer to the description of the steps A10-A50 for details.
进一步,在获取到第一候选存储桶h0(x)和第二候选存储桶h1(x)后,搜索第一候选存储桶h0(x)是否匹配该预数据成员x的第一标签指纹以及第二候选存储桶h1(x)是否匹配该预数据成员x的第二标签指纹如果匹配第一标签指纹或第二标签指纹删除查找到的预设数据成员对应的标签指纹,数据成员删除结束;如果未存储匹配第一标签指纹或第二标签指纹表删除数据成员失败,元素删除结束。Further, after obtaining the first candidate storage bucket h0 (x) and the second candidate storage bucket h1 (x), search whether the first candidate storage bucket h0 (x) matches the first label of the pre-data member x fingerprint and whether the second candidate bucket h1 (x) matches the second tag fingerprint of the pre-data member x If matching the first tag fingerprint or second tag fingerprint Delete the tag fingerprint corresponding to the found preset data member, and the deletion of the data member ends; if the matching first tag fingerprint is not stored or second tag fingerprint Table deletion of data member failed, element deletion ended.
举例说明加减布谷鸟过滤器的元素删除方法:An example to illustrate the element deletion method of the addition and subtraction cuckoo filter:
如图10所示,当删除元素y时,首先,采用公式(1)和(2)计算元素y的两个候选存储桶索引h0(y)=3和h1(y)=6;其次,采用公式(3)、(4)和(5)计算两个候选存储桶h0(y)和h1(y)对应的两个标签指纹和同时调整两个候选存储桶索引为h0(y)=3和h1(y)=1;接着,搜索两个候选存储桶h0(y)=3和h1(y)=1是否匹配对应的两个标签指纹和假设两个标签指纹和相等(即),发现候选存储桶h0(y)=3和h1(y)=1均存在匹配的标签指纹和最后,随机选择一个候选存储桶h1(y)=1,从该存储桶中删除一个匹配标签指纹删除元素y成功。As shown in Figure 10, when element y is deleted, first, two candidate bucket indices h0 (y)=3 and h1 (y)=6 for element y are calculated using formulas (1) and (2); secondly , using formulas (3), (4) and (5) to calculate the two label fingerprints corresponding to the two candidate buckets h0 (y) and h1 (y) and Adjust the two candidate bucket indexes at the same time to h0 (y)=3 and h1 (y)=1; then, search whether the two candidate buckets h0 (y)=3 and h1 (y)=1 match The corresponding two tag fingerprints and Assume two tag fingerprints and equal (ie ), it is found that there are matching tag fingerprints for both candidate buckets h0 (y)=3 and h1 (y)=1 and Finally, a candidate bucket h1 (y) = 1 is randomly selected, and a matching label fingerprint is removed from this bucket The deletion of element y succeeded.
进一步,标签布谷鸟过滤器的元素删除方法可确保正确删除元素,不会产生假阴性错误。如果两个插入元素的指纹相同,加减布谷鸟过滤器插入该两个元素的两个指纹在过滤器中。如果删除上述两个元素的一个标签指纹,另一个标签指纹仍在过滤器中,因此不会产生假阴性错误,可能产生假阳性错误。例如,在图10中,当元素y删除后查询元素g,由于元素g的标签指纹和与元素y的标签指纹和相等(即且),发现标签指纹在候选存储桶h0(g)=3中,查询表明元素g在集合中,返回查询结果为真。当查询元素y时,由于标签指纹(即)仍在候选存储桶h0(y)=3中,查询表明元素y在集合中,返回查询结果为真;但是,由于元素y已删除成功,上述查询结果为一个假阳性错误。虽然如此,标签布谷鸟过滤器的元素删除方法不会增加假阳性错误率,其假阳性错误率低,并与标准布鲁姆过滤器、计数布鲁姆过滤器、布谷鸟过滤器等的假阳性错误率相同。Further, the element removal method of the tag cuckoo filter ensures that elements are removed correctly without false negative errors. If the fingerprints of two inserted elements are the same, the addition and subtraction of the cuckoo filter inserts the two fingerprints of the two elements in the filter. If one of the label fingerprints of the above two elements is removed, the other label fingerprint is still in the filter, so no false negative errors will be generated, false positive errors may be generated. For example, in Figure 10, when element y is deleted, element g is queried, due to the label fingerprint of element g and tag fingerprint with element y and equal (ie and ), found the tag fingerprint In candidate bucket h0 (g)=3, the query indicates that element g is in the set, and the query result is returned as true. When querying for element y, due to tag fingerprinting (which is ) is still in the candidate bucket h0 (y)=3, the query indicates that the element y is in the set, and the returned query result is true; however, since the element y has been successfully deleted, the above query result is a false positive error. Nonetheless, the element removal method of the label cuckoo filter does not increase the false positive error rate, its false positive error rate is low, and is comparable to the false positive error rate of the standard bloom filter, count bloom filter, cuckoo filter, etc. The positive error rate is the same.
在本实施例的一个实现方式中,在接收到插入操作时,可以判断存储桶数m是否是2的幂。如果m是2的幂,标签布谷鸟过滤器退化为现有布谷鸟过滤器,如果m不是2的幂,则按照标签鸟过滤器执行删除操作的过程执行。其中,当标签布谷鸟过滤器退化为现有布谷鸟过滤器时,如图9所示,所述删除操作的实现过程可以为:In an implementation manner of this embodiment, when an insert operation is received, it may be determined whether the number m of buckets is a power of 2. If m is a power of 2, the tag cuckoo filter degenerates into the existing cuckoo filter, if m is not a power of 2, it is performed according to the process of performing the deletion operation of the tag bird filter. Wherein, when the tag cuckoo filter degenerates into an existing cuckoo filter, as shown in FIG. 9 , the implementation process of the deletion operation may be:
F10、计算元素x的指纹fx和第一候选存储桶索引h0(x)。采用公式(9)计算元素x的指纹fx和候选存储桶索引h0(x),其中h0(x)的范围为h1(x)的范围为[0,...,M-1],其中,M是2的幂且m等于MF10. Calculate the fingerprint fx of the element x and the first candidate bucket index h0 (x). Formula (9) is used to calculate the fingerprint fx of elementx and the candidate bucket index h0 (x), where h0 (x) is in the range of h1 (x) is in the range [0,...,M-1 ], where M is a power of 2 and m is equal to M
F20、计算元素x的第二个候选存储桶索引h1(x)。采用公式(10)计算第二候选存储桶索引h1(x),其中,h1(x)的范围为h1(x)的范围为[0,...,M-1],M是2的幂且m等于M。F20. Calculate the second candidate bucket index h1 (x) of the element x. The second candidate bucket index h1 (x) is calculated using formula (10), where h1 (x) is in the range of h1 (x) is in the range [0,...,M-1], and M is power of 2 and m equals M.
F30、搜索两个候选存储桶h0(x)和h1(x)是否匹配指纹fx。如果匹配指纹fx,进入步骤DF0;否则,进入步骤F50。F30. Search whether the two candidate buckets h0 (x) and h1 (x) match the fingerprint fx . If it matches the fingerprint fx , go to step DF0; otherwise, go to step F50.
F40、表明元素x在集合中,则删除查找到的预设数据成员对应的标签指纹fx,元素删除结束。F40 , indicating that the element x is in the set, delete the tag fingerprint fx corresponding to the found preset data member, and the element deletion ends.
F50、表明元素x不在集合中,返回删除结果为假,元素删除结束。F50. Indicates that the element x is not in the set, and returns that the deletion result is false, and the element deletion ends.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010360757.7ACN111552693B (en) | 2020-04-30 | 2020-04-30 | Tag cuckoo filter |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010360757.7ACN111552693B (en) | 2020-04-30 | 2020-04-30 | Tag cuckoo filter |
| Publication Number | Publication Date |
|---|---|
| CN111552693Atrue CN111552693A (en) | 2020-08-18 |
| CN111552693B CN111552693B (en) | 2023-04-07 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010360757.7AActiveCN111552693B (en) | 2020-04-30 | 2020-04-30 | Tag cuckoo filter |
| Country | Link |
|---|---|
| CN (1) | CN111552693B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112148928A (en)* | 2020-09-18 | 2020-12-29 | 鹏城实验室 | Cuckoo filter based on fingerprint family |
| CN112632337A (en)* | 2020-12-28 | 2021-04-09 | 南方科技大学 | Element management method applied to firework filter and firework filter |
| CN113360516A (en)* | 2021-08-11 | 2021-09-07 | 成都信息工程大学 | Set member management method based on first-in first-out and minimum active number strategy |
| CN113535706A (en)* | 2021-08-03 | 2021-10-22 | 重庆赛渝深科技有限公司 | Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter |
| CN113641681A (en)* | 2021-10-13 | 2021-11-12 | 南京大数据集团有限公司 | Space self-adaptive mass data query method |
| CN114880400A (en)* | 2022-05-17 | 2022-08-09 | 咪咕数字传媒有限公司 | Data synchronization verification method and device, computing equipment and storage medium |
| CN116126928A (en)* | 2021-11-11 | 2023-05-16 | 中国科学院声学研究所 | An Information Finding System Based on Variable Fingerprint Cuckoo Filter |
| CN116467307A (en)* | 2023-03-29 | 2023-07-21 | 济南大学 | Design method and system for cuckoo filter for reducing false positive rate |
| JP2025523303A (en)* | 2023-06-15 | 2025-07-18 | 泉城省実験室 | Cuckoo filters and how to insert, query, and delete data |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101655861A (en)* | 2009-09-08 | 2010-02-24 | 中国科学院计算技术研究所 | Hashing method based on double-counting bloom filter and hashing device |
| CN105630955A (en)* | 2015-12-24 | 2016-06-01 | 华中科技大学 | Method for efficiently managing members of dynamic data set |
| CN107515901A (en)* | 2017-07-24 | 2017-12-26 | 中国科学院信息工程研究所 | A chained log storage structure and its hash index structure, data operation method, server, and medium |
| CN109815234A (en)* | 2018-12-29 | 2019-05-28 | 杭州中科先进技术研究院有限公司 | A kind of multiple cuckoo filter under streaming computing model |
| CN110046164A (en)* | 2019-04-16 | 2019-07-23 | 中国人民解放军国防科技大学 | Index independent grain distribution filter, consistency grain distribution filter and operation method |
| CN110222088A (en)* | 2019-05-20 | 2019-09-10 | 华中科技大学 | Data approximation set representation method and system based on insertion position selection |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101655861A (en)* | 2009-09-08 | 2010-02-24 | 中国科学院计算技术研究所 | Hashing method based on double-counting bloom filter and hashing device |
| CN105630955A (en)* | 2015-12-24 | 2016-06-01 | 华中科技大学 | Method for efficiently managing members of dynamic data set |
| CN107515901A (en)* | 2017-07-24 | 2017-12-26 | 中国科学院信息工程研究所 | A chained log storage structure and its hash index structure, data operation method, server, and medium |
| CN109815234A (en)* | 2018-12-29 | 2019-05-28 | 杭州中科先进技术研究院有限公司 | A kind of multiple cuckoo filter under streaming computing model |
| CN110046164A (en)* | 2019-04-16 | 2019-07-23 | 中国人民解放军国防科技大学 | Index independent grain distribution filter, consistency grain distribution filter and operation method |
| CN110222088A (en)* | 2019-05-20 | 2019-09-10 | 华中科技大学 | Data approximation set representation method and system based on insertion position selection |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112148928A (en)* | 2020-09-18 | 2020-12-29 | 鹏城实验室 | Cuckoo filter based on fingerprint family |
| CN112148928B (en)* | 2020-09-18 | 2024-02-20 | 鹏城实验室 | Cuckoo filter based on fingerprint family |
| CN112632337B (en)* | 2020-12-28 | 2023-12-22 | 南方科技大学 | Element management method applied to firework filter and firework filter |
| CN112632337A (en)* | 2020-12-28 | 2021-04-09 | 南方科技大学 | Element management method applied to firework filter and firework filter |
| CN113535706A (en)* | 2021-08-03 | 2021-10-22 | 重庆赛渝深科技有限公司 | Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter |
| CN113535706B (en)* | 2021-08-03 | 2023-05-23 | 佛山赛思禅科技有限公司 | Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter |
| CN113360516A (en)* | 2021-08-11 | 2021-09-07 | 成都信息工程大学 | Set member management method based on first-in first-out and minimum active number strategy |
| CN113360516B (en)* | 2021-08-11 | 2021-11-26 | 成都信息工程大学 | Collection member management method |
| CN113641681A (en)* | 2021-10-13 | 2021-11-12 | 南京大数据集团有限公司 | Space self-adaptive mass data query method |
| CN116126928A (en)* | 2021-11-11 | 2023-05-16 | 中国科学院声学研究所 | An Information Finding System Based on Variable Fingerprint Cuckoo Filter |
| CN114880400A (en)* | 2022-05-17 | 2022-08-09 | 咪咕数字传媒有限公司 | Data synchronization verification method and device, computing equipment and storage medium |
| CN116467307A (en)* | 2023-03-29 | 2023-07-21 | 济南大学 | Design method and system for cuckoo filter for reducing false positive rate |
| CN116467307B (en)* | 2023-03-29 | 2024-02-23 | 济南大学 | Design method and system for cuckoo filter for reducing false positive rate |
| JP2025523303A (en)* | 2023-06-15 | 2025-07-18 | 泉城省実験室 | Cuckoo filters and how to insert, query, and delete data |
| Publication number | Publication date |
|---|---|
| CN111552693B (en) | 2023-04-07 |
| Publication | Publication Date | Title |
|---|---|---|
| CN111552693B (en) | Tag cuckoo filter | |
| CN112148928B (en) | Cuckoo filter based on fingerprint family | |
| CN111552692B (en) | An additive and subtractive cuckoo filter | |
| CN111046034B (en) | Method and system for managing memory data and maintaining data in memory | |
| US12335297B2 (en) | Systems and methods for rapidly generating security ratings | |
| CN103064906B (en) | File management method and device | |
| US20130151562A1 (en) | Method of calculating feature-amount of digital sequence, and apparatus for calculating feature-amount of digital sequence | |
| US8190591B2 (en) | Bit string searching apparatus, searching method, and program | |
| US10013312B2 (en) | Method and system for a safe archiving of data | |
| CN111247518A (en) | Database sharding | |
| WO2020211236A1 (en) | Read-write conflict resolution method and apparatus employing b+ tree and storage medium | |
| CN109189759B (en) | Data reading method, data query method, device and equipment in KV storage system | |
| CN107729371A (en) | The data directory and querying method of block chain, device, equipment and storage medium | |
| WO2022241813A1 (en) | Graph database construction method and apparatus based on graph compression, and related component | |
| CN113468080A (en) | Caching method, system and related device for full flash metadata | |
| WO2011137684A1 (en) | Search method and device based on information records of embedded system | |
| CN112307266B (en) | Index model construction method and device | |
| CN118349578A (en) | Data query method, device, electronic equipment and storage medium | |
| EP4542411A1 (en) | Data storage method, data reading method, database system, and device and medium | |
| Vachaspati et al. | SIESTA: enhancing searches for optimal supertrees and species trees | |
| KR100859710B1 (en) | How to retrieve, store, and delete data using data structures to search for data | |
| CN115827623B (en) | Data filtering method and device | |
| CN116860700A (en) | Method, device, equipment and medium for processing metadata in distributed file system | |
| US11392569B2 (en) | Tree partitioning of the succinct trie | |
| KR101666758B1 (en) | Method for searching data using enhanced bloom filter |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |