Movatterモバイル変換


[0]ホーム

URL:


CN107766258B - Memory storage method and device and memory query method and device - Google Patents

Memory storage method and device and memory query method and device
Download PDF

Info

Publication number
CN107766258B
CN107766258BCN201710890458.2ACN201710890458ACN107766258BCN 107766258 BCN107766258 BCN 107766258BCN 201710890458 ACN201710890458 ACN 201710890458ACN 107766258 BCN107766258 BCN 107766258B
Authority
CN
China
Prior art keywords
hash bucket
data
stored
hash
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710890458.2A
Other languages
Chinese (zh)
Other versions
CN107766258A (en
Inventor
邱文一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Enyike (Beijing) Data Technology Co.,Ltd.
Original Assignee
Enyike Beijing Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Enyike Beijing Data Technology Co ltdfiledCriticalEnyike Beijing Data Technology Co ltd
Priority to CN201710890458.2ApriorityCriticalpatent/CN107766258B/en
Publication of CN107766258ApublicationCriticalpatent/CN107766258A/en
Application grantedgrantedCritical
Publication of CN107766258BpublicationCriticalpatent/CN107766258B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The embodiment of the invention provides a memory storage method and device and a memory query method and device, wherein the method comprises the following steps: obtaining a hash bucket value corresponding to each data to be stored according to the keyword and the hash bucket function of each data to be stored, and determining the position of the hash bucket corresponding to each hash bucket value in the storage table according to each hash bucket value and the hash function; and storing the data to be stored in the hash bucket according to the position of the hash bucket in a storage table. By taking the hash bucket value as the Key value, when storing n records, only the Key value of the hash bucket value is needed, so that the memory is saved, and the real-time query of the memory is realized.

Description

Memory storage method and device and memory query method and device
Technical Field
The present invention relates to computer technologies, and in particular, to a memory storage method and apparatus, and a memory query method and apparatus.
Background
With the development of big Data and the sharing of information, DMP (Data-Management Platform) is widely used. DMP includes billions to billions of data, and its real-time query usually provides storage support by a memory-type database, and the memory consumption of such huge magnitude is about tens of T basically, and the cost is huge. There is a need for a storage database that can improve the performance of real-time queries (typically requiring less than 5 ms) while supporting large amounts of data with low memory.
At present, the general solution for mass memory storage is Redis cluster (an open source distributed memory Key-Value database) and Memcached (a high performance distributed memory object cache system), and a part of hot data is loaded in a memory through policies such as LRU (Least Recently Used) to provide a query, and cold data is put into a slower and cheaper storage device. The method essentially reduces the amount of data needing to be cached through service characteristics, and sacrifices the real-time property of low-frequency access data so as to achieve the aim of saving storage cost.
However, for data that cannot be predicted to be hot, the prior art cannot reduce the memory.
Disclosure of Invention
The embodiment of the invention provides a memory storage method and device and a memory query method and device, aiming at solving the problem that the prior art cannot realize reduction of memory storage.
In a first aspect, an embodiment of the present invention provides a memory storage method, including:
obtaining a hash bucket value corresponding to each data to be stored according to a keyword and a hash bucket function of each data to be stored, wherein the hash bucket function is a functional relation between the keyword and the hash bucket value;
determining the position of the hash bucket corresponding to each hash bucket value in the storage table according to each hash bucket value and the hash function;
and storing the data to be stored in the hash bucket according to the position of the hash bucket in a storage table, wherein the storage table comprises at least one hash bucket and the data to be stored in the hash bucket.
Optionally, the hash bucket further includes a mapping relationship between the keyword of the data to be stored and the data to be stored.
Optionally, the number of hash buckets included in the storage table is 2mAnd m is a positive integer greater than 0.
In a possible implementation manner of the first aspect, the obtaining, according to the keyword and the hash bucket function of each to-be-stored data, a hash bucket value corresponding to each to-be-stored data specifically includes:
according to the formula hash' (Ki) ═ MD5(Ki)&((2128)<<(127-m)), obtaining hash bucket value hash' (Ki) corresponding to each data to be stored;
the Ki is a keyword of data i to be stored, the MD5 is a one-way hashing algorithm, and m is the first m bits of the MD 5.
Optionally, different data to be stored in the hash bucket are arranged linearly.
Optionally, the number of data to be stored in each hash bucket conforms to a polynomial distribution.
In another possible implementation manner of the first aspect, the method further includes:
and acquiring the invalid data to be stored in each hash bucket, and deleting the invalid data to be stored.
In another possible implementation manner of the first aspect, the obtaining of the invalidated data to be stored in each hash bucket and deleting the invalidated data to be stored specifically includes:
judging whether the number of the data to be stored in each hash bucket is greater than or equal to a preset number;
if so, judging whether the initial storage time of each piece of data to be stored in each first hash bucket is greater than or equal to a preset time or not, wherein the first hash bucket is a hash bucket with the number of the pieces of data to be stored in each hash bucket greater than or equal to the preset number;
and if the initial storage time of the data to be stored in the first hash bucket is greater than or equal to a preset time, deleting the data to be stored from the first hash bucket.
In another possible implementation manner of the first aspect, the obtaining of the invalidated data to be stored in each hash bucket and deleting the invalidated data to be stored specifically includes:
judging whether the capacity utilization rate of each hash bucket is greater than or equal to a preset capacity utilization rate or not;
and if so, deleting target quantity of data to be stored from each second hash bucket according to the initial storage time of each data to be stored in each second hash bucket, wherein the target quantity is the quantity of the data to be stored in the second hash bucket exceeding the preset capacity utilization rate, and the second hash bucket is the hash bucket of which the capacity utilization rate is greater than or equal to the preset capacity utilization rate in each hash bucket.
In a second aspect, an embodiment of the present invention provides a memory query method, including:
obtaining a hash bucket value corresponding to data to be inquired according to a keyword and a hash bucket function of the data to be inquired, wherein the hash bucket function is a functional relation between the keyword and the hash bucket value;
determining the position of a hash bucket corresponding to the hash bucket value in a storage table according to the hash bucket value and a hash function, wherein the storage table comprises at least one hash bucket and data to be inquired stored in the hash bucket;
and reading the data to be inquired from the hash bucket according to the position of the hash bucket.
In a third aspect, an embodiment of the present invention provides a memory storage device, including:
the obtaining module is used for obtaining a hash bucket value corresponding to each data to be stored according to a keyword and a hash bucket function of each data to be stored, wherein the hash bucket function is a functional relation between the keyword and the hash bucket value;
the determining module is used for determining the position of the hash bucket corresponding to each hash bucket value in the storage table according to each hash bucket value and the hash function;
and the storage module is used for storing the data to be stored in the hash bucket according to the position of the hash bucket in a storage table, and the storage table comprises at least one hash bucket and the data to be stored in the hash bucket.
Optionally, the hash bucket further includes a mapping relationship between the keyword of the data to be stored and the data to be stored.
Optionally, the number of hash buckets included in the storage table is 2mAnd m is a positive integer greater than 0.
In a possible implementation manner of the second aspect, the obtaining, according to the keyword and the hash bucket function of each to-be-stored data, a hash bucket value corresponding to each to-be-stored data specifically includes:
the obtaining module is specifically configured to obtain a hash' (Ki) MD5(Ki) according to a formula&((2128)<<(127-m)), obtaining hash bucket value hash' (Ki) corresponding to each data to be stored;
the Ki is a keyword of data i to be stored, the MD5 is a one-way hashing algorithm, and m is the first m bits of the MD 5.
Optionally, different data to be stored in the hash bucket are arranged linearly.
Optionally, the number of data to be stored in each hash bucket conforms to a polynomial distribution.
In another possible implementation manner of the second aspect, the apparatus further includes a deletion module:
the obtaining module is further configured to obtain the invalidated data to be stored in each hash bucket,
and the deleting module is used for deleting the invalid data to be stored from the hash bucket.
In another possible implementation manner of the second aspect, the apparatus further includes a first determining module;
the first judging module is used for judging whether the number of the data to be stored in each hash bucket is greater than or equal to a preset number;
the first judging module is further configured to judge whether an initial storage time of each piece of data to be stored in each first hash bucket is greater than or equal to a preset time, where the number of the pieces of data to be stored in each hash bucket is greater than or equal to the preset number;
the deleting module is specifically configured to delete the data to be stored from the first hash bucket when the first determining module determines that the initial storage time of the data to be stored in the first hash bucket is greater than or equal to a preset time.
In another possible implementation manner of the second aspect, the apparatus further includes a second determining module;
the second judging module is used for judging whether the capacity utilization rate of each hash bucket is greater than or equal to a preset capacity utilization rate or not;
the deleting module is further specifically configured to delete a target number of pieces of data to be stored from each second hash bucket according to an initial storage time of each piece of data to be stored in each second hash bucket, where the target number is the number of pieces of data to be stored in the second hash bucket that exceed the preset capacity usage rate, and the second hash bucket is a hash bucket in which the capacity usage rate in each hash bucket is greater than or equal to the preset capacity usage rate.
In a fourth aspect, an embodiment of the present invention provides a memory query apparatus, including:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a hash bucket value corresponding to data to be inquired according to a keyword of the data to be inquired and a hash bucket function, and the hash bucket function is a functional relation between the keyword and the hash bucket value;
the determining module is used for determining the position of the hash bucket corresponding to the hash bucket value in a storage table according to the hash bucket value and a hash function; the storage table comprises at least one hash bucket and data to be inquired stored in the hash bucket;
and the reading module is used for reading the data to be inquired from the hash bucket according to the position of the hash bucket.
According to the memory storage method and device and the memory query method and device provided by the embodiment of the invention, the hash bucket value corresponding to each data to be stored is obtained according to the keyword and the hash bucket function of each data to be stored, and the position of the hash bucket corresponding to each hash bucket value in the memory storage table is determined according to each hash bucket value and the hash function; and storing the data to be stored in the hash buckets according to the positions of the hash buckets in a storage table. In this embodiment, the hash bucket value is used as the Key value, and when n records are stored, only the Key value of the hash bucket value is needed, so that the memory is saved, and the real-time query of the memory is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a memory storage method according to an embodiment of the present invention;
FIG. 2 is a prior art hash representation;
fig. 3 is a schematic diagram of a hash bucket storage structure according to the embodiment;
fig. 4 is a flowchart of a time-first invalidation policy in the memory storage method according to the second embodiment of the present invention;
fig. 4a is a flowchart of a capacity-first invalidation policy in the memory storage method according to the second embodiment of the present invention;
fig. 5 is a flowchart of a memory query and storage method according to an embodiment of the present invention;
fig. 6 is a structural diagram of a memory storage device according to an embodiment of the present invention;
fig. 7 is a structural diagram of a memory storage device according to a second embodiment of the present invention;
fig. 8 is a structural diagram of a memory storage device according to a third embodiment of the present invention;
fig. 9 is a structural diagram of a memory storage device according to a fourth embodiment of the present invention;
fig. 10 is a structural diagram of a memory query device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, the mainstream of the memory type database is Redis Cluster and Memcached (a high-performance distributed memory object cache system). Since Redis Cluster (Redis Cluster) supports distributed extension, it is widely used in the present situation. The embodiment of the invention is based on the current mainstream memory Key-value database Redis Cluster and Memcached, optimizes the data structure of the memory Key-value database Redis Cluster and Memcached, and can effectively save the memory.
The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 1 is a flowchart of a memory storage method according to an embodiment of the present invention. The execution body of the embodiment is a memory storage device. The embodiment relates to a specific process of storing data to be stored in a memory by a memory storage device. As shown in fig. 1, the method of this embodiment may include:
s101, obtaining a hash bucket value corresponding to each data to be stored according to the keyword and the hash bucket function of each data to be stored.
Since data needs to be located in o (1) time in the key-value database, a Hash table is needed to implement the underlying data structure. The Hash table uses a mapping function f: key- > address maps the key to the storage location of the record in the table, so that when the record is desired to be searched, the storage location of the record in the table can be calculated according to the key and the mapping relation. The mapping is called a Hash function and the memory location calculated by the Hash function and the key (which is the memory location in the table) is called the Hash address.
For example, when records such as K1 ═ V1, K2 ═ V2 … Kn ═ Vn and the like are stored, a conventional Hash table storage method is as shown in fig. 2, Kn is used as key in the Hash table, Vn is used as value in the Hash table, a storage location of Vn can be obtained according to the key and the Hash function, and Vn is stored in the value corresponding to the storage location. But for different keys, the Hash function may calculate the same address, so that a collision occurs. The Hash table must open up a large amount of Hash space in order to reduce the collision rate. Meanwhile, a large number of pointers are also needed to realize query of the jump table in order to solve the hash collision, so that the storage of the hash table consumes a large amount of memory, and is even larger than data which needs to be stored specifically.
According to the method, on the premise of storing the same amount of data, the consumption of the memory is saved by using fewer Key numbers. Specifically, the concept of the hash bucket is provided, the memory storage method of the embodiment can be used for hash bucket storage, one hash bucket corresponds to one Key, and for storing data with the same quantity, the number of keys used in the method is small, so that the memory is saved. In the method of this embodiment, when the stored data is larger, the more memory is saved.
Specifically, when data is stored, a hash bucket value corresponding to each piece of data to be stored needs to be calculated first, specifically, the hash bucket value of each piece of data to be stored is calculated according to a keyword and a hash bucket function of each piece of data to be stored. The hash bucket function is a functional relation between the keyword and the hash bucket value, and the hash bucket function can be set according to actual needs, which is not limited in this embodiment.
For example, there are four pieces of data to be stored: "zhangsan 15612234570", "lie four 17812234002", "wang five 18100042301" and "zhao six 13969004576", wherein the corresponding keywords are "zhangsan", "lie four", "wang five" and "zhao six", respectively, and the above keywords are brought into a hash bucket function to calculate the corresponding hash bucket values, which are assumed to be 10, 39, 10 and 40, respectively.
S102, determining the position of the hash bucket corresponding to each hash bucket value in the storage table according to each hash bucket value and the hash function.
Fig. 3 is a schematic diagram of a hash bucket storage structure in this embodiment, and as shown in fig. 3, the memory in this embodiment stores data in the form of a storage table, where the storage table includes at least one hash bucket. Therefore, after the hash bucket value corresponding to each data to be stored is calculated according to the above steps, the position of the hash bucket corresponding to each hash bucket value in the storage table also needs to be determined.
Specifically, in this embodiment, the Hash bucket value obtained through the above calculation is used as a key value in the Hash table, and the position of the Hash bucket corresponding to each Hash bucket value in the storage table is calculated according to the original Hash function in the system.
For example, the hash bucket value 10 corresponding to "zhangsan", the hash bucket value 39 corresponding to "liqi", the hash bucket value 10 corresponding to "wangwu", and the hash bucket value 40 corresponding to "zhao" are respectively substituted into the hash function, and the positions of the respective corresponding hash buckets in the storage table are obtained, which are assumed to be 2, 4, 2, and 5, respectively.
S103, storing the data to be stored in the hash bucket according to the position of the hash bucket in a storage table.
Specifically, the position of the hash bucket corresponding to each hash bucket value in the storage table is obtained according to the method, so that each hash bucket value can be positioned on the corresponding hash bucket, and then the data to be stored corresponding to the hash bucket value is stored in the corresponding hash bucket.
In this embodiment, when the hash bucket values of the keywords calculated according to the hash bucket function are the same, the positions of the hash buckets corresponding to the same hash bucket value in the storage table are also the same. At this time, as shown in fig. 3, the data to be stored with the same hash bucket value may be stored in the same hash bucket, for example, the hash bucket value hash '(k 3) of the key k3 is the same as the hash bucket value hash' (k1) of the key k1, at this time, the data to be stored V3 corresponding to hash '(k 3) and the data to be stored V1 corresponding to hash' (k1) may be stored in the same hash bucket 1, that is, a plurality of data to be stored may be stored in one hash bucket.
For example, continuing with the above example, assume that the hash bucket values corresponding to the keywords "zhang san," "li ye si," "wang wu," and "zhao liu" are respectively: 10. 39, 10, and 40, and the positions of the hash buckets corresponding to the hash bucket values 10, 39, 10, and 40 in the storage table are: 2. 4, 2 and 5. At this time, "zhangsan 15612234570" and "wangwu 18100042301" of the data to be stored corresponding to "zhangsan" and "wangwu" with the hash bucket value of 10 are stored in the same hash bucket with the position of 2 in the storage table. Meanwhile, the data to be stored "li four 17812234002" is stored separately in one hash bucket of position 4 in the storage table, and the data to be stored "zhao six 13969004576" is stored separately in one hash bucket of position 5 in the storage table.
The number of hash buckets included in the storage table is not limited in this embodiment, and is specifically adjusted according to actual needs.
In the method of this embodiment, when the database stores n records (i.e., data to be stored), n Key values need to be accommodated in the conventional hash table storage, and the hash bucket value of each record is calculated in this embodiment, and the hash bucket value is used as a Key value in the hash table storage.
According to the memory storage method provided by the embodiment of the invention, the hash bucket value corresponding to each data to be stored is obtained according to the keyword and the hash bucket function of each data to be stored, and the position of the hash bucket corresponding to each hash bucket value in the memory storage table is determined according to each hash bucket value and the hash function; and storing the data to be stored in the hash buckets according to the positions of the hash buckets in a storage table. In this embodiment, the hash bucket value is used as the Key value, and when n records are stored, only the Key value of the hash bucket value is needed, so that the memory is saved, and the real-time query of the memory is realized.
In a possible implementation manner of this embodiment, the number of hash buckets included in the storage table may be 2mAnd m is a positive integer greater than 0.
For example, the encoding space of the hash bucket value hash' is limited to 2^10, so after enough records are stored, only 2^10 hash bucket values exist in Redis, and the corresponding hash bucket is at most 2^10, so that the number of the hash buckets is limited to be below a constant, and the larger N is, the more the number of the saved hash buckets is, and the more the corresponding saved memory is. The number of the hash buckets is a certain number, namely 2^ m, for the same storage system, wherein m is a positive integer larger than 0.
In another possible implementation manner of this embodiment, the obtaining, by the S101, a hash bucket value corresponding to each piece of data to be stored according to the keyword and the hash bucket function of each piece of data to be stored specifically includes:
according to the formula hash' (Ki) ═ MD5(Ki)&((2128)<<(127-m)), obtaining hash bucket value hash' (Ki) corresponding to each data to be stored;
the Ki is a keyword of data i to be stored, the MD5 is a one-way hashing algorithm, and m is the first m bits of the MD 5.
In a possible implementation manner of the embodiment of the present invention, when a plurality of data to be stored can be stored in one hash bucket, in order to facilitate later query, a mapping relationship between a keyword of the data to be stored and the data to be stored is stored in the hash bucket.
For example, as shown in fig. 3, the hash bucket with position 1 in the storage table stores the mapping relationship of K1 ═ V1 and the mapping relationship of K3 ═ V3. Thus, when V1 corresponding to K1 needs to be queried, K1 is searched, the hash bucket function is used to calculate that the actually stored Key1 is hash' (K1), and then the hash function built in the system is used to locate the position in the storage table, so as to find the position pointed by Key1, that is, the hash bucket with the position of 1. Next, the hash bucket pointed to by position 1 is traversed, so that a record of K1 ═ Vn is found in the hash bucket, and an accurate query is completed.
Optionally, in this embodiment, when a plurality of data to be stored are stored in the hash bucket, different data to be stored in the hash bucket are linearly arranged, for example, as shown in fig. 3, K1 ═ V1 and K3 ═ V3 are linearly arranged. That is, the hash buckets connected by the hash bucket value in this embodiment are implemented by using hash maps built in the Redis, and the hash maps can be configured to be stored linearly. That is to say, the data to be stored (for example, K1 and K3) with the same hash bucket value are placed in the same hash map in a linear arrangement, so that the overhead of additional pointers and hash spaces is not increased, and the memory is further reduced. Further, when more data is to be stored, the more memory is saved compared to conventional storage.
Optionally, the number of the data to be stored in each hash bucket in the storage table of this embodiment conforms to polynomial distribution.
Since the Hash bucket storage implements a customized Hash bucket function, it is necessary to ensure that data stored in all Hash buckets are balanced, otherwise, the memory fragmentation rate of the entire system becomes high. Because the probability of the data to be stored entering each hash bucket is equal, the distribution of records inside the hash bucket conforms to a polynomial distribution:
Figure BDA0001421074600000101
in the formula P (X)k=nk) Represents the storage of n in a hash bucketkThe probability of a piece of data to be stored, k, represents the number of hash buckets. When the number of the hash buckets is larger, the phase difference of each hash bucket is very small as can be seen from a formula.
Therefore, the number of the data to be stored in each hash bucket in the storage table is distributed uniformly, so that the generation of storage fragments can be avoided, the memory storage is balanced, and the reliability of the memory storage is improved while the memory is saved.
Fig. 4 is a flowchart of a time-first invalidation policy in the memory storage method according to the second embodiment of the present invention, and fig. 4a is a flowchart of a capacity-first invalidation policy in the memory storage method according to the second embodiment of the present invention. On the basis of the above embodiment, the present embodiment further includes S200: acquiring the invalid data to be stored in each hash bucket, and deleting the invalid data to be stored to save the memory space, where the S200 may specifically include two strategies: the first is a time-first failure policy, and the second is a capacity-first failure policy.
As shown in fig. 4, the time-first invalidation policy may specifically include:
s201, judging whether the number of the data to be stored in each hash bucket is larger than or equal to a preset number.
And S202, if so, judging whether the initial storage time of each to-be-stored data in each first hash bucket is greater than or equal to a preset time, wherein the number of the to-be-stored data in each hash bucket is greater than or equal to the preset number.
S203, if the initial storage time of the data to be stored in the first hash bucket is greater than or equal to a preset time, deleting the data to be stored from the first hash bucket.
Specifically, during the process of writing the Redis, the number of pieces of data to be stored in each hash bucket is checked by using a certain proportion of write operations (configurable proportion), and whether the number of pieces of data to be stored in each hash bucket is greater than or equal to a preset number is judged. And if the number of the data to be stored in the hash bucket is greater than or equal to the preset number, initiating a elimination strategy aiming at the hash bucket. For convenience of description, the present embodiment refers to a hash bucket in which the number of pieces of data to be stored is greater than or equal to a preset number of pieces, as a first hash bucket.
In this embodiment, an expiration timestamp (i.e., a preset time) is written in the first byte of the value of the hashmap of redis, and the system may read the timestamp of each piece of data to be stored (i.e., the initial storage time of each piece of data to be stored) while scanning the data to be stored inside the first hash bucket. Then, whether the initial storage time of each piece of data to be stored in the first hash bucket is greater than or equal to a preset time is judged, if yes, the piece of data to be stored is determined to be overdue, the overdue data to be stored is deleted from the first hash bucket, and specifically, a hdel command in the redis can be directly called to delete the overdue data to be stored.
As shown in fig. 4a, the capacity-first failure policy may specifically include:
and S204, judging whether the capacity utilization rate of each hash bucket is greater than or equal to a preset capacity utilization rate.
And S205, if yes, deleting target quantity of data to be stored from each second hash bucket according to the initial storage time of each data to be stored in each second hash bucket, wherein the target quantity is the quantity of the data to be stored in the second hash bucket exceeding the preset capacity utilization rate, and the second hash bucket is a hash bucket with the capacity utilization rate larger than or equal to the preset capacity utilization rate in each hash bucket.
Specifically, data to be stored recorded in each hash bucket is checked by using write operation, whether the capacity utilization rate of each hash bucket is greater than or equal to a preset capacity utilization rate is judged, and if yes, a deletion strategy is started for the hash bucket. For convenience of illustration, the present embodiment refers to a hash bucket with a capacity usage rate greater than or equal to a preset capacity usage rate as a second hash bucket. Then, according to the initial storage time of each to-be-stored data in each second hash bucket, performing minimum heap (min heap) processing on each to-be-stored data in each second hash bucket according to the initial storage time, so that the data to be stored closest to the expired target quantity can be found, and the target quantity of the to-be-stored data is deleted from each second hash bucket. And the target number is equal to the number of the data to be stored in the second hash bucket exceeding the preset capacity utilization rate.
It should be noted that, one of the two effective policies may be selected as the failure policy of the data to be stored according to actual needs, optionally, the two effective policies may also be simultaneously selected as the failure policies of the data to be stored, and this embodiment does not limit this.
According to the memory storage method provided by the embodiment of the invention, whether the data to be stored in each hash bucket is effective or not is determined by two methods, namely a time-first failure strategy and a capacity-first failure strategy, and the failed data to be stored is deleted from the hash bucket, so that the memory is prevented from being occupied by the failed data, and the effective utilization rate of the memory is further improved.
Fig. 5 is a flowchart of a memory query and storage method according to an embodiment of the present invention. The execution main body of the embodiment is a device with a memory query function, and is hereinafter referred to as a memory query device. The present embodiment relates to a specific process of querying data from a memory by a memory querying device. As shown in fig. 5, the present embodiment may include:
s301, obtaining a hash bucket value corresponding to the data to be queried according to the keywords of the data to be queried and the hash bucket function.
Wherein the hash bucket function is a functional relation between the keyword and the hash bucket value.
It should be noted that the data to be queried in this embodiment is the data to be stored in the storage table in the foregoing embodiment.
S302, according to the hash bucket value and the hash function, determining the position of the hash bucket corresponding to the hash bucket value in a storage table.
The storage table comprises at least one hash bucket and data to be inquired stored in the hash bucket.
S303, reading the data to be inquired from the hash bucket according to the position of the hash bucket.
For example, as described with reference to the above embodiment, it is assumed that the hash bucket with position 2 in the storage table stores "zhangsan 15612234570" of the data to be queried, the hash bucket with position 4 stores "li four 17812234002" of the data to be queried, and the hash bucket with position 5 stores "zhao six 13969004576" of the data to be queried. When the telephone number of Zhang III needs to be inquired, determining the hash bucket value corresponding to the inquired data to be detected according to the keyword Zhang III and the hash bucket function, and assuming that the hash bucket value corresponding to the keyword Zhang III is determined to be 10. Next, according to the hash bucket value and the hash function, the position of the hash bucket corresponding to the hash bucket value in the storage table is determined, and it is assumed that the position of the hash bucket corresponding to the hash bucket value 10 in the storage table is 2. At this time, the data "15612234570" to be queried is read from the hash bucket with the position 2 in the storage table, and then the query is realized.
In a possible implementation manner of this embodiment, the hash bucket may include a mapping relationship between a keyword of the data to be queried and the data to be queried, and the step S303 may specifically include:
and reading the data to be queried corresponding to the keyword from the hash bucket according to the position of the hash bucket and the mapping relation between the keyword and the data to be queried.
For example, as described with reference to the example, it is assumed that the hash bucket with the storage location position of 2 stores data to be queried, namely "zhangsan 15612234570" and "wangwu 18100042301". When the telephone number of Zhang III needs to be inquired, the hash bucket value corresponding to Zhang III is obtained according to the method, the assumption is 10, and the position of the hash bucket corresponding to the hash bucket value 10 in the storage table is obtained, and the assumption is 2. At this time, two records are stored in the hash bucket with the storage table position of 2, so that the two records of the hash bucket are traversed to obtain the telephone number "15612234570" corresponding to the keyword "zhang san", and further, the query is realized.
According to the method, as the hash bucket values corresponding to the data to be stored are less, the hash bucket values and the hash buckets in the storage table are reduced, so that the data amount scanned in the query process is less, and the query efficiency is improved.
According to the memory query method provided by the embodiment of the invention, the hash bucket value corresponding to the data to be queried is obtained according to the keyword and the hash bucket function of the data to be queried, the position of the hash bucket corresponding to the hash bucket value in the storage table is determined according to the hash bucket value and the hash function, the data to be queried is read from the hash bucket according to the position of the hash bucket, and the rapid query of the data to be queried is further realized.
Fig. 6 is a structural diagram of a memory storage device according to an embodiment of the present invention. Thememory storage device 100 of the embodiment may be software, hardware or a combination of software and hardware. As shown in fig. 6, thememory storage device 100 of the present embodiment may include:
the obtainingmodule 110 is configured to obtain a hash bucket value corresponding to each piece of data to be stored according to a keyword of each piece of data to be stored and a hash bucket function, where the hash bucket function is a functional relation between the keyword and the hash bucket value.
The determiningmodule 120 is configured to determine, according to each hash bucket value and the hash function, a position of the hash bucket corresponding to each hash bucket value in the storage table.
Astorage module 130, configured to store the data to be stored in the hash bucket according to a position of the hash bucket in a storage table, where the storage table includes at least one hash bucket and the data to be stored in the hash bucket.
The memory storage device according to the embodiment of the present invention may implement the technical solutions shown in the above method embodiments, and the implementation principles and beneficial effects thereof are similar, and are not described herein again.
In a possible implementation manner of this embodiment, the hash bucket further includes a mapping relationship between a keyword of the data to be stored and the data to be stored.
Optionally, the number of hash buckets included in the storage table is 2mAnd m is a positive integer greater than 0.
In another possible implementation manner of this embodiment, the obtainingmodule 110 is specifically configured to obtain the value of MD5(Ki) according to the formula hash' (Ki)&((2128)<<(127-m)), obtaining hash bucket value hash' (Ki) corresponding to each data to be stored;
the Ki is a keyword of data i to be stored, the MD5 is a one-way hashing algorithm, and m is the first m bits of the MD 5.
Optionally, different data to be stored in the hash bucket are arranged linearly.
Optionally, the number of data to be stored in each hash bucket conforms to a polynomial distribution.
Fig. 7 is a structural diagram of a memory storage device according to a second embodiment of the present invention. On the basis of the foregoing embodiments, thememory storage device 100 of the present embodiment further includes a deletable module 140:
the obtainingmodule 110 is further configured to obtain the invalidated data to be stored in each hash bucket.
The deletingmodule 140 is configured to delete the invalidated data to be stored from the hash bucket.
The memory storage device according to the embodiment of the present invention may implement the technical solutions shown in the above method embodiments, and the implementation principles and beneficial effects thereof are similar, and are not described herein again.
Fig. 8 is a structural diagram of a memory storage device according to a third embodiment of the present invention. On the basis of the foregoing embodiments, thememory storage device 100 of the present embodiment further includes a first determiningmodule 150;
the first determiningmodule 150 is configured to determine whether the number of pieces of data to be stored in each hash bucket is greater than or equal to a preset number;
the first determiningmodule 150 is further configured to determine whether an initial storage time of each to-be-stored data in each first hash bucket is greater than or equal to a preset time, where the first hash bucket is a hash bucket in which the number of to-be-stored data in each hash bucket is greater than or equal to a preset number;
the deletingmodule 140 is specifically configured to delete the data to be stored from the first hash bucket when the first determiningmodule 150 determines that the initial storage time of the data to be stored in the first hash bucket is greater than or equal to a preset time.
The memory storage device according to the embodiment of the present invention may implement the technical solutions shown in the above method embodiments, and the implementation principles and beneficial effects thereof are similar, and are not described herein again.
Fig. 9 is a structural diagram of a memory storage device according to a fourth embodiment of the present invention. On the basis of the foregoing embodiment, thememory storage device 100 of the present embodiment further includes a second determiningmodule 160;
the second determiningmodule 160 is configured to determine whether the capacity utilization rate of each hash bucket is greater than or equal to a preset capacity utilization rate;
the deletingmodule 140 is further specifically configured to delete a target number of data to be stored from each second hash bucket according to an initial storage time of each data to be stored in each second hash bucket, where the target number is a number of data to be stored in the second hash bucket that exceeds the preset capacity usage rate, and the second hash bucket is a hash bucket in which the capacity usage rate in each hash bucket is greater than or equal to the preset capacity usage rate.
The memory storage device according to the embodiment of the present invention may implement the technical solutions shown in the above method embodiments, and the implementation principles and beneficial effects thereof are similar, and are not described herein again.
Fig. 10 is a structural diagram of a memory query device according to an embodiment of the present invention. Thememory access device 200 of the present embodiment may be software, hardware or a combination of software and hardware. As shown in fig. 10, thememory query device 200 of the present embodiment may include:
an obtainingmodule 210, configured to obtain a hash bucket value corresponding to data to be queried according to a keyword of the data to be queried and a hash bucket function, where the hash bucket function is a functional relation between the keyword and the hash bucket value;
a determiningmodule 220, configured to determine, according to the hash bucket value and the hash function, a position of a hash bucket corresponding to the hash bucket value in a storage table; the storage table comprises at least one hash bucket and data to be inquired stored in the hash bucket;
areading module 230, configured to read the data to be queried from the hash bucket according to the location of the hash bucket.
The memory storage device according to the embodiment of the present invention may implement the technical solutions shown in the above method embodiments, and the implementation principles and beneficial effects thereof are similar, and are not described herein again.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A memory storage method, comprising:
obtaining a hash bucket value corresponding to each data to be stored according to a keyword and a hash bucket function of each data to be stored, wherein the hash bucket function is a functional relation between the keyword and the hash bucket value;
determining the position of the hash bucket corresponding to each hash bucket value in the storage table according to each hash bucket value and the hash function;
storing the data to be stored in a storage table according to the position of the hash bucket in the hash bucket, wherein the storage table comprises at least one hash bucket and the data to be stored in the hash bucket; the number of the data to be stored in each hash bucket conforms to polynomial distribution;
acquiring invalid data to be stored in each hash bucket, and deleting the invalid data to be stored;
the acquiring the invalid data to be stored in each hash bucket and deleting the invalid data to be stored specifically includes:
judging whether the capacity utilization rate of each hash bucket is greater than or equal to a preset capacity utilization rate or not;
and if so, deleting target quantity of data to be stored from each second hash bucket according to the initial storage time of each data to be stored in each second hash bucket, wherein the target quantity is the quantity of the data to be stored in the second hash bucket exceeding the preset capacity utilization rate, and the second hash bucket is the hash bucket of which the capacity utilization rate is greater than or equal to the preset capacity utilization rate in each hash bucket.
2. The method of claim 1, wherein the hash bucket further comprises a mapping relationship between a key of the data to be stored and the data to be stored.
3. The method of claim 2, wherein the different data to be stored in the hash bucket are arranged linearly.
4. The method of claim 2, wherein the storage table comprises a number of hash buckets of 2mAnd m is a positive integer greater than 0.
5. The method according to claim 4, wherein the obtaining the hash bucket value corresponding to each data to be stored according to the key and the hash bucket function of each data to be stored specifically comprises:
according to the formula hash' (Ki) ═ MD5(Ki)&((2128) < (127-m)), obtaining a hash bucket value hash' (Ki) corresponding to each data to be stored;
the Ki is a keyword of data i to be stored, the MD5 is a one-way hashing algorithm, and m is the first m bits of the MD 5.
6. The method according to claim 1, wherein the obtaining of the invalidated data to be stored in each hash bucket and the deleting of the invalidated data to be stored further specifically comprises:
judging whether the number of the data to be stored in each hash bucket is greater than or equal to a preset number;
if so, judging whether the initial storage time of each piece of data to be stored in each first hash bucket is greater than or equal to a preset time or not, wherein the first hash bucket is a hash bucket with the number of the pieces of data to be stored in each hash bucket greater than or equal to the preset number;
and if the initial storage time of the data to be stored in the first hash bucket is greater than or equal to a preset time, deleting the data to be stored from the first hash bucket.
7. A memory query method, comprising:
obtaining a hash bucket value corresponding to data to be inquired according to a keyword and a hash bucket function of the data to be inquired, wherein the hash bucket function is a functional relation between the keyword and the hash bucket value;
determining the position of a hash bucket corresponding to the hash bucket value in a storage table according to the hash bucket value and a hash function, wherein the storage table comprises at least one hash bucket and data to be inquired stored in the hash bucket; the number of the data to be queried stored in each hash bucket accords with polynomial distribution;
and reading the data to be inquired from the hash bucket according to the position of the hash bucket.
8. A memory storage device, comprising:
the obtaining module is used for obtaining a hash bucket value corresponding to each data to be stored according to a keyword and a hash bucket function of each data to be stored, wherein the hash bucket function is a functional relation between the keyword and the hash bucket value;
the determining module is used for determining the position of the hash bucket corresponding to each hash bucket value in the storage table according to each hash bucket value and the hash function;
the storage module is used for storing the data to be stored in the hash bucket according to the position of the hash bucket in a storage table, and the storage table comprises at least one hash bucket and the data to be stored in the hash bucket; the number of the data to be stored in each hash bucket conforms to polynomial distribution;
the acquisition module is also used for acquiring the invalid data to be stored in each hash bucket;
the deleting module is used for deleting the invalid data to be stored from the hash bucket;
the second judgment module is used for judging whether the capacity utilization rate of each hash bucket is greater than or equal to the preset capacity utilization rate or not;
the deleting module is further specifically configured to delete a target number of pieces of data to be stored from each second hash bucket according to an initial storage time of each piece of data to be stored in each second hash bucket, where the target number is the number of pieces of data to be stored in the second hash bucket that exceed the preset capacity usage rate, and the second hash bucket is a hash bucket in which the capacity usage rate in each hash bucket is greater than or equal to the preset capacity usage rate.
9. A memory lookup apparatus, comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a hash bucket value corresponding to data to be inquired according to a keyword of the data to be inquired and a hash bucket function, and the hash bucket function is a functional relation between the keyword and the hash bucket value;
the determining module is used for determining the position of the hash bucket corresponding to the hash bucket value in a storage table according to the hash bucket value and a hash function; the storage table comprises at least one hash bucket and data to be inquired stored in the hash bucket; the number of the data to be queried stored in each hash bucket accords with polynomial distribution;
and the reading module is used for reading the data to be inquired from the hash bucket according to the position of the hash bucket.
CN201710890458.2A2017-09-272017-09-27Memory storage method and device and memory query method and deviceActiveCN107766258B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201710890458.2ACN107766258B (en)2017-09-272017-09-27Memory storage method and device and memory query method and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201710890458.2ACN107766258B (en)2017-09-272017-09-27Memory storage method and device and memory query method and device

Publications (2)

Publication NumberPublication Date
CN107766258A CN107766258A (en)2018-03-06
CN107766258Btrue CN107766258B (en)2021-11-16

Family

ID=61267601

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201710890458.2AActiveCN107766258B (en)2017-09-272017-09-27Memory storage method and device and memory query method and device

Country Status (1)

CountryLink
CN (1)CN107766258B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108829344A (en)*2018-05-242018-11-16北京百度网讯科技有限公司 Data storage method, device and storage medium
CN113672619B (en)*2021-08-172024-02-06天津南大通用数据技术股份有限公司Method for segmenting data according to hash rule to make data more uniform
CN114936087B (en)*2021-09-292023-06-02华为技术有限公司Method, device, system and related equipment for embedded vector prefetching
CN114020753A (en)*2021-11-292022-02-08北京天融信网络安全技术有限公司 A data interaction method and device
CN114625933B (en)*2022-03-242025-09-02阿里巴巴(中国)有限公司 Method, device, equipment, storage medium and product for searching character strings in memory
CN114791978A (en)*2022-04-192022-07-26中国电信股份有限公司News recommendation method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102232219A (en)*2010-01-262011-11-02华为技术有限公司Method and device for storing and searching keyword
CN103905503A (en)*2012-12-272014-07-02中国移动通信集团公司Data storage method, data scheduling method, device and system
US9385957B1 (en)*2012-11-302016-07-05Netronome Systems, Inc.Flow key lookup involving multiple simultaneous cam operations to identify hash values in a hash bucket
US9397946B1 (en)*2013-11-052016-07-19Cisco Technology, Inc.Forwarding to clusters of service nodes

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6052697A (en)*1996-12-232000-04-18Microsoft CorporationReorganization of collisions in a hash bucket of a hash table to improve system performance
CN102609509B (en)*2010-04-262015-09-30华为技术有限公司 Hash data processing method and device
CN104639570A (en)*2013-11-062015-05-20南京中兴新软件有限责任公司Resource object storage processing method and device
CN104077343B (en)*2013-12-262018-08-24国家计算机网络与信息安全管理中心A kind of Hash table failure elements delet method
CN106326475B (en)*2016-08-312019-12-27中国科学院信息工程研究所Efficient static hash table implementation method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102232219A (en)*2010-01-262011-11-02华为技术有限公司Method and device for storing and searching keyword
US9385957B1 (en)*2012-11-302016-07-05Netronome Systems, Inc.Flow key lookup involving multiple simultaneous cam operations to identify hash values in a hash bucket
CN103905503A (en)*2012-12-272014-07-02中国移动通信集团公司Data storage method, data scheduling method, device and system
US9397946B1 (en)*2013-11-052016-07-19Cisco Technology, Inc.Forwarding to clusters of service nodes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
移动机器人视觉同时定位与地图构建关键算法研究;吴俊君;《中国博士学位论文全文数据库》;中国学术期刊(光盘版)电子杂志社;20150515(第5期);第55页*

Also Published As

Publication numberPublication date
CN107766258A (en)2018-03-06

Similar Documents

PublicationPublication DateTitle
CN107766258B (en)Memory storage method and device and memory query method and device
US11775524B2 (en)Cache for efficient record lookups in an LSM data structure
CN100468400C (en) A method and system for improving the speed of searching information
US11726712B2 (en)Memory system with write modes based on an internal state of a memory controller
US9104327B2 (en)Fast translation indicator to reduce secondary address table checks in a memory device
US8868926B2 (en)Cryptographic hash database
EP0851354B1 (en)Reorganization of collisions in a hash bucket of a hash table to improve system performance
KR102036769B1 (en)Data caching method, cache and computer system
US8122216B2 (en)Systems and methods for masking latency of memory reorganization work in a compressed memory system
US11580162B2 (en)Key value append
US7962700B2 (en)Systems and methods for reducing latency for accessing compressed memory using stratified compressed memory architectures and organization
US11226904B2 (en)Cache data location system
US8185692B2 (en)Unified cache structure that facilitates accessing translation table entries
CN103150395B (en)Directory path analysis method of solid state drive (SSD)-based file system
US10698834B2 (en)Memory system
CN104598394A (en)Data caching method and system capable of conducting dynamic distribution
CN113590506B (en)HMB table entry management method and solid state disk control system
CN104504076A (en)Method for implementing distributed caching with high concurrency and high space utilization rate
JP2018526740A (en) Data storage method and apparatus for mobile terminal
CN113157606B (en)Buffer realization method and device and data processing equipment
US20140115246A1 (en)Apparatus, system and method for managing empty blocks in a cache
US10169250B2 (en)Method and apparatus method and apparatus for controlling access to a hash-based disk
CN112286873A (en)Hash tree caching method and device
CN114356230B (en)Method and system for improving read performance of column storage engine
Geng et al.OmniCache: A Unified Cache for Efficient Query Handling in LSM-tree Based Key-Value Stores

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
TA01Transfer of patent application right

Effective date of registration:20201224

Address after:136a, 1st floor, D-1 building, Dongsheng Science Park, 66 xixiaokou Road, Haidian District, Beijing 100080 (Dongsheng area)

Applicant after:Enyike (Beijing) Data Technology Co.,Ltd.

Address before:Room 9014, 9 / F, building 3, yard 30, Shixing street, Shijingshan District, Beijing 100041

Applicant before:ADMASTER TECHNOLOGY (BEIJING) Co.,Ltd.

TA01Transfer of patent application right
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp