Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Relational data or unstructured data cannot be well stored in a distributed database at the same time.
In video monitoring, a large amount of archives and time-space data need to be stored, and the data can be stored and inquired well by providing the distributed database.
The distributed database provided by the application comprises a physical storage table and a logic management table. The physical storage table includes a plurality of time slices, and the time slices are divided by a time attribute, specifically, the time attribute may be in a period of one day, a period of one week, or another time. Are not limited herein. Taking a one-day period as an example, the distributed database is used for data of 2019 in 6 months, and includes 30 time slices from a time slice with a time attribute of 2019 in 6 months and 1 day to a time slice with a time attribute of 2019 in 6 months and 30 days.
Each time slice comprises a plurality of physical slices, and specifically comprises at least one physical slice. The physical shards are used as time memory shards for storing data.
Referring to fig. 1 specifically, fig. 1 is a schematic flow chart of a first embodiment of a data storage method of a distributed database according to the present invention, where the data storage method of the distributed database includes the following steps:
and S11, receiving the data to be stored.
Data to be stored, in particular, may be archival data or spatio-temporal data, is received.
And S12, determining the physical storage path of the data to be stored according to the key value of the data to be stored.
The physical storage path on which the data is stored may then be determined based on the key value of the data to be stored.
And S13, storing the data to be stored in the physical shards corresponding to the physical storage paths under each time shard or partial time shards.
In a specific embodiment, the data to be stored is then stored in the physical partition corresponding to the physical storage path of each time slice, or in a specific embodiment, the data to be stored may be stored in the physical partition corresponding to the physical storage path of a part of the time slices.
In particular, the data to be stored is stored differently due to the type of data to be stored.
Referring to fig. 2, fig. 2 is a schematic flow chart of a data storage method of a distributed database according to a second embodiment of the present invention, in which data to be stored is a storage manner of archive data, and the data storage method of the distributed database of the present embodiment includes the following steps:
s11a, receiving data to be stored.
In an embodiment, the data to be stored is file data, and the key values of the file data include a file primary key value and an attribute key value. Specifically, one expression of the archive data may be { "person _ id": 1"," name ": John", "sex":1}, where in "person _ id": 1", person _ id is the archive primary key identifier, and 1 is the archive primary key value. The attribute key is specifically an attribute parameter, such as name and gender, which are all independent of time. For example, "name": John and "sex":1, both name and sex are attribute identifiers, and John and 1 are attribute keys respectively.
If 1 is predefined as male, 0 is female. Namely { "person _ id": 1), "name": John "," sex ":1} the archive data is data with an archive primary key value of 1, a name of John and a gender of male.
S12a, determining the physical storage path of the data to be stored according to the key value of the data to be stored.
Calculating a physical archive storage path of archive data according to an archive primary key value, specifically, calculating a key value by using a consistent Hash composite algorithm, specifically, calculating an archive primary key value, thereby obtaining a routing value; the route value is then divided by the number of physical slices in the time slice to get a remainder, and the remainder is taken as the physical storage path.
In particular, the number of physical slices in each time slice may be the same or different. For different time slices, when the storage path is searched, the route value is divided by the number of physical slices in the time slice, and then the remainder is obtained, so that the physical storage path can be obtained.
In a specific embodiment, when multiple physical fragments are established in a time fragment, sequence numbers are sequentially allocated to the multiple physical fragments and the multiple physical fragments are used as physical storage paths of the physical fragments. And after the route value is divided by the number of the physical fragments in the time fragment and is left, obtaining the remainder and the serial number of the physical fragment to be stored.
And S13a, storing the data to be stored in the physical slice corresponding to the physical storage path under each time slice or partial time slice.
And storing the archive data in the physical fragment corresponding to the archive physical storage path under each time fragment.
And correspondingly storing the data to be stored, namely the archive data, into the physical fragment corresponding to the sequence number under each time fragment.
Referring to fig. 3 specifically, fig. 3 is a schematic flowchart of a data storage method of a distributed database according to a third embodiment of the present invention, specifically, a storage manner of spatio-temporal data is to be stored in the data to be stored, and the data storage method of the distributed database according to this embodiment includes the following steps:
s11b, receiving data to be stored.
In an embodiment, the data to be stored is space-time data, and the key values of the space-time data include associated key values, time distribution key values, and spatial distribution key values. Specifically, one expression form of the spatiotemporal data may be { "person _ id": 1"," record _ place ": X Road", "record _ time": 2019-05-0100:00:00 }, where in "person _ id": 1", person _ id is an association key identifier and 1 is an association key value. In the 'record _ time', 2019-05-0100:00:00 ', record _ time is a time distribution key identifier, 2019-05-0100:00:00 is a time distribution key value, and in the' record _ place ', X Road', record _ place is a space distribution key identifier and X Road is a space distribution key value.
The strip of space-time data is represented as data with an associated key value of 1, a time distribution key value of 2019-05-0100:00:00 and a space distribution key value of X Road. I.e., data for one day represented by a time of 2019-05-0100:00:00, space X Road.
S12b, determining the physical storage path of the data to be stored according to the key value of the data to be stored.
And calculating a time storage path of the spatio-temporal data according to the time distribution key values.
For the time-space data, the time attribute is possessed, and the time-space data needs to be stored into the corresponding time slice, specifically, the time storage path of the time-space data is calculated through the time distribution key value, and for the time-space data with the time distribution key value of 2019-05-0100:00:00, the time slice corresponding to the time distribution key value is the time slice with the time attribute of 2019, 05, month and 01.
In a specific embodiment, when time slicing is established, a serial number may be assigned to the time slicing, taking the time slicing of 5 months in 2019 as an example. Days 1 to 30 are 1,2,3 …..30, respectively. When calculating the time storage path of the spatio-temporal data, the time distribution key value can be calculated to obtain the sequence number of the time slice required to be stored.
And calculating a physical storage path of the spatiotemporal data according to the associated key values.
In a specific embodiment, the spatio-temporal data is associated with the archive data, specifically, the spatio-temporal data serves as the subdata of the archive data, and the spatio-temporal data is associated with the archive data by an association key, i.e., the association key of the spatio-temporal data is the archive primary key of the archive data corresponding to the spatio-temporal data.
The physical storage path of the spatio-temporal data can be calculated according to the associated key values, and in particular, the calculation method can be similar to that of the primary key values of the archive. Calculating a correlation key value by using a consistent Hash composite algorithm, thereby obtaining a routing value; the route value is then divided by the number of physical slices in the time slice to get a remainder, and the remainder is taken as the physical storage path.
Specifically, the time storage path and the physical storage path may be calculated simultaneously, and then an intersection is taken, or the time storage path may be calculated first, and then the physical storage path is calculated under the time slice corresponding to the time storage path, which is not limited herein.
And S13b, storing the data to be stored in the physical slice corresponding to the physical storage path under each time slice or partial time slice.
And storing the space-time data in the time slices corresponding to the time storage paths and the physical slices corresponding to the space-time physical storage paths.
That is, the spatiotemporal data is stored in the time slice corresponding to the time storage path and further in the physical slice corresponding to the physical storage path.
In a specific embodiment, the distributed database provided by the present application further includes a logical parent table and a logical child table for performing logical management. The logical parent table includes an archive primary key definition value, and the logical child table includes an association key definition value and a time distribution definition value.
In one embodiment, if the data to be stored is file data, the file data is identified by the file primary key definition value to obtain the file primary key value.
And if the data to be stored is space-time data, identifying the space-time data by using the associated key definition value to obtain a space-time associated key value, and identifying the space-time data by using the time distribution definition value to obtain a time distribution key value.
In the foregoing embodiment, a distributed database and a storage method are provided, where time slices and physical slices are established, and respective key values of archival data and spatio-temporal data are used, so that distributed storage may be performed, and thus, fast storage of relational data is achieved.
The data storage method of the distributed database is generally realized by a data storage device of the distributed database, so the invention also provides the data storage device of the distributed database. Referring to fig. 4, fig. 4 is a schematic structural diagram of a data storage device of a distributed database according to an embodiment of the present invention. The data storage device 100 of the distributed database of the present embodiment includes a processor 11 and a memory 12; the memory 12 stores a computer program, and the processor 11 is configured to execute the computer program to implement the steps of the data storage method of the distributed database.
Referring to fig. 5, fig. 5 is a schematic flowchart of a first embodiment of a data query method for a distributed database according to the present invention, where the data query method for the distributed database includes the following steps:
for each time slice, there are multiple physical slices. The physical shards store archival data and spatiotemporal data, wherein the archival data, as parent data of the spatiotemporal data, may include a plurality of spatiotemporal data. The archive data comprises an archive primary key value and an attribute key value; the space-time data comprises an association key value, a time distribution key value and a space distribution key value, wherein the association key value is associated with the primary archive key value.
In particular embodiments, a physical shard may store both gender, name, etc. profile data for a person and spatiotemporal data for a day.
And S21, acquiring the query condition.
And acquiring a query condition, wherein the query condition can be an archive query condition or a space-time query condition. Or includes both archive query conditions and spatio-temporal query conditions.
And S22, traversing each time slice or the physical slices under partial time slices according to the query condition.
And traversing the physical fragments under all time fragments or the physical fragments under partial time according to the query conditions, thereby acquiring the archival data or/and the spatio-temporal data matched with the query conditions.
And S23, outputting the archive data or/and the spatio-temporal data matched with the query conditions as a query result.
After traversal, outputting the archive data or/and the spatio-temporal data matched with the query conditions as a query result.
Referring to fig. 6, fig. 6 is a schematic flowchart of a second embodiment of a data query method for a distributed database according to the present invention, where the query condition is a file query condition, and the data query method for the distributed database of the present embodiment includes the following steps:
and S21a, acquiring the query condition.
In one embodiment, the query condition is a profile query condition, which includes at least an attribute condition. It may be specifically one or two. Such as name or gender, name and gender, etc.
And S22a, traversing each time slice or the physical slices under partial time slices according to the query condition.
Because the archive query condition is not related to time, the physical fragments under each time fragment can be traversed according to the archive query condition, namely, the physical fragments under all the time fragments are traversed.
The format of the archive query condition may be { "archive query condition": [ "sex ═ 1" ] }, and the attribute condition is "sex ═ 1", then the physical shards under all time shards are traversed according to the query condition. Thereby acquiring the archive data with the attribute key value also being 1.
And S23a, outputting the archive data or/and the spatio-temporal data matched with the query conditions as a query result.
The file data with the attribute key value matching the attribute condition is output, and if the file query condition is taken as an example, the file data with the attribute key value sex being 1 is output, that is, all the file data with gender of male are output as the query result.
In one embodiment, the subdata of the archive data, i.e., the spatiotemporal data of the archive data, may also be output as a query result.
Referring to fig. 7, fig. 7 is a schematic flowchart of a third embodiment of a data query method for a distributed database according to the present invention, where the query condition is a spatio-temporal query condition, and the data query method for the distributed database of the present embodiment includes the following steps:
and S21b, acquiring the query condition.
In a particular embodiment, the query condition is a spatiotemporal query condition, the spatiotemporal query condition including a temporal distribution condition.
And S22b, traversing each time slice or the physical slices under partial time slices according to the query condition.
In an embodiment, the format of the spatio-temporal query condition may be { [ "spatio-temporal query condition" [ "record _ time [" 2019-05-0100:00:00 "}, and the time distribution condition is 2019-05-0100:00: 00. According to the time distribution condition, the time slice with the time attribute matched with the time distribution condition is determined, which is somewhat similar to the storage method in the above embodiment and is not described here again. All physical slices below the time slice are then traversed.
And S23b, outputting the archive data or/and the spatio-temporal data matched with the query conditions as a query result.
And outputting the spatiotemporal data and the archival data of which the time distribution key values are matched with the time distribution conditions under the physical fragmentation. Taking the above file query condition as an example, the spatio-temporal data with the time distribution key value record _ time of 2019-05-0100:00:00 is output.
In a specific embodiment, the archive data corresponding to the spatio-temporal data also needs to be output, and specifically, the archive data of the spatio-temporal data can be determined according to the association between the associated key value of the spatio-temporal data and the primary key value of the archive, and output as the query result.
In a specific embodiment, the spatio-temporal query condition further includes a spatial distribution condition, such as { [ "record _ time ═ 2019-05-0100:00:00 ]," "record _ place ═ X Road" ] }, and its spatial distribution condition record _ place is XRoad, when outputting, it is necessary to output spatio-temporal data and corresponding archival data in which a time distribution key value under the physical segment matches the time distribution condition and a spatial distribution key value matches the spatial distribution condition, that is, spatio-temporal data in which a spatio-temporal distribution value and a spatial distribution key value respectively correspond to the spatio-temporal distribution condition and the spatial distribution condition, and output archival data of the spatio-temporal data according to a correlation between the correlation key value and an archival primary key value, taking the above as an example, output the spatio-temporal distribution key value record _ time ═ 2019-05-0100:00, and the spatio-temporal data in which the spatial distribution key value record _ place is X Road and the archival data corresponding thereto, thereby being the query result.
Referring to fig. 8, fig. 8 is a schematic flowchart of a fourth embodiment of a data query method for a distributed database according to the present invention, where query conditions include an archive query condition and a spatio-temporal query condition, and the data query method for the distributed database of the present embodiment includes the following steps:
and S21c, acquiring the query condition.
In one embodiment, the query criteria include an archive query criteria and a spatio-temporal query criteria. The archive query condition comprises an attribute condition, and the spatio-temporal query condition comprises a time distribution condition.
And S22c, traversing each time slice or the physical slices under partial time slices according to the query condition.
In a specific embodiment, the query condition may be { "file query condition": [ "sex ═ 1" ], [ "space-time query condition" [ "record _ time ═ 2019-05-0100:00:00 } ] }, and the file data with the attribute key value matching the attribute condition is determined by traversing the physical shards of all the time shards according to the file query condition. And judging whether the number of the successfully matched file data is greater than a preset threshold value.
If the time distribution key value is greater than the threshold value, determining the time slice with the time attribute matched with the time distribution condition, traversing the physical slices under the time slice, and determining the time-space data with the time distribution key value matched with the time distribution condition in the physical slices.
If not, namely the time slice is smaller than the threshold, determining the time slice with the time attribute matched with the time distribution condition, and traversing the physical slice to which the archive data with the attribute key value matched with the attribute condition belongs under the time slice. Namely, the physical segment to which the archive data corresponding to the attribute key value belongs is directly traversed under the time segment corresponding to the time distribution condition.
And S23c, outputting the archive data or/and the spatio-temporal data matched with the query conditions as a query result.
And if the number of the file data is larger than the threshold value, matching operation is carried out on the file primary key values of the file data obtained by traversing and the associated key values of the time-space data, and the file data matched with the file primary key values and the associated key values and the time-space data thereof are output to be used as query results.
The method comprises the steps of performing matching operation according to a primary key value of the archive data and an associated key value of the space-time data, and determining the archive data corresponding to the associated key value of the space-time data.
Taking the above query condition as an example, in an embodiment, the archive data whose sex is 1 is queried on all physical slices under the time slice, and a result set is obtained: [ { "id": 1), "person _ id": 1, "" name ": John," "sex":1 }; { "id": 1), "person _ id": 2), "name": Jonny, "sex":1} ].
Inquiring the spatio-temporal data of record _ time 2019-05-0100:00:00 on the physical shards on the determined time shards to obtain a result set [ "id": 1", {" person _ id ": 1", "record _ place": X Road "," record _ time ": 2019-05-0100:00:00" }; { "id": 1), "person _ id": 3), "record _ place": Y Road, "" record _ time ": 2019-05-0100:00:00" }.
And intersecting the result set of the archival data and the result set of the spatio-temporal data, and determining the archival data { "id": 1"," person _ id ": 1", "record _ place": X Road "," record _ time ": 2019-05-0100:00:00 } according to the association key values, wherein the archival data {" id ": 1", "person _ id": 1"," name ": John", "sex":1 }. So as to output the archive data { "id": 1), "person _ id": 1, "" name ": John," "sex":1} as the query result. The time-space data { "id": 1), "person _ id": 1"," record _ place ": X Road", "record _ time": 2019-05-0100:00:00 "} are not returned as the query result.
In "id": 1", 1 refers to the serial number of the physical segment to which the archival data or spatio-temporal data belongs, and is added by way of example only, and in a specific embodiment, may be calculated according to an algorithm and a primary key/associated key, and is not stored in a record.
And if the number of the file data is smaller than the threshold value, determining the spatio-temporal data of which the time distribution key values in the traversed physical fragments are matched with the time distribution conditions, and outputting the spatio-temporal data and the file data thereof as a query result.
In a specific embodiment, the spatio-temporal conditions may further include a spatial distribution condition, and the query manner is similar to that in the above embodiment, which is not described herein again.
In the above embodiment, different traversal methods are adopted by determining the quantity value of the archive data, and for a large amount of data, the query speed can be increased by adopting an intersection manner, so that an optimal query manner is provided by setting a threshold value, and the query efficiency is ensured.
In an embodiment, the query condition may further include a paging condition and a sorting condition, the paging condition includes a numerical value of each page of the file data, and the sorting condition includes a sorting manner of the primary key of the file.
The format of the query condition may be { "file query condition" [ "sex ═ 1" ], [ "space-time query condition" [ "record _ time ═ 2019-05-0100:00:00", "record _ place ═ X Road" ], "paging condition": 3 pieces per page "," sorting condition ": arranged in reverse order according to id" }.
After determining a good result set according to the archive query condition and the spatio-temporal query condition,
first, the archive data is sorted according to the archive ID of the archive data in the query result and the sorting mode. That is, sorting is performed according to the size of "id": Z "in the archive data, specifically from large to small in this embodiment.
Then, the sorted file data is paged according to the number of each page, as exemplified by the paging condition. The three bars are divided into one page so that each page displays the file data of the bar value of each page and the spatio-temporal data of the file data.
In a specific embodiment, after each physical partition is queried, the number of result sets may be determined first, and it is determined whether the number meets the requirement of the paging condition, such as the minimum requirement of 3, and if so, paging may be performed first. If not, the page display can be performed after the query play in the whole process is sequentially waited.
In particular embodiments, the paging condition may also be a "paging condition" of "3 strips per page get page 1". I.e. further including the number of pages displayed, such as only the first page.
In the foregoing embodiment, by providing the time segment and the physical segment, when the query condition includes the archive query condition and the time-space query condition, the data can be independently queried according to the time-space query condition and the archive query condition, so that the required data can be quickly traversed, and the data does not need to be traversed stage by stage, thereby improving the efficiency of querying the data.
The data query method of the distributed database is generally realized by a data query device of the distributed database, so the invention also provides the data query device of the distributed database. Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of a data query apparatus for a distributed database according to the present invention. The data querydevice 200 of the distributed database of the present embodiment includes aprocessor 21 and amemory 22; thememory 22 stores a computer program, and theprocessor 21 is configured to execute the computer program to implement the steps of the data query method of the distributed database as described above.
In an embodiment, the data storage device 100 of the distributed database and thedata query device 200 of the distributed database may be specifically the same device, and are not limited herein.
As shown in fig. 10, the present application further provides adata storage apparatus 400 of a distributed database, where thedata storage apparatus 400 of the distributed database includes a receivingmodule 41, a determiningmodule 42, and astoring module 43. The receivingmodule 41 is configured to receive data to be stored; the determiningmodule 42 is configured to determine a physical storage path of the data to be stored according to a key value of the data to be stored; thestorage module 43 is configured to store the data to be stored in each time slice or a physical slice corresponding to a physical storage path under a part of the time slices. The specific steps of the above embodiments have already been described, and are not described herein again.
As shown in fig. 11, the present application further provides adata query apparatus 500 for a distributed database, where thedata query apparatus 500 for a distributed database includes an obtainingmodule 51, atraversing module 52, and anoutputting module 53. The obtainingmodule 51 is configured to obtain a query condition, the traversingmodule 52 is configured to traverse each time slice or each physical slice in a part of the time slices according to the query condition, and theoutputting module 53 is configured to output archive data or/and spatio-temporal data matched with the query condition as a query result. The specific steps of the above embodiments have already been described, and are not described herein again.
As shown in fig. 12, the present application further provides a distributed database, where the distributed database includes a storage manager and a logic manager, the storage manager includes a plurality of time slices, the time slices are divided by time attributes, each time slice includes a plurality of physical slices, and the physical slices are divided by routing attributes. The physical shards may store archival data and spatiotemporal data as actual memory shards. The file data is father data of the space-time data, the space-time data is subdata of the file data, one file data can correspond to a plurality of time data, and one space-time data only corresponds to one file data. The logic manager is used for carrying out logic management on the storage manager and comprises a logic parent table and a logic child table, wherein the logic parent table comprises a file primary key definition value, and the file primary key definition value can identify file data to obtain a file primary key value. The logical sub-table includes an associated key definition value and a time distribution definition value. The linkage definition value can identify spatio-temporal data to obtain spatio-temporal association key values, and the time distribution definition value can identify spatio-temporal data to obtain time distribution key values.
The logic processes of the data storage method of the distributed database and the data query method of the distributed database are presented as a computer program, and in terms of the computer program, if the computer program is sold or used as an independent software product, the computer program can be stored in a computer storage medium, so the invention provides the computer storage medium. Referring to fig. 13, fig. 13 is a schematic structural diagram of acomputer storage medium 300 according to an embodiment of the present invention, in which acomputer program 31 is stored, and the computer program is executed by a processor to implement the distribution network method or the control method.
Thecomputer storage medium 300 may be a medium that can store a computer program, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or may be a server that stores the computer program, and the server may send the stored computer program to another device for running or may run the stored computer program by itself. Thecomputer storage medium 300 may be a combination of a plurality of entities from a physical point of view, for example, a plurality of servers, a server plus a memory, or a memory plus a removable hard disk.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.