Movatterモバイル変換


[0]ホーム

URL:


CN113778318B - Data storage method and device - Google Patents

Data storage method and device
Download PDF

Info

Publication number
CN113778318B
CN113778318BCN202010575945.1ACN202010575945ACN113778318BCN 113778318 BCN113778318 BCN 113778318BCN 202010575945 ACN202010575945 ACN 202010575945ACN 113778318 BCN113778318 BCN 113778318B
Authority
CN
China
Prior art keywords
data
user
partition
archiving
layered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010575945.1A
Other languages
Chinese (zh)
Other versions
CN113778318A (en
Inventor
林艳
周德辉
文小东
史金昊
崔词茗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co LtdfiledCriticalBeijing Jingdong Century Trading Co Ltd
Priority to CN202010575945.1ApriorityCriticalpatent/CN113778318B/en
Publication of CN113778318ApublicationCriticalpatent/CN113778318A/en
Application grantedgrantedCritical
Publication of CN113778318BpublicationCriticalpatent/CN113778318B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention discloses a data storage method and device, and relates to the technical field of computers. One embodiment of the method comprises the steps of receiving user layered data obtained through a user layered computing task, and writing the user layered data into a preset thermal data set based on storage time and a model; according to the archiving task, archiving the user layered data meeting the preset period in the hot data set to a preset cold data set according to a dump model, and further obtaining the partition codes of the user layered data in the cold data set; and calling an archiving metadata writing service to acquire storage time, a model, archiving time and partition codes corresponding to the user layered data, and writing the storage time, the model, the archiving time and the partition codes into a data archiving metadata table of a database. Therefore, the embodiment of the invention can solve the problems of low efficiency and difficult management of the existing user layered data storage.

Description

Data storage method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data storage method and apparatus.
Background
The e-commerce platform needs to provide a digital user operation system for platform merchants to support user operation management of the platform. In order to make marketing activities and user operation strategies more targeted, a 4A user layering model is established, and users are divided into four layering models of cognition (Aware), attraction (Appeal), action (Act) and advocacy (Advocate) according to the degree of interaction behavior of the users with branded commodities.
Because the commodities are various, some commodities belong to popular explosion products, some commodities are quite popular, therefore, the difference of the user layering data quantity of different models is quite large, the difference of hundreds of pieces of user layering data and hundreds of pieces of user layering data can occur every day in user layering data, and the storage space occupied by the user layering data every day is calculated to be from hundreds of GB to hundreds of KB.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:
The e-commerce platform builds a data warehouse using APACHE HIVE (APACHE HIVE, a large-scale data warehouse software that is an open source of the Apache software foundation) and its evolving versions as the base software. The Hive's underlying file storage system is HDFS, where each partition acts as a folder in HDFS when storing user-layered data, creating too many small files and being difficult to integrate. The NameNode of the HDFS will load all file meta-information into the memory, if there are too many small files, it will occupy a lot of memory space in the NameNode, resulting in its performance degradation and excessive pressure. In addition, when Hive executes tasks, if the small files are stored too much, more scanning tasks are generated, resources are wasted, and management is difficult.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a data storage method and device, which can solve the problems of low efficiency and difficult management of the existing user layered data storage.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a data storage method including receiving user hierarchical data obtained through a user hierarchical computing task, writing the user hierarchical data into a preset thermal data set based on a storage time and a model;
according to the archiving task, archiving the user layered data meeting the preset period in the hot data set to a preset cold data set according to a dump model, and further obtaining the partition codes of the user layered data in the cold data set;
And calling an archiving metadata writing service to acquire storage time, a model, archiving time and partition codes corresponding to the user layered data, and writing the storage time, the model, the archiving time and the partition codes into a data archiving metadata table of a database.
Optionally, after writing into the data archiving metadata table of the database, the method further comprises:
According to the reading task of the user layered data, an archiving metadata reading service is called, the storage information of the target model in a preset time period is queried from a data archiving metadata table of the database based on the reading request, and the partition codes are returned to read the user layered data in the cold data set according to the partition codes.
Optionally, querying, based on the read request, stored information of the target model in a preset time period from a data archiving metadata table of the database, including:
judging whether storage information with a target model in a preset time period exists in a data archiving metadata table of a database based on a reading request, if so, extracting storage information with the largest version number, and returning to a corresponding partition code so as to read user layered data in a cold data set according to the partition code; if not, notifying that the user is not archived, and then reading the user layered data in the hot data set based on the read request.
Optionally, archiving the user layered data in the hot dataset satisfying the preset period to the preset cold dataset according to the dump model includes:
grouping user layered data meeting a preset period according to a model, and sorting the user layered data according to the retained data volume from large to small;
Acquiring the total user layering data quantity and the number of partitions which meet the preset period to obtain the average user layering data quantity of each partition;
Sequentially taking out a group of user layered data, and circularly executing the following processes for each group of user layered data until all user layered data are transferred to be stored:
judging whether the data quantity which is not transferred in the current group is larger than or equal to the average user layered data quantity, if so, judging whether a partition with the data quantity of 0 exists in the cold data set, if so, transferring the data based on the partition with the data quantity of 0, and if not, obtaining the partition with the largest residual space in the cold data set for transferring; if not, judging whether a partition with the difference value between the average user layered data quantity and the current data quantity is larger than or equal to the data quantity which is not transferred in the current group exists, if so, acquiring the smallest residual space in the partition for transferring, and if not, acquiring the partition with the largest residual space in the cold data set for transferring.
Optionally, when the transferring is performed, the method includes:
Acquiring layered data of the user to be restored according to the sequence from small storage time to large storage time in the group;
labeling each user layered data to be restored based on the partition code for the restoration in the cold data set;
and merging the marked user layered data to be restored into the file size stored by the HDFS block, and writing the file size into a cold data set HIVE table.
Optionally, writing the data archive metadata table of the database includes:
judging whether storage information with the same model coding and storage time exists in the current data archiving metadata table or not;
If yes, setting the storage information as invalid, and adding 1 to the version number in the storage information as the version number of the user layered data of the archive; if not, setting the version number of the user layered data of the archive to be 1.
Optionally, writing the user layered data into a preset thermal data set based on a storage time and a model, including:
partition encodings in the hot dataset are generated based on the storage time and the model to write the user-layered data to the hot dataset.
In addition, the invention also provides a data storage device, which comprises a first module, a second module and a third module, wherein the first module is used for receiving user layering data obtained through user layering calculation tasks, and writing the user layering data into a preset thermal data set based on storage time and a model;
The second module is used for archiving the user layered data meeting the preset period in the hot data set to a preset cold data set according to the archiving task and the unloading model, so as to obtain the partition codes of the user layered data in the cold data set;
and the third module is used for calling the archiving metadata writing service, acquiring the storage time, the model, the archiving time and the partition code corresponding to the user layered data, and writing the storage time, the model, the archiving time and the partition code into a data archiving metadata table of the database.
One embodiment of the above invention has the following advantages or benefits: because the user layered data is written into a preset thermal data set based on the storage time and the model; according to the archiving task, archiving the user layered data meeting the preset period in the hot data set to a preset cold data set according to a dump model, and further obtaining the partition codes of the user layered data in the cold data set; the method comprises the steps of calling an archiving metadata writing service, obtaining storage time, a model, archiving time and partition codes corresponding to user layered data, and writing the storage time, the model, the archiving time and the partition codes into a data archiving metadata table of a database, so that a scene which can adapt to various data processing is realized, small file fragments can be sorted when the use frequency of the user layered data is reduced, excessive memory of Namenode is not occupied, the pressure of the Namenode and a storage system is reduced, and the technical effects of reading and writing efficiency of user-defined user layered model data are maintained.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main flow of a data storage method according to one embodiment of the invention;
FIG. 2 is an architecture diagram of a data storage method according to an embodiment of the invention;
FIG. 3 is a schematic diagram of the main flow of hot dataset archiving in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of the main flow of writing a hot dataset into a cold dataset according to an embodiment of the invention;
FIG. 5 is a schematic diagram of a main flow of user layered data reading according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of the primary modules of a data storage device according to an embodiment of the present invention;
FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 8 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
FIG. 1 is a schematic diagram of the main flow of a data storage method according to one embodiment of the present invention, as shown in FIG. 1, the data storage method includes:
Step S101, receiving user layered data obtained through a user layered computing task, and writing the user layered data into a preset thermal data set based on storage time and a model.
In an embodiment, the invention provides for a hot data set and a cold data set. Wherein the thermal data set is used to store recently produced user hierarchical data for use by subsequent computing tasks. The cold dataset stores user tiered data that is no longer frequently used. For example: when user layering data obtained by calculation of one user layering model are all generated in a defined period, and after all fixed statistical tasks are calculated, the user layering data can be considered to be not used frequently. Tables 1 and 2 below also respectively refer to the fields included in the hot data set calculated for the user's hierarchical calculation task and the fields included in the cold data set written for the archiving task.
TABLE 1
TABLE 2
Preferably, in the embodiment of the present invention, step S101 to step S103 are performed using a Spark distributed computing framework. Spark is a large-scale distributed data computing framework of Apache software foundation open source.
It should be noted that a preferred embodiment of the present invention employs partition encodings generated in the hot dataset based on the storage time and model to write the user-layered data into the hot dataset.
And step S102, archiving the user layered data meeting the preset period in the hot data set to a preset cold data set according to the archiving task and the transcoding model, so as to obtain the partition codes of the user layered data in the cold data set.
In some embodiments, according to the method, the process of archiving the user layered data meeting the preset period in the hot data set into the preset cold data set according to the dump model, the user layered data meeting the preset period can be grouped according to the model and ordered from large to small according to the amount of the stored data; and acquiring the total user layering data quantity and the number of the partitions which meet the preset period so as to obtain the average user layering data quantity of each partition.
Then, sequentially taking out a group of user layered data, and circularly executing the following processes for each group of user layered data until all user layered data are transferred to be stored:
judging whether the data quantity which is not transferred in the current group is larger than or equal to the average user layered data quantity, if so, judging whether a partition with the data quantity of 0 exists in the cold data set, if so, transferring the data based on the partition with the data quantity of 0, and if not, obtaining the partition with the largest residual space in the cold data set for transferring; if not, judging whether a partition with the difference value between the average user layered data quantity and the current data quantity is larger than or equal to the data quantity which is not transferred in the current group exists, if so, acquiring the smallest residual space in the partition for transferring, and if not, acquiring the partition with the largest residual space in the cold data set for transferring.
As a further embodiment, in the process of transferring the hot data set to the cold data set, the user layered data to be transferred may be obtained according to the order of the group storage time from small to large; then labeling each user layered data to be restored based on the partition code for the restoration in the cold data set; and merging the marked user layered data to be restored into the file size stored by the HDFS block, and writing the file size into a cold data set HIVE table. The HDFS is a distributed file storage system. HIVE is a data warehouse tool based on Hadoop that can map structured data files into a database table and provide complete sql query functions.
It can be seen that the present invention will partition the archived data in the order of the number of data stripes, model size, and storage time ordering. Therefore, the method and the device realize the arrangement of the models with small data volume according to the normal storage file size, ensure that the data with the same model and similar date under the same model are filed together as much as possible, and ensure the scanning efficiency when using a cold data set. Meanwhile, the data volume of each partition of the cold data set is ensured to be balanced as much as possible, the situation of serious data inclination can not be generated, and the performance bottleneck caused by uneven partition storage can not be generated when the data is scanned.
And step S103, invoking an archiving metadata writing service, acquiring storage time, a model, archiving time and partition codes corresponding to the user layered data, and writing the storage time, the model, the archiving time and the partition codes into a data archiving metadata table of a database.
In some embodiments, after writing the data in the data archiving metadata table of the database, the present invention can invoke the archiving metadata reading service according to the reading task of the user layered data, query the data archiving metadata table of the database for the storage information of the target model in the preset time period based on the reading request, and return the partition code to read the user layered data in the cold data set according to the partition code. Preferably, the database may be a relational database Mysql.
It should be noted that, after writing into the data archiving metadata table of the database, the archiving metadata reading service may also be invoked according to the reading task of the user layered data. Then, judging whether storage information with a target model in a preset time period exists in a data archiving metadata table of the database based on the reading request, if so, extracting storage information with the largest version number, and returning to a corresponding partition code so as to read user layered data in the cold data set according to the partition code; if not, notifying that the user is not archived, and then reading the user layered data in the hot data set based on the read request.
As yet other embodiments, during writing to the data archive metadata table of the database, the present invention may determine whether stored information having the same model encoding and storage time exists in the current data archive metadata table. Setting the stored information as invalid according to the judging result, and adding 1 to the version number in the stored information as the version number of the user layered data of the file; if not, setting the version number of the user layered data of the archive to be 1.
It can be seen that the invention manages and maintains the archived data through the read-write service, firstly, when the data is refreshed, the data integrity of the archived meta information storage is ensured, and whether the data is latest and valid is distinguished through the version number. Secondly, when the scene of the user layered data is needed to be used, the storage information of the related model user layered data can be obtained by directly calling the filing metadata reading service, and the storage position of the bottom layer data is not needed to be judged according to the service scene. Thirdly, the function can be expanded to the storage and positioning of other types of data, and the method has good expansibility.
It is worth to say that the invention can also store the user layered data in a partition according to the model, directly add the newly added user layered data to the relevant partition, and perform one-time file merging for a period of time. Because the data size of each model is not uniform, if a model with larger data size is encountered, the task efficiency of scanning data becomes low, and pressure is caused to downstream tasks; if the data volume of user layered data is too small, the problem of small files still exists. Secondly, because the operation flow of the HIVE on data updating is complex, the HIVE is directly added to the related partition, and the layered data of the user is difficult to maintain.
In summary, the data storage method of the invention realizes hierarchical data access of large-scale electric users, and on one hand, the invention divides data into a cold data set and a hot data set, so that the invention can adapt to various data processing scenes. When user-defined user layered data are frequently used, a hot data set is used, so that the efficiency of current model user layered data scanning is ensured, and meanwhile, the problem that data are difficult to manage due to too many HIVE partitions is solved. On the other hand, the invention solves the problem of small files caused by too many partitions, combines and sorts the user layered model data with small data volume, can sort small file fragments when the use frequency of the user layered data is reduced, does not occupy too much memory of Namenode, and reduces the pressure of the Namenode and a storage system. Furthermore, the invention reduces Namenode pressure and simultaneously maintains the reading and writing efficiency of the user-defined user layering model data.
Fig. 2 is a block diagram of a data storage method according to an embodiment of the present invention, in which user-layered data is obtained through a user-layered calculation task, and the user-layered data is written into a preset thermal data set based on a storage time and a model. According to the archiving task, archiving the user layered data meeting the preset period in the hot data set to the preset cold data set (namely writing the user layered data into the cold data set) according to the dump model, and obtaining the partition codes of the user layered data in the cold data set. And then, according to the archiving task, invoking archiving metadata writing service, acquiring storage time, a model, archiving time and partition codes corresponding to the user layered data, and writing the storage time, the model, the archiving time and the partition codes into a data archiving metadata table of the Mysql database. At the same time, user layered data that has been archived in the hot dataset is deleted according to the archiving task.
In addition, through the data calculation task of user layering, the archiving metadata reading service can be called to inquire the storage information of the target model in the preset time period from the data archiving metadata table of the Mysql database, and the partition code is returned to read the user layering data in the cold data set according to the partition code. If the partition code is not returned, the user layered data in the hot data set is indicated to be not written into the cold data set, and the user layered data in the hot data set is directly read.
It should be noted that the user-layered computing task, the archiving task, and the user-layered data computing task (i.e., the reading service) may be executed in the Spark engine.
FIG. 3 is a schematic diagram of the main flow of thermal dataset archiving, according to an embodiment of the present invention, including:
step one: a list of archived model-based and storage times is obtained.
Step two: user hierarchical data in the hot dataset is scanned.
Step three: judging whether user layered data corresponding to the archive list exists in the hot data set, if so, performing a step four, and if not, exiting the process.
Step four: and distributing the archive partition codes according to the dump model.
Step five: the cold data set is written in accordance with the partition encoding.
Step six: and judging whether all writing is successful, if so, performing a step seven, and if not, deleting the user layered data of the cold dataset filed and returning to the step five.
Step seven: an archive metadata write service is invoked to write metadata information for the cold data set being archived.
Step eight: and judging whether the meta information is successfully written, if so, deleting the user layered data in the working and heating data set filed at this time, and if not, returning to the step seven.
FIG. 4 is a schematic diagram of the main flow of writing a hot dataset into a cold dataset according to an embodiment of the invention, including:
Step one: and acquiring user layered data in the thermal data set according to a preset period.
Step two: grouping according to the models, and calculating the total number of each model.
Step three: the models are sorted from big to small according to the total number of the models, and the models are sorted from small to big according to the date.
Step four: and calculating the total number of the partitions, and confirming the number of the partitions, namely part_num, so as to obtain the average number of the partitions, namely row_num.
Step five: judging whether unallocated user layered data exists, if yes, proceeding to step six, otherwise, exiting the flow.
Step six: and (5) sequentially taking out a group of models from small to large, and initializing the total number of the models as the number left_count of the remaining unassigned number.
Step seven: judging whether left_count is larger than or equal to row_num, if yes, performing step eight, and if not, performing step nine.
Step eight: judging whether the subarea which is not allocated with the user layered data exists or not, if so, performing the step ten, and if not, performing the step nine.
Step nine: judging whether a partition with the row_num less the block_num being greater than or equal to the left_count exists, if yes, performing the step eleventh, and if not, performing the step twelfth.
Wherein the number of stored data stripes of the partition block_num.
Step ten: the partition to which the data has not been allocated is fetched, and then step thirteenth is performed.
Step eleven: and searching a model list of which the row_num is subtracted by the block_num and is larger than or equal to the left_count, obtaining a partition with the largest block_num, and then performing the step thirteen.
Step twelve: the partition with the largest row_num minus block_num is fetched, and then step thirteenth is performed.
Step thirteen: and sequentially taking out unallocated user hierarchical data, allocating the unallocated user hierarchical data to the taken out partition, and updating the block_num and the left_count.
Step fourteen: and judging whether the current model is distributed, if yes, returning to the step five, and if not, carrying out the step fifteen.
Fifteen steps: judging whether the block_num is larger than or equal to the row_num, if so, returning to the step seven, and if not, returning to the step thirteen.
FIG. 5 is a schematic diagram of a main flow of user layered data reading according to an embodiment of the present invention, including:
Step one: after the computing task based on the user layered data is started, a request is sent to the archiving metadata reading service to request the data storage information of the required user layered data.
The request may include the number of the user hierarchical model, the start time and the end time of the storage time.
Step two: and the archiving metadata reading service receives the request, judges whether the metadata of the request exists in the archiving metadata table, if yes, proceeds to step four, otherwise proceeds to step three.
Step three: and returning the position of the user layered data corresponding to the hot data set request, and performing the step six.
In an embodiment, the metadata of the user-hierarchical data in the hot dataset is archived in the data archive metadata table of the database at this time, but the user-hierarchical data in the hot dataset is not written into the cold dataset. That is, the metadata of the user layered data in the hot dataset may be archived to the database and the user layered data may be written to the cold dataset separately.
Step four: and judging whether the metadata storage information is not available, if yes, performing a step five, and if not, directly returning the metadata storage information and performing a step six.
Step five: and returning the latest effective storage information.
In an embodiment, the valid meta-information of the largest version number is taken back.
Step six: and analyzing the returned storage information to read the user layered data in the cold dataset.
Fig. 6 is a schematic diagram of main modules of a data storage device according to an embodiment of the present invention, and as shown in fig. 6, the data storage device 600 includes a first module 601, a second module 602, and a third module 603. The first module 601 receives user layered data obtained through a user layered computing task, and writes the user layered data into a preset thermal data set based on storage time and a model; the second module 602 files the user layered data meeting the preset period in the hot data set to a preset cold data set according to a file task and a dump model, so as to obtain partition codes of the user layered data in the cold data set; the third module 603 invokes an archive metadata writing service to obtain the storage time, the model, the archive time and the partition code corresponding to the user hierarchical data, and then writes the storage time, the model, the archive time and the partition code into a data archive metadata table of the database.
In some embodiments, after the third module 603 writes to the data archive metadata table of the database, it further comprises:
According to the reading task of the user layered data, an archiving metadata reading service is called, the storage information of the target model in a preset time period is queried from a data archiving metadata table of the database based on the reading request, and the partition codes are returned to read the user layered data in the cold data set according to the partition codes.
In some embodiments, the third module 603 queries, based on the read request, the data archive metadata table of the database for stored information of the target model for a preset period of time, including:
judging whether storage information with a target model in a preset time period exists in a data archiving metadata table of a database based on a reading request, if so, extracting storage information with the largest version number, and returning to a corresponding partition code so as to read user layered data in a cold data set according to the partition code; if not, notifying that the user is not archived, and then reading the user layered data in the hot data set based on the read request.
In some embodiments, the second module 602 archives user layered data in the hot dataset that meets a preset period into a preset cold dataset according to a dump model, including:
grouping user layered data meeting a preset period according to a model, and sorting the user layered data according to the retained data volume from large to small;
Acquiring the total user layering data quantity and the number of partitions which meet the preset period to obtain the average user layering data quantity of each partition;
Sequentially taking out a group of user layered data, and circularly executing the following processes for each group of user layered data until all user layered data are transferred to be stored:
judging whether the data quantity which is not transferred in the current group is larger than or equal to the average user layered data quantity, if so, judging whether a partition with the data quantity of 0 exists in the cold data set, if so, transferring the data based on the partition with the data quantity of 0, and if not, obtaining the partition with the largest residual space in the cold data set for transferring; if not, judging whether a partition with the difference value between the average user layered data quantity and the current data quantity is larger than or equal to the data quantity which is not transferred in the current group exists, if so, acquiring the smallest residual space in the partition for transferring, and if not, acquiring the partition with the largest residual space in the cold data set for transferring.
In some embodiments, the second module 602, when performing the transfer, includes:
Acquiring layered data of the user to be restored according to the sequence from small storage time to large storage time in the group;
labeling each user layered data to be restored based on the partition code for the restoration in the cold data set;
and merging the marked user layered data to be restored into the file size stored by the HDFS block, and writing the file size into a cold data set HIVE table.
In some embodiments, the third module 603 writes to a data archive metadata table of the database, comprising:
judging whether storage information with the same model coding and storage time exists in the current data archiving metadata table or not;
If yes, setting the storage information as invalid, and adding 1 to the version number in the storage information as the version number of the user layered data of the archive; if not, setting the version number of the user layered data of the archive to be 1.
In some embodiments, the first module 601 writes the user layered data into a preset thermal dataset based on a storage time and a model, including:
partition encodings in the hot dataset are generated based on the storage time and the model to write the user-layered data to the hot dataset.
In addition, the data storage method and the data storage device of the present invention have a corresponding relationship in terms of implementation content, so the repetitive content will not be described.
Fig. 7 illustrates an exemplary system architecture 700 in which a data storage method or data storage device, a data storage discrimination or data storage discrimination device of embodiments of the invention may be employed.
As shown in fig. 7, a system architecture 700 may include terminal devices 701, 702, 703, a network 704, and a server 705. The network 704 is the medium used to provide communication links between the terminal devices 701, 702, 703 and the server 705. The network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 705 via the network 704 using the terminal devices 701, 702, 703 to receive or send messages or the like. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 701, 702, 703.
The terminal devices 701, 702, 703 may be various electronic devices having a data storage screen or a data storage discrimination screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 705 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using the terminal devices 701, 702, 703. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.
It should be noted that, the data storage method or the data storage determining method provided by the embodiments of the present invention is generally executed by the server 705, and accordingly, the computing device is generally disposed in the server 705.
It should be understood that the number of terminal devices, networks and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 8, there is illustrated a schematic diagram of a computer system 800 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 8 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data required for the operation of the computer system 800 are also stored. The CPU801, ROM802, and RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a liquid crystal data memory (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 801.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a first module, a second module, and a third module. The names of these modules do not constitute a limitation on the module itself in some cases.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by one of the devices, cause the device to include receiving user-hierarchical data obtained by a user-hierarchical computing task, writing the user-hierarchical data into a predetermined set of thermal data based on a storage time and a model; according to the archiving task, archiving the user layered data meeting the preset period in the hot data set to a preset cold data set according to a dump model, and further obtaining the partition codes of the user layered data in the cold data set; and calling an archiving metadata writing service to acquire storage time, a model, archiving time and partition codes corresponding to the user layered data, and writing the storage time, the model, the archiving time and the partition codes into a data archiving metadata table of a database.
According to the technical scheme provided by the embodiment of the invention, the problems of low efficiency and difficult management of the existing user layered data storage can be solved.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (9)

According to the archiving task, archiving the user layered data meeting the preset period in the hot data set to the preset cold data set according to the dump model, including: grouping user layered data meeting a preset period according to a model, and sorting the user layered data according to the retained data volume from large to small; acquiring the total user layering data quantity and the number of partitions which meet the preset period to obtain the average user layering data quantity of each partition; sequentially taking out a group of user layered data, and circularly executing the following processes for each group of user layered data until all user layered data are transferred to be stored: judging whether the data quantity which is not transferred in the current group is larger than or equal to the average user layered data quantity, if so, judging whether a partition with the data quantity of 0 exists in the cold data set, if so, transferring the data based on the partition with the data quantity of 0, and if not, obtaining the partition with the largest residual space in the cold data set for transferring; if not, judging whether a partition with the difference value between the average user layered data quantity and the current stored data quantity being greater than or equal to the data quantity which is not transferred in the current group exists, if so, acquiring the smallest residual space in the partition for transferring, and if not, acquiring the partition with the largest residual space in the cold data set for transferring;
The second module is configured to archive, according to an archiving task, user layered data in a hot data set meeting a preset period into a preset cold data set according to a dump model, and includes: grouping user layered data meeting a preset period according to a model, and sorting the user layered data according to the retained data volume from large to small; acquiring the total user layering data quantity and the number of partitions which meet the preset period to obtain the average user layering data quantity of each partition; sequentially taking out a group of user layered data, and circularly executing the following processes for each group of user layered data until all user layered data are transferred to be stored: judging whether the data quantity which is not transferred in the current group is larger than or equal to the average user layered data quantity, if so, judging whether a partition with the data quantity of 0 exists in the cold data set, if so, transferring the data based on the partition with the data quantity of 0, and if not, obtaining the partition with the largest residual space in the cold data set for transferring; if not, judging whether a partition with the difference value between the average user layered data quantity and the current stored data quantity being greater than or equal to the data quantity which is not transferred in the current group exists, if so, acquiring the smallest residual space in the partition for transferring, and if not, acquiring the partition with the largest residual space in the cold data set for transferring; further obtaining the partition codes of the user layered data in the cold data set;
CN202010575945.1A2020-06-222020-06-22Data storage method and deviceActiveCN113778318B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010575945.1ACN113778318B (en)2020-06-222020-06-22Data storage method and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010575945.1ACN113778318B (en)2020-06-222020-06-22Data storage method and device

Publications (2)

Publication NumberPublication Date
CN113778318A CN113778318A (en)2021-12-10
CN113778318Btrue CN113778318B (en)2024-09-20

Family

ID=78835202

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010575945.1AActiveCN113778318B (en)2020-06-222020-06-22Data storage method and device

Country Status (1)

CountryLink
CN (1)CN113778318B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114416699A (en)*2022-01-212022-04-29北京自如信息科技有限公司 A relational database data management method, device and electronic device

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109726174A (en)*2018-12-282019-05-07江苏满运软件科技有限公司Data archiving method, system, equipment and storage medium
DE102018129366A1 (en)*2018-11-212020-05-28Deepshore Gmbh System for processing and storing data requiring archiving

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108509624A (en)*2018-04-082018-09-07武汉斗鱼网络科技有限公司A kind of database filing method for cleaning and system, server and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
DE102018129366A1 (en)*2018-11-212020-05-28Deepshore Gmbh System for processing and storing data requiring archiving
CN109726174A (en)*2018-12-282019-05-07江苏满运软件科技有限公司Data archiving method, system, equipment and storage medium

Also Published As

Publication numberPublication date
CN113778318A (en)2021-12-10

Similar Documents

PublicationPublication DateTitle
CN109997126B (en)Event driven extraction, transformation, and loading (ETL) processing
CN108629029B (en)Data processing method and device applied to data warehouse
US9817877B2 (en)Optimizing data processing using dynamic schemas
CN107229718B (en)Method and device for processing report data
CN107704202B (en)Method and device for quickly reading and writing data
CN111753019B (en)Data partitioning method and device applied to data warehouse
CN112597126B (en)Data migration method and device
US9372880B2 (en)Reclamation of empty pages in database tables
CN111061680B (en) A method and device for data retrieval
CN107729399B (en)Data processing method and device
CN112835904A (en) A data processing method and data processing device
CN112883009B (en)Method and device for processing data
EP3767486B1 (en)Multi-record index structure for key-value stores
CN112182138A (en) Method and device for cataloging
CN111984686A (en)Data processing method and device
CN114625695A (en)Data processing method and device
CN113448957B (en) A data query method and device
CN113778318B (en)Data storage method and device
CN112783887A (en)Data processing method and device based on data warehouse
CN119336731A (en) Data storage method and device
CN113095778A (en)Architecture for managing mass data in communication application through multiple mailboxes
CN115794876A (en)Fragment processing method, device, equipment and storage medium for service data packet
CN113760600A (en)Database backup method, database restoration method and related device
CN113760860A (en)Data reading method and device
CN113704242A (en)Data processing method and device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp