Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.
According to an embodiment of the present invention, a data storage method is provided. The embodiment can be executed in a platform layer, namely, in a data receiving and storing end. Fig. 1 is a flow chart of a data storage method according to an embodiment of the invention.
Step S110, data is collected from a target data source.
The target data source refers to a data source needing data acquisition. In other words, the target data source is the source platform of the data. Further, the target data source refers to upstream data. For example: a microblog data source, a WeChat data source, a question and answer data source and the like. The embodiment of the present invention does not limit the specific type of the target data source, and in principle, any data source that needs to perform data acquisition and storage may be used as the target data source in the embodiment of the present invention.
Step S120, according to the field information in the data, determining the field type mapped by the field information.
The field information refers to a field in the data. Further, the data may include a plurality of field information.
The field type refers to a category to which the field information belongs.
In the present embodiment, the field type is set according to the characteristics of the field between different target data sources. The characteristics of the fields are, for example, commonality (commonality) and personality between the fields.
For example: a field type, including: general fields and specific fields. The common field is used for mapping the same field, namely the common field, among target data sources of different data structures. The special fields are used for mapping different fields, namely individual fields, among target data sources of different data structures.
For another example: the field type includes, in addition to general fields and specific fields: an extension field. And the extension field is used for mapping the newly extended field in the target data source.
Further, before acquiring data from the target data source, the method may further include: when the target data source is newly added, a database table corresponding to the target data source is created; the database table comprises a plurality of storage spaces, and each storage space is used for storing field information corresponding to one field type; and/or setting a mapping relation table according to the data structure of the target data source; wherein the mapping relation table includes: and each kind of field information in the target data source is respectively mapped to a field type.
The determining the field type of the field information mapping includes: and determining the field type of the field information mapping by inquiring the mapping relation table.
And step S130, storing the data in a database table corresponding to the target data source into a storage space corresponding to the field type.
In this embodiment, before the data is stored in the storage space corresponding to the field type, a field type corresponding to each piece of field information in the data may also be determined; determining a storage space in the database table respectively corresponding to each field type; respectively storing each field information in the data into corresponding storage space in the database table; and/or, when storing the data in the storage space, if data is already stored in the storage space, overwriting the already stored data with the acquired data.
In this embodiment, the database includes a plurality of database tables with at least partially identical storage structures; wherein each database table is used for storing data collected from one target data source. The part with the same storage structure in the plurality of database tables is divided into a plurality of storage spaces, and each storage space is used for storing field information corresponding to one field type. Further, the storage structure of the multiple database tables may be the same, namely: each database table is divided into a plurality of storage spaces, and each storage space is used for storing field information corresponding to one field type. In this way, although the data structures of different target data sources are the same or different, the storage structures of the database tables corresponding to different target data sources are the same.
For example: the database comprises a plurality of database tables, each database table corresponds to one target data source, and the same two field types are set for each database table in the plurality of database tables, wherein one field type is used for mapping general fields among the plurality of target data sources, and the other field type is used for mapping personalized fields among the plurality of target data sources. Such as: the 'praise' is a general field of a microblog data source and a WeChat data source, and the 'hot search ranking' is a personalized field of microblog data.
Further, the storage space corresponding to each field type is a field set, and each field set comprises a plurality of fields. When the data is stored in the storage space, each field information in the data may be stored in a corresponding field in the storage space of a corresponding field type. For example: and storing the 'approval number of 2000' into an 'approval number' field in the storage space of the corresponding field type.
In the embodiment, a unified storage structure is provided for the data sources with different data structures by setting different field types for the storage space in the database table, so that the problem that different data storage structures are respectively set for the data sources with different data structures is avoided, the storage complexity of different target data sources is reduced, the data storage steps are simplified, the data storage efficiency is improved, the target data sources with different data structures can reuse the warehousing logic of the invention, and the unified warehousing of the data of the target data sources with different data structures is facilitated.
In the embodiment, the same and different target data sources are abstracted into the same storage structure, and when the target data sources are newly added, as long as the target data sources are structured into the storage structure agreed by the invention, the platform layer can adapt to the newly added target data sources without any modification, so that the data of each target data source can be unified, the storage complexity of each target data source is reduced, and the workload and the development cost are reduced.
The present embodiment may perform data acquisition and storage in a streaming manner for each target data source.
Based on the above embodiment, the following describes the steps of setting the mapping relationship. Fig. 2 is a flowchart illustrating a setting procedure of mapping relationships according to an embodiment of the present invention.
Step S210, when a target data source is newly added, a database table corresponding to the target data source is created.
In this embodiment, the storage structure of the database table corresponding to the target data source is the same as the storage structure of the database tables corresponding to other target data sources.
In a database table, include: and each storage space is used for storing field information corresponding to one field type.
Step S220, the data structure of the target data source is analyzed.
The data structure of the target data source is parsed for determining the fields involved in the target data source.
Step S230, setting a mapping relation table according to the data structure of the target data source.
In the mapping relationship table, the following are included: each kind of field information in the target data source is mapped to a field type respectively.
Specifically, the field types may include: a common field common and a special field special.
A common field for mapping the same field between target data sources of different data structures. The storage space corresponding to the general field can store similar field information between different target data sources. For example: ID (Identity document), mainContent, updateTime, etc. Further, the same field between target data sources of different data structures refers to a field with the same meaning between target data sources of different data structures. For example: the "praise number" and the "support number" have the same meaning.
Unique fields, also known as personalization fields, are used to map different fields between target data sources of different data structures. The storage space corresponding to the specific field can store the field information specially held by the target data source. Further, different fields between target data sources of different data structures refer to special holding fields between target data sources of different data structures. For example: compared with the situation that the microblog data source does not have the 'hot search ranking', the microblog data source has the 'hot search ranking' which is the personalized field of the microblog data source.
Further, the field type may further include: the extension field dynamic. And the extension field is used for mapping the newly extended field in the target data source. Furthermore, because the data content of the target data source is not strictly unchanged, the extension field is set to be prepared in advance for adding a new field to the target data source, and when a certain target data source needs to add a new field, the data adaptation can be realized without any modification.
After a database table corresponding to a target data source is established, according to a data structure of the target data source, fields contained in the target data source can be determined, and then the field type to be mapped by each field of the target data source is determined; according to the field type to be mapped of each field of the target data source, the storage space corresponding to the field type in the database table can be determined, and then a mapping relation table can be set.
According to the set mapping relation table, when data storage is executed, the field type of the field information mapped in the database table can be inquired in the mapping relation table, and then the storage space in which the field information should be stored is determined.
In this embodiment, when a data source is newly added, a database table having the same storage structure as other target data sources may be directly created without setting a corresponding data structure for the newly added data source, and data storage of different target data sources may be implemented by the database table.
In this embodiment, for the problem that the data structures of different target data sources are different, a general field, a specific field, and an extended field are provided for all the target data sources, and are respectively used to store data with the same meaning of each target data source, specially-held data, and newly-added field data, so that the target data sources can be adapted without modifying a database table during data storage.
A data storage device is provided below. The data storage device may be disposed at a platform layer.
Fig. 3 is a block diagram of a data storage device according to an embodiment of the present invention.
The data storage device includes: an acquisition module 310, a query module 320, and a storage module 330.
An acquisition module 310 for acquiring data from a target data source.
A determining module 320, configured to determine, according to field information in the data, a field type mapped by the field information.
A storage module 330, configured to store the data in a database table corresponding to the target data source into a storage space corresponding to the field type.
Wherein the device further comprises a setting module (not shown in the figures). The setting module is used for creating a database table corresponding to a target data source when the target data source is newly added before the data is collected from the target data source; the database table comprises a plurality of storage spaces, and each storage space is used for storing field information corresponding to one field type; and/or setting a mapping relation table according to the data structure of the target data source; wherein the mapping relation table includes: and each kind of field information in the target data source is respectively mapped to a field type.
Wherein the determining module 320 is configured to: and determining the field type of the field information mapping by inquiring the mapping relation table.
Wherein the determining module 320 is configured to: before the data is stored in a storage space corresponding to the field type, determining the field type corresponding to each field information in the data; determining a storage space in the database table respectively corresponding to each field type; the storage module 330 is configured to: respectively storing each field information in the data into corresponding storage space in the database table; and/or, when storing the data in the storage space, if data is already stored in the storage space, overwriting the already stored data with the acquired data.
The database comprises a plurality of database tables with at least partially same storage structures; wherein each database table is used for storing data collected from one target data source.
Wherein the field type includes: a general field and a specific field; the universal field is used for mapping the same field between target data sources of different data structures; the specific fields are used for mapping different fields between target data sources of different data structures.
Wherein the field type further includes: an extension field; and the extension field is used for mapping the newly extended field in the target data source.
The functions of the apparatus of the present invention have been described in the method embodiments shown in fig. 1 to fig. 2, so that reference may be made to the related descriptions in the foregoing embodiments for details in the description of the present embodiment, which are not repeated herein.
The data storage device comprises a processor and a memory, the acquisition module 310, the query module 320, the storage module 330 and the like are all stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more, and the data storage method is realized by adjusting the kernel parameters. Since the data storage method has already been described above, it is not described herein.
An embodiment of the present invention provides a storage medium (computer-readable storage medium) having a program stored thereon, the program implementing the data storage method when executed by a processor. Further, the storage medium stores one or more programs, which are executable by one or more processors to implement the data storage method. Since the data storage method has already been described above, it is not described herein.
The embodiment of the invention provides a processor, which is used for running a program, wherein the data storage method is executed when the program runs. Since the data storage method has already been described above, it is not described herein.
The embodiment of the invention provides a data storage device. Fig. 4 is a block diagram of a data storage device according to an embodiment of the present invention.
Thedata storage device 40 includes at least oneprocessor 410, and at least onememory 420 connected to the processor, abus 430; theprocessor 410 and thememory 420 complete communication with each other through thebus 430;processor 410 is operative to call program instructions inmemory 420 to perform the data storage methods described above. Since the data storage method has already been described above, it is not described herein. The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The invention also provides a computer program product. When executed on a data processing device, is adapted to perform a procedure for initializing the following method steps: collecting data from a target data source; determining the field type mapped by the field information according to the field information in the data; and storing the data into a storage space corresponding to the field type in a database table corresponding to the target data source.
Wherein, before the acquiring data from the target data source, further comprising: when the target data source is newly added, a database table corresponding to the target data source is created; the database table comprises a plurality of storage spaces, and each storage space is used for storing field information corresponding to one field type; and/or setting a mapping relation table according to the data structure of the target data source; wherein the mapping relation table includes: and each kind of field information in the target data source is respectively mapped to a field type.
Wherein the determining the field type of the field information mapping includes: and determining the field type of the field information mapping by inquiring the mapping relation table.
Wherein, prior to storing the data in the storage space corresponding to the field type, the method further comprises: determining a field type corresponding to each field information in the data; determining a storage space in the database table respectively corresponding to each field type; the storing the data into a storage space corresponding to the field type in a database table includes: respectively storing each field information in the data into corresponding storage space in the database table; and/or, when storing the data in the storage space, if data is already stored in the storage space, overwriting the already stored data with the acquired data.
The database comprises a plurality of database tables with at least partially same storage structures; wherein each database table is used for storing data collected from one target data source.
Wherein the field type includes: a general field and a specific field; the universal field is used for mapping the same field between target data sources of different data structures; the specific fields are used for mapping different fields between target data sources of different data structures.
Wherein the field type further includes: an extension field; and the extension field is used for mapping the newly extended field in the target data source.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.