Movatterモバイル変換


[0]ホーム

URL:


CN112817938A - General data service construction method and system based on data productization - Google Patents

General data service construction method and system based on data productization
Download PDF

Info

Publication number
CN112817938A
CN112817938ACN202110118636.6ACN202110118636ACN112817938ACN 112817938 ACN112817938 ACN 112817938ACN 202110118636 ACN202110118636 ACN 202110118636ACN 112817938 ACN112817938 ACN 112817938A
Authority
CN
China
Prior art keywords
data
real
time
product
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110118636.6A
Other languages
Chinese (zh)
Inventor
汪尚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yilaixin Technology Co ltd
Original Assignee
Beijing Yilaixin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yilaixin Technology Co ltdfiledCriticalBeijing Yilaixin Technology Co ltd
Priority to CN202110118636.6ApriorityCriticalpatent/CN112817938A/en
Publication of CN112817938ApublicationCriticalpatent/CN112817938A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明提供了一种基于数据产品化的通用数据服务构建方法及系统,该方法通过依据业务场景对数据的需求定义通用的目标数据产品,设置为形成对应通用数据服务提供原始数据的数据源,进而综合考虑目标数据产品的实际业务需求制定通用数据服务对应的调取方式和运算策略。采用上述方案,能够克服现有数据服务形成技术中规范性不足以及操作繁琐的问题,基于通用目标数据产品结合设定的数据调取方式和运算策略,构建的对应数据服务可复用,且可靠性和灵活性更佳,便于高效地为各个业务场景提供有力支持。

Figure 202110118636

The invention provides a general data service construction method and system based on data productization. The method defines a general target data product according to the data requirements of a business scenario, and is set to form a data source corresponding to the general data service to provide original data, Then, comprehensively consider the actual business requirements of the target data product to formulate the corresponding retrieval method and calculation strategy of the general data service. The above solution can overcome the problems of insufficient standardization and cumbersome operation in the existing data service formation technology. Based on the data retrieval method and calculation strategy set in combination with the general target data product, the corresponding data service constructed is reusable and reliable. Better flexibility and flexibility to efficiently support various business scenarios.

Figure 202110118636

Description

General data service construction method and system based on data productization
Technical Field
The invention relates to the technical field of data management and application, in particular to a general data service construction method and system based on data productization.
Background
Because the information contained in the data can provide great support for technical development and optimization in various fields, the data management technology is applied more and more widely in various fields, however, the original data acquired from the data source cannot be directly provided for an upper platform or application to be used, and a series of processing such as encapsulation, processing and the like needs to be performed on the acquired original data through a data service technology, so that the data can meet the requirements of various application construction scenes.
At present, the mainstream mode of data service is to set up corresponding data service in the modes of sql, data cube and the like aiming at the requirement scene of user application construction so as to meet the requirement of corresponding application. The data services of the type are designed and generated aiming at each application scene, have no reusability, and for various application construction scenes, various data services need to be repeatedly formulated; the technology for establishing the data service has no determined rule and mode, and is determined by the user according to the actual scene, different users may use different technologies, the normalization is poor, and the reliability cannot be guaranteed.
Disclosure of Invention
In order to solve the above problem, the present invention provides a method for constructing a generic data service based on data productization, and in one embodiment, the method includes:
a target product definition step, wherein a universal target data product is defined according to the requirement of a service scene on data, the target data product comprises one or more of a real-time data product, a detail query product, a service theme product, a data set product and a data cube product, and each target data product corresponds to one or more service scenes;
determining a data source, setting a plurality of data sources for providing original data for a general data service forming the target data product;
and a data service formulation step, wherein the actual business requirements of the target data product and the data source are comprehensively considered to formulate a data calling mode and an operation strategy corresponding to the general data service.
Preferably, in one embodiment, in the data source determining step,
the data source comprises a historical data source and an event data source, wherein the original data of the historical data source adopts the online analysis processing data of a data warehouse, and the original data of the event data source adopts the event data generated by a source service system.
In one embodiment, in the data source determining step, the raw data of the historical data source further includes online transaction data in the database
Further, in one embodiment, the data service formulating step includes:
step A1, considering the real-time property of data, dividing the original data into real-time data, quasi-real-time data and non-real-time data;
step A2, analyzing the real-time performance of data objects corresponding to each target data product according to actual business requirements, and determining the type of matched original data;
and step A3, specifying the calling mode and the operation strategy corresponding to each target data product by combining the original data type and the data source.
Specifically, in one embodiment, the real-time data product is oriented to a scene of real-time analysis requirements, corresponds to a real-time data object, and is matched with real-time class data and/or quasi-real-time class data;
the detail query product is oriented to a scene of large-scale detail data query requirements, corresponds to a non-real-time data object, is matched with non-real-time data, and needs to perform data transcoding before data operation;
the business theme product is oriented to a visual report and a scene of a relation exploration analysis demand, corresponds to a non-real-time data object, is matched with non-real-time data, and needs to perform data correlation analysis before data operation;
the data set product is oriented to a scene with temporary data demand, corresponds to a non-real-time data object and is matched with non-real-time data;
the data cube product is oriented to a report with large data volume and a scene with visual exploration analysis requirements, corresponds to a non-real-time data object, is matched with non-real-time data, needs to perform data association analysis before data operation, and pre-calculates an aggregation table according to a set rule on the data after association analysis.
Further, in the step a3, the method includes:
for real-time class source data and quasi-real-time class source data, directly acquiring corresponding original data from a source pasting layer of the data warehouse or the database; and/or
Acquiring set source data from the source service system;
and for non-real-time source data, processing and synchronizing data in a source service system to a corresponding first storage layer at regular time through an etl technology, and then calling.
In one embodiment, step a3 includes:
for real-time source data and quasi-real-time source data, a stream type calculation engine is adopted to calculate the acquired original data, and the calculation result is uniformly backed up to a second storage layer for management and calling;
and for non-real-time source data, calculating the called original data by adopting a batch calculation engine, and directly transmitting the calculation result to a corresponding target data product or storing the calculation result to a second storage layer.
In an optional embodiment, the generic target data product further comprises an index cloud product and a tag cloud product; in step a2, the method further includes:
the index cloud product faces to the description demand scene of different business object indexes, corresponds to a non-real-time data object and is matched with non-real-time data;
the basic information of the label cloud product facing the indexes describes a demand scene, corresponds to a non-real-time data object and is matched with non-real-time data.
The present invention also provides a storage medium having stored thereon program code that implements the method described in any one or more of the embodiments described above, based on the method described in any one or more of the embodiments described above.
In accordance with other aspects of the method described in any one or more of the above embodiments, the present invention further provides a data productization-based generic data service building system that performs the method described in any one or more of the above embodiments.
Compared with the closest prior art, the invention also has the following beneficial effects:
the invention provides a general data service construction method and a system based on data productization, the method defines general target data products according to the requirements of service scenes on data, wherein each defined target data product effectively corresponds to the requirements of one or more service scenes, and a data service system constructed based on the target data products has reusability, so that the defect that the prior art always needs to re-execute operation aiming at the current service scenes to construct new data service can be overcome;
furthermore, according to the scheme of the invention, from the data sources with different real-time attributes and different business scene requirements, various influence factors are integrated to formulate a data calling mode and an operation strategy suitable for a corresponding data service system, so that the reliability of the constructed data service is ensured to the maximum extent, and the data service combination corresponding to various target data products can provide comprehensive selection of various business scenes for users, so that the portability and the timeliness are better.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a method for building a generic data service based on data productization according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a schematic framework of a general data service construction method based on data productization in the embodiment of the present invention;
FIG. 3 is an exemplary diagram of a real-time data product data service architecture of a generic data service construction method according to an embodiment of the present invention;
FIG. 4 is a detailed data product data service architecture illustration diagram of a generic data service construction method according to another embodiment of the present invention;
FIG. 5 is an exemplary diagram of a data service system of a business theme product of a general data service construction method according to an embodiment of the present invention;
FIG. 6 is a diagram of an example of a data set product data service architecture for a generic data service construction method according to an embodiment of the present invention;
FIG. 7 is a data cube product data service architecture example diagram of a generic data service construction method provided by an embodiment of the present invention;
fig. 8 is an exemplary diagram of an index cloud product data service system of a general data service construction method according to an embodiment of the present invention;
fig. 9 is a diagram illustrating a tag cloud product data service architecture of a general data service construction method according to an embodiment of the present invention.
Detailed Description
The following detailed description will be provided for the embodiments of the present invention with reference to the accompanying drawings and examples, so that the practitioner of the present invention can fully understand how to apply the technical means to solve the technical problems, achieve the technical effects, and implement the present invention according to the implementation procedures. It should be noted that, unless otherwise conflicting, the embodiments and features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are all within the scope of the present invention.
Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. The order of the operations may be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.
The computer equipment comprises user equipment and network equipment. The user equipment or the client includes but is not limited to a computer, a smart phone, a PDA, and the like; network devices include, but are not limited to, a single network server, a server group of multiple network servers, or a cloud based on cloud computing consisting of a large number of computers or network servers. The computer devices may operate individually to implement the present invention or may be networked and interoperate with other computer devices in the network to implement the present invention. The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
The terms "first," "second," and the like may be used herein to describe various elements, but these elements should not be limited by these terms, which are used merely to distinguish one element from another. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. When an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Since the data era is entered, because information contained in data can provide great support for technical development and optimization in various fields, data management technology is applied more and more widely in various fields, however, original data acquired from a data source cannot be directly provided for an upper-layer platform or application to be used, and a series of processing such as encapsulation and processing of the acquired original data needs to be performed through a data service technology, so that the data meets the requirements of various application construction scenes.
At present, the mainstream mode of data service is to set up corresponding data service in the modes of sql, data cube and the like aiming at the requirement scene of user application construction so as to meet the requirement of corresponding application. The data services of the type are designed and generated aiming at each application scene, have no reusability, and for various application construction scenes, various data services need to be formulated; the technology for establishing the data service has no determined rule and mode, and is determined by the user according to the actual scene, different users may use different technologies, the normalization is poor, and the reliability cannot be guaranteed.
In order to solve the problems, the invention provides a general data service construction method and system based on data productization, defines a flexible and reusable standardized data product, provides a standardized data service based on the data product, is beneficial to improving the efficiency of the data service, can adapt to various application scenes, and is also beneficial to exerting the value of data to the maximum due to the reusability of the data product. The method comprises real-time data/quasi-real-time data + offline data classification, and is combined with a streaming computing engine + batch computing engine, so that sub-services of various data products such as real-time data service, detail query service, business theme, data set, data cube, index cloud and tag cloud are provided for a user to select and construct data service, and the data service requirements of the user in different application scenes can be met simply, easily, flexibly and efficiently
The detailed flow of a method according to an embodiment of the invention is described in detail below based on the accompanying drawings, the steps shown in the flow chart of which can be executed in a computer system containing instructions such as a set of computer executable instructions. Although a logical order of steps is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
Example one
Fig. 1 is a schematic flowchart illustrating a general data service construction method based on data productization according to an embodiment of the present invention, and as can be seen from fig. 1, the method includes the following steps.
A target product definition step, wherein a universal target data product is defined according to the requirement of a service scene on data, the target data product comprises one or more of a real-time data product, a detail query product, a service theme product, a data set product and a data cube product, and each target data product corresponds to one or more service scenes;
determining a data source, setting a plurality of data sources for providing original data for a general data service forming the target data product;
and a data service formulation step, wherein the actual business requirements of the target data product and the data source are comprehensively considered to formulate a calling mode and an operation strategy of the corresponding general data service.
In practical application, in the target product definition step, common requirements of different service scenes on data are considered, and effective target data products are determined by combining application effects of the data in each service scene. For example, in different business scenarios, real-time monitoring data can provide convenience for users, or further support for application in data analysis and processing, and thus, corresponding universal real-time data products are defined. It should be noted that, in the process of actually defining a general target data product, not all common requirements define the corresponding data product, and the value of the target data product is determined by combining the application effect of the data, specifically, researchers can analyze and calculate the value contribution degree of the data by using a reasonable means in the field, and the invention is not limited in particular here.
Further, considering that for each target data product, the reliability of the raw data is crucial to the effectiveness and processing efficiency of the data service architecture. In a preferred embodiment, in the data source determining step, the data source includes a historical data source and an event data source, wherein the raw data of the historical data source adopts online analysis of a data warehouse to process data, and the raw data of the event data source adopts event data generated by a source business system.
In practical applications, in one embodiment, in the data source determining step, the raw data of the historical data source further includes online transaction data in the database
The data warehouse is generated for further mining data resources and decision making needs under the condition that a database already exists in a large quantity, is different from a large database, and related data are processed in an operation type and an analysis type from the database to the data warehouse. A basic data warehouse is divided into a source layer, a history layer and a data model layer; the source layer, belonging to a data buffer, generally extracts data of a source system, and the structure of the data is consistent with that of an original system, but the data of the source layer is not the same as that of the original data of the database.
In practical application, basic data cleaning is performed on data of a source pasting layer in a data warehouse, and the data cleaning of the source pasting layer mainly comprises two aspects:
1. the type of data. When a data warehouse is generally built, data can come from different types of databases such as oracle, mysql, sql server and the like, although the databases generally conform to general data types, slight differences exist, and if the data types are not processed well, the data entering the data warehouse can be different from the data entering the original system. For example, a double type in mysql enters hive, and it may not be necessarily well suited to support the double type, and at this time, it may be necessary to change the relevant data type, for example, to replace it with decimal, for example, to use a double type in hive at 0.0001, and it may store the data of scientific counting method, and if such data type is not processed, the data obtained later is the wrong data.
2. Obviously erroneous data. For example, if there is empty data or there is error data with special characters, when the original system is a transactional system, it is inevitable that there is an error operation, and there may be obvious dirty data such as empty data, and some fields have line feed symbols, and if there is no processing, the data may enter the data warehouse and be misplaced. Based on the analysis, the probability of abnormal data can be controlled to a considerable extent by acquiring the original data from the source pasting layer in the data warehouse, and the time for cleaning the data can be saved.
According to a service demand scene and a requirement on data real-time performance, a data source may be in real-time/quasi-real-time and no real-time requirement, data needs to be acquired from a source service system source layer aiming at the real-time/quasi-real-time scene, if the scene is not in real-time, the mandatory requirement is not needed, and data can be directly acquired from the source system according to actual conditions (rarely in practical application) or data in the source system is synchronized to a storage layer through etl (a common method) to be called. In a real-time/near real-time scenario, the data source is updated in real-time/near real-time, and if etl is used, the data is updated according to a predetermined frequency (usually T + 1).
Based on the above considerations, further, in an embodiment, the data service formulating step includes:
step A1, considering the real-time property of the data, dividing the original data into real-time class data, quasi-real-time class data and non-real-time class data.
Step A2, analyzing the real-time performance of data objects corresponding to each target data product according to actual business requirements, and determining the type of matched original data;
and step A3, specifying the calling mode and the operation strategy corresponding to each target data product by combining the original data type and the data source.
In one embodiment, the real-time data product is oriented to a scene of real-time analysis requirements, corresponds to a real-time data object, and is matched with real-time class data and/or quasi-real-time class data;
the detail query product is oriented to a scene of large-scale detail data query requirements, corresponds to a non-real-time data object, is matched with non-real-time data, and needs to perform data transcoding before data operation;
the business theme product is oriented to a visual report and a scene of a relation exploration analysis demand, corresponds to a non-real-time data object, is matched with non-real-time data, and needs to perform data correlation analysis before data operation;
the data set product is oriented to a scene with temporary data demand, corresponds to a non-real-time data object and is matched with non-real-time data;
the data cube product is oriented to a report with large data volume and a scene with visual exploration analysis requirements, corresponds to a non-real-time data object, is matched with non-real-time data, needs to perform data association analysis before data operation, and pre-calculates an aggregation table according to a set rule on the data after association analysis.
For a real-time/quasi-real-time scenario, a stream computing engine (Storm/Spark Streaming) needs to be deployed to perform real-time/quasi-real-time operation on data in a data source, and for non-real-time data of a non-real-time scenario, such as offline data, a batch offline computing engine (e.g., Spark) needs to be deployed to perform batch operation on data of a storage layer at regular time. The different requirements of the business for real-time are met by the two calculation engines.
Therefore, in one embodiment, in the step a3, the method includes:
for real-time class source data and quasi-real-time class source data, directly acquiring corresponding original data from a source pasting layer of the data warehouse or the database; and/or
Acquiring set source data from the source service system as original data;
and for non-real-time source data, processing and synchronizing data in a source service system to a corresponding first storage layer at regular time through an etl technology, and then calling.
Further, in one embodiment, in the step a3, the method includes:
for real-time source data and quasi-real-time source data, a stream type calculation engine is adopted to calculate the acquired original data, and the calculation result is uniformly backed up to a second storage layer for management and calling;
and for non-real-time source data, calculating the called original data by adopting a batch calculation engine, and directly transmitting the calculation result to a corresponding target data product or storing the calculation result to a second storage layer.
Specifically, the real-time data product is oriented to the requirements of real-time analysis scenes, such as a large monitoring screen, real-time data must be directly read from a source layer of a source database to provide real-time data service, and the technical architecture is generally storm architecture. Fig. 3 shows an exemplary diagram of a real-time data product data service system of the data service construction method in the embodiment of the present invention, in which a Streaming operation engine (e.g., storm/Spark Streaming real-time/near real-time computation framework) continuously obtains the latest data for computation, and then stores the computation result in a second storage layer, for example, in any tool capable of persistently storing data such as Mysql and Hbase, for a user to call.
The detail query product is oriented to the detail query requirement, and the applicable scene is that after the user sees the aggregated result, the user goes down to query the corresponding detail data. The detail data is subjected to batch computation by a batch computation engine such as Spark on data of a source system or a storage layer, and is stored and then provided for a user to query. In practice, there are two cases as follows: 1) the original key information such as a few digits in the user identification number is converted and coded (the original sensitive field is re-coded to generate a new field, for example, a set symbol mark is used for replacing a few bytes in the new field); 2) the conversion is carried out in the query process, and the result shows that the aggregation table is the encoded data (the fixed digit after aggregation is encoded). FIG. 4 is a diagram illustrating an example of a query detail product data service architecture of a data service construction method according to an embodiment of the present invention.
The business theme product is specifically oriented to the requirements of instant reports and visual exploration analysis, and the common usage is to integrate and logically calculate fields, derived indexes and the like required by a certain type of commonly used report requirements to form a theme for data analysis of a certain scene, such as a sales analysis theme, and solidify the theme for all business users who pay attention to the scene to quickly find the fields required by the business users to analyze, so that the difficulty of data analysis is reduced. Fig. 5 shows an exemplary diagram of a business theme product data service system of the data service construction method in the embodiment of the present invention, where a business object (a business subject to be analyzed) and related business attributes for each requirement scenario need to be combed out for the construction of a business theme, Spark performs logical association after calculating in batch corresponding data in a data storage layer, and the data is directly provided for a user to query without storage after the logical association.
Fig. 6 shows an exemplary diagram of a data service system of a data set product of a data service construction method in an embodiment of the present invention, the data set product is a service oriented to temporary data requirements, several types of data products, such as the real-time data product, the detail query product, and the business topic, are relatively fixed in the data service construction process, and for temporary and infrequent access requirements, a user uses sql to perform fast and flexible query by himself through the data set product service provided by the system, so as to improve the lack of flexibility of other products.
The data cube product faces report forms with large data volume and visual exploration analysis requirements, business requirements corresponding to the scenes are relatively fixed, and query frequency is not high, fig. 7 shows an example diagram of a data cube product data service system of the data service construction method in the embodiment of the invention, as shown in fig. 7, according to the business requirements, all aggregation tables are calculated in advance from a data source according to a preset rule (index and calculation mode), the result is stored in a first storage area, such as HBase, the calculation result of a storage layer is directly read when a user queries the result, and query speed is improved.
Further, in an optional embodiment, the general target data product further includes an index cloud product and a tag cloud product; based on this, in step a2, the method further includes:
the index cloud product faces to the description demand scene of different business object indexes, corresponds to a non-real-time data object and is matched with non-real-time data;
the basic information of the label cloud product facing the indexes describes a demand scene, corresponds to a non-real-time data object and is matched with non-real-time data.
Further, fig. 8 shows an example diagram of a data service system of an index cloud product of a data service construction method in an embodiment of the present invention, where the index cloud product is a description of a business object in a business demand, such as a patient, a device, a work order, and the like, and provides a large number of indexes for a customer to use according to an actual business demand, such as a device defect occurrence rate derived based on a frequency of occurrence of a device defect, a sodium-potassium ratio derived based on a level of sodium and potassium of a patient, and the like, and the indexes are logically solidified in advance. The index cloud is generated through a batch computing engine based on a data source and is stored in a storage layer for being called by a user when needed.
Specifically, fig. 9 shows an exemplary diagram of a tag cloud product data service architecture of the data service construction method in the embodiment of the present invention. Regarding the tag cloud product, the tag is index information of the index, is used for describing basic information and purposes of the index, can be used for retrieval and query of the index, and can also be used as recommendation and understanding of the index, so as to help a customer to quickly and accurately find an applicable index. The index cloud is generated through a batch computing engine based on a data source and is stored in a storage layer for being called by a user when needed.
Based on the combination of the data products, data services which are suitable for various demand scenes of the user can be constructed, the construction of the data services can be more standardized and normalized, each data product has a corresponding application scene, and meanwhile, the data services constructed by the data products have reusability, for example, a business theme can be suitable for multiple data analysis in the same scene, an index cloud and a tag cloud can be reused, and the process used by the user is beneficial to further accumulation of indexes and tags. The data service is simple and easy to use based on the data product, and the index cloud and the tag cloud also make the use of data more convenient
Based on the constructed general data service, a user can call the corresponding data service according to a business demand scene, specifically, the data service required by the user can be called through a service layer, and according to the characteristics of each type of data product, the corresponding data source and a computing engine generate the data product required by the user to construct an application so as to meet business demands.
Based on the above technical content, it can be understood that, compared with the existing data service model, the data service construction technology based on data productization provided by the present invention has the following advantages:
1. each data product is suitable for a corresponding scene, and an appropriate data product can be selected according to different scenes, so that the data service constructed by the data products is more standard and more standard.
2. The flexibility is better, and the combination of multiple data products can be suitable for all application scenes, and provides data service for any application scene.
3. The data service constructed by using the data product has reusability, and a new data service does not need to be constructed for each new business scene.
4. The data product is used, so that the data service is simpler and easier to construct, particularly, the index cloud provides more normalized indexes for the user, and the tag cloud helps the user to quickly know and use the indexes.
While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present invention is not limited by the illustrated ordering of acts, as some steps may occur in other orders or concurrently with other steps in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
It should be noted that in other embodiments of the present invention, the method may also obtain a new data service construction method by combining one or some of the above embodiments.
It should be noted that, based on the method in any one or more embodiments of the present invention, the present invention further provides a storage medium, on which program code for implementing the method in any one or more embodiments of the present invention is stored, and when the program code is executed by an operating system, the program code can implement the method for building a generic data service based on data productization, as described above.
Example two
The method is described in detail in the embodiments disclosed in the present invention, and the method of the present invention can be implemented by using various forms of apparatuses or systems, so based on other aspects of the method described in any one or more embodiments, the present invention further provides a data productization-based generic data service building system, which is configured to execute the data productization-based generic data service building method described in any one or more embodiments. Specific examples are given below for a detailed description. Specifically, the system comprises:
the target product definition module is configured to define a universal target data product according to the requirement of the business scene on data, the target data product comprises one or more of a real-time data product, a detail query product, a business theme product, a data set product and a data cube product, and each target data product corresponds to one or more business scenes;
a data source determination module configured to set a plurality of data sources for providing raw data for a general data service forming the target data product;
and the data service formulation module is configured to synthetically consider the actual business requirements of the target data product and the data source to formulate a data calling mode and an operation strategy corresponding to the general data service.
In one embodiment, the data source determination module is configured to set the data source to include a historical data source and an event data source, wherein raw data of the historical data source adopts online analysis processing data of a data warehouse, and raw data of the event data source adopts event data generated by a source business system.
Further, in one embodiment, the data source determination module is configured to set the raw data of the historical data source to also include online transaction data in a database
Preferably, in an embodiment, the data service formulation module implements construction of a data service corresponding to a target data object by performing the following operations:
step A1, considering the real-time property of data, dividing the original data into real-time data, quasi-real-time data and non-real-time data;
step A2, analyzing the real-time performance of data objects corresponding to each target data product according to actual business requirements, and determining the type of matched original data;
and step A3, specifying the calling mode and the operation strategy corresponding to each target data product by combining the original data type and the data source.
Specifically, in one embodiment, the real-time data product is oriented to a scene of real-time analysis requirements, corresponds to a real-time data object, and is matched with real-time class data and/or quasi-real-time class data;
the detail query product is oriented to a scene of large-scale detail data query requirements, corresponds to a non-real-time data object, is matched with non-real-time data, and needs to perform data transcoding before data operation;
the business theme product is oriented to a visual report and a scene of a relation exploration analysis demand, corresponds to a non-real-time data object, is matched with non-real-time data, and needs to perform data correlation analysis before data operation;
the data set product is oriented to a scene with temporary data demand, corresponds to a non-real-time data object and is matched with non-real-time data;
the data cube product is oriented to a report with large data volume and a scene with visual exploration analysis requirements, corresponds to a non-real-time data object, is matched with non-real-time data, needs to perform data association analysis before data operation, and pre-calculates an aggregation table according to a set rule on the data after association analysis.
In one embodiment, the data service formulation module is configured to, for real-time class source data and quasi-real-time class source data, directly obtain corresponding raw data from a posting source layer of the data warehouse or the database; and/or
Acquiring set source data from the source service system;
and for non-real-time source data, processing and synchronizing data in a source service system to a corresponding first storage layer at regular time through an etl technology, and then calling.
Further, in one embodiment, the data service formulation module is configured to, for the real-time source data and the quasi-real-time source data, perform an operation on the acquired original data by using a stream type calculation engine, and uniformly backup an operation result to the second storage layer for management and calling;
and for non-real-time source data, calculating the called original data by adopting a batch calculation engine, and directly transmitting the calculation result to a corresponding target data product or storing the calculation result to a second storage layer.
In an optional embodiment, the generic target data product further comprises an index cloud product and a tag cloud product; in step a2, the method further includes:
the index cloud product faces to the description demand scene of different business object indexes, corresponds to a non-real-time data object and is matched with non-real-time data;
the basic information of the label cloud product facing the indexes describes a demand scene, corresponds to a non-real-time data object and is matched with non-real-time data.
In the general data service construction system based on data productization provided by the embodiment of the invention, each module or unit structure can be independently operated or operated in a combined mode according to actual data analysis and relation requirements, so that corresponding technical effects are realized.
It is to be understood that the disclosed embodiments of the invention are not limited to the particular structures, process steps, or materials disclosed herein but are extended to equivalents thereof as would be understood by those ordinarily skilled in the relevant arts. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, appearances of the phrase "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.
Although the embodiments of the present invention have been described above, the above descriptions are only for the convenience of understanding the present invention, and are not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A general data service construction method based on data productization is characterized by comprising the following steps:
a target product definition step, wherein a universal target data product is defined according to the requirement of a service scene on data, the target data product comprises one or more of a real-time data product, a detail query product, a service theme product, a data set product and a data cube product, and each target data product corresponds to one or more service scenes;
determining a data source, setting a plurality of data sources for providing original data for a general data service forming the target data product;
and a data service formulation step, wherein the actual business requirements of the target data product and the data source are comprehensively considered to formulate a data calling mode and an operation strategy corresponding to the general data service.
2. The method according to claim 1, wherein, in the data source determining step,
the data source comprises a historical data source and an event data source, wherein the original data of the historical data source adopts the online analysis processing data of a data warehouse, and the original data of the event data source adopts the event data generated by a source service system.
3. The method of claim 2, wherein in the data source determining step, the raw data of the historical data source further comprises online transaction data in a database.
4. The method according to claim 1, wherein the data service preparation step comprises:
step A1, considering the real-time property of data, dividing the original data into real-time data, quasi-real-time data and non-real-time data;
step A2, analyzing the real-time performance of data objects corresponding to each target data product according to actual business requirements, and determining the type of matched original data;
and step A3, specifying the calling mode and the operation strategy corresponding to each target data product by combining the original data type and the data source.
5. The method of claim 1, wherein the real-time data product is oriented to a scenario of real-time analysis requirements, which corresponds to real-time data objects, matched to real-time class data and/or quasi-real-time class data;
the detail query product is oriented to a scene of large-scale detail data query requirements, corresponds to a non-real-time data object, is matched with non-real-time data, and needs to perform data transcoding before data operation;
the business theme product is oriented to a visual report and a scene of a relation exploration analysis demand, corresponds to a non-real-time data object, is matched with non-real-time data, and needs to perform data correlation analysis before data operation;
the data set product is oriented to a scene with temporary data demand, corresponds to a non-real-time data object and is matched with non-real-time data;
the data cube product is oriented to a report with large data volume and a scene with visual exploration analysis requirements, corresponds to a non-real-time data object, is matched with non-real-time data, needs to perform data association analysis before data operation, and pre-calculates an aggregation table according to a set rule on the data after association analysis.
6. The method according to claim 4, wherein in the step A3, the method comprises:
for real-time class source data and quasi-real-time class source data, directly acquiring corresponding original data from a source pasting layer of the data warehouse or the database; and/or
Acquiring set source data from the source service system;
and for non-real-time source data, processing and synchronizing data in a source service system to a corresponding first storage layer at regular time through an etl technology, and then calling.
7. The method according to claim 4, wherein in the step A3, the method comprises:
for real-time source data and quasi-real-time source data, a stream type calculation engine is adopted to calculate the acquired original data, and the calculation result is uniformly backed up to a second storage layer for management and calling;
and for non-real-time source data, calculating the called original data by adopting a batch calculation engine, and directly transmitting the calculation result to a corresponding target data product or storing the calculation result to a second storage layer.
8. The method of claim 1, wherein the generic target data products further comprise an index cloud product and a tag cloud product; in step a2, the method further includes:
the index cloud product faces to the description demand scene of different business object indexes, corresponds to a non-real-time data object and is matched with non-real-time data;
the basic information of the label cloud product facing the indexes describes a demand scene, corresponds to a non-real-time data object and is matched with non-real-time data.
9. A storage medium having program code stored thereon for implementing the method of any one of claims 1-8.
10. A generic data service construction system based on data productization, characterized in that the system performs the method according to any one of claims 1 to 8.
CN202110118636.6A2021-01-282021-01-28General data service construction method and system based on data productizationPendingCN112817938A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110118636.6ACN112817938A (en)2021-01-282021-01-28General data service construction method and system based on data productization

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110118636.6ACN112817938A (en)2021-01-282021-01-28General data service construction method and system based on data productization

Publications (1)

Publication NumberPublication Date
CN112817938Atrue CN112817938A (en)2021-05-18

Family

ID=75859920

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110118636.6APendingCN112817938A (en)2021-01-282021-01-28General data service construction method and system based on data productization

Country Status (1)

CountryLink
CN (1)CN112817938A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115391448A (en)*2022-10-082022-11-25数兑科技(杭州)有限公司Intelligent counting method

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20090037430A1 (en)*2007-08-032009-02-05Sybase, Inc.Unwired enterprise platform
CN109840253A (en)*2019-01-102019-06-04北京工业大学Enterprise-level big data platform framework
CN111008197A (en)*2019-11-202020-04-14王锦志Data center design method for power marketing service system
CN112199430A (en)*2020-10-152021-01-08苏州龙盈软件开发有限公司 A business data processing system and method based on a data center

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20090037430A1 (en)*2007-08-032009-02-05Sybase, Inc.Unwired enterprise platform
CN109840253A (en)*2019-01-102019-06-04北京工业大学Enterprise-level big data platform framework
CN111008197A (en)*2019-11-202020-04-14王锦志Data center design method for power marketing service system
CN112199430A (en)*2020-10-152021-01-08苏州龙盈软件开发有限公司 A business data processing system and method based on a data center

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115391448A (en)*2022-10-082022-11-25数兑科技(杭州)有限公司Intelligent counting method

Similar Documents

PublicationPublication DateTitle
US11789943B1 (en)Configuring alerts for tags associated with high-latency and error spans for instrumented software
CN108628929B (en)Method and apparatus for intelligent archiving and analysis
US11669507B2 (en)Indexing and relaying data to hot storage
US11775501B2 (en)Trace and span sampling and analysis for instrumented software
CN103136335B (en)A kind of data control method based on data platform
WO2022133981A1 (en)Data processing method, platform, computer-readable storage medium, and electronic device
US20190251095A1 (en)Identifying metrics related to data ingestion associated with a defined time period
EP2709023A1 (en)Systems and/or methods for statistical online analysis of large and potentially heterogeneous data sets
CN111339073A (en)Real-time data processing method and device, electronic equipment and readable storage medium
CN110335009A (en) Report generation method, device, computer equipment and storage medium
CN117971606A (en)Log management system and method based on elastic search
CN109299089A (en)The calculating and storage method and calculating of a kind of label data of drawing a portrait and storage system
CN118394829A (en)Data blood edge analysis method, device, equipment and readable storage medium
CN111522918A (en)Data aggregation method and device, electronic equipment and computer readable storage medium
CN115599871A (en)Lake and bin integrated data processing system and method
CN112817938A (en)General data service construction method and system based on data productization
CN119336759A (en) Data label processing method, device, equipment and medium based on dimension conversion
CN110858341A (en) Indicator monitoring method, device, device and medium based on distributed storage system
CN114860851A (en)Data processing method, device, equipment and storage medium
CN110727532B (en)Data restoration method, electronic equipment and storage medium
CN114490137A (en) Method, device, electronic device and readable storage medium for real-time statistics of business data
CN111143328A (en)Agile business intelligent data construction method, system, equipment and storage medium
CN111611245B (en)Method and system for processing data table
DoanA Framework for Integrating IoT Streaming Data from Multiple Sources
CN109710673B (en)Work processing method, device, equipment and medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20210518

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp