CN114756469B

Movatterモバイル変換

Info

Publication number: CN114756469B
Application number: CN202210434561.7A
Authority: CN
Inventors: 徐登峰
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-04-24
Filing date: 2022-04-24
Publication date: 2024-07-16
Anticipated expiration: 2042-04-24
Also published as: CN114756469A

Abstract

The embodiment of the application discloses a data relationship analysis method, a data relationship analysis device and electronic equipment, wherein the method comprises the following steps: initiating a detection flow of a target link aiming at the target link to be detected; in the detection flow, intercepting the call request with the detection identifier received by the application; in the process of responding to the call request, performing cache invalidation processing on the related read cache request so as to enable data query to be performed in a database associated with the application; intercepting and processing the database query request and the returned result to acquire a database query record and then storing the corresponding relation between the database query record and the link identifier of the target link; and determining the related library table information of the target link and the fields with the data blood-edge relationship by analyzing a plurality of database query records corresponding to the same link identifier. By the method, the data blood-edge relation information can be obtained more efficiently and completely.

Description

Data relationship analysis method and device and electronic equipment

Technical Field

The present application relates to the field of data analysis technologies, and in particular, to a data relationship analysis method, a data relationship analysis device, and an electronic device.

Background

In pressure testing or other testing scenarios, data is required as a support, but in actual testing, data loss or insufficient abundance often occurs. One solution is to synchronize the relevant data on-line to a "shadow library" or to an off-line database for stress testing or to enrich the off-line test data, since the real environment is run on-line, which generates a lot of data. In this process, data synchronized on-line to other places needs to be desensitized first based on data security requirements.

The same link may involve multiple applications (e.g., detail applications in a commodity order link, marketing-related applications, member management applications, etc.), and each application may involve multiple data tables in multiple databases. Wherein, in different data tables of different databases, there may be fields with data blood-edge relations (which are relations between related data found in the process of tracing data, or a relation similar to human society blood-edge relations formed between data in the process of generating, processing and transferring data to extinction), for the data on such fields, the same desensitization rule needs to be used for desensitization in the process of synchronizing from online to offline, otherwise, the same desensitization rule may not be used after synchronizing to offline. However, since specific databases and data tables may belong to different domains, maintenance is performed by different groups or teams (which may be generally referred to as BU), and thus, it is necessary to determine which fields in which data tables of which databases have data blood-edge relationships between them after analysis.

One way is for the experts in the multiple domains to manually output the blood-lineage relationships between the fields of the database table empirically. For example, some of the data tables in some databases may be designed by some expert, who knows the specific library table logic, and the blood-edge relationships between fields, so that data blood-edge relationships can be output by such expert. However, since multiple applications may be involved on the same link, each of which includes multiple database tables, the expert, who may be involved in multiple different BU's, may be involved in coordination to obtain complete data lineage information on the same link, and the overall process is very labor intensive for most links.

Alternatively, analysis of data lineage relationships is performed by a log maintained by the link tracking system (which applications are involved on a particular link, which services are invoked, which database tables are used, which caches, etc.). However, in most cases, data call is performed on a specific link through a cache, but database call is not performed directly, and data in the cache is a result obtained by performing arithmetic processing such as aggregation on data call results of a plurality of database tables, so that call conditions between specific database tables cannot be known, and further complete data blood-edge relation information on the specific link cannot be obtained.

Therefore, how to obtain the data blood-edge relation information more efficiently and more completely becomes a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The application provides a data relationship analysis method, a data relationship analysis device and electronic equipment, which can more efficiently and completely acquire data blood relationship information.

The application provides the following scheme:

a data relationship analysis method, comprising:

For a target link to be detected, initiating a detection flow of the target link by starting and executing the target link and carrying a detection identifier in a starting request; wherein the target link includes a plurality of applications thereon;

In the detection flow, intercepting the call request with the detection identifier received by the application;

In the process of responding to the call request, performing cache invalidation processing on the related read cache request so as to enable data query to be performed in a database associated with the application;

intercepting and processing the database query request and the returned result to acquire a database query record and then storing the corresponding relation between the database query record and the link identifier of the target link;

And determining the related library table information of the target link, the data value with the data blood-edge relationship on the target link and the field where the data blood-edge relationship exists by analyzing a plurality of database query records corresponding to the same link identifier.

A data relationship analysis apparatus comprising:

The detection flow starting unit is used for starting and executing the target link aiming at the target link to be detected, carrying a detection identifier in a starting request and initiating a detection flow of the target link; wherein the target link includes a plurality of applications thereon;

The request interception processing unit is used for intercepting the call request with the detection identifier received by the application in the detection flow;

the cache invalidation processing unit is used for performing cache invalidation processing on the related read cache request in the process of responding to the call request so as to perform data query in the database related to the application;

The data interception processing unit is used for intercepting and processing the database query request and the returned result so as to acquire the database query record and then store the corresponding relation between the database query record and the link identifier of the target link;

And the analysis unit is used for determining the related library table information of the target link, the data value with the data blood-edge relationship on the target link and the field where the data value is located by analyzing a plurality of database query records corresponding to the same link identifier.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the preceding claims.

An electronic device, comprising:

one or more processors; and

A memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of the preceding claims.

According to the specific embodiment provided by the application, the application discloses the following technical effects:

According to the embodiment of the application, link detection can be carried out in the process of executing a specific link, particularly in the process of detecting, a call request received by a specific application in the link can be intercepted, and in the process of responding to the call request, cache invalidation processing is carried out on a read cache request, so that data inquiry is carried out in a database associated with the application, and further, database inquiry record information can be obtained through interception of a database inquiry request and a return result, wherein the inquiry condition information, a data value returned by the database, a field where the data value is located and library table identification information can be included. Further, the database query records corresponding to the same link identifier can be analyzed to determine the database table information related to the target link, the data value with the data blood-edge relationship on the target link and the field where the data value is located. In this way, it is possible to determine, in an automated manner, which databases and tables are associated with a link and which fields in the databases and tables have data blood-cause relationships therebetween, and accuracy and efficiency are improved because expert experience is not required. In addition, because data collection is performed during the specific execution of the link, information about a more complete database on the specific link can be obtained.

In order to avoid the code of the specific application, a plug-in program can be provided, and by deploying the plug-in program in the pre-sending environment of the application, the processes of interception of a call request, cache breakdown, data interception and the like can be executed through the plug-in program. In addition, the specific plug-in program can be deployed only on a part of the prefiring machine corresponding to the same application, and meanwhile, the directional routing to the machine equipment is realized through the plug-in program, so that excessive influence on normal flow in the prefiring environment is avoided.

Of course, it is not necessary for any one product to practice the application to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application;

FIG. 2 is a flow chart of a method provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a probe initiation interface provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of an apparatus provided by an embodiment of the present application;

fig. 5 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the application, fall within the scope of protection of the application.

In order to facilitate understanding of the specific technical solution provided by the embodiments of the present application, it is first required to explain that in the process of performing data synchronization from on-line to off-line, the synchronization is mainly for performing a pressure test or other tests, and the test process involves a test in units of links. For example, pressure testing of the lower link, and so on. However, as described in the background section, the same link may involve a very large number of applications, each of which may in turn involve multiple data tables in multiple databases, multiple fields in each data table, and so on. In order to enable testing of links, on the one hand, it is necessary to synchronize all databases, data in the data tables, to which the same link relates, offline. On the other hand, since a desensitization process is required for a part of data (for example, data related to user privacy, etc.) in synchronizing the data from online to offline, data values between different fields in different data tables may have a data blood-edge relationship, and when the desensitization process is performed, the same desensitization rule is required for such data values having the data blood-edge relationship.

For example, when the database query is performed under the same query condition during the execution of the same link, the data value x on the m field in the two data tables a and the data value y on the n field in the data table B may be hit, and the data value x and the data value y belong to the data values with the data blood-edge relationship. For the data values for which there is a data blood-edge relationship, the same desensitization rule needs to be used for desensitization in synchronizing from online to offline for use by the test system. For example, in the above example, the data value x and the data value y having the blood-related relationship are data related to the mobile phone number of a user, where if the data value x uses the desensitization rule that the last four bits are replaced by "×", the data value y also needs to be replaced by "×", otherwise, if the data value y uses the desensitization rule that the last eight bits are replaced by "×", an error may occur when using such data in an online test system.

That is, in the embodiment of the present application, on one hand, it is required to know which data tables in which databases are specifically related to the same link, and on the other hand, it is required to determine which fields of the databases and the data tables related to the same link have a data blood relationship between the data values. However, it is difficult to achieve these two points, because in some systems (such as a system of commodity information service, etc.), one scene often involves multiple (tens and hundreds of common) applications in front of the middle station, and if the two systems are manually arranged, experts in multiple domains are required to manually output the blood-edge relationship between the database tables and the data values involved on the same link according to experience, which is huge in workload. There are also problems if the ranking is based on the log of the link tracking system, for example, on the one hand, applications on the same link recorded in the link tracking system may not be complete (for example, there may be some hidden applications that cannot be tracked by the link tracking system), and there are also problems with the database tables involved that are not comprehensive enough; on the other hand, since the specific application is in the actual running process, in order to improve the efficiency, there may be a large amount of cache reading, that is, after some data is acquired by accessing the database for the first time, the data may be stored in the cache, and then when the data is used again, the data is directly read from the cache, without accessing the database. Since the data stored in the cache is usually the result of aggregating the data returned by multiple databases, the process of reading the cache cannot determine which data tables of which databases the specific data comes from, and therefore, the link tracking system cannot record which data tables of which databases the specific data comes from, and further cannot determine which data values have a data blood-edge relationship.

In view of the above, the embodiments of the present application provide a corresponding solution to obtain, with higher efficiency, more complete database table information on links, and data blood-edge relationship information between data values. Specifically, the embodiment of the application can intercept the reading condition of the database which is specifically related in the process of executing the specific link in the online data logic environment. In this embodiment of the present application, since a large number of cache read requests are involved in the process of being invoked by an application on a specific link, a "cache breakdown" process may also be performed, that is, cache read is disabled, so that relevant data is obtained by reading a database. In this way, more complete information about database call in the link execution process can be obtained, and further, by analyzing the information, it can be determined which database tables are specifically related in the target link and which data values have data blood-edge relations.

In order to achieve the above purpose, in one mode, the method can be implemented by modifying the code of a specific application, that is, adding processing logic such as identification of a probe identifier, interception of a call request, cache breakdown, interception of a database return result and the like into the code of the application. Or in another mode, in order to avoid invading codes of specific applications, a special plug-in program can be provided, and the plug-in program is deployed in a plurality of applications related to a specific link, so that processing logic such as detection identification, interception of a call request, cache breakdown, interception of a database return result and the like can be realized through the plug-in program.

From the system architecture perspective, referring to fig. 1, a data relationship analysis system may be provided in an embodiment of the present application, and in an alternative manner, a plug-in may also be provided, where such plug-in may be deployed into a specific link-related application. For example, in the example shown in fig. 1, it is assumed that an application A, B, C, D is involved in a certain target link, where application a is an entry application of the link, and in the process of executing the link, application a calls application B, application B calls application C again, and application C calls application D again. In the embodiment of the application, a specific plug-in program can be deployed in all four applications. In this way, after the probing process of the target link is started, the plug-in program can intercept the call request called by the specific application, and then if cache reading is involved in the process of responding to the call request, the plug-in program can also perform cache breakdown, so that the specific data request can directly reach the database. And then, the plug-in program can acquire the database query record information according to the returned result of the database, and store the corresponding relation between the database query record information and the link identifier of the current target link into the data relation analysis system. After the detection process of the current target link is finished, the data relationship analysis system can determine which databases, data tables, which data tables in which databases have data values with data blood relationship on which fields, and the like, which are specifically related to the target link through analyzing the information provided by the plug-in program. Wherein, the link identification can be obtained by the plug-in program through querying the link tracking system.

The following describes in detail the specific implementation scheme provided by the embodiment of the present application.

Firstly, the embodiment of the application provides a method for acquiring data relationship information, referring to fig. 2, the method specifically may include:

s201: for a target link to be detected, initiating a detection flow of the target link by starting and executing the target link and carrying a detection identifier in a starting request; wherein the target link includes a plurality of applications thereon.

In the embodiment of the application, the data databases and data tables related to the specific link and the data blood-edge relations among the data values in the fields in the databases and the tables can be determined by detecting the specific link in the link execution process.

In particular, in order to improve accuracy of a detection result, a detection flow of a target link may be initiated in an online data logic environment of a related application. Where related applications are applications, typically multiple, that are involved on a particular link. The online data logic environment may be various, for example, it may refer to an environment in which an application actually runs online, at which time a specific application uses processing logic of an online database to provide services for a user, and in this state, a specific link may be probed. Or in view of data security, in a preferred embodiment, the specific link may be probed in a prefire environment of the relevant application. After a specific application is developed, various running environments are usually provided for the application, including, for example, a test environment, the foregoing prefire environment, and a real environment after actually being released. Under the testing environment, offline testing is carried out on the application by mainly utilizing an offline database, the real environment of formal online release is used for providing services for the processes of actual access and the like of users, and the pre-sending environment is mainly used for verifying related applications. However, the pre-sending environment usually uses the online data logic directly, that is, uses the same data logic as the application that is issued online formally, but when a specific request is initiated in the pre-sending environment, some virtual parameters may be used, for example, when a downlink is initiated in the pre-sending environment, a specific user account may be a virtual account, instead of an account of a certain actual consumer user, etc. Therefore, the link detection related to the embodiment of the application is more suitable to be performed in a prefire environment. It should be noted that, for a specific application, a prefire environment exists all the time, that is, even if a certain application is already formally released online.

In the foregoing prefire environment, specific applications are typically deployed to some machines for running, and as described above, when the processes related to call request interception, cache breakdown and the like in the embodiments of the present application are implemented by using plug-ins, specific plug-ins may be deployed on the machines where these applications are located. Of course, in order to avoid excessive impact on other normal flows in the prefire environment, a plug-in may be installed on some machines, for example, assuming that an application is deployed on 5 machines, a plug-in may be deployed for the application on only one of the machines. Meanwhile, a specific plug-in program can also realize a routing function, that is, when an application needs to call a downstream application, a specific call request can be routed to a machine corresponding to the application and provided with the plug-in program through the plug-in program, and therefore, the method is described in detail later at Wen Huiyou.

There may be various ways to specifically initiate the link probing. For example, in one mode, since a data relationship analysis system can be deployed, relevant data acquired in a link probing process can be summarized to the data relationship analysis system, so that data blood relationship analysis can be performed on a plurality of database query records corresponding to the same link identifier through the data relationship analysis system. In this case, configuration information of the ingress application/service and parameters of the target link may be received through a start-up interface provided by the data relationship analysis system, for starting up execution of the target link, and carrying a probe identification in a start-up request. Where for a particular target link, its portal application may be obtained in some way, i.e. the start point of the link may be known, e.g. in the example shown in fig. 1, application a is the start point of the corresponding link, of course, in a particular implementation, one application typically associates multiple services, one service may also associate multiple methods, and the start point of a particular link may be specific to a service, method, etc. in a particular application. Therefore, when the specific data relationship analysis system provides the starting interface, as shown in fig. 3, a control for inputting information of specific link entry applications, services, methods and the like can be provided, and if a specific service address and the like already contain information of an application, the control for inputting information of the specific application does not need to be separately provided; controls for entering specific parameters, etc. may also be provided. Thus, after specifying the ingress application/service, method for a particular link, execution of that link in the application prefire environment may be initiated by setting specific parameters. And in the specific starting request, a specific detection identifier is carried, so that the subsequent downstream application knows that the current request is used for carrying out link detection so as to carry out distinguishing processing with a common link.

That is, assuming that the downlink link needs to be detected currently, information such as an entry service, a method and the like of the downlink link can be specified in a starting interface provided by the data relationship analysis system, specific parameters are set, then detection of the link can be initiated, and accordingly, the data relationship analysis system can automatically add a detection identifier in a starting request.

Or in another mode, the execution of the target link can be started under the state that the target page provided by the portal application of the target link is displayed through a web browser, wherein a browser plug-in program is arranged in the web browser and used for carrying the detection identification in the starting request. That is, in addition to the initiation of the probing of the target link by the data relationship analysis system, the probing of the target link may also be initiated directly in the relevant interface of the portal application of the link, for example, if the portal application of a link is application a, the probing of the link may be initiated specifically on the machine where application a is located in the pre-sending environment. Specifically, assuming that the link is a downlink link, the application a may be accessed through a browser, and a ordering operation may be initiated in a relevant page of the application a. Of course, in the embodiment of the present application, since the probing flow needs to be distinguished from the normal flow, when a specific link is started, a probing identifier needs to be added to the starting request. In order to achieve the purpose, in specific implementation, the application can also provide a browser plug-in, after the browser plug-in is installed in the browser, a specific detection identifier can be added to a specific starting request after a certain application is accessed and a certain link is started through the browser.

After a label link is started, each related application executes calling according to a preset calling relation in the link until the link is finished (during which no manual intervention is needed).

In order to achieve the above-mentioned purpose of collecting relevant data, specific codes can be added in the relevant application, or in order to avoid code intrusion into the application, specific plug-ins can be deployed in the relevant application, as described above. In any way, it is possible to know in advance which applications are specifically related to a specific link, and specifically, because a link tracking system is usually deployed in a scenario of cooperative work of multiple applications, which applications are related to a specific link can be known through a log of the link tracking system, and the like. Then, the application is modified by code or the related plug-in program is installed. Of course, there may be cases where some hidden applications cannot be embodied in the log in the link tracking system, but for such hidden applications are found during execution of a particular link by the on-line data logic, so after probing of a link is initiated, if it is found that an application has not yet implemented probing-related logic in the code, or a particular plug-in has not yet been deployed, the current probing flow may be suspended, prompting the user to re-initiate probing of the link after the application is code-modified or plug-in deployed.

It should be noted that, for a target link, in an ideal case, only one link probing process needs to be started, but if there are cases that part of applications in the link have not completed code transformation or plug-in deployment, multiple link probing may need to be initiated for the same link to complete the complete probing. In addition, there may be some cases that the databases, data tables, etc. involved may be different when the same link is started by using different parameters, in which case, a more complete data detection result about the link may be obtained by initiating detection on the link multiple times, and changing the parameter type, etc.

S202: in the detection flow, intercepting the call request with the detection identifier received by the application.

After a specific target link is started and a probe identifier is carried in a starting request, a probing process for the link is started. Then, in the process of calling each application on the link according to the set logic, a specific data collection process can be executed through code added in the application or a plug-in program deployed (for convenience of description, the description is presented below by taking a plug-in program implementation as an example).

Wherein each application on the link may perform the processing as in steps S202 to S204 in order to perform specific data collection. That is, after each application is called and a specific call request is received, a specific plug-in program can intercept the request and judge whether the request has a detection identifier, if no detection identifier, the request can be directly released and processed according to a common flow, and if the request has the detection identifier, the plug-in program takes over subsequent response work to the call request. Here, in a specific implementation, a switching function may be configured for a specific plug-in an application, and the plug-in may be controlled to be turned on or off by the switching function.

It should be noted here that, regarding the call request received by the specific application, typically, from the upstream application in the link, the different applications may communicate through a pre-configured protocol to initiate the specific call request. For example, the method may specifically include an HSF (High-SPEED SERVICE Framework) protocol, etc., where each application may register a service provided by each application in the HSF system, and other applications may initiate a call to the service registered by other applications through the protocol to consume the service, etc. In the embodiment of the application, service call among the applications is performed among the applications on one link based on the protocol. Thus, in this case, the specific intercepted call request may be such an HSF request, and so on.

S203: and in the process of responding to the call request, performing cache invalidation processing on the related read cache request so as to enable data query to be performed in the database associated with the application.

After a call request is specifically intercepted and found to have a probe identification, the call request can be responded to by a specific plug-in. Among them, in a specific call request, the processing that needs to be executed by the current application may be divided into the following aspects: a database reading operation, a cache reading operation, a further call request related operation initiated to a downstream application, and the like. If the request of directly reading the database is involved, the request can be released, the returned result of the specific database can be intercepted, and of course, information such as query conditions in the specific request can also be recorded.

If a request to read the cache is involved, then in embodiments of the present application, a cache breakdown process may be performed, i.e., by returning a message about the cache miss for the request in some manner so that it may be diverted to initiate the request to a particular database. That is, in practical applications, an application may automatically initiate a read data request to an associated database when accessing the cache if it is found that the cached data has been stale (either there is no desired data, the data has timed out, etc.). Therefore, the embodiment of the application utilizes the characteristic that whether the cache is really invalid or not, the plug-in program returns the information about the invalid cache, thereby directly acquiring specific data by accessing the database, and acquiring complete information about which databases, data tables, fields and the like the current target link specifically relates to.

In particular, when performing invalidation processing on the cache operation, there may be multiple modes, and the corresponding invalidation processing mode may also be selected according to the characteristics of different cache types. For example, if in the current cache system, specific cache data needs to be identified by a cache ID or the like, an invalid cache ID may be returned; if in the current caching system, it is necessary to first determine whether the cache has timed out before obtaining the cached data, then uniformly returning the timed out, and so on.

Specifically, for a buffer system of the Tair MDB (MESSAGE DRIVEN Bean, information driven class), the parameter of the nalspace in all the Tair related get requests can be modified to request the nalspace set by the data relationship analysis system, and the returned result of the nalspace is empty, so that the data relationship analysis system automatically enters the database logic.

Or for a Guava class cache system, the database logic may be automatically triggered by modifying the outcome of whether the cache data has expired, i.e., letting all the cache data be expired. Of course, in specific implementation, the white list control may be further performed according to the application, and only in the white list, the Guava cache breakdown may be performed.

Still further, or for a tair3.0 (Redis) type cache system, the parameter values of the get data request method may be modified to make it unable to find the corresponding data, and then the specific request goes to database logic. In this case, the whitelist control can be performed according to parameters such as the Redis address and Key, and only the whitelist is broken down by the Redis cache.

It should be noted that, although in the embodiment of the present application, in the process of detecting a link, cache breakdown is performed, so that a specific data request goes through the database logic uniformly, and this process definitely affects the efficiency of data reading, on one hand, since the purpose of link detection is mainly to obtain information about a complete database table on a link, the time factor is not a primary consideration; on the other hand, since the number of execution times of the link probe is limited, the link probe occupies only a small proportion of the traffic of the normal flow, and the cache still continues to be cached in the normal flow, and the link probe is not influenced by the plug-in program, especially when the plug-in program is only deployed on a part of the machines, so that the overall system performance is not excessively influenced.

S204: intercepting and processing the database query request and the returned result to acquire a database query record and then storing the corresponding relation between the database query record and the link identifier of the target link; the database query record includes: and querying condition information, and returning data values and fields and library table identification information of the data values and the fields.

Because the cache breakdown processing is performed, the specific data request is completely sent to the database logic, and then the result returned by the specific database can be intercepted by plug-in programs and the like, so that the specific data query record is obtained. In the embodiment of the application, because the determination of the database table related to the link and the analysis of the data blood edge relationship are involved, the specific data query record may include: query condition information in the query request, and data values returned by the database, fields in which the data values reside, and library table identification information.

In this case, with respect to the query condition information, since the plug-in intercepts the call request received by the application, the query condition information can be obtained from the request related to the specific database access. For example, in the case of performing a database query specifically through SQL (Structured Query Language, a structured query language, which is a database language), a query condition may be specified by a Select statement, that is, a result of a query in a database by a Select statement returned with the specified condition, so that the query condition value and the returned result specified in the Select statement may be intercepted to obtain specific query condition information and a returned data value. In addition, the field in which the data value is located and the identification information of the library table (i.e., which data table is located in which database specifically) may also be obtained.

In a specific implementation, related problems generated in a specific application system aiming at a distributed database may use a database middleware to package the database, so that the distributed database has a multi-layer structure, for example, the database is grouped through a Group layer, so as to realize functions of separating read and write of the database, switching between main and standby, etc., rule management is performed through a Matrix layer, and interpretation of SQL sentences, etc. In this case, the interception of the database related data may be divided into two parts, one part may be interception of the middle layer (for example, the Group layer) to obtain the database, the table, and the database and table related data (including database identifier, data table identifier, field identifier, etc.), and the other part may be used to intercept the Matrix layer to obtain the sql related request data and the return result (that is, specific query condition and data value).

After the database query record information is obtained, because multiple database queries may be involved in one application, multiple different applications may be involved in the same link, and these applications may be distributed on different machines, where it may be necessary to aggregate the obtained database query records to the data relationship analysis system for analysis. Thus, to facilitate the data relationship analysis system in determining which database query records are generated in the same link, a particular plug-in may also record the correspondence between a particular database query record and the link identification. The link identifier may be specifically obtained through a link tracking system, where for some large-scale application systems, because of the number of specific applications and the number of links, multiple applications on each link need to cooperate with each other, and therefore, a corresponding link tracking system is generally configured. Thus, after a link is started, the link tracking system automatically generates a link identifier for the link, so that the link identifier of the current target link can be obtained by querying the link tracking system.

Of course, in addition to the cache breakdown process, a call to a downstream application may be involved, where if, as described above, the link probing is performed in the prefire environment of the application, and only a plug-in is deployed in a part of the machines where the same application is located, when the call is made to the downstream application, it may also be queried first on which machine the plug-in is deployed by the specific downstream application, and then the specific call request may be routed to this machine.

In particular, a configuration center may be provided in the data relationship analysis system, through which a plug-in program is configured for a particular application, for example, for a certain application, it may be specified on which machine the plug-in program is deployed, and the correspondence relationship between the application and the IP address or the like of the machine device in which the plug-in program is specifically deployed may be saved. When a plug-in program of a specific application needs to initiate a call to a downstream application, a query can be initiated to the configuration center by using an identifier of the downstream application as a parameter, and the configuration center can return information such as an IP address of a machine device configured with the plug-in program corresponding to the downstream application. Further, the plug-in of the current application may route the call request to the downstream application to the IP address.

In addition, when a call request is sent to a downstream application, a specific plug-in program can also add a detection identifier in the call request, so that the plug-in program in the downstream application processes according to a specific detection flow by identifying the detection identifier.

S205: and determining the related library table information of the target link, the data value with the data blood-edge relationship on the target link and the field where the data blood-edge relationship exists by analyzing a plurality of database query records corresponding to the same link identifier.

When all applications on a link have completed calls and the associated database also returns data, probing of a link is completed, although, as described above, the same link may need to be probed multiple times to collect complete data. In summary, after the detection of the same link is completed, specific data, that is, related database query records, etc., may be collected, and then, analysis may be performed according to these database query records to determine the related database table information of the target link, and the data value and the field where the data blood-edge relationship exists on the target link.

As described above, since multiple applications on the same link may be distributed on multiple different machine devices, the plug-in program deployed by each application may uniformly store the correspondence between the database query record and the link identifier in the data relationship analysis system, where the data is analyzed. When analyzing, the corresponding link identifier can be determined for the current target link, and then a plurality of database query records corresponding to the link identifier are taken out for analysis. In particular, during analysis, because the corresponding database, data table and other identifiers can be recorded in each database query record, the database and data table identifiers recorded in the database records corresponding to the same link identifier are summarized and de-duplicated to obtain which database tables the link specifically relates to. In addition, when the data blood-edge relationship analysis is performed, query conditions and returned data values respectively included in the plurality of database query records may be analyzed, where if a plurality of identical data values corresponding to the same query conditions exist in the plurality of database query records associated with the same link identifier, the plurality of identical data values are determined as data values having the data blood-edge relationship.

In specific implementation, the method can be carried out by the following steps: aiming at SQL information contained in the detected database query record, the SQL statement of Update can be firstly analyzed into a Select statement; then, the SQL result is analyzed, the table fields are aggregated according to the value, then the condition fields of the Select are facilitated, and the table fields consistent with the values of the condition fields of the Select are obtained.

For example, in a plurality of database query records corresponding to the same link identifier, the commodity ID of a commodity is also used as a query condition, wherein in the database a, a specific data value of an x field in the data table m hits the query condition, in the database B, a specific data value of a y field in the data table n also hits the query condition, and then a data blood-edge relationship exists between the two data values, so that it is determined that a data blood-edge relationship exists between the x field in the data table m and the y field in the database B and between the two data values.

In the process of analyzing the data blood edges according to the above, filtering can be performed according to a certain rule. For example, the data value is a digital type and 10 or less, or a boolean type, or a string type and 3 or less, or a status (status), a type (type) or the like is included in the field name, and the data blood-edge relationship can be excluded.

After the analysis result is obtained, the data relationship analysis system may be used to display, for example, a database associated with the current link, a list of data tables, which databases, which fields in the data tables have data blood relationship, and so on. In this way, the user can synchronize data from the on-line database to off-line based on this information, for example, the entire database associated with the current link, the entire amount of data in the data table, and the same desensitization rule is used when desensitizing data on fields having a data blood-edge relationship therein during synchronization. In addition, the deduction process of the specific data blood-edge relationship, including the query condition value associated with the specific data value, can be displayed to the user (the personnel who need to use the data blood-edge relationship to perform data synchronization and other processes), so that the user can know not only which fields have the data blood-edge relationship, but also the reasons for the data blood-edge relationship in the fields.

In the specific implementation, the specific detection process is to read the database on the line, but due to the defects of the plug-in program or the code defects of the application itself, the dead loop condition may occur in the database reading process, so that the processing of the dead loop prevention can be performed on the reading of the database. Specifically, the number of times of reading the database in one call process may be recorded, for example, a certain database is repeatedly read thousands of times, and the data value returned each time is the same, so that an alarm may be sent, and in the subsequent call process, the number of times of reading the database may be limited, and so on. In addition, specific alarm information and the like can be synchronized into a log system provided by a related cloud computing service platform for standardized monitoring.

The specific scheme provided by the embodiment of the application can be used in various specific application scenes. For example, in the early stage of deployment of the community group purchase system, the offline link needs to be opened, at this time, the method provided by the embodiment of the application can be used for detecting each link to obtain the specific related links such as commodity creation and release, business details, basic data and the like, which data are related to which database and table, and which fields assist in the offline opening of the related links

In summary, by the embodiment of the application, link detection can be performed in the process of executing a specific link, particularly in the process of detecting, call requests received by specific applications in the link can be intercepted, and in the process of responding to the call requests, cache invalidation processing is performed on read cache requests, so that data query is performed in a database associated with the applications, and further, database query record information can be obtained by intercepting the database query requests and returned results, wherein the database query record information can comprise query condition information, a data value returned by the database, a field where the data value is located and library table identification information. After the link detection is finished, the database query records corresponding to the same link identification can be analyzed to determine the related database table information of the target link, the data value with the data blood-edge relationship on the target link and the field where the data value is located. In this way, it is possible to determine, in an automated manner, which databases and tables are associated with a link and which fields in the databases and tables have data blood-cause relationships therebetween, and accuracy and efficiency are improved because expert experience is not required. In addition, because data collection is performed during the specific execution of the link, information about a more complete database on the specific link can be obtained.

It should be noted that, in the embodiment of the present application, the use of user data may be involved, and in practical application, the user specific personal data may be used in the solution described herein within the scope allowed by the applicable legal regulations in the country under the condition of meeting the applicable legal regulations in the country (for example, the user explicitly agrees to the user to notify practically, etc.).

Corresponding to the foregoing method embodiment, the embodiment of the present application further provides an apparatus for acquiring data relationship information, referring to fig. 4, where the method specifically may include:

A detection flow starting unit 401, configured to initiate a detection flow for a target link to be detected by starting to execute the target link and carrying a detection identifier in a starting request; wherein the target link includes a plurality of applications thereon;

a request interception processing unit 402, configured to intercept, in the probing flow, a call request with the probe identifier received by the application;

A cache invalidation processing unit 403, configured to perform cache invalidation processing on the related read cache request in a process of responding to the call request, so that data query is performed in the database associated with the application;

the data interception processing unit 404 is configured to intercept the database query request and the returned result, so as to obtain a database query record, and store a correspondence between the database query record and a link identifier of the target link; the database query record includes: inquiring condition information, a data value returned by the database, a field in which the data value is positioned and library table identification information;

And the analysis unit 405 is configured to determine, after the target link detection is finished, the library table information related to the target link, and a data value and a field where the data blood-edge relationship exists on the target link by analyzing a plurality of database query records corresponding to the same link identifier.

In a specific implementation, the detection flow starting unit may specifically be configured to:

and starting and executing the target link under the online data logic environment of the related application, and carrying a detection identifier in a starting request to initiate a detection flow of the target link.

Wherein the online data logic environment of the related application comprises: the prefire environment of the related application is an environment for verifying the related application based on the online data logic.

In specific implementation, the corresponding relation can be stored in a data relation analysis system so as to analyze a plurality of database query records corresponding to the same link identifier through the data relation analysis system; at this time, the detection flow starting unit may specifically be configured to:

And receiving configuration information of the portal application/service and parameters of the target link through a starting interface provided by the data relationship analysis system, wherein the configuration information is used for starting and executing the target link and carrying a detection identifier in a starting request.

Or the detection flow starting unit may specifically be configured to:

And starting to execute the target link in a state that a target page provided by an entry application of the target link is displayed through a web browser, wherein a browser plug-in program is deployed in the web browser and is used for carrying the detection identification in the starting request.

And a target plug-in program can be deployed in a plurality of applications included in the target link so as to execute the interception processing through the plug-in program, perform cache invalidation processing in the process of responding to the call request, intercept a returned result of the database, and store the corresponding relation between the database query record and the link identifier of the target link into the processing of the data relation analysis system.

If the detection flow of the target link is initiated in the pre-launch environment of the related application, the same application is deployed on a plurality of machine devices in the pre-launch environment, and the target plug-in program can be deployed in only part of the machine devices;

at this time, the apparatus may further include:

And the request routing unit is used for inquiring the target machine equipment which is associated with the downstream application and is provided with the target plug-in program in a deployment way if the call to the downstream application is related in the process of responding to the call request through the plug-in program, and routing the call request to the downstream application to the target machine equipment.

In addition, it may further include:

and the detection identifier adding unit is used for adding the detection identifier into the call request of the downstream application if the call of the downstream application is involved after the call request is intercepted.

Specifically, the cache invalidation processing unit may specifically be configured to:

and returning an invalid address when the cache address information is returned to the read cache request, so as to perform the cache invalidation processing.

Or the cache miss processing unit may specifically be configured to:

and returning cache overtime information when returning the cache state information to the read cache request so as to perform the cache invalidation processing.

In particular, the analysis unit may be specifically configured to:

and if a plurality of identical data values corresponding to the same query condition exist in the database query records associated with the same link identifier, determining the identical data values as the data values with the data blood-edge relationship.

In addition, the embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the method of any one of the previous method embodiments.

And an electronic device comprising:

one or more processors; and

A memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of the preceding method embodiments.

Fig. 5 illustrates an architecture of an electronic device, which may include a processor 510, a video display adapter 511, a disk drive 512, an input/output interface 513, a network interface 514, and a memory 520, among others. The processor 510, the video display adapter 511, the disk drive 512, the input/output interface 513, the network interface 514, and the memory 520 may be communicatively coupled via a communication bus 530.

The processor 510 may be implemented by a general-purpose CPU (Central Processing Unit) or a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solution provided by the present application.

The Memory 520 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage, dynamic storage, or the like. The memory 520 may store an operating system 521 for controlling the operation of the electronic device 500, and a Basic Input Output System (BIOS) for controlling the low-level operation of the electronic device 500. In addition, a web browser 523, a data storage management system 524, a data relationship analysis processing system 525, and the like may also be stored. The data relationship analysis processing system 525 may be an application program that implements the operations of the foregoing steps in the embodiments of the present application. In general, when the technical solution provided by the present application is implemented by software or firmware, relevant program codes are stored in the memory 520 and invoked by the processor 510 to be executed.

The input/output interface 513 is used for connecting with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

The network interface 514 is used to connect communication modules (not shown) to enable communication interactions of the device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 530 includes a path to transfer information between components of the device (e.g., processor 510, video display adapter 511, disk drive 512, input/output interface 513, network interface 514, and memory 520).

It should be noted that although the above devices only show the processor 510, the video display adapter 511, the disk drive 512, the input/output interface 513, the network interface 514, the memory 520, the bus 530, etc., in the specific implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be appreciated by those skilled in the art that the apparatus may include only the components necessary to implement the present application, and not all of the components shown in the drawings.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The data relationship analysis method, the data relationship analysis device and the electronic equipment provided by the application are described in detail, and specific examples are applied to the description of the principle and the implementation mode of the application, and the description of the examples is only used for helping to understand the method and the core idea of the application; also, it is within the scope of the present application to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the application.

Claims

1. A method of data relationship analysis, comprising:

For a target link to be detected, initiating a detection flow of the target link by starting execution of the target link under an online data logic environment of related applications and carrying a detection mark in a starting request; wherein the target link includes a plurality of applications thereon;

Intercepting and processing the database query request and the returned result to acquire a database query record and then storing the corresponding relation between the database query record and the link identifier of the target link; the database query record includes: inquiring condition information, a data value returned by the database, a field in which the data value is positioned and library table identification information;

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

The online data logic environment of the related application comprises: the prefire environment of the related application is an environment for verifying the related application based on the online data logic.

3. The method of claim 1, wherein the step of determining the position of the substrate comprises,

The corresponding relation is used for being stored in a data relation analysis system so as to analyze a plurality of database query records corresponding to the same link identifier through the data relation analysis system;

The method for initiating the detection flow of the target link by starting the execution of the target link and carrying a detection identifier in a starting request comprises the following steps:

4. The method of claim 1, wherein the step of determining the position of the substrate comprises,

5. The method of claim 1, wherein the step of determining the position of the substrate comprises,

And a plurality of applications included in the target link are deployed with target plug-ins, so that the interception processing is executed through the plug-ins, cache invalidation processing is carried out in the process of responding to the calling request, a return result of the database is intercepted, and the corresponding relation between the database query record and the link identifier of the target link is saved to the processing of the data relation analysis system.

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

Initiating a detection flow of the target link in a prefire environment of a related application; the method comprises the steps that under a prefire environment, the same application is deployed on a plurality of machine devices, and the target plug-in program is deployed in part of the machine devices;

The method further comprises the steps of:

And in the process of responding to the call request through the plug-in, if the call to the downstream application is involved, inquiring the target machine equipment which is associated with the downstream application and is provided with the target plug-in, and routing the call request to the downstream application to the target machine equipment.

7. The method as recited in claim 1, further comprising:

After intercepting the call request, if the call to the downstream application is involved, adding the detection identifier into the call request to the downstream application.

8. The method according to any one of claims 1 to 7, wherein,

The cache invalidation processing of the related read cache request comprises the following steps:

9. The method according to any one of claims 1 to 7, wherein,

10. The method according to any one of claims 1 to 7, wherein,

The step of determining the database table information related to the target link, the data value with the data blood relationship on the target link and the field thereof by analyzing a plurality of database query records corresponding to the same link identifier comprises the following steps:

11. A data relationship analysis apparatus, comprising:

The detection flow starting unit is used for starting and executing the target link under the online data logic environment of related applications aiming at the target link to be detected, carrying a detection mark in a starting request and initiating a detection flow of the target link; wherein the target link includes a plurality of applications thereon;

The data interception processing unit is used for intercepting and processing the database query request and the returned result so as to acquire the database query record and then store the corresponding relation between the database query record and the link identifier of the target link; the database query record includes: inquiring condition information, a data value returned by the database, a field in which the data value is positioned and library table identification information;

12. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method of any of claims 1 to 10.

13. An electronic device, comprising:

one or more processors; and

A memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of claims 1 to 10.