Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams depicted in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow diagrams depicted in the figures are merely exemplary and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be separated, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Fig. 1 schematically shows a block diagram of an exemplary system architecture to which the solution of the present application applies.
As shown in fig. 1,system architecture 100 may include aterminal device 110, anetwork 120, and aserver 130. Theterminal device 110 may include various electronic devices such as a smart phone, a tablet computer, a notebook computer, and a desktop computer. Theserver 130 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services.Network 120 may be a communication medium of various connection types capable of providing a communication link betweenterminal device 110 andserver 130, such as a wired communication link or a wireless communication link.
The system architecture in the embodiments of the present application may have any number of terminal devices, networks, and servers, as needed for implementation. For example, theserver 130 may be a server group composed of a plurality of server devices. In addition, the technical solution provided in the embodiment of the present application may be applied to theterminal device 110, or may be applied to theserver 130, or may be implemented by both theterminal device 110 and theserver 130, which is not particularly limited in this application.
For example, a user initiates a payment, a transfer or other service request to a financial institution such as a bank through theterminal device 110, and theserver 130 running at the financial institution may handle a corresponding service according to a specified service rule after receiving the service request initiated by the user through thenetwork 120. Meanwhile, theserver 130 can perform supervision and risk prediction on the business data generated in the business handling process so as to find out abnormal transaction behaviors in time.
Fig. 2 shows a block diagram of an architecture of a data processing system in an embodiment of the present application, which may be implemented by an application installed onterminal device 110 orserver 130 for risk prediction.
As shown in FIG. 2,data processing system 200 may generally include akeyword matching module 210, akeyword analysis module 220, and akeyword application module 230. The keyword matching module first extracts text information from the business information table 240, and meanwhile, may obtain keyword data from thekeyword database 250, and after performing matching detection on the text information by using the keyword data, obtain a data list composed of matching results, and then transmit the data list to thekeyword analysis model 220. Thekeyword analysis module 220 may perform data cleaning on the data list in a linear mode or a nonlinear mode, and extract main risk text information in which an abnormal risk exists. Thekeyword application module 230 may further perform data processing on the main risk text information analyzed and extracted by thekeyword analysis module 220 according to actual application requirements, so as to obtain a data processing result in a specific application scenario. For example, a data report or warning information for risk prompt of the business data with abnormal risk may be finally formed.
In some embodiments of the present application, thedata processing system 200 may be implemented by an artificial intelligence based machine learning model.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence means that the design principle and the implementation method of various intelligent machines are researched, so that the machine has the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, electromechanical integration, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use daily, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question-answering, knowledge-mapping, and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how the computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, migratory learning, admittance learning, and learning by formula education.
In some embodiments of the present application, the keyword database and the service information table may be decentralized and reliably stored by a blockchain technique.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, and an application service layer.
The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after the effective request is identified in a consensus, and for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the encrypted service information to a shared account (network communication) completely and consistently, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for the deployment, configuration modification, contract setting, cloud adaptation in the product release process and the visual output of the real-time state in the product operation, for example: alarm, monitoring network conditions, monitoring node equipment health status, and the like.
The platform product service layer provides basic capability and an implementation framework of typical application, and developers can complete block chain implementation of business logic based on the basic capability and the characteristics of the superposed business. The application service layer provides the application service based on the block chain scheme for the business participants to use.
The following detailed description is provided to technical solutions of a data processing method, a data processing apparatus, a computer readable medium, and an electronic device provided in the present application, with reference to specific embodiments.
Fig. 3 is a flowchart illustrating steps of a data processing method in an embodiment of the present application, where the data processing method may be executed by a terminal device or a server, or may be executed by both the terminal device and the server. As shown in fig. 3, the data processing method may mainly include steps S310 to S350 as follows.
Step S310: and acquiring service data to be processed, wherein the service data comprises a service text and a service attribute associated with the service text, and the service attribute is used for representing a data acquisition source and data acquisition time of the service data.
Step S320: the method comprises the steps of obtaining keyword data used for carrying out abnormity detection on business data, wherein the keyword data comprises a keyword text and a keyword attribute associated with the keyword text, and the keyword attribute is used for representing the keyword type and the abnormity risk level of the keyword data.
Step S330: and matching and detecting the keyword texts with the service texts to determine whether the keyword texts are contained in the service texts.
Step S340: and if the keyword text is successfully matched with the service text, combining the service data and the keyword data to form matched data.
Step S350: and extracting target data for evaluating the abnormal risk of the business data from the matching data according to the business attribute and the keyword attribute.
In the data processing method provided by the embodiment of the application, the business text in the business data is subjected to matching detection through the keyword text, the business text with abnormal risk can be quickly identified, and after the matching data is formed by the business text and the corresponding keyword data, the target data for evaluating the abnormal risk is extracted from the matching data through the business attribute and the keyword attribute. The method can improve the processing efficiency of risk detection on the business data and can improve the accuracy of data identification.
Each method step in the data processing method of the embodiment of the present application is described in detail below with reference to a specific application scenario.
In step S310, service data to be processed is obtained, where the service data includes a service text and a service attribute associated with the service text, and the service attribute is used to indicate a data acquisition source and a data acquisition time of the service data.
In an embodiment of the present application, the to-be-processed business data may be transaction data obtained by screening and collecting information of a large amount of transactions or suspicious transactions found in monitoring, and specifically may include data such as business information and user information generated when a user transacts financial business. The service text is text content generated during the service transaction process, and may include, for example, the name of the transaction and descriptive words explaining the transaction content. The service attribute is attribute content associated with the service text and can comprise fields representing various attribute dimensions such as data acquisition source, data acquisition time and the like.
Taking the transfer transaction between two or more users as an example, the transaction data generated by the transaction may include account numbers of transaction parties, transaction time, transfer amount, transfer epilogue and other data. The transfer appendix or the service description is a service text, the related account number or account name is a service attribute for representing a data source, and the related transaction time is a service attribute for representing data acquisition time.
In one embodiment of the present application, the service data may be a data table formed by combining the service text and the service attribute according to a specified dimension. Fig. 4 is a schematic diagram illustrating a table for storing service data in an application scenario according to an embodiment of the present application, where the table is a service information table. As shown in fig. 4, the first column of the table is a serial number id, the second column is a matched text content word, the third column is a client name whos to which the text information belongs, and the fourth column is a time when the text information appears.
In step S320, keyword data for performing anomaly detection on the service data is obtained, where the keyword data includes a keyword text and a keyword attribute associated with the keyword text, and the keyword attribute is used to indicate a keyword type and an anomaly risk level of the keyword data.
In an embodiment of the present application, the keyword data may be a partial database entry extracted from a keyword database and used for performing anomaly detection on the service data, and the keyword database may be a database formed by performing data acquisition and screening on historical service data.
The keyword data comprises keyword texts and one or more keyword attributes associated with the keyword texts, wherein the keyword attributes are used for representing the keyword type, the abnormal risk level and other attribute information related to the abnormal risk of the business data of each keyword data. The keyword types may include, for example, white text and abnormal text, where white text indicates that the corresponding keyword text is normal text without an abnormal risk, and abnormal text indicates that the keyword text is text with an abnormal risk of a specified type. The abnormal risk level is used for indicating the degree of abnormal risk existing in the keyword text corresponding to different keyword types, the abnormal risk level may include attribute text for qualitatively measuring the degree of abnormal risk, such as a high risk level, a medium risk level and a low risk level, the abnormal risk level may also include attribute numerical values for quantitatively measuring the degree of abnormal risk, such as numerical values for continuous value or segmented value, and the larger the numerical value is, the higher the degree of abnormal risk existing is.
In one embodiment of the present application, the keyword data may include a combination of fields consisting of a keyword text field, a keyword type field, and an abnormal risk level field. Taking the application scenario of due diligence as an example, the keyword types may include white text as well as abnormal text. For example, the keyword data may include a combination of fields [ white text, XXX, a ], [ deviant text, YYY, b ], [ deviant text, ZZZ, c ], and so on. The first field is a keyword type field, the second field is a keyword text field, and the third field is an abnormal risk level field.
Step S330: and matching and detecting the keyword texts with the service texts to determine whether the keyword texts are contained in the service texts.
The method for matching and detecting the keyword text and the service text can comprise the steps of obtaining the text length of the keyword text, taking the text length of the keyword text as a sliding window, sampling character strings on the service text, sequentially matching the character strings of the text obtained by sampling with the keyword text, and determining that the keyword text is contained in the service text when at least one text character string in the service text is successfully matched with the keyword text.
In one embodiment of the present application, besides determining whether the keyword text is included in the service text through the matching detection, the occurrence frequency of the keyword text in the service text may be counted. The higher the occurrence frequency of the keyword text is, the greater the influence degree of the keyword text on the abnormal detection result of the business text is.
Step S340: and if the keyword text is successfully matched with the service text, combining the service data and the keyword data to form matched data.
For the keyword text and the service text which are successfully matched, the service data containing the service text and the keyword data containing the keyword text can be combined to form corresponding matching data. In an embodiment of the present application, the matching data may be complete data obtained by combining all fields in the business data and the keyword data, or the matching data may be incomplete data obtained by extracting partial fields from the business data and the keyword data and combining the partial fields.
Fig. 5 is a schematic diagram illustrating a process of performing match detection on keyword texts to obtain match data in an application scenario according to an embodiment of the present application.
As shown in fig. 5, in one matching round, thekeyword data 510 to be matched may be, for example, a field combination [ weather, white text, 0], where the first field "weather" is the keyword text, the second field "white text" is the keyword type, and the third field "0" is the abnormal risk level.
Service texts are extracted one by one from the service information table 520 storing service data, and are input to thekeyword matching module 530 together with the keyword text, and thekeyword matching module 530 determines whether the service texts contain text strings identical to the keyword text. For example, the service text extracted in the current matching round is "today's weather is really good", and the keyword text "weather" is included, so that the matching can be determined to be successful. Conversely, if the text string identical to the keyword text does not exist in the service text, it is determined that the matching fails.
When the matching is successful, thekeyword data 510 and the successfully matched service data may be combined to obtain the matchingdata 540. For example, the service data is a field combination [ today weather true good, zhang san, 2020-1-2 ], wherein the first field "today weather true good" is the service text used in the matching detection, the second field "zhang san" is the customer name indicating the data acquisition source, and the third field "2020-1-2" is the text occurrence time indicating the data acquisition time, which may be, for example, the transaction time of a bank transfer transaction. The matchingdata 540 obtained by combining the service data with thekeyword data 510 may be a new field combination formed by arranging the fields according to a specified dimension order, and may be, for example, [ weather, white text, 0, today's weather is really good, zhang, 2020-1-2 ].
When the matching fails, thekeyword data 510 may be continuously matched with the next service text extracted from the service information table 520. If all the service texts in the service information table 520 have been matched with thekeyword data 510, a new set of keyword data may be reselected and the next round of matching detection is started.
After the matching detection is completed to obtain the matching data, the target data can be further extracted from the matching data so as to evaluate the abnormal risk of the business data according to the target data.
FIG. 6 is a flow chart illustrating steps of a method for extracting target data in one embodiment of the present application. As shown in fig. 6, on the basis of the above embodiment, extracting target data for evaluating an abnormal risk of business data from matching data according to business attributes and keyword attributes in step S350 may include steps S610 to S630 as follows.
Step S610: and carrying out quantization processing on the matched data according to the service attribute and the keyword attribute to obtain quantized data corresponding to the matched data, wherein the quantized data is numerical data comprising one or more numerical fields.
Step S620: and sequencing the matched data according to the value of each numerical field in the quantized data to obtain a data list in ordered arrangement.
Step S630: and selecting target data for evaluating the abnormal risk of the business data from the data list according to the arrangement sequence.
In the embodiment of the application, the matching data are sorted by acquiring the quantitative data of multiple dimensions corresponding to the business attribute and the keyword attribute, the target data can be selected in a mode of comparing quantitative values, the data selection efficiency of the target data is improved, and the reliability of abnormal risk evaluation is improved.
The following describes each method step of extracting target data in detail with reference to specific embodiments.
In step S610, the matching data is quantized according to the service attribute and the keyword attribute to obtain quantized data corresponding to the matching data, where the quantized data is numerical data including one or more numerical fields.
In an embodiment of the present application, the method for performing quantization processing on the matching data in step S610 may include: extracting a first service field related to service attributes, a second service field related to key word attributes and a third service field related to statistical information of the matching data from the matching data; respectively carrying out quantization processing on the first service field, the second service field and the third service field according to the field type of each service field to obtain a numerical field corresponding to each service field; and combining the numerical value fields to obtain quantized data corresponding to the matched data.
And respectively carrying out quantitative processing on the service fields related to the service attributes and the keyword attributes in the matched data and the statistical information of the matched data according to a preset rule to obtain numerical fields corresponding to a plurality of service fields, and combining the numerical fields to obtain the quantitative data corresponding to the matched data.
In one embodiment of the application, the first business field includes a time field for indicating a time of collection of the business data, the second business field includes a rating field for indicating a level of an abnormal risk of the keyword data, and the third business field includes a frequency field for indicating a frequency of occurrence of the keyword text in the matching data.
For the first service field, a time period division mode can be adopted for quantization to obtain a corresponding numerical value field. For example, according to the data acquisition time in the service attribute, the distance between the acquisition time of the service data and the current time can be determined, and then the time distance is compared with the preset unit time length to obtain a corresponding quantitative value. For example, when the time is within 1 month from the current time, the corresponding quantization value is determined to be 1; when the distance from the current time is 1-2 months, the corresponding quantitative value can be determined to be 2; and so on.
For the second service field, if the abnormal risk level is numerical data expressed quantitatively, its own value may be directly selected as a quantized numerical field, or a value subjected to normalization processing or other processing may be used as a quantized numerical field. If the abnormal risk level is a text field represented qualitatively, the qualitative level can be mapped into a corresponding quantitative value according to a preset mapping relation. For example, if the anomaly risk level is qualitatively divided into three levels, i.e., high, medium, and low, the three levels may be mapped to quantized values of 2, 1, and 0, respectively.
For the third service field, the self-value of the occurrence frequency can be directly selected as the quantized value field, and the normalized value or the value after other processing can be used as the quantized value field.
In step S620, the matching data is sorted according to the value of each numerical field in the quantized data, so as to obtain an ordered data list.
In one embodiment of the present application, a method of sorting matching data may include: obtaining quantization weights corresponding to respective numerical fields in the quantized data; carrying out weighted summation on the numerical field according to the quantization weight to obtain the selection probability of the matched data; and sorting the matched data according to the selection probability.
For example, the quantized data includes a first quantized field L corresponding to an abnormal risk level, a second quantized field C corresponding to a frequency of occurrence of a keyword, and a third quantized field T corresponding to a data collection time. According to respective corresponding quantization weight beta1、β2And beta3After weighting and summing each quantization field, the selection probability P can be obtained as follows:
P=β1*L+β2*C+β3*T
in an embodiment of the present application, the quantitative weight may be a fixed value preset according to a processing experience of historical business data, or may be a dynamic value dynamically determined in an abnormal risk evaluation process.
In one embodiment of the present application, a method of obtaining quantization weights corresponding to respective numerical fields in quantized data may include: acquiring a quantized data sample obtained by quantizing historical service data, and acquiring a data tag obtained by evaluating abnormal risks of the historical service data; forming a training sample by the quantized data sample and the data label; quantization weights corresponding to respective numerical fields in the quantized data are predicted from the training samples.
FIG. 7 shows a data table diagram of training samples in one embodiment of the present application. As shown in fig. 7, the data table of the training sample includes two parts, namely adata label 701 and aquantized data sample 702. Thedata label 701 indicates whether the historical business data determined after the abnormal risk evaluation has an abnormal risk, for example, when the data label is 1, it indicates that the data sample has an abnormal risk, and when the data label is 0, it indicates that the data sample does not have an abnormal risk. The quantizeddata samples 702 are composed of quantized fields of multiple dimensions, including, for example, a first quantized field L corresponding to the level of abnormal risk, a second quantized field C corresponding to the frequency of occurrence of keywords, and a third quantized field T corresponding to the time of data collection.
In an embodiment of the present application, different quantization weight calculation methods may be selected according to the data amount of the training sample, for example, when the data amount of the training sample is greater than a set data amount threshold, a method of training a weight prediction model may be used to determine the quantization weight; and when the data quantity of the training sample is less than or equal to the set data quantity threshold value, the quantization weight can be determined by adopting a matrix operation method.
In one embodiment of the present application, a method of predicting quantization weights corresponding to respective numerical fields in quantized data from training samples may include: acquiring a weight prediction model for performing weight prediction on quantized data; performing iterative training on the weight prediction model according to the training samples; and predicting quantization weights corresponding to various numerical value fields in the quantized data through the trained weight prediction model. In an embodiment of the present application, the weight prediction model may be a linear model or a nonlinear model for performing a mapping process on input data to predict quantization weights. The weight prediction model can be trained by adopting a gradient descent method so as to improve the weight prediction effect.
In one embodiment of the present application, a method of predicting quantization weights corresponding to respective numerical fields in quantized data from training samples may include: respectively forming a data sample matrix and a data label matrix by using the quantized data samples and the data labels in the training samples according to the data types; and performing matrix operation on the data sample matrix and the data label matrix to obtain a weight matrix, wherein the weight matrix comprises quantization weights corresponding to all numerical value fields in the quantization data.
Taking the training sample shown in fig. 7 as an example, the data labels 701 may constitute a data label matrix Y. Y ═ 1 … 0.
The quantizeddata samples 702 may constitute a data sample matrix X.
The data label matrix Y and the data sample matrix X may be operated by the following matrix operation formula to obtain the weight matrix a.
A=(XT*X)-1*XT*Y
In an embodiment of the present application, a method for forming a data sample matrix and a data label matrix by quantizing data samples and data labels in training samples according to data types respectively may include: screening the training samples to obtain a set number of related samples having data relevance with the service data; and respectively forming a data sample matrix and a data label matrix by the quantized data samples and the data labels in the related samples according to the data types. When the data volume of the training samples is large, more computing resources are consumed in the matrix operation process.
In an embodiment of the present application, the method for sorting the matching data according to the value of each numerical field in the quantized data in step S620 may include: acquiring the sequencing priority corresponding to each numerical value field in the quantized data; and sorting the matching data according to the sorting priority and the value obtained by each numerical value field. For example, in the embodiment of the present application, matching data is sorted according to a first numerical field with the highest priority in the sorting priorities, for matching data with the same first numerical field, the matching data is sorted according to a second numerical field with the second highest sorting priority, and so on until the sorting comparison of each numerical field under all sorting priorities is completed.
In an embodiment of the application, after the matching data are sorted according to the value obtained from each numerical field in the quantized data to obtain an ordered data list, hyperlink texts related to data sources of the service data can be added to the data list; and responding to a triggering operation of the hyperlink text, and displaying a keyword cloud picture related to a data source of the business data. The basic information of a data source (such as a specified transaction user) can be visually checked based on the keyword cloud picture.
In step S630, target data for evaluating the risk of abnormality of the business data is selected from the data list in the order of arrangement.
FIG. 8 is a diagram illustrating an interface of a data list in one embodiment of the present application. As shown in the figure, the data list includes a plurality of fields relating to clients, text content, hit keywords, probability of selection, and keyword type. The data in the table are sequentially sorted from top to bottom according to the arrangement order determined in step S620, and the text content at the top represents the abnormal risk involved in the related service data. The related client field is the client to which the text content belongs, namely the data source of the service data. The basic information content of the client can be checked by adding hyperlink text to the field related to the client, and the specific text occurrence scene can be more conveniently known. The hit keyword field allows the user to know what keywords the text was hit by, and the keyword type allows the user to know the risk type to which the text relates.
In an embodiment of the application, after the target data for evaluating the abnormal risk of the business data is extracted from the matching data, an evaluation message can be automatically generated according to business requirements to evaluate the abnormal risk of the business data. In the embodiment of the application, in response to the selection operation of the keyword type, extracting the keyword text corresponding to the selected keyword type from the target data; and filling the keyword text into a preset message template to generate an evaluation message for evaluating the abnormal risk of the business data. Taking automatic generation of the message as an example, when the auditor selects the risk type of the case, the keywords with the same risk type and the same level before the auditor can be selected according to the risk type, and the related description of the text can be automatically generated, so that the generation efficiency of the message is improved.
According to the data processing method, the keyword database is utilized, all text information can be matched quickly, risks of the text information can be identified more quickly, meanwhile, comprehensive coverage of the risks is guaranteed, in the process of manual daily auditing, the speed of full-time investigation can be greatly improved, and meanwhile, related risks are not omitted. In addition, the embodiment of the application directly outputs the text image representing the main risk of the embodiment, so that the auditor can quickly analyze the case property, meanwhile, the situation that part of texts cannot be identified under the condition that the text information amount is overlarge is avoided, and the compliance is ensured to a certain extent.
By taking the application of the embodiment of the application in due diligence as an example, the embodiment of the application can quickly finish extraction and analysis of text information in due diligence, and can also ensure timeliness, risk concentration degree and balance of risk size of main extracted text risks.
The text information is searched through manual adjustment, the time is about 10s or more for a single text information search, and under the condition that the related text content is large, only the first 10% -20% of the text information is generally extracted for analysis, but the speed can be increased by 80% or more according to the embodiment of the application, meanwhile, 100% of the text information can be quickly searched, and the risk can be more comprehensively covered.
And each dimension information of the text cannot be accurately quantified during manual analysis, the extracted risk is more objective, and compared with manual adjustment, the method and the device can effectively guarantee the timeliness and other aspects of the extracted text information.
The embodiment of the application can effectively improve the efficiency of auditing the entire debugging process, and simultaneously improve the risk coverage degree and effectively ensure the compliance of full-time investigation.
It should be noted that although the various steps of the methods in this application are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
The following describes embodiments of an apparatus of the present application, which may be used to perform the data processing method in the above-described embodiments of the present application. Fig. 9 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 9, thedata processing apparatus 900 includes: a servicedata obtaining module 910, configured to obtain service data to be processed, where the service data includes a service text and a service attribute associated with the service text, and the service attribute is used to indicate a data acquisition source and data acquisition time of the service data; a keyworddata obtaining module 920 configured to obtain keyword data for performing anomaly detection on the service data, where the keyword data includes a keyword text and a keyword attribute associated with the keyword text, and the keyword attribute is used to indicate a keyword type and an anomaly risk level of the keyword data; amatch detection module 930 configured to perform match detection on the keyword text and the service text to determine whether the keyword text is included in the service text; adata combination module 940, configured to combine the service data and the keyword data to form matching data if the keyword text is successfully matched with the service text; adata extraction module 950 configured to extract target data for evaluating an abnormal risk of the business data from the matching data.
In some embodiments of the present application, based on the above technical solutions, thedata extraction module 950 includes: the quantization processing module is configured to perform quantization processing on the matched data according to the service attribute and the keyword attribute to obtain quantized data corresponding to the matched data, wherein the quantized data is numerical data comprising one or more numerical fields; the data sorting module is configured to sort the matched data according to the value of each numerical field in the quantized data to obtain a data list in ordered arrangement; and the data selecting module is configured to select target data for evaluating the abnormal risk of the business data from the data list according to the arrangement sequence.
In some embodiments of the present application, based on the above technical solutions, the quantization processing module includes: a field extraction module configured to extract, from the matching data, a first service field related to the service attribute, a second service field related to the keyword attribute, and a third service field related to statistical information of the matching data; a field quantization module configured to perform quantization processing on the first service field, the second service field, and the third service field according to a field type of each service field, to obtain a numerical field corresponding to each service field; a field combination module configured to combine the value fields into quantized data corresponding to the matching data.
In some embodiments of the present application, based on the above technical solution, the first service field includes a time field for indicating a collection time of the service data, the second service field includes a level field for indicating an abnormal risk level of the keyword data, and the third service field includes a frequency field for indicating a frequency of occurrence of the keyword text in the matching data.
In some embodiments of the present application, based on the above technical solutions, the data sorting module includes: a weight obtaining module configured to obtain quantization weights corresponding to respective numerical fields in the quantization data; the field weighting module is configured to perform weighted summation on the numerical fields according to the quantization weights to obtain the selection probability of the matching data; and the probability sorting module is configured to sort the matching data according to the selection probability.
In some embodiments of the present application, based on the above technical solutions, the weight obtaining module includes: the system comprises a sample acquisition module, a data analysis module and a data analysis module, wherein the sample acquisition module is configured to acquire a quantized data sample obtained by quantizing historical service data and acquire a data tag obtained by evaluating abnormal risks of the historical service data; a sample combination module configured to combine the quantized data samples and the data labels into training samples; a weight prediction module configured to predict quantization weights corresponding to respective numerical fields in the quantized data from the training samples.
In some embodiments of the present application, based on the above technical solutions, the weight prediction module includes: a model obtaining module configured to obtain a weight prediction model for performing weight prediction on the quantized data; an iterative training module configured to iteratively train the weight prediction model according to the training samples; a model prediction module configured to predict quantization weights corresponding to respective numerical fields in the quantized data through a trained weight prediction model.
In some embodiments of the present application, based on the above technical solutions, the weight prediction module includes: the matrix combination module is configured to respectively combine the quantized data samples and the data labels in the training samples into a data sample matrix and a data label matrix according to data types; a matrix operation module configured to perform matrix operation on the data sample matrix and the data label matrix to obtain a weight matrix, where the weight matrix includes quantization weights corresponding to respective numerical fields in the quantized data.
In some embodiments of the present application, based on the above technical solutions, the matrix combining module is configured to: screening the training samples to obtain a set number of related samples which have data relevance with the service data; and respectively forming a data sample matrix and a data label matrix by the quantized data samples and the data labels in the related samples according to the data types.
In some embodiments of the present application, based on the above technical solutions, the data sorting module is configured to: acquiring the sequencing priority corresponding to each numerical value field in the quantized data; and sorting the matched data according to the sorting priority and the values of the numerical value fields.
In some embodiments of the present application, based on the above technical solutions, the data processing apparatus further includes: a hyperlink adding module configured to add hyperlink text related to a data source of the service data in the data list; a cloud picture presentation module configured to present a keyword cloud picture associated with a data source of the business data in response to a triggering operation on the hyperlink text.
In some embodiments of the present application, based on the above technical solutions, the data processing apparatus further includes: the text selection module is configured to respond to selection operation of the keyword types and extract keyword texts corresponding to the selected keyword types from the target data; and the message generation module is configured to fill the keyword text into a preset message template so as to generate an evaluation message for evaluating the abnormal risk of the business data.
The specific details of the data processing apparatus provided in each embodiment of the present application have been described in detail in the corresponding method embodiment, and are not described herein again.
Fig. 10 schematically shows a block diagram of a computer system of an electronic device for implementing an embodiment of the present application.
It should be noted that thecomputer system 1000 of the electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 10, thecomputer system 1000 includes a Central Processing Unit (CPU) 1001 that can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In therandom access memory 1003, various programs and data necessary for system operation are also stored. Thecpu 1001, therom 1002, and theram 1003 are connected to each other via a bus 1004. An Input/Output interface 1005(Input/Output interface, i.e., I/O interface) is also connected to the bus 1004.
The following components are connected to the input/output interface 1005: aninput section 1006 including a keyboard, a mouse, and the like; anoutput section 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1008 including a hard disk or the like; and acommunication section 1009 including a network interface card such as a local area network card, modem, or the like. Thecommunication section 1009 performs communication processing via a network such as the internet. Thedriver 1010 is also connected to the input/output interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.
In particular, according to embodiments of the present application, the processes described in the various method flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication part 1009 and/or installed from theremovable medium 1011. When the computer program is executed by thecpu 1001, various functions defined in the system of the present application are executed.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of computer-readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by a combination of software and necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.