Disclosure of Invention
In view of the above, the present application provides a method, a system, a computer and a readable storage medium for predicting a risk merchant.
The embodiment of the application provides a risk merchant prediction method, which comprises the following steps:
acquiring shop information, wherein the shop information comprises product information, and the product information comprises product transaction data;
deriving a preset number of to-be-measured transaction characteristic variable sets based on the product transaction data;
screening out a primary transaction characteristic variable set based on a preset sample set and a distributed gradient enhancement library;
screening out a secondary transaction characteristic variable set based on the primary transaction characteristic variable set and the transaction characteristic variable set to be tested;
screening out an optimal distributed gradient enhancement library based on the secondary transaction characteristic variable set and the distributed gradient enhancement library;
and predicting a risk merchant based on the optimal distributed gradient enhancement library and the shop information.
Further, in the risk merchant prediction method, the preset sample set includes a positive sample set and a negative sample set, and the screening the first-level transaction characteristic variable set based on the preset sample set and the distributed gradient enhancement library includes:
based on a preset sample set and a distributed gradient enhancement library, obtaining an AUC value of each pair of positive and negative sample groups;
and screening the primary transaction characteristic variable set based on the AUC value and a preset threshold value.
Further, in the method for predicting a risk merchant, the transaction characteristic variable corresponds to a transaction characteristic, and the screening the secondary transaction characteristic variable set based on the primary transaction characteristic variable set and the to-be-measured transaction characteristic variable set includes:
and taking the to-be-measured transaction characteristic variable set which is the same as the transaction characteristic in the primary transaction characteristic variable set as the secondary transaction characteristic variable set.
Further, in the method for predicting a risk merchant, the screening the optimal distributed gradient enhancement library based on the secondary transaction characteristic variable set and the distributed gradient enhancement library includes:
obtaining a weight value of each secondary transaction characteristic variable set based on the secondary transaction characteristic variable set and a distributed gradient enhancement library;
deleting the secondary transaction characteristic variable set corresponding to the weight value lower than the preset weight value to obtain an optimal transaction characteristic variable set;
training the distributed gradient enhancement library based on the optimal transaction characteristic variable set to obtain an optimal distributed gradient enhancement library.
Further, in the risk merchant prediction method, the shop information includes shop registration information, and the product information further includes a product picture.
Further, in the risk merchant predicting method, the shop registration information includes a shop name, a control person name of the shop, and a business license of the shop; the predicting risk merchants based on the optimal distributed gradient enhancement library and the shop information comprises the following steps:
obtaining the similarity of each product picture based on the product picture;
based on the shop registration information, obtaining an actual meaning value of the shop name and shop information of all other shops associated with the controller;
and predicting the risk merchant through the optimal distributed gradient enhancement library based on the similarity, the actual meaning value and the shop information of the other shops.
Further, in the method for predicting a risk merchant, after the screening of the secondary transaction characteristic variable set and before the screening of the optimal distributed gradient enhancement library, the method further includes:
obtaining a three-level transaction characteristic variable set based on the two-level transaction characteristic variable set and a preset screening rule;
screening an optimal distributed gradient enhancement library based on the three-level transaction characteristic variable set and the distributed gradient enhancement library;
the preset screening rule comprises the following steps:
and deleting the preset transaction characteristic variable set in the secondary transaction characteristic variable set.
Another embodiment of the present application further provides a risk shop prediction system, including:
the information processing device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring shop information, the shop information comprises product information, and the product information comprises product transaction data;
the deriving unit is used for deriving a preset number of to-be-measured transaction characteristic variable sets based on the product transaction data;
the first screening unit is used for screening out a primary transaction characteristic variable set based on a preset sample set and a distributed gradient enhancement library;
the second screening unit is used for screening out a secondary transaction characteristic variable set based on the primary transaction characteristic variable set and the transaction characteristic variable set to be tested;
the third screening unit is used for screening out an optimal distributed gradient enhancement library based on the secondary transaction characteristic variable set and the distributed gradient enhancement library;
and the prediction unit is used for predicting the risk merchant based on the optimal distributed gradient enhancement library and the shop information.
Another embodiment of the present application also proposes a computer including a storage unit in which a computer program is stored, and a processing unit that executes the steps of the risk-merchant prediction method described above by calling the computer program stored in the storage unit.
Another embodiment of the present application also proposes a computer readable storage medium storing a computer program adapted to be loaded by a processor for performing the steps of the risk merchant prediction method as described above.
The embodiment of the application has the following beneficial effects:
the embodiment of the application provides a risk merchant prediction method, which comprises the steps of obtaining a primary transaction characteristic variable set through a preset sample set, carrying out derivatization, optimization and refinement by utilizing the primary transaction characteristic variable set and obtained shop information, carrying out layer-by-layer screening through a model to obtain an optimal model, and predicting the shop information through the optimal model. The method can not only continuously learn the optimization model along with the acquired data, but also improve the accuracy of the prediction result.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments.
The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
The terms "comprises," "comprising," "including," or any other variation thereof, are intended to cover a specific feature, number, step, operation, element, component, or combination of the foregoing, which may be used in various embodiments of the present application, and are not intended to first exclude the presence of or increase the likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the application belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is the same as the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in connection with the various embodiments of the application.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The embodiments described below and features of the embodiments may be combined with each other without conflict.
Generally, the risk prevention and control means of the current e-commerce platform mainly take a blacklist, an expert experience model and other identification modes, but with the continuous expansion of guest groups sinking, business operation modes and illegal black ash industry chains, the current risk prevention and control means and manpower cannot meet the current risk prevention and control, and are mainly embodied in the following 3 aspects:
first, the suspected risk merchants provided by the public security department, the clearing society, the data service provider and the like are low in efficiency, and cannot meet the requirements of platform business compliance development.
Secondly, the existing expert experience model is summarized and refined for historical illegal merchants, mainly relies on past experiences, and for novel online gambling risk transaction monitoring, the novel online gambling risk transaction monitoring system lacks self-learning capability, is obvious in hysteresis and low in accuracy.
Thirdly, the behavior data of the platform merchant is single, merchant fund chain links cannot be fully utilized, closed loops cannot be formed, and the requirement of continuous development of business scale cannot be met.
Therefore, in order to solve the above problems, the present application provides a method for predicting risk merchants, which can be applied in the field of financial science and technology and other fields.
Referring to fig. 1, a flow chart of a method for predicting a risk merchant according to an embodiment of the application is shown. The risk merchant prediction method is exemplarily applied to a transaction platform, for example, the transaction platform can be an e-commerce platform in the field of financial science and technology.
In some embodiments, as shown in fig. 1, a method for risk merchant prediction includes:
s110, acquiring shop information, wherein the shop information comprises product information, and the product information comprises product transaction data.
Specifically, the shop information in the embodiment is derived from a plurality of electronic commerce platforms, public security departments, financial regulatory authorities, full-volume industrial and commercial enterprise information service providers, and the like.
S210, deriving a preset number of to-be-tested transaction characteristic variable sets based on the product transaction data.
Specifically, the product transaction data includes recharge running water, lift running water, transfer information, etc. Since each store has less data, more data needs to be derived from multi-dimensional multi-angles for subsequent screening by feature engineering.
Illustratively, deriving the recharge flow data may derive a 3 minute recharge amount, a 7 minute recharge amount, a 1 hour recharge amount, a 24 hour recharge amount, a 3 day recharge amount, a 7 day recharge amount, a 1 month recharge amount, and the like. If the cash flow is derived, the cash flow can be derived from the cash amount of the past 3 minutes, the cash amount of 7 minutes, the cash amount of 1 hour, the cash amount of 24 hours, the cash amount of 3 days, the cash amount of 7 days, and the cash amount of 1 month.
S310, screening out a primary transaction characteristic variable set based on a preset sample set and a distributed gradient enhancement library.
Specifically, the distributed gradient enhancement library (eXtreme Gradient Boosting, XGBoost) is an extensible machine learning model of tree boosting. The model realizes a machine learning algorithm under the Gradient Boosting framework, and can quickly and accurately solve a plurality of data science problems. These problems include: store sales prediction, high-energy physical event classification, web text classification, customer behavior prediction, motion detection, advertisement click rate prediction, malware classification, product classification, risk prediction, large-scale online course learning rate prediction, and the like.
The application inputs a preset sample set into the XGBoost model for screening. The preset sample set is a set formed by preset samples, and the samples are transaction behavior data of merchants, such as transaction running water, presentation running water, transfer information and the like.
In some embodiments of the risk merchant prediction method, as shown in fig. 2, the preset sample set includes a positive sample set and a negative sample set, and the screening of the first-level transaction characteristic variable set based on the preset sample set and the distributed gradient enhancement library includes:
s311, obtaining the AUC value of each pair of positive and negative sample groups based on a preset sample set and a distributed gradient enhancement library.
Specifically, the preset sample set includes a positive sample set and a negative sample set, wherein the positive sample is transaction behavior data of a normal user, and the negative sample is transaction behavior data of an abnormal user (risk user). The model randomly extracts one sample from the positive sample set and the negative sample set respectively to be combined into a pair of positive and negative sample sets, and predicts each pair of positive and negative sample sets to obtain corresponding AUC values.
Wherein the AUC value is a measure of the model's ability to rank. It means that a positive sample and a negative sample are randomly given, the model prediction probability of the positive sample is larger than the probability value of the prediction probability of the negative sample, the higher the AUC value is, the better the model ordering capability is, and if the prediction probability values of all the positive samples are higher than the negative sample, the AUC of the model is 1.
S312, screening out a first-level transaction characteristic variable set based on the AUC value and a preset threshold value.
Specifically, since each positive and negative sample group corresponds to an AUC value, a large screening of AUC values is required for subsequent reference. And taking the sample set corresponding to the AUC value not smaller than the preset threshold value as a primary transaction characteristic variable set.
S410, screening out a secondary transaction characteristic variable set based on the primary transaction characteristic variable set and the transaction characteristic variable set to be tested.
In some embodiments of the method for predicting risk merchants, the transaction characteristic variable corresponds to a transaction characteristic, and the screening of the secondary transaction characteristic variable set based on the primary transaction characteristic variable set and the to-be-measured transaction characteristic variable set includes:
and taking the to-be-measured transaction characteristic variable set which is the same as the transaction characteristic in the primary transaction characteristic variable set as the secondary transaction characteristic variable set.
Specifically, the transaction characteristics corresponding to the primary transaction characteristic variable set have strong distinguishing capability on the positive sample and the negative sample, so that the transaction characteristics to be tested are screened by taking the transaction characteristics corresponding to the primary transaction characteristic variable set as a reference.
Exemplary, if the corresponding transaction characteristic in the primary transaction characteristic variable set is a 1 hour refill amount and an hour advance amount. And taking all variables corresponding to the 1-hour recharge amount and the one-hour advance amount in the transaction characteristic variables to be tested as a secondary transaction characteristic variable set. It will be appreciated that if a large amount of money is paid after a large amount of money is charged during a transaction, and a large amount of cash is presented in a short period of time, the transaction is likely to be a risky transaction.
S510, screening out an optimal distributed gradient enhancement library based on the secondary transaction characteristic variable set and the distributed gradient enhancement library.
In some embodiments of the method for predicting a risk merchant, as shown in fig. 3, screening an optimal distributed gradient enhancement library based on a secondary transaction characteristic variable set and the distributed gradient enhancement library includes:
s511, obtaining the weight value of each secondary transaction characteristic variable set based on the secondary transaction characteristic variable set and the distributed gradient enhancement library.
Specifically, the XGBoost model may output a weight value corresponding to each secondary transaction characteristic variable set. The weight value represents the importance of the variable, and the larger the weight value is, the higher the importance of the variable is. Therefore, more important variables need to be extracted.
And S512, deleting the secondary transaction characteristic variable set corresponding to the weight value lower than the preset weight value to obtain an optimal transaction characteristic variable set.
Specifically, deleting variables with smaller weight values to obtain variables with larger weight values, and taking all the variables remained after deletion as an optimal transaction characteristic variable set.
And S513, training the distributed gradient enhancement library based on the optimal transaction characteristic variable set to obtain an optimal distributed gradient enhancement library.
Specifically, after the optimal transaction characteristic variable set is obtained, training the XGBoost model by using the variables to obtain an optimal XGBoost model (i.e., an optimal distributed gradient enhancement library). And then predicting the subsequent data by using the optimal distributed gradient enhancement library.
In the method for predicting risk merchants of some embodiments, after screening out the secondary transaction characteristic variable set and before screening out the optimal distributed gradient enhancement library, the method further comprises:
and obtaining a three-level transaction characteristic variable set based on the two-level transaction characteristic variable set and a preset screening rule.
Screening out an optimal distributed gradient enhancement library based on the three-level transaction characteristic variable set and the distributed gradient enhancement library;
the preset screening rules comprise:
and deleting the preset transaction characteristic variable set in the secondary transaction characteristic variable set.
Specifically, in actual testing, the obtained optimal transaction characteristic variable set is often not optimal after the secondary transaction characteristic variable set with lower than preset weight is deleted, because the actual situation needs to be combined, for example, the commodity is suitable for young people, but the age of the transactor is found to be too large (60 years old, 70 years old, 80 years old, etc.), or the amount data is too large and actually seriously inconsistent. Therefore, the secondary transaction feature variable set needs to be filtered again after being weighted, for example, feature variables which are not matched with the actual products are deleted.
S610, predicting risk merchants based on the optimal distributed gradient enhancement library and the shop information.
Specifically, the obtained optimal XGBoost model is utilized to predict the information of the follow-up shops.
In some embodiments of the risk merchant prediction method, the store information includes store registration information, and the product information further includes a product picture.
In some embodiments of the risk merchant prediction method, the merchant registration information includes a merchant name, a merchant's control person name, and a merchant's business license; predicting a risk merchant based on the optimal distributed gradient enhancement library and the merchant information, comprising:
and obtaining the similarity of each product picture based on the product pictures.
Specifically, comparing all the product pictures of each shop to obtain similar pictures and similarity. The transaction amount corresponding to the picture with high similarity can be used as one of the judgment standards of the risk merchant.
Exemplary, products A, B and C are included in the shops, the pictures of the product a are compared with the pictures of the product B and the product C in sequence, the pictures of the product a and the product C are found to have the highest similarity, the similarity is eighty percent, and the corresponding similarity of the product a and the product C is eighty percent.
Based on the store registration information, the actual meaning value of the store name of the store, and store information of all other stores associated with the controller are obtained.
Specifically, since the commodity name and the shop name in the shop registration information do not belong to the same class or do not have an actual meaning (i.e. whether the commodity name has an actual meaning or not, the value range of the actual meaning value is 0-1), for example, the shop name is preceded by AAA letters. Then it can be one of the criteria for the risky merchant. The store information of other stores associated with the controller is also one of the criteria. The acquired shop information of other shops is the same as the acquired shop information of the commodity.
And predicting the risk merchant through the optimal distributed gradient enhancement library based on the similarity, the actual meaning value and the shop information of other shops.
According to the risk shop prediction method, the primary transaction characteristic variable set is acquired through the preset sample set, the primary transaction characteristic variable set and the acquired shop information are utilized to conduct derivatization, optimization and refinement, the models are screened layer by layer to obtain the optimal model, and the shop information is predicted through the optimal model. The method can not only continuously learn the optimization model along with the acquired data, but also improve the accuracy of the prediction result. Meanwhile, the method has the following advantages:
first, integrate the blacklist that authorities such as the public security betting was involved in fraud, the national financial supervision administration, the clearing society provided and the e-commerce platform trade of business in and out, the business operation condition of shop and the multiple data sources such as merchant financial behavior data and carry out the multiple incidence relation penetration of customer by means of the knowledge graph, enrich current risk monitoring dimension and technique, solve the pain point that can't fully penetrate "know customer" at present.
Second, the problems of low operation efficiency and poor expansibility can be solved, model errors are reduced, and model accuracy is improved.
Thirdly, an AI algorithm model and a rule model are comprehensively applied, so that the identification means and modes of network gambling risk partners are enriched, the defects of the existing rule model are overcome, and the identification accuracy of risk merchants is comprehensively improved.
Another embodiment of the present application also proposes a risk shop prediction system 700, as shown in fig. 4, the system 700 includes:
and an acquisition unit 710 for acquiring shop information including product transaction data.
A deriving unit 720, configured to derive a preset number of to-be-measured transaction characteristic variable sets based on the product transaction data.
The first screening unit 730 is configured to screen out a first-level transaction characteristic variable set based on a preset sample set and a distributed gradient enhancement library.
The second screening unit 740 is configured to screen out the secondary transaction characteristic variable set based on the primary transaction characteristic variable set and the to-be-tested transaction characteristic variable set.
And a third screening unit 750, configured to screen out an optimal distributed gradient enhancement library based on the secondary transaction characteristic variable set and the distributed gradient enhancement library.
And a prediction unit 760 for predicting the risk merchant based on the optimal distributed gradient enhancement library and the merchant information.
Another embodiment of the present application further provides a computer, including a storage unit and a processing unit, where the storage unit stores a computer program, and the processing unit executes the steps of the risk merchant prediction method by calling the computer program stored in the storage unit.
Another embodiment of the present application also proposes a computer readable storage medium storing a computer program adapted to be loaded by a processor to perform the steps of the above-mentioned risk merchant prediction method.
It will be appreciated that the method steps of the present embodiment correspond to the risk shop prediction method in the above embodiment, and that the options of the risk shop prediction method described above are equally applicable to the present embodiment, and will not be repeated here.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flow diagrams and block diagrams in the figures, which illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules or units in various embodiments of the application may be integrated together to form a single part, or the modules may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a smart phone, a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application.