Detailed Description
For a further understanding of the present application, the present application will be described in detail with reference to the drawings and examples. The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the application are shown in the drawings.
As shown in fig. 1-3, a bill payment system based on service configuration in this embodiment may specifically include:
s101, acquiring transaction risk level and user credit score according to the business configuration parameters of the bill payment system.
The method comprises the steps of obtaining transaction amount, transaction time and transaction object information according to a transaction request, evaluating transaction risk by using a logistic regression model according to historical transaction data and user credit records to obtain transaction risk scores, taking different measures according to transaction risk grades, wherein if the transaction risk scores are larger than or equal to a preset first risk threshold and smaller than a preset second risk threshold, the user is required to conduct identity verification, if the transaction risk scores are larger than or equal to the preset second risk threshold, limiting the upper limit of the transaction amount, taking the evaluated transaction risk scores together with the transaction amount and the transaction object information as input of the logistic regression model, calculating the credit scores of users in real time by using a random forest algorithm, dynamically updating the credit scores of the users according to calculation results, taking the transaction risk scores and the credit scores of the users into a business rule engine for bill payment, dynamically adjusting system strategies comprising transaction limits and transaction rights according to preset rules, obtaining transaction results and user feedback after the transaction is completed, taking the transaction results and the user feedback as sample data, retraining the logistic regression model, continuously optimizing performance, and enabling the business rules and the user feedback as sample data to meet the regular risk requirements when the business rules and the level of the transaction risk requirements are matched with the business risk rules are required to be matched with the regular risk demands.
Specifically, in the bill payment system, when a 500-element shopping transaction request is received, the system first obtains the amount, time and identity information of the transaction object of the transaction. Next, the system retrieves the user's last 6 months transaction record from the historical transaction database, finds that the user has 3 similar shopping transactions, in amounts of 480, 550 and 520, respectively, and completes payment on time. At the same time, the system also learns from the user credit profile that the user's credit score is 750 points (total score 800 points), belonging to the group of users with good credit. Based on the above information, the system uses a logistic regression model to evaluate the risk of the transaction, and the risk score of the transaction is calculated to be 35, which is greater than or equal to the preset first risk threshold 30 and less than the preset second risk threshold 40, so that the risk of the transaction is determined to be lower. Nevertheless, the system still requires the user to perform authentication of the short message authentication code to confirm the user's identity. And after the identity authentication is passed, the system uses a random forest algorithm, takes 10 characteristics of the user such as the latest 6 months of transaction frequency, transaction amount, credit score and the like as input, and calculates the credit score of the user in real time to obtain the latest credit score of 770 points of the user. According to the preset business rule, the system dynamically adjusts the single transaction allowance of the user to 800 yuan, and successfully puts the transaction request of 500 yuan. After the transaction is completed, the system also obtains 5-star praise of the shopping experience of the user, and takes the 5-star praise as sample data to be incorporated into the model training set. Through periodic model retraining, the system optimizes the performances of the logistic regression risk assessment model and the random forest credit scoring model, and improves the risk prevention and control capability of the system while ensuring service continuity.
S102, judging whether the transaction risk level and the user credit score trigger an insurance mechanism or not through a dynamic decision rule.
Acquiring transaction-related data in a transaction system, wherein the transaction-related data comprises transaction amount, transaction time, transaction object, transaction type and transaction place;
preprocessing the transaction related data, wherein the preprocessing comprises data cleaning, feature extraction and normalization processing of the transaction related data;
Inputting the preprocessed transaction related data into a preset transaction risk assessment model, wherein the transaction risk assessment model adopts a random forest algorithm, and outputting a transaction risk level corresponding to the transaction related data;
Acquiring user historical transaction data and credit records in a user information system, wherein the user historical transaction data and the credit records comprise past transaction frequency, transaction amount, transaction type and credit default conditions of a user;
Inputting the historical transaction data and the credit records of the user into a preset credit scoring model, wherein the credit scoring model adopts a logistic regression algorithm to output credit scores corresponding to the user;
Weighting calculation is carried out on the transaction risk level and the credit score according to a preset risk control rule, so that a comprehensive risk index is obtained;
And if the comprehensive risk index exceeds a preset threshold, triggering a corresponding insurance mechanism, wherein the insurance mechanism comprises the steps of requiring additional identity verification, limiting transaction amount and delaying transaction execution.
Specifically, in the bill payment system of a certain e-commerce platform, when a 1000 yuan shopping transaction request is received, the system firstly obtains that the amount of the transaction is 1000 yuan, the transaction time is 2023, 5 months, 10 days, 14:30, the transaction object is a well-known brand clothing store, the transaction type is clothing consumption, and the transaction place is Beijing city sea lake area. And then, preprocessing the transaction related data by the system, removing abnormal values in transaction amount and transaction time through data cleaning, extracting 5 key features such as transaction amount and transaction type, and normalizing the feature data by adopting a maximum and minimum normalization method. And then, the system inputs the preprocessed transaction data into a transaction risk assessment model based on a random forest algorithm, and the model outputs the risk level of the transaction as 'low risk' through comprehensively analyzing the characteristics of transaction amount, transaction type and the like. Meanwhile, the system also obtains the transaction frequency of the user in the last 1 year from the user information system as 5 times per month, monthly transaction amount as 3000 yuan, main transaction type as clothing consumption, and no credit violation record. The system inputs the historical transaction data and credit records of the user into a credit scoring model adopting a logistic regression algorithm, and the credit score of the user is 750 points (total score of 800 points) through model calculation. Next, the system performs weighted calculation on the risk level "low risk" of the transaction and the user credit score 750 according to a preset risk control rule, and the obtained comprehensive risk index is 20 and does not exceed the preset risk threshold 50, so that the transaction is judged to be normal, and the payment flow is successfully released and completed. Through the bill payment system, the e-commerce platform can monitor the risk condition of each transaction in real time, dynamically adjust the risk control strategy according to the credit condition of the user, and effectively prevent fraud risks while improving the payment efficiency.
S103, training a transaction problem probability prediction model based on historical transaction data by adopting a machine learning algorithm.
Historical transaction data are obtained, and preprocessing is carried out on the data. Preprocessing includes deleting duplicate records, filling in missing values, normalizing numerical features, and the like. Key features such as transaction amount, transaction frequency, transaction time, etc. are extracted from the preprocessed data. The extracted features are combined with known problem transaction tags to form a training dataset. Based on the characteristics of the training data set, a random forest algorithm is selected to construct a transaction problem probability prediction model. Random forests can handle high-dimensional features, are insensitive to outliers, and have generalization capability. The training data is input into a random forest algorithm, and an initial model is obtained through training by adjusting the number of the trees and the depth parameters of the trees. The model performance was evaluated using a five-fold cross-validation method. And calculating the accuracy, the precision, the recall and the F1 value on each verification set. Based on the evaluation result, grid search is used to optimize model parameters such as the number of trees and the maximum depth to improve the predictive performance of the model. The same preprocessing and feature extraction methods as training data are applied to the new transaction data. And inputting the extracted features into an optimized transaction problem probability prediction model to obtain a probability value of the transaction problem. The risk probability threshold is set to 0.7. If the predictive probability is greater than the threshold, a high risk transaction is determined. For high risk transactions, the system automatically sends risk prompt information to the relevant operators and pauses the subsequent processing flow of the transaction. New transaction data and issue feedback are collected daily. When the amount of accumulated new data reaches 10% of the original training set, model update is triggered. When updating, the new data is combined with the original training set, and the model training and optimizing process is repeated. If the F1 value of the updated model is improved by more than 1%, the existing model is replaced, otherwise, the original model is reserved. Manual review of model predictions is performed periodically (e.g., monthly). And adjusting a feature extraction method or a risk probability threshold according to the audit feedback. If a new fraud mode is found, the new fraud mode is incorporated into the characteristic engineering process, and the recognition capability of the model on new problem transactions is improved.
Specifically, in the transaction risk monitoring system of a certain electronic commerce platform, 500 ten thousand historical transaction records in the last 1 year are firstly obtained from a transaction database. Then, preprocessing the data, removing 2% of repeated records and 5% of missing value records by a data cleaning algorithm, and carrying out maximum and minimum value normalization processing on numerical characteristics such as transaction amount, transaction time and the like. Then, 10 key features such as transaction amount, transaction frequency, transaction time, transaction objects and the like are extracted from the preprocessed data and combined with the corresponding tags of whether the transaction has problems or not, so that a training data set containing 450 ten-thousand records is formed. A random forest algorithm is selected to construct a transaction problem prediction model, taking into account the high dimensional nature of the dataset and the possible outliers. By setting parameters such as the number of trees is 100, the maximum depth is 20 and the like, training is performed on a training data set, and an initial random forest model is obtained. To evaluate the performance of the model, a five-fold cross-validation method was used to randomly divide the training set into 5 subsets, each time training the model with 4 of the subsets, the remaining 1 subset being validated. After 5 times of cross verification, the average accuracy of the model on a verification set is 95%, the accuracy is 91%, the recall rate is 89%, and the F1 value is 90%. In order to improve the model performance, parameters such as the number of trees, the maximum depth and the like are optimized by using a grid search method, and finally, a random forest model which comprises 150 trees and has the maximum depth of 25 is obtained. When new transaction data arrives, the system automatically carries out preprocessing and feature extraction which are the same as those of training data, and then inputs the extracted features into an optimized random forest model for prediction. If the probability of the transaction being predicted as a problem transaction is greater than 70%, the system determines that the transaction is a high risk transaction and automatically sends risk prompt information to related personnel, and simultaneously pauses the payment flow of the transaction. In addition, the system can automatically collect newly generated transaction data and problem transaction data fed back by a user every day, when the accumulated new data amount reaches 10% of the original training set, namely 45 ten thousand pieces, the system automatically triggers a model updating flow, the new data is combined into the original training set, and model training and optimization are performed again. After one model update, the F1 value of the updated model on the verification set is improved from 90% to 92% and the improvement amplitude exceeds 1%, so that the system automatically replaces the updated model with a new online prediction model. Meanwhile, 1000 transaction data predicted as high risk are extracted monthly by the system, the transaction data are audited manually, and a risk judgment threshold value and a feature extraction method are dynamically adjusted according to an audit result, so that the risk recognition capability of the model is continuously improved.
S104, predicting the occurrence probability of the transaction problem by using the transaction problem probability prediction model aiming at the current batch processing bill.
For batch processing of bill transactions, attributes of each transaction are first obtained from a transaction database, including transaction object, transaction time, transaction amount and recent transaction frequency. And cleaning and standardizing the obtained transaction attribute data, wherein the cleaning and standardizing comprises the steps of removing abnormal values, filling missing values and unifying data formats. The processed transaction attribute data is input into a pre-trained transaction problem probability prediction model which is obtained by training based on historical transaction data and is used for predicting the problem occurrence probability of each transaction. And setting the probability threshold of occurrence of the problem to be 0.7, and judging whether the prediction probability of each transaction exceeds the threshold. For transactions exceeding the threshold, the high risk transaction is marked and additional characteristics of the transaction are obtained, including credit score of the transaction object, difference between the transaction amount and the average transaction amount of the account over the last 3 months, whether the transaction time is in an unusual period, etc. All features of the high risk transaction, including the initial attributes and additional features, are entered into a pre-trained gradient-boost decision tree model that is used to calculate a suspicion degree score for the high risk transaction. And (3) sorting the high-risk transactions in a descending order according to the suspicious degree score, and selecting the first 10% transactions with the highest score as suspicious transactions needing to be focused and checked. And generating an early warning report containing transaction detailed information and a suspicion degree score for the screened suspicious transactions. And pushing the early warning report to responsible persons of related business departments in a message form through an application program interface of an enterprise internal instant messaging system. Meanwhile, a checking task is created in a task list to be processed of the transaction monitoring system and assigned to specific business personnel. After receiving the early warning information, business personnel log in the transaction monitoring system to check the detailed information of suspicious transactions. According to a preset checking flow, business personnel need to contact a transaction related party, check a transaction background, check related documents and the like to verify the authenticity and compliance of the transaction. The verification results will be recorded in the transaction monitoring system, including transaction status (normal, abnormal, fraudulent, etc.), verification process descriptions, and processing advice. For transactions confirmed to be abnormal or fraudulent, the system automatically generates a processing worksheet, and the system flows to the wind control department for investigation and processing. Meanwhile, the related transaction information is fed back to the model training module and used for periodically updating and optimizing the probability prediction model of the transaction problem, so that the accuracy and the adaptability of the model are improved.
Specifically, in a transaction risk monitoring system of a cross-border e-commerce platform, attribute data of each cross-border transaction is obtained in real time through a data interface of the transaction system, wherein the attribute data comprise registered countries of transaction objects, transaction occurrence time, transaction amount (dollars) and the number of transactions of the last week of an account. Then, the registered country names of the trade objects are standardized by using a natural language processing technology, different languages, synonyms and abbreviations are unified into standard English country names, the trade time is subjected to time zone conversion and unified into Greenwich mean time, the trade amount is subjected to exchange rate conversion and unified into dollar units, the trade frequency is subjected to abnormal value detection, and extreme values exceeding 3 times of standard deviation of the average value are removed. And then, inputting the processed transaction attribute data into a random forest model of 100 trees trained based on 1000 ten thousand pieces of historical transaction data, wherein the AUC (Area Under ROC Curve) of the model on a test set reaches 0.95, and the model is used for predicting the occurrence probability of the problem of each transaction. If the occurrence probability of the problem of one transaction exceeds the threshold value of 0.7, marking the problem as high-risk transaction, and acquiring the credit score (provided by a third-party credit investigation mechanism in the range of 0-100), the difference value between the transaction amount and the daily average transaction amount of the account of nearly 3 months, whether the transaction time is in the high-risk period of 23:00 local time to 5:00 next day and other additional characteristics of the transaction object through the query interface of the transaction system. Then 7 features of the high risk transaction (4 initial attributes +3 additional features) are input into a gradient boost decision tree model consisting of 500 trees with a maximum tree depth of 8, the accuracy of the model on the validation set being 92%, for calculating the suspicious degree score (range 0-100) of the high risk transaction. And (3) ordering all high risk transactions in descending order according to the suspicious degree score, taking the first 10% (if less than 100, all the transactions are selected, and the maximum is not more than 500) as suspicious transactions, and generating an early warning report containing information such as transaction serial numbers, transaction object account numbers, transaction amounts, transaction time, suspicious degree scores and the like. Pushing the early warning report to a responsible person of a cross-border business department in a rich text message form through RestFul API interfaces of an internal IM system, and sending the early warning report with the message in an attachment form of a CSV format. Meanwhile, the early warning report information can be automatically written into a task table to be processed in a Mysql database of the transaction monitoring system, a trigger is automatically distributed to specific business personnel, and the state is set to be 'to be checked'. After receiving the early warning message through the IM system, business personnel can click on a link in the message to log in a Web page of the transaction monitoring system, find corresponding early warning transaction in the task list to be processed, and check transaction details. the page is embedded with data interfaces of CRM system, ERP system, logistics system, etc. of enterprise, business personnel can call the historical transaction data, commodity stock state, logistics list number, etc. of transaction object by one key, and can call the transaction object by one key or send mail, etc. to complete checking efficiently. After the verification is completed, the business personnel selects the transaction state (normal/abnormal/fraudulent) on the page, inputs verification result description and processing suggestion, and clicks and submits. If the abnormality or fraud is selected, the system automatically creates a risk event record and pushes the risk event record to the wind control department for processing. And simultaneously, transmitting the actual state and related information of the transaction to a model training module in a Json format through Kafka real-time data stream. After receiving the feedback data, the model training module starts the characteristic engineering flow, redesigns the characteristics and merges the characteristics into the model training set. When the accumulated feedback data reaches 10 ten thousand, the model is automatically started to retrain, the gradient lifting decision tree model is updated, and updated model parameters are written into a configuration file to take effect in real time.
S105, triggering the insurance mechanism if the occurrence probability of the transaction problem exceeds a preset threshold, otherwise dynamically adjusting the batch processing quantity according to the bill amount and the credit score of the user.
And acquiring the occurrence probability of the transaction problem, comparing the occurrence probability of the transaction problem with a preset threshold, triggering an insurance mechanism to execute corresponding risk precautionary measures if the occurrence probability of the transaction problem exceeds the preset threshold, and continuing the subsequent transaction processing flow if the occurrence probability of the transaction problem does not exceed the preset threshold. And acquiring the bill amount and the credit scoring data of the user of the current transaction, and predicting by using a pre-trained CART decision tree model by taking the bill amount and the credit scoring data of the user as input characteristics to obtain the batch processing quantity adjustment amplitude for the current transaction. The batch processing number represents the number of transaction records contained in each batch when the transaction data is batched. And carrying out weighted average on the batch processing quantity adjustment amplitude obtained by the prediction of the decision tree model and the currently set batch processing quantity, and setting the weight according to the historical data to obtain a new batch processing quantity. And applying the adjusted batch processing quantity to a transaction processing flow, and carrying out batch processing operation on the corresponding quantity of transaction data, thereby improving the transaction processing efficiency. And continuously monitoring the change condition of the occurrence probability of the transaction problem, when the occurrence probability is increased, adjusting and optimizing the preset threshold and the hyper-parameters of the CART decision tree model by using a grid search algorithm, and dynamically updating the adjustment strategies of the threshold and the batch processing quantity so as to adapt to the change of the transaction mode and ensure the dynamic balance of risk prevention and transaction efficiency.
Specifically, in a transaction risk monitoring system of a cross-border e-commerce platform, attribute data of each cross-border transaction is obtained in real time through a data interface of the transaction system, wherein the attribute data comprise registered countries of transaction objects, transaction occurrence time, transaction amount (dollars) and the number of transactions of the last week of an account. Then, the registered country names of the trade objects are standardized by using a natural language processing technology, different languages, synonyms and abbreviations are unified into standard English country names, the trade time is subjected to time zone conversion and unified into Greenwich mean time, the trade amount is subjected to exchange rate conversion and unified into dollar units, the trade frequency is subjected to abnormal value detection, and extreme values exceeding 3 times of standard deviation of the average value are removed. And then, inputting the processed transaction attribute data into a random forest model of 100 trees trained based on 1000 ten thousand pieces of historical transaction data, wherein the AUC of the model on a test set reaches 0.95, and the AUC is used for predicting the occurrence probability of the problem of each transaction. If the occurrence probability of the problem of one transaction exceeds the preset 0.8 threshold, triggering an insurance mechanism, automatically sending related information of the transaction to an insurance company, applying for starting a corresponding risk reimbursement flow, and if the occurrence probability of the problem of the transaction does not exceed the threshold, continuing to process the subsequent transaction. Meanwhile, the system acquires the bill amount of the current transaction and the credit score of the user as input features, and predicts the batch processing quantity adjustment amplitude for the current transaction by using a CART decision tree model trained based on 5 ten thousand pieces of historical data. For example, if the currently set batch processing number is 50 pens/batch, the adjustment amplitude of the decision tree model prediction is +10%, and then the adjustment amplitude and the original batch number are weighted and averaged according to a weight of 7:3, so that the new batch processing number is 52 pens/batch. The system continuously monitors the average transaction problem occurrence probability of each hour, when the probability rises by more than 15% compared with the previous hour, a grid search algorithm is automatically started, 10-fold cross validation is carried out in the parameter ranges of 0.7-0.9 of the threshold value, 3-8 of the maximum depth of the decision tree, 10-100 of the minimum leaf node sample number and the like, the optimal model super-parameter combination is obtained, and the preset threshold value and batch quantity adjustment strategy are updated in real time so as to dynamically adapt to the change of the transaction mode.
And S106, dynamically routing the bill by comprehensively considering the transaction risk level, the credit score of the user and the payment guarantee degree through an intelligent routing algorithm.
And acquiring the bill information to be processed, including transaction amount, transaction type, user history behavior and the like. And extracting transaction records of the user for nearly half a year from a database, and calculating characteristics such as transaction frequency, average transaction amount and the like. And calculating a transaction risk score by using a logistic regression model according to the transaction frequency and the average transaction amount characteristics. And comparing the risk score with a preset threshold value to determine the transaction risk level. A credit score for the user is obtained from a credit scoring system. The credit scoring system calculates a score of 0-100 using a weighted average based on the user's payment history, credit card usage. And judging the payment guarantee degree of the transaction. It is checked whether the transaction supports refund and has a guarantee. For transactions supporting refunds, the value 1 is assigned, and for transactions supporting refunds, the value not supported is 0. The guaranteed transaction has a value of 1 and the unsecured transaction has a value of 0. And adding the assignment of whether refund is supported and the assignment of whether guarantee exists to obtain the payment guarantee degree, wherein the range is 0-2. And taking the transaction risk level, the credit score of the user and the payment guarantee degree as input variables. A decision model is constructed using a random forest algorithm. The output of the decision model is the probability distribution of the different processing channels. And selecting the channel with the highest probability as an optimal routing scheme. According to the routing scheme, the bill is allocated to the corresponding processing channel. The method is specifically implemented by establishing a processing channel mapping table, and corresponding a channel ID output by the decision model to an API interface of an actual processing system. And calling a corresponding API, and transmitting the bill information to the target processing system. And after the processing system receives the bill, returning a processing result. The processing results are recorded into a database for subsequent model optimization and system evaluation. And (3) periodically analyzing the processing result, evaluating the accuracy and efficiency of the routing scheme, and providing a basis for model parameter adjustment.
Specifically, in an intelligent bill routing system of a cross-border e-commerce platform, attribute information of each bill to be processed, including transaction amount, transaction type (such as commodity purchase, recharging, etc.), and user ID, is acquired in real time through a data interface with a transaction system. Then, the system extracts transaction records of the user in the last half year from the database, and uses SQL sentences to carry out aggregation statistics according to months to obtain characteristic variables such as monthly transaction frequency and monthly transaction amount of the user. These feature variables are then input into a logistic regression model trained on 10 ten thousand pieces of historical data, which has an accuracy of 85% on the test set, for predicting risk scores for each transaction, ranging from 0 to 100. The system sets three risk classes, low risk (0-30), medium risk (30-70), high risk (70-100). Meanwhile, the system acquires the credit score of the user from the credit scoring system through the API interface, wherein the score range is 350 to 950 points. In addition, the system also judges the payment guarantee degree of the transaction, the transaction supporting refund is assigned 1, the unsupported value is assigned 0, the transaction with the third party guarantee is assigned 1, the value without guarantee is assigned 0, and the two values are added to obtain the payment guarantee degree. Finally, the system uses a random forest model of 100 trees trained based on 5 ten thousand historical data, takes transaction risk level, user credit score and payment guarantee degree as input variables, predicts probability distribution of different processing channels, and selects a channel with highest probability as an optimal routing scheme. For example, for a low risk transaction with a transaction amount of $500, a user credit score of 650 points, and a degree of payment assurance of 1, the model calculates that it has a probability of being assigned to the manual audit channel of 2%, a probability of automatic pass of 95%, and a probability of rejection of 3%, then the transaction will be automatically passed. And the system calls a corresponding API interface according to the routing result, transmits bill information to the target processing system in a JSON format, and waits for a processing result to be returned. The system automatically analyzes the bill processing result of the previous hour every 1 hour, calculates indexes such as the actual passing rate, rejection rate and the like of each channel, and if the deviation from the expected deviation exceeds 5%, automatically adjusts the parameters of the trigger model so as to adapt to the change of bill distribution.
It will be apparent to those skilled in the art that various modifications and variations can be made to the embodiments of the present application without departing from the spirit or scope of the embodiments of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims and the equivalents thereof, the present application is also intended to include such modifications and variations.