技术领域technical field
本发明涉及一种方法,尤其涉及一种客户交易行为分析方法。The invention relates to a method, in particular to a client transaction behavior analysis method.
背景技术Background technique
随着经济的快速增长,各行各业的竞争日渐激烈,对于与经济密切相关的银行等金融行业而言,如何能在激烈的竞争环境下生存成为目前最为关心的问题。随着信息技术的发展,企业的竞争环境产生巨大的变化,越来越多的企业正从以产品为核心的商业模式向以客户为核心的商业模式转变。企业逐渐意识到把握住客户,就是把握住业绩,越能及时地满足客户的需求,就越能满足市场的需求,就越能在行业竞争中脱颖而出。目前,大部分的金融公司都已逐步建立内部客户管理系统,已积累海量的客户数据资源,若能有效地理解和利用这些客户信息,准确地对客户进行分类能使企业更加有效地对不同的客户群体提供更有针对性的服务,从而提高企业服务水平。With the rapid growth of the economy, the competition in all walks of life is becoming increasingly fierce. For financial industries such as banks, which are closely related to the economy, how to survive in the fierce competition environment has become the most concerned issue at present. With the development of information technology, the competitive environment of enterprises has undergone tremendous changes, and more and more enterprises are transforming from a product-centric business model to a customer-centric business model. Enterprises gradually realize that grasping customers means grasping performance. The more they can meet the needs of customers in a timely manner, the more they can meet the needs of the market, and the more they can stand out from the competition in the industry. At present, most financial companies have gradually established internal customer management systems and accumulated massive customer data resources. If they can effectively understand and use these customer information, and accurately classify customers, companies can more effectively classify different customers. Customer groups provide more targeted services, thereby improving the service level of enterprises.
现有技术中,一般采用RFM模型进行分析,但是,这种分析方式仅仅以购买时间、购买的总次数以及交易总额进行分析,这种分析方法不能准确对客户进行分类,进而不能准确确定客户的类别,从而不能准确知道企业的生产经营。In the prior art, the RFM model is generally used for analysis. However, this analysis method only analyzes the purchase time, the total number of purchases, and the total transaction amount. This analysis method cannot accurately classify customers, and thus cannot accurately determine customers. category, so that the production and operation of the enterprise cannot be accurately known.
因此,为了解决上述技术问题,亟需提出一种新的方法。Therefore, in order to solve the above technical problems, it is urgent to propose a new method.
发明内容Contents of the invention
有鉴于此,本发明的目的是提供一种客户交易行为分析方法,能够对客户交易的多种属性进行准确分析,能够有效提高客户分类的准确性,为企业的生产营销提供准确的数据支持。In view of this, the purpose of the present invention is to provide a method for analyzing customer transaction behavior, which can accurately analyze various attributes of customer transactions, effectively improve the accuracy of customer classification, and provide accurate data support for the production and marketing of enterprises.
本发明提供的一种客户是交易行为分析方法,包括A method for analyzing customer transaction behavior provided by the present invention includes
采集客户交易数据,并将交易数据按照时间排序;Collect customer transaction data and sort the transaction data by time;
提取交易数据中表征购买时间、交易频率以及交易金额的属性特征,并对购买时间、交易频率以及交易金额进行归一化处理;Extract the attribute features representing the purchase time, transaction frequency and transaction amount from the transaction data, and normalize the purchase time, transaction frequency and transaction amount;
挖掘属性特征的隐含属性,并获取属性特征在隐含属性上的分值数,并根据分值数确定聚类个数;Mining the hidden attributes of the attribute features, and obtaining the scores of the attribute features on the hidden attributes, and determining the number of clusters according to the scores;
基于分值数采用遗传算法进行处理,确定最终的聚类结果,得出客户分类。Based on the scores, the genetic algorithm is used to determine the final clustering results, and the customer classification is obtained.
进一步,属性特征包括最近购买、最远购买、购买间距的第一四分位点、购买间距中位数点、购买间距的第三四分位点、总体购买频率、月最大购买频率、月最小购买频率;累积购买金额以及平均购买金额。Further, the attribute features include the most recent purchase, the farthest purchase, the first quartile of the purchase distance, the median point of the purchase distance, the third quartile of the purchase distance, the overall purchase frequency, the maximum purchase frequency of the month, the minimum purchase frequency of the month Purchase frequency; cumulative purchase amount and average purchase amount.
进一步,对购买时间、交易频率以及交易金额通过如下方法进行归一化处理:Further, the purchase time, transaction frequency and transaction amount are normalized by the following methods:
其中,为第j个客户在购买时间属性R上的归一化值,Rmax为所有客户在购买时间属性R中属性特征的最大值,Rmin为所有客户在购买时间属性R中属性特征的最小值;为第j个客户在购买频率属性F上的归一化值,Fmax为所有客户在购买频率属性F中属性特征的最大值;Fmin为所有客户在购买频率属性F中属性特征的最小值;为第j个客户在交易金额属性M上的归一化值,Mmin为所有客户在交易金额属性F中属性特征的最大值,Mmin为所有客户在交易金额属性F中属性特征的最小值。in, is the normalized value of the jth customer at the purchase time attribute R, Rmax is the maximum value of the attribute characteristics of all customers in the purchase time attribute R, and Rmin is the minimum value of the attribute characteristics of all customers in the purchase time attribute R ; is the normalized value of the jth customer on the purchase frequency attribute F, Fmax is the maximum value of the attribute characteristics of all customers in the purchase frequency attribute F; Fmin is the minimum value of the attribute characteristics of all customers in the purchase frequency attribute F ; is the normalized value of the jth customer on the transaction amount attribute M, Mmin is the maximum value of all customers’ attribute features in the transaction amount attribute F, and Mmin is the minimum value of all customers’ attribute features in the transaction amount attribute F .
进一步,在因子分析中,任一属性特征被表示为:Further, in factor analysis, any attribute feature is expressed as:
Xg=θg+αg1f1+...+αgnfn+εg,f为隐含属性factor的缩写,θg第g个客户的交易数据的平均值,εg为交易数据中的特殊因子,不能被分解的部分,αg1…αgn为各隐含属性的权重,Xg =θg +αg1 f1 +...+αgn fn +εg , f is the abbreviation of hidden attribute factor, θg is the average value of the transaction data of the gth customer, εg is the transaction data The special factor in , the part that cannot be decomposed, αg1 ... αgn is the weight of each hidden attribute,
采用因子分析算法属性特征的分值数:The number of scores for attribute features using the factor analysis algorithm:
其中,X为客户的交易行为,factor为属性特征的隐含属性,为属性特征的分数值矩阵。 Among them, X is the customer's transaction behavior, factor is the implicit attribute of the attribute feature, A matrix of score values for attribute features.
进一步,根据如下方法确定聚类个数:Further, determine the number of clusters according to the following method:
构建客户交易相似度矩阵其中,K为隐含属性的个数,为第i个客户在第k个隐含属性的分数值,为第j个客户的第k个隐含属性的分数值;Build customer transaction similarity matrix Among them, K is the number of hidden attributes, is the score value of the i-th customer in the k-th hidden attribute, is the score value of the kth hidden attribute of the jth customer;
进一步,根据如下方法确定客户交易行为的聚类中心归属:Further, according to the following method to determine the cluster center of customer transaction behavior:
(1)先构造吸引信息矩阵:(1) Construct the attraction information matrix first:
rt+1(i,h)=S(i,j)-max{at(i,h')+S(i,j')};rt+1 (i,h)=S(i,j)-max{at (i,h')+S(i,j')};
其中其中,rt+1(i,h)为下一次迭代的客户h适合作为客户i的聚类中心的程度值,S(i,j')为相似度矩阵,at(i,h')为本次迭代客户i选择客户h'作为聚类中心的适合程度值,第一次迭代归属信息矩阵a0取值全为零;Among them, rt+1 (i,h) is the degree value of the customer h in the next iteration that is suitable as the cluster center of customer i, S(i,j') is the similarity matrix, at (i,h' ) Select customer h' as the suitability value of the clustering center for customer i in this iteration, and the value of the attribution information matrix a0 in the first iteration is all zero;
(2)再构造归属矩阵:(2) Reconstruct the attribution matrix:
其中,at+1(i,h)为下一次迭代客户i适合作为客户h的聚类中心的适合程度值,rt(i',h)为下一次迭代客户h作为客户i'的聚类中心的适合程度值;at+1(h,h)为下次迭代客户h作为自己本身的聚类中心的适合程度;第一次迭代的吸引信息矩阵r0取值全为零;Among them, at+1 (i, h) is the suitability value of customer i as the cluster center of customer h in the next iteration, and rt (i', h) is the cluster center of customer h in the next iteration. The suitability value of the cluster center; at+1 (h, h) is the suitability of the customer h as its own cluster center in the next iteration; the attraction information matrix r0 of the first iteration is all zero;
每一次迭代都会计算更新这两个信息矩阵,对于客户i,当r(i,i)+a(i,i)的值大于设定的阈值P时,表明客户i是一个聚类中心。Each iteration will calculate and update these two information matrices. For customer i, when the value of r(i,i)+a(i,i) is greater than the set threshold P, it indicates that customer i is a cluster center.
进一步,根据如下方法对吸引信息矩阵rt+1(i,h)和归属信息at+1(i,h)进行修正:Further, the attraction information matrix rt+1 (i,h) and the attribution information at+1 (i,h) are corrected according to the following method:
rt+1(i,h)=(1-λ)rt+1(i,h)+λrt(i,h);rt+1 (i,h)=(1-λ)rt+1 (i,h)+λrt (i,h);
at+1(i,h)=(1-λ)at+1(i,h)+λat(i,h);其中,λ为衰减系数,且λ∈(0,1);rt(i,h)为初始下客户h适合作为客户i的聚类中心的程度值;at(i,h)为初始下客户i适合作为客户h的聚类中心的适合程度值。at+1 (i,h)=(1-λ)at+1 (i,h)+λat (i,h); where, λ is the attenuation coefficient, and λ∈(0,1); rt (i, h) is the degree of suitability of the initial customer h as the cluster center of customer i;at (i, h) is the suitability value of the initial customer i as the cluster center of customer h.
进一步,还包括如下步骤:Further, the following steps are also included:
计算确定的聚类中的数据点的之间的方差SSE:Compute the variance SSE between data points in identified clusters:
其中,ci为第i个类的集合,x是ci集合的元素,dist为距离函数;方差SSE的值越小,证明聚类效果越好; Among them,ci is the set of the i-th class, x is the element of the set ofci , and dist is the distance function; the smaller the value of the variance SSE, the better the clustering effect;
计算聚类的平均轮过系数:Compute the average run-through coefficient for the clusters:
其中,M(i)表示当前聚类的平均轮廓系数,d(i)表示客户i到同一类内的其他点不相似程度的平均值,b(i)为客户i到其他类的平均不同相似程度的最小值;Among them, M(i) represents the average silhouette coefficient of the current cluster, d(i) represents the average degree of dissimilarity between customer i and other points in the same class, and b(i) is the average difference and similarity between customer i and other points in the same class minimum degree;
由平均轮廓系数M(i)判断当前聚类结果的影响程度,M(i)值越大说明聚类的结果越好。The influence degree of the current clustering result is judged by the average silhouette coefficient M(i), and the larger the value of M(i), the better the clustering result.
综合方差SSE值越小越好,平均轮廓系数越大越好这两个评价指标,可以对得到的聚类个数进行准确的定位,得到最优的聚类个数。The smaller the value of the comprehensive variance SSE, the better, and the larger the average silhouette coefficient, the better these two evaluation indicators can accurately locate the number of clusters obtained and obtain the optimal number of clusters.
进一步,根据如下方法进行遗传算法处理:Further, genetic algorithm processing is carried out according to the following method:
S1.将聚类中心的数据进行编码;S1. Coding the data of the cluster center;
S2.初始群体:采用随机生成方式,并生成E个初始个体,每个初始个体包含e个聚类中心;S2. Initial group: adopt random generation method, and generate E initial individuals, each initial individual contains e cluster centers;
S3.建立适应度函数:f=between/1+D;其中,between为聚类间的距离,D为聚类内的距离;S3. Establish a fitness function: f=between/1+D; wherein, between is the distance between the clusters, and D is the distance within the cluster;
S4.计算每个个体内的适应度函数,并找出适应度函数值最小的个体作为最优个体,采用采用轮盘赌算法选择个体,并在最后用最优个体替代最差个体;S4. Calculate the fitness function in each individual, and find out the individual with the smallest fitness function value as the optimal individual, use the roulette algorithm to select the individual, and finally replace the worst individual with the optimal individual;
S5.交叉处理:采用浮点书编码的交叉算子对个体的位置进行相互交换:S5. Intersection processing: The positions of the individuals are exchanged with each other using the crossover operator coded by the floating point book:
其中;a为0或1随机数,为下一次迭代A个体交叉后的结果,为本次迭代A个体;为下一次迭代A个体交叉后的结果,为本次迭代的B个体; Among them; a is a random number of 0 or 1, is the result of the crossover of individual A in the next iteration, Individual A for this iteration; is the result of the crossover of individual A in the next iteration, is the individual B of this iteration;
S6.变异:采用均匀变异算子,对于每一个变异点,从对应基因位的取值范围内取一随机值代替原来的基因位:S6. Mutation: Using a uniform mutation operator, for each mutation point, a random value is taken from the value range of the corresponding gene bit to replace the original gene bit:
Y'=Umin+μ(Umax-Umin);Y'=Umin +μ(Umax -Umin );
其中,Y'为变异后的个体,μ为(0,1)内的随机数,Umin为当前基因位的最小值,Umax为当前基因位的最大值;Among them, Y' is the individual after mutation, μ is a random number in (0,1), Umin is the minimum value of the current gene position, and Umax is the maximum value of the current gene position;
S7.将变异后的新群体采用K-means算法计算出新的聚类的中心,并以新的聚类中心生成新的群体进入下一次迭代,直至聚类中心不在改变,从而按照该聚类中心对客户进行分类。S7. Use the K-means algorithm to calculate the center of the new cluster for the mutated new population, and use the new cluster center to generate a new population to enter the next iteration until the cluster center does not change, so as to follow the clustering The center classifies customers.
本发明的有益效果:通过本发明,能够对客户交易的多种属性进行准确分析,能够有效提高客户分类的准确性,为企业的生产营销提供准确的数据支持。Beneficial effects of the present invention: through the present invention, various attributes of customer transactions can be accurately analyzed, the accuracy of customer classification can be effectively improved, and accurate data support can be provided for production and marketing of enterprises.
附图说明Description of drawings
下面结合附图和实施例对本发明作进一步描述:The present invention will be further described below in conjunction with accompanying drawing and embodiment:
图1为本发明的流程图。Fig. 1 is a flowchart of the present invention.
图2为本发明的交易行为细分聚类指标结果图。Fig. 2 is a result diagram of the transaction behavior subdivision and clustering index of the present invention.
图3为本发明的分析得到客户特征分析结果图。Fig. 3 is a diagram of the analysis results of customer characteristics obtained by the analysis of the present invention.
具体实施方式Detailed ways
以下结合说明书附图对本发明做出进一步详细说明:Below in conjunction with accompanying drawing, the present invention is described in further detail:
本发明提供的一种客户是交易行为分析方法,包括A method for analyzing customer transaction behavior provided by the present invention includes
采集客户交易数据,并将交易数据按照时间排序;Collect customer transaction data and sort the transaction data by time;
提取交易数据中表征购买时间、交易频率以及交易金额的属性特征,并对购买时间、交易频率以及交易金额进行归一化处理;Extract the attribute features representing the purchase time, transaction frequency and transaction amount from the transaction data, and normalize the purchase time, transaction frequency and transaction amount;
挖掘属性特征的隐含属性,并获取属性特征在隐含属性上的分值数,并根据分值数确定聚类个数;Mining the hidden attributes of the attribute features, and obtaining the scores of the attribute features on the hidden attributes, and determining the number of clusters according to the scores;
基于分值数采用遗传算法进行处理,确定最终的聚类结果,得出客户分类,通过上述方法,能够对客户交易的多种属性进行准确分析,能够有效提高客户分类的准确性,为企业的生产营销提供准确的数据支持。Based on the score value, the genetic algorithm is used to determine the final clustering result, and the customer classification is obtained. Through the above method, various attributes of customer transactions can be accurately analyzed, and the accuracy of customer classification can be effectively improved. Production and marketing provide accurate data support.
本实施例中,属性特征包括最近购买、最远购买、购买间距的第一四分位点、购买间距中位数点、购买间距的第三四分位点、总体购买频率、月最大购买频率、月最小购买频率;累积购买金额以及平均购买金额,通过这种方式,对客户的交易行为进行了多指标分析,从而提高了分析结果的准确性。In this embodiment, the attribute features include the latest purchase, the farthest purchase, the first quartile of the purchase distance, the median point of the purchase distance, the third quartile of the purchase distance, the overall purchase frequency, and the monthly maximum purchase frequency , Monthly minimum purchase frequency; cumulative purchase amount and average purchase amount. In this way, multi-index analysis is carried out on the customer's transaction behavior, thereby improving the accuracy of the analysis results.
本实施例中,对购买时间、交易频率以及交易金额通过如下方法进行归一化处理:In this embodiment, the purchase time, transaction frequency and transaction amount are normalized by the following method:
其中,为第j个客户在购买时间属性R上的归一化值,Rmax为所有客户在购买时间属性R中属性特征的最大值,Rmin为所有客户在购买时间属性R中属性特征的最小值;为第j个客户在购买频率属性F上的归一化值,Fmax为所有客户在购买频率属性F中属性特征的最大值;Fmin为所有客户在购买频率属性F中属性特征的最小值;为第j个客户在交易金额属性M上的归一化值,Mmin为所有客户在交易金额属性F中属性特征的最大值,Mmin为所有客户在交易金额属性F中属性特征的最小值。in, is the normalized value of the jth customer at the purchase time attribute R, Rmax is the maximum value of the attribute characteristics of all customers in the purchase time attribute R, and Rmin is the minimum value of the attribute characteristics of all customers in the purchase time attribute R ; is the normalized value of the jth customer on the purchase frequency attribute F, Fmax is the maximum value of the attribute characteristics of all customers in the purchase frequency attribute F; Fmin is the minimum value of the attribute characteristics of all customers in the purchase frequency attribute F ; is the normalized value of the jth customer on the transaction amount attribute M, Mmin is the maximum value of all customers’ attribute features in the transaction amount attribute F, and Mmin is the minimum value of all customers’ attribute features in the transaction amount attribute F .
以时间属性R上的归一化为例:如需对第i个客户的最近购买数据进行归一化处理:那么R的取值为第i个客户的最近购买时间;Rmax为所有客户的最近购买的最大值,Rmin为所有客户最近购买的最小值;通过上述方式,将所有的客户数据统一到(0,1)之间,从而利于数据的分析处理,提高分析结果的准确性。Take the normalization on the time attribute R as an example: if it is necessary to normalize the latest purchase data of the i-th customer: then the value of R is the latest purchase time of the i-th customer; Rmax is the The maximum value of recent purchases, Rmin is the minimum value of recent purchases by all customers; through the above method, all customer data is unified between (0,1), which facilitates data analysis and processing and improves the accuracy of analysis results.
本实施例中,采用因子分析算法属性特征的分值数:In this embodiment, the score of the attribute feature using the factor analysis algorithm:
在因子分析中,某一个属性特征被表示为:In factor analysis, an attribute feature is expressed as:
Xg=θg+αg1f1+...+αgnfn+εg,f为隐含属性factor的缩写,θg第g个客户的交易数据的平均值,εg为交易数据中的特殊因子,不能被分解的部分,αg1…αgn为各隐含属性的权重,其中,X为客户的交易行为,factor为属性特征的隐含属性,为属性特征的分数值矩阵。在上述的方式中,将客户的交易数据交转换成了客户交易数据在隐含属性的分数值然后进行处理,一方面提供分类的精确性,另一方面,能够简化计算过程;需要指出的是,在因子分析之前,需要判断客户的原数据是否适合做因子分析。Xg =θg +αg1 f1 +...+αgn fn +εg , f is the abbreviation of hidden attribute factor, θg is the average value of the transaction data of the gth customer, εg is the transaction data The special factor in , the part that cannot be decomposed, αg1 ... αgn is the weight of each hidden attribute, Among them, X is the customer's transaction behavior, factor is the implicit attribute of the attribute feature, A matrix of score values for attribute features. In the above method, the customer's transaction data is converted into the fractional value of the customer's transaction data in the hidden attribute and then processed. On the one hand, the accuracy of classification is provided, and on the other hand, the calculation process can be simplified; it should be pointed out that , before factor analysis, it is necessary to judge whether the customer's original data is suitable for factor analysis.
本实施例中,根据如下方法确定聚类个数:In this embodiment, the number of clusters is determined according to the following method:
构建客户交易相似度矩阵其中,K为隐含属性的个数,为第i个客户在第k个隐含属性的分数值,为第j个客户的第k个隐含属性的分数值;Build customer transaction similarity matrix Among them, K is the number of hidden attributes, is the score value of the i-th customer in the k-th hidden attribute, is the score value of the kth hidden attribute of the jth customer;
以下面的实例说明相似度矩阵的构成:假如只有两个客户:The following example illustrates the composition of the similarity matrix: If there are only two customers:
客户i:具有三个隐含属性,那么,在隐含属性上的分数值分别为αi1,αi2以及αi3;Customer i: has three hidden attributes, then the scores on the hidden attributes are αi1 , αi2 and αi3 ;
客户j:同样具有三个隐含属性,那么,在隐含属性上的分数值分别为αj1,αj2以及αj3;因此:Customer j: also has three hidden attributes, then the scores on the hidden attributes are αj1 , αj2 and αj3 ; therefore:
S(i,j)=-{(αi1-αj1)2+(αi2-αj2)2+(αi3-αj3)2}。S(i,j)=-{(αi1 -αj1 )2 +(αi2 -αj2 )2 +(αi3 -αj3 )2 }.
本实施例中,根据如下方法确定客户交易行为的聚类中心归属:In this embodiment, the attribution of the cluster center of the customer's transaction behavior is determined according to the following method:
构建吸引信息矩阵和归属矩阵并进行迭代运算:Construct the attraction information matrix and attribution matrix and perform iterative operations:
首先构造吸引信息矩阵:First construct the attraction information matrix:
rt+1(i,h)=S(i,j)-max{at(i,h')+S(i,j')};rt+1 (i,h)=S(i,j)-max{at (i,h')+S(i,j')};
其中,rt+1(i,h)为下一次迭代的客户h适合作为客户i的聚类中心的程度值,S(i,j')为相似度矩阵,at(i,h')为本次迭代客户i选择客户h'作为聚类中心的适合程度值,第一次迭代归属信息矩阵a0取值全为零;Among them, rt+1 (i,h) is the degree value of the customer h in the next iteration that is suitable as the cluster center of customer i, S(i,j') is the similarity matrix, at (i,h') Select customer h' as the suitability value of the clustering center for customer i in this iteration, and the value of the attribution information matrix a0 in the first iteration is all zero;
再构造归属矩阵:Reconstruct the attribution matrix:
其中,at+1(i,h)为下一次迭代客户i适合作为客户h的聚类中心的适合程度值,rt(i',h)为下一次迭代客户h作为客户i'的聚类中心的适合程度值;at+1(h,h)为下次迭代客户h作为自己本身的聚类中心的适合程度;第一次迭代的吸引信息矩阵r0取值全为零;Among them, at+1 (i, h) is the suitability value of customer i as the cluster center of customer h in the next iteration, and rt (i', h) is the cluster center of customer h in the next iteration. The suitability value of the cluster center; at+1 (h, h) is the suitability of the customer h as its own cluster center in the next iteration; the attraction information matrix r0 of the first iteration is all zero;
每一次迭代都会计算更新这两个信息矩阵,对于客户i,当r(i,i)+a(i,i)的值大于设定的阈值P时,表明客户i是一个聚类中心。Each iteration will calculate and update these two information matrices. For customer i, when the value of r(i,i)+a(i,i) is greater than the set threshold P, it indicates that customer i is a cluster center.
并且,根据如下方法对吸引信息矩阵rt+1(i,h)和归属信息at+1(i,h)进行修正:Moreover, the attraction information matrix rt+1 (i,h) and the attribution information at+1 (i,h) are corrected according to the following method:
rt+1(i,h)=(1-λ)rt+1(i,h)+λrt(i,h);rt+1 (i,h)=(1-λ)rt+1 (i,h)+λrt (i,h);
at+1(i,h)=(1-λ)at+1(i,h)+λat(i,h);其中,λ为衰减系数,且λ∈(0,1);rt(i,h)为初始下客户h适合作为客户i的聚类中心的程度值;at(i,h)为初始下客户i适合作为客户h的聚类中心的适合程度值,通过上述方法,能够有效防止数据振荡,而且能够准确的进行初步的聚类以及聚类个数的确定。at+1 (i,h)=(1-λ)at+1 (i,h)+λat (i,h); where, λ is the attenuation coefficient, and λ∈(0,1); rt (i, h) is the value of the degree of suitability of the initial customer h as the cluster center of customer i; at (i, h) is the suitability value of the initial customer i as the cluster center of customer h. The method can effectively prevent data oscillation, and can accurately perform preliminary clustering and determine the number of clusters.
本实施例中,还包括如下步骤:In this embodiment, the following steps are also included:
计算确定的聚类中的数据点的之间的方差SSE:Compute the variance SSE between data points in identified clusters:
其中,ci为第i个类的数据的集合,x是第i个类的元素,dist为距离函数;方差SSE值越小,证明聚类效果越好; Among them,ci is the data set of the i-th class, x is the element of the i-th class, and dist is the distance function; the smaller the variance SSE value, the better the clustering effect;
计算聚类的平均轮廓系数:Compute the average silhouette coefficient of the clusters:
其中,M(i)表示当前聚类的平均轮廓系数,d(i)表示客户i到同一类内的其他点不相似程度的平均值,b(i)为客户i到其他类的平均不同相似程度的最小值;Among them, M(i) represents the average silhouette coefficient of the current cluster, d(i) represents the average degree of dissimilarity between customer i and other points in the same class, and b(i) is the average difference and similarity between customer i and other points in the same class minimum degree;
由平均轮廓系数M(i)判断当前聚类结果的影响程度,M(i)值越大说明聚类结果越好。The influence degree of the current clustering result is judged by the average silhouette coefficient M(i), and the larger the value of M(i), the better the clustering result.
综合方差SSE值越小越好,平均轮廓系数越大越好这两个评价指标,可以对得到的聚类个数进行准确的定位,得到最优的聚类个数。The smaller the value of the comprehensive variance SSE, the better, and the larger the average silhouette coefficient, the better these two evaluation indicators can accurately locate the number of clusters obtained and obtain the optimal number of clusters.
本实施例中,根据如下方法进行基于K-means的遗传算法处理:In the present embodiment, the genetic algorithm processing based on K-means is carried out according to the following method:
S1.将聚类中心的数据进行编码;S1. Coding the data of the cluster center;
S2.初始群体:采用随机生成方式,并生成E个初始个体,每个初始个体包含e个聚类中心;S2. Initial group: adopt random generation method, and generate E initial individuals, each initial individual contains e cluster centers;
S3.建立适应度函数:f=between/1+D;其中,between为聚类间的距离,D为聚类内的距离;S3. Establish a fitness function: f=between/1+D; wherein, between is the distance between the clusters, and D is the distance within the cluster;
S4.计算每个个体内的适应度函数,并找出适应度函数值最小的个体作为最优个体,采用采用轮盘赌算法选择个体,并在最后用最优个体替代最差个体;S4. Calculate the fitness function in each individual, and find out the individual with the smallest fitness function value as the optimal individual, use the roulette algorithm to select the individual, and finally replace the worst individual with the optimal individual;
S5.交叉处理:采用浮点书编码的交叉算子对个体的位置进行相互交换:S5. Intersection processing: The positions of the individuals are exchanged with each other using the crossover operator coded by the floating point book:
其中;a为0或1随机数,为下一次迭代A个体交叉后的结果,为本次迭代A个体;为下一次迭代A个体交叉后的结果,为本次迭代的B个体; Among them; a is a random number of 0 or 1, is the result of the crossover of individual A in the next iteration, Individual A for this iteration; is the result of the crossover of individual A in the next iteration, is the individual B of this iteration;
S6.变异:采用均匀变异算子,对于每一个变异点,从对应基因位的取值范围内取一随机值代替原来的基因位:S6. Mutation: Using a uniform mutation operator, for each mutation point, a random value is taken from the value range of the corresponding gene bit to replace the original gene bit:
Y'=Umin+μ(Umax-Umin);Y'=Umin +μ(Umax -Umin );
其中,Y'为变异后的个体,μ为(0,1)内的随机数,Umin为当前基因位的最小值,Umax为当前基因位的最大值;Among them, Y' is the individual after mutation, μ is a random number in (0,1), Umin is the minimum value of the current gene position, and Umax is the maximum value of the current gene position;
S7.将变异后的新群体采用K-means算法计算出新的聚类的中心,并以新的聚类中心生成新的群体进入下一次迭代,直至聚类中心不在改变,从而按照该聚类中心对客户进行分类得到最终的分类结果;S7. Use the K-means algorithm to calculate the center of the new cluster for the mutated new population, and use the new cluster center to generate a new population to enter the next iteration until the cluster center does not change, so as to follow the clustering The center classifies the customers to obtain the final classification results;
在K-means算法中,把聚类个数和初始的聚类中心作为参数,把W个客户分成w个类,并使类内具有高相似度,而类间的相似度,则通过如下公式进行计算:In the K-means algorithm, the number of clusters and the initial cluster center are used as parameters, and W customers are divided into w classes, and the intra-class similarity is high, while the similarity between classes is determined by the following formula Calculation:
其中,其中,ci为第i个类的数据的集合,x是第i个类的元素,ηi为第i个类数据的均值; in, Wherein, ci is the collection of the data of the i-th class, x is the element of the i-th class, and ηi is the mean value of the i-th class data;
其中,K-means算法采用贪心策略,通过如下过程求解:Among them, the K-means algorithm adopts a greedy strategy and solves it through the following process:
(1)随机初始化k个客户作为初始的聚类中心;(1) Randomly initialize k customers as initial cluster centers;
(2)对每一个xi找到最近的数据点,作为它的类中心;(2) Find the nearest data point for eachxi as its class center;
(3)计算新的类中心;(3) Calculate the new class center;
(4)重复步骤(2)和步骤(3)直至类中心不再改变;该贪心策略属于现有的技术,在此仅仅是描述的大概过程,不属于本申请的改进核心,在此不加以详述。(4) Repeat steps (2) and (3) until the class center does not change; this greedy strategy belongs to the existing technology, and it is only a general process described here, which does not belong to the improvement core of this application, and will not be added here detail.
最后说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的宗旨和范围,其均应涵盖在本发明的权利要求范围当中。Finally, it is noted that the above embodiments are only used to illustrate the technical solutions of the present invention without limitation. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be carried out Modifications or equivalent replacements without departing from the spirit and scope of the technical solution of the present invention shall be covered by the claims of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711490136.5ACN108230029A (en) | 2017-12-29 | 2017-12-29 | Client trading behavior analysis method |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711490136.5ACN108230029A (en) | 2017-12-29 | 2017-12-29 | Client trading behavior analysis method |
| Publication Number | Publication Date |
|---|---|
| CN108230029Atrue CN108230029A (en) | 2018-06-29 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201711490136.5APendingCN108230029A (en) | 2017-12-29 | 2017-12-29 | Client trading behavior analysis method |
| Country | Link |
|---|---|
| CN (1) | CN108230029A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109711484A (en)* | 2019-01-10 | 2019-05-03 | 哈步数据科技(上海)有限公司 | A kind of classification method and system of customer |
| CN110009417A (en)* | 2019-04-02 | 2019-07-12 | 深圳前海微众银行股份有限公司 | Target customer screening method, apparatus, device, and computer-readable storage medium |
| CN111563628A (en)* | 2020-05-09 | 2020-08-21 | 重庆锐云科技有限公司 | Real estate customer transaction time prediction method, device and storage medium |
| CN112070548A (en)* | 2020-09-11 | 2020-12-11 | 上海风秩科技有限公司 | User layering method, device, equipment and storage medium |
| CN112905863A (en)* | 2021-03-19 | 2021-06-04 | 青岛檬豆网络科技有限公司 | Automatic customer classification method based on K-Means clustering |
| CN114022283A (en)* | 2021-11-10 | 2022-02-08 | 北银金融科技有限责任公司 | Upstream and downstream data mining method based on bank transaction line enterprise |
| CN117493979A (en)* | 2023-12-29 | 2024-02-02 | 青岛智简尚达信息科技有限公司 | Customer classification method based on data processing |
| CN119624511A (en)* | 2025-02-17 | 2025-03-14 | 山东恒迈信息科技有限公司 | A customer behavior analysis method and system based on retail data |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102136123A (en)* | 2011-03-15 | 2011-07-27 | 中国工商银行股份有限公司 | Target bank customer recognition system |
| CN102254028A (en)* | 2011-07-22 | 2011-11-23 | 青岛理工大学 | A personalized product recommendation method and system integrating attribute and structure similarity |
| CN104778605A (en)* | 2015-04-09 | 2015-07-15 | 北京京东尚科信息技术有限公司 | Method and device for classifying E-commerce customers |
| US20160255969A1 (en)* | 2015-03-06 | 2016-09-08 | Wal-Mart Stores, Inc. | Shopping facility assistance systems, devices and methods pertaining to movement of a mobile retail product display |
| CN106127521A (en)* | 2016-03-23 | 2016-11-16 | 四川长虹电器股份有限公司 | A kind of information processing method and data handling system |
| CN106202216A (en)* | 2016-03-23 | 2016-12-07 | 四川长虹电器股份有限公司 | A kind of data processing method and data handling system |
| CN106355449A (en)* | 2016-08-31 | 2017-01-25 | 腾讯科技(深圳)有限公司 | User selecting method and device |
| CN106529968A (en)* | 2016-09-29 | 2017-03-22 | 深圳大学 | Customer classification method and system thereof based on transaction data |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102136123A (en)* | 2011-03-15 | 2011-07-27 | 中国工商银行股份有限公司 | Target bank customer recognition system |
| CN102254028A (en)* | 2011-07-22 | 2011-11-23 | 青岛理工大学 | A personalized product recommendation method and system integrating attribute and structure similarity |
| US20160255969A1 (en)* | 2015-03-06 | 2016-09-08 | Wal-Mart Stores, Inc. | Shopping facility assistance systems, devices and methods pertaining to movement of a mobile retail product display |
| CN104778605A (en)* | 2015-04-09 | 2015-07-15 | 北京京东尚科信息技术有限公司 | Method and device for classifying E-commerce customers |
| CN106127521A (en)* | 2016-03-23 | 2016-11-16 | 四川长虹电器股份有限公司 | A kind of information processing method and data handling system |
| CN106202216A (en)* | 2016-03-23 | 2016-12-07 | 四川长虹电器股份有限公司 | A kind of data processing method and data handling system |
| CN106355449A (en)* | 2016-08-31 | 2017-01-25 | 腾讯科技(深圳)有限公司 | User selecting method and device |
| CN106529968A (en)* | 2016-09-29 | 2017-03-22 | 深圳大学 | Customer classification method and system thereof based on transaction data |
| Title |
|---|
| FREY等: "Clustering by passing messages between data points", 《SCIENCE》* |
| 曾小青等: "基于消费数据挖掘的多指标客户细分新方法", 《计算机应用研究》* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109711484A (en)* | 2019-01-10 | 2019-05-03 | 哈步数据科技(上海)有限公司 | A kind of classification method and system of customer |
| CN110009417A (en)* | 2019-04-02 | 2019-07-12 | 深圳前海微众银行股份有限公司 | Target customer screening method, apparatus, device, and computer-readable storage medium |
| CN110009417B (en)* | 2019-04-02 | 2023-04-18 | 深圳前海微众银行股份有限公司 | Target customer screening method, device, equipment and computer readable storage medium |
| CN111563628A (en)* | 2020-05-09 | 2020-08-21 | 重庆锐云科技有限公司 | Real estate customer transaction time prediction method, device and storage medium |
| CN112070548A (en)* | 2020-09-11 | 2020-12-11 | 上海风秩科技有限公司 | User layering method, device, equipment and storage medium |
| CN112070548B (en)* | 2020-09-11 | 2024-02-20 | 上海秒针网络科技有限公司 | User layering method, device, equipment and storage medium |
| CN112905863A (en)* | 2021-03-19 | 2021-06-04 | 青岛檬豆网络科技有限公司 | Automatic customer classification method based on K-Means clustering |
| CN114022283A (en)* | 2021-11-10 | 2022-02-08 | 北银金融科技有限责任公司 | Upstream and downstream data mining method based on bank transaction line enterprise |
| CN117493979A (en)* | 2023-12-29 | 2024-02-02 | 青岛智简尚达信息科技有限公司 | Customer classification method based on data processing |
| CN119624511A (en)* | 2025-02-17 | 2025-03-14 | 山东恒迈信息科技有限公司 | A customer behavior analysis method and system based on retail data |
| Publication | Publication Date | Title |
|---|---|---|
| CN108230029A (en) | Client trading behavior analysis method | |
| CN112070125A (en) | Prediction method of unbalanced data set based on isolated forest learning | |
| CN111612323B (en) | Electric power credit investigation evaluation method based on big data model | |
| CN110619351B (en) | Vegetable and bird stager site selection method based on improved k-means algorithm | |
| CN104321794B (en) | A kind of system and method that the following commercial viability of an entity is determined using multidimensional grading | |
| CN118552303B (en) | Finance big data fusion analysis method | |
| CN114444573A (en) | Power customer label generation method based on big data clustering technology | |
| CN113554310A (en) | A dynamic evaluation model of enterprise credit based on smart contracts | |
| CN116823496A (en) | Intelligent insurance risk assessment and pricing system based on artificial intelligence | |
| CN117539920B (en) | Data query method and system based on real estate transaction multidimensional data | |
| CN104850868A (en) | Customer segmentation method based on k-means and neural network cluster | |
| CN113591947A (en) | Power data clustering method and device based on power consumption behaviors and storage medium | |
| CN114611976A (en) | Power consumer behavior portrait method, system and device | |
| CN116883157A (en) | Small sample credit assessment method and system based on metric learning | |
| CN109978023A (en) | Feature selection approach and computer storage medium towards higher-dimension big data analysis | |
| CN118898498B (en) | Digital management method for electric power system commercial environment based on big data analysis | |
| CN111221915B (en) | Online learning resource quality analysis method based on CWK-means | |
| CN119172476B (en) | Method, system and device for intelligent sorting of outbound calls for non-performing assets based on machine learning | |
| CN112348220A (en) | Credit risk assessment prediction method and system based on enterprise behavior pattern | |
| CN114626940A (en) | Data analysis method and device and electronic equipment | |
| CN114091961A (en) | Power enterprise supplier evaluation method based on semi-supervised SVM | |
| CN115730254B (en) | Method and device for expanding modeling sample data label | |
| Zheng | Application of silence customer segmentation in securities industry based on fuzzy cluster algorithm | |
| Yang et al. | Application research of K-means algorithm based on big data background | |
| CN115169914A (en) | Customer behavior data analysis method and device |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | Application publication date:20180629 | |
| RJ01 | Rejection of invention patent application after publication |