CN114861050A

Movatterモバイル変換

Info

Publication number: CN114861050A
Application number: CN202210454110.XA
Authority: CN
Inventors: 边根庆; 李婷
Original assignee: Xian University of Architecture and Technology
Current assignee: Xian University of Architecture and Technology
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2022-08-05

Abstract

Translated fromChinese

本发明公开了一种基于神经网络的特征融合推荐方法及系统，属于新零售企业商品智能推荐领域，区别于传统算法仅考虑用户对物品的行为来进行推荐，融合了用户和物品的特征，提出了基于神经网络特征融合推荐模型。选取用户和物品特征经过编码转换成特征向量，随后经过神经网络嵌入层、全连接层更深层次的特征提取，生成用户和物品的特征表示，最后两个矩阵相乘来预测用户对物品的评分，相比于传统的基于物品的协同过滤算法，该模型在MSE、RMSE和MAE上都具备比较显著的优势。

The invention discloses a feature fusion recommendation method and system based on a neural network, which belongs to the field of intelligent commodity recommendation of new retail enterprises. It is different from the traditional algorithm that only considers the user's behavior of the item for recommendation, and integrates the characteristics of the user and the item. A recommendation model based on neural network feature fusion is proposed. The user and item features are selected and converted into feature vectors through encoding, and then the feature representation of users and items is generated through the neural network embedding layer and the fully connected layer for deeper feature extraction. Finally, the two matrices are multiplied to predict the user's rating of the item. Compared with the traditional item-based collaborative filtering algorithm, the model has significant advantages in MSE, RMSE and MAE.

Description

Translated fromChinese

技术领域technical field

本发明属于新零售企业商品智能推荐领域，具体涉及一种基于神经网络的特征融合推荐方法及系统。The invention belongs to the field of intelligent recommendation of commodities in new retail enterprises, and in particular relates to a method and system for feature fusion recommendation based on a neural network.

背景技术Background technique

现如今，越来越多的人使用网络技术来获取所需要的知识和信息，这就导致了信息数量激增，出现了“信息过载”的问题。对使用者而言，互联网上的“信息过载”问题常常使用户找不到自己想要的内容，而搜索引擎的存在虽然可以缓解“信息过载”，但也要求用户有较为清晰的搜索要求；当用户没有特定的需求时，推荐系统的出现为用户提供了一个新的选择途径，使得个性化的服务成为了可能。推荐系统不仅是一种有效的信息处理工具，更是一种基于用户兴趣的智能搜索平台，它可以帮助用户发现自己潜在感兴趣的项目，从而节省用户的搜索时间。对企业而言，推荐系统能够积极地向用户推荐潜在感兴趣的商品信息，提高信息的利用率与用户的满意度，从而增加用户的黏性，达到增加公司商业收益的目的。总之，推荐系统在为企业带来价值的同时，也给用户的工作和生活带来了便利，在一定程度上影响了整个社会的发展。所以，在未来的智能社会中，推荐系统是不可或缺的一种技术，无论在学术研究、商业领域、实际应用等方面，都具有重大的应用价值。Nowadays, more and more people use network technology to obtain the required knowledge and information, which leads to a surge in the amount of information and the problem of "information overload". For users, the problem of "information overload" on the Internet often prevents users from finding the content they want. Although the existence of search engines can alleviate the "information overload", it also requires users to have clearer search requirements; When users do not have specific needs, the emergence of recommender systems provides users with a new way of choice, making personalized services possible. Recommender system is not only an effective information processing tool, but also an intelligent search platform based on user interests, which can help users discover items of potential interest, thereby saving users' search time. For enterprises, the recommender system can actively recommend potentially interesting product information to users, improve the utilization of information and user satisfaction, thereby increasing user stickiness and achieving the purpose of increasing the company's business income. In a word, the recommendation system not only brings value to enterprises, but also brings convenience to the work and life of users, which affects the development of the whole society to a certain extent. Therefore, in the future intelligent society, the recommendation system is an indispensable technology, and it has great application value in academic research, commercial fields, practical applications, etc.

传统推荐算法主要分为三大类：基于内容的推荐算法(Content based，CB)、协同过滤推荐算法(Collaborative Filtering，CF)以及混合推荐算法(HybridRecommendations)。协同过滤推荐包括基于项目的协同过滤(Item CollaborativeFiltering，Item CF)和基于用户的协同过滤(User Collaborative Filtering，User CF)。无论是Item CF还是CB，都是以项目的相似性为基础进行推荐的，只是相似度的计算方法有所不同，前者是根据用户的历史偏好，后者是根据项目本身的特性来进行的。其中，CB推荐算法直接利用项目的特征信息来进行推荐，会存在特征提取的问题。一方面，项目的特征描述一般为非结构化的文本数据，这无疑加大了特征提取的难度；另一方面，项目的本身属性信息较少，也增加了特征选择的难度。虽然CF依据用户对项目的行为进行推荐能解决特征提取的难题，但该算法要求用户-项目矩阵具有较高的饱和度，才能产生更加准确的推荐结果，事实上，用户有过行为的项目数往往很小，所得到的用户-项目矩阵异常稀疏，这不仅加大了计算的难度，也导致了推荐结果的不准确。Traditional recommendation algorithms are mainly divided into three categories: content-based recommendation algorithms (Content based, CB), collaborative filtering recommendation algorithms (Collaborative Filtering, CF) and hybrid recommendation algorithms (Hybrid Recommendations). Collaborative filtering recommendations include item-based collaborative filtering (Item Collaborative Filtering, Item CF) and user-based collaborative filtering (User Collaborative Filtering, User CF). Both Item CF and CB are recommended based on the similarity of items, but the calculation method of similarity is different. The former is based on the user's historical preference, and the latter is based on the characteristics of the item itself. Among them, the CB recommendation algorithm directly uses the feature information of the item to recommend, and there will be a problem of feature extraction. On the one hand, the feature description of the project is generally unstructured text data, which undoubtedly increases the difficulty of feature extraction; on the other hand, the project itself has less attribute information, which also increases the difficulty of feature selection. Although CF recommends items based on user behaviors, it can solve the problem of feature extraction, but the algorithm requires the user-item matrix to have high saturation in order to produce more accurate recommendation results. In fact, the number of items that users have acted on It is often very small, and the obtained user-item matrix is extremely sparse, which not only increases the difficulty of calculation, but also leads to inaccurate recommendation results.

通过对以上背景的分析，传统推荐算法面临着特征提取以及推荐精确度不高等问题。同时，人们对于个性化服务的要求也越来越高，而用户的行为特征往往难以获取。因此，如何从大量的网络数据中挖掘出有用的信息并做出正确的决策是目前推荐技术需要重点考虑和探索的问题之一。Through the analysis of the above background, traditional recommendation algorithms face the problems of feature extraction and low recommendation accuracy. At the same time, people's requirements for personalized services are getting higher and higher, and users' behavioral characteristics are often difficult to obtain. Therefore, how to mine useful information from a large amount of network data and make correct decisions is one of the issues that need to be considered and explored in the current recommendation technology.

发明内容SUMMARY OF THE INVENTION

为了克服上述现有技术的缺点，本发明的目的在于提供一种基于神经网络的特征融合推荐方法及系统，以解决现有技术中特征提取难、推荐精度不高和对特征信息的利用方式单一的问题。In order to overcome the above-mentioned shortcomings of the prior art, the purpose of the present invention is to provide a feature fusion recommendation method and system based on a neural network, so as to solve the problem of difficulty in feature extraction, low recommendation accuracy and single utilization of feature information in the prior art. The problem.

为了达到上述目的，本发明采用以下技术方案予以实现：In order to achieve the above object, the present invention adopts the following technical solutions to be realized:

本发明公开了一种基于神经网络的特征融合推荐方法，包括：The invention discloses a feature fusion recommendation method based on neural network, comprising:

获取商品特征和用户特征信息；Obtain product features and user feature information;

将商品特征和用户特征信息向量化，形成商品特征矩阵与用户特征矩阵；Vectorize the product features and user feature information to form a product feature matrix and a user feature matrix;

将商品特征矩阵与用户特征矩阵输入神经网络特征融合推荐模型生成三层神经网络的特征融合推荐模型；Input the product feature matrix and user feature matrix into the neural network feature fusion recommendation model to generate a three-layer neural network feature fusion recommendation model;

将三层神经网络的特征融合推荐模型参数初始化，生成预测评分由高到低的推荐列表。The features of the three-layer neural network are fused to initialize the parameters of the recommendation model to generate a recommendation list with prediction scores from high to low.

优选地，所述商品特征包括商品的编码、品类、名称、二级分类、适用年龄、适用部位、适用肤质和用户单价；所述用户特征包括性别和肤质，性别有男、女2个种类，肤质有干性、油性、敏感肌和混合型4个种类。Preferably, the product features include product code, category, name, secondary classification, applicable age, applicable location, applicable skin type and user unit price; the user characteristics include gender and skin type, and there are two genders: male and female There are 4 types of skin types: dry, oily, sensitive and combination.

优选地，所述商品特征向量化包括：Preferably, the commodity feature vectorization includes:

1)选取商品特征，形成商品特征标签库；1) Select product features to form a product feature label library;

2)通过独热编码对商品特征标签库的文本信息进行向量化处理；2) The text information of the commodity feature tag library is vectorized by one-hot encoding;

3)对用户单价进行归一化处理及嵌入处理得到商品深层次的特征矩阵表示。3) The user unit price is normalized and embedded to obtain the deep-level feature matrix representation of the product.

优选地，采用了方差分析法、互信息法、基于分类与回归树嵌入法即CART、基于递归特征消除交叉验证即RFECV的包装法来进行商品特征数据的选择。Preferably, a variance analysis method, a mutual information method, a classification and regression tree-based embedding method (CART), and a packaging method based on recursive feature elimination cross-validation (RFECV) are used to select the commodity feature data.

优选地，采用了方差分析法、互信息法、基于分类与回归树嵌入法或基于递归特征消除交叉验证的包装法来进行商品特征的选取。Preferably, a variance analysis method, a mutual information method, an embedding method based on classification and regression trees, or a packaging method based on recursive feature elimination and cross-validation are used to select the product features.

优选地，步骤一中，用户特征向量化，包括：Preferably, in step 1, the user features are vectorized, including:

用户特征向量化，包括：User feature vectorization, including:

a)选取用户特征，形成用户特征标签库；a) Select user features to form a user feature tag library;

b)通过独热编码对用户特征标签库的文本信息进行向量化处理；b) Perform vectorization processing on the text information of the user feature tag library by one-hot encoding;

c)对特征向量进行归一化处理及嵌入处理得到用户深层次的特征矩阵表示。c) Normalize and embed the feature vector to obtain the deep-level feature matrix representation of the user.

优选地，所述三层神经网络的特征融合推荐模型，第一层神经网络为将特征向量转化为低纬稠密的特征表示的嵌入层；第二层神经网络为将所有的特征表示拼接到一起，得到用户和会员的特征向量的全连接层；第三层神经网络为将前两层得到的商品和用户特征作为输入，将两个输入以矩阵相乘的方式得到一个输出值并其将回归到真实评分的全连接层。Preferably, in the feature fusion recommendation model of the three-layer neural network, the first layer of neural network is an embedding layer that converts feature vectors into low-dimensional and dense feature representations; the second layer of neural network is for splicing all feature representations together. , the fully connected layer of the feature vectors of users and members is obtained; the third layer of neural network takes the product and user features obtained from the first two layers as input, and the two inputs are multiplied by a matrix to obtain an output value and it will return Fully connected layer to the ground truth score.

优选地，三层神经网络的特征融合推荐模型的构建包括：Preferably, the construction of the feature fusion recommendation model of the three-layer neural network includes:

步骤一：商品特征向量化和用户特征向量化，形成商品特征矩阵与用户特征矩阵；Step 1: Product feature vectorization and user feature vectorization to form product feature matrix and user feature matrix;

步骤二：商品特征矩阵与用户特征矩阵作为特征融合推荐模型的第一层神经网络的输入，在第一层神经网络的迭代训练形成低纬稠密的嵌入层特征表示；Step 2: The product feature matrix and the user feature matrix are used as the input of the first-layer neural network of the feature fusion recommendation model, and the low-dimensional and dense embedding layer feature representation is formed in the iterative training of the first-layer neural network;

步骤三：嵌入层特征向示输入第二层全连接层，经过拼接形成全连接层特征向量；Step 3: The features of the embedding layer are input to the second fully connected layer, and the fully connected layer feature vector is formed by splicing;

步骤四：全连接层特征向量输入第三层全连接层，经过迭代、回归形成经过三层神经网络的特征融合推荐模型。Step 4: The feature vector of the fully connected layer is input into the third fully connected layer, and after iteration and regression, a feature fusion recommendation model of a three-layer neural network is formed.

优选地，三层神经网络的特征融合推荐模型的评价包括：Preferably, the evaluation of the feature fusion recommendation model of the three-layer neural network includes:

S1：初始化模型参数，生成预测评分由高到低的推荐列表；S1: Initialize the model parameters and generate a recommendation list with prediction scores from high to low;

S2：通过预测评分与真实评分的差值来评价特征融合推荐模型的精度。S2: Evaluate the accuracy of the feature fusion recommendation model by the difference between the predicted score and the real score.

优选地，通过MSE、RMSE和MAE来评价特征融合推荐模型的精度，Preferably, the accuracy of the feature fusion recommendation model is evaluated by MSE, RMSE and MAE,

其中，u代表用户，i代表物品，r_ui是用户u对物品i的实际评分，

是推荐算法给出的预测评分。Among them, u represents the user, i represents the item, r_ui is the actual rating of the item i by the user u,

is the predicted score given by the recommendation algorithm.

一种基于神经网络的特征融合推荐系统，包括：A feature fusion recommendation system based on neural network, including:

特征信息获取模块，用于获取商品特征和用户特征信息；The feature information acquisition module is used to obtain product features and user feature information;

向量化模块，用于将商品特征和用户特征信息向量化，形成商品特征矩阵与用户特征矩阵；The vectorization module is used to vectorize the product features and user feature information to form a product feature matrix and a user feature matrix;

模型生成模块，用于将商品特征矩阵与用户特征矩阵输入神经网络特征融合推荐模型生成三层神经网络的特征融合推荐模型；The model generation module is used to input the product feature matrix and the user feature matrix into the neural network feature fusion recommendation model to generate a three-layer neural network feature fusion recommendation model;

参数初始化模块，用于将三层神经网络的特征融合推荐模型参数初始化，生成预测评分由高到低的推荐列表。The parameter initialization module is used to initialize the parameters of the recommendation model by integrating the features of the three-layer neural network, and generate a recommendation list with prediction scores from high to low.

与现有技术相比，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明公开的一种基于神经网络的特征融合推荐方法，利用辅助信息提取用户和物品的分布式特征表示，并结合神经网络在特征提取和大规模数据分析方面的优势，以解决本发明根据用户对物品的历史评分，学习用户的兴趣模型，并预测该用户在将来看到一个他没有评过分的物品时，会给这个物品评多少分。评分预测的预测准确度一般通过均方误差、均方根误差和平均绝对误差计算，该指标适合对拥有打分的数据集进行评估。对于特征提取困难的问题，本发明使用方差分析法、互信息法、基于分类与回归树嵌入法等进行特征的提取，并将特征数据经过独热编码后输入神经网络嵌入层，经过嵌入后的数据维度较低，能将离散的序列映射为连续的向量。其优势在于在嵌入空间中找到最近邻和挖掘变量间的关系。通过MSE、RMSE和MAE有效地提高了推荐结果的准确性。The invention discloses a feature fusion recommendation method based on neural network, which uses auxiliary information to extract distributed feature representation of users and items, and combines the advantages of neural network in feature extraction and large-scale data analysis, so as to solve the problem according to the present invention. The historical rating of the item, learning the user's interest model, and predicting how much the user will rate an item in the future when he sees an item that he has not rated. The prediction accuracy of score prediction is generally calculated by mean square error, root mean square error and mean absolute error, which is suitable for evaluating data sets with scores. For the problem of difficulty in feature extraction, the present invention uses variance analysis method, mutual information method, classification and regression tree-based embedding method, etc. to extract features, and input the feature data into the neural network embedding layer after one-hot encoding. The data dimension is low and can map discrete sequences into continuous vectors. Its advantage lies in finding nearest neighbors in the embedding space and mining the relationship between variables. The accuracy of recommendation results is effectively improved by MSE, RMSE and MAE.

本发明提出的基于神经网络的特征融合推荐模型，特征融合推荐模型为三层神经网络的特征融合推荐模型，第一层嵌入层可以将特征向量转化为低纬稠密的特征表示；第二层为全连接层，该层将所有的特征向量拼接到一起，得到用户和会员的特征向量。第三层也是全连接层，将前两层得到的商品和用户特征作为输入，将两个输入以矩阵相乘的方式得到一个输出值，并将输出值回归到真实评分，优化损失。模型经过一次次迭代，可以逐渐减小预测值与真实值的误差，得到更为准确的预测结果。The feature fusion recommendation model based on the neural network proposed by the present invention is a feature fusion recommendation model of a three-layer neural network. The first layer of embedding layer can convert the feature vector into a low-dimensional and dense feature representation; A fully connected layer, which concatenates all feature vectors together to obtain feature vectors for users and members. The third layer is also a fully connected layer, taking the product and user features obtained in the first two layers as input, multiplying the two inputs by a matrix to obtain an output value, and returning the output value to the real score to optimize the loss. After repeated iterations of the model, the error between the predicted value and the actual value can be gradually reduced, and a more accurate prediction result can be obtained.

本发明公开的一种基于神经网络的特征融合推荐模型的构建方法将经过特征工程提取的商品和用户特征经过独热编码后输入神经网络嵌入层，经过嵌入后的数据维度较低，能将离散的序列映射为连续的向量。如果仅凭借独热编码对商品的特征进行向量化处理，会导致商品特征向量异常稀疏。然而神经网络无法处理稀疏向量，因此必须对稀疏向量进一步处理，得到更低维、稠密的特征向量表示。嵌入即Embedding是一种将离散变量转换为稠密向量的方法，将商品和用户特征向量通过神经网络嵌入层进一步训练，可以减少特征向量的维数，其优势在于在嵌入空间中找到最近邻和挖掘变量间的关系。全连接层则起到将学到的特征表示映射到样本的标记空间的作用。换句话说，就是把特征整合到一起即高度提纯特征，方便交给最后的分类器或者回归函数。The invention discloses a method for constructing a feature fusion recommendation model based on a neural network. Commodities and user features extracted by feature engineering are input into a neural network embedding layer after one-hot encoding. The embedded data has a lower dimension and can convert discrete The sequence maps to a continuous vector. If the product features are vectorized only by one-hot encoding, the product feature vector will be abnormally sparse. However, the neural network cannot handle sparse vectors, so the sparse vectors must be further processed to obtain lower-dimensional and denser feature vector representations. Embedding is a method of converting discrete variables into dense vectors. The product and user feature vectors are further trained through the neural network embedding layer, which can reduce the dimension of the feature vector. Its advantage lies in finding the nearest neighbors and mining in the embedding space. relationship between variables. The fully connected layer plays the role of mapping the learned feature representation to the label space of the samples. In other words, it is to integrate the features together, that is, to highly purify the features, which is convenient for handing over to the final classifier or regression function.

进一步地，由于机器无法直接识别接收到的标识符，如单词、短语、字符等，因此有必要对标识符进行数字化处理。本发明采用独热编码来进行商品特征的数值化处理。利用独热编码，一方面可以把离散特征的低维空间扩展到有限的n维空间，从而可以计算向量的内积、距离等，确保了它的输出能够用于机器学习；另一方面，经独热编码后的特征，每一维度的特征都可以看作是连续的特征，可以对其每一维特征都进行归一化。其次，本发明对特征信息中价格等数值化的信息进行了归一化处理，避免样本中存在特别大或特别小的值，对后续计算带来不利的影响。Further, since the machine cannot directly recognize the received identifiers, such as words, phrases, characters, etc., it is necessary to digitize the identifiers. The present invention adopts one-hot encoding to carry out numerical processing of commodity features. Using one-hot encoding, on the one hand, the low-dimensional space of discrete features can be extended to a limited n-dimensional space, so that the inner product, distance, etc. of the vector can be calculated, ensuring that its output can be used for machine learning; For the features after one-hot encoding, the features of each dimension can be regarded as continuous features, and the features of each dimension can be normalized. Secondly, the present invention normalizes the numerical information such as price in the feature information, so as to avoid the existence of extremely large or extremely small values in the sample, which would have adverse effects on subsequent calculations.

进一步地，对于特征提取困难的问题，本发明使用方差分析法、互信息法、基于分类与回归树嵌入法等进行特征的提取，缓解了因特征选取不当对算法精度的影响。Further, for the problem of difficulty in feature extraction, the present invention uses variance analysis method, mutual information method, classification and regression tree-based embedding method, etc. to extract features, which alleviates the influence of improper feature selection on algorithm accuracy.

附图说明Description of drawings

图1为本发明的流程图；Fig. 1 is the flow chart of the present invention;

图2为用户特征信息提取图；Fig. 2 is a user feature information extraction diagram;

图3为商品特征信息提取图；Figure 3 is a diagram of commodity feature information extraction;

图4为特征融合推荐模型图；Figure 4 is a feature fusion recommendation model diagram;

图5为基于RMSE的特征融合推荐模型训练图；Figure 5 is a training diagram of a feature fusion recommendation model based on RMSE;

图6为基于MSE的特征融合推荐模型训练图；Figure 6 is a training diagram of a feature fusion recommendation model based on MSE;

图7为基于MAE的特征融合推荐模型训练图；Figure 7 is a training diagram of a feature fusion recommendation model based on MAE;

图8为实验结果对比图；Figure 8 is a comparison diagram of experimental results;

图9为系统用例图；Figure 9 is a system use case diagram;

图10为系统构架图。Figure 10 is a system architecture diagram.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to make those skilled in the art better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only Embodiments are part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second" and the like in the description and claims of the present invention and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

下面结合附图对本发明做进一步详细描述：Below in conjunction with accompanying drawing, the present invention is described in further detail:

本发明公开的一种基于神经网络的特征融合推荐方法，包括：A feature fusion recommendation method based on a neural network disclosed in the present invention includes:

参见图1，为本发明的流程图，包括：Referring to Fig. 1, it is a flowchart of the present invention, including:

S1.商品特征向量化S1. Product feature vectorization

商品特征向量化包括特征提取以及特征向量化，图2为商品特征向量化的流程，详细阐述如下：Product feature vectorization includes feature extraction and feature vectorization. Figure 2 shows the process of product feature vectorization, which is described in detail as follows:

1)本发明采用了方差分析法、互信息法、基于分类与回归树嵌入法即CART、基于递归特征消除交叉验证即RFECV的包装法来进行商品特征数据的选择，最终选取了商品的编码、品类、名称、二级分类、适用年龄、适用部位、适用肤质、用户单价八个特征，形成了商品特征标签库。1) The present invention adopts the analysis of variance method, the mutual information method, the packaging method based on classification and regression tree embedding method, namely CART, and the packaging method based on recursive feature elimination cross-validation, namely RFECV, to select the commodity feature data, and finally select the code of the commodity, Eight characteristics of category, name, secondary classification, applicable age, applicable part, applicable skin type, and user unit price form a product feature tag library.

2)因机器无法直接接收文本标识符，所以在构建特征融合推荐模型之前，要将选取特征数值化，本发明通过独热编码对商品的文本信息进行向量化处理。需要数值化处理的商品特征有二级分类、品类、适用年龄、适用部位、适用肤质。其中，二级分类共有15个种类，品类共有37个种类；适用部位分为11个种类，分别为：头部、面部、眼部、唇部、手、足、躯干、头发、体毛、全身皮肤、指甲以及特殊部位；适用年龄划分为5个区间，分别为：18岁以下、18到25岁、25岁到30岁、3用到40岁以及40岁以上；适用肤质分为5个种类，分别为干性、中性、油性、混合性以及敏感性。2) Since the machine cannot directly receive the text identifier, the selected features should be digitized before constructing the feature fusion recommendation model. The present invention performs vectorization processing on the text information of the product through one-hot encoding. The product features that need to be numerically processed include secondary classification, category, applicable age, applicable part, and applicable skin type. Among them, there are 15 categories in the secondary classification, and a total of 37 categories; the applicable parts are divided into 11 categories, namely: head, face, eyes, lips, hands, feet, trunk, hair, body hair, whole body skin , nails and special parts; the applicable age is divided into 5 intervals, namely: under 18 years old, 18 to 25 years old, 25 years old to 30 years old, 3 for 40 years old and over 40 years old; applicable skin types are divided into 5 types , respectively dry, neutral, oily, combination and sensitive.

3)在将所有特征信息进行编码后，需要对用户单价进行归一化处理，避免样本中存在特别大或特别小的值，对后续计算带来不利的影响。经过归一化处理后的数据会被限制在一定范围内，比如[0，1]。常见的归一化方法包括线性归一化和标准差归一化。线性归一化，也称min-max标准化，是对原始数据的线性变换，使得结果值映射到[0，1]之间。转换函数如(1)所示。3) After encoding all feature information, the user unit price needs to be normalized to avoid the existence of particularly large or extremely small values in the sample, which will adversely affect subsequent calculations. The normalized data will be limited to a certain range, such as [0, 1]. Common normalization methods include linear normalization and standard deviation normalization. Linear normalization, also known as min-max normalization, is a linear transformation of the original data such that the resulting values are mapped between [0, 1]. The conversion function is shown in (1).

其中min(x)为原始数据集的最小值，max(x)为原始数据集的最大值，这种归一化普遍应用于在数值较集中的场景，如果min和max差值较大，容易导致归一化的结果不稳定，所以在实际应用中，一般用经验常量来替代min和max。标准差归一化，也称Z-score标准化，该方法是将原始测量值与标准偏差相联系，通过对该差平方和的计算得到其绝对值的大小。经过处理后的数据符合标准正态分布，即均值为0，标准差为1。转换函数如下：where min(x) is the minimum value of the original data set, and max(x) is the maximum value of the original data set. This normalization is generally used in scenes with relatively concentrated values. If the difference between min and max is large, it is easy to The result of normalization is unstable, so in practical applications, min and max are generally replaced by empirical constants. Standard deviation normalization, also known as Z-score standardization, is a method that associates the original measurement value with the standard deviation, and obtains its absolute value by calculating the sum of squares of the difference. The processed data conform to the standard normal distribution, that is, the mean is 0 and the standard deviation is 1. The conversion function is as follows:

其中μ为原始数据的均值，σ为原始数据的标准差。经过多次验证，本发明选择线性归一化的方式对商品价格进行处理。where μ is the mean of the original data and σ is the standard deviation of the original data. After multiple verifications, the present invention selects the method of linear normalization to process commodity prices.

S2.用户特征向量化S2. User feature vectorization

用户特征向量化包括特征提取以及特征向量化，图3为用户特征向量化的流程，详细描述如下：User feature vectorization includes feature extraction and feature vectorization. Figure 3 shows the process of user feature vectorization, which is described in detail as follows:

1)选取用户的手机号码、性别、年龄和肤质四个特征，形成了用户特征标签库。1) Four features of the user's mobile phone number, gender, age and skin quality are selected to form a user feature tag library.

2)通过独热编码对用户的文本信息进行向量化处理。需要数值处理得有性别和肤质，性别有男、女两个种类，肤质有干性、中性、油性、混合性以及敏感性五个种类。2) The user's text information is vectorized by one-hot encoding. There are two types of gender and skin types that need to be processed. There are two types of gender, male and female, and five types of skin types: dry, neutral, oily, combination and sensitive.

3)在将所有特征信息转为数值化后，需要对用户年龄进行归一化处理。本发明选择线性归一化的方式对用户年龄进行处理。3) After converting all feature information into numerical value, the user age needs to be normalized. The present invention selects a linear normalization method to process the user's age.

S3.嵌入层处理S3. Embedding layer processing

1)如果仅凭借独热编码对商品的编码进行数值化处理，会导致特征向量异常稀疏。然而神经网络无法处理稀疏向量，因此必须对稀疏向量进一步处理，得到更低维、稠密的特征向量表示。嵌入(Embedding)是一种将离散变量转换为稠密向量的方法，将商品特征向量和用户特征向量通过神经网络嵌入层进一步训练，可以减少特征变量的维数。1) If only one-hot encoding is used to numerically process the encoding of the product, the feature vector will be abnormally sparse. However, the neural network cannot handle sparse vectors, so the sparse vectors must be further processed to obtain lower-dimensional and denser feature vector representations. Embedding is a method of converting discrete variables into dense vectors. The product feature vector and user feature vector are further trained through the neural network embedding layer, which can reduce the dimension of feature variables.

2)经过嵌入层处理后，得到商品和用户深层次的特征矩阵表示。2) After processing by the embedding layer, the deep-level feature matrix representation of products and users is obtained.

本模型的输入是用户和商品的Embedding矩阵，代表了用户、商品独有的特征信息，图4为特征聚合推荐模型的神经网络结构图。The input of this model is the Embedding matrix of users and products, which represents the unique feature information of users and products. Figure 4 is the neural network structure diagram of the feature aggregation recommendation model.

1)首先，把经过特征提取后的用户特征Embedding矩阵和商品特征Embedding矩阵作为特征聚合推荐模型的输入，用户特征Embedding矩阵包含了用户的手机号码、性别、年龄和肤质，手机号码产出向量大小为(N,32)，性别、年龄和肤质产出向量大小均为(N,16)；商品特征Embedding矩阵包含了商品的品类、名称、二级分类、适用年龄、适用部位、适用肤质、用户单价，商品ID产出向量大小为(N,32)，名称、二级分类、品类、适用年龄、适用部位、适用肤质、用户单价产出向量大小均为(N,16)。1) First, the user feature Embedding matrix and product feature Embedding matrix after feature extraction are used as the input of the feature aggregation recommendation model. The user feature Embedding matrix contains the user's mobile phone number, gender, age and skin quality, and the mobile phone number output vector The size is (N, 32), and the output vectors of gender, age and skin quality are all (N, 16); the product feature Embedding matrix includes the product category, name, secondary classification, applicable age, applicable part, and applicable skin. Quality, user unit price, product ID output vector size is (N, 32), name, secondary classification, category, applicable age, applicable part, applicable skin type, user unit price output vector size is (N, 16).

2)将经由嵌入层的训练的特征向量作为第二层全连接层的输入。对于用户特征向量，手机号码输入样本大小为32，输出样本大小为32，其余特征输入输入样本大小为16，输出样本大小为32；对于商品特征输入，ID输入样本大小为32，输出样本大小为32，其余特征输入输入样本大小为16，输出样本大小为32；2) Use the trained feature vector of the embedding layer as the input of the second fully connected layer. For the user feature vector, the input sample size of mobile phone number is 32, the output sample size is 32, the input sample size of other feature input is 16, and the output sample size is 32; for commodity feature input, the ID input sample size is 32, and the output sample size is 32 32, the remaining feature input input sample size is 16, and the output sample size is 32;

3)第三层全连接层，输入特征空间的维度为128，最后的全连接神经元个数为200。3) The third fully connected layer, the dimension of the input feature space is 128, and the final number of fully connected neurons is 200.

S4.初始化模型参数并训练S4. Initialize model parameters and train

参数初始化，设置数据单次迭代次数epochs为10，学习速率learning_rate为0.0001，嵌入层维数embed_dim为32，第二层全连接层维数embed_dim为32，第三层全连接层，输入特征空间的维度为128，最后的全连接神经元个数为200，每次训练样本量batch_size为256，采用Adam算法作为模型的优化器，以MAE指标、MSE指标和RMSE指标作为模型的损失函数进行训练。Parameter initialization, set the number of epochs for a single iteration of the data to 10, the learning rate learning_rate to 0.0001, the embedding layer dimension embed_dim to 32, the second layer fully connected layer dimension embed_dim to 32, the third layer fully connected layer, the input feature space of The dimension is 128, the final number of fully connected neurons is 200, the batch_size of each training sample is 256, the Adam algorithm is used as the model optimizer, and the MAE index, MSE index and RMSE index are used as the loss function of the model for training.

S5.评价模型S5. Evaluation model

除了计算推荐结果的准确率，还可以通过预测用户对产品的评分，并以预测评分与真实评分的差值来评价推荐模型的好坏。如果可以获取用户对产品的历史评分，就可以预测用户对未知产品的评分结果，根据评分高低决定是否将该产品推荐给用户。一般通过MSE、RMSE和MAE来评价预测模型的精度，该指标适合对拥有打分的数据集进行评估。MSE、RMSE和MAE的计算公式分别为(3)、(4)、(5)，In addition to calculating the accuracy of the recommendation results, the quality of the recommendation model can also be evaluated by predicting the user's rating of the product and using the difference between the predicted rating and the actual rating. If the user's historical rating of the product can be obtained, the user's rating result for an unknown product can be predicted, and whether to recommend the product to the user is determined according to the rating. The accuracy of the prediction model is generally evaluated by MSE, RMSE and MAE, which is suitable for evaluating the data set with scoring. The calculation formulas of MSE, RMSE and MAE are (3), (4), (5), respectively,

其中u代表用户，i代表物品，r_ui是用户u对物品i的实际评分，而

是推荐算法给出的预测评分。where u represents the user, i represents the item, r_ui is the actual rating of the item i by the user u, and

is the predicted score given by the recommendation algorithm.

首先，本发明使用的“某零售企业”数据集，经处理后以用户购买商品的次数作为用户对商品的评分，符合以上指标的应用要求。其次，本发明提出的基于神经网络的特征融合推荐模型，就是根据用户对已知商品的评分来预测对未知商品的评分，因此使用MSE、RMSE、MAE来作为实验的评测指标。First of all, the data set of "a retail enterprise" used in the present invention, after processing, uses the number of times the user purchases the product as the user's score for the product, which meets the application requirements of the above indicators. Secondly, the feature fusion recommendation model based on the neural network proposed by the present invention predicts the rating of the unknown product according to the user's rating of the known product, so MSE, RMSE, and MAE are used as the evaluation indicators of the experiment.

8.系统应用8. System application

通过上述研究，得到了优于传统算法的基于神经网络的特征融合推荐模型。接下来将构建新零售企业商品智能推荐系统，并将上述模型应用到该推荐该系统中。系统主要分为后台管理模块、数据处理模块、商品展示模块、统计推荐模块以及个性化推荐模块。其中个性化推荐模块为核心模块，主要任务是帮助用户发现对自己有价值的信息，提高用户满意度，增加企业盈利，从而实现消费者和生产者的双赢。Through the above research, a feature fusion recommendation model based on neural network is obtained that is superior to the traditional algorithm. Next, we will build a new retail enterprise commodity intelligent recommendation system, and apply the above model to the recommendation system. The system is mainly divided into background management module, data processing module, commodity display module, statistical recommendation module and personalized recommendation module. Among them, the personalized recommendation module is the core module, and its main task is to help users discover valuable information for themselves, improve user satisfaction, and increase corporate profits, so as to achieve a win-win situation for consumers and producers.

本发明区别于传统算法仅考虑用户对物品的行为来进行推荐，融合了用户和物品的特征，提出了基于神经网络特征融合推荐模型。选取用户和物品特征经过编码转换成特征向量，随后经过神经网络嵌入层、全连接层更深层次的特征提取，生成用户和物品的特征表示，最后两个矩阵相乘来预测用户对物品的评分，相比于传统的基于物品的协同过滤算法，该模型在MSE、RMSE和MAE上都具备比较显著的优势。Different from the traditional algorithm, the present invention only considers the user's behavior of the item for recommendation, integrates the features of the user and the item, and proposes a feature fusion recommendation model based on a neural network. The user and item features are selected and converted into feature vectors through encoding, and then the feature representation of users and items is generated through the neural network embedding layer and the fully connected layer for deeper feature extraction. Finally, the two matrices are multiplied to predict the user's rating of the item. Compared with the traditional item-based collaborative filtering algorithm, the model has significant advantages in MSE, RMSE and MAE.

验证性试验confirmatory test

为了说明本发明的性能，利用本发明的方法和传统基于协同过滤的方法进行对比实验，实验主要是在选定的数据集上进行算法测试，数据集来源于某零售业企业，数据由该企业一年的销售记录、用户数据、商品数据构成。其中，销售记录为在过去一年内用户购买商品的记录，字段包括用户编号、商品编号、数量、金额以及交易日期等。用户数据集记录了用户的基础属性，包括性别、出生年份、出生日期、用户等级、以及肤质等。商品数据包括商品名称、分类、系列、品类、适用年龄、零售价等。In order to illustrate the performance of the present invention, the method of the present invention and the traditional method based on collaborative filtering are used to conduct a comparative experiment. The experiment is mainly to test the algorithm on the selected data set. The data set comes from a retail enterprise, and the data is obtained by the enterprise. One year of sales records, user data, and commodity data. Among them, the sales record is the record of the user purchasing the product in the past year, and the fields include the user ID, the product ID, the quantity, the amount, and the transaction date. The user data set records the basic attributes of users, including gender, birth year, date of birth, user level, and skin type. Product data includes product name, classification, series, category, applicable age, retail price, etc.

特征融合模型关键在于商品和会员的特征提取，为了确保实验的准确性，需要对商品数据和会员数据进行清洗，过滤掉会员和商品数据中选取特征均为空的数据。对于会员数据，过滤掉会员性别、年龄和肤质均为空的数据；对只有肤质为空的均填充为中性肤质；对于性别为空的，填充为女性；对于年龄为空的，填充为所有年龄的平均值，最终会员数据有185696条。对于商品数据，过滤掉在商品特征在名称、二级分类、品类、适用年龄、适用部位、适用肤质、会员单价中，有四个以上均为空的数据，最终商品数据有882条。The key to the feature fusion model is the feature extraction of products and members. In order to ensure the accuracy of the experiment, it is necessary to clean the product data and member data, and filter out the data with empty features in the member and product data. For member data, filter out members whose gender, age, and skin type are all empty; fill in neutral skin type if only skin type is empty; fill in as female if gender is empty; if age is empty, fill in as female. Filled with the average of all ages, the final membership data has 185,696 pieces. For the product data, filter out the data with more than four empty data in the product features in the name, secondary classification, category, applicable age, applicable part, applicable skin type, and member unit price, and the final product data has 882 items.

本实验设置数据单次迭代次数epochs为10，学习速率learning_rate为0.0001，嵌入层维数embed_dim为32，第二层全连接层维数embed_dim为32，第三层全连接层，输入特征空间的维度为128，最后的全连接神经元个数为200，每次训练样本量batch_size为256，采用Adam算法作为模型的优化器，对训练集数据采用随机打乱，以MAE指标、MSE指标和RMSE指标作为模型的损失函数进行训练，实验分析具体如下：In this experiment, the number of epochs for a single iteration of the data is set to 10, the learning rate learning_rate is 0.0001, the dimension of the embedding layer embed_dim is 32, the dimension of the second fully connected layer is 32, and the third fully connected layer is the dimension of the input feature space. is 128, the final number of fully connected neurons is 200, the batch_size of each training sample is 256, the Adam algorithm is used as the optimizer of the model, the training set data is randomly scrambled, and the MAE indicator, MSE indicator and RMSE indicator are used. As the loss function of the model for training, the experimental analysis is as follows:

收敛性分析：实验中分别采用选取不用的损失函数进行多次迭代优化，得到的实验结果如图5，6，7所示。图5为Loss＝RMSE的神经网络的训练图，图6为Loss＝MSE的神经网络的训练图，图7为Loss＝MAE的神经网络的训练图。由图可知在迭代十次左右的时候，模型的损失函数已经趋于零，由此可以得出该推荐模型收敛速度快且误差较小的结论。为了防止模型过拟合，应用测试集对模型进行验证，图8为模型拟合对比图，红线为评分真实值，黑线为预测值，可以看出峰值基本拟合。Convergence analysis: In the experiment, different loss functions were used to perform multiple iterative optimizations, and the experimental results obtained are shown in Figures 5, 6, and 7. FIG. 5 is a training diagram of a neural network with Loss=RMSE, FIG. 6 is a training diagram of a neural network with Loss=MSE, and FIG. 7 is a training diagram of a neural network with Loss=MAE. It can be seen from the figure that the loss function of the model has tended to zero after about ten iterations, so it can be concluded that the recommended model has a fast convergence speed and a small error. In order to prevent the model from overfitting, the test set was used to verify the model. Figure 8 is a comparison chart of model fitting. The red line is the true value of the score, and the black line is the predicted value. It can be seen that the peak is basically fitted.

准确率分析：除了以MSE、RMSE、MAE来衡量一个算法的推荐性能外，还需对发明提出的构建基于神经网络特征融合推荐与其他推荐算法进行对比，来验证本发明方法的准确率。表1是本发明提出的推荐模型在“某零售企业”数据集下与基于项目的协同过滤推荐Slope One、提出的基于因子分解机模型(Factorization Machines，FM)、基于深度神经网络的因子分解机模型(Factorization Machines Based Neural Network，Deep FM)的推荐算法的实验对比：Accuracy analysis: In addition to measuring the recommendation performance of an algorithm by MSE, RMSE, and MAE, it is also necessary to compare the proposed method based on neural network feature fusion with other recommendation algorithms to verify the accuracy of the method of the present invention. Table 1 shows the recommendation model proposed by the present invention and the item-based collaborative filtering recommendation Slope One under the "a retail enterprise" data set, the proposed factorization machine model (Factorization Machines, FM), and the factorization machine based on deep neural network. Experimental comparison of the recommendation algorithm of the model (Factorization Machines Based Neural Network, Deep FM):

表1实验结果图Table 1 Experimental result diagram

由表1可见，基于神经网络的特征融合推荐模型相比于传统的基于项目的协同过滤算法、基于因子分解机模型的推荐算法、基于深度神经网络的因子分解机推荐模型在MSE、RMSE和MAE上都具备比较显著的优势。It can be seen from Table 1 that compared with the traditional item-based collaborative filtering algorithm, the recommendation algorithm based on the factorization machine model, and the factorization machine recommendation model based on the deep neural network, the feature fusion recommendation model based on neural network has better performance in MSE, RMSE and MAE. Both have significant advantages.

本发明的基于神经网络特征融合推荐模型，融合了商品特征和用户特征，并基于神经网络进行模型训练。该方法解决了传统算法直接以用户－项目矩阵预测评分，没有充分挖掘用户或商品的一些隐形特征，推荐准确性不高的问题。其次，神经网络在特征提取和特征表示方面与传统方法相比有很大优势。最后在实验中采用某零售企业一年的销售数据对基于神经网络特征融合推荐模型进行仿真实验。通过实验分析了该方法有效地证实了本发明的准确性，并与传统基于协同过滤方法在准确率方面进行了比较的。结果表明，该模型在MSE、RMSE和MAE上都具备比较显著的优势，推荐准确率更高。The feature fusion recommendation model based on the neural network of the present invention integrates product features and user features, and performs model training based on the neural network. This method solves the problem that the traditional algorithm directly predicts the score based on the user-item matrix, does not fully exploit some invisible features of users or products, and the recommendation accuracy is not high. Second, neural networks have great advantages over traditional methods in feature extraction and feature representation. Finally, in the experiment, one year's sales data of a retail enterprise is used to simulate the recommendation model based on neural network feature fusion. Through the experimental analysis, the method effectively proves the accuracy of the present invention, and compares with the traditional method based on collaborative filtering in terms of accuracy. The results show that the model has significant advantages in MSE, RMSE and MAE, and the recommendation accuracy is higher.

开发与应用development and application

基于本发明提出的基于神经网络特征融合推荐模型，设计并实现零售企业商品推荐系统。该系统将对“某零售企业”电商平台内所产生的用户行为信息进行分析，找出用户与用户、商品与商品以及用户与商品之间的关联，从而将该用户可能感兴趣的商品推荐给他。本系统将接收到的“某零售企业”的用户数据、商品数据以及交易数据进行清洗，并导入系统得到实验所需数据；在这些数据上，进行特征处理，利用本发明提出的基于神经网络的特征融合推荐模型完成推荐，并在前端给予用户展示。Based on the neural network feature fusion recommendation model proposed by the present invention, a retail enterprise commodity recommendation system is designed and implemented. The system will analyze the user behavior information generated in the e-commerce platform of "a retail enterprise", find out the relationship between users and users, commodities and commodities, and users and commodities, so as to recommend commodities that the user may be interested in. give him. The system cleans the received user data, commodity data and transaction data of "a retail enterprise", and imports it into the system to obtain the data required for the experiment; on these data, feature processing is performed, and the neural network-based algorithm proposed by the present invention is used. The feature fusion recommendation model completes the recommendation and displays it to the user at the front end.

1.用例分析1. Use case analysis

如图9系统用例图所示，本系统用户角色分为三种，分别为系统管理员、电商平台管理员以及普通用户。As shown in the system use case diagram in Figure 9, there are three user roles in this system, namely system administrators, e-commerce platform administrators and ordinary users.

对于系统管理员，其用例有用户信息管理、商品信息管理、交易数据管理、评分数据管理以及推荐结果管理。For system administrators, the use cases are user information management, commodity information management, transaction data management, rating data management, and recommendation result management.

对于电商平台管理员，其用例有推荐模型的维护、数据的清洗以及导入。For e-commerce platform administrators, use cases include recommendation model maintenance, data cleaning, and import.

对于普通用户，其用例有个人信息管理、商品浏览、统计推荐榜单查看，包括销量榜，浏览量高的商品等。最重要的是个性化推荐结果查看，该用例针对用户历史购买行为，为每个用户实现更符合用户特点和兴趣的商品。For ordinary users, its use cases include personal information management, product browsing, and viewing of statistical recommendation lists, including sales lists, and products with high page views. The most important thing is to view the personalized recommendation results. This use case aims at the user's historical purchase behavior and realizes the products that are more in line with the user's characteristics and interests for each user.

2.架构设计2. Architecture Design

本系统基于B/S架构，选用Django作为后台开发框架。Django是一个开源的Web开发框架，能够帮助开发人员更快、更容易地开发web站点。Django采用的是三层架构设计，分别为模板层，视图层和模型层。模板层直接与用户接触，用于显示数据和接受用户输入的数据。视图层位于中间层，起到了数据处理以及传输的作用。模型层层主要功能是对数据模型的创建以及维护。系统架构如图10。This system is based on B/S architecture, and Django is selected as the background development framework. Django is an open source web development framework that helps developers develop web sites faster and easier. Django adopts a three-layer architecture design, which are template layer, view layer and model layer. The template layer is in direct contact with the user and is used to display data and accept data entered by the user. The view layer is located in the middle layer and plays the role of data processing and transmission. The main function of the model layer is to create and maintain the data model. The system architecture is shown in Figure 10.

3系统模块设计3 System Module Design

该系统主要功能模块为系统管理模块、数据处理模块、统计推荐模块、个性化推荐模块。The main functional modules of the system are system management module, data processing module, statistical recommendation module and personalized recommendation module.

1)后台管理模块1) Background management module

后台管理模块主要用于会员数据、商品数据和交易数据的管理。会员数据包含会员注册时的登录账户密码和个人基本信息；商品数据包含商品的基本信息，管理员可以更新商品数据，或者根据筛选条件查看商品详细信息；交易数据包括会员对商品的购买记录，管理员可以根据购买日期、会员编码、商品编码等对交易记录进行查看。The background management module is mainly used for the management of member data, commodity data and transaction data. Member data includes the login account password and personal basic information when the member is registered; product data includes the basic information of the product, the administrator can update the product data, or view the product details according to the filter conditions; transaction data includes the member's purchase record of the product, management Members can view the transaction records according to the purchase date, membership code, commodity code, etc.

2)数据处理模块2) Data processing module

数据处理模块包括会员数据、商品数据和交易数据的清洗以及导入。The data processing module includes cleaning and importing of member data, commodity data and transaction data.

3)商品展示模块3) Commodity display module

会员可以浏览所有商品，并且可以根据适用肤质以及适用年龄对商品进行筛选，按照商品点击数进行排序。商品详情页可以查看商品基本信息，以及与该商品同系列的商品推荐。Members can browse all products, and can filter products according to applicable skin type and applicable age, and sort according to the number of product clicks. On the product details page, you can view the basic information of the product, as well as product recommendations of the same series as the product.

4)统计推荐模块4) Statistical recommendation module

统计推荐主要依据历史评分记录，计算历史热门商品和商品平均评分统计。Statistical recommendation is mainly based on historical score records to calculate historical popular products and average score statistics of products.

5)个性化推荐模块5) Personalized recommendation module

个性化推荐模块是该系统的核心部分，负责平台对会员进行商品推荐的部分，包括数据信息获取、会员与商品特征提取、模型构建、推荐候选集筛选和最终推荐结果。由数据处理模块处理后的数据作为推荐模块的输入，然后调用本文提出的推荐模型进行训练，得到会员和产品的特征向量并进行保存。同时保存会员和产品的字典，用来获取特定会员或者产品的信息，设计相关算法得出推荐的产品集。本系统设计三个推荐方案，其中包括给推荐同类型的产品，推荐会员最喜欢的产品，推荐相似类型会员喜欢产品。The personalized recommendation module is the core part of the system, which is responsible for the part of the platform that recommends products to members, including data information acquisition, member and product feature extraction, model building, recommendation candidate set screening and final recommendation results. The data processed by the data processing module is used as the input of the recommendation module, and then the recommendation model proposed in this paper is called for training, and the feature vectors of members and products are obtained and saved. At the same time, the dictionary of members and products is saved, which is used to obtain information of specific members or products, and design related algorithms to obtain recommended product sets. This system designs three recommendation schemes, including recommending products of the same type, recommending members' favorite products, and recommending similar types of members' favorite products.

推荐同类型的产品即给会员推荐与其购买过产品相似的产品，设计思路是计算当前会员购买过的产品与其他产品的余弦相似度，然后取相似度最大的十个推荐给会员，并添加随机选择以便可以根据会员的实际情况来进行相应的调整。Recommending products of the same type means recommending products similar to the products they have purchased to members. The design idea is to calculate the cosine similarity between the products purchased by the current member and other products, and then select the ten most similar products to recommend to members, and add random Select so that adjustments can be made according to the actual situation of the member.

推荐会员最喜欢的产品即推荐预测评分最高的产品，设计思路是使用会员特征矩阵与产品特征矩阵的乘积表示预测会员对未知商品的评分，取预测评分最高的十个商品推荐给会员。The product that recommends the member's favorite product is the product with the highest predicted score. The design idea is to use the product of the member feature matrix and the product feature matrix to represent the predicted member's score for the unknown product, and select the ten products with the highest predicted score to recommend to the member.

推荐相似类型会员喜欢产品即推荐与会员相似的会员购买过的产品，首先提取所有会员的会员特征向量，和所有产品特征做一个点积操作，得到会员对产品的评分，对这些评分做一个排序操作，返回每个会员评分最高的前一个产品，存储在字典序中。然后根据会员特征集合，计算与其他所有会员的余弦相似度，排序后返回与目标会员相似的前十个会员，获取存储的这十个会员评分最高的产品生成推荐产品集。Recommending products that similar types of members like is recommending products purchased by members similar to the members. First, extract the member feature vectors of all members, and do a dot product operation with all product features to get the members' ratings of the products, and sort these ratings. Operation that returns the previous product with the highest rating for each member, stored in lexicographical order. Then, according to the member feature set, calculate the cosine similarity with all other members, and return the top ten members similar to the target member after sorting, and obtain the stored products with the highest ratings from the ten members to generate a recommended product set.

本发明公开的一种基于神经网络的特征融合推荐系统，包括：A feature fusion recommendation system based on a neural network disclosed in the present invention includes:

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

最后应当说明的是：以上实施例仅用以说明本发明的技术方案而非对其限制，尽管参照上述实施例对本发明进行了详细的说明，所属领域的普通技术人员应当理解：依然可以对本发明的具体实施方式进行修改或者等同替换，而未脱离本发明精神和范围的任何修改或者等同替换，其均应涵盖在本发明的权利要求保护范围之内。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: the present invention can still be Modifications or equivalent replacements are made to the specific embodiments of the present invention, and any modifications or equivalent replacements that do not depart from the spirit and scope of the present invention shall be included within the protection scope of the claims of the present invention.

Claims

Translated fromChinese

2.根据权利要求1所述的一种基于神经网络的特征融合推荐方法，其特征在于，所述商品特征包括商品的编码、品类、名称、二级分类、适用年龄、适用部位、适用肤质和用户单价；所述用户特征包括性别和肤质，性别有男、女2个种类，肤质有干性、油性、敏感肌和混合型4个种类。2. A kind of feature fusion recommendation method based on neural network according to claim 1, is characterized in that, described commodity characteristic comprises commodity code, category, name, secondary classification, applicable age, applicable position, applicable skin quality and the unit price of the user; the user characteristics include gender and skin type, there are 2 types of gender, male and female, and 4 types of skin types: dry, oily, sensitive skin and combination type.

3.根据权利要求1所述的一种基于神经网络的特征融合推荐方法，其特征在于，所述商品特征向量化包括：3. A kind of feature fusion recommendation method based on neural network according to claim 1, is characterized in that, described commodity feature vectorization comprises:

4.根据权利要求3所述的一种基于神经网络的特征融合推荐方法，其特征在于，采用了方差分析法、互信息法、基于分类与回归树嵌入法或基于递归特征消除交叉验证的包装法来进行商品特征的选取。4. a kind of feature fusion recommendation method based on neural network according to claim 3, is characterized in that, adopts analysis of variance method, mutual information method, based on classification and regression tree embedding method or based on recursive feature to eliminate the packaging of cross-validation method to select product features.

5.根据权利要求1所述的一种基于神经网络的特征融合推荐方法，其特征在于，用户特征向量化，包括：5. a kind of feature fusion recommendation method based on neural network according to claim 1, is characterized in that, user feature vectorization, comprises:

6.根据权利要求1所述的一种基于神经网络的特征融合推荐方法，其特征在于，所述三层神经网络的特征融合推荐模型，第一层神经网络为将特征向量转化为低纬稠密的特征表示的嵌入层；第二层神经网络为将所有的特征表示拼接到一起，得到用户和会员的特征向量的全连接层；第三层神经网络为将前两层得到的商品和用户特征作为输入，将两个输入以矩阵相乘的方式得到一个输出值并其将回归到真实评分的全连接层。6. a kind of feature fusion recommendation method based on neural network according to claim 1, is characterized in that, the feature fusion recommendation model of described three-layer neural network, the first layer of neural network is to convert feature vector into low-dimensional dense The embedding layer of the feature representation; the second layer of neural network is a fully connected layer that splices all feature representations together to obtain the feature vectors of users and members; the third layer of neural network is the product and user features obtained from the first two layers. As input, a fully connected layer that matrix-multiplies the two inputs to get an output value that is regressed to the true score.

7.根据权利要求1所述的一种基于神经网络的特征融合推荐方法，其特征在于，三层神经网络的特征融合推荐模型的构建包括：7. a kind of feature fusion recommendation method based on neural network according to claim 1, is characterized in that, the construction of the feature fusion recommendation model of three-layer neural network comprises:

8.根据权利要求1所述的一种基于神经网络的特征融合推荐方法，其特征在于，三层神经网络的特征融合推荐模型的评价包括：8. a kind of feature fusion recommendation method based on neural network according to claim 1, is characterized in that, the evaluation of the feature fusion recommendation model of three-layer neural network comprises:

9.根据权利要求8所述的一种基于神经网络的特征融合推荐方法，其特征在于，通过MSE、RMSE和MAE来评价特征融合推荐模型的精度，9. a kind of feature fusion recommendation method based on neural network according to claim 8, is characterized in that, by MSE, RMSE and MAE to evaluate the precision of feature fusion recommendation model,

其中u代表用户，i代表物品，r_ui是用户u对物品i的实际评分，

是推荐算法给出的预测评分。where u represents the user, i represents the item, and r_ui is the actual rating of the item i by the user u,

is the predicted score given by the recommendation algorithm.

10.一种基于神经网络的特征融合推荐系统，其特征在于，包括：10. A feature fusion recommendation system based on neural network, characterized in that, comprising: