CN117611243B

Movatterモバイル変換

Info

Publication number: CN117611243B
Application number: CN202311726507.0A
Authority: CN
Inventors: 王烨卓; 梁婷婷
Original assignee: Ren Tuo Data Technology Shanghai Co ltd
Current assignee: Ren Tuo Data Technology Shanghai Co ltd
Priority date: 2023-12-14
Filing date: 2023-12-14
Publication date: 2024-08-06
Anticipated expiration: 2043-12-14
Also published as: CN117611243A; WO2025123691A1

Abstract

The invention discloses an analysis method for quantitatively analyzing interaction and sales indexes of content tags, and belongs to the field of content marketing. Aiming at the problems of difficult creation and inaccuracy of the existing marketing content, the invention provides an analysis method for quantitatively analyzing the interaction effect of a content label on the marketing content and the contribution to commodity sales, which comprises the following steps: obtaining a content database, labeling the content in the content database to obtain a content label, and adding a commodity SPU label to the content database; acquiring an electronic commerce commodity database, and adding a commodity SPU label to the commodity database; calculating content interaction indexes of all content tags and commodity sales indexes of all content tags; and carrying out visual analysis on the content interaction index and the commodity sales index. According to the method, the influence of the content tag on the content interaction and commodity sales is quantitatively evaluated through the content interaction index and the commodity sales index, and quantitative comparison is carried out on the content tag, so that the creation of marketing content is simple, feasible and accurate, and the cost is saved.

Description

Translated fromChinese

一种量化分析内容标签的互动和销售指标的分析方法A method for quantitatively analyzing the interaction and sales indicators of content tags

技术领域Technical Field

本发明属于内容营销技术领域，更具体地说，涉及一种量化分析内容标签的互动和销售指标的分析方法。The present invention belongs to the technical field of content marketing, and more specifically, relates to an analysis method for quantitatively analyzing the interaction and sales indicators of content tags.

背景技术Background technique

社交媒体上的商品营销内容对于社交媒体站内电商平台上的商品销售有拉动作用。营销内容本身变化万千，为了量化分析营销内容的内容元素，可以将营销内容拆解为结构化的内容标签(内容元素)，对内容标签可以分析其在内容的互动维度的指标(包括内容的曝光量、点赞数、转发数、评价数、收藏数等)。品牌商及内容创作者均有意愿获取内容标签对内容互动和商品销售维度的量化评估，从而指导营销内容的创作。而如何量化分析营销内容的内容标签对商品销售的影响作用，在内容营销行业内一直是一个较难的课题。Product marketing content on social media has a pulling effect on product sales on e-commerce platforms within social media sites. Marketing content itself varies in many ways. In order to quantitatively analyze the content elements of marketing content, marketing content can be broken down into structured content tags (content elements). Content tags can be analyzed for indicators in the interactive dimension of content (including content exposure, likes, reposts, reviews, collections, etc.). Brands and content creators are willing to obtain quantitative evaluations of content tags on content interaction and product sales dimensions, so as to guide the creation of marketing content. However, how to quantitatively analyze the impact of content tags of marketing content on product sales has always been a difficult topic in the content marketing industry.

营销内容对商品销售的拉动有直接和间接的影响，部分营销内容有直接附上了站内商品链接，消费者可以直接点击购买，但部分消费者不一定马上购买而有可能过段时间再自行去购买；另一部分营销内容属于“种草”内容，并不直接给出商品链接，这种情况下营销内容对消费者购买决策的影响，更不容易评估。一般在平台后台，对于挂了商品链接的内容，可以有这部分商品的从内容转化的销量数据，但该数据只有平台本身拥有且不公开，除了平台本身之外很难有机构能对行业数据进行分析。并且，营销内容对消费者购买的间接影响的量化评估，品牌商及内容创作者通常也很难获取。Marketing content has both direct and indirect effects on product sales. Some marketing content directly attaches links to products on the site, so consumers can click to buy directly. However, some consumers may not buy immediately and may make their own purchases after a while. Another part of marketing content is “grass-planting” content, which does not directly provide product links. In this case, the impact of marketing content on consumer purchase decisions is even more difficult to evaluate. Generally, in the background of the platform, for content with product links, there may be sales data of these products converted from content, but this data is only owned by the platform itself and is not public. It is difficult for any organization other than the platform itself to analyze industry data. In addition, it is usually difficult for brands and content creators to obtain quantitative evaluations of the indirect impact of marketing content on consumer purchases.

针对上述问题也进行相应的改进，如中国专利申请号CN202110787436.X，公开日为2021年10月29日，该专利公开了一种智慧银行多渠道协作营销系统及方法，包括：从自有渠道和营销渠道采集数据，得到用户信息；对用户信息进行分析，得到有效信息；将有效信息和银行产品进行匹配；采用银行自有渠道分别对产品营销内容进行投放。该专利的不足之处在于：精准投放的精准度较差，匹配精度有待提高。Corresponding improvements have also been made to the above problems, such as the Chinese patent application number CN202110787436.X, which was published on October 29, 2021. The patent discloses a smart bank multi-channel collaborative marketing system and method, including: collecting data from proprietary channels and marketing channels to obtain user information; analyzing user information to obtain valid information; matching valid information with bank products; and using the bank's own channels to deliver product marketing content respectively. The shortcomings of this patent are: the accuracy of precise delivery is poor, and the matching accuracy needs to be improved.

又如中国专利申请号CN202110319997.7，公开日为2021年6月15日，该专利公开了一种自助式视频营销管理系统，包括：网络爬虫模块，用于基于预设的营销视频特征参数在各大视频播放平台上爬取对应的营销视频播放量、点赞量和评论量；观众群定位模块，用于实现点赞人群和评论人群账户信息的分析，从而定位该营销视频所适配的观众群特征，并构建观众群特征配置模型；评论分析模块，用于挖掘存在价值的评论数据，并实现评论数据的处理分析；视频内容改进建议模块，用于根据评论数据的处理分析结果生成对应的视频内容改进建议；营销视频定向投放模块，用于根据观众群定位模块的定位结果实现营销视频的定向投放。该专利的不足之处在于：虽能够根据定位结果实现营销视频的定向投放，但准确度较差。Another example is Chinese patent application number CN202110319997.7, which was published on June 15, 2021. The patent discloses a self-service video marketing management system, including: a web crawler module, which is used to crawl the corresponding marketing video playback volume, likes and comments on major video playback platforms based on preset marketing video feature parameters; an audience group positioning module, which is used to analyze the account information of the likes group and the comment group, so as to locate the audience group characteristics adapted to the marketing video and build an audience group characteristic configuration model; a comment analysis module, which is used to mine valuable comment data and realize the processing and analysis of comment data; a video content improvement suggestion module, which is used to generate corresponding video content improvement suggestions based on the processing and analysis results of comment data; a marketing video targeted delivery module, which is used to realize the targeted delivery of marketing videos based on the positioning results of the audience group positioning module. The shortcomings of this patent are: although it can realize the targeted delivery of marketing videos according to the positioning results, the accuracy is poor.

发明内容Summary of the invention

1、要解决的问题1. Problems to be solved

针对现有营销内容创作困难且不精准的问题，本发明提供一种量化分析内容标签的互动和销售指标的分析方法。本发明通过内容互动指标和商品销售指标量化评估内容标签在内容互动和商品销售上的影响，对内容标签进行量化的比较，使得营销内容的创作简单可行且精准，节省人力成本以及资源成本。In view of the difficulty and inaccuracy of existing marketing content creation, the present invention provides a method for quantitatively analyzing the interaction and sales indicators of content tags. The present invention quantitatively evaluates the impact of content tags on content interaction and product sales through content interaction indicators and product sales indicators, and quantitatively compares content tags, making the creation of marketing content simple, feasible and accurate, saving manpower and resource costs.

2、技术方案2. Technical solution

为解决上述问题，本发明采用如下的技术方案。To solve the above problems, the present invention adopts the following technical solutions.

一种量化分析内容标签的互动和销售指标的分析方法，包括以下步骤：A method for quantitatively analyzing interaction and sales indicators of content tags includes the following steps:

获取社交媒体平台的内容数据库，并对内容数据库中的内容进行打标签处理得到内容标签，且对内容数据库添加商品SPU(Standard Product Unit，标准化产品单元)标签；Obtain a content database of a social media platform, label the content in the content database to obtain content labels, and add a commodity SPU (Standard Product Unit) label to the content database;

获取社交媒体平台内电商平台的电商商品数据库，并对商品数据库添加商品SPU标签；Obtain the e-commerce product database of the e-commerce platform in the social media platform, and add product SPU tags to the product database;

计算所有内容标签的内容互动指标和计算所有内容标签的商品销售指标；Calculate content engagement metrics for all content tags and calculate product sales metrics for all content tags;

对内容互动指标和商品销售指标进行可视化分析。Visualize and analyze content engagement and product sales metrics.

更进一步的，所述计算内容互动指标包括如下步骤：Furthermore, the calculation of the content interaction index includes the following steps:

确定内容数据库中的商品品类，并统计该商品品类在设定时间段内的所有内容标签k的内容数量n，得到关于内容的集合C_k＝{内容1，内容2，...，内容n}；Determine the commodity category in the content database, and count the number of contents n of all content tags k in the commodity category within a set time period, and obtain a set of contents C_k = {content 1, content 2, ..., content n};

计算内容集合中每个内容的互动值E_i：Calculate the interaction value E_i of each content in the content set:

E_i＝每个内容的点赞数+每个内容的转发数+每个内容的收藏数+每个内容的评价数；或E_i = the number of likes for each content + the number of reposts for each content + the number of favorites for each content + the number of comments for each content; or

E_i＝每个内容的点赞数或每个内容的转发数或每个内容的收藏数或每个内容的评价数；E_i = the number of likes for each content, the number of reposts for each content, the number of collections for each content, or the number of evaluations for each content;

计算内容标签k的内容互动指标Y_k：Calculate the content interaction index Y_k of content tag k:

或者是 or

更进一步的，所述计算商品销售指标包括如下步骤：Furthermore, the calculation of the commodity sales index includes the following steps:

确定内容集合中单个内容i对应的m个商品SPU，得到单个内容i对应的商品SPU集合P_i＝{SPU₁，SPU₂，...，SPU_m}；Determine m commodity SPUs corresponding to a single content i in the content set, and obtain a commodity SPU set P_i ={SPU₁ , SPU₂ , ..., SPU_m } corresponding to the single content i;

确定单个商品SPU在设定时间段内每个日期的销售额；Determine the sales of a single product SPU on each date within a set time period;

确定某个日期d，单个内容i对单个商品SPU_j的销售额贡献值：Determine the sales contribution of a single content i to a single product SPU_j on a certain date d:

其中，Sales(j，d)为商品SPU_j在日期d的销售额；W_i为单个内容i在日期d对商品SPU_j销售额的影响贡献权重；u为对于日期d在有效时间窗口内提到SPU_j的内容的总数量；Where Sales(j, d) is the sales of product SPU_j on date d;_Wi is the contribution weight of the impact of individual content i on the sales of product SPU_j on date d; u is the total number of content that mentions SPU_j within the valid time window for date d;

确定单个内容i对其内容中提及的商品P_i＝{SPU₁，SPU₂，...，SPU_m}销售额总计贡献值：Determine the total contribution of a single content i to the sales of the products P_i ={SPU₁ , SPU₂ , ..., SPU_m } mentioned in the content:

其中，t为设定的计算内容从发布日期开始影响商品销量的时间窗口天数；Among them, t is the number of days in the time window from the release date that the calculated content affects the sales volume of the product;

确定内容标签k对应的内容集合C_k对商品销售影响的累计值：Determine the cumulative value of the impact of the content set C_k corresponding to the content tag k on the product sales:

计算内容标签k的销售指标X_k：或其中p为内容标签k对应的内容集合C_k对应的排重总商品SPU数。Calculate the sales index X_k of content tag k: or Where p is the total SPU number of deduplicated products corresponding to the content set C_k corresponding to the content tag k.

更进一步的，对内容互动指标和商品销售指标进行可视化分析具体包括如下步骤：Furthermore, the visual analysis of content interaction indicators and product sales indicators includes the following steps:

建立二维坐标系，将内容标签对应的销售指标作为X轴；内容标签对应的互动指标作为Y轴；Establish a two-dimensional coordinate system, with the sales index corresponding to the content tag as the X-axis and the interaction index corresponding to the content tag as the Y-axis;

确定所有内容标签的销售指标和互动指标的均值，以销售指标和互动指标的均值为界分为四个区域，将销售指标和互动指标分别放入四个区域内。Determine the mean values of the sales index and the interaction index of all content tags, divide them into four areas based on the mean values of the sales index and the interaction index, and put the sales index and the interaction index into the four areas respectively.

更进一步的，所述对内容数据库中的内容进行打标签处理得到内容标签具体包括如下步骤：Furthermore, the tagging of the content in the content database to obtain the content tag specifically includes the following steps:

构建商品知识图谱；Build a product knowledge graph;

构建内容品类数据库；获取媒体内容数据库，利用商品知识图谱从媒体内容数据库中筛选出各商品的品类相关的内容数据，构建为内容品类数据库；Build a content category database; obtain a media content database, use the product knowledge graph to filter out the content data related to the category of each product from the media content database, and build a content category database;

对内容品类数据库进行信息抽取构建品类内容标签树，按照品类内容标签树将品类数据库中的数据进行打标签；其中该步骤具体包括如下步骤：Extract information from the content category database to build a category content label tree, and label the data in the category database according to the category content label tree; this step specifically includes the following steps:

首先使用RaNER模型对内容品类数据库进行抽取人物实体、品类实体、品牌实体和商品属性实体；再使用大语言模型结合信息抽取式prompt和思维链总结式prompt对品类数据库进行语义识别并抽取人物实体，网络热词实体，用户痛点实体，产品特点实体，适用实体；最后进行实体结果融合得到最终的实体；First, the RaNER model is used to extract person entities, category entities, brand entities, and product attribute entities from the content category database. Then, the large language model is used in combination with the information extraction prompt and the thought chain summary prompt to perform semantic recognition on the category database and extract person entities, network hot word entities, user pain point entities, product feature entities, and applicable entities. Finally, the entity results are fused to obtain the final entity.

将最终的实体通过文本向量化模型将实体词转换成词向量；然后通过聚类算法得到若干类词向量；再然后通过大语言模型将每一类中的词归纳为一个或多个标签，利用大语言模型输出的关键词类型，构建树形结构的内容标签树，以及每个标签的关键词；The final entity is converted into a word vector through the text vectorization model; then several categories of word vectors are obtained through the clustering algorithm; then the words in each category are summarized into one or more tags through the large language model, and the keyword types output by the large language model are used to build a tree-structured content tag tree and the keywords of each tag;

按照品类内容标签树将品类数据库中的内容文本进行打标签。Label the content text in the category database according to the category content label tree.

更进一步的，对内容文本进行打标签包括如下步骤：Furthermore, tagging the content text includes the following steps:

对内容文本进行打标签处理时，判断内容文本是否进行过实体抽取；如果进行过实体抽取，则实体词成为候选标签；如果没有进行过实体抽取则进行实体抽取后，对识别到的实体词对应的标签加入候选标签集合；When labeling the content text, determine whether the content text has been subjected to entity extraction; if entity extraction has been performed, the entity word becomes a candidate tag; if entity extraction has not been performed, after entity extraction, the tag corresponding to the identified entity word is added to the candidate tag set;

且对标签树中的各标签对应的关键词或正则表达式，使用关键词匹配和正则表达式匹配，匹配到的标签也加入内容文本的候选标签集合；And for the keywords or regular expressions corresponding to each tag in the tag tree, keyword matching and regular expression matching are used, and the matched tags are also added to the candidate tag set of the content text;

使用大语言模型利用判别式Prompt对该内容文本已经筛选出的所有候选标签进行判断，确定候选标签与对应的内容文本含义是否匹配；若匹配则确认该候选标签，若不匹配则进行修正。Use the large language model to use the discriminant prompt to judge all the candidate tags that have been screened out for the content text to determine whether the candidate tags match the meaning of the corresponding content text; if they match, confirm the candidate tag, if not, modify it.

更进一步的，构建内容品类数据库具体包括如下步骤：Furthermore, building a content category database specifically includes the following steps:

采集各社交媒体内容信息，形成媒体内容数据库；Collect content information from various social media to form a media content database;

利用商品知识图谱对媒体内容数据库中的文本信息进行文本匹配，建立商品品类的内容初筛数据库；Use the product knowledge graph to match text information in the media content database and establish a content screening database for product categories;

对初筛数据库中的图片类型内容和视频类型内容分别进行转换成文本内容；Convert the image type content and video type content in the initial screening database into text content respectively;

对初筛数据库进行精筛分类，判断文本内容是否与商品品类相关。Perform fine screening and classification on the initial screening database to determine whether the text content is related to the product category.

更进一步的，所述媒体内容数据库中仅存储原始文本描述信息：对于图文内容，存储其内容标题、内容文本和图片链接内容；对于视频内容，存储其内容标题和视频链接内容。Furthermore, the media content database only stores original text description information: for graphic content, the content title, content text and picture link content are stored; for video content, the content title and video link content are stored.

3、有益效果3. Beneficial effects

相比于现有技术，本发明的有益效果为：Compared with the prior art, the present invention has the following beneficial effects:

(1)本发明通过获取社交媒体平台的内容数据库以及社交媒体平台内的电商商品数据库，对内容数据库中的内容进行打标签处理后，根据电商商品数据库分别计算每个内容标签的内容互动指标和商品销售指标，继而对其进行可视化分析；整个数据来源仅通过社交媒体页面上公开的数据，不依赖于追踪用户行为链路，极大拓宽了使用对象的范围；同时通过内容互动指标和商品销售指标量化评估内容标签在内容互动和商品销售上的影响，从而对内容标签进行量化的比较，当品牌商或内容创作者需要创作商品营销内容时可以选择在该品类较为优质的标签，使得营销内容的创作简单可行且精准；节省人力成本以及资源成本；(1) The present invention obtains the content database of the social media platform and the e-commerce product database in the social media platform, labels the content in the content database, and then calculates the content interaction index and product sales index of each content tag according to the e-commerce product database, and then performs a visual analysis on them; the entire data source is only the data publicly available on the social media page, and does not rely on tracking user behavior links, which greatly broadens the scope of users; at the same time, the content interaction index and product sales index are used to quantitatively evaluate the impact of content tags on content interaction and product sales, so as to quantitatively compare content tags. When brands or content creators need to create product marketing content, they can choose tags with relatively high quality in the category, making the creation of marketing content simple, feasible and accurate; saving manpower costs and resource costs;

(2)本发明通过构建商品知识图谱，利用商品知识图谱中实体与关系从媒体内容数据库中构建内容品类数据库，使得内容品类数据库的构建效率快，有效提高工作效率；当构建品类数据库完成后对其进行信息抽取构建内容标签树，在进行信息抽取时采用RaNER模型识别具象化实体，再利用大语言模型结合信息抽取式prompt和思维链总结式prompt识别抽象化实体，不同类型的实体采用不同的模型进行识别抽取，有效弥补单一模型进行实体抽取所造成的抽取识别不准确且实体召回不全的问题；最后再通过内容标签树对内容文本进行打标签；整个过程步骤流程简单不繁琐，实体抽取准确率较高，使得整体打标签的精度较高；同时效率快，减少人工成本与时间成本；(2) The present invention constructs a commodity knowledge graph and uses the entities and relationships in the commodity knowledge graph to construct a content category database from the media content database, so that the construction efficiency of the content category database is fast and the work efficiency is effectively improved; after the construction of the category database is completed, information is extracted from it to construct a content tag tree. When extracting information, the RaNER model is used to identify concrete entities, and then the large language model is combined with the information extraction prompt and the thinking chain summary prompt to identify abstract entities. Different types of entities are identified and extracted using different models, which effectively compensates for the problems of inaccurate extraction and recognition and incomplete entity recall caused by a single model for entity extraction; finally, the content text is labeled through the content tag tree; the whole process is simple and not cumbersome, the entity extraction accuracy is high, and the overall labeling accuracy is high; at the same time, the efficiency is high, reducing labor costs and time costs;

(3)本发明在对内容文本进行打标签时先对内容文本进行判断是否进行实体抽取以此来提高工作效率，节省时间；已经进行实体抽取过的实体词成为候选标签，没有进行实体抽取的则进行实体抽取过后成为候选标签，同时对获取的候选标签集合进行判别，避免其存在潜在错误的可能，尽可能提升召回率(减少遗漏)的同时，利用大语言模型的语义理解能力，尽可能过滤掉关键词匹配到但语义上错误的实体，且尽可能过滤掉前面大语言模型可能给出的原文中并不存在的实体词，提高精确率；(3) When labeling content text, the present invention first determines whether to perform entity extraction on the content text to improve work efficiency and save time; entity words that have been subjected to entity extraction become candidate tags, and entity words that have not been subjected to entity extraction become candidate tags after entity extraction. At the same time, the obtained candidate tag set is judged to avoid the possibility of potential errors, and the recall rate is improved as much as possible (reducing omissions). At the same time, the semantic understanding ability of the large language model is used to filter out entities that are matched by keywords but are semantically incorrect, and to filter out entity words that do not exist in the original text that may be given by the large language model as much as possible, thereby improving the precision rate;

(4)本发明在构建内容品类数据库时包括先对媒体内容数据库进行初筛，得到初筛数据库，随后再对初筛数据库中的图片以及视频内容进行分别转换为文本内容，最后再进行精筛分类得到最终的品类数据库；先低成本快速初筛，缩小数据范围再进行精筛的处理流程，整体保证精准的前提下降低成本的同时提高效率；并且媒体内容数据库中仅存储原始文本描述信息，大幅度降低存储成本。(4) When constructing a content category database, the present invention includes first performing a preliminary screening of the media content database to obtain a preliminary screening database, then converting the images and video contents in the preliminary screening database into text contents respectively, and finally performing fine screening and classification to obtain a final category database; the process of first performing a low-cost and fast preliminary screening, narrowing the data range and then performing a fine screening process reduces costs while improving efficiency while ensuring overall accuracy; and only the original text description information is stored in the media content database, which greatly reduces storage costs.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的流程示意图；Fig. 1 is a schematic diagram of the process of the present invention;

图2为内容标签的销售指标计算的过程示意图；FIG2 is a schematic diagram of the process of calculating the sales index of content tags;

图3为内容对商品销售额的贡献值的计算过程示意图。FIG. 3 is a schematic diagram of the calculation process of the contribution value of content to commodity sales.

具体实施方式Detailed ways

下面结合具体实施例和附图对本发明进一步进行描述。The present invention is further described below in conjunction with specific embodiments and drawings.

请参阅图1至图2，图1为本申请的流程示意图；图2为内容标签的销售指标计算的过程示意图。Please refer to Figures 1 to 2, Figure 1 is a schematic diagram of the process of this application; Figure 2 is a schematic diagram of the process of calculating the sales index of the content tag.

在本实施例中如图1所示，一种量化分析内容标签的互动和销售指标的分析方法，包括以下步骤：In this embodiment, as shown in FIG. 1 , a method for quantitatively analyzing the interaction and sales indicators of content tags includes the following steps:

S1：获取社交媒体平台的内容数据库，并对内容数据库中的内容进行打标签处理得到内容标签；内容数据库中存储了内容信息；按照日期存储了内容的互动量(互动量即包括了点赞数、评价数、转发数和收藏数)；存储了内容中识别到的商品SPU、按结构化标签树给内容打上的内容标签；以及商品SPU和描述该商品SPU的内容标签的关系；S1: Obtain the content database of the social media platform and label the content in the content database to obtain content labels; the content database stores content information; stores the content interaction volume (interaction volume includes the number of likes, comments, reposts and favorites) by date; stores the product SPU identified in the content, the content labels assigned to the content according to the structured label tree; and the relationship between the product SPU and the content label describing the product SPU;

获取社交媒体平台内电商平台的电商商品数据库：所述电商商品数据库即为社交媒体平台站内电商平台的清洗好品牌、品类、SPU(Standard Product Unit，标准化产品单元)的电商商品数据库。Obtaining an e-commerce product database of an e-commerce platform within a social media platform: the e-commerce product database is an e-commerce product database of clean brands, categories, and SPUs (Standard Product Units) of an e-commerce platform within a social media platform.

在一个具体举例中，内容数据库中的内容数据如下表1所示；内容数据按日期的互动数据如表2所示；内容数据对应的商品SPU以及描述该商品SPU的内容标签的数据如下表3所示：In a specific example, the content data in the content database is shown in Table 1 below; the interactive data of the content data by date is shown in Table 2; the commodity SPU corresponding to the content data and the data of the content tag describing the commodity SPU are shown in Table 3 below:

表1内容数据库的内容数据表Table 1 Content database content table

表2内容数据按日期的互动数据表Table 2 Interactive data table of content data by date

日期date内容idContent ID点赞量Likes转发数Number of forwarding评价数Number of reviews收藏数Number of Favorites2023-2-12023-2-1内容1Content 1……………………2023-2-22023-2-2内容1Content 1……………………2023-2-32023-2-3内容1Content 1……………………

表3内容数据对应的商品SPU以及描述该商品SPU的内容标签的数据表Table 3 Data table of commodity SPU corresponding to content data and content tags describing the commodity SPU

内容idContent IDSPUSPU内容标签Content Tags内容1Content 1SPU 1SPU 1功效-美白Effect-Whitening内容1Content 1SPU 1SPU 1功效-淡斑Effect-Lightens spots内容1Content 1SPU 2SPU 2功效-保湿Effect-Moisturizing

从表1至表3中可以看出，内容1中提到了两款商品，分别是SPU1和SPU2，其中对SPU1的描述中识别到两个内容标签，分别是功效-美白和功效-淡斑；对SPU2的描述中，提到了内容标签功效-保湿。It can be seen from Tables 1 to 3 that two products, SPU1 and SPU2, are mentioned in Content 1. Two content labels, namely efficacy-whitening and efficacy-freckles, are identified in the description of SPU1; the content label efficacy-moisturizing is mentioned in the description of SPU2.

在一个具体举例中，电商商品数据库中存储了包括商品信息，以及清洗后的品牌、品类、SPU等字段；并且按照日期存储了商品的销量、销售额。一个商品id对应电商平台上的一个商品链接，一个SPU可能对应多个商品链接。其电商商品数据库中的电商商品数据如下表4所示，商品数据按照日期的销售额数据如下表5所示；In a specific example, the e-commerce product database stores product information, as well as cleaned fields such as brand, category, SPU, etc., and stores product sales and sales by date. A product ID corresponds to a product link on the e-commerce platform, and an SPU may correspond to multiple product links. The e-commerce product data in its e-commerce product database is shown in Table 4 below, and the sales data of product data by date is shown in Table 5 below;

表4电商商品数据库中的电商商品数据表Table 4 E-commerce commodity data table in the e-commerce commodity database

表5商品数据按照日期的销售额数据表Table 5 Sales data of commodity data by date

日期date商品idProduct ID销量Sales销售额Sales价格price2023-2-12023-2-1商品1Product 1………………2023-2-22023-2-2商品1Product 1………………2023-2-32023-2-3商品1Product 1………………

在一个具体实施方式中，对内容数据库中的内容按照结构化标签树进行打标签处理得到内容标签具体包括如下步骤：In a specific implementation, labeling the content in the content database according to the structured label tree to obtain the content label specifically includes the following steps:

S11：构建商品知识图谱；S11: Build a product knowledge graph;

在一个具体实施方式中，商品知识图谱的构建可以通过采集各电商平台的商品信息(商品信息包括平台、商品id、商品名称、规格、商品参数等)，形成商品数据库；利用RaNER模型对商品数据库中的商品文本信息进行实体识别，获得品牌、品类、商品属性实体及相关关键词，并利用实体在商品中的共现关系，构建商品知识图谱；所述商品知识图谱的结构为(实体，关系，实体)的三元组形式。In a specific implementation, the construction of a product knowledge graph can be achieved by collecting product information from various e-commerce platforms (product information includes platform, product ID, product name, specifications, product parameters, etc.) to form a product database; using the RaNER model to perform entity recognition on the product text information in the product database to obtain brand, category, product attribute entities and related keywords, and using the co-occurrence relationship of entities in the products to construct a product knowledge graph; the structure of the product knowledge graph is in the form of a triple of (entity, relationship, entity).

S12：构建内容品类数据库；获取媒体内容数据库，利用商品知识图谱从媒体内容数据库中筛选出各商品的品类相关的内容数据，构建为内容品类数据库；S12: construct a content category database; obtain a media content database, and use the product knowledge graph to filter out content data related to the category of each product from the media content database to construct a content category database;

具体的，构建内容品类数据库包括如下步骤：Specifically, building a content category database includes the following steps:

S121：采集各社交媒体内容信息，形成媒体内容数据库；在社交媒体上内容的形式主要包括图文内容和视频；为该步骤中媒体内容数据库中仅存储原始文本描述信息：对于图文内容，存储其内容标题、内容文本和图片链接内容；对于视频内容，存储其内容标题和视频链接内容；媒体内容数据库中仅存储原始文本描述信息，不包含图片视频等多媒体数据，从而达到大幅度降低存储成本的目的；S121: Collecting content information of various social media to form a media content database; the forms of content on social media mainly include graphic content and video; in this step, only the original text description information is stored in the media content database: for graphic content, its content title, content text and picture link content are stored; for video content, its content title and video link content are stored; the media content database only stores the original text description information, and does not include multimedia data such as pictures and videos, so as to achieve the purpose of significantly reducing storage costs;

S122：利用商品知识图谱对媒体内容数据库中的文本信息进行文本匹配，建立商品品类的内容初筛数据库；在这进行说明的是，由于社交媒体上的内容信息包含各种领域的主题，包括与商品相关的或者无关的，需要筛选出与某个品类的商品相关的内容；同时由于媒体内容数据库中包含海量数据(10亿条以上)，而内容类型除了文本还包含视频类型内容需要转换文字才能进一步处理，且视频转换文字算力成本较高且耗时，并且使用AI模型对海量数据进行分类同样成本较高；因此先对文本信息进行文本匹配，得到初筛数据库；低成本快速初筛，缩小数据范围，既保证效率又降低成本；S122: Use the product knowledge graph to perform text matching on the text information in the media content database, and establish a preliminary screening database for the content of the product category; it is explained here that since the content information on social media contains topics in various fields, including those related to or unrelated to the product, it is necessary to screen out the content related to a certain category of products; at the same time, since the media content database contains a large amount of data (more than 1 billion items), and the content type includes video type content in addition to text, it needs to be converted into text for further processing, and the computing cost of video conversion to text is high and time-consuming, and the cost of using AI models to classify massive data is also high; therefore, text matching is first performed on the text information to obtain a preliminary screening database; low-cost and fast preliminary screening narrows the data range, which ensures efficiency and reduces costs;

初筛阶段，利用商品知识图谱中各种实体的关键词，包括品牌、品类、属性等相关实体的关键词，对媒体内容数据库中的文本信息进行文本匹配，该步骤对其中的视频内容仅匹配其视频标题，从而能够快速从海量内容数据中筛选出与某个品类的商品相关的内容数据，建立某个品类的初筛数据库；可以对所有业务上需要的目标品类建立品类初筛数据库，例如美妆个护的品类初筛数据库。In the initial screening stage, keywords of various entities in the product knowledge graph, including keywords of related entities such as brand, category, and attribute, are used to perform text matching on the text information in the media content database. This step only matches the video title of the video content, so that content data related to a certain category of products can be quickly filtered out from massive content data to establish an initial screening database for a certain category. A category initial screening database can be established for all target categories required for the business, such as a category initial screening database for beauty and personal care products.

S123：对初筛数据库中的图片类型内容和视频类型内容分别进行转换成文本内容；由于步骤S122中初筛数据库其中包含图片、视频类型的内容仅判断了其标题等文本信息，为了进一步分析图片和视频中包含的商品相关的文本信息，需要将图片和视频中的内容转成文本；S123: Convert the image type content and video type content in the preliminary screening database into text content respectively; since the preliminary screening database in step S122 contains image and video type content, only the text information such as the title is determined, in order to further analyze the commodity-related text information contained in the image and video, it is necessary to convert the content in the image and video into text;

具体的，对于图片内容，利用OCR技术将图片中的文字转成文本内容；对于视频内容，分别利用OCR技术和ASR技术将视频转成文本内容：在进行OCR处理前，按照一定时间间隔对视频进行抽帧操作譬如每一秒抽一帧，将视频转换成一组图片，对这组图片利用OCR技术将图片中的文字转成文本内容，并且利用文字的位置和大小信息，尽可能过滤掉背景中的次要文字信息，而保留字幕等重要内容文本；且利用ASR技术将视频语音转换成文本内容，从而给每条视频内容添加(OCR文本，ASR文本)字段；将两个文本内容进行结合得到视频内容转成最终的文本内容；由于视频中画面上的文字和语音中分别包含了一部分语言信息，并且也各自缺失了一些语言信息，以及OCR和ASR技术分别可能有一定的错误率，将一段视频同时用OCR和ASR转出文本，并在后续结合使用，有利于更加完整的分析视频中的语言信息。Specifically, for image content, OCR technology is used to convert the text in the image into text content; for video content, OCR technology and ASR technology are used to convert the video into text content: before OCR processing, the video is framed at a certain time interval, such as one frame per second, and the video is converted into a group of images. The OCR technology is used to convert the text in the image into text content for this group of images, and the position and size information of the text is used to filter out the secondary text information in the background as much as possible, while retaining important content text such as subtitles; and the ASR technology is used to convert the video voice into text content, so as to add an (OCR text, ASR text) field to each video content; the two text contents are combined to obtain the video content and converted into the final text content; because the text and voice on the screen in the video respectively contain part of the language information, and each of them lacks some language information, and the OCR and ASR technologies may have a certain error rate, converting a video into text using OCR and ASR at the same time, and using them in combination later, is conducive to a more complete analysis of the language information in the video.

S124：对初筛数据库进行精筛分类，判断文本内容是否与商品品类相关。当步骤S123中获得初筛数据库中图片内容和视频内容的文本内容后，需要进一步判断内容是否与该品类商品数据相关；由于初筛是使用关键词快速匹配在海量数据中缩小数据范围，召回潜在相关的内容，但是会存在歧义的关键词，例如牛奶品牌“光明”，用关键词可能匹配到“阳光明媚”。因此需要对品类初筛数据库进行进一步精筛分类。传统的方法是先进行人工数据标注，然后训练有监督文本分类模型，判断内容文本是否与某品类商品相关。由于需要处理的商品品类数量较大，采用人工标注数据再训练分类模型成本较高。S124: Perform fine screening and classification on the preliminary screening database to determine whether the text content is related to the product category. After obtaining the text content of the picture content and video content in the preliminary screening database in step S123, it is necessary to further determine whether the content is related to the product data of this category; because the preliminary screening uses keywords to quickly match to narrow the data range in massive data and recall potentially relevant content, there will be ambiguous keywords, such as the milk brand "Guangming", which may match "sunny" with keywords. Therefore, it is necessary to further fine screen and classify the category preliminary screening database. The traditional method is to first perform manual data annotation, and then train a supervised text classification model to determine whether the content text is related to a certain category of goods. Due to the large number of product categories that need to be processed, it is costly to use manually annotated data and then train the classification model.

在该步骤S124中，精筛分类具体包括：对初筛数据库中的初筛数据进行品牌、品类关键词匹配后，对于同时匹配到品牌和品类实体，则判定为文本内容与商品品类的数据相关；同时匹配到品牌和品类实体的不需要使用模型判断从而提高速度降低算力消耗；对于仅匹配到品牌或品类关键词的，由于歧义可能性较高，使用大语言模型利用Prompt提示词如“上述文本是否描述的是护肤品类的产品”构建文本分类器，对内容文本进行分类到对应的品类中，随后构建内容品类数据库完成。In step S124, the fine screening classification specifically includes: after matching the initial screening data in the initial screening database with brand and category keywords, for those that match both brand and category entities, it is determined that the text content is related to the data of the product category; for those that match both brand and category entities, there is no need to use a model to judge, thereby increasing the speed and reducing computing power consumption; for those that only match brand or category keywords, due to the high possibility of ambiguity, a large language model is used to use prompt words such as "Does the above text describe a product in the skin care category" to build a text classifier, classify the content text into the corresponding category, and then build a content category database to complete.

S13：对内容品类数据库进行信息抽取构建品类内容标签树，按照品类内容标签树将品类数据库中的数据进行打标签；其中该步骤具体包括如下步骤：S13: extract information from the content category database to construct a category content label tree, and label the data in the category database according to the category content label tree; this step specifically includes the following steps:

S131：首先使用RaNER模型对品类数据库进行抽取人物实体、品类实体、品牌实体和商品属性实体；再使用大语言模型结合信息抽取式prompt和思维链总结式prompt对品类数据库进行语义识别并抽取人物实体，网络热词实体，用户痛点实体，产品特点实体；最后进行实体结果融合得到最终的实体；S131: First, use the RaNER model to extract person entities, category entities, brand entities, and product attribute entities from the category database; then use the large language model combined with information extraction prompt and thought chain summary prompt to perform semantic recognition on the category database and extract person entities, network hot word entities, user pain point entities, and product feature entities; finally, perform entity result fusion to obtain the final entity;

在该步骤中RaNER抽取品类信息和品牌信息及部分商品属性相关信息较为准确，在部分语义丰富的内容文本上抽取表现一般，对于内容长文本语义理解能力不足，比如沙漠大干、为肌肤这种词意义不明；因此再使用大语言模型(LLM模型)进行实体识别，并采用多种Prompt抽取。LLM模型能够用来进行各种NLP任务，包括实体识别；LLM模型是一种人工智能模型，旨在理解和生成人类语言。它们在大量的文本数据上进行训练，可以执行广泛的任务，包括文本总结、翻译、情感分析等等；但是使用LLM抽取仍然会存在一些问题，比如实体召回不全。本方案对一段内容文本采用多种类型的Prompt进行实体识别，并将结果做融合。综合使用多种类型的Prompt，可以获得相比单一prompt更佳的标签词抽取效果，继而进一步保证实体抽取的准确性，从而保证后续内容标签树构建的准确性；并且通过RaNER模型和LLM模型从内容文本抽取的实体词，包括内容id和实体词id的对应关系，都存储在抽词数据库表中，在后续按照结构化标签树对内容打标的处理中，该结果可以被重复利用。In this step, RaNER extracts category information, brand information, and some product attribute-related information more accurately. However, its extraction performance is average on some semantically rich content texts, and its ability to understand the semantics of long content texts is insufficient. For example, the meaning of words such as "desert dryness" and "for skin" is unclear. Therefore, a large language model (LLM model) is used for entity recognition, and multiple prompts are used for extraction. The LLM model can be used for various NLP tasks, including entity recognition. The LLM model is an artificial intelligence model designed to understand and generate human language. They are trained on a large amount of text data and can perform a wide range of tasks, including text summarization, translation, sentiment analysis, etc. However, there are still some problems when using LLM extraction, such as incomplete entity recall. This solution uses multiple types of prompts to perform entity recognition on a piece of content text and fuses the results. The combined use of multiple types of prompts can achieve better label word extraction effects than a single prompt, and further ensure the accuracy of entity extraction, thereby ensuring the accuracy of subsequent content label tree construction; and the entity words extracted from the content text through the RaNER model and the LLM model, including the correspondence between the content id and the entity word id, are stored in the word extraction database table. In the subsequent content labeling according to the structured label tree, the result can be reused.

S132：将最终的实体(实体词以及实体词的类型，再加上前述从商品中抽取的属性词譬如功能功效等类型的实体词)，通过文本向量化模型将实体词转换成词向量；然后通过聚类算法得到若干类词向量(相似度高的词向量聚为一类)；由于聚类算法存在一定的局限性，单纯词向量聚类后，同一类中的词也可能仍然对应不同的意思，因此再然后通过大语言模型的语义理解能力，将每一类中的词归纳为一个或多个标签，利用大语言模型输出的关键词类型，构建树形结构的内容标签树，以及每个标签的关键词；S132: The final entities (entity words and entity word types, plus the attribute words extracted from the product, such as entity words of the function and efficacy type) are converted into word vectors through a text vectorization model; then a clustering algorithm is used to obtain several types of word vectors (word vectors with high similarity are clustered into one category); due to certain limitations of the clustering algorithm, after simple word vector clustering, words in the same category may still correspond to different meanings, so the semantic understanding ability of the large language model is then used to summarize the words in each category into one or more tags, and the keyword types output by the large language model are used to construct a content tag tree with a tree structure, as well as keywords for each tag;

S133：按照品类内容标签树将品类数据库中的内容文本进行打标签。在步骤S133中对内容文本进行打标签包括如下步骤：S133: Tagging the content text in the category database according to the category content tag tree. Tagging the content text in step S133 includes the following steps:

S1331：对内容文本进行打标签处理时，判断内容文本是否进行过实体抽取(即是否已经使用RaNER和LLM模型进行抽取)；如果进行过实体抽取，则实体词成为候选标签；如果没有进行过实体抽取则进行实体抽取(包括RaNER实体识别，以及使用LLM大模型利用信息抽取式Prompt和层级链路总结式Prompt进行识别)后，对识别到的实体词对应的标签加入候选标签集合；S1331: When labeling the content text, determine whether the content text has been subjected to entity extraction (i.e., whether the RaNER and LLM models have been used for extraction); if entity extraction has been performed, the entity words become candidate tags; if entity extraction has not been performed, perform entity extraction (including RaNER entity recognition, and use the LLM large model to use information extraction prompt and hierarchical link summary prompt for recognition), and then add the tags corresponding to the recognized entity words to the candidate tag set;

S1332：且对标签树中的各标签对应的关键词或正则表达式，使用关键词匹配和正则表达式匹配，匹配到的标签也加入内容文本的候选标签集合；S1332: keyword matching and regular expression matching are used for the keywords or regular expressions corresponding to each tag in the tag tree, and the matched tags are also added to the candidate tag set of the content text;

S1333：经过前两步获取的候选标签集合，存在潜在错误的可能，对于关键词等识别到的标签，可能有一些语义上的错误；因此本步骤使用大语言模型利用判别式Prompt对该内容文本已经筛选出的所有候选标签进行判断，提示词例如“判断内容文本中是否提到了以下内容标签...”，确定候选标签与对应的内容文本含义是否匹配；若匹配则确认该候选标签，若不匹配则进行修正。这样在通过多种方式提升召回率(减少遗漏)的同时，利用LLM模型的语义理解能力，尽可能过滤掉关键词匹配到但语义上错误的实体，并且尽可能过滤掉前面环节LLM模型可能给出的原文并不中存在的实体词，起到同时提高精确率和召回率的效果。S1333: The candidate tag set obtained through the first two steps may have potential errors. For the tags identified such as keywords, there may be some semantic errors. Therefore, this step uses the large language model to use the discriminant prompt to judge all the candidate tags that have been screened out by the content text. The prompt word is such as "Judge whether the content text mentions the following content tags..." to determine whether the candidate tag matches the meaning of the corresponding content text; if it matches, the candidate tag is confirmed, and if it does not match, it is corrected. In this way, while improving the recall rate (reducing omissions) in various ways, the semantic understanding ability of the LLM model is used to filter out entities that are matched by keywords but are semantically incorrect as much as possible, and to filter out entity words that do not exist in the original text that may be given by the LLM model in the previous link as much as possible, so as to improve both precision and recall.

综上所述，本申请中的对内容数据库中的内容按照结构化标签树进行打标签处理得到内容标签的具体操作通过构建商品知识图谱，利用商品知识图谱中实体与关系从媒体内容数据库中构建内容品类数据库，使得内容品类数据库的构建效率快，有效提高工作效率；当构建内容品类数据库完成后对其进行信息抽取构建内容标签树，在进行信息抽取时采用RaNER模型识别具象化实体，再利用大语言模型结合信息抽取式prompt和思维链总结式prompt识别抽象化实体，不同类型的实体采用不同的模型进行识别抽取，有效弥补单一模型进行实体抽取所造成的抽取识别不准确且实体召回不全的问题；最后再通过内容标签树对内容文本进行打标签；整个过程步骤流程简单不繁琐，实体抽取准确率较高，使得整体打标签的精度较高；同时效率快，减少人工成本与时间成本。To summarize, the specific operation of labeling the content in the content database according to the structured tag tree to obtain the content tag in the present application is to construct a product knowledge graph, and use the entities and relationships in the product knowledge graph to construct a content category database from the media content database, so that the construction efficiency of the content category database is fast, and the work efficiency is effectively improved; after the construction of the content category database is completed, information is extracted from it to construct a content tag tree, and when extracting information, the RaNER model is used to identify concrete entities, and then the large language model is used in combination with the information extraction prompt and the thinking chain summary prompt to identify abstract entities. Different types of entities are identified and extracted using different models, which effectively compensates for the problems of inaccurate extraction and recognition and incomplete entity recall caused by a single model for entity extraction; finally, the content text is labeled through the content tag tree; the whole process is simple and not cumbersome, and the entity extraction accuracy is high, so that the overall labeling accuracy is high; at the same time, the efficiency is high, reducing labor costs and time costs.

S2：计算所有内容标签的内容互动指标和计算所有内容标签的商品销售指标；计算指标时对于步骤S1中的内容数据库和电商商品数据库中的数据，针对某一类商品品类，给定一个时间段(开始日期Ds，结束日期De)，进行计算；在一个商品品类中，对一个内容标签k而言，设定X_k为该内容标签k的销售指标，Y_k为该内容标签k的互动指标；S2: Calculate the content interaction index of all content tags and the product sales index of all content tags; when calculating the index, for a certain product category, for a given time period (start date Ds, end date De), calculate for the data in the content database and the e-commerce product database in step S1; in a product category, for a content tag k, set X_k as the sales index of the content tag k, and Y_k as the interaction index of the content tag k;

具体的，所述步骤S2中计算内容标签的内容互动指标包括如下步骤：Specifically, the calculation of the content interaction index of the content tag in step S2 includes the following steps:

S21：确定内容数据库中的商品品类，并统计该商品品类在设定时间段(设定时间段可根据实际情况进行确认，通常确定一个开始日期和一个结束日期，在开始日期与结束日期之间的日子为设定时间段)内的所有带有内容标签k的内容数量n，得到关于内容的集合C_k＝{内容1，内容2，...，内容n}；S21: Determine the commodity category in the content database, and count the number n of all contents with content tag k in the commodity category within a set time period (the set time period can be confirmed according to actual conditions, usually a start date and an end date are determined, and the days between the start date and the end date are the set time period), and obtain a set of contents C_k ={content 1, content 2, ..., content n};

S22：计算内容集合中每个内容的互动值E_i：S22: Calculate the interaction value E_i of each content in the content set:

E_i＝每个内容的点赞数或每个内容的转发数或每个内容的收藏数或每个内容的评价数；具体互动值的计算可根据实际情况进行选择；E_i = the number of likes for each content, the number of reposts for each content, the number of collections for each content, or the number of evaluations for each content; the calculation of the specific interaction value can be selected according to the actual situation;

S23：计算内容标签k的内容互动指标Y_k：S23: Calculate the content interaction index Y_k of the content tag k:

或者是 or

具体的，如图2所示，所述步骤S2中计算内容标签的商品销售指标包括如下步骤：Specifically, as shown in FIG2 , the step S2 of calculating the commodity sales index of the content tag includes the following steps:

S201：确定内容数据库中的商品品类，并统计该商品品类在设定时间段内的所有内容标签k的内容数量n，得到关于内容的集合C_k＝{内容1，内容2，...，内容n}；S201: Determine a commodity category in a content database, and count the number of contents n of all content tags k of the commodity category within a set time period, and obtain a set of contents C_k ={content 1, content 2, ..., content n};

S202：对于内容集合中的每一个内容i，假设内容i中提到了m个商品SPU，则确定内容集合中单个内容i对应的m个商品SPU，得到单个内容i对应的商品SPU集合P_i＝{SPU₁，SPU₂，...，SpU_m}；S202: For each content i in the content set, assuming that content i mentions m commodity SPUs, determine the m commodity SPUs corresponding to the single content i in the content set, and obtain the commodity SPU set P_i ={SPU₁ , SPU₂ , ..., SpU_m } corresponding to the single content i;

S203：确定单个商品SPU在设定时间段内的某个日期d的销售额；举例说明如下：对于内容i而言，从内容i的发布日期开始，设定内容i对商品SPU_j销售的影响时间窗口长度为t＝14天(也可调整为30天等时长，此处的时间窗口设定可根据实际情况进行确定)，以14天举例而言，在这14天中，设定内容i对商品SPU_j销售额贡献值分别为[S(j，0)，S(j，1)，...，S(j，13)]；对于商品SPU_j而言，在设定的时间窗口内的某个日期d的总销售额为Sales(j，d)，该值在数据库中通过SQL可查询到(日期d的SPU_j对应的所有商品链接的销售额之和)；S203: Determine the sales of a single commodity SPU on a certain date d within a set time period; an example is given as follows: for content i, starting from the release date of content i, the length of the time window for the impact of content i on the sales of commodity SPU_j is set to t=14 days (it can also be adjusted to 30 days or other time periods, and the time window setting here can be determined according to actual conditions). Taking 14 days as an example, during these 14 days, the contribution values of content i to the sales of commodity SPU_j are set to [S(j, 0), S(j, 1), ..., S(j, 13)] respectively; for commodity SPU_j , the total sales on a certain date d within the set time window is Sales(j, d), which can be queried in the database through SQL (the sum of the sales of all commodity links corresponding to SPU_j on date d);

S204：确定某个日期d，单个内容对单个商品SPU销售额贡献值：S204: Determine the contribution value of a single content to the SPU sales of a single product on a certain date d:

在步骤S204中，将SPU_j在某个日期d的销售额Sales(j，d)分拆给提到SPU_j的内容i，从而得到内容i在日期d对商品SPU_j销售额贡献值S_i(j，d)；In step S204, the sales of SPU_j on a certain date d, Sales(j, d), are split to the content i that mentions SPU_j , thereby obtaining the contribution value S_i (j, d) of content i to the sales of commodity SPU_j on date d;

设在某个日期d，一共有u个内容{内容1，内容2，...，内容u}都是在14天内发布的，并且它们都提到了商品SPU_j；设内容i在日期d对商品SPU_j销售额的影响贡献权重为W_i，W_i可以选择不同的取值，例如取值W_i＝内容i的互动量，也可替换为其他值(具体替换可根据实际情况而定)；从而确定在日期d，单个内容i对商品SPU_j的销售额贡献值。Suppose on a certain date d, there are a total of u contents {content 1, content 2, ..., content u} all released within 14 days, and they all mention product SPU_j ; suppose the contribution weight of content i on the sales of product SPU_j on date d is_Wi ,_Wi can choose different values, for example, value_Wi = interaction volume of content i, or it can be replaced by other values (the specific replacement can be determined according to actual conditions); thereby determining the sales contribution value of a single content i to product SPU_j on date d.

为了进一步方便理解步骤S204中内容i在日期d对商品SPU_j销售额贡献值S_i(j，d)，进行如下举例：如图3所示，图3中有3条内容，内容1和内容3提到了商品{SPU₁，SPU₂}，内容2提到了商品{SPU₁}，假设d9这天商品SPU₁有销售额(对应商品链接销售额加总)，d10这天对某一天SPU₂有销售额，将各日期各商品SPU的销售额按权重分拆到当天在时间窗口影响范围的所有内容，然后将每条内容分配到的所有商品SPU的销售额累加，作为各内容对商品销售额的贡献值。To further facilitate understanding of the sales contribution value S_i (j, d) of content i to product SPU_j on date d in step S204, the following example is given: as shown in Figure 3, there are 3 contents in Figure 3, content 1 and content 3 mention products {SPU₁ , SPU₂ }, and content 2 mentions product {SPU₁ }. Assume that product SPU₁ has sales on day d9 (the sum of the corresponding product link sales), and SPU₂ has sales on a certain day on day d10, the sales of each product SPU on each date are split according to the weight to all the content within the influence range of the time window on that day, and then the sales of all product SPUs assigned to each content are added up as the contribution value of each content to the product sales.

S205：确定单个内容i对其内容中提及的商品P_i＝{SPU₁，SPU₂，...，SPU_m}销售额总计贡献值：S205: Determine the total contribution value of a single content i to the sales of the products P_i ={SPU₁ , SPU₂ , ..., SPU_m } mentioned in the content:

其中t为设定的内容发布后对商品销量计算影响的时间窗口长度，当t为14时，则Where t is the time window length of the impact of the content release on the product sales calculation. When t is 14, then

S206：确定内容标签k对应的内容集合C_k对商品销售影响的累计值：S206: Determine the cumulative value of the impact of the content set C_k corresponding to the content tag k on the product sales:

S207：计算内容标签的销售指标X_k：或其中p为内容标签k对应的内容集合C_k对应的排重总商品SPU数，可以从数据库中进行统计得到。S207: Calculate the sales index X_k of the content tag: or Where p is the total SPU number of duplicated products corresponding to the content set C_k corresponding to the content tag k, which can be obtained by statistics from the database.

在这进行说明的是，内容标签k的销售指标X_k和互动指标Y_k也可以不进行log处理，但由于大数据中容易出现一些非常大的极值，log化之后能得到更好的可视化效果，便于用户进行可视化分析。It should be noted here that the sales indicator X_k and the interaction indicator Y_k of the content tag k do not need to be log-processed. However, since some very large extreme values are prone to appear in big data, better visualization effects can be obtained after log-processing, which is convenient for users to perform visual analysis.

S3：对内容互动指标和商品销售指标进行可视化分析；其具体包括如下步骤：S3: Visualize and analyze content interaction indicators and product sales indicators; the specific steps include:

S31：建立二维坐标系，将内容标签k对应的销售指标X_k作为X轴；内容标签对应的互动指标Y_k作为Y轴；S31: Establish a two-dimensional coordinate system, with the sales indicator X_k corresponding to the content tag k as the X-axis; and the interaction indicator Y_k corresponding to the content tag as the Y-axis;

S32：确定所有内容标签的销售指标和互动指标的均值，以销售指标和互动指标的均值为界分为四个区域，将销售指标和互动指标分别放入四个区域内。具体的，当计算好销售指标的均值和互动指标的均值后，两个均值在X轴和Y轴上做分隔线，两条相互垂直的线相交形成四个区域：观望区(左下)、机会区(左上)、强势区(右上)、信服区(右下)；随后再将所有内容标签的销售指标和互动指标的值放入至不同的区域中：强势区即为销售指标和互动指标均在各自的均值以上；机会区即为销售指标在均值以下，互动指标在均值以上；观望区即为销售指标和互动指标均在其各自的均值以下；信服区即为销售指标在均值以上，互动指标在均值以下。S32: Determine the mean of the sales index and the interaction index of all content labels, divide them into four areas with the mean of the sales index and the interaction index as the boundary, and put the sales index and the interaction index into the four areas respectively. Specifically, after calculating the mean of the sales index and the mean of the interaction index, the two means are used as dividing lines on the X-axis and the Y-axis, and two mutually perpendicular lines intersect to form four areas: the wait-and-see area (lower left), the opportunity area (upper left), the strong area (upper right), and the convinced area (lower right); then put the values of the sales index and the interaction index of all content labels into different areas: the strong area means that the sales index and the interaction index are both above their respective means; the opportunity area means that the sales index is below the mean and the interaction index is above the mean; the wait-and-see area means that the sales index and the interaction index are both below their respective means; the convinced area means that the sales index is above the mean and the interaction index is below the mean.

本申请通过计算内容标签在内容互动和商品销售两方面的量化指标，帮助客户找到在内容互动上表现更好的内容标签，或者在商品销售上表现更好的内容标签，或者两方面都表现好的内容标签；对内容标签进行量化的比较，使得营销内容的创作简单可行且精准，节省人力成本以及资源成本。This application helps customers find content tags that perform better in content interaction, or content tags that perform better in product sales, or content tags that perform well in both aspects by calculating the quantitative indicators of content tags in terms of content interaction and product sales; quantitative comparison of content tags makes the creation of marketing content simple, feasible and accurate, saving manpower and resource costs.

本发明所述实例仅仅是对本发明的优选实施方式进行描述，并非对本发明构思和范围进行限定，在不脱离本发明设计思想的前提下，本领域工程技术人员对本发明的技术方案作出的各种变形和改进，均应落入本发明的保护范围。The examples described in the present invention are merely descriptions of the preferred implementation modes of the present invention, and are not intended to limit the concept and scope of the present invention. Without departing from the design concept of the present invention, various modifications and improvements made to the technical solutions of the present invention by engineers and technicians in this field should all fall within the protection scope of the present invention.