Date of day	Content id	Praise amount	Number of forwarding	Number of evaluation	Collection number
						2023-2-1	Content 1	…	…	…	…
2023-2-2	Content 1	…	…	…	…
						2023-2-3	Content 1	…	…	…	…

Table 3 goods SPU corresponding to the content data and data table describing the content tag of the goods SPU

Content id	SPU	Content label
			Content 1	SPU 1	Efficacy-whitening
Content 1	SPU 1	Efficacy-speckle reduction
			Content 1	SPU 2	Efficacy-moisture retention

As can be seen from tables 1 to 3, two items of merchandise are mentioned in content 1, SPU1 and SPU2, respectively, wherein two content tags are identified in the description of SPU1, efficacy-whitening and efficacy-spot-lightening, respectively; in the description of SPU2, mention is made of content label efficacy-moisturization.

In a specific example, the electronic commerce commodity database stores fields including commodity information, and brands, classes, SPUs and the like after cleaning; and stores sales and sales of the commodities according to the date. One commodity id corresponds to one commodity link on the e-commerce platform, and one SPU may correspond to a plurality of commodity links. The data of the electronic commerce commodity in the electronic commerce commodity database is shown in the following table 4, and the sales line data of the commodity data according to the date is shown in the following table 5;

table 4 electronic commerce goods data table in electronic commerce goods database

Table 5 sales data table of commodity data according to date

Date of day	Commodity id	Sales amount	Sales amount	Price of
					2023-2-1	Commodity 1	…	…	…
2023-2-2	Commodity 1	…	…	…
					2023-2-3	Commodity 1	…	…	…

In one specific embodiment, the labeling process for the content in the content database according to the structured label tree to obtain the content label specifically includes the following steps:

s11: constructing a commodity knowledge graph;

in a specific embodiment, the commodity knowledge graph can be constructed by collecting commodity information (commodity information comprises a platform, a commodity id, a commodity name, a specification, commodity parameters and the like) of each electronic commerce platform to form a commodity database; entity identification is carried out on commodity text information in a commodity database by utilizing a RanER model, so that brands, classes, commodity attribute entities and related keywords are obtained, and commodity knowledge graphs are constructed by utilizing co-occurrence relations of the entities in commodities; the structure of the commodity knowledge graph is in a form of a triplet.

S12: constructing a content class database; acquiring a media content database, screening content data related to the categories of all commodities from the media content database by utilizing a commodity knowledge graph, and constructing a content category database;

specifically, the construction of the content class database comprises the following steps:

s121: collecting information of each social media content to form a media content database; the form of the content on the social media mainly comprises graphic content and video; for this step, only the original text description information is stored in the media content database: for the image-text content, storing the content title, the content text and the picture link content; for video content, storing its content title and video link content; only original text description information is stored in the media content database, and multimedia data such as picture video and the like are not contained, so that the aim of greatly reducing the storage cost is fulfilled;

s122: text matching is carried out on text information in a media content database by utilizing the commodity knowledge graph, and a content primary screening database of commodity class is established; it is described that, since the content information on the social media contains topics in various fields, including related or unrelated to the commodity, the content related to the commodity of a certain class needs to be screened out; meanwhile, the media content database contains massive data (more than 10 hundred million pieces), the content type can be further processed by converting the text besides the text, the cost of converting the text into the video is high and time-consuming, and the cost of classifying the massive data by using the AI model is also high; therefore, text matching is firstly carried out on the text information to obtain a preliminary screening database; the low-cost rapid preliminary screening reduces the data range, ensures the efficiency and reduces the cost;

in the primary screening stage, text matching is carried out on text information in a media content database by utilizing keywords of various entities in a commodity knowledge graph, including keywords of related entities such as brands, classes, attributes and the like, and the step only matches video titles of video contents in the text information, so that content data related to commodities of a certain class can be screened out from massive content data rapidly, and a primary screening database of the certain class is established; a class prescreening database, such as a cosmetic personal care class prescreening database, may be established for all business-required target classes.

S123: respectively converting the picture type content and the video type content in the primary screening database into text content; because the primary screening database in step S122 contains only text information such as the title of the content of the picture and the video, in order to further analyze the text information related to the commodity contained in the picture and the video, the content in the picture and the video needs to be converted into text;

specifically, for the picture content, converting the characters in the picture into text content by utilizing an OCR technology; for video content, the video is converted into text content using OCR technology and ASR technology, respectively: before OCR processing, frame extraction operation is carried out on the video according to a certain time interval, such as one frame is extracted every second, the video is converted into a group of pictures, characters in the pictures are converted into text contents by utilizing an OCR technology, secondary character information in the background is filtered as far as possible by utilizing the position and size information of the characters, and important content texts such as subtitles and the like are reserved; and converting the video speech into text content using ASR techniques, thereby adding (OCR text, ASR text) fields to each piece of video content; combining the two text contents to obtain video contents and converting the video contents into final text contents; because the characters and the voices on the picture in the video respectively contain a part of language information, and some language information is also respectively absent, and the OCR and ASR technologies respectively have a certain error rate, a section of video is simultaneously converted into the text by the OCR and the ASR and is used in a subsequent combination way, thereby being beneficial to more completely analyzing the language information in the video.

S124: and carrying out fine screening classification on the primary screening database, and judging whether the text content is related to commodity class. After obtaining the text content of the picture content and the video content in the preliminary screening database in step S123, it is necessary to further determine whether the content is related to the category merchandise data; since the primary screening is to narrow down data in massive data using keyword quick matching, potentially relevant content is recalled, but ambiguous keywords, such as milk brands "bright", may be used to match "sunny". Therefore, further fine screening of the class prescreening database is required. The traditional method is to label the manual data, train a supervised text classification model and judge whether the content text is related to a certain commodity. Because the number of commodity categories needing to be processed is large, the cost of retraining the classification model by adopting the manual annotation data is high.

In this step S124, the fine screening classification specifically includes: after brand and category keywords are matched with the primary screening data in the primary screening database, judging that text content is related to the data of the commodity category for the entity matched with the brand and category at the same time; meanwhile, the model judgment is not needed when the brand and class entities are matched, so that the speed is improved, and the calculation power consumption is reduced; for keywords matching only brands or categories, because of high ambiguity possibility, a text classifier is built by using a large language model with a Prompt word such as "whether the above text describes a product of skin care category", content text is classified into corresponding categories, and then a content category database is built.

S13: information extraction is carried out on the content database to construct a content label tree of the content, and data in the content database is labeled according to the content label tree of the content; wherein the method specifically comprises the following steps:

s131: firstly, extracting a character entity, a product entity, a brand entity and a commodity attribute entity from a product database by using a RanER model; carrying out semantic recognition on the class database by combining a large language model with an information extraction type prompt and a thinking chain summary type prompt, and extracting a character entity, a network hotword entity, a user pain point entity and a product characteristic entity; finally, fusing entity results to obtain a final entity;

in the step, the RaNER extracts the category information, the brand information and the related information of part of commodity attributes more accurately, extracts the content text with rich part of semantics, has general expression, and has insufficient semantic understanding capability on the content text with rich semantics, such as the word meaning of desert trunk and skin is unknown; therefore, a large language model (LLM model) is used for entity recognition, and various Prompt extraction is adopted. The LLM model can be used to perform various NLP tasks, including entity recognition; the LLM model is an artificial intelligence model aimed at understanding and generating human language. They train on a large amount of text data and can perform a wide range of tasks including text summarization, translation, emotion analysis, etc.; there are still some problems with LLM extraction such as physical recall insufficiency. According to the scheme, a section of content text is subjected to entity identification by adopting a plurality of types of promts, and the results are fused. The label word extraction effect better than that of single promt can be obtained by comprehensively using multiple types of promts, so that the accuracy of entity extraction is further ensured, and the accuracy of the construction of a subsequent content label tree is further ensured; and the entity words extracted from the content text through the RanER model and the LLM model, including the corresponding relation between the content id and the entity word id, are stored in the word extraction database table, and the result can be reused in the subsequent process of marking the content according to the structured tag tree.

S132: converting the final entity (entity words and types of entity words, and the attribute words extracted from the commodity such as functional efficacy and other types of entity words) into word vectors through a text vectorization model; then, obtaining a plurality of word vectors (word vectors with high similarity are gathered into one type) through a clustering algorithm; because the clustering algorithm has certain limitation, after the simple word vectors are clustered, words in the same class can still correspond to different meanings, so that the words in each class are induced into one or more labels through the semantic understanding capability of the large language model, and the keyword type output by the large language model is utilized to construct a content label tree of a tree structure and keywords of each label;

s133: labeling the content text in the class database according to the class content label tree. Tagging the content text in step S133 includes the steps of:

s1331: when the labeling processing is carried out on the content text, judging whether the entity extraction is carried out on the content text (namely, whether the entity extraction is carried out by using a RanER model and a LLM model or not); if entity extraction is performed, the entity word becomes a candidate tag; if the entity extraction is not performed, performing entity extraction (including RaNER entity identification and identification by using LLM large model by using information extraction type promt and hierarchical link summary type promt), and adding a candidate label set to the label corresponding to the identified entity word;

s1332: keyword matching and regular expression matching are used for keywords or regular expressions corresponding to all labels in the label tree, and matched labels are added into a candidate label set of the content text;

s1333: the candidate label set obtained through the previous two steps has potential errors, and some semantic errors can exist for labels identified by keywords and the like; the step uses a large language model to judge all candidate labels screened by the content text by using a discriminant Prompt, and the Prompt words are used for judging whether the content text is provided with the following content labels or not and determining whether the candidate labels are matched with the meanings of the corresponding content text or not; if the candidate labels are matched, the candidate labels are confirmed, and if the candidate labels are not matched, the candidate labels are corrected. Therefore, the recall rate is improved in various modes (omission is reduced), the semantic understanding capability of the LLM model is utilized, entities with keywords matched with but semantically wrong are filtered out as much as possible, and entity words which are possibly given by the LLM model in the previous link and do not exist in the original text are filtered out as much as possible, so that the effect of improving the precision rate and the recall rate is achieved.

In summary, the specific operation of labeling the content in the content database according to the structured label tree to obtain the content label in the application constructs a commodity knowledge graph, and constructs a content class database from the media content database by using the entity and the relation in the commodity knowledge graph, so that the construction efficiency of the content class database is fast, and the working efficiency is effectively improved; when the content database is built, information extraction is carried out on the content database to build a content tag tree, a RaNER model is adopted to identify an imaging entity when the information extraction is carried out, then a large language model is used for combining the information extraction type template and a thinking chain summarization type template to identify an abstract entity, and different types of entities adopt different models to carry out identification extraction, so that the problems of inaccurate extraction identification and incomplete entity recall caused by entity extraction carried out by a single model are effectively solved; finally, labeling the content text through the content label tree; the whole process has simple and not complicated step flow, and the entity extraction accuracy is higher, so that the overall labeling accuracy is higher; meanwhile, the efficiency is high, and the labor cost and the time cost are reduced.

S2: calculating content interaction indexes of all content tags and commodity sales indexes of all content tags; when calculating the index, a time period (a start date Ds and an end date De) is set for the data in the content database and the E-commerce commodity database in the step S1 and for a certain class of commodity class, and calculation is carried out; in a commodity class, X is set for a content label k_k For sales index of the content label k, Y_k An interaction index of the content label k;

specifically, the step S2 of calculating the content interaction index of the content tag includes the following steps:

s21: determining commodity class in content database, and counting all content number n with content label k in set time period (the set time period can be confirmed according to actual situation, usually determining a start date and an end date, and the days between the start date and the end date are set time period) to obtain content set C_k = { content 1, content 2,., content n };

s22: calculating an interaction value E for each content in a collection of content_i ：

E_i Number of endorsements per content + forwarding number per content + per innerThe collection number of the container+the rating number of each content; or (b)

E_i Number of endorsements per content or forwarding number per content or collection number per content or rating number per content; the specific interaction value can be calculated according to actual conditions;

s23: calculating content interaction index Y of content label k_k ：

Or +.>

Specifically, as shown in fig. 2, the step S2 of calculating the commodity sales index of the content tag includes the following steps:

s201: determining commodity class in the content database, and counting the content quantity n of all content labels k of the commodity class in a set time period to obtain a set C of the content_k = { content 1, content 2,., content n };

s202: for each content i in the content set, assuming that m commodity SPUs are mentioned in the content i, determining m commodity SPUs corresponding to a single content i in the content set to obtain a commodity SPU set P corresponding to the single content i_i ＝{SPU₁ ，SPU₂ ，...，SpU_m }；

S203: determining sales of a single commodity SPU on a certain date d in a set time period; the following are illustrated: for content i, from the release date of content i, setting content i to the commodity SPU_j The time window length of the sales is t=14 days (also 30 days or so, the time window setting can be determined according to the actual situation), and the content i is set for the goods SPU in 14 days, for example, in 14 days_j Sales contributions were [ S (j, 0), S (j, 1),. The sum of the sales contributions was S (j, 13), respectively]The method comprises the steps of carrying out a first treatment on the surface of the For commodity SPU_j In other words, the total Sales of a certain date d within a set time window is Sales (j, d), which value is passed through the databaseSQL can query (SPU date d)_j The sum of sales of all corresponding commodity links);

s204: determining a certain date d, and contributing value of single content to sales of single goods SPU:

in step S204, SPU is executed_j Sales (j, d) at a certain date d are split into SPU' s_j To obtain the content i on the date d for the goods SPU_j Sales contribution value S_i (j，d)；

Set to a certain date d, there are a total of u pieces of content { content 1, content 2,.,. The content u } are all issued within 14 days, and they all mention the commodity SPU_j The method comprises the steps of carrying out a first treatment on the surface of the Set content i to SPU at date d_j Impact of sales contribution weight W_i ，W_i Different values may be selected, e.g. value W_i The amount of interaction of content i may be replaced by other values (specific replacement may be according to the actual situation); thereby determining that at date d, a single content i is for the item SPU_j Sales contribution value of (c).

To further facilitate understanding of the item SPU of content i at date d in step S204_j Sales contribution value S_i (j, d), by way of example: as shown in FIG. 3, 3 pieces of content are shown in FIG. 3, and content 1 and content 3 refer to the commodity SPU₁ ，SPU₂ Content 2 mentions the commodity SPU₁ Suppose d9 day of the commodity SPU₁ With sales (corresponding to the sum of the linked sales of the merchandise), d10 day for SPU on a certain day₂ With sales, dividing sales of SPU of each commodity on each date into all contents of the time window influence range of the current day according to weights, and then dividing each piece of salesSales of all the goods SPUs to which the contents are distributed are accumulated as contribution values of the respective contents to the sales of the goods.

S205: determining the merchandise P to which a single content i refers_i ＝{SPU₁ ，SPU₂ ，...，SPU_m Total sales contribution value:

wherein t is the length of a time window affecting commodity sales calculation after the set content is released, and when t is 14, then

S206: determining a content set C corresponding to a content tag k_k Cumulative value of impact on sales of goods:

s207: calculating sales index X of content tags_k ：Or->Wherein p is the content set C corresponding to the content label k_k The corresponding SPU number of total goods for the duplicate removal can be obtained by statistics from a database.

It is explained here that the sales index X of the content label k_k And an interaction index Y_k The log processing can be omitted, but because some very large extremum easily appears in the big data, a better visual effect can be obtained after log conversion, and visual analysis can be conveniently carried out by a user.

S3: performing visual analysis on the content interaction index and the commodity sales index; the method specifically comprises the following steps:

s31: establishing a two-dimensional coordinate system, and setting up a sales index X corresponding to the content label k_k As the X-axis; interaction index Y corresponding to content label_k As the Y-axis;

s32: and determining the average value of the sales index and the interaction index of all the content labels, dividing the average value of the sales index and the interaction index into four areas, and respectively putting the sales index and the interaction index into the four areas. Specifically, after calculating the average value of the sales index and the average value of the interaction index, the two average values are divided into an X axis and a Y axis, and the two mutually perpendicular lines intersect to form four areas: a sightseeing area (lower left), an opportunity area (upper left), a powerful area (upper right) and a letter service area (lower right); and then putting the sales index and the interaction index values of all the content tags into different areas: the strong area is that the sales index and the interaction index are above the respective average value; the opportunity area is that the sales index is below the average value and the interactive index is above the average value; the sightseeing area is that the sales index and the interaction index are below the respective average value; the trust zone is that the sales index is above the average value and the interactive index is below the average value.

The method and the device help customers find content tags which perform better on content interaction or better on commodity sales or perform better on both aspects by calculating the quantization indexes of the content tags on both aspects of content interaction and commodity sales; and the quantitative comparison of the content labels is performed, so that the creation of the marketing content is simple, feasible and accurate, and the labor cost and the resource cost are saved.

The examples of the present invention are merely for describing the preferred embodiments of the present invention, and are not intended to limit the spirit and scope of the present invention, and those skilled in the art should make various changes and modifications to the technical solution of the present invention without departing from the spirit of the present invention.

Claims

1. An analysis method for quantitatively analyzing interaction and sales indexes of content tags, which is characterized by comprising the following steps of: the method comprises the following steps:

acquiring a content database of a social media platform, marking the content in the content database to obtain a content label, and adding a commodity SPU (Standard Product Unit, standardized product unit) label to the content database;

acquiring an electronic commerce commodity database of an electronic commerce platform in a social media platform, and adding a commodity SPU label to commodity data;

2. The method of claim 1, wherein the method comprises the steps of: the calculating the content interaction index comprises the following steps:

determining commodity class in the content database, and counting the content quantity n of all content labels k of the commodity class in a set time period to obtain a set C of the content_k = { content 1, content 2, …, content n };

calculating content interaction index Y of content label k_k ：

Or +.>

3. The method for quantitatively analyzing content-tagged interactions and sales indicators of claim 2, wherein: the calculating commodity sales index comprises the following steps:

determining m commodity SPU corresponding to single content in the content set to obtain a commodity SPU set P corresponding to the single content_i ＝{SPU₁ ,SPU₂ ,…,SPU_m }；

Determining sales of each date of a single commodity SPU in a set time period;

determining a certain date d, a single content i for a single item SPU_j Sales contribution value:

determining the merchandise P to which a single content i refers_i ＝{SPU₁ ,SPU₂ ,…,SPU_m Total sales contribution value:

wherein t is the number of days of a time window in which the set calculation content influences commodity sales from the release date; determining a content set C corresponding to a content tag k_k Cumulative value of impact on sales of goods:

4. The method of claim 1, wherein the method comprises the steps of: the visual analysis of the content interaction index and the commodity sales index specifically comprises the following steps:

5. The method of claim 1, wherein the method comprises the steps of: the content labeling processing for the content in the content database to obtain the content label specifically comprises the following steps:

constructing a commodity knowledge graph;

6. The method for quantitatively analyzing content-tagged interactions and sales indicators of claim 5, wherein: labeling the content text comprises the following steps:

7. The method for quantitatively analyzing content-tagged interactions and sales indicators of claim 5, wherein: the construction of the content class database specifically comprises the following steps:

8. The method of claim 7, wherein the method further comprises the step of: only the original text description information is stored in the media content database: for the image-text content, storing the content title, the content text and the picture link content; for video content, its content title and video link content are stored.