


技术领域technical field
本发明涉及计算机技术领域,具体而言,涉及一种广告处理方法、装置和电子设备。The present invention relates to the field of computer technology, and in particular, to an advertisement processing method, apparatus and electronic device.
背景技术Background technique
目前,在线广告投放活动中,广告平台对广告内容都有一定的审核,以使广告可以在不违规的情况下投放出来。但由于广告主对平台规则的不了解、以及平台规则本身的模糊性,使得新上架的广告常常因为各种原因违规遭到平台的下架,从而产生损失。At present, in the online advertising campaign, the advertising platform has a certain degree of review on the advertising content, so that the advertising can be delivered without breaking the rules. However, due to advertisers' lack of understanding of the platform's rules and the ambiguity of the platform's rules, newly listed ads are often removed from the platform for various reasons, resulting in losses.
发明内容SUMMARY OF THE INVENTION
为解决上述问题,本发明实施例的目的在于提供一种广告处理方法、装置和电子设备。In order to solve the above problems, the embodiments of the present invention aim to provide an advertisement processing method, apparatus and electronic device.
第一方面,本发明实施例提供了一种广告处理方法,包括:In a first aspect, an embodiment of the present invention provides an advertisement processing method, including:
获取违规广告和正常投放广告,利用所述违规广告得到模型训练负样本,并利用所述正常投放广告得到模型训练正样本;其中,所述模型训练负样本和所述模型训练正样本均是广告文本;Obtaining illegal advertisements and normal advertisements, using the illegal advertisements to obtain model training negative samples, and using the normal advertisements to obtain model training positive samples; wherein both the model training negative samples and the model training positive samples are advertisements text;
通过所述模型训练负样本和模型训练负样本对文本分类算法模型进行训练,得到广告文本分类模型;The text classification algorithm model is trained by the model training negative samples and the model training negative samples to obtain an advertisement text classification model;
获取品牌名称集合和待检测的广告文本,将待检测的所述广告文本中的词语在品牌名称集合中进行查询,当能够从所述品牌名称集合中查询出与广告文本中词语一致的品牌名称时,将所述广告文本判定为违规广告文本;其中,所述品牌名称集合,包括:品牌名称和多含义词语;所述多含义词语,是指具有品牌以及非品牌双重含义的词语;Obtain the brand name set and the advertisement text to be detected, and query the words in the advertisement text to be detected in the brand name set, when the brand name that is consistent with the words in the advertisement text can be queried from the brand name set When the advertisement text is determined as illegal advertisement text; wherein, the brand name set includes: brand name and multi-meaning words; the multi-meaning words refer to words with dual meanings of brand and non-brand;
当从所述品牌名称集合中查询出与广告文本中词语一致的多含义词语时,利用广告文本分类模型对所述广告文本进行处理,得到所述广告文本是违规广告文本的第一概率值;When the multi-meaning words that are consistent with the words in the advertisement text are queried from the brand name set, the advertisement text is processed by the advertisement text classification model, and the first probability value that the advertisement text is illegal advertisement text is obtained;
当所述第一概率值大于概率阈值时,将所述广告文本判定为违规广告文本。When the first probability value is greater than a probability threshold, the advertisement text is determined to be a violation advertisement text.
第二方面,本发明实施例还提供了一种广告处理装置,包括:In a second aspect, an embodiment of the present invention further provides an advertisement processing apparatus, including:
获取模块,用于获取违规广告和正常投放广告,利用所述违规广告得到模型训练负样本,并利用所述正常投放广告得到模型训练正样本;其中,所述模型训练负样本和所述模型训练正样本均是广告文本;an acquisition module, used for acquiring illegal advertisements and normal advertisements, using the illegal advertisements to obtain model training negative samples, and using the normal advertisements to obtain model training positive samples; wherein the model training negative samples and the model training The positive samples are all advertisement texts;
训练模块,用于通过所述模型训练负样本和模型训练负样本对文本分类算法模型进行训练,得到广告文本分类模型;A training module for training the text classification algorithm model through the model training negative samples and the model training negative samples to obtain an advertisement text classification model;
检测模块,用于获取品牌名称集合和待检测的广告文本,将待检测的所述广告文本中的词语在品牌名称集合中进行查询,当能够从所述品牌名称集合中查询出与广告文本中词语一致的品牌名称时,将所述广告文本判定为违规广告文本;其中,所述品牌名称集合,包括:品牌名称和多含义词语;所述多含义词语,是指具有品牌以及非品牌双重含义的词语;The detection module is used to obtain the set of brand names and the advertisement text to be detected, and query the words in the advertisement text to be detected in the set of brand names. When the words are consistent with the brand name, the advertisement text is determined as the illegal advertisement text; wherein, the brand name set includes: brand name and multi-meaning words; the multi-meaning words refer to the double meaning of brand and non-brand words;
处理模块,用于当从所述品牌名称集合中查询出与广告文本中词语一致的多含义词语时,利用广告文本分类模型对所述广告文本进行处理,得到所述广告文本是违规广告文本的第一概率值;The processing module is used to process the advertisement text by using the advertisement text classification model when the multi-meaning words consistent with the words in the advertisement text are queried from the brand name set, and obtain that the advertisement text is illegal advertisement text. the first probability value;
判定模块,用于当所述第一概率值大于概率阈值时,将所述广告文本判定为违规广告文本。A determination module, configured to determine the advertisement text as illegal advertisement text when the first probability value is greater than a probability threshold.
第三方面,本发明实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行上述第一方面所述的方法的步骤。In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program executes the method described in the first aspect when the computer program is run by a processor A step of.
第四方面,本发明实施例还提供了一种电子设备,所述电子设备包括有存储器,处理器以及一个或者一个以上的程序,其中所述一个或者一个以上程序存储于所述存储器中,且经配置以由所述处理器执行上述第一方面所述的方法的步骤。In a fourth aspect, an embodiment of the present invention further provides an electronic device, the electronic device includes a memory, a processor, and one or more programs, wherein the one or more programs are stored in the memory, and is configured to perform, by the processor, the steps of the method of the first aspect above.
本发明实施例上述第一方面至第四方面提供的方案中,通过利用违规广告得到模型训练负样本以及正常投放广告得到模型训练正样本,通过所述模型训练负样本和模型训练负样本对文本分类算法模型进行训练,得到广告文本分类模型;获取品牌名称集合和待检测的广告文本,将待检测的所述广告文本中的词语在品牌名称集合中进行查询,当能够从所述品牌名称集合中查询出与广告文本中词语一致的品牌名称时,将所述广告文本判定为违规广告文本;其中,所述品牌名称集合,包括:品牌名称和多含义词语;所述多含义词语,是指具有品牌以及非品牌双重含义的词语;当从所述品牌名称集合中查询出与广告文本中词语一致的多含义词语时,利用广告文本分类模型对所述广告文本进行处理,得到所述广告文本是违规广告文本的第一概率值;当所述第一概率值大于概率阈值时,将所述广告文本判定为违规广告文本,与相关技术中广告主无法对广告是否违规进行判断的方式相比,可以在广告上架之前通过训练得到的广告文本分类模型,对广告文本是否违规进行判断,从而可以自动对广告是否违规进行判断,尽可能避免广告因违规而遭到下架而遭受的损失,使经过违规判定的广告可以顺利通过广告平台审核,提高广告投放效率。In the solutions provided in the first to fourth aspects of the embodiments of the present invention, the negative samples for model training are obtained by using illegal advertisements and the positive samples for model training are obtained by placing advertisements normally, and the negative samples for model training and negative samples for model training are used for text. The classification algorithm model is trained to obtain an advertisement text classification model; the brand name set and the advertisement text to be detected are obtained, and the words in the advertisement text to be detected are queried in the brand name set. When the brand name that is consistent with the words in the advertisement text is found in the advertisement text, the advertisement text is determined as the illegal advertisement text; wherein, the brand name set includes: brand name and multi-meaning words; the multi-meaning words refer to Words with dual meanings of brand and non-brand; when the multi-meaning words that are consistent with the words in the advertisement text are queried from the brand name set, the advertisement text is processed by the advertisement text classification model, and the advertisement text is obtained. is the first probability value of the illegal advertisement text; when the first probability value is greater than the probability threshold, the advertisement text is determined as the illegal advertisement text, which is compared with the way in which the advertiser cannot judge whether the advertisement is illegal or not in the related art , the ad text classification model obtained by training can be used to judge whether the advertisement text violates the rules before the advertisement is put on the shelves, so as to automatically judge whether the advertisement violates the rules, and try to avoid the loss caused by the advertisement being taken off the shelf due to the violation as much as possible. Ads that have been judged in violation of regulations can be successfully reviewed by the advertising platform to improve the efficiency of advertising.
为使本发明的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present invention more obvious and easy to understand, preferred embodiments are given below, and are described in detail as follows in conjunction with the accompanying drawings.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.
图1示出了本发明实施例1所提供的一种广告处理方法的流程图;FIG. 1 shows a flowchart of an advertisement processing method provided by Embodiment 1 of the present invention;
图2示出了本发明实施例2所提供的一种广告处理装置的结构示意图;FIG. 2 shows a schematic structural diagram of an advertisement processing apparatus provided by Embodiment 2 of the present invention;
图3示出了本发明实施例3所提供的一种电子设备的结构示意图。FIG. 3 shows a schematic structural diagram of an electronic device according to Embodiment 3 of the present invention.
具体实施方式Detailed ways
在本发明的描述中,需要理解的是,术语“中心”、“纵向”、“横向”、“长度”、“宽度”、“厚度”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”“内”、“外”、“顺时针”、“逆时针”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", " Rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inside", "outside", "clockwise", "counterclockwise", etc. The relationship is based on the orientation or positional relationship shown in the drawings, which is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the device or element referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore It should not be construed as a limitation of the present invention.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本发明的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。In addition, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature. In the description of the present invention, "plurality" means two or more, unless otherwise expressly and specifically defined.
在本发明中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本发明中的具体含义。In the present invention, unless otherwise expressly specified and limited, the terms "installed", "connected", "connected", "fixed" and other terms should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrally connected; it can be a mechanical connection or an electrical connection; it can be a direct connection, or an indirect connection through an intermediate medium, or the internal communication between the two components. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood according to specific situations.
目前,在线广告投放活动中,广告平台对广告内容都有一定的审核,以使广告可以在不违规的情况下投放出来。但由于广告主对平台规则的不了解、以及平台规则本身的模糊性,使得新上架的广告常常因为各种原因违规遭到平台的下架,从而产生损失。At present, in the online advertising campaign, the advertising platform has a certain degree of review on the advertising content, so that the advertising can be delivered without breaking the rules. However, due to advertisers' lack of understanding of the platform's rules and the ambiguity of the platform's rules, newly listed ads are often removed from the platform for various reasons, resulting in losses.
基于此,本实施例提出一种广告处理方法、装置和电子设备,通过利用违规广告得到模型训练负样本以及正常投放广告得到模型训练正样本,通过所述模型训练负样本和模型训练负样本对文本分类算法模型进行训练,得到广告文本分类模型;获取品牌名称集合和待检测的广告文本,将待检测的所述广告文本中的词语在品牌名称集合中进行查询,当能够从所述品牌名称集合中查询出与广告文本中词语一致的品牌名称时,将所述广告文本判定为违规广告文本;其中,所述品牌名称集合,包括:品牌名称和多含义词语;所述多含义词语,是指具有品牌以及非品牌双重含义的词语;当从所述品牌名称集合中查询出与广告文本中词语一致的多含义词语时,利用广告文本分类模型对所述广告文本进行处理,得到所述广告文本是违规广告文本的第一概率值;当所述第一概率值大于概率阈值时,将所述广告文本判定为违规广告文本,可以在广告上架之前通过训练得到的广告文本分类模型,对广告文本是否违规进行判断,尽可能避免广告因违规而遭到下架而遭受的损失,使经过违规判定的广告可以顺利通过广告平台审核,提高广告投放效率。Based on this, the present embodiment proposes an advertisement processing method, device and electronic device. The negative samples for model training are obtained by using illegal advertisements and the positive samples for model training are obtained by placing advertisements normally. The negative samples for model training and the negative samples for model training are paired with The text classification algorithm model is trained to obtain an advertisement text classification model; the brand name set and the advertisement text to be detected are obtained, and the words in the advertisement text to be detected are queried in the brand name set. When a brand name that is consistent with the words in the advertisement text is found in the collection, the advertisement text is determined as the illegal advertisement text; wherein, the brand name collection includes: brand name and multi-meaning words; the multi-meaning words are Refers to words with dual meanings of brand and non-brand; when the multi-meaning words consistent with the words in the advertisement text are queried from the brand name set, the advertisement text is processed by the advertisement text classification model, and the advertisement is obtained. The text is the first probability value of the illegal advertisement text; when the first probability value is greater than the probability threshold, the advertisement text is determined as the illegal advertisement text, and the advertisement text classification model obtained by training can be used to classify the advertisement before the advertisement is put on the shelves. Judging whether the text is in violation of regulations, as far as possible to avoid the losses suffered by ads being removed from the shelves due to violations, so that the ads that have been determined to violate the regulations can smoothly pass the review of the advertising platform and improve the efficiency of advertising delivery.
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请做进一步详细的说明。In order to make the above objects, features and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and specific embodiments.
实施例1Example 1
本实施例提出一种广告处理方法,执行主体是广告主使用的服务器。This embodiment provides an advertisement processing method, where the execution subject is a server used by the advertiser.
参见图1所示的一种广告处理方法的流程图,本实施例提出一种广告处理方法,包括以下具体步骤:Referring to the flowchart of an advertisement processing method shown in FIG. 1, the present embodiment proposes an advertisement processing method, which includes the following specific steps:
步骤100、获取违规广告和正常投放广告,利用所述违规广告得到模型训练负样本,并利用所述正常投放广告得到模型训练正样本;其中,所述模型训练负样本和所述模型训练正样本均是广告文本。Step 100: Acquire illegal advertisements and normal advertisements, use the illegal advertisements to obtain negative samples for model training, and use the normal advertisements to obtain positive samples for model training; wherein, the negative samples for model training and the positive samples for model training are used Both are ad texts.
在上述步骤100中,所述违规广告和所述正常投放广告都是存储在所述服务器中的数据库中的。In the
所述违规广告和所述正常投放广告,均是文本形式的广告。Both the illegal advertisement and the normally placed advertisement are advertisements in the form of text.
为了利用所述违规广告得到模型训练负样本,可以执行以下过程:从所述违规广告中提取出违规广告的标题和文字,并对违规广告的标题和文字进行拼接,得到作为模型训练负样本的违规广告的广告文本。In order to use the illegal advertisement to obtain a negative sample for model training, the following process can be performed: extracting the title and text of the illegal advertisement from the illegal advertisement, and splicing the title and text of the illegal advertisement to obtain a model training negative sample. The ad text of the offending ad.
从所述违规广告中提取出违规广告的标题和文字的过程是现有技术,这里不再赘述。The process of extracting the title and text of the violating advertisement from the violating advertisement is in the prior art and will not be repeated here.
对违规广告的标题和文字进行拼接,得到作为模型训练负样本的违规广告的广告文本的过程是现有技术,这里不再赘述。The process of splicing the title and text of the illegal advertisement to obtain the advertisement text of the illegal advertisement as a negative sample for model training is the prior art, and will not be repeated here.
利用所述正常投放广告得到模型训练正样本的过程与上述利用所述违规广告得到模型训练负样本的过程类似,这里不再赘述。The process of obtaining positive samples for model training by using the normally placed advertisement is similar to the above-mentioned process of obtaining negative samples for model training by using the illegal advertisement, and will not be repeated here.
步骤102、通过所述模型训练负样本和模型训练负样本对文本分类算法模型进行训练,得到广告文本分类模型。Step 102: Train the text classification algorithm model by using the model training negative samples and the model training negative samples to obtain an advertisement text classification model.
在上述步骤102中,可以先对所述模型训练负样本和模型训练负样本进行分词操作,分别得到所述模型训练负样本中的词语以及所述模型训练负样本中的词语。然后利用分别得到的所述模型训练负样本中的词语以及所述模型训练负样本中的词语对文本分类算法模型进行训练,得到广告文本分类模型。In the
所述词语,包括:单词和词组。The words include: words and phrases.
步骤104、获取品牌名称集合和待检测的广告文本,将待检测的所述广告文本中的词语在品牌名称集合中进行查询,当能够从所述品牌名称集合中查询出与广告文本中词语一致的品牌名称时,将所述广告文本判定为违规广告文本;其中,所述品牌名称集合,包括:品牌名称和多含义词语;所述多含义词语,是指具有品牌以及非品牌双重含义的词语。Step 104: Obtain the brand name set and the advertisement text to be detected, and query the words in the advertisement text to be detected in the brand name set. When the brand name is identified, the advertisement text is judged as illegal advertisement text; wherein, the brand name set includes: brand name and multi-meaning words; the multi-meaning words refer to words with dual meanings of brand and non-brand .
在上述步骤104中,所述品牌名称集合中所包括的品牌名称,是从电商平台上利用爬虫技术获取到的。In the
所述品牌名称集合,存储在所述服务器中。The set of brand names is stored in the server.
在获取到品牌名称后,工作人员会对其中即有品牌含义又有非品牌含义的品牌名称进行标注,将即有品牌含义又有非品牌含义的品牌名称确定为多含义词语。After obtaining the brand name, the staff will mark the brand name with both brand meaning and non-brand meaning, and determine the brand name with both brand meaning and non-brand meaning as multi-meaning words.
那么,在经过工作人员标注后,所述品牌名称集合会包括:品牌名称和多含义词语。多含义词语携带有多含义标识;而品牌名称没有携带多含义标识。Then, after being marked by the staff, the set of brand names will include: brand names and words with multiple meanings. Multi-meaning words carry a multi-meaning designation; brand names do not carry a poly-meaningful designation.
在一个实施方式中,所述多含义词语,包括但不限于:苹果和巴塔哥尼亚。In one embodiment, the multi-meaning words include, but are not limited to: apple and Patagonia.
所述广告文本中的词语是通过对所述广告文本进行分词后得到的。对文本进行分词的过程是现有技术,这里不再赘述。The words in the advertisement text are obtained by segmenting the advertisement text. The process of segmenting the text is in the prior art, and details are not repeated here.
步骤106、当从所述品牌名称集合中查询出与广告文本中词语一致的多含义词语时,利用广告文本分类模型对所述广告文本进行处理,得到所述广告文本是违规广告文本的第一概率值。Step 106: When the multi-meaning words that are consistent with the words in the advertisement text are queried from the brand name set, the advertisement text is processed by using the advertisement text classification model, and it is obtained that the advertisement text is the first one of the illegal advertisement text. probability value.
在上述步骤106中,利用广告文本分类模型对所述广告文本进行处理,得到所述广告文本是违规广告文本的第一概率值的过程是现有技术,这里不再赘述。In the
所述第一概率值,可以设置为0.5至0.8之间的任意数值,这里不再一一赘述。The first probability value may be set to any value between 0.5 and 0.8, which will not be repeated here.
步骤108、当所述第一概率值大于概率阈值时,将所述广告文本判定为违规广告文本。Step 108: When the first probability value is greater than a probability threshold, determine the advertisement text as illegal advertisement text.
在执行完上述步骤108后,若未能从所述广告文本查询出属于品牌名称集合的词语时,就直接将所述广告文本的词语输入到广告文本分类模型中,利用广告文本分类模型对所述广告文本进行处理,得到所述广告文本是违规广告文本的概率。After the above step 108 is performed, if the words belonging to the brand name set cannot be queried from the advertisement text, the words of the advertisement text are directly input into the advertisement text classification model, and the advertisement text classification model is used to classify all the words in the advertisement text. The advertisement text is processed to obtain the probability that the advertisement text is the illegal advertisement text.
当所述广告文本是违规广告文本的概率大于所述概率阈值时,将所述广告文本判定为违规广告文本。When the probability that the advertisement text is a violating advertisement text is greater than the probability threshold, the advertisement text is determined to be a violating advertisement text.
以上内容,对广告文本是否为违规广告文本进行判定的过程进行了说明。The above content describes the process of determining whether the advertisement text is illegal advertisement text.
进一步地,为了对广告图像是否违规进行判定,本实施例提出的广告处理方法,还可以执行以下步骤(1)至步骤(11):Further, in order to determine whether the advertisement image violates the rules, the advertisement processing method proposed in this embodiment may also perform the following steps (1) to (11):
(1)当获取到待检测的广告图像时,利用光学字符识别技术(Optical CharacterRecognition,OCR)提取所述广告图像中的文字,将提取出的文字确定为广告图像的文本;(1) when acquiring the advertisement image to be detected, utilize Optical Character Recognition (Optical Character Recognition, OCR) to extract the text in the advertisement image, and the extracted text is determined as the text of the advertisement image;
(2)将所述广告图像的文本中的词语在品牌名称集合中进行查询,当能够从所述品牌名称集合中查询出与所述广告图像的文本中词语一致的品牌名称时,确定所述广告图像中包含违规词语;(2) query the words in the text of the advertisement image in the brand name set, and when the brand name consistent with the words in the text of the advertisement image can be queried from the brand name set, determine the Ad images contain offending words;
(3)当从所述品牌名称集合中查询出与所述广告图像的文本中词语一致的多含义词语时,利用广告文本分类模型对所述广告图像的文本进行处理,得到所述广告图像的文本包含违规词语的违规概率值;(3) When the multi-meaning words that are consistent with the words in the text of the advertisement image are queried from the brand name set, use the advertisement text classification model to process the text of the advertisement image, and obtain the content of the advertisement image. The text contains the violation probability value of the violation word;
(4)当所述违规概率值大于概率阈值时,确定所述广告图像中包含违规词语;(4) when the violation probability value is greater than a probability threshold, determine that the advertisement image contains a violation word;
(5)当利用广告文本分类模型确定广告图像未包含违规词语时,获取带有品牌标志的图片和带有所述品牌标志的图片的图片信息,所述带有所述品牌标志的图片的图片信息,包括:所述图片中的品牌标志所属的品牌名称和品牌标志在图片中的位置信息;(5) When it is determined by using the advertisement text classification model that the advertisement image does not contain illegal words, obtain the picture with the brand logo and the picture information of the picture with the brand logo, the picture of the picture with the brand logo Information, including: the brand name to which the brand logo in the picture belongs and the location information of the brand logo in the picture;
(6)利用带有品牌标志的图片和带有所述品牌标志的图片的图片信息对目标检测模型进行训练,得到品牌标志的检测器;(6) using the picture with the brand logo and the picture information of the picture with the brand logo to train the target detection model to obtain the detector of the brand logo;
(7)将未包含违规词语的广告图像输入到所述品牌标志的检测器进行处理,得到未包含违规词语的广告图像中具有品牌标志的第二概率值;(7) inputting the advertisement image that does not contain the illegal words into the detector of the brand logo for processing, and obtains the second probability value of the brand logo in the advertisement image that does not contain the illegal words;
(8)获取品牌款式图像类型的违规图像、著名人物图像类型的违规图像、漫画人物图像类型的违规图像以及正常投放广告中的图像,利用所述品牌款式图像类型的违规图像、所述著名人物图像类型的违规图像、所述漫画人物图像类型的违规图像和所述正常投放广告中的图像对图像分类模型进行训练,得到图片分类器;(8) Obtain the illegal images of the brand style image type, the illegal images of the famous person image type, the illegal images of the cartoon character image type and the images in the normal advertisement, and use the illegal images of the brand style image type, the famous person image The illegal images of the image type, the illegal images of the cartoon character image type, and the images in the normal advertisement are trained on the image classification model to obtain a picture classifier;
(9)将未包含违规词语的广告图像输入到所述图片分类器中进行处理,得到所述未包含违规词语的广告图像的图像类型以及第三概率值;(9) inputting the advertisement image that does not contain the illegal words into the image classifier for processing, and obtaining the image type and the third probability value of the advertisement image that does not contain the illegal words;
(10)通过以下公式对未包含违规词语的广告图像的违规概率值进行计算:(10) Calculate the violation probability value of the advertisement image that does not contain the violation word by the following formula:
S=2*S1*S2/(S1+S2)S=2*S1*S2/(S1+S2)
其中,S表示违规概率值;S1表示第二概率值;S2表示第三概率值;Among them, S represents the violation probability value; S1 represents the second probability value; S2 represents the third probability value;
(11)当计算得到的所述违规概率值大于所述概率阈值时,判定所述未包含违规词语的广告图像是违规广告。(11) When the calculated violation probability value is greater than the probability threshold, determine that the advertisement image that does not contain the violation word is a violation advertisement.
在上述步骤(2)中,所述广告图像的文本中的词语,是通过对所述广告图像的文本中的分词得到的。In the above step (2), the words in the text of the advertisement image are obtained by segmenting the words in the text of the advertisement image.
在上述步骤(5)中,所述带有品牌标志的图片,是工作人员从互联网中搜索到的,然后将搜索到的所有的带有品牌标志的图片转换到同一个分辨率下,并将转换分辨率后的带有品牌标志的图片展示给工作人员,由工作人员确定出图片所携带的品牌标志所属的品牌名称,然后工作人员将图片所携带的品牌标志标识出来,之后将品牌标志被标识出的图片输入到运行在上述服务器中的图片处理软件中,得到图片中携带的品牌标志在图片中的位置信息。In the above step (5), the picture with the brand logo is searched by the staff from the Internet, and then all the searched pictures with the brand logo are converted to the same resolution, and the The converted resolution picture with the brand logo is displayed to the staff, the staff will determine the brand name of the brand logo carried in the picture, and then the staff will identify the brand logo carried in the picture, and then put the brand logo into the image. The identified picture is input into the picture processing software running in the above server, and the position information of the brand logo carried in the picture in the picture is obtained.
图片中携带的品牌标志在图片中的位置信息,就是上述品牌标志在图片中的位置信息。The position information of the brand logo carried in the picture in the picture is the position information of the above brand logo in the picture.
在得到图片中携带的品牌标志在图片中的位置信息后,将带有品牌标志的图片、以及作为带有所述品牌标志的图片的图片信息的图片中品牌标志在图片中的位置信息和图片中的品牌标志所属的品牌名称关联到一起,并将关联后的带有品牌标志的图片以及带有所述品牌标志的图片的图片信息存储在所述服务器中。After obtaining the position information of the brand logo carried in the picture in the picture, the picture with the brand logo and the position information and the picture of the brand logo in the picture as the picture information of the picture with the brand logo are obtained. The brand names to which the brand logos in the device belong are associated together, and the associated picture with the brand logo and the picture information of the picture with the brand logo are stored in the server.
在上述步骤(6)中,利用带有品牌标志的图片和带有所述品牌标志的图片的图片信息对目标检测模型进行训练,得到品牌标志的检测器的具体过程是现有技术,这里不再一一赘述。In the above-mentioned step (6), the target detection model is trained by using the picture with the brand logo and the picture information of the picture with the brand logo, and the specific process of obtaining the detector of the brand logo is the prior art. Repeat them one by one.
在一个实施方式中,所述目标检测模型,运行在服务器中;所述目标检测模型,可以采用YoloV4模型。In one embodiment, the target detection model runs on a server; the target detection model may use the YoloV4 model.
在上述步骤(7)中,将未包含违规词语的广告图像输入到所述品牌标志的检测器进行处理,会得到未包含违规词语的广告图像中品牌标志是目标检测模型已知各个品牌标志的概率值,服务器从得到的概率值中,选择最大的概率值作为未包含违规词语的广告图像中具有品牌标志的第二概率值。In the above step (7), input the advertisement image that does not contain the illegal words into the detector of the brand logo for processing, and get the brand logo in the advertisement image that does not contain the illegal words. The target detection model knows the brand logo of each brand. For the probability value, the server selects the largest probability value from the obtained probability values as the second probability value of the brand logo in the advertisement image that does not contain the illegal word.
将未包含违规词语的广告图像输入到所述品牌标志的检测器进行处理,会得到未包含违规词语的广告图像中品牌标志是目标检测模型已知各个品牌标志的概率值的具体过程是现有技术,这里不再赘述。Input the advertisement image that does not contain the illegal words into the detector of the brand logo for processing, and the brand logo in the advertisement image that does not contain the illegal words will be obtained. The specific process of the probability value of each brand logo known to the detection model is an existing process. technology, which will not be repeated here.
而且,将未包含违规词语的广告图像输入到所述品牌标志的检测器进行处理,会得到未包含违规词语的广告图像中品牌标志是目标检测模型已知各个品牌标志的概率值的同时,还可以输出未包含违规词语的广告图像中品牌标志的位置信息。Moreover, inputting the advertisement images that do not contain the illegal words into the detector of the brand logo for processing, the brand logo in the advertisement image that does not contain the illegal words is obtained. The target detection model knows the probability value of each brand logo, and also Position information of brand logos in advertisement images that do not contain offending words can be output.
在上述步骤(8)中,品牌款式图像类型的违规图像、著名人物图像类型的违规图像、漫画人物图像类型的违规图像以及正常投放广告中的图像都是从数据库中存储的违规广告的广告图像中得到的。In the above step (8), the illegal images of the brand style image type, the illegal images of the famous person image type, the illegal images of the cartoon character image type, and the images in the normal advertisement are all advertisement images of the illegal advertisement stored in the database. obtained in.
所述数据库中,包括:违规图像和正常投放广告中的图像。The database includes: illegal images and images in normal advertisements.
所述违规图像携带有图像类型,所述图像类型,包括:品牌款式图像类型、著名人物图像类型和漫画人物图像类型。The illegal images carry image types, and the image types include: brand style image types, famous character image types, and cartoon character image types.
为了根据品牌款式图像类型的违规图像、著名人物图像类型的违规图像、漫画人物图像类型的违规图像训练图片分类器,需要从正常投放广告中的图像中分别选择与品牌款式图像类型的违规图像、著名人物图像类型的违规图像、漫画人物图像类型的违规图像对应的图像样本,如:著名人物图像类型的违规图像含有明星肖像,那么与著名人物图像类型的违规图像对应的图像样本,就是正常投放广告中的普通人物图像;品牌款式图像类型的违规图像,那么与品牌款式图像类型的违规图像对应的图像样本,就是正常投放广告中和品牌款式图像类型的违规图像指示商品的商品类型相同的一般商品的商品图像。In order to train the image classifier based on the illegal images of the brand style image type, the illegal images of the famous person image type, and the illegal images of the comic character image type, it is necessary to select the illegal images of the brand style image type, Image samples corresponding to the illegal images of the image type of famous people and the illegal images of the image type of comic characters. For example, if the illegal images of the image type of famous people contain star portraits, then the image samples corresponding to the illegal images of the image type of famous people are normal delivery. Ordinary person images in advertisements; violating images of brand style image type, then the image samples corresponding to the violating images of brand style image type are the same as those indicated by the violating images of brand style image types in normal advertisements. The product image for the product.
然后分别将品牌款式图像类型的违规图像和与其对应的图像样本、著名人物图像类型的违规图像和与其对应的图像样本、漫画人物图像类型的违规图像和与其对应的图像样本输入到图像分类模型中,对图像分类模型进行训练,得到图片分类器,具体的训练过程是现有技术,这里不再赘述。Then, the illegal images of the brand style image type and their corresponding image samples, the illegal images of the famous person image type and their corresponding image samples, and the illegal images of the comic character image type and their corresponding image samples are respectively input into the image classification model. , train the image classification model to obtain the image classifier, and the specific training process is the prior art, which will not be repeated here.
在一个实施方式中,所述图像分类模型,可以采用EfficientNet模型。In one embodiment, the image classification model may use an EfficientNet model.
在上述步骤(9)中,所述未包含违规词语的广告图像的图像类型,就是图片分类器已知的各所述图像类型中最大概率值对应的图像类型,并将该最大概率值确定为第三概率值。In the above step (9), the image type of the advertisement image that does not contain the illegal words is the image type corresponding to the maximum probability value among the image types known to the image classifier, and the maximum probability value is determined as The third probability value.
可选地,在通过以上步骤(1)至步骤(11)对广告图像是否违规进行判定之外,本实施例提出的广告处理方法,还可以执行以下步骤(1)至步骤(4),对广告视频是否违规进行判定:Optionally, in addition to the above steps (1) to (11) for judging whether the advertisement image violates the rules, the advertisement processing method proposed in this embodiment can also perform the following steps (1) to (4), Judging whether the advertising video violates the rules:
(1)当获取到待检测的广告视频时,利用关键帧提取技术对所述广告视频中的视频关键帧进行提取;(1) when acquiring the advertisement video to be detected, utilize the key frame extraction technology to extract the video key frame in the advertisement video;
(2)将提取到的视频关键帧中位于广告视频开头的视频关键帧和位于广告视频结尾的视频关键帧删除,并按照预设时间间隔从已删除位于广告视频开头的视频关键帧和位于广告视频结尾的视频关键帧的剩余的视频关键帧中提取出待检测的多个视频关键帧;(2) Delete the video key frame at the beginning of the advertisement video and the video key frame at the end of the advertisement video among the extracted video key frames, and delete the video key frame at the beginning of the advertisement video and the video key frame at the end of the advertisement video according to the preset time interval. Extracting multiple video key frames to be detected from the remaining video key frames of the video key frame at the end of the video;
(3)对待检测的多个视频关键帧中的各视频关键帧进行违规判定,得到各视频关键帧的违规判定结果;其中,所述违规判定结果,包括:包含违规词语的视频关键帧以及被判定为违规广告的视频关键帧;(3) Violation judgment is performed on each video key frame in the plurality of video key frames to be detected, and a violation judgment result of each video key frame is obtained; wherein, the violation judgment result includes: the video key frame containing the violating word and the Video keyframes determined to be illegal ads;
(4)当所述违规判定结果指示各视频关键帧中具有包含违规词语的视频关键帧或者各视频关键帧中具有被判定为违规广告的视频关键帧时,确定所述广告视频为违规广告。(4) When the violation determination result indicates that each video key frame has a video key frame containing a violation word or each video key frame has a video key frame determined as a violation advertisement, determine that the advertisement video is a violation advertisement.
在上述步骤(1)中,利用关键帧提取技术对所述广告视频中的视频关键帧进行提取的具体过程是现有技术,这里不再赘述。所述视频关键帧携带有时间戳。In the above-mentioned step (1), the specific process of extracting the video key frame in the advertisement video by using the key frame extraction technology is the prior art, which will not be repeated here. The video key frame carries a timestamp.
所述视频关键帧,携带有时间戳。The video key frame carries a timestamp.
在上述步骤(2)中,根据所述视频关键帧中携带的时间戳指示的时间,将提取到的视频关键帧中位于广告视频开头的视频关键帧和位于广告视频结尾的视频关键帧删除。In the above step (2), according to the time indicated by the timestamp carried in the video key frame, the video key frame located at the beginning of the advertisement video and the video key frame located at the end of the advertisement video in the extracted video key frame are deleted.
按照视频关键帧的时间戳指示时间从小到大的顺序,对删除首尾视频关键帧后剩余的视频关键帧进行排序,并按照预设时间间隔从排序后的剩余的视频关键帧中提取出待检测的多个视频关键帧。Sort the remaining video key frames after deleting the first and last video key frames according to the time indicated by the timestamps of the video key frames in ascending order, and extract the remaining video key frames to be detected according to the preset time interval. of multiple video keyframes.
在一个实施方式中,预设时间间隔可以设置为2秒至5秒之间的任意时长,这里不再一一赘述。In one embodiment, the preset time interval may be set to any duration between 2 seconds and 5 seconds, which will not be repeated here.
在上述步骤(3)中,一个视频关键帧,可以看作是一张广告图像。In the above step (3), a video key frame can be regarded as an advertisement image.
对待检测的多个视频关键帧中的各视频关键帧进行违规判定的过程,与上述步骤(1)至步骤(11)对广告图像是否违规进行判定的过程类似,这里不再赘述。The process of judging the violation of each video key frame among the plurality of video key frames to be detected is similar to the process of judging whether the advertisement image violates the rules in the above steps (1) to (11), and will not be repeated here.
综上所述,本实施例提出一种广告处理方法,通过利用违规广告得到模型训练负样本以及正常投放广告得到模型训练正样本,通过所述模型训练负样本和模型训练负样本对文本分类算法模型进行训练,得到广告文本分类模型;获取品牌名称集合和待检测的广告文本,将待检测的所述广告文本中的词语在品牌名称集合中进行查询,当能够从所述品牌名称集合中查询出与广告文本中词语一致的品牌名称时,将所述广告文本判定为违规广告文本;其中,所述品牌名称集合,包括:品牌名称和多含义词语;所述多含义词语,是指具有品牌以及非品牌双重含义的词语;当从所述品牌名称集合中查询出与广告文本中词语一致的多含义词语时,利用广告文本分类模型对所述广告文本进行处理,得到所述广告文本是违规广告文本的第一概率值;当所述第一概率值大于概率阈值时,将所述广告文本判定为违规广告文本,与相关技术中广告主无法对广告是否违规进行判断的方式相比,可以在广告上架之前通过训练得到的广告文本分类模型,对广告文本是否违规进行判断,从而可以自动对广告是否违规进行判断,尽可能避免广告因违规而遭到下架而遭受的损失,使经过违规判定的广告可以顺利通过广告平台审核,提高广告投放效率。To sum up, this embodiment proposes an advertisement processing method, which obtains model training negative samples by using illegal advertisements and obtains model training positive samples by placing advertisements normally, and uses the model training negative samples and model training negative samples to classify a text algorithm. The model is trained to obtain an advertisement text classification model; the brand name set and the advertisement text to be detected are obtained, and the words in the advertisement text to be detected are queried in the brand name set. When a brand name that is consistent with the words in the advertisement text is found, the advertisement text is determined as the illegal advertisement text; wherein, the brand name set includes: brand name and multi-meaning words; and non-brand double-meaning words; when the multi-meaning words consistent with the words in the advertisement text are queried from the brand name set, the advertisement text is processed by the advertisement text classification model, and it is found that the advertisement text is illegal The first probability value of the advertisement text; when the first probability value is greater than the probability threshold, the advertisement text is determined as illegal advertisement text. The advertisement text classification model obtained by training before the advertisement is put on the shelves can judge whether the advertisement text violates the rules, so as to automatically judge whether the advertisement violates the rules, so as to avoid the losses caused by the advertisements being removed due to violations as much as possible. The adjudged ad can pass the review of the ad platform smoothly, improving the efficiency of ad delivery.
实施例2Example 2
本实施例提出一种广告处理装置,用于执行上述实施例1提出的广告处理装置。This embodiment provides an advertisement processing apparatus, which is used for executing the advertisement processing apparatus provided in the above-mentioned Embodiment 1.
参见图2所示的一种广告处理装置的结构示意图,本实施例提出一种广告处理装置,包括:Referring to the schematic structural diagram of an advertisement processing apparatus shown in FIG. 2 , this embodiment provides an advertisement processing apparatus, including:
获取模块200,用于获取违规广告和正常投放广告,利用所述违规广告得到模型训练负样本,并利用所述正常投放广告得到模型训练正样本;其中,所述模型训练负样本和所述模型训练正样本均是广告文本;The acquiring
训练模块202,用于通过所述模型训练负样本和模型训练负样本对文本分类算法模型进行训练,得到广告文本分类模型;The
检测模块204,用于获取品牌名称集合和待检测的广告文本,将待检测的所述广告文本中的词语在品牌名称集合中进行查询,当能够从所述品牌名称集合中查询出与广告文本中词语一致的品牌名称时,将所述广告文本判定为违规广告文本;其中,所述品牌名称集合,包括:品牌名称和多含义词语;所述多含义词语,是指具有品牌以及非品牌双重含义的词语;The
处理模块206,用于当从所述品牌名称集合中查询出与广告文本中词语一致的多含义词语时,利用广告文本分类模型对所述广告文本进行处理,得到所述广告文本是违规广告文本的第一概率值;The
判定模块208,用于当所述第一概率值大于概率阈值时,将所述广告文本判定为违规广告文本。The determining
本实施例提出一种广告处理装置,还包括:广告图像检测模块。This embodiment provides an advertisement processing apparatus, further comprising: an advertisement image detection module.
所述广告图像检测模块,具体用于:The advertisement image detection module is specifically used for:
当获取到待检测的广告图像时,利用光学字符识别技术OCR提取所述广告图像中的文字,将提取出的文字确定为广告图像的文本;When the advertisement image to be detected is acquired, use the optical character recognition technology OCR to extract the text in the advertisement image, and determine the extracted text as the text of the advertisement image;
将所述广告图像的文本中的词语在品牌名称集合中进行查询,当能够从所述品牌名称集合中查询出与所述广告图像的文本中词语一致的品牌名称时,确定所述广告图像中包含违规词语;The words in the text of the advertisement image are searched in the set of brand names, and when the brand name consistent with the words in the text of the advertisement image can be queried from the set of brand names, it is determined that there is a brand name in the advertisement image. contains offending words;
当从所述品牌名称集合中查询出与所述广告图像的文本中词语一致的多含义词语时,利用广告文本分类模型对所述广告图像的文本进行处理,得到所述广告图像的文本包含违规词语的违规概率值;When the multi-meaning words that are consistent with the words in the text of the advertisement image are queried from the brand name set, the text of the advertisement image is processed by using an advertisement text classification model, and it is obtained that the text of the advertisement image contains violations of the rules. Violation probability value of the word;
当所述违规概率值大于概率阈值时,确定所述广告图像中包含违规词语。When the violation probability value is greater than a probability threshold, it is determined that the advertisement image contains a violation word.
所述广告图像检测模块,还具体用于:The advertisement image detection module is also specifically used for:
当利用广告文本分类模型确定广告图像未包含违规词语时,获取带有品牌标志的图片和带有所述品牌标志的图片的图片信息,所述带有所述品牌标志的图片的图片信息,包括:所述图片中的品牌标志所属的品牌名称和品牌标志在图片中的位置信息;When it is determined by using the advertisement text classification model that the advertisement image does not contain illegal words, the picture with the brand logo and the picture information of the picture with the brand logo are obtained, and the picture information of the picture with the brand logo includes: : the brand name to which the brand logo in the picture belongs and the location information of the brand logo in the picture;
利用带有品牌标志的图片和带有所述品牌标志的图片的图片信息对目标检测模型进行训练,得到品牌标志的检测器;The target detection model is trained by using the picture with the brand logo and the picture information of the picture with the brand logo, and the detector of the brand logo is obtained;
将未包含违规词语的广告图像输入到所述品牌标志的检测器进行处理,得到未包含违规词语的广告图像中具有品牌标志的第二概率值;inputting the advertisement images that do not contain the violating words into the detector of the brand logo for processing, and obtaining a second probability value that the advertisement images that do not contain the violating words have the brand logo;
获取品牌款式图像类型的违规图像、著名人物图像类型的违规图像、漫画人物图像类型的违规图像以及正常投放广告中的图像,利用所述品牌款式图像类型的违规图像、所述著名人物图像类型的违规图像、所述漫画人物图像类型的违规图像和所述正常投放广告中的图像对图像分类模型进行训练,得到图片分类器;Obtain the illegal images of the brand style image type, the illegal images of the famous person image type, the illegal images of the cartoon character image type, and the images in the normal advertisement, and use the illegal images of the brand style image type and the famous person image type. The image classification model is trained on the illegal image, the illegal image of the cartoon character image type, and the image in the normally placed advertisement to obtain a picture classifier;
将未包含违规词语的广告图像输入到所述图片分类器中进行处理,得到所述未包含违规词语的广告图像的图像类型以及第三概率值;所述图像类型,包括:品牌款式图像类型、著名人物图像类型和漫画人物图像类型;Input the advertisement image that does not contain the illegal words into the image classifier for processing, and obtain the image type and the third probability value of the advertisement image that does not contain the illegal words; the image type includes: brand style image type, Famous character image types and comic character image types;
通过以下公式对未包含违规词语的广告图像的违规概率值进行计算:The violation probability value of the advertisement image that does not contain the violation word is calculated by the following formula:
S=2*S1*S2/(S1+S2)S=2*S1*S2/(S1+S2)
其中,S表示违规概率值;S1表示第二概率值;S2表示第三概率值;Among them, S represents the violation probability value; S1 represents the second probability value; S2 represents the third probability value;
当计算得到的所述违规概率值大于所述概率阈值时,判定所述未包含违规词语的广告图像是违规广告。When the calculated violation probability value is greater than the probability threshold, it is determined that the advertisement image that does not contain the violation word is a violation advertisement.
还包括:广告视频检测模块。Also includes: advertising video detection module.
本实施例提出一种广告处理装置,还包括:广告视频检测模块。所述广告视频检测模块,具体用于:This embodiment provides an advertisement processing apparatus, which further includes: an advertisement video detection module. The advertising video detection module is specifically used for:
当获取到待检测的广告视频时,利用关键帧提取技术对所述广告视频中的视频关键帧进行提取;When the advertisement video to be detected is obtained, the key frame extraction technology is used to extract the video key frame in the advertisement video;
将提取到的视频关键帧中位于广告视频开头的视频关键帧和位于广告视频结尾的视频关键帧删除,并按照预设时间间隔从已删除位于广告视频开头的视频关键帧和位于广告视频结尾的视频关键帧的剩余的视频关键帧中提取出待检测的多个视频关键帧;Delete the video key frames at the beginning of the ad video and the video key frames at the end of the ad video from the extracted video key frames, and delete the video key frames at the beginning of the ad video and the video key frames at the end of the ad video from the deleted video key frames at the preset time interval. Extracting multiple video key frames to be detected from the remaining video key frames of the video key frame;
对待检测的多个视频关键帧中的各视频关键帧进行违规判定,得到各视频关键帧的违规判定结果;其中,所述违规判定结果,包括:包含违规词语的视频关键帧以及被判定为违规广告的视频关键帧;Violation judgment is performed on each video key frame in the multiple video key frames to be detected, and a violation judgment result of each video key frame is obtained; wherein, the violation judgment result includes: the video key frame containing the violating words and the video key frame that is judged to be violating the rules The video keyframe of the ad;
当所述违规判定结果指示各视频关键帧中具有包含违规词语的视频关键帧或者各视频关键帧中具有被判定为违规广告的视频关键帧时,确定所述广告视频为违规广告。When the violation determination result indicates that each video key frame has a video key frame containing a violation word or each video key frame has a video key frame determined to be a violation advertisement, it is determined that the advertisement video is a violation advertisement.
综上所述,本实施例提出一种广告处理装置,通过利用违规广告得到模型训练负样本以及正常投放广告得到模型训练正样本,通过所述模型训练负样本和模型训练负样本对文本分类算法模型进行训练,得到广告文本分类模型;获取品牌名称集合和待检测的广告文本,将待检测的所述广告文本中的词语在品牌名称集合中进行查询,当能够从所述品牌名称集合中查询出与广告文本中词语一致的品牌名称时,将所述广告文本判定为违规广告文本;其中,所述品牌名称集合,包括:品牌名称和多含义词语;所述多含义词语,是指具有品牌以及非品牌双重含义的词语;当从所述品牌名称集合中查询出与广告文本中词语一致的多含义词语时,利用广告文本分类模型对所述广告文本进行处理,得到所述广告文本是违规广告文本的第一概率值;当所述第一概率值大于概率阈值时,将所述广告文本判定为违规广告文本,与相关技术中广告主无法对广告是否违规进行判断的方式相比,可以在广告上架之前通过训练得到的广告文本分类模型,对广告文本是否违规进行判断,从而可以自动对广告是否违规进行判断,尽可能避免广告因违规而遭到下架而遭受的损失,使经过违规判定的广告可以顺利通过广告平台审核,提高广告投放效率。To sum up, this embodiment proposes an advertisement processing device, which obtains negative samples for model training by using illegal advertisements and obtains positive samples for model training by placing advertisements normally, and uses the negative samples for model training and negative samples for model training to classify a text algorithm. The model is trained to obtain an advertisement text classification model; the brand name set and the advertisement text to be detected are obtained, and the words in the advertisement text to be detected are queried in the brand name set. When a brand name that is consistent with the words in the advertisement text is found, the advertisement text is determined as the illegal advertisement text; wherein, the brand name set includes: brand name and multi-meaning words; and non-brand double-meaning words; when the multi-meaning words consistent with the words in the advertisement text are queried from the brand name set, the advertisement text is processed by the advertisement text classification model, and it is found that the advertisement text is illegal The first probability value of the advertisement text; when the first probability value is greater than the probability threshold, the advertisement text is determined as illegal advertisement text. The advertisement text classification model obtained by training before the advertisement is put on the shelves can judge whether the advertisement text violates the rules, so as to automatically judge whether the advertisement violates the rules, so as to avoid the losses caused by the advertisements being removed due to violations as much as possible. The adjudged ad can pass the review of the ad platform smoothly, improving the efficiency of ad delivery.
实施例3Example 3
本实施例提出一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行上述实施例1描述的广告处理方法的步骤。具体实现可参见方法实施例1,在此不再赘述。This embodiment provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the advertisement processing method described in Embodiment 1 above are executed. For specific implementation, refer to Method Embodiment 1, which will not be repeated here.
此外,参见图3所示的一种电子设备的结构示意图,本实施例还提出一种电子设备,上述电子设备包括总线51、处理器52、收发机53、总线接口54、存储器55和用户接口56。上述电子设备包括有存储器55。In addition, referring to the schematic structural diagram of an electronic device shown in FIG. 3 , an electronic device is also proposed in this embodiment, and the electronic device includes a bus 51 , a processor 52 , a transceiver 53 , a bus interface 54 , a memory 55 and a user interface 56. The above-mentioned electronic device includes the memory 55 .
本实施例中,上述电子设备还包括:存储在存储器55上并可在处理器52上运行的一个或者一个以上的程序,经配置以由上述处理器执行上述一个或者一个以上程序用于进行以下步骤(1)至步骤(5):In this embodiment, the above-mentioned electronic device further includes: one or more programs stored on the memory 55 and executable on the processor 52, configured to be executed by the above-mentioned processor to execute the above-mentioned one or more programs for the following Step (1) to Step (5):
(1)获取违规广告和正常投放广告,利用所述违规广告得到模型训练负样本,并利用所述正常投放广告得到模型训练正样本;其中,所述模型训练负样本和所述模型训练正样本均是广告文本;(1) Obtaining illegal advertisements and normal advertisements, using the illegal advertisements to obtain model training negative samples, and using the normal advertisements to obtain model training positive samples; wherein the model training negative samples and the model training positive samples are advertising text;
(2)通过所述模型训练负样本和模型训练负样本对文本分类算法模型进行训练,得到广告文本分类模型;(2) training the text classification algorithm model through the model training negative samples and the model training negative samples to obtain an advertisement text classification model;
(3)获取品牌名称集合和待检测的广告文本,将待检测的所述广告文本中的词语在品牌名称集合中进行查询,当能够从所述品牌名称集合中查询出与广告文本中词语一致的品牌名称时,将所述广告文本判定为违规广告文本;其中,所述品牌名称集合,包括:品牌名称和多含义词语;所述多含义词语,是指具有品牌以及非品牌双重含义的词语;(3) Obtain a set of brand names and the advertisement text to be detected, and query the words in the advertisement text to be detected in the set of brand names. When the brand name is identified, the advertisement text is judged as illegal advertisement text; wherein, the brand name set includes: brand name and multi-meaning words; the multi-meaning words refer to words with dual meanings of brand and non-brand ;
(4)当从所述品牌名称集合中查询出与广告文本中词语一致的多含义词语时,利用广告文本分类模型对所述广告文本进行处理,得到所述广告文本是违规广告文本的第一概率值;(4) When the multi-meaning words that are consistent with the words in the advertisement text are queried from the brand name set, the advertisement text is processed by using the advertisement text classification model, and it is obtained that the advertisement text is the first one of the illegal advertisement text. probability value;
(5)当所述第一概率值大于概率阈值时,将所述广告文本判定为违规广告文本。(5) When the first probability value is greater than a probability threshold, determine the advertisement text as illegal advertisement text.
收发机53,用于在处理器52的控制下接收和发送数据。The transceiver 53 is used to receive and transmit data under the control of the processor 52 .
其中,总线架构(用总线51来代表),总线51可以包括任意数量的互联的总线和桥,总线51将包括由处理器52代表的一个或多个处理器和存储器55代表的存储器的各种电路链接在一起。总线51还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起,这些都是本领域所公知的,因此,本实施例不再对其进行进一步描述。总线接口54在总线51和收发机53之间提供接口。收发机53可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。例如:收发机53从其他设备接收外部数据。收发机53用于将处理器52处理后的数据发送给其他设备。取决于计算系统的性质,还可以提供用户接口56,例如小键盘、显示器、扬声器、麦克风、操纵杆。Of these, the bus architecture (represented by bus 51 ), which may include any number of interconnected buses and bridges, will include one or more processors represented by processor 52 and various types of memory represented by memory 55 circuits are linked together. The bus 51 may also link together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore will not be described further in this embodiment. Bus interface 54 provides an interface between bus 51 and transceiver 53 . Transceiver 53 may be a single element or multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other devices over a transmission medium. For example, the transceiver 53 receives external data from other devices. The transceiver 53 is used to transmit the data processed by the processor 52 to other devices. Depending on the nature of the computing system, a user interface 56 may also be provided, such as a keypad, display, speakers, microphone, joystick.
处理器52负责管理总线51和通常的处理,如前述上述运行通用操作系统。而存储器55可以被用于存储处理器52在执行操作时所使用的数据。The processor 52 is responsible for managing the bus 51 and general processing, such as running a general-purpose operating system as described above. Instead, memory 55 may be used to store data used by processor 52 in performing operations.
可选的,处理器52可以是但不限于:中央处理器、单片机、微处理器或者可编程逻辑器件。Optionally, the processor 52 may be, but not limited to, a central processing unit, a single-chip microcomputer, a microprocessor or a programmable logic device.
可以理解,本发明实施例中的存储器55可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data RateSDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(DirectRambus RAM,DRRAM)。本实施例描述的系统和方法的存储器55旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory 55 in the embodiment of the present invention may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. Wherein, the non-volatile memory may be Read-Only Memory (ROM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (Erasable PROM, EPROM), Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. The volatile memory may be random access memory (RAM), which is used as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (Synchlink DRAM, SLDRAM) and Direct memory bus random access memory (DirectRambus RAM, DRRAM). The memory 55 of the systems and methods described in this embodiment is intended to include, but is not limited to, these and any other suitable types of memory.
在一些实施方式中,存储器55存储了如下的元素,可执行模块或者数据结构,或者它们的子集,或者它们的扩展集:操作系统551和应用程序552。In some embodiments, memory 55 stores the following elements, executable modules or data structures, or a subset thereof, or an extended set of them: operating
其中,操作系统551,包含各种系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务。应用程序552,包含各种应用程序,例如媒体播放器(Media Player)、浏览器(Browser)等,用于实现各种应用业务。实现本发明实施例方法的程序可以包含在应用程序552中。The
综上所述,本实施例提出一种计算机可读存储介质和电子设备,通过利用违规广告得到模型训练负样本以及正常投放广告得到模型训练正样本,通过所述模型训练负样本和模型训练负样本对文本分类算法模型进行训练,得到广告文本分类模型;获取品牌名称集合和待检测的广告文本,将待检测的所述广告文本中的词语在品牌名称集合中进行查询,当能够从所述品牌名称集合中查询出与广告文本中词语一致的品牌名称时,将所述广告文本判定为违规广告文本;其中,所述品牌名称集合,包括:品牌名称和多含义词语;所述多含义词语,是指具有品牌以及非品牌双重含义的词语;当从所述品牌名称集合中查询出与广告文本中词语一致的多含义词语时,利用广告文本分类模型对所述广告文本进行处理,得到所述广告文本是违规广告文本的第一概率值;当所述第一概率值大于概率阈值时,将所述广告文本判定为违规广告文本,与相关技术中广告主无法对广告是否违规进行判断的方式相比,可以在广告上架之前通过训练得到的广告文本分类模型,对广告文本是否违规进行判断,从而可以自动对广告是否违规进行判断,尽可能避免广告因违规而遭到下架而遭受的损失,使经过违规判定的广告可以顺利通过广告平台审核,提高广告投放效率。To sum up, this embodiment proposes a computer-readable storage medium and an electronic device. The negative samples for model training are obtained by using illegal advertisements, and the positive samples for model training are obtained by placing advertisements normally. The sample trains the text classification algorithm model to obtain an advertisement text classification model; obtains the brand name set and the advertisement text to be detected, and queries the words in the advertisement text to be detected in the brand name set. When a brand name that is consistent with the words in the advertisement text is found in the brand name set, the advertisement text is determined as the illegal advertisement text; wherein, the brand name set includes: brand name and multi-meaning words; the multi-meaning words , refers to words with dual meanings of brand and non-brand; when the multi-meaning words that are consistent with the words in the advertisement text are queried from the brand name set, the advertisement text is processed by the advertisement text classification model, and the result is obtained. The advertisement text is the first probability value of the illegal advertisement text; when the first probability value is greater than the probability threshold, the advertisement text is determined to be the illegal advertisement text, and the advertiser cannot judge whether the advertisement violates the regulations in the related art. Compared with other methods, the ad text classification model obtained by training can be used to judge whether the advertisement text violates the rules before the advertisement is put on the shelves, so as to automatically judge whether the advertisement violates the rules, and try to avoid the advertisements being removed from the shelves due to violations as much as possible. Losses, so that advertisements that have been judged against violations can smoothly pass the review of the advertising platform, and improve the efficiency of advertising.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed by the present invention. should be included within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210334882.XACN114663152A (en) | 2022-03-31 | 2022-03-31 | An advertisement processing method, device and electronic device |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210334882.XACN114663152A (en) | 2022-03-31 | 2022-03-31 | An advertisement processing method, device and electronic device |
| Publication Number | Publication Date |
|---|---|
| CN114663152Atrue CN114663152A (en) | 2022-06-24 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210334882.XAPendingCN114663152A (en) | 2022-03-31 | 2022-03-31 | An advertisement processing method, device and electronic device |
| Country | Link |
|---|---|
| CN (1) | CN114663152A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115470398A (en)* | 2022-07-05 | 2022-12-13 | 飞书深诺数字科技(上海)股份有限公司 | A method, device, and electronic device for detecting adult illegal content in online advertisements |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110852231A (en)* | 2019-11-04 | 2020-02-28 | 云目未来科技(北京)有限公司 | Illegal video detection method and device and storage medium |
| CN112199948A (en)* | 2020-09-28 | 2021-01-08 | 中国互联网金融协会 | Text content identification and illegal advertisement identification method and device and electronic equipment |
| CN113221845A (en)* | 2021-06-07 | 2021-08-06 | 北京猎豹移动科技有限公司 | Advertisement auditing method, device, equipment and storage medium |
| CN114155529A (en)* | 2021-11-05 | 2022-03-08 | 深圳市标准技术研究院 | Illegal advertisement identification method combining character visual features and character content features |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110852231A (en)* | 2019-11-04 | 2020-02-28 | 云目未来科技(北京)有限公司 | Illegal video detection method and device and storage medium |
| CN112199948A (en)* | 2020-09-28 | 2021-01-08 | 中国互联网金融协会 | Text content identification and illegal advertisement identification method and device and electronic equipment |
| CN113221845A (en)* | 2021-06-07 | 2021-08-06 | 北京猎豹移动科技有限公司 | Advertisement auditing method, device, equipment and storage medium |
| CN114155529A (en)* | 2021-11-05 | 2022-03-08 | 深圳市标准技术研究院 | Illegal advertisement identification method combining character visual features and character content features |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115470398A (en)* | 2022-07-05 | 2022-12-13 | 飞书深诺数字科技(上海)股份有限公司 | A method, device, and electronic device for detecting adult illegal content in online advertisements |
| Publication | Publication Date | Title |
|---|---|---|
| CN110263248B (en) | Information pushing method, device, storage medium and server | |
| US9524518B1 (en) | Product image information extraction | |
| KR100893273B1 (en) | Ad inspection method and system using keyword comparison | |
| CN112287914B (en) | PPT video segment extraction method, device, equipment and medium | |
| CN112395420A (en) | Video content retrieval method and device, computer equipment and storage medium | |
| WO2018050022A1 (en) | Application program recommendation method, and server | |
| CN103699594B (en) | A kind of information-pushing method and system | |
| CN113704623B (en) | Data recommendation method, device, equipment and storage medium | |
| US8788436B2 (en) | Utilization of features extracted from structured documents to improve search relevance | |
| CN110287314B (en) | Method and system for long text credibility assessment based on unsupervised clustering | |
| CN111444387A (en) | Video classification method and device, computer equipment and storage medium | |
| CN116415017B (en) | Advertisement sensitive content auditing method and system based on artificial intelligence | |
| EP3260968A1 (en) | Method and apparatus for displaying electronic picture, and mobile device | |
| CN112507176A (en) | Automatic determination method and device for domain name infringement, electronic equipment and storage medium | |
| CN114416939A (en) | Intelligent question answering method, device, equipment and storage medium | |
| CN111582913A (en) | Advertisement recommendation method and device | |
| CN114067343A (en) | Data set construction method, model training method and corresponding device | |
| CN112364068A (en) | Course label generation method, device, equipment and medium | |
| CN104185041A (en) | Video interaction advertisement automatic generation method and system | |
| CN103617192A (en) | Method and device for clustering data objects | |
| US20150019555A1 (en) | Method for enriching a multimedia content, and corresponding device | |
| KR100876214B1 (en) | Apparatus and method for context aware advertising and computer readable medium processing the method | |
| CN114881685B (en) | Advertisement delivery method, device, electronic device and storage medium | |
| CN114663152A (en) | An advertisement processing method, device and electronic device | |
| TW201415263A (en) | Forensic system, forensic method, and forensic program |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |