Movatterモバイル変換


[0]ホーム

URL:


CN114357242A - Training evaluation method based on recall model and device, equipment and storage medium - Google Patents

Training evaluation method based on recall model and device, equipment and storage medium
Download PDF

Info

Publication number
CN114357242A
CN114357242ACN202111575932.5ACN202111575932ACN114357242ACN 114357242 ACN114357242 ACN 114357242ACN 202111575932 ACN202111575932 ACN 202111575932ACN 114357242 ACN114357242 ACN 114357242A
Authority
CN
China
Prior art keywords
video
account
feature
target
recall
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111575932.5A
Other languages
Chinese (zh)
Other versions
CN114357242B (en
Inventor
戴威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co LtdfiledCriticalTencent Technology Shenzhen Co Ltd
Priority to CN202111575932.5ApriorityCriticalpatent/CN114357242B/en
Publication of CN114357242ApublicationCriticalpatent/CN114357242A/en
Application grantedgrantedCritical
Publication of CN114357242BpublicationCriticalpatent/CN114357242B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The embodiment of the application discloses a training evaluation method and device based on a recall model, electronic equipment and a storage medium, which can be applied to the fields of automatic driving, intelligent traffic and the like, and comprise the following steps: respectively storing account characteristics and video characteristics extracted based on training video samples in an online training process of the recall model into an account characteristic library and a video characteristic library; obtaining target account characteristics by offline sampling from an account characteristic library, and searching a video characteristic set related to warehousing time from a video characteristic library aiming at the target account characteristics; calculating the matching degree of the target account characteristics and the video characteristics of each item mark in the video characteristic set, and selecting a specified rank based on the descending order of the matching degree values; the recall rate is calculated based on the number of positive samples associated with the target video feature corresponding to the specified rank and the positive sample data associated with the target video features in the set of video features. The scheme of the embodiment of the application can save online machine resources and improve the iteration effect of the model parameters.

Description

Translated fromChinese
基于召回模型的训练评估方法及装置、设备、存储介质Training evaluation method based on recall model and device, equipment and storage medium

技术领域technical field

本申请涉及计算机技术领域,具体而言,涉及一种基于召回模型的训练评估方法及装置、电子设备、存储介质、程序产品。The present application relates to the field of computer technology, and in particular, to a recall model-based training evaluation method and device, electronic equipment, storage medium, and program product.

背景技术Background technique

推荐系统是指互联网时代平台根据用户兴趣自动选择/匹配平台上的商品并且呈现给用户。推荐系统通常包括召回模型,召回模型用于从候选池中选出符合目标和算力限制的子集。The recommendation system refers to the fact that the platform in the Internet era automatically selects/matches the products on the platform according to the user's interests and presents it to the user. Recommender systems typically include recall models that are used to select subsets from a pool of candidates that meet goals and computational constraints.

为了提升召回模型的训练效果,需要对召回模型进行评估。目前,召回模型的评估方式通常是在线AB测试,但是在线AB测试需要占用线上资源,且线上测试的规模较小,观测周期较长。In order to improve the training effect of the recall model, the recall model needs to be evaluated. At present, the evaluation method of recall model is usually online AB test, but online AB test needs to occupy online resources, and the scale of online test is small and the observation period is long.

发明内容SUMMARY OF THE INVENTION

为解决上述技术问题,本申请的实施例提供了基于召回模型的训练评估方法及装置、电子设备、存储介质、程序产品。To solve the above technical problems, the embodiments of the present application provide a recall model-based training evaluation method and device, electronic equipment, storage medium, and program product.

本申请的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本申请的实践而习得。Other features and advantages of the present application will become apparent from the following detailed description, or be learned in part by practice of the present application.

根据本申请实施例的一个方面,提供了一种基于召回模型的训练评估方法,包括:According to an aspect of the embodiments of the present application, a training evaluation method based on a recall model is provided, including:

获取召回模型在线上训练过程中基于训练视频样本所提取的账号特征和视频特征,并将所述账号特征和所述视频特征分别存储至账号特征库和视频特征库;Acquire the account features and video features extracted based on the training video samples during the online training process of the recall model, and store the account features and the video features in the account feature database and the video feature database respectively;

从所述账号特征库中离线采样得到目标账号特征,并针对所述目标账号特征从所述视频特征库中搜索入库时间相关的视频特征集合;The target account feature is obtained by offline sampling from the account feature library, and a video feature set related to the storage time is searched from the video feature library for the target account feature;

分别计算所述目标账号特征与所述视频特征集合中含有的各条目标视频特征之间的匹配度,并基于匹配度数值由大到小的排序选取指定排名;Calculate the matching degree between the target account feature and each target video feature contained in the video feature set respectively, and select the specified ranking based on the matching degree numerical order from large to small;

根据与所述指定排名对应的目标视频特征相关联的正样本数量,以及与所述视频特征集合中的目标视频特征相关联的正样本数据计算召回率,所述召回率用于评估所述召回模型的训练效果。A recall rate is calculated based on the number of positive samples associated with the target video feature corresponding to the specified ranking and the positive sample data associated with the target video feature in the set of video features, and the recall rate is used to evaluate the recall The training effect of the model.

根据本申请实施例的一个方面,提供了一种基于召回模型的训练评估装置,包括:According to an aspect of the embodiments of the present application, a recall model-based training evaluation device is provided, including:

特征获取模块,配置为获取召回模型在线上训练过程中基于训练视频样本所提取的账号特征和视频特征,并将所述账号特征和所述视频特征分别存储至账号特征库和视频特征库;A feature acquisition module, configured to acquire account features and video features extracted based on the training video samples during the online training process of the recall model, and store the account features and the video features in the account feature database and the video feature database respectively;

离线采样模块,配置为从所述账号特征库中离线采样得到目标账号特征,并针对所述目标账号特征从所述视频特征库中搜索入库时间相关的视频特征集合;an offline sampling module, configured to obtain a target account feature by offline sampling from the account feature library, and search for a video feature set related to storage time from the video feature library for the target account feature;

匹配度排名模块,配置为分别计算所述目标账号特征与所述视频特征集合中含有的各条目标视频特征之间的匹配度,并基于匹配度数值由大到小的排序选取指定排名;A matching degree ranking module, configured to calculate the matching degree between the target account feature and each target video feature contained in the video feature set respectively, and select a specified ranking based on the matching degree value from large to small;

召回率计算模块,配置为根据与所述指定排名对应的目标视频特征相关联的正样本数量,以及与所述视频特征集合中的目标视频特征相关联的正样本数据计算召回率,所述召回率用于评估所述召回模型的训练效果。A recall rate calculation module, configured to calculate a recall rate according to the number of positive samples associated with the target video features corresponding to the specified ranking, and the positive sample data associated with the target video features in the video feature set, the recall rate rate is used to evaluate the training effect of the recall model.

根据本申请实施例的一个方面,提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述电子设备实现如前所述的基于召回模型的训练评估方法。According to an aspect of the embodiments of the present application, an electronic device is provided, including: one or more processors; and a storage device for storing one or more programs, when the one or more programs are stored by the one or more programs When executed by the multiple processors, the electronic device implements the aforementioned recall model-based training evaluation method.

根据本申请实施例的一个方面,提供了一种计算机可读存储介质,其上存储有计算机可读指令,当所述计算机可读指令被电子设备的处理器执行时,使电子设备执行如前所述的基于召回模型的训练评估方法。According to an aspect of the embodiments of the present application, a computer-readable storage medium is provided, on which computer-readable instructions are stored. When the computer-readable instructions are executed by a processor of an electronic device, the electronic device is caused to execute as before. The training evaluation method based on the recall model.

根据本申请实施例的一个方面,提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如前所述的基于召回模型的训练评估方法。According to an aspect of the embodiments of the present application, a computer program product is provided, including a computer program, which implements the aforementioned recall model-based training evaluation method when executed by a processor.

在本申请的实施例所提供的技术方案中,获取召回模型在线上训练过程中基于训练视频样本所提取的账号特征和视频特征,并基于获取到的账号特征和视频特征对召回模型进行评估,从而使得评估过程不会占用线上资源,节约了在线机器资源;并且,在训练的过程中对召回模型进行评估,可以提高评估的及时性,提升模型参数迭代的效率。In the technical solution provided by the embodiment of the present application, the account features and video features extracted based on the training video samples during the online training process of the recall model are obtained, and the recall model is evaluated based on the obtained account features and video features, Therefore, the evaluation process does not occupy online resources and saves online machine resources; moreover, evaluating the recall model during the training process can improve the timeliness of evaluation and the efficiency of model parameter iteration.

应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not limiting of the present application.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术者来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application. Obviously, the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can also be obtained from these drawings without creative effort. In the attached image:

图1是本申请涉及的一种实施环境的示意图;1 is a schematic diagram of an implementation environment involved in the present application;

图2是本申请的一示例性实施例示出的一种基于召回模型的训练评估方法的流程图;2 is a flowchart of a recall model-based training evaluation method shown in an exemplary embodiment of the present application;

图3是图2所示实施例中的步骤S130在一示例性实施例中的流程图;FIG. 3 is a flowchart of step S130 in the embodiment shown in FIG. 2 in an exemplary embodiment;

图4是本申请的另一示例性实施例示出的一种基于召回模型的训练评估方法的流程图;4 is a flowchart of a recall model-based training evaluation method shown in another exemplary embodiment of the present application;

图5是本申请的一示例性实施例示出的召回模型的训练及评估流程图;5 is a flowchart of training and evaluation of a recall model according to an exemplary embodiment of the present application;

图6是本申请的一示例性实施例示出的一种基于召回模型的训练评估装置的结构示意图;6 is a schematic structural diagram of a recall model-based training and evaluation apparatus shown in an exemplary embodiment of the present application;

图7示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。FIG. 7 shows a schematic structural diagram of a computer system suitable for implementing the electronic device according to the embodiment of the present application.

具体实施方式Detailed ways

这里将详细地对示例性实施例执行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。The description will now be made in detail of exemplary embodiments, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as recited in the appended claims.

附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。The block diagrams shown in the figures are merely functional entities and do not necessarily necessarily correspond to physically separate entities. That is, these functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices entity.

附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。The flowcharts shown in the figures are only exemplary illustrations and do not necessarily include all contents and operations/steps, nor do they have to be performed in the order described. For example, some operations/steps can be decomposed, and some operations/steps can be combined or partially combined, so the actual execution order may be changed according to the actual situation.

还需要说明的是:在本申请中提及的“多个”是指两个或者两个以上。“和/或”描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。It should also be noted that the "plurality" mentioned in this application refers to two or more. "And/or" describes the association relationship between associated objects, indicating that there can be three kinds of relationships, for example, A and/or B can indicate that A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are an "or" relationship.

在介绍本申请实施例的技术方案之前,先对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。Before introducing the technical solutions of the embodiments of the present application, the terms and terms involved in the embodiments of the present application will be described first. The terms and terms involved in the embodiments of the present application are applicable to the following explanations.

推荐系统:是指互联网时代平台根据用户兴趣自动选择/匹配平台上的商品并且呈现给用户。由于用户的最近的行为表达更强烈的兴趣或者趋势,因此为了提升实时性,目前推荐系统广泛应用了实时化的技术,即通过流式数据传输/模型分钟级训练、导出上线/新模型分钟级更新并在线推理,达到能够及时响应并推理个体用户和群体用户行为。由于推荐系统算力和在线系统时延的限制,推荐系统可以采用召回-粗排(可没有)-精排-策略(混排)的漏斗级构造。Recommender system: refers to the fact that the platform in the Internet era automatically selects/matches products on the platform according to user interests and presents them to users. Since the recent behavior of users expresses stronger interests or trends, in order to improve real-time performance, the current recommendation system widely uses real-time technology, that is, through streaming data transmission/model training in minutes, export and online/new models in minutes Update and reason online, so as to be able to respond in time and reason about the behavior of individual users and groups of users. Due to the limitation of the computing power of the recommendation system and the delay of the online system, the recommendation system can adopt a funnel-level structure of recall-coarse ranking (optional)-fine ranking-strategy (mixed ranking).

召回:用于从整个侯选池中选出符合目标和算力限制的子集。Recall: Used to select a subset from the entire candidate pool that meets the target and power constraints.

在线AB测试:在某个新功能全面上线之前,对线上流量进行切分,利用切分得到的小部分流量对新功能进行测试,评估新功能的效果。Online AB test: Before a new function is fully launched, the online traffic is divided, and a small part of the traffic obtained by the segmentation is used to test the new function and evaluate the effect of the new function.

云存储(cloud storage)是在云计算概念上延伸和发展出来的一个新的概念,分布式云存储系统是指通过集群应用、网格技术以及分布存储文件系统等功能,将网络中大量各种不同类型的存储设备(存储设备也称之为存储节点)通过应用软件或应用接口集合起来协同工作,共同对外提供数据存储和业务访问功能的一个存储系统。Cloud storage is a new concept extended and developed from the concept of cloud computing. Distributed cloud storage system refers to the storage of a large number of Different types of storage devices (storage devices are also called storage nodes) work together through application software or application interfaces to jointly provide a storage system with data storage and service access functions.

人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习、自动驾驶、智慧交通等几大方向。Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology. The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning, autonomous driving, and smart transportation.

机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。Machine Learning (ML) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.

目前,分布式云存储系统的存储方法为:创建逻辑卷,在创建逻辑卷时,就为每个逻辑卷分配物理存储空间,该物理存储空间可能是某个存储设备或者某几个存储设备的磁盘组成。客户端在某一逻辑卷上存储数据,也就是将数据存储在文件系统上,文件系统将数据分成许多部分,每一部分是一个对象,对象不仅包含数据而且还包含数据标识(ID,IDentity)等额外的信息,文件系统将每个对象分别写入该逻辑卷的物理存储空间,且文件系统会记录每个对象的存储位置信息,从而当客户端请求访问数据时,文件系统能够根据每个对象的存储位置信息让客户端对数据进行访问。At present, the storage method of the distributed cloud storage system is to create a logical volume, and when the logical volume is created, a physical storage space is allocated to each logical volume, and the physical storage space may be the storage space of a certain storage device or several storage devices. disk composition. The client stores data on a logical volume, that is, the data is stored on the file system. The file system divides the data into many parts, each part is an object, and the object contains not only data but also data identification (ID, IDentity), etc. For additional information, the file system writes each object into the physical storage space of the logical volume, and the file system records the storage location information of each object, so that when the client requests to access data, the file system can The storage location information allows clients to access the data.

分布式云存储系统为逻辑卷分配物理存储空间的过程,具体为:按照对存储于逻辑卷的对象的容量估量(该估量往往相对于实际要存储的对象的容量有很大余量)和独立冗余磁盘阵列(RAID,Redundant Array of Independent Disk)的组别,预先将物理存储空间划分成分条,一个逻辑卷可以理解为一个分条,从而为逻辑卷分配了物理存储空间。The process of allocating physical storage space for a logical volume by a distributed cloud storage system is as follows: according to the capacity estimation of the objects stored in the logical volume (this estimation often has a large margin relative to the actual capacity of the objects to be stored) and independent A redundant disk array (RAID, Redundant Array of Independent Disk) group divides the physical storage space into stripes in advance, and a logical volume can be understood as a stripe, thereby allocating physical storage space for the logical volume.

为了提升召回模型的训练效果,需要对召回模型进行评估。目前,召回模型的评估方式通常是在线AB测试,但是,在线AB测试过程中,需要线上环境,占据线上机器资源,例如内存、外存等资源;并且,用于线上测试的流量一般20%以下,规模较小。基于此,本申请的实施例提供了一种基于召回模型的训练评估方法及装置、电子设备、存储介质、程序产品,使得召回模型的评估过程不会占用线上资源,且评估规模可以通过采用自适应调整。In order to improve the training effect of the recall model, the recall model needs to be evaluated. At present, the evaluation method of recall model is usually online AB test. However, in the process of online AB test, online environment is required, which occupies online machine resources, such as memory, external memory and other resources; and the traffic used for online test is generally Below 20%, the scale is small. Based on this, the embodiments of the present application provide a recall model-based training evaluation method and device, electronic device, storage medium, and program product, so that the evaluation process of the recall model does not occupy online resources, and the evaluation scale can be determined by using Adaptive adjustment.

请参阅图1,图1是本申请涉及的一种实施环境的示意图。该实施环境包括基于召回模型的训练评估装置100、召回模型200以及线上训练装置300。Please refer to FIG. 1 , which is a schematic diagram of an implementation environment involved in the present application. The implementation environment includes a recall model-basedtraining evaluation device 100 , arecall model 200 and anonline training device 300 .

训练评估装置100可以是服务器或其他设备。服务器可以是提供各种服务的服务器,其可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)以及大数据和人工智能平台等基础云计算服务的云服务器,本处不对此进行限制。Thetraining evaluation apparatus 100 may be a server or other device. A server can be a server that provides various services, it can be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or it can provide cloud services, cloud databases, cloud computing, cloud functions, Cloud servers for basic cloud computing services such as cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms. limit.

线上训练装置300也可以是服务器或其他设备。Theonline training device 300 may also be a server or other device.

线上训练装置300在对召回模型200进行线上训练过程中,召回模型会基于训练视频样本提取账号特征和视频特征。训练评估装置100可以获取召回模型在线上训练过程中基于训练视频样本所提取的账号特征和视频特征,并将账号特征和视频特征分别存储至账号特征库和视频特征库;从账号特征库中离线采样得到目标账号特征,并针对目标账号特征从视频特征库中搜索入库时间相关的视频特征集合;分别计算目标账号特征与视频特征集合中含有的各条目标视频特征之间的匹配度,并基于匹配度数值由大到小的排序选取指定排名;根据与指定排名对应的目标视频特征相关联的正样本数量,以及与视频特征集合中的目标视频特征相关联的正样本数据计算召回率,召回率用于评估召回模型的训练效果。这样,通过获取召回模型在线上训练过程中基于训练视频样本所提取的账号特征和视频特征,并基于获取到的账号特征和视频特征对召回模型进行评估,从而使得评估过程不会占用线上资源,节约了在线机器资源;并且,在训练的过程中对召回模型进行评估,可以提高评估的及时性,提升模型参数迭代的效率。另外,评估过程中用到的数据与训练过程中的数据相同,从而降低偏差,提升评估准确性。During online training of therecall model 200 by theonline training device 300, the recall model will extract account features and video features based on the training video samples. Thetraining evaluation device 100 can obtain the account features and video features extracted based on the training video samples in the online training process of the recall model, and store the account features and video features in the account feature database and the video feature database respectively; offline from the account feature database The target account features are obtained by sampling, and the video feature sets related to the storage time are searched from the video feature database according to the target account features; the matching degrees between the target account features and the target video features contained in the video feature sets are calculated respectively, and The specified ranking is selected based on the matching degree value from large to small; the recall rate is calculated according to the number of positive samples associated with the target video feature corresponding to the specified ranking, and the positive sample data associated with the target video feature in the video feature set, The recall rate is used to evaluate the training effect of the recall model. In this way, by obtaining the account features and video features extracted from the training video samples during the online training process of the recall model, and evaluating the recall model based on the obtained account features and video features, the evaluation process will not occupy online resources. , saving online machine resources; and evaluating the recall model during the training process can improve the timeliness of evaluation and the efficiency of model parameter iteration. In addition, the data used in the evaluation process is the same as the data in the training process, thereby reducing bias and improving evaluation accuracy.

其中,账号模型为基于机器学习创建的模型;账号特征库和视频特征库可以存储在存储系统中,存储系统可以是基于云存储技术的存储系统,当然,也可以是其他类型的存储系统。参见图2,图2是本申请的一示例性实施例示出的一种基于召回模型的训练评估方法的流程图。该方法可以应用于图1所示的实施环境,其可以由图1所示的实施环境中的基于召回模型的训练评估装置100执行。The account model is a model created based on machine learning; the account feature library and the video feature library can be stored in a storage system, which can be a storage system based on cloud storage technology, or other types of storage systems. Referring to FIG. 2, FIG. 2 is a flowchart of a recall model-based training evaluation method shown in an exemplary embodiment of the present application. The method can be applied to the implementation environment shown in FIG. 1 , and it can be executed by the recall model-basedtraining evaluation apparatus 100 in the implementation environment shown in FIG. 1 .

在召回模型达到一定的条件后,可以将召回模型进行上线,以为用户终端提供服务,其中,用户终端包括但不限于手机、电脑、智能语音交互设备、智能家电、车载终端等。After the recall model reaches certain conditions, the recall model can be launched to provide services for user terminals, including but not limited to mobile phones, computers, intelligent voice interaction devices, smart home appliances, vehicle terminals, etc.

需要说明的是,除了前述所涉及的应用场景,本申请实施例还可以应用于各种应用场景,包括但不限于云技术、人工智能、智慧交通、辅助驾驶等,在实际应用中,可以根据具体应用场景进行相应调整。例如,如果应用于云技术场景,召回模型可以部署在云端,并且,账号特征库和视频特征库也可以基于云存储技术进行存储;如果应用于人工如果应用于智慧交通或辅助驾驶场景,召回模型可以部署在车载终端、导航终端等,用于导航、辅助驾驶等。It should be noted that, in addition to the aforementioned application scenarios, the embodiments of the present application can also be applied to various application scenarios, including but not limited to cloud technology, artificial intelligence, intelligent transportation, and assisted driving. Adjust accordingly for specific application scenarios. For example, if applied to cloud technology scenarios, the recall model can be deployed in the cloud, and the account feature library and video feature library can also be stored based on cloud storage technology; if applied to artificial intelligence, if applied to smart traffic or assisted driving scenarios, the recall model It can be deployed in vehicle terminals, navigation terminals, etc. for navigation, assisted driving, etc.

如图2所示,在一示例性实施例中,该基于召回模型的训练评估方法可以包括步骤S110至步骤S140,详细介绍如下:As shown in FIG. 2, in an exemplary embodiment, the recall model-based training evaluation method may include steps S110 to S140, which are described in detail as follows:

步骤S110,获取召回模型在线上训练过程中基于训练视频样本所提取的账号特征和视频特征,并将账号特征和视频特征分别存储至账号特征库和视频特征库。Step S110: Acquire account features and video features extracted based on the training video samples during the online training process of the recall model, and store the account features and video features in the account feature database and the video feature database, respectively.

需要说明的是,本实施例中,召回模型应用于视频召回场景,用于从侯选池中选择符合目标和算力的模型。该召回模型可以是基于DNN(Deep Neural Networks,深度神经网络)创建的模型,当然,也可以是基于其他机器学习网络创建的模型。候选池可以根据实际需要灵活设置,例如,包括但不限于视频平台中含有的视频。It should be noted that, in this embodiment, the recall model is applied to a video recall scenario, and is used to select a model that meets the target and computing power from the candidate pool. The recall model can be a model created based on DNN (Deep Neural Networks, deep neural network), of course, can also be a model created based on other machine learning networks. The candidate pool can be flexibly set according to actual needs, for example, including but not limited to videos contained in video platforms.

训练视频样本为用于对召回模型进行训练的视频,其可以包括正样本,还可以包括负样本。该训练视频样本可以是基于线上实时消息确定的。例如,在视频平台的运行过程中,会产生实时消息,从实时消息中可以获取账号数据,包括但不限于账号属性信息、账号行为信息等,其中,账号属性信息包括但不限于该账号对应用户的年龄、性别等,账号行为信息包括但不限于点击、观看、点赞、评论、转发等行为。基于账号数据,可以确定该账号的账号特征,基于账号特征可以构建正样本和负样本,其中,正样本可以包括该账号观看时长达到一定时长(例如,10分钟、3分钟等)的视频、该账号转发过的视频、该账号点赞过的视频等,负样本可以包括该账号未观看过的视频、该账号屏蔽过的视频等,或者,还可以从候选池中随机选择视频作为负样本。The training video samples are videos used for training the recall model, which may include positive samples and negative samples. The training video samples may be determined based on online real-time messages. For example, during the operation of the video platform, real-time messages will be generated, and account data can be obtained from the real-time messages, including but not limited to account attribute information, account behavior information, etc., wherein the account attribute information includes but not limited to the user corresponding to the account. age, gender, etc., account behavior information includes but is not limited to clicking, viewing, liking, commenting, forwarding and other behaviors. Based on the account data, the account characteristics of the account can be determined, and positive samples and negative samples can be constructed based on the account characteristics, wherein the positive samples can include videos watched by the account for a certain length of time (for example, 10 minutes, 3 minutes, etc.), For videos forwarded by the account, videos liked by the account, etc., the negative samples can include videos that the account has not watched, videos blocked by the account, etc., or, videos can be randomly selected from the candidate pool as negative samples.

账号特征库用于存储账号特征,其具体类型可以根据实际需要灵活设置,例如,在一个示例中,由于账号特征是实时更新的,因此,账号特征库的类型包括但不限于实时表,实时表是指内容实时进行更新的表文件,以满足召回模型在线上训练过程中实时提取得到的账号特征的需求。The account feature database is used to store account features, and its specific type can be flexibly set according to actual needs. For example, in an example, since account features are updated in real time, the types of account feature databases include but are not limited to real-time tables, real-time tables Refers to the table file whose content is updated in real time to meet the needs of the account features extracted in real time during the online training process of the recall model.

视频特征库用于存储候选池中视频的视频特征,其具体类型可以根据实际需要灵活设置。在一个示例中,由于侯选池中的视频量较大,通常为百万到亿级别,因此,为了降低存储压力,视频特征库的类型包括但不限于分布式文件系统。其中,分布式文件系统(Distributed File System,DFS)是指文件系统管理的物理存储资源不一定直接连接在本地节点上,而是通过计算机网络与节点相连,或是若干不同的逻辑磁盘分区或卷标组合在一起而形成的完整的有层次的文件系统,不仅可以降低单点存储压力,还可以满足基于时间点的视频特征存储需求。The video feature library is used to store the video features of the video in the candidate pool, and its specific type can be flexibly set according to actual needs. In one example, since the amount of videos in the candidate pool is large, usually at the level of one million to one hundred million, in order to reduce storage pressure, the type of video feature library includes but is not limited to distributed file systems. Among them, Distributed File System (DFS) means that the physical storage resources managed by the file system are not necessarily directly connected to the local node, but are connected to the node through a computer network, or several different logical disk partitions or volumes A complete hierarchical file system is formed by combining the video tags together, which can not only reduce the pressure of single-point storage, but also meet the storage requirements of video features based on time points.

在对召回模型进行线上训练时,召回模型会基于训练视频样本提取账号特征,并对候选池中的视频进行特征提取,得到候选池中每个视频的视频特征。When the recall model is trained online, the recall model will extract account features based on the training video samples, and perform feature extraction on the videos in the candidate pool to obtain the video features of each video in the candidate pool.

召回模型在线上训练过程中,可以对输入的与账号关联的训练视频样本进行识别,提取到该账号的账号特征,并且,召回模型也会对候选池中的视频进行识别,提取到每个视频的视频特征,从而便于后续基于账号特征和视频特征召回该账号特征对应的视频;为了对召回模型进行评估,本实施例中,获取召回模型在线上训练过程中基于训练视频样本所提取的账号特征和视频特征,并将账号特征存储至账号特征库,将视频特征存储至视频特征库。During the online training process of the recall model, the input training video samples associated with the account can be identified, and the account characteristics of the account can be extracted. Moreover, the recall model will also identify the videos in the candidate pool and extract each video. The video features of the account, so as to facilitate the subsequent recall of the video corresponding to the account feature based on the account feature and the video feature; in order to evaluate the recall model, in this embodiment, the account feature extracted based on the training video samples during the online training process of the recall model is obtained. and video features, store the account features in the account feature database, and store the video features in the video feature database.

在一些实施方式中,侯选池是不断更新的,会存在不同版本的候选池,召回模型可以对不同版本的候选池中的视频进行特征提取,在进行召回时,通常是从某个版本的候选池中召回符合条件的视频,因此,在获取召回模型在线上训练过程中的视频特征时,可以以缓存池为单位,获取缓存池中含有的视频所对应的视频特征。In some implementations, the candidate pool is constantly updated, and there are different versions of the candidate pool. The recall model can perform feature extraction on the videos in the candidate pools of different versions. When recalling, the candidate pool of a certain version is usually extracted from the video. Therefore, when obtaining the video features during the online training of the recall model, the video features corresponding to the videos contained in the cache pool can be obtained in units of the cache pool.

步骤S120,从账号特征库中离线采样得到目标账号特征,并针对目标账号特征从视频特征库中搜索入库时间相关的视频特征集合。In step S120, the target account feature is obtained by offline sampling from the account feature database, and a video feature set related to the storage time is searched from the video feature database for the target account feature.

账号特征库中存储有不同时间获取到的不同账号的账号特征,数据量较大,因此,本实施例中,可以在账号特征库中进行离线采样,从而得到目标账号特征,离线采样的方式可以根据实际需要灵活设置。Account features of different accounts obtained at different times are stored in the account feature database, and the amount of data is relatively large. Therefore, in this embodiment, offline sampling can be performed in the account feature database to obtain the target account features. The offline sampling method can be Flexible settings according to actual needs.

候选池中的视频是不断更新的,因此,存在不同版本的侯选池,召回模型会对不同版本的候选池中的视频进行特征提取,相应的,视频特征库中存储有不同版本的候选池对应的视频特征,例如,在某一时刻,对候选池1中的数据进行了更新,则得到更新后的候选池,记为侯选池2,视频特征库中存储有候选池1对应的视频特征,以及侯选池2对应的视频特征,候选池对应的视频特征为侯选池中含有的视频所对应的视频特征。The videos in the candidate pool are constantly updated. Therefore, there are different versions of the candidate pool. The recall model will perform feature extraction on the videos in the different versions of the candidate pool. Correspondingly, the video feature database stores the corresponding versions of the candidate pools. Video features, for example, at a certain moment, the data in candidate pool 1 is updated, and the updated candidate pool is obtained, which is recorded as candidate pool 2. The video feature database stores the video features corresponding to candidate pool 1 and the candidate pool. 2 Corresponding video features, the video features corresponding to the candidate pool are the video features corresponding to the videos contained in the candidate pool.

视频特征的入库时间可以是该视频特征存入视频特征库的时间,也可以是获取到该视频特征的时间;账号特征的入库时间可以是该账号特征存入账号特征库的时间,也可以是获取到该账号特征的时间;The storage time of the video feature can be the time when the video feature is stored in the video feature database, or the time when the video feature is obtained; the storage time of the account feature can be the time when the account feature is stored in the account feature database, or It can be the time when the account characteristics are obtained;

本实施例中,在从账号特征库中离线采样得到目标账号特征后,从视频特征库中获取与目标账号特征的入库时间对应的视频特征,得到视频特征集合,例如,可以获取入库时间早于目标账号特征的入库时间的视频特征。In this embodiment, after offline sampling of the target account feature from the account feature database, the video feature corresponding to the storage time of the target account feature is obtained from the video feature database to obtain a video feature set. For example, the storage time can be obtained. Video features earlier than the storage time of the target account feature.

步骤S130,分别计算目标账号特征与视频特征集合中含有的各条目标视频特征之间的匹配度,并基于匹配度数值由大到小的排序选取指定排名。Step S130, respectively calculating the matching degree between the target account feature and each target video feature contained in the video feature set, and selecting a designated ranking based on the matching degree value in descending order.

其中,指定排名可以根据实际需要灵活设置,例如,前100名,500名等。Among them, the specified ranking can be flexibly set according to actual needs, for example, the top 100, 500 and so on.

本实施例中,分别计算目标账号特征与视频特征集合中含有的各条目标视频特征之间的匹配度,并基于匹配度数值的大小,按照由大到小的排序选择指定排名对应的目标视频特征。In this embodiment, the matching degree between the target account feature and each target video feature contained in the video feature set is calculated respectively, and based on the value of the matching degree, the target video corresponding to the specified ranking is selected in descending order. feature.

步骤S140,根据与指定排名对应的目标视频特征相关联的正样本数量,以及与视频特征集合中的目标视频特征相关联的正样本数据计算召回率,召回率用于评估召回模型的训练效果。Step S140: Calculate the recall rate according to the number of positive samples associated with the target video feature corresponding to the specified ranking and the positive sample data associated with the target video feature in the video feature set, and the recall rate is used to evaluate the training effect of the recall model.

其中,与指定排名对应的目标视频特征相关联的正样本数量包括:指定排名对应的目标视频特征所关联的视频中,正样本的数量。与视频特征集合中的目标视频特征相关联的正样本数据包括:视频特征集合中的目标视频特征所关联的视频中,正样本的数量。The number of positive samples associated with the target video feature corresponding to the specified ranking includes: the number of positive samples in the video associated with the target video feature corresponding to the specified ranking. The positive sample data associated with the target video feature in the video feature set includes: the number of positive samples in the video associated with the target video feature in the video feature set.

召回率可以为正样本数量与正样本数据的比值。在一个示例中,假设视频特征集合中包括视频a的特征、视频b的特征、视频c的特征、视频d的特征以及视频e的特征,其中,正样本为视频a和视频c,指定排名为3,匹配度数值由大到小的排列顺序依次为视频a的特征、视频b的特征、视频e的特征、视频c的特征、视频d的特征,由于前3名中只包括正样本视频a的特征,因此,与指定排名对应的目标视频特征相关联的正样本数量为1(即视频a),与视频特征集合中的目标视频特征相关联的正样本数据为2(即视频a和c),召回率为1/2=0.5。The recall rate can be the ratio of the number of positive samples to the positive sample data. In one example, it is assumed that the video feature set includes the features of video a, the features of video b, the features of video c, the features of video d, and the features of video e, wherein the positive samples are video a and video c, and the specified ranking is 3. The order of matching degree values from large to small is the characteristics of video a, the characteristics of video b, the characteristics of video e, the characteristics of video c, and the characteristics of video d, because only the positive sample video a is included in the top 3 , therefore, the number of positive samples associated with the target video feature corresponding to the specified ranking is 1 (i.e. video a), and the number of positive samples associated with the target video feature in the video feature set is 2 (i.e. videos a and c ), the recall rate is 1/2=0.5.

由于账号特征以及视频特征均是从召回模型线上训练过程中获取的,其在一定程度上可以表征召回模型的训练效果,因此,根据与指定排名对应的目标视频特征相关联的正样本数量,以及与视频特征集合中的目标视频特征相关联的正样本数据计算得到的召回率,可以评估召回模型的训练效果。Since both account features and video features are obtained from the online training process of the recall model, they can characterize the training effect of the recall model to a certain extent. Therefore, according to the number of positive samples associated with the target video features corresponding to the specified ranking, As well as the recall rate calculated from the positive sample data associated with the target video feature in the video feature set, the training effect of the recall model can be evaluated.

本实施例中,获取召回模型在线上训练过程中基于训练视频样本所提取的账号特征和视频特征,并将账号特征和视频特征分别存储至账号特征库和视频特征库;从账号特征库中离线采样得到目标账号特征,并针对目标账号特征从视频特征库中搜索入库时间相关的视频特征集合;分别计算目标账号特征与视频特征集合中含有的各条目标视频特征之间的匹配度,并基于匹配度数值由大到小的排序选取指定排名;根据与指定排名对应的目标视频特征相关联的正样本数量,以及与视频特征集合中的目标视频特征相关联的正样本数据计算召回率,召回率用于评估召回模型的训练效果,从而使得评估过程不会占用线上资源,节约了在线机器资源;并且,在训练的过程中对召回模型进行评估,可以提高评估的及时性,提升模型参数迭代的效率。In this embodiment, the account features and video features extracted based on the training video samples during the online training process of the recall model are obtained, and the account features and video features are stored in the account feature database and the video feature database respectively; offline from the account feature database The target account features are obtained by sampling, and the video feature sets related to the storage time are searched from the video feature database according to the target account features; the matching degrees between the target account features and the target video features contained in the video feature sets are calculated respectively, and The specified ranking is selected based on the matching degree value from large to small; the recall rate is calculated according to the number of positive samples associated with the target video feature corresponding to the specified ranking, and the positive sample data associated with the target video feature in the video feature set, The recall rate is used to evaluate the training effect of the recall model, so that the evaluation process does not occupy online resources and saves online machine resources; in addition, evaluating the recall model during the training process can improve the timeliness of evaluation and improve the model. Efficiency of parameter iteration.

在一示例性实施例中,由于在确定视频特征集合时,需要参考时间参数,因此,基于召回模型的训练评估方法还可以包括:在将账号特征和视频特征分别存储至账号特征库和视频特征库的过程中,还将账号特征的获取时间以及视频特征的获取时间分别记录到账号特征库和视频特征库中。从而便于后续针对目标账号特征从视频特征库中搜索获取时间相关的视频特征集合。In an exemplary embodiment, since the time parameter needs to be referenced when determining the video feature set, the training evaluation method based on the recall model may further include: storing the account feature and the video feature in the account feature database and the video feature respectively. In the process of storing the database, the acquisition time of the account feature and the acquisition time of the video feature are also recorded in the account feature database and the video feature database, respectively. Therefore, it is convenient to search and obtain time-related video feature sets from the video feature database for the target account feature subsequently.

参见图3,图3为在将账号特征的获取时间以及视频特征的获取时间分别记录到账号特征库和视频特征库的条件下,图2所示实施例中的步骤S130在一示例性实施例中的流程图,如图3所示,从账号特征库中离线采样得到目标账号特征,并针对目标账号特征从视频特征库中搜索入库时间相关的视频特征集合的过程可以包括步骤S131-步骤S132,详细介绍如下:Referring to FIG. 3, FIG. 3 is an exemplary embodiment of step S130 in the embodiment shown in FIG. 2 under the condition that the acquisition time of the account feature and the acquisition time of the video feature are recorded in the account feature database and the video feature database respectively. The flow chart in , as shown in Figure 3, the offline sampling from the account feature database to obtain the target account feature, and the process of searching the video feature set related to the storage time from the video feature database for the target account feature may include step S131-step S132, the details are as follows:

步骤S131,从账号特征库中离线采样目标账号特征,并确定目标账号特征对应的获取时间。In step S131, the target account features are sampled offline from the account feature database, and the acquisition time corresponding to the target account features is determined.

为了选取与目标账号特征对应的视频特征集合,本实施例中,在从账号特征库中离线采样目标账号特征的过程中,可以获取目标账号特征对应的获取时间。In order to select the video feature set corresponding to the target account feature, in this embodiment, in the process of offline sampling the target account feature from the account feature database, the acquisition time corresponding to the target account feature may be obtained.

步骤S132,在视频特征库中搜索获取时间早于目标账号特征的获取时间、且与目标账号特征的获取时间最接近的视频特征,将搜索到的视频特征作为视频特征集合中的目标视频特征。Step S132: Search the video feature library for a video feature whose acquisition time is earlier than the acquisition time of the target account feature and is closest to the acquisition time of the target account feature, and use the searched video feature as the target video feature in the video feature set.

由于召回模型基于账号特征从侯选池中召回视频时,只能从对应时间早于账号特征的生成时间的候选池中召回视频,因此,为了提升离线评估的准确度,本实施例中,在从账号特征库中离线采样目标账号特征,并确定该目标账号特征对应的获取时间后,从视频特征库中搜索获取时间早于目标账号特征的获取时间、且与目标账号特征的获取时间最接近的视频特征,将搜索到的视频特征作为视频特征集合中的目标视频特征。例如,假设目标账号特征的获取时间为12时10分05秒,视频特征库中包括侯选池1对应的视频特征、侯选池2对应的视频特征以及侯选池3对应的视频特征,侯选池1对应的视频特征获取时间为12时3分06秒,侯选池2对应的视频特征获取时间为12时6分06秒,侯选池1对应的视频特征获取时间为12时20分06秒,因此,将侯选池2对应的视频特征作为视频特征集合中的目标视频特征。Since the recall model recalls videos from the candidate pool based on account characteristics, it can only recall videos from the candidate pool whose corresponding time is earlier than the generation time of account characteristics. Therefore, in order to improve the accuracy of offline evaluation, in this embodiment, the The target account feature is sampled offline in the feature database, and after the acquisition time corresponding to the target account feature is determined, the video feature database is searched for the video whose acquisition time is earlier than the acquisition time of the target account feature and is closest to the acquisition time of the target account feature. feature, the searched video feature is used as the target video feature in the video feature set. For example, assuming that the acquisition time of the target account feature is 12:10:05, the video feature library includes the video features corresponding to candidate pool 1, the video features corresponding to candidate pool 2, and the video features corresponding to candidate pool 3, and the video features corresponding to candidate pool 1 are obtained. The time is 12:3:06, the video feature acquisition time corresponding to candidate pool 2 is 12:6:06, and the video feature acquisition time corresponding to candidate pool 1 is 12:20:06. Therefore, the video feature corresponding to candidate pool 2 is obtained. as the target video feature in the video feature set.

在一些实施方式中,若存在多条目标账号特征,考虑到每条目标账号特征的获取时间可能不同,因此,针对每条目标账号特征,可以根据该目标账号特征的获取时间确定对应的视频特征集合,然后,基于对应视频特征集合关联的视频中包括的正样本的数量,以及指定排名的目标视频特征相关联的视频中包括的正样本的数据,计算该目标视频特征的召回率;然后,将多条目标账号特征对应的召回率进行平均,得到最终的召回率。或者,若存在多条目标账号特征,可以从多条目标账号特征对应的获取时间中,确定最早获取时间,基于最早获取时间确定一个视频特征集合,然后,对于每条目标账号特征,确定其在视频特征集合中关联的视频中包括的正样本的数量,以及指定排名的目标视频特征相关联的视频中包括的正样本的数据,计算该目标视频特征的召回率;然后,将多条目标账号特征对应的召回率进行平均,得到最终的召回率。当然,还可以有其他处理方式,此处不对其进行限制。In some embodiments, if there are multiple target account features, considering that the acquisition time of each target account feature may be different, for each target account feature, the corresponding video feature can be determined according to the acquisition time of the target account feature Set, then, based on the number of positive samples included in the video associated with the corresponding video feature set, and the data of the positive samples included in the video associated with the target video feature of the specified ranking, calculate the recall rate of the target video feature; then, The recall rates corresponding to multiple target account features are averaged to obtain the final recall rate. Alternatively, if there are multiple target account features, the earliest acquisition time may be determined from the acquisition times corresponding to the multiple target account features, and a video feature set may be determined based on the earliest acquisition time, and then, for each target account feature, determine its The number of positive samples included in the video associated in the video feature set, and the data of the positive samples included in the video associated with the target video feature of the specified ranking, calculate the recall rate of the target video feature; The recall rates corresponding to the features are averaged to obtain the final recall rate. Of course, there may also be other processing methods, which are not limited here.

本实施例中,从账号特征库中离线采样目标账号特征,并确定目标账号特征对应的获取时间,在视频特征库中搜索获取时间早于目标账号特征的获取时间、且与目标账号特征的获取时间最接近的视频特征,将搜索到的视频特征作为视频特征集合中的目标视频特征,从而提升评估准确性。In this embodiment, the target account feature is sampled offline from the account feature database, and the acquisition time corresponding to the target account feature is determined, and the video feature database is searched for the acquisition time earlier than the acquisition time of the target account feature and the acquisition time of the target account feature. The video features with the closest time, use the searched video features as the target video features in the video feature set, so as to improve the evaluation accuracy.

在一示例性实施例中,图2所示实施例中的步骤S130中,从账号特征库中离线采样目标账号特征,并确定目标账号特征对应的获取时间的过程,可以包括:基于预设时间间隔周期性从账号特征库中离线采样指定数量的目标账号特征,其中,预设时间间隔大于召回模型在线上训练过程中进行账号特征和视频特征提取的频率。In an exemplary embodiment, in step S130 in the embodiment shown in FIG. 2 , the process of offline sampling the target account feature from the account feature database and determining the acquisition time corresponding to the target account feature may include: based on a preset time The interval periodically samples a specified number of target account features offline from the account feature library, wherein the preset time interval is greater than the frequency of extracting account features and video features during the online training process of the recall model.

预设时间间隔为对召回模型进行评估的时间间隔,即,每隔一定的时间,就获取目标账号特征,并基于目标账号特征对召回模型进行一次评估。预设时间间隔的具体数值可以根据实际需要灵活设置,例如,可以设置为1小时、2小时等。指定数量为每个时间间隔内采样的目标账号特征数量,其具体数值可以根据实际需要灵活设置,例如,可以是10万个、5万个等。The preset time interval is the time interval for evaluating the recall model, that is, the target account feature is acquired at regular intervals, and the recall model is evaluated once based on the target account feature. The specific value of the preset time interval can be flexibly set according to actual needs, for example, it can be set to 1 hour, 2 hours, and so on. The specified number is the number of target account features sampled in each time interval, and its specific value can be flexibly set according to actual needs, for example, it can be 100,000, 50,000, etc.

召回模型在线上训练过程中,会每隔一定的时间对新增的训练样本进行账号特征的提取。由于侯选池是不断更新的,召回模型也可以每隔一定的时间对新版本的候选池包含的视频进行特征提取,例如,召回模块可以分钟级提取账号特征,并分钟级提取对候选池包含的视频进行特征提取,分钟级提取是指两次提取的间隔时间为分钟级别(小于1小时,例如,1分钟、2分钟等)。During the online training process of the recall model, account features are extracted from the newly added training samples at regular intervals. Since the candidate pool is constantly updated, the recall model can also perform feature extraction on the videos contained in the new version of the candidate pool at regular intervals. For example, the recall module can extract account features at the minute level, and extract the videos contained in the candidate pool at the minute level. To perform feature extraction, minute-level extraction means that the interval between two extractions is minute-level (less than 1 hour, for example, 1 minute, 2 minutes, etc.).

为了避免基于相同的账号特征以及视频特征对召回模型进行多次评估的情况,本实施例中,预设时间间隔大于召回模型在线上训练过程中进行账号特征和视频特征提取的频率。In order to avoid multiple evaluations of the recall model based on the same account features and video features, in this embodiment, the preset time interval is greater than the frequency of extracting account features and video features during the online training process of the recall model.

本实施例中,基于预设时间间隔周期性从账号特征库中离线采样指定数量的目标账号特征,从而可以每隔预设时间间隔对召回模型进行一次评估,从而提高评估的实时性,便于对不同时间段的召回模型进行对比。In this embodiment, a specified number of target account features are periodically sampled offline from the account feature database based on a preset time interval, so that the recall model can be evaluated once every preset time interval, thereby improving the real-time performance of evaluation and facilitating the evaluation of The recall models of different time periods are compared.

在一示例性实施例中,参见图4所示,在根据与指定排名对应的目标视频特征相关联的正样本数量,以及与视频特征集合中的目标视频特征相关联的正样本数据计算召回率之后,基于召回模型的训练评估方法还可以包括步骤S210-步骤S220,详细介绍如下:In an exemplary embodiment, referring to FIG. 4 , the recall rate is calculated according to the number of positive samples associated with the target video feature corresponding to the specified ranking and the positive sample data associated with the target video feature in the video feature set. After that, the training evaluation method based on the recall model may further include steps S210-S220, which are described in detail as follows:

步骤S210,获取在不同的预设时间间隔内计算得到的召回率。Step S210, obtaining recall rates calculated in different preset time intervals.

本实施例中,每隔预设时间间隔对召回模型进行一次评估,得到召回率,因此,可以获取在不同的预设时间间隔内计算得到的召回率。例如,可以获取3个不同的预设时间间隔内计算得到的召回率,获取4个不同的预设时间间隔内计算得到的召回率等。In this embodiment, the recall model is evaluated every preset time interval to obtain the recall rate. Therefore, the recall rate calculated in different preset time intervals can be obtained. For example, recall rates calculated in 3 different preset time intervals, recall rates calculated in 4 different preset time intervals, etc. may be acquired.

步骤S220,对获取到的多个召回率进行数值大小对比,选取数值最大的召回率对应的召回模型版本作为训练效果最好的召回模型应用于信息推荐系统。Step S220, compare the values of the obtained recall rates, and select the recall model version corresponding to the recall rate with the largest value as the recall model with the best training effect and apply it to the information recommendation system.

在获取不同的预设时间间隔内计算得到的召回率后,对获取到的召回率的数值进行比较,从而确定数值最大的召回率,即最大召回率。由于召回率越大,表征召回模型的效果越好,因此,可以将最大召回率对应版本的召回模型作为训练效果最好的召回模型应用于信息推荐系统。After obtaining the recall rates calculated in different preset time intervals, the obtained recall rates are compared to determine the recall rate with the largest value, that is, the maximum recall rate. Since the larger the recall rate is, the better the effect of representing the recall model is. Therefore, the recall model corresponding to the maximum recall rate can be applied to the information recommendation system as the recall model with the best training effect.

本实施例中,获取在不同的预设时间间隔内计算得到的召回率,对获取到的多个召回率进行数值大小对比,选取数值最大的召回率对应的召回模型版本作为训练效果最好的召回模型应用于信息推荐系统,从而提升召回模型的迭代效率。In this embodiment, the recall rates calculated in different preset time intervals are obtained, the numerical values of the obtained recall rates are compared, and the recall model version corresponding to the recall rate with the largest numerical value is selected as the one with the best training effect. The recall model is applied to the information recommendation system to improve the iteration efficiency of the recall model.

在一示例性实施例中,图2所示实施例中的步骤S130中,分别计算目标账号特征与视频特征集合中含有的各条目标视频特征之间的匹配度的过程还可以包括:分别针对目标账号特征和各条目标视频特征进行向量内积运算,将得到的运算作为目标账号特征与对应目标视频特征之间的匹配度。In an exemplary embodiment, in step S130 in the embodiment shown in FIG. 2 , the process of respectively calculating the matching degree between the target account feature and each target video feature contained in the video feature set may further include: A vector inner product operation is performed on the target account feature and each target video feature, and the obtained operation is used as the matching degree between the target account feature and the corresponding target video feature.

召回模型在提取账号特征和视频特征时,可以将账号特征数据和视频特征数据映射至同一空间,从而得到向量形式存在的账号特征和视频特征,为了确定目标账号特征与目标视频特征的匹配度,本实施例中,可以分别对目标账号特征和各条目标视频特征进行向量内积运算,将得到的运算结果作为目标账号特征与对应目标视频特征之间的匹配度。When the recall model extracts account features and video features, it can map account feature data and video feature data to the same space, thereby obtaining account features and video features in the form of vectors. In order to determine the matching degree of target account features and target video features, In this embodiment, the vector inner product operation may be performed on the target account feature and each target video feature respectively, and the obtained operation result may be used as the matching degree between the target account feature and the corresponding target video feature.

本实施例中,通过向量内积运算方式确定目标账号特征和各条目标视频特征之间的匹配度,从而可以提升计算速度以及准确性。In this embodiment, the matching degree between the target account feature and each target video feature is determined by the vector inner product operation, so that the calculation speed and accuracy can be improved.

以下对本申请实施例的一个具体应用场景进行详细说明。参见图5所示,召回模型的线上训练或评估可以包括以下过程:A specific application scenario of the embodiments of the present application will be described in detail below. Referring to Figure 5, online training or evaluation of the recall model can include the following processes:

实时消息:在视频平台的运行过程中,会产生实时消息,为了对信息推荐模型进行线上训练,本实施例中,会获取实时消息。Real-time messages: During the operation of the video platform, real-time messages will be generated. In order to perform online training on the information recommendation model, in this embodiment, real-time messages will be obtained.

实时数据处理:在获取实时消息后,会对实时消息进行处理,以从实时消息中获取用户数据,用户数据包括但不限于用户属性信息、用户行为信息等,用户属性信息包括但不限于年龄、性别等,用户行为信息包括但不限于点击、观看、点赞、评论、转发等行为。Real-time data processing: After obtaining real-time messages, the real-time messages will be processed to obtain user data from the real-time messages. User data includes but is not limited to user attribute information, user behavior information, etc. User attribute information includes but is not limited to age, Gender, etc. User behavior information includes but is not limited to behaviors such as clicking, watching, liking, commenting, and forwarding.

拉取并拼接特征:在对实时消息进行处理得到用户数据后,本实施例中,还可以从用户数据中提取特征并对特征进行拼接。Pulling and splicing features: After processing real-time messages to obtain user data, in this embodiment, features can also be extracted and spliced from the user data.

正负样本构造:在提取特征并对特征进行拼接后,可以基于得到的特征构造正负样本,构造方式可以根据实际需要灵活设置,例如,正样本可以包括用户观看时长达到一定值的视频、用户转发过的视频等。Positive and negative sample construction: After extracting features and splicing the features, positive and negative samples can be constructed based on the obtained features, and the construction method can be flexibly set according to actual needs. Forwarded videos, etc.

离线样本中心:离线样本中心可以从候选池中随机选择视频作为负样本,并将负样本输入。Offline sample center: The offline sample center can randomly select videos from the candidate pool as negative samples, and input the negative samples.

召回模型在线训练:在构造得到正负样本后,可以基于正负样本对召回模型进行在线训练。Online training of recall model: After constructing positive and negative samples, the recall model can be trained online based on the positive and negative samples.

用户塔DNN提取账号特征:召回模型包含用户塔DNN模块,在对召回模型进行在线训练的过程中,用户塔DNN模块对正、负样本进行特征提取,得到账号特征,该账号特征以向量的形式存在,其中,用户塔DNN模块可以每隔一段时间,对新增的正负样本进行特征抽取,该时间间隔可以是分钟级的。User tower DNN extracts account features: The recall model includes the user tower DNN module. During the online training process of the recall model, the user tower DNN module extracts features from positive and negative samples to obtain account features, which are in the form of vectors. Exist, among them, the user tower DNN module can perform feature extraction on the newly added positive and negative samples at regular intervals, and the time interval can be at the minute level.

视频塔DNN提取视频特征:召回模型包含视频塔DNN,视频塔DNN可以对候选池中包含的视频进行特征抽取,得到视频特征,该视频特征以向量的形式存在,其中,候选池中的视频是动态变化的,视频塔DNN可以每隔一段时间,对新版本的候选池中的视频进行特征抽取,该时间间隔可以是分钟级的。Video tower DNN extracts video features: The recall model includes video tower DNN. Video tower DNN can perform feature extraction on the videos contained in the candidate pool to obtain video features. The video features exist in the form of vectors, where the video in the candidate pool is Dynamically changing, the video tower DNN can perform feature extraction on the videos in the candidate pool of the new version at regular intervals, and the time interval can be on the order of minutes.

在线索引更新:在得到候选池中视频的视频特征后,可以将视频特征导入搜索库,例如,Faiss,其中,Faiss是Facebook AI团队开源的针对聚类和相似性搜索库。Online index update: After obtaining the video features of the videos in the candidate pool, the video features can be imported into a search library, such as Faiss, which is an open source search library for clustering and similarity by the Facebook AI team.

在线服务:在线服务从搜索库中搜索出与账号特征匹配的视频。Online services: Online services search for videos matching account characteristics from a search base.

存储账号特征:用户塔DNN模块对正、负样本进行特征提取,得到账号特征之后,本实施例中,将得到的账号特征存储至账号特征库中,并记录账号特征的获取时间。Store account features: The user tower DNN module performs feature extraction on positive and negative samples, and after obtaining account features, in this embodiment, the obtained account features are stored in the account feature database, and the acquisition time of the account features is recorded.

存储视频特征:视频塔DNN可以对候选池中包含的视频进行特征抽取,得到视频特征之后,将得到的视频特征存储至账号特征库中,并记录每个版本的缓存池对应的视频特征的获取时间。Store video features: The video tower DNN can perform feature extraction on the videos contained in the candidate pool, and after obtaining the video features, store the obtained video features in the account feature database, and record the acquisition of the corresponding video features of each version of the cache pool. time.

采样比对:在账号特征库中,每隔预设时间间隔从该时间将对应的账号特征中,离线采样得到目标账号特征,并确定目标账号特征的获取时间。其中,预设时间间隔和采样量可以任意设置,例如,预设时间将可以是1小时,采样量可以是10万,也可以说,每隔1小时采样10万目标账号特征。然后,对于每个目标账号特征,在视频特征库中搜索获取时间早于该目标账号特征的获取时间、且与目标账号特征的获取时间最接近的侯选池对应的视频特征,将侯选池对应的视频特征作为该目标账号特征的视频特征集合,分别计算目标账号特征与对应视频特征集合中含有的各条目标视频特征之间的匹配度,并基于匹配度数值由大到小的排序选取排名前K的目标视频特征。其中,K的数值可以根据实际需要灵活设置,例如,可以通过多次随机采样计算recall指标的AA实验波动程度来确定K的数值。Sampling comparison: In the account feature database, the target account features are obtained by offline sampling from the corresponding account features at the preset time interval, and the acquisition time of the target account features is determined. The preset time interval and sampling amount can be set arbitrarily. For example, the preset time can be 1 hour, and the sampling amount can be 100,000. It can also be said that 100,000 target account features are sampled every 1 hour. Then, for each target account feature, search the video feature database for the video feature corresponding to the candidate pool whose acquisition time is earlier than the acquisition time of the target account feature and is closest to the acquisition time of the target account feature. As the video feature set of the target account feature, calculate the matching degree between the target account feature and each target video feature contained in the corresponding video feature set, and select the top K based on the matching degree value in descending order. target video features. Among them, the value of K can be flexibly set according to actual needs. For example, the value of K can be determined by calculating the fluctuation degree of the AA experiment of the recall index through multiple random sampling.

计算召回率:确定排名前K的目标视频特征所对应的视频中,正样本的数量;并确定视频特征集合所对应的视频中,正样本的数量;获取二者的比值,得到召回率,并基于该时间间隔内选出的每个目标账号特征对应的召回率,进行平均得到该时间间隔内的召回率。Calculate the recall rate: determine the number of positive samples in the videos corresponding to the top K target video features; and determine the number of positive samples in the videos corresponding to the video feature set; obtain the ratio of the two to obtain the recall rate, and Based on the recall rate corresponding to each target account feature selected in the time interval, the recall rate in the time interval is obtained by averaging.

召回率存储:在得到每个时间间隔的召回率后,存储每个时间间隔的召回率、召回模型的名称、时间间隔、该时间间隔内采样的目标账号特征数量、训练视频样本,其中,可以存储在表或者分布式文件系统中。Recall rate storage: After obtaining the recall rate of each time interval, store the recall rate of each time interval, the name of the recall model, the time interval, the number of target account features sampled in the time interval, and the training video samples. Among them, you can Stored in a table or distributed file system.

模型比对:可以根据存储的召回率,对不同时间间隔对应版本的召回模型、不同召回模型的召回率进行比对,从而选择效果最好的召回模型进行上线,提高迭代效率。并且,还可以将召回模型在不同时间间隔的召回率进行聚合。Model comparison: According to the stored recall rate, the recall models corresponding to different time intervals and the recall rates of different recall models can be compared, so as to select the recall model with the best effect to go online and improve the iteration efficiency. And, the recall rate of the recall model at different time intervals can also be aggregated.

参见图6,图6是本申请的一示例性实施例示出的一种基于召回模型的训练评估装置的框图。如图6所示,该装置包括:Referring to FIG. 6 , FIG. 6 is a block diagram of a recall model-based training evaluation apparatus shown in an exemplary embodiment of the present application. As shown in Figure 6, the device includes:

特征获取模块610,配置为获取召回模型在线上训练过程中基于训练视频样本所提取的账号特征和视频特征,并将账号特征和视频特征分别存储至账号特征库和视频特征库;Thefeature acquisition module 610 is configured to acquire account features and video features extracted based on the training video samples in the online training process of the recall model, and store the account features and video features in the account feature database and the video feature database respectively;

离线采样模块620,配置为从账号特征库中离线采样得到目标账号特征,并针对目标账号特征从视频特征库中搜索入库时间相关的视频特征集合;Theoffline sampling module 620 is configured to obtain the target account feature by offline sampling from the account feature library, and search for the video feature set related to the storage time from the video feature library for the target account feature;

匹配度排名模块630,配置为分别计算目标账号特征与视频特征集合中含有的各条目标视频特征之间的匹配度,并基于匹配度数值由大到小的排序选取指定排名;The matchingdegree ranking module 630 is configured to respectively calculate the matching degree between the target account feature and each target video feature contained in the video feature set, and select a specified ranking based on the matching degree value from large to small;

召回率计算模块640,配置为根据与指定排名对应的目标视频特征相关联的正样本数量,以及与视频特征集合中的目标视频特征相关联的正样本数据计算召回率,召回率用于评估召回模型的训练效果。The recallrate calculation module 640 is configured to calculate the recall rate according to the number of positive samples associated with the target video feature corresponding to the specified ranking and the positive sample data associated with the target video feature in the video feature set, and the recall rate is used to evaluate the recall The training effect of the model.

在另一示例性实施例中,特征获取模块610还配置为在将账号特征和视频特征分别存储至账号特征库和视频特征库的过程中,还将账号特征的获取时间以及视频特征的获取时间分别记录到账号特征库和视频特征库中。In another exemplary embodiment, thefeature acquisition module 610 is further configured to, in the process of storing the account feature and the video feature in the account feature database and the video feature database respectively, also obtain the acquisition time of the account feature and the acquisition time of the video feature. They are recorded in the account feature database and the video feature database respectively.

在另一示例性实施例中,离线采样模块620包括:In another exemplary embodiment, theoffline sampling module 620 includes:

特征采样单元,从账号特征库中离线采样目标账号特征,并确定目标账号特征对应的获取时间;The feature sampling unit samples the target account feature offline from the account feature library, and determines the acquisition time corresponding to the target account feature;

特征搜索单元,配置为在视频特征库中搜索获取时间早于目标账号特征的获取时间、且与目标账号特征的获取时间最接近的视频特征,将搜索到的视频特征作为视频特征集合中的目标视频特征。A feature search unit, configured to search the video feature library for a video feature whose acquisition time is earlier than the acquisition time of the target account feature and is closest to the acquisition time of the target account feature, and uses the searched video feature as the target in the video feature set video features.

在另一示例性实施例中,特征采样单元还配置为基于预设时间间隔周期性从账号特征库中离线采样指定数量的目标账号特征,其中,预设时间间隔大于召回模型在线上训练过程中进行账号特征和视频特征提取的频率。In another exemplary embodiment, the feature sampling unit is further configured to periodically sample a specified number of target account features offline from the account feature database based on a preset time interval, wherein the preset time interval is greater than that during the online training of the recall model. Frequency of account feature and video feature extraction.

在另一示例性实施例中,该装置还包括:In another exemplary embodiment, the apparatus further includes:

召回率获取子单元,配置为获取在不同的预设时间间隔内计算得到的召回率;The recall rate acquisition subunit is configured to acquire recall rates calculated in different preset time intervals;

召回率对比子单元,配置为对获取到的多个召回率进行数值大小对比,选取数值最大的召回率对应的召回模型版本作为训练效果最好的召回模型应用于信息推荐系统。The recall rate comparison sub-unit is configured to compare the values of the obtained recall rates, and select the recall model version corresponding to the recall rate with the largest value as the recall model with the best training effect and apply it to the information recommendation system.

在另一示例性实施例中,匹配度排名模块630配置为分别针对目标账号特征和各条目标视频特征进行向量内积运算,将得到的运算作为目标账号特征与对应目标视频特征之间的匹配度。In another exemplary embodiment, the matchingdegree ranking module 630 is configured to perform a vector inner product operation on the target account feature and each target video feature respectively, and use the obtained operation as a match between the target account feature and the corresponding target video feature Spend.

需要说明的是,上述实施例所提供的基于召回模型的训练评估装置与上述实施例所提供的基于召回模型的训练评估方法属于同一构思,其中各个模块和单元执行操作的具体方式已经在方法实施例中进行了详细描述,此处不再赘述。It should be noted that the recall model-based training and evaluation apparatus provided by the above embodiments and the recall model-based training and evaluation methods provided by the above embodiments belong to the same concept, and the specific manners for performing operations of each module and unit have been implemented in the method. The example has been described in detail and will not be repeated here.

本申请的实施例还提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当一个或多个程序被一个或多个处理器执行时,使得电子设备实现上述各个实施例中提供的方法。Embodiments of the present application also provide an electronic device, including: one or more processors; and a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors, The electronic device is made to implement the methods provided in the above embodiments.

图7示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。FIG. 7 shows a schematic structural diagram of a computer system suitable for implementing the electronic device according to the embodiment of the present application.

需要说明的是,图7示出的电子设备的计算机系统1600仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。It should be noted that thecomputer system 1600 of the electronic device shown in FIG. 7 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.

如图7所示,计算机系统1600包括中央处理单元(Central Processing Unit,CPU)1601,其可以根据存储在只读存储器(Read-Only Memory,ROM)1602中的程序或者从储存部分1608加载到随机访问存储器(Random Access Memory,RAM)1603中的程序而执行各种适当的动作和处理,例如执行上述实施例中所述的方法。在RAM 1603中,还存储有系统操作所需的各种程序和数据。CPU1601、ROM 1602以及RAM 1603通过总线1604彼此相连。输入/输出(Input/Output,I/O)接口1605也连接至总线1604。As shown in FIG. 7 , thecomputer system 1600 includes a central processing unit (Central Processing Unit, CPU) 1601, which can be loaded into a random device according to a program stored in a read-only memory (Read-Only Memory, ROM) 1602 or from a storage part 1608 A program in a random access memory (RAM) 1603 is accessed to perform various appropriate actions and processes, for example, the methods described in the above embodiments are performed. In theRAM 1603, various programs and data necessary for system operation are also stored. TheCPU 1601 , theROM 1602 , and theRAM 1603 are connected to each other through a bus 1604 . An Input/Output (I/O) interface 1605 is also connected to the bus 1604 .

以下部件连接至I/O接口1605:包括键盘、鼠标等的输入部分1606;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分1607;包括硬盘等的储存部分1608;以及包括诸如LAN(Local AreaNetwork,局域网)卡、调制解调器等的网络接口卡的通信部分1609。通信部分1609经由诸如因特网的网络执行通信处理。驱动器1610也根据需要连接至I/O接口1605。可拆卸介质1611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1610上,以便于从其上读出的计算机程序根据需要被安装入储存部分1608。The following components are connected to the I/O interface 1605: an input section 1606 including a keyboard, a mouse, etc.; anoutput section 1607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc. ; astorage section 1608 including a hard disk and the like; and a communication section 1609 including a network interface card such as a LAN (Local Area Network) card, a modem, and the like. The communication section 1609 performs communication processing via a network such as the Internet.Drivers 1610 are also connected to I/O interface 1605 as needed. A removable medium 1611, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on thedrive 1610 as needed so that a computer program read therefrom is installed into thestorage section 1608 as needed.

特别地,根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的计算机程序。在这样的实施例中,该计算机程序可以通过通信部分1609从网络上被下载和安装,和/或从可拆卸介质1611被安装。在该计算机程序被中央处理单元(CPU)1601执行时,执行本申请的系统中限定的各种功能。In particular, according to embodiments of the present application, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program comprising a computer program for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network through the communication portion 1609, and/or installed from theremovable medium 1611. When the computer program is executed by the central processing unit (CPU) 1601, various functions defined in the system of the present application are executed.

需要说明的是,本申请实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的计算机程序。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的计算机程序可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the embodiments of the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable Compact Disc Read-Only Memory (CD-ROM), optical storage device, magnetic storage device, or any suitable of the above The combination. In this application, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying a computer-readable computer program therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . A computer program embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。其中,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Wherein, each block in the flowchart or block diagram may represent a module, program segment, or part of code, and the above-mentioned module, program segment, or part of code contains one or more executables for realizing the specified logical function instruction. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented in special purpose hardware-based systems that perform the specified functions or operations, or can be implemented using A combination of dedicated hardware and computer instructions is implemented.

描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments of the present application may be implemented in software or hardware, and the described units may also be provided in a processor. Among them, the name of these units does not constitute a limitation of the unit itself under certain circumstances.

本申请的另一方面还提供了一种计算机可读存储介质,其上存储有计算机可读指令,该计算机可读指令被电子设备的处理器执行时,使电子设备实现如前所述的方法。该计算机可读存储介质可以是上述实施例中描述的电子设备中所包含的,也可以是单独存在,而未装配入该电子设备中。Another aspect of the present application also provides a computer-readable storage medium on which computer-readable instructions are stored. When the computer-readable instructions are executed by a processor of an electronic device, the electronic device can implement the aforementioned method. . The computer-readable storage medium may be included in the electronic device described in the above embodiments, or may exist alone without being assembled into the electronic device.

本申请的另一方面还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,计算机指令被处理器执行时实现上述各个实施例中提供的方法。其中,该计算机指令可以存储在计算机可读存储介质中;电子设备的处理器可以从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该电子设备执行上述各个实施例中提供的方法。Another aspect of the present application also provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and when the computer instructions are executed by a processor, implement the methods provided in the above embodiments. Wherein, the computer instruction can be stored in a computer-readable storage medium; the processor of the electronic device can read the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the electronic device performs the above-mentioned various embodiments. provided method.

上述内容,仅为本申请的较佳示例性实施例,并非用于限制本申请的实施方案,本领域普通技术人员根据本申请的主要构思和精神,可以十分方便地进行相应的变通或修改,故本申请的保护范围应以权利要求书所要求的保护范围为准。The above contents are only preferred exemplary embodiments of the present application, and are not intended to limit the embodiments of the present application. Those of ordinary skill in the art can easily make corresponding changes or modifications according to the main concept and spirit of the present application, Therefore, the protection scope of this application shall be subject to the protection scope required by the claims.

Claims (11)

CN202111575932.5A2021-12-202021-12-20 Training evaluation method, device, equipment and storage medium based on recall modelActiveCN114357242B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111575932.5ACN114357242B (en)2021-12-202021-12-20 Training evaluation method, device, equipment and storage medium based on recall model

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111575932.5ACN114357242B (en)2021-12-202021-12-20 Training evaluation method, device, equipment and storage medium based on recall model

Publications (2)

Publication NumberPublication Date
CN114357242Atrue CN114357242A (en)2022-04-15
CN114357242B CN114357242B (en)2025-05-06

Family

ID=81100425

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111575932.5AActiveCN114357242B (en)2021-12-202021-12-20 Training evaluation method, device, equipment and storage medium based on recall model

Country Status (1)

CountryLink
CN (1)CN114357242B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115186738A (en)*2022-06-202022-10-14北京百度网讯科技有限公司 Model training method, device and storage medium
CN115221353A (en)*2022-07-122022-10-21腾讯科技(深圳)有限公司 A video recommendation method, apparatus, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20180246867A1 (en)*2017-02-272018-08-30International Business Machines CorporationUnified text analytics annotator development life cycle combining rule-based and machine learning based techniques
US20190163809A1 (en)*2017-11-302019-05-30Bby Solutions, Inc.Streaming events analysis for search recall improvements
US20200082296A1 (en)*2018-09-062020-03-12Quickpath Analytics, Inc.Real-time drift detection in machine learning systems and applications
CN112288092A (en)*2019-07-232021-01-29百度时代网络技术(北京)有限公司Model evaluation method, model evaluation device, electronic device and storage medium
CN113377972A (en)*2020-03-092021-09-10北京达佳互联信息技术有限公司Multimedia content recommendation method and device, computing equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20180246867A1 (en)*2017-02-272018-08-30International Business Machines CorporationUnified text analytics annotator development life cycle combining rule-based and machine learning based techniques
US20190163809A1 (en)*2017-11-302019-05-30Bby Solutions, Inc.Streaming events analysis for search recall improvements
US20200082296A1 (en)*2018-09-062020-03-12Quickpath Analytics, Inc.Real-time drift detection in machine learning systems and applications
CN112288092A (en)*2019-07-232021-01-29百度时代网络技术(北京)有限公司Model evaluation method, model evaluation device, electronic device and storage medium
CN113377972A (en)*2020-03-092021-09-10北京达佳互联信息技术有限公司Multimedia content recommendation method and device, computing equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115186738A (en)*2022-06-202022-10-14北京百度网讯科技有限公司 Model training method, device and storage medium
CN115221353A (en)*2022-07-122022-10-21腾讯科技(深圳)有限公司 A video recommendation method, apparatus, computer equipment and storage medium

Also Published As

Publication numberPublication date
CN114357242B (en)2025-05-06

Similar Documents

PublicationPublication DateTitle
CN111931062B (en)Training method and related device of information recommendation model
CN109104620B (en)Short video recommendation method and device and readable medium
CN107609152B (en)Method and apparatus for expanding query expressions
WO2020094060A1 (en)Recommendation method and apparatus
CN112084413B (en)Information recommendation method, device and storage medium
CN113301442B (en) Method, apparatus, medium and program product for determining live broadcast resources
WO2020107624A1 (en)Information pushing method and apparatus, electronic device and computer-readable storage medium
CN111881358B (en)Object recommendation system, method and device, electronic equipment and storage medium
WO2023284516A1 (en)Information recommendation method and apparatus based on knowledge graph, and device, medium, and product
CN114564644A (en)Model training method, resource recommendation method, device, electronic equipment and storage medium
CN114357242A (en) Training evaluation method based on recall model and device, equipment and storage medium
CN113962285A (en) Method, device, electronic device and computer medium for fusion of multiple recall results
CN117573973A (en) Resource recommendation methods, devices, electronic devices and storage media
CN115131058B (en) Account identification method, device, equipment and storage medium
CN116910201A (en) Dialog data generation method and related equipment
CN115774817A (en) Information processing model training method, information processing method and related equipment
CN115438221A (en) Recommendation method, device and electronic equipment based on artificial intelligence
US12050647B2 (en)Open-domain trending hashtag recommendations
CN115114442B (en)Knowledge graph updating method and device, storage medium and electronic equipment
US20200026769A1 (en)Present Controlled Heterogeneous Digital Content to Users
CN115756821A (en)Online task processing model training and task processing method and device
CN116980472A (en)Push data processing method, data push model training method and device
US20220300852A1 (en)Method and System for Automating Scenario Planning
WO2022262561A1 (en)Multimedia resource processing method and apparatus, and device and storage medium
CN114647761A (en)Image frame processing method and device, equipment and computer readable storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp