Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
First, terms referred to in the embodiments of the present application are briefly described:
masking: the method is to cover the layout area to select the layout area, and the selection method may be a forward selection method by masking or a reverse selection method by masking, that is, the covered area may be a selected area or the uncovered area may be a selected area. Optionally, for the area needing to be selected, after the area is covered by a colored translucent or opaque color area, the area is selected. Optionally, the different types of regions are covered by color regions of different colors, or the regions corresponding to different objects are covered by color regions of different colors. Illustratively, when the object in the image is shown in the form of a mask according to the image recognition result, the recognized image includes an object a, an object B, and an object C, a red semitransparent mask is superimposed over the area of the recognized object a in the image to indicate the position of the object a in the image, a green semitransparent mask is superimposed over the area of the recognized object B in the image to indicate the position of the object B in the image, and a blue semitransparent mask is superimposed over the area of the recognized object C in the image to indicate the position of the object C in the image.
Illustratively, the application scenarios related to the present application at least include the following scenarios:
the server comprises a recommendation information implantation module, and the recommendation information corresponding to the product a is implanted into the target video through the server.
Firstly, a server acquires a target video frame needing position detection from the target video, the recommendation information implantation module comprises an example segmentation model, and the target video frame is subjected to image recognition through the example segmentation model to obtain an object type and an area corresponding to the object type in the target video frame, such as: the obtaining of the target video frame through the example segmentation model recognition comprises: the desktop type is in the area b, the role type is in the area c and the area d, the vase type is in the area e, and the cup type is in the area f, the position where the recommendation information needs to be implanted is determined in the area b corresponding to the desktop type according to the identification result of the example segmentation model, and the recommendation information is implanted into the scene segment corresponding to the target recommendation frame.
Schematically, taking the example of implanting recommendation information corresponding to milk into a target video for explanation, after obtaining a three-dimensional model corresponding to the milk, a server takes a foreground image of the three-dimensional model as recommendation information. After the server acquires the target video, the target video is divided into a plurality of video segments by scenes, and illustratively, the video content of the target video includes the following scenes: the method comprises the steps of taking a person for eating, talking about, walking and meeting, wherein the scene of eating of the person comprises eating shot in the front and eating shot in the side, obtaining a video clip of the eating shot in the front, taking out a first frame key frame in the video clip as a target video frame, carrying out image recognition on the target video frame to obtain a desktop area 1, a person area 2, a lunch box area 3, a water cup area 4 and an electronic equipment area 5, determining the implantation position of recommended information based on the desktop area 1, and implanting a foreground image of a three-dimensional milk model into the implantation position.
Schematically, fig. 1 is a schematic diagram of a recommendation information implantation effect provided by an exemplary embodiment of the present application, as shown in fig. 1, atarget video frame 100 includes acharacter 110, adesktop 120, and awater cup 130, and after an area corresponding to thedesktop 120 is obtained by performing image recognition on thetarget video frame 100,recommendation information 140, that is, a three-dimensional model foreground image of milk, is implanted corresponding to thedesktop 120.
Optionally, in the above exemplary application scenario, the recommendation information is described as recommendation information of milk, and in actual operation, the recommendation information may also be recommendation information of products such as coffee, vase, mobile phone, and earphone, which is not limited in this application embodiment.
Optionally, in the above exemplary application scenario, the recommendation information is illustrated as a foreground image of a three-dimensional model of milk, and in an actual operation, the recommendation information may also be implemented as a billboard of a product, a map information of the product on other objects, and the like, which is not limited in this embodiment of the application.
It should be noted that the above application scenarios are only illustrative examples, and in actual operation, the application scenarios for determining the implantation position of the recommendation information through the recommendation information implantation module may all use the determination method for determining the implantation position of the recommendation information provided in the embodiment of the present application, which is not limited in the embodiment of the present application.
Fig. 2 is a flowchart of a method for determining a recommended information implantation position according to an exemplary embodiment of the present application, which is described by way of example as being applied to a computer device, where the computer device may be a server, as shown in fig. 2, and the method includes:
step 201, acquiring a target video.
Optionally, the target video is a video to be implanted with recommendation information.
Optionally, the target video is a video stored in the server, or the target video is a video received by the server and sent by the terminal.
Optionally, the target Video includes at least one of a television show, a movie, a short Video, a Video published in a social platform, and a Music Video (MV).
Step 202, a target video frame in the target video is obtained according to the scene change condition of the target video, wherein the target video frame is a video frame used for determining the implantation position of the recommendation information.
Optionally, the manner of acquiring the target video frame in the target video according to the scene change condition includes at least one of the following manners:
firstly, detecting a scene of a target video, carrying out video slicing on the target video according to a scene change condition to obtain a video clip corresponding to the scene change condition, and acquiring a key frame in the video clip as a target video frame;
optionally, the target video frame may be a first frame key frame in the video clip, or may be all key frames in the target video clip, which is not limited in this embodiment of the application. Optionally, the target video frame is used to determine an implantation position of the recommendation information in the video segment corresponding to the scene change condition.
Optionally, the scene change condition is determined according to a similarity between at least one group of video frames in the target video. Optionally, the scene change condition may be determined according to a similarity between two adjacent video frames in the target video, that is, each group of video frames in the at least one group of video frames includes two adjacent frames in the target video; or may be determined according to the similarity between the video frames at every other preset frame number, that is, each group of the at least one group of video frames includes two video frames at every other preset frame number. Optionally, when the similarity between the at least one group of video frames is lower than the preset similarity, a video frame with a timestamp earlier in the group of video frames corresponds to a scene before the current scene, and a video frame with a timestamp later in the group of video frames corresponds to the current scene. Illustratively, if it is determined that the similarity between the ith video frame and the (i + 1) th video frame in the target video is lower than the preset similarity, the ith video frame corresponds to the kth scene, that is, corresponds to the kth video clip, and the (i + 1) th video frame corresponds to the (k + 1) th scene, that is, corresponds to the (k + 1) th video clip.
Optionally, in the video slicing process, the rule for splitting the video is implemented based on a shot segmentation algorithm.
Optionally, when video slicing is performed on the target video according to the scene change condition, schematically, a video slice obtained by current slicing is the ith video slice, when the scene change degree is large, the current video frame is used as the first frame video frame of the (i + 1) th video slice, and i is a positive integer.
Illustratively, after detecting the scene of the target video, the target video is segmented into k video segments according to the scene change condition, and the first key frame of each video segment in the k video segments is used as the target video frame, that is, k target video frames in total, where k is a positive integer.
And secondly, detecting the scene of the target video, and taking the video frame with larger change degree as the target video frame according to the change degree of the scene change condition.
Optionally, the detection of the scene change degree is continuously performed on the continuous video frames in the target video, and when the scene change degree between the two detected video frames is large, the video frame with the later timestamp is taken as the target video frame. The detection process can detect two adjacent frames of video frames, such as: the scene change degree between the ith frame video frame and the (i + 1) th frame video frame is detected, and the detection can also be performed on two video frames at intervals of a preset number, such as: and when the preset number is n, detecting the scene change degree between the ith frame of video frame and the (i + n) th frame of video frame.
Step 203, performing image recognition on the target video frame to obtain mask information of the target video frame, where the mask information includes a first region of an object of the target type in the target video frame.
Optionally, the mask information is used to represent a region corresponding to the object in the target video frame, and optionally, the mask information includes regions corresponding to at least two types of objects in the target video frame, where the regions include a first region corresponding to the object of the target type in the target video frame.
Optionally, the server includes an example segmentation model, where the example segmentation model is a model obtained by training a sample image labeled with an object class. And inputting the target video frame into the example segmentation model, identifying the image content in the target video frame by the example segmentation model, and outputting the type of the object identified from the target video frame and the area occupied by the object in the target video frame.
Optionally, the Instance segmentation model is a MaskR-CNN model based on an Instance segmentation (Instance segmentation) algorithm.
Optionally, after the target video frame is identified by the example segmentation model, a mask result set in the target video frame may be further obtained, where the mask result set includes an object classification, an object region, and a confidence of each object obtained by identification, and the confidence is used to indicate a prediction accuracy of an identification result (the object classification and the object region) of the object. Optionally, after removing the result whose confidence level is smaller than the confidence level requirement in the mask result set, the filtered mask result set is obtained as mask information, that is, the object identification result whose confidence level meets the confidence level requirement is retained as the mask information of the target video frame.
Optionally, the target type includes at least one of a desktop type, a floor type, a sill type, and a counter type.
And step 204, determining the implantation position of the recommendation information in the target video frame based on the first area.
Optionally, when the first area is an area where at least one plane of a desktop, a ground, a windowsill and a counter is located, the implantation position of the recommendation information is determined by a position where other objects on the plane are occluded.
Optionally, a target sub-region where connectivity exists with respect to the first region is determined from the mask information, and a position where the target sub-region is located is determined as an implantation position of the recommendation information in the target video frame.
Optionally, the target sub-area that is communicated with the first area refers to a sub-area surrounded or semi-surrounded by the first area, where when the target sub-area is surrounded by the first area, an object corresponding to the target sub-area is placed on a plane corresponding to the first area (for example, a mobile phone is placed on a desktop), a placement range of the object corresponding to the target sub-area is in the plane corresponding to the first area, and a display range is in the range of the first area; when the target sub-area is half-surrounded by the first area, the object corresponding to the target sub-area is placed on the plane corresponding to the first area, and the display range of the object corresponding to the target sub-area exceeds the range of the first area.
Optionally, when the position of the target sub-region is determined as the implantation position of the recommendation information, any point on a side or a contour of the target sub-region may be used as a reference as the implantation position of the recommendation information when the recommendation information is implanted, and illustratively, when the side of the target sub-region located at the lowest part in the target video frame is used as a reference, a region where the side of the lowest part of the recommendation information is aligned with the side of the lowest part of the target sub-region is the implantation position of the recommendation information; when the point of the target sub-region, which is positioned at the lowest part in the target video frame, is taken as a reference, when the recommendation information is implanted, the region in which the point at the lowest part of the recommendation information is aligned with the point at the lowest part of the target sub-region is the implantation position of the recommendation information.
Optionally, the target sub-region is a region corresponding to a non-character type object. Optionally, when there are a plurality of target sub-regions in the target video frame, which are communicated with the first region, the following target sub-regions are selected:
firstly, selecting the position of any target sub-region from the target sub-regions as an implantation position of recommendation information;
secondly, selecting a target sub-region with the smallest region area from the multiple target sub-regions, and taking the position of the target sub-region as an implantation position of the recommendation information;
thirdly, a central role is also identified in the target video frame, a target sub-region is determined according to the distances between the target sub-regions and the central role, and the position of the target sub-region is used as the implantation position of the recommendation information.
Referring to fig. 3, schematically, after image recognition is performed on animage 300, an object type, an object region, and a confidence level in theimage 300 are obtained, where the confidence level requirement is 0.5, so that an image recognition result that does not meet the confidence level requirement is removed, mask information that meets the confidence level requirement is retained, including desktop region 310 (confidence 0.990), cup region 320 (confidence 0.933), vase region 330 (confidence 0.855), cell phone region 340 (confidence 0.904), and people region 350 (confidence 0.999), wherein thecup area 320, thevase area 330 and themobile phone area 340 are target sub-areas communicated with thedesktop area 310, the position of thecup area 320 is determined from the mask information as the implantation position of the recommended information by the method for selecting the target sub-area, and the lowest side of thecup area 320 is used as the lowest side of theimplantation area 360 of the recommendation information.
In summary, according to the method for determining the implantation position of the recommendation information provided In this embodiment, the target Video frame is obtained from the target Video, and the image recognition processing is performed on the target Video frame to obtain the mask information of the target Video frame, so that the implantation position of the recommendation information In the target Video frame is determined on the basis of the first region of the target type object corresponding to the target Video frame, thereby implementing a process of automatically implanting the recommendation information In the target Video In a Video-In manner, reducing the workload of the implantation process of the recommendation information, improving the implantation efficiency of the recommendation information, and saving time resources and human resources.
Referring to fig. 4, schematically, a process diagram of a method for determining a recommended information implantation position according to an exemplary embodiment of the present application is shown, as shown in fig. 4, the method for determining a recommended information implantation position is divided into three steps:
step one, acquiring a complete video material.
And step two, single-shot video slicing.
Optionally, the video material is subjected to video slicing according to a video scene, the video material is divided into video segments based on scene changes, and a target video frame is acquired from the video segments as a video frame for position detection.
And step three, determining the implantation position of the recommendation information through an advertisement space detection algorithm.
Optionally, first, the example in the target video frame is detected through a MaskR-CNN model segmented based on the example, and the implantation position of the recommendation information is determined from the detection result.
In an optional embodiment, the target sub-region is determined according to a position relationship between a candidate region in communication with the first region and the central role, fig. 5 is a flowchart of a method for determining a recommended information implantation position according to another exemplary embodiment of the present application, which is described by taking application of the method to a server as an example, as shown in fig. 5, the method includes:
step 501, obtaining a target video.
Optionally, the target video is a video to be implanted with recommendation information.
Optionally, the target video includes at least one of a television show, a movie, a short video, a video posted in a social platform, and an MV.
Step 502, obtaining a target video frame in the target video according to the scene change condition of the target video, where the target video frame is a video frame for determining the implantation position of the recommendation information.
Optionally, the scene change condition is determined according to the similarity between at least one group of video frames in the target video.
Optionally, the manner of acquiring the target video frame in the target video according to the scene change condition includes at least one of the following manners:
firstly, detecting a scene of a target video, carrying out video slicing on the target video according to a scene change condition to obtain a video clip corresponding to the scene change condition, and acquiring a key frame in the video clip as a target video frame;
and secondly, detecting the scene of the target video, and taking the video frame with larger change degree as the target video frame according to the change degree of the scene change condition.
Step 503, performing image recognition on the target video frame to obtain mask information of the target video frame, where the mask information includes a first region of an object of the target type in the target video frame.
Optionally, the mask information is used to indicate a corresponding region of the object in the target video frame. Optionally, the mask information includes regions corresponding to at least two types of objects in the target video frame, where the regions include a first region of the object of the target type in the target video frame.
Optionally, after the target video frame is identified by the example segmentation model, a mask result set in the target video frame may be further obtained, where the mask result set includes an object classification, an object region, and a confidence of each object obtained by identification, and the confidence is used to indicate a prediction accuracy of an identification result (the object classification and the object region) of the object. Optionally, after removing the result whose confidence level is smaller than the confidence level requirement in the mask result set, the filtered mask result set is obtained as mask information, that is, the object identification result whose confidence level meets the confidence level requirement is retained as the mask information of the target video frame.
Optionally, the target type includes at least one of a desktop type, a floor type, a sill type, and a counter type.
Step 504, n candidate regions which are communicated with the first region are determined from the mask information, wherein n is a positive integer.
Optionally, the target sub-area that is in communication with the first area refers to a sub-area surrounded or semi-surrounded by the first area, where when the target sub-area is surrounded by the first area, an object corresponding to the target sub-area is placed on a plane corresponding to the first area, and a display range is within a range of the first area (for example, a mobile phone placed on a desktop in a horizontal manner); when the target sub-area is half-surrounded by the first area, that is, the object corresponding to the target sub-area is placed on the plane corresponding to the first area, and the display range of the object corresponding to the target sub-area exceeds the range of the first area (for example, a water bottle standing on a desktop is shot from an oblique upper side, and a part of the display range of the water bottle exceeds the range of the desktop), that is, the n candidate areas which are communicated with the first area are the display areas of the n objects placed on the plane corresponding to the first area.
Optionally, taking the first area as an area corresponding to a desktop as an example, when n objects are placed on the desktop, the mask information includes n candidate areas communicating with the first area, such as: a mobile phone area, a water cup area, a vase area, etc.
Optionally, when the target video frame includes at least two objects belonging to the target type, and the first region includes at least two candidate sub-regions corresponding to the objects of the target type, it is necessary to first filter the first region, and then determine n candidate regions communicating with the filtered first region from the mask information. Optionally, the filtering of the first region is implemented by reserving a candidate sub-region with the largest area of the at least two candidate sub-regions as the filtered first region, and deleting other candidate sub-regions.
Fig. 6 is a schematic diagram of a filtering process of the first area according to an exemplary embodiment of the present application, and as shown in fig. 6, a mask corresponding to a desktop type is identified in an image 600: and thecandidate sub-region 610 and thecandidate sub-region 620 are compared, and thecandidate sub-region 610 is reserved as a first region after filtering.
Step 505, a second region corresponding to the central character is determined from the mask information.
Optionally, when the mask information only includes an area corresponding to one role type, taking the area corresponding to the role type as a second area corresponding to the central role; when the mask information includes a plurality of role regions, m role regions corresponding to the role types are determined from the mask information, m is a positive integer, and the role region with the largest region area in the m role regions is used as a second region corresponding to the central role, that is, the role region with the largest region area is a region corresponding to the central role in the target video frame.
Schematically, fig. 7 is a schematic diagram of a center role determination process provided in an exemplary embodiment of the present application, as shown in fig. 7, arole region 710, arole region 720, and arole region 730 corresponding to a role type are identified and obtained in animage 700, and by comparing the region areas of therole region 710, therole region 720, and therole region 730, arole region 710 with the largest area among therole region 710, therole region 720, and therole region 730 is reserved as a second region corresponding to a center role.
Step 506, determining a distance between each of the n candidate regions and the second region.
Optionally, the distance between the candidate region and the second region may be determined by a distance between leftmost edges of each region, may be determined by a distance between rightmost edges of each region, may be determined by a topmost edge of each region, may be determined by a bottommost edge of each region, and may also be determined by a distance between center points of each region, which is not limited in this embodiment of the present application.
Optionally, the region between the candidate region and the second region may be calculated by using an euclidean distance algorithm, a mahalanobis distance algorithm, a cosine distance algorithm, a hamming distance algorithm, a manhattan distance algorithm, or the like, which is not limited in this embodiment of the application.
And step 507, taking the region with the largest distance to the second region in the n candidate regions as a target sub-region.
And step 508, determining the position of the target sub-region as the implantation position of the recommendation information in the target video frame.
Optionally, the recommendation information is implanted in a manner of covering the target sub-region, so that the area of the target sub-region is multiplied by a preset multiple to serve as a target area of the recommendation information displayed in the target video frame, a target region corresponding to the recommendation information in the target video frame is determined according to the target area and a display shape of the recommendation information, and a position where the target region covers the target sub-region serves as an implantation position of the recommendation information in the target video frame.
Schematically, fig. 8 is a schematic diagram of a recommended information implantation position determination process according to an exemplary embodiment of the present application, and as shown in fig. 8, after afirst region 810 corresponding to a desktop is identified in animage 800, a candidate region 821, acandidate region 822, acandidate region 823, and acandidate region 824 that are connected to thefirst region 810 are separated, asecond region 830 corresponding to a central role is determined, thecandidate region 823 farthest from thesecond region 830 is determined as a target sub-region, and animplantation region 840 having a region area four times as large as thecandidate region 823 is disposed on the target sub-region.
In summary, according to the method for determining the implantation position of the recommendation information provided In this embodiment, the target Video frame is obtained from the target Video, and the image recognition processing is performed on the target Video frame to obtain the mask information of the target Video frame, so that the implantation position of the recommendation information In the target Video frame is determined on the basis of the first region of the target type object corresponding to the target Video frame, thereby implementing a process of automatically implanting the recommendation information In the target Video In a Video-In manner, reducing the workload of the implantation process of the recommendation information, improving the implantation efficiency of the recommendation information, and saving time resources and human resources.
According to the method provided by the embodiment, the candidate region which is farthest away from the central character is selected from the candidate regions communicated with the first region as the target sub-region, and the position of the target sub-region is used as the implantation position of the recommendation information, so that the problem that the normal display of the video content is influenced due to the fact that the central character is shielded by the implantation of the recommendation information is solved.
In an optional embodiment, the target video frame is obtained by slicing a video, optionally, the mask information is obtained by filtering confidence, fig. 9 is a flowchart of a method for determining a recommended information implantation position according to another exemplary embodiment of the present application, and as shown in fig. 9, the method includes:
step 901, acquiring a target video.
Optionally, the target video is a video to be implanted with recommendation information.
Optionally, the target video includes at least one of a television show, a movie, a short video, a video posted in a social platform, and an MV.
And 902, performing video slicing on the target video according to the scene change condition of the target video to obtain a video clip corresponding to the scene change condition.
Optionally, in the video slicing process, the rule for splitting the video is implemented based on a shot segmentation algorithm.
Optionally, when video slicing is performed on the target video according to the scene change condition, video slicing is performed on the target video according to the scene change degree, schematically, a video slice obtained by current slicing is the ith video slice, when the scene change degree is large, the current video frame is used as the first frame video frame of the (i + 1) th video slice, and i is a positive integer.
Step 903, acquiring a first key frame in the video clip as a target video frame.
And 904, carrying out image recognition on the target video frame to obtain a mask result set in the target video frame.
The mask result set comprises an object classification, an object region and a confidence coefficient of each object obtained through recognition, wherein the confidence coefficient is used for representing the prediction accuracy of the recognition result (the object classification and the object region) of the object.
And 905, removing the result with the confidence coefficient smaller than the confidence coefficient requirement in the mask result set from the mask result set, and obtaining the filtered mask result set as mask information of the target video frame.
Optionally, the mask information is used to indicate a corresponding region of the object in the target video frame, where the first region of the object of the target type in the target video frame is included.
And step 906, determining the implantation position of the recommendation information in the target video frame based on the first area.
Optionally, when the first area is an area where at least one plane of a desktop, a ground, a windowsill and a counter is located, the implantation position of the recommendation information is determined by a position where other objects on the plane are occluded.
Optionally, a target sub-region where connectivity exists with respect to the first region is determined from the mask information, and a position where the target sub-region is located is determined as an implantation position of the recommendation information in the target video frame.
Optionally, the target sub-region in communication with the first region refers to a sub-region surrounded or semi-surrounded by the first region.
Optionally, when the position where the target sub-region is located is determined as the implantation position of the recommendation information, any point on the side or the contour of the target sub-region may be used as a reference as the implantation position of the recommendation information during implantation.
Optionally, the target sub-region is a region corresponding to a non-character type object. Optionally, when there are a plurality of target sub-regions in the target video frame, which are communicated with the first region, the following target sub-regions are selected:
firstly, selecting the position of any target sub-region from the target sub-regions as an implantation position of recommendation information;
secondly, selecting a target sub-region with the smallest region area from the multiple target sub-regions, and taking the position of the target sub-region as an implantation position of the recommendation information;
thirdly, a central role is also identified in the target video frame, a target sub-region is determined according to the distances between the target sub-regions and the central role, and the position of the target sub-region is used as the implantation position of the recommendation information.
In summary, according to the method for determining the implantation position of the recommendation information provided In this embodiment, the target Video frame is obtained from the target Video, and the image recognition processing is performed on the target Video frame to obtain the mask information of the target Video frame, so that the implantation position of the recommendation information In the target Video frame is determined on the basis of the first region of the target type object corresponding to the target Video frame, thereby implementing a process of automatically implanting the recommendation information In the target Video In a Video-In manner, reducing the workload of the implantation process of the recommendation information, improving the implantation efficiency of the recommendation information, and saving time resources and human resources.
According to the method provided by the embodiment, the results with lower confidence degrees in the mask result set are filtered, and the results with confidence degrees meeting the confidence requirements are reserved, so that the mask information is obtained, and the identification accuracy of the subsequent first region, the subsequent second region and other regions and the accuracy of the recommended information implantation position are ensured.
Fig. 10 is a block diagram of a device for determining a recommended information placement position according to an exemplary embodiment of the present application, and is described by taking as an example that the device for determining a recommended information placement position is applied to a server, as shown in fig. 10, the device includes: anacquisition module 1010, anidentification module 1020, and adetermination module 1030;
an obtainingmodule 1010, configured to obtain a target video, where the target video is a video to be implanted with recommendation information;
the obtainingmodule 1010 is further configured to obtain a target video frame in the target video according to a scene change condition of the target video, where the target video frame is a video frame used for determining an implantation position of the recommendation information, and the scene change condition is determined according to a similarity between at least one group of video frames in the target video;
an identifyingmodule 1020, configured to perform image identification on the target video frame to obtain mask information of the target video frame, where the mask information includes areas corresponding to at least two types of objects in the target video frame, and includes a first area corresponding to the object of the target type in the target video frame;
a determiningmodule 1030, configured to determine, based on the first region, an implantation position of the recommendation information in the target video frame.
In an optional embodiment, the determiningmodule 1030 is further configured to determine, from the mask information, a target sub-region that is in communication with the first region; and determining the position of the target sub-region as the implantation position of the recommendation information in the target video frame.
In an optional embodiment, the determiningmodule 1030 is further configured to determine n candidate regions communicating with the first region from the mask information, where n is a positive integer; determining a second area corresponding to the central role from the mask information; determining a distance between each of the n candidate regions and the second region; and taking the region with the largest distance with the second region in the n candidate regions as the target sub-region.
In an optional embodiment, the determiningmodule 1030 is further configured to determine m role regions corresponding to role types from the mask information, where m is a positive integer; and taking the role region with the largest region area in the m role regions as the second region corresponding to the central role.
In an optional embodiment, the determiningmodule 1030 is further configured to multiply a region area of the target sub-region by a preset multiple to serve as a target area of the recommendation information displayed in the target video frame; determining a target area corresponding to the recommendation information in the target video frame according to the target area and the display shape of the recommendation information; and taking the position of the target area when the target area covers the target sub-area as the implantation position of the recommendation information in the target video frame.
In an optional embodiment, the target video frame includes at least two objects belonging to the target type, and the first region includes at least two candidate sub-regions corresponding to the objects of the target type;
as shown in fig. 11, the apparatus further includes:
afiltering module 1040, configured to reserve the candidate sub-region with the largest area of the at least two candidate sub-regions as the first region after filtering, and delete other candidate sub-regions.
In an optional embodiment, the obtainingmodule 1010 is further configured to perform video slicing on the target video according to a scene change condition of the target video, so as to obtain a video segment corresponding to the scene change condition; and acquiring a first key frame in the video clip as the target video frame, wherein the target video frame is used for determining the implantation position of the recommendation information in the video clip.
In an optional embodiment, the identifyingmodule 1020 is further configured to perform image identification on the target video frame to obtain a mask result set in the target video frame, where the mask result set includes an object classification, an object region, and a confidence of each identified object;
the device further comprises:
afiltering module 1040, configured to remove a result, of which confidence is smaller than a confidence requirement, in the mask result set from the mask result set, and obtain the filtered mask result set as the mask information of the target video frame.
In an alternative embodiment, the target type includes at least one of a desktop type, a floor type, a sill type, and a counter type.
In summary, the device for determining the implantation position of the recommendation information provided In this embodiment obtains the mask information of the target Video frame by obtaining the target Video frame from the target Video and performing image recognition processing on the target Video frame, so as to determine the implantation position of the recommendation information In the target Video frame on the basis of the first region of the target type object corresponding to the target Video frame, thereby implementing a process of automatically implanting the recommendation information In the target Video In a Video-In manner, reducing workload of the implantation process of the recommendation information, improving efficiency of implantation of the recommendation information, and saving time resources and human resources.
It should be noted that: the device for determining the recommended information implantation position provided in the above embodiment is only illustrated by dividing the above functional modules, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the determination apparatus for the recommended information implantation position and the determination method for the recommended information implantation position provided in the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.
The application also provides a computer device, which comprises a processor and a memory, wherein at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to realize the determination method of the recommended information implantation position provided by the above method embodiments. It should be noted that the computer device may be a server as provided in fig. 12 below.
Referring to fig. 12, a schematic structural diagram of a server according to an exemplary embodiment of the present application is shown. Specifically, the method comprises the following steps: theserver 1200 includes a Central Processing Unit (CPU)1201, asystem memory 1204 including a Random Access Memory (RAM)1202 and a Read Only Memory (ROM)1203, and asystem bus 1205 connecting thesystem memory 1204 and thecentral processing unit 1201. Theserver 1200 also includes a basic input/output system (I/O system) 1206 to facilitate transfer of information between devices within the computer, and amass storage device 1207 for storing anoperating system 1213,application programs 1214, and other program modules 1215.
The basic input/output system 1206 includes adisplay 1208 for displaying information and aninput device 1209, such as a mouse, keyboard, etc., for a user to input information. Wherein thedisplay 1208 andinput device 1209 are connected to thecentral processing unit 1201 through an input-output controller 1210 coupled to thesystem bus 1205. The basic input/output system 1206 may also include an input/output controller 1210 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 1210 also provides output to a display screen, a printer, or other type of output device.
Themass storage device 1207 is connected to thecentral processing unit 1201 through a mass storage controller (not shown) connected to thesystem bus 1205. Themass storage device 1207 and its associated computer-readable media provide non-volatile storage for theserver 1200. That is, themass storage device 1207 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROI drive.
Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. Thesystem memory 1204 andmass storage device 1207 described above may be collectively referred to as memory.
The memory stores one or more programs configured to be executed by the one or morecentral processing units 1201, the one or more programs containing instructions for implementing the above-described method for determining a recommended information implantation position, and thecentral processing unit 1201 executes the one or more programs to implement the method for determining a recommended information implantation position provided by the above-described respective method embodiments.
Theserver 1200 may also operate as a remote computer connected to a network via a network, such as the internet, in accordance with various embodiments of the present invention. That is, theserver 1200 may be connected to thenetwork 1212 through anetwork interface unit 1211 coupled to thesystem bus 1205, or thenetwork interface unit 1211 may be used to connect to other types of networks or remote computer systems (not shown).
The memory further includes one or more programs, the one or more programs are stored in the memory, and the one or more programs include steps executed by the server for performing the method for determining the recommended information placement position provided by the embodiment of the invention.
The embodiment of the application also provides computer equipment, which comprises a memory and a processor, wherein the memory is stored with at least one instruction, at least one program, code set or instruction set, and the at least one instruction, at least one program, code set or instruction set is loaded by the processor and realizes the determination method of the recommended information implantation position.
The embodiment of the present application further provides a computer-readable storage medium, where at least one instruction, at least one program, a code set, or a set of instructions is stored in the computer-readable storage medium, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the above method for determining the recommended information implantation position.
The application also provides a computer program product, which when running on a computer, causes the computer to execute the method for determining the recommended information implantation position provided by the above method embodiments.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, which may be a computer readable storage medium contained in a memory of the above embodiments; or it may be a separate computer-readable storage medium not incorporated in the terminal. The computer readable storage medium has at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, which is loaded and executed by the processor to implement the above-mentioned method for determining a recommended information implantation location.
Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.