Detailed Description
Reference will now be made in detail to the embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The embodiments described in the examples below do not represent all embodiments consistent with the application. Merely exemplary of systems and methods consistent with aspects of the application as set forth in the claims.
It should be noted that the brief description of the terminology in the present application is for the purpose of facilitating understanding of the embodiments described below only and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.
The terms "first," "second," "third," and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for limiting a particular order or sequence, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.
The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements explicitly listed, but may include other elements not expressly listed or inherent to such product or apparatus.
The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware or/and software code that is capable of performing the function associated with that element.
In the embodiment of the present application, the display device 200 generally refers to a device having a screen display and a data processing capability. For example, display device 200 includes, but is not limited to, a smart television, a mobile terminal, a computer, a monitor, an advertising screen, a wearable device, a virtual reality device, an augmented reality device, and the like.
Fig. 1 is a schematic diagram of an operation scenario between a display device and a control device according to some embodiments of the present application. As shown in fig. 1, a user may operate the display device 200 through a touch operation, the mobile terminal 300, and the control device 100. For example, the control device 100 may be a remote control, a stylus, a handle, or the like.
The mobile terminal 300 may serve as a control device for performing man-machine interaction between a user and the display device 200. The mobile terminal 300 may also be used as a communication device for establishing a communication connection with the display device 200 for data interaction. In some embodiments, the mobile terminal 300 may install a software application with the display device 200, implement connection communication through a network communication protocol, and achieve the purpose of one-to-one control operation and data communication. The audio/video content displayed on the mobile terminal 300 can also be transmitted to the display device 200, so as to realize the synchronous display function.
As also shown in fig. 1, the display device 200 is also in data communication with the server 400 via a variety of communication means. The display device 200 may be permitted to make communication connections via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks.
The display device 200 may provide a broadcast receiving tv function, and may additionally provide an intelligent network tv function of a computer supporting function, including, but not limited to, a network tv, an intelligent tv, an Internet Protocol Tv (IPTV), etc.
Fig. 2 is a block diagram of a hardware configuration of the display device 200 of fig. 1 according to some embodiments of the present application.
In some embodiments, the display apparatus 200 may include at least one of a modem 210, a communication device 220, a detector 230, a device interface 240, a controller 250, a display 260, an audio output device 270, a memory, a power supply, a user input interface.
In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, the detector 230 includes a light receiver for collecting a sensor of the intensity of ambient light, or the detector 230 includes an image collector such as a camera that may be used to collect external ambient scenes, user attributes or user interaction gestures, or the detector 230 includes a sound collector such as a microphone or the like for receiving external sounds.
In some embodiments, display 260 includes display functionality for presenting pictures, and a drive component that drives the display of images. The display 260 is used for receiving and displaying image signals output from the controller 250. For example, the display 260 may be used to display video content, image content, and components of menu manipulation interfaces, user manipulation UI interfaces, and the like.
In some embodiments, the communication apparatus 220 is a component for communicating with an external device or server 400 according to various communication protocol types. The display apparatus 200 may be provided with a plurality of communication devices 220 according to the supported communication manner. For example, when the display apparatus 200 supports wireless network communication, the display apparatus 200 may be provided with a communication device 220 including a WiFi function. When the display apparatus 200 supports bluetooth connection communication, the display apparatus 200 needs to be provided with a communication device 220 including a bluetooth function.
The communication means 220 may communicatively connect the display device 200 with an external device or the server 400 by means of a wireless or wired connection. Wherein the wired connection may connect the display device 200 with an external device through a data line, an interface, etc. The wireless connection may then connect the display device 200 with an external device through a wireless signal or a wireless network. The display device 200 may directly establish a connection with an external device, or may indirectly establish a connection through a gateway, a route, a connection device, or the like.
In some embodiments, the controller 250 may include at least one of a central processor, a video processor, an audio processor, a graphic processor, a power supply processor, first to nth interfaces for input/output, and the controller 250 controls the operation of the display device and responds to the user's operation through various software control programs stored on the memory. The controller 250 controls the overall operation of the display apparatus 200.
In some embodiments, the controller 250 and the modem 210 may be located in separate devices, i.e., the modem 210 may also be located in an external device to the main device in which the controller 250 is located, such as an external set-top box or the like.
In some embodiments, a user may input user commands through a graphical user interface (GRAPHICAL USER INTERFACE, GU 1) displayed on the display 260, and the user input interface receives the user input commands through a Graphical User Interface (GUI).
In some embodiments, audio output device 270 may be a speaker local to display device 200 or an audio output device external to display device 200. For an external audio output device of the display device 200, the display device 200 may also be provided with an external audio output terminal, and the audio output device may be connected to the display device 200 through the external audio output terminal to output sound of the display device 200.
In some embodiments, user input interface 280 may be used to receive instructions from user input.
Fig. 3 is a block diagram of a hardware configuration of the control device in fig. 1 according to some embodiments of the present application. As shown in fig. 3, the control device 100 may include a controller 110, a communication interface 130, a user input/output interface, a memory, and a power supply.
The control device 100 is configured to control the display device 200, and can receive an input operation instruction of a user, and convert the operation instruction into an instruction recognizable and responsive to the display device 200 to perform an interaction between the user and the display device 200.
In some embodiments, the control device 100 may be a smart device. For example, the control apparatus 100 may install various applications for controlling the display apparatus 200 according to user's demands.
In some embodiments, as shown in fig. 1, a mobile terminal 300 or other intelligent electronic device may function similarly to the control device 100 after installing an application that manipulates the display device 200.
The controller 110 includes a processor 112 and a RAM (Random Access Memory ) 113 and a ROM (Read-Only Memory) 114, a communication interface 130, and a communication bus. The controller 110 is used to control the operation and operation of the control device 100, as well as the communication collaboration among the internal components and the external and internal data processing functions.
The communication interface 130 enables communication of control signals and data signals with the display device 200 under the control of the controller 110. The Communication interface 130 may include at least one of a WiFi chip 131, a bluetooth module 132, an NFC (NEAR FIELD Communication) module 133, and other near field Communication modules.
A user input/output interface 140, wherein the input interface includes at least one of a microphone 141, a touch pad 142, a sensor 143, keys 144, and other input interfaces.
In some embodiments, the control device 100 includes at least one of a communication interface 130 and an input-output interface 140. The control device 100 is configured with a communication interface 130, such as a WiFi, bluetooth, NFC, etc. module, and may send a user input instruction to the display device 200 through a WiFi protocol, or a bluetooth protocol, or an NFC protocol code.
A memory 190 for storing various operation programs, data and applications for driving and controlling the control device 100 under the control of the controller. The memory 190 may store various control signal instructions input by a user.
A power supply 180 for providing operating power support for the various elements of the control device 100 under the control of the controller.
To perform user interactions, in some embodiments, display device 200 may be run with an operating system. The operating system is a computer program for managing and controlling hardware resources and software resources in the display device 200. The operating system may provide a user interface (to control the display device), allow a user to interact with the display device 200, and support running various applications.
It should be noted that, the operating system may be a native operating system based on a specific operating platform, a third party operating system customized based on a depth of the specific operating platform, or an independent operating system specially developed for a display device.
The operating system may be divided into different modules or tiers depending on the functionality implemented,
For example, as shown in FIG. 4, in some embodiments, the system is divided into four layers, from top to bottom, an application layer (application), an application framework layer (Application Framework), a system library layer, and a kernel layer.
In some embodiments, the application layer is used to provide services and interfaces for applications so that the display device 200 can run applications and interact with users based on the applications. The application layer may be run with at least one application program, which may be a Window (Window) program, a system setting program, or a clock program of the operating system, or may be an application program developed by a third party developer. In particular implementations, the application packages in the application layer are not limited to the above examples.
The framework layer provides an application programming interface (Application Programming Interface, API) and programming framework for the application. The application framework layer includes a number of predefined functions. The application framework layer corresponds to a processing center that decides to let the applications in the application layer act. Through the API interface, the application program can access the resources in the system and acquire the services of the system in the execution.
As shown in fig. 4, in the embodiment of the present application, the application framework layer includes a view system (VIEW SYSTEM), a manager (Managers), a Content Provider (Content Provider), and the like, where the view system may design and implement interfaces and interactions of the application, and the view system includes a list (lists), a network (grids), text boxes, buttons (buttons), and the like. The Manager includes at least one of an activity Manager (ACTIVITY MANAGER) for interacting with all activities running in the system, a Location Manager (Location Manager) for providing system Location service access to system services or applications, a package Manager (PACKAGE MANAGER) for retrieving various information about application packages currently installed on the device, a notification Manager (Notification Manager) for controlling the display and removal of notification messages, and a Window Manager (Window Manager) for managing icons, windows, toolbars, wallpaper, and desktop components on the user interface.
In some embodiments, the activity manager is used to manage the lifecycle of the individual applications as well as the usual navigation rollback functions, such as controlling the exit, opening, fallback, etc. of the applications. The window manager is used for managing all window programs, such as obtaining the size of a display screen, judging whether a status bar exists or not, locking the screen, intercepting the screen, controlling the change of the display window, for example, reducing the display window to display, dithering display, distorting display and the like.
In some embodiments, the system runtime layer may provide support for the framework layer, and when the framework layer is in use, the operating system may run instruction libraries, such as the C/C++ instruction library, contained in the system runtime layer to implement the functions to be implemented by the framework layer.
In some embodiments, the kernel layer is a functional hierarchy between the hardware and software of the display device 200. The kernel layer can realize the functions of hardware abstraction, multitasking, memory management and the like. For example, as shown in FIG. 4, a hardware driver may be configured in the kernel layer, where the driver included in the kernel layer may be at least one of an audio driver, a display driver, a Bluetooth driver, a camera driver, a WIFI driver, a USB (Universal Serial Bus ) driver, an HDMI (High Definition Multimedia Interface, high-definition multimedia interface) driver, a sensor driver (such as a fingerprint sensor, a temperature sensor, a pressure sensor, etc.), and a power driver.
It should be noted that the above examples are merely a simple division of functions of an operating system, and do not limit the specific form of the operating system of the display device 200 in the embodiment of the present application, and the number of levels and specific types of levels included in the operating system may be expressed in other forms according to factors such as the functions of the display device and the type of the operating system.
Along with the development of display equipment (such as smart televisions), in the process of content display of a display device at present, in order to improve the interest of content display, improve the viewing experience of a viewer, often, display bullet screens are overlapped on the content displayed by the display. Typically, the bullet screen displayed by the display often originates from a real bullet screen provided by the content provider that is entered by a viewing user of the displayed content. However, currently display devices are individually displayed on a bullet screen based on the user's logged-in account. This way, the accuracy of the presented barrage is low. Based on this, in some embodiments, a barrage display method is provided. The bullet screen display method can be realized by a display device.
In one exemplary embodiment, a controller in which a bullet screen display method is applied to a display apparatus is taken as an example, wherein the display apparatus includes a display, a communication device, and a controller.
The communication connection between the display and the controller can be a wired communication connection or a wireless communication connection.
The communication device is in communication connection with the audio and video acquisition device, and can be in wired communication connection or wireless communication connection.
As shown in fig. 5, the bullet screen display method applied to the controller in the display device may include the steps of:
s501, under the condition that a bullet screen triggering event is detected in the process of displaying the content by the display, receiving the audio data and the video data of the current scene where the display equipment is located, which are acquired by the audio and video acquisition device, and acquiring the image data of the current display content of the display.
Typically, the bullet screen is displayed on the content displayed on the display to increase the interest of the displayed content and improve the viewing experience of the viewer, so that the bullet screen is typically displayed during the content display process of the display.
In addition, the display of the bullet screen is typically triggered based on some triggering event, for example, the viewer sends a bullet screen display instruction to the display device by clicking a related button on the remote controller, the display plays a display content of a specific type (such as a television show, a movie, a variety program), related information such as "bullet screen" appearing in the communication information of the viewer in the current scene, specific content (such as a specific actor, a specific building, a specific episode) appearing in the content displayed by the display, and the like.
Wherein, in an alternative embodiment, detecting a barrage trigger event may include any of:
1. And determining that the barrage triggering event is detected under the condition that the display starts to display the content (such as the starting of the display equipment).
2. And determining that the bullet screen triggering event is detected under the condition that the preset content (such as preset building, preset actor, preset plot and the like) appears in the content displayed by the display.
3. And under the condition that the preset content (such as a barrage, a barrage opening, a barrage displaying and the like) exists in the audio data of the current scene, determining that the barrage triggering event is detected.
4. In the event that the display is detected to begin displaying a specified type of content (e.g., live broadcast, television show, movie, show, etc.), it is determined that a barrage trigger event is detected.
5. And determining that the bullet screen triggering event is detected under the condition that the bullet screen display instruction (such as a voice interaction instruction, an instruction sent through a remote controller and the like) is received.
Based on this, in the process of content display by the display, in the case where a bullet screen triggering event is detected, it may be determined to trigger bullet screen display, and it is necessary to further determine a bullet screen to be displayed in the display. In order to determine the personalized barrage aiming at the watching party in the current scene of the display device, so as to realize the personalized barrage display aiming at the watching party in the on-site scene, the audio data and the video data of the current scene of the display device, which are acquired by the audio and video acquisition device, can be received first, and the image data of the current display content of the display can be acquired.
In some embodiments, the controller may send a data acquisition request to the audio and video acquisition device, and the audio and video acquisition device responds to the data acquisition request and feeds back the acquired audio data and video data of the current scene where the display device is located to the controller, so that the controller may receive the audio data and video data of the current scene where the display device is located, which are acquired by the audio and video acquisition device.
Optionally, the audio-video acquisition device is a device integrated with an audio acquisition function and a video acquisition function. The audio and video acquisition device comprises an audio acquisition device (such as a microphone and a microphone array) for carrying out audio acquisition and a video acquisition device (such as a camera and the like) for carrying out video acquisition, which are independent. Correspondingly, in the step S501, the step of acquiring the audio data and the video data of the current scene of the display equipment acquired by the audio and video acquisition device may include acquiring the audio data of the current scene of the display equipment acquired by the audio acquisition device and acquiring the video data of the current scene of the display equipment acquired by the video acquisition device.
Optionally, the audio and video acquisition device may be disposed on the display device, or may be independent of the display device. In an alternative embodiment, the audio and video acquisition device comprises an audio acquisition device and a video acquisition device, and the display device further comprises an audio acquisition device configured to acquire audio data and/or a video acquisition device configured to acquire video data. That is, in the case where the above-mentioned audio capturing device includes an audio capturing device and a video capturing device which are independent from each other, the display apparatus may be provided with only the audio capturing device and the video capturing device outside the display apparatus, the display apparatus may be provided with only the video capturing device and the audio capturing device outside the display apparatus, and the display apparatus may be provided with both the audio capturing device and the video capturing device.
Optionally, the audio data includes, but is not limited to, personnel communication information in the current scene, such as comment information, personnel conversation information, etc. about the content displayed on the display. For example, the audio data may further include environmental sounds of the current scene (such as wind sounds, audio information output by other electronic devices), voice interaction instruction information of people aiming at the display device in the current scene, and the like. The video data includes environmental data and personnel data of the current scene.
Optionally, the display device may have a screen capturing function, so that during the content display process of the display device, the screen capturing may be performed on the display device, and thus, each captured screen image is the image data of the current display content of the display device.
Optionally, an image acquisition device for acquiring an image of the screen of the display may be further provided, so that in the process of displaying the content of the display, the image acquisition device acquires an image of the screen of the display, so as to obtain image data of the current display content of the display. The image acquisition device can be arranged on the display equipment or can be independent of the display equipment. In an alternative embodiment, the display device further comprises an image acquisition means configured to acquire image data. Correspondingly, the step of acquiring the image data of the current display content of the display in S501 may include acquiring the image data of the current display content of the display acquired by the image acquisition device.
In order to ensure the pertinence and the real-time property of the finally displayed personalized barrage to the watching party in the current scene, the audio data and the video data are acquired in real time by the audio and video acquisition device, and the image data of the current display content of the display are also generated in real time. Based on the above, in the process of displaying the content by the display, under the condition that the bullet screen triggering event is detected, the controller of the display device can receive the audio data and the video data of the current scene where the display device is located, which are acquired by the audio and video acquisition device in real time, and acquire the image data of the current display content of the display, which is generated in real time.
S502, generating scene description information of the current scene according to the audio data, the video data and the image data.
The scene description information of the current scene is used for describing the environmental characteristics and the character characteristics of the current scene, and the character characteristics comprise at least one of character watching content, character dialogue and character basic attributes.
After the audio data, the video data and the image data are obtained, the audio data, the video data and the image data can be processed in a compound mode or a single mode by utilizing a multi-mode large model, an expert model in a specific field and the like so as to obtain scene description information of a current scene.
The scene description information is a detailed description of a specific situation or environment, and is used to determine "what time, what place, what person, what thing is done, what needs/targets are, how" the surrounding environment is, and other elements. It converts abstract scenes into intelligible, analyzable concrete descriptions through structured or semi-structured information.
Based on this, the scene description information of the current scene is used to describe the environmental features and character features of the current scene. The information describing the environmental characteristics includes information describing characteristics of the environment in which the display device is located, for example, a room type (such as a living room, a bedroom, etc.), a furniture style, a lighting brightness, a room area, etc. The information describing the character feature includes at least one of information describing a viewing content of the character, information describing a dialogue of the character, and information describing a basic attribute of the character. Wherein the information describing the person viewing the content includes information describing the content displayed on the display, such as the type of content (e.g., games, movies, sports, shopping), the changing situation of the type of content, the person and scene in the content displayed on the display, and the search keywords input by the user, and the search result content displayed in response to the search keywords, etc. The information describing the dialogue of the person includes information describing dialogue voices of the person in the current scene, comment voices of the person in the current scene on contents displayed on the display, man-machine interaction voices of the person in the current scene and the display device, and the like. In some embodiments, the characters in the current scene may further include characters that pass through the current scene in the case of including the viewer in the current scene, and the characters may converse with the viewer in the current scene and comment on the content displayed on the display during the process of passing through the current scene, so the information describing the conversation of the characters may further include information describing the voice of the characters that pass through the current scene. The information describing the basic attribute of the person includes information describing characteristics of the person itself in the current scene, such as the number, age, sex, wearing style, emotion, and the like.
Optionally, for the above audio data, an ASR (Automatic Speech Recognition ) model may be used to perform speech recognition to obtain speech information such as evaluation speech of the person, dialogue speech of the person, and interaction speech of the person and the display device in the current scene. For the video data, recognition processing such as face recognition, age recognition, emotion recognition, behavior recognition, and environment recognition may be performed using a CV (Computer Vision) model to obtain character basic attribute information (such as age, gender, emotion, number, behavior, etc.) and environment attribute information (such as home furnishings, decoration style, brightness information, scene area, etc.) of the current scene. For the above image data, a plurality of models such as a CNN (Convolutional Neural Network ) model, a TSN (Temporal Segment Network, time series segment network) model, and the like may be used to identify the content, person, and the like in the above image data to obtain viewing content information, for example, content type (such as games, movies, sports, shopping), fine granularity classification under specific types (such as martial arts, love, actions, years, and the like under movie types), change condition of content type, stay time under each type, and the like. Therefore, the scene description information of the current scene can be comprehensively generated by combining the identified various information.
S503, extracting real-time personalized portraits of the viewers in the current scene from the scene description information.
The viewers in the current scene can be all viewers in the current scene, or can be part of viewers in the current scene.
Alternatively, when the viewer in the current scene is a part of the viewers in the current scene, the biological characteristics of the viewers in the current scene may be determined according to the audio data and/or the video data, so that the registered person in the current scene is determined as the viewer in the current scene by matching the biological characteristics of the viewers with the biological characteristics of the registered person.
The personalized image (Personalized Portrait) refers to an avatar summary which can accurately describe the characteristics, behaviors, preferences, demands and the like of an individual by collecting and analyzing multidimensional data of the individual.
As described above, the audio data, the video data and the image data may be collected in real time in the current scene where the display device is located, so that the real-time personalized portrait of the viewer in the current scene may be understood as a real-time accurate description extracted to reflect the current state, behavior, preference, requirement, and other attributes of the viewer in the current scene based on the audio data, the video data and the image data collected in real time in the current scene. That is, the real-time personalized portrayal of the viewer in the current scene may accurately describe the personalized features of the viewer in the current scene in real-time.
Thus, after the scene description information of the current scene is generated according to the audio data, the video data and the image data, the real-time personalized portrait of the viewer in the current scene can be extracted from the scene description information.
Since the scene description information of the current scene may include multi-dimensional information for describing the environmental features and character features of the current scene. Alternatively, the above-described multi-dimensional information includes viewing person information (scene description information describing basic attributes of a person), viewing environment information (scene description information describing characteristics of the environment), viewing content information (scene description information describing viewing content of a person), and viewing dialogue information (scene description information describing dialogue of a person). Of course, the multi-dimensional information may also include information of other dimensions, which is not particularly limited. For example, device information of the display device, such as model, brand, memory size, etc., is extracted as device information included as description information of the current scene, etc.
Furthermore, the personalized information of four dimensions of the viewing personnel information, the viewing environment information, the viewing content information and the viewing dialogue information can be extracted from the scene description information, so that the real-time personalized portraits of the viewing parties in the current scene can be obtained.
For example, the number of viewers, the age of the viewers, the gender of the viewers, the wearing characteristics of the viewers (such as fashion, elegance, lovely, etc.), and the emotion of the viewers are extracted as personalized information in the viewer information dimension, the room category (such as living room, bedroom, etc.), the light brightness, the home style, and the scene area are extracted as personalized information in the viewing environment dimension, the viewing content, the search content of the viewers, the application program used by the viewers, the person displayed by the display focused by the viewers, and the object displayed by the display focused by the viewers are extracted as personalized information in the viewing content information dimension, and the discussion content of the viewers, the dialogue content between the viewers, and the man-machine interaction content of the viewers and the display device are extracted as personalized information in the viewing dialogue information dimension. The extracted personalized information in each dimension is used as a part of the real-time personalized portrait of the viewer in the current scene, so that the real-time personalized portrait of the viewer in the current scene is obtained.
S504, determining a personalized barrage according to the real-time personalized portraits of the viewers in the current scene.
The personalized barrage refers to barrage content which is dynamically generated or recommended for different users according to personal characteristics, preferences, behavior habits, real-time scenes and other data of the users.
And as mentioned above, the real-time personalized portrait of the viewer in the current scene may describe the real-time personalized features of the viewer in the current scene, so that the personalized barrage may be determined according to the real-time personalized portrait of the viewer in the current scene, so as to meet the barrage requirement of the viewer in the current scene, and realize accurate barrage display for the viewer in the current scene.
Optionally, in the case that the content displayed by the display has bullet screen resources, a content tag may be added to each bullet screen resource according to the content of the bullet screen resource, so that the bullet screen resource with the content tag matched with the real-time personalized portrait of the viewer in the current scene may be determined as the personalized bullet screen.
Alternatively, a barrage with contents conforming to the real-time personalized portrait of the viewer in the current scene can be generated as a personalized barrage according to the information of each dimension represented by the real-time personalized portrait of the viewer in the current scene. For example, by AIGC (ARTIFICIALINTELLIGENCE GENERATED Content, artificial intelligence generation Content) means, a personalized barrage is generated. For example, when the content displayed by the display does not have the barrage resource or the barrage resource is less, the barrage with the content conforming to the real-time personalized portrait of the viewer in the current scene can be generated as the personalized barrage according to the information of each dimension represented by the real-time personalized portrait of the viewer in the current scene.
S505, controlling the display to display the personalized barrage in a superposition mode.
After the personalized barrage is determined, the personalized barrage can be controlled to be displayed on the display in the process of displaying the content, and then in the process of displaying the content on the display, a viewer in the current scene can view the content which is displayed by the display and is originally to be displayed, and can view the personalized barrage aiming at the viewer in the current scene, so that the viewing experience of the viewer in the current scene is improved.
Optionally, in an optional embodiment, in a process of displaying content on the display, under a condition that a bullet screen triggering event is detected, the steps S501 to S505 may be executed periodically, so that, in a process of displaying content on the display, a personalized bullet screen may be continuously displayed in a superimposed manner on the display according to a set period, so as to meet a viewing requirement of a viewer in a current scene, and improve a viewing experience of the viewer in the current scene.
In the above embodiment, in the process of displaying the content on the display of the display device, the controller of the display device may receive the audio data and the video data of the current scene where the display device is located and acquired by the audio and video acquisition device, and acquire the image data of the current display content of the display device when detecting the bullet screen triggering event. Further, scene description information of the current scene is generated according to the audio data, the video data and the image data, and the scene description information is used for describing environmental characteristics and character characteristics of the current scene, and the character characteristics comprise at least one of character watching content, character dialogue and character basic attributes, so that real-time personalized portraits of viewers in the current scene can be extracted from the scene description information. The personalized barrage can be further determined according to the real-time personalized portrait of the viewer in the current scene, and the display is controlled to display the determined personalized barrage in a superposition mode. Therefore, on one hand, the audio data and the video data can be acquired by means of the audio and video acquisition device, and the audio data and the video data can reflect the environmental characteristics of the current scene and the character characteristics of characters in the current scene, so that real-time personalized portraits constructed based on the audio data and the video data are used for refining the granularity of the real-time personalized portraits from the use environment of the display equipment and the characteristics of users to make up for the fine granularity personalized portraits which cannot be perceived by a login account, on the other hand, the personalized barrage is determined according to the real-time personalized portraits, and the real-time personalized portraits are determined based on the audio data and the video data of the current scene and the current display content of the display, so that the real-time performance and the pertinence of the determined personalized barrage can be improved. Furthermore, the scheme expands the traditional personalized determination category based on the login account, realizes the behavior perception of crossing APP (Application) and signal sources under the condition of no sense of a user, performs bullet screen personalized display aiming at a viewer in the current scene by taking the display equipment as a unit in the process of content display of the display equipment, further refines the personalized granularity of the bullet screen, and improves the accuracy and pertinence of the displayed bullet screen.
On the basis of the above embodiment, in an exemplary embodiment, the generation of the scene description information in S502 is further refined. The generating manner of the scene description information may be that audio data, video data and image data are input into a multi-modal large model to obtain scene description information of the current scene.
In this embodiment, a multi-modal large model (such as a CLIP (Contrastive Language-Image Pre-training) model) is an artificial intelligent model capable of processing and understanding information of multiple different modalities at the same time, where the multiple different modalities may include text, image, audio, video, and so on. Therefore, the audio data, the video data and the image data can be input into the multi-mode large model to obtain the description information about the scene obtained by processing and identifying various data by the multi-mode large model, and the description information is used as the scene description information of the current scene.
The multi-mode large model can convert data of different modes into data features in the same semantic space through mode alignment. The semantic space is used for describing semantic relations and structural representations of information such as symbols, languages, images and sounds, and the like, and the abstract semantics are converted into a computable spatial structure by means of mathematical modeling, so that a computer can understand and process semantic association of human language or multi-modal data. Therefore, the multi-modal large model can be aligned through modes, so that data of different modes can be accurately mapped, associated and interacted, differences of the data of different modes in the representation form are eliminated, and cross-modal semantic consistency is established, so that cross-modal tasks are completed.
Based on this, in an alternative embodiment, the multi-modal large model may include a modal alignment module and a semantic recognition module, and the inputting the audio data, the video data and the image data into the multi-modal large model to obtain the scene description information of the current scene may include inputting the audio data, the video data and the image data into the modal alignment module to perform modal alignment to obtain data features of the audio data, the video data and the image data in the same semantic space, and inputting the data features into the semantic recognition module to perform semantic recognition to obtain the scene description information of the current scene.
In this embodiment, the mode alignment module in the multi-mode large model is configured to perform mode alignment on input data of different modes, so as to convert the input data of different modes into data features in the same semantic space. Therefore, the audio data, the video data and the image data can be input into the modal alignment module of the multi-modal large model for modal alignment, and the modal alignment module can output the feature data of the audio data, the video data and the image data under the same voice space after conversion.
And the semantic recognition module of the multi-mode large model can recognize the deep semantic information implicit in the feature data obtained by the conversion so as to extract information such as meaning, emotion, scene, logic relation and the like expressed by the audio data, the video data and the image data and obtain scene description information of the current scene reflected by the audio data, the video data and the image data.
Optionally, the mode alignment module in the multi-mode large model may be trained by using audio data, video data and image data of a historical scene where the display device is located as input and using description information of the audio data, the video data and the image data of the historical scene where the display device is located in the same semantic space as a label, and the semantic recognition module of the multi-mode large model may be trained by using the audio data, the video data and the image data of the historical scene where the display device is located as input and using scene description information of the historical scene where the display device is located as a label. So far, in the case of inputting the audio data, video data and image data of the current scene to the multi-modal large model obtained by training, the scene description information of the current scene output by the multi-modal large model may be obtained.
In this embodiment, the scene description information of the current scene is generated by using the multi-mode large model, and the scene information represented by the data of different modes such as audio data, video data and image data can be fused, so that the limitation of a single mode is broken through, the generated scene information of the current scene is richer, more comprehensive and more accurate, the data of different modes can be mutually supplemented, and the robustness of the obtained scene description information of the current scene can be improved. And the accuracy and pertinence of the real-time personalized portrait of the viewer in the current scene can be further improved, and the accuracy and pertinence of the displayed barrage are further improved.
Based on the above embodiments, in an exemplary embodiment, the extraction of the real-time personalized image in S503 is further refined. The extraction mode of the real-time personalized portrait can be that historical operation data of a viewer in a current scene on the display device is extracted from local storage data of the display device, and the real-time personalized portrait of the viewer in the current scene is extracted from scene description information and the historical operation data.
In this embodiment, it may be understood that, in the process of displaying content on the display, a person in the current scene may perform various operations on the display device through various modes such as a voice command, an operation pointing remote controller, a touch screen operation, and the like. For example, switching of the displayed content by controlling the display through a voice instruction, opening of an application by operating the pointing remote controller, touch screen operation in the display to circle a facial image of an actor, movement of a cursor around a facial contour of the actor in the display by operating the pointing remote controller, and the like. Accordingly, in the locally stored data of the display device, historical operation data of the viewer to the display device in the current scene, for example, an operation log of the display device, may be recorded.
It will be appreciated that the historical operational data of the viewer with respect to the display device in the current scene may also reflect the personalized features of the viewer in the current scene to some extent, e.g., the historical operational data of frequently opening an application may reflect a higher degree of interest in the function of the application by the viewer in the current scene, while the historical operational data controlling movement of the facial contour of the actor in the display directed to the remote control cursor may reflect a higher degree of interest in the actor by the viewer in the current scene, etc. Therefore, the historical operation data of the viewer in the current scene on the display device can participate in the extraction process of the personalized image of the viewer in the current scene.
Based on the above, after generating the scene description information of the current scene, the historical operation data of the viewer in the current scene on the display device can be extracted from the local storage data of the display device, and then the real-time personalized portrait of the viewer in the current scene is extracted from the scene description information and the historical operation data.
Alternatively, viewing content characteristic information for describing viewing content of a person may be extracted from the above-described scene description information and history operation data, respectively, and the viewing content characteristic information extracted from the above-described scene description information may be adjusted using the viewing content characteristic information extracted from the history operation data, for example, weight adjustment for the degree of attention of different content types, increase or decrease of a person having a higher degree of attention, or the like, to thereby obtain adjusted viewing content characteristic information as an image of viewing content information dimension in a personalized image, and viewing person information, viewing environment information, and an image of viewing session information dimension may be extracted from the above-described scene description information. Thus, the real-time personalized portrait of the viewer in the current scene is obtained.
In this embodiment, in the process of extracting the personalized image of the viewer in the current scene, the historical operation data of the viewer in the current scene on the display device is involved, so that the extracted personalized image of the viewer in the current scene can be enriched, and the accuracy and pertinence of the extracted personalized image of the viewer in the current scene can be further improved by correcting the scene description information of the current scene by the historical operation data of the viewer in the current scene on the display device.
It will be appreciated that the characteristics of the preferences, needs, etc. of a person may change over time, so may their personalized image for a viewer in the current scene. For example, for a home television viewing scene, the preference of a viewer for different actors, preference for different types of display content, and character, wearing style, furniture style, etc. of the viewer may all vary over time, so short-term variations and long-term stability of the personalized features need to be fully considered in determining the personalized image of the viewer.
The method comprises the steps of displaying a real-time personalized image of a viewer, wherein the short-term change of the personalized feature of the viewer can be reflected by the real-time personalized image of the viewer to realize real-time tracking of real-time preference of time change, and the long-term stability of the personalized feature of the viewer can be reflected by the historical personalized image of the viewer. And the personalized barrage aiming at the watching party can be comprehensively determined through the real-time personalized portrait and the historical personalized portrait of the watching party.
Based on this, in an exemplary embodiment, the determination of the personalized barrage in S504 is further refined. Alternatively, as shown in fig. 6, the following steps may be included:
S601, performing portrait fusion on the historical personalized portraits and the real-time personalized portraits associated with the watching party to obtain the comprehensive personalized portraits.
Wherein the historical personalized portrayal associated with the viewer includes information describing the viewing content characteristics of the viewer over a preset historical period.
It will be appreciated that the historical personalisation portrayal is a set of user feature tags built by data analysis and algorithmic models based on behavioural data of the user over a period of time. It focuses on the historical behavior patterns of the user to describe the user's long-term stable interests, habits and needs.
Based on this, the historical personalized portraits associated with the viewers include information describing the features of the viewers' viewing content over a preset historical period. That is, the historical personalized portraits associated with the viewers may reflect the long-term stability of the viewers' preferences for viewing content over a preset historical period. Through historical personalized portraits associated with a viewer, content type features (such as games, movies, sports, shopping) of viewing content of interest to the viewer within a preset historical period, fine granularity classification features under specific types (such as martial arts, love, actions, years, etc. under movie types), character features of viewing content of interest (such as stars, actors, athletes, etc.), event features of viewing content of interest (such as entertainment events, etc.) can be extracted.
Optionally, the viewer-associated historical personalized portrayal may further comprise information describing the viewer's underlying character attributes over a preset history period. That is, the historical personalized portraits associated with the viewers may reflect the long-term stability of the viewers' wearing styles, etc. over a preset historical period. Optionally, the historical personalized portrayal associated with the viewer may further include information describing environmental characteristics of the environment in which the display device is located within a preset history period. Namely, the historical personalized portraits associated with the watching party can also reflect the long-term stability of furniture style, light brightness and the like of the environment where the display equipment is located in a preset historical period.
The specific length of the preset history period and the time relationship with the current time may be set according to an empirical value, a test value of multiple tests, a requirement of an actual application scenario, and the like, which is not limited specifically.
After the real-time personalized portraits of the viewers in the current scene are obtained, the historical personalized portraits related to the viewers in the current scene can be further obtained, so that the historical personalized portraits and the real-time personalized portraits are fused to obtain the comprehensive personalized portraits.
The comprehensive personalized image can have long-term characteristics of a viewer in a current scene taking display equipment as a unit and real-time characteristics in a real-time viewing state, so that error jitter caused by accidental situations of the real-time personalized image can be reduced, and the specificity of the real-time personalized image can be enriched.
Optionally, the historical personalized portraits and the real-time personalized portraits can be integrated, the conflict or complementary characteristics are processed by adopting weighted calculation and combining time attenuation factors, and the comprehensive personalized portraits with stability and timeliness are formed, so that the accurate depiction and dynamic adaptation of the multiple demands of the user are realized. For example, the historical personalized portrait and the real-time personalized portrait associated with the viewers in the current scene can comprise information with the same dimension, and different weights can be allocated to the information with the same dimension in the historical personalized portrait and the real-time personalized portrait, so that new information with the same dimension can be obtained by adding the weights of the information with the same dimension in the historical personalized portrait and the real-time personalized portrait, and then the new information with the same dimension is the information with the dimension in the comprehensive personalized portrait, and further the comprehensive personalized portrait is obtained. The weight may be set according to an empirical value, a test value of multiple tests, and a requirement of an actual application scenario, which is not particularly limited.
In an alternative embodiment, the historical personalized portraits and the real-time personalized portraits associated with the viewers in the current scene can be input as models, so that the models can be summarized and generated into comprehensive personalized portraits through specific model fine tuning or prompting word technology.
The pre-training model is self-supervision learning based on a large amount of public data, and has certain intelligent emergence capability, such as language understanding, logic reasoning and the like. However, the effect on the vertical domain task may be poor, so that training of the vertical domain model needs to be performed on the model specifically by combining vertical domain data on the basis of the pre-training model, and the training is fine-tuning of the model, for example, a Low-Rank Adaptation (LORA) method and the like. In this embodiment, the fine-tuning of the model refers to fine-tuning of the vertical domain model such as barrage generation, personalized image extraction, personalized image update, personalized image integration, and the like.
The term "hint" refers to a method of guiding an artificial intelligence model (e.g., a large language model) to generate an expected response by designing and optimizing natural language instructions (i.e., "hint words"). In this embodiment, the artificial intelligence model is guided to generate a comprehensive personalized image conforming to the expectation based on the historical personalized image and the real-time personalized image associated with the viewer in the current scene through the natural language instruction.
S602, screening the barrage matched with the comprehensive personalized portrait from the existing barrages of the content displayed by the display as the personalized barrage when the content displayed by the display is the first type of content.
In general, when a user watches videos such as a video website, the user can actively input a barrage, so that in the process of displaying contents by the display, the displayed contents of the display may have a large amount of barrage resources input by other users during watching, and when the personalized barrage is determined for a watching party in a current scene, the personalized barrage can be directly screened from the existing barrage resources according to personalized portraits. Accordingly, for display contents in which there are no or few bullet screen resources, it is not possible to directly screen individual bullet screens from existing bullet screen resources, and thus it is necessary to generate individual bullet screens from individual images. Based on this, the content displayed on the display can be classified according to the number of bullet screen resources. The first type of content is content with bullet screen resources meeting the quantity requirement, such as media resource content with bullet screen resource access provided by a content provider. The second category of content is content that does not have bullet screen resources meeting the quantity requirement, such as live broadcast content, screen-cast content, content provided by a third party APP, content provided by HDMI from a film source, and the like, and new media asset content provided by a content provider that has bullet screen resource access but fewer bullet screen resources, and the like.
Wherein the first type of content is content having bullet screen resources that meet the quantity requirements.
As described above, in the case where the content displayed on the display is the first type of content, the barrage matching the integrated personalized portrait can be selected from the existing barrages of the content displayed on the display as the personalized barrage.
For example, a content tag may be added to each existing bullet screen according to the content of the bullet screen, so that the bullet screen with the content tag matched with the comprehensive personalized image may be determined as the personalized bullet screen.
S603, when the content displayed by the display is the second type of content, generating a barrage matched with the comprehensive personalized portrait as the personalized barrage.
Wherein the second category of content is content that does not have bullet screen resources that meet the quantity requirements.
Correspondingly, under the condition that the content displayed by the display is the second type of content, generating a barrage matched with the comprehensive personalized portrait as the personalized barrage.
For example, by AIGC method, based on the information of each dimension represented by the comprehensive personalized image, a barrage whose content meets the comprehensive personalized image is generated as the personalized barrage.
In this embodiment, the content displayed on the display is classified according to the number of existing barrage resources, and when the display displays different types of content, different manners are adopted to determine the personalized barrage resources. Therefore, aiming at the content displayed by a display without barrage resource access or with barrage resource access but fewer barrage resources, the personalized barrage matched with the comprehensive personalized portrait can be generated, so that personalized barrage display aiming at the watching party in the current scene is realized under the condition of no barrage source or fewer barrages. Furthermore, the personalized barrage is determined by adopting the comprehensive personalized portraits, so that the error jitter caused by the contingency of the real-time personalized portraits of the viewers in the current scene can be reduced, the specificity of the real-time personalized portraits can be enriched, and the accuracy and pertinence of the displayed barrage can be further improved.
It will be appreciated that the historical personalized portraits associated with the viewers in the current scene are time-varying, which may be formed progressively from the real-time personalized portraits over time, that is, the real-time personalized portraits of the viewers in the current scene may contribute to the historical personalized portraits associated with the viewers in the subsequent scene, and thus, the real-time personalized portraits of the viewers in the current scene may be utilized to construct a new historical personalized portraits associated with the viewers in the current scene for use in generating the comprehensive personalized portraits in the subsequent scene.
Based on this, in the above embodiments, in an exemplary embodiment, the history personalized portrait associated with the viewer in the current scene is a history personalized portrait in the current portrait update period, as shown in fig. 7, the barrage display method may include the following steps:
S701, after the last image updating period of the current image updating period is finished, performing image fusion on the real-time personalized images extracted in the preset history period before the current image updating period to obtain the history personalized images in the current image updating period.
In order to obtain a comprehensive personalized portrait by fusing a historical personalized portrait of a viewer in a current scene and a real-time personalized portrait in a current portrait period, the historical personalized portrait in the current period needs to be determined first.
Based on this, after the last image update period of the current image update period is ended, the real-time personalized images extracted in the preset history period before the current image update period can be acquired, and the acquired real-time personalized images are fused to use the obtained fused personalized images as the history personalized images in the current image update period.
Correspondingly, after the current image update period is finished, the real-time personalized images extracted in a preset history period before the next image update period of the current image update period can be subjected to image fusion to obtain the history personalized images in the next image update period of the current image update period, so that personalized barrage display for the viewer is realized in the next image update period of the current image update period.
In an alternative embodiment, the preset history period may include a plurality of portrait update periods as a plurality of history portrait update periods. Thus, the real-time personalized image extracted in each history image update period before the current image update period can be subjected to image fusion to obtain the history personalized image in the current image update period. Wherein the number of real-time personalized portraits extracted in each portrait update cycle may be at least one.
Optionally, after the last image update period of the current image update period is finished, all real-time personalized images extracted in each historical image update period before the current image update period can be subjected to image fusion to obtain a historical personalized image in the current image update period.
Optionally, in the case that the number of the real-time personalized images extracted in each image update period is plural, the image fusion may be performed on a part of the real-time personalized images extracted in each history image update period before the current image update period, so as to obtain a history personalized image in the current image update period. For example, the extracted part of the real-time personalized portraits in each historical portraits update period is randomly selected for portraits fusion, so as to obtain the historical personalized portraits in the current portraits update period. For another example, according to the frequency of use of the display device in different periods of each historical portrait update period, real-time personalized portraits extracted in a period with a higher frequency of use are selected for portrait fusion, so as to obtain a historical personalized portraits in the current portrait update period, and the like.
In an alternative embodiment, the extracted real-time personalized image may be stored in a preset historical image database in each image update period, so that after the image update period is finished, the real-time personalized image stored in a preset historical period before the next image update period of the image update period is read in the historical image database, and image fusion is performed on the read real-time personalized image to obtain a new historical personalized image as the historical personalized image in the next image update period of the image update period. For example, when each portrait update period is 1 day and the preset history period is 30 days, the history personalized portrait used every day is obtained by portrait fusion of real-time personalized portraits extracted within the previous 30 days.
Optionally, for each image update period, the real-time personalized images extracted in the preset history period are subjected to image fusion, and when the historical personalized images in the next image update period of the image update period are determined, the real-time personalized images extracted in the preset history period can be weighted in a nonlinear manner according to the principle similar to an RLS (Recursive Least Squares Filter, recursive least squares) filter. Wherein the weight of the real-time personalized portraits extracted in the history portraits update period is higher as the history portraits update period is closer to the portraits update period. For example, when the preset history period includes a plurality of history image update periods, the weight is first assigned to the information of the same dimension in each real-time personalized image extracted in the history image update period, then the information of the same dimension in each real-time personalized image extracted in the history image update period is weighted and summed to obtain a corresponding fused personalized image in the history image update period, and then the weight is assigned to each history image update period according to the time from each history image update period to the current image update period, wherein the weight in the history image update period closer to the image update period is higher. Therefore, the information of the same dimension in the corresponding fused personalized portraits in each historical portraits updating period can be weighted and summed to obtain new information of the same dimension, and the new information is used as the information of the same dimension in the current portraits updating period, so that the historical personalized portraits in the current portraits updating period are obtained. The weight may be set according to an empirical value, a test value of multiple tests, and a requirement of an actual application scenario, which is not particularly limited.
S702, under the condition that a bullet screen triggering event is detected in the process of displaying the content by the display, receiving the audio data and the video data of the current scene where the display equipment is located, which are acquired by the audio and video acquisition device, and acquiring the image data of the current display content of the display.
S703, generating scene description information of the current scene according to the audio data, the video data and the image data.
S704, extracting real-time personalized portraits of the viewers in the current scene from the scene description information.
The specific implementation manner of S702 to S704 is the same as the specific implementation manner of S501 to S503, and will not be described herein.
S705, performing portrait fusion on the historical personalized portraits and the real-time personalized portraits associated with the watching party to obtain the comprehensive personalized portraits.
S706, screening out the barrage matched with the comprehensive personalized portrait from the existing barrages of the content displayed by the display as the personalized barrage when the content displayed by the display is the first type of content.
And S707, when the content displayed by the display is the second type of content, generating a barrage matched with the comprehensive personalized portrait as the personalized barrage.
The specific implementation of S705 to S707 is the same as the specific implementation of S601 to S603, and will not be described here again.
S708, controlling the display to display the personalized barrage in a superposition mode.
The specific implementation manner of S708 is the same as that of S505, and will not be described here again.
In this embodiment, the real-time personalized image extracted in the preset history period before the current image update period is subjected to image fusion to obtain the history personalized image in the current image update period, so that the change of personalization along with time can be ensured while the long-term personalized tracking is focused, the matching of the characteristics of the history personalized image and the change of the viewer along with time is ensured, the comprehensive personalized image error caused by the excessively fixed history personalized image is avoided, the accuracy of the comprehensive personalized image is improved, and the accuracy and pertinence of the displayed bullet screen are further improved.
In general, when a viewer in a current scene views a content displayed on a display, he/she often wants to see a bullet screen related to the content displayed on the display, for example, evaluation information, introduction information, query solution information, etc. of the content displayed on the display. Thus, in generating the personalized barrage, the content displayed by the display may be further considered in addition to the comprehensive personalized image of the viewer in the current scene.
Based on this, on the basis of the above embodiments, in an exemplary embodiment, as shown in fig. 8, the barrage display method may include the steps of:
s801, under the condition that a bullet screen triggering event is detected in the process of displaying the content by the display, receiving the audio data and the video data of the current scene where the display equipment is located, which are acquired by the audio and video acquisition device, and acquiring the image data of the current display content of the display.
S802, generating scene description information of a current scene according to the audio data, the video data and the image data.
S803, extracting real-time personalized portraits of the viewers in the current scene from the scene description information.
S804, carrying out portrait fusion on the historical personalized portraits and the real-time personalized portraits associated with the watching party to obtain the comprehensive personalized portraits.
The specific implementation manner of S801 to S804 is the same as the specific implementation manner of S702 to S705, and will not be described here again.
S805, acquiring content analysis information of the content displayed by the display.
Wherein the content resolution information includes information describing at least one of a character, a scene, and a scenario in the content displayed on the display.
In the process of displaying the content by the display, the content displayed by the display can be analyzed to obtain content analysis information of the content displayed by the display.
The parsing of the content displayed on the display may include performing various parsing operations such as behavior recognition, object and scene classification, event monitoring, etc. on the content displayed on the display, so that person behavior, scene type, object type, event information (such as event occurrence timing, causal relationship, etc.) in the target content may be obtained, and thus the content parsing information of the content displayed on the display includes information describing at least one of person, scene, and scenario in the content displayed on the display. The information describing the person included in the content analysis information may be information such as the number of persons, the gender of the person, the relationship between the persons, the language of the person, and the behavior of the person, the information describing the scene included in the content analysis information may be information such as the scene type, the scene location, and the scene name, and the information describing the scenario included in the content analysis information may be information such as the scenario type, the scenario occurrence timing, the scenario cause, and the scenario cause relationship.
Optionally, in the process of displaying the content on the display, the content displayed on the display may be cached according to a preset caching period, and after each caching period is ended, the cached content is parsed, so as to obtain content parsing information. For example, when the cached content is analyzed, the cached content in the current preset caching period may be analyzed, all the currently cached target content may be analyzed, and the cached content in the preset duration range may be analyzed by taking the current time as the end time. This is reasonable.
In addition, as described above, the content displayed on the display may be a plurality of contents such as live broadcast content, screen-cast content, content provided by the third party APP, content provided by the HDMI from the active source, and content provided by the content provider, and according to the different sources of the content displayed on the display, whether the complete content to which the present display content belongs can be obtained in the display process is different. For example, in the case where the content displayed on the display is live content, screen-cast content, content provided by the third party APP, and content provided by the HDMI from a film source, the complete content to which the content displayed at this time belongs may not be obtained in the display process, whereas in the case where the content displayed on the display is content provided by the content provider, the complete content to which the content displayed at this time belongs may be obtained in the display process. Therefore, according to different sources of the content displayed by the display, different modes can be adopted to acquire the content analysis information of the content displayed by the display.
Based on this, in an alternative embodiment, the obtaining of the content analysis information in S805 is further limited, and S805 may include the following steps:
1. when the content displayed by the display is live content, caching the content displayed by the display according to a preset caching period in the process of displaying the content by the display, and analyzing the cached content after each caching is finished to obtain content analysis information.
As described above, for live broadcast content, the complete content of the live broadcast can not be obtained in the display process, but only the content displayed in the live broadcast can be obtained, so that the content displayed in the display can be cached according to a preset caching period (such as 10 seconds) in the process of displaying the content on the display, and after each caching period is finished, the cached content is analyzed, so as to obtain content analysis information.
Optionally, for the screen-throwing content, the content provided by the third party APP, the content provided by the HDMI from the active source, and the like, the display content of the complete content to which the display content belongs cannot be obtained in the display process, or the content displayed by the display can be cached according to a preset caching period in the process of displaying the content by the display, and after each caching is finished, the cached content is parsed, so as to obtain content parsing information.
Optionally, when the cached content is resolved, the cached content in the current preset caching period may be resolved, all the content that is cached currently may be resolved, and the content in the cached preset duration range may be resolved with the current time as the end time. This is reasonable.
In addition, the preset buffering period may be set according to the specific content of the content displayed by the display, the estimated total display duration, the watching habit of the watching party, and the like, which is not limited in particular.
2. And when the content displayed by the display is the on-demand content, acquiring content analysis information of the content displayed by the display from the server.
The on-demand content refers to display content provided by a content provider for selection by a user, and in general, complete content to which the on-demand content belongs can be obtained before or during display of the on-demand content. For example, a television episode, a movie, a variety episode, etc. provided by each video website, etc.
Accordingly, in the case that the content displayed by the display is the on-demand content, the complete content of the on-demand content can be obtained in the display process, so that the complete content of the on-demand content can be directly obtained although the complete content of the on-demand content is not displayed by the display at this time, and the content analysis information of the content displayed by the display can be directly obtained from the server.
Optionally, the complete content to which the on-demand content belongs may be obtained from the server, and then the obtained complete content is parsed to obtain content parsing information.
Optionally, the server may store content analysis information of the complete content to which the on-demand content belongs in advance, and then the content analysis information of the complete content to which the on-demand content belongs may be directly obtained from the server, without performing analysis operation.
S806, screening out the barrage matched with the content analysis information and the comprehensive personalized image from the existing barrages of the content displayed by the display as the personalized barrage when the content displayed by the display is the first type of content.
After the content analysis information is obtained, as described above, the first type of content is content having barrage resources meeting the quantity requirement, and then, if the content displayed by the display is the first type of content, the barrage matched with the content analysis information and the comprehensive personalized image can be screened directly from the existing barrages of the content displayed by the display, and used as the personalized barrage.
For example, a content tag may be added to each existing bullet screen according to the content of the bullet screen, so that the bullet screen having the content tag matched with the content analysis information and the comprehensive personalized image may be determined as a personalized bullet screen.
S807, when the content displayed on the display is the second type of content, generating a barrage matching both the content analysis information and the comprehensive personalized image as a personalized barrage.
Correspondingly, if the second type of content is the content which does not have the bullet screen resources meeting the quantity requirement, the bullet screen which is matched with the content analysis information and the comprehensive personalized image can be generated as the personalized bullet screen under the condition that the content displayed by the display is the second type of content.
For example, the content analysis information and the comprehensive personalized image can be fused to obtain the fused information, and the fused information is used for describing the characteristics of matching required by the bullet screen to be displayed, so that the bullet screen with the content conforming to the content analysis information and the comprehensive personalized image is generated as the personalized bullet screen according to the characteristics of the bullet screen represented by the fused information in a AIGC mode.
S808, controlling the display to display the personalized barrage in a superposition mode.
The specific implementation manner of S808 is the same as that of S505, and will not be described herein.
In this embodiment, the content analysis information may be involved in determining the personalized barrage by acquiring the content analysis information of the target content, so that, while the determined personalized barrage meets the comprehensive personalized portrait of the viewer in the current scene, the matching degree of the determined personalized barrage and the content displayed by the display may be improved, so that the comprehensive matching degree of the determined personalized barrage to the viewer and the content displayed in the current scene may be improved, and under the condition of further improving the accuracy and pertinence of the displayed barrage, the interest of the displayed barrage and the attraction of the viewer may be improved, and the viewing experience of the viewer may be further improved.
It will be appreciated that for a scene where the members are relatively stationary, during the content display of the display, the person in the current scene where the display device is located is typically stationary, so that the determined personalized portraits, such as the real-time personalized portraits, the historical personalized portraits, and the comprehensive personalized portraits, are directed to the stationary members of the scene, and thus, the stationary members of the scene as a whole are personalized portraits determined and personalized barrage displayed. For example, for a family, it is common to perform personalized image determination by taking family members as a whole, and further perform personalized barrage display.
However, in some cases, during the display of content by the display, there may be occasional persons in the current scene in which the display device is located, or there may not be all of the fixed members in the current scene in which the display device is located. In order to avoid the influence of the accidental personalized image of the person on the real-time personalized image of the fixed member in the scene, and further avoid the influence of the historical personalized image of the fixed member in the scene, so that the subsequent personalized barrage display aiming at the fixed member in the scene has deviation, the accuracy and pertinence are reduced, and in order to improve the accuracy and pertinence of the personalized barrage aiming at the fixed member existing in the current scene, the person belonging to the fixed member in the current scene can be identified, so that the personalized image determination and the personalized barrage display aiming at the fixed member can be carried out.
Based on this, on the basis of the above embodiments, in an exemplary embodiment, as shown in fig. 9, the barrage display method may include the steps of:
s901, under the condition that a bullet screen triggering event is detected in the process of displaying the content by the display, receiving the audio data and the video data of the current scene where the display equipment is located, which are acquired by the audio and video acquisition device, and acquiring the image data of the current display content of the display.
S902, generating scene description information of the current scene according to the audio data, the video data and the image data.
The specific implementation manner of S901-S902 is the same as the specific implementation manner of S501-S502, and will not be described herein.
S903, determining the biological characteristics of the person in the current scene according to the audio data and/or the video data.
Wherein the biometric characteristic is a physiological attribute characteristic for uniquely identifying the person. Such as voiceprint information, facial image information, etc. The voiceprint information of each person in the current scene can be obtained by voiceprint recognition of the audio data, and the face image information of each person in the current scene can be obtained by face image recognition of the video data.
Alternatively, voiceprint information of each person in the current scene may be determined from the audio data as a biometric feature of each person in the current scene.
Alternatively, face image information of each person in the current scene may be determined from the video data as the biometric feature of each person in the current scene.
Alternatively, voiceprint information of each person in the current scene may be determined according to the audio data, and face image information of each person in the current scene may be determined according to the video data, so that the voiceprint information and the face image information are used as biological features of each person in the current scene.
S904, matching the biological characteristics with the biological characteristics of the preset person to obtain a target person belonging to the preset person in the current scene.
According to the situation of the members of the scene where the display device is located, a preset person which can be used as a watching party can be preset, for example, for a family, the family members can be set as preset persons, and the biological characteristics of the preset persons can be collected in advance for storage. Therefore, the biological characteristics of each person in the current scene can be matched with the biological characteristics of the preset person, and the target person belonging to the preset person in the current scene can be obtained.
S905, a target person or a person of the target persons that matches the content displayed on the display is determined as a viewer.
Alternatively, after the target person is determined, the target person may be determined directly as the viewer.
Optionally, after the target person is determined, according to specific information of the displayed content, a person currently focusing on the displayed content or a person with higher focusing on the displayed content of the display may be selected from the target persons, so that the selected person may pay more attention to whether the bullet screen displayed by the display matches with own preference and requirement, and thus, personalized bullet screen display may be performed for the selected person, and then, a person matched with the displayed content of the display in the target person may be determined as a viewer.
For example, the determined target character includes character a and character B, and character a is more focused on the variety program and character B is more focused on the news content, and the content displayed on the display is the variety program, character a may be determined as the viewer.
S906, extracting real-time personalized portraits of the viewers in the current scene from the scene description information.
S907, determining a personalized barrage according to the real-time personalized portraits of the viewers in the current scene.
S908, controlling the display to display the personalized barrage in a superposition mode.
The specific implementation of S906-S908 is the same as the specific implementation of S503-S505, and will not be described here again.
Optionally, in this embodiment, a history personalized image for all preset people exists in the current image update period, and a history personalized image for each preset person exists in the current image update period, so that after the last image update period of the current image update period is finished, image fusion can be performed on the real-time personalized image extracted in the preset history period before the current image update period, so as to obtain a history personalized image for all preset people in the current image update period, and a history personalized image for each preset person in the current image update period.
In the embodiment, personalized barrage display can be performed for specific characters in the current scene, so that the granularity of barrage personalization is further refined, and the accuracy and pertinence of the displayed barrages are improved.
Based on the above embodiments, in an exemplary embodiment, as shown in fig. 10, the bullet screen display method may include the steps of:
S1001, after the last image updating period of the current image updating period is finished, the controller performs image fusion on the real-time personalized images extracted in the preset history period before the current image updating period to obtain the history personalized images in the current image updating period.
S1002, an audio and video acquisition device acquires audio data and video data of a current scene where display equipment is located.
S1003, in the process of displaying the content by the display, the image acquisition device acquires image data of the content currently displayed by the display.
S1004, the controller respectively sends data acquisition requests to the audio acquisition device and the image acquisition device under the condition that the bullet screen triggering event is detected;
S1005, the audio and video acquisition device sends the audio data and the video data to the controller.
S1006, the image acquisition device acquires and sends image data to the controller.
S1007, the controller generates scene description information of the current scene from the audio data, the video data, and the image data.
S1008, the controller extracts the real-time personalized portrait of the viewer in the current scene from the scene description information.
S1009, the controller performs image fusion on the historical personalized images and the real-time personalized images related to the watching party to obtain the comprehensive personalized images.
And S1010, caching the content displayed by the display according to a preset caching period by the controller under the condition that the content displayed by the display is live broadcast content, and analyzing the cached content after each caching is finished to obtain content analysis information.
And S1011, the controller acquires content analysis information of the content displayed by the display from the server when the content displayed by the display is the on-demand content.
And S1012, when the content displayed by the display is the first type of content, the controller screens the barrage matched with the content analysis information and the comprehensive personalized image from the existing barrages of the content displayed by the display, and takes the barrages as personalized barrages.
And S1013, when the content displayed by the display is the second type of content, the controller generates a barrage matched with the content analysis information and the comprehensive personalized image as the personalized barrage.
S1014, the controller sends the personalized barrage to the display.
S1015, the display displays the personalized barrage in a superposition mode.
The specific implementation manner of S1001 to S1015 is the same as that of the foregoing embodiments, and will not be described herein.
Based on the same inventive concept, some embodiments also provide a barrage display apparatus for implementing the barrage display method related to the above. The implementation of the solution provided by the device is similar to that described in the above method, so the specific limitation of one or more embodiments of the barrage display device provided below may be referred to above for limitation of the barrage display method, and will not be repeated here.
In an exemplary embodiment, as shown in fig. 11, a barrage display apparatus is provided, which is applied to a display device, and includes a data acquisition module 1110, an information generation module 1120, an image extraction module 1130, a barrage determination module 1140, and a barrage display module 1150.
The data acquisition module 1110 is configured to receive, when a bullet screen triggering event is detected during content display performed by a display of the display device, audio data and video data of a current scene where the display device is located, which are acquired by the audio and video acquisition device, and acquire image data of a current display content of the display;
The information generating module 1120 is configured to generate scene description information of a current scene according to the audio data, the video data and the image data, where the scene description information is used to describe an environmental feature and a character feature of the current scene;
The portrait extraction module 1130 is used for extracting a real-time personalized portrait of a viewer in the current scene from the scene description information;
the barrage determining module 1140 is configured to determine a personalized barrage according to the real-time personalized portraits of the viewers in the current scene;
And the bullet screen display module 1150 is used for controlling the display to display the personalized bullet screen in a superposition mode.
In an exemplary embodiment, the barrage determining module 1140 includes an image fusion unit configured to fuse a historical personalized image associated with a viewer with a real-time personalized image to obtain a comprehensive personalized image, where the historical personalized image includes information describing a feature of a viewing content of the viewer in a preset historical period, a first determining unit configured to screen, from existing barrages of the content displayed on the display, a barrage matching the comprehensive personalized image as a personalized barrage if the content displayed on the display is a first type of content, where the first type of content is a content having barrage resources meeting a quantity requirement, and a second determining unit configured to generate, as a personalized barrage, a barrage matching the comprehensive personalized image if the content displayed on the display is a second type of content, where the second type of content is a content that does not have barrage resources meeting the quantity requirement.
In an exemplary embodiment, the viewing party associated historical personalized image is a historical personalized image in a current image update period, and the bullet screen display device further comprises an image update module, which is used for carrying out image fusion on the real-time personalized image extracted in a preset historical period before the current image update period after the last image update period of the current image update period is finished, so as to obtain the historical personalized image in the current image update period.
In an exemplary embodiment, the bullet screen display device further comprises an information determining module, a character determining module and a watching party determining module, wherein the information determining module is used for determining the biological characteristics of characters in the current scene according to audio data and/or video data before extracting real-time personalized portraits of the watching party in the current scene from scene description information, the biological characteristics are physiological attribute characteristics used for uniquely identifying the characters, the information matching module is used for matching the biological characteristics with biological characteristics of preset characters to obtain target characters belonging to the preset characters in the current scene, and the character determining module is used for determining the target characters or characters matched with contents displayed by a display in the target characters as the watching party.
In an exemplary embodiment, the bullet screen display device further comprises an information acquisition module, a first determination unit and a second determination unit, wherein the information acquisition module is used for acquiring content analysis information of content displayed by the display, the content analysis information comprises information describing at least one of characters, scenes and plots in the content displayed by the display, the first determination unit is specifically used for screening bullet screens matched with the content analysis information and comprehensive personalized images from existing bullet screens of the content displayed by the display to serve as personalized bullet screens, and the second determination unit is specifically used for generating bullet screens matched with the content analysis information and the comprehensive personalized images to serve as personalized bullet screens.
In an exemplary embodiment, the information determining module is specifically configured to cache the content displayed on the display according to a preset cache period in a process of displaying the content on the display when the content displayed on the display is live content, and parse the cached target content after each cache is completed to obtain content parsing information, and obtain the content parsing information of the content displayed on the display from the server when the content displayed on the display is on-demand content.
In an exemplary embodiment, the information generating module 1120 includes an information generating module, configured to specifically input audio data, video data, and image data into the multi-modal large model, to obtain scene description information of the current scene.
In an exemplary embodiment, the multi-mode large model comprises a mode alignment module and a semantic recognition module, wherein the information generation module is specifically used for outputting audio data, video data and image data to the mode alignment module for mode alignment to obtain data features of the audio data, the video data and the image data in the same semantic space, and inputting the data features to the semantic recognition module for semantic recognition to obtain scene description information of a current scene.
In an exemplary embodiment, the representation extraction module 1130 is specifically configured to extract historical operation data of the viewer in the current scene from the locally stored data of the display device, and extract a real-time personalized representation of the viewer in the current scene from the scene description information and the historical operation data.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, displayed data, etc.) related to the present application are both information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to meet the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile memory and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (RESISTIVE RANDOMACCESS MEMORY, reRAM), magneto-resistive Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computation, an artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) processor, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the present application.
The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.