wherein L is_bCan represent the foreground loss function, P_tCan represent the foreground characteristic key point, P, in the video frame at the moment t_t+1Foreground feature keypoints in the video frame at time t +1 may be represented.

The background loss function can be represented by the relation (2):

wherein L is_fCan represent the background loss function, F_tCan represent the key point of the background feature in the video frame at the moment t, F_t+1Can represent the video at the moment of t +1Background feature keypoints in the frame.

The objective loss function can be represented by the relation (3):

L＝(1-λ)L_b+λL_f (3)

where L may represent an objective loss function, L_bThe foreground loss function, L, can be expressed_fThe background loss function can be represented, the lambda can represent a weight coefficient with the value range of 0 to 1, and the foreground loss and the background loss are combined through the lambda to form the total loss L of the video anti-shake model.

Fig. 5 schematically illustrates a structural diagram of a video anti-shake model in an exemplary embodiment of the present disclosure.

Referring to fig. 5, after a video frame to be processed is acquired (assuming that the duration is 0 to T-1), image segmentation is performed on the video frame to obtain a foreground image region and a background image region, and feature key point extraction is performed on the foreground image region and the background image region respectively by using different feature extraction algorithms to obtain a background featurekey point 501 and a foreground featurekey point 502.

Background featurekey points 501 and foreground featurekey points 502 in at least three frames of video frames (such as video frames at time t-1, time t and time t + 1) are used as input of a videoanti-shake model 503, and the background featurekey points 501 and the foreground featurekey points 502 are respectively encoded by using one-dimensional convolution to obtainbackground feature vectors 504 andforeground feature vectors 505.

Coding thebackground feature vector 504 by processing the multilayer one-dimensional convolution of the background feature to obtain anintermediate coding result 506, and coding theforeground feature vector 505 by processing the multilayer one-dimensional convolution of the foreground feature to obtain anintermediate coding result 507; and fusing theintermediate coding result 506 and theintermediate coding result 507 to obtain a feature vector S1, a feature vector S2, a feature vector S3 and a feature vector S4, wherein for the formic acid process fusing the intermediate coding results, reference may be made to alegend 508, where λ may represent a weight coefficient for fusing foreground features and background features, and S may represent a feature vector after image feature coding.

The feature vector S1, the feature vector S2, the feature vector S3, and the feature vector S4 are input into themulti-layer decoding network 509 for decoding the feature vector, and finally, theanti-shake displacement data 510 corresponding to the feature key points in the video frame at time t is output. Of course, fig. 5 is only a schematic illustration, and the exemplary embodiment is not limited thereto.

In an exemplary embodiment, displacement weight data of image content around the feature key point may be determined according to the anti-shake displacement data, and then the feature key point may be re-rendered through the anti-shake displacement data, and the image content around the feature key point may be re-rendered through the determined displacement weight data. For example, when the foreground image area is the portrait area, the obtained anti-shake displacement data of the portrait feature key point a in the portrait area is 1, and in order to avoid deformation of the face area, the displacement weight data of the image content near the portrait feature key point a may be set to be 100%, that is, the anti-shake displacement data of the portrait feature key point a is 1, then the anti-shake displacement data of the image content near the portrait feature key point a is also 1, and the moving direction is consistent with the portrait feature key point a; for the background image area, the displacement weight data of the nearby image content may be set according to the distance from the background feature key point, that is, the image content closer to the background feature key point is, the displacement weight data is larger, if the anti-shake displacement data of the background feature key point is 3, the anti-shake displacement data of the image content within the range of 0-1 from the background feature key point is 3, and the anti-shake displacement data of the image content within the range of 1-2 from the background feature key point is 2.7, and so on, of course, the displacement weight data of the image content around the feature key point may be set in other setting manners, and may be specifically set by self-definition according to the actual situation, which is only an illustrative example here, and should not cause any special limitation to this example embodiment.

Fig. 6 schematically illustrates a schematic diagram of implementing video frame anti-shake processing by anti-shake displacement data in an exemplary embodiment of the disclosure.

Referring to fig. 6, feature key points a and B in avideo frame 601 at a time t-1 and feature key points a and B in avideo frame 602 at a time t are input into a video anti-shake model, so as to obtainanti-shake displacement data 603 corresponding to the feature key points in the video frame at the time t. The characteristic key points a and B in the video frame and the image content around the characteristic key points a and B may be re-rendered according to theanti-shake displacement data 603 and the displacement weight data determined by theanti-shake displacement data 603, so as to obtain theanti-shake video frame 604 at time t.

Optionally, the image content in the video frame may be divided into regions in a grid form, and then the feature key points and the image content around the feature key points are re-rendered in a grid unit, so that the rendering efficiency can be effectively accelerated, and the deformation phenomenon of the image content can be reduced.

Fig. 7 schematically illustrates a flow chart of anti-shaking of a video frame in an exemplary embodiment of the disclosure.

Referring to fig. 7, in step S710, an input video frame is acquired;

step S720, performing foreground and background segmentation on the video frame; performing semantic segmentation on the video frame through a full convolution network to obtain a foreground image area and a background image area;

step S730, detecting key points of the face features in the foreground image area through a dense 3D face alignment algorithm, and determining key points of the foreground features;

step S740, detecting the key points of the background features of the background image area through an optical flow tracking algorithm, and determining the key points of the background features;

step S750, inputting feature key points (namely foreground feature key points and background feature key points) in at least three adjacent frames of video frames (such as video frames at the time t-1, the time t and the time t + 1) into a video anti-shake model, and outputting anti-shake displacement data corresponding to the feature key points in the video frames at the time t through the video anti-shake model;

and step S760, performing re-rendering processing on the video frame through the anti-shake displacement data to obtain the anti-shake video frame.

In summary, in the exemplary embodiment, a video frame to be processed may be obtained, feature key points of the video frame are extracted, feature key points of consecutive frames are input into a video anti-shake model, anti-shake displacement data corresponding to each feature key point is output, and finally, re-rendering processing may be performed on picture contents of different portions in the video frame according to the output anti-shake displacement data, so as to generate an anti-shake video frame. On one hand, the motion of a portrait on a video sequence is tracked by extracting characteristic key points in a video frame, then anti-shake displacement data of the characteristic key points after anti-shake is output through a video anti-shake model, and then different characteristic key points in the video frame are adjusted to different degrees through the anti-shake displacement data, so that the problem that partial images are deformed due to the fact that anti-shake processing is carried out on the whole video frame in the related technology is avoided, and the display effect of the video frame after anti-shake is improved; on the other hand, by means of a deep learning mode, effective contents, namely feature key points, of video frames are fully utilized, a video anti-shake model is trained, the anti-shake model is insensitive to environmental factors such as shake degree of video pictures and size of human images in the video frames, has strong robustness, and is suitable for human image video anti-shake in various motion scenes.

It is noted that the above-mentioned figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Further, referring to fig. 8, a videoanti-shake apparatus 800 is further provided in the present example, and may include a videoframe obtaining module 810, a feature keypoint extracting module 820, an anti-shake displacementdata determining module 830, and a video frameanti-shake processing module 840. Wherein:

the videoframe acquiring module 810 is configured to acquire a video frame to be processed;

the feature keypoint extraction module 820 is configured to extract feature key points of the video frames;

the anti-shake displacementdata determination module 830 is configured to input the feature key points of consecutive multiple frames into a video anti-shake model, and output anti-shake displacement data corresponding to each feature key point;

the video frameanti-shake processing module 840 is configured to perform re-rendering processing on the video frame according to the anti-shake displacement data, and generate an anti-shake video frame.

In an exemplary embodiment, the featurekeypoint extraction module 820 may be configured to:

performing image segmentation on the video frame, and determining a foreground image area and a background image area corresponding to the video frame;

extracting foreground characteristic key points in the foreground image area and extracting background characteristic key points in the background image area;

and taking the foreground characteristic key points and the background characteristic key points as the characteristic key points of the video frame.

In an exemplary embodiment, the foreground image region may include a portrait region, and the foreground feature key points may include face feature key points; the featurekeypoint extraction module 820 may be configured to:

constructing a virtual three-dimensional face model;

fitting the portrait area and the virtual three-dimensional face model through a pre-trained convolutional neural network, and determining key points of face features in the portrait area.

traversing luminance values in the background image region of adjacent video frames;

and determining a motion vector according to the brightness value, and taking a starting point corresponding to the motion vector as a background feature key point of the background image area.

and performing semantic segmentation on the video frame through a pre-trained full convolution neural network, and determining a foreground image area and a background image area corresponding to the video frame.

In an exemplary embodiment, the videoanti-shake apparatus 800 may further include a video anti-shake model training module, and the video anti-shake model training module may be configured to:

constructing an initial video anti-shake model, wherein the initial video anti-shake model comprises a target loss function, and the target loss function comprises a foreground loss function and a background loss function;

and carrying out unsupervised learning training on the initial video anti-shake model to obtain a trained video anti-shake model.

In an exemplary embodiment, the video frameanti-shake processing module 840 may be further configured to:

determining displacement weight data of image contents around the characteristic key points according to the anti-shake displacement data;

and re-rendering the characteristic key points through the anti-shake displacement data, and re-rendering the image content around the characteristic key points through the displacement weight data.

The specific details of each module in the above apparatus have been described in detail in the method section, and details that are not disclosed may refer to the method section, and thus are not described again.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An exemplary embodiment of the present disclosure provides an electronic device for implementing a convolutional neural network pruning method, which may be a

terminal device

101, 102, 103 or aserver 105 in fig. 1. The electronic device comprises at least a processor and a memory for storing executable instructions of the processor, the processor being configured to perform the video anti-shake method via execution of the executable instructions.

The following takes the electronic apparatus 900 in fig. 9 as an example, and exemplifies the configuration of the electronic apparatus in the present disclosure. The electronic device 900 shown in fig. 9 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.

As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: at least oneprocessing unit 910, at least onememory unit 920, abus 930 connecting different system components (including thememory unit 920 and the processing unit 910), adisplay unit 940.

In which thestorage unit 920 stores program codes that can be executed by theprocessing unit 910 so that theprocessing unit 910 performs the motion posture determination method in this specification.

Thestorage unit 920 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)921 and/or acache memory unit 922, and may further include a read only memory unit (ROM) 923.

Storage unit 920 may also include a program/utility 924 having a set (at least one) ofprogram modules 925,such program modules 925 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 930 can be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 900 may also communicate with one or more external devices 970 (e.g., sensor devices, bluetooth devices, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any devices (e.g., routers, modems, etc.) that enable the electronic device 900 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O)interface 950. Also, the electronic device 900 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via thenetwork adapter 960. As shown, thenetwork adapter 960 communicates with the other modules of the electronic device 900 via thebus 930. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 900, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and sensor modules (e.g., gyroscope sensors, magnetic sensors, acceleration sensors, distance sensors, proximity light sensors, etc.).

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the terminal device.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Furthermore, program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.