Movatterモバイル変換


[0]ホーム

URL:


CN113191942A - Method for generating image, method for training human detection model, program, and device - Google Patents

Method for generating image, method for training human detection model, program, and device
Download PDF

Info

Publication number
CN113191942A
CN113191942ACN202110561037.1ACN202110561037ACN113191942ACN 113191942 ACN113191942 ACN 113191942ACN 202110561037 ACN202110561037 ACN 202110561037ACN 113191942 ACN113191942 ACN 113191942A
Authority
CN
China
Prior art keywords
human
image
image block
original
pose
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110561037.1A
Other languages
Chinese (zh)
Inventor
支蓉
郭子杰
张武强
王宝锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mercedes Benz Group AG
Original Assignee
Daimler AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Daimler AGfiledCriticalDaimler AG
Priority to CN202110561037.1ApriorityCriticalpatent/CN113191942A/en
Publication of CN113191942ApublicationCriticalpatent/CN113191942A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

The present invention relates to the field of image generation and the field of person detection. In particular to a method for generating an image, wherein the method comprises the following steps: s1, providing an original image (10) containing a person; s2, cutting out at least one original character image block (11) from the original image (10); s3, generating a synthesized human image block (21) based on each original human image block (11), wherein the synthesized human image block (21) has a background in the corresponding original human image block (11) and a human pose different from the human pose in the corresponding original human image block (11); s4, replacing the corresponding original human image block (11) in the original image (10) with the synthesized human image block (21) to generate the synthesized image (20). It also relates to a method of training a human detection model (40), a computer program product and an apparatus for processing images.

Description

Method for generating image, method for training human detection model, program, and device
Technical Field
The present invention relates to the field of image generation and the field of person detection, and in particular to a method of generating an image, a method of training a person detection model, and a corresponding computer program product and apparatus for processing an image.
Background
The person detection technology based on computer vision can detect the position of a person and the like by processing images or video information acquired by a camera. The human detection has wide application prospect, for example, the human detection can be used for pedestrian detection and used as a key technology in the applications of vehicle auxiliary driving, vehicle automatic driving, intelligent video monitoring, human behavior analysis and the like. In recent years, machine learning has become an algorithm widely used in the field of computer vision and the like. People detection technology based on machine learning is increasingly receiving attention from both academic and industrial fields.
The performance of machine learning based character detection models depends not only on the quality of the model construction, but also on the quality and quantity of training data. In order to ensure the performance of the human detection model, a large number of samples are usually required for training, and the acquisition of the large number of samples consumes a large amount of manpower and material resources. The data enhancement technology is a better method for reducing acquisition cost, can effectively expand the number of training samples and improve the identification accuracy of the character detection model.
For example, existing Generative Networks such as Variational Autocodes (VAEs), Generative Adaptive Networks (GANs), etc. may generate new samples based on a training data set having a limited number of training samples.
However, most of the current generation processes are random processes, and the current generation model is difficult to ensure that the pattern of the target image is accurately controlled while generating a high-quality and high-definition image, so that the generated image is not suitable for being used as a training sample of a person detection model.
Therefore, the prior art still has many defects in image generation and in improving the recognition rate of the person detection model.
Disclosure of Invention
It is an object of the invention to provide an improved method of generating images, an improved method of training a person detection model, and corresponding computer program products and apparatuses.
According to a first aspect of the present invention, there is provided a method of generating an image, wherein the method comprises the steps of:
s1, providing an original image containing a person;
s2, cutting out at least one original character image block from the original image;
s3, generating a synthesized human image block based on each of the original human image blocks, wherein the synthesized human image block has a background in the corresponding original human image block and a human pose different from the human pose in the corresponding original human image block;
s4, replacing the corresponding original human image block in the original image with the synthesized human image block to generate a synthesized image.
According to the present invention, the newly generated synthesized human image block has the background in the original human image block. That is, the synthesized human image block contains background information that matches the environmental information in the original image. It is thus possible to directly replace the original human image block with the corresponding newly generated synthetic human image block. No element other than the person in the complete composite image thus generated is changed. In the synthesized image, the new synthesized image block of the person is in a reasonable position and a reasonable size, and simultaneously can well accord with the environmental information in the original image. These composite images have different character poses compared to the original images, and therefore can provide more diverse character poses and bounding box information.
According to an alternative embodiment of the invention the synthetic character image block has the appearance of a character in the corresponding original character image block.
In accordance with an alternative embodiment of the present invention, step S1 also includes providing target pose information, and in step S3, generating a composite human image block based on each original human image block and according to the target human pose such that the composite human image block has the target human pose represented by the target pose information.
According to an alternative embodiment of the invention, the target pose information is provided by a pose image comprising pose key points connected according to a real human skeleton link. Alternatively, the target pose information is provided by a target pose character image containing a character having a target character pose. Alternatively, the target pose information is provided by position data for a set of pose key points.
According to an alternative embodiment of the invention, the target pose information has associated annotation information, and the composite character image patch generated in step S3 has annotation information associated with the corresponding target pose information, wherein the annotation information optionally includes character intent information and/or gesture information.
According to an alternative embodiment of the present invention, in step S3, the original human image block and the target posture information are input into a human generator that generates a composite human image block.
According to an alternative embodiment of the invention, the character generator is configured to be able to perform the following steps:
s31, identifying at least one posture key point of the character in the original character image block;
s32, intercepting a plurality of foreground image blocks and a plurality of background image blocks from the original character image block based on the at least one posture key point;
s33, extracting at least one first feature vector from the plurality of foreground image patches and the plurality of background image patches;
s34, acquiring at least one second feature vector from the target posture information; and
and S35, generating a composite human image block from the at least one first feature vector extracted in step S33 and the at least one second feature vector acquired in step S34.
According to a second aspect of the present invention, there is provided a method of training a human detection model, wherein the method comprises the steps of: providing a training data set comprising a synthetic image generated by a method of generating images according to the invention; and training the character detection model with a training data set, wherein the training data set optionally includes original images.
As described above, on the one hand, in the synthesized image, the new synthesized human image block has both a reasonable position and a reasonable size, while well conforming to the surrounding environment information in the original image. On the other hand, these composite images may provide more varied character poses and bounding box information. Therefore, the composite image is particularly suitable for training the human detection model, so that the human detection model obtains a higher recognition rate.
According to a third aspect of the invention, there is provided a computer program product comprising computer program instructions, wherein the computer program instructions, when executed by one or more processors, enable the processor to perform a method of generating an image according to the invention or a method of training a human detection model according to the invention.
According to a fourth aspect of the present invention, there is provided an apparatus for processing an image, the apparatus comprising a processor and a computer-readable storage device communicatively connected to the processor, the computer-readable storage device having stored thereon a computer program for implementing the method of generating an image according to the present invention or the method of training a human detection model according to the present invention when the computer program is executed by the processor.
By the invention, the following effects are realized: the figure in the generated composite image has reasonable position and size, and the problem that the foreground is not matched with the background information is avoided. And training the character detection model by using the synthetic image so as to improve the recognition rate of the character detection model.
Drawings
The principles, features and advantages of the present invention may be better understood by describing the invention in more detail below with reference to the accompanying drawings. The drawings comprise:
fig. 1 is a schematic block diagram illustrating an apparatus for processing an image according to an exemplary embodiment of the present invention;
FIG. 2 shows a flow diagram of a method of generating an image according to an exemplary embodiment of the invention;
FIG. 3 schematically illustrates an original image;
FIG. 4 schematically illustrates a method of generating an image according to an exemplary embodiment of the invention;
FIG. 5 schematically illustrates a process of generating a composite human image block, according to an exemplary embodiment of the present invention; and
figure 6 schematically illustrates a method of training a human detection model according to an exemplary embodiment of the invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and exemplary embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the scope of the invention.
Fig. 1 illustrates a schematic structural block diagram of an apparatus for processing an image according to an exemplary embodiment of the present invention. The apparatus for processing images comprises aprocessor 1 and a computerreadable storage device 2 communicatively connected to theprocessor 1. The computer-readable storage device 2 has stored therein a computer program for implementing a method of generating an image or a method of training a human detection model, which will be explained in detail below, when the computer program is executed by theprocessor 1.
According to an exemplary embodiment, adisplay device 3 is provided in communicative connection with theprocessor 1. By means of thedisplay device 3, the user can view theoriginal image 10 processed by the device of the image to be processed and thenew composite image 20 generated by the device.
According to an exemplary embodiment, aninput device 4 is provided in communicative connection with theprocessor 1. By means of theinput device 4, the user can select or input anoriginal image 10 to be processed by the device. Theinput device 4 may include, for example: a keyboard, a mouse, and/or a touch screen.
According to an exemplary embodiment, a camera device 5 is provided in communicative connection with theprocessor 1. By means of the camera device 5, the user can take a photograph containing an image of a person as anoriginal image 10 to be processed by the device. In particular, theoriginal image 10 includes not only the person but also surrounding environment information such as a scene in which the person is located.
According to an exemplary embodiment, an original image set is provided which is made up of a plurality oforiginal images 10. The raw set of images may be stored in the computerreadable storage device 2 or another storage device communicatively connected to theprocessor 1.
Fig. 2 shows a flowchart of a method of generating an image according to an exemplary embodiment of the present invention.
In step S1, theoriginal image 10 containing the person is provided. Fig. 3 schematically shows anoriginal image 10, theoriginal image 10 containing a person and a scene in which the person is located. Theoriginal image 10 may be any of the images in the original set mentioned above. Theoriginal image 10 may be, for example, an image captured by a user via the camera 5 or a frame of a person image captured from a video stream.
Then, in step S2, at least one original-person image block 11 is cut out from theoriginal image 10. As shown in fig. 4, two original human image blocks 11 can be cropped from theoriginal image 10 shown in fig. 3. The cropped originalcharacter image block 11 contains the complete character and also contains a small amount of background.
In one exemplary embodiment, pose keypoints of a person contained in theoriginal image 10 are identified, and the original-person image blocks 11 are cropped according to the identified pose keypoints, so that a single original-person image block 11 contains a single whole person. For example, from the identified pose key points, a character bounding box may be determined, which is expanded outward, e.g., by a factor of 1.5, to form a cropped bounding box for cropping out the originalcharacter image block 11.
Next, in step S3, a synthesizedhuman image block 21 is generated based on each of the original human image blocks 11, where the synthesizedhuman image block 21 has the background in the corresponding originalhuman image block 11 and a human posture different from the human posture in the corresponding originalhuman image block 11. The syntheticcharacter image block 21 has new character pose and bounding box information, making the information in the full data set more rich.
Alternatively, the synthesizedhuman image block 21 has the appearance of a human in the corresponding originalhuman image block 11. Thus, the synthesizedhuman image block 21 changes only the human pose in the originalhuman image block 11, while maintaining the human appearance and background of the originalhuman image block 11.
Illustratively, the synthetichuman image block 21 may be generated by means of ahuman generator 30. As shown in fig. 4, thehuman generator 30 generates two corresponding synthesized human image blocks 21 based on the two original human image blocks 11.
Next, in step S4, the corresponding originalhuman image block 11 in theoriginal image 10 is replaced with the synthesizedhuman image block 21 to generate thesynthesized image 20. In the newly generated synthetic-human image block 21, the background information of the pixel level matching the background and environmental information of theoriginal image 10 has already been included, so that the original-human image block 11 can be directly replaced with the corresponding newly generated synthetic-human image block 21. Thus, no elements other than the person in the newly generated completecomposite image 20 are changed. In thecomposite image 20, the new compositehuman image block 21 is both in a reasonable position and in a reasonable size, while conforming well to the surrounding environment information. Thesecomposite images 20 may provide a greater variety of character poses and bounding box information. The use of the original image set together with the composite image set for training theperson detection model 40 allows theperson detection model 40 to achieve a higher recognition rate.
Fig. 5 illustrates a process of generating the synthesizedhuman image block 21 according to an exemplary embodiment of the present invention. In the exemplary embodiment, step S1 also includes providing target pose information. Also, in step S3, the synthesizedhuman image block 21 is generated based on each of the original human image blocks 11 and in accordance with the target human pose such that the synthesizedhuman image block 21 has the target human pose represented by the target pose information.
The target pose information may be provided by a pose image that contains pose key points connected according to a real human skeleton link. The originalhuman image block 11 and the posture image having the target posture are input into thehuman generator 30, and then thehuman generator 30 outputs the synthesizedhuman image block 21. The source of the target pose is not limited by the invention, and the target pose can be the pose of other people in the original image set or the pose of people in other data sets.
Alternatively, the target pose information may be provided by a target pose character image containing a character having a target character pose. The target posed character image may or may not be selected from the original set of images.
Alternatively, the target pose information may be provided by position data for a set of pose key points. It should be understood that the present invention is not limited to the specific form of the target pose information.
In one exemplary embodiment, the target pose information has associated annotation information, and the syntheticcharacter image block 21 generated in step S3 has annotation information associated with the corresponding target pose information. The annotation information includes, for example, character intention information and/or gesture information. Thus, the resultingcomposite images 20 will also have associated annotation information, and thesecomposite images 20 may be particularly advantageously used to train a human intent recognizer or human gesture detector, or the like, for enhancing their performance.
In one exemplary embodiment,character generator 30 is configured to perform the following steps:
s31, identifying at least one posture key point of the person in the originalperson image block 11;
s32, intercepting a plurality of foreground image patches and a plurality of background image patches from the originalcharacter image block 11 based on the at least one pose keypoint;
s33, extracting at least one first feature vector from the plurality of foreground image patches and the plurality of background image patches;
s34, acquiring at least one second feature vector from the target posture information; and
s35, generating a synthetichuman image block 21 from the at least one first feature vector extracted in step S33 and the at least one second feature vector acquired in step S34.
Theperson generator 30 may be provided as another type of generator as long as theperson generator 30 functionally satisfies the requirement of being able to control the appearance, posture and background of the generated person image. It should be understood that the present invention is not limited to a particular type ofcharacter generator 30.
Fig. 6 shows a schematic diagram of a method of training ahuman detection model 40 according to an exemplary embodiment of the invention. In the method of training thehuman detection model 40, thesynthetic image 20 is generated by the method of generating an image according to the present invention. Then, a training data set is provided, which comprises thesynthetic image 20. The training data set may be stored in a computer readable storage medium. Optionally, the training data set further comprises theoriginal image 10. That is, theoriginal image 10 and thesynthesized image 20 may be used together to train thehuman detection model 40.
As described above, the person in thecomposite image 20 has a reasonable size and position, and the problem of the foreground not matching the background information does not occur. Thecomposite image 20 may provide more varied character poses and bounding box information, making the training data set more informative. The original image set and the composite image set are submitted to thehuman detection model 40 for training, so that thehuman detection model 40 can obtain higher recognition rate.
Furthermore, the invention relates to a computer program product comprising computer program instructions which, when executed by one or more processors, are capable of performing the method of generating an image according to the invention or the method of training a human detection model according to the invention. The computer program instructions may be stored in a computer readable storage medium. In the present invention, the computer-readable storage medium may include, for example, a high-speed random access memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device. The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Although specific embodiments of the invention have been described herein in detail, they have been presented for purposes of illustration only and are not to be construed as limiting the scope of the invention. Various substitutions, alterations, and modifications may be devised without departing from the spirit and scope of the present invention.

Claims (10)

CN202110561037.1A2021-05-202021-05-20Method for generating image, method for training human detection model, program, and devicePendingCN113191942A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110561037.1ACN113191942A (en)2021-05-202021-05-20Method for generating image, method for training human detection model, program, and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110561037.1ACN113191942A (en)2021-05-202021-05-20Method for generating image, method for training human detection model, program, and device

Publications (1)

Publication NumberPublication Date
CN113191942Atrue CN113191942A (en)2021-07-30

Family

ID=76984668

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110561037.1APendingCN113191942A (en)2021-05-202021-05-20Method for generating image, method for training human detection model, program, and device

Country Status (1)

CountryLink
CN (1)CN113191942A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116051926B (en)*2023-01-122024-04-16北京百度网讯科技有限公司Training method of image recognition model, image recognition method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116051926B (en)*2023-01-122024-04-16北京百度网讯科技有限公司Training method of image recognition model, image recognition method and device

Similar Documents

PublicationPublication DateTitle
Wang et al.Objectformer for image manipulation detection and localization
CN111291629A (en) Recognition method, device, computer equipment and computer storage medium of text in image
US10824910B2 (en)Image processing method, non-transitory computer readable storage medium and image processing system
CN110826389B (en) Gait recognition method based on attention 3D frequency convolutional neural network
CN110555433A (en)Image processing method, image processing device, electronic equipment and computer readable storage medium
CN114092938B (en)Image recognition processing method and device, electronic equipment and storage medium
JP2008217347A (en) License plate recognition device, control method thereof, computer program
CN111368682B (en)Method and system for detecting and identifying station caption based on master RCNN
TWI637325B (en)Note recognition and management using multi-color channel non-marker detection
CN112257665A (en) Image content recognition method, image recognition model training method and medium
CN103116754A (en)Batch image segmentation method and batch image segmentation system based on recognition models
CN111210434A (en)Image replacement method and system based on sky identification
CN108520263B (en)Panoramic image identification method and system and computer storage medium
CN118506115A (en)Multi-focal-length embryo image prokaryotic detection method and system based on optimal arc fusion
CN111144256B (en) Formula Synthesis and Error Detection Method of Spreadsheet Based on Video Dynamic Analysis
JP5201184B2 (en) Image processing apparatus and program
CN113191942A (en)Method for generating image, method for training human detection model, program, and device
Li et al.PPR-Net: Patch-based multi-scale pyramid registration network for defect detection of printed labels
JP2019139383A (en)Character recognition system, character recognition program, character recognition method, character sharpening system, character sharpening program, and character sharpening method
CN113743434A (en)Training method of target detection network, image augmentation method and device
CN111435448B (en) Image salient object detection methods, devices, equipment and media
US12169938B2 (en)Product release method and image processing method, apparatus, device, and storage medium
CN111680722B (en)Content identification method, device, equipment and readable storage medium
KR102157005B1 (en)Method of improving precision of deep learning resultant image by using image filtering technique
CN111583168A (en)Image synthesis method, image synthesis device, computer equipment and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
WD01Invention patent application deemed withdrawn after publication

Application publication date:20210730

WD01Invention patent application deemed withdrawn after publication

[8]ページ先頭

©2009-2025 Movatter.jp