Disclosure of Invention
Because angular resolution and spatial resolution have been a pair of irreconcilable contradictions for a light field camera, increasing the angular resolution necessarily sacrifices the spatial resolution, and increasing the spatial resolution necessarily reduces the angular resolution. The invention provides a light field video imaging system based on a hybrid camera array, which is characterized in that a global camera, a plurality of local high-definition cameras and a global light field camera are used for respectively and simultaneously acquiring global common images, local high-definition images and global light field images of the same wide view field, and then transformation, fusion and super-resolution are carried out to obtain the wide view field high-definition light field images, so that the defects of the light field camera are overcome, and billion-pixel-level wide view field light field videos can be obtained.
In order to achieve the above object, one embodiment of the present invention provides the following technical solutions:
a light field video imaging system based on a hybrid camera array comprises a control holder, an image processing unit, a storage unit and the hybrid camera array arranged on the control holder; the hybrid camera array comprises at least one global camera, at least one global light field camera, and a plurality of local cameras; the global camera, the global light field camera and the local cameras synchronously acquire a global low-resolution reference video, a global low-resolution light field video and a plurality of different local high-resolution videos of a preset scene respectively and store the videos in the storage unit; the image processing unit is connected with the storage unit and is used for carrying out image processing on a global low-resolution reference video, a global low-resolution light field video and each group of synchronous frame images of a plurality of high-resolution videos so as to obtain a global high-resolution light field video; wherein a set of synchronized frame images includes a global reference image from the reference video, a global light field image from the light field video, and local images from the high resolution videos, the image processing including the steps of S1 to S4:
s1, block matching: finding a corresponding block of each local image in the global reference image by adopting a block matching algorithm;
s2, transformation and registration: transforming each local image onto a corresponding block in the global reference image and performing image registration on each local image and its corresponding block to obtain a global high resolution image of the predetermined scene;
s3, super-resolution of images: performing up-sampling on the global light field image with low resolution by a set magnification factor to obtain a sampled low-resolution light field image, performing Fourier transform on the sampled low-resolution light field image to obtain a first frequency spectrum image, and performing low-pass filtering on the first frequency spectrum image; carrying out high-pass filtering on the global high-resolution image to obtain a second frequency spectrum image;
s4, fusion recovery: and performing inverse Fourier transform after the first frequency spectrum image subjected to low-pass filtering and the second frequency spectrum image are linearly added to obtain a global high-resolution light field image.
The light field video imaging system provided by the invention synchronously acquires global low-pixel video, global light field video and different local high-definition video of a long-distance wide view field by adopting a hybrid camera array consisting of a low-pixel global industrial camera, a low-pixel global light field camera and a plurality of high-definition local cameras, then performs image processing on the acquired videos by adopting a series of image processing algorithms, performs image matching transformation on images of the global low-pixel video and the local high-definition video to obtain global high-resolution images, performs image super-resolution on the images of the global light field video by combining the obtained global high-resolution images, compensates for the low spatial resolution of the high-resolution light field images of the global high-resolution images to obtain high-resolution light field images, and combines each frame of high-resolution light field images obtained by the super-resolution, the high-resolution light field video can be obtained, so that the technical contradiction that the spatial resolution and the angular resolution of the light field video can not be considered at the same time is overcome, and if a billion-pixel-level local camera is adopted in a matching manner, the wide-field billion-pixel-level light field video can be processed.
Another embodiment of the present invention provides a video processing method, configured to perform a fusion process on a global low-resolution reference video and a global low-resolution light field video of the same scene, which are acquired synchronously, and a plurality of local high-resolution videos of different viewing angles of the scene to obtain a global high-resolution light field video of the scene; the fusion process includes performing the following steps S1-S4 on each set of synchronized frame images of the reference video, the light field video, and the plurality of the local high resolution videos:
wherein a set of sync frame images includes a global reference image from the reference video, a global light field image from the light field video, and local images from the high resolution videos; the steps S1 to S4 are:
s1, block matching: finding a corresponding block of each local image in the global reference image by adopting a block matching algorithm;
s2, transformation and registration: transforming each local image onto a corresponding block in the global reference image and registering to obtain a global high-resolution image of the scene;
s3, super-resolution of images: performing up-sampling on the global light field image with low resolution by a set magnification factor to obtain a sampled low-resolution light field image, performing Fourier transform on the sampled low-resolution light field image to obtain a first frequency spectrum image, and performing low-pass filtering on the first frequency spectrum image; carrying out high-pass filtering on the global high-resolution image to obtain a second frequency spectrum image;
s4, fusion recovery: and performing inverse Fourier transform after the first frequency spectrum image subjected to low-pass filtering and the second frequency spectrum image are linearly added to obtain a global high-resolution light field image.
Detailed Description
The invention is further described with reference to the following figures and detailed description of embodiments.
The specific implementation mode of the invention provides a light field video imaging system capable of generating billion-pixel light field videos, which is realized based on a hybrid camera array and a special video image fusion algorithm and specifically comprises a control holder, an image processing unit, a storage unit and the hybrid camera array arranged on the control holder. The hybrid camera array comprises at least one global camera, at least one global light field camera, and a plurality of local cameras; the global camera, the global light field camera and the local cameras synchronously acquire a global low-resolution reference video, a global low-resolution light field video and a plurality of different local high-resolution videos of a preset scene respectively and store the videos in the storage unit; referring to fig. 1, the image processing unit is connected to the storage unit, and is configured to perform image processing on a global low-resolution reference video, a global low-resolution light field video, and each group of synchronous frame images of a plurality of the high-resolution videos, so as to obtain a global high-resolution light field video; wherein a set of synchronized frame images includes a global reference image from the reference video, a global light field image from the light field video, and local images from the high resolution videos, referring to fig. 1, the image processing includes the following steps S1 to S4:
s1, block matching: finding a corresponding block of each local image in the global reference image by adopting a block matching algorithm;
s2, transformation and registration: transforming each local image onto a corresponding block in the global reference image and performing image registration on each local image and its corresponding block to obtain a global high resolution image of the predetermined scene;
s3, super-resolution of images: performing up-sampling on the global light field image with low resolution by a set magnification factor to obtain a sampled low-resolution light field image, performing Fourier transform on the sampled low-resolution light field image to obtain a first frequency spectrum image, and performing low-pass filtering on the first frequency spectrum image; carrying out high-pass filtering on the global high-resolution image to obtain a second frequency spectrum image;
s4, fusion recovery: and performing inverse Fourier transform after the first frequency spectrum image subjected to low-pass filtering and the second frequency spectrum image are linearly added to obtain a global high-resolution light field image.
In the hybrid camera array, the global camera is a common camera, the resolution of the global camera is much lower than that of the local camera, the global camera is used for shooting a global video with a wide field of view and is used as a reference video in subsequent video processing, and the global light field video acquired by the global light field camera has the traditional characteristics of low spatial resolution and high angular resolution. As shown in fig. 2, the image 10 is a global reference image of a frame captured by a global camera, the local image 100 'is a high-definition image of the local scene 100 in the image 10 at the same time captured by a local camera, and the image 1000' is an enlarged view of a local area in the local image 100 ', on which words on the guideboard are clearly displayed, and it can be seen that the pixels of the local image 100' are very high. The focal length of the local camera is much larger than that of the global camera, and in one embodiment, the focal length of the local camera is more than 8 times of the focal length of the global camera, and the focal length of the local camera may be, for example, 32-200 mm, and the focal length of the global camera may be, for example, 4-25 mm. Preferably, the hybrid camera array comprises one global camera, one global light field camera and a series (e.g. more than 100) of super-high pixel local cameras. The local camera can be fixedly arranged on the control cloud platform, and can also be rotatably arranged on the control cloud platform.
The global camera, the global light field camera and the local cameras realize image acquisition of a large scene at the same time based on a synchronous acquisition module of the cameras, and acquired image data and intermediate image data generated by the image processing unit in the image processing process can be stored in the storage unit.
In a specific embodiment, the image processing unit performs image processing in units of frames, that is, an algorithm flow of performing the image processing each time is performed, and processes a global reference image (captured by a global camera), a global light field image (captured by the global light field camera), and a plurality of different local images (captured by a plurality of local cameras, respectively) captured at the same time to obtain a global high-resolution light field image at the time. And executing the image processing algorithm on each group of synchronous frame images to obtain a plurality of continuous frames of global high-resolution light field images so as to synthesize a global high-resolution light field video. That is to say, the present invention can fuse, through the image processing algorithms of the foregoing steps S1 to S4, the global low-resolution reference video, the global low-resolution light field video, and the local high-resolution videos of the scene from different perspectives, which are acquired synchronously, of the same scene to obtain the global high-resolution light field video of the scene.
In a preferred embodiment, referring to fig. 1, in step S1, a zero-mean normalized cross-correlation block matching algorithm (abbreviated as "ZNCC algorithm") is used to perform block matching, preferably two ZNCC iterations are performed to find a corresponding block (also referred to as "corresponding region" or "reference block") of each local image in the global reference image, so as to obtain a pixel matching relationship between each local image and the global reference image. As shown in fig. 3, after passing through ZNCC block matching process 001, the corresponding reference block 101 of local image 101' in global reference image 10 can be found.
As in fig. 3, after finding the reference block of each local image in the global reference image, performed from path 002 onward, a transformation and registration will take place, which is roughly divided into three steps: global transforms, mesh-based transforms, and temporal, spatial smoothing transforms.
Firstly, integral transformation is carried out: each local image and its reference block are used as a pair, and then the extraction and matching of feature points are performed on each pair of local images and their corresponding blocks by adopting the ZNCC algorithm, so as to extract the corresponding (matched) feature point pairs in each local image and its corresponding block. In general, a certain number of pairs of feature points may be extracted in one iteration, and based on these pairs of feature points, a homography matrix between the local image and its corresponding block may be preliminarily calculated, and based on the homography matrix, image transformation may be performed, i.e., the local image is transformed onto the corresponding block in the global reference image. The extraction of the feature point pairs may be performed repeatedly, for example, a preliminary homography matrix is calculated after the first iterative extraction, more feature point pairs are extracted based on the preliminary homography matrix in the second iterative extraction, and the homography matrix may be recalculated based on the feature point pairs extracted twice for the next iterative extraction, and so on. In a preferred embodiment of the present invention, two ZNCC iterations are performed to extract the feature point pairs and calculate the homography matrix, and the process of the two iterations can be represented by the following iteration formula:
wherein,representing matching blocks between a local image and its corresponding blocks (characterizing the matching relationship between the two), Il、IrRespectively representing a local image and its corresponding block in a global reference image, plAnd prAre respectively partial images IlAnd its corresponding block IrCorresponding feature point of, i.e. plAnd prIs a characteristic point pair; ZNCC () represents an energy function for calculating a local image and a corresponding block thereof by using a ZNCC algorithm; h represents a homography matrix, and is initialized to be an identity matrix during initialization;is plPi () represents the central projection and the inverse normalization function; w is the size of the partial image (the partial image is a square, w represents the side length of the square), and epsilon is the search width. When feature point extraction and matching are performed in the first iteration, w and epsilon are both set to 256, H in the constraint condition (s.t. represents the constraint condition) is an identity matrix, as shown in fig. 3, based on the identity matrix, the iteration formula is used for performing the first iteration 003 to establish a matching correspondence, and the successfully-matched feature points are displayed in the local image 101' a, that is, a certain number of feature point pairs are extracted, and a homography matrix H1 is calculated (H in the constraint condition is replaced in the second iteration); and performing a second iteration 005 to improve matching, setting w and epsilon as 128, extracting more feature point pairs and recalculating the homography matrix to obtain an updated homography matrix H2, wherein feature points successfully matched after two iterations are displayed in the local image 101' b. In the preferred embodiment, the local image is transformed onto the corresponding reference block over two iterations 003 and 005. Performing the above ZNCC iteration on all local images to extract the characteristic point pairs and calculate the corresponding homography matrix respectively so as to transform all local images to the corresponding reference blocks respectively, thereby completing the processThe overall transformation is performed to obtain a preliminary global transformation image.
The global transformation image obtained by the above-mentioned overall transformation has some regions with poor transformation effect due to the lack of feature point pairs, especially when the feature point pairs are located at the edge of the image. Therefore a mesh-based transformation is then performed: on the basis of the preliminary global transformation image obtained above, the feature point pairs extracted in the overall transformation process are subjected to grid-based transformation (H3 in fig. 3 indicates that transformation is performed based on grids) by using an ASAP transformation frame (most similar transformation frame), and then subjected to optical flow-based transformation (H4 in fig. 3 indicates that transformation is performed based on optical flows) on the result of the grid transformation, so as to optimize the pixel matching relationship, obtain more reliable feature point pairs, and obtain feature points showing more successful matching in the local image 101' c at this time and changed optical flows. The distortion of the optical flow transformation is combined with the stability of the local image, the homography matrix is recalculated, the grid-based transformation and the optical flow-based transformation are completed, and the transformation result 101'd is obtained. The transformed and registered partial images 101 "are then color calibrated 012. The flow shown in fig. 3 is performed for each local image, and all local images are transformed onto the reference image, thereby obtaining an optimized global transformed image.
The optimized global transformation image is a high-resolution global image synthesized by local images, and in order to ensure the temporal stability of the overall video after all the subsequent frames are combined together, the optimized global transformation image can be subjected to temporal and spatial smooth transformation.
In a specific embodiment, a time and space smooth transformation is performed by introducing a time stability constraint, and the energy function of the smooth transformation is
E(V)=λrEr(V)+λtEt(V)+λsEs(V)
Where V represents a homography matrix of the transformation that depends on the vertices of the mesh, Er(V) is the global transformation graphSum of distances of each pair of feature points between local images in the image and the global reference image, Et(V) is a temporal stability constraint; es(V) is a spatial smoothing term defined as the spatial deformation between adjacent vertices; lambda [ alpha ]r、λtAnd λsAre all constants greater than 0; wherein:
αplis plBilinear interpolation weights of (1);
pl' is a feature point p in a local imagelCorresponding characteristic points in the time prior graph; b is an indication function for checking the pixel point plWhether or not on a static background, when B (p)l) 0 denotes a pixel plOn a moving background; s is the global transformation function between the local image and its temporal prior map.
After the series of transformation and registration, a global high-resolution image is obtained, wherein in consideration of the problem of inconsistent color of each local image in the global high-resolution image caused by different color illumination of the local camera, color correction can be performed on each local image until the local image is consistent with the global reference image, so that the global high-resolution image has a uniform color style as a whole. In addition, such an optimization can also be made for a global high resolution image: and removing the overlapped part between the local images on the transformation by adopting a graph cutting method so as to minimize the error of image registration.
After obtaining the global high-resolution image, the global light field image needs to be subjected to image super-resolution to improve the problem of low spatial resolution, and the specific method is as follows: the global light field image with low resolution (referring to spatial resolution) is subjected to up-sampling with amplification set times to obtain low score after samplingPerforming Fourier transform on the sampled low-resolution light field image to obtain a first spectrum image, and performing low-pass filtering on the first spectrum image; and carrying out high-pass filtering on the global high-resolution image to obtain a second frequency spectrum image. And performing inverse Fourier transform after the first frequency spectrum image subjected to low-pass filtering and the second frequency spectrum image are linearly added to obtain a global high-resolution light field image. Wherein the set multiple is fh/fl,fhAnd flThe focal lengths of the local camera and the global camera, respectively.
And performing the above-mentioned series of image processing on each group of synchronous frame images of the global reference video, the global light field video and all local videos, so that each group of synchronous frame images finally obtains a corresponding frame of global high-resolution light field image. And finally, synthesizing all the frame global high-resolution light field images into the global high-resolution light field video required by us. Using a local camera at the gigapixel level in the system to acquire a local image at the gigapixel level, a global gigapixel light field video of a wide field of view can be finally obtained.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.