Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The panoramic image is a two-dimensional image with length and width, and since the two-dimensional image has an edge area, the edge area of the panoramic image only contains partial information of the target object, when the existing detection model detects an incomplete target object in the edge area, the existing detection model can only rely on the partial information of the target object to perform target detection, but the accuracy of detection depending on the partial information of the target object is not high, and the problem of false detection may occur. In order to solve the above problem, an embodiment of the present application provides a solution, and the basic idea is: in the process of carrying out target detection on a panoramic image, a current first panoramic image is transversely rolled and translated, an incomplete target object in an edge area is moved towards the middle direction to form a complete target object, a translated second panoramic image is obtained, then a first target detection frame corresponding to the target detection frame in the edge area in the first panoramic image is determined from target detection frames contained in the second panoramic image according to the rolling and translating relation between the first panoramic image and the second panoramic image, and then the target object contained in the first panoramic image is determined according to the target detection frame in a non-edge area in the first panoramic image and the first target detection frame in the second panoramic image, so that the accuracy of target detection for the panoramic image is improved, and the problem that the existing target detection mode for the panoramic image is insufficient in accuracy is solved. In the embodiment of the present application, an image direction in each panoramic image, in which 360-degree panoramic information can be presented, is referred to as a horizontal direction of the panoramic image, and an image direction that can be referred to as the "horizontal direction" in this embodiment may be a longitudinal direction of the image or a width direction of the image; specifically, which image direction shows 360-degree panoramic information, and which image direction is taken as the "horizontal" of the present embodiment; further, the image direction for displaying the 360-degree panoramic information may be the image length direction or the image width direction, for example, if the horizontal 360-degree panoramic information is desired to be displayed, the camera may be rotated 360 degrees around the vertical direction to perform shooting to obtain a panoramic image, and the 360-degree panoramic information is displayed in the length direction of the panoramic image; if the vertical 360-degree panoramic information is to be displayed, a camera can be used for shooting around the horizontal direction for 360 degrees to obtain a panoramic image, and the 360-degree panoramic information is displayed in the width direction of the panoramic image.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1a is a diagram of an object detection method for a panoramic image according to an exemplary embodiment of the present application. As shown in fig. 1a, the method comprises:
101. acquiring a first panoramic image, wherein the first panoramic image comprises a target object to be detected;
102. performing rolling translation on the first panoramic image along the transverse direction to obtain a second panoramic image;
103. target detection is carried out on the first panoramic image and the second panoramic image to obtain a target detection frame contained in the first panoramic image and a target detection frame contained in the second panoramic image;
104. according to the rolling translation relation between the first panoramic image and the second panoramic image, determining a first target detection frame corresponding to a target detection frame in the edge area of the first panoramic image from target detection frames contained in the second panoramic image;
105. and determining a target object contained in the first panoramic image according to the target detection frame in the non-edge region of the first panoramic image and the first target detection frame in the second panoramic image.
In the embodiment of the present application, target detection may be performed on a panoramic image using a device or an electronic apparatus having a target detection function. Taking target detection of the panoramic image by using an electronic device as an example, the electronic device includes but is not limited to a desktop computer, a notebook computer, a tablet computer, a smart phone, a wearable device, a smart home device, and the like. The electronic equipment provided by the embodiment of the application can be used for acquiring the panoramic image to be detected and realizing target detection by using the method provided by the application. In some embodiments, the electronic device may capture a panoramic image to be detected through an image capture unit mounted thereto, for example, the image capture unit may be a camera, a video camera, or the like. Alternatively, the electronic device may also implement a panoramic image capturing device including an image capturing unit and a target detection unit. In other embodiments, the electronic terminal may also receive the panoramic image to be detected from an independently arranged image capturing device, for example, the independently arranged image capturing device may be a camera, a panoramic camera, or the like.
The panoramic image may be a panoramic image of a certain target physical space, the target physical space may be a physical room space, and a target object in the panoramic image may be various room structures such as a wall, a window, a ceiling, a floor, a door and the like in the physical room space, or may be various furniture or home appliances such as a bed, a table, a chair, a television, a green plant and the like in a room.
In this embodiment, a first panoramic image of a target physical space may be obtained through an electronic terminal, where the first panoramic image includes a target object to be detected. Then, when the target object to be detected is located in the edge area, in order to improve the accuracy of target detection, the two edge areas may be spliced together to form a complete target object, and then the complete target object is detected. In specific implementation, the first panoramic image can be horizontally rolled and translated until the target objects to be detected in the edge area of the first panoramic image can be spliced into a complete target object, so as to obtain a second panoramic image. The direction of translation is not limited here, and the image portion originally located in the non-edge region may be translated to the edge region by moving from the second edge region to the first edge region, or from the first edge region to the second edge region.
Based on the fact that the process of performing scroll translation on the first panoramic image is essentially the process of shifting pixels, in an optional embodiment,step 102, the process of performing scroll translation on the first panoramic image in the transverse direction to obtain the second panoramic image can be implemented by: and performing horizontal rolling translation on the first panoramic image by N pixels along the transverse direction to obtain a second panoramic image, wherein N is a positive integer. When the target object to be detected in the edge area is detected, the more comprehensive the information of the target object to be detected is, the higher the detection accuracy is, that is, the more complete the target object to be detected is, the higher the detection accuracy is, and when the target object to be detected is completely spliced, the highest detection accuracy is obtained. Then, when the limited range of the edge area is small, the first panoramic image needs to be shifted in the transverse direction by at least the number of pixels contained in one of the edge areas in the transverse direction, that is, N should be equal to or greater than the number of pixels contained in the edge area in the transverse direction. For example, assuming that the edge region is a region defined by 10 pixels in the lateral dimension, it makes sense to move the object to be detected in the edge region all out of the edge region of the 10 pixels when translating. It should be noted here that when the limited range of the edge area is relatively large, the target object to be detected may not be moved out of the edge area completely. As shown in fig. 1b, the panoramic image has two edge areas in the horizontal direction, a part of information of the target object located in the edge area is located in the first edge area, another part of information is located in the second edge area, and the panoramic image is translated according to the set number of pixels, and can be translated from the second edge area to the first edge area, or translated from the second edge area to the first edge area, taking the set number of translation pixels as 20, and the translation direction is translated from the second edge area to the first edge area, as an example, the panoramic image shown in fig. 1c is obtained after translation, at this time, the two parts of the target object are all moved to the non-edge area, and are spliced to form a complete target object, so as to perform target detection on the complete target object, thereby improving the accuracy of target detection of the panoramic image.
Further, it may be defaulted that the edge area is translated to the central area of the panoramic image, i.e. N is half the number of pixels contained in the first panoramic image in the lateral direction; then, according to the rolling translation relationship between the first panoramic image and the second panoramic image, determining the area position of the target detection frame in the edge area of the first panoramic image in the second panoramic image, which can be implemented by the following steps: and determining the area position of the second panoramic image close to the central line of the second panoramic image according to the rolling translation relation between the first panoramic image and the second panoramic image, wherein the area position of the target detection frame in the edge area of the second panoramic image in the first panoramic image is the area position of the target detection frame in the second panoramic image.
In this embodiment, after the second panoramic image is obtained, the target detection model may be used to perform target detection on the first panoramic image and the second panoramic image, respectively, so as to obtain a target detection frame included in the first panoramic image and a target detection frame included in the second panoramic image. The target detection model may be a deep learning-based model, and for example, the panoramic image may be detected by using a one-stage or two-stage deep neural network model. The deep neural network model includes at least one of: YOLO, RCNN, fast-RCNN, SSD, or any variation of the above-mentioned deep network model or other deep neural network models having the same effect as the above-mentioned deep network model may also be used for target detection of the panoramic image to be detected. By using the model based on the deep learning to detect the target of the panoramic image, whether the panoramic image to be detected has the candidate target of the preset category and the position of the candidate target in the panoramic image to be detected can be determined.
Further, there are various ways to perform target detection on the first panoramic image and the second panoramic image to obtain a target detection frame included in the first panoramic image and a target detection frame included in the second panoramic image, and in an optional embodiment, the following ways may be implemented: and respectively inputting the first panoramic image and the second panoramic image into a target detection model for target detection to obtain the target detection model contained in the first panoramic image and the target detection model contained in the second panoramic image. In another optional embodiment, in order to obtain the target detection frames in the first panoramic image and the second panoramic image more conveniently and quickly, the first panoramic image and the second panoramic image may be spliced into one panoramic image, and then the target detection frame included in the spliced panoramic image is determined by using the target detection model, which may be specifically implemented in the following manner: and splicing the first panoramic image and the second panoramic image along the longitudinal direction to obtain a third panoramic image, wherein the third panoramic image comprises the first panoramic image and the second panoramic image, then inputting the third panoramic image into a target detection model for target detection to obtain a target detection frame contained in the third panoramic image, and the target detection frame contained in the third panoramic image comprises a target detection frame contained in the first panoramic image and a target detection frame contained in the second panoramic image.
After obtaining the target detection frame included in the first panoramic image and the target detection frame included in the second panoramic image based on the third panoramic image, continue to step 104, and determine a first target detection frame corresponding to the target detection frame in the edge region of the first panoramic image from the target detection frames included in the second panoramic image according to the rolling translation relationship between the first panoramic image and the second panoramic image. However, when the target detection is performed on the third panoramic image, there may be a false detection situation, for example, in the embodiment of the present application, the horizontal dimension of the panoramic image is detected, and the vertical dimension is complete by default, and if the target detection frame appears at the splicing position of the first panoramic image and the second panoramic image, the false detection situation is described.
In order to avoid the above false detection and improve the accuracy of target detection, in an alternative embodiment, step 104 may be implemented by: filtering out target detection frames contained in the third panoramic image and spanning the first panoramic image and the second panoramic image to obtain target detection frames contained in the first panoramic image and target detection frames contained in the second panoramic image; then, according to the rolling translation relation between the first panoramic image and the second panoramic image, determining the area position of the target detection frame in the edge area of the first panoramic image in the second panoramic image, namely obtaining the number of pixels translated by the second panoramic image based on the first panoramic image, and determining the corresponding area position of the target detection frame in the edge area of the first panoramic image in the second panoramic image; and then, taking the target detection frame contained in the second panoramic image and located in the area position as a first target detection frame, namely, the first target detection frame is a target detection frame corresponding to the target detection frame contained in the edge area of the first panoramic image in the second panoramic image. In the embodiment of the present application, the area position of the target detection frame in the panoramic image refers to the position of the target detection frame in the panoramic image.
Further, after the first target detection frame corresponding to the target detection frame in the edge region of the first panoramic image is determined from the target detection frames in the second panoramic image, the target object included in the first panoramic image is determined based on the determination, that is, thestep 105 is continued, and the target object included in the first panoramic image is determined according to the target detection frame in the non-edge region of the first panoramic image and the first target detection frame in the second panoramic image. In an alternative embodiment, step 105 may be implemented by: filtering out target detection frames in the edge area of the first panoramic image, and filtering out other target detection frames except the first target detection frame in the second panoramic image; then, according to the rolling translation relation between the first panoramic image and the second panoramic image, converting a first target detection frame in the second panoramic image into an edge area in the first panoramic image; and then determining a target object contained in the first panoramic image according to the target detection frame in the marginal area of the first panoramic image and the target detection frame in the non-marginal area of the first panoramic image.
In an optional embodiment of the present application, the first panoramic image may be a panoramic image corresponding to a room space, and based on this, when performing target detection on the first panoramic image, detection of a marking may be involved, for example, a door line, a wall line formed by intersecting a wall surface and a wall surface, a floor line formed by intersecting a wall surface and a ground surface, and a ceiling line formed by intersecting a wall surface and a ceiling, so when performing marking detection on the first panoramic image corresponding to the room space, not only a sideline needs to be detected, but also a category of the marking needs to be detected, which may specifically be implemented by: inputting the first panoramic image into a marking line prediction model with a semantic annotation function to predict marking line information to obtain position information and semantic information of marking lines in the first panoramic image; marking the marked line in the first panoramic image according to the position information and the semantic information of the marked line, and marking the semantic information of the marked line.
Further, in an optional embodiment, the first panoramic image is input into a reticle prediction model with a semantic annotation function to perform prediction of reticle information, so as to obtain location information and semantic information of a reticle existing in the first panoramic image, that is, the reticle prediction model at least includes two branches, one branch is used for predicting a location where the reticle exists, and the other branch is used for predicting semantic information of the reticle, so that the location information and the semantic information of the reticle existing in the first panoramic image are predicted in a manner shown in fig. 1 d: inputting the first panoramic image into a feature extraction network in a marking prediction model to extract image features to obtain the image features of the first panoramic image, wherein the image features at least comprise the pixel features of the first panoramic image; then, on one hand, inputting the image characteristics into a reticle position prediction network in a reticle prediction model to predict the reticle position to obtain first reticle characteristics and position information thereof, wherein the first reticle characteristics can reflect the image characteristics of the reticle, such as angular point characteristics, color characteristics, gradient characteristics and the like, and the position information can reflect the pixel coordinates of the reticle position and the like; on the other hand, the image characteristics are input into a reticle semantic prediction network in a reticle prediction model to predict reticle semantics, so that a second reticle characteristic and semantic information thereof are obtained, wherein the second reticle characteristic can also reflect the image characteristics of the reticle, such as angular point characteristics, color characteristics, gradient characteristics and the like, and the semantic information can reflect the categories of the reticle and objects related to the reticle, such as door lines, wall lines formed by intersecting wall surfaces and wall surfaces, floor lines formed by intersecting wall surfaces and the ground, and ceiling lines formed by intersecting wall surfaces and ceilings; and finally, mapping the position information of the first marking features and the semantic information of the second marking features to obtain the position information and the semantic information of the marking existing in the first panoramic image.
Further, inputting the first panoramic image into a feature extraction network in the reticle prediction model to extract image features, so as to obtain the image features of the first panoramic image, which can be implemented in the following way: inputting the first panoramic image into a feature extraction network, extracting pixel features of the first panoramic image, and compressing pixel features corresponding to each pixel dimension in the transverse dimension in the longitudinal dimension to obtain one-dimensional features corresponding to the first panoramic image, wherein the one-dimensional features represent wall corner points in a house and ceiling lines and/or floor lines formed by extending the wall corner points in the transverse dimension. It should be noted that, in the case of compressing the one-dimensional features, the transverse lines in the house may be extracted. However, under the condition of giving wall corner points, the wall surface lines can be extracted by combining the height of the house and extending along the longitudinal direction, but because the adjacent condition is only that the wall surfaces are adjacent, the semantic information is relatively single for the wall surface lines, and too much attention can not be paid.
In an alternative embodiment, there may be a reticle formed by overlapping physical surfaces of different types, for example, a reticle formed by a cabinet shielding wall surface, and for this case, the position information of the first reticle feature is mapped with the semantic information of the second reticle feature to obtain the position information and the semantic information of the reticle existing in the first panoramic image, which may be implemented by: matching the first reticle feature with the second reticle feature to form at least one reticle feature pair, each reticle feature pair comprising a first reticle feature and a second reticle feature; for each marking feature pair, if a first marking feature is the same as a second marking feature, determining a marking existing in the first panoramic image according to the position information of the first marking feature, and taking the semantic information of the second marking feature as the semantic information of the marking; and if the first marking features are different from the second marking features, supplementing the first marking features and the position information thereof according to the semantic information of the second marking features, determining marking lines in the first panoramic image according to the supplemented position information of the first marking features, and taking the semantic information of the second marking features as the semantic information of the marking lines. According to the embodiment, the semantic analysis of the reticle is added, the reticle in the panoramic image is identified by combining the position information and the semantic information, and the accuracy of reticle prediction of the panoramic image is effectively improved.
Further, since the object detection method for panoramic images in the present application can be used in online house-watching and home-decoration scenes, the object detection method for panoramic images may further include the following steps: generating a house flat house type picture or a house three-dimensional panoramic picture corresponding to the first panoramic picture by combining the marked lines and the semantic information thereof in the first panoramic picture according to the target objects in the first panoramic picture; and/or generating a decoration scheme of the house space corresponding to the first panoramic image according to the target object existing in the first panoramic image by combining the marked line existing in the first panoramic image and the semantic information of the marked line, and outputting the decoration scheme.
According to the technical scheme provided by the embodiments of the application, in the process of performing target detection on a panoramic image, a current first panoramic image is transversely rolled and translated, so that an incomplete target object in an edge area moves towards the middle direction to form an integral target object, a translated second panoramic image is obtained, then a first target detection frame corresponding to the target detection frame in the edge area in the first panoramic image is determined from target detection frames contained in the second panoramic image according to the rolling and translating relation between the first panoramic image and the second panoramic image, and then the target object contained in the first panoramic image is determined according to the target detection frame in a non-edge area in the first panoramic image and the first target detection frame in the second panoramic image, so that the accuracy of target detection on the panoramic image is improved, and the problem that the existing target detection mode on the panoramic image is insufficient in accuracy is solved.
Fig. 2 is a diagram of an object detection apparatus for a panoramic image according to an exemplary embodiment of the present application. As shown in fig. 2, the apparatus includes:
the acquiring module 21 is configured to acquire a first panoramic image, where the first panoramic image includes a target object to be detected;
the translation module 22 is configured to perform rolling translation on the first panoramic image along the transverse direction to obtain a second panoramic image;
a target detection module 23, configured to perform target detection on the first panoramic image and the second panoramic image to obtain a target detection frame included in the first panoramic image and a target detection frame included in the second panoramic image;
a determining module 24, configured to determine, according to a rolling translation relationship between the first panoramic image and the second panoramic image, a first target detection frame corresponding to a target detection frame in an edge region of the first panoramic image from target detection frames included in the second panoramic image;
the determining module 24 is further configured to determine a target object included in the first panoramic image according to the target detection frame in the first panoramic image in the non-edge region thereof and the first target detection frame in the second panoramic image.
Here, it should be noted that: the target detection apparatus for a panoramic image provided in the above embodiments may implement the technical solutions described in the above method embodiments, and the specific implementation principles of the above modules or units may refer to the corresponding contents in the above method embodiments, and are not described herein again.
Fig. 3 is an electronic device provided in an exemplary embodiment of the present application. As shown in fig. 3, the electronic apparatus includes: amemory 30a and a processor 30b; thememory 30a is for storing a computer program, and the processor 30b is coupled to thememory 30a for executing the computer program to perform the steps of:
acquiring a first panoramic image, wherein the first panoramic image comprises a target object to be detected;
performing rolling translation on the first panoramic image along the transverse direction to obtain a second panoramic image;
target detection is carried out on the first panoramic image and the second panoramic image to obtain a target detection frame contained in the first panoramic image and a target detection frame contained in the second panoramic image;
according to the rolling translation relation between the first panoramic image and the second panoramic image, determining a first target detection frame corresponding to a target detection frame in the edge area of the first panoramic image from target detection frames contained in the second panoramic image;
and determining a target object contained in the first panoramic image according to the target detection frame in the non-edge region of the first panoramic image and the first target detection frame in the second panoramic image.
Further, when the processor 30b performs panning on the first panoramic image in the lateral direction to obtain the second panoramic image, the processor is specifically configured to: the first panoramic image is transversely rolled and translated by N pixels to obtain a second panoramic image; wherein N is a positive integer, and N is equal to or greater than the number of pixels included in the edge region in the lateral direction.
Further, when the processor 30b performs target detection on the first panoramic image and the second panoramic image to obtain a target detection frame included in the first panoramic image and a target detection frame included in the second panoramic image, the processor is specifically configured to: splicing the first panoramic image and the second panoramic image along the longitudinal direction to obtain a third panoramic image, wherein the third panoramic image comprises a first panoramic image and a second panoramic image; and inputting the third panoramic image into the target detection model for target detection to obtain a target detection frame contained in the third panoramic image, wherein the target detection frame contained in the third panoramic image comprises the target detection frame contained in the first panoramic image and the target detection frame contained in the second panoramic image.
Further, when determining, according to the panning relationship between the first panoramic image and the second panoramic image, the first target detection frame corresponding to the target detection frame in the edge region of the first panoramic image from the target detection frames included in the second panoramic image, the processor 30b is specifically configured to: filtering out target detection frames contained in the third panoramic image and spanning the first panoramic image and the second panoramic image to obtain target detection frames contained in the first panoramic image and target detection frames contained in the second panoramic image; determining the area position of a target detection frame in the edge area of the first panoramic image in the second panoramic image according to the rolling translation relation between the first panoramic image and the second panoramic image; and taking the target detection frame contained in the second panoramic image and positioned in the area position as a first target detection frame.
Further, when determining the target object included in the first panoramic image according to the target detection frame in the non-edge region of the first panoramic image and the first target detection frame in the second panoramic image, the processor 30b is specifically configured to: filtering out target detection frames in the edge area of the first panoramic image, and filtering out other target detection frames except the first target detection frame in the second panoramic image; converting a first target detection frame in the second panoramic image into an edge region in the first panoramic image according to a rolling translation relation between the first panoramic image and the second panoramic image; and determining a target object contained in the first panoramic image according to the target detection frame in the marginal area of the first panoramic image and the target detection frame in the non-marginal area of the first panoramic image.
Further, N is half the number of pixels contained in the first panoramic image in the lateral direction; the processor 30b, when determining the region position of the target detection box in the edge region of the first panoramic image in the second panoramic image according to the panning relationship between the first panoramic image and the second panoramic image, is specifically configured to: and determining the area position of the second panoramic image close to the central line of the second panoramic image according to the rolling translation relation between the first panoramic image and the second panoramic image, and taking the area position of the target detection frame in the edge area of the first panoramic image in the second panoramic image.
Further, the first panoramic image is a panoramic image corresponding to the room space, and accordingly, the processor 30b is further configured to: inputting the first panoramic image into a marking line prediction model with a semantic annotation function to predict marking line information to obtain position information and semantic information of marking lines in the first panoramic image; marking the marked line in the first panoramic image according to the position information and the semantic information of the marked line, and marking the semantic information of the marked line.
Further, when the first panoramic image is input into the reticle prediction model with the semantic annotation function for predicting the reticle information to obtain the position information and the semantic information of the reticle existing in the first panoramic image, the processor 30b is specifically configured to: inputting the first panoramic image into a feature extraction network in a marking prediction model to extract image features to obtain the image features of the first panoramic image; inputting the image characteristics into a marking position prediction network in a marking prediction model to predict the marking position, and obtaining first marking characteristics and position information thereof; inputting the image characteristics into a reticle semantic prediction network in a reticle prediction model to predict reticle semantics to obtain second reticle characteristics and semantic information thereof; and mapping the position information of the first marking features and the semantic information of the second marking features to obtain the position information and the semantic information of the marking existing in the first panoramic image.
Further, when the processor 30b inputs the first panoramic image into the feature extraction network in the reticle prediction model to extract the image features, so as to obtain the image features of the first panoramic image, the processor is specifically configured to: inputting the first panoramic image into a feature extraction network, extracting pixel features of the first panoramic image, and compressing pixel features corresponding to each pixel dimension in the transverse dimension in the longitudinal dimension to obtain one-dimensional features corresponding to the first panoramic image, wherein the one-dimensional features represent wall corner points in a house and ceiling lines and/or floor lines formed by extending the wall corner points in the transverse dimension.
Further, when the processor 30b maps the position information of the first reticle feature and the semantic information of the second reticle feature to obtain the position information and the semantic information of the reticle existing in the first panoramic image, the processor is specifically configured to: matching the first reticle feature with the second reticle feature to form at least one reticle feature pair, each reticle feature pair comprising a first reticle feature and a second reticle feature; for each marking feature pair, if a first marking feature is the same as a second marking feature, determining a marking existing in the first panoramic image according to the position information of the first marking feature, and taking the semantic information of the second marking feature as the semantic information of the marking; and if the first marking features are different from the second marking features, supplementing the first marking features and the position information thereof according to the semantic information of the second marking features, determining marking lines in the first panoramic image according to the supplemented position information of the first marking features, and taking the semantic information of the second marking features as the semantic information of the marking lines.
Further, the processor 30b is further configured to: generating a house flat house type image or a house three-dimensional panoramic image corresponding to the first panoramic image according to a target object existing in the first panoramic image and by combining the marked line existing in the first panoramic image and semantic information thereof; and/or generating a decoration scheme of the house space corresponding to the first panoramic image by combining the marked lines and the semantic information thereof in the first panoramic image according to the target objects in the first panoramic image, and outputting the decoration scheme.
Further, as shown in fig. 3, the electronic terminal further includes:display 30c, communications component 30d, power component 30e, audio component 30f, and the like. Only some of the components are schematically shown in fig. 3, and the electronic terminal is not meant to include only the components shown in fig. 3. The electronic terminal of the embodiment can be implemented as a desktop computer, a notebook computer, a smart phone, an IOT device, or other terminal devices.
Here, it should be noted that: the electronic device provided in the foregoing embodiments may implement the technical solutions described in the foregoing method embodiments, and the specific implementation principle of each module or unit may refer to the corresponding content in the foregoing method embodiments, which is not described herein again.
An exemplary embodiment of the application also provides a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
acquiring a first panoramic image, wherein the first panoramic image comprises a target object to be detected;
performing rolling translation on the first panoramic image along the transverse direction to obtain a second panoramic image;
performing target detection on the first panoramic image and the second panoramic image to obtain a target detection frame contained in the first panoramic image and a target detection frame contained in the second panoramic image;
according to the rolling translation relation between the first panoramic image and the second panoramic image, determining a first target detection frame corresponding to a target detection frame in the edge area of the first panoramic image from target detection frames contained in the second panoramic image;
and determining a target object contained in the first panoramic image according to the target detection frame in the non-edge region of the first panoramic image and the first target detection frame in the second panoramic image.
Further, when the processor performs horizontal scroll translation on the first panoramic image to obtain the second panoramic image, the processor is specifically configured to: the first panoramic image is transversely rolled and translated by N pixels to obtain a second panoramic image; wherein N is a positive integer, and N is equal to or greater than the number of pixels included in the edge region in the lateral direction.
Further, when performing target detection on the first panoramic image and the second panoramic image to obtain a target detection frame included in the first panoramic image and a target detection frame included in the second panoramic image, the processor is specifically configured to: splicing the first panoramic image and the second panoramic image along the longitudinal direction to obtain a third panoramic image, wherein the third panoramic image comprises a first panoramic image and a second panoramic image; and inputting the third panoramic image into the target detection model for target detection to obtain target detection frames contained in the third panoramic image, wherein the target detection frames contained in the third panoramic image comprise the target detection frames contained in the first panoramic image and the target detection frames contained in the second panoramic image.
Further, when determining, according to the panning relationship between the first panoramic image and the second panoramic image, a first target detection frame corresponding to a target detection frame in an edge region of the first panoramic image from target detection frames included in the second panoramic image, the processor is specifically configured to: filtering out target detection frames contained in the third panoramic image and spanning the first panoramic image and the second panoramic image to obtain target detection frames contained in the first panoramic image and target detection frames contained in the second panoramic image; determining the area position of a target detection frame in the edge area of the first panoramic image in the second panoramic image according to the rolling translation relation between the first panoramic image and the second panoramic image; and taking the target detection frame contained in the second panoramic image and positioned in the area position as a first target detection frame.
Further, when determining the target object included in the first panoramic image according to the target detection frame in the first panoramic image in the non-edge region thereof and the first target detection frame in the second panoramic image, the processor is specifically configured to: filtering out target detection frames in the edge area of the first panoramic image, and filtering out other target detection frames except the first target detection frame in the second panoramic image; converting a first target detection frame in the second panoramic image into an edge region in the first panoramic image according to a rolling translation relation between the first panoramic image and the second panoramic image; and determining a target object contained in the first panoramic image according to the target detection frame in the marginal area of the first panoramic image and the target detection frame in the non-marginal area of the first panoramic image.
Further, N is half the number of pixels contained in the first panoramic image in the lateral direction; the processor is specifically configured to, when determining, according to a panning relationship between the first panoramic image and the second panoramic image, an area position of the target detection box in the edge area of the first panoramic image in the second panoramic image, determine: and determining the area position of the second panoramic image close to the central line of the second panoramic image according to the rolling translation relation between the first panoramic image and the second panoramic image, wherein the area position of the target detection frame in the edge area of the second panoramic image in the first panoramic image is the area position of the target detection frame in the second panoramic image.
Further, the first panoramic image is a panoramic image corresponding to the room space, and accordingly, the processor is further configured to: inputting the first panoramic image into a marking line prediction model with a semantic annotation function to predict marking line information to obtain position information and semantic information of marking lines in the first panoramic image; marking the marked line in the first panoramic image according to the position information and the semantic information of the marked line, and marking the semantic information of the marked line.
Further, when the processor inputs the first panoramic image into the reticle prediction model with the semantic annotation function to predict reticle information, and obtains position information and semantic information of a reticle existing in the first panoramic image, the processor is specifically configured to: inputting the first panoramic image into a feature extraction network in a marking prediction model to extract image features to obtain the image features of the first panoramic image; inputting the image characteristics into a marking position prediction network in a marking prediction model to predict the marking position, and obtaining first marking characteristics and position information thereof; inputting the image characteristics into a marking semantic prediction network in a marking prediction model to predict marking semantics to obtain second marking characteristics and semantic information thereof; and mapping the position information of the first marking features and the semantic information of the second marking features to obtain the position information and the semantic information of the marking existing in the first panoramic image.
Further, when the processor inputs the first panoramic image into the feature extraction network in the reticle prediction model to extract the image features, and obtains the image features of the first panoramic image, the processor is specifically configured to: inputting the first panoramic image into a feature extraction network, extracting pixel features of the first panoramic image, and compressing pixel features corresponding to each pixel dimension in the transverse dimension in the longitudinal dimension to obtain one-dimensional features corresponding to the first panoramic image, wherein the one-dimensional features represent wall corner points in a house and ceiling lines and/or floor lines formed by the wall corner points extending in the transverse dimension.
Further, the processor is specifically configured to, when mapping the position information of the first reticle feature and the semantic information of the second reticle feature to obtain the position information and the semantic information of the reticle existing in the first panoramic image: matching the first reticle feature with the second reticle feature to form at least one reticle feature pair, each reticle feature pair comprising a first reticle feature and a second reticle feature; for each reticle feature pair, if the first reticle feature is the same as the second reticle feature, determining a reticle existing in the first panoramic image according to the position information of the first reticle feature, and taking the semantic information of the second reticle feature as the semantic information of the reticle; and if the first reticle feature is different from the second reticle feature, supplementing the first reticle feature and the position information thereof according to the semantic information of the second reticle feature, determining the reticle existing in the first panoramic image according to the supplemented position information of the first reticle feature, and taking the semantic information of the second reticle feature as the semantic information of the reticle.
Further, the processor is further configured to: generating a house flat house type picture or a house three-dimensional panoramic picture corresponding to the first panoramic picture by combining the marked lines and the semantic information thereof in the first panoramic picture according to the target objects in the first panoramic picture; and/or generating a decoration scheme of the house space corresponding to the first panoramic image by combining the marked lines and the semantic information thereof in the first panoramic image according to the target objects in the first panoramic image, and outputting the decoration scheme.
Here, it should be noted that: the storage medium provided by the foregoing embodiment may implement the technical solutions described in the foregoing method embodiments, and details are not described here.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.