WO2022145723A1

Movatterモバイル変換

Info

Publication number: WO2022145723A1
Application number: PCT/KR2021/016801
Authority: WO
Inventors: Xinyan Mei; Juan Liu; Dohun Kim; Longhai WU; Jie Chen
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2020-12-28
Filing date: 2021-11-16
Publication date: 2022-07-07
Anticipated expiration: 2023-06-28
Also published as: CN112651942B; CN112651942A

Abstract

Embodiments of the present disclosure provide a method and apparatus for detecting a layout. A current image including at least one component is recognized using an instance segmentation model in response to acquiring the current image, to obtain bounding box information and a category name of each component. Then, a marked image corresponding to the current image is generated based on the bounding box information of the each component. Finally, the bounding box information and the category name of the each component and the marked image are inputted into a long short-term memory network detection model, and a layout result of the component in the current image is outputted. The layout result including sorting and grouping results of the each component in the current image, thus implementing that the layout result of the each component in the image is extracted. The grouping result of the each component can be obtained based on the interdependence relationship between components. Accordingly, the global information between the portions of the current image is taken into consideration, thereby improving the accuracy of the layout result.

Description

METHOD AND APPARATUS FOR DETECTING LAYOUT

Embodiments of the present disclosure relate to the field of computer technology, and specifically to a method and apparatus for detecting a layout.

With the continuous advancement of science and technology, software testing plays an important role in product development and deployment. In practice, if a tester finds an error in a product, a developer needs to redesign the product to remove the error, which is laborious and time-consuming.

Embodiments of the present disclosure propose a method and apparatus for detecting a layout, an electronic device and a computer readable medium.

In a first aspect, the embodiments of the present disclosure provide a method for detecting a layout. The method comprises: recognizing a current image including at least one component in response to acquiring the current image, and obtaining bounding box information and a category name of each component by using an instance segmentation model; generating a marked image corresponding to the current image based on the bounding box information of the each component; and inputting the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model and outputting a layout result of the component in the current image, the layout result including sorting and grouping results of the each component in the current image.

In some embodiments, the inputting the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model and outputting a layout result of the component in the current image comprises: inputting the bounding box information and the category name of the each component into an encoding model in the long short-term memory network detection model, and outputting text feature information corresponding to the each component; inputting the marked image into an image processing model in the long short-term memory network detection model, and outputting image feature information corresponding to the marked image; and inputting the text feature information and the image feature information into a decoding model in the long short-term memory network detection model, and outputting the layout result of the component in the current image.

In some embodiments, the instance segmentation model is acquired by: acquiring a sample image set, wherein the sample image set includes at least one sample image, and each sample image includes at least one component; ascertaining and annotating a category name and bounding box information of each component in the each sample image; training and obtaining the instance segmentation model by using the each sample image as an input, and by using the category name and the bounding box information of the each component in the each sample image as a desired output.

In some embodiments, the method further comprises: performing a state similarity search on stored images based on the layout result of the component in the current image, to obtain a plurality of similar images; and comparing the current image with each similar image to determine whether the current image and the each similar image are in a given state.

In some embodiments, the comparing comprises: calculating an intersection-over-union between the layout result of the component in the current image and a layout result of a component in the each similar image; comparing a pixel of the current image with a pixel of the each similar image to obtain a comparison result between the pixel of the current image and the pixel of the each similar image; and comparing the intersection-over-union and the comparison result with a preset(or predetermined) threshold value to determine whether the current image and the each similar image are in the given state.

In some embodiments, the method further comprises: deleting the current image in response to ascertaining that the current image and the similar image are in the given state; and storing the current image in response to ascertaining that the current image and the similar image are in different states.

In some embodiments, the method further comprises: performing, for each component having a given layout, an adjustment operation on bounding box information of the each component to obtain adjusted bounding box information corresponding to the each component, in response to acquiring the layout result of the component in the current image.

In some embodiments, the method further comprises: obtaining a new sample image set by using the current image, the category name of the each component and the adjusted bounding box information; and training the instance segmentation model based on the new sample image set.

In some embodiments, the method further comprises: inputting the adjusted bounding box information corresponding to the each component and the layout result into a UI2Code to obtain a duplicate layout code, in response to acquiring the adjusted bounding box information corresponding to the each component.

In a second aspect, the embodiments of the present disclosure provide an apparatus for detecting a layout. The apparatus comprises: a recognizing module, configured to recognize a current image including at least one component in response to acquiring the current image, and obtain bounding box information and a category name of each component, by using an instance segmentation model; a generating module, configured to generate a marked image corresponding to the current image based on the bounding box information of the each component; and an outputting module, configured to input the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model and output a layout result of the component in the current image, the layout result including sorting and grouping results of the each component in the current image.

In some embodiments, the outputting module comprises: an encoding unit, configured to input the bounding box information and the category name of the each component into an encoding model in the long short-term memory network detection model, and output text feature information corresponding to the each component; an image processing unit, configured to input the marked image into an image processing model in the long short-term memory network detection model, and output image feature information corresponding to the marked image; and a decoding unit, configured to input the text feature information and the image feature information into a decoding model in the long short-term memory network detection model, and output the layout result of the component in the current image.

In some embodiments, the apparatus further comprises: a searching module, configured to perform a state similarity search on stored images based on the layout result of the component in the current image, to obtain a plurality of similar images; and a determining module, configured to compare the current image with each similar image to determine whether the current image and the each similar image are in a given state.

In some embodiments, the determining module comprises: a calculating unit, configured to calculate an intersection-over-union between the layout result of the component in the current image and a layout result of a component in the each similar image; a comparing unit, configured to compare a pixel of the current image with a pixel of the each similar image to obtain a comparison result between the pixel of the current image and the pixel of the each similar image; and a determining unit, configured to compare the intersection-over-union and the comparison result with a preset threshold value to determine whether the current image and the each similar image are in the given state.

In some embodiments, the apparatus further comprises: a deleting module, configured to delete the current image in response to ascertaining that the current image and the similar image are in the given state; and a storing module, configured to store the current image in response to ascertaining that the current image and the similar image are in different states.

In some embodiments, the apparatus further comprises: an adjusting module, configured to perform, for each component having a given layout, an adjustment operation on bounding box information of the each component to obtain adjusted bounding box information corresponding to the each component, in response to acquiring the layout result of the component in the current image.

In some embodiments, the apparatus further comprises: an updating module, configured to obtain a new sample image set by using the current image, the category name of the each component and the adjusted bounding box information; and a training module, configured to train the instance segmentation model based on the new sample image set.

In some embodiments, the apparatus further comprises: a code generating module, configured to input the adjusted bounding box information corresponding to the each component and the layout result into a UI2Code to obtain a duplicate layout code, in response to acquiring the adjusted bounding box information corresponding to the each component.

In a third aspect, the present disclosure provides an electronic device, the electronic device comprising one or more processors and a storage apparatus configured to store one or more programs. The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method described in any implementation in the first aspect.

In a fourth aspect, the present disclosure provides a computer readable medium storing a computer program. The program, when executed by a processor, implements the method described in any implementation in the first aspect.

According to the method and apparatus for detecting a layout, the current image including the at least one component is recognized using the instance segmentation model in response to acquiring the current image, to obtain the bounding box information and the category name of the each component. Then, the marked image corresponding to the current image is generated based on the bounding box information of the each component. Finally, the bounding box information and the category name of the each component and the marked image are inputted into the long short-term memory network detection model, and the layout result of the component in the current image is outputted. The layout result includes the sorting and grouping results of the each component in the current image, thus implementing that the layout result of the each component in the image is extracted. The grouping result of the each component can be obtained based on the interdependence relationship between components. Accordingly, the global information between the portions of the current image is taken into consideration, thereby improving the accuracy of the layout result.

After reading detailed descriptions for non-limiting embodiments given with reference to the following accompanying drawings, other features, objectives and advantages of the present disclosure will be more apparent:

Fig. 1 is a diagram of an exemplary system architecture in which some embodiments of the present disclosure may be applied;

Fig. 2 is a flowchart of an embodiment of a method for detecting a layout according to the present disclosure;

Fig. 3 is a schematic diagram of an application scenario of the method for detecting a layout according to the present disclosure;

Fig. 4 is a schematic diagram of an embodiment of an output of a layout result according to the present disclosure;

Fig. 5 is a schematic diagram of another embodiment of the method for detecting a layout according to the present disclosure;

Fig. 6 is a schematic diagram of another embodiment of the method for detecting a layout according to the present disclosure;

Fig. 7 is a schematic structural diagram of an embodiment of an apparatus for detecting a layout according to the present disclosure; and

Fig. 8 is a schematic structural diagram of an electronic device adapted to implement embodiments of the present disclosure.

The present disclosure is further described below in detail by combining the accompanying drawings and the embodiments. It may be appreciated that the specific embodiments described herein are merely used for explaining the relevant invention, rather than limiting the invention. In addition, it should be noted that, for the ease of description, only the parts related to the relevant invention are shown in the accompanying drawings.

It should also be noted that the embodiments in the present disclosure and the features in the embodiments may be combined with each other on a non-conflict basis. The present disclosure will be described below in detail with reference to the accompanying drawings and in combination with the embodiments.

Fig. 1 illustrates anexemplary system architecture 100 in which a method for detecting a layout or an apparatus for detecting a layout according to an embodiment of the present disclosure may be applied.

As shown in Fig. 1, thesystem architecture 100 may include

terminal devices

104 and 105, anetwork 106, and

servers

101, 102 and 103. Thenetwork 106 serves as a medium providing a communication link between the

terminal devices

104 and 105 and the

servers

101, 102 and 103. Thenetwork 106 may include various types of connections, for example, wired or wireless communication links, or optical fiber cables.

A user may use the

terminal devices

104 and 105 to interact with the

servers

101, 102 and 103 via thenetwork 106 to receive or send messages. Various applications (e.g., an image processing application, a data processing application, an instant communication tool, social platform software, a search application, and a shopping application) may be installed on the

104 and 105 may be hardware or software. When being the hardware, the terminal devices may be various electronic devices having a display screen and supporting communication with the servers, the electronic devices including, but not limited to, a smartphone, a tablet computer, a laptop portable computer, a desktop computer, etc. When being the software, the terminal devices may be installed in the above listed electronic devices. The terminal devices may be implemented as a plurality of pieces of software or a plurality of software modules, or as a single piece of software or a single software module, which will not be specifically defined here.

The

terminal devices

104 and 105 may acquire a current image via thenetwork 106 or through local reading. The current image may include at least one component. The

terminal devices

104 and 105 call an instance segmentation model to recognize the current image to obtain bounding box information and a category name of each component in the current image. Then, the

terminal devices

104 and 105 mark the current image according to the obtained bounding box information of the each component, thus obtaining a marked image including the bounding box information of the each component. Finally, the

terminal devices

104 and 105 input the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model. The long short-term memory network detection model processes the inputted content, and outputs a layout result of a component in the current image. The layout result may include a grouping result of the category name of the each component in the current image. That is, components in the same layout are considered as being in the same grouping.

The

servers

101, 102 and 103 may be servers providing various services, for example, backend servers receiving a request sent by a terminal device communicated with the

servers

101, 102 and 103. The backend servers may perform processing such as receiving and analyzing on the request sent by the terminal device, and generate a processing result.

It should be noted that the servers may be hardware or software. When being the hardware, the servers may be various electronic devices providing various services for the terminal devices. When being the software, the servers may be implemented as a plurality of pieces of software or a plurality of software modules providing various services for the terminal devices, or as a single piece of software or a single software module providing various services for the terminal devices, which will not be specifically defined here.

It should be noted that the method for detecting a layout provided by the embodiments of the present disclosure may be performed by the

terminal devices

104 and 105. Correspondingly, the apparatus for detecting a layout may be provided in the

terminal devices

104 and 105.

It should be appreciated that the numbers of the terminal devices, the networks, and the servers in Fig. 1 are merely illustrative. Any number of terminal devices, networks, and servers may be provided based on actual requirements.

Further referring to Fig. 2, Fig. 2 illustrates aflow 200 of an embodiment of a method for detecting a layout according to the present disclosure. The method for detecting a layout may include the following steps.

Step 210, using an instance segmentation model to recognize a current image including at least one component in response to acquiring the current image, to obtain bounding box information and a category name of each component.

In this step, an executing body (e.g., the terminal devices shown in Fig. 1) of the method for detecting a layout may acquire the current image from a server via a network, acquire the current image by performing an operation such as a screen capture operation on a currently displayed image, or acquire the current image from a local storage. Here, the current image may include the at least one component, and a component refers to a simple encapsulation for some portion of data and some portion of a method in the image. The current image may consist of a plurality of different components, and each component has a different attribute and plays a different role.

After acquiring the current image, the above executing body may call the instance segmentation model to process the current image, and recognize the each component in the current image, to obtain the bounding box information and the category name of the each component. The category names of components in the current image may be outputted as a sequence. That is, a category name sequence of the components in the current image is outputted. Here, the bounding box information of the component is used to represent a size of the component that is displayed using a border box, and generally includes the coordinates of each border box. The category name of the component is used to represent the type name of the component, and may include a customized type name, for example, menu, scroll view, tool bar, user info(information), tip box, noti(notification) box, keyboard, text list, text box, bgimg(for example, background image), banner, other, view box, button, pop dialog, pop menu, Item, first screen and progress bar.

Here, the above instance segmentation model may be used to segment each instance in an image to obtain a border box and a type of the each instance, for example, a Mask R-CNN model. The Mask R-CNN model trained and obtained based on the above image and the bounding box information and the category name of each component in the image may perform a corresponding pre-processing operation on the inputted current image to obtain a pre-processed image, and then input the pre-processed image into a pre-trained neural network to obtain a corresponding feature map. Next, the Mask R-CNN model sets a predetermined number of points of interest for each point in the feature map to obtain a plurality of candidate points of interest, and puts the candidate points of interest into an RPN network for a binary classification and a BB regression, to filter out a portion of the candidate points of interest. Then, the Mask R-CNN model performs an ROIAlign(Region Of Interest Align) operation on the remaining points of interest (i.e., first makes a pixel of an original image corresponding to a pixel of the feature map, and then makes the feature map corresponding to a fixed feature), and finally performs a classification (N category classification), a BB regression and MASK generation (performs an FCN operation in each point of interest) on these candidate points of interest, thus obtaining the bounding box information and the category name of the each component in the current image.

Step 220, generating a marked image corresponding to the current image based on the bounding box information of the each component.

In this step, after recognizing the current image using the instance segmentation model to obtain the bounding box information and the category name of the each component, the above executing body replaces the each component in the current image with a bounding box according to the bounding box information of the each component, that is, changes the each component in the current image to the corresponding bounding box to obtain a marked image including each bounding box.

As an example, the above bounding box information is displayed using the border box. After obtaining a rectangular box of the each component in the current image, the above executing body replaces the each component with the corresponding rectangular box, and uses an image including the rectangular box as the marked image corresponding to the current image.

Step 230, inputting the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model, and outputting a layout result of the component in the current image.

In this step, after acquiring the bounding box information and the category name of the each component in the current image and the marked image corresponding to the current image, the above executing body inputs the bounding box information and the category name of the each component and the marked image into the long short-term memory network detection model. The long short-term memory network detection model processes the inputted content, and outputs the layout result of the component in the current image. Here, the layout result may include sorting and grouping results of the each component in the current image. Since there is an association dependency relationship between the components in the current image, the long short-term memory network detection model may sort and group the components in the current image according to the association dependency relationship between the components, thereby obtaining the sorting and grouping results of the each component in the current image. The above executing body may sort and group the category name of a component in the above category name sequence, to obtain sorting and grouping results of the category name corresponding to the component. Category names in the same grouping may be distinguished in the same way, for example, the category names in the same grouping are put into the same brackets. Different groupings are distinguished using different brackets. Components in the same grouping belong to the same layout distribution, and there may be different alignments such as a left alignment, a vertical middle alignment, a right alignment, a top alignment, a horizontal center alignment, and a bottom alignment.

Further referring to Fig. 3, Fig. 3 is a schematic diagram of an application scenario of the method for detecting a layout according to this embodiment. The method may be applied to the application scenario of Fig. 3. Atelevision 301 captures a current image displayed on the screen to obtain a current image. Thetelevision 301 recognizes the current image using an instance segmentation model, to obtain bounding box information and a category name of each component in the current image. Then, thetelevision 301 generates a marked image corresponding to the current image according to the bounding box information of the each component in the current image, and inputs the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model, to obtain an outputtedlayout result 302 of a component in the current image. Then, thetelevision 301 acquires a component list identical and similar to thelayout result 302 of the current image from a local focus manager according to the obtainedlayout result 302. When receiving voice from a user, thetelevision 301 analyzes the received voice to ascertain whether the voice includes a similar command such as "Next" and "Previous." When it is ascertained that the voice includes the similar command such as "Next" and "Previous," according to a preset condition that an image obtained by performing switching in the same group is first selected and an image obtained by performing switching in different groups is second selected, thetelevision 301 first performs a selection in the component list identical to thelayout result 302 of the current image, and presents the selected image to the user through the screen. If the selection for the image in the component list identical to thelayout result 302 of the current image is completed, the component list is switched to the similar list to perform a selection.

According to the method for detecting a layout provided in the above embodiment of the present disclosure, the current image including the at least one component is recognized using the instance segmentation model in response to acquiring the current image, to obtain the bounding box information and the category name of the each component. Then, the marked image corresponding to the current image is generated based on the bounding box information of the each component. Finally, the bounding box information and the category name of the each component and the marked image are inputted into the long short-term memory network detection model, and the layout result of the component in the current image is outputted. Thus, it is implemented that the layout result of the each component in the image is extracted. The grouping result of the each component can be obtained based on the interdependence relationship between the components. Accordingly, the global information between the portions of the current image is taken into consideration, thereby improving the accuracy of the layout result.

Further referring to Fig. 4, Fig. 4 illustrates thestep 230 "inputting the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model, and outputting a layout result of the component in the current image" in Fig. 2, which may be implemented based on the following steps:

Step 410, inputting bounding box information and a category name of each component into an encoding model in a long short-term memory network detection model, and outputting text feature information corresponding to the each component.

In this step, after recognizing a current image using an instance segmentation model to obtain the bounding box information and the category name of the each component, the above executing body first applies embedded coding to the bounding box information and the category name of the each component to obtain vector information of a bounding box and a category, then inputs the obtained vector information of the bounding box and the category into the encoding model in the long short-term memory network detection model (LSTM), and outputs the text feature information corresponding to the each component.

Step 420, inputting a marked image into an image processing model in the long short-term memory network detection model, and outputting image feature information corresponding to the marked image.

In this step, after acquiring the marked image corresponding to the current image, the above executing body inputs the marked image into the image processing model in the long short-term memory network detection model to perform an operation such as a convolution operation and a full connection operation of a CNN, and outputs the image feature information corresponding to the marked image.

Step 430, inputting the text feature information and the image feature information into a decoding model in the long short-term memory network detection model, and outputting a layout result of a component in a current image.

In this step, after acquiring the text feature information and the image feature information, the above executing body fuses the text feature information and the image feature information to obtain the fused feature information. Then, the fused feature information is inputted into the decoding model in the long short-term memory network detection model, and a softmax classification is performed. A layout result having the same length as an inputted category name sequence is outputted, that is, the layout result of the component in the current image is outputted.

In this implementation, the layout result is detected based on a long short-term memory network, and thus, the problem that features are dependent on each other may be solved. A grouping result of the each component is ascertained based on the interdependence relationship between the features, and thus, a relationship between local portions is taken into consideration, thereby improving the accuracy of the grouping result.

In some alternative implementations of this embodiment, the above instance segmentation model is acquired based on the following steps:

In a first step, a sample image set is acquired.

Specifically, the process of training the above instance segmentation model is implemented in a server, and the server may locally read the sample image set or acquire the sample image set from the above executing body. The sample image set includes at least one sample image, and each sample image includes at least one component.

In a second step, a category name and bounding box information of each component in the each sample image are ascertained and annotated.

Specifically, the server may annotate the category name of the each component in the each sample image according to a customized category name. Moreover, the server annotates a border box to which the each component in the each sample image belongs, to implement the annotation for the bounding box information of the each component. Thus, the category name and the bounding box information that correspond to the each component in the each sample image are obtained.

In a third step, the each sample image is used as an input, and the category name and the bounding box information of the each component in the each sample image are used as a desired output, to train and obtain the instance segmentation model.

Specifically, the server may acquire a Mask R-CNN network, use a sample image as an input, and use a category name and bounding box information of each component corresponding to the inputted sample image as a desired output to train the Mask R-CNN network to obtain the trained Mask R-CNN network as the instance segmentation model. The trained instance segmentation model may recognize an inputted image and output a category name and bounding box information of each component in the image.

In this implementation, by training and obtaining the instance segmentation model, the accuracy and the recognition efficiency of the category name and the bounding box information of the each component in the image may be improved.

Further referring to Fig. 5, Fig. 5 illustrates aflow 500 of another embodiment of the method for detecting a layout. The method for detecting a layout may further include the following steps.

Step 510, using an instance segmentation model to recognize a current image including at least one component in response to acquiring the current image, to obtain bounding box information and a category name of each component.

In this step, step 510 is the same as step 210 in the embodiment shown in Fig. 2, which will not be repeatedly described here.

Step 520, generating a marked image corresponding to the current image based on the bounding box information of the each component.

In this step,step 520 is the same asstep 220 in the embodiment shown in Fig. 2, which will not be repeatedly described here.

Step 530, inputting the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model, and outputting a layout result of the component in the current image.

In this step,step 530 is the same asstep 230 in the embodiment shown in Fig. 2, which will not be repeatedly described here.

Step 540, performing a state similarity search on stored images based on the layout result of the component in the current image, to obtain a plurality of similar images.

In this step, after acquiring the layout result of the component in the current image (i.e., after acquiring a grouping result of the component in the current image), the above executing body performs a state search on an image stored in a local database, that is, performs the state similarity search on the stored images according to the layout result of the component in the current image, to search an image in the stored images that is similar to the grouping result of the component in the current image or search an image having a similar layout in the stored images, thereby obtaining a plurality of similar images having a state similar to that of the layout result of the component in the current image.

Step 550, comparing the current image with each similar image to determine whether the current image and the each similar image are in a given state.

In this step, after acquiring the plurality of similar images, the above executing body compares the current image with the each similar image, respectively, and thus, a plurality of comparison results between the current image and the each similar image may be obtained. A comparison result may include a similarity numerical value and the like. Then, the above executing body determines whether the current image and the each similar image are in the given state according to the comparison result.

As an example, the above executing body acquires 3 similar images that are respectively an image 1, an image 2, and an image 3. The above executing body compares the current image with the image 1 to obtain a first similarity numerical value, then compares the current image with the image 2 to obtain a second similarity numerical value, and then compares the current image with the image 3 to obtain a third similarity numerical value. The above executing body determines whether the current image and the each similar image are in the given state according to the first similarity numerical value, the second similarity numerical value and the third similarity numerical value.

In this embodiment, the search for the image is implemented through the layout result of the component in the current image, thus implementing the state comparison between images, and the accuracy of the state comparison between the images can be improved based on the layout result of the component in the current image.

In some alternative implementations of this embodiment, theabove step 550 "comparing the current image with each similar image to determine whether the current image and the each similar image are in a given state" may be implemented based on the following steps:

In a first step, an intersection-over-union between the layout result of the component in the current image and a layout result of a component in the each similar image is calculated.

Specifically, after acquiring the plurality of similar images, the above executing body acquires the layout result of the each similar image. Then, the above executing body ascertains the bounding box information of the each component in the current image and bounding box information of the component in the each similar image, and calculates an intersection-over-union between the bounding box information of the component in the current image and the bounding box information of the component in the each similar image, respectively. Here, the intersection-over-union (IOU) is a concept used in a target detection, which refers to an overlap ratio between a generated candidate bound and a ground truth bound, that is, a ratio between an intersection and a union of the candidate bound and the ground truth bound. The most ideal situation refers to that the intersection and the union are completely overlapped, that is, the ratio is 1. The above executing body may calculate an intersection-over-union between a border box of the component in the current image and a border box of a component in a similar image, to obtain the intersection-over-union between the layout result of the component in the current image and the layout result of the component in the each similar image, respectively.

In a second step, a pixel of the current image and a pixel of the each similar image are compared to obtain a comparison result between the pixel of the current image and the pixel of the each similar image.

Specifically, after acquiring the plurality of similar images, the above executing body acquires the pixel of the each similar image. Then, the above executing body compares the pixel of the current image with the pixel of the each similar image, respectively. The above executing body may calculate a difference value between the pixel of the current image and the pixel of the each similar image, and use the difference value as the comparison result between the pixel of the current image and the pixel of the each similar image.

In a third step, the intersection-over-union and the comparison result are compared with a preset threshold value to determine whether the current image and the each similar image are in the given state.

Specifically, after acquiring the intersection-over-union between the layout result of the component in the current image and the layout result of the component in the each similar image and the comparison result between the pixel of the current image and the pixel of the each similar image, the above executing body compares the intersection-over-union and the comparison result that correspond to the each similar image with the preset threshold value, respectively.

The above executing body may add the intersection-over-union and the comparison result that correspond to the each similar image together, and compare the result of the addition with a preset threshold value, to determine whether the current image and the each similar image are in the given state. The above executing body may determine whether the result of the addition is not less than the preset threshold value. If the result of the addition is not less than the preset threshold value, it is ascertained that the similar image corresponding to the result of the addition is in the same state as that of the current image. If the result of the addition is less than the preset threshold value, it is ascertained that the similar image corresponding to the result of the addition is in a state different from that of the current image.

The above executing body may also calculate the average value of the intersection-over-union and the comparison result that correspond to the each similar image, respectively, and compare the average value with a preset threshold value, to determine whether the current image and the each similar image are in the given state. The above executing body may determine whether the average value is not less than the preset threshold value. If the result of the addition (average value) is not less than the preset threshold value, it is ascertained that the similar image corresponding to the average value is in the same state as that of the current image. If the average value is less than the preset threshold value, it is ascertained that the similar image corresponding to the average value is in a state different from that of the current image.

In this implementation, the intersection-over-union between the current image and the similar image and the comparison result between the pixels are calculated to determine whether the current image and the similar image are in the given state. The accuracy of the state comparison is improved based on the plurality of determinations.

Further referring to Fig. 5, the method for detecting a layout may further include the following steps.

Step 560, deleting the current image in response to ascertaining that the current image and the similar image are in the given state.

In this step, the above executing body ascertains that the current image and the similar image are in the given state by comparing the intersection-over-union and the comparison result with the preset threshold value. It indicates that an image having the layout result of the current image is already stored in the local database, and thus, the current image is deleted and is not stored.

Step 570, storing the current image in response to ascertaining that the current image and the similar image are in different states.

In this step, the above executing body ascertains that the current image and the similar image are in the different states by comparing the intersection-over-union and the comparison result with the preset threshold value. It indicates that the image having the layout result of the current image is not stored in the local database, and thus, the current image is stored into the local database.

In this embodiment, the current image is deleted or stored through the result of the state comparison, thus implementing different processing operations on the current image, and improving the diversity of the processing for the current image and the applicability of the layout result.

Further referring to Fig. 6, Fig. 6 illustrates aflow 600 of another embodiment of the method for detecting a layout. The method for detecting a layout may further include the following steps.

Step 610, using an instance segmentation model to recognize a current image including at least one component in response to acquiring the current image, to obtain bounding box information and a category name of each component.

In this step, step 610 is the same as step 210 in the embodiment shown in Fig. 2, which will not be repeatedly described here.

Step 620, generating a marked image corresponding to the current image based on the bounding box information of the each component.

In this step, step 620 is the same asstep 220 in the embodiment shown in Fig. 2, which will not be repeatedly described here.

Step 630, inputting the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model, and outputting a layout result of the component in the current image.

In this step, step 630 is the same asstep 230 in the embodiment shown in Fig. 2, which will not be repeatedly described here.

Step 640, performing, for each component having a given layout, an adjustment operation on bounding box information of the each component to obtain adjusted bounding box information corresponding to the each component, in response to acquiring the layout result of the component in the current image.

In this step, the above executing body acquires the layout result of the component in the current image, and ascertains each layout (i.e., each grouping result) in the layout result, respectively. Each grouping includes components having the given layout. For the each component having the given layout, the adjustment operation is performed on the bounding box information of the each component, to align the bounding box information of the each component and set a width of the bounding box information of the each component to the same numerical value, thus obtaining the adjusted bounding box information corresponding to the each component.

In this embodiment, the bounding box information of the each component in the given layout is adjusted to obtain the component aligned by group and having an average width, which improves the accuracy of the layout result. Thus, a more accurate state comparison between images may be implemented based on the adjusted layout result.

Further referring to Fig. 6, the method for detecting a layout may further include the following steps.

Step 650, using the current image, the category name of the each component and the adjusted bounding box information as a new sample image set.

In this step, after acquiring adjusted bounding box information of the each component in the current image, the above executing body uses the current image, the category name of the each component and the adjusted bounding box information as the new sample image set.

Step 660, training the instance segmentation model based on the new sample image set.

In this step, after using the current image, the category name of the each component and the adjusted bounding box information as the new sample image set, the above executing body may further train the instance segmentation model based on the new sample image set, to obtain an updated instance segmentation model. The updated instance segmentation model may recognize a new image and output a category name sequence of each component in the new image and bounding box information of each component aligned by group and having an average width.

In this embodiment, the instance segmentation model is trained through the new sample image set, such that the updated instance segmentation model can output the bounding box information of the each component aligned by group and having the average width, thereby improving the accuracy of the outputted bounding box information.

In some alternative implementations of this embodiment, the method for detecting a layout may further include: inputting the adjusted bounding box information corresponding to the each component and the layout result into a UI2Code to obtain a duplicate layout code, in response to acquiring the adjusted bounding box information corresponding to the each component.

Specifically, after acquiring the adjusted bounding box information corresponding to the each component, the above executing body inputs the adjusted bounding box information corresponding to the each component and the layout result into the UI2Code. The UI2Code performs a redetection, a classification and an extraction on the customized component in the inputted content, and then applies a method of connecting a line between components, to ascertain a structured layout of each component, thereby obtaining a region having a duplicate layout between the components. Then, the duplicate layout code is generated using the obtained region having the duplicate layout between the components.

In this implementation, the adjusted bounding box information and the layout result are inputted into the UI2Code, thus improving the accuracy of the duplicate layout code generated by the UI2Code, and improving the reusability of the code.

Further referring to Fig. 7, as an implementation of the method shown in the above drawings, the present disclosure provides an embodiment of an apparatus for detecting a layout. The embodiment of the apparatus corresponds to the embodiment of the method shown in Fig. 2. The apparatus may be applied in various electronic devices.

As shown in Fig. 7, theapparatus 700 in this embodiment comprises: a recognizingmodule 710, agenerating module 720 and anoutputting module 730.

Here, the recognizingmodule 710 is configured to use an instance segmentation model to recognize a current image including at least one component in response to acquiring the current image, to obtain bounding box information and a category name of each component.

Thegenerating module 720 is configured to generate a marked image corresponding to the current image based on the bounding box information of the each component; and

Theoutputting module 730 is configured to input the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model and output a layout result of the component in the current image, the layout result including sorting and grouping results of the each component in the current image.

In some alternative implementations of this embodiment, the outputting module comprises: an encoding unit, configured to input the bounding box information and the category name of the each component into an encoding model in the long short-term memory network detection model, and output text feature information corresponding to the each component; an image processing unit, configured to input the marked image into an image processing model in the long short-term memory network detection model, and output image feature information corresponding to the marked image; and a decoding unit, configured to input the text feature information and the image feature information into a decoding model in the long short-term memory network detection model, and output the layout result of the component in the current image.

In some alternative implementations of this embodiment, the instance segmentation model is acquired by: acquiring a sample image set, wherein the sample image set includes at least one sample image, and each sample image includes at least one component; ascertaining and annotating a category name and bounding box information of each component in the each sample image; and using the each sample image as an input, and using the category name and the bounding box information of the each component in the each sample image as a desired output, to train and obtain the instance segmentation model.

In some alternative implementations of this embodiment, the apparatus further comprises: a searching module, configured to perform a state similarity search on stored images based on the layout result of the component in the current image, to obtain a plurality of similar images; and a determining module, configured to compare the current image with each similar image to determine whether the current image and the each similar image are in a given state.

In some alternative implementations of this embodiment, the determining module comprises: a calculating unit, configured to calculate an intersection-over-union between the layout result of the component in the current image and a layout result of a component in the each similar image; a comparing unit, configured to compare a pixel of the current image with a pixel of the each similar image to obtain a comparison result between the pixel of the current image and the pixel of the each similar image; and a determining unit, configured to compare the intersection-over-union and the comparison result with a preset threshold value to determine whether the current image and the each similar image are in the given state.

In some alternative implementations of this embodiment, the apparatus further comprises: a deleting module, configured to delete the current image in response to ascertaining that the current image and the similar image are in the given state; and a storing module, configured to store the current image in response to ascertaining that the current image and the similar image are in different states.

In some alternative implementations of this embodiment, the apparatus further comprises: an adjusting module, configured to perform, for each component having a given layout, an adjustment operation on bounding box information of the each component to obtain adjusted bounding box information corresponding to the each component, in response to acquiring the layout result of the component in the current image.

In some alternative implementations of this embodiment, the apparatus further comprises: an updating module, configured to use the current image, the category name of the each component and the adjusted bounding box information as a new sample image set; and a training module, configured to train the instance segmentation model based on the new sample image set.

In some alternative implementations of this embodiment, the apparatus further comprises: a code generating module, configured to input the adjusted bounding box information corresponding to the each component and the layout result into a UI2Code to obtain a duplicate layout code, in response to acquiring the adjusted bounding box information corresponding to the each component.

According to the apparatus for detecting a layout provided in the above embodiment of the present disclosure, the current image including the at least one component is recognized using the instance segmentation model in response to acquiring the current image, to obtain the bounding box information and the category name of the each component. Then, the marked image corresponding to the current image is generated based on the bounding box information of the each component. Finally, the bounding box information and the category name of the each component and the marked image are inputted into the long short-term memory network detection model, and the layout result of the component in the current image is outputted. The layout result includes the sorting and grouping results of the each component in the current image, thus implementing that the layout result of the each component in the image is extracted. The grouping result of the each component can be obtained based on the interdependence relationship between components. Accordingly, the global information between the portions of the current image is taken into consideration, thereby improving the accuracy of the layout result.

It may be appreciated by those skilled in the art that the above apparatus further comprises some other well-known structures, for example, a processor and a storage device. In order to unnecessarily obscure the embodiments of the present disclosure, these well-known structures are not shown in Fig. 7.

Referring to Fig. 8, Fig. 8 is a schematic structural diagram of an electronic device (e.g., the terminal device in Fig. 1) 800 adapted to implement embodiments of the present disclosure.

As shown in Fig. 8, theelectronic device 800 may include a processing apparatus (e.g., a central processing unit and a graphic processing unit) 801, which may execute various appropriate actions and processes in accordance with a program stored in a read-only memory (ROM) 802 or a program loaded into a random access memory (RAM) 803 from astorage apparatus 808. TheRAM 803 also stores various programs and data required by operations of theelectronic device 800. Theprocessing apparatus 801, theROM 802 and theRAM 803 are connected to each other through abus 804. An input/output (I/O)interface 805 is also connected to thebus 804.

Generally, the following apparatuses are connected to the I/O interface 805: aninput apparatus 806 including, for example, a touch screen, a touch pad, a keyboard, a mouse, a camera, a microphone, an accelerometer and a gyroscope; anoutput apparatus 807 including, for example, a liquid crystal display (LCD), a speaker and a vibrator; astorage apparatus 808 including, for example, a magnetic tape and a hard disk; and acommunication apparatus 809. Thecommunication apparatus 809 may allow theelectronic device 800 to perform a wireless or wired communication with another device to exchange data. Although Fig. 8 shows theelectronic device 800 having various apparatuses, it should be understood that it is not required to implement or possess all of the illustrated apparatuses. More or fewer apparatuses may be alternatively implemented or possessed. Each block shown in Fig. 8 may represent one apparatus, or may represent a plurality of apparatuses as needed.

In particular, according to the embodiments of the present disclosure, the process described above with reference to the flowchart may be implemented as a computer software program. For example, an embodiment of the present disclosure includes a computer program product, including a computer program hosted on a computer readable medium, the computer program including program codes for performing the method as illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via thecommunication apparatus 809, may be installed from thestorage apparatus 808, or may be installed from theROM 802. The computer program, when executed by theprocessing apparatus 801, implements the above mentioned functionalities defined in the method according to the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium, a computer readable storage medium, or any combination of the two. For example, the computer readable storage medium may be, but not limited to: an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. A more specific example of the computer readable storage medium may include, but not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read only memory (ROM), an erasable programmable read only memory (EPROM or flash memory), a fibre, a portable compact disk read only memory (CD-ROM), an optical memory, a magnet memory or any suitable combination of the above. In the embodiments of the present disclosure, the computer readable storage medium may be any physical medium containing or storing programs, which may be used by a command execution system, apparatus or device or incorporated thereto. In the embodiments of the present disclosure, the computer readable signal medium may include a data signal that is propagated in a baseband or as a part of a carrier wave, which carries computer readable program codes. Such propagated data signal may be in various forms, including, but not limited to, an electromagnetic signal, an optical signal, or any suitable combination of the above. The computer readable signal medium may also be any computer readable medium other than the computer readable storage medium. The computer readable medium may send, propagate or transmit programs for use by, or used in combination with, the command execution system, apparatus or device. The program codes contained on the computer readable medium may be transmitted with any suitable medium including, but not limited to, a wire, an optical cable, an RF, or any suitable combination of the above.

The above computer readable medium may be the computer readable medium included in the above electronic device, or a stand-alone computer readable medium not assembled into the electronic device. The above computer readable medium carries one or more programs. The one or more programs, when executed by the electronic device, cause the electronic device to: acquire a current image including at least one component, and use an instance segmentation model to recognize the current image, to obtain bounding box information and a category name of each component; generate a marked image corresponding to the current image based on the bounding box information of the each component; and input the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model, and output a layout result of the component in the current image, the layout result including sorting and grouping results of the each component in the current image.

A computer program code for executing the operations according to the embodiments of the present disclosure may be written in one or more programming languages or a combination thereof. The programming language includes an object-oriented programming language such as Java, Smalltalk and C++, and further includes a general procedural programming language such as "C" language or a similar programming language. The program codes may be executed entirely on a user computer, executed partially on the user computer, executed as a standalone package, executed partially on the user computer and partially on a remote computer, or executed entirely on the remote computer or a server. When the remote computer is involved, the remote computer may be connected to the user computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or be connected to an external computer (e.g., connected through Internet provided by an Internet service provider).

The flowcharts and block diagrams in the accompanying drawings illustrate architectures, functions and operations that may be implemented according to the system, the method, and the computer program product of the various embodiments of the present disclosure. In this regard, each of the blocks in the flowcharts or block diagrams may represent a module, a program segment, or a code portion, the module, the program segment, or the code portion comprising one or more executable instructions for implementing specified logic functions. It should also be noted that, in some alternative implementations, the functions denoted by the blocks may occur in a sequence different from the sequences shown in the figures. For example, any two blocks presented in succession may be executed substantially in parallel, or they may sometimes be executed in a reverse sequence, depending on the function involved. It should also be noted that each block in the block diagrams and/or flowcharts as well as a combination of blocks in the block diagrams and/or flowcharts may be implemented using a dedicated hardware-based system executing specified functions or operations, or by a combination of dedicated hardware and computer instructions.

The modules involved in the embodiments of the present disclosure may be implemented by means of software or hardware. The described modules may also be provided in a processor. For example, the processor may be described as: a processor comprising a recognizing module, a generating module and an outputting module. Here, the names of these modules do not in some cases constitute a limitation to such modules themselves. For example, the recognizing module may alternatively be described as "a module for using an instance segmentation model to recognize a current image including at least one component in response to acquiring the current image, to obtain bounding box information and a category name of each component."

The above description is only an explanation for the preferred embodiments of the present disclosure and the applied technical principles. It should be appreciated by those skilled in the art that the inventive scope involved in the embodiments of the present disclosure is not limited to the technical solution formed by the particular combinations of the above technical features. At the same time, the inventive scope should also cover other technical solutions formed by any combinations of the above technical features or equivalent features thereof without departing from the concept of the invention, for example, technical solutions formed by replacing the features as disclosed in the present disclosure with (but not limited to) technical features with similar functions.

Claims

A method for detecting a layout, comprising:
recognizing a current image including at least one component in response to acquiring the current image, and obtaining bounding box information and a category name of each component, by using an instance segmentation model;
generating a marked image corresponding to the current image based on the bounding box information of the each component; and
inputting the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model and outputting a layout result of the component in the current image, the layout result including sorting and grouping results of the each component in the current image.
The method according to claim 1, wherein the inputting the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model and outputting a layout result of the component in the current image comprises:
inputting the bounding box information and the category name of the each component into an encoding model in the long short-term memory network detection model, and outputting text feature information corresponding to the each component;
inputting the marked image into an image processing model in the long short-term memory network detection model, and outputting image feature information corresponding to the marked image; and
inputting the text feature information and the image feature information into a decoding model in the long short-term memory network detection model, and outputting the layout result of the component in the current image.
The method according to claim 1, wherein the instance segmentation model is acquired by:
acquiring a sample image set, wherein the sample image set includes at least one sample image, and each sample image includes at least one component;
ascertaining and annotating a category name and bounding box information of each component in the each sample image; and
training and obtaining the instance segmentation model, by using the each sample image as an input, and by using the category name and the bounding box information of the each component in the each sample image as a desired output.
The method according to claim 1, further comprising:
performing a state similarity search on stored images based on the layout result of the component in the current image, to obtain a plurality of similar images; and
comparing the current image with each similar image to determine whether the current image and the each similar image are in a given state.
The method according to claim 4, wherein the comparing comprises:
calculating an intersection-over-union between the layout result of the component in the current image and a layout result of a component in the each similar image;
comparing a pixel of the current image with a pixel of the each similar image to obtain a comparison result between the pixel of the current image and the pixel of the each similar image; and
comparing the intersection-over-union and the comparison result with a preset threshold value to determine whether the current image and the each similar image are in the given state.
The method according to claim 4, further comprising:
deleting the current image in response to ascertaining that the current image and the similar image are in the given state; and
storing the current image in response to ascertaining that the current image and the similar image are in different states.
The method according to claim 1, further comprising:
performing, for each component having a given layout, an adjustment operation on bounding box information of the each component to obtain adjusted bounding box information corresponding to the each component, in response to acquiring the layout result of the component in the current image.
The method according to claim 7, further comprising:
obtaining a new sample image set by using the current image, the category name of the each component and the adjusted bounding box information; and
training the instance segmentation model based on the new sample image set.
The method according to claim 7, further comprising:
inputting the adjusted bounding box information corresponding to the each component and the layout result into a UI2Code to obtain a duplicate layout code, in response to acquiring the adjusted bounding box information corresponding to the each component.
An apparatus for detecting a layout, comprising:
a recognizing module, configured to recognize a current image including at least one component in response to acquiring the current image, and obtain bounding box information and a category name of each component, by using an instance segmentation model;
a generating module, configured to generate a marked image corresponding to the current image based on the bounding box information of the each component; and
an outputting module, configured to input the bounding box information and the category name of the each component and the marked image into a long short-term memory network detection model and output a layout result of the component in the current image, the layout result including sorting and grouping results of the each component in the current image.
The apparatus according to claim 10, wherein the outputting module comprises:
an encoding unit, configured to input the bounding box information and the category name of the each component into an encoding model in the long short-term memory network detection model, and output text feature information corresponding to the each component;
an image processing unit, configured to input the marked image into an image processing model in the long short-term memory network detection model, and output image feature information corresponding to the marked image; and
a decoding unit, configured to input the text feature information and the image feature information into a decoding model in the long short-term memory network detection model, and output the layout result of the component in the current image.
The apparatus according to claim 10, wherein the instance segmentation model is acquired by:
acquiring a sample image set, wherein the sample image set includes at least one sample image, and each sample image includes at least one component;
ascertaining and annotating a category name and bounding box information of each component in the each sample image; and
training and obtaining the instance segmentation model, by using the each sample image as an input, and by using the category name and the bounding box information of the each component in the each sample image as a desired output.
The apparatus according to claim 10, further comprising:
a searching module, configured to perform a state similarity search on stored images based on the layout result of the component in the current image, to obtain a plurality of similar images; and
a determining module, configured to compare the current image with each similar image to determine whether the current image and the each similar image are in a given state.
The apparatus according to claim 13, wherein the determining module comprises:
a calculating unit, configured to calculate an intersection-over-union between the layout result of the component in the current image and a layout result of a component in the each similar image;
a comparing unit, configured to compare a pixel of the current image with a pixel of the each similar image to obtain a comparison result between the pixel of the current image and the pixel of the each similar image; and
a determining unit, configured to compare the intersection-over-union and the comparison result with a preset threshold value to determine whether the current image and the each similar image are in the given state.
The apparatus according to claim 13, further comprising:
a deleting module, configured to delete the current image in response to ascertaining that the current image and the similar image are in the given state; and
a storing module, configured to store the current image in response to ascertaining that the current image and the similar image are in different states.