Movatterモバイル変換


[0]ホーム

URL:


CN119380021A - Interactive object segmentation method based on bounding box input and gaze point assistance - Google Patents

Interactive object segmentation method based on bounding box input and gaze point assistance
Download PDF

Info

Publication number
CN119380021A
CN119380021ACN202411461106.1ACN202411461106ACN119380021ACN 119380021 ACN119380021 ACN 119380021ACN 202411461106 ACN202411461106 ACN 202411461106ACN 119380021 ACN119380021 ACN 119380021A
Authority
CN
China
Prior art keywords
input
net
gaze point
map
refinement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411461106.1A
Other languages
Chinese (zh)
Inventor
谢荣辉
史冉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and TechnologyfiledCriticalNanjing University of Science and Technology
Priority to CN202411461106.1ApriorityCriticalpatent/CN119380021A/en
Publication of CN119380021ApublicationCriticalpatent/CN119380021A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明提出了一种基于边界框输入与注视点辅助的交互式对象分割方法,获取图像I中分割对象的边界框,并转化为二值的边界框图B,同时,获取目标图像的注视点FM,并抹去输入框外的注视点信息,得到处理后的注视点图将图像I和边界框图B输入初始分割网络Coarse U‑Net,生成粗分割结果MC与基于框的多尺度特征计算初始分割结果MC与处理后的注视点图的相似度,并以此调整注视点图,得到调整后的注视点图FM';将图像I、调整后的注视点图FM'和粗分割结果MC在通道维度上连接,输入到细化分割网络Refinement U‑Net,提取细化特征并在此过程中逐层融合Coarse U‑Net提取的基于框的特征将细化特征输入细化网络Refinement U‑Net的解码器进行解码,得到最终的分割结果M。本发明提高了分割质量。

The present invention proposes an interactive object segmentation method based on bounding box input and gaze point assistance, which obtains the bounding box of the segmented object in the image I and converts it into a binary bounding box map B. At the same time, the gaze point FM of the target image is obtained, and the gaze point information outside the input box is erased to obtain a processed gaze point map The image I and the bounding box map B are input into the initial segmentation network Coarse U-Net to generate a coarse segmentation result MC and a box-based multi-scale feature Calculate the initial segmentation result MC and the processed gaze point map The similarity of the image I is calculated and the fixation map is adjusted to obtain the adjusted fixation map FM'. The image I, the adjusted fixation map FM' and the rough segmentation result MC are connected in the channel dimension and input into the refinement segmentation network Refinement U-Net to extract the refined features. In this process, the box-based features extracted by Coarse U-Net are fused layer by layer. Refine features The image is input into the decoder of the refinement network U-Net for decoding to obtain the final segmentation result M. The present invention improves the segmentation quality.

Description

Interactive object segmentation method based on boundary box input and fixation point assistance
Technical Field
The invention belongs to the field of image segmentation, and particularly relates to an interactive object segmentation method based on bounding box input and gaze point assistance.
Background
Image segmentation is an important task in the field of computer vision, aimed at dividing images into different areas or objects, and is widely used in many scenes such as autopilot, medical imaging, picture editing, etc. The interactive object segmentation technology is an important research direction in the field of image segmentation, and generates a binary segmentation mask of a target object by combining prompt information (such as clicking, box selection, smearing and the like). The interactive mode ensures that the image segmentation is more flexible and accurate, improves the segmentation efficiency and the result quality, and has remarkable practical value for the image annotation.
Currently, common interactive methods mainly include clicking, lines and rectangular boxes. The method has the advantages that the click interaction mode is simple and easy to use, the provided information quantity is limited, multiple clicks are usually needed to correct the result, the line interaction can provide finer control, the detailed shape and structure of a target are captured, the drawing of the line or the outline often requires more operation and time cost, the requirement on the skill of a user is higher, the boundary box interaction mode is visual and simple, the operation is easy, the position and boundary information can be provided, and the prior knowledge is relatively rich.
Furthermore, the target objects tend to be highly interesting to users before they have interacted with, they have completed a visual inspection of the image. Thus, some researchers use gaze points as an interaction pattern to assist in the segmentation of target objects. However, accurate capture of gaze points requires expensive eye tracking devices and the process of setting up and calibrating these devices for different users is time consuming and inconvenient. An alternative strategy is to use a gaze prediction model to generate a gaze graph that estimates the user's gaze area for a particular image. But the estimated gaze area may be different from the target object with which the user interacts. The estimated gaze area is based on significance assumptions, an purposeless free gaze. While the user has an explicit target object to perform interactive object segmentation even if the object is not salient. Thus, if an estimated gaze pattern is used instead of a real gaze, this discrepancy problem caused by the estimated gaze pattern has to be overcome.
Disclosure of Invention
The invention aims to provide an interactive object segmentation method based on boundary box input and gaze point assistance, which uses an interactive form combining an input box and an estimated gaze point diagram for interactive segmentation.
The technical scheme for realizing the aim of the invention is that the interactive object segmentation method based on bounding box input and gaze point assistance comprises the following steps:
step 1, obtaining a boundary box of a segmented object in an image I, converting the boundary box into a binary boundary box B, simultaneously obtaining a gaze point FM of a target image, and erasing gaze point information outside an input box to obtain a processed gaze point map
Step 2, inputting the image I and the boundary block diagram B into an initial segmentation network Coarse U-Net to generate a Coarse segmentation result MC and multi-scale features based on frames
Step 3, calculating an initial segmentation result MC and a processed gaze point diagramThe similarity of the gaze point map is adjusted according to the similarity, and an adjusted gaze point map FM' is obtained;
Step 4, connecting the image I, the adjusted gaze point diagram FM' and the rough segmentation result MC in the channel dimension, inputting the connected images to a refinement segmentation network REFINEMENT U-Net, and extracting refinement featuresAnd fusing the extracted frame-based features of the Coarse U-Net layer by layer in the process
Step 5, refining the characteristicsAnd (3) inputting a decoder of the refinement network REFINEMENT U-Net to decode to obtain a final segmentation result M.
Further, step 1, obtaining a boundary box of the split object in the image I, converting the boundary box into a boundary block B with a binary size, simultaneously obtaining the gaze point FM of the target image, and erasing the gaze point information outside the input box to obtain a processed gaze point mapThe method comprises the following steps:
Step 1-1, labeling a segmentation object in an image I in a picture frame mode, and recording a lower left corner coordinate (xmin,ymin) and an upper right corner coordinate (xmax,ymax) of a rectangular frame;
Step 1-2, calculating a boundary block diagram B:
Step 1-3, using a trained gaze point prediction model TRANSALNET, inputting an image I, and sequentially passing through a CNN encoder, a transducer encoder and a CNN decoder of TRANSALNET to obtain an estimated gaze point map FM;
Step 1-4, setting all pixel points outside the input frame in the estimated gaze point diagram FM to 0 so as to erase gaze point information outside the input frame in the FM and obtain a processed gaze point diagram
Where-represents pixel-by-pixel multiplication.
Further, step 2, inputting the image I and the boundary block diagram B into an initial segmentation network Coarse U-Net to generate a Coarse segmentation result MC and a multi-scale feature based on a frameThe method comprises the following steps:
Step 2-1, connecting an image I with a boundary block diagram B in a channel dimension to obtain an Input tensor Input;
Step 2-2, inputting Input into an encoder part of a Coarse U-Net to perform feature extraction, wherein the process is as follows:
Wherein,Features representing the encoder layer i output of the Coarse U-Net,AndConvolution blocks respectively representing ith layer extracted features of Coarse U-Net encoder and parameters thereof, each convolution block comprising three 3×3 convolution layers and ReLu activation layers, maxPooling representing maximum pooling operation, and finally obtaining five features of different scalesi=1,2,...,5;
Step 2-3, characterizingThe input decoder decodes as follows:
Wherein,Features representing the decoder i-th layer output of the Coarse U-Net,AndConvolution blocks representing the i-th layer extraction features of the decoder of Coarse U-Net and their parameters, respectively, each convolution block comprising three 3×3 convolution layers and ReLu activation layers, concat (·, ·) representing the connection tensor in the channel dimension, upsample representing the upsampling operation, and finallyThe channel value is reduced to 1 through a3×3 convolution layer, and a preliminary segmentation result MC is obtained after Sigmoid operation.
Further, step 3, calculating an initial segmentation result MC and a processed gaze point mapAnd adjusts the gaze point map according to the similarity of the gaze point map to obtain an adjusted gaze point map FM', specifically as follows:
step 3-1, calculating an initial segmentation result MC and a gaze point mapSimilarity α of (2);
Wherein, sum || represents pixel-by-pixel multiplication and summation, respectively;
step 3-2, use of alpha pairGlobal adjustment is performed:
Further, step 4, connecting the image I, the adjusted gaze point diagram FM' and the rough segmentation result MC in the channel dimension, inputting the connected images to the refinement segmentation network REFINEMENT U-Net, and extracting refinement featuresAnd fusing the extracted frame-based features of the Coarse U-Net layer by layer in the processThe method comprises the following steps:
Step 4-1, connecting an original image I, an adjusted gaze point diagram FM' and a rough segmentation result MC in a channel dimension to obtain an Input tensor Input2;
Step 4-2, extracting features layer by inputting Input2 into an encoder of REFINEMENT U-NetAt the same time, the cross jump connection module is used for fusing the characteristics of the Coarse U-Net corresponding layerThe specific formula is as follows:
Wherein,AndThe encoder, representing REFINEMENT U-Net separately, extracts the characteristic convolutions and their parameters, each convolutions containing three 3 x 3 convolutions layers and ReLu activation layers,Representing a refinement of the ith layer of the encoder of REFINEMENT U-Net, phir represents two crisscross attention operations.
Further, step 5, the features will be refinedThe decoder of the input refinement network REFINEMENT U-Net decodes to obtain a final segmentation result M, which is specifically as follows:
Wherein,AndThe convolutional blocks representing the characteristics of the ith layer extraction of the REFINEMENT U-Net decoder, and their parameters, respectively, each comprising three 3 x 3 convolutional layers and ReLu active layer,Features representing the ith layer output of the REFINEMENT U-Net decoder, upsample represents the upsampling operation, and finallyThe channel value is reduced to 1 through a3×3 convolution layer, and the final segmentation result M is obtained through Sigmoid operation.
The interactive object segmentation system based on the boundary box input and the gaze point assistance is characterized in that the interactive object segmentation method based on the boundary box input and the gaze point assistance is implemented, and the interactive object segmentation based on the boundary box input and the gaze point assistance is realized.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the bounding box input and gaze point assisted interactive object segmentation method when executing the computer program, implementing bounding box input and gaze point assisted interactive object segmentation.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the bounding box input and gaze point aided interactive object segmentation method described, enabling bounding box input and gaze point aided interactive object segmentation.
Compared with the prior art, the invention has the remarkable advantages that:
1) The invention uses the interaction mode of combining the input box and the gaze point diagram. The implicit interactive information is fully utilized to assist the segmentation, so that the segmentation quality is improved.
2) The double U-Net segmentation network structure, the gaze point diagram adjustment module and the cross jump connection and self-attention mechanism provided by the invention can effectively relieve the difference problem caused by estimating the gaze point diagram, and further improve the segmentation quality.
3) The method can be directly used as a plug-in optimization tool to improve the segmentation quality of other interactive object segmentation models based on the input frame. It allows any input box based interactive object segmentation model to conveniently obtain the help of estimating gaze point patterns.
Drawings
FIG. 1 is a flow chart of an interactive image segmentation method according to the present invention.
Fig. 2 is a schematic structural diagram of an interactive image segmentation model according to the present invention.
Fig. 3 is an explanatory view of the effect of the gaze point map in the embodiment of the present invention. The first column represents an image with a green input box. The second column displays the estimated gaze point map of the entire image. The third column is the true segmentation result. The fourth and fifth columns show the segmentation results of W-Net without gaze point map (FM) and with gaze point map (FM), respectively.
Fig. 4 shows an example of REFINEMENT U-Net effect in an embodiment of the present invention. The input box is marked green and gives a corresponding estimated gaze point map (Fixation) and true segmentation results (GT). Coarse U-Net "," Coarse U-Net+FM "and" W-Net "represent the segmentation results of Coarse U-Net, coarse U-Net with fixation map and W-Net, respectively.
FIG. 5 is a graph of partial segmentation results for DGC, SAM, IOG and W-Net in an embodiment of the present invention.
Wherein the input box is marked green in all methods and the red dots represent additional input points employed in the IOG.
FIG. 6 is a graph of qualitative results of IOG and W-Net at different ratio input boxes in an embodiment of the present invention. The green box represents the input box and the red dot represents the additional input point employed by the IOG.
FIG. 7 is a graph showing comparison of the segmentation results of SAM and SAM+W-Net in the examples of the present invention. The green box represents the input box.
FIG. 8 is a graph comparing the segmentation results of IOG and IOG+W-Net in an embodiment of the invention. The green box represents the input box and the red dot represents the additional input point employed by the IOG.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Referring to fig. 1 and 2, the interactive object segmentation method based on bounding box input and gaze point assistance of the present invention comprises the following steps:
and step 1, acquiring a segmentation object and a boundary box in the target image, and estimating a gaze point diagram of the target image. The method comprises the following steps:
For a given target image. First, a rectangular frame is drawn on the image to mark the target object. Then, the information of the frame is converted into a binary image in which the pixel value inside the frame is set to 255 (representing the target area) and the pixel value outside the frame is set to 0 (representing the background). Next, using the trained gaze point prediction model TRANSALNET, the input image I sequentially passes through the CNN encoder, the transducer encoder, and the CNN decoder of TRANSALNET, and then an estimated gaze point map is obtained. Since the estimated gaze point map is based on free gaze, in this step, it is necessary to process the gaze point map and erase the out-of-frame gaze point. Specifically, all pixel points in the estimated gaze point map that are located outside the input box are set to 0 to avoid interference with subsequent processing, thereby ensuring that only gaze points within the target area are focused.
And 2, inputting the target image and the input box into an initial segmentation network Coarse U-Net to obtain a Coarse segmentation result and multi-scale characteristics based on the box. The method comprises the following steps:
the target image is combined with the input box information to form a four-dimensional tensor as input. In Coarse U-Net, the encoder section is responsible for extracting the multi-scale features of the input data, which are denoted asWhere i=1, 2,..5. Each corresponding to a particular scale of feature representations. The decoder then fuses the multi-scale features extracted by the encoder by an inter-skip connection mechanism and initially generates a coarse segmentation result Mc.
And step 3, inputting the initial segmentation result and the gaze point diagram into a gaze point diagram adjustment module so as to calculate the similarity of the initial segmentation result and the gaze point diagram and adjust the gaze point diagram.
As previously described, there may be a discrepancy between the estimated gaze point area and the user's real target object. Accordingly, a gaze point adjustment module was developed to estimate the reliability of the gaze point map and thereby further adjust the gaze point map. As described in step one, only the range of the input box in the gaze point map is consideredThen measuring the relativity between the coarse segmentation result Mc, wherein the specific calculation formula is as follows
Where, sum || represents pixel-wise multiplication and summation, respectively. In practice, the calculationAnd the cross-correlation ratio (IoU) between Mc. A higher α means a higher reliability of agreement between the gaze area in the estimated gaze view and the user's real target object and vice versa. Thus, utilize alpha pairGlobal adjustment is performed:
It can be seen that, when alpha is low,Can be suppressed so as to control the influence thereof to some extent. In the extreme case, when alpha is zero,Will be set to zero. This means that the gaze point area is completely independent of the user's real target object and the segmentation network should be completely dependent on the information provided by the input box.
Step 4, connecting the image I, the adjusted gaze point diagram FM' and the rough segmentation result MC in the channel dimension, inputting the connected images to a refinement segmentation network REFINEMENT U-Net, extracting refinement features, and merging the features extracted by the Coarse U-Net layer by layer in the process based on the frame
Specifically, first, an original image I, an adjusted gaze point map FM', and a rough segmentation result MC are connected in a channel dimension to obtain an Input tensor Input2;
Then, input2 is Input to an encoder of REFINEMENT U-Net to extract features layer by layerAt the same time, the cross jump connection module is used for fusing the characteristics of the Coarse U-Net corresponding layerThe specific formula is as follows:
Wherein,AndThe encoder, representing REFINEMENT U-Net separately, extracts the characteristic convolutions and their parameters, each convolutions containing three 3 x 3 convolutions layers and ReLu activation layers,Representing a refinement of the ith layer of the encoder of REFINEMENT U-Net, phir represents two crisscross attention operations.
Although the estimated gaze point map has been somewhat subjected to global adjustment by the previous gaze point map adjustment module, adverse effects that may be caused by the adjusted gaze point map have not yet been completely eliminated. The input box may provide more reliable target object interaction information than the estimated gaze point map. Thus, in the feature extraction process of REFINEMENT U-Net encoder, the Coarse U-Net extracted features also cross-hop with the current features at different levels. Moreover, these connected features are very efficient through two crisscross attention operations. Thus, features extracted based on the input box can be further utilized to enhance its dominant effect and better fuse these connection features together through a self-attention mechanism, thus properly relating to gaze point information.
Step 5, feature after fusionDecoding is carried out through a decoder of the refinement network REFINEMENT U-Net, and a final segmentation result M is obtained. The method comprises the following steps:
Wherein,AndThe convolutional blocks representing the characteristics of the ith layer extraction of the REFINEMENT U-Net decoder, and their parameters, respectively, each comprising three 3 x 3 convolutional layers and ReLu active layer,Features representing the ith layer output of the REFINEMENT U-Net decoder, upsample represents the upsampling operation, and finallyThe channel value is reduced to 1 through a3×3 convolution layer, and the final segmentation result M is obtained through Sigmoid operation.
Examples
In order to verify the effectiveness of the inventive protocol, the following simulation experiments were performed.
The interactive object segmentation model W-Net based on the boundary box input and the fixation point assistance is implemented as follows:
Step 1, given a target image I with the size of 3×320×320 in fig. 2, a boundary box of a segmentation object in the image is acquired and converted into a boundary block diagram B with the size of 3×320×320 binary. At the same time, the gaze point of the target image is acquired and processed.
Step 1-1, labeling the segmentation object in the image I by a picture frame mode, recording the left lower corner coordinate (xmin,ymin) and the right upper corner coordinate (xmax,ymax) of the rectangular frame,
Step 1-2, calculating a boundary block diagram B according to the following formula:
step 1-3, selecting a trained gaze point prediction model, inputting an image I, and obtaining an estimated gaze point map FM (with the size of 1 multiplied by 320) through reasoning.
Step 1-4, using the following calculation formula to set all pixel points outside the input frame in the estimated gaze point diagram to 0 so as to erase gaze point information outside the input frame in FM and obtain the processed gaze point diagram
Where-represents pixel-by-pixel multiplication.
And step 2, generating an initial segmentation result. Inputting the image I and the boundary block diagram B into an initial segmentation network Coarse U-Net to generate a Coarse segmentation result and multi-scale characteristics based on a frame
Step 2-1, connecting the image I with the boundary block diagram B in the channel dimension to obtain an Input tensor Input with the size of 4 multiplied by 320.
And 2-2, inputting Input into an encoder part of a Coarse U-Net, and extracting features. The calculation formula is as follows:
Wherein,Representing the characteristics of the output of the ith layer of the encoder,AndConvolution blocks representing i-th layer extracted features and parameters thereof, respectively, each convolution block including three 3×3 convolution layers and ReLu activation layers. Finally, five features with different scales are obtained
Step 2-3, characterizingThe input decoder decodes. The process is as follows:
Wherein,Representing the characteristics of the output of the ith layer of the decoder,AndThe convolution blocks representing the i-th layer extracted features of the decoder and their parameters, respectively, each include three 3 x 3 convolutional layers ReLu active layers. Upsample denotes an upsampling operation. FinallyThe channel value is reduced to 1 through a3×3 convolution layer, and a preliminary segmentation result MC is obtained after Sigmoid operation.
And 3, adjusting the gaze point diagram. Combining the initial segmentation result MC with the gaze point mapAnd inputting a gaze point diagram adjustment module, calculating the similarity of the gaze point diagram and the gaze point diagram, and adjusting the gaze point diagram according to the similarity. The method comprises the following specific steps:
Step 3-1, calculating the initial segmentation result MC and the gaze point map according to the following formulaSimilarity α of (c).
Where, sum || represents pixel-wise multiplication and summation, respectively.
Step 3-2, using the alpha pair according to the following formulaFor each pixel value of (a)Global adjustment is performed:
Step 4, connecting the image, the adjusted gaze point diagram and the rough segmentation result in the channel dimension, inputting the connected result to a refinement segmentation network REFINEMENT U-Net, and extracting refinement featuresAnd fusing the frame-based multi-scale features layer by layer through a cross-jump connection moduleThe method comprises the following steps:
And step 4-1, connecting the original image, the adjusted gaze point diagram and the rough segmentation result in the channel dimension to obtain an Input tensor Input2 with the size of 4 multiplied by 320.
Step 4-2, extracting features layer by inputting Input2 into an encoder of REFINEMENT U-Net, and fusing features of corresponding layers of Coarse U-Net by using a cross jump connection moduleThe specific formula is as follows:
Wherein,AndThe convolution blocks representing the extracted features and their parameters, respectively, each convolution block contains three 3x3 convolution layers and ReLu activation layers.Indicating the refinement of the i-th layer. Phir denotes two crisscross attention operations (Criss-CrossAttention).
Step 5, characterizingThe decoder of input REFINEMENT U-Net decodes. The process is as follows:
Wherein,AndThe convolution blocks representing the i-th layer extraction features of the decoder and parameters thereof respectively, each convolution block comprises three 3 x 3 convolution layers and ReLu activation layers.Representing the characteristics of the decoder i-th layer output. Upsample denotes an upsampling operation. FinallyThe channel value is reduced to 1 through a3×3 convolution layer, and the final segmentation result M is obtained through Sigmoid operation.
In summary, the present invention exploits complementarity between an input box and an estimated view of an object to improve input box based interactive object segmentation. The proposed W-Net framework ensures that segmentation is mainly guided by features extracted from the input box, and assisted by auxiliary information extracted from the estimated view, improving the accuracy of segmentation.
1. Experimental setup
1.1. Data set
The training phase trains the W-Net network using an augmented Pascal VOC dataset with 10582 and 1449 images for training and validation, respectively. In particular, an object in an image is treated as a training sample. The performance of the proposed method was evaluated at the test stage on the GrabCut, berkeley, DAVIS popular interactive object segmentation evaluation dataset. The GrabCut dataset contains 50 images, which are used to evaluate the performance of the interactive segmentation model. The Berkeley dataset contains 96 images with a total of 100 masks of target segmented objects for testing. The DAVIS dataset contained 50 videos, which were evaluated using 3440 frames of pictures containing the object therein. The GrabCut dataset provides an input box using the object-real bounding box as the input box for the Berkeley and DAVIS datasets.
1.2. Implementation details
All input tensors are sized 320 x 320. First, the Coarse U-Net is pre-trained for 50 rounds, then parameters of the Coarse U-Net are frozen, and the Coarse U-Net is trained for 50 rounds of REFINEMENT U-Net. Training of both networks uses binary cross entropy loss. The learning rate was set to 10-5. The batch size was set to 8 during the Coarse U-Net training and to 2 during the REFINEMENT U-Net training. In all experiments, adam optimizers of β1 =0.9 and β2 =0.999 were used. Implementation of the method is based on PyTorch framework, training and testing is completed by using NVIDIA GTX 3080Ti GPU.
1.3. Evaluation index
The error rate E is used as an evaluation index, i.e. the percentage of misclassified pixels within the input box is calculated. The following is shown:
Where P, G represents the predicted segmentation mask and the true mask, respectively, S represents the area of the input box, and |·| represents the pixel-by-pixel addition operation. The error rate takes into account not only the error but also the size of the input box. The same number of misclassified pixels in a relaxed input box compared to a tight input box means better segmentation quality and lower error rate, as a relaxed input box generally results in a more difficult segmentation task. A lower average error rate over the dataset indicates better overall performance of the segmentation method.
2. Ablation experiments
In the method, the core idea is to use the estimated gaze point map to assist the input box based interactive object segmentation. Therefore, an ablation study was first conducted to investigate the effectiveness of the estimated gaze point map and the various components in the proposed W-Net framework. The training procedure for the different ablation experiments was the same as the previous experimental setup. Experimental results were evaluated on GrabCut, berkeley and DAVIS datasets.
2.1 Gaze point pattern analysis
For the estimated gaze point map, on the one hand, the sensitivity of the W-Net model to the different gaze point prediction models needs to be evaluated, and on the other hand, the validity of the estimated gaze point map should be studied by comparing the segmentation performance of the models with gaze point map input and without gaze point map input. The specific procedure and results are as follows.
2.1.1 Sensitivity of the framework to gaze point map prediction models
To evaluate the sensitivity of the W-Net model to different gaze point prediction models, two different gaze point prediction models, tempsal and RINet, were selected in addition to TRANSALNET used in the present method. They generate respective gaze point patterns and then retrain and test W-Net accordingly. The test results are shown in Table 1.
Table 1 uses different gaze points error Rate of predictive model (%)
With TRANSALNET, W-Net achieved optimal performance over all datasets, while RINet had slightly better auxiliary performance than Tempsal. Their behavior reflects the respective influence of the gaze point prediction model in different situations. The three sets of data are not very different, indicating that the W-Net model is not very sensitive to a particular gaze point prediction model, but it still requires a better quality gaze point prediction model to refine. TRANSALNET gaze point prediction model was used in all subsequent experiments.
2.1.2 Validity of gaze point map
To verify the refinement of the estimated gaze point map in the model, the estimated gaze point map is excluded from REFINEMENT U-Net and compared while keeping the other components of the W-Net model unchanged. I.e. the input box is the only interactive input.
Table 2 error rate comparison with or without gaze point map input
As shown in table 2, the model without gaze point map assistance was estimated to exhibit a higher error rate. After the estimated fixation point is added, the segmentation performance is obviously improved, and the error rate is reduced. This shows that the segmentation quality can be improved if the estimated gaze point map is used appropriately when refining the U-Net.
In addition, some segmentation results are shown in fig. 3 to illustrate the assistance of estimating the gaze point map. It can be seen that the estimated gaze point map can help the model not only remedy the undivided object areas (e.g. petals of the first row and human arms of the second row), but also eliminate redundant erroneously segmented areas (e.g. trunks of the third row and roofs of the fourth row).
2.2 Model structural analysis
In this section, ablation experiments were performed to verify the effectiveness of the components in the proposed framework, including REFINEMENT U-Net, gaze point map adjustment module, and cross-jump connection module of the self-attention mechanism.
2.2.1Refinement U-Net availability
To demonstrate that REFINEMENT U-Net can improve the Coarse segmentation results of Coarse U-Net. Two comparative experiments were performed, first, a pure Coarse U-Net was retrained, whose input consisted of images and boundary block diagrams, which were taken as baselines. In a second experiment, the estimated gaze point map was added as a tensor to the Coarse U-Net input, and then this single U-Net was retrained again, which is Coarse U-Net+FM.
TABLE 3 error Rate (%) comparison for different combinations of gaze point plot, coarse U-Net and Refine U-Net
The experimental results in Table 3 show that the W-Net model using REFINEMENT U-Net performs best. Meanwhile, the segmentation result of Coarse U-Net+FM is even worse than that of baseline (Coarse U-Net). This means that simple integration of the estimated gaze point map, such as a connection at the input, does not improve the segmentation quality. The disparity problem caused by the estimated gaze point map may even reduce the segmentation quality. Because REFINEMENT U-Net can limit the estimated gaze point diagram, the auxiliary effect of the gaze point diagram is fully utilized, and the adverse effect of the gaze point diagram is overcome as much as possible. Thus, REFINEMENT U-Net refines the coarse segmentation results compared to baseline.
The effect of REFINEMENT U-Net in refining the coarse segmentation results is also illustrated by an example. In fig. 4, if the estimated gaze point map with partial differences is simply input into the Coarse U-Net, the segmentation result will be worse (Coarse U-net+fm). However REFINEMENT U-Net significantly improves segmentation results by imposing constraints on the gaze point map.
2.2.2 Validity of the gaze point adjustment Module
In the W-Net framework, the gaze point adjustment module aims to explicitly adjust the estimated gaze point, thereby alleviating its adverse effects to some extent. To verify its effectiveness, a comparison is made after the gaze point adjustment module is removed. This means that the estimated fixed point plot under the input box range is directly concatenated with the Coarse segmentation result of the image and the Coarse U-Net as input for REFINEMENT U-Net.
Table 4 compares the model error rates (%) of the gaze point map adjustment module.
As shown in table 4, clear adjustment of the gaze point map is a necessary step to ensure the segmentation quality. Specifically, the correlation between the Coarse result of the Coarse U-Net output and the estimated gaze point map is a reliable indicator for controlling the adverse effects of estimating gaze point map.
2.2.3 Availability of Cross-jump connection Module
In addition to the gaze point map adjustment module, a cross-skip module with self-attention mechanism was developed in the REFINEMENT U-Net encoder to enhance the dominance of features extracted based on input boxes. To demonstrate the effectiveness of establishing a cross-jump connection between the corresponding layers of Coarse U-Net and REFINEMENT U-Net, a comparative experiment was performed with the cross-jump connection with self-attention mechanism deleted (i.e., two U-Nets running independently) on the basis of the W-Net framework.
Table 5 comparison of model error rates (%) with and without cross-connect modules
In table 5, W-Net, after adding the cross jump connection, strengthens the dominant role of feature extraction based on the input box, and fuses these comprehensive features together, further alleviating the difference problem in feature extraction, and generating a fine segmentation result. It can be seen from tables 4 and 5 that the gaze point map adjustment module and the cross jump connection module are both important, and are indispensable for REFINEMENT U-Net for improving the segmentation quality.
2. Comparative experiments
In addition to ablation studies, comparisons were made with some of the most advanced methods, namely DGC, SAM and IOG, on GrabCut, berkeley and DAVIS datasets. Meanwhile, the W-Net is used as an optimization tool to perfect SAM and IOG (namely, on the basis of the W-Net, the rough segmentation result diagram of the input REFINEMENT U-Net is directly replaced by the segmentation result diagram of the SAM/IOG, so that the SAM/IOG and other methods can be optimized). These two combinations are SAM+W-Net and IOG+W-Net, respectively. In addition, the center position of each original input box is fixed and its size is enlarged in different proportions (i.e., 1.1, 1.2, 1.3, 1.4). All methods were tested on these loose input boxes to evaluate their ability to handle different input box cases. Since SAM only considers the appropriate object size in the small range of the input box during the training phase as a generic segmentation model, it performs very poorly in the test when the input box is loose or the object is too large, and therefore only reports the default output performance of SAM in the case of being processable. For the IOG, it requires one more point in the object area in addition to the input box. This point needs to be referred to the true segmentation result, determined by its source code in the test.
Table 6 error rates (\%) for the different segmentation methods across GrabCut, berkeley and DAVIS datasets. The best results are indicated in bold.
The comparison result is shown in fig. 6. In a single method comparison, it can be seen that W-Net can achieve better performance on the GrabCut dataset, while SAM performs better on the Berkeley dataset and DAVIS dataset than other methods when the input box is narrower. This reflects the strong ability of SAM as a generic segmentation model.
Some segmentation examples are shown in fig. 5. With the aid of the gaze point, the W-Net may cover certain areas that other models ignore, such as hammers in the second column, character bodies in the third column and last column, while non-object areas, such as trunks in the first column, may also be excluded.
In the case of loose input boxes, W-Net is worse than IOG. The main reason is that the loose input box reduces the performance of the Coarse U-Net, worsens the adverse effect of estimating the gaze point diagram, and thus makes the Coarse result unable to be refined by REFINEMENT U-Net. In the case of an IOG, it can be seen that its extra, points that explicitly indicate the object region play an important role in mitigating the adverse effects caused by the loose input box. However, in FIG. 6, in the case of division of W-Net and IOG under input boxes of different scales, W-Net still shows its advantages, and in some cases, W-Net can obtain stable division results even if the input box size is increased.
When W-Net is used as a segmentation optimization tool, the error rate of the IOG is further reduced, and the IOG+W-Net can achieve the best performance in all cases. This shows that the W-Net adds gaze point auxiliary function on the basis of the IOG, and is an effective segmentation optimization tool. At the same time, it is also noted that SAM+W-Net reduces SAM performance in some cases where the input box is smaller. For these failure cases, the reason is that although SAM can achieve better performance on overall quality, its variance in segmentation quality is larger compared to IOG. Some SAM segmentation results are of poor quality and cannot be used to control the gaze point map, and even exacerbate its adverse effects, making REFINEMENT U-Net not work well. It follows that W-Net also relies on the coarse segmentation results themselves to extract better features during refinement.
In FIGS. 7 and 8, the effects of SAM+W-Net and IOG+W-Net are illustrated, respectively. In particular, the athlete in the first row and the woman in the last row in fig. 7 demonstrate that the present method can help the SAM complete the segmentation of the target object area. At the same time, the construction in the second row and the railing in the third row illustrate that the present method can exclude erroneous areas from the SAM segmentation result. Also, in fig. 8, the gaze point distribution provides richer target object information than the limited range of clicks used in the IOG. Thus, combining the IOG and W-Net can optimize the partitioning effect.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (9)

Translated fromChinese
1.一种基于边界框输入与注视点辅助的交互式对象分割方法,其特征在于,步骤如下:1. An interactive object segmentation method based on bounding box input and gaze point assistance, characterized in that the steps are as follows:步骤1、获取图像I中分割对象的边界框,并转化为二值的边界框图B,同时,获取目标图像的注视点FM,并抹去输入框外的注视点信息,得到处理后的注视点图Step 1: Get the bounding box of the segmented object in image I and convert it into a binary bounding box map B. At the same time, get the fixation point FM of the target image and erase the fixation point information outside the input box to obtain the processed fixation point map步骤2、将图像I和边界框图B输入初始分割网络Coarse U-Net,生成粗分割结果MC与基于框的多尺度特征Step 2: Input the image I and the bounding box map B into the initial segmentation network Coarse U-Net to generate the coarse segmentation result MC and the box-based multi-scale feature步骤3、计算初始分割结果MC与处理后的注视点图的相似度,并以此调整注视点图,得到调整后的注视点图FM';Step 3: Calculate the initial segmentation result MC and the processed fixation point map The similarity of is used to adjust the fixation point map and obtain the adjusted fixation point map FM';步骤4、将图像I、调整后的注视点图FM'和粗分割结果MC在通道维度上连接,输入到细化分割网络Refinement U-Net,提取细化特征并在此过程中逐层融合Coarse U-Net提取的基于框的特征Step 4: Connect the image I, the adjusted fixation map FM' and the rough segmentation result MC in the channel dimension and input them into the refinement segmentation network Refinement U-Net to extract the refinement features. In this process, the frame-based features extracted by Coarse U-Net are fused layer by layer.步骤5、将细化特征输入细化网络Refinement U-Net的解码器进行解码,得到最终的分割结果M。Step 5: Refine the features Input the decoder of the refinement network Refinement U-Net for decoding to obtain the final segmentation result M.2.根据权利要求1所述的基于边界框输入与注视点辅助的交互式对象分割方法,其特征在于,步骤1、获取图像I中分割对象的边界框,并转化为尺寸为二值的边界框图B,同时,获取目标图像的注视点FM,并抹去输入框外的注视点信息,得到处理后的注视点图具体如下:2. The interactive object segmentation method based on bounding box input and gaze point assistance according to claim 1 is characterized in that, in step 1, a bounding box of the segmented object in the image I is obtained and converted into a bounding box map B with a size of binary, and at the same time, a gaze point FM of the target image is obtained, and the gaze point information outside the input box is erased to obtain a processed gaze point map The details are as follows:步骤1-1、通过画框的方式标注图像I中的分割对象,记录矩形框的左下角坐标(xmin,ymin)和右上角坐标(xmax,ymax);Step 1-1, mark the segmented object in the image I by drawing a frame, and record the coordinates of the lower left corner (xmin , ymin ) and the upper right corner (xmax , ymax ) of the rectangular frame;步骤1-2、计算边界框图B:Step 1-2, calculate the bounding box map B:步骤1-3、使用训练好的注视点预测模型TranSalNet,输入图像I,依次经过TranSalNet的CNN编码器,Transformer编码器和CNN解码器后得到估计的注视点图FM;Step 1-3: Use the trained gaze prediction model TranSalNet, input image I, and obtain the estimated gaze map FM after passing through TranSalNet's CNN encoder, Transformer encoder and CNN decoder in sequence;步骤1-4、将估计的注视点图FM中位于输入框之外的所有像素点置为0,以抹去FM中输入框外的注视点信息,得到处理后的注视点图Step 1-4: Set all pixels outside the input box in the estimated fixation point map FM to 0 to erase the fixation point information outside the input box in FM and obtain the processed fixation point map其中·表示逐像素乘法。where · represents pixel-by-pixel multiplication.3.根据权利要求1所述的基于边界框输入与注视点辅助的交互式对象分割方法,其特征在于,步骤2、将图像I和边界框图B输入初始分割网络Coarse U-Net,生成粗分割结果MC与基于框的多尺度特征具体如下:3. The interactive object segmentation method based on bounding box input and gaze point assistance according to claim 1 is characterized in that, in step 2, the image I and the bounding box map B are input into the initial segmentation network Coarse U-Net to generate a rough segmentation result MC and a box-based multi-scale feature The details are as follows:步骤2-1、在通道维度上连接图像I与边界框图B,得到输入张量Input;Step 2-1, connect the image I and the bounding box map B in the channel dimension to obtain the input tensor Input;步骤2-2、将Input输入Coarse U-Net的编码器部分,进行特征提取,过程如下:Step 2-2: Input the input into the encoder part of Coarse U-Net for feature extraction. The process is as follows:其中,表示Coarse U-Net的编码器第i层输出的特征,分别表示Coarse U-Net编码器的第i层提取特征的卷积块及其参数,每个卷积块包含三个3×3的卷积层和ReLu激活层,MaxPooling表示最大池化操作,最终得到五个不同尺度的特征i=1,2,…,5;in, represents the features of the output of the i-th layer of the encoder of Coarse U-Net, and They represent the convolution blocks and their parameters of the i-th layer feature extraction of the Coarse U-Net encoder. Each convolution block contains three 3×3 convolution layers and a ReLu activation layer. MaxPooling represents the maximum pooling operation, and finally five features of different scales are obtained. i=1,2,…,5;步骤2-3、将特征输入解码器进行解码,过程如下:Step 2-3: Features Input the decoder for decoding, the process is as follows:其中,表示Coarse U-Net的解码器第i层输出的特征,分别表示Coarse U-Net的解码器第i层提取特征的卷积块及其参数,每个卷积块包括三个3×3的卷积层和ReLu激活层,Concat(·,·)表示在通道维度上连接张量,Upsample表示上采样操作,最后经过一个3×3的卷积层将通道值降为1,经过Sigmoid操作后得到初步的分割结果MCin, represents the features of the decoder layer i output of Coarse U-Net, and They represent the convolution blocks and their parameters of the i-th layer of the decoder of Coarse U-Net for extracting features. Each convolution block includes three 3×3 convolution layers and a ReLu activation layer. Concat(·,·) represents the concatenation of tensors in the channel dimension. Upsample represents the upsampling operation. After a 3×3 convolution layer, the channel value is reduced to 1, and after a Sigmoid operation, the preliminary segmentation result MC is obtained.4.根据权利要求1所述的基于边界框输入与注视点辅助的交互式对象分割方法,其特征在于,步骤3、计算初始分割结果MC与处理后的注视点图的相似度,并以此调整注视点图,得到调整后的注视点图FM',具体如下:4. The interactive object segmentation method based on bounding box input and gaze point assistance according to claim 1, characterized in that step 3, calculating the initial segmentation resultMc and the processed gaze point map The similarity of is used to adjust the fixation point map and obtain the adjusted fixation point map FM', which is as follows:步骤3-1、计算初始分割结果MC和注视点图的相似度α;Step 3-1: Calculate the initial segmentation result MC and fixation point map The similarity α;其中,·和||分别表示逐像素乘法和求和;Among them, · and || represent pixel-by-pixel multiplication and summation, respectively;步骤3-2、使用α对进行全局调整:Step 3-2: Use α to To make global adjustments:5.根据权利要求1所述的基于边界框输入与注视点辅助的交互式对象分割方法,其特征在于,步骤4、将图像I、调整后的注视点图FM'和粗分割结果MC在通道维度上连接,输入到细化分割网络Refinement U-Net,提取细化特征并在此过程中逐层融合Coarse U-Net提取的基于框的特征具体如下:5. The interactive object segmentation method based on bounding box input and gaze point assistance according to claim 1 is characterized in that, in step 4, the image I, the adjusted gaze point map FM' and the rough segmentation result MC are connected in the channel dimension and input into the refinement segmentation network Refinement U-Net to extract the refinement features In this process, the frame-based features extracted by Coarse U-Net are fused layer by layer. The details are as follows:步骤4-1、在通道维度上连接原始图像I、调整后的注视点图FM'和粗分割结果MC,得到输入张量Input2Step 4-1, connect the original image I, the adjusted fixation map FM' and the coarse segmentation result MC in the channel dimension to obtain the input tensor Input2 ;步骤4-2、将Input2输入Refinement U-Net的编码器逐层提取特征同时利用交叉跳跃连接模块融合Coarse U-Net对应层的特征具体公式如下:Step 4-2: Input2 into the Refinement U-Net encoder to extract features layer by layer At the same time, the cross-jump connection module is used to fuse the features of the corresponding layers of Coarse U-Net The specific formula is as follows:其中,分别表示Refinement U-Net的编码器提取特征的卷积块及其参数,每个卷积块包含三个3×3的卷积层和ReLu激活层,表示Refinement U-Net的编码器第i层的细化特征,φr表示两次十字交叉注意操作。in, and They represent the convolution blocks and their parameters of the encoder feature extraction of Refinement U-Net. Each convolution block contains three 3×3 convolution layers and a ReLu activation layer. represents the refined features of the i-th layer of the encoder of Refinement U-Net, and φr represents two cross-attention operations.6.根据权利要求1所述的基于边界框输入与注视点辅助的交互式对象分割方法,其特征在于,步骤5、将细化特征输入细化网络Refinement U-Net的解码器进行解码,得到最终的分割结果M,具体如下:6. The interactive object segmentation method based on bounding box input and gaze point assistance according to claim 1, characterized in that step 5, the refined features Input the decoder of the refinement network Refinement U-Net for decoding to obtain the final segmentation result M, as follows:其中,分别表示Refinement U-Net的解码器第i层提取特征的卷积块及其参数,每个卷积块包含三个3×3的卷积层和ReLu激活层,表示Refinement U-Net的解码器第i层输出的特征,Upsample表示上采样操作,最后经过一个3×3的卷积层将通道值降为1,经过Sigmoid操作后得到最终的分割结果M。in, and They represent the convolution blocks and their parameters of the i-th layer of the Refinement U-Net decoder for extracting features. Each convolution block contains three 3×3 convolution layers and a ReLu activation layer. Represents the features of the i-th layer output of the Refinement U-Net decoder, Upsample represents the upsampling operation, and finally After a 3×3 convolution layer, the channel value is reduced to 1, and the final segmentation result M is obtained after a Sigmoid operation.7.一种基于边界框输入与注视点辅助的交互式对象分割系统,其特征在于,实施权利要求1-6任一项所述的基于边界框输入与注视点辅助的交互式对象分割方法,实现基于边界框输入与注视点辅助的交互式对象分割。7. An interactive object segmentation system based on bounding box input and gaze point assistance, characterized in that the interactive object segmentation method based on bounding box input and gaze point assistance described in any one of claims 1-6 is implemented to achieve interactive object segmentation based on bounding box input and gaze point assistance.8.一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实施权利要求1-6任一项所述的基于边界框输入与注视点辅助的交互式对象分割方法,实现基于边界框输入与注视点辅助的交互式对象分割。8. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the method for interactive object segmentation based on bounding box input and gaze point assistance according to any one of claims 1 to 6 is implemented to realize interactive object segmentation based on bounding box input and gaze point assistance.9.一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,实施权利要求1-6任一项所述的基于边界框输入与注视点辅助的交互式对象分割方法,实现基于边界框输入与注视点辅助的交互式对象分割。9. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the interactive object segmentation method based on bounding box input and gaze point assistance according to any one of claims 1 to 6 is implemented to realize interactive object segmentation based on bounding box input and gaze point assistance.
CN202411461106.1A2024-10-182024-10-18 Interactive object segmentation method based on bounding box input and gaze point assistancePendingCN119380021A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202411461106.1ACN119380021A (en)2024-10-182024-10-18 Interactive object segmentation method based on bounding box input and gaze point assistance

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202411461106.1ACN119380021A (en)2024-10-182024-10-18 Interactive object segmentation method based on bounding box input and gaze point assistance

Publications (1)

Publication NumberPublication Date
CN119380021Atrue CN119380021A (en)2025-01-28

Family

ID=94328465

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202411461106.1APendingCN119380021A (en)2024-10-182024-10-18 Interactive object segmentation method based on bounding box input and gaze point assistance

Country Status (1)

CountryLink
CN (1)CN119380021A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10482603B1 (en)*2019-06-252019-11-19Artificial Intelligence, Ltd.Medical image segmentation using an integrated edge guidance module and object segmentation network
RU2742701C1 (en)*2020-06-182021-02-09Самсунг Электроникс Ко., Лтд.Method for interactive segmentation of object on image and electronic computing device for realizing said object
US20210082118A1 (en)*2019-09-182021-03-18Adobe Inc.Enhanced semantic segmentation of images
CN114693927A (en)*2022-03-252022-07-01西安交通大学 An interactive image segmentation method and system based on internal and external guidance
CN114820635A (en)*2022-04-212022-07-29重庆理工大学Polyp segmentation method combining attention U-shaped network and multi-scale feature fusion
CN116485815A (en)*2023-05-042023-07-25中南大学 Medical image segmentation method, device and medium based on dual-scale encoder network
CN117635621A (en)*2024-01-262024-03-01东南大学 Dynamic vision-driven non-sensory interactive segmentation method for large models
CN118069729A (en)*2024-04-172024-05-24菏泽市土地储备中心Method and system for visualizing homeland ecological restoration data based on GIS

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10482603B1 (en)*2019-06-252019-11-19Artificial Intelligence, Ltd.Medical image segmentation using an integrated edge guidance module and object segmentation network
US20210082118A1 (en)*2019-09-182021-03-18Adobe Inc.Enhanced semantic segmentation of images
RU2742701C1 (en)*2020-06-182021-02-09Самсунг Электроникс Ко., Лтд.Method for interactive segmentation of object on image and electronic computing device for realizing said object
CN114693927A (en)*2022-03-252022-07-01西安交通大学 An interactive image segmentation method and system based on internal and external guidance
CN114820635A (en)*2022-04-212022-07-29重庆理工大学Polyp segmentation method combining attention U-shaped network and multi-scale feature fusion
CN116485815A (en)*2023-05-042023-07-25中南大学 Medical image segmentation method, device and medium based on dual-scale encoder network
CN117635621A (en)*2024-01-262024-03-01东南大学 Dynamic vision-driven non-sensory interactive segmentation method for large models
CN118069729A (en)*2024-04-172024-05-24菏泽市土地储备中心Method and system for visualizing homeland ecological restoration data based on GIS

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
GONGYANG LI ET AL.: "Personal Fixations-Based Object Segmentation With Object Localization and Boundary Preservation", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》, vol. 30, 31 December 2021 (2021-12-31), pages 1461 - 1475, XP011837012, DOI: 10.1109/TIP.2020.3044440*
RENCHENG WU ET AL.: "A multi-scale interactive U-Net for pulmonary vessel segmentationmethod based on transfer learning", 《BIOMEDICAL SIGNAL PROCESSING AND CONTROL》, vol. 80, 18 November 2022 (2022-11-18), pages 1 - 12*
XIAOFEI ZHOU ET AL.: "Edge-Aware Multi-Level Interactive Network for Salient Object Detection of Strip Steel Surface Defects", 《DIGITAL OBJECT IDENTIFIER》, vol. 9, 10 November 2021 (2021-11-10), pages 1 - 12*
熊伟 等: "基于神经网络的遥感图像海陆语义分割方法", 《计算机工程与应用》, no. 15, 31 December 2020 (2020-12-31), pages 227 - 233*
陆安琴 等: "融入极值点特征的深度交互式图像分割方法研究", 《信息通信》, no. 06, 15 June 2020 (2020-06-15), pages 71 - 74*

Similar Documents

PublicationPublication DateTitle
Kang et al.Ddcolor: Towards photo-realistic image colorization via dual decoders
CN110443173B (en) A method and system for video instance segmentation based on inter-frame relationship
JP7542740B2 (en) Image line of sight correction method, device, electronic device, and computer program
WO2023231329A1 (en)Medical image semantic segmentation method and apparatus
CN113763300B (en) A Multi-focus Image Fusion Method Combined with Depth Context and Convolutional Conditional Random Field
CN119180826B (en)Diffusion model, multi-scale and attention module medical ultrasonic image segmentation method
Zhang et al.GAIN: Gradient augmented inpainting network for irregular holes
CN112862838A (en)Natural image matting method based on real-time click interaction of user
CN117952830A (en) A stereo image super-resolution reconstruction method based on iterative interactive guidance
CN112634331B (en) Optical flow prediction method and device
CN118196405A (en) Semantic segmentation method of power equipment based on visible light and infrared image feature fusion
Guo et al.Monocular 3D multi-person pose estimation via predicting factorized correction factors
US20230098276A1 (en)Method and apparatus for generating panoramic image based on deep learning network
CN116030077A (en) Video salient region detection method based on multi-dataset collaborative learning
CN114663951B (en)Low-illumination face detection method and device, computer equipment and storage medium
CN115358927A (en) An Image Super-resolution Reconstruction Method Combining Space Adaptation and Texture Transformation
CN116403142B (en)Video processing method, device, electronic equipment and medium
CN119380021A (en) Interactive object segmentation method based on bounding box input and gaze point assistance
CN117176967A (en)Video significance prediction method based on context optimization and time recursion
CN115984289A (en)Laparoscope image segmentation method and system and computer storage medium
Zhang et al.DIMF-Nets: depth-informed cross-modal fusion in three-stream networks for enhanced unsupervised video object segmentation
Chen et al.LINR: A Plug-and-Play Local Implicit Neural Representation Module for Visual Object Tracking
CN115393491A (en)Ink video generation method and device based on instance segmentation and reference frame
CN112734728A (en)Improved CNN non-reference image quality evaluation method
CN112241965A (en) A method for superpixel generation and image segmentation based on deep learning

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp