Disclosure of Invention
An object of the embodiments of the present application is to provide a method, apparatus, device, storage medium and program product for detecting a bladder focus, which are used for improving the problem that the existing detection mode is prone to false detection or omission.
In a first aspect, an embodiment of the present application provides a method for detecting a bladder focus, the method including:
Acquiring multi-frame bladder ultrasonic images;
identifying a target filling state of a bladder in each frame of bladder ultrasonic image;
And determining a corresponding detection network model according to the target filling state, and detecting the bladder focus through the determined detection network model to obtain a detection result, wherein each filling state corresponds to one detection network model.
In the implementation process, important prior information is provided for subsequent focus detection by identifying the filling state of the bladder in each frame of bladder ultrasonic image, the bladder in different filling states is detected by detecting the network model, and the detecting network model is dynamically adjusted by the identified filling state, so that the detecting network model can be more focused on the fine detection of the focus, adapt to the image characteristic change of the bladder in different filling states, effectively improve the accuracy of focus detection and reduce the probability of false detection or omission.
Optionally, the determining a corresponding detection network model according to the target filling state includes:
Determining the current filling state according to the target filling state of the bladder in the multi-frame bladder ultrasonic image;
and determining a corresponding detection network model according to the current filling state.
In the implementation process, the current filling state is determined by combining the multi-frame images, so that the problem that the classification result is inaccurate due to the influence of factors such as noise on a single-frame image can be reduced.
Optionally, the determining the current filling state according to the target filling state of the bladder in the multi-frame bladder ultrasonic image includes:
And if the target filling states of the bladders in the bladder ultrasonic images of the first set number are the same, determining the current filling state as the target filling state.
In the implementation process, whether the target filling states reaching the set number are the same is identified, if so, the filling states are stable, and therefore the identification accuracy and stability can be improved.
Optionally, the determining the current filling state according to the target filling state of the bladder in the multi-frame bladder ultrasonic image includes:
If the target filling states of the bladders in the bladder ultrasonic images of the second set number are different, determining the current filling state as the target filling state with the largest number.
In the implementation process, false detection and missing detection caused by misjudgment of single-frame images can be reduced through analysis of continuous multi-frame images.
Optionally, the determining the current filling state according to the target filling state of the bladder in the multi-frame bladder ultrasonic image includes:
If the target filling states of the bladder in the bladder ultrasound images of the third set number are different, determining that the current filling state comprises various different target filling states.
In the implementation process, if a large number of target filling states are different, the filling states are changed, and at the moment, detection can be performed by adopting detection network models corresponding to various filling states, so that the accuracy and the stability of detection results can be improved.
Optionally, the identifying the target filling state of the bladder in each frame of bladder ultrasound image includes:
Acquiring confidence degrees of various filling states of the bladder in each frame of bladder ultrasonic image identified by the classification network model, wherein the filling states comprise an unfilled state, a standard filling state and an overfilling state;
and determining the filling state with the maximum confidence as a target filling state.
In the implementation process, the target filling state is determined through the confidence, so that the filling state of the bladder in each frame of image can be determined, and the accurate selection and detection of the network model are facilitated.
Optionally, the determining a corresponding detection network model according to the target filling state includes:
Determining the current filling state according to the target filling state of the bladder in the multi-frame bladder ultrasonic image and the corresponding confidence level;
and determining a corresponding detection network model according to the current filling state.
In the implementation process, the current filling state is determined by combining the confidence coefficient, so that the recognition accuracy and stability can be improved.
Optionally, the identifying the target filling state of the bladder in each frame of bladder ultrasound image includes:
Identifying a target filling state of a bladder in each frame of bladder ultrasonic image through a classification network model, wherein the filling state comprises an unfilled state, a standard filling state and an overfilled state;
Wherein the classification network model is EFFICIENTNET-B4 classification network and/or the detection network model is YOLOv9 detection network. By adopting a scheme combining EFFICIENTNET-B4 classification network and YOLOv detection network, accurate classification of filling state and accurate detection of bladder lesions can be realized.
Optionally, during the training of the EFFICIENTNET-B4 classification network, training is performed in a manner that freezes the shallow feature extractor, fine-tunes the network parameters of the deep network. In the training process of EFFICIENTNET-B4 classification networks, parameters of shallow layer feature extractors can be frozen, namely weights of the network layers are not updated, so that the extraction capacity of a pre-training model for basic features can be reserved, and the weakening of the generalization capacity of the basic features caused by over training is avoided. By fine tuning parameters of the deep network, high-level characteristics of the network can be better adapted to specific requirements of medical images, and classification capability of bladder filling states is improved.
Optionally, the size of the Anchor of the YOLOv detection network corresponding to the standard filling state is a standard size, the size of the Anchor of the YOLOv detection network corresponding to the non-filling state is smaller than the standard size by a set proportion, and the size of the Anchor of the YOLOv detection network corresponding to the excessive filling state is larger than the standard size by a set proportion.
In the implementation process, in the state that the bladder is not full, the focus may be relatively smaller due to the shrinkage of the bladder form, the Anchor size of YOLOv is reduced by 50%, and the boundary of the small focus can be predicted more finely. In the bladder overfill state, the focus may be relatively large due to the bladder morphology being relatively distended, and the Anchor size of YOLOv is enlarged by 50%, so that the size of the focus can be better fitted, and the detection accuracy is improved.
In a second aspect, an embodiment of the present application provides a bladder focus detection apparatus, the apparatus comprising:
the image acquisition module is used for acquiring multi-frame bladder ultrasonic images;
The state detection module is used for identifying the target filling state of the bladder in each frame of bladder ultrasonic image;
And the focus detection module is used for determining a corresponding detection network model according to the target filling state, detecting the bladder focus through the determined detection network model, and obtaining a detection result, wherein each filling state corresponds to one detection network model.
In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing computer readable instructions which, when executed by the processor, perform the steps of the method as provided in the first aspect above.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method as provided in the first aspect above.
In a fifth aspect, embodiments of the present application provide a computer program product comprising computer program instructions which, when read and run by a processor, perform the steps of the method as provided in the first aspect above.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
It should be noted that the terms "system" and "network" in embodiments of the present invention may be used interchangeably. "plurality" means two or more, and "plurality" may also be understood as "at least two" in this embodiment of the present invention. "and/or" describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate that there are three cases of a alone, a and B together, and B alone. The character "/", unless otherwise specified, generally indicates that the associated object is an "or" relationship.
It should be further noted that, in the present application, all actions of acquiring signals, information or data are performed under the condition of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.
The embodiment of the application provides a bladder focus detection method, which provides important priori information for subsequent focus detection by identifying the filling state of a bladder in each frame of bladder ultrasonic image, detects the bladder in different filling states by detecting a network model, dynamically adjusts the detecting network model by the identified filling state, enables the detecting network model to be more focused on fine detection of the focus, adapts to image feature change of the bladder in different filling states, can effectively improve the accuracy of focus detection, and reduces the probability of false detection or missing detection.
Referring to fig. 1, fig. 1 is a flowchart of a method for detecting a bladder focus according to an embodiment of the present application, the method includes the following steps:
And S110, acquiring multi-frame bladder ultrasonic images.
The multi-frame bladder ultrasound image may be a video frame image obtained by an ultrasound device, such as the method of the present application being performed by an electronic device (e.g., a terminal, server, etc.) that receives the video frame image acquired by the ultrasound device, which, when performing image acquisition, acquires a video stream, such as a video stream at 30 frames/second. The multi-frame bladder ultrasonic image is a video frame image, and can be a video frame image obtained every second, namely the method can detect the focus aiming at the video frame image obtained every second, and can also acquire the video frame image in a period of time to detect the focus.
In some embodiments, the resolution of the original video frame image acquired by the ultrasound device may be larger, in order to reduce the calculation load, the electronic device may perform adaptive resolution adjustment after receiving the original bladder ultrasound image, for example, when the resolution of the original bladder ultrasound image exceeds a set value (for example, 720p, a specific value may be set according to actual requirements), the original bladder ultrasound image may be reduced in size, for example, a warranty edge downsampling algorithm is adopted to reduce the size of the original bladder ultrasound image to 640 x 640, and then the processed bladder ultrasound image is subjected to subsequent detection. Of course, the specific reduced size can be flexibly set according to actual requirements, so that not only the key details of the image can be kept, but also the calculation load can be reduced.
The edge preserving and downsampling algorithm is an algorithm which can reduce the resolution of images and keep image edge information as much as possible, and the edge information can provide more key features for subsequent bladder focus detection, so that the accuracy of subsequent detection results can be improved. Reducing the image resolution can reduce the amount of computation, while reducing the image resolution by a guard-edge downsampling algorithm, while retaining key details and edge information in the image.
Step S120, identifying the target filling state of the bladder in each frame of bladder ultrasonic image.
The filling state may include an unfilled state, a standard filling state, and an overfilled state, among others. It will be appreciated that the target filling state may refer to one of these three filling states. Specifically, when identifying the target filling state of the bladder in the ultrasound image, the obtained types of filling states may be the three filling states, or may refer to the values of the filling states, that is, the values of the various filling states, and in the following embodiments, refer to the confidence. The identification mode can be realized by adopting a corresponding image processing algorithm or a machine learning model and the like.
In some embodiments, the filling status is determined according to the morphology of the bladder, which may be determined in combination with the bladder wall thickness, volume, and morphology features, the volume calculated as v=0.5 major diameter x transverse diameter x anterior-posterior diameter. When classifying, the bladder wall is not fully filled when the shrinkage thickness exceeds 3mm and the volume is less than 100ml, the bladder wall is fully filled when the shrinkage thickness is 1-3mm and the volume is more than 400ml, and the bladder wall is stretched and thinned to be less than 1 mm.
It can be understood that in practical application, the filling states can be further distinguished according to practical requirements, and the method is not limited to the three filling states provided by the scheme.
In some embodiments, the target filling state of the bladder in each frame of bladder ultrasound image may be identified by a classification network model, for example, if a video frame image is input, and a plurality of frames of bladder ultrasound images are temporally consecutive images, the frames of bladder ultrasound images may be input into the classification network model in the order received, and the classification network model may identify the filling state of the bladder in each frame of images that are input in turn. The classifying network model is used for extracting key features of the input bladder ultrasonic image, and classifying the bladder ultrasonic image based on the key features, namely, the bladder ultrasonic image is classified into an unfilled state, a standard filled state or an overfilled state.
And step 130, determining a corresponding detection network model according to the target filling state, and detecting the bladder focus through the determined detection network model to obtain a detection result.
After determining the target filling state of the bladder in each frame of bladder ultrasound image, a corresponding detection network model may be determined based on the target filling state. Wherein, each filling state corresponds to a detection network model, for example, the non-filling state corresponds to the detection network model 1, the standard filling state corresponds to the detection network model 2, and the excessive filling state corresponds to the detection network model 3. Each filling state corresponds to a detection network model, which can be understood as a detection network model for each filling state, and can be understood as a basic detection network model for a plurality of filling states, but each filling state corresponds to a network parameter, the network parameters of the basic detection network model can be adjusted according to the filling state, for example, when the target filling state is determined to be an unfilled state, and at the moment, the parameters of the basic detection network model can be adjusted to be the network parameters corresponding to the unfilled state.
In some embodiments, after determining the target filling state of the bladder in each frame of bladder ultrasound image, the frame of bladder ultrasound image may be input into a corresponding detection network model for detection. For example, if it is determined that the target filling state of the bladder in a certain frame of bladder ultrasound image is an unfilled state, the frame of image may be continuously input into the detection network model 1, bladder focus detection may be performed on the frame of image by the detection network model 1, and a detection result may be output, where the detection result may include information such as the size, position, shape, focus classification, and the like of the bladder focus, and the focus classification may be classified into benign and malignant classifications.
The detection network model can be used for detecting the focus of the bladder according to the image characteristics by extracting the image characteristics in the bladder ultrasonic image, so as to obtain a detection result. The classification network model can identify the filling state of the bladder in each frame of bladder ultrasonic image, and after the corresponding detection network model is determined, the bladder focus in each frame of bladder ultrasonic image can be detected through the corresponding detection network model, and a detection result is obtained.
For the convenience of user's viewing, the detection results may be output and displayed so that the user may directly view the detection results of the bladder lesions in each frame of bladder ultrasound image.
In the implementation process, the filling state of the bladder in each frame of bladder ultrasonic image is identified through the classification network model, important priori information is provided for subsequent focus detection, the bladder in different filling states is detected through the detection network model, and the detection network model is dynamically adjusted through the identified filling state, so that the detection network model can be more focused on focus fine detection, image feature changes of the bladder in different filling states are adapted, focus detection accuracy can be effectively improved, and false detection or omission rate is reduced.
On the basis of the embodiment, when the classification network model identifies the filling state of the bladder in each frame of bladder ultrasonic image, a single frame of image may be affected by noise, shielding or other interference factors, so that the classification result is inaccurate, the current filling state can be determined according to the target filling state of the bladder in the multi-frame bladder ultrasonic image, and then the corresponding detection network model is determined according to the current filling state, so that the influence of the accidental factors can be reduced, and the classification stability is improved.
The multi-frame bladder ultrasound image here may refer to the multi-frame bladder ultrasound image acquired in step S110 described above, or a part of the bladder ultrasound image therein. In this implementation, the current filling state may be determined in combination with the target filling state of the bladder in the multi-frame bladder ultrasound image.
In some embodiments, an image buffer may be provided in the electronic device, for storing the received multi-frame bladder ultrasound image, for example, 15 frames, and the electronic device may perform image recognition and detection with 15 frames as windows. The electronic device may store 15 frames of bladder ultrasound images in the received multi-frame bladder ultrasound images in an image buffer in time order and detect the filling status of the bladder in each frame of bladder ultrasound images in time order. Then, the bladder ultrasonic images with the set number can be identified as a unit, for example, the bladder ultrasonic images with the set number can be identified as a unit, the classification network model can identify the target filling state of the bladder in the bladder ultrasonic images with the set number of 5 frames, in this case, 5 target filling states can be obtained, then the current filling state can be determined according to the 5 target filling states, for example, the determined current filling state is the non-filling state, then the detection network model 1 corresponding to the non-filling state is determined as the detection network model, the determined current filling state is the standard filling state, and then the detection network model 2 corresponding to the standard filling state is determined as the detection network model.
In the implementation process, the current filling state is determined by combining the multi-frame images, so that the problem that the classification result is inaccurate due to the influence of factors such as noise on a single-frame image can be reduced.
Several ways of determining the current filling status in conjunction with the target filling status of the bladder in the multi-frame bladder ultrasound image are described in detail below.
(1) And if the target filling states of the bladders in the bladder ultrasonic images of the first set number are the same, determining that the current filling state is the target filling state.
For example, if the first set number is 5 frames (the specific numerical value can be flexibly set according to the actual requirement), the target filling state of the bladder in the continuous 5 frames of bladder ultrasonic images can be detected first, if the target filling states of the bladder in the 5 frames of bladder ultrasonic images are the same, if the target filling states are all the non-filling states, the system can determine that the current filling state is the non-filling state at this time, and then determine the detection network model 1 corresponding to the non-filling state. In this case, the 5 frames of bladder ultrasound images can be respectively input into the detection network model 1 for detection, so that the detection result of the bladder focus of the 5 frames of bladder ultrasound images can be obtained.
In the implementation process, whether the target filling states reaching the set number are the same is identified, if so, the filling states are stable, and therefore the identification accuracy and stability can be improved.
(2) If the target filling states of the bladders in the bladder ultrasonic images of the second set number are different, determining the current filling state as the target filling state with the largest number.
The second set number may be the same as or different from the first set number, if the second set number is also 5 frames, it is determined whether the target filling states of the bladder in the continuous 5-frame bladder ultrasound image are different, if so, the most number of target filling states may be determined from the 5 target filling states, for example, 4 of the 5 target filling states are not filling states, 1 is standard filling state, at this time, the current filling state is determined to be the not filling state, and then a detection network model corresponding to the not filling state may be determined, that is, the detection network model 1 may be determined.
After identifying the target filling state of the bladder in each frame of bladder ultrasonic image, the classification network model can record the classification result in an image buffer, and when the electronic equipment judges that the target filling states of the bladders in the set number of continuous frames of bladder ultrasonic images are the same or different, the current filling state can be determined, and then the detection network model corresponding to the current filling state is determined.
Continuing the above example, if the target filling states of the bladder in the previous continuous 5 frames (e.g., 1 st frame to 5 th frame) of bladder ultrasound images are the same, after the 5 frames of bladder ultrasound images are input into the detection network model 1 for detection, the classification network model may continue to detect the target filling states of the bladder in the subsequent 5 frames (e.g., 6 th frame to 10 th frame) of bladder ultrasound images, then may continue to lock the filling states according to the logic and determine the corresponding detection network model, for example, if 3 of the target filling states of the bladder in the subsequent 5 frames of bladder ultrasound images are standard filling states and 2 of the bladder are overfilling states, the filling states may be locked into the standard filling states, at this time, the detection network model 2 corresponding to the standard filling states may be determined, and then the 5 frames of bladder ultrasound images may be respectively input into the detection network model 2 for focus detection, so as to obtain the detection result.
In some other embodiments, the second set number may be different from the first set number, for example, the electronic device may determine whether the first set number of target filling states are the same, if yes, whether the first set number of target filling states of the bladder in the continuous 5-frame images are the same, if not, the current filling state may not be determined at first, and if the second set number is 15 frames, for example, the second set number may continue to detect the target filling states of the bladder in the subsequent 10-frame bladder ultrasound image, so as to obtain 15 target filling states, and at this time, the most number of filling states may be determined from the 15 target filling states, for example, if the most number of target filling states are standard filling states, the current filling state may be determined to be the standard filling state. Of course, if the number of the non-filling states is 7, the number of the standard filling states is 7, and the number of the excessive filling states is 1, it may be determined that the current filling states include the non-filling states and the standard filling states, and the network detection model determined in this case includes the network detection model corresponding to the non-filling states and the standard filling states.
When the focus is detected, the 15 frames of bladder ultrasonic images can be respectively input into the corresponding detection network models for detection, if the detection network models are a plurality of detection network models, each frame of bladder ultrasonic image can be input into the detection network models for detection, and finally the detection results of the detection network models can be integrated.
In the implementation process, false detection and missing detection caused by misjudgment of single-frame images can be reduced through analysis of continuous multi-frame images.
(3) If the target filling states of the bladder in the bladder ultrasound images of the third set number are different, determining that the current filling state comprises various different target filling states.
The third set number may be the same as or different from the second set number. Continuing with the above example, taking the third set number of 5 frames as an example, if the target filling states of the bladders in the continuous 5-frame bladder ultrasound images are detected to be the same, determining the current filling state according to the above mode (1), otherwise, determining the current filling state according to the mode (2) or the mode (3). If the determination is made in the mode (3), the target filling state of the bladder in the 5-frame bladder ultrasonic image comprises a standard filling state and an overfilling state, the current filling state is determined to comprise the standard filling state and the overfilling state, the target filling state of the bladder in the 5-frame bladder ultrasonic image is determined to comprise an underfilling state, the standard filling state and the overfilling state, the current filling state is determined to comprise the underfilling state, the standard filling state and the overfilling state, and the subsequently determined detection network model is also a detection network model corresponding to various filling states.
In the implementation process, if a large number of target filling states are different, the filling states are changed, and at the moment, detection can be performed by adopting detection network models corresponding to various filling states, so that the accuracy and the stability of detection results can be improved.
Based on the above embodiment, in the manner of identifying filling states, the classification network model may further output a confidence level of each filling state, and then may determine a final target filling state according to the confidence level, that is, the confidence level of each filling state of the bladder in each frame of bladder ultrasound image identified by the classification network model may be obtained, and then the filling state with the largest confidence level may be determined as the target filling state of the bladder in the frame of bladder ultrasound image.
The confidence level can represent the probability of each filling state, for example, the confidence level of the three filling states of the bladder in a certain frame of bladder ultrasonic image is respectively an unfilled state (0.7), a standard filling state (0.2) and an overfilled state (0.1), under the condition, the confidence level of the unfilled state is the largest, and at the moment, the target filling state of the bladder in the frame of bladder ultrasonic image can be determined to be the unfilled state. According to the method, the target filling state of the bladder in each frame of bladder ultrasonic image can be determined.
In the implementation process, the target filling state is determined through the confidence, so that the filling state of the bladder in each frame of image can be determined, and the accurate selection and detection of the network model are facilitated.
Based on the above embodiment, in order to further improve the accuracy of classification and determine the accuracy of the detection network model, when determining the current filling state, the current filling state may be determined according to the target filling state of the bladder in the multi-frame bladder ultrasound image and the corresponding confidence level, and then the corresponding detection network model may be determined according to the current filling state.
The manner in which the confidence level is combined to determine is described in detail below.
(4) And if the target filling state of the bladder in the bladder ultrasonic images of the fourth set number is the same and the corresponding confidence level exceeds the set threshold, determining that the current filling state is the target filling state.
For example, the target filling states of the bladder in the 5-frame bladder ultrasound image are the same, such as all the filling states are not filling states, and the confidence of each filling state exceeds a set threshold (such as 0.85, a specific value can be flexibly set according to actual requirements), so that the current filling state can be determined to be the filling state. However, if the target filling states of the bladder in the 5 frames of ultrasonic bladder images are the same, for example, the target filling states are all the same, but the confidence of the filling states reaching more than half of the first set number exceeds a set threshold, if the confidence of the filling states of 3 exceeds the set threshold, the current filling state can be determined to be the filling state.
If the confidence level of a certain number of the underfilling states does not reach the preset threshold, for example, only 2 confidence levels of the underfilling states exceed the preset threshold, the confidence level of the target filling state of the bladder in the subsequent 5-frame bladder ultrasonic image can be continuously identified through the classification network model, 10 target filling states can be obtained at this time, and then the 10 target filling states and the corresponding confidence levels can be combined to judge. For example, if all the 10 target filling states are the same, such as the same as the other 10 target filling states, but more than 5 confidence levels of the other target filling states exceed a set threshold, determining that the current filling state is the other filling state, and otherwise, continuing to detect the target filling state and the confidence level of the bladder in the other frame bladder ultrasonic images.
Of course, if the target filling states of the previous 5 frames or the subsequent 10 frames are different, the current filling state may be determined according to the subsequent mode (2), or (5), or (6), for example, the target filling state with the largest number is selected as the current filling state, or the target filling state with the confidence degree exceeding the set threshold is selected as the current filling state.
(5) If the target filling states of the bladders in the bladder ultrasonic images of the fifth set number are different, the target filling states with the confidence exceeding the set threshold are counted, and the current filling state is determined to be the target filling state.
For example, the target filling states of the bladder in the 5-frame bladder ultrasound image are different, such as including an unfilled state and a standard filling state, and at this time, filling states with a confidence exceeding a set threshold (such as 0.85) may be counted, such as a confidence exceeding 0.85 for the unfilled state and a confidence not exceeding 0.85 for the standard filling state, and the current filling state is determined to be the unfilled state. If the confidence of the non-filling state and the standard filling state exceeds 0.85, determining that the current filling state comprises the non-filling state and the standard filling state.
(6) If the target filling states of the bladder in the bladder ultrasound images of the sixth set number of consecutive frames are different, and the confidence level exceeding the preset number is smaller than the set threshold, determining that the current filling state comprises various different target filling states.
For example, if it is recognized that the target filling states of the bladder in the 15 frames of bladder ultrasound images are different, the confidence of how many target filling states are smaller than the set threshold (for example, 0.85), if there is more than the set number (the set number may be half of the sixth set number or set according to actual requirements, for example, 10), this indicates that the filling states of the bladder are changing, so that the filling states cannot be determined clearly, and at this time, each of the 15 target filling states may be directly used as the current filling state. If the 15 target filling states include an underfilling state and a standard filling state, the current filling state includes the underfilling state and the standard filling state, and if the 15 target filling states include the underfilling state, the standard filling state and the overfilling state, the current filling state includes the underfilling state, the standard filling state and the overfilling state.
It is to be understood that the fourth set number, the fifth set number, and the sixth set number may be flexibly set according to actual requirements. In a specific application, the electronic device may identify and detect the filling state with 15 frames as windows, where the set number may be set to 5 in the case of the same filling state, and the set number may be set to 15 in the case of different filling states, so that the first set number and the fourth set number may be set to 5, and the remaining set number may be 15, that is, if the filling states are the same, the current filling state may be determined by using bladder ultrasound images with fewer frames, and if the filling states are different, the current filling state may be determined by using bladder ultrasound images with more frames, thereby reducing the influence of noise and improving the stability and accuracy of identification.
In the implementation process, the current filling state is determined by combining the confidence coefficient, so that the recognition accuracy and stability can be improved.
Based on the above embodiments, if the target filling state of the bladder in each frame of bladder ultrasound image is identified by the classification network model, in some embodiments, the classification network model may be EFFICIENTNET-B4 classification network, and/or the detection network model may be YOLOv detection network.
EFFICIENTNET-B4 classification network can efficiently extract multi-scale characteristics through depth separable rolling and compound scaling technology, and has higher classification accuracy.
YOLOv9 the detection network has higher detection precision, and simultaneously, the parameters and the calculated amount are obviously reduced. The scheme can deploy the classification network model and the detection network model to the equipment end for real-time operation.
In some embodiments, the classification network model in the present solution may also use other models, such as ResNet models, denseNet models, or other version models of EFFICIENTNET systems, etc., and the detection network model may also use other models, such as the fast R-CNN model, RETINANET, or other version models of YOLO systems, etc.
In the implementation process, the scheme of combining EFFICIENTNET-B4 classification network and YOLOv detection network is adopted, so that accurate classification of filling state and accurate detection of bladder focus can be realized.
On the basis of the above embodiment, the above classification network model and detection network model are obtained by training with a large amount of data in advance, and in the training process, in order to ensure that the classification network model can learn the general image feature extraction capability, the training can be performed in a manner of freezing the shallow feature extractor and fine-tuning the network parameters of the deep network in the training process of EFFICIENTNET-B4 classification network.
The shallow feature extractor refers to the first layers of EFFICIENTNET-B4 classification network, such as a convolution layer including the first layer and MBConv blocks of the first stage, and is mainly responsible for extracting basic features of the image, such as edges, textures, and the like. The deep network refers to the later layers of EFFICIENTNET-B4 classification network, mainly responsible for extracting high-level features of images, such as semantic information, etc., and can comprise MBConv blocks in the subsequent stage, a global average pooling layer, a full connection layer, etc.
In addition, when the EFFICIENTNET-B4 classification network is trained, optimization in aspects of multi-scale feature reinforcement, anatomic prior guidance and the like can be introduced. For example, a dual path attention mechanism is embedded in MBConv blocks (for example, the first MBConv blocks of stage 5-7 of EFFICIENTNET-B4) in a EFFICIENTNET-B4 classification network, one of which is a spatial path, a large range of morphological features are captured by using hole convolution, the other is a detail path, local textures are extracted by using 1*1 convolution, and finally dual path features are fused by using a learnable parameter. When the bladder is emptied (wall thickness >3 mm) to overfill (wall thickness <1 mm), the wall folds are smoothed by complexity, the cavity convolution enlarges the receptive field to 10 x10 mm2, the ductile deformation in the later stage of filling can be completely captured, and 1*1 convolution focuses on the residual micro folds in the 3x 3mm2 area.
Alternatively, the anatomical constraint module, which is a differentiable preprocessing layer used to encode medical prior knowledge into the feature space, may be inserted after the first downsampling layer of the EFFICIENTNET-B4 classification network. Because bladder wall thickness is collected at the time of data acquisition, the bladder wall thickness profile can be converted to a feature mask by the differentiable rendering technique of the module, and spatially weighted fusion can be performed with the first downsampled convolution feature map.
In the training process, a large number of images can be acquired in advance, for example, in the data acquisition process, a high-frequency ultrasonic probe can be adopted to acquire a dynamic image sequence of a transverse and longitudinal section of the bladder, each subject can set a plurality of time monitoring points such as a moment (emptying state), a natural filling period (gradient detection of every 50 ml), a capacity saturation period (with the concentration of > 400ml, and excessive filling) and the like after urination, and can adopt a drinking water load test to control filling efficiency (with constant intake of 500ml/20 min), continuous volume monitoring can be synchronously carried out, a full-period recorded image containing the time from emptying to filling can be acquired, and the images are input into a classification network model and a detection network model for training.
The classification network model and the detection network model may be trained separately and independently, or may be combined with training, without limitation.
After a large number of bladder ultrasonic images are collected, in order to expand training samples, rotation, overturn, noise injection and other operations can be adopted to preprocess the images, affine transformation simulation technology can be introduced, the images are preprocessed through designing an adjustable scaling matrix (scaling factors delta s epsilon < -0.3, 0.3) and displacement parameters (delta x, delta y epsilon < -0.1W,0.1H ]), and the dynamic process of bladder expansion and contraction is reproduced at the image space level. Meanwhile, a non-rigid deformation field can be generated by using a thin plate spline algorithm, and the local stretching effect of the bladder wall in the overfilling state and the mucosa fold characteristics in the emptying state are simulated, so that the model can learn the invariant characteristics in the continuous change of the organ morphology.
In the implementation process, parameters of the shallow feature extractor can be frozen in the training process of the EFFICIENTNET-B4 classification network, namely weights of the network layers are not updated, so that the extraction capacity of the pre-training model for basic features can be reserved, and the weakening of the generalization capacity of the basic features caused by over training is avoided. By fine tuning parameters of the deep network, high-level characteristics of the network can be better adapted to specific requirements of medical images, and classification capability of bladder filling states is improved.
On the basis of the above embodiment, in order to enable the detection network model to adapt to detection of lesions of different sizes, the Anchor size of the detection network may also be set to be different for YOLOv corresponding to different filling states. If the Anchor size of the YOLOv detection network corresponding to the standard filling state is the standard size, the Anchor size of the YOLOv detection network corresponding to the non-filling state is reduced by a set proportion from the standard size, and the Anchor size of the YOLOv detection network corresponding to the excessive filling state is enlarged by a set proportion from the standard size.
YOLOv9 the Anchor of the detection network refers to an Anchor frame, which is a preset group of bounding boxes used for representing possible positions and sizes of target objects with different sizes and shapes in an image, each Anchor frame corresponds to a prediction bounding box, and the model can predict the bounding box of an actual target object by adjusting the sizes and the positions of the Anchor frames.
The standard size may be understood as the size of the Anchor of the original YOLOv detection network, and the standard size may be reduced by 50% in the non-filling state, that is, the Anchor size of the YOLOv detection network corresponding to the non-filling state may be obtained, and the standard size may be enlarged by 50% in the over-filling state, that is, the Anchor size of the YOLOv detection network corresponding to the over-filling state may be obtained. Of course, the setting proportion can be flexibly set according to actual requirements.
When small lesions are detected, the lesions are small relative to the whole image, and the standard size of a traditional Anchor may be large, which can lead to the situation that the model is difficult to accurately match and position when detecting the small lesions, and thus missed detection or false detection occurs. In the bladder unfilled state, the focus may be relatively small due to the bladder morphology being contracted, and the Anchor size of YOLOv is reduced by 50%, so that the boundary of the small focus can be predicted more finely. Therefore, aiming at the detection network model in the non-filling state, the features and the positions of the small focus can be accurately captured by reducing the size of the Anchor, and the small targets can be better fitted on the feature map because the reduced Anchor frame is more fit with the size range of the small focus, so that the sensitivity of detecting the small focus is improved.
Conversely, when detecting large lesions, these lesions are large in size relative to the overall impact, whereas the standard size of a conventional Anchor may be relatively small, which can make it difficult for the model to accurately match and locate when detecting large lesions. In the bladder overfill state, the focus may be relatively large due to the bladder morphology being relatively distended, and the Anchor size of YOLOv is enlarged by 50%, so that the size of the focus can be better fitted, and the detection accuracy is improved.
In the training process of YOLOv detection networks, since one YOLOv detection network is respectively arranged for three filling states, in order to reduce the training amount, knowledge migration can be realized by the three YOLOv detection networks through the first 10 layers of parameters of the shared backbone network, and meanwhile, the auxiliary task of filling state prediction can be integrated into a loss function, namely, the YOLOv detection network can also recognize the filling state, so that the sensitivity of the model to morphological characteristics can be enhanced.
The detection method of the present embodiment will be described below with reference to fig. 2, by way of a specific example, in conjunction with the above embodiment.
The electronic equipment firstly obtains multi-frame bladder ultrasonic images, then the images can be subjected to resolution adjustment, namely resolution standardization, the adjusted images can be subjected to filling state identification through a classification network model, firstly, 5 frames of images are used as units, whether filling states corresponding to continuous 5 frames of images are the same or not is detected, if yes, a high confidence coefficient mode is entered, at the moment, a corresponding detection network model is directly determined according to the current filling state, then focus detection is carried out, and if no, detection of 15 frames of images is expanded. Detecting whether the confidence coefficient of the filling state of more than 10 frames of images in 15 frames of images is lower than 0.85, if not, voting the filling state corresponding to the 15 frames of images, then entering a high confidence coefficient mode, namely determining a corresponding detection network model according to the current filling state, then detecting a focus, if so, entering a transition state mode, adopting a plurality of detection network models to carry out mixed detection at the moment, and finally obtaining a detection result through weighted summation.
For example, in the transient state mode, if it is determined that the filling states corresponding to the 15 frames of images include the three filling states, the detection network model also includes the three models, that is, the detection network model 1 corresponding to the non-filling state, the detection network model 2 corresponding to the standard filling state, and the detection network model 3 corresponding to the excessive filling state, and when in detection, a certain frame of bladder ultrasound image can be respectively input into the three detection network models to obtain three detection results, and then the three detection results can be weighted and summed. The weights can be determined according to the confidence levels of the three filling states, for example, the classification detection network model can predict the confidence levels of the three filling states corresponding to each frame of image, then the confidence levels of the three filling states in the 15 frames of images can be averaged, and the final confidence levels of the three filling states can be obtained, for example, the final non-filling state is the standard filling state, the excessive filling state=0.7:0.2:0.1, and the weights are the confidence levels.
If the detection network model 1 outputs the detection result 1, the corresponding weight is 0.7, the detection network model 2 outputs the detection result 2, the corresponding weight is 0.2, the detection network model 3 outputs the detection result 3, the corresponding weight is 0.1, and when weighting, the final detection result=1×0.7+2×0.2+3×0.1. Each detection result can comprise the position, the size, the shape and the classification of the focus, when weighting is carried out, the detection data corresponding to the focus with overlapped positions can be weighted and summed, the classification in the detection result can be represented by the classification probability, and the weighting processing can be conveniently carried out. When the mixed model is adopted for detection, the detection results can be integrated in the mode, and finally, the detection results can be output to a user for display.
Referring to fig. 3, fig. 3 is a block diagram of a bladder focus detection apparatus according to an embodiment of the present application, where the apparatus may be a module, a program segment, or a code on an electronic device. It should be understood that the apparatus corresponds to the embodiment of the method of fig. 1 described above, and is capable of performing the steps involved in the embodiment of the method of fig. 1, and specific functions of the apparatus may be referred to in the foregoing description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy.
Optionally, the apparatus 200 includes:
an image acquisition module 210 for acquiring a plurality of frames of bladder ultrasound images;
a state detection module 220 for identifying a target filling state of the bladder in each frame of bladder ultrasound image;
the focus detection module 230 is configured to determine a corresponding detection network model according to the target filling states, and detect a bladder focus through the determined detection network model, so as to obtain a detection result, where each filling state corresponds to one detection network model.
Optionally, the focus detection module 230 is configured to determine a current filling state according to a target filling state of a bladder in the multi-frame bladder ultrasound image, and determine a corresponding detection network model according to the current filling state.
Optionally, the focus detection module 230 is configured to determine that the current filling state is the target filling state if the target filling state of the bladder in the first set number of consecutive frame bladder ultrasound images is the same.
Optionally, the focus detection module 230 is configured to determine that the current filling state is the most number of target filling states if the target filling states of the bladders in the bladder ultrasound images of the second set number of consecutive frames are different.
Optionally, the focus detection module 230 is configured to determine that the current filling status includes various different target filling statuses if the target filling statuses of the bladder in the bladder ultrasound images of the continuous frames reaching the third set number are different.
Optionally, the state detection module 220 is configured to obtain confidence levels of various filling states of the bladder in each frame of bladder ultrasound image identified by the classification network model, where the filling states include an uninflated state, a standard filling state and an overfilling state, and determine the filling state with the maximum confidence level as a target filling state.
Optionally, the focus detection module 230 is configured to determine a current filling state according to a target filling state of a bladder in the multi-frame bladder ultrasound image and a corresponding confidence level, and determine a corresponding detection network model according to the current filling state.
Optionally, the state detection module 220 is configured to identify, by using the classification network model, a target filling state of the bladder in each frame of bladder ultrasound image, where the filling state includes an unfilled state, a standard filling state, and an overfilled state;
Wherein the classification network model is EFFICIENTNET-B4 classification network and/or the detection network model is YOLOv9 detection network.
Optionally, during the training of the EFFICIENTNET-B4 classification network, training is performed in a manner that freezes the shallow feature extractor, fine-tunes the network parameters of the deep network.
Optionally, the size of the Anchor of the YOLOv detection network corresponding to the standard filling state is a standard size, the size of the Anchor of the YOLOv detection network corresponding to the non-filling state is smaller than the standard size by a set proportion, and the size of the Anchor of the YOLOv detection network corresponding to the excessive filling state is larger than the standard size by a set proportion.
It should be noted that, for convenience and brevity, a person skilled in the art will clearly understand that, for the specific working procedure of the apparatus described above, reference may be made to the corresponding procedure in the foregoing method embodiment, and the description will not be repeated here.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device for performing a bladder focus detection method according to an embodiment of the present application, where the electronic device may include at least one processor 310, such as a CPU, at least one communication interface 320, at least one memory 330, and at least one communication bus 340. Wherein the communication bus 340 is used to enable connected communication between these components. The communication interface 320 of the device in the embodiment of the present application is used for performing signaling or data communication with other node devices. The memory 330 may be a high-speed RAM memory or a nonvolatile memory (non-volatile memory), such as at least one disk memory. Memory 330 may also optionally be at least one storage device located remotely from the aforementioned processor. The memory 330 has stored therein computer readable instructions which, when executed by the processor 310, perform the method process described above in fig. 1.
It will be appreciated that the configuration shown in fig. 4 is merely illustrative, and that the electronic device may also include more or fewer components than shown in fig. 4, or have a different configuration than shown in fig. 4. The components shown in fig. 4 may be implemented in hardware, software, or a combination thereof.
Embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs a method process performed by an electronic device in the method embodiment shown in fig. 1.
The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the methods provided by the above-described method embodiments, for example, comprising:
Acquiring multi-frame bladder ultrasonic images;
identifying a target filling state of a bladder in each frame of bladder ultrasonic image;
And determining a corresponding detection network model according to the target filling state, and detecting the bladder focus through the determined detection network model to obtain a detection result, wherein each filling state corresponds to one detection network model.
In summary, the embodiments of the present application provide a method, an apparatus, a device, a storage medium, and a program product for detecting a bladder focus, which identify the filling state of a bladder in each frame of bladder ultrasound image, provide important prior information for subsequent focus detection, detect bladders in different filling states by detecting a network model, dynamically adjust the detecting network model by the identified filling state, so that the detecting network model can concentrate more on fine detection of a focus, adapt to image feature changes of the bladder in different filling states, effectively improve accuracy of focus detection, and reduce probability of false detection or missed detection.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.