wherein CC is the number of instances in the training image, v is an intra-class penalty factor, μ c is the average value of the features in a certain class, and xi is a certain pixel feature;

the inter-class discriminative power loss function of the second network may be:

where CC is the number of instances in the training image,_dis an inter-class penalty factor, mu_ca，μ_cbIs the average of the features within a certain class.

In this illustrative embodiment, the squared error loss function of the third network may be:

wherein,

and the predicted value of the position offset of the ith example is obtained, yi is the true value of the position offset of the ith example, and n is the number of examples in the training image.

In this specification embodiment, the absolute value loss function of the fourth network may be:

wherein,

the height predicted value of the ith example is, yi is the height true value of the ith example, and n is the number of examples in the training image.

In this embodiment of this specification, the first network, the second network, the third network, and the fourth network are all in the same deep learning network, and the method of this embodiment may further include:

and constructing a regularization loss function of the deep learning network.

Specifically, in the embodiment of the present specification, determining the sum of the cross entropy loss function, the intra-class cohesion loss function, the inter-class discriminative power loss function, the square error loss function, and the absolute value loss function as the synthetic loss function may include:

and determining a cross entropy loss function, an intra-class cohesion loss function, an inter-class discrimination loss function, a square error loss function, an absolute value loss function and a regularization loss function as a comprehensive loss function.

In this embodiment of the present specification, the regularization loss function may be an L1 regularization function or an L2 regularization function, and when the comprehensive loss function is calculated, the regularization loss function is introduced, so that overfitting of a model corresponding to a network can be prevented, and the generalization capability of the model is improved.

In the embodiment of the present specification, the deep learning network can be a U-Net network, which is a classic full convolution network (i.e., no fully connected operation in the network). The input of the network is a picture with the edge subjected to mirror image operation; on the left side of the network is a series of down-sampling operations consisting of convolution and Max Pooling, this part being called the compression path. The compression path consists of 4 blocks, each block uses 3 effective volumes and 1 Max Pooling (Max Pooling) down-sampling, and the number of Feature maps (Feature maps) after each down-sampling is multiplied by 2; the right part of the network is an extended path. Each block is multiplied by 2 in size by deconvolution, the number of the blocks is reduced by half (the last layer is slightly different), and the blocks are merged with the Feature Map of the left symmetric compression path, and the U-Net is normalized by clipping the Feature Map of the compression path to the Feature Map of the same size as the extension path because the Feature maps of the left compression path and the right extension path are different in size. The extended path convolution operation still uses an efficient convolution operation.

S2095: and fusing the semantic features and the example features of each pixel in the image to be detected to determine the fusion features of each pixel in the image to be detected.

In an embodiment of the present specification, fusing the semantic features and the example features, and determining a fused feature of each pixel in the image to be detected may include:

s20951: fusing a first mask of the background feature of the image to be detected with the texture feature and the spatial position feature of the pixel corresponding to the background in the image to be detected to obtain a first fusion result;

s20953: fusing a second mask of the foreground characteristic of the image to be detected with the texture characteristic and the spatial position characteristic of the corresponding foreground pixel in the image to be detected to obtain a second fusion result;

s20955: and determining the fusion characteristic of each pixel in the image to be detected according to the first fusion result and the second fusion result.

In the embodiment of the specification, a strategy of connected region regional clustering and spatial position feature fusion is adopted, so that on one hand, pixels in different regions can not be clustered into a category, and on the other hand, the clustering speed is increased.

S2011: and determining a pixel set corresponding to each instance type according to the fusion characteristics of each pixel in the image to be detected.

In an embodiment of the present specification, determining, according to the fusion feature of each pixel in the image to be detected, a pixel set corresponding to each instance category includes:

s20111: determining the instance category of each pixel in the image to be detected according to the fusion characteristic of each pixel in the image to be detected;

s20113: and determining a pixel set corresponding to each instance category through a density clustering algorithm.

In the embodiment of the present specification, the density-based clustering method performs clustering based on the density of the data set in spatial distribution, and does not need to set the number of clusters in advance, so that the method is particularly suitable for clustering the data set with unknown content. And representative algorithms are: DBSCAN, OPTICS. Taking the example of the DBSCAN algorithm, the DBSCAN objective is to find the largest set of density connected objects. The classic Density-Based Spatial Clustering of Application with Noise (DBSCAN) algorithm Based on Density Clustering is a Density Clustering algorithm Based on high-Density connected regions.

The basic algorithm flow of the DBSCAN is as follows: and starting from any object P, extracting all objects which can be reached from the density P through breadth-first search according to the threshold value and the parameters to obtain a cluster. If P is the core object, the corresponding object can be marked as the current class at one time, and the expansion is carried out based on the current class. After a complete cluster is obtained, a new object is selected to repeat the above process. If P is a boundary object, it is marked as noise and discarded.

Specifically, in the embodiment of the present specification, an example polygon formed by a small number of points can be generated by the density-clustered example through a suitable vectorization algorithm; the image corresponding to the instance may be a polygonal structure; the triggering operation on the display interface may be sliding, clicking or dragging or other operations of the user on the display interface, for example, "image preview" in the display interface may be clicked, an image corresponding to each instance in the image to be detected is constructed and displayed, and the instances in the image are separated from each other.

In this embodiment, after the step of determining, by a density clustering algorithm, a set of pixels corresponding to each instance category, the method may further include:

sending a pixel set corresponding to each instance type to a terminal; and enabling the terminal to respond to the operation on the display interface, and constructing and displaying the image corresponding to each instance in the image to be detected.

Specifically, in the embodiment of the present specification, the examples in the image may be identified by using different colors, so that a user can distinguish different examples in the image conveniently; the terminal can comprise a map application program, and the map application program can respond to the operation on the display interface and construct and display the image corresponding to each instance in the image to be detected; therefore, the map information corresponding to the image to be detected is visually displayed to the user.

S2013: and determining attribute information of the instance corresponding to each instance type according to the fusion characteristics, the position offset and the height information of the pixels in the pixel set corresponding to each instance type.

In this embodiment of the present specification, as shown in fig. 5, determining attribute information of an instance corresponding to each instance category according to the fusion feature, the position offset, and the height information of the pixels in the pixel set corresponding to each instance category includes:

s20131: and determining the fusion characteristic of the example corresponding to each example type according to the fusion characteristic of the pixels in the pixel set corresponding to each example type.

S20133: and determining the position offset of the example corresponding to each example type according to the position offset of the pixels in the pixel set corresponding to each example type.

In the embodiments of the present specification, the attribute information of the instance may include a fusion feature, a position offset, height information, and the like of the instance.

Specifically, in this embodiment of the present specification, determining, according to a position offset of a pixel in a pixel set corresponding to each instance type, a position offset of an instance corresponding to each instance type includes:

sorting the position offset of each pixel in the pixel set corresponding to each instance type from small to large;

and determining the median of the position offset of each pixel as the position offset of the corresponding example of each example category.

S20135: and determining the height information of the example corresponding to each example type according to the height information of the pixels in the pixel set corresponding to each example type.

Specifically, in this embodiment of the present specification, determining the height information of the instance corresponding to each instance category according to the height information of the pixels in the pixel set corresponding to each instance category includes:

sorting the height information of each pixel in the pixel set corresponding to each instance type from small to large;

and determining the median in the height information of each pixel as the height information of the corresponding example of each example category.

In the embodiment of the present specification, the median of the position offset in the pixel set corresponding to each instance may be used as the position offset of the instance, and the median of the height in the pixel set corresponding to each instance may be used as the height of the instance, so as to realize accurate prediction of the position offset and the height information of the instance.

In this embodiment of the present specification, after the step of determining attribute information of the corresponding instance of each instance category, the method of this embodiment may further include:

and constructing an image corresponding to each instance in the image to be detected according to the attribute information of the instance corresponding to each instance type.

sending attribute information of the corresponding instance of each instance type to a terminal; and enabling the terminal to respond to the operation on the display interface, and constructing and displaying the image corresponding to each instance in the image to be detected.

In a specific embodiment, a network framework corresponding to the method of the present application is divided into six parts, i.e., a feature extraction downsampling part, a semantic feature extraction branch, an example feature extraction branch, a position offset prediction branch, a highly predicted branch, and an example clustering (clustering), as shown in fig. 6.

The network frame corresponding model is an example segmentation image determination model; in the application process, the image to be detected is directly input into the example segmentation image determination model, and the output example segmentation image can be obtained. Specifically, firstly, thedownsampling processing network 04 processes theimage 03 to be detected to obtain a shared feature; then, the shared features are respectively input into anexample branch network 05, asemantic branch network 06, a firstregression branch network 07 and a secondregression branch network 08 to obtain anexample feature graph 09 and asemantic feature graph 10; and finally, obtaining an examplepixel cluster map 11 according to theexample feature map 09 and thesemantic feature map 10, and obtaining an example segmentation image 12 according to the examplepixel cluster map 11 in the example, the position offset output by the firstregression branch network 07 and the height information output by the secondregression branch network 08.

In a specific embodiment, the schematic diagram of the prediction result of the building roof is shown in fig. 7, the areas of therectangular frames 13 in fig. 7 are all the contours of the roof, the schematic diagram of the prediction result of the building base is shown in fig. 8, and therectangular frames 14 in fig. 8 are all the contours of the base; by adopting the method to test the buildings in the national top50 city, the accuracy of the roof prediction of the conventional building region reaches 97%, and the accuracy of the floor prediction of the conventional building region reaches 95%. Compared with the existing manual labeling method, the efficiency of obtaining the map data by the method of the embodiment is improved by 10 times.

According to the technical scheme provided by the embodiment of the specification, the embodiment of the specification performs downsampling processing on the image to be detected including the target number of examples to obtain shared characteristics, then respectively determines the position offset, the height information and the fusion characteristic information of each pixel in the image to be detected according to the shared characteristics, and finally determines the attribute information of each example; the method and the device realize accurate segmentation of the examples in the image, accurately predict the height and the position offset of each example and facilitate accurate drawing of the example map.

An embodiment of the present application further provides an apparatus for determining instance attribute information in an image, as shown in fig. 9, the apparatus includes:

an image to be detected acquisition module 910, configured to acquire an image to be detected, where the image to be detected includes instances of a target number;

a shared feature determining module 920, configured to perform downsampling on the image to be detected to obtain a shared feature;

a position offset determining module 930, configured to perform position offset prediction processing on the shared feature to obtain a position offset of each pixel in the image to be detected;

a height information determining module 940, configured to perform height prediction processing on the shared feature to obtain height information of each pixel in the image to be detected;

a fusion feature determining module 950, configured to determine a fusion feature of each pixel in the image to be detected according to the shared feature;

a pixel set determining module 960, configured to determine a pixel set corresponding to each instance type according to a fusion feature of each pixel in the image to be detected;

the attribute information determining module 970 is configured to determine attribute information of an instance corresponding to each instance type according to the fusion feature, the position offset, and the height information of the pixels in the pixel set corresponding to each instance type.

In some embodiments, the fused feature determination module may include:

the semantic feature determining unit of the pixel is used for performing semantic segmentation processing on the shared feature to obtain the semantic feature of each pixel in the image to be detected;

the pixel example feature determining unit is used for performing example analysis processing on the shared features to obtain example features of each pixel in the image to be detected;

and the fusion characteristic determining unit of the pixels is used for fusing the semantic characteristic and the example characteristic of each pixel in the image to be detected and determining the fusion characteristic of each pixel in the image to be detected.

In some embodiments, the apparatus may further comprise:

and the image construction module is used for constructing an image corresponding to each instance in the image to be detected according to the attribute information of the instance corresponding to each instance type.

In some embodiments, the attribute information determination module may include:

the instance fusion feature determining unit is used for determining the fusion feature of the instance corresponding to each instance type according to the fusion feature of the pixels in the pixel set corresponding to each instance type;

the example position offset determining unit is used for determining the position offset of the example corresponding to each example type according to the position offset of the pixels in the pixel set corresponding to each example type;

and the height information determining unit of the example is used for determining the height information of the example corresponding to each example type according to the height information of the pixels in the pixel set corresponding to each example type.

In some embodiments, the semantic feature determining unit of the pixel may include:

and the semantic feature determining subunit is used for performing semantic segmentation processing on the shared features through a semantic branch network to obtain the semantic features of each pixel in the image to be detected.

In some embodiments, the example feature determination unit of the pixel may include:

and the example feature determining subunit of the pixel is used for performing example analysis processing on the shared feature through an example branch network to obtain an example feature.

In some embodiments, the position offset determination module may include:

the position offset determining unit is used for carrying out position offset prediction processing on the shared features through a first regression branch network to obtain the position offset of each pixel in the image to be detected;

in some embodiments, the height information determination module may include:

and the height information determining unit is used for performing height prediction processing on the shared characteristic through a second regression branch network to obtain the height information of each pixel in the image to be detected.

In some embodiments, the apparatus may further comprise:

the first function construction module is used for constructing a cross entropy loss function of the first network;

the second function building module is used for building an intra-class aggregation degree loss function and an inter-class differentiation degree loss function of the second network;

the third function construction module is used for constructing a square error loss function of a third network;

the fourth function construction module is used for constructing an absolute value loss function of a fourth network;

a comprehensive loss function determining module, configured to determine a sum of the cross entropy loss function, the intra-class conformity loss function, the inter-class discriminative power loss function, the squared error loss function, and the absolute value loss function as a comprehensive loss function;

a parameter adjusting module, configured to adjust parameters of the first network, the second network, the third network, and the fourth network, respectively, to obtain a current first network, a current second network, a current third network, and a current fourth network;

a comprehensive loss value calculating module, configured to calculate a comprehensive loss value corresponding to the current first network, the current second network, the current third network, and the current fourth network;

and the network determining module is used for determining the current first network as the semantic branch network, determining the current second network as the example branch network, determining the current third network as the first regression branch network and determining the current fourth network as the second regression branch network when the comprehensive loss value is smaller than a preset threshold value.

In some embodiments, the pixel set determination module may include:

the example type determining module is used for determining the example type of each pixel in the image to be detected according to the fusion characteristic of each pixel in the image to be detected;

and the pixel set determining module is used for determining the pixel set corresponding to each example category through a density clustering algorithm.

The device and method embodiments in the device embodiment are based on the same inventive concept.

The embodiment of the application provides an apparatus for determining attribute information of an instance in an image, which includes a processor and a memory, where the memory stores at least one instruction or at least one program, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the method for determining attribute information of an instance in an image provided by the above method embodiment.

Embodiments of the present application further provide a computer storage medium, where the storage medium may be disposed in a terminal to store at least one instruction or at least one program for implementing a method for determining example attribute information in an image in the method embodiments, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the method for determining example attribute information in an image provided in the method embodiments.

Alternatively, in the present specification embodiment, the storage medium may be located at least one network server among a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The memory of the embodiments of the present disclosure may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.

The embodiment of the method for determining the example attribute information in the image provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal, a server or a similar operation device. Taking the example of the application running on a server, fig. 10 is a hardware structure block diagram of the server of the method for determining the example attribute information in the image according to the embodiment of the present application. As shown in fig. 10, theserver 1000 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1010 (theprocessor 1010 may include but is not limited to a Processing device such as a microprocessor MCU or a programmable logic device FPGA), amemory 1030 for storing data, and one or more storage media 1020 (e.g., one or more mass storage devices) for storingapplications 1023 ordata 1022.Memory 1030 andstorage media 1020 may be, among other things, transient or persistent storage. The program stored in thestorage medium 1020 may include one or more modules, each of which may include a series of instruction operations for a server. Still further, thecentral processor 1010 may be configured to communicate with thestorage medium 1020 and execute a series of instruction operations in thestorage medium 1020 on theserver 1000. Theserver 1000 may also include one ormore power supplies 1060, one or more wired orwireless network interfaces 1050, one or more input-output interfaces 1040, and/or one ormore operating systems 1021, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

Input-output interface 1040 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of theserver 1000. In one example, i/o Interface 1040 includes a Network adapter (NIC) that may be coupled to other Network devices via a base station to communicate with the internet. In one example, the input/output interface 1040 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

It will be understood by those skilled in the art that the structure shown in fig. 10 is merely illustrative and is not intended to limit the structure of the electronic device. For example,server 1000 may also include more or fewer components than shown in FIG. 10, or have a different configuration than shown in FIG. 10.

The present application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above.

According to the embodiment of the method, the device, the server or the storage medium for determining the example attribute information in the image, the shared feature is obtained by performing down-sampling processing on the image to be detected including the target number examples, then the position offset, the height information and the fusion feature information of each pixel in the image to be detected are respectively determined according to the shared feature, and finally the attribute information of each example is determined; the method and the device realize accurate segmentation of the examples in the image, accurately predict the height and the position offset of each example and facilitate accurate drawing of the example map.

It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, device, and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer storage medium, and the above storage medium may be a read-only memory, a magnetic disk, an optical disk, or the like.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for determining instance attribute information in an image is characterized by comprising the following steps:

2. The method according to claim 1, wherein the determining a fusion feature of each pixel in the image to be detected according to the shared feature comprises:

performing semantic segmentation processing on the shared features to obtain semantic features of each pixel in the image to be detected;

carrying out example analysis processing on the shared characteristics to obtain example characteristics of each pixel in the image to be detected;

and fusing the semantic features and the example features of each pixel in the image to be detected, and determining the fusion features of each pixel in the image to be detected.

3. The method of claim 1, wherein after the step of determining attribute information for the corresponding instance of each instance class, the method further comprises:

4. The method according to claim 1, wherein determining attribute information of the instance corresponding to each instance category according to the fusion feature, the position offset and the height information of the pixels in the pixel set corresponding to each instance category comprises:

determining the fusion characteristics of the instances corresponding to each instance type according to the fusion characteristics of the pixels in the pixel set corresponding to each instance type;

determining the position offset of the example corresponding to each example type according to the position offset of the pixels in the pixel set corresponding to each example type;

and determining the height information of the example corresponding to each example type according to the height information of the pixels in the pixel set corresponding to each example type.

5. The method according to claim 2, wherein the semantic segmentation processing on the shared features to obtain the semantic features of each pixel in the image to be detected comprises:

performing semantic segmentation processing on the shared features through a semantic branch network to obtain semantic features of each pixel in the image to be detected;

the example analysis processing of the shared features to obtain the example features of each pixel in the image to be detected comprises:

carrying out example analysis processing on the shared characteristics through an example branch network to obtain example characteristics;

the step of performing position offset prediction processing on the shared features to obtain the position offset of each pixel in the image to be detected comprises the following steps:

performing position offset prediction processing on the shared features through a first regression branch network to obtain the position offset of each pixel in the image to be detected;

the height prediction processing of the shared features to obtain the height information of each pixel in the image to be detected comprises:

and performing height prediction processing on the shared characteristic through a second regression branch network to obtain height information of each pixel in the image to be detected.

6. The method according to claim 5, wherein before the step of performing semantic segmentation processing on the shared feature through a semantic branch network to obtain the semantic feature of each pixel in the image to be detected, the method further comprises:

constructing a cross entropy loss function of the first network;

constructing an intra-class aggregation degree loss function and an inter-class distinction degree loss function of the second network;

constructing a square error loss function of a third network;

constructing an absolute value loss function of a fourth network;

determining the sum of the cross entropy loss function, the intra-class conformity loss function, the inter-class discrimination loss function, the squared error loss function and the absolute value loss function as a comprehensive loss function;

respectively adjusting parameters of the first network, the second network, the third network and the fourth network to obtain a current first network, a current second network, a current third network and a current fourth network;

calculating the comprehensive loss values corresponding to the current first network, the current second network, the current third network and the current fourth network;

and when the comprehensive loss value is smaller than a preset threshold value, determining the current first network as the semantic branch network, determining the current second network as the example branch network, determining the current third network as the first regression branch network, and determining the current fourth network as the second regression branch network.

7. The method according to claim 1, wherein the determining the pixel set corresponding to each instance category according to the fusion feature of each pixel in the image to be detected comprises:

determining the instance category of each pixel in the image to be detected according to the fusion characteristic of each pixel in the image to be detected;

and determining a pixel set corresponding to each instance category through a density clustering algorithm.

8. An apparatus for determining attribute information of an instance in an image, the apparatus comprising:

9. An apparatus for determining attribute information of an instance in an image, the apparatus comprising a processor and a memory, the memory having at least one instruction or at least one program stored therein, the at least one instruction or the at least one program being loaded and executed by the processor to implement the method for determining attribute information of an instance in an image according to any one of claims 1 to 7.

10. A computer storage medium having at least one instruction or at least one program stored therein, the at least one instruction or the at least one program being loaded and executed by a processor to implement the method for determining instance attribute information in an image according to any one of claims 1 to 7.