CN118212461B

Movatterモバイル変換

Info

Publication number: CN118212461B
Application number: CN202410359260.1A
Authority: CN
Inventors: 何志海; 唐雨顺; 陈烁硕; 张毅; 欧阳健; 吴昊
Original assignee: Southern University of Science and Technology
Current assignee: Southern University of Science and Technology
Priority date: 2024-03-27
Filing date: 2024-03-27
Publication date: 2025-06-27
Anticipated expiration: 2044-03-27
Also published as: CN118212461A

Abstract

The method comprises the steps of obtaining a target domain image, inputting the target domain image into a pre-trained source domain model, adding a double-layer visual condition mark in the source domain model, wherein the double-layer visual condition mark is used for learning long-term changes of specific features of the domain and local changes of specific features of sample examples of domain deviation, converting the target domain image into an image block vector by utilizing an embedded layer in the source domain model, inputting the image block vector and the double-layer visual condition mark into an encoder of the source domain model, updating parameters of the double-layer visual condition mark and a normalization layer of the source domain model by back propagation to obtain a target domain model, and inputting the target domain image into the target domain model to obtain a corresponding image classification result. According to the invention, parameters of the double-layer visual condition mark and the normalization layer are reversely transmitted and updated in real time, and the model is updated online in real time, so that the model updating efficiency in an actual test scene is improved.

Description

Model field self-adaption method, device, terminal and medium based on double-layer vision

Technical Field

The invention relates to the technical field of migration learning, in particular to a model field self-adaption method, device, terminal and medium based on double-layer vision.

Background

While deep neural networks have met with great success in a variety of computer vision tasks, they come at the cost of a large number of labeled training data sets and a large number of computing resources. When training samples and test samples come from different environments, their performance tends to drop significantly, which is often referred to as a domain drift problem. The field self-adaption is a set of transfer learning method, aims to transfer a network model trained by a marked data set (source domain) to a new unmarked data set (target domain), performs sequence analysis on an input sample in an reasoning stage, and can effectively adapt to the source model in the test process so as to solve the problem of cross-domain performance degradation of a deep neural network.

Currently, unsupervised domain adaptation is divided into two categories, namely, an active domain adaptation method and a passive domain adaptation method. The active domain adaptive method mainly comprises statistical distribution matching and generation of an countermeasure network method to minimize domain differences, but the active domain adaptive method must rely on marked data of a source domain, and in practical application, a source domain sample cannot be accessed usually due to data privacy problem or difficult transmission. The self-adaptive method in the passive domain directly performs unsupervised transfer learning in the target domain without the need of labeling data in the source domain. However, the passive field self-adaptive method needs to access the whole test set data and needs to be trained for multiple times, the model can not be updated online in real time during testing, and the updating efficiency is low in an actual test scene.

Accordingly, the prior art has drawbacks and needs to be improved and developed.

Disclosure of Invention

The invention aims to solve the technical problems that the model field self-adaptive method, the device, the terminal and the medium based on double-layer vision are provided for overcoming the defects in the prior art, and aims to solve the problems that the passive field self-adaptive method in the prior art cannot update the model on line in real time during testing and has lower updating efficiency in an actual test scene.

The technical scheme adopted for solving the technical problems is as follows:

a model-domain adaptive method based on dual-layer vision, wherein the method comprises:

acquiring a target domain image, inputting the target domain image into a pre-trained source domain model, wherein a double-layer visual condition mark is added in the source domain model, and the double-layer visual condition mark is used for learning the long-term change of the specific characteristics of the field and the local change of the specific characteristics of the sample instance of the field deviation;

Converting the target domain image into an image block vector by utilizing an embedding layer in the source domain model, inputting the image block vector and the double-layer visual condition mark into an encoder of the source domain model, and updating parameters of the double-layer visual condition mark and a normalization layer of the source domain model by back propagation to obtain a target domain model;

and inputting the target domain image into the target domain model to obtain a corresponding image classification result.

In one implementation, the dual-layer visual condition markers include a domain-specific marker for learning long-term changes in domain-specific features and a sample instance-specific marker for learning local changes in domain-shifted sample instance-specific features;

The initialization parameter of the domain specific mark is a class mark pre-trained in a source domain model, and the initialization parameter of the sample instance specific mark is a zero vector.

In one implementation, acquiring a target domain image, inputting the target domain image into a pre-trained source domain model, comprising:

Acquiring a target domain data set for model domain self-adaption, wherein the target domain data set comprises a plurality of target domain images;

and inputting all target domain images in the target domain data set into a pre-trained source domain model in batches.

In one implementation, the method includes converting the target domain image into an image block vector by using an embedding layer in the source domain model, inputting the image block vector and the dual-layer visual condition flag into an encoder of the source domain model, and back-propagating parameters of normalizing layers of the dual-layer visual condition flag and the source domain model to obtain the target domain model, including:

Dividing all the target domain images in the current batch into a plurality of image blocks by utilizing an embedded layer in the source domain model, and converting each image block into an image block vector;

inputting the image block vector, the domain specific mark and the sample instance specific mark into an encoder of the source domain model, and using a preset information entropy loss function to reversely propagate and update parameters of the normalization layers of the domain specific mark, the sample instance specific mark and the source domain model to obtain a target domain model.

In one implementation, several target domain images of each batch are simultaneously input into a pre-trained source domain model, and the source domain model used by the current batch is the target domain model updated by the previous batch.

In one implementation, a gradient descent method is used to update the parameters of the domain-specific markers, the sample instance-specific markers and the normalized layer of the source domain model, and the parameters and gradients of the sample instance-specific markers in the target domain model obtained from the previous batch are set to zero before the target domain image of each batch is input.

In one implementation, inputting the target domain image into the target domain model to obtain a corresponding image classification result includes:

And inputting all the target domain images in the current batch into the target domain model to obtain image classification results corresponding to all the target domain images.

The invention discloses a model field self-adaptive device based on double-layer vision, wherein the device comprises:

The input module is used for acquiring a target domain image, inputting the target domain image into a pre-trained source domain model, wherein a double-layer visual condition mark is added in the source domain model, and the double-layer visual condition mark is used for learning long-term changes of specific features of the field and local changes of specific features of sample examples of the field deviation;

The updating module is used for converting the target domain image into an image block vector by utilizing an embedding layer in the source domain model, inputting the image block vector and the double-layer visual condition mark into an encoder of the source domain model, and updating parameters of the double-layer visual condition mark and a normalization layer of the source domain model in a counter-propagation way to obtain a target domain model;

And the classification module is used for inputting the target domain image into the target domain model to obtain a corresponding image classification result.

The invention discloses a terminal, which comprises a memory, a processor and a model field self-adaptive program based on double-layer vision, wherein the model field self-adaptive program is stored in the memory and can run on the processor, and the model field self-adaptive program based on double-layer vision realizes the steps of the model field self-adaptive method based on double-layer vision when being executed by the processor.

The present invention discloses a computer readable storage medium storing a computer program executable for implementing the steps of the model-domain adaptation method based on dual-layer vision as described above.

The invention provides a model field self-adaption method, device, terminal and medium based on double-layer vision, wherein the method comprises the steps of obtaining a target field image, inputting the target field image into a pre-trained source field model, adding a double-layer vision condition mark in the source field model, enabling the double-layer vision condition mark to be used for learning long-term changes of field specific features and local changes of sample instance specific features of field deviation, converting the target field image into an image block vector by utilizing an embedding layer in the source field model, inputting the image block vector and the double-layer vision condition mark into an encoder of the source field model, reversely transmitting and updating parameters of normalization layers of the double-layer vision condition mark and the source field model to obtain a target field model, and inputting the target field image into the target field model to obtain a corresponding image classification result. According to the invention, the double-layer visual condition mark is added in the source domain model, and the parameters of the double-layer visual condition mark and the normalization layer of the source domain model can be back-propagated and updated in real time, so that the real-time online model updating is realized, and the model updating efficiency in an actual test scene is improved.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of a model field adaptation method based on dual-layer vision in the present invention;

FIG. 2 is a logical schematic block diagram of a preferred embodiment of the model field adaptation method based on dual-layer vision of the present invention;

FIG. 3 is a functional block diagram of a preferred embodiment of a model-domain adaptive device based on dual-layer vision in accordance with the present invention;

fig. 4 is a functional block diagram of a preferred embodiment of the terminal of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In the task of image classification in transfer learning, when a model trained by a source domain is transferred to a target domain, performance is often greatly compromised. Most of the existing adaptive methods in the field of transfer learning need to access training data of a source domain, which is not applicable in some situations where data transmission is difficult or privacy-sensitive. To address such data transmission or privacy issues, passive domain adaptation is proposed that does not require access to source domain data, whereas current passive domain adaptation methods require access to the entire test set data and multiple rounds of training. The embodiment of the application provides complete adaptation in test, only needs to access the real-time flow of the test sample, and can dynamically adapt to the source model in the test process. Since the key task of adaptation at test time is to learn domain-specific (domain-specific) information and separate domain-specific information from the input sample features. In this work, vision Transformer (ViT) may be used as a backbone encoder for the image classification task. In ViT, class Token (Class Token) training is used to capture semantic information of the source image. Due to the distribution offset, this pre-trained token may not generalize to the target domain image because the a priori knowledge about the semantics embedded in the class token is source domain specific and the pre-trained class token has no a priori knowledge about the domain offset it encounters during testing.

The present invention finds that during the adaptation at test time, the class flags of the first layer encoder can be learned to learn the domain offset characteristics, referred to as visual condition flags (Visual Conditioning Token, VCT). Once the visual condition markers are successfully learned, an adjustment operation can be performed on the input image to correct image disturbances caused by domain offsets. As domain offset perturbations or breaks are removed from the image features, the network model will be more robust to domain offset, thereby significantly improving generalization performance. At the domain level, different distribution variations have different effects on the overall model performance of the test dataset. Furthermore, at each individual image level, the performance impact of the distribution offset on the different images is different. Thus, to successfully learn visual condition signs, the present invention proposes a two-layer learning method to characterize the long-term variation of domain-specific features and the local variation of instance-specific features of domain offsets.

The invention aims to learn a long-term change of domain-specific characteristics and a local change of sample instance-specific characteristics of domain offset by learning a self-adaptive model in a complete test through double-layer visual condition marks (Visual Conditioning Token), and learn class marks of a first-layer encoder so as to solve the problem of reduced accuracy of a test set caused by domain drift in migration learning, and effectively adapt to a source model in the test process so as to solve the problem of degradation of cross-domain performance of a deep neural network.

Referring to fig. 1, fig. 1 is a flowchart of a model domain adaptation method based on dual-layer vision in the present invention. As shown in fig. 1, the model field adaptive method based on double-layer vision according to the embodiment of the invention includes:

step S100, acquiring a target domain image, inputting the target domain image into a pre-trained source domain model, wherein a double-layer visual condition mark is added in the source domain model, and the double-layer visual condition mark is used for learning long-term changes of specific features of the field and local changes of specific features of sample examples of the field deviation.

Specifically, as the class mark can learn domain-specific information in the transducer model, the invention provides a novel self-adaptive model for the double-layer visual condition mark (Visual Conditioning Token) to learn complete testing, so as to solve the problem of reduced accuracy of a testing set caused by domain drift in transfer learning. In image classification based on a transducer model, class flags of the first layer encoder can be learned, and the target domain images are sequenced in an inference phase to capture domain-specific features of the target domain images during adaptation at test time. The double-layer learning method provided by the invention can learn the long-term change of the specific characteristics of the field, adapt to the local change of the specific characteristics of the example, and effectively adapt to the source model in the test process so as to solve the problem of cross-domain performance degradation of the deep neural network.

In an embodiment of the application, the dual-layer visual condition markers include a domain-specific marker for learning long-term changes in domain-specific features and a sample instance-specific marker for learning local changes in domain-shifted sample instance-specific features. The initialization parameter of the domain specific mark is a class mark pre-trained in a source domain model, and the initialization parameter of the sample instance specific mark is a zero vector.

Specifically, a source domain model is trained in a source domain, then unsupervised domain adaptation is performed in a target domain, the source domain model can use any pre-trained transducer model, the parameter theta_L of the specific mark in the initialization domain is a pre-trained class mark, the class mark is a class mark of a first layer encoder, and the parameter theta_S of the specific mark in the initialization sample instance is a 0 vector. The formula is as follows:

Where η_l denotes the learning rate of the domain-specific flag, η_s denotes the learning rate of the sample instance-specific flag,Representing the gradient of the domain-specific marker,Representing the gradient of a sample instance specific mark, theta_t represents the weight parameter of the whole model at the time t, and x represents the input target domain image.

In the embodiment of the present application, the step S100 specifically includes:

step S110, acquiring a target domain data set for model domain self-adaption, wherein the target domain data set comprises a plurality of target domain images;

And step S120, inputting all the target domain images in the target domain data set into a pre-trained source domain model in batches.

Specifically, the method can perform field adaptation by only accessing the current batch data in the test inference stage, and when the pre-trained transducer model is migrated to a downstream test scene, the pre-trained source field model is input in batches, so that the field specific mark, the sample instance specific mark and the normalization layer can be updated in real time to obtain the target field model, and the updating efficiency is improved. The existing technical scheme does not utilize long-term change of specific features in the field of class mark learning and local change of specific features of sample examples of field deviation, and can realize model updating only by accessing the whole target field data and training for many times, so that the updating efficiency is low.

As shown in fig. 1, the model domain adaptive method based on dual-layer vision according to the embodiment further includes:

Step 200, converting the target domain image into an image block vector by using an embedding layer in the source domain model, inputting the image block vector and the double-layer visual condition mark into an encoder of the source domain model, and updating parameters of the double-layer visual condition mark and a normalization layer of the source domain model by back propagation to obtain the target domain model.

According to the embodiment of the application, the double-layer visual condition mark and the normalization layer of the source domain model are updated through back propagation, so that the field self-adaption performance during testing can be remarkably improved, and the method is superior to the existing method.

In the embodiment of the present application, the step S200 specifically includes:

Step S210, respectively segmenting all the target domain images in the current batch into a plurality of image blocks by utilizing an embedded layer in the source domain model, and converting each image block into an image block vector;

step S220, inputting the image block vector, the domain specific mark and the sample instance specific mark into an encoder of the source domain model, and using a preset information entropy loss function to reversely propagate and update parameters of the normalization layers of the domain specific mark, the sample instance specific mark and the source domain model to obtain a target domain model.

Referring to fig. 2, the target field image is divided into a fixed number of tiles (patches), each tile typically being a small square area. Each image block is converted to a vector representation by an embedding layer (embedding layer), typically by a linear projection, to convert the pixel values of the image block to a vector of fixed dimensions. Position coding is added to the embedded vector of each image block to capture the position information of the image block in the original image. In the first layer encoder, the present invention adds a domain-specific flag and a sample instance-specific flag in addition to the embedded vector of the image block. The embedded layer and position encoded image block vector and the domain specific flags and sample instance specific flags are input into a transform encoder. The transform encoder uses a self-attention mechanism to capture global relationships and dependencies between image blocks to encode the input sequence. The ViT model typically contains multiple layers of transform encoders, each of which processes the input and passes it on to the next layer to progressively extract and integrate the image information. The output of the last transducer encoder is fed into a multi-layer perceptron (MLP) for classification tasks.

According to the method, after the target domain image of the current batch is input, the information entropy loss function of the output probability distribution is used for back propagation and updating the domain specific mark, the sample instance specific mark and the normalization layer (Layer Normalization layers) of the source domain model, so that the data of the current batch is accessed, the model is dynamically fine-tuned, and the robustness of the model in the cross-domain is improved.

The information entropy loss function is as follows:

Wherein E₀ is an information entropy threshold for screening out samples with excessive noise, E is an information entropy function, and B_j represents a batch of test samples.

In addition, when the images of the current batch are a plurality of, the information entropy loss function takes the average value of the information entropy loss functions corresponding to the images.

In one embodiment of the present application, several target domain images of each batch are simultaneously input into a pre-trained source domain model, and the source domain model used by the current batch is the target domain model updated by the previous batch.

Unlike available field adaptive method, the present invention needs no access to source field data or repeated iterative training with the whole target field test set data, and only needs access to the current batch data during test inference to make field adaptation, and the model is dynamically fine-tuned to raise the robustness of the model in field.

The invention utilizes the self-adaptive model in the full test of double-layer visual condition mark learning to learn the class mark of the first layer encoder to learn the long-term change of the specific characteristics of the field and the local change of the specific characteristics of the sample instance of the field deviation, so as to solve the problem of the reduced accuracy of the test set caused by the field drift in the transfer learning, and effectively adapt to the source model in the test process so as to solve the problem of the cross-domain performance degradation of the deep neural network.

In one embodiment of the application, a gradient descent method is adopted when updating the parameters of the domain specific mark, the sample instance specific mark and the normalization layer of the source domain model, and before the target domain image of each batch is input, the parameters and the gradient of the sample instance specific mark in the target domain model obtained in the previous batch are set to zero.

Specifically, a lot of test samples B_j is given during the test, the gradient descent is used to update the parameters θ_L of the Domain-specific Token (Domain-specific Token) and the parameters θ_S of the sample Instance-specific Token (Instance-specific Token) and the parameters of the normalization layer, and the current lot prediction result is output after the update. Wherein, parameter θ_S and gradient are set to zero after each batch predicts the result. That is, the domain-specific flags and normalization layer are updated cumulatively between different batches, while the sample instance-specific flags are reset to 0 after each batch update. This is due to the differences in characteristics between different batches.

and step S300, inputting the target domain image into the target domain model to obtain a corresponding image classification result.

In one embodiment of the present application, the step S300 specifically includes inputting all the target domain images in the current batch into the target domain model to obtain image classification results corresponding to the target domain images.

Specifically, the method can update the source domain model only by accessing one batch of data, realizes real-time update, does not need to use all test data to perform repeated iterative training, and improves the update efficiency.

In an embodiment, as shown in fig. 3, based on the model field adaptive method based on double-layer vision, the invention further correspondingly provides a model field adaptive device based on double-layer vision, which comprises:

The input module 100 is configured to acquire a target domain image, input the target domain image into a pre-trained source domain model, and add a double-layer visual condition mark to the source domain model, where the double-layer visual condition mark is used for learning long-term changes of a specific feature of a domain and local changes of a specific feature of a sample instance of domain offset;

The updating module 200 is configured to convert the target domain image into an image block vector by using an embedding layer in the source domain model, input the image block vector and the dual-layer visual condition mark into an encoder of the source domain model, and update parameters of the dual-layer visual condition mark and a normalization layer of the source domain model by back propagation to obtain a target domain model;

and the classification module 300 is used for inputting the target domain image into the target domain model to obtain a corresponding image classification result.

Fig. 4 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal may include:

memory 501, processor 502, and a computer program stored on memory 501 and executable on processor 502.

The processor 502 implements the model-domain-adaptive method based on dual-layer vision provided in the above embodiment when executing a program.

Further, the terminal further includes:

a communication interface 503 for communication in the memory 501 and the processor 502.

Memory 501 for storing a computer program executable on processor 502.

The memory 501 may include high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 501, the processor 502, and the communication interface 503 are implemented independently, the communication interface 503, the memory 501, and the processor 502 may be connected to each other via a bus and perform communication with each other. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (PERIPHERA L Component, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the figures are shown with only one line, but not with only one bus or one type of bus.

Alternatively, in a specific implementation, if the memory 501, the processor 502, and the communication interface 503 are integrated on a chip, the memory 501, the processor 502, and the communication interface 503 may perform communication with each other through internal interfaces.

The processor 502 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the application.

The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the model-domain adaptation method based on dual-layer vision as above.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, for example, two, three, etc., unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can read instructions from and execute instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include an electrical connection (an electronic device) having one or more wires, a portable computer diskette (a magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware as in another embodiment, it may be implemented using any one or combination of techniques known in the art, discrete logic circuits with logic gates for implementing logic functions on data signals, application specific integrated circuits with appropriate combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, where the program when executed includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented as software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

In summary, the method comprises the steps of obtaining a target domain image, inputting the target domain image into a pre-trained source domain model, adding a double-layer visual condition mark in the source domain model, wherein the double-layer visual condition mark is used for learning long-term changes of specific features of the domain and local changes of specific features of sample examples of domain deviation, converting the target domain image into an image block vector by utilizing an embedding layer in the source domain model, inputting the image block vector and the double-layer visual condition mark into an encoder of the source domain model, back-propagating and updating parameters of normalization layers of the double-layer visual condition mark and the source domain model to obtain a target domain model, and inputting the target domain image into the target domain model to obtain a corresponding image classification result. According to the invention, the double-layer visual condition mark is added in the source domain model, and the parameters of the double-layer visual condition mark and the normalization layer of the source domain model can be back-propagated and updated in real time, so that the real-time online model updating is realized, and the model updating efficiency in an actual test scene is improved.

It is to be understood that the invention is not limited in its application to the examples described above, but is capable of modification and variation in light of the above teachings by those skilled in the art, and that all such modifications and variations are intended to be included within the scope of the appended claims.

Claims

1. A model domain adaptive method based on double-layer vision, the method comprising:

inputting the target domain image into the target domain model to obtain a corresponding image classification result;

The double-layer visual condition mark comprises a domain specific mark and a sample instance specific mark, wherein the domain specific mark is used for learning the long-term change of domain specific characteristics, the sample instance specific mark is used for learning the local change of the sample instance specific characteristics of domain deviation, the initialization parameter of the domain specific mark is a class mark pre-trained in a source domain model, and the initialization parameter of the sample instance specific mark is a zero vector;

And before the target domain image of each batch is input, setting the parameters and gradients of the sample instance specific mark in the target domain model obtained in the previous batch to zero.

2. The model domain adaptive method based on double-layer vision according to claim 1, wherein acquiring a target domain image, inputting the target domain image into a pre-trained source domain model, comprises:

3. The model domain adaptive method based on double-layer vision according to claim 2, wherein converting the target domain image into an image block vector by using an embedding layer in the source domain model, inputting the image block vector and the double-layer vision condition mark into an encoder of the source domain model, and back-propagating parameters of normalizing layers of the double-layer vision condition mark and the source domain model to obtain a target domain model, comprising:

4. A model domain adaptive method based on double-layer vision according to claim 3, wherein the plurality of target domain images of each batch are simultaneously input into a pre-trained source domain model, and the source domain model used by the current batch is the target domain model updated by the previous batch.

5. The model domain adaptive method based on double-layer vision according to claim 2, wherein inputting the target domain image into the target domain model to obtain a corresponding image classification result comprises:

6. A model domain adaptive device based on dual-layer vision, the device comprising:

The classification module is used for inputting the target domain image into the target domain model to obtain a corresponding image classification result;

7. A terminal, characterized by comprising a memory, a processor and a model domain adaptive program based on double-layer vision stored on the memory and capable of running on the processor, wherein the model domain adaptive program based on double-layer vision realizes the steps of the model domain adaptive method based on double-layer vision according to any one of claims 1-5 when being executed by the processor.

8. A computer readable storage medium, characterized in that it stores a computer program executable for implementing the steps of the model-domain adaptation method based on double-layer vision according to any one of claims 1 to 5.