CN116152645A

Movatterモバイル変換

Info

Publication number: CN116152645A
Application number: CN202310157638.5A
Authority: CN
Inventors: 张宁; 董乐; 赵浩然
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2023-02-23
Filing date: 2023-02-23
Publication date: 2023-05-23

Abstract

Translated fromChinese

本发明公开了一种融合多种表征平衡策略的室内场景视觉识别方法及系统，包括：使用预热好的模型计算长尾训练集中每个类别的类别中心；通过不同的重采样策略构建多个具有不同特征分布的训练子集；结合自定义的损失函数，使用所述训练子集对预热后的模型进行训练直至损失函数收敛，让模型倾向于学习到在各个训练子集间平衡的特征；处理训练集中类别内部的特征不平衡问题。同时在分类器上施加正则项来调整头尾部类的权重差异，当所述损失函数收敛到一定程度后得到训练好的模型，减少训练集中因为类别样本的不均衡造成的分类器上各类别权重的不平衡。本发明同时解决类别样本不均衡和类别内样本的非类别属性的不平衡给模型训练带来的问题。

The invention discloses an indoor scene visual recognition method and system that integrates multiple representation balance strategies, including: using a preheated model to calculate the category center of each category in the long-tail training set; constructing multiple Training subsets with different feature distributions; combined with a custom loss function, use the training subset to train the preheated model until the loss function converges, so that the model tends to learn features that are balanced between each training subset ; deal with feature imbalance within classes in the training set. At the same time, a regular term is applied to the classifier to adjust the weight difference of the head and tail classes. When the loss function converges to a certain level, the trained model is obtained, which reduces the weight of each class on the classifier caused by the imbalance of class samples in the training set. imbalance. The invention simultaneously solves the problems brought about by the imbalance of category samples and the imbalance of non-category attributes of samples within a category to model training.

Description

Translated fromChinese

一种融合多种表征平衡策略的室内场景视觉识别方法及系统A Method and System for Indoor Scene Visual Recognition Integrating Multiple Representation Balancing Strategies

技术领域technical field

本发明涉及长尾视觉识别和平衡表征学习领域，尤其涉及一种融合多种表征平衡策略的室内场景视觉识别方法及系统。The invention relates to the field of long-tail visual recognition and balanced representation learning, in particular to an indoor scene visual recognition method and system that integrates multiple representational balance strategies.

背景技术Background technique

长尾视觉识别是计算机视觉领域最具挑战性和关键的技术之一，因为任何自然采集获取的数据集都或多或少存在长尾分布的不平衡问题，这些隐患往往被人们忽视，从而对模型训练带来一些莫名其妙的影响。在之前的研究之中被人们关注的是类别间的不平衡，这在绝大多数计算机视觉任务中已经得到人工解决。比如现在公开的绝大多数数据集都会在自然采集后经过人工的类别均衡处理，所以除非是专门的长尾研究领域，目前公开的数据集大都是类别均衡的。但这不表示类别再平衡问题就是无意义的，因为这一研究会减弱自然采集的数据集后的人工类别均衡操作的必要性，减少这一环节的时间和人力成本。除了类别均衡问题，还有一类问题在之前并未得到关注，且未得到解决，就是类内的长尾不平衡问题，比如一些常见的现象：为什么同一类内的样本的表现不一致且呈现长尾分布；为什么尾部类的一些样本在视觉识别任务中会被预测为具有类似属性的头部类类别。Long-tail visual recognition is one of the most challenging and key technologies in the field of computer vision, because any data set collected naturally has more or less imbalanced long-tail distribution, and these hidden dangers are often ignored by people. Model training has some inexplicable effects. What has attracted attention in previous studies is the imbalance between classes, which has been manually addressed in most computer vision tasks. For example, most of the publicly available datasets will undergo artificial category balancing after natural collection, so unless it is a specialized long-tail research field, most of the currently public datasets are category-balanced. But this does not mean that the category rebalancing problem is meaningless, because this research will weaken the necessity of manual category balancing operation after naturally collected data sets, and reduce the time and labor costs of this link. In addition to the class balance problem, there is another class of problems that have not been paid attention to before and have not been resolved, which is the long-tail imbalance problem within the class, such as some common phenomena: why the performance of samples in the same class is inconsistent and presents a long tail Distribution; why some samples from the tail class are predicted as head class categories with similar attributes in visual recognition tasks.

在现有的长尾视觉识别任务中，特别是在室内场景下(例如教室、食堂、商场等)，由于室内空间下物体的类别间样本分布呈现长尾分布，使用获取的训练集直接训练得到的模型无法得到与同类别下类别均衡的测试集相近的分布。样本的类别丰富度高意味着可以避免潜在的混杂因子，相反意味着更容易受到混杂因子的影响。本方法着眼于在室内等狭小空间下利用长尾分布数据训练得到在视觉识别任务下使用的更鲁棒更平衡的特征。In the existing long-tail visual recognition tasks, especially in indoor scenes (such as classrooms, canteens, shopping malls, etc.), since the distribution of samples between categories of objects in indoor spaces presents a long-tail distribution, the obtained training set is directly trained to obtain The model cannot obtain a distribution close to that of a class-balanced test set of the same class. The high class richness of the sample means that potential confounding factors can be avoided, on the contrary, it means that it is more susceptible to confounding factors. This method focuses on using long-tail distribution data training in indoor and other narrow spaces to obtain more robust and balanced features used in visual recognition tasks.

长尾视觉识别任务旨在提高给定的长尾训练集在类别均衡评估方法下的表现。长尾分布数据集下最明显的混杂因素就是“类别”，所以先对“类别”进行去混杂操作。如何有效的利用不均衡的数据，从而降低数据采集时的成本，训练出更平衡的模型是我们所关心的问题。对于“类别间不平衡”的再平衡方法整体上看分四类：第一类是在训练数据集上的重采样策略，例如对头部类的下采样和对尾部类的上采样，这种方法存在数据集的充分利用问题，对头部类的下采样使得一部分数据没有得到充分利用，尾部类的上采样又有新的样本分布与原始分布偏差的问题。第二类是重加权，即在训练阶段对损失函数的处理上，因为损失函数计算的灵活性和简便性的优势，该方法被应用在很多需要复杂建模的任务上。第三种是迁移学习，这种方式基于长尾数据分布的不平衡，先充分学习头部类的样本，然后将学到的知识通过某种方式迁移到尾部类别的特征学习之中，比如利用头部类的分布信息来进行尾部类的样本增强，这种方式往往模型复杂。第四类是模型集成，旨在通过多个子模型来同时提升长尾训练集中头部类和尾部类的表现。The long-tail visual recognition task aims to improve the performance of the given long-tail training set under the class balance evaluation method. The most obvious confounding factor under the long-tail distribution data set is the "category", so the "category" should be deconfounded first. How to effectively use unbalanced data, thereby reducing the cost of data collection and training a more balanced model is our concern. The rebalancing methods for "inter-category imbalance" are generally divided into four categories: the first category is the resampling strategy on the training data set, such as downsampling for the head class and upsampling for the tail class. The method has the problem of fully utilizing the data set. The down-sampling of the head class makes a part of the data not fully utilized, and the up-sampling of the tail class has the problem that the new sample distribution deviates from the original distribution. The second category is reweighting, that is, the processing of the loss function in the training phase. Because of the flexibility and simplicity of loss function calculation, this method is applied to many tasks that require complex modeling. The third is transfer learning. This method is based on the imbalance of long-tail data distribution. First, fully learn the samples of the head class, and then transfer the learned knowledge to the feature learning of the tail class in a certain way, such as using The distribution information of the head class is used to enhance the samples of the tail class. This method often has a complex model. The fourth category is model integration, which aims to simultaneously improve the performance of the head and tail classes in the long-tail training set through multiple sub-models.

此外，已有的研究多是针对类间不平衡问题，给出的方法是训练得到一个类间不平衡的分类器，这个分类器倾向于提高尾部类的置信度并且抑制头部类的置信度，从而“纠正”模型原来倾向于把尾部类样本预测为头部类类别的问题，达到头尾部类的类间平衡。但即使类别间的样本数是均衡的，同一类别内部的样本还会因为特征的分布不均衡等原因造成类内的不均衡，影响模型的识别效果。In addition, most of the existing research is aimed at the problem of inter-class imbalance. The method given is to train a classifier with inter-class imbalance. This classifier tends to increase the confidence of the tail class and suppress the confidence of the head class. , so as to "correct" the problem that the model tends to predict the tail class samples as the head class, so as to achieve the balance between the head and tail classes. But even if the number of samples between categories is balanced, the samples within the same category will still be unbalanced within the category due to unbalanced distribution of features and other reasons, which will affect the recognition effect of the model.

发明内容Contents of the invention

本发明的目的在于克服室内场景获取的长尾训练集中的平衡表征学习问题，尤其是针对之前被忽略的类内不平衡问题，提供了一种融合多种表征平衡策略的室内场景视觉识别方法及系统。The purpose of the present invention is to overcome the problem of balanced representation learning in the long-tail training set of indoor scene acquisition, especially for the previously neglected intra-class imbalance problem, to provide a visual recognition method for indoor scenes that integrates multiple representation balance strategies and system.

本发明的目的是通过以下技术方案来实现的：The purpose of the present invention is achieved through the following technical solutions:

在第一方案中，提供一种融合多种表征平衡策略的室内场景视觉识别方法，所述方法包括：In the first solution, a method for visual recognition of an indoor scene incorporating multiple representational balance strategies is provided, the method comprising:

S1、采样得到长尾训练集；S1. Sampling to obtain a long-tail training set;

S2、预热模型并自定义损失函数；S2. Preheat the model and customize the loss function;

S3、使用预热好的模型计算长尾训练集中每个类别的类别中心；S3. Use the preheated model to calculate the category center of each category in the long-tail training set;

S4、通过不同的重采样策略构建多个具有不同特征分布的训练子集；S4. Construct multiple training subsets with different feature distributions through different resampling strategies;

S5、结合自定义的损失函数，使用所述训练子集对预热后的模型进行训练直至损失函数收敛，让模型倾向于学习到在各个训练子集间平衡的特征；S5. In combination with a custom loss function, use the training subset to train the preheated model until the loss function converges, so that the model tends to learn features that are balanced between each training subset;

S6、在步骤S5中训练后模型的分类器上施加正则项来调整头尾部类的权重差异，当所述损失函数收敛到一定程度后得到训练好的模型。S6. Apply a regularization term to the classifier of the trained model in step S5 to adjust the weight difference between the head and tail classes, and obtain a trained model when the loss function converges to a certain degree.

S7、使用训练好的模型进行视觉识别。S7. Perform visual recognition using the trained model.

作为一优选项，一种融合多种表征平衡策略的室内场景视觉识别方法，所述通过不同的重采样策略构建多个具有不同特征分布的训练子集，包括：As a preference, a method for visual recognition of indoor scenes that integrates multiple representation balance strategies, wherein multiple training subsets with different feature distributions are constructed through different resampling strategies, including:

对长尾训练集中的每个类别分别使用不同重采样方式得到多个新的小子集，然后将所有使用相同重采样方式的小子集合在一起，得到多个大的训练子集。Different resampling methods are used for each category in the long-tail training set to obtain multiple new small subsets, and then all the small subsets using the same resampling method are combined to obtain multiple large training subsets.

作为一优选项，一种融合多种表征平衡策略的室内场景视觉识别方法，所述不同重采样方式包括：As a preferred option, a method for visual recognition of indoor scenes that integrates multiple representational balance strategies, the different resampling methods include:

一种是对每个类别中的样本赋予同样的权重进行采样，另一种是按照二八定律对每个类别中的样本赋予权重进行采样。One is to assign the same weight to the samples in each category for sampling, and the other is to assign weights to the samples in each category for sampling according to the 28th law.

作为一优选项，一种融合多种表征平衡策略的室内场景视觉识别方法，按照二八定律对每个类中的样本赋予权重进行采样，包括：As a preferred option, an indoor scene visual recognition method that integrates multiple representational balance strategies, according to the 28th law, assigns weights to samples in each class to sample, including:

对当前类别中部分样本进行上采样直到类别中预测置信度最低的百分之二十的样本的占比达到了原集合样本数的百分之八十；同时将当前类别中另外一部分样本通过下采样变为原集合样本数的百分之二十。Upsampling some of the samples in the current category until the proportion of the 20% of the samples with the lowest prediction confidence in the category reaches 80% of the original set of samples; at the same time, the other part of the samples in the current category are passed through the following The sampling becomes twenty percent of the original collection samples.

作为一优选项，一种融合多种表征平衡策略的室内场景视觉识别方法，所述上采样的方法是MixUp数据增强。As a preferred option, an indoor scene visual recognition method that integrates multiple representation balance strategies, the up-sampling method is MixUp data enhancement.

作为一优选项，一种融合多种表征平衡策略的室内场景视觉识别方法，所述使用所述训练子集对预热后的模型进行训练直至损失函数收敛，包括：As a preference, a method for visual recognition of indoor scenes that integrates multiple representation balancing strategies, wherein the training subset is used to train the preheated model until the loss function converges, including:

周期性的重复步骤S4重新构建训练子集，并利用重新构建的训练子集训练模型。Step S4 is periodically repeated to reconstruct the training subset, and use the reconstructed training subset to train the model.

作为一优选项，一种融合多种表征平衡策略的室内场景视觉识别方法，所述周期性指的是每20个epoch重复一次。As a preferred option, an indoor scene visual recognition method that integrates multiple representational balance strategies, the periodicity refers to repeating every 20 epochs.

作为一优选项，一种融合多种表征平衡策略的室内场景视觉识别方法，每次重复之前更新每个类别的类别中心。As a preferred option, a method for visual recognition of indoor scenes that incorporates multiple representation balancing strategies, updating the category center of each category before each iteration.

作为一优选项，一种融合多种表征平衡策略的室内场景视觉识别方法，所述步骤S6包括：As a preferred option, a method for visual recognition of indoor scenes that integrates multiple representational balance strategies, the step S6 includes:

随机初始化模型分类器的参数并周期性的使用重采样得到的训练子集对分类器进行单独调整。The parameters of the model classifier are randomly initialized and the classifier is adjusted individually using the training subset obtained by resampling periodically.

在第二方案中，提供一种融合多种表征平衡策略的室内场景视觉识别系统，所述系统包括：In the second solution, an indoor scene visual recognition system incorporating multiple representational balance strategies is provided, the system comprising:

数据采集模块，用于采样得到长尾训练集；Data acquisition module, used for sampling to obtain long tail training set;

模型预热模块，用于预热模型并自定义损失函数；Model preheating module, used to preheat the model and customize the loss function;

类别中心计算模块，使用预热好的模型计算长尾训练集中每个类别的类别中心；The category center calculation module uses the preheated model to calculate the category center of each category in the long-tail training set;

训练子集构建模块，通过不同的重采样策略构建多个具有不同特征分布的训练子集；The training subset construction module constructs multiple training subsets with different feature distributions through different resampling strategies;

类内平衡训练模块，结合自定义的损失函数，使用所述训练子集对预热后的模型进行训练直至损失函数收敛，让模型倾向于学习到在各个训练子集间平衡的特征；The intra-class balance training module, combined with a custom loss function, uses the training subset to train the preheated model until the loss function converges, so that the model tends to learn features that are balanced between each training subset;

类间平衡训练模块，在类内平衡训练模块得到的模型的分类器上施加正则项来调整头尾部类的权重差异，当所述损失函数收敛到一定程度后得到训练好的模型。The inter-class balance training module applies a regular term to the classifier of the model obtained by the intra-class balance training module to adjust the weight difference of the head and tail classes. When the loss function converges to a certain degree, a trained model is obtained.

识别模块，使用训练好的模型进行视觉识别。The recognition module uses the trained model for visual recognition.

需要进一步说明的是，上述各选项对应的技术特征在不冲突的情况下可以相互组合或替换构成新的技术方案。It should be further explained that the technical features corresponding to the above options can be combined or replaced to form a new technical solution if there is no conflict.

与现有技术相比，本发明有益效果是：Compared with prior art, the beneficial effect of the present invention is:

(1)本发明通过不同的重采样策略构建多个具有不同特征分布的训练子集，然后结合自定义的中心损失函数，使用所述训练子集对预热后的模型进行训练直至损失函数收敛，让模型倾向于学习到在各个训练子集间平衡的特征，使得模型倾向于学到类内无偏的表征，解决了之前一直被忽略的类内偏移问题；同时在分类器上施加正则项来调整头尾部类的权重差异，当所述损失函数收敛到一定程度后得到训练好的模型，减少训练集中因为类别样本的不均衡造成的分类器上各类别权重的不平衡。本发明同时解决类别样本不均衡和类别内样本的非类别属性的不平衡给模型训练带来的问题，在之前类间平衡工作的基础上，进一步考虑一些“非类别因素”如背景、姿势、视角等类内长尾对视觉识别任务结果带来的影响，两方面平衡方法互补，使得模型整体效果更好。(1) The present invention constructs multiple training subsets with different feature distributions through different resampling strategies, and then uses the training subsets to train the preheated model in combination with a self-defined central loss function until the loss function converges , so that the model tends to learn features that are balanced between each training subset, so that the model tends to learn unbiased representations within the class, which solves the problem of intra-class offset that has been ignored before; at the same time, regularization is applied to the classifier Items to adjust the weight difference between the head and tail classes. When the loss function converges to a certain extent, a trained model is obtained, which reduces the imbalance of the weights of each class on the classifier caused by the imbalance of class samples in the training set. The present invention simultaneously solves the problems brought about by the unbalanced category samples and non-category attributes of samples within the category to model training. On the basis of the previous balance work between categories, some "non-category factors" such as background, posture, The impact of the long tail of the category such as the viewing angle on the results of the visual recognition task, the two balanced methods complement each other, making the overall effect of the model better.

(2)本发明明确指出并纳入建模的“类内长尾”解释了之前研究没有解决的：为什么同一类内部的表现呈现长尾分布；为什么长尾训练集中尾部类的一些样本会被预测为具有类似“非类别属性”的头部类别。并且改善了原有研究在准确率和精确率上做权衡而没有同时提升的问题。(2) The "intra-class long tail" clearly pointed out and included in the modeling in the present invention explains what was not solved in previous studies: why the performance within the same class presents a long-tail distribution; why some samples of the tail class in the long-tail training set will be predicted for the head category with something like "non-category attribute". And it has improved the problem that the original research made a trade-off between accuracy and precision without improving at the same time.

(3)本发明给出的“类内偏移”平衡方法具有非侵入性的特点，可以结合很多方法比如cRT、LWS、BalancedSoftmax，BBN等，本发明的方法可直接嵌入到现有的长尾识别模型之中而不影响模型之前的结构，且为该模型提供类内偏移平衡的能力，实现互补。(3) The "intra-class offset" balancing method provided by the present invention has the characteristics of non-invasiveness, and can be combined with many methods such as cRT, LWS, BalancedSoftmax, BBN, etc., and the method of the present invention can be directly embedded into the existing long tail The ability to identify the model without affecting the previous structure of the model, and provide the model with the ability to balance the intra-class offset, to achieve complementarity.

附图说明Description of drawings

图1为本发明实施例示出的一种融合多种表征平衡策略的室内场景视觉识别方法的流程图；FIG. 1 is a flow chart of an indoor scene visual recognition method that integrates multiple representation balancing strategies shown in an embodiment of the present invention;

图2为本发明实施例示出的类间样本长尾分布和类内属性长尾分布的图示；Fig. 2 is an illustration of the long-tail distribution of inter-class samples and the long-tail distribution of intra-class attributes shown in the embodiment of the present invention;

图3为本发明实施例示出的长尾视觉识别问题新建模方式的结构因果图；Fig. 3 is the structural cause and effect diagram of the new modeling method of the long tail visual recognition problem shown in the embodiment of the present invention;

图4为本发明实施例示出的整体框架示意图。Fig. 4 is a schematic diagram of the overall framework of the embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图对本发明的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions of the present invention will be clearly and completely described below in conjunction with the accompanying drawings. Apparently, the described embodiments are part of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

此外，下面所描述的本发明不同实施方式中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as there is no conflict with each other.

在一示例性实施例中，提供一种融合多种表征平衡策略的室内场景视觉识别方法，如图1所示，所述方法包括：In an exemplary embodiment, a method for visual recognition of an indoor scene incorporating multiple representational balance strategies is provided, as shown in FIG. 1 , the method includes:

具体地，目前已有的长尾视觉识别平衡方法的成功主要来源于通过扩大尾部类的置信边界去囊括更多的尾部类样本从而提升尾部类的准确率。这其实是一种准确率和精确率的权衡，因为导致混淆的特征并没有被关注，没有真正的引导模型去忽视这些导致混淆的特征，准确率被定义为

精确率被定义为

其中，#AllSamples指的是数据集中的图片样本总数，#CorrectPredictions指的是模型预测了正确类别(预测对了)的图片样本数，准确率总的来说是模型对n张图片做类别预测，其中预测对了的比例。#SamplesPredictionAsThisClass指的是模型预测为某个类别的图片样本数量，比如模型将100张图片中的10张预测为老虎，12张预测为老鼠，那么老虎类的#SamplesPredictionAsThisClass就等于10，老鼠类的#SamplesPredicti onAsThisClass就等于12，精确率也叫“查准率”，指的是数据集中被预测为某个类别的样本的这些样本里预测正确的比例，然后每个类都算一遍加起来再处理类别总数。#class就是数据集中的类别总数，比如前面100张图片中模型将10张预测为老虎，但其中只有6张是真正的老虎，那么精确率就等于6/10。Specifically, the success of the existing long-tail visual recognition balancing methods mainly comes from expanding the confidence boundary of the tail class to include more tail class samples so as to improve the accuracy of the tail class. This is actually a trade-off between accuracy and precision, because the features that cause confusion have not been paid attention to, and there is no real guide model to ignore these features that cause confusion. The accuracy is defined as

Accuracy is defined as

Among them, #AllSamples refers to the total number of image samples in the data set, #CorrectPredictions refers to the number of image samples for which the model predicts the correct category (prediction is correct). The proportion of correct predictions. #SamplesPredictionAsThisClass refers to the number of picture samples that the model predicts as a certain category. For example, if the model predicts 10 out of 100 pictures as tigers and 12 as mice, then #SamplesPredictionAsThisClass for tigers is equal to 10, and # for rats# SamplesPredictionAsThisClass is equal to 12. The accuracy rate is also called "precision rate". It refers to the proportion of the samples that are predicted to be a certain category of samples in the data set. Then, each category is calculated and added up before processing the category. total. #class is the total number of categories in the data set. For example, the model predicts 10 of the first 100 pictures as tigers, but only 6 of them are real tigers, so the accuracy rate is equal to 6/10.

由于头尾部类的样本数量差异导致的模型分类器的头尾部类权重不同，致使的头尾部类决策边界的差异，最终导致头尾部类在模型中的表现的失衡问题，对此，本发明给出的方法是在模型的分类器上施加正则项，减少头尾部类在分类器上的权重差异，缓解类别间的不平衡。除此之外，之前流行的方法中被忽视的一点是类别内的不平衡，本发明给出的方法是通过不同的重采样策略构建多个具有不同特征分布的训练子集，然后结合自定义的中心损失函数，使用所述训练子集对预热后的模型进行训练直至损失函数收敛，让模型倾向于学习到在各个训练子集间平衡的特征，使得模型倾向于学到类内无偏的表征，解决了之前一直被忽略的类内偏移问题。本发明同时解决类别样本不均衡和类别内样本的非类别属性的不平衡给模型训练带来的问题，在之前类间平衡工作的基础上，进一步考虑一些“非类别因素”如背景、姿势、视角等类内长尾对视觉识别任务结果带来的影响，两方面平衡方法互补，使得模型整体效果更好。Due to the difference in the number of samples of the head and tail classes, the weights of the head and tail classes of the model classifier are different, resulting in the difference in the decision boundary of the head and tail classes, which eventually leads to the imbalance of the performance of the head and tail classes in the model. For this, the present invention gives The proposed method is to apply a regular term to the classifier of the model to reduce the weight difference between the head and tail classes on the classifier and alleviate the imbalance between classes. In addition, the neglected point in the previous popular methods is the imbalance within the category. The method proposed in the present invention is to construct multiple training subsets with different feature distributions through different resampling strategies, and then combine the custom The central loss function of , use the training subset to train the preheated model until the loss function converges, so that the model tends to learn features that are balanced between each training subset, so that the model tends to learn unbiased within the class The representation of , solves the problem of intra-class shift that has been ignored before. The present invention simultaneously solves the problems brought about by the unbalanced category samples and non-category attributes of samples within the category to model training. On the basis of the previous balance work between categories, some "non-category factors" such as background, posture, The impact of the long tail of the category such as the viewing angle on the results of the visual recognition task, the two balanced methods complement each other, making the overall effect of the model better.

在一个示例中，一种融合多种表征平衡策略的室内场景视觉识别方法，所述通过不同的重采样策略构建多个具有不同特征分布的训练子集，包括：In one example, a method for visual recognition of indoor scenes that integrates multiple representation balancing strategies, wherein multiple training subsets with different feature distributions are constructed through different resampling strategies, including:

具体地，样本X可以被表示成“类别信息”和“一系列属性”。也就是说样本X由两个潜在的特征向量表示，Zc和Za。其中Zc是具有不变性的类别特征，可以理解成类别的模板信息或原型，而Za是会随着领域改变而改变分布的属性特征，比如纹理、姿势、背景、光照等。这样一来，长尾数据集下的视觉识别问题就可以有一种新的建模方式，同时解释了类别偏见和属性偏见。先给出这种建模方式的结构因果模型，如图2-图3所示，Zc是类别原型，即给定类别Y会存在对应的类别原型Zc。这里定义Zc是一个二进制向量，包含了Y的多个组成部分，比如Y＝人的Zc为[头＝1，躯干＝1，胳膊＝1，腿＝1，其他＝0]。这样也是为了适应更细粒度的分类，比如Y＝牛头人的Zc就可以是[头＝1，躯干＝1，胳膊＝1.腿＝1，牛角＝1，其他＝0]，而不需要另一个不相干的独热向量来表示。而Zc有一组属性集合Za与之对应，比如“头发”有“长头发”、“短头发”等。属性Za又会收到非类别因素的外部噪音ε所影响。具体的物体图像X同时收到类别模板Zc和对应属性集Za的影响。Specifically, sample X can be expressed as "category information" and "a series of attributes". That is, sample X is represented by two latent feature vectors, Zc and Za. Among them, Zc is an invariant category feature, which can be understood as template information or prototype of the category, and Za is an attribute feature that changes distribution as the domain changes, such as texture, pose, background, lighting, etc. In this way, the problem of visual recognition under long-tail datasets can be modeled in a new way, accounting for both category bias and attribute bias. First give the structural causal model of this modeling method, as shown in Figure 2-Figure 3, Zc is a category prototype, that is, a given category Y will have a corresponding category prototype Zc. Zc is defined here as a binary vector, which contains multiple components of Y, for example, Zc of Y=person is [head=1, torso=1, arms=1, legs=1, others=0]. This is also to adapt to finer-grained classification. For example, Zc of Y=Tauren can be [head=1, torso=1, arm=1. leg=1, horn=1, other=0], without additional An unrelated one-hot vector to represent. And Zc has a set of attribute sets Za corresponding to it, for example, "hair" has "long hair", "short hair" and so on. The attribute Za will be affected by the external noise ε of non-categorical factors. A specific object image X is affected by both the category template Zc and the corresponding attribute set Za.

首先视觉识别任务可以看做P(y|x)，Zc是类内所有样本共享的，Za的不同造成了类内不同样本之间表现的差异。依照图3，可以给出如下建模公式：First of all, the visual recognition task can be regarded as P(y|x), Zc is shared by all samples in the class, and the difference in Za causes the difference in performance between different samples in the class. According to Figure 3, the following modeling formula can be given:

最后分解的结果从左至右依次是类别模板、类内属性偏移和类间偏移，这三者共同造成了不同样本之间的不同表现。这里首先关注下类内的差异，虽然Zc是类内样本共享的，但由于Za的不同也会造成同一个类的样本的视觉识别出现难易之分，比如绿色香蕉在香蕉中是尾部类，那么绿色香蕉就会变为困难样本，如图2所示。除此之外，类内属性偏移还可以解释样本是为何被错分类别的。比如，绿色在丝瓜中很常见，那么“绿色”和“瓜”之间就可能产生伪相关，让绿色香蕉有很大概率被分为了“丝瓜”类。The final decomposition results are category templates, intra-class attribute offsets, and inter-class offsets from left to right. These three together cause different performances between different samples. Here we first focus on the differences within the class. Although Zc is shared by samples within the class, the difference in Za will also cause differences in the visual recognition of samples of the same class. For example, green bananas are the tail class among bananas. Then the green banana becomes a difficult sample, as shown in Figure 2. In addition, the intra-class attribute shift can also explain why the sample is misclassified. For example, green is very common in loofah, so there may be a pseudo-correlation between "green" and "melon", so that green bananas have a high probability of being classified as "loofah".

1)样本之间的表现差异可以用如下公式解释：1) The performance difference between samples can be explained by the following formula:

因为

because

那么p(Y＝香蕉|Zc＝颜色,Za＝黄)<p(Y＝香蕉|Zc＝颜色,Za＝绿)Then p(Y=banana|Zc=color, Za=yellow)<p(Y=banana|Zc=color, Za=green)

2)类别中的样本与其他类别产生伪关联(样本被分错到其他类)可以用如下公式解释：如果绿色的丝瓜样本数过多，那么

就会远大于1，那么p(Y＝丝瓜|Zc＝颜色，Za＝绿)值就会很大。即如果某香蕉样本的颜色为绿色，则很有可能被分为丝瓜，再加上绿色香蕉本身由于是尾部类更加剧了这种现象。2) The samples in the category are pseudo-associated with other categories (the samples are misclassified to other categories) can be explained by the following formula: If there are too many green loofah samples, then

It will be much greater than 1, then the value of p(Y=loofah|Zc=color, Za=green) will be very large. That is, if the color of a banana sample is green, it is likely to be classified as a loofah, and the green banana itself is a tail type, which exacerbates this phenomenon.

上面完整的阐述了类内偏移是如何产生并作用于视觉识别任务的，针对类内偏移，本发明给出的整体架构如图4所示，结合图4本实施例给出具体的视觉识别过程：The above fully explains how the intra-class offset is generated and acts on the visual recognition task. For the intra-class offset, the overall architecture given by the present invention is shown in Figure 4. In combination with Figure 4, this embodiment gives a specific visual Identification process:

首先采样得到长尾训练集{{x,y}}，x是样本，y是类别标签，样本图片大小被调整为112X112；First sample the long-tail training set {{x,y}}, x is the sample, y is the category label, and the size of the sample image is adjusted to 112X112;

然后预热训练骨干f(·；θ)(特征提取器)和分类器g(·,ω)，损失函数使用交叉熵θ,ω∈argmin_θ,ωL_cls(f(x；θ),y；ω)，其中θ、ω分别是骨干网络和分类器的可学习参数，L_cls是交叉熵，时长为60个epoch，优化器用的是SGD，批大小为256。Then preheat the training backbone f( ; θ) (feature extractor) and classifier g( , ω), the loss function uses cross entropy θ, ω ∈ argmin_{θ, ω} L_cls (f(x; θ), y ;ω), where θ and ω are the learnable parameters of the backbone network and the classifier respectively, L_cls is the cross entropy, the duration is 60 epochs, the optimizer uses SGD, and the batch size is 256.

通过前一步预热好的模型计算得到每个类别的类别中心{Cy}；Calculate the category center {Cy} of each category through the preheated model in the previous step;

接着在预热之后，利用训练子集训练模型，周期性的重复重新构建训练子集，并利用重新构建的训练子集训练模型，直到损失函数收敛，所述周期性指的是每20个epoch重复一次。具体地，通过不同的重采样策略分别构建两个训练子集进行接下来的训练{(x^e1,y^e1)},{(x^e2,y^e2)}＝SubSetConstruct({x,y},θ,ω)，损失函数用的是θ，ω∈argmin_θ,ω∑_s∈ε∑_i∈s(L_cls+α·L_IFL)，同时在每次重复步骤前要更新每个类的类中心CyThen after warming up, use the training subset to train the model, periodically repeat the reconstruction of the training subset, and use the rebuilt training subset to train the model until the loss function converges, the periodicity refers to every 20 epochs repeat. Specifically, two training subsets are respectively constructed through different resampling strategies for the next training {(x^e1 ,y^e1 )},{(x^e2 ,y^e2 )}=SubSetConstruct({x,y},θ ,ω), the loss function is θ, ω∈argmin_θ,ω ∑_s∈ε∑_i∈s (L_cls +α·L_IFL ), and the class center of each class should be updated before each repeated step Cy

{Cy}→MovingAverage({Cy},{(f(x^s1；θ),y^s1)},{(f(x^s2；θ),y^s2)}){Cy}→MovingAverage({Cy}, {(f(x^s1 ; θ), y^s1 )}, {(f(x^s2 ; θ), y^s2 )})

其中

in

其中

in

最后，完成上述训练过程得到平衡后的特征提取器f(·；θ)。最后进行额外的10epoches的训练用于处理类间不平衡，在此期间冻结住特征提取器的参数θ，然后随机初始化线性分类器的参数ω并在接下来的10epoches使用重采样得到的类别样本均衡的训练集对分类器进行单独调整。最终得到平衡的特征提取器f(·；θ)和分类器g(·,ω)。Finally, the balanced feature extractor f(·;θ) is obtained by completing the above training process. Finally, an additional 10epoches of training is used to deal with the inter-class imbalance, during which the parameter θ of the feature extractor is frozen, and then the parameter ω of the linear classifier is randomly initialized and used in the next 10epoches. The category sample balance obtained by resampling The training set of the classifier is tuned individually. Finally, a balanced feature extractor f(·;θ) and a classifier g(·,ω) are obtained.

在一个示例中，一种融合多种表征平衡策略的室内场景视觉识别方法，所述不同重采样方式包括：In an example, a method for visual recognition of indoor scenes that integrates multiple representation balancing strategies, the different resampling methods include:

具体地，因为没有理论证明模型从训练集中学到的多个特征可以解缠，所以无法通过简单直接的特征选择来分离Zc和Za，本发明给出的方案是通过构建两个训练子集来引导模型减少对Za的学习。首先我们通过实验得到经验性结论：每个样本与其类别中心的余弦相似度与其Za的稀有度成反比，即这个样本的Za越稀有，这个样本由模型给出的预测logit就越小。通过这条结论就可以用样本由模型给出的预测logit作为这个样本的Za在样本所述类别内部的Za长尾分布中所处的位置。然后是训练子集的构建时的重采样方法，因为区分两个训练子集时使用的是子集中样本的Za分布的不同，又由前述经验性结论可知，Za的分布可以用样本由模型给出的预测logit表示，所以具体的训练子集重采样策略如下。在得到预热好的特征提取器和分类器之后，然后可以得到当前整个训练集中所有样本的预测置信度，当样本标签为k时，预测置信度即为P(Y＝k|X in k)。前述中提到过，样本的预测执行度可以被用来表示当前样本x在所属类别y中的Za的分布情况。因为构建多个训练子集的原因就是通过多个子集以及损失函数引导模型减少会带来属性偏移的Za的学习，转而更多的使用Zc去作为视觉识别任务的依据。这里具体的训练子集的重采样方法有两种。一种是对每个类中的样本赋予同样的权重，这样得到的子集里每个类别内部样本的Za的分布与原始集合中的分布相同。第二种重采样方式是依照二八定律，给类别中每个样本赋予的采样权重是(1-p(Y＝k|Zc,Za))^β，其中β自动调整用于从k类别中将具有最低的p(Y＝k|Zc,Za)值的百分之二十样本上采样已达到该类别的百分之八十，简要来说可以理解为第二种重采样方式对于类别k中每个样本所赋予的权重与第一种采样方式刚好相反。然后对原训练集中的每个类别分别使用两种重采样方式得到两个新的小子集，然后将所有使用相同重采样方式的小子集合在一起，最终得到两个大的训练子集，到此完成一次训练子集的构建。Specifically, because there is no theoretical proof that the multiple features learned by the model from the training set can be disentangled, it is impossible to separate Zc and Za through simple and direct feature selection. The solution given by the present invention is to construct two training subsets to The bootstrap model learns less about Za. First, we get an empirical conclusion through experiments: the cosine similarity between each sample and its category center is inversely proportional to the rarity of its Za, that is, the rarer the Za of this sample, the smaller the predicted logit given by the model for this sample. Through this conclusion, the predicted logit given by the model of the sample can be used as the position of Za of this sample in the long-tailed distribution of Za within the category of the sample. Then there is the resampling method during the construction of the training subset, because the difference between the Za distribution of the samples in the subset is used to distinguish the two training subsets, and from the aforementioned empirical conclusions, the distribution of Za can be given by the model using samples. The predicted logit representation, so the specific training subset resampling strategy is as follows. After obtaining the preheated feature extractor and classifier, the prediction confidence of all samples in the current entire training set can be obtained. When the sample label is k, the prediction confidence is P(Y=k|X in k) . As mentioned above, the prediction execution degree of a sample can be used to represent the distribution of Za of the current sample x in the category y to which it belongs. Because the reason for constructing multiple training subsets is to use multiple subsets and loss functions to guide the model to reduce the learning of Za that will cause attribute shifts, and instead use more Zc as the basis for visual recognition tasks. Here, there are two specific resampling methods for the training subset. One is to assign the same weight to the samples in each class, so that the distribution of Za of samples in each class in the obtained subset is the same as that in the original set. The second resampling method is according to the 28th law, the sampling weight assigned to each sample in the category is (1-p(Y=k|Zc,Za))^β , where β is automatically adjusted to extract from k categories Twenty percent of the samples with the lowest p(Y=k|Zc,Za) value have been upsampled to 80% of the category. In brief, it can be understood as the second resampling method for category k The weight assigned to each sample is just the opposite of the first sampling method. Then use two resampling methods for each category in the original training set to obtain two new small subsets, and then combine all the small subsets that use the same resampling method to finally obtain two large training subsets. Complete the construction of a training subset.

进一步地，按照二八定律对每个类中的样本赋予权重进行采样，包括：Further, according to the 28th law, the samples in each class are sampled with weights, including:

对当前类别中部分样本进行上采样直到类别中预测置信度最低的百分之二十的样本的占比达到了原集合样本数的百分之八十；同时将当前类别中另外一部分样本通过下采样变为原集合样本数的百分之二十。具体地，因为部分样本需要进行上采样，这里使用的是MixUp。MixUp是一种数据增强方法，常用作上采样。首先找到当前类中具有最低的p(Y＝k|Zc,Za)值的百分之二十样本，然后随机的取其中两个样本，然后在[0,1]随机选取融合比例μ，μ符合beta分布。然后将之前随机选取的两张图片的每个像素相加进行融合，outputs＝μ*images1+(1-μ)*images，输出得到新的样本做上采样。新产生的样本的标签为

这样组成了一对新的{x,y}。对当前类内的上采样过程直到类中p(Y＝k|Zc,Za)值最低的百分之二十的样本的占比达到了原集合样本数的百分之八十，同时将原集合另外一部分样本通过下采样变为原集合样本数的百分之二十，至此完成第二种训练子集的构建过程。Upsampling some of the samples in the current category until the proportion of the 20% of the samples with the lowest prediction confidence in the category reaches 80% of the original set of samples; at the same time, the other part of the samples in the current category are passed through the following The sampling becomes twenty percent of the original collection samples. Specifically, because some samples need to be up-sampled, MixUp is used here. MixUp is a data augmentation method often used for upsampling. First find the 20% samples with the lowest p(Y=k|Zc,Za) value in the current class, then randomly select two of the samples, and then randomly select the fusion ratio μ in [0,1], μ conforms to the beta distribution. Then add each pixel of the previously randomly selected two images for fusion, outputs=μ*images1+(1-μ)*images, and output a new sample for upsampling. The labels of the newly generated samples are

This forms a new pair {x,y}. The upsampling process in the current class until the proportion of the 20% samples with the lowest p(Y=k|Zc,Za) value in the class reaches 80% of the original collection samples, and the original set The other part of the samples in the collection is changed to 20% of the original collection samples through downsampling, and the construction process of the second training subset is completed so far.

在另一示例性实施例中，提供一种融合多种表征平衡策略的室内场景视觉识别系统，所述系统包括：In another exemplary embodiment, there is provided an indoor scene visual recognition system that integrates multiple representation balancing strategies, the system comprising:

其中，训练子集构建模块通过不同的重采样策略构建多个具有不同特征分布的训练子集，类内平衡训练模块根据训练子集构建模块输出的训练子集结合自定义的中心损失函数，使用所述训练子集对预热后的模型进行训练直至损失函数收敛，让模型倾向于学习到在各个训练子集间平衡的特征，使得模型倾向于学到类内无偏的表征，解决了之前一直被忽略的类内偏移问题。类间平衡训练模块在分类器上施加正则项来调整头尾部类的权重差异，当所述损失函数收敛到一定程度后得到训练好的模型，减少训练集中因为类别样本的不均衡造成的分类器上各类别权重的不平衡。同时解决类别样本不均衡和类别内样本的非类别属性的不平衡给模型训练带来的问题，在之前类间平衡工作的基础上，进一步考虑一些“非类别因素”如背景、姿势、视角等类内长尾对视觉识别任务结果带来的影响，两方面平衡方法互补，使得模型整体效果更好。Among them, the training subset construction module constructs multiple training subsets with different feature distributions through different resampling strategies, and the intra-class balance training module combines the training subset output by the training subset construction module with a self-defined center loss function, using The training subset trains the preheated model until the loss function converges, so that the model tends to learn features that are balanced between the various training subsets, so that the model tends to learn unbiased representations within the class, which solves the previous problem The problem of intra-class offset has been ignored. The inter-class balance training module applies a regular term to the classifier to adjust the weight difference of the head and tail classes. When the loss function converges to a certain level, the trained model is obtained, which reduces the classifier caused by the imbalance of class samples in the training set. The imbalance of the weights of each category above. At the same time, it solves the problems brought about by the imbalance of category samples and the imbalance of non-category attributes of samples in the category. On the basis of the previous work on the balance between categories, some "non-category factors" such as background, posture, viewing angle, etc. are further considered. The impact of the long tail in the class on the results of the visual recognition task, the two balanced methods complement each other, making the overall effect of the model better.

在另一示例性实施例中，本发明提供一种存储介质，其上存储有计算机指令，所述计算机指令运行时执行所述一种融合多种表征平衡策略的室内场景视觉识别方法的步骤。In another exemplary embodiment, the present invention provides a storage medium on which computer instructions are stored, and when the computer instructions are run, the steps of the indoor scene visual recognition method incorporating multiple representation balancing strategies are executed.

基于这样的理解，本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。Based on this understanding, the technical solution of this embodiment is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. Several instructions are included to make a computer device (which may be a personal computer, server, or network device, etc.) execute all or part of the steps of the methods in various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes. .

在另一示例性实施例中，本发明提供一种终端，包括存储器和处理器，存储器上存储有可在处理器上运行的计算机指令，处理器运行计算机指令时执行所述一种融合多种表征平衡策略的室内场景视觉识别方法的步骤。In another exemplary embodiment, the present invention provides a terminal, including a memory and a processor. The memory stores computer instructions that can be run on the processor. When the processor runs the computer instructions, it executes the fusion of multiple Steps in an indoor scene visual recognition method for characterizing balancing strategies.

处理器可以是单核或者多核中央处理单元或者特定的集成电路，或者配置成实施本发明的一个或者多个集成电路。The processor may be a single-core or multi-core central processing unit or a specific integrated circuit, or one or more integrated circuits configured to implement the present invention.

本说明书中描述的主题及功能操作的实施例可以在以下中实现：有形体现的计算机软件或固件、包括本说明书中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本说明书中描述的主题的实施例可以实现为一个或多个计算机程序，即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地，程序指令可以被编码在人工生成的传播信号上，例如机器生成的电、光或电磁信号，该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。Embodiments of the subject matter and functional operations described in this specification can be implemented in tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this specification and their structural equivalents, or in one or more of them. a combination. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, that is, one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by or to control the operation of data processing apparatus. Multiple modules. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for transmission by the data The processing means executes.

本说明书中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行，以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行，并且装置也可以实现为专用逻辑电路。The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).

适合用于执行计算机程序的处理器包括，例如通用和/或专用微处理器，或任何其他类型的中央处理单元。通常，中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常，计算机还将包括用于存储数据的一个或多个大容量存储设备，例如磁盘、磁光盘或光盘等，或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据，抑或两种情况兼而有之。然而，计算机不是必须具有这样的设备。此外，计算机可以嵌入在另一设备中，例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备，仅举几例。Processors suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory and/or a random access memory. The essential components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or to It transmits data, or both. However, a computer is not required to have such a device. In addition, a computer may be embedded in another device such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a device such as a Universal Serial Bus (USB) ) portable storage devices like flash drives, to name a few.

虽然本说明书包含许多具体实施细节，但是这些不应被解释为限制任何发明的范围或所要求保护的范围，而是主要用于描述特定发明的具体实施例的特征。本说明书内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面，在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外，虽然特征可以如上所述在某些组合中起作用并且甚至最初如此要求保护，但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除，并且所要求保护的组合可以指向子组合或子组合的变型。While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as primarily describing features of particular embodiments of particular inventions. Certain features that are described in this specification in multiple embodiments can also be implemented in combination in a single embodiment. On the other hand, various features that are described in a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may function in certain combinations as described above and even be initially so claimed, one or more features from a claimed combination may in some cases be removed from that combination and the claimed A protected combination can point to a subcombination or a variant of a subcombination.

类似地，虽然在附图中以特定顺序描绘了操作，但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行，以实现期望的结果。在某些情况下，多任务和并行处理可能是有利的。此外，上述实施例中的各种系统模块和组件的分离不应被理解为在所有实施例中均需要这样的分离，并且应当理解，所描述的程序组件和系统通常可以一起集成在单个软件产品中，或者封装成多个软件产品。Similarly, while operations are depicted in the figures in a particular order, this should not be construed as requiring that those operations be performed in the particular order shown, or sequentially, or that all illustrated operations be performed, to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of various system modules and components in the above-described embodiments should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can often be integrated together in a single software product in, or packaged into multiple software products.

以上具体实施方式是对本发明的详细说明，不能认定本发明的具体实施方式只局限于这些说明，对于本发明所属技术领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干简单推演和替代，都应当视为属于本发明的保护范围。The above specific embodiment is a detailed description of the present invention, and it cannot be determined that the specific embodiment of the present invention is only limited to these descriptions. For those of ordinary skill in the technical field of the present invention, they can also Making some simple deduction and substitution should be regarded as belonging to the protection scope of the present invention.

Claims

1. An indoor scene visual recognition method integrating multiple characterization balance strategies is characterized by comprising the following steps:

s1, sampling to obtain a long tail training set;

s2, preheating a model and customizing a loss function;

s3, calculating a class center of each class in the long-tail training set by using the preheated model;

s4, constructing a plurality of training subsets with different feature distributions through different resampling strategies;

s5, training the preheated model by using the training subsets until the loss function converges by combining with the self-defined loss function, so that the model tends to learn the characteristics balanced among the training subsets;

s6, applying a regular term on the classifier of the model after training in the step S5 to adjust the weight difference of the head and tail classes, and obtaining the trained model after the loss function converges to a certain degree.

And S7, performing visual identification by using the trained model.

2. The method for indoor scene visual recognition incorporating multiple characterization balancing strategies according to claim 1, wherein the constructing a plurality of training subsets with different feature distributions by different resampling strategies comprises:

and respectively using different resampling modes for each category in the long-tail training set to obtain a plurality of new small subsets, and then combining all the small subsets using the same resampling mode to obtain a plurality of large training subsets.

3. The method for visual recognition of an indoor scene incorporating multiple characterization balancing strategies according to claim 2, wherein the different resampling modes comprise:

one is to sample samples in each class with the same weight, and the other is to sample samples in each class with weights according to the two-eight law.

4. A method of indoor scene visual recognition incorporating multiple characterization balancing strategies according to claim 3, wherein the weighting of the samples in each class according to the bieight law is sampled, comprising:

upsampling a portion of the samples in the current class until the sample fraction of twenty percent of the lowest confidence in the prediction in the class reaches eighty percent of the number of samples in the original set; while another portion of the samples in the current class are down sampled to twenty percent of the original set of samples.

5. The method for indoor scene visual recognition incorporating multiple characterization balancing strategies according to claim 4, wherein the upsampling method is MixUp data enhancement.

6. The method for indoor scene visual recognition incorporating multiple characterization balancing strategies according to claim 1, wherein training the preheated model using the training subset until the loss function converges comprises:

periodically repeating step S4 to reconstruct the training subset and training the model using the reconstructed training subset.

7. The method for visual recognition of an indoor scene incorporating multiple characterization balancing strategies according to claim 6, wherein the periodicity is repeated every 20 epochs.

8. The method for indoor scene visual identification incorporating multiple characterization balancing strategies of claim 6, wherein the class center of each class is updated prior to each repetition.

9. The method for visual recognition of an indoor scene incorporating multiple characterization balancing strategies according to claim 1, wherein the step S6 comprises:

the parameters of the model classifier are randomly initialized and the classifier is independently adjusted periodically by using a training subset obtained by resampling.

10. An indoor scene visual recognition system incorporating a plurality of characterization balancing strategies, the system comprising:

the data acquisition module is used for sampling to obtain a long-tail training set;

the model preheating module is used for preheating the model and customizing the loss function;

the class center calculating module calculates the class center of each class in the long tail training set by using the preheated model;

the training subset construction module is used for constructing a plurality of training subsets with different feature distributions through different resampling strategies;

the in-class balance training module is used for training the preheated model by combining the self-defined loss function until the loss function converges, so that the model tends to learn the characteristics balanced among the training subsets;

and the inter-class balance training module applies a regular term on the classifier of the model obtained by the intra-class balance training module to adjust the weight difference of the head and tail classes, and the trained model is obtained after the loss function converges to a certain degree.

And the recognition module is used for performing visual recognition by using the trained model.