技术领域technical field
本发明涉及到跨模态检索领域,尤其涉及到一种基于多层语义的结合深度学习与哈希方法的图像-文本跨模态检索算法。The invention relates to the field of cross-modal retrieval, in particular to an image-text cross-modal retrieval algorithm based on multi-layer semantics combined with deep learning and a hash method.
背景技术Background technique
随着移动互联网的发展和智能手机、数码相机等设备的普及,互联网上的多媒体数据呈爆炸式增长。在信息检索领域,多媒体大数据的不断增长带来了跨模态检索应用需求。而目前主流的搜索引擎,如百度、谷歌、必应等,仅提供一种模态的检索结果。此外,随着深度学习在计算机视觉、自然语言处理等领域取得一系列突破性进展,将多媒体大数据与人工智能相结合,是两个领域未来共同的发展趋势。因此,结合新技术和新需求,探索新的跨模态检索模式成为当前信息检索领域亟待解决的挑战之一。With the development of mobile Internet and the popularization of smart phones, digital cameras and other devices, the multimedia data on the Internet is growing explosively. In the field of information retrieval, the continuous growth of multimedia big data has brought about the demand for cross-modal retrieval applications. However, the current mainstream search engines, such as Baidu, Google, Bing, etc., only provide one mode of retrieval results. In addition, as deep learning has made a series of breakthroughs in the fields of computer vision and natural language processing, the combination of multimedia big data and artificial intelligence is a common development trend in the future of the two fields. Therefore, combining new technologies and new needs, exploring new cross-modal retrieval models has become one of the challenges to be solved in the field of information retrieval.
传统的跨模态检索通常采用依赖领域知识的手工设计特征,“语义鸿沟”问题仍是该领域的难点。将深度学习应用于跨模态检索领域,不仅为解决不同模态异质数据之间的“媒体鸿沟”提供了大量特征学习与表示方面先进的研究成果。然而,随着多媒体数据的不断增长,采用深度学习的特征表示由于维数过大而面临存储空间与检索效率的挑战,导致无法适应大规模多媒体数据检索任务。同时,跨模态检索问题还面临真实数据存在多个标签的问题。现有的解决方法大部分均采用了将问题转化为二值相关的单标签学习问题,导致学习到的模型不能充分保留数据在原语义空间的关联关系,影响最终检索结果Traditional cross-modal retrieval usually adopts handcrafted features that rely on domain knowledge, and the "semantic gap" problem is still a difficult point in this field. Applying deep learning to the field of cross-modal retrieval not only provides a large number of advanced research results in feature learning and representation for solving the "media gap" between heterogeneous data of different modalities. However, with the continuous growth of multimedia data, the feature representation using deep learning faces the challenges of storage space and retrieval efficiency due to the large dimensionality, which makes it unable to adapt to large-scale multimedia data retrieval tasks. At the same time, the cross-modal retrieval problem also faces the problem of multiple labels in real data. Most of the existing solutions use the problem of converting the problem into a single-label learning problem of binary correlation, resulting in the learned model not being able to fully preserve the relationship between the data in the original semantic space and affecting the final retrieval results.
发明内容Contents of the invention
本发明的目的在于克服现有技术的不足,将结合基于深度学习的特征表示,并同时考虑图像、文本两种模态数据的二值相似性和多层语义相似性,应用哈希方法通过网络训练得到数据到哈希码的映射,提供一种检索准确率更高的图像-文本跨模态检索方法。The purpose of the present invention is to overcome the deficiencies of the prior art, combining the feature representation based on deep learning, and simultaneously considering the binary similarity and multi-layer semantic similarity of the two modal data of image and text, applying the hash method to pass through the network The mapping from training data to hash codes provides an image-text cross-modal retrieval method with higher retrieval accuracy.
为实现上述目的,本发明所提供的技术方案为:In order to achieve the above object, the technical scheme provided by the present invention is:
分为三个模块,分别为深度特征提取模块、相似度矩阵生成模块、哈希码学习模块;It is divided into three modules, namely, the deep feature extraction module, the similarity matrix generation module, and the hash code learning module;
其中,深度特征提取模块采用深度神经网络提取图像和文本数据特征。该模块采用两个子网络分别提取图像和文本模态数据特征的结构,即包含两个深度神经网络,一个用于提取图像数据的特征,一个用于提取文本数据特征。采用深度卷积神经网络CNN-F网络结构进行图像特征提取。CNN-F的结构由5层卷积层和3层全连接层构成。在文本特征提取阶段,首先以词袋(Bag-of-Words,BOW)向量对文本数据建模。基于上述词袋模型,文本特征提取网络采用由三层全连接层构成的多层感知机(Multi-Layer Perception,MLP)网络提取文本特征。Among them, the deep feature extraction module uses a deep neural network to extract image and text data features. This module adopts a structure in which two sub-networks extract image and text modal data features respectively, that is, it contains two deep neural networks, one is used to extract image data features, and the other is used to extract text data features. The deep convolutional neural network CNN-F network structure is used for image feature extraction. The structure of CNN-F consists of 5 convolutional layers and 3 fully connected layers. In the text feature extraction stage, the text data is first modeled with a Bag-of-Words (BOW) vector. Based on the above-mentioned bag-of-words model, the text feature extraction network uses a multi-layer perceptron (Multi-Layer Perception, MLP) network composed of three fully connected layers to extract text features.
对于相似度矩阵生成模块,包含二值相似度矩阵生成和多层语义相似度矩阵生成。它们各自生成一个跨模态相似度矩阵。对于二值相似度矩阵当图像i与文本j相似时,矩阵对应的取值为1;当图像i与文本j不相似时,矩阵对应的取值为0。对于多层语义相似度矩阵根据标签共现关系设计其计算方法,使得两个样本的类别标签集拥有更多相似标签时,样本的相似度越大,当两个标签集完全相同时,达到最大值1。当两个样本标签集中的标签完全不同时,取最小值0。For the similarity matrix generation module, it includes binary similarity matrix generation and multi-layer semantic similarity matrix generation. They each generate a cross-modal similarity matrix. For a binary similarity matrix When image i is similar to text j, the matrix corresponds to The value is 1; when the image i is not similar to the text j, the corresponding matrix The value is 0. For multi-layer semantic similarity matrix The calculation method is designed according to the label co-occurrence relationship, so that when the category label sets of two samples have more similar labels, the similarity of the samples is greater. When the two label sets are exactly the same, reaches a maximum value of 1. When the labels in the two sample label sets are completely different, Take the minimum value of 0.
对于哈希码生成模块,为了使学习到的哈希码保留二值相似度矩阵及多层语义相似度矩阵中的语义信息,设计目标函数:For the hash code generation module, in order to make the learned hash code retain the binary similarity matrix and multi-level semantic similarity matrix Semantic information in the design objective function:
其中,in,
通过优化该目标函数,学习网络参数,得到数据与哈希码的映射关系。By optimizing the objective function and learning network parameters, the mapping relationship between data and hash codes is obtained.
与现有技术相比,本方案原理及优点如下:Compared with the existing technology, the principle and advantages of this scheme are as follows:
本方案结合深度学习与哈希方法,克服传统手工设计特征在特征表示能力上的不足,及深度特征维数过大,不利于数据存储和计算的缺点,并结合二值相似度和多层语义相似度,充分考虑跨模态数据之间复杂的相似度关系,使学习到的哈希码保留更多语义信息,提高检索准确率。This solution combines deep learning and hashing methods to overcome the shortcomings of traditional manual design features in terms of feature representation capabilities, and the large dimension of deep features is not conducive to data storage and calculation, and combines binary similarity and multi-layer semantics Similarity fully considers the complex similarity relationship between cross-modal data, so that the learned hash code retains more semantic information and improves retrieval accuracy.
附图说明Description of drawings
图1为本发明基于多层语义深度哈希算法的图像-文本跨模态检索的整体框架图;Fig. 1 is the overall frame diagram of the image-text cross-modal retrieval based on the multi-layer semantic depth hashing algorithm of the present invention;
具体实施方式Detailed ways
下面结合具体实例对本发明作进一步说明:The present invention will be further described below in conjunction with specific example:
本发明中皆以图像和文本两种模态为例进行讨论。In the present invention, both image and text modes are taken as examples for discussion.
本发明提供了一种基于多层语义深度哈希算法的图像-文本跨模态检索(DeepMulti-Level Semantic Hashing for Cross-modal Retrieval,DMSH)方法,其中包含三个模块:深度特征提取模块、相似度矩阵生成模块、哈希码学习模块,如图1所示;The present invention provides an image-text cross-modal retrieval (DeepMulti-Level Semantic Hashing for Cross-modal Retrieval, DMSH) method based on a multi-layer semantic depth hashing algorithm, which includes three modules: deep feature extraction module, similarity Degree matrix generation module, hash code learning module, as shown in Figure 1;
表1图像特征提取网络结构Table 1 Image feature extraction network structure
深度特征提取模块采用深度神经网络提取图像和文本数据特征。采用深度卷积神经网络CNN-F网络结构进行图像特征提取,网络结构配置如表1所示。在文本特征提取阶段,首先以词袋向量对文本数据建模。基于词袋模型,文本特征提取网络采用由三层全连接层构成的多层感知机网络提取文本特征,网络配置如表2所示.The deep feature extraction module uses a deep neural network to extract image and text data features. The deep convolutional neural network CNN-F network structure is used for image feature extraction, and the network structure configuration is shown in Table 1. In the text feature extraction stage, the text data is first modeled with bag-of-words vectors. Based on the bag-of-words model, the text feature extraction network uses a multi-layer perceptron network composed of three fully connected layers to extract text features. The network configuration is shown in Table 2.
其中,conv1层采用4步长卷积,conv2-conv5层均采用1步长卷积。pad即补边(Padding),表示步长移动方式。通常指给图像边缘补边,使得卷积后输出的图像尺寸与原尺寸一致。LRN表示局部响应归一化(Local Response Normalization)。其模仿生物神经元的侧抑制机制,对局部神经元的活动创建竞争机制,使响应较大的值更大,并抑制反馈较小的神经元,增强模型泛化能力。采用MAX操作的池化技术,取原图像某一尺寸内的最大值,从而有效减少模型参数,防止过拟合。并通过Dropout正则化技术,通过在训练期间随机的丢弃一定数量的神经元,防止网络过拟合。Among them, the conv1 layer uses 4-step convolution, and the conv2-conv5 layers all use 1-step convolution. pad is Padding, which means the step size movement method. It usually refers to filling the edge of the image so that the size of the output image after convolution is consistent with the original size. LRN stands for Local Response Normalization. It imitates the lateral inhibition mechanism of biological neurons, creates a competition mechanism for the activities of local neurons, makes the value of larger responses larger, and inhibits neurons with smaller feedbacks to enhance the generalization ability of the model. The pooling technology of MAX operation is used to take the maximum value within a certain size of the original image, thereby effectively reducing model parameters and preventing overfitting. And through the Dropout regularization technique, a certain number of neurons are randomly discarded during training to prevent the network from overfitting.
表2文本特征提取网络Table 2 Text Feature Extraction Network
其中,网络的第一个隐藏层是与输入词袋向量长度相同的全连接层,第二层隐藏层是4096维全连接层,第三层是长度为哈希码长的全连接层。网络的输出即文本特征向量。Among them, the first hidden layer of the network is a fully connected layer with the same length as the input bag of words vector, the second hidden layer is a 4096-dimensional fully connected layer, and the third layer is a fully connected layer whose length is the hash code length. The output of the network is the text feature vector.
相似度矩阵生成模块包含二值相似度矩阵生成和多层语义相似度矩阵生成。它们各自生成一个跨模态相似度矩阵对于二值相似度矩阵当图像i与文本j相似时,矩阵对应的取值为1;当图像i与文本j不相似时,矩阵对应的取值为0。其中,不同模态数据之间的相似性通过类别标签衡量。即若图像i和文本j有共同的一组类别标签,那么认为它们是相似的;否则认为它们是不相似的。其定义如下:The similarity matrix generation module includes binary similarity matrix generation and multi-layer semantic similarity matrix generation. They each generate a cross-modal similarity matrix For a binary similarity matrix When image i is similar to text j, the matrix corresponds to The value is 1; when the image i is not similar to the text j, the corresponding matrix The value is 0. Among them, the similarity between different modal data is measured by category labels. That is, image i and text j are considered similar if they share a common set of class labels; otherwise, they are considered dissimilar. It is defined as follows:
对于多层语义相似度矩阵采用一种基于类别标签共现关系的相似度矩阵计算方法;下面介绍具体生成方法。For multi-layer semantic similarity matrix A similarity matrix calculation method based on the co-occurrence relationship of category tags is adopted; the specific generation method is introduced below.
对于两个类别标签ti,tj,定义标签相似度:For two category labels ti , tj , define the label similarity:
其中,d(ti,tj)表示两个标签的语义距离,定义如下:Among them, d(ti , tj ) represents the semantic distance between two labels, which is defined as follows:
其中,分别表示训练集中ti,tj出现的次数;表示ti,tj共同出现的次数;Nc表示训练集中所有标签的个数。in, respectively represent the number of occurrences of ti and tj in the training set; Indicates the number of co-occurrences of ti and tj ; Nc indicates the number of all labels in the training set.
由定义(2)可知,s(ti,tj)∈[0,1],表示当两个标签共同出现的次数越多时,它们的相似度越大。根据标签相似性s,可定义样本间的相似性According to definition (2), s(ti , tj ) ∈ [0, 1] means that when two tags co-occur more times, their similarity is greater. According to the label similarity s, the similarity between samples can be defined
对于两个样本Dm,Dn,定义样本相似度For two samples Dm , Dn , define the sample similarity
其中,tm,tm分别表示样本Dm,Dn的类别标签集;|tm|,|tn|分别表示tm,tn的个数;即哈希标签。由定义可知,当两个样本的类别标签集拥有更多相似标签时,样本的相似度越大,当两个标签集tm,tn完全相同时,达到最大值1。当tm中的标签与tn中的标签全部不相似时,取最小值0。因此,基于多标签的语义相似度矩阵可以作为哈希码学习过程的监督信息。与二值相似度矩阵相比,将跨模态相似度由离散的{0,1}扩展为连续的[0,1]区间取值,保留了更多隐含在数据类别标签中的丰富的语义信息。Among them, tm , tm represent the category label sets of samples Dm , Dn respectively; |tm |, |tn | represent the number of tm , tn respectively; Namely hashtags. It can be seen from the definition that when the category label sets of two samples have more similar labels, the similarity of the samples is greater. When the two label sets tm and tn are exactly the same, reaches a maximum value of 1. When the labels in tm are all dissimilar to those in tn , Take the minimum value of 0. Therefore, the semantic similarity matrix based on multi-label Can be used as supervisory information for the hash code learning process. with binary similarity matrix compared to, Extending the cross-modal similarity from discrete {0,1} to continuous [0,1] interval values retains more rich semantic information hidden in data category labels.
哈希码学习模块,以表示学习到的样本Di的图像特征,即图像特征提取网络的输出;以表示学习到的样本Dj的文字特征,即文字特征提取网络的输出。分别表示两个深度网络的参数。Hash code learning module, with Represents the image features of the learned sample Di , that is, the output of the image feature extraction network; Indicates the learned text feature of the sample Dj , that is, the output of the text feature extraction network. represent the parameters of the two deep networks, respectively.
为了使学习到的哈希码保留二值相似度矩阵的语义信息,采用sigmoid交叉熵损失函数:In order for the learned hash codes to preserve the binary similarity matrix The semantic information of , using the sigmoid cross entropy loss function:
其中,为保证训练过程的稳定性及避免溢出,在实现阶段采用(3-5)的等价形式:in, In order to ensure the stability of the training process and avoid overflow, the equivalent form of (3-5) is used in the implementation stage:
基于上述二值语义信息损失函数进一步引入多层语义损失函数使得学习到的模型保留包含在多层语义相似度矩阵中更加丰富的语义信息。这里同样采用sigmoid交叉熵损失函数的等价形式:Based on the above binary semantic information loss function Further introduce a multi-layer semantic loss function Make the learned model retain the multi-layer semantic similarity matrix Richer semantic information in . The equivalent form of the sigmoid cross-entropy loss function is also used here:
因此,可以得到目标函数的完整形式:Therefore, the complete form of the objective function can be obtained:
其中,F(g)、F(x)分别表示学习到的图像和文本的特征向量,它们包含了相似度矩阵中的语义信息;C(g)、C(x)分别表示图像和文本的哈希码,sign(·)表示符号函数,定义如式(3-9)。F(g)、F(x)中的语义信息通过符号函数传递给C(g)、C(x);表示斐波那契范数,E表示元素取值全为1的向量;μ,ρ,τ为超参数。Among them, F(g) and F(x) represent the feature vectors of the learned image and text respectively, which contain the similarity matrix Semantic information in ; C(g) and C(x) represent the hash codes of images and texts respectively, and sign( ) represents the sign function, defined as in formula (3-9). The semantic information in F(g) and F(x) is transferred to C(g) and C(x) through symbolic functions; Represents the Fibonacci norm, E represents a vector whose elements are all 1; μ, ρ, τ are hyperparameters.
C(g)=sign(F(g)) (9)C(g) = sign(F(g) ) (9)
C(x)=sign(F(x)) (10)C(x) = sign(F(x) ) (10)
目标函数的前两项是跨模态相似度的负对数似然函数,通过优化该项可保证当越大时,F(g)*i与F(x)*j的相似度越大;越小,F(g)*i与F(x)*j的相似度越小。因此,优化第1、2项保证了网络学习到的图像和文本的特征保留了原来语义空间的跨模态相似性。The first two terms of the objective function are the negative logarithmic likelihood function of the cross-modal similarity, and by optimizing this term, it can be guaranteed that when The larger the value, the greater the similarity between F(g)*i and F(x)*j ; The smaller , the smaller the similarity between F(g)*i and F(x)*j . Therefore, optimizing items 1 and 2 ensures that the image and text features learned by the network retain the cross-modal similarity of the original semantic space.
目标函数的第3项为正则化项,通过优化该项,得到图像和文本的哈希码C(g)、C(x),并且保留了网络提取的特征F(g)*i与F(x)*j的相似性。由于F(g)*i与F(x)*j保持了语义空间的跨模态相似性,因此得到的哈希码也保留了语义空间的跨模态相似性。Term 3 of the objective function is a regularization item, by optimizing this item, the hash codes C(g) and C(x) of images and texts are obtained, and the features extracted by the network F(g)*i are similar to F(x)*j sex. Since F(g)*i and F(x)*j preserve the cross-modal similarity of semantic space, the resulting hash code also preserves the cross-modal similarity of semantic space.
通过优化目标函数的第4项,使得最终得到的哈希码的每一位在整个训练集上取值为“1”和“-1”的个数保持平衡,即哈希码的同一位置上取“1”和“-1”的个数各占一半。这一约束可以保证哈希码的每一位包含的信息最大化。By optimizing the fourth item of the objective function, the number of values of "1" and "-1" for each bit of the final hash code on the entire training set remains balanced, that is, at the same position of the hash code Take half of the number of "1" and "-1". This constraint can ensure that the information contained in each bit of the hash code is maximized.
实验表明,在网络的训练过程中,令来自同一数据点的图像和文本取完全相同的哈希码,能更好的提升网络的性能。因此,本文在原目标函数的基础上增加加约束C(g)=C(x)=C,最终的目标函数为:Experiments show that during the training process of the network, making images and texts from the same data point have exactly the same hash code can better improve the performance of the network. Therefore, this paper adds a constraint C(g) = C(x) = C on the basis of the original objective function, and the final objective function is:
通过优化该目标函数,使得网络同时学习特征提取的参数和哈希码表示,即将特征学习和哈希码学习过程统一在一个深度学习框架中,实现端到端学习。By optimizing the objective function, the network learns the parameters of feature extraction and the hash code representation at the same time, that is, the process of feature learning and hash code learning is unified in a deep learning framework to achieve end-to-end learning.
在测试及应用阶段,输入任意的单一模态的图像或文本数据,都可以通过训练好的网络来生成其对应的二值码向量,即哈希码。In the testing and application phase, any single-modal image or text data can be input, and its corresponding binary code vector, that is, hash code, can be generated through the trained network.
具体的,将数据点Di的图像模态gi输入网络,通过网络的前向传播可生成其哈希码表示,计算过程如下:Specifically, the image modality gi of the data point Di is input into the network, and its hash code representation can be generated through the forward propagation of the network. The calculation process is as follows:
类似地,对数据点Dj的文本模态xj,通过网络的前向传播可以生成其对应的哈希码:Similarly, for the text modalityxj of data pointDj , the forward propagation through the network can generate its corresponding hash code:
因此,本文提出的DMSH检索模型可以实现给定图像或文本任意一种模态的查询数据,返回不同模态数据库中与之最相似的前k个检索结果。检索过程中,首先计算查询数据(Query)的哈希码与待检索数据库中存储的哈希码之间的距离,然后返回距离最近的前k个哈希码,其所对应的k个数据即最终检索结果。Therefore, the DMSH retrieval model proposed in this paper can realize the query data of any modality of a given image or text, and return the top k most similar retrieval results in different modality databases. In the retrieval process, first calculate the distance between the hash code of the query data (Query) and the hash code stored in the database to be retrieved, and then return the first k hash codes closest to the distance, and the corresponding k data are Final search results.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810649234.7ACN110110122A (en) | 2018-06-22 | 2018-06-22 | Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810649234.7ACN110110122A (en) | 2018-06-22 | 2018-06-22 | Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval |
| Publication Number | Publication Date |
|---|---|
| CN110110122Atrue CN110110122A (en) | 2019-08-09 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810649234.7APendingCN110110122A (en) | 2018-06-22 | 2018-06-22 | Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval |
| Country | Link |
|---|---|
| CN (1) | CN110110122A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110597878A (en)* | 2019-09-16 | 2019-12-20 | 广东工业大学 | A cross-modal retrieval method, device, equipment and medium for multimodal data |
| CN110765281A (en)* | 2019-11-04 | 2020-02-07 | 山东浪潮人工智能研究院有限公司 | A Multi-Semantic Deeply Supervised Cross-modal Hash Retrieval Method |
| CN110990597A (en)* | 2019-12-19 | 2020-04-10 | 中国电子科技集团公司信息科学研究院 | Cross-modal data retrieval system and retrieval method based on text semantic mapping |
| CN111026887A (en)* | 2019-12-09 | 2020-04-17 | 武汉科技大学 | Cross-media retrieval method and system |
| CN111125457A (en)* | 2019-12-13 | 2020-05-08 | 山东浪潮人工智能研究院有限公司 | A deep cross-modal hash retrieval method and device |
| CN111177421A (en)* | 2019-12-30 | 2020-05-19 | 论客科技(广州)有限公司 | Method and device for generating email historical event axis facing digital human |
| CN111221993A (en)* | 2020-01-09 | 2020-06-02 | 山东建筑大学 | Visual media retrieval method based on depth binary detail perception hash |
| CN111353076A (en)* | 2020-02-21 | 2020-06-30 | 华为技术有限公司 | Method for training cross-modal retrieval model, cross-modal retrieval method and related device |
| CN111368176A (en)* | 2020-03-02 | 2020-07-03 | 南京财经大学 | Cross-modal Hash retrieval method and system based on supervision semantic coupling consistency |
| CN111651660A (en)* | 2020-05-28 | 2020-09-11 | 拾音智能科技有限公司 | Method for cross-media retrieval of difficult samples |
| CN111813967A (en)* | 2020-07-14 | 2020-10-23 | 中国科学技术信息研究所 | Retrieval method, retrieval device, computer equipment and storage medium |
| CN111897909A (en)* | 2020-08-03 | 2020-11-06 | 兰州理工大学 | A method and system for ciphertext speech retrieval based on depth-aware hashing |
| CN111914156A (en)* | 2020-08-14 | 2020-11-10 | 中国科学院自动化研究所 | Adaptive label-aware graph convolutional network cross-modal retrieval method and system |
| CN112035700A (en)* | 2020-08-31 | 2020-12-04 | 兰州理工大学 | Voice deep hash learning method and system based on CNN |
| CN112100413A (en)* | 2020-09-07 | 2020-12-18 | 济南浪潮高新科技投资发展有限公司 | A cross-modal hash retrieval method |
| CN112199520A (en)* | 2020-09-19 | 2021-01-08 | 复旦大学 | Cross-modal Hash retrieval algorithm based on fine-grained similarity matrix |
| CN112581477A (en)* | 2019-09-27 | 2021-03-30 | 京东方科技集团股份有限公司 | Image processing method, image matching method, device and storage medium |
| CN112613451A (en)* | 2020-12-29 | 2021-04-06 | 民生科技有限责任公司 | Modeling method of cross-modal text picture retrieval model |
| CN113095415A (en)* | 2021-04-15 | 2021-07-09 | 齐鲁工业大学 | Cross-modal hashing method and system based on multi-modal attention mechanism |
| CN113157739A (en)* | 2021-04-23 | 2021-07-23 | 平安科技(深圳)有限公司 | Cross-modal retrieval method and device, electronic equipment and storage medium |
| CN113177132A (en)* | 2021-06-30 | 2021-07-27 | 中国海洋大学 | Image retrieval method based on depth cross-modal hash of joint semantic matrix |
| CN113270199A (en)* | 2021-04-30 | 2021-08-17 | 贵州师范大学 | Medical cross-modal multi-scale fusion class guidance hash method and system thereof |
| CN113342922A (en)* | 2021-06-17 | 2021-09-03 | 北京邮电大学 | Cross-modal retrieval method based on fine-grained self-supervision of labels |
| CN113536067A (en)* | 2021-07-20 | 2021-10-22 | 南京邮电大学 | Cross-modal information retrieval method based on semantic fusion |
| CN113658683A (en)* | 2021-08-05 | 2021-11-16 | 重庆金山医疗技术研究院有限公司 | Disease diagnosis system and data recommendation method |
| CN113792207A (en)* | 2021-09-29 | 2021-12-14 | 嘉兴学院 | Cross-modal retrieval method based on multi-level feature representation alignment |
| CN113806580A (en)* | 2021-09-28 | 2021-12-17 | 西安电子科技大学 | A Cross-modal Hash Retrieval Method Based on Hierarchical Semantic Structure |
| CN114239730A (en)* | 2021-12-20 | 2022-03-25 | 华侨大学 | A Cross-modal Retrieval Method Based on Neighbor Ranking Relation |
| CN114359930A (en)* | 2021-12-17 | 2022-04-15 | 华南理工大学 | Depth cross-modal hashing method based on fusion similarity |
| CN114780777A (en)* | 2022-04-06 | 2022-07-22 | 中国科学院上海高等研究院 | Cross-modal retrieval method and device, storage medium and terminal based on semantic enhancement |
| CN116955675A (en)* | 2023-09-21 | 2023-10-27 | 中国海洋大学 | Hash image retrieval method and network based on fine-grained similarity relationship contrastive learning |
| CN118839699A (en)* | 2024-07-12 | 2024-10-25 | 电子科技大学 | Weak-supervision cross-mode semantic consistency recovery method |
| CN118884530A (en)* | 2024-07-30 | 2024-11-01 | 北京交通大学 | A method for distinguishing earthquake response of ancient buildings based on OpenCV and perceptual hashing algorithm |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2004006128A2 (en)* | 2002-07-09 | 2004-01-15 | Koninklijke Philips Electronics N.V. | Method and apparatus for classification of a data object in a database |
| CN104166982A (en)* | 2014-06-30 | 2014-11-26 | 复旦大学 | Image optimization clustering method based on typical correlation analysis |
| CN104834748A (en)* | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | Image retrieval method utilizing deep semantic to rank hash codes |
| CN105760507A (en)* | 2016-02-23 | 2016-07-13 | 复旦大学 | Cross-modal subject correlation modeling method based on deep learning |
| CN107679580A (en)* | 2017-10-21 | 2018-02-09 | 桂林电子科技大学 | A kind of isomery shift image feeling polarities analysis method based on the potential association of multi-modal depth |
| CN107766555A (en)* | 2017-11-02 | 2018-03-06 | 电子科技大学 | Image search method based on the unsupervised type cross-module state Hash of soft-constraint |
| CN108170755A (en)* | 2017-12-22 | 2018-06-15 | 西安电子科技大学 | Cross-module state Hash search method based on triple depth network |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2004006128A2 (en)* | 2002-07-09 | 2004-01-15 | Koninklijke Philips Electronics N.V. | Method and apparatus for classification of a data object in a database |
| CN104166982A (en)* | 2014-06-30 | 2014-11-26 | 复旦大学 | Image optimization clustering method based on typical correlation analysis |
| CN104834748A (en)* | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | Image retrieval method utilizing deep semantic to rank hash codes |
| CN105760507A (en)* | 2016-02-23 | 2016-07-13 | 复旦大学 | Cross-modal subject correlation modeling method based on deep learning |
| CN107679580A (en)* | 2017-10-21 | 2018-02-09 | 桂林电子科技大学 | A kind of isomery shift image feeling polarities analysis method based on the potential association of multi-modal depth |
| CN107766555A (en)* | 2017-11-02 | 2018-03-06 | 电子科技大学 | Image search method based on the unsupervised type cross-module state Hash of soft-constraint |
| CN108170755A (en)* | 2017-12-22 | 2018-06-15 | 西安电子科技大学 | Cross-module state Hash search method based on triple depth network |
| Title |
|---|
| YUE CAO 等: "Deep Visual-Semantic Hashing for Cross-Modal Retrieval", 《PUBLICATION: KDD"16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING》* |
| ZHENYAN JI 等: "A Survey of Personalised Image Retrieval and Recommendation", 《THEORETICAL COMPUTER SCIENCE (2017)》* |
| 姚伟娜: "基于深度哈希算法的图像—文本跨模态检索研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》* |
| 张玉宏 等: "深度学习的方法论辨析", 《重庆理工大学学报(社会科学)》* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110597878A (en)* | 2019-09-16 | 2019-12-20 | 广东工业大学 | A cross-modal retrieval method, device, equipment and medium for multimodal data |
| CN110597878B (en)* | 2019-09-16 | 2023-09-15 | 广东工业大学 | A cross-modal retrieval method, device, equipment and medium for multi-modal data |
| CN112581477A (en)* | 2019-09-27 | 2021-03-30 | 京东方科技集团股份有限公司 | Image processing method, image matching method, device and storage medium |
| CN110765281A (en)* | 2019-11-04 | 2020-02-07 | 山东浪潮人工智能研究院有限公司 | A Multi-Semantic Deeply Supervised Cross-modal Hash Retrieval Method |
| CN111026887A (en)* | 2019-12-09 | 2020-04-17 | 武汉科技大学 | Cross-media retrieval method and system |
| CN111026887B (en)* | 2019-12-09 | 2023-05-23 | 武汉科技大学 | A method and system for cross-media retrieval |
| CN111125457A (en)* | 2019-12-13 | 2020-05-08 | 山东浪潮人工智能研究院有限公司 | A deep cross-modal hash retrieval method and device |
| CN110990597A (en)* | 2019-12-19 | 2020-04-10 | 中国电子科技集团公司信息科学研究院 | Cross-modal data retrieval system and retrieval method based on text semantic mapping |
| CN110990597B (en)* | 2019-12-19 | 2022-11-25 | 中国电子科技集团公司信息科学研究院 | Cross-modal data retrieval system and retrieval method based on text semantic mapping |
| CN111177421A (en)* | 2019-12-30 | 2020-05-19 | 论客科技(广州)有限公司 | Method and device for generating email historical event axis facing digital human |
| WO2021136318A1 (en)* | 2019-12-30 | 2021-07-08 | 论客科技(广州)有限公司 | Digital humanities-oriented email history eventline generating method and apparatus |
| CN111221993A (en)* | 2020-01-09 | 2020-06-02 | 山东建筑大学 | Visual media retrieval method based on depth binary detail perception hash |
| CN111221993B (en)* | 2020-01-09 | 2023-07-07 | 山东建筑大学 | Visual media retrieval method based on depth binary detail perception hash |
| CN111353076B (en)* | 2020-02-21 | 2023-10-10 | 华为云计算技术有限公司 | Methods for training cross-modal retrieval models, cross-modal retrieval methods and related devices |
| CN111353076A (en)* | 2020-02-21 | 2020-06-30 | 华为技术有限公司 | Method for training cross-modal retrieval model, cross-modal retrieval method and related device |
| CN111368176A (en)* | 2020-03-02 | 2020-07-03 | 南京财经大学 | Cross-modal Hash retrieval method and system based on supervision semantic coupling consistency |
| CN111368176B (en)* | 2020-03-02 | 2023-08-18 | 南京财经大学 | Cross-modal hash retrieval method and system based on supervision semantic coupling consistency |
| CN111651660A (en)* | 2020-05-28 | 2020-09-11 | 拾音智能科技有限公司 | Method for cross-media retrieval of difficult samples |
| CN111651660B (en)* | 2020-05-28 | 2023-05-02 | 拾音智能科技有限公司 | Method for cross-media retrieval of difficult samples |
| CN111813967B (en)* | 2020-07-14 | 2024-01-30 | 中国科学技术信息研究所 | Retrieval method, retrieval device, computer equipment and storage medium |
| CN111813967A (en)* | 2020-07-14 | 2020-10-23 | 中国科学技术信息研究所 | Retrieval method, retrieval device, computer equipment and storage medium |
| CN111897909A (en)* | 2020-08-03 | 2020-11-06 | 兰州理工大学 | A method and system for ciphertext speech retrieval based on depth-aware hashing |
| CN111914156A (en)* | 2020-08-14 | 2020-11-10 | 中国科学院自动化研究所 | Adaptive label-aware graph convolutional network cross-modal retrieval method and system |
| CN111914156B (en)* | 2020-08-14 | 2023-01-20 | 中国科学院自动化研究所 | Cross-modal retrieval method and system for self-adaptive label perception graph convolution network |
| CN112035700A (en)* | 2020-08-31 | 2020-12-04 | 兰州理工大学 | Voice deep hash learning method and system based on CNN |
| CN112100413A (en)* | 2020-09-07 | 2020-12-18 | 济南浪潮高新科技投资发展有限公司 | A cross-modal hash retrieval method |
| CN112199520A (en)* | 2020-09-19 | 2021-01-08 | 复旦大学 | Cross-modal Hash retrieval algorithm based on fine-grained similarity matrix |
| CN112199520B (en)* | 2020-09-19 | 2022-07-22 | 复旦大学 | Cross-modal Hash retrieval algorithm based on fine-grained similarity matrix |
| CN112613451A (en)* | 2020-12-29 | 2021-04-06 | 民生科技有限责任公司 | Modeling method of cross-modal text picture retrieval model |
| CN113095415B (en)* | 2021-04-15 | 2022-06-14 | 齐鲁工业大学 | A cross-modal hashing method and system based on multimodal attention mechanism |
| CN113095415A (en)* | 2021-04-15 | 2021-07-09 | 齐鲁工业大学 | Cross-modal hashing method and system based on multi-modal attention mechanism |
| CN113157739B (en)* | 2021-04-23 | 2024-01-09 | 平安科技(深圳)有限公司 | Cross-modal retrieval method and device, electronic equipment and storage medium |
| CN113157739A (en)* | 2021-04-23 | 2021-07-23 | 平安科技(深圳)有限公司 | Cross-modal retrieval method and device, electronic equipment and storage medium |
| CN113270199B (en)* | 2021-04-30 | 2024-04-26 | 贵州师范大学 | Medical cross-mode multi-scale fusion class guide hash method and system thereof |
| CN113270199A (en)* | 2021-04-30 | 2021-08-17 | 贵州师范大学 | Medical cross-modal multi-scale fusion class guidance hash method and system thereof |
| CN113342922A (en)* | 2021-06-17 | 2021-09-03 | 北京邮电大学 | Cross-modal retrieval method based on fine-grained self-supervision of labels |
| CN113177132A (en)* | 2021-06-30 | 2021-07-27 | 中国海洋大学 | Image retrieval method based on depth cross-modal hash of joint semantic matrix |
| CN113536067B (en)* | 2021-07-20 | 2024-01-05 | 南京邮电大学 | Cross-modal information retrieval method based on semantic fusion |
| CN113536067A (en)* | 2021-07-20 | 2021-10-22 | 南京邮电大学 | Cross-modal information retrieval method based on semantic fusion |
| CN113658683A (en)* | 2021-08-05 | 2021-11-16 | 重庆金山医疗技术研究院有限公司 | Disease diagnosis system and data recommendation method |
| CN113806580B (en)* | 2021-09-28 | 2023-10-20 | 西安电子科技大学 | Cross-modal hash retrieval method based on hierarchical semantic structure |
| CN113806580A (en)* | 2021-09-28 | 2021-12-17 | 西安电子科技大学 | A Cross-modal Hash Retrieval Method Based on Hierarchical Semantic Structure |
| CN113792207A (en)* | 2021-09-29 | 2021-12-14 | 嘉兴学院 | Cross-modal retrieval method based on multi-level feature representation alignment |
| CN113792207B (en)* | 2021-09-29 | 2023-11-17 | 嘉兴学院 | Cross-modal retrieval method based on multi-level feature representation alignment |
| CN114359930A (en)* | 2021-12-17 | 2022-04-15 | 华南理工大学 | Depth cross-modal hashing method based on fusion similarity |
| CN114359930B (en)* | 2021-12-17 | 2024-09-17 | 华南理工大学 | Depth cross-modal hash method based on fusion similarity |
| CN114239730A (en)* | 2021-12-20 | 2022-03-25 | 华侨大学 | A Cross-modal Retrieval Method Based on Neighbor Ranking Relation |
| CN114239730B (en)* | 2021-12-20 | 2024-08-20 | 华侨大学 | Cross-modal retrieval method based on neighbor ordering relation |
| CN114780777A (en)* | 2022-04-06 | 2022-07-22 | 中国科学院上海高等研究院 | Cross-modal retrieval method and device, storage medium and terminal based on semantic enhancement |
| CN114780777B (en)* | 2022-04-06 | 2022-12-20 | 中国科学院上海高等研究院 | Cross-modal retrieval method and device, storage medium and terminal based on semantic enhancement |
| CN116955675B (en)* | 2023-09-21 | 2023-12-12 | 中国海洋大学 | Hash image retrieval method and network based on fine-grained similarity relation contrast learning |
| CN116955675A (en)* | 2023-09-21 | 2023-10-27 | 中国海洋大学 | Hash image retrieval method and network based on fine-grained similarity relationship contrastive learning |
| CN118839699A (en)* | 2024-07-12 | 2024-10-25 | 电子科技大学 | Weak-supervision cross-mode semantic consistency recovery method |
| CN118884530A (en)* | 2024-07-30 | 2024-11-01 | 北京交通大学 | A method for distinguishing earthquake response of ancient buildings based on OpenCV and perceptual hashing algorithm |
| CN118884530B (en)* | 2024-07-30 | 2025-05-16 | 北京交通大学 | Ancient building earthquake response discrimination method of OpenCV and perceptual hash algorithm |
| Publication | Publication Date | Title |
|---|---|---|
| CN110110122A (en) | Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval | |
| Shu et al. | Specific class center guided deep hashing for cross-modal retrieval | |
| Wu et al. | Weakly semi-supervised deep learning for multi-label image annotation | |
| Yang et al. | Transfer learning for sequence tagging with hierarchical recurrent networks | |
| Jiang et al. | A survey on artificial intelligence in Chinese sign language recognition | |
| Wang et al. | Cross-modal retrieval: a systematic review of methods and future directions | |
| Zhu et al. | Multi-modal deep analysis for multimedia | |
| Su et al. | Semi-supervised knowledge distillation for cross-modal hashing | |
| Wang et al. | Facilitating image search with a scalable and compact semantic mapping | |
| Liu et al. | OMGH: Online manifold-guided hashing for flexible cross-modal retrieval | |
| CN113672693B (en) | Tag recommendation method for online question answering platform based on knowledge graph and tag association | |
| Li et al. | DAHP: Deep attention-guided hashing with pairwise labels | |
| Wang et al. | Large-scale text classification using scope-based convolutional neural network: A deep learning approach | |
| US11935278B1 (en) | Image labeling for artificial intelligence datasets | |
| CN114168784A (en) | A Hierarchical Supervised Cross-modal Image and Text Retrieval Method | |
| Lin et al. | Multi-modality weakly labeled sentiment learning based on explicit emotion signal for Chinese microblog | |
| Zhang et al. | Learning multi-layer coarse-to-fine representations for large-scale image classification | |
| Xu et al. | Weakly supervised facial expression recognition via transferred DAL-CNN and active incremental learning | |
| Wang et al. | Robust local metric learning via least square regression regularization for scene recognition | |
| Liu et al. | Sparse autoencoder for social image understanding | |
| Li et al. | Multimodal fusion with co-attention mechanism | |
| Perdana et al. | Instance-based deep transfer learning on cross-domain image captioning | |
| Wu et al. | Contrastive multi-bit collaborative learning for deep cross-modal hashing | |
| Meng et al. | Concept-concept association information integration and multi-model collaboration for multimedia semantic concept detection | |
| CN109255098B (en) | A matrix factorization hashing method based on reconstruction constraints |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication | Application publication date:20190809 |