Movatterモバイル変換


[0]ホーム

URL:


CN110110122A - Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval - Google Patents

Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval
Download PDF

Info

Publication number
CN110110122A
CN110110122ACN201810649234.7ACN201810649234ACN110110122ACN 110110122 ACN110110122 ACN 110110122ACN 201810649234 ACN201810649234 ACN 201810649234ACN 110110122 ACN110110122 ACN 110110122A
Authority
CN
China
Prior art keywords
similarity
data
image
text
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810649234.7A
Other languages
Chinese (zh)
Inventor
冀振燕
姚伟娜
杨文韬
皮怀雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong UniversityfiledCriticalBeijing Jiaotong University
Priority to CN201810649234.7ApriorityCriticalpatent/CN110110122A/en
Publication of CN110110122ApublicationCriticalpatent/CN110110122A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明涉及结合深度学习与哈希方法的图像‑文本跨模态检索模型。为了解决传统基于深度学习的跨模态哈希方法在处理多标签数据问题时直接将其转换为单标签问题的局限性,提出了一种基于多层语义的深度跨模态哈希算法。通过多标签数据之间的共现关系定义数据之间的相似度,并以此作为网络训练的监督信息。设计综合考虑多层语义相似度与二值相似度的损失函数,对网络进行训练,使得特征提取和哈希码学习过程统一在一个框架内,实现端到端学习。该算法充分利用数据之间的语义相关性信息,提高了检索准确率。

The invention relates to an image-text cross-modal retrieval model combining deep learning and a hash method. In order to address the limitations of traditional deep learning-based cross-modal hashing methods when dealing with multi-label data problems and directly convert them to single-label problems, a deep cross-modal hashing algorithm based on multi-layer semantics is proposed. The similarity between data is defined by the co-occurrence relationship between multi-label data, and it is used as the supervisory information for network training. Design a loss function that comprehensively considers multi-layer semantic similarity and binary similarity, and train the network, so that the process of feature extraction and hash code learning is unified in one framework, and end-to-end learning is realized. The algorithm makes full use of the semantic correlation information between the data and improves the retrieval accuracy.

Description

Translated fromChinese
基于多层语义深度哈希算法的图像-文本跨模态检索Image-text cross-modal retrieval based on multi-layer semantic deep hashing algorithm

技术领域technical field

本发明涉及到跨模态检索领域,尤其涉及到一种基于多层语义的结合深度学习与哈希方法的图像-文本跨模态检索算法。The invention relates to the field of cross-modal retrieval, in particular to an image-text cross-modal retrieval algorithm based on multi-layer semantics combined with deep learning and a hash method.

背景技术Background technique

随着移动互联网的发展和智能手机、数码相机等设备的普及,互联网上的多媒体数据呈爆炸式增长。在信息检索领域,多媒体大数据的不断增长带来了跨模态检索应用需求。而目前主流的搜索引擎,如百度、谷歌、必应等,仅提供一种模态的检索结果。此外,随着深度学习在计算机视觉、自然语言处理等领域取得一系列突破性进展,将多媒体大数据与人工智能相结合,是两个领域未来共同的发展趋势。因此,结合新技术和新需求,探索新的跨模态检索模式成为当前信息检索领域亟待解决的挑战之一。With the development of mobile Internet and the popularization of smart phones, digital cameras and other devices, the multimedia data on the Internet is growing explosively. In the field of information retrieval, the continuous growth of multimedia big data has brought about the demand for cross-modal retrieval applications. However, the current mainstream search engines, such as Baidu, Google, Bing, etc., only provide one mode of retrieval results. In addition, as deep learning has made a series of breakthroughs in the fields of computer vision and natural language processing, the combination of multimedia big data and artificial intelligence is a common development trend in the future of the two fields. Therefore, combining new technologies and new needs, exploring new cross-modal retrieval models has become one of the challenges to be solved in the field of information retrieval.

传统的跨模态检索通常采用依赖领域知识的手工设计特征,“语义鸿沟”问题仍是该领域的难点。将深度学习应用于跨模态检索领域,不仅为解决不同模态异质数据之间的“媒体鸿沟”提供了大量特征学习与表示方面先进的研究成果。然而,随着多媒体数据的不断增长,采用深度学习的特征表示由于维数过大而面临存储空间与检索效率的挑战,导致无法适应大规模多媒体数据检索任务。同时,跨模态检索问题还面临真实数据存在多个标签的问题。现有的解决方法大部分均采用了将问题转化为二值相关的单标签学习问题,导致学习到的模型不能充分保留数据在原语义空间的关联关系,影响最终检索结果Traditional cross-modal retrieval usually adopts handcrafted features that rely on domain knowledge, and the "semantic gap" problem is still a difficult point in this field. Applying deep learning to the field of cross-modal retrieval not only provides a large number of advanced research results in feature learning and representation for solving the "media gap" between heterogeneous data of different modalities. However, with the continuous growth of multimedia data, the feature representation using deep learning faces the challenges of storage space and retrieval efficiency due to the large dimensionality, which makes it unable to adapt to large-scale multimedia data retrieval tasks. At the same time, the cross-modal retrieval problem also faces the problem of multiple labels in real data. Most of the existing solutions use the problem of converting the problem into a single-label learning problem of binary correlation, resulting in the learned model not being able to fully preserve the relationship between the data in the original semantic space and affecting the final retrieval results.

发明内容Contents of the invention

本发明的目的在于克服现有技术的不足,将结合基于深度学习的特征表示,并同时考虑图像、文本两种模态数据的二值相似性和多层语义相似性,应用哈希方法通过网络训练得到数据到哈希码的映射,提供一种检索准确率更高的图像-文本跨模态检索方法。The purpose of the present invention is to overcome the deficiencies of the prior art, combining the feature representation based on deep learning, and simultaneously considering the binary similarity and multi-layer semantic similarity of the two modal data of image and text, applying the hash method to pass through the network The mapping from training data to hash codes provides an image-text cross-modal retrieval method with higher retrieval accuracy.

为实现上述目的,本发明所提供的技术方案为:In order to achieve the above object, the technical scheme provided by the present invention is:

分为三个模块,分别为深度特征提取模块、相似度矩阵生成模块、哈希码学习模块;It is divided into three modules, namely, the deep feature extraction module, the similarity matrix generation module, and the hash code learning module;

其中,深度特征提取模块采用深度神经网络提取图像和文本数据特征。该模块采用两个子网络分别提取图像和文本模态数据特征的结构,即包含两个深度神经网络,一个用于提取图像数据的特征,一个用于提取文本数据特征。采用深度卷积神经网络CNN-F网络结构进行图像特征提取。CNN-F的结构由5层卷积层和3层全连接层构成。在文本特征提取阶段,首先以词袋(Bag-of-Words,BOW)向量对文本数据建模。基于上述词袋模型,文本特征提取网络采用由三层全连接层构成的多层感知机(Multi-Layer Perception,MLP)网络提取文本特征。Among them, the deep feature extraction module uses a deep neural network to extract image and text data features. This module adopts a structure in which two sub-networks extract image and text modal data features respectively, that is, it contains two deep neural networks, one is used to extract image data features, and the other is used to extract text data features. The deep convolutional neural network CNN-F network structure is used for image feature extraction. The structure of CNN-F consists of 5 convolutional layers and 3 fully connected layers. In the text feature extraction stage, the text data is first modeled with a Bag-of-Words (BOW) vector. Based on the above-mentioned bag-of-words model, the text feature extraction network uses a multi-layer perceptron (Multi-Layer Perception, MLP) network composed of three fully connected layers to extract text features.

对于相似度矩阵生成模块,包含二值相似度矩阵生成和多层语义相似度矩阵生成。它们各自生成一个跨模态相似度矩阵。对于二值相似度矩阵当图像i与文本j相似时,矩阵对应的取值为1;当图像i与文本j不相似时,矩阵对应的取值为0。对于多层语义相似度矩阵根据标签共现关系设计其计算方法,使得两个样本的类别标签集拥有更多相似标签时,样本的相似度越大,当两个标签集完全相同时,达到最大值1。当两个样本标签集中的标签完全不同时,取最小值0。For the similarity matrix generation module, it includes binary similarity matrix generation and multi-layer semantic similarity matrix generation. They each generate a cross-modal similarity matrix. For a binary similarity matrix When image i is similar to text j, the matrix corresponds to The value is 1; when the image i is not similar to the text j, the corresponding matrix The value is 0. For multi-layer semantic similarity matrix The calculation method is designed according to the label co-occurrence relationship, so that when the category label sets of two samples have more similar labels, the similarity of the samples is greater. When the two label sets are exactly the same, reaches a maximum value of 1. When the labels in the two sample label sets are completely different, Take the minimum value of 0.

对于哈希码生成模块,为了使学习到的哈希码保留二值相似度矩阵及多层语义相似度矩阵中的语义信息,设计目标函数:For the hash code generation module, in order to make the learned hash code retain the binary similarity matrix and multi-level semantic similarity matrix Semantic information in the design objective function:

其中,in,

通过优化该目标函数,学习网络参数,得到数据与哈希码的映射关系。By optimizing the objective function and learning network parameters, the mapping relationship between data and hash codes is obtained.

与现有技术相比,本方案原理及优点如下:Compared with the existing technology, the principle and advantages of this scheme are as follows:

本方案结合深度学习与哈希方法,克服传统手工设计特征在特征表示能力上的不足,及深度特征维数过大,不利于数据存储和计算的缺点,并结合二值相似度和多层语义相似度,充分考虑跨模态数据之间复杂的相似度关系,使学习到的哈希码保留更多语义信息,提高检索准确率。This solution combines deep learning and hashing methods to overcome the shortcomings of traditional manual design features in terms of feature representation capabilities, and the large dimension of deep features is not conducive to data storage and calculation, and combines binary similarity and multi-layer semantics Similarity fully considers the complex similarity relationship between cross-modal data, so that the learned hash code retains more semantic information and improves retrieval accuracy.

附图说明Description of drawings

图1为本发明基于多层语义深度哈希算法的图像-文本跨模态检索的整体框架图;Fig. 1 is the overall frame diagram of the image-text cross-modal retrieval based on the multi-layer semantic depth hashing algorithm of the present invention;

具体实施方式Detailed ways

下面结合具体实例对本发明作进一步说明:The present invention will be further described below in conjunction with specific example:

本发明中皆以图像和文本两种模态为例进行讨论。In the present invention, both image and text modes are taken as examples for discussion.

本发明提供了一种基于多层语义深度哈希算法的图像-文本跨模态检索(DeepMulti-Level Semantic Hashing for Cross-modal Retrieval,DMSH)方法,其中包含三个模块:深度特征提取模块、相似度矩阵生成模块、哈希码学习模块,如图1所示;The present invention provides an image-text cross-modal retrieval (DeepMulti-Level Semantic Hashing for Cross-modal Retrieval, DMSH) method based on a multi-layer semantic depth hashing algorithm, which includes three modules: deep feature extraction module, similarity Degree matrix generation module, hash code learning module, as shown in Figure 1;

表1图像特征提取网络结构Table 1 Image feature extraction network structure

深度特征提取模块采用深度神经网络提取图像和文本数据特征。采用深度卷积神经网络CNN-F网络结构进行图像特征提取,网络结构配置如表1所示。在文本特征提取阶段,首先以词袋向量对文本数据建模。基于词袋模型,文本特征提取网络采用由三层全连接层构成的多层感知机网络提取文本特征,网络配置如表2所示.The deep feature extraction module uses a deep neural network to extract image and text data features. The deep convolutional neural network CNN-F network structure is used for image feature extraction, and the network structure configuration is shown in Table 1. In the text feature extraction stage, the text data is first modeled with bag-of-words vectors. Based on the bag-of-words model, the text feature extraction network uses a multi-layer perceptron network composed of three fully connected layers to extract text features. The network configuration is shown in Table 2.

其中,conv1层采用4步长卷积,conv2-conv5层均采用1步长卷积。pad即补边(Padding),表示步长移动方式。通常指给图像边缘补边,使得卷积后输出的图像尺寸与原尺寸一致。LRN表示局部响应归一化(Local Response Normalization)。其模仿生物神经元的侧抑制机制,对局部神经元的活动创建竞争机制,使响应较大的值更大,并抑制反馈较小的神经元,增强模型泛化能力。采用MAX操作的池化技术,取原图像某一尺寸内的最大值,从而有效减少模型参数,防止过拟合。并通过Dropout正则化技术,通过在训练期间随机的丢弃一定数量的神经元,防止网络过拟合。Among them, the conv1 layer uses 4-step convolution, and the conv2-conv5 layers all use 1-step convolution. pad is Padding, which means the step size movement method. It usually refers to filling the edge of the image so that the size of the output image after convolution is consistent with the original size. LRN stands for Local Response Normalization. It imitates the lateral inhibition mechanism of biological neurons, creates a competition mechanism for the activities of local neurons, makes the value of larger responses larger, and inhibits neurons with smaller feedbacks to enhance the generalization ability of the model. The pooling technology of MAX operation is used to take the maximum value within a certain size of the original image, thereby effectively reducing model parameters and preventing overfitting. And through the Dropout regularization technique, a certain number of neurons are randomly discarded during training to prevent the network from overfitting.

表2文本特征提取网络Table 2 Text Feature Extraction Network

其中,网络的第一个隐藏层是与输入词袋向量长度相同的全连接层,第二层隐藏层是4096维全连接层,第三层是长度为哈希码长的全连接层。网络的输出即文本特征向量。Among them, the first hidden layer of the network is a fully connected layer with the same length as the input bag of words vector, the second hidden layer is a 4096-dimensional fully connected layer, and the third layer is a fully connected layer whose length is the hash code length. The output of the network is the text feature vector.

相似度矩阵生成模块包含二值相似度矩阵生成和多层语义相似度矩阵生成。它们各自生成一个跨模态相似度矩阵对于二值相似度矩阵当图像i与文本j相似时,矩阵对应的取值为1;当图像i与文本j不相似时,矩阵对应的取值为0。其中,不同模态数据之间的相似性通过类别标签衡量。即若图像i和文本j有共同的一组类别标签,那么认为它们是相似的;否则认为它们是不相似的。其定义如下:The similarity matrix generation module includes binary similarity matrix generation and multi-layer semantic similarity matrix generation. They each generate a cross-modal similarity matrix For a binary similarity matrix When image i is similar to text j, the matrix corresponds to The value is 1; when the image i is not similar to the text j, the corresponding matrix The value is 0. Among them, the similarity between different modal data is measured by category labels. That is, image i and text j are considered similar if they share a common set of class labels; otherwise, they are considered dissimilar. It is defined as follows:

对于多层语义相似度矩阵采用一种基于类别标签共现关系的相似度矩阵计算方法;下面介绍具体生成方法。For multi-layer semantic similarity matrix A similarity matrix calculation method based on the co-occurrence relationship of category tags is adopted; the specific generation method is introduced below.

对于两个类别标签ti,tj,定义标签相似度:For two category labels ti , tj , define the label similarity:

其中,d(ti,tj)表示两个标签的语义距离,定义如下:Among them, d(ti , tj ) represents the semantic distance between two labels, which is defined as follows:

其中,分别表示训练集中ti,tj出现的次数;表示ti,tj共同出现的次数;Nc表示训练集中所有标签的个数。in, respectively represent the number of occurrences of ti and tj in the training set; Indicates the number of co-occurrences of ti and tj ; Nc indicates the number of all labels in the training set.

由定义(2)可知,s(ti,tj)∈[0,1],表示当两个标签共同出现的次数越多时,它们的相似度越大。根据标签相似性s,可定义样本间的相似性According to definition (2), s(ti , tj ) ∈ [0, 1] means that when two tags co-occur more times, their similarity is greater. According to the label similarity s, the similarity between samples can be defined

对于两个样本Dm,Dn,定义样本相似度For two samples Dm , Dn , define the sample similarity

其中,tm,tm分别表示样本Dm,Dn的类别标签集;|tm|,|tn|分别表示tm,tn的个数;即哈希标签。由定义可知,当两个样本的类别标签集拥有更多相似标签时,样本的相似度越大,当两个标签集tm,tn完全相同时,达到最大值1。当tm中的标签与tn中的标签全部不相似时,取最小值0。因此,基于多标签的语义相似度矩阵可以作为哈希码学习过程的监督信息。与二值相似度矩阵相比,将跨模态相似度由离散的{0,1}扩展为连续的[0,1]区间取值,保留了更多隐含在数据类别标签中的丰富的语义信息。Among them, tm , tm represent the category label sets of samples Dm , Dn respectively; |tm |, |tn | represent the number of tm , tn respectively; Namely hashtags. It can be seen from the definition that when the category label sets of two samples have more similar labels, the similarity of the samples is greater. When the two label sets tm and tn are exactly the same, reaches a maximum value of 1. When the labels in tm are all dissimilar to those in tn , Take the minimum value of 0. Therefore, the semantic similarity matrix based on multi-label Can be used as supervisory information for the hash code learning process. with binary similarity matrix compared to, Extending the cross-modal similarity from discrete {0,1} to continuous [0,1] interval values retains more rich semantic information hidden in data category labels.

哈希码学习模块,以表示学习到的样本Di的图像特征,即图像特征提取网络的输出;以表示学习到的样本Dj的文字特征,即文字特征提取网络的输出。分别表示两个深度网络的参数。Hash code learning module, with Represents the image features of the learned sample Di , that is, the output of the image feature extraction network; Indicates the learned text feature of the sample Dj , that is, the output of the text feature extraction network. represent the parameters of the two deep networks, respectively.

为了使学习到的哈希码保留二值相似度矩阵的语义信息,采用sigmoid交叉熵损失函数:In order for the learned hash codes to preserve the binary similarity matrix The semantic information of , using the sigmoid cross entropy loss function:

其中,为保证训练过程的稳定性及避免溢出,在实现阶段采用(3-5)的等价形式:in, In order to ensure the stability of the training process and avoid overflow, the equivalent form of (3-5) is used in the implementation stage:

基于上述二值语义信息损失函数进一步引入多层语义损失函数使得学习到的模型保留包含在多层语义相似度矩阵中更加丰富的语义信息。这里同样采用sigmoid交叉熵损失函数的等价形式:Based on the above binary semantic information loss function Further introduce a multi-layer semantic loss function Make the learned model retain the multi-layer semantic similarity matrix Richer semantic information in . The equivalent form of the sigmoid cross-entropy loss function is also used here:

因此,可以得到目标函数的完整形式:Therefore, the complete form of the objective function can be obtained:

其中,F(g)、F(x)分别表示学习到的图像和文本的特征向量,它们包含了相似度矩阵中的语义信息;C(g)、C(x)分别表示图像和文本的哈希码,sign(·)表示符号函数,定义如式(3-9)。F(g)、F(x)中的语义信息通过符号函数传递给C(g)、C(x)表示斐波那契范数,E表示元素取值全为1的向量;μ,ρ,τ为超参数。Among them, F(g) and F(x) represent the feature vectors of the learned image and text respectively, which contain the similarity matrix Semantic information in ; C(g) and C(x) represent the hash codes of images and texts respectively, and sign( ) represents the sign function, defined as in formula (3-9). The semantic information in F(g) and F(x) is transferred to C(g) and C(x) through symbolic functions; Represents the Fibonacci norm, E represents a vector whose elements are all 1; μ, ρ, τ are hyperparameters.

C(g)=sign(F(g)) (9)C(g) = sign(F(g) ) (9)

C(x)=sign(F(x)) (10)C(x) = sign(F(x) ) (10)

目标函数的前两项是跨模态相似度的负对数似然函数,通过优化该项可保证当越大时,F(g)*i与F(x)*j的相似度越大;越小,F(g)*i与F(x)*j的相似度越小。因此,优化第1、2项保证了网络学习到的图像和文本的特征保留了原来语义空间的跨模态相似性。The first two terms of the objective function are the negative logarithmic likelihood function of the cross-modal similarity, and by optimizing this term, it can be guaranteed that when The larger the value, the greater the similarity between F(g)*i and F(x)*j ; The smaller , the smaller the similarity between F(g)*i and F(x)*j . Therefore, optimizing items 1 and 2 ensures that the image and text features learned by the network retain the cross-modal similarity of the original semantic space.

目标函数的第3项为正则化项,通过优化该项,得到图像和文本的哈希码C(g)、C(x),并且保留了网络提取的特征F(g)*i与F(x)*j的相似性。由于F(g)*i与F(x)*j保持了语义空间的跨模态相似性,因此得到的哈希码也保留了语义空间的跨模态相似性。Term 3 of the objective function is a regularization item, by optimizing this item, the hash codes C(g) and C(x) of images and texts are obtained, and the features extracted by the network F(g)*i are similar to F(x)*j sex. Since F(g)*i and F(x)*j preserve the cross-modal similarity of semantic space, the resulting hash code also preserves the cross-modal similarity of semantic space.

通过优化目标函数的第4项,使得最终得到的哈希码的每一位在整个训练集上取值为“1”和“-1”的个数保持平衡,即哈希码的同一位置上取“1”和“-1”的个数各占一半。这一约束可以保证哈希码的每一位包含的信息最大化。By optimizing the fourth item of the objective function, the number of values of "1" and "-1" for each bit of the final hash code on the entire training set remains balanced, that is, at the same position of the hash code Take half of the number of "1" and "-1". This constraint can ensure that the information contained in each bit of the hash code is maximized.

实验表明,在网络的训练过程中,令来自同一数据点的图像和文本取完全相同的哈希码,能更好的提升网络的性能。因此,本文在原目标函数的基础上增加加约束C(g)=C(x)=C,最终的目标函数为:Experiments show that during the training process of the network, making images and texts from the same data point have exactly the same hash code can better improve the performance of the network. Therefore, this paper adds a constraint C(g) = C(x) = C on the basis of the original objective function, and the final objective function is:

通过优化该目标函数,使得网络同时学习特征提取的参数和哈希码表示,即将特征学习和哈希码学习过程统一在一个深度学习框架中,实现端到端学习。By optimizing the objective function, the network learns the parameters of feature extraction and the hash code representation at the same time, that is, the process of feature learning and hash code learning is unified in a deep learning framework to achieve end-to-end learning.

在测试及应用阶段,输入任意的单一模态的图像或文本数据,都可以通过训练好的网络来生成其对应的二值码向量,即哈希码。In the testing and application phase, any single-modal image or text data can be input, and its corresponding binary code vector, that is, hash code, can be generated through the trained network.

具体的,将数据点Di的图像模态gi输入网络,通过网络的前向传播可生成其哈希码表示,计算过程如下:Specifically, the image modality gi of the data point Di is input into the network, and its hash code representation can be generated through the forward propagation of the network. The calculation process is as follows:

类似地,对数据点Dj的文本模态xj,通过网络的前向传播可以生成其对应的哈希码:Similarly, for the text modalityxj of data pointDj , the forward propagation through the network can generate its corresponding hash code:

因此,本文提出的DMSH检索模型可以实现给定图像或文本任意一种模态的查询数据,返回不同模态数据库中与之最相似的前k个检索结果。检索过程中,首先计算查询数据(Query)的哈希码与待检索数据库中存储的哈希码之间的距离,然后返回距离最近的前k个哈希码,其所对应的k个数据即最终检索结果。Therefore, the DMSH retrieval model proposed in this paper can realize the query data of any modality of a given image or text, and return the top k most similar retrieval results in different modality databases. In the retrieval process, first calculate the distance between the hash code of the query data (Query) and the hash code stored in the database to be retrieved, and then return the first k hash codes closest to the distance, and the corresponding k data are Final search results.

Claims (5)

Translated fromChinese
1.一种基于多层语义深度哈希算法的图像-文本跨模态检索方法。其特征在于:整体框架包含三个模块:深度特征提取模块、相似度矩阵生成模块、哈希码学习模块;分别采用两个深度神经网络提取图像和文字特征,将特征学习和哈希码学习过程统一在一个框架内,并通过引入基于标签共现的多层次语义监督信息指导整个训练过程,使得到的二值码不仅保留了原样本空间基本的相似/不相似关系,并且能够区分样本间的相似程度,更大程度的保留样本间的高层语义,提高检索准确率;在结构上,通过对网络施加“在语义空间相似的图像和文字在汉明空间具有相似的哈希码”这一约束进行训练,直接将哈希码作为网络的输出,实现端到端学习,从而保证学习到的特征适应特定的检索任务。1. An image-text cross-modal retrieval method based on multi-layer semantic deep hashing algorithm. It is characterized in that: the overall framework includes three modules: deep feature extraction module, similarity matrix generation module, and hash code learning module; two deep neural networks are used to extract image and text features, and the feature learning and hash code learning processes are combined Unified within a framework, and guiding the entire training process by introducing multi-level semantic supervision information based on label co-occurrence, the obtained binary code not only retains the basic similarity/dissimilarity relationship of the original sample space, but also can distinguish between samples The degree of similarity, retains the high-level semantics between samples to a greater extent, and improves the retrieval accuracy; structurally, by imposing the constraint that "images and texts that are similar in semantic space have similar hash codes in Hamming space" to the network For training, the hash code is directly used as the output of the network to realize end-to-end learning, so as to ensure that the learned features are suitable for specific retrieval tasks.2.根据权利要求1所述的一种基于多层语义深度哈希算法的图像-文本跨模态检索方法,其特征在于:整体框架由深度特征提取模块、相似度矩阵生成模块、哈希码学习模块三个部分构成,通过将原始空间的数据映射为汉明空间中由统一形式的“+1/-1”构成的二值码向量,降低存储空间,提高计算效率。2. A kind of image-text cross-modal retrieval method based on multi-layer semantic depth hash algorithm according to claim 1, characterized in that: the overall framework consists of a deep feature extraction module, a similarity matrix generation module, a hash code The learning module consists of three parts. By mapping the data in the original space to a binary code vector composed of a unified form of "+1/-1" in the Hamming space, the storage space is reduced and the calculation efficiency is improved.3.根据权利要求1所述的一种基于多层语义深度哈希算法的图像-文本跨模态检索方法,其特征在于:深度特征提取模块对图像和文本数据分别采用不同的深度神经网络,提取两种模态数据的语义特征,对图像数据,采用改进的CNN-F网络,对文本数据,采用多层感知机网络。3. a kind of image-text cross-modal retrieval method based on multi-layer semantic depth hashing algorithm according to claim 1, is characterized in that: depth feature extraction module adopts different deep neural networks respectively to image and text data, Extract the semantic features of the two modal data, use the improved CNN-F network for the image data, and use the multi-layer perceptron network for the text data.4.根据权利要求1所述的一种基于多层语义深度哈希算法的图像-文本跨模态检索方法,其特征在于:相似度矩阵生成模块根据不同模态数据之间是否有共同标签生成二值相似度矩阵,根据不同模态数据标签的相似性大小生成多层语义相似度矩阵,保留更多标签提供的寓意信息。4. A kind of image-text cross-modal retrieval method based on multi-layer semantic depth hash algorithm according to claim 1, characterized in that: the similarity matrix generation module generates according to whether there are common labels between different modal data The binary similarity matrix generates a multi-layer semantic similarity matrix according to the similarity of different modal data labels, and retains more implication information provided by labels.5.根据权利要求1所述的一种基于多层语义深度哈希算法的图像-文本跨模态检索方法,其特征在于:哈希码学习模块通过设计同时保留数据在原语义空间的二值相似度信息和多层语义相似度信息的目标函数,对网络进行训练,学习特征空间到汉明空间的映射。5. A kind of image-text cross-modal retrieval method based on multi-layer semantic depth hash algorithm according to claim 1, characterized in that: the hash code learning module retains the binary similarity of data in the original semantic space by design The objective function of degree information and multi-layer semantic similarity information is used to train the network and learn the mapping from feature space to Hamming space.
CN201810649234.7A2018-06-222018-06-22Image based on multilayer semanteme depth hash algorithm-text cross-module state retrievalPendingCN110110122A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201810649234.7ACN110110122A (en)2018-06-222018-06-22Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201810649234.7ACN110110122A (en)2018-06-222018-06-22Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval

Publications (1)

Publication NumberPublication Date
CN110110122Atrue CN110110122A (en)2019-08-09

Family

ID=67483310

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201810649234.7APendingCN110110122A (en)2018-06-222018-06-22Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval

Country Status (1)

CountryLink
CN (1)CN110110122A (en)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110597878A (en)*2019-09-162019-12-20广东工业大学 A cross-modal retrieval method, device, equipment and medium for multimodal data
CN110765281A (en)*2019-11-042020-02-07山东浪潮人工智能研究院有限公司 A Multi-Semantic Deeply Supervised Cross-modal Hash Retrieval Method
CN110990597A (en)*2019-12-192020-04-10中国电子科技集团公司信息科学研究院 Cross-modal data retrieval system and retrieval method based on text semantic mapping
CN111026887A (en)*2019-12-092020-04-17武汉科技大学Cross-media retrieval method and system
CN111125457A (en)*2019-12-132020-05-08山东浪潮人工智能研究院有限公司 A deep cross-modal hash retrieval method and device
CN111177421A (en)*2019-12-302020-05-19论客科技(广州)有限公司Method and device for generating email historical event axis facing digital human
CN111221993A (en)*2020-01-092020-06-02山东建筑大学Visual media retrieval method based on depth binary detail perception hash
CN111353076A (en)*2020-02-212020-06-30华为技术有限公司Method for training cross-modal retrieval model, cross-modal retrieval method and related device
CN111368176A (en)*2020-03-022020-07-03南京财经大学Cross-modal Hash retrieval method and system based on supervision semantic coupling consistency
CN111651660A (en)*2020-05-282020-09-11拾音智能科技有限公司Method for cross-media retrieval of difficult samples
CN111813967A (en)*2020-07-142020-10-23中国科学技术信息研究所Retrieval method, retrieval device, computer equipment and storage medium
CN111897909A (en)*2020-08-032020-11-06兰州理工大学 A method and system for ciphertext speech retrieval based on depth-aware hashing
CN111914156A (en)*2020-08-142020-11-10中国科学院自动化研究所 Adaptive label-aware graph convolutional network cross-modal retrieval method and system
CN112035700A (en)*2020-08-312020-12-04兰州理工大学Voice deep hash learning method and system based on CNN
CN112100413A (en)*2020-09-072020-12-18济南浪潮高新科技投资发展有限公司 A cross-modal hash retrieval method
CN112199520A (en)*2020-09-192021-01-08复旦大学Cross-modal Hash retrieval algorithm based on fine-grained similarity matrix
CN112581477A (en)*2019-09-272021-03-30京东方科技集团股份有限公司Image processing method, image matching method, device and storage medium
CN112613451A (en)*2020-12-292021-04-06民生科技有限责任公司Modeling method of cross-modal text picture retrieval model
CN113095415A (en)*2021-04-152021-07-09齐鲁工业大学Cross-modal hashing method and system based on multi-modal attention mechanism
CN113157739A (en)*2021-04-232021-07-23平安科技(深圳)有限公司Cross-modal retrieval method and device, electronic equipment and storage medium
CN113177132A (en)*2021-06-302021-07-27中国海洋大学Image retrieval method based on depth cross-modal hash of joint semantic matrix
CN113270199A (en)*2021-04-302021-08-17贵州师范大学Medical cross-modal multi-scale fusion class guidance hash method and system thereof
CN113342922A (en)*2021-06-172021-09-03北京邮电大学Cross-modal retrieval method based on fine-grained self-supervision of labels
CN113536067A (en)*2021-07-202021-10-22南京邮电大学Cross-modal information retrieval method based on semantic fusion
CN113658683A (en)*2021-08-052021-11-16重庆金山医疗技术研究院有限公司Disease diagnosis system and data recommendation method
CN113792207A (en)*2021-09-292021-12-14嘉兴学院Cross-modal retrieval method based on multi-level feature representation alignment
CN113806580A (en)*2021-09-282021-12-17西安电子科技大学 A Cross-modal Hash Retrieval Method Based on Hierarchical Semantic Structure
CN114239730A (en)*2021-12-202022-03-25华侨大学 A Cross-modal Retrieval Method Based on Neighbor Ranking Relation
CN114359930A (en)*2021-12-172022-04-15华南理工大学Depth cross-modal hashing method based on fusion similarity
CN114780777A (en)*2022-04-062022-07-22中国科学院上海高等研究院 Cross-modal retrieval method and device, storage medium and terminal based on semantic enhancement
CN116955675A (en)*2023-09-212023-10-27中国海洋大学 Hash image retrieval method and network based on fine-grained similarity relationship contrastive learning
CN118839699A (en)*2024-07-122024-10-25电子科技大学Weak-supervision cross-mode semantic consistency recovery method
CN118884530A (en)*2024-07-302024-11-01北京交通大学 A method for distinguishing earthquake response of ancient buildings based on OpenCV and perceptual hashing algorithm

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2004006128A2 (en)*2002-07-092004-01-15Koninklijke Philips Electronics N.V.Method and apparatus for classification of a data object in a database
CN104166982A (en)*2014-06-302014-11-26复旦大学Image optimization clustering method based on typical correlation analysis
CN104834748A (en)*2015-05-252015-08-12中国科学院自动化研究所Image retrieval method utilizing deep semantic to rank hash codes
CN105760507A (en)*2016-02-232016-07-13复旦大学Cross-modal subject correlation modeling method based on deep learning
CN107679580A (en)*2017-10-212018-02-09桂林电子科技大学A kind of isomery shift image feeling polarities analysis method based on the potential association of multi-modal depth
CN107766555A (en)*2017-11-022018-03-06电子科技大学Image search method based on the unsupervised type cross-module state Hash of soft-constraint
CN108170755A (en)*2017-12-222018-06-15西安电子科技大学Cross-module state Hash search method based on triple depth network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2004006128A2 (en)*2002-07-092004-01-15Koninklijke Philips Electronics N.V.Method and apparatus for classification of a data object in a database
CN104166982A (en)*2014-06-302014-11-26复旦大学Image optimization clustering method based on typical correlation analysis
CN104834748A (en)*2015-05-252015-08-12中国科学院自动化研究所Image retrieval method utilizing deep semantic to rank hash codes
CN105760507A (en)*2016-02-232016-07-13复旦大学Cross-modal subject correlation modeling method based on deep learning
CN107679580A (en)*2017-10-212018-02-09桂林电子科技大学A kind of isomery shift image feeling polarities analysis method based on the potential association of multi-modal depth
CN107766555A (en)*2017-11-022018-03-06电子科技大学Image search method based on the unsupervised type cross-module state Hash of soft-constraint
CN108170755A (en)*2017-12-222018-06-15西安电子科技大学Cross-module state Hash search method based on triple depth network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YUE CAO 等: "Deep Visual-Semantic Hashing for Cross-Modal Retrieval", 《PUBLICATION: KDD"16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING》*
ZHENYAN JI 等: "A Survey of Personalised Image Retrieval and Recommendation", 《THEORETICAL COMPUTER SCIENCE (2017)》*
姚伟娜: "基于深度哈希算法的图像—文本跨模态检索研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》*
张玉宏 等: "深度学习的方法论辨析", 《重庆理工大学学报(社会科学)》*

Cited By (55)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110597878A (en)*2019-09-162019-12-20广东工业大学 A cross-modal retrieval method, device, equipment and medium for multimodal data
CN110597878B (en)*2019-09-162023-09-15广东工业大学 A cross-modal retrieval method, device, equipment and medium for multi-modal data
CN112581477A (en)*2019-09-272021-03-30京东方科技集团股份有限公司Image processing method, image matching method, device and storage medium
CN110765281A (en)*2019-11-042020-02-07山东浪潮人工智能研究院有限公司 A Multi-Semantic Deeply Supervised Cross-modal Hash Retrieval Method
CN111026887A (en)*2019-12-092020-04-17武汉科技大学Cross-media retrieval method and system
CN111026887B (en)*2019-12-092023-05-23武汉科技大学 A method and system for cross-media retrieval
CN111125457A (en)*2019-12-132020-05-08山东浪潮人工智能研究院有限公司 A deep cross-modal hash retrieval method and device
CN110990597A (en)*2019-12-192020-04-10中国电子科技集团公司信息科学研究院 Cross-modal data retrieval system and retrieval method based on text semantic mapping
CN110990597B (en)*2019-12-192022-11-25中国电子科技集团公司信息科学研究院 Cross-modal data retrieval system and retrieval method based on text semantic mapping
CN111177421A (en)*2019-12-302020-05-19论客科技(广州)有限公司Method and device for generating email historical event axis facing digital human
WO2021136318A1 (en)*2019-12-302021-07-08论客科技(广州)有限公司Digital humanities-oriented email history eventline generating method and apparatus
CN111221993A (en)*2020-01-092020-06-02山东建筑大学Visual media retrieval method based on depth binary detail perception hash
CN111221993B (en)*2020-01-092023-07-07山东建筑大学Visual media retrieval method based on depth binary detail perception hash
CN111353076B (en)*2020-02-212023-10-10华为云计算技术有限公司 Methods for training cross-modal retrieval models, cross-modal retrieval methods and related devices
CN111353076A (en)*2020-02-212020-06-30华为技术有限公司Method for training cross-modal retrieval model, cross-modal retrieval method and related device
CN111368176A (en)*2020-03-022020-07-03南京财经大学Cross-modal Hash retrieval method and system based on supervision semantic coupling consistency
CN111368176B (en)*2020-03-022023-08-18南京财经大学Cross-modal hash retrieval method and system based on supervision semantic coupling consistency
CN111651660A (en)*2020-05-282020-09-11拾音智能科技有限公司Method for cross-media retrieval of difficult samples
CN111651660B (en)*2020-05-282023-05-02拾音智能科技有限公司Method for cross-media retrieval of difficult samples
CN111813967B (en)*2020-07-142024-01-30中国科学技术信息研究所Retrieval method, retrieval device, computer equipment and storage medium
CN111813967A (en)*2020-07-142020-10-23中国科学技术信息研究所Retrieval method, retrieval device, computer equipment and storage medium
CN111897909A (en)*2020-08-032020-11-06兰州理工大学 A method and system for ciphertext speech retrieval based on depth-aware hashing
CN111914156A (en)*2020-08-142020-11-10中国科学院自动化研究所 Adaptive label-aware graph convolutional network cross-modal retrieval method and system
CN111914156B (en)*2020-08-142023-01-20中国科学院自动化研究所Cross-modal retrieval method and system for self-adaptive label perception graph convolution network
CN112035700A (en)*2020-08-312020-12-04兰州理工大学Voice deep hash learning method and system based on CNN
CN112100413A (en)*2020-09-072020-12-18济南浪潮高新科技投资发展有限公司 A cross-modal hash retrieval method
CN112199520A (en)*2020-09-192021-01-08复旦大学Cross-modal Hash retrieval algorithm based on fine-grained similarity matrix
CN112199520B (en)*2020-09-192022-07-22复旦大学Cross-modal Hash retrieval algorithm based on fine-grained similarity matrix
CN112613451A (en)*2020-12-292021-04-06民生科技有限责任公司Modeling method of cross-modal text picture retrieval model
CN113095415B (en)*2021-04-152022-06-14齐鲁工业大学 A cross-modal hashing method and system based on multimodal attention mechanism
CN113095415A (en)*2021-04-152021-07-09齐鲁工业大学Cross-modal hashing method and system based on multi-modal attention mechanism
CN113157739B (en)*2021-04-232024-01-09平安科技(深圳)有限公司Cross-modal retrieval method and device, electronic equipment and storage medium
CN113157739A (en)*2021-04-232021-07-23平安科技(深圳)有限公司Cross-modal retrieval method and device, electronic equipment and storage medium
CN113270199B (en)*2021-04-302024-04-26贵州师范大学Medical cross-mode multi-scale fusion class guide hash method and system thereof
CN113270199A (en)*2021-04-302021-08-17贵州师范大学Medical cross-modal multi-scale fusion class guidance hash method and system thereof
CN113342922A (en)*2021-06-172021-09-03北京邮电大学Cross-modal retrieval method based on fine-grained self-supervision of labels
CN113177132A (en)*2021-06-302021-07-27中国海洋大学Image retrieval method based on depth cross-modal hash of joint semantic matrix
CN113536067B (en)*2021-07-202024-01-05南京邮电大学Cross-modal information retrieval method based on semantic fusion
CN113536067A (en)*2021-07-202021-10-22南京邮电大学Cross-modal information retrieval method based on semantic fusion
CN113658683A (en)*2021-08-052021-11-16重庆金山医疗技术研究院有限公司Disease diagnosis system and data recommendation method
CN113806580B (en)*2021-09-282023-10-20西安电子科技大学Cross-modal hash retrieval method based on hierarchical semantic structure
CN113806580A (en)*2021-09-282021-12-17西安电子科技大学 A Cross-modal Hash Retrieval Method Based on Hierarchical Semantic Structure
CN113792207A (en)*2021-09-292021-12-14嘉兴学院Cross-modal retrieval method based on multi-level feature representation alignment
CN113792207B (en)*2021-09-292023-11-17嘉兴学院Cross-modal retrieval method based on multi-level feature representation alignment
CN114359930A (en)*2021-12-172022-04-15华南理工大学Depth cross-modal hashing method based on fusion similarity
CN114359930B (en)*2021-12-172024-09-17华南理工大学Depth cross-modal hash method based on fusion similarity
CN114239730A (en)*2021-12-202022-03-25华侨大学 A Cross-modal Retrieval Method Based on Neighbor Ranking Relation
CN114239730B (en)*2021-12-202024-08-20华侨大学Cross-modal retrieval method based on neighbor ordering relation
CN114780777A (en)*2022-04-062022-07-22中国科学院上海高等研究院 Cross-modal retrieval method and device, storage medium and terminal based on semantic enhancement
CN114780777B (en)*2022-04-062022-12-20中国科学院上海高等研究院 Cross-modal retrieval method and device, storage medium and terminal based on semantic enhancement
CN116955675B (en)*2023-09-212023-12-12中国海洋大学Hash image retrieval method and network based on fine-grained similarity relation contrast learning
CN116955675A (en)*2023-09-212023-10-27中国海洋大学 Hash image retrieval method and network based on fine-grained similarity relationship contrastive learning
CN118839699A (en)*2024-07-122024-10-25电子科技大学Weak-supervision cross-mode semantic consistency recovery method
CN118884530A (en)*2024-07-302024-11-01北京交通大学 A method for distinguishing earthquake response of ancient buildings based on OpenCV and perceptual hashing algorithm
CN118884530B (en)*2024-07-302025-05-16北京交通大学Ancient building earthquake response discrimination method of OpenCV and perceptual hash algorithm

Similar Documents

PublicationPublication DateTitle
CN110110122A (en)Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval
Shu et al.Specific class center guided deep hashing for cross-modal retrieval
Wu et al.Weakly semi-supervised deep learning for multi-label image annotation
Yang et al.Transfer learning for sequence tagging with hierarchical recurrent networks
Jiang et al.A survey on artificial intelligence in Chinese sign language recognition
Wang et al.Cross-modal retrieval: a systematic review of methods and future directions
Zhu et al.Multi-modal deep analysis for multimedia
Su et al.Semi-supervised knowledge distillation for cross-modal hashing
Wang et al.Facilitating image search with a scalable and compact semantic mapping
Liu et al.OMGH: Online manifold-guided hashing for flexible cross-modal retrieval
CN113672693B (en) Tag recommendation method for online question answering platform based on knowledge graph and tag association
Li et al.DAHP: Deep attention-guided hashing with pairwise labels
Wang et al.Large-scale text classification using scope-based convolutional neural network: A deep learning approach
US11935278B1 (en)Image labeling for artificial intelligence datasets
CN114168784A (en) A Hierarchical Supervised Cross-modal Image and Text Retrieval Method
Lin et al.Multi-modality weakly labeled sentiment learning based on explicit emotion signal for Chinese microblog
Zhang et al.Learning multi-layer coarse-to-fine representations for large-scale image classification
Xu et al.Weakly supervised facial expression recognition via transferred DAL-CNN and active incremental learning
Wang et al.Robust local metric learning via least square regression regularization for scene recognition
Liu et al.Sparse autoencoder for social image understanding
Li et al.Multimodal fusion with co-attention mechanism
Perdana et al.Instance-based deep transfer learning on cross-domain image captioning
Wu et al.Contrastive multi-bit collaborative learning for deep cross-modal hashing
Meng et al.Concept-concept association information integration and multi-model collaboration for multimedia semantic concept detection
CN109255098B (en) A matrix factorization hashing method based on reconstruction constraints

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20190809


[8]ページ先頭

©2009-2025 Movatter.jp