技术领域Technical field
本发明涉及图像处理领域,尤其涉及一种图像处理装置和方法。The present invention relates to the field of image processing, and in particular, to an image processing device and method.
背景技术Background technique
用户在拍完照片后,为了展现更好的图像效果,会对通过电脑中的PS软件或者手机中的修图软件对图像进行处理。After the user takes the photo, in order to show better image effects, the image will be processed through the PS software on the computer or the photo editing software on the mobile phone.
但是,在使用电脑中PS软件或者手机中的修图软件对图像处理之前,用户需要学习掌握软件的使用方法,并且在掌握软件的使用方法后,需要手动输入指令来控制电脑或者手机进行修图操作。这种方式对于用户来说,既耗费时间,并且用户体验差。However, before using the PS software on the computer or the photo retouching software on the mobile phone to process images, the user needs to learn how to use the software, and after mastering the use of the software, the user needs to manually input instructions to control the computer or mobile phone for photo retouching. operate. This method is time-consuming and provides poor user experience for users.
发明内容Contents of the invention
本发明实施例提供一种图像处理装置及方法,实现了输入语音即可对图像进行处理的功能,节省了用户在图像处理之前学习图像处理软件的时间,提高了用户体验。Embodiments of the present invention provide an image processing device and method, which realizes the function of processing images by inputting voice, saves users time in learning image processing software before image processing, and improves user experience.
第一方面,本发明实施例提供一种图像处理装置,包括:In a first aspect, an embodiment of the present invention provides an image processing device, including:
输入输出单元,用于输入语音信号和待处理图像;Input and output unit, used to input speech signals and images to be processed;
存储单元,用于存储所述语音信号和所述待处理图像;A storage unit used to store the voice signal and the image to be processed;
图像处理单元,用于将所述语音信号转换成图像处理指令和目标区域,所述目标区域为待处理图像的处理区域;并根据所述图像处理指令对所述标区域进行处理,以得到处理后的图像,并将所述待处理图像存储到所述存储单元中;An image processing unit, configured to convert the voice signal into an image processing instruction and a target area, where the target area is the processing area of the image to be processed; and process the target area according to the image processing instruction to obtain the processed The final image is obtained, and the image to be processed is stored in the storage unit;
所述输入输出单元,还用于将所述处理后的图像输出。The input and output unit is also used to output the processed image.
在一种可行的实施例中,所述存储单元包括神经元存储单元和权值缓存单元,所述图像处理单元的神经网络运算单元包括神经网络运算子单元;In a feasible embodiment, the storage unit includes a neuron storage unit and a weight cache unit, and the neural network operation unit of the image processing unit includes a neural network operation subunit;
当所述神经元存储单元用于存储所述语音信号和所述待处理图像且所述权值缓存单元用于存储目标语音指令转换模型和目标图像处理模型时,所述神经网络运算子单元用于根据所述目标语音指令转换模型将所述语音信号转换成所述图像处理指令和所述目标区域;When the neuron storage unit is used to store the speech signal and the image to be processed and the weight cache unit is used to store the target voice instruction conversion model and the target image processing model, the neural network operation subunit uses Converting the speech signal into the image processing instruction and the target area according to the target speech instruction conversion model;
所述神经网络运算子单元,还用于根据所述目标图像处理模型和所述图像处理指令对所述目标区域进行处理,以得到处理后的图像;The neural network operation subunit is also used to process the target area according to the target image processing model and the image processing instruction to obtain a processed image;
所述神经网络运算子单元,还用于将所述处理后的图像存储到所述神经元存储单元中。The neural network operation subunit is also used to store the processed image into the neuron storage unit.
在一种可行的实施例中,所述存储单元包括通用数据缓存单元,所述图像处理单元的神经网络运算单元包括通用运算子单元;In a feasible embodiment, the storage unit includes a general data cache unit, and the neural network operation unit of the image processing unit includes a general operation subunit;
当所述通用数据缓存单元用于所述语音信号和所述待处理图像时,所述通用运算子单元用于将所述语音信号转换成所述图像处理指令和所述目标区域;When the general data cache unit is used for the voice signal and the image to be processed, the general operation subunit is used to convert the voice signal into the image processing instruction and the target area;
所述通用运算子单元,还用于根据所述图像处理指令对所述目标区域进行处理,以得到处理后的图像;The general operation subunit is also used to process the target area according to the image processing instruction to obtain a processed image;
所述通用运算子单元,还用于将所述处理后的图像存储到所述通用数据存储单元中。The general operation subunit is also used to store the processed image into the general data storage unit.
在一种可行的实施例中,所述神经网络运算子单元具体用于:In a feasible embodiment, the neural network operation subunit is specifically used to:
根据语音识别技术将所述语音信号转换成文本信息;Convert the speech signal into text information according to speech recognition technology;
根据自然语言处理技术和所述目标语音指令转换模型将所述文本信息转换成所述图像处理指令;Convert the text information into the image processing instructions according to natural language processing technology and the target voice instruction conversion model;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域According to the granularity of the semantic area in the image processing instruction and the image recognition technology, the image to be processed is divided into areas to obtain the target area.
在一种可行的实施例中,所述神经网络运算子单元具体用于:In a feasible embodiment, the neural network operation subunit is specifically used to:
根据语音识别技术、语义理解技术和所述目标语音指令转换模型将所述语音信号转换成所述图像处理指令;Convert the speech signal into the image processing instruction according to speech recognition technology, semantic understanding technology and the target speech instruction conversion model;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
在一种可行的实施例中,所述通用运算子单元具体用于:In a feasible embodiment, the general operation subunit is specifically used for:
根据语音识别技术将所述语音信号转换成文本信息;Convert the speech signal into text information according to speech recognition technology;
根据自然语言处理技术将所述文本信息转换成所述图像处理指令;Convert the text information into the image processing instructions according to natural language processing technology;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
在一种可行的实施例中,所述通用运算子单元具体用于:In a feasible embodiment, the general operation subunit is specifically used for:
根据语音识别技术和语义理解技术将所述语音信号转换成所述图像处理指令;Convert the speech signal into the image processing instruction according to speech recognition technology and semantic understanding technology;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
在一种可行的实施例中,所述神经元存储单元用于存储所述目标区域和所述图像处理指令。In a feasible embodiment, the neuron storage unit is used to store the target area and the image processing instructions.
在一种可行的实施例中,所述通用数据缓存单元用于存储所述目标区域和所述图像处理指令。In a feasible embodiment, the general data cache unit is used to store the target area and the image processing instruction.
在一种可行的实施例中,所述神经网络运算子单元用于:In a feasible embodiment, the neural network operation subunit is used for:
在预设时间窗口内从所述神经元存储单元中获取M条图像处理指令;Obtain M image processing instructions from the neuron storage unit within the preset time window;
删除所述M条图像处理指令中功能相同的图像处理指令,得到N条图像处理指令,所述M为大于1的整数,所述N为小于所述M的整数;Delete image processing instructions with the same function among the M image processing instructions to obtain N image processing instructions, where M is an integer greater than 1, and N is an integer less than M;
根据所述N条图像处理指令和所述目标图像处理模型对所述目标区域进行处理,以得到处理后的图像。The target area is processed according to the N image processing instructions and the target image processing model to obtain a processed image.
在一种可行的实施例中,所述通用运算子单元用于:In a feasible embodiment, the general operator unit is used for:
在预设时间窗口内从所述通用数据缓存单元中获取M条图像处理指令;Obtain M image processing instructions from the general data cache unit within a preset time window;
删除所述M条图像处理指令中功能相同的图像处理指令,得到N条图像处理指令,所述M为大于1的整数,所述N为小于所述M的整数;Delete image processing instructions with the same function among the M image processing instructions to obtain N image processing instructions, where M is an integer greater than 1, and N is an integer less than M;
根据所述N条图像处理指令对所述目标区域进行处理,以得到处理后的图像。The target area is processed according to the N image processing instructions to obtain a processed image.
在一种可行的实施例中,所述神经网络运算子单元还用于:In a feasible embodiment, the neural network operation subunit is also used to:
对语音指令转换模型进行自适应训练,以得到所述目标语音指令转换模型。Perform adaptive training on the voice command conversion model to obtain the target voice command conversion model.
在一种可行的实施例中,所述神经网络运算子单元还用于:In a feasible embodiment, the neural network operation subunit is also used to:
根据所述语音指令转换模型将所述语音信号换成预测指令;Convert the speech signal into a predictive instruction according to the speech instruction conversion model;
确定所述预测指令与其对应的指令集合的相关系数;Determine the correlation coefficient between the predicted instruction and its corresponding instruction set;
根据所述预测指令与其对应的指令集合的相关系数优化所述语音指令转换模型,以得到所述目标语音指令转换模型。The voice command conversion model is optimized according to the correlation coefficient between the predicted command and its corresponding command set to obtain the target voice command conversion model.
在一种可行的实施例中,所述神经网络运算子单元还用于:In a feasible embodiment, the neural network operation subunit is also used to:
对图像处理模型进行自适应训练,以得到所述目标图像处理模型。Perform adaptive training on the image processing model to obtain the target image processing model.
在一种可行的实施例中,所述神经网络运算子单元还用于:In a feasible embodiment, the neural network operation subunit is also used to:
根据所述图像处理模型对所述待处理图像进行处理,以得到预测图像;Process the image to be processed according to the image processing model to obtain a predicted image;
确定所述预测图像与其对应的目标图像的相关系数;Determine the correlation coefficient between the predicted image and its corresponding target image;
根据所述预测图像与其对应的目标图像的相关系数优化所述图像处理模型,以得到所述目标图像处理模型。The image processing model is optimized according to the correlation coefficient between the predicted image and its corresponding target image to obtain the target image processing model.
在一种可行的实施例中,所述图像处理装置的图像处理单元还包括:In a feasible embodiment, the image processing unit of the image processing device further includes:
指令缓存单元,用于存储待执行的指令,该指令包括神经网络运算指令和通用运算指令;An instruction cache unit is used to store instructions to be executed, which include neural network operation instructions and general operation instructions;
指令处理单元,用于将所述神经网络运算指令传输至所述神经网络运算子单元,将所述通用运算指令传输至所述通用运算子单元。An instruction processing unit is used to transmit the neural network operation instructions to the neural network operation sub-unit, and transmit the general operation instructions to the general operation sub-unit.
第二方面,本发明实施例提供了一种图像处理方法,包括:In a second aspect, embodiments of the present invention provide an image processing method, including:
输入语音信号和待处理图像;Input voice signals and images to be processed;
存储所述语音信号和所述待处理图像;Store the voice signal and the image to be processed;
将所述语音信号转换成图像处理指令和目标区域,所述目标区域为待处理图像的处理区域;并根据所述图像处理指令对所述标区域进行处理,以得到处理后的图像,并将所述待处理图像存储到所述存储单元中;Convert the voice signal into an image processing instruction and a target area, where the target area is the processing area of the image to be processed; and process the target area according to the image processing instruction to obtain a processed image, and The image to be processed is stored in the storage unit;
输出所述处理后的图像。Output the processed image.
在一种可行的实施例中,所述将所述语音信号转换成图像处理指令和目标区域,包括:In a feasible embodiment, converting the voice signal into image processing instructions and target areas includes:
根据语音识别技术将所述语音信号转换成文本信息;Convert the speech signal into text information according to speech recognition technology;
根据自然语言处理技术和目标语音指令转换模型将所述文本信息转换成所述图像处理指令;Convert the text information into the image processing instructions according to natural language processing technology and the target voice instruction conversion model;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
在一种可行的实施例中,所述将所述语音信号转换成图像处理指令和目标区域,包括:In a feasible embodiment, converting the voice signal into image processing instructions and target areas includes:
根据语音识别技术、语义理解技术和目标语音指令转换模型将所述语音信号转换成所述图像处理指令;Convert the speech signal into the image processing instruction according to speech recognition technology, semantic understanding technology and target speech instruction conversion model;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
在一种可行的实施例中,所述将所述语音信号转换成图像处理指令和目标区域,包括:In a feasible embodiment, converting the voice signal into image processing instructions and target areas includes:
根据语音识别技术将所述语音信号转换成文本信息;Convert the speech signal into text information according to speech recognition technology;
根据自然语言处理技术将所述文本信息转换成所述图像处理指令;Convert the text information into the image processing instructions according to natural language processing technology;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
在一种可行的实施例中,所述将所述语音信号转换成图像处理指令和目标区域,包括:In a feasible embodiment, converting the voice signal into image processing instructions and target areas includes:
根据语音识别技术和语义理解技术将所述语音信号转换成所述图像处理指令;Convert the speech signal into the image processing instruction according to speech recognition technology and semantic understanding technology;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
在一种可行的实施例中,所述将所述语音信号转换成图像处理指令和目标区域之后,所述方法还包括:In a possible embodiment, after converting the voice signal into image processing instructions and target areas, the method further includes:
存储所述图像处理指令和所述目标区域。The image processing instructions and the target area are stored.
在一种可行的实施例中,所述根据所述图像处理指令对所述标区域进行处理,以得到处理后的图像,包括:In a feasible embodiment, processing the target area according to the image processing instruction to obtain a processed image includes:
在预设时间窗口内从所述神经元存储单元中获取M条图像处理指令;Obtain M image processing instructions from the neuron storage unit within the preset time window;
删除所述M条图像处理指令中功能相同的图像处理指令,得到N条图像处理指令,所述M为大于1的整数,所述N为小于所述M的整数;Delete image processing instructions with the same function among the M image processing instructions to obtain N image processing instructions, where M is an integer greater than 1, and N is an integer less than M;
根据所述N条图像处理指令和目标图像处理模型对所述目标区域进行处理,以得到处理后的图像。The target area is processed according to the N image processing instructions and the target image processing model to obtain a processed image.
在一种可行的实施例中,所述根据所述图像处理指令对所述标区域进行处理,以得到处理后的图像,包括:In a feasible embodiment, processing the target area according to the image processing instruction to obtain a processed image includes:
在预设时间窗口内从所述通用数据缓存单元中获取M条图像处理指令;Obtain M image processing instructions from the general data cache unit within a preset time window;
删除所述M条图像处理指令中功能相同的图像处理指令,得到N条图像处理指令,所述M为大于1的整数,所述N为小于所述M的整数;Delete image processing instructions with the same function among the M image processing instructions to obtain N image processing instructions, where M is an integer greater than 1, and N is an integer less than M;
根据所述N条图像处理指令对所述目标区域进行处理,以得到处理后的图像。The target area is processed according to the N image processing instructions to obtain a processed image.
在一种可行的实施例中,所述方法还包括:In a feasible embodiment, the method further includes:
对语音指令转换模型进行自适应训练,以得到所述目标语音指令转换模型。Perform adaptive training on the voice command conversion model to obtain the target voice command conversion model.
在一种可行的实施例中,所述对语音指令转换模型进行自适应训练,以得到所述目标语音指令转换模型,包括:In a feasible embodiment, the adaptive training of the voice command conversion model to obtain the target voice command conversion model includes:
根据所述语音指令转换模型将所述语音信号换成预测指令;Convert the speech signal into a predictive instruction according to the speech instruction conversion model;
确定所述预测指令与其对应的指令集合的相关系数;Determine the correlation coefficient between the predicted instruction and its corresponding instruction set;
根据所述预测指令与其对应的指令集合的相关系数优化所述语音指令转换模型,以得到所述目标语音指令转换模型。The voice command conversion model is optimized according to the correlation coefficient between the predicted command and its corresponding command set to obtain the target voice command conversion model.
在一种可行的实施例中,所述方法还包括:In a feasible embodiment, the method further includes:
对图像处理模型进行自适应训练,以得到所述目标图像处理模型。Perform adaptive training on the image processing model to obtain the target image processing model.
在一种可行的实施例中,所述对图像处理模型进行自适应训练,包括:In a feasible embodiment, the adaptive training of the image processing model includes:
根据所述图像处理模型对所述待处理图像进行处理,以得到预测图像;Process the image to be processed according to the image processing model to obtain a predicted image;
确定所述预测图像与其对应的目标图像的相关系数;Determine the correlation coefficient between the predicted image and its corresponding target image;
根据所述预测图像与其对应的目标图像的相关系数优化所述图像处理模型,以得到所述目标图像处理模型。The image processing model is optimized according to the correlation coefficient between the predicted image and its corresponding target image to obtain the target image processing model.
第三方面,本发明实施例还提供了一种图像处理芯片,该芯片包括本发明实施例第一方面的所述图像处理装置。In a third aspect, an embodiment of the present invention further provides an image processing chip, which includes the image processing device of the first aspect of the embodiment of the present invention.
在一种可行的实施例中,上述芯片包括主芯片和协作芯片;In a feasible embodiment, the above-mentioned chip includes a main chip and a collaboration chip;
上述协作芯片包括本发明实施例第一方面的所述的装置,上述主芯片用于为上述协作芯片提供启动信号,控制待处理图像和图像处理指令传输至上述协作芯片。The above-mentioned cooperation chip includes the device described in the first aspect of the embodiment of the present invention. The above-mentioned main chip is used to provide a start signal to the above-mentioned cooperation chip and control the transmission of images to be processed and image processing instructions to the above-mentioned cooperation chip.
第四方面,本发明实施例提供了一种芯片封装结构,该芯片封装结构包括本发明实施例第三方面所述的图像处理芯片。In a fourth aspect, embodiments of the present invention provide a chip packaging structure. The chip packaging structure includes the image processing chip described in the third aspect of the embodiments of the present invention.
第五方面,本发明实施例提供了一种板卡,该板卡包括本发明实施例第四方面所述的芯片封装结构。In a fifth aspect, an embodiment of the present invention provides a board card, which includes the chip packaging structure described in the fourth aspect of the embodiment of the present invention.
第六方面,本发明实施例提供了一种电子设备,该电子设备包括本发明实施例的第五方面所述的板卡。In a sixth aspect, an embodiment of the present invention provides an electronic device, which includes the board card described in the fifth aspect of the embodiment of the present invention.
可以看出,在本发明实施例的方案中,输入输出单元输入语音信号和待处理图像;存储单元存储所述语音信号和所述待处理图像;图像处理单元将所述语音信号转换成图像处理指令和目标区域,所述目标区域为待处理图像的处理区域;并根据所述图像处理指令对所述标区域进行处理,以得到处理后的图像,并将所述待处理图像存储到所述存储单元中;所述输入输出单元将所述处理后的图像输出。与现有的图像处理技术相比,本发明通过语音进行图像处理,节省了用户在进行图像处理前学习图像处理软件的时间,提高了用户体验。It can be seen that in the solution of the embodiment of the present invention, the input and output unit inputs the voice signal and the image to be processed; the storage unit stores the voice signal and the image to be processed; the image processing unit converts the voice signal into image processing instructions and a target area, the target area is the processing area of the image to be processed; and the target area is processed according to the image processing instruction to obtain the processed image, and the image to be processed is stored in the In the storage unit; the input and output unit outputs the processed image. Compared with existing image processing technology, the present invention performs image processing through voice, saving users time in learning image processing software before performing image processing, and improving user experience.
本发明的这些方面或其他方面在以下实施例的描述中会更加简明易懂。These and other aspects of the invention will be more clearly understood in the following description of the embodiments.
附图说明Description of the drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.
图1为本发明实施例提供的一种图像处理装置的结构示意图;Figure 1 is a schematic structural diagram of an image processing device provided by an embodiment of the present invention;
图2为本发明实施例提供的另一种图像处理装置的局部结构示意图;Figure 2 is a partial structural schematic diagram of another image processing device provided by an embodiment of the present invention;
图3为本发明实施例提供的一种电子设备的结构示意图;Figure 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention;
图4为本发明实施例提供的一种图像处理方法的流程示意图。Figure 4 is a schematic flowchart of an image processing method provided by an embodiment of the present invention.
具体实施方式Detailed ways
以下分别进行详细说明。Each is explained in detail below.
本发明的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third" and "fourth" in the description, claims and drawings of the present invention are used to distinguish different objects, rather than describing a specific sequence. . Furthermore, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device that includes a series of steps or units is not limited to the listed steps or units, but optionally also includes steps or units that are not listed, or optionally also includes Other steps or units inherent to such processes, methods, products or devices.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art understand, both explicitly and implicitly, that the embodiments described herein may be combined with other embodiments.
请参见图1,图1为本发明实施例提供的一种图像处理装置的结构示意图。如图1所示,该图像处理装置包括:Please refer to FIG. 1 , which is a schematic structural diagram of an image processing device provided by an embodiment of the present invention. As shown in Figure 1, the image processing device includes:
输入输出单元130,用于输入语音信号和待处理图像。The input and output unit 130 is used to input voice signals and images to be processed.
可选地,上述图像处理装置还包括噪声过滤器,上述输入输出单元130获取上述语音信号后,上述噪声过滤器对该语音信号进行降噪处理。Optionally, the above-mentioned image processing device further includes a noise filter. After the above-mentioned input and output unit 130 acquires the above-mentioned speech signal, the above-mentioned noise filter performs noise reduction processing on the speech signal.
可选地,该输入输出单元130可为语音传感器、麦克风、拾音器获取其他音频采集装置。Optionally, the input and output unit 130 can acquire other audio collection devices such as voice sensors, microphones, and pickups.
具体的,上述输入输出单元130在获取上述语音信号时,还获取环境声音信号。上述噪声过滤器根据上述环境声音信号对上述语音信号进行降噪处理。该环境声音信号可以看成上述语音信号的噪声。Specifically, when acquiring the speech signal, the input and output unit 130 also acquires the environmental sound signal. The above-mentioned noise filter performs noise reduction processing on the above-mentioned voice signal according to the above-mentioned environmental sound signal. The environmental sound signal can be regarded as the noise of the above-mentioned speech signal.
进一步地,上述该输入输出单元130可包括对麦克风阵列,既可用于采集上述语音信号和上述环境声音信号,又实现了降噪处理。Further, the above-mentioned input and output unit 130 may include a pair of microphone arrays, which may be used to collect the above-mentioned voice signals and the above-mentioned environmental sound signals, and also implement noise reduction processing.
存储单元120,用于存储所述语音信号和所述待处理图像。The storage unit 120 is used to store the voice signal and the image to be processed.
图像处理单元110,用于将所述语音信号转换成图像处理指令和目标区域,所述目标区域为待处理图像的处理区域;并根据所述图像处理指令对所述标区域进行处理,以得到处理后的图像,并将所述待处理图像存储到所述存储单元中。The image processing unit 110 is configured to convert the voice signal into an image processing instruction and a target area, where the target area is the processing area of the image to be processed; and process the target area according to the image processing instruction to obtain The processed image is stored in the storage unit.
可选地,所述存储单元120包括神经元存储单元121和权值缓存单元122,所述图像处理单元110的神经网络运算单元113包括神经网络运算子单元1131;Optionally, the storage unit 120 includes a neuron storage unit 121 and a weight cache unit 122, and the neural network operation unit 113 of the image processing unit 110 includes a neural network operation subunit 1131;
当所述神经元存储单元121用于存储所述语音信号和所述待处理图像且所述权值缓存单元122用于存储目标语音指令转换模型和目标图像处理模型时,所述神经网络运算子单元1131用于根据所述目标语音指令转换模型将所述语音信号转换成所述图像处理指令和所述目标区域;When the neuron storage unit 121 is used to store the speech signal and the image to be processed and the weight cache unit 122 is used to store the target voice instruction conversion model and the target image processing model, the neural network operator Unit 1131 is configured to convert the speech signal into the image processing instruction and the target area according to the target speech instruction conversion model;
所述神经网络运算子单元1131,还用于根据所述目标图像处理模型和所述图像处理指令对所述目标区域进行处理,以得到处理后的图像;The neural network operation subunit 1131 is also used to process the target area according to the target image processing model and the image processing instruction to obtain a processed image;
所述神经网络运算子单元1131,还用于将所述处理后的图像存储到所述神经元存储单元中。The neural network operation subunit 1131 is also used to store the processed image into the neuron storage unit.
进一步地,所述神经网络运算子单元1131具体用于:Further, the neural network operation subunit 1131 is specifically used to:
根据语音识别技术将所述语音信号转换成文本信息;Convert the speech signal into text information according to speech recognition technology;
根据自然语言处理技术和所述目标语音指令转换模型将所述文本信息转换成所述图像处理指令;Convert the text information into the image processing instructions according to natural language processing technology and the target voice instruction conversion model;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
进一步地,所述神经网络运算子单元1131具体用于:Further, the neural network operation subunit 1131 is specifically used to:
根据语音识别技术、语义理解技术和所述目标语音指令转换模型将所述语音信号转换成所述图像处理指令;Convert the speech signal into the image processing instruction according to speech recognition technology, semantic understanding technology and the target speech instruction conversion model;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
进一步地,所述神经元存储单元121用于存储所述目标区域和所述图像处理指令。Further, the neuron storage unit 121 is used to store the target area and the image processing instructions.
具体地,所述神经网络运算子单元1131用于:Specifically, the neural network operation subunit 1131 is used for:
在预设时间窗口内从所述神经元存储单元中获取M条图像处理指令;Obtain M image processing instructions from the neuron storage unit within the preset time window;
删除所述M条图像处理指令中功能相同的图像处理指令,得到N条图像处理指令,所述M为大于1的整数,所述N为小于所述M的整数;Delete image processing instructions with the same function among the M image processing instructions to obtain N image processing instructions, where M is an integer greater than 1, and N is an integer less than M;
根据所述N条图像处理指令和所述目标图像处理模型对所述目标区域进行处理,以得到处理后的图像。The target area is processed according to the N image processing instructions and the target image processing model to obtain a processed image.
具体地,当上述存储单元120的神经元存储单元121存储上述语音信号和上述待处理图像,且上述权值缓存单元122存储上述目标语音指令转换模型时,上述神经网络运算子单元1131根据语音识别技术将所述语音信号转换成文本信息,根据自然语音处理技术和上述目标语音指令转换模型将上述文本信息转换成图像处理指令,并根据该图像处理指令中的语义区域的粒度和图像识别技术对上述待处理图像进行区域划分,以获取上述目标区域;或者,Specifically, when the neuron storage unit 121 of the storage unit 120 stores the speech signal and the image to be processed, and the weight cache unit 122 stores the target speech instruction conversion model, the neural network operation subunit 1131 performs the operation according to the speech recognition The technology converts the speech signal into text information, converts the above text information into image processing instructions according to the natural speech processing technology and the above-mentioned target speech instruction conversion model, and converts the above-mentioned text information into image processing instructions according to the granularity of the semantic area in the image processing instructions and the image recognition technology. The above image to be processed is divided into areas to obtain the above target area; or,
上述神经网络运算子单元1131根据语音识别技术、语义理解技术和上述目标语音指令转换模型将上述语音信号转换成图像处理指令,并根据上述图像处理指令中的语义区域的粒度和图像识别技术对上述待处理图像进行区域划分,以获取上述目标区域。The above-mentioned neural network operation sub-unit 1131 converts the above-mentioned speech signal into an image processing instruction according to the speech recognition technology, the semantic understanding technology and the above-mentioned target speech instruction conversion model, and converts the above-mentioned speech signal into an image processing instruction according to the granularity of the semantic area in the above-mentioned image processing instruction and the image recognition technology. The image to be processed is divided into regions to obtain the above target area.
进一步地,上述神经网络运算子单元1131将上述图像处理指令和上述目标区域存储到上述神经元缓存单元121中。上述神经网络运算子单元1131从上述权值缓存单元122中获取上述目标语音指令转换模型,并在预设时间窗口内从上述神经元存储单元121中获取M条图像处理指令和目标区域,并删除上述M条图像处理指令中功能相同的图像处理指令,得到N条图像处理指令。上述神经网络运算子单元1131根据上述N条图像处理指令和上述目标图像处理模型对上述目标区域进行处理,以得到处理后的图像。Further, the above-mentioned neural network operation sub-unit 1131 stores the above-mentioned image processing instructions and the above-mentioned target area into the above-mentioned neuron cache unit 121. The above-mentioned neural network operation sub-unit 1131 obtains the above-mentioned target voice instruction conversion model from the above-mentioned weight cache unit 122, and obtains the M image processing instructions and target areas from the above-mentioned neuron storage unit 121 within a preset time window, and deletes them Among the above M image processing instructions, image processing instructions with the same function are obtained as N image processing instructions. The above-mentioned neural network operation sub-unit 1131 processes the above-mentioned target area according to the above-mentioned N image processing instructions and the above-mentioned target image processing model to obtain a processed image.
可选地,所述存储单元包括通用数据缓存单元,所述图像处理单元的神经网络运算单元包括通用运算子单元;Optionally, the storage unit includes a general data cache unit, and the neural network operation unit of the image processing unit includes a general operation subunit;
当所述通用数据缓存单元用于所述语音信号和所述待处理图像时,所述通用运算子单元用于将所述语音信号转换成所述图像处理指令和所述目标区域;When the general data cache unit is used for the voice signal and the image to be processed, the general operation subunit is used to convert the voice signal into the image processing instruction and the target area;
所述通用运算子单元,还用于根据所述图像处理指令对所述目标区域进行处理,以得到处理后的图像;The general operation subunit is also used to process the target area according to the image processing instruction to obtain a processed image;
所述通用运算子单元,还用于将所述处理后的图像存储到所述通用数据存储单元中。The general operation subunit is also used to store the processed image into the general data storage unit.
进一步地,所述通用运算子单元具体用于:Further, the general operation subunit is specifically used for:
根据语音识别技术将所述语音信号转换成文本信息;Convert the speech signal into text information according to speech recognition technology;
根据自然语言处理技术将所述文本信息转换成所述图像处理指令;Convert the text information into the image processing instructions according to natural language processing technology;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
进一步地,所述通用运算子单元具体用于:Further, the general operation subunit is specifically used for:
根据语音识别技术和语义理解技术将所述语音信号转换成所述图像处理指令;Convert the speech signal into the image processing instruction according to speech recognition technology and semantic understanding technology;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
进一步地,所述通用数据缓存单元用于存储所述目标区域和所述图像处理指令。Further, the general data cache unit is used to store the target area and the image processing instructions.
具体地,所述通用运算子单元用于:Specifically, the general operation subunit is used for:
在预设时间窗口内从所述通用数据缓存单元中获取M条图像处理指令;Obtain M image processing instructions from the general data cache unit within a preset time window;
删除所述M条图像处理指令中功能相同的图像处理指令,得到N条图像处理指令,所述M为大于1的整数,所述N为小于所述M的整数;Delete image processing instructions with the same function among the M image processing instructions to obtain N image processing instructions, where M is an integer greater than 1, and N is an integer less than M;
根据所述N条图像处理指令对所述目标区域进行处理,以得到处理后的图像。The target area is processed according to the N image processing instructions to obtain a processed image.
具体地,当上述存储单元120的通用数据缓存单元123存储上述语音信号和上述待处理图像时,上述通用运算子单元1132根据语音识别技术将上述语音信号转换成文本信息,根据自然语言处理技术将上述文本信息转换成图像处理指令,并根据上述图像处理指令的语义区域的粒度和图像识别技术对上述待处理图像进行区域划分,以获取上述目标区域;或者,Specifically, when the general data cache unit 123 of the storage unit 120 stores the speech signal and the image to be processed, the general operation subunit 1132 converts the speech signal into text information according to speech recognition technology, and converts the speech signal into text information according to natural language processing technology. The above-mentioned text information is converted into image processing instructions, and the above-mentioned image to be processed is divided into areas according to the granularity of the semantic area of the above-mentioned image processing instructions and image recognition technology to obtain the above-mentioned target area; or,
上述通用运算子单元1132根据语音识别技术和语义理解技术将上述语音信号转换成上述图像处理指令,并根据该图像处理指令中的语义区域的粒度和图像识别技术对上述待处理图像进行区域划分,以获取上述目标区域。The above-mentioned general operation sub-unit 1132 converts the above-mentioned speech signal into the above-mentioned image processing instruction according to speech recognition technology and semantic understanding technology, and divides the above-mentioned image to be processed according to the granularity of the semantic area in the image processing instruction and the image recognition technology, to obtain the above target area.
进一步地,上述通用运算子单元1132将上述图像处理指令和上述目标区域存储到上述通用数据缓存单元123中。上述通用运算子单元1132从上述通用数据缓存单元中获取上述目标区域和在预设时间窗口内从上述通用数据缓存单元中获取M条图像处理指令,删除该M条图像处理指令中功能相同的图像处理指令,得到N条图像处理指令,并根据该N条图像处理指令对上述目标区域进行处理,以得到处理后的图像。Further, the general operation sub-unit 1132 stores the image processing instruction and the target area into the general data cache unit 123. The general operation sub-unit 1132 obtains the target area from the general data cache unit and M image processing instructions from the general data cache unit within a preset time window, and deletes images with the same function in the M image processing instructions. processing instructions, obtain N image processing instructions, and process the above target area according to the N image processing instructions to obtain a processed image.
具体地,上述预设时间窗口可以理解成预设时长。在预设时长内上述神经网络运算子单元1131从上述神经元存储单元121中获取M条图像处理指令或者上述通用运算子单元1132从上述通用数据缓存单元中获取M条图像处理指令后,上述神经网络运算子单元1131或者上述通用运算子单元1132对上述M条图像处理指令进行两两比较,将该M条图像处理指令中功能相同的指令删除,得到N条图像处理指令。上述神经网络运算子单元1131或者上述通用运算子单元1132根据上述N条处理指令和上述目标图像处理模型对上述待处理图像进行处理。Specifically, the above-mentioned preset time window can be understood as a preset duration. Within a preset period of time, after the neural network operation sub-unit 1131 obtains M image processing instructions from the neuron storage unit 121 or the general operation sub-unit 1132 obtains M image processing instructions from the general data cache unit, the neural network operation sub-unit 1132 obtains M image processing instructions from the general data cache unit. The network operation subunit 1131 or the general operation subunit 1132 compares the M image processing instructions pairwise, and deletes instructions with the same function among the M image processing instructions to obtain N image processing instructions. The above-mentioned neural network operation sub-unit 1131 or the above-mentioned general operation sub-unit 1132 processes the above-mentioned image to be processed according to the above-mentioned N processing instructions and the above-mentioned target image processing model.
举例说明,上述神经网络运算子单元1131或者上述通用运算子单元1132对上述M条图像处理指令进行两两比较。当图像处理指令A和图像处理指令B一样时,上述神经网络运算子单元1131或者上述通用运算子单元1132删除上述图像处理指令A和B中开销最大的一条;当图像处理指令A和图像处理指令B不一样时,上述神经网络运算子单元1131或者上述通用运算子单元1132获取上述图像处理指令A和上述图像处理指令B的相似系数。当该相似系数大于相似阈值时,确定上述图像处理指令A和上述图像处理指令B功能相同,上述神经网络运算子单元1131或者上述通用运算子单元1132删除上述图像处理指令A和B中开销最大的一条;当上述相似系数小于上述相似阈值时,上述神经网络运算子单元1131或者上述通用运算子单元1132确定上述图像处理指令A和B的功能不同。该图像处理指令A和B为上述M条处理指令中的任意两条。For example, the above-mentioned neural network operation sub-unit 1131 or the above-mentioned general operation sub-unit 1132 performs pairwise comparison on the above-mentioned M image processing instructions. When image processing instruction A and image processing instruction B are the same, the above-mentioned neural network operation sub-unit 1131 or the above-mentioned general operation sub-unit 1132 deletes the most expensive one of the above-mentioned image processing instructions A and B; when image processing instruction A and image processing instruction When B is different, the above-mentioned neural network operation sub-unit 1131 or the above-mentioned general operation sub-unit 1132 obtains the similarity coefficient of the above-mentioned image processing instruction A and the above-mentioned image processing instruction B. When the similarity coefficient is greater than the similarity threshold, it is determined that the above-mentioned image processing instruction A and the above-mentioned image processing instruction B have the same function, and the above-mentioned neural network operation sub-unit 1131 or the above-mentioned general operation sub-unit 1132 deletes the most expensive one among the above-mentioned image processing instructions A and B. One; when the similarity coefficient is less than the similarity threshold, the neural network operation subunit 1131 or the general operation subunit 1132 determines that the functions of the image processing instructions A and B are different. The image processing instructions A and B are any two of the above M processing instructions.
所述输入输出单元104,还用于将所述处理后的图像输出。The input and output unit 104 is also used to output the processed image.
其中,上述图像处理单元根据上述语音信号对上述待处理图像进行处理,得到处理后的图像后,通过上述输入输出单元将该处理后的图像输出。Wherein, the image processing unit processes the image to be processed according to the speech signal, and after obtaining the processed image, outputs the processed image through the input and output unit.
举例说明上述语义区域,假设上述图像处理装置根据语音信号确定对上述目标区域为人脸区域时,则上述语义区域为上述待处理图像中的人脸区域,上述图像处理装置以人脸为粒度,获取上述待处理图像中的多个人脸区域;当上述目标区域为背景,上述图像处理装置将上述待处理图像划分为背景区域和非背景区域;当上述目标区域为红颜色区域时,上述图像处理装置将上述待处理图像按照颜色划分为不同颜色的区域。To illustrate the above-mentioned semantic area, suppose that the above-mentioned image processing device determines that the above-mentioned target area is a human face area according to the speech signal, then the above-mentioned semantic area is the human face area in the above-mentioned image to be processed, and the above-mentioned image processing apparatus uses the human face as the granularity to obtain Multiple face areas in the image to be processed; when the target area is the background, the image processing device divides the image to be processed into a background area and a non-background area; when the target area is a red color area, the image processing device Divide the above image to be processed into areas of different colors according to color.
具体地,本发明中使用的语音识别技术包括但不限于采用人工神经网络(Artificial Neural Network,ANN)、隐马尔科夫模型(Hidden Markov Model,HMM)等模型,上述第一语音识别单元可根据上述语音识别技术处理上述语音信号;上述自然语言处理技术包括但不限于利用统计机器学习、ANN等方法,上述语义理解单元可根据上述自然语言处理技术提取出语义信息;上述图像识别技术包括但不限于利用基于边缘检测的方法、阈值分割方法、区域生长与分水岭算法、灰度积分投影曲线分析、模板匹配、可变形模板、Hough变换、Snake算子、基于Gabor小波变换的弹性图匹配技术、主动形状模型和主动外观模型等方法等算法,上述图像识别单元可根据上述图像识别技术将上述待处理图像分割成不同的区域。Specifically, the speech recognition technology used in the present invention includes but is not limited to the use of artificial neural network (ANN), hidden Markov model (HMM) and other models. The above-mentioned first speech recognition unit can be based on The above-mentioned speech recognition technology processes the above-mentioned speech signal; the above-mentioned natural language processing technology includes but is not limited to the use of statistical machine learning, ANN and other methods. The above-mentioned semantic understanding unit can extract semantic information based on the above-mentioned natural language processing technology; the above-mentioned image recognition technology includes but is not limited to Limited to the use of edge detection-based methods, threshold segmentation methods, region growing and watershed algorithms, grayscale integral projection curve analysis, template matching, deformable templates, Hough transform, Snake operator, elastic graph matching technology based on Gabor wavelet transform, active Using algorithms such as shape model and active appearance model, the above-mentioned image recognition unit can segment the above-mentioned image to be processed into different areas according to the above-mentioned image recognition technology.
可选地,在上述输入输出单元130获取上述语音信号和上述待处理图像之前,上述神经网络运算子单元1131对语音指令转换模型进行自适应训练,以得到上述目标语音指令转换模型。Optionally, before the input and output unit 130 acquires the speech signal and the image to be processed, the neural network operation subunit 1131 performs adaptive training on the speech instruction conversion model to obtain the target speech instruction conversion model.
其中,上述神经网络运算子单元1131对语音指令转换模型进行自适应训练是离线进行的或者是在线进行的。Among them, the adaptive training of the voice command conversion model by the above-mentioned neural network operation sub-unit 1131 is performed offline or online.
具体地,上述对语音指令转换模型进行自适应训练是离线进行的具体是上述神经网络运算子单元1131在其硬件的基础上对上述语音指令转换模型进行自适应训练,以得到目标语音指令转换模型;上述对语音指令转换模型进行自适应训练是在线进行的具体是一个不同于神经网络运算子单元1131的云端服务器对上述语音指令转换模型进行自适应训练,以得到目标语音指令转换模型。上述神经网络运算子单元1131在需要使用上述目标语音指令转换模型时,该神经网络运算子单元1131从上述云端服务器中获取该目标语音指令转换模型。Specifically, the above-mentioned adaptive training of the voice command conversion model is performed offline. Specifically, the above-mentioned neural network operation sub-unit 1131 performs adaptive training on the above-mentioned voice command conversion model based on its hardware to obtain the target voice command conversion model. ; The above-mentioned adaptive training of the voice command conversion model is performed online. Specifically, a cloud server different from the neural network operation subunit 1131 performs adaptive training on the above-mentioned voice command conversion model to obtain the target voice command conversion model. When the above-mentioned neural network operation sub-unit 1131 needs to use the above-mentioned target voice command conversion model, the above-mentioned neural network operation sub-unit 1131 obtains the target voice command conversion model from the above-mentioned cloud server.
可选地,上述神经网络运算子单元1131对语音指令转换模型进行自适应训练是有监督的或者是监督的。Optionally, the adaptive training of the voice command conversion model by the above-mentioned neural network operation sub-unit 1131 is supervised or supervised.
具体地,上述对上述语音指令转换模型进行自适应训练是有监督的具体为:Specifically, the above-mentioned adaptive training of the above-mentioned voice command conversion model is supervised, specifically:
上述神经网络运算子单元1131根据语音指令转换模型将上述语音信号换成预测指令;然后确定上述预测指令与其对应的指令集合的相关系数,该指令集合为人工根据语音信号得到的指令的集合;上述神经网络运算子单元1131根据所述预测指令与其对应的指令集合的相关系数优化所述语音指令转换模型,以得到所述目标语音指令转换模型。The above-mentioned neural network operation sub-unit 1131 converts the above-mentioned speech signal into a prediction instruction according to the speech instruction conversion model; and then determines the correlation coefficient between the above-mentioned prediction instruction and its corresponding instruction set, which is a set of instructions obtained manually based on the speech signal; the above-mentioned The neural network operation subunit 1131 optimizes the voice instruction conversion model according to the correlation coefficient between the predicted instruction and its corresponding instruction set to obtain the target voice instruction conversion model.
举例说明,上述对语音指令转换模型进行自适应训练是有监督的具体包括:上述神经网络运算子单元1131获取一段包含相关命令的语音信号,如改变图像的颜色、旋转图片等。每种命令对应一个指令集合。对用于自适应训练的输入的语音信号来说,对应的指令集合是已知的,上述神经网络运算子单元1131以这些语音信号作为语音指令转换模型的输入数据,获取输出后的预测指令。上述神经网络运算子单元1131计算上述预测指令与其对应的指令集合的相关系数,并根据该自适应地更新上述语音指令转换模型中的参数(如权值、偏置等等),以提高上述语音指令转换模型的性能,进而得到上述目标语音指令转换模型。For example, the above-mentioned adaptive training of the voice command conversion model is supervised and specifically includes: the above-mentioned neural network operation sub-unit 1131 obtains a voice signal containing relevant commands, such as changing the color of an image, rotating a picture, etc. Each command corresponds to an instruction set. For the input speech signals used for adaptive training, the corresponding instruction set is known. The above-mentioned neural network operation subunit 1131 uses these speech signals as input data of the speech instruction conversion model to obtain the output prediction instructions. The above-mentioned neural network operation sub-unit 1131 calculates the correlation coefficient between the above-mentioned predicted instructions and their corresponding instruction sets, and adaptively updates the parameters (such as weights, biases, etc.) in the above-mentioned voice instruction conversion model based on the correlation coefficient to improve the above-mentioned speech instructions. performance of the command conversion model, and then obtain the above target voice command conversion model.
具体地,针对上述图像处理单元110,其输入和输出均为图像。上述图像103可以通过包括但不限定于ANN和传统计算机视觉方法对上述待处理图像进行的处理包括但不局限于:美体(例如美腿,隆胸),换脸、美化脸,换物体(猫换狗、斑马变马,苹果换桔子等),换背景(后面的森林换成田野),去遮挡(例如人脸遮住了一个眼睛,重新把眼睛重构出来),风格转换(一秒钟变梵高画风),位姿转换(例如站着变坐着,正脸变侧脸)、非油画变油画、更换图像背景的颜色和更换图像中物体所处的季节背景。Specifically, for the above-mentioned image processing unit 110, both its input and output are images. The above-mentioned image 103 can be processed by including but not limited to ANN and traditional computer vision methods, including but not limited to: body beautification (such as beautiful legs, breast augmentation), face changing, face beautification, and object changing (cat to dog). , zebras into horses, apples into oranges, etc.), changing the background (the forest behind is replaced by a field), de-occlusion (for example, if the face covers one eye, reconstruct the eye again), style conversion (turning into a Sanskrit in one second) High painting style), posture conversion (such as standing to sitting, front face to side face), non-oil painting to oil painting, changing the color of the image background and changing the seasonal background of the objects in the image.
可选地,在上述神经网络运算子单元1131接收上述语音信号之前,该神经网络运算子单元1131对图像处理模型进行自适应训练,以得到上述目标图像处理模型。Optionally, before the neural network operation sub-unit 1131 receives the speech signal, the neural network operation sub-unit 1131 performs adaptive training on the image processing model to obtain the target image processing model.
其中,上述神经网络运算子单元1131对图像处理模型进行自适应训练是离线进行的或者是在线进行的。Among them, the adaptive training of the image processing model by the above-mentioned neural network operation sub-unit 1131 is performed offline or online.
具体地,上述对图像处理模型进行自适应训练是离线进行的具体是上述神经网络运算子单元1131在其硬件的基础上对上述图像处理模型进行自适应训练,以得到目标语音指令转换模型;上述对图像处理模型进行自适应训练是在线进行的具体是一个不同于上述神经网络运算子单元1131的云端服务器对上述图像处理模型进行自适应训练,以得到目标图像处理模型。上述神经网络运算子单元1131在需要使用上述目标图像处理模型时,该神经网络运算子单元1131从上述云端服务器中获取该目标图像处理模型。Specifically, the above-mentioned adaptive training of the image processing model is performed offline. Specifically, the above-mentioned neural network operation sub-unit 1131 performs adaptive training on the above-mentioned image processing model based on its hardware to obtain the target voice command conversion model; the above-mentioned The adaptive training of the image processing model is performed online. Specifically, a cloud server different from the above-mentioned neural network operation sub-unit 1131 performs adaptive training on the above-mentioned image processing model to obtain the target image processing model. When the neural network operation sub-unit 1131 needs to use the target image processing model, the neural network operation sub-unit 1131 obtains the target image processing model from the cloud server.
可选地,上述神经网络运算子单元1131对图像处理模型进行自适应训练是有监督的或者是监督的。Optionally, the adaptive training of the image processing model by the above-mentioned neural network operation sub-unit 1131 is supervised or supervised.
具体地,上述神经网络运算子单元1131对上述图像处理模型进行自适应训练是有监督的具体为:Specifically, the adaptive training of the above-mentioned image processing model by the above-mentioned neural network operation sub-unit 1131 is supervised, specifically as follows:
上述神经网络运算子单元1131根据图像处理模型将上述语音信号换成预测图像;然后确定上述预测图像与其对应的目标图像的相关系数,该目标为人工根据语音信号对待处理图像进行处理得到的图像;上述神经网络运算子单元1131根据所述预测图像与其对应的目标图像的相关系数优化所述图像处理模型,以得到所述目标图像处理模型。The above-mentioned neural network operation sub-unit 1131 replaces the above-mentioned speech signal with a predicted image according to the image processing model; and then determines the correlation coefficient between the above-mentioned predicted image and its corresponding target image, which is an image obtained by manually processing the image to be processed based on the speech signal; The above-mentioned neural network operation sub-unit 1131 optimizes the image processing model according to the correlation coefficient between the predicted image and its corresponding target image to obtain the target image processing model.
举例说明,上述对图像处理模型进行自适应训练是有监督的具体包括:上述神经网络运算子单元1131获取一段包含相关命令的语音信号,如改变图像的颜色、旋转图片等。每种命令对应一张目标图像。对用于自适应训练的输入的语音信号来说,对应的目标图像是已知的,上述神经网络运算子单元1131以这些语音信号作为图像处理模型的输入数据,获取输出后的预测图像。上述神经网络运算子单元1131计算上述预测图像与其对应的目标图像的相关系数,并根据该自适应地更新上述图像处理模型中的参数(如权值、偏置等等),以提高上述图像处理模型的性能,进而得到上述目标图像处理模型。For example, the above-mentioned adaptive training of the image processing model is supervised and specifically includes: the above-mentioned neural network operation sub-unit 1131 obtains a voice signal containing relevant commands, such as changing the color of the image, rotating the picture, etc. Each command corresponds to a target image. For the input speech signals used for adaptive training, the corresponding target image is known. The above-mentioned neural network operation subunit 1131 uses these speech signals as input data of the image processing model to obtain the output predicted image. The above-mentioned neural network operation sub-unit 1131 calculates the correlation coefficient between the above-mentioned predicted image and its corresponding target image, and adaptively updates the parameters (such as weights, offsets, etc.) in the above-mentioned image processing model based on the correlation coefficient to improve the above-mentioned image processing. The performance of the model is then obtained, and the above target image processing model is obtained.
其中,上述图像处理装置的图像处理单元110还包括:Wherein, the image processing unit 110 of the above image processing device also includes:
指令缓存单元111,用于存储待执行的指令,该指令包括神经网络运算指令和通用运算指令;An instruction cache unit 111, used to store instructions to be executed, the instructions including neural network operation instructions and general operation instructions;
指令处理单元112,用于将所述神经网络运算指令传输至所述神经网络运算子单元,将所述通用运算指令传输至所述通用运算子单元。The instruction processing unit 112 is configured to transmit the neural network operation instructions to the neural network operation sub-unit, and transmit the general operation instructions to the general operation sub-unit.
需要说明的是,上述图像处理装置的图像处理单元113中神经网络运算子单元1131在进行图像处理操作、对上述图像处理模型和上述语音指令转换模型进行自适应训练过程中,上述指令处理单元112从上述指令缓存单元111中获取神经网络运算指令并传输至上述神经网络运算子单元1131,以驱动该神经网络运算子单元1131。上述通用运算子单元1132在进行图像处理操作过程中,上述指令处理单元112从上述指令缓存单元111中获取通用运算指令并传输至上述通用运算子单元1132,以驱动该通用运算子单元1132。It should be noted that when the neural network operation sub-unit 1131 in the image processing unit 113 of the above-mentioned image processing device performs image processing operations and performs adaptive training on the above-mentioned image processing model and the above-mentioned voice command conversion model, the above-mentioned instruction processing unit 112 Neural network operation instructions are obtained from the instruction cache unit 111 and transmitted to the neural network operation sub-unit 1131 to drive the neural network operation sub-unit 1131. During the image processing operation of the general operation sub-unit 1132, the instruction processing unit 112 obtains general operation instructions from the instruction cache unit 111 and transmits them to the general operation sub-unit 1132 to drive the general operation sub-unit 1132.
在本实施例中,上述图像处理装置是以单元的形式来呈现。这里的“单元”可以指特定应用集成电路(application-specific integrated circuit,ASIC),执行一个或多个软件或固件程序的处理器和存储器,集成逻辑电路,和/或其他可以提供上述功能的器件。In this embodiment, the above image processing device is presented in the form of a unit. The "unit" here may refer to an application-specific integrated circuit (ASIC), a processor and memory that executes one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the above functions. .
可以看出,在本发明实施例的方案中,输入输出单元输入语音信号和待处理图像;存储单元存储所述语音信号和所述待处理图像;图像处理单元将所述语音信号转换成图像处理指令和目标区域,所述目标区域为待处理图像的处理区域;并根据所述图像处理指令对所述标区域进行处理,以得到处理后的图像,并将所述待处理图像存储到所述存储单元中;所述输入输出单元将所述处理后的图像输出。与现有的图像处理技术相比,本发明通过语音进行图像处理,节省了用户在进行图像处理前学习图像处理软件的时间,提高了用户体验。It can be seen that in the solution of the embodiment of the present invention, the input and output unit inputs the voice signal and the image to be processed; the storage unit stores the voice signal and the image to be processed; the image processing unit converts the voice signal into image processing instructions and a target area, the target area is the processing area of the image to be processed; and the target area is processed according to the image processing instruction to obtain the processed image, and the image to be processed is stored in the In the storage unit; the input and output unit outputs the processed image. Compared with existing image processing technology, the present invention performs image processing through voice, saving users time in learning image processing software before performing image processing, and improving user experience.
参见图2,图2为本发明实施例提供的另一种图像处理装置的结构框架示意图。如图2所示,该芯片包括:Referring to Figure 2, Figure 2 is a schematic structural frame diagram of another image processing device provided by an embodiment of the present invention. As shown in Figure 2, the chip includes:
图像处理单元210、存储单元220、输入输出单元230。Image processing unit 210, storage unit 220, input and output unit 230.
其中,上述图像处理单元210包括:Wherein, the above-mentioned image processing unit 210 includes:
指令缓存单元211,用于存储待执行的指令,该指令包括神经网络运算指令和通用运算指令。The instruction cache unit 211 is used to store instructions to be executed, which include neural network operation instructions and general operation instructions.
在一种实施方式中,上述指令缓存单元211可以是重排序缓存。In one implementation, the instruction cache unit 211 may be a reorder cache.
指令处理单元212、用于从指令缓存单元获取神经网络运算指令或通用运算指令,并对该指令进行处理并提供给上述神经网络运算单元213。其中,上述指令处理单元212包括:The instruction processing unit 212 is configured to obtain neural network operation instructions or general operation instructions from the instruction cache unit, process the instructions, and provide them to the neural network operation unit 213. Among them, the above-mentioned instruction processing unit 212 includes:
取指模块214,用于从指令缓存单元中获取指令;The instruction fetch module 214 is used to obtain instructions from the instruction cache unit;
译码模块215,用于对获取的指令进行译码;Decoding module 215, used to decode the obtained instructions;
指令队列模块216,用于对译码后的指令进行顺序存储。The instruction queue module 216 is used to sequentially store decoded instructions.
标量寄存模块217,用于存储上述指令对应的操作码和操作数,包括神经网络运算指令对应的神经网络运算操作码和操作数、以及通用运算指令对应的通用运算操作码和操作数。The scalar register module 217 is used to store the operation codes and operands corresponding to the above instructions, including the neural network operation opcodes and operands corresponding to the neural network operation instructions, and the general operation opcodes and operands corresponding to the general operation instructions.
处理依赖关系模块218,用于对上述指令处理单元212发来的指令及其对应的操作码和操作数进行判断,判断该指令与前一指令是否访问相同的数据,若是,将该指令存储在存储队列单元219中,待前一指令执行完毕后,将存储队列单元中的该指令提供给上述神经网络运算单元213;否则,直接将该指令提供给上述神经网络运算单元213。The processing dependency module 218 is used to judge the instruction sent by the above-mentioned instruction processing unit 212 and its corresponding operation code and operand, and judge whether the instruction accesses the same data as the previous instruction. If so, store the instruction in In the storage queue unit 219, after the execution of the previous instruction is completed, the instruction in the storage queue unit is provided to the above-mentioned neural network operation unit 213; otherwise, the instruction is directly provided to the above-mentioned neural network operation unit 213.
存储队列单元219,用于在指令访问存储单元时,存储访问同一存储空间的连续两条指令。The storage queue unit 219 is used to store two consecutive instructions accessing the same storage space when the instruction accesses the storage unit.
具体地,为了保证上述连续两条指令执行结果的正确性,当前指令如果被检测到与之前指令的数据存在依赖关系,该连续两条指令必须在上述存储队列单元219内等待至依赖关系被消除,才可将该连续两条指令提供给上述神经网络运算单元。Specifically, in order to ensure the correctness of the execution results of the above two consecutive instructions, if the current instruction is detected to have a dependency on the data of the previous instruction, the two consecutive instructions must wait in the above storage queue unit 219 until the dependency is eliminated. , the two consecutive instructions can be provided to the above-mentioned neural network operation unit.
神经网络运算单元213,用于处理指令处理模块或者存储队列单元传输过来的指令。The neural network computing unit 213 is used to process instructions transmitted from the instruction processing module or the storage queue unit.
存储单元220包括神经元缓存单元521和权值缓存单元522,神经网络数据模型存储于上述神经元缓存单元221和权值缓存单元222中。The storage unit 220 includes a neuron cache unit 521 and a weight cache unit 522. The neural network data model is stored in the neuron cache unit 221 and the weight cache unit 222.
输入输出单元230,用于输入语音信号,并输出图像处理指令。The input and output unit 230 is used to input voice signals and output image processing instructions.
在一个实施方式中,存储单元220可以是高速暂存存储器,输入输出单元230可以是IO直接内存存取模块。In one embodiment, the storage unit 220 may be a cache memory, and the input and output unit 230 may be an IO direct memory access module.
具体地,上述图像处理装置的神经网络运算子单元将语音信号转换为图像处理指令具体包括:Specifically, the neural network operation subunit of the above-mentioned image processing device converts the speech signal into an image processing instruction, which specifically includes:
步骤A、取指令模块214从指令缓存单元211取出一条用于语音识别的神经网络运算指令,并将运算指令送往译码模块215。Step A. The instruction fetching module 214 fetches a neural network operation instruction for speech recognition from the instruction cache unit 211 and sends the operation instruction to the decoding module 215 .
步骤B、译码模块215对运算指令译码,并将译码后的指令送往指令队列单元216。Step B: The decoding module 215 decodes the operation instructions and sends the decoded instructions to the instruction queue unit 216 .
步骤C、从标量寄存模块217中获取所述指令对应的神经网络运算操作码和神经网络运算操作数。Step C: Obtain the neural network operation operation code and neural network operation operand corresponding to the instruction from the scalar registration module 217 .
步骤D、指令被送往处理依赖关系模块218;该处理依赖关系模块218对指令对应的操作码和操作数进行判断,判断指令与之前尚未执行完的指令在数据上是否存在依赖关系,如果不存在,将所述指令直接送往神经网络运算单元213;如果存在,则指令需要在存储队列单元219中等待,直至其与之前尚未执行完的指令在数据上不再存在依赖关系,然后将所述指令送往神经网络运算单元213Step D. The instruction is sent to the processing dependency module 218; the processing dependency module 218 determines the operation code and operand corresponding to the instruction, and determines whether there is a data dependency between the instruction and the instruction that has not been executed before. If not, If exists, the instruction will be sent directly to the neural network computing unit 213; if it exists, the instruction needs to wait in the storage queue unit 219 until it no longer has a data dependency relationship with the previously unexecuted instructions, and then all the instructions will be sent to the neural network operation unit 213. The above instructions are sent to the neural network computing unit 213
步骤E、神经网络运算子单元2131根据指令对应的操作码和操作数确定所需数据的地址和大小,从存储单元220取出所需数据,包括语音指令转换模型数据等。Step E. The neural network operation subunit 2131 determines the address and size of the required data according to the operation code and operand corresponding to the instruction, and retrieves the required data from the storage unit 220, including voice command conversion model data, etc.
步骤F、神经网络运算子单元2131执行所述指令对应的神经网络运算,完成相应处理,得到图像处理指令,并将图像处理指令写回存储单元220的神经元存储单元221。Step F. The neural network operation subunit 2131 executes the neural network operation corresponding to the instruction, completes the corresponding processing, obtains the image processing instruction, and writes the image processing instruction back to the neuron storage unit 221 of the storage unit 220.
具体地,上述图像处理装置的通用运算子单元将语音信号转换为图像处理指令具体包括:Specifically, the general operation subunit of the above-mentioned image processing device converts the voice signal into an image processing instruction, which specifically includes:
步骤A’、取指令模块214从指令缓存单元211取出一条用于语音识别的通用运算指令,并将运算指令送往译码模块215。Step A’, the instruction fetching module 214 fetches a general operation instruction for speech recognition from the instruction cache unit 211, and sends the operation instruction to the decoding module 215.
步骤B’、译码模块215对运算指令译码,并将译码后的指令送往指令队列单元216。Step B’, the decoding module 215 decodes the operation instructions and sends the decoded instructions to the instruction queue unit 216.
步骤C’、从标量寄存模块217中获取所述指令对应的通用运算操作码和通用运算操作数。Step C’: Obtain the general operation operation code and general operation operand corresponding to the instruction from the scalar register module 217.
步骤D’、指令被送往处理依赖关系模块218;该处理依赖关系模块218对指令对应的操作码和操作数进行判断,判断指令与之前尚未执行完的指令在数据上是否存在依赖关系,如果不存在,将所述指令直接送往神经网络运算单元213;如果存在,则指令需要在存储队列单元219中等待,直至其与之前尚未执行完的指令在数据上不再存在依赖关系,然后将所述指令送往神经网络运算单元213Step D', the instruction is sent to the processing dependency module 218; the processing dependency module 218 determines the operation code and operand corresponding to the instruction, and determines whether there is a data dependency between the instruction and the instruction that has not been executed before. If If it does not exist, the instruction is sent directly to the neural network operation unit 213; if it exists, the instruction needs to wait in the storage queue unit 219 until it no longer has a data dependency relationship with the previously unexecuted instructions, and then the instruction is sent to the neural network operation unit 213. The instructions are sent to the neural network computing unit 213
步骤E’、通用运算子单元2132根据指令对应的操作码和操作数确定所需数据的地址和大小,从存储单元220取出所需数据,包括语音指令转换模型数据等。Step E', the general operation subunit 2132 determines the address and size of the required data according to the operation code and operand corresponding to the instruction, and retrieves the required data from the storage unit 220, including voice command conversion model data, etc.
步骤F’、通用运算子单元2132执行所述指令对应的通用运算,完成相应处理,得到图像处理指令,并将图像处理指令写回存储单元220的通用数据缓存单元223。Step F’, the general operation sub-unit 2132 executes the general operation corresponding to the instruction, completes the corresponding processing, obtains the image processing instruction, and writes the image processing instruction back to the general data cache unit 223 of the storage unit 220.
需要说明的是,在进行图像处理过程中,上述神经网络运算子单元213的神经网络运算子单元2131和通用运算子单元2132、上述存储单元220的神经元存储单元221、权值缓存单元222和通用数据缓存单元223和上述输入输出单元230的具体操作过程可参见图1所示实施例的相关描述,在此不再叙述。It should be noted that during image processing, the neural network operation sub-unit 2131 and general operation sub-unit 2132 of the above-mentioned neural network operation sub-unit 213, the neuron storage unit 221, the weight cache unit 222 and the above-mentioned storage unit 220 The specific operation process of the general data cache unit 223 and the above-mentioned input and output unit 230 can be referred to the relevant description of the embodiment shown in FIG. 1 and will not be described here.
需要指出的是,上述存储单元220为图2所示的图像处理装置的片上缓存单元。It should be noted that the above-mentioned storage unit 220 is an on-chip cache unit of the image processing device shown in FIG. 2 .
可选地,上述图像处理装置可为数据处理装置、机器人、电脑、平板电脑、智能终端、手机、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储或者可穿戴设备。Optionally, the above image processing device may be a data processing device, a robot, a computer, a tablet, a smart terminal, a mobile phone, a cloud server, a camera, a video camera, a projector, a watch, a headset, a mobile storage or a wearable device.
在一种可行的实施例中,一种图像处理芯片包括上述图1所示的图像处理装置。In a possible embodiment, an image processing chip includes the image processing device shown in FIG. 1 above.
其中,上述芯片包括主芯片和协作芯片;Among them, the above-mentioned chips include main chips and collaboration chips;
上述协作芯片包括本发明实施例第一方面的所述的装置,上述主芯片用于为上述协作芯片提供启动信号,控制待处理图像和图像处理指令传输至上述协作芯片。The above-mentioned cooperation chip includes the device described in the first aspect of the embodiment of the present invention. The above-mentioned main chip is used to provide a start signal to the above-mentioned cooperation chip and control the transmission of images to be processed and image processing instructions to the above-mentioned cooperation chip.
可选地,上述图像处理芯片可用于摄像头,手机,电脑,笔记本,平板电脑或者其他图像处理装置。Optionally, the above image processing chip can be used in cameras, mobile phones, computers, notebooks, tablets or other image processing devices.
在一种可行的实施例中,本发明实施例提供了一种芯片封装结构,该芯片封装结构包括上述图像处理芯片。In a feasible embodiment, an embodiment of the present invention provides a chip packaging structure, which includes the above image processing chip.
在一种可行的实施例中,本发明实施例提供了一种板卡,该板卡包括上述芯片封装结构。In a feasible embodiment, an embodiment of the present invention provides a board card, which includes the above chip packaging structure.
在一种可行的实施例中,本发明实施例提供了一种电子设备,该电子设备包括上述板卡。In a feasible embodiment, an embodiment of the present invention provides an electronic device, which includes the above-mentioned board card.
在一种可行的实施例的中,本发明实施例提供了另一种电子设备,该电子设备包括上述板卡,交互界面、控制单元和语音采集器。In a feasible embodiment, an embodiment of the present invention provides another electronic device, which includes the above-mentioned board card, an interactive interface, a control unit and a voice collector.
如图3所示,上述语音采集器用于接收语音,并将语音和待处理图像作为输入数据传递给板卡内部的图像处理芯片。As shown in Figure 3, the above-mentioned voice collector is used to receive voice, and pass the voice and the image to be processed as input data to the image processing chip inside the board.
可选地,上述图像处理芯片可为人工神经网络处理芯片。Optionally, the above image processing chip may be an artificial neural network processing chip.
优选地,语音采集器为麦克风或者多阵列麦克风。Preferably, the voice collector is a microphone or a multi-array microphone.
其中板卡内部的芯片的包括同图1和图2所示的实施例,用于得到对应的输出数据(即处理后的图像),并将其传输至交互界面中。The chip inside the board card includes the same embodiment as shown in Figures 1 and 2, and is used to obtain corresponding output data (ie, the processed image) and transmit it to the interactive interface.
其中交互界面接收上述芯片(可以看成人工神经网络处理器)的输出数据,并将其转化为合适形式的反馈信息显示给用户。The interactive interface receives the output data of the above-mentioned chip (which can be seen as an artificial neural network processor) and converts it into appropriate form of feedback information to display to the user.
其中图像处理单元接收用户的操作或命令,并控制整个图像处理装置的运作。The image processing unit receives user operations or commands and controls the operation of the entire image processing device.
可选地,上述电子设备可为数据处理装置、机器人、电脑、平板电脑、智能终端、手机、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储或者可穿戴设备。Optionally, the above-mentioned electronic device may be a data processing device, a robot, a computer, a tablet, a smart terminal, a mobile phone, a cloud server, a camera, a video camera, a projector, a watch, a headset, a mobile storage or a wearable device.
参见图4,图4为本发明实施例提供的一种图像处理方法的流程示意图。如图4所示,该方法包括:Referring to Figure 4, Figure 4 is a schematic flow chart of an image processing method provided by an embodiment of the present invention. As shown in Figure 4, the method includes:
S401、图像处理装置输入语音信号和待处理图像。S401. The image processing device inputs the voice signal and the image to be processed.
S402、图像处理装置存储所述语音信号和所述待处理图像S402. The image processing device stores the voice signal and the image to be processed.
S403、图像处理装置将所述语音信号转换成图像处理指令和目标区域,所述目标区域为待处理图像的处理区域;并根据所述图像处理指令对所述标区域进行处理,以得到处理后的图像,并将所述待处理图像存储到所述存储单元中。S403. The image processing device converts the voice signal into an image processing instruction and a target area. The target area is the processing area of the image to be processed; and processes the target area according to the image processing instruction to obtain the processed image, and store the image to be processed into the storage unit.
在一种可行的实施例中,所述将所述语音信号转换成图像处理指令和目标区域,包括:In a feasible embodiment, converting the voice signal into image processing instructions and target areas includes:
根据语音识别技术将所述语音信号转换成文本信息;Convert the speech signal into text information according to speech recognition technology;
根据自然语言处理技术和目标语音指令转换模型将所述文本信息转换成所述图像处理指令;Convert the text information into the image processing instructions according to natural language processing technology and the target voice instruction conversion model;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
在一种可行的实施例中,所述根据目标语音指令转换模型将所述语音信号转换成图像处理指令和目标区域,包括:In a feasible embodiment, converting the voice signal into an image processing instruction and a target area according to the target voice instruction conversion model includes:
将所述语音信号通过语音识别技术、语义理解技术和所述语音指令转换模型转换成所述图像处理指令;Convert the speech signal into the image processing instruction through speech recognition technology, semantic understanding technology and the speech instruction conversion model;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
在一种可行的实施例中,所述将所述语音信号转换成图像处理指令和目标区域,包括:In a feasible embodiment, converting the voice signal into image processing instructions and target areas includes:
根据语音识别技术、语义理解技术和目标语音指令转换模型将所述语音信号转换成所述图像处理指令;Convert the speech signal into the image processing instruction according to speech recognition technology, semantic understanding technology and target speech instruction conversion model;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
在一种可行的实施例中,所述将所述语音信号转换成图像处理指令和目标区域,包括:In a feasible embodiment, converting the voice signal into image processing instructions and target areas includes:
根据语音识别技术将所述语音信号转换成文本信息;Convert the speech signal into text information according to speech recognition technology;
根据自然语言处理技术将所述文本信息转换成所述图像处理指令;Convert the text information into the image processing instructions according to natural language processing technology;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
在一种可行的实施例中,所述将所述语音信号转换成图像处理指令和目标区域,包括:In a feasible embodiment, converting the voice signal into image processing instructions and target areas includes:
根据语音识别技术和语义理解技术将所述语音信号转换成所述图像处理指令;Convert the speech signal into the image processing instruction according to speech recognition technology and semantic understanding technology;
根据所述图像处理指令中的语义区域的粒度和图像识别技术对所述待处理图像进行区域划分,获取所述目标区域。The image to be processed is divided into regions according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and the target region is obtained.
在一种可行的实施例中,所述将所述语音信号转换成图像处理指令和目标区域之后,所述方法还包括:In a possible embodiment, after converting the voice signal into image processing instructions and target areas, the method further includes:
存储所述图像处理指令和所述目标区域。The image processing instructions and the target area are stored.
在一种可行的实施例中,所述根据所述图像处理指令对所述标区域进行处理,以得到处理后的图像,包括:In a feasible embodiment, processing the target area according to the image processing instruction to obtain a processed image includes:
在预设时间窗口内从所述神经元存储单元中获取M条图像处理指令;Obtain M image processing instructions from the neuron storage unit within the preset time window;
删除所述M条图像处理指令中功能相同的图像处理指令,得到N条图像处理指令,所述M为大于1的整数,所述N为小于所述M的整数;Delete image processing instructions with the same function among the M image processing instructions to obtain N image processing instructions, where M is an integer greater than 1, and N is an integer less than M;
根据所述N条图像处理指令和目标图像处理模型对所述目标区域进行处理,以得到处理后的图像。The target area is processed according to the N image processing instructions and the target image processing model to obtain a processed image.
在一种可行的实施例中,所述根据所述图像处理指令对所述标区域进行处理,以得到处理后的图像,包括:In a feasible embodiment, processing the target area according to the image processing instruction to obtain a processed image includes:
在预设时间窗口内从所述通用数据缓存单元中获取M条图像处理指令;Obtain M image processing instructions from the general data cache unit within a preset time window;
删除所述M条图像处理指令中功能相同的图像处理指令,得到N条图像处理指令,所述M为大于1的整数,所述N为小于所述M的整数;Delete image processing instructions with the same function among the M image processing instructions to obtain N image processing instructions, where M is an integer greater than 1, and N is an integer less than M;
根据所述N条图像处理指令对所述目标区域进行处理,以得到处理后的图像。The target area is processed according to the N image processing instructions to obtain a processed image.
S404、图像处理装置输出所述处理后的图像。S404. The image processing device outputs the processed image.
在一种可行的实施例中,所述方法还包括:In a feasible embodiment, the method further includes:
对语音指令转换模型进行自适应训练,以得到所述目标语音指令转换模型。Perform adaptive training on the voice command conversion model to obtain the target voice command conversion model.
在一种可行的实施例中,所述对语音指令转换模型进行自适应训练是离线进行的或者是在线进行的。In a feasible embodiment, the adaptive training of the voice command conversion model is performed offline or online.
在一种可行的实施例中,所述对语音指令转换模型进行自适应训练是有监督的或者是无监督的。In a feasible embodiment, the adaptive training of the voice command conversion model is supervised or unsupervised.
在一种可行的实施例中,所述对语音指令转换模型进行自适应训练,以得到所述目标语音指令转换模型,包括:In a feasible embodiment, the adaptive training of the voice command conversion model to obtain the target voice command conversion model includes:
根据所述语音指令转换模型将所述语音信号换成预测指令;Convert the speech signal into a predictive instruction according to the speech instruction conversion model;
确定所述预测指令与其对应的指令集合的相关系数;Determine the correlation coefficient between the predicted instruction and its corresponding instruction set;
根据所述预测指令与其对应的指令集合的相关系数优化所述语音指令转换模型,以得到所述目标语音指令转换模型。The voice command conversion model is optimized according to the correlation coefficient between the predicted command and its corresponding command set to obtain the target voice command conversion model.
在一种可行的实施例中,所述方法还包括:In a feasible embodiment, the method further includes:
对图像处理模型进行自适应训练,以得到所述目标图像处理模型。Perform adaptive training on the image processing model to obtain the target image processing model.
在一种可行的实施例中,所述对图像处理模型进行自适应训练是离线进行的或者是在线进行的。In a feasible embodiment, the adaptive training of the image processing model is performed offline or online.
在一种可行的实施例中,所述对图像处理模型进行自适应训练是有监督或者无监督的。In a feasible embodiment, the adaptive training of the image processing model is supervised or unsupervised.
在一种可行的实施例中,所述对图像处理模型进行自适应训练,包括:In a feasible embodiment, the adaptive training of the image processing model includes:
根据所述图像处理模型对所述待处理图像进行处理,以得到预测图像;Process the image to be processed according to the image processing model to obtain a predicted image;
确定所述预测图像与其对应的目标图像的相关系数;Determine the correlation coefficient between the predicted image and its corresponding target image;
根据所述预测图像与其对应的目标图像的相关系数优化所述图像处理模型,以得到所述目标图像处理模型。The image processing model is optimized according to the correlation coefficient between the predicted image and its corresponding target image to obtain the target image processing model.
需要说明的是,图4所示的方法的各个步骤的具体实现过程可参见上述图像处理装置的具体实现过程,在此不再叙述。It should be noted that the specific implementation process of each step of the method shown in Figure 4 can be referred to the specific implementation process of the above-mentioned image processing device, and will not be described here.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。It should be noted that for the sake of simple description, the foregoing method embodiments are expressed as a series of action combinations. However, those skilled in the art should know that the present invention is not limited by the described action sequence. Because in accordance with the present invention, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily necessary for the present invention.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above embodiments, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元可以采用硬件的形式实现。In addition, each functional unit in various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated unit can be implemented in the form of hardware.
以上对本发明实施例进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上上述,本说明书内容不应理解为对本发明的限制。The embodiments of the present invention have been introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation modes of the present invention. The description of the above embodiments is only used to help understand the method and the core idea of the present invention; at the same time, for Those of ordinary skill in the art will make changes in the specific implementation and application scope based on the ideas of the present invention. In summary, the contents of this description should not be understood as limiting the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711121244.5ACN109785843B (en) | 2017-11-14 | 2017-11-14 | Image processing apparatus and method |
| JP2019556201AJP6810283B2 (en) | 2017-09-29 | 2018-09-29 | Image processing equipment and method |
| EP19215862.4AEP3667488B1 (en) | 2017-09-29 | 2018-09-29 | Image processing apparatus and method |
| EP18861574.4AEP3627499B1 (en) | 2017-09-29 | 2018-09-29 | Image processing apparatus and method |
| KR1020197032701AKR102380494B1 (en) | 2017-09-29 | 2018-09-29 | Image processing apparatus and method |
| PCT/CN2018/108696WO2019062931A1 (en) | 2017-09-29 | 2018-09-29 | Image processing device and method |
| KR1020197032702AKR102379954B1 (en) | 2017-09-29 | 2018-09-29 | Image processing apparatus and method |
| EP19215861.6AEP3667487B1 (en) | 2017-09-29 | 2018-09-29 | Image processing apparatus and method |
| US16/615,255US11532307B2 (en) | 2017-09-29 | 2018-09-29 | Image processing apparatus and method |
| KR1020197028486AKR102317958B1 (en) | 2017-09-29 | 2018-09-29 | Image processing apparatus and method |
| JP2019211745AJP6810232B2 (en) | 2017-09-29 | 2019-11-22 | Image processing equipment and method |
| JP2019211746AJP6893968B2 (en) | 2017-09-29 | 2019-11-22 | Image processing equipment and method |
| US16/719,035US11450319B2 (en) | 2017-09-29 | 2019-12-18 | Image processing apparatus and method |
| US16/718,981US11437032B2 (en) | 2017-09-29 | 2019-12-18 | Image processing apparatus and method |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711121244.5ACN109785843B (en) | 2017-11-14 | 2017-11-14 | Image processing apparatus and method |
| Publication Number | Publication Date |
|---|---|
| CN109785843A CN109785843A (en) | 2019-05-21 |
| CN109785843Btrue CN109785843B (en) | 2024-03-26 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201711121244.5AActiveCN109785843B (en) | 2017-09-29 | 2017-11-14 | Image processing apparatus and method |
| Country | Link |
|---|---|
| CN (1) | CN109785843B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111444922A (en)* | 2020-03-27 | 2020-07-24 | Oppo广东移动通信有限公司 | Picture processing method and device, storage medium and electronic equipment |
| CN111743462B (en)* | 2020-06-18 | 2022-06-28 | 北京小狗吸尘器集团股份有限公司 | Sweeping method and device of sweeping robot |
| CN118898245A (en)* | 2024-10-08 | 2024-11-05 | 深圳市智慧城市科技发展集团有限公司 | Image processing method, device, equipment, storage medium and computer program product |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6133904A (en)* | 1996-02-09 | 2000-10-17 | Canon Kabushiki Kaisha | Image manipulation |
| US6690825B1 (en)* | 1999-06-16 | 2004-02-10 | Canon Kabushiki Kaisha | Image processing apparatus and method |
| JP2010079411A (en)* | 2008-09-24 | 2010-04-08 | Sony Corp | Learning equipment, image processor, learning method, image processing method, and program |
| CN105912717A (en)* | 2016-04-29 | 2016-08-31 | 广东小天才科技有限公司 | Image-based information searching method and device |
| CN105979035A (en)* | 2016-06-28 | 2016-09-28 | 广东欧珀移动通信有限公司 | AR image processing method and device as well as intelligent terminal |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP6024110B2 (en)* | 2012-01-26 | 2016-11-09 | ソニー株式会社 | Image processing apparatus, image processing method, program, terminal device, and image processing system |
| US9412366B2 (en)* | 2012-09-18 | 2016-08-09 | Adobe Systems Incorporated | Natural language image spatial and tonal localization |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6133904A (en)* | 1996-02-09 | 2000-10-17 | Canon Kabushiki Kaisha | Image manipulation |
| US6690825B1 (en)* | 1999-06-16 | 2004-02-10 | Canon Kabushiki Kaisha | Image processing apparatus and method |
| JP2010079411A (en)* | 2008-09-24 | 2010-04-08 | Sony Corp | Learning equipment, image processor, learning method, image processing method, and program |
| CN105912717A (en)* | 2016-04-29 | 2016-08-31 | 广东小天才科技有限公司 | Image-based information searching method and device |
| CN105979035A (en)* | 2016-06-28 | 2016-09-28 | 广东欧珀移动通信有限公司 | AR image processing method and device as well as intelligent terminal |
| Publication number | Publication date |
|---|---|
| CN109785843A (en) | 2019-05-21 |
| Publication | Publication Date | Title |
|---|---|---|
| US11437032B2 (en) | Image processing apparatus and method | |
| US11450319B2 (en) | Image processing apparatus and method | |
| US11532307B2 (en) | Image processing apparatus and method | |
| EP3859488B1 (en) | Signal processing device, signal processing method and related product | |
| US11544059B2 (en) | Signal processing device, signal processing method and related products | |
| US12334092B2 (en) | Speech separation method, electronic device, chip, and computer- readable storage medium | |
| CN109584864B (en) | Image processing apparatus and method | |
| CN109785843B (en) | Image processing apparatus and method | |
| CN110968235B (en) | Signal processing device and related product | |
| CN110969246A (en) | Signal processing device and related product | |
| CN109584862B (en) | Image processing apparatus and method | |
| CN110968285A (en) | Signal processing devices and related products | |
| CN116954364A (en) | Limb action interaction method and device, electronic equipment and storage medium | |
| CN119027299A (en) | Method, device, equipment and medium for synthesizing digital human on client |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| TG01 | Patent term adjustment | ||
| TG01 | Patent term adjustment |