CN118333171B

Movatterモバイル変換

Info

Publication number: CN118333171B
Application number: CN202410725531.0A
Authority: CN
Inventors: 向鸣; 陈凤; 李康荣; 刘思健; 赵政桥; 黄海; 陈捷; 陈景东
Original assignee: South China Sea Survey Technology Center State Oceanic Administration (south China Sea Marine Buoy Center); Northwestern Polytechnical University
Current assignee: South China Sea Survey Technology Center State Oceanic Administration (south China Sea Marine Buoy Center); Northwestern Polytechnical University
Priority date: 2024-06-06
Filing date: 2024-06-06
Publication date: 2024-10-11
Anticipated expiration: 2044-06-06
Also published as: CN118333171A

Abstract

The invention relates to the technical field of marine ecological environment monitoring, and discloses a penguin monitoring method and device based on video and passive acoustics. The method adopts a plurality of groups of acquisition equipment to simultaneously acquire audio data and video data of a target area; analyzing audio data according to the relative spatial positions of the audio collectors to obtain the relative spatial coordinates of the sounding penguin; determining pixel coordinates of the sounding penguin in the video data according to the relative space coordinates of the sounding penguin and the video data; according to the pixel coordinates of the sounding penguin, intercepting a video fragment of the sounding penguin in the video data; extracting a plurality of penguin behaviors of the sounding penguin from the video clips, marking sound clips related to the penguin behaviors, and constructing an exchange behavior knowledge graph of the sounding penguin. The invention improves the comprehensiveness and effectiveness of data acquisition, improves the accuracy of penguin positioning, and further deepens the research on penguin communication modes.

Description

Translated fromChinese

一种基于视频和被动声学的企鹅监测方法及装置A penguin monitoring method and device based on video and passive acoustics

技术领域Technical Field

本发明涉及海洋生态环境监测技术领域，特别是涉及一种基于视频和被动声学的企鹅监测方法及装置。The present invention relates to the technical field of marine ecological environment monitoring, and in particular to a penguin monitoring method and device based on video and passive acoustics.

背景技术Background Art

评估生物种群行为模式的变化和多样性是生态学研究的经典问题。监测和研究动物的行为变化也有助于回答动物交流领域的一些基础性问题，例如理解交流信号的模态和功能。南极企鹅的监测主要是通过高空如无人机、卫星遥感进行远距离监测。又因南极自然环境恶略、企鹅的栖息地与觅食地点较远，长时间近距离的采集企鹅音频以及视频数据较为困难。随着近几十年大量的南极企鹅海洋馆建立，对南极企鹅的数据采集很多由野外转到了室内。而室内由于空间有限，大量不同种类的企鹅混杂在一起，当企鹅发出叫声时通过视频和拾音器很难辨别是哪只企鹅发出的，以致获取到的音频信号无法有效的正确分类。Assessing changes and diversity in the behavioral patterns of biological populations is a classic problem in ecological research. Monitoring and studying changes in animal behavior can also help answer some basic questions in the field of animal communication, such as understanding the modality and function of communication signals. The monitoring of Antarctic penguins is mainly carried out through long-distance monitoring by high-altitude drones and satellite remote sensing. Due to the harsh natural environment in Antarctica and the long distance between the penguin habitat and the foraging site, it is difficult to collect audio and video data of penguins at close range for a long time. With the establishment of a large number of Antarctic penguin aquariums in recent decades, a lot of data collection on Antarctic penguins has been transferred from the wild to indoors. However, due to limited space indoors, a large number of penguins of different species are mixed together. When penguins make calls, it is difficult to distinguish which penguin is making them through videos and microphones, so that the acquired audio signals cannot be effectively and correctly classified.

目前的企鹅观测手段主要通过高分辨率的卫星遥感影像、空中无人机拍摄、定点伪装相机、穿戴式的观测设备、以及遥控机器人等，但这些监测手段都各有缺点。其中，高分辨率的卫星遥感影像数据有一定的时延性，无法获取声音信号、图像角度只有正上方、而且企鹅信息非常微小模糊；空中无人机空中作业时间短，不能实现长期监测；定点伪装相机容易被企鹅发现甚至进行破坏，采集到的数据无法覆盖整个企鹅的活动范围，也无法多角度获取所需数据；穿戴式的观测设备对穿戴的企鹅有一定伤害，而且也无法控制多角度的数据获取；遥控机器人每次只能对一只企鹅进行近距离数据采集，数据采集需要人工操作，所以作业时间有一定限制，而且遥控机器人的续航不能支撑长期观测，直接接近企鹅获取数据也会对企鹅的日常活动造成一定干扰或威胁。The current means of penguin observation are mainly through high-resolution satellite remote sensing images, aerial drone photography, fixed-point camouflage cameras, wearable observation equipment, and remote-controlled robots, but these monitoring methods all have their own shortcomings. Among them, high-resolution satellite remote sensing image data has a certain delay, cannot obtain sound signals, the image angle is only directly above, and the penguin information is very small and blurred; the aerial operation time of aerial drones is short and long-term monitoring cannot be achieved; fixed-point camouflage cameras are easily discovered or even destroyed by penguins, and the collected data cannot cover the entire penguin activity range, nor can the required data be obtained from multiple angles; wearable observation equipment has certain damage to the penguins wearing it, and it is also impossible to control the acquisition of data from multiple angles; remote-controlled robots can only collect data from one penguin at a time at close range, and data collection requires manual operation, so the operation time is limited, and the endurance of remote-controlled robots cannot support long-term observation, and directly approaching penguins to obtain data will also cause certain interference or threats to the penguins' daily activities.

发明内容Summary of the invention

本发明提供了一种基于视频和被动声学的企鹅监测方法及装置，提高了数据采集的全面性和有效性，提高了企鹅定位的准确性，进一步加深了对企鹅交流方式的研究。The present invention provides a penguin monitoring method and device based on video and passive acoustics, which improves the comprehensiveness and effectiveness of data collection, improves the accuracy of penguin positioning, and further deepens the research on penguin communication methods.

为了解决上述技术问题，本发明提供了一种基于视频和被动声学的企鹅监测方法，包括：In order to solve the above technical problems, the present invention provides a penguin monitoring method based on video and passive acoustics, comprising:

采用若干组采集设备同时采集目标区域的音频数据和视频数据；其中，每组采集设备包括视频采集器和音频采集器；Using several groups of acquisition devices to simultaneously acquire audio data and video data of the target area; wherein each group of acquisition devices includes a video collector and an audio collector;

根据各音频采集器的相对空间位置分析所述音频数据，获取发声企鹅的相对空间坐标；Analyze the audio data according to the relative spatial positions of the audio collectors to obtain the relative spatial coordinates of the sounding penguin;

根据发声企鹅的相对空间坐标和所述视频数据，在所述视频数据确定发声企鹅的像素坐标；Determine pixel coordinates of the sounding penguin in the video data according to the relative spatial coordinates of the sounding penguin and the video data;

根据发声企鹅的像素坐标，在所述视频数据截取所述发声企鹅的视频片段；According to the pixel coordinates of the penguin making the sound, a video clip of the penguin making the sound is intercepted in the video data;

在发声企鹅在视频片段中提取所述发声企鹅的若干个企鹅行为，并对各企鹅行为关联的声音片段进行标注，构建发声企鹅的交流行为知识图谱。A number of penguin behaviors of the vocal penguin are extracted from the video clips, and the sound clips associated with each penguin behavior are annotated to construct a knowledge graph of the communication behavior of the vocal penguin.

进一步地，所述采用若干组采集设备同时采集目标区域的音频数据和视频数据，具体为：Furthermore, the method of using several groups of acquisition devices to simultaneously acquire the audio data and video data of the target area is specifically as follows:

采用长距离定向拾音器作为声音采集器；A long-distance directional pickup is used as a sound collector;

采用超高清监控摄像头作为视频采集器；Use ultra-high-definition surveillance cameras as video collectors;

组合所述声音采集器和所述视频采集器，形成采集设备；Combining the sound collector and the video collector to form a collection device;

根据预设的设备位置信息，利用支架将若干组采集设备进行固定，并通过若干组所述采集设备实时采集目标区域的音频数据和视频数据。According to the preset device position information, several groups of acquisition devices are fixed by using brackets, and the audio data and video data of the target area are collected in real time by using the several groups of acquisition devices.

进一步地，所述根据各音频采集器的相对空间位置分析所述音频数据，获取发声企鹅的相对空间坐标，具体为：Furthermore, the audio data is analyzed according to the relative spatial positions of the audio collectors to obtain the relative spatial coordinates of the sounding penguin, specifically:

获取各音频采集器的相对空间位置，以及各音频采集器接收发声企鹅的声音信号的时间差；Obtain the relative spatial position of each audio collector and the time difference between each audio collector receiving the sound signal of the sounding penguin;

根据标准空气中的声速和所述时间差，计算得出各音频采集器与所述发声企鹅的距离；The distance between each audio collector and the sound-generating penguin is calculated based on the speed of sound in standard air and the time difference;

结合各音频采集器的相对空间位置和各音频采集器与所述发声企鹅的距离，计算得出发声企鹅的相对空间坐标。The relative spatial coordinates of the sounding penguin are calculated based on the relative spatial positions of the audio collectors and the distances between the audio collectors and the sounding penguin.

进一步地，所述根据发声企鹅的像素坐标，在所述视频数据截取所述发声企鹅的视频片段，具体为：Furthermore, the step of intercepting a video clip of the sounding penguin from the video data according to the pixel coordinates of the sounding penguin is specifically as follows:

根据发声企鹅的像素坐标，采用追踪算法在所述视频数据上追踪所述发声企鹅的行动轨迹，并根据所述行动轨迹提取发声企鹅的视频片段。According to the pixel coordinates of the sounding penguin, a tracking algorithm is used to track the movement trajectory of the sounding penguin on the video data, and a video clip of the sounding penguin is extracted according to the movement trajectory.

进一步地，所述在发声企鹅在视频片段中提取所述发声企鹅的若干个企鹅行为，具体为：Furthermore, the extracting of several penguin behaviors of the vocal penguin in the video clip is specifically:

在发声企鹅在视频片段中，利用姿态识别算法提取所述发声企鹅的骨架表示，利用动作估计算法提取所述发声企鹅的动作特征；In the video clip of the sounding penguin, a skeleton representation of the sounding penguin is extracted using a posture recognition algorithm, and a motion feature of the sounding penguin is extracted using a motion estimation algorithm;

利用预设的多模态大模型分别对所述发声企鹅的骨架表示和动作特征进行语义分析，得出取所述发声企鹅的若干个企鹅行为。The preset multimodal large model is used to perform semantic analysis on the skeleton representation and action characteristics of the vocal penguin, and several penguin behaviors of the vocal penguin are obtained.

进一步地，对各企鹅行为关联的声音片段进行标注，具体为：Furthermore, the sound clips associated with each penguin behavior are annotated, specifically:

利用聚类算法对发声企鹅的若干个企鹅行为进行聚类分析，分析企鹅行为的各种类型，得出各企鹅行为对应的类别；Clustering algorithms were used to cluster several behaviors of vocal penguins, analyze various types of penguin behaviors, and obtain the categories corresponding to each penguin behavior;

根据各企鹅行为对应的视频片段，确定与各企鹅行为对应的声音片段；Determine the sound clips corresponding to the behaviors of each penguin based on the video clips corresponding to the behaviors of each penguin;

根据各企鹅行为对应的类别，将企鹅行为对应的声音片段进行标注。According to the categories corresponding to each penguin behavior, the sound clips corresponding to the penguin behavior are labeled.

进一步地，在所述根据各音频采集器的相对空间位置分析所述音频数据，获取发声企鹅的相对空间坐标后，还包括：Further, after analyzing the audio data according to the relative spatial positions of the audio collectors to obtain the relative spatial coordinates of the sounding penguin, the method further includes:

根据发声企鹅的相对空间坐标，在所述音频数据截取所述发声企鹅的声音片段；According to the relative spatial coordinates of the penguin making the sound, intercepting the sound segment of the penguin making the sound in the audio data;

利用预设的企鹅种属检测分类模型对所述发声企鹅的声音片段进行识别，得出所述发声企鹅的所属种类。The sound segment of the vocal penguin is identified using a preset penguin species detection and classification model to obtain the species of the vocal penguin.

进一步地，所述构建发声企鹅的交流行为知识图谱，具体为：Furthermore, the construction of the communication behavior knowledge graph of the vocal penguin is specifically as follows:

通过分析所述企鹅行为对应的声音片段，将所述发声企鹅的声音信号和企鹅行为关联，构建所述发声企鹅的所属种类的交流行为知识图谱。By analyzing the sound clips corresponding to the penguin behavior, the sound signal of the vocal penguin is associated with the penguin behavior, and a communication behavior knowledge graph of the species to which the vocal penguin belongs is constructed.

进一步地，在所述采用若干组采集设备同时采集目标区域的音频数据和视频数据后，还包括：Furthermore, after the audio data and video data of the target area are collected simultaneously by using several groups of collection devices, the method further includes:

对采集到的音频数据进行波束形成、降噪和去混响的信号预处理；Perform signal preprocessing of beamforming, noise reduction and dereverberation on the collected audio data;

根据频谱特性，对预处理后的音频数据进行检测，并在音频数据中去除检测结果为背景噪音的声音信号。The preprocessed audio data is detected according to the frequency spectrum characteristics, and the sound signal whose detection result is background noise is removed from the audio data.

本发明提供了一种基于视频和被动声学的企鹅监测方法，通过设计采集设备，长时间多角度地采集目标区域的音频数据及视频数据；通过拾音器阵列，根据接收音频信号之间的时间差来计算出发声企鹅的相对空间坐标，并对相关音频信号进行正确标签分类；转换发声企鹅的相对空间坐标，配合视频数据得出发声企鹅的像素坐标，从而在视频数据中截取发声企鹅的视频片段，以确认企鹅在发声时在实施的行为，通过对企鹅行为进行识别及语义分析，关联发声企鹅的声音信号和企鹅行为，推断企鹅交流对象和模式。本发明在不打扰企鹅日常活动的前提下实现了长时间多角度地采集企鹅数据，提高了数据采集的全面性和有效性，提高了企鹅定位的准确性，进一步加深了对企鹅交流方式的研究。The present invention provides a penguin monitoring method based on video and passive acoustics. By designing a collection device, the audio data and video data of the target area are collected for a long time and from multiple angles. The relative spatial coordinates of the sounding penguin are calculated according to the time difference between the received audio signals through the microphone array, and the relevant audio signals are correctly labeled and classified. The relative spatial coordinates of the sounding penguin are converted, and the pixel coordinates of the sounding penguin are obtained in combination with the video data, so as to intercept the video clips of the sounding penguin in the video data to confirm the behavior of the penguin when making a sound. By identifying and semantically analyzing the penguin behavior, the sound signal of the sounding penguin and the penguin behavior are associated, and the penguin communication object and mode are inferred. The present invention realizes the long-term and multi-angle collection of penguin data without disturbing the penguin's daily activities, improves the comprehensiveness and effectiveness of data collection, improves the accuracy of penguin positioning, and further deepens the research on the penguin's communication mode.

相应的，本发明提供了一种基于视频和被动声学的企鹅监测装置，包括：采集模块、定位模块、坐标转换模块、截取模块和关联模块；Accordingly, the present invention provides a penguin monitoring device based on video and passive acoustics, comprising: a collection module, a positioning module, a coordinate conversion module, an interception module and an association module;

所述采集模块用于采用若干组采集设备同时采集目标区域的音频数据和视频数据；其中，每组采集设备包括视频采集器和音频采集器；The acquisition module is used to use several groups of acquisition devices to simultaneously acquire audio data and video data of the target area; wherein each group of acquisition devices includes a video collector and an audio collector;

所述定位模块用于根据各音频采集器的相对空间位置分析所述音频数据，获取发声企鹅的相对空间坐标；The positioning module is used to analyze the audio data according to the relative spatial positions of each audio collector to obtain the relative spatial coordinates of the sounding penguin;

所述坐标转换模块用于根据发声企鹅的相对空间坐标和所述视频数据，在所述视频数据确定发声企鹅的像素坐标；The coordinate conversion module is used to determine the pixel coordinates of the sounding penguin in the video data according to the relative spatial coordinates of the sounding penguin and the video data;

所述截取模块用于根据发声企鹅的像素坐标，在所述视频数据截取所述发声企鹅的视频片段；The interception module is used to intercept the video clip of the sounding penguin in the video data according to the pixel coordinates of the sounding penguin;

所述关联模块用于在发声企鹅在视频片段中提取所述发声企鹅的若干个企鹅行为，并对各企鹅行为关联的声音片段进行标注，构建发声企鹅的交流行为知识图谱。The association module is used to extract several penguin behaviors of the vocal penguin in the video clip, and to mark the sound clips associated with each penguin behavior, so as to construct a knowledge graph of the communication behavior of the vocal penguin.

本发明提供了一种基于视频和被动声学的企鹅监测装置，以模块间的有机结合为基础，通过设计采集设备，长时间多角度地采集目标区域的音频数据及视频数据；通过拾音器阵列，根据接收音频信号之间的时间差来计算出发声企鹅的相对空间坐标，并对相关音频信号进行正确标签分类；转换发声企鹅的相对空间坐标，配合视频数据得出发声企鹅的像素坐标，从而在视频数据中截取发声企鹅的视频片段，以确认企鹅在发声时在实施的行为，通过对企鹅行为进行识别及语义分析，关联发声企鹅的声音信号和企鹅行为，推断企鹅交流对象和模式。本发明在不打扰企鹅日常活动的前提下实现了长时间多角度地采集企鹅数据，提高了数据采集的全面性和有效性，提高了企鹅定位的准确性，进一步加深了对企鹅交流方式的研究。The present invention provides a penguin monitoring device based on video and passive acoustics. Based on the organic combination of modules, the device designs the acquisition equipment to acquire audio data and video data of the target area for a long time and from multiple angles. The relative spatial coordinates of the sounding penguin are calculated according to the time difference between the received audio signals through the microphone array, and the relevant audio signals are correctly labeled and classified. The relative spatial coordinates of the sounding penguin are converted, and the pixel coordinates of the sounding penguin are obtained in combination with the video data, so as to intercept the video clips of the sounding penguin in the video data to confirm the behavior of the penguin when making the sound. By identifying and semantically analyzing the penguin behavior, the sound signal of the sounding penguin and the penguin behavior are associated, and the penguin communication object and mode are inferred. The present invention realizes the acquisition of penguin data from multiple angles for a long time without disturbing the penguin's daily activities, improves the comprehensiveness and effectiveness of data acquisition, improves the accuracy of penguin positioning, and further deepens the research on the penguin's communication mode.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明提供的基于视频和被动声学的企鹅监测方法的一种实施例的流程示意图；FIG1 is a flow chart of an embodiment of a penguin monitoring method based on video and passive acoustics provided by the present invention;

图2为本发明采集设备的布放方法的一种实施例的示意图；FIG2 is a schematic diagram of an embodiment of a method for placing a collection device according to the present invention;

图3为本发明提供的坐标转换方法的一种实施例的示意图；FIG3 is a schematic diagram of an embodiment of a coordinate conversion method provided by the present invention;

图4为本发明提供的基于视频和被动声学的企鹅监测装置的一种实施例的结构示意图。FIG. 4 is a schematic structural diagram of an embodiment of a penguin monitoring device based on video and passive acoustics provided by the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

附图中所示的流程图仅是示例说明，不是必须包括所有的内容和操作/步骤，也不是必须按所描述的顺序执行。例如，有的操作/步骤还可以分解、组合或部分合并，因此实际执行的顺序有可能根据实际情况改变。The flowcharts shown in the accompanying drawings are only examples and do not necessarily include all the contents and operations/steps, nor must they be executed in the order described. For example, some operations/steps may also be decomposed, combined or partially merged, so the actual execution order may change according to actual conditions.

下面结合附图，对本发明的一些实施方式作详细说明。在不冲突的情况下，下述的实施例及实施例中的特征可以相互组合。Some embodiments of the present invention are described in detail below in conjunction with the accompanying drawings. In the absence of conflict, the following embodiments and features in the embodiments can be combined with each other.

实施例1Example 1

参见图1，是本发明提供的基于视频和被动声学的企鹅监测方法的一种实施例的流程示意图，该方法应用于企鹅海洋馆，包括步骤101至步骤105，各步骤具体如下：Referring to FIG. 1 , it is a flow chart of an embodiment of a penguin monitoring method based on video and passive acoustics provided by the present invention. The method is applied to a penguin aquarium and includes steps 101 to 105. The specific steps are as follows:

步骤101：采用若干组采集设备同时采集目标区域的音频数据和视频数据；其中，每组采集设备包括视频采集器和音频采集器。Step 101: using several groups of acquisition devices to simultaneously acquire audio data and video data of a target area; wherein each group of acquisition devices includes a video collector and an audio collector.

进一步地，在本发明第一实施例中，采用若干组采集设备同时采集目标区域的音频数据和视频数据，具体为：Furthermore, in the first embodiment of the present invention, several groups of acquisition devices are used to simultaneously acquire audio data and video data of the target area, specifically:

在本发明第一实施例中，采用超高清监控摄像头，通过一个根网线以及相应的PoE交换机实现长时间供电及实时视频数据传输，使其长时间大量获取目标区域所有企鹅影像，为后续企鹅行为、种类等方面的检测识别处理提供训练、验证以及测试样本。采用长距离定向麦克风阵列拾音器，通过一个根网线以及相应的PoE交换机实现长时间供电及实时音频数据传输，使其长时间大量获取目标区域的所有企鹅音频信号。In the first embodiment of the present invention, an ultra-high-definition surveillance camera is used, and long-term power supply and real-time video data transmission are achieved through a network cable and a corresponding PoE switch, so that it can obtain a large number of images of all penguins in the target area for a long time, and provide training, verification and test samples for subsequent detection and recognition of penguin behaviors, species, etc. A long-distance directional microphone array pickup is used, and long-term power supply and real-time audio data transmission are achieved through a network cable and a corresponding PoE switch, so that it can obtain a large number of audio signals of all penguins in the target area for a long time.

作为本发明第一实施例的一种举例，参见图2，是本发明采集设备的布放方法的一种实施例的示意图，为同步获取目标区域视频以及音频数据，同时获取不同位置不同角度的企鹅音频信号和视频信号，利用多组音视频记录阵列同时采集数据，每组阵元均由一个超高清监控摄像头和一个长距离定向麦克风阵列拾音器组成，并利用支架固定其相对位置，确保目标与设备的相对位置信息可解。通过多组音视频记录阵列能够获取到海洋馆内目标位置企鹅的同步音频信号及视频数据，通过拾音器阵列可以实现声音定位，定位发声企鹅再与全景视频相应位置的企鹅种类和行为进行确认，为后续企鹅声音通讯交流研究及识别处理提供了数据上的支撑。As an example of the first embodiment of the present invention, see FIG2, which is a schematic diagram of an embodiment of the method for placing the acquisition device of the present invention. In order to synchronously obtain the video and audio data of the target area, and simultaneously obtain the audio signals and video signals of penguins at different positions and angles, multiple groups of audio and video recording arrays are used to simultaneously collect data. Each group of array elements is composed of an ultra-high-definition surveillance camera and a long-distance directional microphone array pickup, and the relative position is fixed by a bracket to ensure that the relative position information of the target and the device can be solved. The synchronous audio signals and video data of the penguins at the target position in the oceanarium can be obtained through multiple groups of audio and video recording arrays. The sound positioning can be achieved through the pickup array. The locating sounding penguins are then confirmed with the penguin species and behaviors at the corresponding position of the panoramic video, which provides data support for the subsequent penguin sound communication and identification processing.

进一步地，在本发明第一实施例中，在采用若干组采集设备同时采集目标区域的音频数据和视频数据后，还包括：Furthermore, in the first embodiment of the present invention, after using several groups of acquisition devices to simultaneously acquire the audio data and video data of the target area, the method further includes:

在本发明第一实施例中，企鹅场馆混响高、背景噪声，其环境噪音源包括制冷设备、游客等，所以在利用采集设备采集到音频数据后，需要先对音频数据进行预处理。首先，对长距离定向拾音器采集到的麦克风阵列数据进行波束形成、降噪、去混响等信号处理操作，提高声信号质量。然后，从定向拾音器阵采集到的音频流中获取包含所有企鹅种类叫声和场馆背景噪声音频数据帧，进行分类标记。根据频谱特性，将音频数据帧中的背景噪音与生物声音分类，分别得到企鹅声音数据集和背景声音数据集，企鹅声音数据集应用在后续的监测分析中。In the first embodiment of the present invention, the penguin venue has high reverberation and background noise, and its environmental noise sources include refrigeration equipment, tourists, etc., so after the audio data is collected by the acquisition equipment, the audio data needs to be preprocessed first. First, the microphone array data collected by the long-distance directional microphone is subjected to signal processing operations such as beamforming, noise reduction, and reverberation to improve the quality of the sound signal. Then, the audio data frame containing the calls of all penguin species and the background noise of the venue is obtained from the audio stream collected by the directional microphone array, and classified and marked. According to the spectral characteristics, the background noise and biological sounds in the audio data frame are classified to obtain a penguin sound data set and a background sound data set, respectively. The penguin sound data set is used in subsequent monitoring and analysis.

步骤102：根据各音频采集器的相对空间位置分析所述音频数据，获取发声企鹅的相对空间坐标。Step 102: Analyze the audio data according to the relative spatial positions of the audio collectors to obtain the relative spatial coordinates of the sounding penguin.

进一步地，在本发明第一实施例中，根据各音频采集器的相对空间位置分析所述音频数据，获取发声企鹅的相对空间坐标，具体为：Furthermore, in the first embodiment of the present invention, the audio data is analyzed according to the relative spatial positions of the audio collectors to obtain the relative spatial coordinates of the sounding penguin, specifically:

在本发明第一实施例中，首先各音频采集器的相对空间位置，再通过同一声源发声时每台音频采集器收到信号的时间差，以标准空气中的声速(340m/s)来计算距离差，从而通过计算得到每台音频采集器到声源的距离，最终得到声源的相对空间坐标值。In the first embodiment of the present invention, the relative spatial position of each audio collector is first determined, and then the time difference when each audio collector receives the signal when the same sound source emits sound is used to calculate the distance difference using the speed of sound in standard air (340 m/s). Thus, the distance from each audio collector to the sound source is calculated, and finally the relative spatial coordinate value of the sound source is obtained.

进一步地，在本发明第一实施例中，在根据各音频采集器的相对空间位置分析所述音频数据，获取发声企鹅的相对空间坐标后，还包括：Furthermore, in the first embodiment of the present invention, after analyzing the audio data according to the relative spatial positions of the audio collectors to obtain the relative spatial coordinates of the sounding penguin, the method further includes:

在本发明第一实施例中，在获取发声企鹅的相对空间坐标后，在音频数据截取发声企鹅对应的声音片段，作为后续分析的音频数据。将发声企鹅的声音片段输入至企鹅种属检测分类模型，可以识别出发声企鹅的种类。分别对若干个音频采集器采集的发声企鹅的声音片段进行种类识别，可以提高识别准确率。In the first embodiment of the present invention, after the relative spatial coordinates of the sounding penguin are obtained, the sound segment corresponding to the sounding penguin is intercepted from the audio data as the audio data for subsequent analysis. The sound segment of the sounding penguin is input into the penguin species detection classification model to identify the species of the sounding penguin. The sound segments of the sounding penguins collected by several audio collectors are respectively identified to improve the recognition accuracy.

步骤103：根据发声企鹅的相对空间坐标和所述视频数据，在所述视频数据确定发声企鹅的像素坐标。Step 103: Determine the pixel coordinates of the sounding penguin in the video data according to the relative spatial coordinates of the sounding penguin and the video data.

作为本发明第一实施例的一种举例，将发声企鹅的相对空间坐标映射在视频数据中的对应位置，可以将发声企鹅的声音信号和视频信号相关联。具体地，由于发声企鹅的相对空间坐标属于世界坐标系，而视频数据中用到的定位是像素坐标系，这两个坐标系可通过视频采集器本身的参数(如相机摆放角度、位置、焦距、像素等)通过相机坐标系以及图像坐标系进行转换，由此可通过世界坐标系的坐标(x, y, z)推导出视频中的像素坐标系坐标(u, v)。参见图3，是本发明提供的坐标转换方法的一种实施例的示意图，世界坐标转换为相机坐标的公式为：As an example of the first embodiment of the present invention, the relative spatial coordinates of the sounding penguin are mapped to the corresponding position in the video data, and the sound signal of the sounding penguin and the video signal can be associated. Specifically, since the relative spatial coordinates of the sounding penguin belong to the world coordinate system, and the positioning used in the video data is the pixel coordinate system, these two coordinate systems can be converted through the camera coordinate system and the image coordinate system through the parameters of the video collector itself (such as camera placement angle, position, focal length, pixels, etc.), thereby deriving the pixel coordinate system coordinates (u, v) in the video through the coordinates (x, y, z) of the world coordinate system. Referring to Figure 3, it is a schematic diagram of an embodiment of the coordinate conversion method provided by the present invention, and the formula for converting the world coordinates to the camera coordinates is:

式中，(,,)是相机坐标，(,,)是世界坐标，R和T 分别是相机的外参，即R(3*3矩阵)旋转向量是相机的布放角度，T(1*3矩阵)平移向量是相机的布放高度。In the formula, ( , , ) is the camera coordinate, ( , , ) is the world coordinate, R and T are the external parameters of the camera, that is, R (3*3 matrix) rotation vector is the camera placement angle, T (1*3 matrix) translation vector is the camera placement height.

相机坐标转换为图像坐标的公式为：The formula for converting camera coordinates to image coordinates is:

式中，是相机焦距，(,,)是相机坐标，(x, y)是图像坐标。In the formula, is the focal length of the camera, ( , , ) are the camera coordinates and (x, y) are the image coordinates.

图像坐标和像素坐标都在同一个平面上，图像坐标原点在画面的中心，而像素坐标原点在左上角，所以图像坐标到像素坐标的转换需要两个要素的转换，一个是坐标值的平移校正以及像素点距离的换算。图像坐标转换为像素坐标的公式为：Image coordinates and pixel coordinates are on the same plane. The origin of image coordinates is at the center of the screen, while the origin of pixel coordinates is at the upper left corner. Therefore, the conversion from image coordinates to pixel coordinates requires the conversion of two elements: the translation correction of coordinate values and the conversion of pixel distances. The formula for converting image coordinates to pixel coordinates is:

式中，与是每个像素点分别在x和y轴上的物理尺寸，和是图像正中央点O点的像素坐标，可通过相机图像大小计算得出。比如，若图像大小为1920x1080，则参数=960，=540。若相机设为80x60cm，则=1/24(cm/像素)、=1/18(cm/像素)。In the formula, and are the physical sizes of each pixel on the x and y axes respectively. and is the pixel coordinate of the center point O of the image, which can be calculated from the camera image size. For example, if the image size is 1920x1080, then the parameter =960, =540. If the camera is set to 80x60cm, then =1/24(cm/pixel), =1/18(cm/pixel).

步骤104：根据发声企鹅的像素坐标，在所述视频数据截取所述发声企鹅的视频片段。Step 104: According to the pixel coordinates of the sounding penguin, a video segment of the sounding penguin is intercepted from the video data.

进一步地，在本发明第一实施例中，根据发声企鹅的像素坐标，在所述视频数据截取所述发声企鹅的视频片段，具体为：Further, in the first embodiment of the present invention, according to the pixel coordinates of the sounding penguin, a video clip of the sounding penguin is intercepted from the video data, specifically:

在本发明第一实施例中，当确定发声企鹅在视频数据中的像素坐标后，可以利用transformer-assist追踪算法TrSiam在视频数据上对企鹅行动轨迹进行追踪，实现对相关企鹅行为数据的连续提取。具体地，将上述利用企鹅种属检测分类模型识别到的发声企鹅与视频数据中的发声企鹅进行匹配，然后利用transformer-assist追踪算法TrSiam在视频数据上对发声企鹅的行动轨迹进行追踪，以YOLO算法检测目标企鹅得出的框为基础，往外扩大1倍对追踪算法得出的轨迹进行视频截取。比如，YOLO算法对发声企鹅A的检测框四个角的像素坐标为(x1，y1)、(x2，y1)、(x1，y2)、(x2，y2)，扩大一倍后截取范围的四个角的像素坐标为((x1-(x2-x1)/2)，(y1-(y2-y1)/2))、((x2+(x2-x1)/2)，(y1-(y2-y1)/2))、((x1-(x2-x1)/2)，(y2+(y2-y1)/2))、((x2+(x2-x1)/2)，(y2+(y2-y1)/2))。通过截取范围的像素坐标，可利用OpenCV对视频数据进行所需要的目标画面截取。截取时长由5秒起，直至视频中目标消失或停止动作，并记录原视频中的截取时间戳。In the first embodiment of the present invention, after determining the pixel coordinates of the sounding penguin in the video data, the transformer-assist tracking algorithm TrSiam can be used to track the penguin's movement trajectory on the video data to achieve continuous extraction of relevant penguin behavior data. Specifically, the sounding penguin identified by the penguin species detection and classification model is matched with the sounding penguin in the video data, and then the transformer-assist tracking algorithm TrSiam is used to track the movement trajectory of the sounding penguin on the video data, and the frame obtained by detecting the target penguin by the YOLO algorithm is used as the basis, and the trajectory obtained by the tracking algorithm is expanded outward by 1 times to perform video capture. For example, the pixel coordinates of the four corners of the detection frame of the sounding penguin A by the YOLO algorithm are (x1, y1), (x2, y1), (x1, y2), (x2, y2). The pixel coordinates of the four corners of the interception range after doubling are ((x1-(x2-x1)/2), (y1-(y2-y1)/2)), ((x2+(x2-x1)/2), (y1-(y2-y1)/2)), ((x1-(x2-x1)/2), (y2+(y2-y1)/2)), ((x2+(x2-x1)/2), (y2+(y2-y1)/2)). Through the pixel coordinates of the interception range, OpenCV can be used to intercept the required target screen of the video data. The interception time starts from 5 seconds until the target disappears or stops moving in the video, and the interception timestamp in the original video is recorded.

步骤105：在发声企鹅在视频片段中提取所述发声企鹅的若干个企鹅行为，并对各企鹅行为关联的声音片段进行标注，构建发声企鹅的交流行为知识图谱。Step 105: extracting several penguin behaviors of the vocal penguin from the video clip, annotating the sound clips associated with each penguin behavior, and constructing a knowledge graph of the communication behavior of the vocal penguin.

进一步地，在本发明第一实施例中，在发声企鹅在视频片段中提取所述发声企鹅的若干个企鹅行为，具体为：Further, in the first embodiment of the present invention, several penguin behaviors of the vocal penguin are extracted from the video clip, specifically:

在本发明第一实施例中，企鹅在发声时，有可能是保持某个姿态，也可能是在做某个动作，因此在截取到的发声企鹅的视频片段中提取发声企鹅的骨架表示和动作特征，可以识别出发声企鹅在发声时的行为，通过多模态神经网络对视频中的目标行为进行解译可以生成解译文字，比如，对视频中的发声企鹅行为生成“企鹅在摇头晃脑”的解释性文字，帮助游客或用户观察企鹅的行为。In the first embodiment of the present invention, when a penguin makes a sound, it may maintain a certain posture or perform a certain action. Therefore, by extracting the skeleton representation and action features of the sounding penguin from the captured video clip of the sounding penguin, the behavior of the sounding penguin when making the sound can be identified, and the target behavior in the video can be interpreted by a multimodal neural network to generate interpreted text. For example, an explanatory text "the penguin is shaking its head" is generated for the sounding penguin behavior in the video to help tourists or users observe the behavior of the penguin.

进一步地，在本发明第一实施例中，对各企鹅行为关联的声音片段进行标注，具体为：Furthermore, in the first embodiment of the present invention, the sound clips associated with each penguin behavior are labeled, specifically:

在本发明第一实施例中，通过对视频片段进行姿态动作分析后，通过聚类分析将同类语义的归成一类。然后，原视频按截取时间戳进行全幅视频截取保存，并与关联的声频数据按判定的姿态动作进行类别标注，以此来快速高效的按企鹅的不同行为进行初步的分类标记，为不同企鹅叫声的分类研究快速提供大量的数据。In the first embodiment of the present invention, after analyzing the posture and action of the video clips, cluster analysis is used to classify the similar semantics into one category. Then, the original video is captured and saved in full according to the capture timestamp, and the associated audio data is categorized according to the determined posture and action, so as to quickly and efficiently perform preliminary classification and labeling according to the different behaviors of penguins, and quickly provide a large amount of data for the classification research of different penguin calls.

进一步地，在本发明第一实施例中，构建发声企鹅的交流行为知识图谱，具体为：Furthermore, in the first embodiment of the present invention, a knowledge graph of the communication behavior of the vocal penguin is constructed, specifically:

在本发明第一实施例中，将发声企鹅的声音信号与其交流动作等行为的视频信号相关联，构建企鹅交流与行为的知识图谱，结合发声企鹅的种类，可以推断企鹅交流对象和模式，进一步加强对企鹅交流方式的研究。In the first embodiment of the present invention, the sound signal of the vocal penguin is associated with the video signal of its communication movements and other behaviors to construct a knowledge graph of penguin communication and behavior. Combined with the type of the vocal penguin, the penguin's communication objects and patterns can be inferred, further strengthening the research on penguin communication methods.

综上，本发明第一实施例提供了一种基于视频和被动声学的企鹅监测方法，通过设计采集设备，长时间多角度地采集目标区域的音频数据及视频数据；通过拾音器阵列，根据接收音频信号之间的时间差来计算出发声企鹅的相对空间坐标，并对相关音频信号进行正确标签分类；转换发声企鹅的相对空间坐标，配合视频数据得出发声企鹅的像素坐标，从而在视频数据中截取发声企鹅的视频片段，以确认企鹅在发声时在实施的行为，通过对企鹅行为进行识别及语义分析，关联发声企鹅的声音信号和企鹅行为，推断企鹅交流对象和模式。本发明在不打扰企鹅日常活动的前提下实现了长时间多角度地采集企鹅数据，提高了数据采集的全面性和有效性，提高了企鹅定位的准确性，进一步加深了对企鹅交流方式的研究。In summary, the first embodiment of the present invention provides a penguin monitoring method based on video and passive acoustics. By designing a collection device, the audio data and video data of the target area are collected for a long time and from multiple angles; the relative spatial coordinates of the sounding penguin are calculated according to the time difference between the received audio signals through the microphone array, and the relevant audio signals are correctly labeled and classified; the relative spatial coordinates of the sounding penguin are converted, and the pixel coordinates of the sounding penguin are obtained in combination with the video data, so as to intercept the video clips of the sounding penguin in the video data to confirm the behavior of the penguin when making a sound, and the sound signal of the sounding penguin and the penguin behavior are associated by identifying and semantically analyzing the penguin behavior, and the penguin communication object and mode are inferred. The present invention realizes the long-term and multi-angle collection of penguin data without disturbing the penguin's daily activities, improves the comprehensiveness and effectiveness of data collection, improves the accuracy of penguin positioning, and further deepens the research on penguin communication methods.

实施例2Example 2

参见图4，是本发明提供的基于视频和被动声学的企鹅监测装置的一种实施例的结构示意图，该装置包括采集模块201、定位模块202、坐标转换模块203、截取模块204和关联模块205；4 is a schematic diagram of the structure of an embodiment of a penguin monitoring device based on video and passive acoustics provided by the present invention, the device includes a collection module 201, a positioning module 202, a coordinate conversion module 203, an interception module 204 and an association module 205;

采集模块201用于采用若干组采集设备同时采集目标区域的音频数据和视频数据；其中，每组采集设备包括视频采集器和音频采集器；The acquisition module 201 is used to use several groups of acquisition devices to simultaneously acquire audio data and video data of the target area; wherein each group of acquisition devices includes a video collector and an audio collector;

定位模块202用于根据各音频采集器的相对空间位置分析所述音频数据，获取发声企鹅的相对空间坐标；The positioning module 202 is used to analyze the audio data according to the relative spatial positions of each audio collector to obtain the relative spatial coordinates of the sounding penguin;

坐标转换模块203用于根据发声企鹅的相对空间坐标和所述视频数据，在所述视频数据确定发声企鹅的像素坐标；The coordinate conversion module 203 is used to determine the pixel coordinates of the sounding penguin in the video data according to the relative spatial coordinates of the sounding penguin and the video data;

截取模块204用于根据发声企鹅的像素坐标，在所述视频数据截取所述发声企鹅的视频片段；The capture module 204 is used to capture the video clip of the sounding penguin from the video data according to the pixel coordinates of the sounding penguin;

关联模块205用于在发声企鹅在视频片段中提取所述发声企鹅的若干个企鹅行为，并对各企鹅行为关联的声音片段进行标注，构建发声企鹅的交流行为知识图谱。The association module 205 is used to extract several penguin behaviors of the vocal penguin in the video clip, and to annotate the sound clips associated with each penguin behavior, so as to construct a knowledge graph of the communication behavior of the vocal penguin.

进一步地，在本发明第二实施例中，采用若干组采集设备同时采集目标区域的音频数据和视频数据，具体为：Furthermore, in the second embodiment of the present invention, several groups of acquisition devices are used to simultaneously acquire audio data and video data of the target area, specifically:

进一步地，在本发明第二实施例中，根据各音频采集器的相对空间位置分析所述音频数据，获取发声企鹅的相对空间坐标，具体为：Furthermore, in the second embodiment of the present invention, the audio data is analyzed according to the relative spatial positions of the audio collectors to obtain the relative spatial coordinates of the sounding penguin, specifically:

进一步地，在本发明第二实施例中，根据发声企鹅的像素坐标，在所述视频数据截取所述发声企鹅的视频片段，具体为：Further, in the second embodiment of the present invention, according to the pixel coordinates of the sounding penguin, a video clip of the sounding penguin is intercepted from the video data, specifically:

进一步地，在本发明第二实施例中，在发声企鹅在视频片段中提取所述发声企鹅的若干个企鹅行为，具体为：Further, in the second embodiment of the present invention, several penguin behaviors of the vocal penguin are extracted from the video clip, specifically:

进一步地，在本发明第二实施例中，对各企鹅行为关联的声音片段进行标注，具体为：Furthermore, in the second embodiment of the present invention, the sound clips associated with each penguin behavior are labeled, specifically:

进一步地，在本发明第二实施例中，在根据各音频采集器的相对空间位置分析所述音频数据，获取发声企鹅的相对空间坐标后，还包括：Furthermore, in the second embodiment of the present invention, after analyzing the audio data according to the relative spatial positions of the audio collectors to obtain the relative spatial coordinates of the sounding penguin, the method further includes:

进一步地，在本发明第二实施例中，构建发声企鹅的交流行为知识图谱，具体为：Furthermore, in the second embodiment of the present invention, a knowledge graph of the communication behavior of the vocal penguin is constructed, specifically:

进一步地，在本发明第二实施例中，在采用若干组采集设备同时采集目标区域的音频数据和视频数据后，还包括：Furthermore, in the second embodiment of the present invention, after using several groups of acquisition devices to simultaneously acquire the audio data and video data of the target area, the method further includes:

综上，本发明第二实施例提供了一种基于视频和被动声学的企鹅监测装置，以模块间的有机结合为基础，通过设计采集设备，长时间多角度地采集目标区域的音频数据及视频数据；通过拾音器阵列，根据接收音频信号之间的时间差来计算出发声企鹅的相对空间坐标，并对相关音频信号进行正确标签分类；转换发声企鹅的相对空间坐标，配合视频数据得出发声企鹅的像素坐标，从而在视频数据中截取发声企鹅的视频片段，以确认企鹅在发声时在实施的行为，通过对企鹅行为进行识别及语义分析，关联发声企鹅的声音信号和企鹅行为，推断企鹅交流对象和模式。本发明在不打扰企鹅日常活动的前提下实现了长时间多角度地采集企鹅数据，提高了数据采集的全面性和有效性，提高了企鹅定位的准确性，进一步加深了对企鹅交流方式的研究。In summary, the second embodiment of the present invention provides a penguin monitoring device based on video and passive acoustics, which is based on the organic combination of modules, and through the design of acquisition equipment, the audio data and video data of the target area are collected for a long time and from multiple angles; through the microphone array, the relative spatial coordinates of the sounding penguin are calculated according to the time difference between the received audio signals, and the relevant audio signals are correctly labeled and classified; the relative spatial coordinates of the sounding penguin are converted, and the pixel coordinates of the sounding penguin are obtained in combination with the video data, so as to intercept the video clips of the sounding penguin in the video data to confirm the behavior of the penguin when making a sound, and through the identification and semantic analysis of the penguin behavior, the sound signal of the sounding penguin and the penguin behavior are associated, and the penguin communication object and mode are inferred. The present invention realizes the long-term and multi-angle collection of penguin data without disturbing the daily activities of the penguins, improves the comprehensiveness and effectiveness of data collection, improves the accuracy of penguin positioning, and further deepens the research on the communication mode of penguins.

以上所述的具体实施例，对本发明的目的、技术方案和有益效果进行了进一步的详细说明，应当理解，以上所述仅为本发明的具体实施例而已，并不用于限定本发明的保护范围。特别指出，对于本领域技术人员来说，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The specific embodiments described above further illustrate the purpose, technical solutions and beneficial effects of the present invention. It should be understood that the above description is only a specific embodiment of the present invention and is not intended to limit the scope of protection of the present invention. It is particularly pointed out that for those skilled in the art, any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of protection of the present invention.