CN109298642B

Movatterモバイル変換

Info

Publication number: CN109298642B
Application number: CN201811097881.8A
Authority: CN
Inventors: 王壮; 杨建军
Original assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Current assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Priority date: 2018-09-20
Filing date: 2018-09-20
Publication date: 2021-08-27
Anticipated expiration: 2038-09-20
Also published as: CN109298642A

Abstract

Translated fromChinese

本发明提出采用智能音箱进行监控的方法及装置。方法包括：对于监控环境中可能出现的每类声音，根据该类声音的声源在监控环境中相对智能音箱的所有可能位置，采用智能音箱分别采集每个可能位置上该类声音的声源发出的声音信号；从声音信号中提取声音的声学特征和声源的方位特征；将声学特征和方位特征输入到训练模型中进行训练，得到用于识别声音类型和声源方位的声音识别模型；采用智能音箱实时采集监控环境中的声音信号，从声音信号中实时提取声音的声学特征和声源的方位特征，将提取的声学特征和方位特征实时输入到声音识别模型；根据声音识别模型输出的声音的类型和声源的方位，确定监控场景是否发生异常。本发明提高了监控效率。

The present invention proposes a method and device for monitoring by using a smart speaker. The method includes: for each type of sound that may appear in the monitoring environment, according to all possible positions of the sound source of the type of sound relative to the smart speaker in the monitoring environment, using the smart speaker to collect the sound source of the type of sound at each possible position and emit the sound. extract the acoustic features of the sound and the azimuth features of the sound source from the sound signal; input the acoustic features and azimuth features into the training model for training, and obtain a sound recognition model for identifying the type of sound and the azimuth of the sound source; using The smart speaker collects the sound signals in the monitoring environment in real time, extracts the acoustic features of the sound and the azimuth features of the sound source in real time from the sound signals, and inputs the extracted acoustic features and azimuth features into the sound recognition model in real time; according to the sound output from the sound recognition model The type and position of the sound source to determine whether the monitoring scene is abnormal. The present invention improves monitoring efficiency.

Description

Method and device for monitoring by adopting intelligent sound box

Technical Field

The invention relates to the technical field of monitoring. In particular to a method and a device for monitoring by adopting an intelligent sound box.

Background

The holding capacity of the intelligent sound box is increasing year by year, and the overall market of the intelligent sound box in 2018 is predicted to reach 5,630 hundred million units. The main functions of present intelligent audio amplifier are: the intelligent home control system comprises intelligent home control, voice shopping, mobile phone recharging, called takeaway, audio music playing and the like.

At present, an indoor monitoring mode mainly depends on a camera or a sensor to achieve the purpose of monitoring. The disadvantages of this approach are as follows:

firstly, the abnormity cannot be sensed automatically, and the monitoring efficiency is low;

secondly, an indoor monitoring mode based on abnormal image perception shot by a camera can expose images of a target place, and the privacy of a user cannot be well protected;

and thirdly, installing a corresponding sensor by a user based on an indoor monitoring mode of sensing abnormity of the sensor.

Disclosure of Invention

The invention provides a method and a device for monitoring by adopting an intelligent sound box, which are used for improving the monitoring efficiency.

The technical scheme of the invention is realized as follows:

a method for monitoring by adopting an intelligent sound box comprises the following steps:

for each type of sound which may appear in the monitoring environment, respectively acquiring sound signals emitted by the sound source of the type of sound at each possible position by adopting the intelligent sound box according to all possible positions of the sound source of the type of sound in the monitoring environment relative to the intelligent sound box;

for sound signals of each collected type of sound at each possible position, respectively extracting acoustic characteristics of the sound and azimuth characteristics of a sound source from the sound signals;

inputting the acoustic characteristics and the orientation characteristics of all collected sound signals of all categories into a preset training model for training to obtain a sound identification model for identifying the sound type and the sound source orientation;

collecting sound signals in a monitoring environment in real time by adopting an intelligent sound box, extracting acoustic characteristics of sound and azimuth characteristics of a sound source from the collected sound signals in real time, and inputting the extracted acoustic characteristics and azimuth characteristics into a sound identification model in real time;

and determining whether the monitoring scene is abnormal or not according to the type of the sound output by the sound identification model and the direction of the sound source.

The acoustic features of the sound include:

short-time amplitude zero crossing rate, short-time average energy and Mel frequency cepstrum coefficient MFCC;

or comprises the following steps: short-time amplitude zero crossing rate, short-time average energy, and MFCC, and one or any combination of: wavelet packet decomposition coefficient, energy, amplitude or power of fundamental tone sub-band, adjacent boundary band characteristic vector and linear predictive coding cepstrum coefficient LPCC.

The azimuth characteristics of the sound source include: an interaural time difference ITD and an interaural intensity difference IID;

or comprises the following steps: ITD and IID, and one or a combination of: interaural relative attenuation ILD, interaural phase difference IPD.

After adopting intelligent audio amplifier real-time collection monitoring environment's sound signal, further include before the orientation characteristic of acoustic feature and the sound source of extracting sound in real time from the sound signal of gathering:

judging whether the frequency of the collected sound signal is greater than a preset frequency threshold value, if so, executing the action of extracting the acoustic characteristics of the sound and the azimuth characteristics of the sound source from the collected sound signal in real time; and if not, discarding the collected sound signal.

The training model is a deep neural network model or a long-short term memory neural network model.

The monitoring scene is an indoor scene, and the sound type in the monitoring scene comprises one or any combination of the following: rain, tap running water, door opening, sneezing, coughing and alarming;

the position range of each area in the indoor scene relative to the intelligent sound box is predetermined;

the determining whether the monitoring scene is abnormal according to the type of the sound output by the sound recognition model and the direction of the sound source comprises:

according to the direction of the sound source output by the sound recognition model and the position range of each area in the indoor scene relative to the intelligent sound box, the area in the indoor scene where the sound source is located is determined, whether equipment facilities in the area need to be controlled or not is judged according to the type of the sound, and if yes, a corresponding control instruction is sent to a control device of the corresponding equipment facility.

The judging whether the equipment facilities in the area need to be controlled according to the type of the sound comprises the following steps:

if the type of the sound is rain sound, judging whether the window in the area is closed, and if not, sending a closing instruction to a control device of the window; or comprises the following steps:

if the sound type is tap running water sound, sending a closing instruction to a control device of the tap in the area; or comprises the following steps:

if the type of the sound is door opening sound and the sound source is determined to come from an entrance door according to the direction of the sound source, judging whether the current time is the home returning time of the user according to a preset home returning time range of the user, if not, determining that abnormal door opening occurs, and starting a recording or/and monitoring camera; or comprises the following steps:

if the sound type is sneezing sound, judging whether the temperature of the area is lower than a preset comfortable temperature or not, if so, judging whether the air conditioner is started or not, if so, inquiring whether the temperature of the air conditioner is raised or not, if not, inquiring whether the air conditioner is started or not, and sending a corresponding control instruction to a control device of the air conditioner according to user feedback; or comprises the following steps:

if the sound type is cough sound, judging whether the air quality of the area is lower than a preset standard, and if so, inquiring whether a user needs to start an air purifier; or comprises the following steps:

if the type of the sound is an alarm sound, whether the electric equipment exists in the area is judged, and if the type of the sound is the alarm sound, alarm information aiming at the electric equipment is sent to a user.

The utility model provides an adopt intelligent audio amplifier to carry out the device monitored, the device includes:

the sound acquisition module is used for acquiring sound signals in a monitoring environment in real time by adopting an intelligent sound box, extracting acoustic characteristics of sound and azimuth characteristics of a sound source from the acquired sound signals in real time, and inputting the extracted acoustic characteristics and azimuth characteristics into the sound recognition model in real time;

and the voice recognition module is used for calculating the input acoustic characteristics and the orientation characteristics according to the voice recognition model and outputting a recognition result: the type of sound and the orientation of the sound source; the voice recognition model is obtained by the following method: for each type of sound which may appear in a monitoring environment, according to all possible positions of a sound source of the type of sound in the monitoring environment relative to an intelligent sound box, respectively acquiring sound signals emitted by the sound source of the type of sound at each possible position by the intelligent sound box, respectively extracting acoustic characteristics of the sound and azimuth characteristics of the sound source from the sound signals of the acquired type of sound at each possible position, inputting the acoustic characteristics and the azimuth characteristics of all the acquired types of sound signals into a preset training model for training, and obtaining a sound identification model for identifying the sound type and the sound source azimuth;

and the abnormity judgment module is used for determining whether the monitoring scene is abnormal or not according to the output result of the voice recognition module.

the abnormality judgment module is further used for predetermining the position range of each region in the indoor scene relative to the intelligent sound box; and determining the area in the indoor scene where the sound source is located according to the direction of the sound source output by the sound recognition model and the position range of each area in the indoor scene relative to the intelligent sound box, judging whether equipment facilities in the area need to be controlled or not according to the type of the sound, and if so, sending a corresponding control instruction to a control device of the corresponding equipment facility.

The abnormal judgment module judges whether the equipment facilities in the area need to be controlled according to the type of the sound, and the abnormal judgment module comprises the following steps:

The invention can automatically identify the type and the direction of the environmental sound, and improves the monitoring efficiency.

Drawings

Fig. 1 is a flowchart of a monitoring method using a smart speaker according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for obtaining a voice recognition model according to an embodiment of the present invention;

FIG. 3 is a schematic view showing the orientation of a sound source;

FIG. 4 is a flowchart of a method for monitoring using a voice recognition model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a device for monitoring by using a smart sound box according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a flowchart of a method for monitoring by using an intelligent speaker according to an embodiment of the present invention, which includes the following specific steps:

step 101: for each type of sound which may appear in the monitoring environment, according to all possible positions of the sound source of the type of sound in the monitoring environment relative to the intelligent sound box, the intelligent sound box is adopted to respectively collect sound signals emitted by the sound source of the type of sound at each possible position.

Step 102: for the sound signal of each collected type of sound at each possible position, the acoustic characteristics of the sound and the azimuth characteristics of the sound source are respectively extracted from the sound signal.

Step 103: and inputting the acoustic characteristics and the orientation characteristics of all the collected sound signals of all the categories into a preset training model for training to obtain a sound identification model for identifying the sound type and the sound source orientation.

Step 104: the method comprises the steps of adopting an intelligent sound box to collect sound signals in a monitoring environment in real time, extracting acoustic characteristics of sound and azimuth characteristics of a sound source from the collected sound signals in real time, and inputting the extracted acoustic characteristics and azimuth characteristics into a sound recognition model in real time.

Step 105: and determining whether the monitoring scene is abnormal or not according to the type of the sound output by the sound identification model and the direction of the sound source.

Fig. 2 is a flowchart of a method for obtaining a voice recognition model according to an embodiment of the present invention, which includes the following specific steps:

step 201: the types of all sounds that may occur in the monitored environment, and all possible locations of the sound sources for each type of sound in the monitored environment relative to the smart speakers, are predetermined.

Step 202: for each type of sound which may appear in the monitoring environment, according to all possible positions of the sound source of the type of sound in the monitoring environment relative to the intelligent sound box, the intelligent sound box is adopted to respectively collect sound signals emitted by the sound source of the type of sound at each possible position.

For example: for each type of sound, the relative position between the sound source of the type of sound and the smart sound box can be represented by (r, θ), where r is the distance between the sound source and the smart sound box, and θ is the included angle between the emitted wave of the sound source and the midperpendicular of the smart sound box, as shown in fig. 3.

In practical application, for each type of sound, the sound can be prerecorded for a certain time, then the recorded type of sound is played at each possible relative position according to all possible relative positions of the sound source of the type of sound and the intelligent sound box in the monitoring environment and the position of the current intelligent sound box, and meanwhile, the intelligent sound box collects the played sound signal. Wherein, in determining each possible relative position, the following may be employed:

such as: the range of all possible relative positions of the sound source of a certain type of sound to the smart loudspeaker is expressed as: r1 is not less than r2, theta 1 is not less than theta 2, the step length of r is set as dr, the step length of theta is set as d theta, the dr and the d theta are respectively used as the step length, each position in the range that r1 is not less than r2 and theta 1 is not less than theta 2 is determined, the recorded sound signals are respectively played at each position, and meanwhile, the played sound signals are collected by an intelligent sound box.

Through thisstep 202, the smart speaker can finally acquire the sound signal of each type of sound at each position of the monitoring scene relative to the smart speaker.

Step 203: and for the sound signal at each possible position of each type of sound collected by the intelligent sound box, respectively extracting the acoustic characteristics of the sound and the orientation characteristics of the sound source from the sound signal.

Wherein the extracted acoustic features of the sound at least include: short-time amplitude zero crossing rate, short-time average energy, and Mel-frequency Cepstral coefficients (MFCC), and may further comprise one or any combination of the following: wavelet Packet Decomposition (WPD) Coefficients, pitch subband energy, amplitude or power, neighborhood band feature vectors, Linear Prediction coding Cepstral Coefficients (LPCC).

The extracted azimuth features of the sound source at least include: interaural Time Difference (ITD) and Interaural Intensity Difference (IID), and may further include: interaural relative attenuation (ILD) or/and Interaural Phase Difference (IPD).

When extracting the azimuth feature from the sound signal, the sound signal is first decomposed into a plurality of Time-Frequency (T-F) units. Each T-F unit corresponds to a time frame of one channel in the filter bank, and then ITD, IID, ILD and IPD are calculated based on each T-F unit.

As shown in fig. 3, in practical applications, two microphones of the smart speaker can be abstracted into a sphere with a radius a, and left and right ears are similar to mass points at two ends of the sphere. The distance between the sound source and the center of the head is r. For a sound wave with an incident angle θ, considering the transmission at the spherical surface of the head, the formula for the interaural time difference is expressed as:

from equation (1), it can be seen that the ITD (θ) is an odd function with respect to θ. When theta is in the interval

ITD is positive and monotonically increasing with respect to θ; when theta is in the interval

ITD is negative and monotonically increasing with respect to θ. ITDs can be an important basis for sound source localization.

IID characterizes the difference in the intensity of the sound signals received by the left and right ears and is also an important clue for sound separation based on sound source localization. When the sound source deviates from the midperpendicular of the binaural connecting line, the sound waves arrive at the two ears with different attenuations due to different paths.

The characteristics of the IID and ITD of sound signals of the same sound type at different azimuths are also different, so that the IID and ITD can be used to distinguish the azimuth of the sound source.

Step 204: and inputting all the acoustic characteristics and the orientation characteristics of each type of collected sound signals into a preset training model for training to obtain a sound recognition model.

The training model may employ Deep Neural Networks (DNN) or Long-Short Term Memory (LSTM) Neural network models. For example: the LSTM neural network model employed was: each sub-band LSTM neural network classifier comprises an input layer, two hidden layers, and an output layer, each hidden layer comprising 256 neurons, inputs acoustic features and orientation features into the input layer of the LSTM neural network model.

The inputs to the voice recognition model are: the acoustic characteristics of the sound and the azimuth characteristics of the sound source are output as the type of sound and the azimuth of the sound source.

Fig. 4 is a flowchart of a monitoring method using a voice recognition model according to an embodiment of the present invention, which includes the following specific steps:

step 401: the intelligent sound box collects sound signals in a monitoring scene in real time.

In practical applications, a lower frequency limit may be set for each type of sound, and if the frequency of the sound signal is lower than the lower frequency limit, the sound signal is considered invalid and is directly filtered out.

Step 402: and extracting acoustic features of sound and azimuth features of a sound source from sound signals collected by the intelligent sound box in real time, and inputting the extracted acoustic features and azimuth features into the sound recognition model in real time.

Step 403: the sound recognition model outputs recognition results in real time according to the input acoustic features and the orientation features: the type of sound and the orientation of the sound source.

Step 404: and judging whether the monitoring scene is abnormal or not according to the type of the sound and the direction of the sound source, and if so, performing exception handling.

The invention can be used in any required monitoring scene, the most common monitoring scene is such as an indoor scene.

The following gives a specific procedure for applying the invention in an indoor monitoring scenario.

In indoor scenes, common sound types are: rain, tap running water, door opening, sneezing, coughing, alarms, and the like.

First, a process of determining a voice recognition model is given:

step 01: and determining the position range of various sounds relative to the intelligent sound box according to the indoor position and the indoor layout of the intelligent sound box.

Step 02: for each type of sound, according to the position range of the type of sound relative to the intelligent sound box: r1 is more than or equal to r2, theta 1 is more than or equal to theta 2, the collected distance step dr and angle step d theta are used for determining each collection position, and the intelligent sound box is used for collecting sound signals emitted by the sounds at each collection position.

Step 03: acoustic features of sound and azimuth features of a sound source are extracted from collected sound signals from all types and all locations.

Step 04: inputting all the extracted acoustic features and orientation features into a training model for training, wherein the expected output is as follows: and (5) obtaining a voice recognition model after training of the type of the voice and the direction of the sound source.

Then, a voice recognition and exception handling process is given:

step 01: the position ranges of various areas in the room are collected in advance.

For example: for a home environment, this typically includes: in the areas such as a living room, a bedroom, a kitchen, a bathroom, a balcony and the like, the position range of each area needs to be collected firstly, the representation mode of the position range is the same as the position of each type of sound relative to the intelligent sound box, and the position range of the sound box are represented by the relative position of the sound box, so that subsequent abnormal processing is facilitated.

Step 02: the intelligent sound box collects sound signals in real time.

Step 03: and extracting acoustic features of the sound and the azimuth features of the sound source from the sound signals in real time, and inputting the extracted acoustic features and the azimuth features into the sound recognition model in real time.

Step 04: the sound recognition model outputs recognition results in real time according to the input acoustic features and the orientation features: the type of sound and the orientation of the sound source.

Step 05: and judging whether an abnormality occurs according to the type of the sound and the direction of the sound source, and if so, performing abnormality processing.

Specifically, for example:

if the sound type is rain sound, determining which room the rain sound comes from according to the direction of the sound source and the position range of each indoor area, determining the rain sound volume level according to the rain sound volume, and sending the rain sound volume level and the room from the rain sound volume level to a user terminal, such as: the mobile phone sends a closing command to the control device if the room has a window and is not closed and the window has the control device; then, in a preset time period such as: and shielding the rain sound identification result within four hours.

And if the sound type is tap running water sound, and the running water sound is determined to come from the kitchen according to the direction of the sound source and the position range of each indoor area, transmitting the sound type and the room to a mobile phone of a user, and if the tap is provided with a control device, transmitting a closing command to the control device.

If the sound type is door opening sound, judging whether the current time is in the time range of the user returning home according to the time range of the user returning home set by the user, if not, determining that the current time is abnormal, starting the recording equipment to record, simultaneously starting other monitoring equipment (such as a camera), informing the direction of the door opening sound to the monitoring equipment so that the monitoring equipment monitors the direction, simultaneously sending abnormal door opening information to a mobile phone of the user, and then closing the recording equipment and other monitoring equipment if receiving a normal notification returned by the user.

And fourthly, if the sound type is sneezing, determining that the sneezing comes from the bedroom according to the position of the sound source and the position range of each indoor area, monitoring the frequency of the sneezing, judging whether the temperature of the bedroom is within the preset comfortable temperature range when the frequency of the sneezing reaches a preset first threshold value, if not, sending an inquiry message of whether the bedroom air conditioner needs to be started or not or whether the temperature of the bedroom air conditioner needs to be adjusted to the mobile phone of the user, and sending a corresponding control command to the control device of the air conditioner according to user feedback.

And fifthly, if the sound type is cough sound, determining that the cough sound comes from the living room according to the direction of the sound source and the position range of each indoor area, monitoring the frequency of the cough sound, judging whether the air quality of the living room is lower than a preset standard when the frequency of the cough sound reaches a preset second threshold, and if the frequency of the cough sound is lower than the preset second threshold, sending an inquiry message whether the air purifier needs to be started to the mobile phone of the user.

And sixthly, if the sound type is the alarm sound, determining a room from which the alarm sound comes according to the sound source direction and the position range of each indoor area, and sending the alarm sound and the room from which the alarm sound comes to the mobile phone of the user.

Such as: air purifier's filter screen has need changed, and air purifier can regularly send the warning, if the user just can not hear this chimes of doom at home.

Therefore, when the method is applied to indoor environment, the control function of the intelligent home can be combined to realize the control of indoor facility equipment. Of course, even if the control function of the intelligent home is not available indoors, the monitoring scheme of the intelligent home monitoring system cannot be influenced.

The following gives examples of specific applications of the invention in indoor environments:

example 1

Step 01: and setting a monitoring mode.

The user sets up intelligent audio amplifier into monitor mode, increases following environment sound in awakening up the sound: rain, tap running water, door opening, sneezing, coughing and alarming.

Step 02: the intelligent sound box starts to collect the environmental sound and extracts the acoustic characteristics and the orientation characteristics of the environmental sound in real time.

The acoustic features include: WPD coefficient, fundamental tone sub-band energy, short-time zero-crossing rate, short-time average energy, real-time amplitude or power, adjacent boundary band feature vector, LPCC and MFCC.

The orientation features include: ITD, IID, ILD and IPD.

Step 03: inputting the acoustic features and the orientation features extracted in the step 02 into a sound recognition model, and outputting recognized sound types by the sound recognition model: matching the sound source direction with the position range of each pre-recorded indoor area, determining that the sound source is from a balcony, and judging that the rain level is level 2 according to the sound volume and the volume range corresponding to each preset volume level.

For example: the preset rain sound level has two levels: the rain sound volume is higher than a preset volume threshold value, the rain sound volume is the first level, otherwise, the rain sound volume is the second level, and a user needs to be informed when the rain sound volume is the first level.

Step 04: according to the preset rain level needing to be informed to the user, the following conditions are obtained: the level 2 rain sound does not need to notify the user, and the user is not notified.

Step 05: returning to the step 02, and continuing monitoring.

Thereafter, if it is detected that the volume of the rain sound is upgraded to level 1, and if there is a control device in a window of a room from which the rain sound comes and the window is not closed, the control device is notified to close the window, and the level and direction of the rain sound and the control result of the window control device (close the window) are transmitted to the user terminal.

Example two

Step 01: and setting a monitoring mode.

Step 03: inputting the acoustic features and the orientation features extracted in the step 02 into a sound recognition model, and outputting recognized sound types by the sound recognition model: and matching the door opening sound with the sound source direction, and determining that the sound source comes from the entrance door by matching the sound source direction with the position range of each pre-recorded indoor area.

Step 04: judging whether the current time is the home returning time of the user according to a preset home returning time range of the user, determining that abnormal door opening occurs, starting recording, simultaneously starting other monitoring equipment (such as a camera), informing the monitoring equipment of monitoring directions (namely the directions of the entrance doors), sending abnormal door opening information to the user terminal, and stopping recording and closing other monitoring equipment if the user terminal returns normal information.

Step 05: returning to the step 02, and continuing monitoring.

Example three

Step 01: and setting a monitoring mode.

Step 03: inputting the acoustic features and the orientation features extracted in the step 02 into a sound recognition model, and outputting the recognized sound type by the sound type recognition model: sneezing sound and sound source direction, and the room from which the sound source is coming is determined by matching the sound source direction with the position ranges of the respective areas in the room recorded in advance.

Step 04: the sneezing sound is processed for times, if the sneezing sound exceeds 5 times, whether the current indoor temperature is lower than a preset comfortable temperature or not is judged, if the current indoor temperature is lower than the preset comfortable temperature, whether the air conditioner is started or not is judged, if the air conditioner is started, whether the air conditioner temperature is increased or not is inquired for a user, and if the air conditioner is not started, whether the air conditioner is started or not is inquired for the user; and performing related operations according to the user feedback.

Step 05: returning to the step 02, and continuing monitoring.

Example four

Step 01: and setting a monitoring mode.

Step 03: inputting the acoustic features and the orientation features extracted in the step 02 into a sound recognition model, and outputting the recognized sound type by the sound type recognition model: and the alarm sound and the sound source direction are matched with the position ranges of the pre-recorded indoor areas, so that the room from which the sound source comes is determined.

Step 04: according to the alarm sound, searching the household appliances in the room according to the existing household appliance list, sending the information that the household appliances generate the alarm to the user, and then, in the preset time period, if: the alarm sound of the room is masked within 4 hours.

Step 05: returning to the step 02, and continuing monitoring.

Fig. 5 is a schematic structural diagram of a device for monitoring by using an intelligent sound box according to an embodiment of the present invention, where the device mainly includes: asound collection module 51, asound identification module 52 and anabnormality judgment module 53, wherein:

thesound collection module 51 is configured to collect sound signals in the monitoring environment in real time by using the smart speaker, extract acoustic features of sound and orientation features of a sound source from the collected sound signals in real time, and input the extracted acoustic features and orientation features to thesound recognition model 52 in real time.

Thevoice recognition module 52 is configured to calculate the input acoustic features and the orientation features according to the voice recognition model, and output a recognition result: the type of sound and the orientation of the sound source; the voice recognition model is obtained by the following method: for each type of sound which may appear in a monitoring environment, according to all possible positions of a sound source of the type of sound in the monitoring environment relative to an intelligent sound box, the intelligent sound box is adopted to collect sound signals emitted by the sound source of the type of sound at each possible position, for the collected sound signals of each type of sound at each possible position, acoustic features of the sound and azimuth features of the sound source are respectively extracted from the sound signals, the acoustic features and the azimuth features of all the collected sound signals of all the types are input into a preset training model to be trained, and a sound identification model for identifying the sound type and the sound source azimuth is obtained.

And ananomaly determination module 53, configured to determine whether an anomaly occurs in the monitored scene according to an output result of thevoice recognition module 52.

In practical applications, the monitoring scene may be an indoor scene, and the sound type in the monitoring scene may include one or any combination of the following: rain, tap running water, door opening, sneezing, coughing and alarming;

moreover, theanomaly determination module 53 may be further configured to predetermine a position range of each region in the indoor scene relative to the smart sound box; and determining the area in the indoor scene where the sound source is located according to the direction of the sound source output by the sound recognition model and the position range of each area in the indoor scene relative to the intelligent sound box, judging whether equipment facilities in the area need to be controlled or not according to the type of the sound, and if so, sending a corresponding control instruction to a control device of the corresponding equipment facility.

In practical applications, the determining whether the equipment facilities in the area need to be controlled according to the type of the sound by theabnormality determining module 53 may include:

The invention has the following beneficial technical effects:

firstly, the type of the environmental sound and the direction of the sound source can be identified at one time, and multiple times of identification are not needed, so that the identification time is saved;

secondly, images do not need to be shot, and privacy is protected;

the sensor is not needed, the indoor environment can be monitored in the indoor environment without the sensor or in the indoor environment of the smart home, and the monitoring cost is reduced;

and fourthly, different exception handling processes can be set according to the identified environment sound type and the sound source position, and user experience is improved.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for monitoring by adopting an intelligent sound box is characterized by comprising the following steps:

determining whether the monitoring scene is abnormal or not according to the type of the sound output by the sound identification model and the direction of the sound source;

2. The method of claim 1, wherein the acoustic characteristics of the sound comprise:

3. The method of claim 1, wherein the azimuthal characteristic of the acoustic source comprises: an interaural time difference ITD and an interaural intensity difference IID;

4. The method of claim 1, wherein after the collecting the sound signals in the monitoring environment in real time by the smart speaker, before extracting the acoustic features of the sound and the orientation features of the sound source from the collected sound signals in real time, further comprises:

5. The method of claim 1, wherein the training model is a deep neural network or a long-short term memory neural network model.

6. The method of claim 1, wherein determining whether control of equipment in the area is required according to the type of sound comprises:

7. The utility model provides an adopt intelligent audio amplifier to carry out the device monitored, its characterized in that, the device includes:

the abnormality judgment module is used for determining whether the monitoring scene is abnormal or not according to the output result of the voice recognition module; the monitoring scene is an indoor scene, and the sound type in the monitoring scene comprises one or any combination of the following: rain, tap running water, door opening, sneezing, coughing and alarming;

8. The apparatus of claim 7, wherein the abnormality determining module determining whether control of equipment in the area is required according to the type of sound comprises: