Movatterモバイル変換


[0]ホーム

URL:


CN112885350A - Control method and device of network conference, electronic equipment and storage medium - Google Patents

Control method and device of network conference, electronic equipment and storage medium
Download PDF

Info

Publication number
CN112885350A
CN112885350ACN202110213134.1ACN202110213134ACN112885350ACN 112885350 ACN112885350 ACN 112885350ACN 202110213134 ACN202110213134 ACN 202110213134ACN 112885350 ACN112885350 ACN 112885350A
Authority
CN
China
Prior art keywords
audio data
mute function
network conference
function
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110213134.1A
Other languages
Chinese (zh)
Inventor
刘俊启
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co LtdfiledCriticalBeijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110213134.1ApriorityCriticalpatent/CN112885350A/en
Publication of CN112885350ApublicationCriticalpatent/CN112885350A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本公开公开了一种网络会议的控制方法,涉及计算机技术领域,尤其涉及人工智能和语音识别领域。具体实现方案为:在网络会议程序运行期间获取音频数据;从音频数据中识别语音指令;根据所识别的语音指令控制网络会议程序的禁言功能。本公开还公开了一种网络会议的控制装置、电子设备和存储介质。

Figure 202110213134

The present disclosure discloses a control method for a network conference, which relates to the field of computer technology, in particular to the fields of artificial intelligence and speech recognition. The specific implementation scheme is: acquiring audio data during the running of the network conference program; recognizing voice commands from the audio data; controlling the mute function of the network conference program according to the recognized voice commands. The present disclosure also discloses a control device, an electronic device and a storage medium for a network conference.

Figure 202110213134

Description

Control method and device of network conference, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technology, and more particularly to artificial intelligence and speech recognition technology. More specifically, the present disclosure provides a method and apparatus for controlling a web conference, an electronic device, and a storage medium.
Background
Network conferences are becoming more and more popular in people's lives. The environments of a plurality of users participating in the network conference are different, and the background sound is different, so that the overall conference background sound is noisy.
At present, the background sound input of the environment where the user is located can be avoided by starting the language forbidden function, and the voice effect of the network conference is ensured. However, the start and the stop of the language forbidden function both need to be manually operated according to the actual requirements of the user, and the manual stop of the language forbidden operation has high cost, which affects the communication efficiency of the network conference.
Disclosure of Invention
The disclosure provides a method, a device, equipment and a storage medium for controlling a network conference.
According to a first aspect, there is provided a method of controlling a web conference, the method comprising: acquiring audio data during the running of the network conference program; recognizing a voice instruction from the audio data; and controlling the language forbidden function of the network conference program according to the recognized voice instruction.
According to a second aspect, there is provided a control apparatus for a web conference, the apparatus comprising: the acquisition module is used for acquiring audio data during the running of the network conference program; the first recognition module is used for recognizing the voice command from the audio data; and the control module is used for controlling the language forbidden function of the network conference program according to the recognized voice instruction.
According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided in accordance with the present disclosure.
According to a fifth aspect, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method provided according to the present disclosure.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1A is a schematic diagram of an exemplary system architecture to which the control method and apparatus of web conferencing may be applied, according to one embodiment of the present disclosure;
fig. 1B is an exemplary scenario diagram of a control method and apparatus to which a web conference may be applied, according to one embodiment of the present disclosure;
fig. 2 is a flowchart of a control method of a web conference according to one embodiment of the present disclosure;
fig. 3 is a flowchart of a control method of a web conference according to another embodiment of the present disclosure;
fig. 4 is a flowchart of a control method of a web conference according to another embodiment of the present disclosure;
FIG. 5 is a flow diagram of a method of identifying a source of audio data according to one embodiment of the present disclosure;
fig. 6 is a block diagram of a control device of a web conference according to one embodiment of the present disclosure;
fig. 7 is a block diagram of an electronic device of a method of controlling a web conference according to one embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
With the continuous development of computer and internet technologies, networks provide more convenient communication modes for people, and application scenes such as instant messaging, online office work, online learning and the like are more and more common in people's lives. Web conferencing is a common form of implementing online office and online learning.
In the process of the network conference (which can be online meeting or online teaching), a plurality of users participate in the network conference, the environments of the users are different, the background sound is uncertain, and the background sound of the whole conference or classroom is loud. Solutions for setting a talk-bar function for online conferences have also been proposed in the related art. Specifically, a control for controlling the start and stop of the language forbidden function is arranged on the running interface of the web conference, and the user can start or stop the language forbidden function by clicking the control. When the talk-inhibiting function is started, the sound of the environment where the local user is located cannot be transmitted to the remote user, and the remote user cannot hear the sound of the local user. When the language-forbidden function is closed, the sound of the environment where the local user is located is transmitted to the remote user, and the remote user can hear the sound of the local user. The prohibition function of a certain user or a certain part of users of the network conference can be turned on or off by an administrator, for example, the administrator sets prohibition of a part of user groups during the network conference.
In the network conference (which may be online meeting or online teaching), the user may select banning talk when not speaking, and the user needs to turn off banning talk when speaking, but may forget to turn off banning talk when speaking, and after finding that the banning talk is not turned off, the user needs to turn off the banning talk first and then repeat speaking, which is cumbersome to operate.
During the network conference (which may be online meeting or online teaching), the network conference serves as an instant messaging tool, and the user can communicate with a plurality of groups, for example, perform voice communication with group a, perform file sending and receiving with group B, and the like.
In the process of the network conference (which can be online meeting or online teaching), a user can also open a plurality of application programs at the same time, and the network conference program can run in the background, so that the operations of receiving and sending messages and transmitting texts by using other application programs in the foreground by the user are not influenced.
In the above multiple application scenarios, a user has multiple task requirements, and whether the talk-inhibition function is enabled or not is adjusted at any time according to the user requirements, which may cause the following problems: (1) the user speaks in the stage of forbidding speaking function starting, which can cause that the speaking is not heard by the remote user and needs to speak again for the second time; (2) if the user is communicating with a plurality of group conferences and the talk inhibition is required to be performed after the conversation stream for voice communication is found out each time the user clicks to switch, the operation of talk inhibition is performed, and the operation cost is high; (3) when the user uses other application programs, the user needs to switch to the network conference program first and then forbid the language operation; (4) when the device (mobile phone or computer) locks the screen, the device needs to be unlocked first, then the device switches to the network conference program, and then the device performs the operation of prohibiting words, so that the operation cost is high.
Fig. 1A is a schematic diagram of an exemplary system architecture to which a control method and apparatus for a web conference may be applied, according to one embodiment of the present disclosure. It should be noted that fig. 1A is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1A, thesystem architecture 100 according to this embodiment may include a plurality ofterminal devices 101, anetwork 102, and aserver 103.Network 102 is the medium used to provide a communication link betweenterminal device 101 andserver 103.Network 102 may include various connection types, such as wired and/or wireless communication links, and so forth.
A user may useterminal device 101 to interact withserver 103 overnetwork 102 to receive or send messages and the like. Various messaging client applications, such as a web browser application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (to name a few), may be installed onterminal device 101.
Theterminal device 101 may be various electronic devices having a display screen including, but not limited to, a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like.
Theserver 103 may be a server providing various services, such as a background management server (for example only) providing support for web conference requests initiated by users using theterminal devices 101. The background management server may analyze and otherwise process the received data such as the user request, and feed back a processing result (for example, information or data obtained or generated according to the user request) to the terminal device.
For example, any one of the plurality ofterminal apparatuses 101 initiates a web conference, transmits a web conference request to theserver 103, and transmits a request to invite the remainingterminal apparatuses 101 to join the web conference. Theserver 103 creates a conference and forwards a request to invite to join the network conference to the remainingterminal apparatuses 101. After the remainingterminal apparatuses 101 join the network conference, eachterminal apparatus 101 may transmit a local text, voice, or video message to the remaining terminal apparatuses 101 (remote terminals) through theserver 103 and receive a text, voice, or video message from the remote terminals through theserver 103.
It should be noted that the method for controlling the web conference provided by the embodiment of the present disclosure may be generally executed by theterminal device 101. Accordingly, the control device for the network conference provided by the embodiment of the present disclosure may be generally disposed in theterminal device 101.
It should be understood that the number of terminal devices, networks, and servers in FIG. 1A are merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 1B is an exemplary scene diagram of a control method and apparatus to which a web conference may be applied according to one embodiment of the present disclosure.
As shown in fig. 1B, an exemplary scenario according to this embodiment may includeterminal device 110,terminal device 110 may run a web conference program, and web conference program may exposeweb conference interface 111 when running. In the left half of theweb conference interface 111, a user who is speaking or a document being explained, etc. may be displayed. In the right half of thenetmeeting interface 111, a group currently in a netmeeting may be displayed, and the group may include, for example, user a, user B, and user C. In the right half of theweb conference interface 111, some controls that can control the web conference are also displayed, such as a control for controlling the video function to be turned on and off, a control for controlling the language forbidden function to be turned on and off, a control for uploading a file, a control for sending a message, and the like.
Illustratively, theterminal device 110 is a device held by the user a, and the user a can decide whether to let the remote users (user B and user C) hear his/her own voice by clicking a control for controlling the language-forbidden function during the participation in the network conference. For example, when the user a clicks a control for controlling the talk inhibition function to turn on the talk inhibition function while not speaking, the users B and C do not hear the sound of the user a. When the user a clicks the control for controlling the talk inhibiting function again to close the talk inhibiting function when speaking, the users B and C can hear the sound of the user a.
Fig. 2 is a flowchart of a control method of a web conference according to one embodiment of the present disclosure.
As shown in fig. 2, themethod 200 for controlling a web conference may include operations S210 to S230.
According to an embodiment of the present disclosure, operations S210 to S230 may be performed by the local electronic device during the network conference, and the users participating in the network conference may include a user of the local electronic device (simply referred to as a local user) and a user of the remote electronic device (simply referred to as a remote user).
In operation S210, audio data is acquired during the operation of the network conference program.
According to embodiments of the present disclosure, the audio data may be sound data in the environment in which the local electronic device is located. When the user of the local electronic device does not speak, the sound data of the environment in which the user of the local electronic device is located may include background sounds such as system noise and natural sounds, and when the user of the local electronic device does not speak, the sound data of the environment in which the user of the local electronic device is located may include background sounds and voice of the user. Audio data may be acquired using an audio sensor, such as a microphone of the local electronic device.
In operation S220, a voice instruction is recognized from audio data.
According to the embodiment of the disclosure, the voice instruction in the audio data can be recognized by using a voice recognition technology, the voice instruction in the audio data is matched with the preset voice instruction, and if the voice instruction in the audio data is consistent with the preset voice instruction, a corresponding instruction control function is executed.
According to an embodiment of the present disclosure, the preset voice instruction may be a pre-configured voice instruction for turning on or off the language prohibition function. For example, the voice instruction for turning off the talk inhibit function may be "i want to speak, turn off talk inhibit". The voice instruction for turning on the talk-inhibit function may be "i am inhibited to talk first and you continue". The preset voice command may be configured in other forms besides the sentence spoken by the user, such as a keyword, a number, a letter, etc. spoken by the user, for example, the voice command for turning off the language forbidden function may be configured as "off", "1", or "a". The voice instruction for turning on the talk inhibit function may be configured as "on", "2" or "b", and the like. The configuration of the preset voice command is not limited to the sound generated by the user, and may also be the sound generated by the user using other sound generation tools. The above are merely examples, and may be configured according to actual needs.
In operation S230, a talk inhibit function of the network conference program is controlled according to the recognized voice instruction.
According to the embodiment of the disclosure, if the voice instruction in the audio data is consistent with the preset voice instruction for closing the language-forbidden function, and the language-forbidden function of the current network conference is in an open state, the language-forbidden function of the network conference is closed. And if the voice instruction in the audio data is consistent with the preset voice instruction for starting the language-forbidden function and the language-forbidden function of the current network conference is in a closed state, starting the language-forbidden function of the network conference.
Illustratively, if the voice instruction in the audio data is recognized as "i want to speak, close the talk-inhibiting function", which is the same as the preset voice instruction for closing the talk-inhibiting function, and the talk-inhibiting function of the current network conference is in an on state, the talk-inhibiting function of the network conference is closed. After the banned function of the network conference is turned off, a prompt message for prompting that the banned function is turned off may be generated, and the prompt message may be in the form of a sound, such as a beep. The reminder information may also be in the form of a message, such as a notification message showing "banning closed". The prompt information can enable the user to quickly know that the speech-inhibiting function is closed, the local user can start speaking, and the speaking can be heard by the remote user.
If the voice instruction in the audio data is recognized as 'I forbid speaking first, and you continue', the voice instruction is the same as the voice instruction for starting the speech forbidding function, and the speech forbidding function of the current network conference is in a closed state, the speech forbidding function of the network conference is started. After the talk-back function of the network conference is turned on, a prompt message for prompting that the talk-back function is turned on may be generated, and the prompt message may be in the form of a sound, such as a click. The reminder information may also be in the form of a message, such as a notification message showing "no statement is on". The prompt message can enable the user to quickly know that the language-prohibiting function is started, and the words spoken by the local user cannot be heard by the remote user.
According to an embodiment of the present disclosure, audio data is acquired during operation of a web conference program; recognizing a voice instruction from the audio data; the language forbidden function of the network conference program is controlled according to the recognized voice instruction, the voice instruction can be used for controlling the opening and closing of the language forbidden function of the network conference program, the convenience of switching the states of the language forbidden function is improved, and the communication efficiency of the network conference is improved.
Fig. 3 is a flowchart of a control method of a web conference according to another embodiment of the present disclosure.
As shown in fig. 3, the method for controlling the web conference may include operations S310 to S360.
In operation S310, in the case where the talk inhibit function of the network conference program is turned on, audio data is acquired and a first voice instruction is recognized.
According to the embodiment of the disclosure, under the condition that the language forbidden function of the network conference program is started, the voice sensors such as the microphone of the local electronic equipment can continue to work and acquire audio data in real time. And recognizing the collected audio data in real time by utilizing a voice recognition technology, and recognizing a voice instruction as a first voice instruction.
In operation S320, it is determined whether the recognized first voice instruction is a first preset instruction, if so, operation S330 is performed, otherwise, operation S310 is returned to.
According to an embodiment of the present disclosure, the first preset instruction may be an instruction for turning off the talk-inhibiting function, for example, the first preset instruction is "i want to speak, turn off talk-inhibiting". And judging whether the first voice instruction is a first preset instruction, if so, executing operation S330, otherwise, returning to operation S310 to continue to acquire audio data in real time by using the microphone under the condition that the language-forbidden function is started, and performing voice recognition.
In operation S330, the talk-disable function is turned off, and a first prompt message for prompting that the talk-disable function has been turned off is generated.
According to the embodiment of the disclosure, under the condition that the first voice instruction is the first preset instruction, the language-forbidden function of the network conference is automatically closed, and the first prompt information for prompting that the language-forbidden function is closed is generated, so that a user can quickly know that the language-forbidden function is closed, a local user can start speaking, and the speech can be heard by a remote user.
In operation S340, in case that the talk inhibit function of the network conference program is turned off, new audio data is acquired and a second voice instruction is recognized.
According to the embodiment of the disclosure, under the condition that the language-forbidden function of the network conference program is closed, the voice sensor such as the microphone of the local electronic device acquires new audio data in real time, the collected new audio data is identified in real time by using the voice identification technology, and the second voice instruction is identified from the acquired new audio data.
In operation S350, it is determined whether the second voice command is a second preset command, if so, operation S360 is performed, otherwise, operation S340 is returned to.
According to an embodiment of the present disclosure, the second preset instruction may be an instruction for turning on a speech-forbidden function, for example, the second preset instruction is "i forbid speaking first, and you continue". And judging whether the second voice instruction is a second preset instruction, if so, executing operation S360, otherwise, returning to operation S340 to continue to acquire new audio data in real time by using the microphone under the condition that the language-forbidden function is closed, and performing voice recognition.
In operation S360, the talk inhibit function is turned on, and a second prompt message is generated to prompt that the talk inhibit function is turned on.
According to the embodiment of the disclosure, under the condition that the second voice instruction is the second preset instruction, the language-forbidden function of the network conference is automatically started, and second prompt information for prompting that the language-forbidden function is started is generated, so that a user can quickly know that the language-forbidden function is started, and the words spoken by a local user cannot be heard by a remote user. And returns to operation S310 to continue to acquire audio data in real time using the microphone and perform voice recognition with the language-inhibited function turned on.
Fig. 4 is a flowchart of a control method of a web conference according to another embodiment of the present disclosure.
As shown in fig. 4, the method for controlling the web conference may include operations S410 to S440.
In operation S410, audio data is acquired during the operation of the network conference program.
According to the embodiment of the disclosure, the audio data is audio data in the environment where the local electronic device is located, and the audio data can be acquired by using an audio sensor such as a microphone of the local electronic device.
In operation S420, it is determined whether the source of the audio data is human voice, and if so, operation S430 is performed, otherwise, operation S410 is returned.
According to the embodiment of the disclosure, the collected audio data is identified in real time, and whether the source of the collected audio data is human voice is judged, so that whether the user speaks can be quickly judged. Specifically, if it is human voice, it indicates that the user has spoken, and operation S430 is performed. If the voice is not human voice, it indicates that the user has not spoken, and returns to S410 to continue to use the microphone to acquire audio data in real time.
In operation S430, a voice instruction is recognized from audio data.
In operation S440, a talk inhibit function of the network conference program is controlled according to the recognized voice instruction.
According to the embodiment of the disclosure, if the voice instruction in the audio data is consistent with the preset voice instruction for closing the language-forbidden function, and the language-forbidden function of the current network conference is in an open state, the language-forbidden function of the network conference is closed. And if the voice instruction in the audio data is consistent with the preset voice instruction for starting the language-forbidden function and the language-forbidden function of the current network conference is in a closed state, starting the language-forbidden function of the network conference.
Fig. 5 is a flowchart of a method of identifying a source of audio data according to one embodiment of the present disclosure.
As shown in fig. 5, the method includes operations S5421 to S5422.
In operation S5421, spectral features of audio data are extracted from at least a portion of the audio data.
According to the embodiment of the disclosure, if the voice is generated by a user, the voice intensity is continuously provided with the audio data after being detected, the spectrum characteristics can be extracted from the continuous audio data, and whether the source of the continuous audio data is human voice can be identified according to the spectrum characteristics of the continuous audio data.
In operation S5422, a source of audio data is identified based on spectral features using a speech recognition model.
According to the embodiment of the disclosure, whether the source of the audio data is human voice can be recognized based on the spectral features of the audio data by using a voice recognition model, the voice recognition model can be obtained by training based on a neural network model, the training data can comprise the spectral features of the human voice, the spectral features of animal voice, the spectral features of natural sound and the like, the spectral features of the human voice are used as positive samples, the spectral features of the animal voice and the spectral features of the natural sound and the like are used as negative samples, and the neural network model is trained by using the positive samples and the negative samples to obtain the trained neural network model as the voice recognition model. The speech recognition model can recognize whether the source of the audio data is human speech for the spectral features of the input audio data.
Fig. 6 is a block diagram of a control device of a web conference according to one embodiment of the present disclosure.
As shown in fig. 6, thecontrol 600 of the web conference may include anacquisition module 601, afirst recognition module 602, and acontrol module 603.
The obtainingmodule 601 is used for obtaining audio data during the operation of the network conference program.
Thefirst recognition module 602 is configured to recognize a voice command from audio data.
Thecontrol module 603 is configured to control a talk-disable function of the netmeeting program according to the recognized voice instruction.
According to an embodiment of the present disclosure, thecontrol module 603 comprises a first control unit.
The first control unit is used for closing the language-forbidden function of the network conference under the condition that the language-forbidden function is in an opening state and the recognized voice instruction comprises a first instruction.
According to an embodiment of the present disclosure, thecontrol 600 of the web conference further includes a first generation module.
The first generation module is used for generating prompt information for prompting that the language-forbidden function is closed after the first control unit closes the language-forbidden function.
According to an embodiment of the present disclosure, thecontrol module 603 comprises a second control unit.
The second control unit is used for starting the language-forbidden function under the condition that the language-forbidden function of the network conference is in a closed state and the recognized voice instruction comprises a second instruction.
According to an embodiment of the present disclosure, thecontrol 600 of the web conference further includes a second generation module.
The second generation module is used for generating prompt information for prompting that the language-forbidden function is started after the second control unit starts the language-forbidden function.
According to an embodiment of the present disclosure, thecontrol 600 of the web conference further comprises a second identification module.
The second recognition module is used for recognizing the source of the audio data before the first recognition module recognizes the voice instruction from the audio data, wherein the first recognition module is executed under the condition that the source of the audio data is human voice.
According to an embodiment of the present disclosure, the second recognition module includes an extraction unit and a recognition unit.
The extraction unit is used for extracting the spectral characteristics of the audio data from at least one part of the audio data;
the recognition unit is configured to recognize a source of the audio data based on the spectral feature using a speech recognition model.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 7 illustrates a schematic block diagram of an exampleelectronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, thedevice 700 comprises acomputing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from astorage unit 708 into a Random Access Memory (RAM) 703. In theRAM 703, various programs and data required for the operation of thedevice 700 can also be stored. Thecomputing unit 701, theROM 702, and theRAM 703 are connected to each other by abus 704. An input/output (I/O)interface 705 is also connected tobus 704.
Various components in thedevice 700 are connected to the I/O interface 705, including: aninput unit 706 such as a keyboard, a mouse, or the like; anoutput unit 707 such as various types of displays, speakers, and the like; astorage unit 708 such as a magnetic disk, optical disk, or the like; and acommunication unit 709 such as a network card, modem, wireless communication transceiver, etc. Thecommunication unit 709 allows thedevice 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of thecomputing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. Thecalculation unit 701 executes the respective methods and processes described above, such as the control method of the network conference. For example, in some embodiments, the method of controlling a web conference may be implemented as a computer software program tangibly embodied in a machine-readable medium, such asstorage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed ontodevice 700 viaROM 702 and/orcommunications unit 709. When the computer program is loaded into theRAM 703 and executed by thecomputing unit 701, one or more steps of the control method of the network conference described above may be performed. Alternatively, in other embodiments, thecomputing unit 701 may be configured by any other suitable means (e.g., by means of firmware) to perform the control method of the web conference.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

Translated fromChinese
1.一种网络会议的控制方法,包括:1. A control method for a network conference, comprising:在网络会议程序运行期间获取音频数据;Obtain audio data during the running of the web conference program;从所述音频数据中识别语音指令;recognize voice commands from the audio data;根据所识别的语音指令控制所述网络会议程序的禁言功能。The mute function of the network conference program is controlled according to the recognized voice command.2.根据权利要求1所述的方法,其中,所述根据所识别的语音指令控制所述网络会议程序的禁言功能包括:2. The method according to claim 1, wherein the controlling the mute function of the network conference program according to the recognized voice command comprises:在所述网络会议的禁言功能为开启状态且所识别的语音指令包括第一指令的情况下,关闭所述禁言功能。When the mute function of the network conference is on and the recognized voice command includes the first command, the mute function is turned off.3.根据权利要求2所述的方法,在关闭所述禁言功能之后,还包括:3. The method according to claim 2, after turning off the mute function, further comprising:产生用于提示所述禁言功能已关闭的提示信息。A prompt message for prompting that the mute function has been turned off is generated.4.根据权利要求1所述的方法,其中,所述根据所识别的语音指令控制所述网络会议程序的禁言功能包括:4. The method according to claim 1, wherein the controlling the mute function of the network conference program according to the recognized voice command comprises:在所述网络会议的禁言功能为关闭状态且所识别的语音指令包括第二指令的情况下,开启所述禁言功能。When the mute function of the network conference is off and the recognized voice command includes the second command, the mute function is enabled.5.根据权利要求4所述的方法,在开启所述禁言功能之后,还包括:5. The method according to claim 4, after enabling the mute function, further comprising:产生用于提示所述禁言功能已开启的提示信息。A prompt message for prompting that the mute function has been turned on is generated.6.根据权利要求1所述的方法,在从所述音频数据中识别语音指令之前,还包括:6. The method of claim 1, prior to recognizing voice commands from the audio data, further comprising:识别所述音频数据的来源;identify the source of the audio data;其中,在所述音频数据的来源是人类语音的情况下,执行从所述音频数据中识别语音指令的操作。Wherein, when the source of the audio data is human voice, the operation of recognizing the voice command from the audio data is performed.7.根据权利要求6所述的方法,其中,所述识别所述音频数据的来源包括:7. The method of claim 6, wherein the identifying the source of the audio data comprises:从所述音频数据的至少一部分中提取所述音频数据的频谱特征:extracting spectral features of the audio data from at least a portion of the audio data:使用语音识别模型基于所述频谱特征识别所述音频数据的来源。A source of the audio data is identified based on the spectral features using a speech recognition model.8.一种网络会议的控制装置,包括:8. A control device for a network conference, comprising:获取模块,用于在网络会议程序运行期间获取音频数据;an acquisition module for acquiring audio data during the running of the web conference program;第一识别模块,用于从所述音频数据中识别语音指令;a first recognition module for recognizing voice commands from the audio data;控制模块,用于根据所识别的语音指令控制所述网络会议程序的禁言功能。The control module is used for controlling the mute function of the network conference program according to the recognized voice command.9.根据权利要求8所述的装置,其中,所述控制模块包括:9. The apparatus of claim 8, wherein the control module comprises:第一控制单元,用于在所述网络会议的禁言功能为开启状态且所识别的语音指令包括第一指令的情况下,关闭所述禁言功能。A first control unit, configured to disable the mute function when the mute function of the network conference is on and the recognized voice command includes the first command.10.根据权利要求9所述的装置,所述装置还包括:10. The apparatus of claim 9, further comprising:第一产生模块,用于在第一控制单元关闭所述禁言功能之后,产生用于提示所述禁言功能已关闭的提示信息。The first generating module is configured to generate prompt information for prompting that the mute function has been closed after the first control unit deactivates the mute function.11.根据权利要求8所述的装置,其中,所述控制模块包括:11. The apparatus of claim 8, wherein the control module comprises:第二控制单元,用于在所述网络会议的禁言功能为关闭状态且所识别的语音指令包括第二指令的情况下,开启所述禁言功能。The second control unit is configured to enable the mute function when the mute function of the network conference is off and the recognized voice command includes the second command.12.根据权利要求11所述的装置,所述装置还包括:12. The apparatus of claim 11, further comprising:第二产生模块,用于在第二控制单元开启所述禁言功能之后,产生用于提示所述禁言功能已开启的提示信息。The second generating module is configured to generate prompt information for prompting that the mute function has been activated after the second control unit enables the mute function.13.根据权利要求8所述的装置,所述装置还包括:13. The apparatus of claim 8, further comprising:第二识别模块,用于在第一识别模块从所述音频数据中识别语音指令之前,识别所述音频数据的来源;a second recognition module for recognizing the source of the audio data before the first recognition module recognizes the voice command from the audio data;其中,在所述音频数据的来源是人类语音的情况下,执行第一识别模块。Wherein, when the source of the audio data is human voice, the first recognition module is executed.14.根据权利要求13所述的装置,其中,所述第二识别模块包括:14. The apparatus of claim 13, wherein the second identification module comprises:提取单元,用于从所述音频数据的至少一部分中提取所述音频数据的频谱特征;an extraction unit for extracting spectral features of the audio data from at least a part of the audio data;识别单元,用于使用语音识别模型基于所述频谱特征识别所述音频数据的来源。An identification unit for identifying the source of the audio data based on the spectral feature using a speech recognition model.15.一种电子设备,包括:15. An electronic device comprising:至少一个处理器;以及at least one processor; and与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1至7中任一项所述的方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the execution of any one of claims 1 to 7 Methods.16.一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1至7中任一项所述的方法。16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 7.17.一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1至7中任一项所述的方法。17. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 7.
CN202110213134.1A2021-02-252021-02-25Control method and device of network conference, electronic equipment and storage mediumPendingCN112885350A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110213134.1ACN112885350A (en)2021-02-252021-02-25Control method and device of network conference, electronic equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110213134.1ACN112885350A (en)2021-02-252021-02-25Control method and device of network conference, electronic equipment and storage medium

Publications (1)

Publication NumberPublication Date
CN112885350Atrue CN112885350A (en)2021-06-01

Family

ID=76054505

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110213134.1APendingCN112885350A (en)2021-02-252021-02-25Control method and device of network conference, electronic equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN112885350A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115312057A (en)*2022-08-152022-11-08腾讯科技(深圳)有限公司Conference interaction method and device, computer equipment and storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103824559A (en)*2012-11-192014-05-28国际商业机器公司Interleaving voice commands for electronic meetings
CN106454539A (en)*2016-11-292017-02-22武汉斗鱼网络科技有限公司Bullet screen forbidding system and bullet screen forbidding method for live video websites
CN108023805A (en)*2016-10-312018-05-11阿里巴巴集团控股有限公司The collocation method and device of interaction authority
CN108111701A (en)*2016-11-242018-06-01北京中创视讯科技有限公司Silence processing method and device
US20180358034A1 (en)*2017-06-092018-12-13International Business Machines CorporationActive speaker detection in electronic meetings
CN110139152A (en)*2019-05-202019-08-16北京字节跳动网络技术有限公司Prohibit speech method, apparatus, electronic equipment and computer readable storage medium
CN110349597A (en)*2019-07-032019-10-18山东师范大学A kind of speech detection method and device
CN110380875A (en)*2019-06-142019-10-25平安科技(深圳)有限公司Group communication method, apparatus, computer equipment and storage medium
CN111028852A (en)*2019-11-062020-04-17杭州哲信信息技术有限公司Noise removing method in intelligent calling system based on CNN
CN111343410A (en)*2020-02-142020-06-26北京字节跳动网络技术有限公司Mute prompt method and device, electronic equipment and storage medium
CN111402922A (en)*2020-03-062020-07-10武汉轻工大学Audio signal classification method, device, equipment and storage medium based on small samples
CN112040166A (en)*2019-06-042020-12-04中兴通讯股份有限公司Conference control realization method, device and server
CN112153223A (en)*2020-10-232020-12-29北京蓦然认知科技有限公司Method for voice assistant to recognize and execute called user instruction and voice assistant
CN112397073A (en)*2020-11-042021-02-23北京三快在线科技有限公司Audio data processing method and device

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103824559A (en)*2012-11-192014-05-28国际商业机器公司Interleaving voice commands for electronic meetings
CN108023805A (en)*2016-10-312018-05-11阿里巴巴集团控股有限公司The collocation method and device of interaction authority
CN108111701A (en)*2016-11-242018-06-01北京中创视讯科技有限公司Silence processing method and device
CN106454539A (en)*2016-11-292017-02-22武汉斗鱼网络科技有限公司Bullet screen forbidding system and bullet screen forbidding method for live video websites
US20180358034A1 (en)*2017-06-092018-12-13International Business Machines CorporationActive speaker detection in electronic meetings
CN110139152A (en)*2019-05-202019-08-16北京字节跳动网络技术有限公司Prohibit speech method, apparatus, electronic equipment and computer readable storage medium
CN112040166A (en)*2019-06-042020-12-04中兴通讯股份有限公司Conference control realization method, device and server
CN110380875A (en)*2019-06-142019-10-25平安科技(深圳)有限公司Group communication method, apparatus, computer equipment and storage medium
CN110349597A (en)*2019-07-032019-10-18山东师范大学A kind of speech detection method and device
CN111028852A (en)*2019-11-062020-04-17杭州哲信信息技术有限公司Noise removing method in intelligent calling system based on CNN
CN111343410A (en)*2020-02-142020-06-26北京字节跳动网络技术有限公司Mute prompt method and device, electronic equipment and storage medium
CN111402922A (en)*2020-03-062020-07-10武汉轻工大学Audio signal classification method, device, equipment and storage medium based on small samples
CN112153223A (en)*2020-10-232020-12-29北京蓦然认知科技有限公司Method for voice assistant to recognize and execute called user instruction and voice assistant
CN112397073A (en)*2020-11-042021-02-23北京三快在线科技有限公司Audio data processing method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115312057A (en)*2022-08-152022-11-08腾讯科技(深圳)有限公司Conference interaction method and device, computer equipment and storage medium

Similar Documents

PublicationPublication DateTitle
US11811973B2 (en)Computer-programmed telephone-enabled devices for processing and managing numerous simultaneous voice conversations conducted by an individual over a computer network and computer methods of implementing thereof
CN106462573B (en)In-call translation
US9652113B1 (en)Managing multiple overlapped or missed meetings
KR20240007888A (en)Electronic device and method for communicating with chatbot
EP3050051B1 (en)In-call virtual assistants
US9294616B2 (en)Identifying a contact based on a voice communication session
US9507774B2 (en)Systems, method and program product for speech translation
CN114691857A (en)Real-time generation of summaries and next actions for a user from interaction records in a natural language
EP3797413A1 (en)Use of voice recognition to generate a transcript of conversation(s)
US11699360B2 (en)Automated real time interpreter service
US20230403174A1 (en)Intelligent virtual event assistant
CN112000781A (en) Information processing method, device, electronic device and storage medium in user dialogue
CN113678153A (en) Context-aware real-time meeting audio transcription
US20190019067A1 (en)Multimedia conferencing system for determining participant engagement
US11909784B2 (en)Automated actions in a conferencing service
CN113284500B (en) Audio processing method, device, electronic equipment and storage medium
CN111681650A (en)Intelligent conference control method and device
US20170286755A1 (en)Facebot
CN116569197A (en)User promotion in collaboration sessions
CN110865789A (en)Method and system for intelligently starting microphone based on voice recognition
CN112885350A (en)Control method and device of network conference, electronic equipment and storage medium
CN112969000A (en)Control method and device of network conference, electronic equipment and storage medium
US20210327416A1 (en)Voice data capture
CN113012679A (en)Method, apparatus and medium for broadcasting message by voice
US8775163B1 (en)Selectable silent mode for real-time audio communication system

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20210601


[8]ページ先頭

©2009-2025 Movatter.jp