CN112669838A

Movatterモバイル変換

Info

Publication number: CN112669838A
Application number: CN202011495657.1A
Authority: CN
Inventors: 彭媛; 操灿; 方律
Original assignee: Hefei Feier Intelligent Technology Co ltd
Current assignee: Hefei Feier Intelligent Technology Co ltd
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2021-04-16

Abstract

本发明提供了一种智能音箱音频播放方法，包括A：智能音箱接收到唤醒语音，切换到唤醒状态，唤醒状态下，如果智能音箱处于播放状态，则降低智能音箱输出音量；B：智能音箱接收播放控制语音，基于播放控制语音提取播放关键词，生成播放请求；C：基于播放请求访问音频流服务器，依次从音频流服务器获取对应的音频流数据；D：智能音箱播放收到的音频流数据。本发明的优点在于：基于唤醒语音调控播放音量，主动降低播放音量以方便接收用户后续语音指令，解决了自身播放振动对接收控制指令的影响；通过统一端口访问不同服务器，防止因版权问题影响用户使用，提高满意度；只需要对关键词进行检测，从音频流服务器层面滤除错误数据。

The present invention provides an audio playback method of a smart speaker, including A: the smart speaker receives a wake-up voice, switches to a wake-up state, and in the wake-up state, if the smart speaker is in a playback state, the output volume of the smart speaker is reduced; B: the smart speaker receives Play control voice, extract play keywords based on the play control voice, and generate a play request; C: Access the audio stream server based on the play request, and sequentially obtain the corresponding audio stream data from the audio stream server; D: The smart speaker plays the received audio stream data . The advantages of the present invention are: based on the wake-up voice to control the playback volume, actively reduce the playback volume to facilitate the reception of the user's subsequent voice commands, and solve the influence of self-play vibration on receiving control commands; access different servers through a unified port to prevent copyright issues from affecting users Use, improve satisfaction; only need to detect keywords, filter out erroneous data from the audio streaming server level.

Description

Intelligent sound box audio playing method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of intelligent sound boxes, in particular to an intelligent sound box audio playing method and device, electronic equipment and a storage medium.

Background

The intelligent sound box is a novel intelligent electronic product, has a voice input function and a remote voice acquisition function, is fixed on a PCB (printed circuit board) inside the sound box by arranging MEMS microphones with small models according to a certain array mode, and enables the sound box to perform fine acquisition on voices from all angles and directions. However, in the actual use process, the speaker of the sound box makes a sound, so that the whole sound box and the PCB inside the sound box are in a vibration state, the invention patent application with publication number CN107134286A discloses a wireless audio playing method based on voice interaction, a music player and a storage medium, the intelligent music player receives the control voice of a user, the high-power sound box is connected through wireless communication for playing, the receiving and playing devices are separately arranged, the influence of the playing audio on the voice instruction recognition effect is reduced, but this method does not solve the problem of how to eliminate the influence of the vibration of the PCB panel on the received voice signal when the audio player is used to play audio, in this case, the low-power audio player can only be used as a controller, and cannot be used independently as a player, which limits the use of the device.

Disclosure of Invention

The invention aims to solve the technical problem of providing an audio playing method of an intelligent sound box, which can eliminate the influence of playing audio on the audio receiving effect.

The invention solves the technical problems through the following technical scheme: an audio playing method for an intelligent sound box comprises the following steps:

step A: the intelligent sound box receives the awakening voice and switches to an awakening state, and in the awakening state, if the intelligent sound box is in a playing state, the output volume of the intelligent sound box is reduced;

and B: the intelligent sound box receives the playing control voice, extracts playing keywords based on the playing control voice and generates a playing request;

and C: accessing an audio stream server based on the playing request, and sequentially acquiring corresponding audio stream data from the audio stream server;

step D: and the intelligent sound box plays the received audio stream data.

According to the method, the intelligent sound box is awakened based on the awakening voice, when the intelligent sound box plays audio, the playing volume can be actively reduced after the intelligent sound box is awakened so as to conveniently receive a subsequent voice instruction of a user, the recognition precision is improved, and the influence of self playing vibration on receiving a control instruction is solved, so that the intelligent sound box can effectively serve as a controller and a player, and the user experience is improved; the control command is analyzed in the intelligent loudspeaker box to obtain a playing request, the corresponding server is accessed, different audio stream servers can be accessed based on the audio information specified by the user, the problem that the single audio stream server cannot meet the user requirements due to copyright and other problems is solved, control is performed through a unified port, use is convenient, and user experience is better.

Preferably, the awakening voice content is an identification name of the intelligent sound box, and the awakening voice can switch the intelligent sound box from a dormant state to an awakening state; the user does not perform any operation or input any effective voice command within the preset time period, and the intelligent sound box enters a dormant state.

Preferably, in step B, after receiving the play control voice, the smart speaker obtains text information corresponding to the play control voice through parsing, obtains the play keyword according to the text information parsing, and obtains a corresponding audio stream server address according to the play keyword, where the play request includes the play keyword, the audio stream server address, and a play sequence.

Preferably, the playing sequence includes sequential playing, random playing, circular playing, and single-track playing; when the text information does not include the playing sequence, the playing is sequentially performed based on the total playing times, sequentially performed based on the preference of the user, or randomly performed.

Preferably, the method for analyzing the playing keywords is to analyze and match the text information by using a preset playing keyword library, wherein the playing keyword library comprises one or more combinations of audio stream file names, singers, song writers, categories, regions, years and genders.

Preferably, the smart speaker analyzes the relationship between the preset playing keyword and the audio stream server address to obtain an audio stream server address corresponding to the playing keyword, when the smart speaker plays the audio stream data, the smart speaker further caches the next audio stream data to be played from the audio stream server, and after the current audio stream data is played, the smart speaker plays the cached data.

Preferably, if the playing keyword cannot be analyzed from the text information, the playing keyword is randomly generated based on the historical playing data of the user, or the playing keywords are sequentially generated from high to low according to the playing times based on the historical playing data.

Preferably, before the smart sound box receives the play control voice, the smart sound box further comprises a step of communicating with a user terminal to perform networking, the user terminal is connected with the smart sound box through a Bluetooth, the user terminal selects to connect with a wifi network, an account is input by the user terminal or an account of the smart sound box device is used to log in an audio streaming server, and historical play data of the account is obtained, wherein the historical play data comprises audio streaming information corresponding to keywords in a play keyword library, and the address of the audio streaming server and the playing frequency of the audio streaming.

Preferably, if the smart sound box receives the operation control request voice after the step a, analyzing the operation control request to obtain a control keyword, and executing a corresponding command; the control keywords comprise pause, start, previous, next, volume increase and volume decrease.

Preferably, the smart sound box is a double-loudspeaker or multi-loudspeaker Bluetooth sound box.

The invention also provides an audio playing method of the intelligent sound box, which comprises the steps of

Step i: a user speaks a wake-up voice of the intelligent sound box;

step ii: the intelligent sound box receives the awakening voice, switches to an awakening state and sends prompt information, and in the awakening state, if the intelligent sound box is in a playing state, the output volume of the intelligent sound box is reduced;

step iii: after the intelligent sound box sends out the prompt message, the user speaks a play control voice which the intelligent sound box wants to execute;

step iv: the intelligent sound box receives the playing control voice, extracts playing keywords based on the playing control voice, analyzes an address of the audio streaming server based on the playing keywords and generates a playing request;

step v: the intelligent sound box accesses a corresponding audio streaming server based on the playing request;

step vi: the audio streaming server responds to the playing request and sequentially returns the searched audio streaming data to the intelligent sound box based on the playing sequence in the playing request;

step vii: and the intelligent sound box plays the received audio stream data.

The invention also provides an intelligent sound box audio playing device, which comprises

A wake-up module: used for receiving the awakening voice and switching to an awakening state, and in the awakening state, if the intelligent sound box is in a playing state, the output volume of the intelligent sound box is reduced

A play request generation module: the system is used for receiving the playing control voice and analyzing to generate a playing request;

a play request sending module: the audio streaming server is used for sending the playing request to the audio streaming server obtained by analysis;

an audio stream data receiving module: the system comprises a server, a server and a server, wherein the server is used for receiving audio stream data returned by the audio stream server;

a playing module: and playing the audio stream data through the intelligent sound box.

The invention also provides an electronic device comprising a memory and a processor, wherein the memory is used for storing one or more computer instructions, and the one or more computer instructions are executed by the processor to realize the playing method.

The invention also provides a readable storage medium, which stores computer instructions, and the computer instructions can realize the audio playing method when being executed by a processor.

The intelligent sound box audio playing method, the intelligent sound box audio playing device, the electronic equipment and the storage medium have the advantages that: the intelligent sound box is awakened based on the awakening voice, when the intelligent sound box plays audio, the play volume can be actively reduced after the intelligent sound box is awakened so as to conveniently receive a subsequent voice instruction of a user, the recognition precision is improved, and the influence of self-play vibration on receiving a control instruction is solved, so that the intelligent sound box can effectively serve as a controller and a player, and the user experience is improved; the control command is analyzed in the intelligent loudspeaker box to obtain a playing request, the corresponding server is accessed, different audio stream servers can be accessed based on the audio information specified by the user, the problem that the single audio stream server cannot meet the user requirements due to copyright and other problems is solved, control is performed through a unified port, use is convenient, and user experience is better. All playing data can be stored on the account of each audio streaming server and can be stored on the intelligent sound box in a unified mode, so that unified management and control over different music players can be achieved, great convenience is brought to users, user satisfaction is improved, historical data of the users on different audio streaming servers can be fused, user preference is better analyzed, user instructions can be understood more intelligently, and user satisfaction is improved. The playing keyword is extracted from the playing control voice to obtain the playing request, so that only the keyword needs to be detected, the voice recognition difficulty is reduced, the corresponding audio streaming server is accessed to obtain data, the error data is filtered from the audio streaming server, the user experience is improved, and the audio is played based on the data fed back by the audio streaming server.

Drawings

Fig. 1 is a flowchart of an audio playing method for a smart sound box according to an embodiment of the present invention.

Fig. 2 is a flowchart of generating a play request by an audio playing method for a smart sound box according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a method for playing an audio of an intelligent sound box according to an embodiment of the present invention;

fig. 4 is a flowchart of an audio streaming server corresponding to the audio playing method for an intelligent sound box according to an embodiment of the present invention;

fig. 5 is a flowchart illustrating a method for playing an audio of an intelligent sound box according to an embodiment of the present invention to analyze a control keyword;

fig. 6 is a flowchart illustrating a method for playing audio of a smart sound box according to an embodiment of the present invention;

fig. 7 is a block diagram of an audio playing apparatus of a smart speaker according to a third embodiment of the present invention;

fig. 8 is a block diagram of a playing request generating module of an audio playing device of an intelligent speaker according to a third embodiment of the present invention;

fig. 9 is a structural diagram of an audio playing device of an intelligent sound box according to a third embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention are described below in detail and completely with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

As shown in fig. 1, the present embodiment provides an audio playing method for a smart sound box, including the following steps:

this embodiment awakens up intelligent audio amplifier based on awakening up pronunciation, when intelligent audio amplifier broadcast audio frequency, can initiatively reduce the broadcast volume after awakening up in order to conveniently receive the follow-up voice command of user, improves the discernment precision, has solved self broadcast vibration to the influence of receiving control command to make intelligent audio amplifier can effectually act as controller and player, promote user experience.

The content of the awakening voice is the identification name of the intelligent sound box and can be defined by a user, the awakening voice can switch the intelligent sound box from a dormant state to an awakening state, the intelligent sound box can receive the control voice of the user in the awakening state, when the intelligent sound box is in the awakening state, the user does not perform any operation or input any effective voice instruction in a preset time period, and the intelligent sound box automatically enters the dormant state; in order to distinguish from most of the names or nicknames, the content of the wake-up voice is generally a phrase of four or more words.

referring to fig. 2, after receiving the play control voice, the smart speaker sends the play control voice to the smart speaker, the smart speaker obtains corresponding text information through parsing, obtains the play keyword according to the text information, the smart speaker obtains a corresponding audio streaming server address according to the play keyword, and the play request includes the play keyword, the audio streaming server address and a play sequence;

referring to fig. 3, the method for parsing the broadcast keyword is to parse and match the text information by using a preset broadcast keyword library, where the broadcast keyword library includes one or more combinations of audio stream file name, singer, song writer, genre, region, year, and gender.

For example, if the user says "i want to listen to the song of zhou jilun", the play keyword is "play-zhou jilun"; when the user says 'Guo Dege', the playing keyword is analyzed to be 'playing-Guo Dege'; the playing keywords can be further refined to reduce the search noise, for example, for "Zhou Jilun", the resolution is "playing-Zhou Jilun-popular music"; for "Guo Dege", it is analyzed as "Play-Guo Dege-Xiang"; therefore, key information can be extracted quickly, phrases which cannot be identified can be filtered automatically under the condition that a user expresses relatively complicated words or habitual slogan, and finally subsequent steps are executed only on the basis of analyzed playing keywords.

And the intelligent sound box analyzes the relation between the preset playing keyword and the audio streaming server address to obtain the audio streaming server address corresponding to the playing keyword.

For example, for the playing keywords "play-zhou jieren-popular music" and "play-guo german-phase", the smart speaker searches for "zhou jieren" using a single or multiple audio streaming servers such as QQ music or internet cloud music based on the classification result of the audio streams, and for the phase sound category, the smart speaker accesses himalaya or other audio streaming servers for searching; when the type of the audio streaming server is determined, information such as copyright can be considered according to professional categories of different audio streaming servers, for example, when the copyright of the zhou jieren music belongs to the QQ music, and when the keyword for analyzing the playing is "play-zhou jieren-popular music", the address of the audio streaming server determined by the smart speaker is the server address of the QQ music.

The playing sequence comprises sequential playing, random playing, circular playing and single-track playing; the intelligent sound box preferentially extracts a playing sequence based on the playing control voice of the user, and if the user says ' random playing of Zhou Jie Lun ' song ', the analyzed playing key word is ' playing-Zhou Jie Lun-QQ music-random playing '; and when the text information does not comprise the playing sequence, playing the text information in sequence based on the total playing times, playing the text information in sequence based on the preference of the user or randomly playing the text information, wherein the preference of the user is obtained by arranging all audio stream data listened to by different audio stream servers according to the descending order of the playing times of the user, and the more the playing times, the higher the preference of the user is.

If the playing keywords can not be analyzed from the text information, the playing keywords are randomly generated based on the historical playing data of the user, or the playing keywords are sequentially generated from high to low according to the historical playing data. For example, if the user says "i want to listen to a song", no specific song, author, or genre information is found, and the songs played by the user with the highest frequency are played in turn based on the user's historical preference.

This embodiment accomplishes the analysis of broadcast keyword in intelligent audio amplifier to further analysis confirms audio stream server address and broadcast order, can confirm corresponding music provider based on the music copyright is automatic from this, and search on the audio stream server that corresponds, thereby solve the problem that single server part audio stream data copyright lacks, can satisfy user's demand, and the simple operation, the user only need be through unified interface, promptly the intelligent audio amplifier control can, all broadcast data except can keeping on the account number of each audio stream server, can also unified save on intelligent audio amplifier, can realize carrying out unified management and control to different music players from this, the very big user that has facilitated, promote user satisfaction.

And C: accessing an audio stream server based on the playing request, and sequentially acquiring corresponding audio stream data from the audio stream server; the intelligent sound box accesses and searches the corresponding audio streaming server based on the analyzed playing request, and returns the search results to the intelligent sound box, and the intelligent sound box sequentially returns the playing sequence obtained based on the analysis to the intelligent sound box under the condition that a plurality of search results exist;

for the audio streaming server, referring to fig. 4, the working method is as follows: the audio streaming server responds to the playing request, and sends corresponding audio streaming data to the intelligent sound box, wherein the playing request at least comprises: audio streaming server address, playing keyword and playing sequence; after the audio streaming server is connected, searching corresponding audio streaming data on the audio streaming server according to the playing keywords; sequentially sending the searched audio stream data to the intelligent sound boxes based on the playing sequence contained in the playing request;

when the intelligent sound box plays the searched audio stream data, the intelligent sound box also caches the next audio stream data to be played from the audio stream server, and after the current audio stream data is played, the intelligent sound box plays the cached data; and if the intelligent sound box receives other playing control voices again before the current audio stream data is played, executing the steps based on the latest playing control voice.

Step D: through intelligent audio amplifier broadcast audio stream data, in order to improve broadcast tone quality, obtain abundanter high bass effect, intelligent audio amplifier chooses for use two loudspeaker or many loudspeaker bluetooth speaker.

Further, referring to fig. 5, if the smart speaker receives the play control voice in step B and then parses out a control keyword, corresponding control operations are executed, where the control keyword includes pause, start, previous, next, volume up, and volume down.

If the intelligent sound box is awakened when in a playing state, in the step B, the subsequent steps are executed according to the playing control voice, the audio stream server is accessed to obtain new audio stream data, the new audio stream data is played, the audio stream data cached in the previous task is discarded, and the audio stream data to be played is cached in the current task again.

Referring to fig. 6, the above playing method requires accessing an audio streaming server based on a network, where the smart speaker further includes a step of communicating with a user terminal to perform networking before receiving a playing control voice, where the user terminal is connected to the smart speaker through bluetooth, and selects to connect to a wifi network through the user terminal, and inputs an account number using the user terminal or logs in the audio streaming server using an account number of the smart speaker device, so as to obtain historical playing data of the account number, where the historical playing data includes audio streaming information corresponding to keywords in a playing keyword library, an address of the audio streaming server, and playing times of the audio streaming; therefore, historical data of users in different audio streaming servers can be centralized in the intelligent loudspeaker box, and unified control is convenient to carry out.

Example two

The audio playing method provided by the embodiment comprises the following steps:

an intelligent sound box audio playing method comprises

Step i: a user speaks a wake-up voice of the intelligent sound box;

step vii: and the intelligent sound box plays the received audio stream data.

EXAMPLE III

Referring to fig. 7, based on the above audio playing method, this embodiment further provides an audio playing apparatus for an intelligent speaker, including:

a wake-up module: the intelligent sound box is used for receiving the awakening voice and switching to an awakening state, and in the awakening state, if the intelligent sound box is in a playing state, the output volume of the intelligent sound box is reduced;

Referring to fig. 8, the play request generation module includes

A voice receiving and processing unit: the system comprises a voice processing module, a voice processing module and a voice processing module, wherein the voice processing module is used for receiving voice information in a user environment and performing noise reduction processing and echo cancellation processing on the received voice information; the echo cancellation aims to remove background sound played by the intelligent sound box, so that the identification capability of the control instruction is further improved, and the user experience is improved;

an offline speech recognition unit: the voice recognition device is used for performing offline voice recognition on the voice information processed by the voice receiving and processing unit, wherein the range of the offline voice recognition comprises recognition of playing control keywords, and the control keywords comprise pause, start, previous, next, volume plus and volume minus;

a voice transmission and analysis unit: the voice cloud interface is used for sending the processed voice information to the server through the voice cloud interface of the intelligent sound box, analyzing text information corresponding to the playing control voice through the server, and analyzing playing keywords, an audio streaming server address and an audio playing sequence according to the text information;

a play request generation unit: the method is used for fusing the playing keywords, the audio streaming server address and the audio playing sequence to generate a playing request.

The wake-up module may obtain the wake-up voice based on the voice receiving and processing unit of the play request generation module.

Referring to fig. 9, the audio playing device of the smart speaker further includes

A networking module: the system is used for connecting a wifi network;

a feedback module: the system is used for acquiring and feeding back historical data of a user in the intelligent loudspeaker box and each audio streaming server;

a buffer module: and the audio stream data is stored in the buffer area.

Example four

The present embodiments also provide an electronic device comprising a memory and a processor, the memory for storing one or more computer instructions executable by the processor for performing a method comprising:

step D: and the intelligent sound box plays the received audio stream data.

EXAMPLE five

The present embodiments also provide a readable storage medium storing computer instructions that, when executed by a processor, are capable of performing the following method:

step D: and the intelligent sound box plays the received audio stream data.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.