Numbering device	History detection text	Number of accesses	Date of day
				1	Cartoon piece	23030	2024-02-03
2	Film making apparatus	21566	2024-02-03
				3	Playing back	19500	2024-02-03
4	Weather forecast display	18500	2024-02-03
				5	The spring festival brings our reporter to the mind	15700	2024-02-03
6	Shutdown	14005	2024-02-03

TABLE 3 Table 3

Numbering device	History detection text	Number of accesses	Date of day
				1	Playing back	24130	2024-02-04
2	Cartoon piece	22208	2024-02-04
				3	Shutdown	18705	2024-02-04
4	Joint-cheering evening party	15983	2024-02-04
				5	Film making apparatus	12050	2024-02-04
6	Music	10028	2024-02-04

TABLE 4 Table 4

Numbering device	History detection text	Number of accesses	Date of day
				1	Cartoon piece	23030	2024-02-03
2	Film making apparatus	21566	2024-02-03
				3	Playing back	19500	2024-02-03
4	Weather forecast display	18500	2024-02-03
				5	The spring festival brings our reporter to the mind	15700	2024-02-03
6	Shutdown	14005	2024-02-03
				7	Playing back	24130	2024-02-04
8	Cartoon piece	22208	2024-02-04
				9	Shutdown	18705	2024-02-04
10	Joint-cheering evening party	15983	2024-02-04

S520: a history detection text database is determined based on a plurality of history detection texts corresponding to each time period.

In some embodiments, after obtaining the preferred result list, the electronic device may determine the number of accesses of the plurality of history detection texts corresponding to each day in the preferred result list in each period, so as to determine the history detection text database. The length of each period may be 1 minute, 3 minutes, 5 minutes, etc., and the embodiment of the present invention does not limit the specific length of each period.

In some examples, a hot interval in which the user uses the electronic device with a high frequency, for example, 11 to 13 points, 17 to 23 points, and so on, may be selected, and the number of accesses of the plurality of history detection texts corresponding to each day in each period may be obtained in the hot interval.

Illustratively, table 5 is one possible history detection text database, taking the number of accesses of a plurality of history detection texts as an example for each period within the abovementioned hot interval at 2 months 3 of 2024. Wherein, whether the high frequency represents the text of whether the history detection text is high-frequency occurrence in a certain period of time relative to the total access times of the day (i.e. whether the text is greater than or equal to a preset high-frequency access threshold).

For example, if the preset high frequency access threshold is 0.4% of the total number of accesses of the day, for the period of 11:50-11:51, the number of accesses of the history detection text "play" is 603 times, and compared to the number of accesses of all the history detection texts of the day (for example: 23030+21566+19500+18500+15700+14005= 112301), the number of accesses of the history detection text in the period is greater than the preset high frequency access threshold (112301 x 0.5% =561), and therefore, the history detection text "play" is a text that appears at high frequency in the period of 11:50-11:51.

TABLE 5

In some embodiments, after determining the history detection text database, the preset access threshold corresponding to each time period may be determined based on the number of accesses of the plurality of history detection texts corresponding to each time period in at least one period (i.e., in N days). For example, the mean and variance of the number of text accesses corresponding to each time period in each day may be determined first; and then determining a preset access threshold corresponding to each time period based on the mean value and the variance corresponding to each time period in N days.

For example, for the time period 17:31-17:32, the number of text accesses corresponding to the time period of 2 nd month 3 days 2024 is 1188, including two history detection texts, "play" and "cartoon", the mean and variance of the number of text accesses corresponding to the time period of 2 nd month 3 days 2024 can be determined based on the number of text accesses and the number of history detection texts, and the mean and variance of the number of text accesses corresponding to the time period of other days can be obtained similarly.

Based on the mean value and the variance of the text access times corresponding to the time period in N days (for example: 7 days), a preset access threshold corresponding to the time period can be obtained, for example, the mean value of the mean value and the variance of the text access times corresponding to the time period in 7 days can be taken, and the preset access threshold can be determined according to the mean value of the mean value and the variance. The preset access threshold corresponding to other time periods can be obtained in the same way, and will not be described in detail here.

In some embodiments, after determining the history detection text database and the preset access threshold corresponding to each time period, when the electronic device receives the wake-up audio at the current time, the electronic device may respond to the wake-up audio and perform a wake-up operation. For example, the electronic device may output relevant alert audio to the user after receiving the wake-up audio to alert the user that the voice interactive function has been turned on and wait to receive voice instructions from the user.

It will be appreciated that the wake-up audio may be the correct wake-up audio, or may be audio played by an electronic device or other device, content spoken by the user that is not related to voice interactions, or false wake-up audio such as ambient noise.

Referring to a scene graph of voice interactions shown in fig. 6A-6D. Taking the preset wake-up word of the electronic device 200 as "small poly" as an example. As shown in fig. 6A, if the wake-up audio 601 input by the user is "small", the electronic device 200 may respond to the wake-up audio and output related prompt information, for example, a prompt control of "you good, i am listening" may be displayed in the user interface 602 of the electronic device 200, or a voice message of "you good, i am listening" may be output. As shown in fig. 6B, if the preset wake-up word "small group" is included in the chat content of the user or the audio being played by the electronic device/other device (e.g., the wake-up audio 604 by mistake), the electronic device 200 may still respond to the wake-up audio.

S420: and performing voice recognition processing on the audio to be detected to obtain a text to be detected corresponding to the audio to be detected.

In some embodiments, if the electronic device receives a piece of audio (i.e., audio to be detected) after performing the wake-up operation, the audio to be detected may be subjected to a voice recognition process, and a text to be detected corresponding to the audio to be detected is obtained.

It will be appreciated that if the electronic device receives correct wake-up audio, the audio to be detected is more likely to be a voice command input by the user, such as playing a certain video, playing a certain television program, turning off, suspending, etc. If the electronic device receives the wake-up error audio, the audio to be detected is more likely to be chat audio of the user, audio being played by the electronic device/other devices, and the like, namely, audio irrelevant to the voice interaction process.

In some embodiments, the electronic device may search the history detection text database for whether there is a target history detection text that matches the text to be detected. If there is a target history detection text matching the above-mentioned text to be detected, S430 is performed. The history detection text database comprises a plurality of history detection texts, and the plurality of history detection texts comprise target history detection texts.

The condition that the text to be detected is matched with the target history detection text may include: the text to be detected is completely matched with the target history detection text, or the matching degree of the text to be detected and the target history detection text is larger than a certain matching degree threshold value.

The method for determining whether the text to be detected is matched with the history detection text in the history detection text database can adopt a similarity algorithm, a similarity matching model and the like. For example, the similarity matching model may include a Cross-encoder (Cross-encoders) model, a Bi-encoders model, a Late Interaction model, an Attention-based aggregation (Attention-based Aggregator) model, and the like; the similarity algorithm may include a Jacquard similarity coefficient (Jaccard Similarity Coefficient), a Euclidean distance (Euclidean Distance), a Manhattan distance (MANHATTAN DISTANCE), a Cosine of the included angle (Cosine), a Tanimoto coefficient (Tanimoto Coefficient), and the like.

S430: if the history detection text database contains a target history detection text matched with the text to be detected, and the target access times of the target history detection text are larger than a target preset access threshold, detecting context information of the audio to be detected.

In some embodiments, if there is a target history detection text in the history detection text database that matches the text to be detected, then it may be continuously determined whether the number of target accesses of the target history detection text is greater than a target preset access threshold; if the target access times of the target history detection text are greater than the target preset access threshold, the wake-up audio corresponding to the audio to be detected can be initially determined to be false wake-up audio, and then context information detection is continuously carried out on the audio to be detected, so that whether the wake-up audio is false wake-up audio or not is further determined.

In some embodiments, the target time period and the target access number corresponding to the target history detection text may be determined based on the history detection text database, and whether the target access number is greater than the target preset access threshold may be determined according to the target preset access threshold corresponding to the target time period.

Taking the text to be detected as "spring festival bringing our reporter" as an example, it can be determined that the target history detection text matched with the text to be detected is "spring festival bringing our reporter" in the history detection text database shown in the above table 5; the target time period corresponding to the target history detection text is 21:23-21:24, and the target access times are 300. If the target preset access threshold corresponding to the target time period is 21:23-21:24 and 205 times, the target access times are larger than the target preset access threshold.

In some embodiments, in the case where there is a target history detection text that matches the text to be detected, it may be determined whether the target access number of the target history detection text is less than a preset high-frequency threshold in addition to determining whether the target access number of the target history detection text is greater than a target preset access threshold.

If the access times of the detected text are larger than a preset high-frequency threshold value, the text corresponding to the instruction audio, which indicates that the high probability of the detected text is correct, is indicated; if the access times of the detected text are smaller than the preset high-frequency threshold value, the detected text is possibly the text acquired by waking up the audio by mistake.

And primarily determining wake-up audio corresponding to the text to be detected (audio to be detected) of which the target access times are larger than a target preset access threshold and smaller than a preset high-frequency threshold as false wake-up audio, namely primarily determining wake-up audio which is easier to cause false wake-up and wake-up audio corresponding to non-instruction audio as false wake-up audio.

It will be appreciated that, as shown in fig. 6C, if the user inputs correct wake-up audio, the electronic device will receive a large probability of further instruction audio 603 (for example, "play XXX") input by the user, i.e., the audio to be detected is instruction audio; and outputs a relevant prompt (e.g., "good, you play right away") in the user interface 602 to inform the user that the instruction audio has been successfully responded to at present, and waits for the user to input the instruction audio to perform the relevant function indicated by the instruction audio.

As shown in fig. 6D, if the wake-up audio is a false wake-up audio (e.g., the false wake-up audio 604 in fig. 6B), and the electronic device 200 responds to the audio to be detected, the electronic device 200 may output a related prompt message to inform the user that the currently acquired audio to be detected does not include a valid instruction; after learning the prompt information output by the electronic device, the user typically exits the voice interaction function through a voice command 605 (e.g., exit the voice interaction function) or the control device 100.

That is, in the case where the audio to be detected is a non-instruction audio, the context information corresponding to the audio to be detected may include that the user exits the voice interaction function through a voice instruction (i.e., a preset instruction audio), or triggers an operation of exiting the voice interaction function of the electronic device through a control device or the like (i.e., a preset operation), or the like.

S440: if the context audio corresponding to the audio to be detected comprises preset instruction audio or a user triggers preset operation on the electronic equipment after the audio to be detected is acquired, the wake-up audio is determined to be false wake-up audio.

In some embodiments, if the context audio corresponding to the audio to be detected includes the preset instruction audio, or the user triggers the preset operation on the electronic device after the audio to be detected is obtained, it indicates that the user does not need to use the voice interaction function currently, so it may be inferred that the voice interaction function is likely to be awakened by mistake, so that the previously received awakening audio is the awakening audio by mistake, that is, the awakening audio is determined to be the awakening audio by mistake.

In some embodiments, after determining the wake-up audio as the false wake-up audio, the electronic device may store a plurality of identification information sessionid corresponding to the multiple false wake-up audio, and obtain, in the subsequent process of reporting the false wake-up data, the multiple false wake-up audio and a plurality of detection texts corresponding to the multiple false wake-up audio according to the stored plurality of identification information in the link log information. And, at least one dimension information corresponding to each false wake-up audio may be obtained from the device base information and the user information, where the dimension information is used to determine a cause of generating the false wake-up audio, and may include a region, a language, an electronic device model, a chip type, version information, and the like where the false wake-up phenomenon occurs.

Then, the electronic device may send the above-mentioned false wake-up data (including multiple false wake-up audio, multiple detection texts corresponding to the multiple false wake-up audio, and at least one dimension information corresponding to each false wake-up audio) to the terminal device, so that the terminal device or related personnel determines a cause of generating the false wake-up audio based on the multiple false wake-up audio, multiple texts to be detected corresponding to the multiple false wake-up audio, and the at least one dimension information. Moreover, the false wake-up data can be used as a negative sample of the voice interaction model of the electronic equipment to train the voice interaction model, so that the probability of false wake-up of the electronic equipment can be effectively reduced, and the user experience is improved.

By applying the technical scheme of the invention, after the electronic equipment acquires the awakening audio and the audio to be detected at the current moment, whether the awakening audio is the false awakening audio or not can be determined in real time based on the historical statistical data (namely the historical detection text database) and the audio to be detected, so that the resource consumption of the electronic equipment when the false awakening audio is identified can be reduced, and the acquired false awakening data can be more; after the false wake-up audio is determined, false wake-up data (for example, a plurality of false wake-up audio, a plurality of detection texts corresponding to the false wake-up audio and at least one dimension information corresponding to each false wake-up audio) can be reported to the terminal equipment or the server in batches, so that the instantaneity and the effectiveness of the false wake-up data can be ensured, a large number of reporting processes of the false wake-up problem can be saved, and a developer can timely find and process the false wake-up problem of the electronic equipment.

The embodiment of the invention also provides a device for determining the false wake-up audio, referring to fig. 7, the device 700 for determining the false wake-up audio may be applied to an electronic device, and the device 700 for determining the false wake-up audio may include: a receiving module 710, a processing module 720, a detecting module 730, and a determining module 740.

The receiving module 710 is configured to receive wake-up audio and audio to be detected.

The processing module 720 is configured to perform voice recognition processing on the audio to be detected, so as to obtain a text to be detected corresponding to the audio to be detected.

The detecting module 730 is configured to detect the context information of the audio to be detected if the history detection text database contains the target history detection text matched with the text to be detected and the target access frequency of the target history detection text is greater than the target preset access threshold.

The history detection text database comprises a plurality of history detection texts, and the plurality of history detection texts comprise target history detection texts.

The determining module 740 is configured to determine the wake-up audio as a false wake-up audio if the context audio corresponding to the audio to be detected includes a preset instruction audio, or if the user triggers a preset operation on the electronic device after the audio to be detected is acquired.

As shown in fig. 7, the determination apparatus 700 of false wake-up audio may further include: and an acquisition module 750.

In some embodiments, the acquisition module 750 is configured to: acquiring a plurality of historical detection audios corresponding to time periods of a plurality of users in at least one period; the processing module 720 is further configured to: performing voice recognition processing on the plurality of history detection audios to obtain a plurality of history detection texts corresponding to the plurality of history detection audios; the determination module 740 is further configured to: a history detection text database is determined based on a plurality of history detection texts corresponding to each time period.

In some embodiments, the determination module 740 is further configured to: determining a preset access threshold corresponding to each time period based on the access times of a plurality of history detection texts corresponding to each time period in at least one period in a history detection text database; the target preset access threshold is a preset access threshold corresponding to a target time period in each time period.

In some embodiments, the determination module 740 is specifically configured to: determining access times mean and access times variance corresponding to each time period based on the access times of the plurality of history detection texts corresponding to each time period; and determining a preset access threshold corresponding to each time period based on the access frequency mean value and the access frequency variance corresponding to each time period.

In some embodiments, the determination module 740 is further configured to: determining whether the target access times of the target history detection text are larger than a target preset access threshold value or not; the determining module 740 is specifically configured to: determining a target time period and target access times corresponding to the target history detection text based on the history detection text database; and determining whether the target access times are greater than a target preset access threshold according to the target preset access threshold corresponding to the target time period.

In some embodiments, the detection module 730 is specifically configured to perform context information detection on the audio to be detected if the target access number is greater than the target preset access threshold and the target access number is less than the preset high frequency access threshold.

As shown in fig. 7, the determination apparatus 700 of false wake-up audio may further include: a storage module 760 and a transmission module 770.

Correspondingly, the specific details of each part in the device for determining the false wake-up audio are already described in detail in the embodiment of the electronic equipment part, and the details not disclosed can be referred to the embodiment of the electronic equipment part, so that the details are not repeated.

The embodiment of the invention provides a computer readable storage medium, which stores at least one executable instruction, and when the executable instruction runs on an electronic device/device for determining false wake-up audio, the method for determining false wake-up audio in any method embodiment is executed by the device for determining false wake-up audio.

The executable instructions may be specifically configured to cause the determining means of the electronic device/false wake-up audio to perform the above-described method for determining false wake-up audio.

In this embodiment, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. In addition, embodiments of the present invention are not directed to any particular programming language.

In the description provided herein, numerous specific details are set forth. It will be appreciated, however, that embodiments of the invention may be practiced without such specific details. Similarly, in the above description of exemplary embodiments of the invention, various features of embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. Wherein the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or elements are mutually exclusive.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.