CN111105777B

Movatterモバイル変換

Info

Publication number: CN111105777B
Application number: CN201811253059.6A
Authority: CN
Inventors: 叶建隆; 董民; 陶伟成
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-10-25
Filing date: 2018-10-25
Publication date: 2023-10-31
Anticipated expiration: 2038-10-25
Also published as: CN111105777A

Abstract

The application provides a voice data acquisition method and device, a voice data playing method and device, a key package updating method and device and a non-transitory computer readable storage medium. The voice data acquisition method comprises the following steps: storing received voice data into an exclusive memory in a trusted execution environment, wherein the exclusive memory is the memory exclusive to the trusted execution environment; in the trusted execution environment, the voice data in the exclusive memory is packaged into a voice data packet by utilizing a preset key packet, and the voice data packet is written into a shared memory; and in an untrusted execution environment independent of the trusted execution environment, reading the voice data packet from the shared memory and sending the voice data packet to a server, wherein the shared memory is a memory shared by the trusted execution environment and the untrusted execution environment.

Description

Voice data acquisition and playing method and device, key package updating method and device and storage medium

Technical Field

The present application relates to the field of multimedia device control, and in particular, to a method and apparatus for collecting voice data, a method and apparatus for playing voice data, a method and apparatus for updating a key package, and a non-transitory computer readable storage medium.

Background

With the development of artificial intelligence and embedded technology, intelligent voice technology based on man-machine interaction is increasingly widely used, and is likely to become the next portal of the internet in the future.

Generally, voice data contains privacy information of a user, and belongs to sensitive data, such as dialogue data, voiceprint data, key data and the like of the user, wherein not only the content of the data contains the privacy of the user, but also the parameter characteristics of the data can reflect the identity information of the user. Therefore, the protection of the voice data security is particularly important.

In general, voice data of a user is collected and preprocessed at a collection device side, and is transmitted to a server side (for example, a cloud end) through a network for processing. Then, after analyzing and processing the voice data of the user, the server returns the response voice data to the equipment end to be played to the user. Therefore, in the whole link process of voice data use (including collection, preprocessing, transmission, etc.), the security protection of the voice data is particularly important, and once the voice data is illegally acquired or tampered by a hacker, huge loss is brought to the user.

Disclosure of Invention

The application provides a voice data acquisition method and device, a voice data playing method and device, a key package updating method and device and a non-transitory computer readable storage medium.

According to a first aspect of the present application, there is provided a voice data acquisition method, comprising:

storing received voice data into an exclusive memory in a trusted execution environment, wherein the exclusive memory is the memory exclusive to the trusted execution environment;

in the trusted execution environment, the voice data in the exclusive memory is packaged into a voice data packet by utilizing a preset key packet, and the voice data packet is written into a shared memory;

and in an untrusted execution environment independent of the trusted execution environment, reading the voice data packet from the shared memory and sending the voice data packet to a server, wherein the shared memory is a memory shared by the trusted execution environment and the untrusted execution environment.

According to a second aspect of the present application, there is provided a method for playing voice data, including:

writing the received voice response packet into a shared memory in an untrusted execution environment;

Reading the voice response packet from the shared memory and loading the voice response packet into an exclusive memory in a trusted execution environment independent of the untrusted execution environment, wherein the shared memory is a memory shared by the trusted execution environment and the untrusted execution environment, and the exclusive memory is a memory exclusive to the trusted execution environment;

in the trusted execution environment, verifying the validity of the voice response packet by using a preset key packet;

and in the trusted execution environment, voice data in legal voice response packets are acquired.

According to a third aspect of the present application, there is provided a method of updating a keybag, comprising:

in a trusted execution environment, triggering a key pack update request based on a key refreshing time interval in a current key pack and a preset secure clock;

in the trusted execution environment, based on the key package update request, the current key package is packaged into an update request package, and the update request package is written into a shared memory;

reading the update request packet from the shared memory in an untrusted execution environment independent of the trusted execution environment, and sending the update request packet to a server, wherein the shared memory is a memory shared by the trusted execution environment and the untrusted execution environment;

Writing the received key update response packet into a shared memory in the non-trusted execution environment;

in the trusted execution environment, reading the key update response packet from the shared memory, and loading the key update response packet into an exclusive memory, wherein the exclusive memory is the memory exclusive to the trusted execution environment;

in the trusted execution environment, verifying the validity of the key update response package by using the current key package;

and in the trusted execution environment, replacing the current key package with an update key package in a legal key update response package.

According to a fourth aspect of the present application there is provided an apparatus comprising:

a processor;

a memory for storing one or more programs;

the one or more programs, when executed by the processor, cause the processor to perform any of the methods described above.

According to a fifth aspect of the present application there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to implement any of the methods as described above.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 shows a flow chart of a method of collecting speech data according to one embodiment of the application;

FIG. 2 shows a flow chart of a method of collecting speech data according to another embodiment of the application;

FIG. 3 shows a flow chart of a method of collecting speech data according to another embodiment of the application;

FIG. 4 is a flow chart of encapsulating voice data in exclusive memory into voice data packets using a preset key package in a trusted execution environment according to one embodiment of the present application;

FIG. 5 is a flowchart of encapsulating voice data in exclusive memory into voice data packets using a preset key package in a trusted execution environment according to another embodiment of the present application;

FIG. 6 is a flowchart of encapsulating voice data in exclusive memory into voice data packets using a preset key package in a trusted execution environment according to another embodiment of the present application;

Fig. 7 is a flowchart illustrating a method of playing voice data according to an embodiment of the present application;

FIG. 8 illustrates a flow chart for verifying the legitimacy of a voice response package using a preset key package in a trusted execution environment according to one embodiment of the present application;

FIG. 9 is a flow chart illustrating verifying the legitimacy of a voice response package using a preset key package in a trusted execution environment according to another embodiment of the present application;

FIG. 10 is a flow chart of acquiring and playing voice data in a legitimate voice response package in a trusted execution environment in accordance with one embodiment of the application;

FIG. 11 shows a flow chart of a method of updating a keybag in accordance with one embodiment of the application;

fig. 12 shows an application scenario diagram of a voice data collection method and a playback method according to the present application.

Detailed Description

Embodiments of the present application are described in detail below with reference to the accompanying drawings. It should be noted that the following description is merely exemplary and is not intended to limit the present application. Furthermore, in the following description, the same reference numerals will be used to designate the same or similar components in different drawings. The various features of the various embodiments described below may be combined with one another to form other embodiments within the scope of the application.

In order to solve the problems in the prior art, according to the present application, a trusted execution environment (TEE: trusted Execution Environment) and an untrusted execution environment (re: rich Execution Environment) which are independent of each other may be operated at a device end (e.g., a smart speaker, a smart mobile terminal, a smart set-top box with a voice interaction function, or a smart television) that collects voice data of a user. In general, TEE may also be referred to as Secure World (Secure World), and REE as Normal World (Normal World). In the TEE, the original voice data of the user can be collected and preprocessed, and then the processed voice data of the user is transferred to the re through the shared memory of the TEE and the re and then sent to the server by the re. Because the TEE is a safe operating environment, the TEE has higher safety and is not easy to attack, the TEE collects and preprocesses the original voice data, the safety of the voice data can be ensured, and the TEE is relatively independent REE as a transfer hub to transfer the processed voice data to a server. After receiving the voice data of the user, the server side can obtain corresponding feedback voice data through analysis processing, for example, TTS (Text to Speech) voice data synthesized through natural language processing. The server side sends the feedback voice data to the REE, and the REE transfers the feedback voice data to the TEE through the shared memory, and the TEE processes the feedback voice data and can play the feedback voice data. Therefore, the voice data, whether the original voice data of the user or the voice data fed back by the server side, are relatively safe in the whole acquisition, processing and transmission processes, so that the safety of the voice data is ensured.

Fig. 1 shows a flowchart of a method of collecting voice data according to an embodiment of the present application. As shown in fig. 1, the method 100 may include steps S110 to S130. In step S110, the received voice data is stored in the exclusive memory in the trusted execution environment. The exclusive memory is a secure memory, and may be a secure memory set specially for a trusted execution environment at the voice data acquisition device side. The exclusive memory is only used by the trusted execution environment and cannot be accessed by any entity in the untrusted execution environment, so that the exclusive memory has higher security and is not easy to attack. Therefore, the received original voice data of the user is stored in the exclusive memory, so that the safety of the collected original voice data can be ensured. The original voice data of the user can be voice data collected by an input device such as a microphone at the collecting device side, for example, voice data in the human-computer interaction process, and the general format is pulse code modulation (PCM: pulse Code Modulation). It will be appreciated that the original voice data of the user may also be obtained by a secure audio capture (Secure Audio Capturer) device controlled by a trusted execution environment.

In step S120, in the trusted execution environment, the voice data in the exclusive memory is encapsulated into a voice data packet by using a preset key packet, and the voice data packet is written into the shared memory. The key package may be preset at the voice data collection device, and in the trusted execution environment, the voice data stored in the exclusive memory in step S110 may be encapsulated by using the key package to generate a voice data packet. The specific packaging operation will be described in detail below. The generated voice data packet is written into the shared memory of the acquisition equipment end. The shared memory is a memory which is arranged at the acquisition equipment end and can be used by both the trusted execution environment and the untrusted execution environment, and is used for sharing and transmitting data in the trusted execution environment and the untrusted execution environment. Shared memory may be referred to as non-secure memory as opposed to exclusive memory, which is referred to as secure memory.

In step S130, in the untrusted execution environment, the voice data packet is read from the shared memory and sent to the server. As described above, the untrusted execution environment is independent of the trusted execution environment, and the shared memory is a memory that can be used by both for sharing and transferring data between the two. Therefore, the voice data in the shared memory can be fetched and sent to the server side in the non-trusted execution environment.

Therefore, the original voice data of the user can be collected and preprocessed in the trusted execution environment with high relative safety. Then, through the shared memory, the processed voice data can be obtained by the non-trusted execution environment and forwarded to the server side. Therefore, the original voice data is protected in the trusted execution environment and cannot be illegally acquired, and in the untrusted execution environment, because the voice data is packaged by the key package, even if the voice data package is illegally acquired, the original voice data cannot be easily acquired and/or tampered, so that the safety of the voice data is improved, and the authenticity of the voice data received by the server side is ensured.

Fig. 2 shows a flowchart of a method of collecting voice data according to another embodiment of the present application. As shown in fig. 2, the method 100 may further include steps S140 and S150 in addition to steps S110 to S130. For the sake of brevity, only the differences of the embodiment shown in fig. 2 from fig. 1 will be described below, and a detailed description of the same will be omitted.

In step S140, in the untrusted execution environment, the key package pre-stored in the read-only memory partition is read, and the key package is written into the shared memory. The read-only storage partition can be preset at the voice data acquisition equipment end and can be mapped to the read-only partition of the file system, wherein the stored data cannot be modified or deleted, and even if the machine is restarted or the image is updated online, the stored data cannot be modified or deleted. When the voice data acquisition equipment leaves the factory, the unique key package of the equipment can be prestored in a read-only storage partition in the voice data acquisition equipment. The key package may be distributed by a key distribution center, and may contain a key for encryption (e.g., 16 bytes in length), a key for signing (e.g., 16 bytes in length), and key additional information (e.g., key ID). The content in the key package is used for encrypting, signing and the like the voice data in the trusted execution environment. In step S140, the key package may be fetched from the read-only memory partition of the collection device by the untrusted execution environment and written into the shared memory.

Subsequently, in step S150, in the trusted execution environment, the keybag is read from the shared memory and loaded into the exclusive memory. In order to encrypt and sign voice data by utilizing the key package in the trusted execution environment, the key package can be taken out of the shared memory and loaded into the exclusive memory in the trusted execution environment, so that the security of subsequent operations is ensured.

Fig. 3 shows a flowchart of a method of collecting voice data according to another embodiment of the present application. As shown in fig. 3, the method 100 may further include step S160 in addition to steps S110 to S150. For the sake of brevity, only the differences between the embodiment shown in fig. 3 and fig. 2 will be described below, and a detailed description of the same will be omitted.

In step S160, in the trusted execution environment, the key pack loaded into the exclusive memory is decrypted and signature verified by using the encryption key and the signature key preset in the trusted execution environment, so as to obtain the encryption key, the signature key and the key ID in the key pack. In order to ensure the security of the key package, when the voice data collection device leaves the factory, the key package may be encrypted and signed and then stored in the read-only storage partition of the collection device, and the keys used in the encryption and signature operation (different from the encryption key and the signature key used for packaging the voice data in the key package) at this time may be set in the trusted execution environment, so that in the subsequent operation, the trusted execution environment decrypts and verifies the signature of the key package from the shared memory, that is, the operation in step S160. For example, at the time of shipping of the collection device, the device manufacturer may sign and encrypt the keybag using the signing key and the encryption key in the secure encryption module in the trusted execution environment in the device. Alternatively, the preset encryption key and signature key may be set in a code in the trusted execution environment when the voice data collection device leaves the factory.

Therefore, the encryption key, the signature key and the key ID contained in the key package can be obtained in the trusted execution environment and used for packaging operation on voice data, so that the leakage of the data in the key package is avoided.

Fig. 4 is a flowchart of encapsulating voice data in exclusive memory into voice data packets using a preset key package in a trusted execution environment according to an embodiment of the present application. As shown in fig. 4, the above step S120 may include sub-steps S121 and S122.

In sub-step S121, the speech data is compressed in a trusted execution environment. The voice data is compressed, so that the transmission efficiency to the server side can be improved, and the bandwidth can be saved. The voice data may be compressed using mp3, m4a, ogg, etc. techniques.

In sub-step S122, in the trusted execution environment, a signature is generated and encrypted for the compressed voice data using the encryption key and the signing key in the keybag to generate a voice data packet. After the voice data is compressed, the encryption key and the signing key in the key package acquired in the step S160 may be used to generate a signature for the voice data in the trusted execution environment and encrypt the voice data, so that the generated voice data package is a voice data package encapsulated by the encryption key and the signing key in the key package.

Fig. 5 is a flowchart of encapsulating voice data in exclusive memory into voice data packets using a preset key package in a trusted execution environment according to another embodiment of the present application. As shown in fig. 5, the above-described step S120 may further include a substep S123 in addition to the substeps S121 and S122. For the sake of brevity, only the differences between the embodiment shown in fig. 5 and fig. 4 will be described below, and a detailed description of the same will be omitted.

In sub-step S123, a key ID is added to the voice data packet in the trusted execution environment. In step S160 described above, in addition to the encryption key and the signature key in the key package, the key ID in the key package, which characterizes the unique key package as the acquisition device, is also acquired. Therefore, when the server receives the voice data packet, the server can know which key packet the server corresponds to through the key ID, so that the server can process the received voice data packet by utilizing the corresponding key packet.

Fig. 6 is a flowchart of encapsulating voice data in exclusive memory into voice data packets using a preset key package in a trusted execution environment according to another embodiment of the present application. As shown in fig. 6, the above-described step S120 may further include a substep S124 in addition to the substeps S121 and S122. For the sake of brevity, only the differences between the embodiment shown in fig. 6 and fig. 4 will be described below, and a detailed description of the same will be omitted.

In sub-step S124, a random number is added to the compressed speech data in the trusted execution environment. The random number is randomly generated in the trusted execution environment for adding to the voice data packet, and the server side should also include the random number in the sent back voice response packet for verification. If the voice response packet received from the server does not contain the random number, it may be considered that interception or tampering may occur during transmission of voice data, and the voice response packet may be unsafe.

Through the above description, the voice data of the user can be collected and preprocessed at the voice data collection equipment end and sent to the server end, the whole collection and processing process is carried out in a safe environment, and when the voice data is transmitted to an external open environment, the voice data is signed and encrypted, so that the safety of the voice data is ensured.

After receiving the voice data packet, the server side can acquire which key packet corresponds to the voice data packet through the key ID, so that the voice data packet is decrypted and signed by utilizing the encryption key and the signature key in the key packet to obtain voice data in the voice data packet. The server side may generate responsive voice data, e.g., synthesized TTS voice data, by processing and analyzing the voice data. After compressing the response voice data, the server side can sign and encrypt the random numbers in the received voice data packet by utilizing the encryption key and the signature key in the key packet so as to generate a voice response packet. The key ID may be added to the voice response packet and then sent to the voice data acquisition device.

Fig. 7 shows a flowchart of a method of playing voice data according to an embodiment of the present application. As shown in fig. 7, the method 200 may include steps S210 to S240. In step S210, in the untrusted execution environment, the received voice response packet is written into the shared memory. As described above, the shared memory is a memory set at the collection device side and usable by both the trusted execution environment and the untrusted execution environment, for sharing and transferring data in the trusted execution environment and the untrusted execution environment. After the device used by the user receives the voice response packet, it is first written into the shared memory by the untrusted execution environment.

In step S220, in the trusted execution environment, the voice response packet is read from the shared memory and loaded into the exclusive memory. As described above, the exclusive memory is a secure memory set by the device end used by the user and dedicated to the trusted execution environment, and the exclusive memory is only used by the trusted execution environment and is not accessed by any entity in the untrusted execution environment, so that the device end has higher security and is not easy to attack. Therefore, in the step, the voice response packet is taken out from the shared memory by using the trusted execution environment and is loaded into the exclusive memory, so that the voice response packet can be conveniently processed in the trusted execution environment, and the safety of voice data is improved.

In step S230, in the trusted execution environment, the validity of the voice response packet is verified by using the preset key package. As described above, the key package may be preset at the voice data collection device side, and the validity of the voice response package may be verified in the trusted execution environment by using the content in the key package. The specific authentication process will be described in detail below.

In step S240, in the trusted execution environment, voice data in a legal voice response packet is acquired. The voice data obtained in this step may be played in a trusted execution environment or in an untrusted execution environment according to actual needs or settings. Therefore, the voice response packet from the server can be verified in a trusted execution environment, and the verified voice data is played, so that the safety and the reliability of the voice response packet are ensured.

FIG. 8 illustrates a flow chart for verifying the legitimacy of a voice response package using a preset key package in a trusted execution environment according to one embodiment of the present application. As shown in fig. 8, the above step S230 may include sub-steps S231 and S232. In sub-step S231, in the trusted execution environment, a key package corresponding to the key ID of the voice response package is looked up from the key ID. As described above, the server side may add the key ID of the key package to the voice response package when generating the voice response package. The key ID may be known when the device used by the user receives the voice response packet. In the trusted execution environment of the user equipment side, the corresponding key package can be confirmed through the key ID.

Subsequently, in sub-step S232, the voice response package is decrypted and signature authenticated in the trusted execution environment using the encryption key and the signature key in the key package. After the corresponding key package is known, the voice response package generated by the server side can be decrypted and signed and authenticated by utilizing the encryption key and the signature key in the key package in the trusted execution environment. Therefore, the validity of the voice response packet can be verified through the key packet preset by the user equipment. And the whole verification process is completed in a trusted execution environment, so that the security of voice data and key data is effectively protected.

Fig. 9 is a flowchart for verifying validity of a voice response package using a preset key package in a trusted execution environment according to another embodiment of the present application. As shown in fig. 9, the above-described step S230 may further include a sub-step S233 in addition to the sub-steps S231 and S232. For the sake of brevity, only the differences between the embodiment shown in fig. 9 and fig. 8 will be described below, and a detailed description of the same will be omitted.

FIG. 10 is a flow chart illustrating the retrieval and playback of voice data in a legitimate voice response package in a trusted execution environment in accordance with one embodiment of the application. As shown in fig. 10, the above step S240 may include sub-steps S241 and S242. In sub-step S241, compressed speech data is extracted from legitimate speech response packets in a trusted execution environment. As described above, in order to improve transmission efficiency and save bandwidth, voice data in a voice response packet from a server side may be data that has been compressed. Thus, when the voice response package is validated, compressed voice data may be extracted from the voice response package in a trusted execution environment.

Subsequently, in sub-step S242, the compressed speech data is decompressed in a trusted execution environment. Therefore, the decompression of the voice data from the server side is performed under a trusted execution environment, so that the security of the voice data is protected.

In the above description, the key package is unique to each device, and when the key package of one device is intercepted or leaked carelessly, the security of the key package on other devices is not affected. The key package in the device may be a device unique key package applied by the device manufacturer to the key distribution center by using the UUID (Universally Unique Identifier: device unique ID) of the device, where the key distribution center may ensure that both the key ID and the key information in the key package corresponding to the UUID of the device have uniqueness. However, security is further improved if updates can be dynamically made during use of the key fob.

FIG. 11 shows a flowchart of a method of updating a keybag in accordance with one embodiment of the present application. As shown in fig. 11, the method 300 may include steps S310 to S370. In step S310, in the trusted execution environment, a key pack update request is triggered based on a key refresh time interval in the current key pack and a preset secure clock. As described above, the key pack set at the acquisition device side may include additional information, which may include a key refresh time interval in addition to the key ID. Thus, after decrypting and signature authenticating the key package in the trusted execution environment, the key refresh time interval may be obtained from the additional information therein. Furthermore, according to the present embodiment, a secure clock may also be run in the trusted execution environment, which is not accessible and modifiable by the untrusted execution environment. Thus, the trusted execution environment may trigger a key pack update request based on the secure clock and the key refresh time interval.

In step S320, in the trusted execution environment, based on the key package update request, the current key package is encapsulated into an update request package, and the update request package is written into the shared memory. As described above, a key package (i.e., a current key package) may be preset at the voice data collection device side, and the key package may be encapsulated in a trusted execution environment to generate an update request package. The specific packaging operation will be described in detail below. The generated update request packet is written into the shared memory of the acquisition equipment end.

According to one embodiment, step S320 may further include: in a trusted execution environment, a random number is added to the current keybag. As described above, the random number is randomly generated in the trusted execution environment, and may be added to the current key package, and the server side should also include the random number in the returned key update response package for verification. If the key update response packet received from the server does not contain the random number, it may be considered that the key update response packet may be unsafe to intercept or tamper with during transmission of the key packet.

According to one embodiment, step S320 may include: in a trusted execution environment, a signature is generated for a current keybag using an encryption key and a signing key in the current keybag and encrypted to generate an update request bag. That is, the current keybag may be encapsulated using the encryption key and the signing key in the current keybag to generate the update request bag.

According to one embodiment, step S320 may further include: in a trusted execution environment, a key ID preset in a current key package is added to an update request package. Therefore, when the server side receives the update request packet, the server side can know which key packet corresponds to the update request packet through the key ID, and the server side can process the received update request packet by utilizing the corresponding key packet.

After receiving the update request packet, the server side can acquire which key packet corresponds to the update request packet through the key ID information, so that the encryption key and the signature key in the key packet are utilized to decrypt and verify the signature of the update request packet, and the current key packet in the update request packet is obtained. The server side may randomly generate a new key package. The server side may sign and encrypt the new key package with the random number in the received update request package, using the encryption key and the signing key in the old key package (i.e., the current key package) to generate a key update response package. And, the server may also add a new key ID corresponding to the new key package to the key update response package. Then, the key ID (the ID of the old key package is convenient for the collection device to search the corresponding current key package to be updated after receiving) can be added to the key update response package, and then the key update response package is sent to the collection device of the voice data.

In step S340, the received key update response packet is written into the shared memory in the untrusted execution environment. After the device used by the user receives the key update response packet, it is first written into the shared memory by the untrusted execution environment.

In step S350, in the trusted execution environment, the key update response packet is read from the shared memory and loaded into the exclusive memory. As described above, the exclusive memory is a secure memory set by the device end used by the user and dedicated to the trusted execution environment, and the exclusive memory is only used by the trusted execution environment and is not accessed by any entity in the untrusted execution environment, so that the device end has higher security and is not easy to attack. Therefore, in the step, the key update response packet is taken out of the shared memory and loaded into the exclusive memory by utilizing the trusted execution environment, so that the subsequent processing in the trusted execution environment is facilitated, and the security of updating the key packet is improved.

In step S360, in the trusted execution environment, the validity of the key update response package is verified with the current key package.

According to one embodiment, in a trusted execution environment, the key update response package may be decrypted and signature authenticated using the encryption key and the signature key in the current key package, thereby verifying the validity of the key update response package. After the key update response package is obtained, the key update response package generated by the server side can be decrypted and signed by using the encryption key and the signature key in the current key package (old key package) in the trusted execution environment. Thus, the validity of the key update response package can be verified by the current key package. And the whole verification process is completed in a trusted execution environment, so that the security of key data is effectively protected.

According to one embodiment, the validity of the response packet is updated in a trusted execution environment with the random number check key described above. As described above, when the user equipment side sends out the update request packet to the server side, the random number for verification is included in the update request packet, and the server side includes the random number therein when generating the key update response packet. Therefore, the validity of the key update response packet is verified by checking whether the random number is included in the key update response packet. If the random number is contained in the key update response packet, the key update response packet is legal. Otherwise, if the random number is not contained in the key update response packet, the key update response packet is illegal, and the key data therein may be intercepted or tampered.

In step S370, in the trusted execution environment, the current key package is replaced with the update key package in the legitimate key update response package. Therefore, for the key update response packet from the server side, verification can be performed under a trusted execution environment, and the updated key packet passing the verification is used as a new key packet in the device, so that the update of the key packet is realized, and the safety and the reliability of the updated key packet are ensured.

Fig. 12 shows an application scenario diagram of a voice data collection method and a playback method according to the present application. As shown in fig. 12, the application scenario includes a device manufacturer 410, a voice service 420, and a voice device 430. Device manufacturer 410 is the manufacturer that produces voice device side 430; the voice equipment 430 is a terminal device that performs man-machine voice interaction with a user, for example, an intelligent sound box, an intelligent mobile terminal, an intelligent set-top box with a voice interaction function, or an intelligent television; the voice server 420 is a server providing background computing and voice services, receives voice data sent from the voice device 430 through a network, and after analysis, synthesizes TTS voice data and sends the TTS voice data back to the voice device 430, so that the voice device 430 plays the voice data for a user to listen.

The voice device side 430 has a user space and a kernel space, and the untrusted execution environment REEs and trusted execution environment TEEs run independently of each other in the voice device side 430. The non-trusted execution environment REE includes a non-trusted voice application and a normal OS (Operating System), the non-trusted voice application is in a user space, the normal OS is in a kernel space, and an encrypted key package is stored in a file System read-only partition of the normal OS. The trusted execution environment TEE comprises a trusted voice application and a secure OS, wherein the trusted voice application is located in a user space, the secure OS is located in a kernel space, and the secure OS comprises a secure encryption module and a secure audio module.

Before the delivery of the voice device side 430, the device manufacturer 410 may apply for the unique key package of the device to the key distribution center using the UUID of the device. The key package may contain encryption keys, signature keys, and accessory information, which may include information such as key IDs and key refresh intervals. The key distribution center ensures that the key ID and the key information in the key package corresponding to the UUID of each device have uniqueness. The device manufacturer 410 may sign the keybag with a signing key in the secure encryption module of the voice device side 430 and encrypt with an encryption key in the secure encryption module. The encrypted keybag may be stored in a read-only memory partition of the voice device side 430.

When the voice device 430 performs man-machine voice interaction with the user, the non-trusted voice application may read the encrypted key package from the read-only memory partition of the file system of the normal OS, and transfer the encrypted key package to the trusted voice application through the shared memory (not shown in the figure) of the trusted execution environment TEE and the non-trusted execution environment REE. The trusted voice application may decrypt and verify the signature of the encrypted key package by using an encryption key and a signature key preset in the secure encryption module, and the decrypted key package may be stored in a secure memory (not shown in the figure) exclusive to the trusted execution environment. The encryption key, the signature key, and additional information (e.g., key ID, key refresh time interval, etc.) in the key package can be obtained by parsing the signature-authenticated key package. It should be noted that the encryption key and the signing key in the key package obtained in this operation are different from the encryption key and the signing key preset in the secure encryption module.

When the user makes a sound, the trusted voice application of the voice device 430 obtains the original voice data using the secure audio module and stores it in the secure memory of the trusted execution environment. In order to improve the subsequent transmission efficiency and save bandwidth, it is often necessary to compress the original voice data in a trusted execution environment, and the compression technology may be mp3, m4a, ogg, etc. In a trusted execution environment, the compressed voice data tail can be added with additional information, then 16 bytes aligned and packed into a request data packet. The additional information may include a key ID, a random number, and signature data, where the key ID is used to enable the voice server 420 to identify and obtain a corresponding key package accordingly; the random number is randomly generated in the trusted execution environment, so that in order to prevent replay attack, the TTS voice data returned by the voice server 420 should also contain the random number for verification; the signature data is to ensure the integrity of the voice data transmission process. Then, the encryption key in the key package is used for encrypting the request data package, and the key ID is added at the tail part of the request data package, wherein the key ID is the same as the additional information key ID in the request data package. The request packet thus generated is forwarded to the voice server 420 after passing through the shared memory to the untrusted voice application.

After receiving the request packet, the voice server 420 can learn which key packet it corresponds to according to the key ID therein. After decrypting and signature verification of the request packet, the voice server 420 may obtain the voice data (and the random number therein) in the request packet. The voice service 420 may generate responsive voice data, e.g., synthesized TTS voice data, through processing and analysis of the voice data. After compressing the response voice data, the voice server 420 adds additional information and performs 16-byte alignment to package the response voice data into a response data packet, where the additional information includes a key ID, the random number, and signature data. Then, the voice service 420 encrypts the response packet, adds the key ID to the tail, and sends the encrypted response packet to the voice device 430.

After receiving the response packet, the untrusted voice application of the voice device 430 forwards the response packet to the trusted voice application through the shared memory. The trusted voice application loads the response data packet into the secure memory and finds the corresponding key packet through the key ID therein. The trusted voice application decrypts and signature verifies the response data packet through the encryption key and the signature key of the key packet, and verifies whether the response data packet contains the random number in the previously sent request data packet. After the verification and the verification, the trusted voice application extracts the compressed TTS voice data from the decrypted response data packet to decompress, so as to obtain the TTS voice data. The trusted voice application then completes the playing of the TTS voice data using the secure audio module.

When the voice device side 430 updates the key package, the trusted voice application first uses a secure clock (not shown in the figure) set in the trusted execution environment TEE to trigger the key update request at regular time, where the triggering time can be defined according to the security required by the voice interaction, and the shorter the time interval, the higher the security. As described above, when the device manufacturer 410 applies for the key package, the additional information in the key package includes key refresh time interval information.

Based on the key update request, the trusted voice application adds additional information to the tail of the currently used key package according to the signing key, the encryption key and the additional information in the currently used key package, performs 16-byte alignment, and packages the additional information into a key update request package, wherein the additional information comprises a key ID, a random number and signature data. Then, the trusted voice application encrypts the key update request packet (adopts the encryption key in the current key packet), adds the key ID at the tail part, and hands the generated key update request packet to the untrusted voice application through the shared memory. The untrusted voice application sends a key update request packet to the voice server 420 over the network, the key update request packet being opaque and invisible to the untrusted voice application.

After receiving the key update request packet, the voice server 420 first knows which key packet it corresponds to according to the key ID therein. After the voice server 420 decrypts and verifies the signature of the key update request packet, it can learn the random number therein. Subsequently, the voice server 420 randomly generates a new key package, and adds additional information to the tail of the new key package and performs 16-byte alignment to package the new key package into a key update response package, where the additional information includes a key ID (newly generated key ID), the random number, and signature data. Then, the voice server 420 encrypts the key update response packet, adds a key ID (the key ID of the original key packet is identical to the key ID in the key update request packet) to the tail, and sends the encrypted key update response packet to the voice device 430. The signing and encrypting keys adopt the signing key and the encrypting key in the original key package.

After receiving the key update response packet through the network, the untrusted voice application of the voice device 430 transfers the key update response packet to the trusted voice application through the shared memory. The trusted voice application decrypts and signature-authenticates the key update response packet by using the current key packet (old key packet), and verifies whether the key update response packet contains the random number of the previously issued key update request packet. After the verification and the verification, the trusted voice application replaces the old key package currently used with the new key package in the key update response package.

Those skilled in the art will appreciate that the inventive aspects may be implemented as a system, method, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to as a "circuit," module "or" system. Furthermore, the application can take the form of a computer program product embedded in any tangible expression medium having computer-usable program code embodied in the medium.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an apparatus including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

While the above description includes many specific arrangements and parameters, it is noted that these specific arrangements and parameters are merely illustrative of one embodiment of the application. This should not be taken as limiting the scope of the application. Those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the application. The scope of the application should, therefore, be construed in accordance with the appended claims.

Claims

1. A voice data acquisition method is applied to voice interaction equipment and comprises the following steps:

in the trusted execution environment, the voice data in the exclusive memory is packaged into a voice data packet by utilizing a preset key packet, and the voice data packet is written into a shared memory; the key package is stored in a read-only memory partition after being encrypted and signed when the voice interaction equipment leaves the factory; the key package is unique to the voice interaction device;

reading the voice data packet from the shared memory in an untrusted execution environment independent of the trusted execution environment, and sending the voice data packet to a server, wherein the shared memory is a memory shared by the trusted execution environment and the untrusted execution environment;

During the use of the key package, the key package is dynamically updated, and the updating process comprises the following steps: in the trusted execution environment, adding a random number to a current key package based on the key package update request, packaging the current key package into an update request package, and writing the update request package into a shared memory; the random number is used for verifying the received key update response packet;

reading the update request packet from the shared memory and sending the update request packet to a server in an untrusted execution environment independent of the trusted execution environment;

in the trusted execution environment, reading the key update response packet from the shared memory, and loading the key update response packet into the exclusive memory;

2. The method of claim 1, further comprising:

In the non-trusted execution environment, reading the key pack pre-stored in a read-only memory partition, and writing the key pack into the shared memory;

and in the trusted execution environment, reading the key pack from the shared memory and loading the key pack into the exclusive memory.

3. The method of claim 2, further comprising:

and in the trusted execution environment, decrypting and signature verification are carried out on the key package loaded into the exclusive memory by utilizing an encryption key and a signature key preset in the trusted execution environment so as to acquire the encryption key, the signature key and the key ID in the key package.

4. The method of claim 3, wherein in the trusted execution environment, encapsulating the voice data in the exclusive memory into voice data packets using a predetermined key pack, and writing the voice data packets into a shared memory comprises:

compressing the voice data in the trusted execution environment;

in the trusted execution environment, a signature is generated for the compressed voice data by using the encryption key and the signature key in the key package, and encryption is carried out to generate the voice data package.

5. The method of claim 4, wherein in the trusted execution environment, encapsulating the voice data in the exclusive memory into voice data packets using a predetermined key pack, and writing the voice data packets into a shared memory further comprises:

in the trusted execution environment, the key ID is added to the voice data packet.

6. The method of claim 4, wherein in the trusted execution environment, encapsulating the voice data in the exclusive memory into voice data packets using a predetermined key pack, and writing the voice data packets into a shared memory further comprises:

in the trusted execution environment, a random number is added to the compressed voice data.

7. A playing method of voice data is applied to voice interaction equipment and comprises the following steps:

in the trusted execution environment, voice data in legal voice response packets are obtained;

the key package is stored in a read-only memory partition after being encrypted and signed when the voice interaction equipment leaves the factory; the key package is unique to the voice interaction device;

8. The method of claim 7, wherein verifying the legitimacy of the voice response package with a preset key package in the trusted execution environment comprises:

searching a key package corresponding to the key ID according to the key ID of the voice response package in the trusted execution environment;

and in the trusted execution environment, decrypting and signature authentication are carried out on the voice response package by utilizing the encryption key and the signature key in the key package.

9. The method of claim 8, wherein verifying the legitimacy of the voice response package with a preset key package in the trusted execution environment further comprises:

and in the trusted execution environment, verifying the validity of the voice response packet by using a preset random number.

10. The method of claim 7, wherein in the trusted execution environment, obtaining voice data in a legitimate voice response package comprises:

Extracting compressed voice data from legal voice response packets in the trusted execution environment;

in the trusted execution environment, decompressing the compressed voice data.

11. A method for updating a key package is applied to a device side and comprises the following steps:

in a trusted execution environment, triggering a key pack update request based on a key refreshing time interval in a current key pack and a preset secure clock; the current key package is stored in a read-only memory partition after being encrypted and signed when the equipment end leaves the factory; the key package is unique to the device side;

during the use of the key package, the key package is dynamically updated, and the updating process comprises the following steps: in the trusted execution environment, adding a random number to the current key package based on the key package update request, packaging the current key package into an update request package, and writing the update request package into a shared memory; the random number is used for verifying the received key update response packet;

12. The method of claim 11, wherein in the trusted execution environment, encapsulating the current keybag into an update request bag based on the keybag update request, and writing the update request bag to shared memory comprises:

and in the trusted execution environment, generating a signature for the current key package by using the encryption key and the signature key in the current key package and encrypting the signature to generate the update request package.

13. The method of claim 12, wherein in the trusted execution environment, encapsulating the current keybag into an update request bag based on the keybag update request, and writing the update request bag to shared memory further comprises:

And in the trusted execution environment, adding a key ID preset in the current key package to the update request package.

14. The method of claim 12, wherein verifying the legitimacy of the keyupdate response package with the current keypackage in the trusted execution environment comprises:

and in the trusted execution environment, decrypting and signature authenticating the key update response package by utilizing the encryption key and the signature key in the current key package.

15. The method of claim 11, wherein verifying, in the trusted execution environment, the legitimacy of the key update response package with the current key package further comprises:

and in the trusted execution environment, verifying the validity of the key update response packet by using the random number.

16. An apparatus, comprising:

a processor;

a memory for storing one or more programs;

the one or more programs, when executed by the processor, cause the processor to perform the method of any of claims 1-15.

17. A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to implement the method of any of claims 1-15.