Disclosure of Invention
Aiming at the technical problems existing in the prior art, the embodiment of the application provides a virtual digital man driving method and device, electronic equipment and a storage medium.
In a first aspect, an embodiment of the present application provides a virtual digital person driving method, including:
acquiring audio data, and acquiring weights of a plurality of first expression groups in a first expression group system based on the audio data;
based on the mapping relation between the first expression group in the first expression group system and the second expression group in the second expression group system and the weights of the plurality of first expression groups, the weights of the plurality of second expression groups in the second expression group system are obtained, and the weights of the plurality of second expression groups are utilized to drive the virtual digital person, wherein the second expression group system is a ARKit expression group system, and the number of the first expression groups in the first expression group system is larger than that of the second expression groups in the second expression group system.
In a second aspect, an embodiment of the present application further provides a virtual digital man driving apparatus, including:
The computing unit is used for acquiring audio data and obtaining weights of a plurality of first expression groups in the first expression group system based on the audio data;
The driving unit is used for obtaining weights of a plurality of second expression groups in the second expression group system based on the mapping relation between the first expression groups in the first expression group system and the second expression groups in the second expression group system and the weights of the plurality of first expression groups, and driving the virtual digital person by utilizing the weights of the plurality of second expression groups, wherein the second expression group system is a ARKit expression group system, and the number of the first expression groups in the first expression group system is larger than that of the second expression groups in the second expression group system.
In a third aspect, embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the virtual digital person driving method according to the first aspect.
In a fourth aspect, an embodiment of the present application further provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the virtual digital person driving method as described in the first aspect.
In the scheme, the weights of the plurality of first expression groups in the first expression group system are obtained based on the audio data, and then the weights of the plurality of first expression groups in the first expression group system are converted into the weights of the plurality of second expression groups in the second expression group system, so that the weights of all the second expression groups in the second expression group system are obtained based on the audio data instead of directly, the accuracy of the obtained weights of the second expression groups can be improved, the accuracy of virtual digital human face driving can be improved, and the facial expression is more natural.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for the purpose of illustration and description only and are not intended to limit the scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this disclosure, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the present disclosure.
In addition, the described embodiments are only some, but not all, embodiments of the application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
It should be noted that the term "comprising" will be used in embodiments of the application to indicate the presence of the features stated hereafter, but not to exclude the addition of other features.
Referring to fig. 1, a flow chart of a virtual digital person driving method according to an embodiment of the present application is shown, where the virtual digital person driving method includes:
s10, acquiring audio data, and acquiring weights of a plurality of first expression groups in a first expression group system based on the audio data;
in this embodiment, it should be noted that the audio data may be original audio data or audio data converted from text. The first expression base system is another set of expression base systems that is distinct from the second expression base system (i.e., ARKit expression base systems) and that contains a number of first expression bases that is greater than 52 (i.e., the number of ARKit expression bases). That is, the first expression base system more finely divides the facial expression than the ARKit expression base system. 62 first expression bases of the first expression base system are shown in fig. 2. When the weights of the plurality of first expression groups in the first expression group system are obtained based on the audio data, the weights of the plurality of first expression groups in the first expression group system can be obtained based on the audio data, or the weights of one part of the first expression groups in the first expression group system can be obtained based on the audio data, and then the weights of the other part of the first expression groups in the first expression group system can be deduced based on the weights of the one part of the first expression groups in the first expression group system, wherein the weights of the plurality of first expression groups in the first expression group system, the weights of the one part of the first expression groups in the first expression group system and the weights of the other part of the first expression groups in the first expression group system can be generated through a pre-trained prediction model. For example, 37 first expression groups marked in fig. 2 may be the aforementioned part of first expression groups, 25 first expression groups not marked in fig. 2 may be the aforementioned another part of first expression groups, and after calculating weights of the 37 first expression groups marked in fig. 2, weights of the 25 first expression groups not marked in fig. 2 may be calculated based on the weights of the 37 first expression groups marked in fig. 2, and weights of the 37 first expression groups marked in fig. 2 may be recalculated or not recalculated.
S11, obtaining weights of a plurality of second expression groups in a second expression group system based on a mapping relation between a first expression group in the first expression group system and a second expression group in the second expression group system and the weights of the plurality of first expression groups, and driving a virtual digital person by using the weights of the plurality of second expression groups, wherein the second expression group system is a ARKit expression group system, and the number of the first expression groups in the first expression group system is larger than that of the second expression groups in the second expression group system.
In fig. 2, if ARKit expression group tongueOut (spitting tongue) in the last line does not have the corresponding first expression group, the weight of tongueOut does not need to be calculated, and the weight of tongueOut may be set to 0. Except for the last line, each line of data represents the mapping relation between at least one first expression group and one ARKit expression group, after the weights of a plurality of first expression groups are calculated, the weights of a plurality of ARKit expression groups need to be calculated based on the mapping relation, and finally, the weights of a plurality of ARKit expression groups are used for driving the virtual digital person. For example, 4 first expression groups: the weights of mouthFunnel need to be calculated based on the weights of mouth_fuel_dl, mouth_fuel_dr, mouth_fuel_ul, and mouth_fuel_ur, with mouth_fuel_dl (upper left lip down), mouth_fuel_ul (lower left lip up), and mouth_fuel_ur (lower right lip up) mapped to ARKit expression base of mouthFunnel (slightly open mouth and double open lips). Finally, the weights of all ARKit expression groups in fig. 2 can be obtained, and the weights of all ARKit expression groups are utilized to drive the virtual digital person.
According to the virtual digital person driving method provided by the embodiment of the application, the weights of the plurality of first expression groups in the first expression group system are obtained based on the audio data, and then the weights of the plurality of first expression groups in the first expression group system are converted into the weights of the plurality of second expression groups in the second expression group system, so that the accuracy of the obtained weights of the second expression groups can be improved, and the accuracy of the virtual digital person face driving can be improved, and the facial expression is more natural.
Based on the foregoing method embodiment, the obtaining weights of the plurality of second expression groups in the second expression group system based on the mapping relationship between the first expression group in the first expression group system and the second expression group in the second expression group system and the weights of the plurality of first expression groups may include:
and calculating the weight gamma_Q of the second expression group Q, wherein the calculation formula is as follows: gamma_q= Σ (ai x γ_pi);
Wherein Σ (ai×γ_pi) represents summing ai×γ_pi, subscript of Σ in Σ (ai×γ_pi) is i=1, superscript is N, N represents the number of first expression bases mapped to the second expression base Q, γ_pi represents weight of the ith first expression base Pi mapped to the second expression base Q, ai represents weight coefficient corresponding to γ_pi, ai is determined according to weight of the first expression base mapped to the second expression base Q.
Based on the foregoing method embodiment, the calculation formula of ai may be: ai = λi/(Σλi);
Where Σλi denotes summing λi, subscript of Σ in Σλi is i=1, subscript is N, λi= (1/((2pi)0.5×δ))×exp(-(γ_Pi-μ)2/(2×δ)), μ is the mean of the weights of the first expression group mapped to the second expression group Q, and δ is the variance of the weights of the first expression group mapped to the second expression group Q.
On the basis of the foregoing method embodiment, the driving the virtual digital person with the weights of the plurality of second expression bases may include:
optimizing the weight gamma_Q of the second expression base Q, wherein an optimization formula is gamma_QQ=1/(1+exp (-5 x gamma_Q)) -0.5;
Wherein, gamma_QQ is the weight of the optimized second expression group Q;
And driving the virtual digital person by using the optimized weight of the second expression group Q.
Referring to fig. 3, a schematic structural diagram of a virtual digital man driving device according to an embodiment of the present application is shown, where the device includes:
a computing unit 30, configured to obtain audio data, and obtain weights of a plurality of first expression bases in a first expression base system based on the audio data;
The driving unit 31 is configured to obtain weights of a plurality of second expression groups in the second expression group system based on a mapping relationship between a first expression group in the first expression group system and a second expression group in the second expression group system, and the weights of the plurality of second expression groups are used to drive the virtual digital person, where the second expression group system is a ARKit expression group system, and the number of the first expression groups in the first expression group system is greater than the number of the second expression groups in the second expression group system.
According to the virtual digital human driving device provided by the embodiment of the application, the weights of the plurality of first expression groups in the first expression group system are obtained based on the audio data, and then the weights of the plurality of first expression groups in the first expression group system are converted into the weights of the plurality of second expression groups in the second expression group system, so that the accuracy of the obtained weights of the second expression groups can be improved, the accuracy of the virtual digital human facial driving can be improved, and the facial expression is more natural.
On the basis of the foregoing device embodiments, the driving unit may specifically be configured to:
and calculating the weight gamma_Q of the second expression group Q, wherein the calculation formula is as follows: gamma_q= Σ (ai x γ_pi);
Wherein Σ (ai×γ_pi) represents summing ai×γ_pi, subscript of Σ in Σ (ai×γ_pi) is i=1, superscript is N, N represents the number of first expression bases mapped to the second expression base Q, γ_pi represents weight of the ith first expression base Pi mapped to the second expression base Q, ai represents weight coefficient corresponding to γ_pi, ai is determined according to weight of the first expression base mapped to the second expression base Q.
Based on the foregoing device embodiment, the calculation formula of ai may be: ai = λi/(Σλi);
Where Σλi denotes summing λi, subscript of Σ in Σλi is i=1, subscript is N, λi= (1/((2pi)0.5×δ))×exp(-(γ_Pi-μ)2/(2×δ)), μ is the mean of the weights of the first expression group mapped to the second expression group Q, and δ is the variance of the weights of the first expression group mapped to the second expression group Q.
The implementation process of the virtual digital person driving device provided by the embodiment of the application is consistent with the virtual digital person driving method provided by the embodiment of the application, and the effect achieved by the virtual digital person driving device is the same as that of the virtual digital person driving method provided by the embodiment of the application, and the description thereof is omitted.
As shown in fig. 4, an electronic device provided in an embodiment of the present application includes: a processor 40, a memory 41 and a bus 42, said memory 41 storing machine readable instructions executable by said processor 40, said processor 40 and said memory 41 communicating via the bus 42 when the electronic device is running, said processor 40 executing said machine readable instructions to perform the steps of a virtual digital person driving method as described above.
Specifically, the above-described memory 41 and processor 40 can be general-purpose memories and processors, and are not particularly limited herein, and the above-described virtual digital person driving method can be performed when the processor 40 runs a computer program stored in the memory 41.
Corresponding to the above-mentioned virtual digital person driving method, the embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, performs the steps of the above-mentioned virtual digital person driving method.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily appreciate variations or alternatives within the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.