CN115237785B

Movatterモバイル変換

Info

Publication number: CN115237785B
Application number: CN202210906435.7A
Authority: CN
Inventors: 孙奥兰; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2025-05-30
Anticipated expiration: 2042-07-29
Also published as: CN115237785A

Abstract

The embodiment relates to the technical field of artificial intelligence, in particular to a test method of a voice synthesis system, electronic equipment and a storage medium. The test method of the voice synthesis system comprises the steps of obtaining system attribute data of the voice synthesis system to be tested from an application end, screening a preliminary test template from a preset first test database according to source information, filling the preliminary test template according to version type information to obtain target test data, screening test scheme data from a preset second test database according to the target test data, obtaining initial test cases according to the test scheme data, screening the initial test cases according to preset screening conditions to obtain target test cases, and testing the voice synthesis system according to the target test cases. The embodiment of the application can accelerate the test progress of the voice synthesis system and improve the test efficiency of the voice synthesis system.

Description

Test method of voice synthesis system, electronic equipment and storage medium

Technical Field

The present invention relates to the field of artificial intelligence, and in particular, to a method for testing a speech synthesis system, an electronic device, and a storage medium.

Background

Software testing refers to the process of verifying and validating software products, including staged products.

Disclosure of Invention

The main purpose of the disclosed embodiments is to provide a test method, an electronic device and a storage medium for a speech synthesis system, which can accelerate the test progress of the speech synthesis system and improve the test efficiency of the speech synthesis system.

To achieve the above object, a first aspect of an embodiment of the present disclosure provides a method for testing a speech synthesis system, including:

Acquiring system attribute data of a voice synthesis system to be tested from an application end, wherein the voice synthesis system runs on the application end, and the system attribute data comprises source information of the application end and version type information of the voice synthesis system;

Screening a preliminary test template from a preset first test database according to the source information;

Filling the preliminary test template according to the version type information to obtain target test data;

screening test scheme data from a second preset test database according to the target test data;

obtaining an initial test case according to the test scheme data;

screening the initial test cases according to preset screening conditions to obtain target test cases;

and testing the voice synthesis system according to the target test case.

In some embodiments, the test plan data includes a test type, the test type including a functional test;

The obtaining the initial test case according to the test scheme data comprises the following steps:

If the test type is the functional test, acquiring test scene information of the voice synthesis system;

And screening the initial test case from a preset test case database according to the test scene information, or inputting the test scene information into a preset text generation model for text generation to obtain the initial test case.

In some embodiments, the screening conditions include a match pass result, and the test plan data includes test requirement information;

the initial test case is screened according to preset screening conditions to obtain a target test case, which comprises the following steps:

Acquiring test title information of the initial test case;

Obtaining the test requirement information according to the test scheme data;

comparing the test requirement information with the test title information;

If the test requirement information is matched with the test title information, obtaining a matching passing result;

And taking the initial test case as the target test case according to the matching passing result.

In some embodiments, the screening conditions include an alignment pass result;

acquiring first test audio generated by the voice synthesis system according to the initial test case;

Inputting the first test audio to a preset target voice recognition model for recognition to obtain a test text;

Comparing the test text with the initial test case;

If the test text is consistent with the initial test case, obtaining the comparison passing result;

And taking the initial test case as the target test case according to the comparison passing result.

In some embodiments, the system attribute data further comprises acoustic training parameters;

before the initial test case is screened according to the preset screening conditions to obtain the target test case, the test method further comprises the steps of constructing the target voice recognition model, and specifically comprises the following steps:

Obtaining language knowledge according to a preset language database, wherein the language knowledge comprises acoustic knowledge, phonological knowledge and language priori knowledge;

inputting the acoustic knowledge and the phonological knowledge into an original acoustic model for training to obtain a preliminary acoustic model;

Inputting the language priori knowledge into an original language model for training to obtain a preliminary language model;

Constructing a preliminary voice recognition model according to the preliminary acoustic model and the preliminary language model;

Acquiring voice information, and acquiring voice parameters according to the voice information, wherein the voice parameters are matched with the acoustic training parameters;

And matching the voice parameters with the preliminary voice recognition model to obtain the target voice recognition model.

In some embodiments, the test method further comprises:

if the version type information represents iteration type, acquiring a history test case of the voice synthesis system;

obtaining an actual test case according to the historical test case and the target test case;

and testing the voice synthesis system according to the actual test case.

In some embodiments, the test method further comprises:

Acquiring second test audio generated by the voice synthesis system according to the target test case;

Analyzing the audio content of the second test audio to obtain an analysis result, wherein the analysis result comprises an error result representing that the voice synthesis system has a functional defect;

And generating prompt information for improving the prompt function according to the error result.

In some embodiments, the preliminary test template includes a fill field;

and performing filling operation on the preliminary test template according to the version type information to obtain target test data, wherein the filling operation comprises the following steps:

if the version type information represents iteration type, acquiring historical test data of the voice synthesis system;

Acquiring field attributes of the filling field, wherein the field attributes comprise reusable types;

taking the filling field with the field attribute of the multiplexing type as a field to be processed;

and filling the field to be processed according to the historical test data to obtain the target test data.

To achieve the above object, a second aspect of the embodiments of the present disclosure proposes an electronic device, including:

at least one memory;

At least one processor;

At least one computer program;

The computer program is stored in the memory, and the processor executes the at least one computer program to implement:

A method of testing a speech synthesis system as claimed in any one of the first aspects.

To achieve the above object, a third aspect of the embodiments of the present disclosure proposes a computer-readable storage medium storing computer-executable instructions for causing a computer to execute:

According to the testing method of the voice synthesis system, provided by the embodiment of the application, the primary testing module and the testing scheme data can be obtained by screening according to the system attribute data of the voice synthesis system to be tested according to the first testing database and the second testing database, so that the target testing case for testing is obtained according to the testing scheme data and the screening conditions. Therefore, the test method of the voice synthesis system provided by the embodiment of the application can automatically acquire the preliminary test template (i.e. the test plan) and the test scheme data (i.e. the test scheme), and automatically test the voice synthesis system on the basis, thereby avoiding the method of manually making the test plan and the test scheme in the related technology. Therefore, the test method of the voice synthesis system provided by the embodiment of the application can accelerate the test progress and improve the test efficiency.

Drawings

FIG. 1 is a flow chart of a test method of a speech synthesis system according to an embodiment of the application;

FIG. 2 is another flow chart of a test method of a speech synthesis system according to an embodiment of the present application;

FIG. 3 is another flow chart of a test method of a speech synthesis system according to an embodiment of the present application;

FIG. 4 is another flow chart of a test method of the speech synthesis system according to the embodiment of the application;

FIG. 5 is another flow chart of a test method of a speech synthesis system according to an embodiment of the present application;

FIG. 6 is another flow chart of a test method of a speech synthesis system according to an embodiment of the present application;

FIG. 7 is another flow chart of a test method of a speech synthesis system according to an embodiment of the present application;

FIG. 8 is another flow chart of a test method of a speech synthesis system according to an embodiment of the present application;

FIG. 9 is a block diagram of a test apparatus of a speech synthesis system according to an embodiment of the present application;

Fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

First, several nouns involved in the present application are parsed:

Artificial intelligence ARTIFICIAL INTELLIGENCE, AI is a new technical science for studying, developing theories, methods, techniques and application systems for simulating, extending and expanding human intelligence, is a branch of computer science, and attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence, including robotics, language recognition, image recognition, natural language processing, expert systems, and the like. Artificial intelligence can simulate the information process of consciousness and thinking of people. Artificial intelligence is also a theory, method, technique, and application system that utilizes a digital computer or digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Natural language processing (natural language processing, NLP) NLP is a branch of artificial intelligence, which is a interdisciplinary of computer science and linguistics, and is often referred to as computational linguistics, where NLP is processed, understood, and applied in human language (e.g., chinese, english, etc.). Natural language processing includes parsing, semantic analysis, chapter understanding, and the like. Natural language processing is commonly used in the technical fields of machine translation, handwriting and print character recognition, voice recognition and text-to-speech conversion, information retrieval, information extraction and filtering, text classification and clustering, public opinion analysis and viewpoint mining, and the like, and relates to data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research, linguistic research related to language calculation, and the like.

Speech synthesis (TTS), also known as Text-to-Speech (TTS), is a technique that converts Text information, either generated by the computer itself or externally input, into an intelligible, fluent Speech output. In the speech synthesis technology, it is mainly divided into a language analysis section and an acoustic system section, also called front-end section and back-end section. The acoustic system part generates corresponding audio according to the linguistic specification provided by the linguistic analysis part, thereby realizing the function of sounding. Specifically, the language analysis part comprises character input, text structure and language judgment, text standardization, text-to-phoneme conversion, sentence reading prosody prediction and the like. The text structure and language judgment is to judge the language of the input text to be synthesized, for example, judge the text to be synthesized as Chinese, english, japanese and the like, then segment the whole text in the text to be synthesized into single sentences according to grammar rules of corresponding languages, and transmit the segmented sentences to a subsequent processing module. The text normalization is to perform normalization processing on all contents in the text to be synthesized, for example, when Arabic numerals or letters exist in the text to be synthesized, the Arabic numerals or letters need to be converted into characters according to set rules so as to facilitate subsequent text phonetic transcription work. The text-to-phoneme is used for determining the pronunciation of the current synthesized text, for example, in the speech synthesis of Chinese, the text is mainly marked by pinyin, so the text-to-phoneme needs to convert the text into the corresponding pinyin. When some characters are polyphones, the specific reading of the characters is judged through word segmentation, part-of-speech syntactic analysis and the like, and the characters are determined to be the tones of several sounds. The prosody prediction is to predict the prosody of the text, i.e. determine where the synthesized voice needs to be stopped, how long, which words or words need to be re-read, which words need to be read lightly, etc., so as to make the voice of the synthesized voice high and low meandering, and to suppress the pause, thereby achieving the effect of more truly imitating the voice of a person. The acoustic system part mainly has three technical implementation modes, namely a waveform splicing voice synthesis technology, a parameter voice synthesis technology and an end-to-end voice synthesis technology. The waveform splicing voice synthesis technology is used for splicing syllables in the existing library to realize the voice synthesis function. The parameter speech synthesis technology mainly carries out frequency spectrum characteristic parameter modeling on the existing sound recording through a data method, constructs a mapping relation of a text sequence mapping to speech characteristics, and generates a parameter synthesizer. The end-to-end speech synthesis technology is a method of learning through a neural network, and outputs synthesized audio to directly input text or phonetic characters according to a middle black box part.

Speech recognition (Automatic Speech Recognition, ASR) is used to convert lexical content in human speech into computer-readable input, such as keys, binary codes, or character sequences. The speech recognition system comprises four parts, namely signal processing and feature extraction, an acoustic model, a language model and decoding search. The signal processing and feature extraction takes an audio signal as an input, enhances speech by removing noise and channel distortion, converts the signal from the time domain to the frequency domain, and extracts a suitable representative feature vector for the acoustic model. The acoustic model integrates knowledge of acoustics and phonology, takes the characteristics generated by the characteristic extraction part as input, and generates an acoustic model score for the variable-length characteristic sequence. The language model is used to learn word-to-word correlations by training corpus to estimate the likelihood of hypothesized word sequences, which is also known as language model score. Thus, if knowledge of the prior knowledge of the corresponding domain, or knowledge of the prior knowledge related to the task, is known, the language model score can be effectively improved.

The prior knowledge is knowledge prior to experience, and can be used for adjusting and optimizing the parameter range of the model or restraining the model so as to improve the accuracy of model output data. For example, when the license plate is identified by using the image recognition model, the prior knowledge includes an aspect ratio of the license plate, a background color of the license plate, a color of a character, a font of a character, a number of lines of characters, a character interval distribution feature, a character set to which each character belongs, and the like.

Knowledge mining refers to acquiring information such as entities, new entity links, new association rules and the like from data. The main techniques include linking and disambiguation of entities, knowledge rule mining, knowledge graph representation learning, and the like. Wherein, entity linking and disambiguation is content mining of knowledge; the knowledge rule mining is structure mining, and the representation learning is mining after mapping the knowledge graph to a vector space.

Phonetic study, which is used to study pronunciation mechanism, phonetic characteristics, change rules in speaking, etc. The phonetic study objects include vowels, consonants, tones, accents, rhythms, pitch variations, prosody, and the like.

Linguistics-the subject of human language, which includes language structure, word formation, syntax, semantic meaning, etc.

Software Testing (Software Testing) is a process of auditing or comparing actual output with expected output, and is used for identifying correctness, integrity, security, quality, etc. of Software to be tested. Software testing can be categorized into unit testing, integration testing, and system testing according to development stage division. The unit test is also called a module test, and is a minimum unit for software design. The purpose of unit testing is to examine whether each program unit can properly fulfill the requirements of module functions, performance, interfaces, design constraints, etc. in the detailed design specification, and to discover various errors that may exist inside each module. The unit test needs to design a test case from the internal structure of the program, and a plurality of modules can independently perform the unit test in parallel. Integrated testing, also called assembly testing, is typically performed on a unit-by-unit basis, which is a sequential, incremental testing process of all program modules. Integration testing is used to verify the interface relationships of program elements or components to gradually integrate into a program component or overall system that meets the requirements of a schematic design. The system test is a process of checking whether a complete program system can be correctly configured and connected with a system (including hardware, peripheral equipment, a network, system software, a support platform and the like) under a real system running environment, and finally meeting design requirements.

Test Case (Test Case) refers to the description of a specific software product for testing tasks, and the Test scheme, method, technology and strategy are embodied. A test case is a set of test inputs, execution conditions, and expected results that are formulated for a particular purpose to verify that a software product to be tested meets a particular software requirement. The test case mainly comprises four contents of a case title, a precondition, a test step and an expected result. The method comprises the steps of testing a function, wherein a case title mainly describes testing a certain function, a precondition refers to a condition which needs to be met by the case title, a testing step is used for describing an operation step of the test case, and an expected result refers to meeting an expected requirement.

The test plan is file data of an organization management layer, and is used for specifying and restricting the test range, organization, resources, principles and the like of the whole software test process, making task allocation and time scheduling of each stage of the whole test process, and providing assessment, risk analysis and management requirements for each task.

The test scheme is further refinement and definition of the test plan and is file data of the technical layer. The test scheme is used for describing the characteristics of the software product to be tested, the test method, the programming of the test environment, the design and selection of the test tool, the design method of the test case, the design scheme of the test code and the like.

Functional testing (Functional testing), also known as behavioral testing, tests the characteristics and operational behavior of the software product to be tested to determine that it meets design requirements based on the characteristics, operational descriptions, and user context of the software product to be tested.

Random test (Ad-hoc testing) is used for retesting important functions of the software product to be tested, and spot check test of functions and performances of the software product to be tested is an effective mode and process for guaranteeing the integrity of test coverage.

Interface test is a test of testing interfaces between components of a system. The interface test is mainly used for detecting interaction points between an external system and a software product to be tested and between subsystems in the software product to be tested. The key point of the test is to check the exchange, transfer and control management of data, the inter-system logical dependency relationship, etc.

Performance testing refers to the process of simulating normal and peak load access to a software product to be tested by an automatic testing tool or a code means so as to observe whether each performance index of the software product to be tested is qualified. The performance index includes response time, throughput, resource utilization, error rate, etc.

Security testing, which is the process of verifying the security services of the software products to be tested and identifying potential security flaws, includes user management and access control testing, communication and data encryption testing, data backup and recovery testing, and the like.

Regression testing refers to a testing method that re-tests software products to be tested by modifying an old version to obtain a new version to confirm that the modification did not introduce new errors or that other unmodified functions caused errors.

Based on the above, the embodiment of the application provides a test method, electronic equipment and storage medium of a voice synthesis system, which can automatically test the voice synthesis system, thereby accelerating the test progress to a certain extent and further improving the test efficiency of the voice synthesis system.

The embodiment of the application provides a test method of a voice synthesis system, electronic equipment and a storage medium, and specifically, the test method of the voice synthesis system in the embodiment of the application is described firstly by describing the following embodiment.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The embodiment of the application provides a test method of a voice synthesis system, relates to the technical field of artificial intelligence, and particularly relates to the technical field of software test. The test method of the voice synthesis system provided by the embodiment of the application can be applied to the terminal, can be applied to the server side, and can also be software running in the terminal or the server side. In some embodiments, the terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart watch, or the like, the server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), and basic cloud computing services such as big data and an artificial intelligence platform, and the software may be an application that implements a test method of a speech synthesis system, or the like, but is not limited to the above form.

The application is operational with numerous general purpose or special purpose computer system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable consumer electronics, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In a first aspect, referring to fig. 1, an embodiment of the present application provides a test method of a speech synthesis system, where the test method includes, but is not limited to, steps S110 to S170.

S110, acquiring system attribute data of a voice synthesis system to be tested from an application end, wherein the voice synthesis system is operated at the application end, and the system attribute data comprises source information of the application end and version type information of the voice synthesis system;

it can be understood that the speech synthesis system to be tested is a software product to be tested in the embodiment of the present application, and may be loaded at an application end such as a terminal device APP end, a WEB page WEB end, and the like. System attribute data of the speech synthesis system is acquired to determine basic attribute information of the speech synthesis system. The system attribute data comprises source information used for representing an application end loaded by the voice synthesis system and version information used for representing the voice synthesis system. The speech synthesis system is shown to be developed for the first time when the version information is expressed as version 1.0, and is shown to be developed for iteration when the version type information is expressed as version 2.0. It will be appreciated that versions 1.0 and 2.0 are merely exemplary, that is, an iterative development version of 1.1 may be set according to actual needs, which is not particularly limited to the embodiment of the present application.

It will be appreciated that the terminal device may be a mobile terminal device or a non-mobile terminal device. The mobile terminal device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal device, a wearable device, an ultra mobile personal computer, a netbook, a personal digital assistant, a CPE, a UFI (wireless hotspot device), or the like. The non-mobile terminal device may be a personal computer, a television, a teller machine, a self-service machine, or the like. The embodiment of the present application is not particularly limited.

S120, screening out a preliminary test template from a preset first test database according to source information;

It will be appreciated that when the software product to be tested is loaded on different application ends, the emphasis on performing the software test will be different. For example, when the application terminal is a terminal device APP terminal, the testing key points comprise a terminal device model matching test, a system compatibility test, a random test, a performance test and the like, and when the application terminal is a WEB page WEB terminal, the testing key points comprise a browser compatibility test, a website availability test, a concurrent performance test and the like.

Specifically, a first test database is pre-built according to historical software test data. The first test database comprises a plurality of test templates and source information corresponding to each test template. For example, when the source information includes a terminal device APP end and a WEB page WEB end, the first test database includes a test template corresponding to the terminal device APP end and a test template corresponding to the WEB page WEB end. And acquiring a test template corresponding to the application end loaded by the voice synthesis system from the first test database according to the acquired source information, and taking the test template as a preliminary test template. It will be appreciated that the test templates are used to characterize test plans in software testing.

S130, filling the preliminary test template according to version type information to obtain target test data;

It is understood that the preliminary test template obtained according to step S120 is only a generic template corresponding to the source information. Therefore, in order to match with the current speech synthesis system to be tested, the preliminary test template needs to be filled with content according to specific information of the speech synthesis system.

Specifically, the speech synthesis system to be tested is determined to be developed for the first time or iterated according to the version type information. When the speech synthesis system is determined to be developed for the first time, the filling information can be obtained according to the modes of user typing, automatic identification of development project files and the like, and when the speech synthesis system is determined to be developed for iteration, the filling information can be obtained according to the modes of user typing, automatic identification of development project files, information of the last iteration version and the like. And filling the preliminary test template according to the filling information to obtain target test data.

S140, screening test scheme data from a second preset test database according to target test data;

It will be appreciated that the test plan data is used to characterize a test plan in a software test, the test plan including a test type of the test, the test type including a functional test, a random test, an interface test, a performance test, a security test, a regression test, and the like. As can be seen from the above description, the test plan is technical layer document data corresponding to the test plan, so that the test plan data corresponding to the target test data can be selected from the second test database constructed in advance according to the target test data. For example, the target test data indicates that the speech synthesis system to be tested only involves expansion or contraction of the cluster, and at this time, the corresponding requirement verification can be completed only by performing performance test on the speech synthesis system. Thus, a test plan (i.e., test plan data) including the performance test is screened from the second test database based on the target test data. Or when the target test data indicate that the speech synthesis system to be tested is developed iteratively, screening test scheme data comprising a safety test, a random test and a regression test from the second test database according to the target test data.

S150, obtaining an initial test case according to the test scheme data;

It can be appreciated that the test scheme is used to describe a design method of the test case, so that an initial test case matched with the speech synthesis system to be tested can be obtained according to the test scheme data. For example, when the screened test scheme data only includes a functional test, only an initial test case for performing the functional test is acquired or constructed to accelerate the test progress of the speech synthesis system.

S160, screening the initial test cases according to preset screening conditions to obtain target test cases;

It can be understood that, according to step S150, a plurality of initial test cases are obtained or constructed, and in order to ensure that the test cases loaded on the speech synthesis system to be tested meet the test requirements, match the functions of the speech synthesis system, and the like, the plurality of initial test cases are also required to be screened. Specifically, according to preset screening conditions of the test requirements, functions and the like of the voice synthesis system to be tested, taking an initial test case meeting the screening conditions as a target test case.

S170, testing the voice synthesis system according to the target test case.

It can be understood that the target test case obtained by screening is used as an input data set of the voice synthesis system, so as to realize software test of the voice synthesis system. Specifically, taking a functional test as an example, it is assumed that the function of the speech synthesis system to be tested is to convert chinese into chinese. At this time, the target test case is in a text format including Chinese content, the target test case is input to an application terminal loaded with the speech synthesis system, and output data of the application terminal is compared with the target test case, so that the function test of the speech synthesis system is realized.

It may be understood that the test method of the speech synthesis system provided by the embodiment of the present application may be any one of a unit test, an integration test, and a system test of the speech synthesis system, which is not specifically limited to the embodiment of the present application. In addition, in the testing process of the speech synthesis system, the testing method of the speech synthesis system provided by the embodiment of the application can automatically synthesize the test report data according to the generated target test data, the test scheme data, the target test case, the test result when the test is performed according to the target test case, and the like. For example, a test report module is preset, and a filling operation is performed on the test report template according to target test data, test scheme data, target test cases, test results and the like, so as to obtain test report data.

Referring to FIG. 2, in some embodiments, the test plan data includes test types, including functional tests. Step S150 includes, but is not limited to, sub-steps S210 through S220.

S210, if the test type is a functional test, acquiring test scene information of a voice synthesis system;

It will be appreciated that the test plan data is used to characterize a test plan in a software test, the test plan including a test type of the test, the test type including a functional test, a random test, an interface test, a performance test, a security test, a regression test, and the like. When the test type of the voice synthesis system to be tested is determined to be a functional test according to the test scheme data, the test scene information of the voice synthesis system is obtained according to modes such as typing by a user and automatic identification of development project files. When the speech synthesis system to be tested is applied to a specific scene, the test scene information corresponds to the specific scene, for example, when the speech synthesis system to be tested is applied to an insurance scene, the test scene information comprises keyword information related to insurance, such as insurance recommendation, insurance introduction, insurance purchase and the like. When the speech synthesis system to be tested is applied to a non-specific scene, the test scene information comprises keyword information of a plurality of different scenes.

It can be understood that the test scenario information may be in an excel file format, that is, a plurality of keyword information is written into the excel file, and the plurality of keyword information is set in different rows in the excel file, so that the keyword information is conveniently invoked later.

S220, screening out an initial test case from a preset test case database according to the test scene information, or inputting the test scene information into a preset text generation model to generate a text, so as to obtain the initial test case.

It can be appreciated that the embodiment of the application provides two methods for obtaining the initial test case according to the test scene information. First, a test case database is built in advance, and the test case database comprises keyword information corresponding to different scenes and texts corresponding to each keyword information. For example, this keyword information for insurance recommendation will correspond to the text of "you good, you want to buy what aspect of insurance, whether there is an unexpected warranty requirement or a disease assurance requirement". And screening a plurality of texts from the test case database according to the keyword information in the excel file, and taking the texts as initial test cases. It is understood that a keyword may correspond to one text or a plurality of texts, which is not particularly limited in the embodiment of the present application.

Secondly, a text generation model capable of generating random text according to the keyword information is built in advance, the keyword information in an excel file is used as input data of the text generation model, and output data of the text generation model is used as an initial test case. It is to be understood that the text generation model may be a GAN network model or other network models, which is not specifically limited in this embodiment of the present application.

Referring to FIG. 3, in some embodiments, the screening criteria include a match pass result and the test plan data includes test requirement information. Step S160 includes, but is not limited to, sub-steps S310 through S350.

S310, acquiring test title information of an initial test case;

It will be appreciated that the initial test case includes a case header (i.e., test header information) that describes testing a function. Specifically, when the initial test case is acquired according to the first method described above, the test case database includes text corresponding to each keyword information, and test title information corresponding to the text, including, for example, test title information such as text-correctly synthesized audio, text-added space synthesis specified audio, and the like. The text corresponding to the text correct synthesized audio is used for testing the text synthesis function of the speech synthesis system, the text corresponding to the text added space synthesis appointed audio comprises special characters such as spaces, and the text is used for testing the function of the speech synthesis system for identifying the special characters. It will be appreciated that a test header information may correspond to a plurality of texts, for example, a plurality of texts for testing a special character recognition function of the speech synthesis system may be provided.

When the initial test case is obtained according to the second method, whether the initial test case includes special characters or not can be determined according to the technologies of OCR (Optical Character Recognition ) and the like, so that test title information corresponding to the initial test case can be determined.

S320, obtaining test requirement information according to the test scheme data;

it can be understood that the test plan data includes test requirement information of the speech synthesis system to be tested, and the test requirement information is used for describing the functional points to be tested of the speech synthesis system. For example, the test requirement information includes correctly synthesizing audio, recognizing special characters, and the like.

S330, comparing the test requirement information with the test title information;

It can be understood that the test requirement information is compared with the plurality of test header information in a traversing manner so as to determine whether the test header information corresponding to the test requirement information exists or not, and further determine whether all the plurality of initial test cases cover the functional points to be tested or not.

S340, if the test requirement information is matched with the test title information, a matching passing result is obtained;

It can be understood that the test requirement information is respectively matched and compared with a plurality of test title information, and a matching result is obtained. The matching result comprises a matching passing result and a matching failing result. It will be appreciated that one test requirement information may be matched to a plurality of test title information, for example, when the test requirement information indicates that a special character is recognized, it may be matched to test title information of text-added space synthesized specified audio and to test title information of text-added well character (#) synthesized specified audio.

S350, taking the initial test case as a target test case according to the matching passing result.

It can be understood that the initial test case with the matching result being the matching passing result is taken as the target test case, and the software test is performed on the voice synthesis system to be tested according to the target test case.

It can be understood that, in order to ensure that the test requirement information is covered completely, when the test requirement information is not matched with the test requirement information in the plurality of test header information, a new test case is obtained according to the test requirement information, and the new test case is used as a target test case, so as to ensure the comprehensiveness of the target test case.

Referring to fig. 4, in some embodiments, the screening conditions include an alignment pass result. Step S160 includes, but is not limited to, sub-steps S410 through S450.

S410, acquiring first test audio generated by a voice synthesis system according to an initial test case;

S420, inputting a first test audio to a preset target voice recognition model for recognition to obtain a test text;

s430, comparing the test text with the initial test case;

s440, if the test text is consistent with the initial test case, a comparison passing result is obtained;

S450, taking the initial test case as a target test case according to the comparison passing result.

It can be appreciated that in some embodiments, the correctness of the initial test case needs to be determined, that is, the initial test case is used as input data of the speech synthesis system to be tested, and whether the initial test case can be correctly identified and processed by the speech synthesis system is determined according to the output data and the standard data of the speech synthesis system.

Specifically, in step S410 of some embodiments, the initial test case is used as input data of the speech synthesis system to be tested, and the first test audio generated by the speech synthesis system according to the initial test case is obtained.

In step S420 of some embodiments, in order to avoid influencing the test progress and the test accuracy of the speech synthesis system when the output data is compared with the preset data in a manual manner, the test method of the speech synthesis system provided by the embodiment of the application is pre-constructed to obtain the target speech recognition model. And taking the output data of the voice synthesis system to be tested as the input data of the target voice recognition model, and taking the output data of the target voice recognition model as the data to be compared (namely test text).

In step S430 of some embodiments, output data (i.e., test text) of the target speech recognition model is compared with standard data (i.e., initial test cases) to determine whether the initial test cases can be correctly recognized by the speech synthesis system.

In step S440 of some embodiments, according to the comparison process of step S430, a corresponding comparison result is generated, where the comparison result includes a comparison passing result and a comparison failing result. If the output data of the target voice recognition model is consistent with the standard data, a comparison passing result for indicating that the corresponding initial test case can be recognized and processed by the voice synthesis system is generated, namely the initial test case is correct. At this time, the initial test case is used as a target test case, so that a real software test is performed on the speech synthesis system according to the target test case in a subsequent operation. It can be understood that "consistent" indicates that the error between the output data of the target speech recognition model and the standard data is within a preset range, and the specific value of the preset range can be adaptively adjusted according to actual needs, which is not particularly limited in the embodiment of the present application.

Referring to fig. 5, in some embodiments, the system attribute data further includes acoustic training parameters. Before step S420, the testing method of the language synthesis system provided by the embodiment of the present application further includes constructing a target speech recognition model, which specifically includes, but is not limited to, steps S510 to S570.

S510, acquiring language knowledge according to a preset language database, wherein the language knowledge comprises acoustic knowledge, phonological knowledge and language priori knowledge;

It can be understood that voice information, language information and the like of each scene and application field are acquired in advance, and a language database is constructed according to the voice information and the language information. And performing signal processing operation and knowledge mining operation on the information in the language database to obtain language knowledge such as acoustic knowledge, phonological knowledge, language priori knowledge and the like. Wherein the speech information comprises information related to speech, and the language information comprises information related to speech. The acoustic knowledge includes knowledge of pause distribution, language emotion, etc., and the phonology knowledge includes knowledge of phonemes, syllables, etc. The language priori knowledge includes dictionary knowledge, grammar knowledge, syntax knowledge, etc. related to the domain (or scene).

S520, inputting the acoustic knowledge and the phonological knowledge into an original acoustic model for training to obtain a preliminary acoustic model;

it can be appreciated that the plurality of acoustic knowledge and the plurality of phonology knowledge obtained according to the steps are used as training parameters of the original acoustic model to train to obtain a plurality of preliminary acoustic models corresponding to different fields without applying scenes, and/or train to obtain a plurality of preliminary acoustic models corresponding to the same scene but different training parameters are used.

S530, inputting the language priori knowledge into an original language model for training to obtain a preliminary language model;

It can be understood that the multiple language priori knowledge obtained according to the steps is used as training parameters of the original language model to train to obtain multiple preliminary language models corresponding to different fields without application scenes, and/or train to obtain multiple preliminary language models corresponding to the same scene but different training parameters are used.

S540, constructing a preliminary voice recognition model according to the preliminary acoustic model and the preliminary language model;

It can be appreciated that the plurality of preliminary acoustic models are respectively matched with the preliminary language models applied to the related field (or scene) to construct a plurality of preliminary speech recognition models.

S550, acquiring voice information from a target object;

It will be appreciated that in an actual deployment, the speech synthesis system to be tested will be applied to a specific field or a specific scenario. Therefore, the acoustic training parameters of the speech synthesis system are parameters related to the field (or scene), for example, when the speech synthesis system is deployed on the shopping platform APP of the terminal device, the acoustic training parameters include a Linear Prediction Coefficient (LPC) of the target object (i.e., a speaker such as a customer service), a Mel Frequency Cepstrum Coefficient (MFCC), and other speech characteristic parameters having individual attribute identification, so that the output data of the speech synthesis system matches with the tone, pitch, and the like of the target object of the shopping platform.

It can be appreciated that, in order to ensure the accuracy of the comparison of the test text and the initial test case, the target speech recognition model should be a preliminary speech recognition model with a higher recognition capability for the sound of the target object. Thus, the speech information of the target object is acquired, for example, the audio of the target object is acquired, which is used to represent arbitrary content.

S560, extracting characteristic parameters of the voice information according to the acoustic training parameters to obtain voice parameters;

It will be appreciated that the parameter type of the acoustic training parameter is determined, and the characteristic parameter extraction is performed on the voice information according to the parameter type to obtain the voice parameter identical to the parameter type. For example, when the acoustic training parameters include a Linear Prediction Coefficient (LPC), a mel-frequency cepstrum coefficient (MFCC), and the like, the voice parameters such as the Linear Prediction Coefficient (LPC), the mel-frequency cepstrum coefficient (MFCC), and the like are extracted from the voice information.

S570, screening out a target voice recognition model from the preliminary voice recognition model according to the voice parameters.

It can be understood that the voice parameters obtained according to the steps are respectively compared with the corresponding parameters of a plurality of preliminary voice recognition models in a matching way, so as to obtain the preliminary voice recognition model with the highest matching probability. And taking the preliminary speech recognition model with the highest matching probability as a target speech recognition model.

Referring to fig. 6, in some embodiments, the test method of the speech synthesis system provided by the embodiment of the present application further includes, but is not limited to, steps S610 to S630.

S610, if the version type information represents iteration type, acquiring a history test case of the voice synthesis system;

It can be understood that when the speech synthesis system is determined to be developed iteratively according to the system attribute data of the speech synthesis system, a history test case corresponding to a history development version of the speech synthesis system is obtained.

S620, obtaining an actual test case according to the historical test case and the target test case;

It can be appreciated that when the speech synthesis system is developed for iteration, it should be determined whether a phenomenon that the demand case is repeated occurs, i.e., whether the target test case and the history test case are repeated. When the target test case and the history test case are determined to be repeated, the regression test is carried out on the voice synthesis system according to the target test case, namely, the target test case is taken as an actual test case to carry out the retest on the voice synthesis system. When the target test case and the historical test case are determined not to be repeated, the target test case is indicated to correspond to the new requirement of the current development version of the speech synthesis system, so that the correctness of the target test case is required to be judged, the target test case which is judged by the correctness is taken as an actual test case, and the target test case is archived to be taken as the historical test case of the speech synthesis system of the next iteration version. It can be understood that the method for judging the correctness of the target test case is the same as the method for judging the correctness of the initial test case, and the embodiments of the present application are not repeated.

S630, testing the voice synthesis system according to the actual test case.

It can be understood that the actual test case obtained according to step S620 is used as input data of the speech synthesis system, so as to implement a software test on the speech synthesis system.

Referring to fig. 7, in some embodiments, the test method of the speech synthesis system provided by the embodiment of the application further includes, but is not limited to, steps S710 to S730.

S710, obtaining second test audio generated by the voice synthesis system according to the target test case;

It can be understood that the target test case is used as input data of the speech synthesis system to be tested, so as to determine whether the target test case is a contradictory case according to the output data (i.e., the second test audio) of the speech synthesis system, that is, determine whether the target test case contradicts the existing required functions of the speech synthesis system. When the audio content of the second test audio corresponds to the target test case, the target test case is not the contradictory case, and when the audio content of the second test audio is the warning, error reporting and other contents, the target test case is the contradictory case.

For example, the existing required function of the speech synthesis system on the well character (#) is set to be splicing, that is, when the text corresponding to the target test case is "speaking a#speaking B", if the second test audio output by the speech synthesis system is the spliced audio of speaking a and speaking B (i.e., outputting "speaking a#speaking B"), it is determined that the target test case is not a contradictory case. When the text corresponding to the target test case is '17#302' for representing the house number, the speech synthesis system outputs second test audio such as 'splicing error' or 'splicing failure' due to contradiction with the current splicing function of the speech synthesis system, and at this time, the target test case is judged to be a contradictory case.

S720, analyzing the audio content of the second test audio to obtain an analysis result, wherein the analysis result comprises an error result representing that the voice synthesis system has a functional defect;

It can be understood that the audio content of the second test audio is parsed by means of voice recognition or the like, so as to obtain a parsed result including a correct result and an incorrect result. And obtaining a correct result when the audio content of the second test audio corresponds to the target test case, and obtaining an error result when the audio content of the second test audio is 'splicing error', 'splicing failure', and the like.

And S730, generating prompt information for improving the prompt function according to the error result.

It will be appreciated that when an erroneous result is obtained, it is indicated that the target test case currently input to the speech synthesis system contradicts the functional requirements of the speech synthesis system. Therefore, in order to perfect the functional requirement of the speech synthesis system, corresponding prompt information is generated to prompt the user to perfect the function.

Referring to fig. 8, in some embodiments, the preliminary test template includes a fill field, and step S130 includes, but is not limited to, sub-steps S810 through S840.

S810, if the version type information represents iteration type, acquiring historical test data of a voice synthesis system;

It will be appreciated that when the speech synthesis system is determined to be developed iteratively based on system attribute data of the speech synthesis system, historical test data for a historical development version of the speech synthesis system is obtained. It is understood that the historical test data includes historical test data (i.e., historical test plans) for which a population operation has been performed.

S820, acquiring field attributes of the filling field, wherein the field attributes comprise reusable types;

It will be appreciated that the preliminary test template includes a plurality of fill fields, each having field attributes of either a reusable or non-reusable type. The content corresponding to the filling field is indicated by the multiplexing type to be the universal content (such as the content corresponding to the filling field of background, purpose and the like) of the speech synthesis system. Therefore, the content corresponding to the filling field can be multiplexed into the preliminary test templates of the voice synthesis systems with different iteration versions to obtain the corresponding target test data. The non-reusable content corresponding to the filling field is only applicable to the voice synthesis system corresponding to the iterative version, such as the content corresponding to the filling field of the version number, the newly added requirement function and the like.

S830, taking a filling field with a field attribute of multiplexing type as a field to be processed;

Specifically, different types of identifiers can be set for the filling fields in the preliminary test template, and the field attributes of the corresponding filling fields can be determined by identifying the types of the identifiers. Or, recognizing the text content of the filling field through OCR (Optical Character Recognition ) and other technologies, and comparing the recognized text content with the text content in a preset reusable database, thereby determining the field attribute of the corresponding filling field. It will be appreciated that the above-described method of determining the field properties of the fill field is merely exemplary, and embodiments of the present application are not limited in this regard. And when the field attribute of the filling field is determined to be a multiplexing type, the filling field is taken as a field to be processed.

And S840, filling the field to be processed according to the historical test data to obtain target test data.

It can be appreciated that when the field attribute of the filling field is of a reusable type, it is indicated that the preliminary test template may be filled according to the content of the corresponding filling field in the historical test data. Therefore, the filling field corresponding to the field to be processed is searched from the historical test data, the content corresponding to the filling field is associated with the field to be processed, namely, the preliminary test template is filled according to the content filling corresponding to the filling field, and the preliminary test template after all filling operations are completed is used as target test data.

It can be understood that, when the field attribute of the filling field is of a non-reusable type, the content corresponding to the filling field can be obtained by means of typing by a user, automatic identification of a development project file, and the like, which is not particularly limited in the embodiment of the present application.

It will be appreciated that completing all padding operations includes padding the content corresponding to all of the repeatable type padding fields and padding the content corresponding to all of the non-repeatable type padding fields.

The test method of the voice synthesis system provided by the embodiment of the application firstly realizes automatic generation of target test data through the preset first test database and filling operation, and realizes automatic search of test scheme data through the preset second test database, thereby avoiding the adoption of a manual mode in the related technology to formulate the target test data and the test scheme data, further accelerating the test progress of the voice synthesis system to a certain extent and improving the test efficiency of the voice synthesis system. And secondly, the accuracy of the test of the voice synthesis system is ensured by judging the overall coverage, the correctness, the repeatability and the contradiction of the test cases.

It can be understood that, according to the test method of the speech synthesis system provided by the embodiment of the application, the test risk can be judged according to methods such as a decision tree classification model and the like. Specifically, the test risks are classified into three categories of high, medium and low, and when the current test operation is determined to be a high risk operation according to the decision tree classification model, a corresponding warning prompt signal can be generated. The decision factors of the decision tree classification model include the judgment results of all coverage, correctness, repeatability and contradiction of the test cases, the random selection result in the random test, the pressure and load in the performance test and the like, and the embodiment of the application is not particularly limited.

Referring to fig. 9, in some embodiments, the present application further provides a test apparatus of a speech synthesis system, where the test apparatus includes:

the system attribute data acquisition module 910 is configured to acquire system attribute data of a to-be-tested speech synthesis system from an application end, where the speech synthesis system is operated at the application end, and the system attribute data includes source information of the application end and version type information of the speech synthesis system;

the target test data acquisition module 920 is configured to screen a preliminary test template from a preset first test database according to the source information;

the test scheme data acquisition module 930 is configured to screen test scheme data from a second preset test database according to the target test data;

The test case generation module 940 is used for obtaining an initial test case according to the test scheme data, screening the initial test case according to preset screening conditions to obtain a target test case;

and the test module 950 is used for testing the speech synthesis system according to the target test case.

It can be seen that the content in the above-mentioned test method embodiment of the speech synthesis system is applicable to the embodiment of the test device of the speech synthesis system, and the functions specifically implemented by the test device embodiment of the speech synthesis system are the same as those of the test method embodiment of the speech synthesis system, and the beneficial effects achieved by the test device embodiment of the speech synthesis system are the same as those achieved by the test method embodiment of the speech synthesis system.

The embodiment of the application also provides electronic equipment, which comprises:

at least one memory;

At least one processor;

at least one program;

The program is stored in the memory, and the processor executes at least one program to implement the test method of the speech synthesis system according to the present application. The electronic device may be any intelligent terminal including a mobile phone, a tablet computer, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a vehicle-mounted computer, and the like.

Referring to fig. 10, fig. 10 illustrates a hardware structure of an electronic device of another embodiment, the electronic device including:

The processor 1010 may be implemented by a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits, for executing related programs, so as to implement the technical solution provided by the embodiments of the present application;

The Memory 1020 may be implemented in the form of a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access Memory (Random Access Memory, RAM). The memory 1020 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present disclosure are implemented by software or firmware, relevant program codes are stored in the memory 1020, and the processor 1010 invokes a test method for executing the speech synthesis system according to the embodiments of the present disclosure;

An input/output interface 1030 for implementing information input and output;

The communication interface 1040 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (such as USB, network cable, etc.), or may implement communication in a wireless manner (such as mobile network, WIFI, bluetooth, etc.);

A bus 1050 that transfers information between the various components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040);

wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.

The embodiment of the application also provides a storage medium which is a computer readable storage medium, wherein the computer readable storage medium stores computer executable instructions for causing a computer to execute the test method of the voice synthesis system.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by persons skilled in the art that the embodiments of the application are not limited by the illustrations, and that more or fewer steps than those shown may be included, or certain steps may be combined, or different steps may be included.

The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" is used to describe an association relationship of an associated object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that only a exists, only B exists, and three cases of a and B exist simultaneously, where a and B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one of a, b or c may represent a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions for causing an electronic device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. The storage medium includes various media capable of storing programs, such as a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk.

The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and are not thereby limiting the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims

1. A method for testing a speech synthesis system, comprising:

obtaining an initial test case according to the test scheme data;

testing the voice synthesis system according to the target test case;

The system attribute data further comprises acoustic training parameters, and the method further comprises the steps of constructing a target voice recognition model, and specifically comprises the following steps:

acquiring voice information from a target object;

extracting characteristic parameters of the voice information according to the acoustic training parameters to obtain voice parameters;

And screening the target voice recognition model from the preliminary voice recognition model according to the voice parameters, wherein the target voice recognition model is used for screening the initial test case.

2. The method of claim 1, wherein the test plan data comprises a test type, the test type comprising a functional test;

3. The method for testing a speech synthesis system according to claim 1, wherein the screening conditions include a result of a pass of the match, and the test plan data includes test requirement information;

Acquiring test title information of the initial test case;

Obtaining the test requirement information according to the test scheme data;

comparing the test requirement information with the test title information;

4. The method for testing a speech synthesis system according to claim 1, wherein the screening conditions include comparison of pass results;

inputting the first test audio to the target voice recognition model for recognition to obtain a test text;

Comparing the test text with the initial test case;

5. The method for testing a speech synthesis system according to any one of claims 1 to 4, further comprising:

and testing the voice synthesis system according to the actual test case.

6. The method for testing a speech synthesis system according to any one of claims 1 to 4, further comprising:

7. The method of testing a speech synthesis system according to any one of claims 1 to 4, wherein the preliminary test template includes a padding field;

8. An electronic device, comprising:

at least one memory;

At least one processor;

At least one computer program;

the computer program is stored in the memory, and a processor executes the at least one computer program to implement the method of testing a speech synthesis system according to any one of claims 1 to 7.

9. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the test method of the speech synthesis system according to any one of claims 1 to 7.