CN111667075A

Movatterモバイル変換

Info

Publication number: CN111667075A
Application number: CN202010540400.7A
Authority: CN
Inventors: 史新新; 宛然; 魏培培; 易平; 姜传民; 曹佳; 周游; 刘培锴
Original assignee: Hangzhou Fuyun Network Technology Co ltd
Current assignee: Hangzhou Fuyun Network Technology Co ltd
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2020-09-15

Abstract

The application discloses a service execution method, a device, a system and a computer readable storage medium, wherein the service execution method utilizes the existing supervised learning game model to carry out self-game, corrects the game model according to a game result in the game process, generates corresponding game samples according to the corrected game model for continuous training of the subsequent supervised learning game model, and accordingly, the game level of the supervised learning game model is gradually improved by optimizing the game samples, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.

Description

Service execution method, device and related equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a service execution method, a service execution apparatus, a service execution system, and a computer-readable storage medium.

Background

Machine gaming has been touted as artificial intelligence fruit flies, and has been at the forefront of artificial intelligence research, for example, poker games are typical non-complete information games, and a long-standing challenge in artificial intelligence research, and many game intelligence systems have reached an advanced player level by replicating human player decisions using supervised learning. However, although the end-to-end game strategy model can be obtained by using the neural network for supervised learning based on the human game data, the performance level of the game strategy model trained based on the human game data is limited by the quality of the training data, because the human player sample contains strategy error data, the quality of the sample data set limits further improvement of the learned network model performance to a certain extent, so that the model precision is lower, and the accuracy of the execution result of the corresponding game service is reduced.

Therefore, how to effectively improve the accuracy of the game model and further improve the accuracy of the game service execution result is a problem to be solved urgently by those skilled in the art.

Disclosure of Invention

The application aims to provide a service execution method, which can effectively improve the accuracy of a game model and further improve the accuracy of a game service execution result; it is another object of the present application to provide a service execution apparatus, system and computer-readable storage medium, which also have the above-mentioned advantageous effects.

In order to solve the foregoing technical problem, in a first aspect, the present application provides a service execution method, including:

performing self-game by using an original game model to obtain a first game result;

backtracking according to the first game result to obtain a second game result opposite to the first game result, and obtaining a game sample corresponding to the second game result;

optimizing the original game model by using the game samples to obtain an optimized game model;

carrying out model confrontation on the original game model and the optimized game model, and reserving a game model with successful confrontation as the original game model;

judging whether the current model optimization meets preset optimization conditions, if not, returning to the step of utilizing the original game model to carry out self game to obtain a first game result for iterative optimization until the current model optimization meets the preset optimization conditions to obtain an optimal game model;

and executing the target game business by utilizing the optimal game model.

Preferably, the self-gaming by using the original gaming model to obtain the first gaming result includes:

acquiring current game data;

processing the current game data by using the original game model to obtain each legal game action and a probability value corresponding to each legal game action;

and determining a maximum probability value in all the probability values, and executing legal game actions corresponding to the maximum probability value until the game is finished to obtain the first game result.

Preferably, after obtaining the game sample corresponding to the second game result, the method further includes:

judging whether the number of the game samples reaches a preset number or not; if not, returning to the step of utilizing the original game model to carry out self-game to obtain a first game result until the number of the game samples reaches the preset number;

optimizing the original game model by using the game samples to obtain an optimized game model, wherein the optimizing comprises the following steps:

and optimizing the original game model by using the preset number of game samples to obtain the optimized game model.

Preferably, the determining whether the current model optimization meets a preset optimization condition includes:

counting the optimization times of the current model;

and judging whether the current model optimization times reach preset times or not.

In a second aspect, the present application further provides a service execution apparatus, including:

the initial game module is used for carrying out self-game by utilizing the original game model to obtain a first game result;

the backtracking game module is used for backtracking according to the first game result to obtain a second game result opposite to the first game result and obtain a game sample corresponding to the second game result;

the model optimization module is used for optimizing the original game model by using the game samples to obtain an optimized game model;

the model confrontation module is used for carrying out model confrontation on the original game model and the optimized game model and reserving a game model with successful confrontation as the original game model;

the iterative optimization module is used for judging whether the current model optimization meets a preset optimization condition, if not, returning to the step of utilizing the original game model to carry out self game and obtain a first game result for iterative optimization until the current model optimization meets the preset optimization condition, and obtaining an optimal game model;

and the service execution module is used for executing the target game service by utilizing the optimal game model.

Preferably, the primary gaming module comprises:

the data acquisition unit is used for acquiring current game data;

the data processing unit is used for processing the current game data by using the original game model to obtain each legal game action and a probability value corresponding to each legal game action;

and the action execution unit is used for determining a maximum probability value in all the probability values and executing legal game actions corresponding to the maximum probability value until the game is finished to obtain the first game result.

Preferably, the service execution method further includes:

the sample counting module is used for judging whether the number of the game samples reaches a preset number or not after the game samples corresponding to the second game result are obtained; if not, returning to the step of utilizing the original game model to carry out self-game to obtain a first game result until the number of the game samples reaches the preset number;

the model optimization module is specifically configured to optimize the original game model by using the preset number of game samples to obtain the optimized game model.

Preferably, the iterative optimization module is specifically configured to count the number of times of current model optimization; judging whether the current model optimization times reach preset times or not; if not, returning to the step of utilizing the original game model to carry out self game and obtain the first game result to carry out iterative optimization until the current model optimization meets the preset optimization condition to obtain the optimal game model.

In a third aspect, the present application further discloses a service execution system, including:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of any of the service execution methods described above.

In a fourth aspect, the present application also discloses a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, is adapted to carry out the steps of any of the service execution methods as described above.

The service execution method comprises the steps of utilizing an original game model to carry out a self-game to obtain a first game result; backtracking according to the first game result to obtain a second game result opposite to the first game result, and obtaining a game sample corresponding to the second game result; optimizing the original game model by using the game samples to obtain an optimized game model; carrying out model confrontation on the original game model and the optimized game model, and reserving a game model with successful confrontation as the original game model; judging whether the current model optimization meets preset optimization conditions, if not, returning to the step of utilizing the original game model to carry out self game to obtain a first game result for iterative optimization until the current model optimization meets the preset optimization conditions to obtain an optimal game model; and executing the target game business by utilizing the optimal game model.

Therefore, the service execution method provided by the application utilizes the existing supervised learning game model to carry out self-game, corrects the game model according to the game result in the game process, generates the corresponding game sample according to the corrected game model, and is used for continuous training of the subsequent supervised learning game model, so that the game level of the supervised learning game model is gradually improved by optimizing the game sample, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.

The service execution device, the service execution system, and the computer-readable storage medium provided by the present application all have the above beneficial effects, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the prior art and the embodiments of the present application, the drawings that are needed to be used in the description of the prior art and the embodiments of the present application will be briefly described below. Of course, the following description of the drawings related to the embodiments of the present application is only a part of the embodiments of the present application, and it will be obvious to those skilled in the art that other drawings can be obtained from the provided drawings without any creative effort, and the obtained other drawings also belong to the protection scope of the present application.

Fig. 1 is a schematic flow chart of a service execution method provided in the present application;

FIG. 2 is a flow chart of a method for optimizing game samples provided herein;

FIG. 3 is a flow chart of a method for optimizing a game model provided herein;

FIG. 4 is a diagram illustrating the trend of the confrontation results of a game model provided in the present application;

fig. 5 is a schematic structural diagram of a service execution device provided in the present application;

fig. 6 is a schematic structural diagram of a service execution device provided in the present application.

Detailed Description

The core of the application is to provide a service execution method, which can effectively improve the accuracy of a game model and further improve the accuracy of a game service execution result; another core of the present application is to provide a service execution apparatus, a system and a computer-readable storage medium, which also have the above-mentioned advantages.

In order to more clearly and completely describe the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a schematic flow chart of a service execution method provided in the present application, including:

s101: performing self-game by using an original game model to obtain a first game result;

this step is intended to perform a self-game using the original game model to obtain a corresponding game result, i.e. the above-mentioned first game result. The original game model is an existing supervised learning game strategy model, a self-game platform is built through the original game model to simulate a game, each game participant utilizes the original game model to make a decision and complete the game, and the first game result is obtained.

As a preferred embodiment, the self-gaming using the original gaming model to obtain the first gaming result may include: acquiring current game data; processing the current game data by using an original game model to obtain each legal game action and a probability value corresponding to each legal game action; and determining the maximum probability value in all the probability values, and executing legal game actions corresponding to the maximum probability value until the game is ended to obtain a first game result.

The preferred embodiment provides a specific method for acquiring a first game result, which includes acquiring current game data, wherein the current game data is situation data of a current game, processing the current game data by using an original game model to acquire legal game actions and probability values corresponding to the legal game actions, and executing the legal game actions corresponding to the maximum probability values to perform the game until the game is finished, so that the first game result can be acquired. For example, for a chess game, for one party participating in the game, the original game model can be used to process the current game data, such as the played information, the card information hidden by the own card information, and the like, so as to obtain the legal playing actions and the corresponding probability values, and further, the players can execute the legal playing actions corresponding to the maximum probability values, so that the players play cards in sequence by using the original game model until the game is finished, and the game result is obtained.

S102: backtracking according to the first game result to obtain a second game result opposite to the first game result, and obtaining a game sample corresponding to the second game result;

the method comprises the following steps of obtaining a second game result opposite to the first game result through backtracking game and obtaining a game sample corresponding to the second game result. Assuming that the game is played by a participant a and a participant B respectively, in the self-game process, the first game result indicates that the participant a wins over the participant B, and the second game result opposite to the first game result indicates that the participant B wins over the participant a, which can be realized by backtracking the game. Specifically, backtracking can be performed from the side with the game failure to reach the upper-layer game decision point, legal game actions completely different from the previous legal game actions are selected and executed at the decision point, the original game model is utilized to continue the game until the game is finished, if all legal game actions at the decision point cannot change the game result, the backtracking is continued upwards to perform the game until a second game result opposite to the first game result is obtained; furthermore, through multiple backtracking, until no legal game actions which can be continuously improved can be found in the specified backtracking layer number and backtracking times, at the moment, corresponding game samples which can change the game result can be obtained based on each improved action, namely the optimized game samples.

S103: optimizing the original game model by using the game sample to obtain an optimized game model;

the step aims at realizing model optimization, the optimized game samples are utilized to optimize the original game model to obtain a corresponding optimized game model, the model optimization process refers to the prior art, and the description is omitted herein.

As a preferred embodiment, after obtaining the game sample corresponding to the second game result, the method may further include: judging whether the number of the game samples reaches a preset number or not; if not, returning to the step of utilizing the original game model to perform self-game to obtain a first game result until the number of the game samples reaches a preset number; the optimizing the original game model by using the game samples to obtain an optimized game model may include: and optimizing the original game model by using a preset number of game samples to obtain an optimized game model.

In order to effectively ensure the optimization effect of the model and improve the performance of the optimized model, the number of the optimized game samples can be preset so as to optimize the original game model by using a certain number of optimized game samples. Therefore, after the game samples corresponding to the second game result are obtained based on S102, the number of the game samples can be counted first, and whether the number of the game samples reaches the preset number is judged, if not, the process returns to S101 to repeat the self game and backtracking game until the preset number of optimized game samples are obtained, so that in S103, the original game model can be optimized by using the preset number of game samples, and an optimized game model with higher performance is obtained. It can be understood that the value of the preset number does not affect the implementation of the technical scheme, and the value is set by a technician according to the actual situation, which is not limited in the present application.

S104: carrying out model confrontation on the original game model and the optimized game model, and reserving the game model with successful confrontation as the original game model;

s105: judging whether the current model optimization meets preset optimization conditions, if not, returning to S101 for iterative optimization, and if so, executing S106;

the method aims to realize model confrontation, namely an original game model and an optimized game model are subjected to confrontation game, the game model with successful confrontation is reserved, and the game model with successful confrontation is set as a new original game model, so that the game model with optimal performance can be obtained by performing cycle iterative training according to iteration conditions. The iteration condition is a preset condition for judging whether the model needs to be continuously subjected to iteration training or not, namely the preset optimization condition is not unique in type, can be the maximum times of the iteration training, can also be a condition that certain model parameters reach certain standard values and the like, and is not limited in the application.

As a preferred embodiment, the above determining whether the current model optimization satisfies the preset optimization condition may include: counting the optimization times of the current model; and judging whether the optimization times of the current model reach preset times or not.

The method comprises the following steps of providing a specific type of preset optimization condition, namely presetting the highest times of iterative training, namely the preset times, counting the optimization times of a current model after each model confrontation, judging whether the optimization times reach the preset times, if not, continuing the iterative training until the optimization times of the current model reach the preset times, and obtaining the optimal game model. The specific value of the preset times does not affect the implementation of the technical scheme, and the technical personnel can set the value according to the actual situation, which is not limited by the application.

S106: and taking the original game model as an optimal game model, and executing the target game service by using the optimal game model.

The step aims to realize the execution of the game service, namely when the target game service is obtained, the optimal game model is directly called to carry out the game, and the corresponding game service execution result can be obtained. The target game service is the received game service to be executed.

It should be noted that, the above-mentioned S101 to S105 are training processes of an optimal game model, and in an actual game service execution process, the above-mentioned model training process only needs to be executed once, and then the model is directly called when the game service is received again. In addition, the memorability correction and optimization of the optimal game model can be continued according to the execution result of the game service, so that a game model with better performance can be obtained.

On the basis of the above embodiments, the embodiment of the present application provides a more specific service execution method taking the game of the field-fighting primary game as an example, and the specific implementation flow is as follows:

(1) self-gaming simulated games

Learning card-playing strategy model p based on existing supervision of fighting with land owner_θBuilding a self-gaming platform simulation game, each player using p_θMaking a decision, inputting the situation data of the current game state s into the model, and outputting the probability distribution p of all legal actions in the current game state_θAnd (a | s), each player respectively picks the legal action with the highest probability to play until the game is ended.

(2) Performing backtracking improvement decision to generate optimized sample

Referring to fig. 2, fig. 2 is a flowchart of a game sample optimization method provided in the present application, based on the above self-game process, tracing back from the losing player to the previous player decision point, picking a different card-playing action from the previous one at the decision point with multiple card-playing modes, and continuing to use p from this step_θSimulating the game until the game is finished, if all the card-playing actions of the decision point can not change the game result, continuously backtracking upwards, selecting different actions for simulation until the game result is changed, and recording the improvement action; further, repeating the backtracking process from the new output side until no continuously improved methods can be found within the specified backtracking layer number (set as 8 layers) and the maximum backtracking times (namely the iterative backtracking times in the single game, set as 400), and ending the backtracking of the game; finally, based on the improvement action, a new optimized training sample is generated for each step of the game and stored in the sample container M.

(3) Generating a number of game samples

Based on p_θThe multiple games are played against so that the sample size of the sample container reaches the set number (the first training sample is set to 50 ten thousand, and the sample number is set to 2.5 thousand in the subsequent iterative training process).

(4) Model training

Self-gaming sample continuation pair p based on optimization_θTraining is carried out, and after the number of times of model training reaches the specified number (set to be 1000 times), a new supervised learning game strategy model p 'is obtained'_θThis is one strategy model iteration.

(5) Model evaluation

Game strategy model p for supervised learning_θAnd p'_θConducting game fight, and determining winning model as new p_θAnd using the new p_θAnd continuing to play the self-game simulation game, and repeating the steps.

Therefore, the above processes are performed circularly, as shown in fig. 3, fig. 3 is a flowchart of a game model optimization method provided by the present application, until the performance of a supervised learning game policy model is not improved, an optimal game model is obtained.

The model evaluation can be realized by adopting the following method: selecting 1000 deals of ground-fighting games (the card distribution result and the base card of each game are known) as a fixed test card game library, wherein the 1000 deals are used for each model evaluation; because the landlord has three roles, on each card game, the two strategy models are combined according to the roles and the cards to carry out 6 times of confrontation games, and finally 6000 games are carried out, and meanwhile, the winning rates of the two strategy models are recorded, as shown in fig. 4, fig. 4 is a competition result trend chart of the game model provided by the application.

Therefore, the service execution method provided by the embodiment of the application utilizes the existing supervised learning game model to carry out self-game, corrects the game model according to the game result in the game process, generates the corresponding game sample according to the corrected game model, and is used for continuous training of the subsequent supervised learning game model, so that the game level of the supervised learning game model is gradually improved by optimizing the game sample, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.

To solve the above technical problem, the present application further provides a service execution device, please refer to fig. 5, where fig. 5 is a schematic structural diagram of the service execution device provided in the present application, and the schematic structural diagram includes:

the initial game module 1 is used for carrying out self-game by utilizing an original game model to obtain a first game result;

thebacktracking game module 2 is used for backtracking according to the first game result to obtain a second game result opposite to the first game result and obtain a game sample corresponding to the second game result;

themodel optimization module 3 is used for optimizing the original game model by using the game samples to obtain an optimized game model;

themodel confrontation module 4 is used for carrying out model confrontation on the original game model and the optimized game model and reserving a game model with successful confrontation as the original game model;

theiterative optimization module 5 is used for judging whether the current model optimization meets a preset optimization condition, if not, returning to the step of utilizing the original game model to perform self game and obtain a first game result to perform iterative optimization until the current model optimization meets the preset optimization condition, and obtaining an optimal game model;

and theservice execution module 6 is used for executing the target game service by utilizing the optimal game model.

Therefore, the service execution device provided by the embodiment of the application utilizes the existing supervised learning game model to perform self-game, corrects the game model according to the game result in the game process, generates the corresponding game sample according to the corrected game model, and is used for continuous training of the subsequent supervised learning game model, so that the game level of the supervised learning game model is gradually improved by optimizing the game sample, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.

As a preferred embodiment, the above-mentioned primary gaming module 1 may comprise:

the data acquisition unit is used for acquiring current game data;

As a preferred embodiment, the service execution device may further include a sample statistics module, configured to determine, after the game samples corresponding to the second game result are obtained, whether the number of the game samples reaches a preset number; if not, returning to the step of utilizing the original game model to carry out self-game to obtain a first game result until the number of the game samples reaches the preset number;

themodel optimization module 3 may be specifically configured to optimize the original game model by using the preset number of game samples to obtain the optimized game model.

As a preferred embodiment, theiterative optimization module 5 may be specifically configured to count the number of times of current model optimization; judging whether the current model optimization times reach preset times or not; if not, returning to the step of utilizing the original game model to carry out self game and obtain the first game result to carry out iterative optimization until the current model optimization meets the preset optimization condition to obtain the optimal game model.

For the introduction of the apparatus provided in the present application, please refer to the above method embodiments, which are not described herein again.

To solve the above technical problem, the present application further provides a service execution system, please refer to fig. 6, where fig. 6 is a schematic structural diagram of the service execution system provided in the present application, and the service execution system may include:

amemory 10 for storing a computer program;

theprocessor 20, when executing the computer program, may implement the steps of any of the service execution methods described above.

For the introduction of the system provided by the present application, please refer to the above method embodiment, which is not described herein again.

To solve the above problem, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, can implement the steps of any one of the service execution methods described above.

The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

For the introduction of the computer-readable storage medium provided in the present application, please refer to the above method embodiments, which are not described herein again.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The technical solutions provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, without departing from the principle of the present application, several improvements and modifications can be made to the present application, and these improvements and modifications also fall into the protection scope of the present application.

Claims

1. A method for performing a service, comprising:

and executing the target game business by utilizing the optimal game model.

2. The method for executing services according to claim 1, wherein said self-gaming using a primary gaming model to obtain a primary gaming result comprises:

acquiring current game data;

3. The service execution method of claim 1, wherein after obtaining the game sample corresponding to the second game result, the method further comprises:

4. The method of claim 1, wherein the determining whether the current model optimization satisfies a predetermined optimization condition comprises:

counting the optimization times of the current model;

5. A service execution apparatus, comprising:

6. The transaction execution device of claim 5, wherein the primary gaming module comprises:

the data acquisition unit is used for acquiring current game data;

7. The service execution apparatus of claim 5, further comprising:

8. The service execution device according to claim 5, wherein the iterative optimization module is specifically configured to count a number of times of current model optimization; judging whether the current model optimization times reach preset times or not; if not, returning to the step of utilizing the original game model to carry out self game and obtain the first game result to carry out iterative optimization until the current model optimization meets the preset optimization condition to obtain the optimal game model.

9. A business execution system, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the service execution method of any of claims 1 to 4.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the service execution method according to any one of claims 1 to 4.