CN117173285A

Movatterモバイル変換

Info

Publication number: CN117173285A
Application number: CN202311109510.8A
Authority: CN
Inventors: 杨越; 姜钧窦; 刘思博
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-08-30
Filing date: 2023-08-30
Publication date: 2023-12-05

Abstract

The application discloses an image generation method, an image generation device, image generation equipment and a storage medium, and belongs to the technical field of image processing. The method comprises the following steps: displaying a building interface of the virtual environment, wherein the building interface comprises at least one candidate virtual model for building the virtual environment; acquiring an input text, wherein the input text is a prompt text for building an interest virtual environment, and the input text is used for describing the appearance of at least one virtual object included in the interest virtual environment; in response to obtaining the input text, displaying a predictive image, the predictive image being a preview image of the virtual environment of interest constructed using the at least one candidate virtual model, the predictive image including a view of at least one of the candidate virtual models.

Description

Image generation method, device, equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image generating method, apparatus, device, and storage medium.

Background

Virtual objects that typically need to be controlled in applications that include a virtual environment are active in the virtual environment, such as: walking, driving, climbing, picking up objects, combat, etc.

In the related art, a player can customize a virtual environment based on a virtual model provided in advance as needed for virtual game or virtual sightseeing in the virtual environment.

However, the above-mentioned process requires the player to conceive design details of each virtual article in the virtual environment, and is cumbersome to operate and inefficient in man-machine interaction.

Disclosure of Invention

The application provides an image generation method, an image generation device and a storage medium, wherein the technical scheme is as follows:

according to an aspect of the present application, there is provided an image generation method including:

displaying a building interface of the virtual environment, wherein the building interface comprises at least one candidate virtual model for building the virtual environment;

acquiring an input text, wherein the input text is a prompt text for building an interest virtual environment, and the input text is used for describing the appearance of at least one virtual object included in the interest virtual environment;

in response to obtaining the input text, displaying a predictive image, the predictive image being a preview image of the virtual environment of interest constructed using the at least one candidate virtual model, the predictive image including a view of at least one of the candidate virtual models.

According to another aspect of the present application, there is provided an image generating apparatus including:

the display module is used for displaying a building interface of the virtual environment, wherein the building interface comprises at least one candidate virtual model for building the virtual environment;

the acquisition module is used for acquiring an input text, wherein the input text is a prompt text for building an interest virtual environment, and the input text is used for describing the appearance of at least one virtual object included in the interest virtual environment;

the display module is further configured to display, in response to acquiring the input text, a predicted image, the predicted image being a preview image for the virtual environment of interest constructed using the at least one candidate virtual model, the predicted image including a view of at least one of the candidate virtual models.

In an alternative design of the present application, the predicted image includes a first predicted image and a second predicted image; the display module is further configured to:

in response to obtaining the input text, displaying the first predicted image and the second predicted image;

the first virtual articles in the first predicted image are spliced by a first number of candidate virtual models, and the second virtual articles in the second predicted image are spliced by a second number of candidate virtual models; the first virtual article and the second virtual article belong to the same article type, the first number and the second number are positive integers, and the first number is smaller than the second number.

In an alternative design of the application, the predictive image is displayed superimposed on the construction interface in a floating window manner;

or the predicted image and the construction interface are displayed in a split screen mode;

or, the predicted image is displayed on at least one virtual model in the construction interface in a mapping mode.

In an alternative design of the application, the display module is further adapted to:

and in response to the input text being acquired, invoking the image prediction model, predicting the predicted image according to the input text, and displaying the predicted image, wherein the image prediction model is used for generating an image according to the input prompt text.

In an optional design of the present application, the obtaining module is further configured to obtain a sample image and a description text of the sample image, where the sample image is a picture for observing a sample environment, and the sample environment is a virtual environment obtained by building based on the at least one candidate virtual model included in the building interface;

the apparatus further comprises a processing module:

the processing module is used for training a through image prediction model based on the sample image and the description text to obtain the image prediction model, wherein the through image prediction model is a model with the capability of generating an image according to the prompt text.

In an alternative design of the application, the acquisition module is further configured to:

acquiring an image style text, wherein the image style text is a prompt text describing style types of predicted images;

the display module is further configured to:

determining the input text and the image style text as input parameters of an image prediction model in response to acquiring the input text and the image style text;

and calling the image prediction model, and predicting according to the input parameters to obtain the predicted image, wherein the image prediction model is used for generating an image according to the input prompt text.

acquiring a sample image and a description text of the sample image, wherein the sample image is a picture for observing a sample environment, and the sample environment is a virtual environment constructed based on the at least one candidate virtual model included in the construction interface;

performing feature extraction processing on the image style of the sample image to obtain sample style features;

invoking a recognition image prediction model to perform prediction processing on the description text and an initial style text to obtain a style prediction image, wherein the initial style text comprises style description words and weight information;

And correcting the weight information in the initial style text by taking the difference between the prediction style characteristics of the style prediction image and the sample style characteristics as a correction target to obtain the image style text.

a color selector for displaying a first virtual model, wherein the color selector is provided with at least one recommended color, the recommended color is the color of an interest color taking point in the predicted image, the position of the interest color taking point belongs to a first interest area, and the shape of the first interest area is the same as the observation shape of the first virtual model under at least one view angle;

in response to a filling operation, displaying a first recommended color selected by the filling operation on at least one external surface of the first virtual model, the at least one external surface corresponding to the first region of interest.

displaying at least one recommendation model, wherein the recommendation model is a candidate virtual model included in the construction interface, and the observation shape of the recommendation model under at least one view angle is the same as the shape of a second interest area in the predicted image;

And in response to a model selection operation for a first recommendation model in the recommendation models, creating the first recommendation model in the building interface, wherein the first recommendation model is used for building the interest virtual environment.

displaying a recommended position of a first model relative to a second model, wherein the first model and the second model form an interest model group at the recommended position, and the observation shape of the interest model group at least one view angle is the same as the shape of a third interest region in the predicted image;

and in response to a position selection operation for the first model, moving the first model to the recommended position in the building interface.

responding to a triggering operation aiming at a language template, and displaying a first text corresponding to the language template, wherein the first text comprises an initial description word of a virtual object in the interest virtual environment on at least one appearance attribute;

in response to an editing operation for the first text, displaying the first text to be changed into a second text, wherein the second text is used for modifying the initial description word in the first text;

In response to a confirmation operation for the second text, the second text is determined to be the input text.

According to another aspect of the present application there is provided a computer device comprising a processor and a memory having stored therein at least one instruction, at least one program, code set or instruction set, the at least one instruction, at least one program, code set or instruction set being loaded and executed by the processor to implement the image generation method as described in the above aspect.

According to another aspect of the present application, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes or a set of instructions, the at least one instruction, the at least one program, the set of codes or the set of instructions being loaded and executed by a processor to implement the image generation method as described in the above aspect.

According to another aspect of the present application there is provided a computer program product comprising computer instructions stored in a computer readable storage medium from which a processor reads and executes the computer instructions to implement the image generation method as described in the above aspects.

The technical scheme provided by the application has the beneficial effects that at least:

by generating a predicted image comprising at least one candidate virtual model according to the input text, effective reference is provided for the establishment of the interesting virtual environment; the predicted image comprises the observation picture of at least one candidate virtual model, and a relation is established between the predicted image and the virtual provided by the construction interface, so that repeated adjustment of the construction position caused by no reference picture of the interesting virtual environment in the process of constructing the virtual environment is avoided, and the man-machine interaction efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block diagram of a computer system provided in accordance with an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of an image generation method provided by an exemplary embodiment of the present application;

FIG. 3 is a flowchart of an image generation method provided by an exemplary embodiment of the present application;

FIG. 4 is a flowchart of an image generation method provided by an exemplary embodiment of the present application;

FIG. 5 is a schematic illustration of a predicted image provided by an exemplary embodiment of the present application;

FIG. 6 is a schematic illustration of a predictive image and build interface provided by an exemplary embodiment of the application;

FIG. 7 is a flowchart of an image generation method provided by an exemplary embodiment of the present application;

FIG. 8 is a flowchart of an image generation method provided by an exemplary embodiment of the present application;

FIG. 9 is a schematic illustration of a predicted image provided by an exemplary embodiment of the present application;

FIG. 10 is a flowchart of an image generation method provided by an exemplary embodiment of the present application;

FIG. 11 is a flowchart of an image generation method provided by an exemplary embodiment of the present application;

FIG. 12 is a flowchart of an image generation method provided by an exemplary embodiment of the present application;

FIG. 13 is a flowchart of an image generation method provided by an exemplary embodiment of the present application;

fig. 14 is a block diagram of an image generating apparatus according to an exemplary embodiment of the present application;

fig. 15 is a block diagram of a terminal according to an exemplary embodiment of the present application.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region. For example, the information such as the input text related to the application is acquired under the condition of full authorization.

It should be understood that, although the terms first, second, etc. may be used in this disclosure to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first parameter may also be referred to as a second parameter, and similarly, a second parameter may also be referred to as a first parameter, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

FIG. 1 illustrates a block diagram of a computer system provided in accordance with an exemplary embodiment of the present application. The computer system 100 includes: a first terminal 110, a server 120, a second terminal 130.

The first terminal 110 is installed and operated with a client 111 supporting a virtual environment, and the client 111 may be a multi-person online fight program. When the first terminal runs the client 111, a user interface of the client 111 is displayed on a screen of the first terminal 110. The client 111 may be any one of a large fleeing shooting Game, a Virtual Reality (VR) application, an augmented Reality (Augmented Reality, AR) program, a three-dimensional map program, a Virtual Reality Game, an augmented Reality Game, a First-person shooting Game (FPS), a Third-person shooting Game (Third-Personal Shooting Game, TPS), a multiplayer online tactical competition Game (Multiplayer Online Battle Arena Games, MOBA), a strategy Game (SLG). In the present embodiment, the client 111 is exemplified as an FPS game. The first terminal 110 is a terminal used by the first user 112, and the first user 112 uses the first terminal 110 to control a first virtual object located in the virtual environment to perform activities, where the first virtual object may be referred to as a virtual object of the first user 112. The activities of the first virtual object include, but are not limited to: at least one of moving, jumping, transmitting, releasing skills, using props, adjusting body posture, crawling, walking, running, riding, flying, jumping, driving, picking up, shooting, attacking, throwing. Illustratively, the first virtual object is a first virtual object, such as an emulated persona or cartoon persona.

The second terminal 130 is installed and operated with a client 131 supporting a virtual environment, and the client 131 may be a multi-person online fight program. When the second terminal 130 runs the client 131, a user interface of the client 131 is displayed on a screen of the second terminal 130. The client may be any one of a fleeing game, VR application, AR program, three-dimensional map program, virtual reality game, augmented reality game, FPS, TPS, MOBA, SLG, in this embodiment exemplified by the client being a MOBA game. The second terminal 130 is a terminal used by the second user 132, and the second user 132 uses the second terminal 130 to control a second virtual object located in the virtual environment to perform an activity, and the second virtual object may be referred to as a virtual object of the second user 132. Illustratively, the second virtual object is a second virtual object, such as an emulated persona or cartoon persona.

Optionally, the first virtual object and the second virtual object are in the same virtual environment. Optionally, the first virtual object and the second virtual object may belong to the same camp, the same team, the same organization, have a friend relationship, or have temporary communication rights. Alternatively, the first virtual object and the second virtual object may belong to different camps, different teams, different organizations, or have hostile relationships.

Alternatively, the clients installed on the first terminal 110 and the second terminal 130 are the same, or the clients installed on the two terminals are the same type of client on different operating system platforms (android or IOS). The first terminal 110 may refer broadly to one of the plurality of terminals and the second terminal 130 may refer broadly to another of the plurality of terminals, the present embodiment being illustrated with only the first terminal 110 and the second terminal 130. The device types of the first terminal 110 and the second terminal 130 are the same or different, and the device types include: at least one of a smart phone, a tablet computer, an electronic book reader, an MP3 player, an MP4 player, a laptop portable computer, and a desktop computer.

Only two terminals are shown in fig. 1, but in different embodiments there are a plurality of other terminals 140 that can access the server 120. Optionally, there are one or more terminals 140 corresponding to the developer, a development and editing platform for supporting the client of the virtual environment is installed on the terminal 140, the developer can edit and update the client on the terminal 140, and transmit the updated client installation package to the server 120 through a wired or wireless network, and the first terminal 110 and the second terminal 130 can download the client installation package from the server 120 to implement the update of the client.

The first terminal 110, the second terminal 130, and the other terminals 140 are connected to the server 120 through a wireless network or a wired network.

Server 120 includes at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. The server 120 is used to provide background services for clients supporting a three-dimensional virtual environment. Optionally, the server 120 takes on primary computing work and the terminal takes on secondary computing work; alternatively, the server 120 takes on secondary computing work and the terminal takes on primary computing work; alternatively, a distributed computing architecture is used for collaborative computing between the server 120 and the terminals.

In one illustrative example, server 120 includes a processor 122, a user account database 123, an engagement service module 124, and a user-oriented Input/Output Interface (I/O Interface) 125. Wherein the processor 122 is configured to load instructions stored in the server 120, process data in the user account database 123 and the combat service module 124; the user account database 123 is configured to store data of user accounts used by the first terminal 110, the second terminal 130, and the other terminals 140, such as an avatar of the user account, a nickname of the user account, and a combat index of the user account, where the user account is located; the combat service module 124 is configured to provide a plurality of combat rooms for users to combat, such as 1V1 combat, 3V3 combat, 5V5 combat, etc.; the user-oriented I/O interface 125 is used to establish communication exchanges of data with the first terminal 110 and/or the second terminal 130 via a wireless network or a wired network.

The method provided in the present application may be applied, but is not limited to, to at least one of the following scenarios: virtual reality applications, three-dimensional map programs, first-person shooter games (FPS), third-person shooter games (Third-Person Shooting Game, TPS), multiplayer online tactical competition games (Multiplayer Online Battle Arena Games, MOBA), multiplayer gunfight survival games, etc., the following embodiments are illustrated with application in games.

Fig. 2 shows a schematic diagram of an image generation method according to an exemplary embodiment of the present application.

The first interface 310 is a building interface of the virtual environment, and the building interface includes candidate virtual models for building the virtual environment, and the candidate virtual models include a cone model 311, a dodecahedron model 312 and a cube model 313; it will be appreciated that a greater variety of candidate virtual models may also be provided in the build interface.

Illustratively, the sandbox region 315 in the first interface 310 is used for building an interest virtual environment, and by creating a virtual model in the sandbox region, the interest virtual environment is built; illustratively, a virtual model 315a of a cube shape is created in the sandbox region 315 of the figure. Illustratively, the reference map control 316 on the first interface is provided with a functional portal that generates a predicted image, and the second interface 320 is displayed in response to a click operation on the reference map control 316.

An input portal 322 in the second interface 320 is used to obtain input text; acquiring input text entered at input portal 322 in response to clicking on confirmation control 324; in response to a click operation on the confirmation control 324, three predicted images including a first predicted image 325, a second predicted image 326, and a third predicted image 327 are displayed; wherein the first predicted image 325 has a lower image complexity than the second predicted image 326, and the second predicted image 326 has a lower image complexity than the third predicted image 327.

In response to a click operation on the second predicted image 326, the third interface 330 is displayed. In the third interface 330, the second predicted image 326 is displayed superimposed on the building interface of the virtual environment. By simultaneously displaying the predicted image and the construction interface, the predicted image is simultaneously displayed when the candidate virtual model provided by the construction interface is used for constructing the interest virtual environment, and effective reference is provided for constructing the interest virtual environment.

Fig. 3 shows a flowchart of an image generation method provided by an exemplary embodiment of the present application. The method is applied to the terminal for illustration, and comprises the following steps:

step 510: displaying a building interface of the virtual environment;

Exemplary, the building interface includes candidate virtual models for building a virtual environment; the setting-up interface is an interface for setting up a virtual environment, and at least one candidate virtual model is provided in the setting-up interface, and the virtual environment is set up by at least one of moving the position of the candidate virtual model, scaling the size of the candidate virtual model, arranging the relative positions between at least two candidate virtual models, and the like. The virtual environment is an environment for virtual activities of a control account logged in by the current terminal, and the virtual environment comprises virtual articles built by at least one candidate virtual model, and it can be understood that the virtual environment can comprise one or more virtual articles.

The candidate virtual model provided in the construction interface may be a three-dimensional shape model such as a sphere, a prism, a pyramid, a cylinder, a cone, or a model of virtual articles such as a virtual house, a virtual vehicle, and a virtual tree, which are constructed in advance, which is not limited in the present application.

Step 520: acquiring an input text;

the input text is illustratively a prompt text for building the interest virtual environment, and the input text is illustratively a description text for the appearance of the interest virtual environment when building the environment. As described above, the virtual environment includes at least one virtual object, and the input text is used to describe the appearance of the at least one virtual object included in the virtual environment of interest, that is, the building mode of the virtual environment is prompted through the appearance of the virtual environment.

For example, the input text may describe the virtual item by at least one of color, texture, size, type, and the type information of the virtual item may be described by an item name in the real world. For example, the environmental items of interest include a virtual castle, and the input text may be described by displaying the names of the items in the world, such as the castle building. For example, the input text may include style information of the virtual article, such as a style of the virtual building is a gothic building, and a color of the virtual building is a garden building color.

For example, the input text may be used to describe the appearance of one or more virtual items, and in the case where the input text describes the appearance of a plurality of virtual items, the input text may describe the relative positional relationship between the plurality of virtual items.

Step 530: displaying a predicted image in response to the input text being acquired;

illustratively, the predicted image is a preview image for the virtual environment of interest built using at least one candidate virtual model, the predicted image being used to simulate a view of the virtual environment of interest; illustratively, the predicted image is an observation of the virtual environment of interest predicted from the input text. The predicted image includes a virtual object carried by the input image, and the appearance of the virtual object in the predicted image is the same as the description in the input text.

Illustratively, the predicted image includes a view of at least one candidate virtual model provided by a build interface for building the virtual environment. The candidate virtual model in the predicted image may be, for example, a virtual item or a sub-portion of a virtual item. Specifically, the predicted image includes an observation of a cylindrical model, which is a sub-portion of the virtual vehicle, such as a wheel portion of the virtual vehicle. The predicted image includes an observation screen of one rectangular pyramid model, which is a virtual tree model in a virtual environment.

Illustratively, the predicted image is a picture for simulating an observation in the virtual environment of interest by the virtual camera model. For example, virtual items in a virtual environment of interest are simulated by entering text.

In summary, according to the method provided by the embodiment, by generating the predicted image including at least one candidate virtual model according to the input text, an effective reference is provided for building the interesting virtual environment; the predicted image comprises an observation picture of at least one candidate virtual model, a relation is established between the predicted image and the virtual provided by the construction interface, repeated adjustment of construction positions caused by no reference picture of the interesting virtual environment in the process of constructing the virtual environment is avoided, and the man-machine interaction efficiency is improved.

Fig. 4 shows a flowchart of an image generation method provided by an exemplary embodiment of the present application. Taking the application of the method in the terminal as an example, i.e. in the embodiment shown in fig. 3, step 530 may be implemented as step 530a:

step 530a: in response to obtaining the input text, displaying a first predicted image and a second predicted image;

illustratively, the predicted image includes a first predicted image and a second predicted image; the first predicted image and the second predicted image each include an observation of at least one candidate virtual model.

Illustratively, the first predicted image has a lower image complexity than the second predicted image; for example, the number of virtual items in the first predicted image is less than the second predicted image.

In one example, a first virtual item is included in a first predicted image and a second virtual item is included in a second predicted image; the first virtual article and the second virtual article belong to the same article type, e.g., the first virtual article and the second virtual article each belong to at least one of a virtual building, a virtual plant, a virtual vehicle, a virtual table and a chair.

Illustratively, the first virtual article is stitched from at least one candidate virtual model; illustratively, the stitching candidate virtual model is used to indicate that there is a connection point, a connection line or a connection surface between the two candidate virtual models, and the two candidate virtual models are combined into a whole to be determined as the first virtual object. Similarly, the second virtual article is spliced by at least one candidate virtual model.

Illustratively, the first virtual item is stitched from a first number of candidate virtual models, and the second virtual item is stitched from a second number of candidate virtual models; the first number and the second number are both positive integers, the first number being smaller than the second number. The number a of the corresponding candidate virtual models is smaller than the number b of the corresponding candidate virtual models when the candidate virtual models provided by the construction interface are used for constructing the first virtual object in the first prediction image; the first virtual item in the first predicted image is less complex to build than the second virtual item in the second predicted image.

By way of example, by displaying the first predicted image and the second predicted image, the observation screen of the virtual environment of interest with different degrees of difficulty in construction is displayed; multiple types of predictive images are provided in the dimension of the difficulty of building the virtual environment of interest.

Fig. 5 illustrates a schematic diagram of a predicted image provided by an exemplary embodiment of the present application. Nine sub-pictures are included in fig. 5, wherein the first sub-picture a, the second sub-picture b, and the third sub-picture c are predicted pictures predicted based on the same input text; illustratively, the input text is: "panorama of Castle building, castle has pointed roof"; it can be seen that the image complexity of the first sub-image a to the third sub-image c gradually increases.

Similarly, the fourth sub-picture d, the fifth sub-picture e and the sixth sub-picture f are predicted images predicted based on the same input text; illustratively, the input text is: "panorama, blue tone of a house building under moonlight, cloud around building"; it can be seen that the image complexity of the fourth sub-image d to the sixth sub-image f gradually increases. The seventh sub-image g, the eighth sub-image h and the ninth sub-image i are predicted images predicted based on the same input text; illustratively, the input text is: "a multi-story ancient building sitting on hillside with steps leading to the gates of the ancient building from under the hillside, rich plants such as turf, cherry tree"; it can be seen that the image complexity of the seventh sub-image g to the ninth sub-image i gradually increases.

In summary, according to the method provided by the embodiment, by generating the predicted image including at least one candidate virtual model according to the input text, an effective reference is provided for building the interesting virtual environment; the predicted image comprises a first predicted image and a second predicted image, and the interest virtual environment is simulated in different complexity degrees; the predicted image comprises an observation picture of at least one candidate virtual model, a relation is established between the predicted image and the virtual provided by the construction interface, repeated adjustment of construction positions caused by no reference picture of the interesting virtual environment in the process of constructing the virtual environment is avoided, and the man-machine interaction efficiency is improved.

Next, a display mode of the virtual environment will be described.

In various embodiments of the present application, the display mode of the virtual environment at least includes any one of the following three implementation modes.

The predicted image is displayed on the construction interface in a superimposed manner in a floating window manner;

the predicted image and the build interface are displayed in a split screen manner;

the predicted image is displayed in a mapped manner on at least one virtual model in the build interface.

By means of the method, the device and the system, the predicted image and the construction interface are displayed simultaneously, so that when the candidate virtual model provided by the construction interface is used for constructing the interest virtual environment, the predicted image is displayed simultaneously, and effective reference is provided for constructing the interest virtual environment.

In one implementation mode, the predicted image is displayed on the construction interface in a superimposed manner in a floating window mode, so that the display size of the construction interface is ensured, the display position of the predicted image can be moved on the construction interface by displaying the predicted image in the floating window mode, and the virtual model for avoiding shielding creation is used for constructing the interest virtual environment.

In another implementation mode, the predicted image and the construction interface are displayed in a split screen mode, parallel display of the predicted image and the construction interface is achieved, an overlapping area does not exist in the two display parts, and the problem that the created virtual model cannot be completely viewed due to shielding of a suspension window on the construction interface is avoided.

In another implementation, the predictive image is graphically displayed on at least one virtual model in the build interface. Creating a virtual model mapped with a predicted image in a construction interface; the virtual model is controlled to rotate and move, so that the predicted image can be viewed in an immersive manner in the building process, and a reference is provided for building the interesting virtual environment. By way of example, FIG. 6 shows a schematic representation of a predicted image and a build interface provided by an exemplary embodiment of the present application. The predicted image is graphically displayed on a first virtual model 612 created in the build interface, the first virtual model 612 being a virtual billboard model. The building interface also comprises a second virtual model 614, and the second virtual model 614 is a building outer wall model obtained by splicing a plurality of cube models.

In summary, according to the method provided by the embodiment, by generating the predicted image including at least one candidate virtual model according to the input text, an effective reference is provided for building the interesting virtual environment; the predicted image comprises an observation picture of at least one candidate virtual model, a relation is established between the predicted image and a virtual provided by a construction interface, repeated adjustment of construction positions caused by no reference picture of an interesting virtual environment in the process of constructing the virtual environment is avoided, and the man-machine interaction efficiency is improved; the method and the device realize that when the candidate virtual model provided by the construction interface is used for constructing the interest virtual environment, the prediction image is displayed at the same time, and an effective reference is provided for constructing the interest virtual environment.

In an alternative implementation, step 530 in the embodiment illustrated in fig. 3 can be implemented as the following steps:

in response to obtaining the input text, invoking an image prediction model, predicting a predicted image from the input text, and displaying the predicted image;

illustratively, the image prediction model is used to generate an image from the entered prompt text. The image prediction model may be an artificial neural network (Artificial Neural Networks, ANN) model or a function calculation model based on a statistical calculation mode, and the image prediction model includes at least one of a convolutional neural network (Convolutional Neural Network, CNN), a cyclic neural network (Recurrent Neural Network, RNN), a depth residual network (Deep residual network, res net) and a conversion network (Transformer Network). The image prediction model illustratively has a model of the ability to generate images from hint text.

Further, fig. 7 shows a flowchart of an image generating method according to an exemplary embodiment of the present application. The method is applied to the terminal and is exemplified by the following steps 510, 520, 525, 532 and 534; for the description of steps 510, 520 reference may be made to the corresponding embodiment of fig. 3 above; hereinafter, steps 525, 532, 534 in this embodiment will be described.

Step 525: acquiring an image style text;

by way of example, the image style text is a hint text describing the style type of the predicted image; the style of the predicted image is used for indicating the overall style characteristics of the virtual environment obtained by constructing the candidate virtual model, such as cartoon style, iridescence color matching, soft light illumination condition and the like. For use in generating a predicted image for an image prediction model.

The image style text is used as an input parameter of an image prediction model to describe the integral style characteristics of the interest virtual environment simulated by the predicted image. The image style text can be preset or the whole style characteristics extracted from the existing virtual environment; further, the existing virtual environment is a virtual environment obtained by building based on candidate virtual models included in the building interface; taking the example that the virtual environment is a gateway-running class checkpoint, the image style text can be extracted from the screenshot of the virtual environment corresponding to the virtual checkpoint.

Step 532: in response to acquiring the input text and the image style text, determining the input text and the image style text as input parameters of the image prediction model;

Illustratively, determining the input text and the image style text as input parameters of an image prediction model, and reinforcing the input text through the image style text as the input parameters of the image prediction model; the method ensures that the predicted image generated by the image prediction model has stable integral style characteristics so as to realize that the predicted image and the existing virtual environment have the same characteristics in color matching, illumination conditions and image styles, thereby being beneficial to improving the simulation level of the predicted image for observing the interesting virtual environment and providing effective reference for building the interesting virtual environment.

Step 534: calling an image prediction model, predicting according to input parameters to obtain a predicted image, and displaying the predicted image;

illustratively, the image prediction model is used for generating an image according to the input prompt text; specifically, the input parameters are reference information of a predicted image generated by the image prediction model, and the input parameters are processed based on network parameters in the image prediction model to obtain the predicted image.

The image prediction model illustratively has a model of the ability to generate an image from the hint text, and illustratively the predicted image generated by the image prediction model has the same overall style characteristics as the viewing frame of the existing virtual environment.

In summary, according to the method provided by the embodiment, according to the input text, the predicted image including at least one candidate virtual model is generated by calling the image predicted model, so that an effective reference is provided for building the interest virtual environment; the predicted image comprises an observation picture of at least one candidate virtual model, a relation is established between the predicted image and the virtual provided by the construction interface, repeated adjustment of construction positions caused by no reference picture of the interesting virtual environment in the process of constructing the virtual environment is avoided, and the man-machine interaction efficiency is improved.

Next, the training process of the image prediction model in the above two embodiments will be further described.

In an alternative implementation, the method further comprises the step of training the image prediction model:

acquiring a sample image and descriptive text of the sample image;

illustratively, the sample image is a picture of the sample environment being observed; specifically, the sample environment is a virtual environment constructed based on at least one candidate virtual model included in the construction interface. In one example, the sample environment is a virtual environment for a virtual character to make a jaywalker class checkpoint, and the sample image may be screenshot information of the jaywalker class checkpoint. The description text is used for describing virtual articles deployed in the sample environment, and the description text can be information obtained through manual annotation or appearance of the virtual articles in a sample image obtained through image recognition of the sample image.

Training the recognition image prediction model based on the sample image and the description text to obtain an image prediction model;

illustratively, a text is input into a recognition image prediction model to obtain a recognition prediction image; the recognition predicted image is an image predicted by a recognition image prediction model having an ability to generate an image from a hint text. And carrying out backward propagation training on the through image prediction model according to the difference between the through image prediction image and the sample image, and adjusting model parameters of the through image prediction model to obtain the image prediction model through training.

Illustratively, the recognition image prediction model is a model that has the ability to generate an image from hint text. Illustratively, in training the predictive model of the through image, the model parameters of the predictive model of the through image are adjusted to reduce the distinction between the predictive image and the sample image.

The image prediction model is obtained by training on the basis of the through image prediction model, and by adjusting model parameters of the through image prediction model, the image prediction model obtained by training is ensured to generate an image with the same construction characteristics as a sample image, the predicted image generated by the image prediction model is ensured to comprise an observation picture of at least one virtual model, and the interest virtual environment is constructed based on at least one candidate virtual model in the picture of observing the interest virtual environment simulated by the image prediction model.

In summary, according to the method provided by the embodiment, according to the input text, the predicted image including at least one candidate virtual model is generated by calling the image predicted model, so that an effective reference is provided for building the interest virtual environment; the image prediction model is obtained by training the through image prediction model, so that the prediction accuracy of the observation picture of the interest virtual environment is improved; the predicted image comprises an observation picture of at least one candidate virtual model, a relation is established between the predicted image and the virtual provided by the construction interface, repeated adjustment of construction positions caused by no reference picture of the interesting virtual environment in the process of constructing the virtual environment is avoided, and the man-machine interaction efficiency is improved.

In another alternative implementation, the method further comprises the step of training the image prediction model:

acquiring a sample model and descriptive text of the sample model;

the sample model may be a candidate virtual model provided in the construction interface, or may be a model constructed by a plurality of candidate virtual models, which is not limited in this embodiment. The description text is used for describing appearance information of the sample model, and the description text can be information obtained through manual annotation or appearance of a virtual object in a sample image obtained through image recognition of the sample image.

Training the three-dimensional model prediction network based on the sample model and the description text to obtain a trained three-dimensional model prediction network;

illustratively, inputting the description text into a three-dimensional model prediction network to obtain a prediction model; the three-dimensional model prediction network is used for generating a three-dimensional virtual model according to the descriptive text. And according to the difference between the sample model and the prediction model, performing backward propagation training on the three-dimensional model prediction network, and adjusting model parameters of the three-dimensional model prediction network to obtain the trained three-dimensional model prediction network.

In an alternative implementation, training the three-dimensional model predictive network may be divided into a plurality of training phases; two training phases are illustrated by way of example.

In one implementation, the outer surface of the sample model is a three-dimensional structure obtained by splicing triangular patches; in a first training phase, the number of triangular patches of the outer surface of the sample model is less than a number threshold, and in a second training phase, the number of triangular patches of the outer surface of the sample model is greater than the number threshold. The training times of the first training stage are preset, and the second training stage is entered after the preset training times are completed. Through the two training stages, the prediction capability of the three-dimensional model prediction network under different complexity degrees of the three-dimensional structure is improved.

In another implementation, in the first training phase, the descriptive text includes only descriptive information of the three-dimensional structure of the sample model; in the second training stage, the descriptive text also includes descriptive information of other appearance attributes such as color, texture, etc. of the sample model. Through two training stages, firstly, the structure prediction capability of the three-dimensional model is trained, and then, the capability of illumination, color filling and the like on the three-dimensional model is trained.

The image prediction model comprises a trained three-dimensional model prediction network; in the use process of the image prediction model, a prediction image generated by calling the image prediction model is an observation image aiming at a prediction virtual environment, and the prediction virtual environment is a virtual environment constructed by a three-dimensional model generated by a trained three-dimensional model prediction network according to an input text.

Next, the image style text line will be further described.

In an alternative implementation, step 525 in the embodiment shown in fig. 7 can be implemented as the following sub-steps:

substep 25a: acquiring a sample image and a description text of the sample image;

illustratively, the sample image is a picture of the sample environment being observed; specifically, the sample environment is a virtual environment constructed based on at least one candidate virtual model included in the construction interface. The description text is used for describing virtual articles deployed in the sample environment, and the description text can be information obtained through manual annotation or appearance of the virtual articles in a sample image obtained through image recognition of the sample image.

Substep 25b: carrying out feature extraction processing on the image style of the sample image to obtain sample style features;

For example, the feature extraction process may determine, as the sample style feature, a feature such as a degree of color similarity of the virtual article in one sample image, an illumination condition of the sample image, or the like; features having similarity among the plurality of sample images may also be determined as sample style features. For example, the sample style feature is provided in both sample images: the virtual buildings are architectural style characteristics of the tall buildings in the left and right side areas of the sample image according to the fact that the short buildings are located in the middle area of the sample image. The sample style characteristics are provided in both sample images: color style characteristics for which the color saturation is below a first threshold.

Substep 25c: calling a through image prediction model to perform prediction processing on the description text and the initial style text to obtain a style prediction image;

illustratively, the initial style text includes style descriptors and weight information; the style descriptors are natural language words describing sample style characteristics, and the initial style text can be pre-selected or determined according to the sample style characteristics.

For example, the description text and the initial style text are used as input parameters of a predictive model of the through image to be predicted, so that a style predictive image is obtained; the style prediction image is a predicted image having the same overall style characteristics as the sample image. Illustratively, the weight information is used for indicating the constraint degree of the style descriptor on the image prediction model; the weight information is illustratively normalized information, and the weight information floats with a value of 1 as a standard, indicating the degree of importance of the style descriptor.

Substep 25d: the method comprises the steps of taking the difference between the prediction style characteristics of a reduced style prediction image and sample style characteristics as a correction target, and correcting weight information in an initial style text to obtain an image style text;

illustratively, the difference between the predicted and sample style characteristics is indicative of a difference between the overall style characteristics of the style predicted image and the overall style characteristics of the sample image; this difference is due to the fact that the original style text does not accurately represent the extracted sample style characteristics; illustratively, this is due to the deviation between the use of natural language to describe sample style characteristics and understanding of natural language by the predictive model of the recognition image.

By means of example, the influence degree of the style description words in the initial style text on the recognition image prediction model is adjusted by correcting the weight information in the initial style text, so that the image style text is obtained, and the overall style characteristics of the style prediction image are guaranteed to be similar to those of the sample image.

In one example, the resulting image style text is: cartoon style, three-dimensional image, soft light (1.5), colorful flash, soft filling color (1.3). Illustratively, the numerical information in brackets is weight information of style descriptors.

In summary, according to the method provided by the embodiment, according to the input text, the predicted image including at least one candidate virtual model is generated by calling the image predicted model, so that an effective reference is provided for building the interest virtual environment; by obtaining the image style text, the predicted image and the sample image are ensured to have the same integral style characteristics; the predicted image comprises an observation picture of at least one candidate virtual model, a relation is established between the predicted image and the virtual provided by the construction interface, repeated adjustment of construction positions caused by no reference picture of the interesting virtual environment in the process of constructing the virtual environment is avoided, and the man-machine interaction efficiency is improved.

Fig. 8 shows a flowchart of an image generation method provided by an exemplary embodiment of the present application. The method comprises the following steps:

step 702: acquiring a sample image and a description text;

illustratively, the sample image is a picture of the sample environment being observed; in one example, the sample environment is a virtual environment for a virtual character to make a jaywalker class checkpoint, and the sample image may be screenshot information of the jaywalker class checkpoint. Illustratively, the descriptive text is used to describe virtual items deployed in the sample environment.

Step 704: calling a recognition image prediction model to predict the description text;

illustratively, the descriptive text is input into a recognition image prediction model, and the recognition image prediction model is called to predict the descriptive text.

Step 706: outputting a recognition prediction image;

illustratively, the recognition image prediction model outputs the recognition predicted image in response to invoking the recognition image prediction model to predict the descriptive text.

Step 708: manually labeling the integral style difference between the sample image and the recognition predicted image;

illustratively, the overall style is used to indicate the color similarity of the virtual article, the illumination condition of the sample image, and the like.

Step 710: judging whether the model training requirement is met or not;

by way of example, it is determined whether the manually noted global style differences meet model training requirements for the recognition predicted image, which are typically preset.

Illustratively, step 712 is performed if the model training requirements are met; in the event that the model training requirements are not met, step 714 is performed.

For example, the model training requirement may be a number of training times for training the model simultaneously. Illustratively, for the input text: an outdoor dining area, a sunshade cloth above a dining table, a lawn and trees near the dining table and a car.

The predicted image of the first image prediction model obtained through model training for a times is predicted in the use stage as shown in a first image 631 in fig. 9; the predicted image of the second image prediction model obtained through b model training predicted at the stage of use is shown as a second image 632 in fig. 9; the predicted image of the third image prediction model obtained through model training c times predicted at the stage of use is shown as a third image 633 in fig. 9.

Illustratively, a is greater than b, b is greater than c, and a, b, c are positive integers. It can be seen that the complexity of the first image 631 to the third image 633 increases gradually due to the number of training of the model. Specifically, the first image prediction model is trained by a models, screenshot information, close to the jaywalking class checkpoint, in the generated first image 631, and each virtual object in the first image 631 is built based on candidate virtual models provided by a building interface. Along with the decrease of the training times, the constraint of the screenshot information of the break-over class checkpoint on the model is gradually reduced in the second image 632 and the third image 633, and the association degree of the generated virtual object and the candidate virtual model provided by the construction interface is reduced.

It can be seen that the main body of the virtual vehicle in the first image 631 is a cube, which is obtained by splicing the cube as a vehicle body and the cylinder as a vehicle wheel, and only a part of vehicle details need to be added at the vehicle head part, and the virtual book tree is also obtained by splicing the cube and the sphere. In the second image 632, the virtual vehicle has a rounded vehicle appearance, the virtual plants have different types, and some plants have rich blade detail features. Further, in the third image 633, the virtual vehicle has rich vehicle details such as a car light, an air inlet grille, a vehicle waist line, and the like, and the virtual lawn has detail features on the premise that the virtual plant has blade features.

Step 712: training to obtain an image prediction model;

illustratively, the invoked generic image prediction model is determined as the image prediction model.

Step 714: adjusting network parameters of the recognition prediction image;

illustratively, the network parameters of the invoked predictive model of the generic image are adjusted, backward error propagation training is performed, and step 704 is performed after step 714 is performed.

In summary, according to the method provided by the embodiment, the image prediction model is obtained by training the recognition image prediction model, so that the prediction accuracy of the observation picture of the interest virtual environment is improved; the predicted image comprises an observation picture of at least one candidate virtual model, a relation is established between the predicted image and the virtual provided by the construction interface, repeated adjustment of construction positions caused by no reference picture of the interesting virtual environment in the process of constructing the virtual environment is avoided, and the man-machine interaction efficiency is improved.

Next, the process of building the virtual environment of interest will be described by three embodiments.

Fig. 10 shows a flowchart of an image generation method provided by an exemplary embodiment of the present application. The method is applied to the terminal for illustration, that is, based on the embodiment shown in fig. 3, the method further includes step 542 and step 543:

step 542: a color selector displaying the first virtual model;

illustratively, the first virtual model is a virtual model created in a build interface, the first virtual model being used to build the virtual environment of interest. Illustratively, the first virtual model and the at least one virtual model provided in the build interface are of the same kind.

The color selector of the first virtual model is illustratively provided with at least one recommended color for recommending filling the outer surface of the first virtual model in the build interface. Illustratively, the recommended color is a color in the predicted image; specifically, the recommended color is a color of the color point of interest in the predicted image. The position of the interest color taking point belongs to a first interest region, and the first interest region can be a region surrounded by lines in the predicted image; such as a polygon surrounded by lines of a first color, the first color and the color of the unmanned filling of the polygon may be different. Or may be a patch area such as an oval patch of a third color.

Illustratively, the shape of the first region of interest is the same as the shape of the first virtual model under at least one viewing angle. Taking the example that the first virtual model is a cube, the observation shape of the cube is rectangular under the front view angle, and the corresponding first interest area is also a rectangle. The three-dimensional structure of the cube can be seen from the observation shape of the cube under the other view angle, the observation shape is a hexagon, three vertexes which are not adjacent to each other on the hexagon correspond to the starting points of three line segments which are not parallel to each other, and the three line segments are intersected in the hexagon. The corresponding first region of interest is also a hexagon. It will be appreciated that the degree of darkness of the colors of the different regions in the hexagon may be different, subject to the three-dimensional structure.

Step 543: in response to the filling operation, displaying a first recommended color selected by the filling operation on at least one outer surface of the first virtual model;

illustratively, the fill operation is a select operation for any one of the recommended colors on the color selector; in case the color selector is provided with one of the recommended colors, the filling operation may be a triggering operation for building a first virtual model in the interface; for example, long press the first virtual model, double click on the first virtual model. In the case where the color selector is provided with a plurality of recommended colors, the filling operation may be a trigger operation for any one of the recommended colors; for example, clicking on the first recommended color, dragging the first recommended color onto the first virtual model.

The color selector may be displayed on the first virtual model circumference side in a bubble manner or may be displayed on the setting-up interface in a pop-up card manner.

Illustratively, at least one of the outer surfaces corresponds to a first region of interest. The outer surface filled with the first recommended color is an outer surface that can be observed in the predicted image. Further, the filling operation is a drag operation for the first recommended color, and the first recommended color is displayed on the outer surface corresponding to the end position of the drag operation.

In summary, according to the method provided by the embodiment, by generating the predicted image including at least one candidate virtual model according to the input text, an effective reference is provided for building the interesting virtual environment; by recommending the filling colors in the predicted image on the color selector, effective references are provided in the construction stage of the interesting virtual environment, and the man-machine interaction efficiency is improved.

Fig. 11 shows a flowchart of an image generation method provided by an exemplary embodiment of the present application. The method is applied to the terminal for illustration, namely, on the basis of the embodiment shown in fig. 3, the method further comprises the steps 544 and 545:

step 544: displaying at least one recommendation model;

Illustratively, the recommendation model is a candidate virtual model included in the build interface, and the recommendation model is displayed for prompting creation of the recommendation model in the build interface for building the virtual environment of interest.

Illustratively, the recommendation model has the same shape as the second region of interest in the predicted image in the observed shape at the at least one view angle; the second region of interest may be a region surrounded by lines in the predicted image, or may be a color block region; the application is not limited in this regard. For a detailed description of the second region of interest reference is made to the first region of interest above.

Step 545: in response to a model selection operation for a first recommendation model of the recommendation models, creating the first recommendation model in the build interface;

illustratively, the first recommendation model is used for building an interest virtual environment, and the interest virtual environment includes the first recommendation model, and the first recommendation model is used for building a virtual article in the predicted image or a sub-part of the virtual article.

By way of example, the setting of the parameters such as the position, the size, the color filling and the like of the first recommendation model is not limited, and the interested virtual environment can be built by adjusting at least one of the parameters.

In summary, according to the method provided by the embodiment, by generating the predicted image including at least one candidate virtual model according to the input text, an effective reference is provided for building the interesting virtual environment; by displaying at least one recommended model, effective references are provided for selection of candidate virtual models in the construction stage of the interest virtual environment, and man-machine interaction efficiency is improved.

Fig. 12 shows a flowchart of an image generation method provided by an exemplary embodiment of the present application. The method is applied to the terminal for illustration, namely, on the basis of the embodiment shown in fig. 3, the method further comprises the steps 546 and 547:

step 546: displaying a recommended position of the first model relative to the second model;

illustratively, where the first model is located at the recommended location, the first model and the second model form an interesting model set; the set of interest models may be one or more virtual items in the predicted image, but is not excluded from being a sub-portion of one virtual item.

The first model is a conical model, the second model is a cylindrical model, and the interest model group formed by the first model and the second model at the recommended position corresponds to a virtual castle; the set of impromptu models is one virtual item in the predicted image. The first model is a table model, the second model is a table model, the interest model group formed by the first model and the second model at the recommended position corresponds to a virtual table lamp and a virtual table in the predicted image, and the virtual table lamp is placed on the virtual table; the set of impromptu models is a plurality of virtual items in the predictive image.

Illustratively, the observed shape of the interest model set at the at least one view angle is the same as the shape of the third region of interest in the predicted image; the second region of interest may be a region surrounded by lines in the predicted image, or may be a color block region; the application is not limited in this regard. For a detailed description of the third region of interest reference is made to the first region of interest above.

Step 547: responding to the position selection operation, and moving the first model to a recommended position in the construction interface;

the position selection operation of the first model can be a trigger operation for a recommended position or a drag operation for the first model, and the operation mode of the position selection operation is not limited.

For example, in a case where the position selection operation is a drag operation for the first model, in a case where a distance between the drag position and the recommended position is less than a distance threshold, the first model position is displayed as the recommended position; adsorbing the first model position on the recommended position under the condition that the distance between the dragging position and the recommended position is smaller than a distance threshold; the recommended position can be selected quickly. Illustratively, the distance threshold is typically a pre-set empirical value.

In summary, according to the method provided by the embodiment, by generating the predicted image including at least one candidate virtual model according to the input text, an effective reference is provided for building the interesting virtual environment; by displaying the recommended position of the first model relative to the second model, effective reference is provided for determining the position of the first model in the interest virtual environment in the building stage of the interest virtual environment, and the man-machine interaction efficiency is improved.

FIG. 13 is a flowchart of an image generation method provided by an exemplary embodiment of the present application; the application of the method in a terminal is illustrated as an example, i.e. step 520 in the embodiment shown in fig. 3 can be implemented as step 522, step 524, step 526:

step 522: responding to a triggering operation for the language template, and displaying a first text corresponding to the language template;

illustratively, the first text includes an initial description of the virtual item in the virtual environment of interest on at least one appearance attribute; appearance attributes include at least one of color, texture, three-dimensional structure, size, fill pattern. Specifically, the following description is made.

Color, which may be a single color or a combination of colors, is used to indicate the mapping color of the virtual object.

The material is used for indicating the virtual object to simulate the material of the real object through the parameter information such as the reflectivity, the transparency and the like, such as the metal material, the glass material, the rubber material and the like.

And the texture is used for indicating the virtual object to simulate the rugged texture of the real object in a mapping mode and the like.

Three-dimensional structure for indicating structural information of a virtual object in a three-dimensional space, such as: at least one of three-dimensional shapes such as sphere, ellipsoid, cylinder, cone, prism, pyramid, etc., or a three-dimensional structure composed by combination. The three-dimensional structure may also be a three-dimensional structure in which the outer surfaces are spliced by triangles, for example.

A size, information describing the size of the virtual object, indicating the length, width, height, etc. of the virtual object in the three-dimensional space.

And filling the pattern for indicating the pattern of the virtual object on the external surface map.

Step 524: in response to an editing operation for the first text, displaying a change from the first text to the second text;

the second text is used for modifying the initial description word in the first text; the second text is a description word that replaces the first text on at least one appearance attribute, but does not delete or add the dimension of the appearance attribute included in the first text.

Step 526: determining the second text as the input text in response to a confirmation operation for the second text;

by way of example, through editing operation, description information of the interest virtual environment is obtained, and the second text carries appearance attributes of at least one dimension, so that description of the interest virtual environment from different dimensions is realized.

In summary, according to the method provided by the embodiment, by generating the predicted image including at least one candidate virtual model according to the input text, an effective reference is provided for building the interesting virtual environment; the appearance attributes of a plurality of dimensions are acquired through the first text guidance, the dimension of the input text for describing the interesting virtual environment is improved, the predicted image comprises the observation picture of at least one candidate virtual model, the relation is established between the predicted image and the virtual provided by the construction interface, the repeated adjustment of the construction position caused by the fact that the reference picture of the interesting virtual environment is not available in the process of constructing the virtual environment is avoided, and the man-machine interaction efficiency is improved.

It will be appreciated by those skilled in the art that the above embodiments may be implemented independently, or the above embodiments may be combined freely to form new embodiments to implement the image generating method of the present application.

Fig. 14 is a block diagram showing the structure of an image generating apparatus according to an exemplary embodiment of the present application.

The device comprises:

the display module 810 is configured to display a building interface of the virtual environment, where the building interface includes at least one candidate virtual model for building the virtual environment;

an obtaining module 820, configured to obtain an input text, where the input text is a prompt text for building an interest virtual environment, and the input text is used to describe an appearance of at least one virtual article included in the interest virtual environment;

the display module 810 is further configured to display, in response to acquiring the input text, a predicted image, where the predicted image is a preview image of the virtual environment of interest constructed using the at least one candidate virtual model, and the predicted image includes a view of at least one candidate virtual model.

In an alternative implementation of the present embodiment, the predicted image includes a first predicted image and a second predicted image; the display module 810 is further configured to:

In an optional implementation manner of this embodiment, the predicted image is displayed on the building interface in a superimposed manner in a floating window manner;

In an optional implementation of this embodiment, the display module 810 is further configured to:

In an optional design of the present application, the obtaining module 820 is further configured to obtain a sample image and a description text of the sample image, where the sample image is a picture for observing a sample environment, and the sample environment is a virtual environment built based on the at least one candidate virtual model included in the building interface;

the apparatus further comprises a processing module 830:

the processing module 830 is configured to train a recognition image prediction model based on the sample image and the description text to obtain the image prediction model, where the recognition image prediction model is a model with an ability to generate an image according to the prompt text.

In an optional implementation of this embodiment, the obtaining module 820 is further configured to:

the display module 810 is further configured to:

It should be noted that, when the apparatus provided in the foregoing embodiment performs the functions thereof, only the division of the respective functional modules is used as an example, in practical application, the foregoing functional allocation may be performed by different functional modules according to actual needs, that is, the content structure of the device is divided into different functional modules, so as to perform all or part of the functions described above.

With respect to the apparatus in the above embodiments, the specific manner in which the respective modules perform the operations has been described in detail in the embodiments regarding the method; the technical effects achieved by the execution of the operations by the respective modules are the same as those in the embodiments related to the method, and will not be described in detail herein.

The embodiment of the application also provides a computer device, which comprises: a processor and a memory, the memory storing a computer program; the processor is configured to execute the computer program in the memory to implement the image generating method provided in each method embodiment.

Fig. 15 is a block diagram illustrating a structure of a terminal according to an exemplary embodiment of the present application. The terminal 1900 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 1900 may also be referred to as a user device, portable terminal, laptop terminal, desktop terminal, or the like.

Generally, terminal 1900 includes: a processor 1901 and a memory 1902. Processor 1901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1901 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1901 may also include a main processor, which is a processor for processing data in the awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1901 may incorporate a GPU (Graphics Processing Unit, image processor) for rendering and rendering content required for display by the display screen. In some embodiments, the processor 1901 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 1902 may include one or more computer-readable storage media, which may be non-transitory. Memory 1902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1902 is used to store at least one instruction for execution by processor 1901 to implement the image generation methods provided by the method embodiments of the present application.

In some embodiments, terminal 1900 may optionally further include: a peripheral interface 1903 and at least one peripheral. The processor 1901, memory 1902, and peripheral interface 1903 may be connected by a bus or signal line. The individual peripheral devices may be connected to the peripheral device interface 1903 via buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1904, a touch display screen 1905, a camera assembly 1906, audio circuitry 1907, and a power supply 1908.

Peripheral interface 1903 may be used to connect at least one Input/Output (I/O) related peripheral to processor 1901 and memory 1902. In some embodiments, processor 1901, memory 1902, and peripheral interface 1903 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 1901, memory 1902, and peripheral interface 1903 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 1904 is configured to receive and transmit RF (Radio Frequency) signals, also referred to as electromagnetic signals. The radio frequency circuit 1904 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1904 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1904 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuit 1904 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuit 1904 may also include NFC (Near Field Communication ) related circuits, which the present application is not limited to.

The camera assembly 1906 is used to capture images or video. Optionally, camera assembly 1906 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize that panoramic shooting is realized with a Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 1906 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 1907 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, inputting the electric signals to the processor 1901 for processing, or inputting the electric signals to the radio frequency circuit 1904 for realizing voice communication. For purposes of stereo acquisition or noise reduction, the microphone may be multiple, each disposed at a different location on the terminal 1900. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1901 or the radio frequency circuit 1904 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuit 1907 may also include a headphone jack.

In some embodiments, terminal 1900 also includes one or more sensors 1909. The one or more sensors 1909 include, but are not limited to: acceleration sensor 1910, gyro sensor 1911, pressure sensor 1912, optical sensor 1913, and proximity sensor 1914.

Acceleration sensor 1910 may detect the magnitude of acceleration on three coordinate axes of a coordinate system established with terminal 1900. For example, the acceleration sensor 1910 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1901 may control the touch display 1905 to display a user interface in a landscape view or a portrait view based on gravitational acceleration signals acquired by the acceleration sensor 1910. Acceleration sensor 1910 may also be used for the acquisition of motion data of a game or user. The gyro sensor 1911 may detect a body direction and a rotation angle of the terminal 1900, and the gyro sensor 1911 may collect a 3D motion of the user on the terminal 1900 in cooperation with the acceleration sensor 1910. The processor 1901 may implement the following functions based on data collected by the gyro sensor 1911: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

Pressure sensor 1912 may be disposed on a side border of terminal 1900 and/or below touch display 1905. When the pressure sensor 1912 is disposed on the side frame of the terminal 1900, a grip signal of the terminal 1900 by the user may be detected, and the processor 1901 may perform left-right hand recognition or quick operation according to the grip signal collected by the pressure sensor 1912. When the pressure sensor 1912 is disposed at the lower layer of the touch display screen 1905, the processor 1901 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 1905. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The optical sensor 1913 is configured to collect ambient light intensity. In one embodiment, the processor 1901 may control the display brightness of the touch display 1905 based on the ambient light intensity collected by the optical sensor 1913. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1905 is turned high; when the ambient light intensity is low, the display brightness of the touch display screen 1905 is turned down. In another embodiment, the processor 1901 may also dynamically adjust the shooting parameters of the camera assembly 1906 based on the ambient light intensity collected by the optical sensor 1913.

A proximity sensor 1914, also referred to as a distance sensor, is typically disposed on the front panel of terminal 1900. The proximity sensor 1914 serves to collect a distance between a user and the front of the terminal 1900. In one embodiment, when the proximity sensor 1914 detects a gradual decrease in the distance between the user and the front face of the terminal 1900, the processor 1901 controls the touch display 1905 to switch from the bright screen state to the off screen state; when the proximity sensor 1914 detects that the distance between the user and the front surface of the terminal 1900 gradually increases, the processor 1901 controls the touch display 1905 to switch from the off-screen state to the on-screen state.

It will be appreciated by those skilled in the art that the above-described structure is not limiting of terminal 1900 and may include more or fewer components than shown, or may combine certain components, or employ a different arrangement of components.

In an exemplary embodiment, a chip is also provided, the chip comprising programmable logic circuits and/or program instructions for implementing the image generation method of the above aspect when the chip is run on a computer device.

In an exemplary embodiment, a computer program product is also provided, the computer program product comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor reads and executes the computer instructions from the computer-readable storage medium to implement the image generation method provided by the above-described method embodiments.

In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein a computer program loaded and executed by a processor to implement the image generation method provided by the above-described method embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims

1. An image generation method, the method comprising:

2. The method of claim 1, wherein the predicted image comprises a first predicted image and a second predicted image; the displaying a predicted image in response to the obtaining of the input text, comprising:

3. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the predicted image is overlapped and displayed on the construction interface in a floating window mode;

4. The method of claim 1, wherein the displaying a predicted image in response to the obtaining of the input text comprises:

5. The method according to claim 4, wherein the method further comprises:

training a through image prediction model based on the sample image and the description text to obtain the image prediction model, wherein the through image prediction model is a model with the capability of generating an image according to the prompt text.

6. The method according to claim 4, wherein the method further comprises:

the responding to the input text is obtained, the image prediction model is called, the predicted image is obtained according to the input text prediction, and the method comprises the following steps:

7. The method of claim 6, wherein the obtaining image style text comprises:

8. The method according to any one of claims 1 to 7, further comprising:

9. The method according to any one of claims 1 to 7, further comprising:

10. The method according to any one of claims 1 to 7, further comprising:

And in response to a position selection operation, moving the first model to the recommended position in the construction interface.

11. The method of any of claims 1 to 7, wherein the obtaining the input text comprises:

12. An image generation apparatus, the apparatus comprising:

13. A computer device, the computer device comprising: a processor and a memory, wherein at least one section of program is stored in the memory; the processor is configured to execute the at least one program in the memory to implement the image generation method according to any one of claims 1 to 11.

14. A computer readable storage medium having stored therein executable instructions that are loaded and executed by a processor to implement the image generation method of any of the preceding claims 1 to 11.

15. A computer program product, characterized in that it comprises computer instructions stored in a computer-readable storage medium, from which a processor reads and executes the computer instructions to implement the image generation method according to any of the preceding claims 1 to 11.