Movatterモバイル変換


[0]ホーム

URL:


CN119939004A - Web page process automation method, device, equipment, storage medium and program product - Google Patents

Web page process automation method, device, equipment, storage medium and program product
Download PDF

Info

Publication number
CN119939004A
CN119939004ACN202510031776.8ACN202510031776ACN119939004ACN 119939004 ACN119939004 ACN 119939004ACN 202510031776 ACN202510031776 ACN 202510031776ACN 119939004 ACN119939004 ACN 119939004A
Authority
CN
China
Prior art keywords
web page
webpage
large model
operation step
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510031776.8A
Other languages
Chinese (zh)
Inventor
付兵兰
廖汉伟
李丹霞
何慧敏
蔡亚妮
杨晓锋
刘雅莲
胡菲
刘春林
彭伟军
陈国�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Information Technology Co LtdfiledCriticalChina Mobile Communications Group Co Ltd
Priority to CN202510031776.8ApriorityCriticalpatent/CN119939004A/en
Publication of CN119939004ApublicationCriticalpatent/CN119939004A/en
Pendinglegal-statusCriticalCurrent

Links

Landscapes

Abstract

The application discloses a webpage process automation method, a device, equipment, a storage medium and a program product, which relate to the technical field of artificial intelligence, wherein the webpage process automation method comprises the steps of receiving business requirement description input by a user; and positioning webpage elements through a large model based on a pre-acquired webpage document object model and the operation step instructions to obtain a webpage element path, and executing the webpage operation flow through the webpage element path. The application realizes the webpage process automation based on the large model and the webpage document object model, solves the problem of low webpage element positioning efficiency of the existing webpage process automation method, and improves the webpage element positioning efficiency of the webpage process automation.

Description

Webpage process automation method, device, equipment, storage medium and program product
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a program product for automating a web page process.
Background
The RPA (Robotic Process Automation, robot process automation) technology allows the operation of programs and systems to be automated by simulating the operation of users on computers, so that the users can be released from the monotone repeated operation, and the working efficiency is improved.
In the existing webpage process automation method, a large amount of time is needed to mark data by strengthening learning of the intelligent agent to identify webpage components and position the webpage elements, and a large amount of resources are needed to train a model, so that too many webpage elements are given to the intelligent agent, the difficulty of positioning the webpage elements is increased, and therefore the positioning efficiency of the webpage elements is low.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present application and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The application mainly aims to provide a webpage process automation method, a device, equipment, a storage medium and a program product, and aims to solve the technical problem that the existing webpage process automation method is low in webpage element positioning efficiency.
In order to achieve the above object, the present application provides a web page process automation method, which includes:
Receiving a service demand description input by a user;
step disassembly is carried out on the business requirement description to obtain a webpage operation flow, wherein the webpage operation flow comprises a plurality of operation step instructions;
and based on a pre-acquired webpage document object model and the operation step instructions, positioning webpage elements through a large model to obtain a webpage element path, and executing the webpage operation flow through the webpage element path.
In an embodiment, the step of disassembling the service requirement description to obtain the web page operation flow includes:
acquiring full component information;
Generating a first disassembly prompt word based on the service demand description and the full component information;
inputting the first disassembly prompt word into the large model, performing step disassembly through the large model, and outputting a webpage operation flow.
In an embodiment, the full component information includes one or more of a component name, a component parameter, a component relationship, and a component function description, the operation step instruction includes one or more of a step name, a step instruction parameter, and a step function description, and the operation step instruction is used to operate a web page element of the front-end web page, where the web page element includes one or more of a text box, a button, a drop-down box, and a check box.
In an embodiment, before the step of obtaining the web page element path by performing web page element positioning through the large model based on the pre-obtained web page document object model and the plurality of operation step instructions, the method further includes:
Checking the operation step instructions based on a preset checking rule;
When the operation step instructions fail to pass the verification, generating operation semantic vectors based on the operation step instructions fail to pass the verification;
Searching through a preset component vector database according to the operation semantic vector to obtain target component data;
And generating a second disassembly prompt word based on the target assembly data and the first disassembly prompt word, inputting the second disassembly prompt word into the large model, performing step disassembly through the large model, and outputting the webpage operation flow again.
In an embodiment, the step of positioning the web page element through a preset large model based on the pre-acquired web page document object model and the plurality of operation step instructions to obtain a web page element path includes:
generating a positioning prompt word based on the webpage document object model and a plurality of operation step instructions;
and inputting the positioning prompt word into the large model, positioning the webpage element through the large model, and outputting a webpage element path.
In an embodiment, the step of inputting the positioning prompt word into the large model, positioning the webpage element through the large model, and outputting the webpage element path includes:
Inputting the positioning prompt word into the large model for the following processing:
Determining webpage elements to be operated according to the positioning prompt words;
Reading nodes corresponding to the webpage elements in the webpage document object model, and setting the nodes corresponding to the webpage elements as current nodes;
recursively calling a parent node of the current node in the webpage document object model, and setting the parent node of the current node as the current node until the current node does not have the parent node, so as to obtain a recursively called node path;
And outputting a webpage element path according to the node path.
In addition, in order to achieve the above object, the present application also provides a web page process automation device, which includes:
the receiving module is used for receiving the business requirement description input by the user;
the disassembly module is used for carrying out step disassembly on the business requirement description to obtain a webpage operation flow, and the webpage operation flow comprises a plurality of operation step instructions;
and the positioning module is used for positioning the webpage elements through the large model based on the pre-acquired webpage document object model and the operation step instructions to obtain a webpage element path, and executing the webpage operation flow through the webpage element path.
In addition, in order to achieve the aim, the application also provides a webpage process automation device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program is configured to realize the steps of the webpage process automation method.
In addition, to achieve the above object, the present application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the web page flow automation method as described above.
Furthermore, to achieve the above object, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of a web page flow automation method as described above.
The application provides a webpage process automation method, which comprises the steps of firstly receiving business requirement description input by a user to determine webpage process automation requirements, carrying out step disassembly on the business requirement description to obtain a webpage operation process comprising a plurality of operation step instructions so as to convert the user requirements into a plurality of executable webpage step instructions, and carrying out high-efficiency webpage element positioning by utilizing a large model to position according to a pre-acquired webpage document object model and the operation step instructions obtained by step disassembly to obtain a webpage element path and automatically executing the webpage operation process through the webpage element path. The method realizes webpage process automation based on the large model and the webpage document object model, solves the problem of low webpage element positioning efficiency of the existing webpage process automation method, and improves the webpage element positioning efficiency of the webpage process automation.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it will be obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.
FIG. 1 is a flow chart of a first embodiment of an automated web page flow method according to the present application;
FIG. 2 is a flow chart of a second embodiment of the web page flow automation method according to the present application;
FIG. 3 is a schematic diagram of a third embodiment of an automated web page process according to the present application;
FIG. 4 is a flowchart of a third embodiment of the web page process automation method according to the present application;
FIG. 5 is a schematic flow chart of service requirement dismantling provided by a third embodiment of the automatic webpage flow chart method of the present application;
FIG. 6 is a flowchart of a third embodiment of the method for automatically generating a web page flow according to the present application;
FIG. 7 is a flowchart illustrating a web page element positioning process according to a third embodiment of the present application;
FIG. 8 is a flowchart illustrating a web page process according to a third embodiment of the present application;
FIG. 9 is a schematic block diagram of an embodiment of an automatic web page process device;
fig. 10 is a schematic diagram of a device structure of a hardware operating environment related to a web page process automation method according to an embodiment of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the technical solution of the present application and are not intended to limit the present application.
For a better understanding of the technical solution of the present application, the following detailed description will be given with reference to the drawings and the specific embodiments.
The main solution of the embodiment of the application is that the business requirement description input by a user is received, the business requirement description is disassembled to obtain a webpage operation flow, the webpage operation flow comprises a plurality of operation step instructions, webpage element positioning is carried out through a large model based on a pre-acquired webpage document object model and the operation step instructions to obtain a webpage element path, and the webpage operation flow is executed through the webpage element path.
In the prior art, a method of training a reinforcement learning model is generally adopted to identify UI (User Interface) components and web page elements, so as to implement web page process automation, in order to enable reinforcement learning agents to identify front-end component types (i.e., buttons, text boxes, drop-down boxes, etc.) corresponding to various web page elements, learn alignment instruction key value pairs and operable leaf nodes and implement automatic execution actions, collect data sets described by different web pages about DOM (Document Object Model, document object models) of UI components such as buttons, drop-down boxes, check boxes, etc., and label these data sets, and this method needs to spend a great deal of time to label data and a great deal of resources to train the model.
Secondly, the mode of giving all controllable nodes (leaf nodes) in the webpage document object model to the intelligent agent can cause excessive page elements given to the intelligent agent, and the difficulty of locating the page elements is improved. The natural instructions are analyzed to obtain instruction key value pairs instead of task execution steps, the granularity and the dimension of the analysis are not thin enough, and the analysis is not matched with the existing webpage process automatic design steps, so that the prior RPA component and user understanding are not facilitated to be reused.
The application relates to the technical terms:
RPA (Robotic Process Automation, robotic flow automation), a digital technique that allows automating the operation of programs and systems by mimicking the operation of a user on a computer.
Large models, meaning machine learning models with large scale parameters and complex computational structures, typically built from deep neural networks, with billions or even billions of parameters capable of handling more complex tasks and data.
The page element recognition technology, namely a technology for automatically recognizing and understanding various elements on a webpage or an application program interface, can realize various complex automation tasks such as webpage crawlers, UI (user interface) automation tests, data extraction and the like by combining with technologies such as HTML (hypertext markup language) analysis, element positioning, data grabbing, cleaning and the like.
Plug-in is a program written by an application program interface conforming to a certain specification. It can only run under a program-specified system platform (possibly supporting multiple platforms simultaneously) and cannot run separately from the specified platform.
The DOM (document object model) is a programming interface for representing and manipulating HTML or XML documents, representing the document as a structured tree structure, each node representing an object, such as an element, attribute, or text, in the document.
The application can automatically complete the disassembly of the user service demands based on the semantic understanding and reasoning capacity of the large model, convert the natural language input by the user into a plurality of executable steps, reduce the threshold of RPA flow arrangement, reduce the time of flow arrangement, reuse the existing RPA components to complete the execution of operation steps, basically coincide with the existing RPA design steps, and be beneficial to multiplexing the understanding of the existing RPA components and the user. In addition, a given page element and a page DOM structure are used as input, a certain simple page element is positioned from the given page element and the page element and an operation step instruction are given to an executor engine to be executed, and the page element can be positioned efficiently and accurately for different web page interfaces or in different application scenes without being influenced by factors such as web page layout, colors and styles.
It should be noted that, the execution body of the embodiment may be a computing service device with functions of data processing, network communication and program running, such as a tablet computer, a personal computer, a mobile phone, or an electronic device, a web page flow automation device, or the like capable of implementing the above functions. The present embodiment and the following embodiments will be described below by taking a web page flow automation device as an example.
Based on this, an embodiment of the present application provides a web page process automation method, and referring to fig. 1, fig. 1 is a flow diagram of a first embodiment of the web page process automation method of the present application.
In this embodiment, the web page process automation method includes steps S10 to S30:
step S10, receiving a service demand description input by a user;
It should be noted that, the service requirement description refers to a specific service requirement description of the user on the automation task, which is a starting point of the automation flow design, and the user may input through a web plug-in, a user graphical interface, a command line input or through uploading a requirement document, etc. The business requirement description can include task content, target web pages, expected results and the like which the user wants to execute.
It can be understood that, in order to define the user requirements, a foundation is provided for the subsequent automated process design, and the specific business requirement description of the user on the automated process of the webpage is received, so that the specific task that the user wants to be automated is understood, and the automated process is ensured to meet the actual business requirements of the user.
Step S20, carrying out step disassembly on the business requirement description to obtain a webpage operation flow, wherein the webpage operation flow comprises a plurality of operation step instructions;
It should be noted that, the web page operation flow refers to a series of step sets for performing an automation operation in a web page, and is used for guiding the automation flow to execute. The operation step instruction refers to a specific operation instruction for guiding an automation tool to perform a specific action in a web page, such as "click form button", "input data", "click submit button", etc.
It will be appreciated that in order to convert the high-level requirements of a user into specific operational steps executable by an automation tool, the user-entered business requirement description is broken down into a web page operational flow containing a plurality of executable operational step instructions, which provides explicit operational guidance for the automation tool, ensuring the process's executable performance.
Step S30, based on the pre-acquired webpage document object model and the operation step instructions, positioning webpage elements through the large model to obtain a webpage element path, and executing the webpage operation flow through the webpage element path.
It should be noted that, the web page document object model refers to a tree representation of a web page structure, which is used for locating and operating web page elements, and a complete web page document object model may include contents such as web page elements, notes, js, styles and the like of the whole web page. The web page element paths point to specific positions of a certain web page element in the DOM tree of the web page document object model, and the paths can be used for accurately interacting with the elements on the web page, so that an automation tool is guided to find and operate specific web page elements in the web page, wherein the paths of how to reach the specific web page elements from the root node of the web page document object model are described. For example, CSS selectors (e.g., # loginButton) or XPath (e.g.,/html/body/div/button [1 ]) are used to locate elements.
Additionally, it should be noted that, when the web page element cannot be located directly through the DOM tree or the text description, the elements such as the buttons, the input boxes, etc. in the page can be found through the image recognition technology, and the elements are further accurately located based on the visual content of the web page.
It can be understood that, in order to realize automatic web page interaction, a web page element path is positioned by using a document object model and an operation step instruction of a web page, a web page element is operated by the web page element path, a web page operation flow is automatically executed, and a system can accurately identify a target element in a complex web page and execute a web page flow automatic operation by a large model technology and a web page document object model technology.
In a possible implementation manner, the step of disassembling the service requirement description to obtain the web page operation flow includes:
step S201, acquiring full component information;
it should be noted that the full component information refers to all possible operable component information, and the components generally include interactive elements such as buttons, text boxes, links, drop-down boxes, check boxes, and the like, and their attributes (such as IDs, class names, locations, tag text, and the like) can be obtained by collecting different component library data.
It can be understood that the components used in different webpages are different, so that in order to ensure that the system can comprehensively understand the components of different webpages and provide rich operation information for the disassembly of the subsequent steps, all the webpage component information is collected, and therefore, when the subsequent steps are disassembled, the webpage elements corresponding to the component operation can be identified according to the total component information and an operation flow can be generated.
Step S202, generating a first disassembly prompt word based on the service demand description and the full component information;
It should be noted that, the disassembly prompt word is a descriptive problem or instruction, and is intended to guide the large model to disassemble the steps described by the user's requirements. And converting the business requirements of the user into task prompt words which can be understood and disassembled by the machine. For example, when the user enters "I want to log in," the system may generate a break-down prompt, "the user wants to fill in the user name and password on the login page, and clicks the login button".
It can be understood that, in order to combine the user requirement with the information of the web page component, the large model is guided to disassemble the service requirement by the disassembling prompt word to be a specific operation step, the service requirement description provided by the user is combined with the acquired total component information to generate the disassembling prompt word guiding the large model to output the answer, so as to convert the abstract service requirement into the operation prompt directly associated with the web page component, and provide detailed guidance for subsequent automatic disassembly.
Step 203, inputting the first disassembly prompt word into the large model, performing step disassembly through the large model, and outputting a webpage operation flow.
It should be noted that the large model may be a trained deep learning model, which can understand the specific meaning of a task from the input prompt words and generate a series of operation steps.
It can be understood that the first disassembly prompt word is used as input, the first disassembly prompt word is further processed and analyzed by utilizing the reasoning capability of the large model, the requirement of the user is intelligently disassembled into the webpage operation flow, and the accuracy and the high efficiency of the execution of the webpage operation flow are ensured.
In this embodiment, based on the business requirements input by the user, the key actions, targets and context information in the business requirements of the user are extracted by using the natural language processing capability of the large model, and specific operations and relevant elements corresponding to each step in the business requirements are understood, so that the natural language requirements input by the user can be converted into machine executable operations, the automation level is improved, and by means of the language understanding and logical reasoning capability of the large model, not only static web pages can be processed, but also page structure changes can be adapted, and different types of web pages can be flexibly handled.
In a possible implementation, the full component information includes one or more of a component name, a component parameter, a component relationship, and a component function description, the operation step instruction includes one or more of a step name, a step instruction parameter, and a step function description, and the operation step instruction is used to operate a web page element of the front-end web page, where the web page element includes one or more of a text box, a button, a drop-down box, and a check box.
It should be noted that, the component name refers to a basic identifier of a web page element, typically the name of an HTML tag. For example, a button corresponds to < button >, a text box corresponds to < input >, a drop-down box corresponds to < select >, a check box corresponds to < input type= "checkbox", etc. Component parameters refer to specific attributes describing a component that determine the behavior and manner of display of the component. Component relationships refer to associations or relative locations between components and other components that are important for determining the order and steps of operations, e.g., a button may be located within a form, or a text box is part of a login form. Component function descriptions refer to related descriptions of component roles or functions, e.g., a function description of a "submit" button may be "submit form data" and a text box may be "enter user name" or "enter password".
The step name describes the function or purpose of the step, e.g. "enter user name", "click on login button", etc. The step command parameters give specific operation parameters, for example, the "user name input" step may include a parameter "user name", i.e., a component name corresponding to the input box, and the "button click" step may include a parameter "submit_button", i.e., a component name of the button. Step function description details the role of this operation step, for example, the function description of "enter user name" may be "enter a valid user name in a user name text box", and the function description of "click on login button" may be "click on login button submit form".
In a possible implementation manner, before the step of obtaining the web page element path by performing web page element positioning through the large model based on the pre-obtained web page document object model and the plurality of operation step instructions, the method further includes:
step S301, checking the operation step instructions based on a preset checking rule;
It should be noted that, the rule preset by the checking rule for judging whether the operation step instruction is valid or not only can check the web page operation flow completely through the valid operation step instruction, and the rule can include requirements of grammar, semantics and logic layers, such as format check, grammar check, semantic check, operation sequence check and the like. Syntax checking refers to checking whether the structure of an operation step instruction meets requirements, such as whether the format of the instruction and parameters are complete. Semantic checking refers to checking whether the logic of the operation step instruction is correct, such as whether a valid component is contained or whether there is an impossible operation (e.g., clicking an invisible button or entering invalid data, etc.). The operation sequence checking means that the sequence of operation steps is ensured to be reasonable, and logical incoherence of the front step and the rear step is avoided.
It can be understood that, in order to ensure that the instructions of the operation steps are valid and executable, several operation step instructions output by the large model are checked to ensure that the operation step instructions conform to preset checking rules, so as to ensure that the operation step instructions are valid and executable, avoid that the wrong instructions enter the subsequent procedure, ensure the correctness and consistency of the instructions, and improve the reliability of the whole system.
Step S302, when the operation step instructions are not verified, generating operation semantic vectors based on the operation step instructions which are not verified;
It will be appreciated that when some of the operation step instructions fail to pass, the potential problem in these operation step instructions needs to be understood, the operation step instructions fail to pass the verification are semantically analyzed and corresponding operation semantic vectors are generated, and the operation semantic vectors not only reflect the content of the operation steps, but also may cover potential error information or ambiguity.
Step S303, searching through a preset component vector database according to the operation semantic vector to obtain target component data;
it should be noted that, the attribute (such as ID, component name, type, location, etc.) of each web page component may be encoded and converted into a vector form, where the operation semantic vector reflects the function, type, relationship with other elements, etc. of the component. The component vector database is a vectorized database that contains all the operational components in the web page, and the component data (e.g., location, type, status, etc.) is stored in vector form.
It will be appreciated that by converting the operation steps into semantic vectors, the system may use these vectors to retrieve in a pre-established component vector database, find target component data associated with the target operation, so that subsequent operations can be performed on the target component accurately, ensuring that the identified component is the component desired by the user, thereby avoiding misoperations.
Illustratively, assuming that the component to which a step instruction corresponds is a "click on a login button", but that the button is not present or visible due to page layout changes, the semantics of this step may be translated into a vector and marked as "button unavailable" or "target element not found", and the model may be matched to a component associated with "login" based on the semantic vector.
Step S304, based on the target assembly data and the first disassembly prompt word, generating a second disassembly prompt word, inputting the second disassembly prompt word into the large model, performing step disassembly through the large model, and outputting the webpage operation flow again.
It can be understood that according to the target assembly data and the first disassembly prompt word generated for the first time, a more optimized second disassembly prompt word is generated, the generated second disassembly prompt word is input into the large model, the step of describing the user service requirement is disassembled again, and finally, a webpage operation flow which is more accurate and accords with the actual webpage environment is output.
In the embodiment, through combining a large model and a search enhancement generation technology, the operation step instruction is verified, searched and optimized, the accurate execution of the operation step is ensured, and the automation operation is ensured to adapt to various complex webpage structures and element changes according to the real-time condition of the webpage, so that the intelligent level of webpage process automation is improved.
The embodiment provides a webpage process automation method, which comprises the steps of firstly receiving business requirement description input by a user to determine webpage process automation requirements, carrying out step disassembly on the business requirement description to obtain a webpage operation process comprising a plurality of operation step instructions so as to convert the user requirements into a plurality of executable webpage step instructions, and carrying out high-efficiency webpage element positioning by utilizing a large model to position according to a pre-acquired webpage document object model and the operation step instructions obtained by step disassembly to obtain a webpage element path and automatically executing the webpage operation process through the webpage element path. The method realizes webpage process automation based on the large model and the webpage document object model, solves the problem of low webpage element positioning efficiency of the existing webpage process automation method, and improves the webpage element positioning efficiency of the webpage process automation.
In the second embodiment of the present application, the same or similar content as that of the first embodiment may be referred to the description above, and will not be repeated. On this basis, please refer to fig. 2, fig. 2 is a flowchart illustrating a second embodiment of the web page process automation method of the present application.
In this embodiment, the step of performing positioning of the web page element through the preset large model based on the pre-acquired web page document object model and the plurality of operation step instructions to obtain the web page element path includes:
step S305, generating a positioning prompt word based on the webpage document object model and a plurality of operation step instructions;
It will be appreciated that, based on the document object model of the web page and the several operation step instructions, a positioning prompt for positioning the element of the web page in the input large model is generated, so that the element can be positioned in the web page quickly when the web page flow is executed.
Step S306, inputting the positioning prompt word into the large model, positioning the webpage element through the large model, and outputting a webpage element path.
It should be noted that, the positioning prompt word is input into the large model, the positioning of the webpage element is performed by utilizing the natural language understanding and reasoning capability of the large model, the position of the webpage element is deduced from the positioning prompt word, and finally the path of the webpage element corresponding to the operation step instructions in the webpage document object model is output.
In a possible implementation manner, the step of inputting the positioning prompt word into the large model, positioning the webpage element through the large model, and outputting the webpage element path includes:
Inputting the positioning prompt word into the large model for the following processing:
step S3061, determining webpage elements needing to be operated according to the positioning prompt words;
it will be appreciated that after the positioning prompt is entered into the large model, the large model analyzes the entered positioning prompt, understands its semantics and determines the elements of the web page that need to be operated, e.g., clicking the login button will point to the button element on the page that has login functionality.
Step S3062, reading nodes corresponding to the webpage elements in the webpage document object model, and setting the nodes corresponding to the webpage elements as current nodes;
It should be noted that once the target element is identified, the large model will find the node of the element from the DOM tree structure of the web page document object model, where the node contains all the information of the element, such as tag type, attribute, child node, and so on.
It can be understood that the node of the webpage element to be operated in the webpage document object model is read and set as the current node, and is used as the starting point of recursive traversal to prepare for the subsequent node path of the webpage element for the recursive search of the father node.
Step S3063, recursively calling a parent node of the current node in the webpage document object model, and setting the parent node of the current node as the current node until the current node does not have the parent node, so as to obtain a recursively called node path;
it should be noted that the node path uniquely identifies the position of the web page element in the DOM tree. The recursively invoked node paths eventually form a complete path by taking the label of the current node and its attributes as part of the path in each recursion.
It can be understood that the current node is taken as a starting point of recursive lookup, the DOM tree is traversed recursively from the current node, the father node of the current node is obtained until the current node reaches the root node of the DOM tree, recursive call is stopped, a complete path of continuous father-son node relation of the target element from the node of the target element to the root node in the DOM is constructed, and node paths of different complex webpages corresponding to the webpage elements can be accurately tracked and constructed.
Step S3064, outputting a webpage element path according to the node path.
It should be noted that, the web page element path may be a hierarchical description of DOM nodes, such as html > body > div > form > button [ id= "submit"), and may be output in the form of a string or in a structured format (e.g., JSON).
It can be appreciated that after the path from the target element node to the root node is obtained through a recursive method, the path is converted into a readable web page element path and output, and the target element can be uniquely identified, so that a user or an automation tool can accurately find the target element executed by the web page operation flow by using the path.
Illustratively, assuming that the positioning hint word is "click submit button", the large model recognizes that the "submit button" corresponds to a < button > element and determines the target operation based on the hint word. The large model then reads the < button > element node from the DOM, starting recursively looking up its parent node up until the root node is reached. Assuming a path such as html > body > div > form > button [ id= "submit" ], the path can accurately locate the "submit button" in the page and output as the final web page element path.
In the embodiment, the DOM structure is traversed by combining the reasoning capability of the large model in a recursive call mode, and the path from the target element to the root node is constructed, so that the accurate positioning of the webpage element is realized, the webpage element can be accurately identified and positioned according to the positioning prompt word, and the complete DOM path of the webpage element is constructed.
In the embodiment, by combining the characteristics of natural language processing, large model reasoning capability and webpage DOM structure, the high-efficiency and flexible webpage element positioning is realized, positioning prompt words can be automatically generated from operation step instructions, the paths of the webpage elements are deduced through a large model, and finally, automatic webpage operation and test are realized, and the accuracy, robustness and adaptability of an automatic system are improved.
In the third embodiment of the present application, the same or similar content as that of the first and second embodiments may be referred to the description above, and will not be repeated.
For example, in order to facilitate understanding of the implementation flow of the web page flow automation method obtained by combining the first embodiment and the second embodiment, please refer to fig. 3, fig. 3 is a schematic diagram of the architecture of the third embodiment of the web page flow automation method of the present application, which may specifically include three layers:
the first layer is a presentation layer and is interacted with a user mainly through a webpage plug-in;
the second layer is a business logic layer and mainly comprises three parts of business disassembly, tool calling and task execution;
the third layer is a large model layer and mainly comprises a large model, an Agent and a vector database.
Fig. 4 is a flow chart of a third embodiment of the web page flow automation method according to the present application. The user-oriented presentation layer is realized in the form of a webpage plug-in, and mainly comprises the following steps:
1. acquiring natural language service demand description received by webpage plug-in
After the web plug-in is started, a text input box is displayed on the front-end web page, and a user can input natural language through the text input box so as to describe the service requirement to be realized. For example, the natural language input by the user is 'Baidu search Hua is Mate 60 mobile phone'. The natural language text input through the webpage plug-in is transferred to the step disassembly module of the business logic layer for processing.
2. Breaking down business requirements into operation step instructions
Fig. 5 is a schematic flow chart of service requirement disassembly provided in a third embodiment of the web page flow automation method of the present application. The method mainly comprises the steps of carrying out semantic understanding on input natural language business demands through a large language model and realizing step disassembly, so that operation step instructions are output, and the output operation step instructions need to pass through various verification checks until the operation step instructions corresponding to the business demands completely meet the conditions. The specific processing flow comprises the following steps:
(1) Intention verification
After receiving the natural language service requirement description input by the user through the webpage plug-in, firstly, intention recognition screening is needed, the processing is directly exited for the service requirements which violate legal regulations, society social ethics and the like and are not in accordance with the requirements, and corresponding prompt information is given to the user, so that the user inputs legal service requirements again.
(2) Large model generation operation step instruction
The large model generates an operation step instruction according to the prompt word, wherein key information contained in the prompt word comprises full component information and history generation memory information. The history generation memory information is mainly prompting information that the operation step instruction generated in the history is not in accordance with the requirements, and the problem of the operation step instruction generated before the large model is described, so that the large model can generate a result in accordance with the requirements next time.
The repeated times of generating the operation step instruction can be realized through configuration, the operation step instruction can be circularly generated within the configuration times until all the conditions are met, and the operation step instruction meeting all the conditions is output. The generated operation step instruction is output in json format and comprises information such as step names, step parameters, step function descriptions and the like.
In addition, the accuracy of the large model disassembly business requirement step is closely related to the temperature parameter of the large model. temperature is the sampling temperature of a large model, the value range is [0.0,1.0], and the larger the value is, the higher the answer diversity of the model is. In this embodiment, the number of cycles when the service requirement splitting meets the condition can be counted by taking different values for the temperature, so as to determine the value of the temperature, and improve the accuracy of step splitting.
The large model is utilized to automatically complete the disassembly of the service requirements, the disassembled output is an operation step instruction, the granularity of the disassembly is consistent with that of the existing RPA assembly, the existing RPA assembly can be better reused, and meanwhile, a user can easily understand the generated operation step instruction.
(3) Checking operation step instruction output by large model
The method includes the steps that the limiting conditions of model output are described in detail in prompt words of a large model, so that the large model outputs operation step instructions according to given requirements, but sometimes the output result of the large model does not completely meet the requirements, so that the output result of the large model needs to be checked, and only the check completely passes through the effective operation step instructions. The main verification has the following aspects:
(i) The output operation step instruction is in json format, and the fields in json also accord with the preset field requirements;
(ii) The method comprises the steps of checking an operation step instruction, namely checking whether a step name is in a known component set, clearly showing the step name and using the component name in a prompt word, and judging whether an output step instruction parameter meets the requirement of a component, wherein the parameter of a filling type cannot be lacked, otherwise, the operation step cannot be correctly executed, and finally, whether the relation between the components meets the requirement, for example, a certain two components cannot continuously appear.
When the component name check fails, the correct component name is output through the RAG (RETRIEVAL AUGMENTED GENERATION, search enhancement generation) technology, and then an operation step instruction is regenerated for the large model. The retrieval enhancement generation is used for retrieving related information through the self-drooping domain database, then combining the related information into a prompt template, and generating answers of questions for the large model, so that the method is an effective scheme for solving the problems of limitation, illusion problems, data security and the like of general basic large model knowledge. The component data of each RPA system is the data of the self-drooping domain, so that the data can be well provided for a large model through the RAG technology.
As shown in fig. 6, fig. 6 is a schematic flow chart of search enhancement generation provided by a third embodiment of the web page flow automation method of the present application, and a main processing flow of the search enhancement generation includes:
firstly, converting the component data into semantic vectors through Embedding models and storing the semantic vectors into a vector database, when the component name is not checked, constructing a Question (namely a problem) by a user or a system, converting the semantic vectors into the semantic vectors by using Embedding models (namely embedded models) and searching the semantic vectors into the vector database, submitting the searched front top_n data and prompt words to a large model, and finally giving correct component names and operation step instructions by the large model, wherein top_n can be configured by the user.
If the verification is not passed, the false prompt message is explicitly given or the forward data prompt is provided, so that the false prompt message is input into the large model again, the context memory capacity is increased for the large model in the mode, and the large model can generate operation step instructions meeting the requirements and realizing the service requirements in the mode similar to the interaction mode of multi-round conversations.
(4) Outputting the instruction of operation step
And transmitting the operation step instruction which meets the requirements and is generated by the large model to a webpage element acquisition module, and persisting the input and the generated result of the user to a database. If the user inputs the same service requirement next time, the service requirement can be acquired from the database first, so that the calling times of the large model are reduced, and meanwhile, the efficiency of disassembling the operation step instructions is improved.
Taking hundred-degree search as a Mate 60 mobile phone as an example, the disassembly process comprises the following steps:
a. after the user inputs 'hundred-degree search is a Mate 60 mobile phone' on the interface, intention recognition is carried out on the sentence, the recognition result is compliant, then step disassembling logic is entered, and a large model is output to a disassembling step according to a specified format through prompt word engineering.
B. After the disassembly step of the large model output is obtained, the component names, component parameters, component relationships and the like in the step are checked. If the verification is not passed, generating error prompt information, and inputting the error prompt information, history memory and the like into the large model together through prompt words. In this way a satisfactory decomposition step is generated. The following is an example of a generated operation step instruction:
Based on the large model and the prompt word engineering technology, the method is technically characterized in that the result generated by the large model is verified, and prompt information which does not meet the requirements is added to the prompt word, so that the large model has context memory, and operation step instructions which meet the requirements are circularly generated. Therefore, the disassembly is automatically completed, the threshold of RPA flow arrangement is reduced, and the time of flow arrangement is shortened.
3. Locating web page elements
In the last step, satisfactory operation step instructions have been generated, and for each operation step instruction, it is desirable to automatically execute the web page element that first needs to be operated by the positioning operation step instruction. In this embodiment, the web page elements corresponding to each operation step instruction are simple web page elements, such as text boxes, buttons, drop-down boxes, check boxes, and the like in the web page.
In the embodiment, through semantic understanding and reasoning capability of a large model, firstly, the webpage element corresponding to the operation step instruction is found, and then the XPath path of the webpage element is obtained in a recursion calling mode, so that the positioning of the webpage element is easily realized.
As shown in fig. 7, fig. 7 is a schematic flow chart of positioning web page elements according to a third embodiment of the present application, where the specific flow of positioning web page elements includes:
(1) Acquiring document object model of webpage
The document object model of the webpage can be obtained through the URL address of the webpage. For example, the document object model of the current web page can be obtained through Selenium or browser plug-in technology.
(2) Document object model for processing web pages
After the original web page document object model is obtained through the URL address, a lot of contents are not used for locating the web page elements, and the contents are removed, such as notes, js, styles and the like in the document object model. And the processed webpage object model is put into a cache, and is directly obtained from the cache when the webpage object model is needed to be used next time, so that the efficiency of obtaining the webpage document object model is improved.
(3) Outputting webpage elements to be operated by large model
The large model locates the webpage elements to be operated according to the prompt words, and the prompt words mainly comprise the processed webpage document object model, the given webpage elements, the historical error information and the like. The processed webpage document object model still keeps complete webpage elements and structures thereof. The method comprises the steps of firstly determining the type of a given webpage element, wherein the type of the webpage element is preconfigured according to an operation step instruction, for example, the operation step instruction is to click a submit button, the submit button mainly appears in two forms of < input type= "submit" > and < button type= "submit" >, if other forms can be added through configuration, and specific webpage elements can be obtained in a webpage document object model through the type of the webpage element. The historical error information contains the error information of the previous positioning webpage element, is used as the context memory of the large model, and informs the large model of the problem of the generated result, so that the large model can output the correct result next time.
The large model only needs to locate the webpage elements which need to be operated by the operation step instruction in the given webpage elements through semantic understanding and reasoning capability according to the complete document object model and error prompt information. The number of times of locating the webpage element circulation can be realized through configuration, the webpage element is always located through the large model circulation within the number of times of configuration, the webpage element is finally completed until the corresponding webpage element is found, the circulation is exited, and the condition-meeting webpage element is calculated at the moment.
(4) Checking the output result
The operation object corresponding to each operation step instruction is determined, and only in the given webpage element provided for the large model, the webpage element output by the large model must be in the given webpage element. If the webpage element output by the large model is not among the given webpage elements, corresponding error prompt information is given, and the large model is enabled to locate the webpage element again until the webpage element meeting the condition is found.
(5) XPath for obtaining webpage element
And adding a complete webpage document object model to the specific webpage element positioned in the last step, and obtaining an XPath path corresponding to the webpage element in a recursion calling mode. The document object model is expressed as a tree structure, called DOM tree, and now has a specific webpage element, the parent node can be obtained, and the complete XPath path of the webpage element can be obtained by judging whether the parent node exists to finish the recursive call or not and adopting the recursive call mode.
Taking a hundred-degree search as a Mate 60 mobile phone as an example, an operation step instruction is generated in the last step, and each step is executed, namely, firstly, the source code of the current webpage is obtained through a browser plug-in, and then js, css, notes and the like are removed from the webpage source code. And then the processed webpage source codes and the webpage elements required to be operated in the current step are fed to the large model together, the large model outputs the exact webpage elements required to be operated in the step through prompt word engineering, and finally the XPath path of the webpage elements is obtained through the complete webpage source codes and the webpage elements in a recursion calling mode. In this way, the component can be invoked to perform the current step by locating the web page element.
Through the powerful semantic understanding and reasoning capability of the large model, web page document model objects, given web page elements and historical error prompt information are input, the web page elements are accurately and efficiently positioned in a similar multi-round dialogue mode, and each operation step instruction is automatically completed through an executor engine.
4. Automatically executing operation step instructions
The automatic execution module of the operation step instruction is responsible for automatic execution of the operation step instruction and is mainly completed by calling corresponding components through an executor engine. The components are repeatedly executable logic units packaged in the RPA, and the components for operating the webpage elements need to transmit the webpage elements, parameters and operation step instructions of the operation.
As shown in fig. 8, fig. 8 is a schematic flow chart of web page flow execution provided in a third embodiment of the web page flow automation method of the present application, where the specific flow of web page flow execution includes:
Firstly, checking input parameters, checking whether the parameters are legal, whether necessary parameters exist or not and the like, after the parameters pass the checking, calling corresponding components to execute, returning an execution result after the execution is finished so as to execute the next step, and finally, automatically calling the corresponding components to finish the operation step through an operation step instruction of natural language disassembly and an XPath path for positioning webpage elements. The execution of each operation step instruction is the same execution logic, and all operation step instructions are circularly executed, so that the service requirement of the user can be successfully completed, and the service requirement of the user is realized.
In this embodiment, the service requirement described in the natural language is automatically converted into a specific operation step instruction, and then the corresponding operation action is automatically executed according to the operation step instruction. And simultaneously, accurately finding out the webpage elements which need to be operated by the operation step instruction by utilizing the semantic understanding and reasoning capability of the large language model, and acquiring the XPath paths of the webpage elements so as to realize the automatic completion of the operation step instruction. The method is not influenced by factors such as page style, layout and the like, and has good generalization capability.
It should be noted that the foregoing examples are only for understanding the present application, and are not meant to limit the method for automating the web page process of the present application, and more forms of simple transformation based on the technical concept are all within the scope of the present application.
The application also provides a webpage process automation device, please refer to fig. 9, which comprises:
a receiving module 10, configured to receive a service requirement description input by a user;
The disassembling module 20 is configured to disassemble the service requirement description to obtain a web page operation flow, where the web page operation flow includes a plurality of operation step instructions;
the positioning module 30 is configured to perform positioning of a web page element through a large model based on a pre-acquired web page document object model and the plurality of operation step instructions, obtain a web page element path, and execute the web page operation flow through the web page element path.
Optionally, the disassembling module 20 is further configured to:
acquiring full component information;
Generating a first disassembly prompt word based on the service demand description and the full component information;
inputting the first disassembly prompt word into the large model, performing step disassembly through the large model, and outputting a webpage operation flow.
Optionally, the full component information includes one or more of a component name, a component parameter, a component relationship, and a component function description, the operation step instruction includes one or more of a step name, a step instruction parameter, and a step function description, and the operation step instruction is used to operate a web page element of the front-end web page, and the web page element includes one or more of a text box, a button, a drop-down box, and a check box.
Optionally, the disassembling module 20 is further configured to:
Checking the operation step instructions based on a preset checking rule;
When the operation step instructions fail to pass the verification, generating operation semantic vectors based on the operation step instructions fail to pass the verification;
Searching through a preset component vector database according to the operation semantic vector to obtain target component data;
And generating a second disassembly prompt word based on the target assembly data and the first disassembly prompt word, inputting the second disassembly prompt word into the large model, performing step disassembly through the large model, and outputting the webpage operation flow again.
Optionally, the positioning module 30 is further configured to:
generating a positioning prompt word based on the webpage document object model and a plurality of operation step instructions;
and inputting the positioning prompt word into the large model, positioning the webpage element through the large model, and outputting a webpage element path.
Optionally, the positioning module 30 is further configured to:
Inputting the positioning prompt word into the large model for the following processing:
Determining webpage elements to be operated according to the positioning prompt words;
Reading nodes corresponding to the webpage elements in the webpage document object model, and setting the nodes corresponding to the webpage elements as current nodes;
recursively calling a parent node of the current node in the webpage document object model, and setting the parent node of the current node as the current node until the current node does not have the parent node, so as to obtain a recursively called node path;
And outputting a webpage element path according to the node path.
The webpage process automation device provided by the application adopts the webpage process automation method in the embodiment, and can solve the technical problem of low webpage element positioning efficiency of the existing webpage process automation method. Compared with the prior art, the webpage process automation device has the advantages that the webpage process automation method provided by the embodiment is the same as the webpage process automation method provided by the embodiment, and other technical features in the webpage process automation device are the same as the features disclosed by the embodiment method, and are not repeated herein.
The application provides webpage process automation equipment, which comprises at least one processor and a memory in communication connection with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the webpage process automation method in the first embodiment.
Referring now to FIG. 10, a block diagram of a web page process automation device suitable for use in implementing embodiments of the present application is shown. The web page flow automation device in the embodiment of the present application may include, but is not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal DIGITAL ASSISTANT: personal digital assistants), PADs (Portable Application Description: tablet computers), PMPs (Portable MEDIA PLAYER: portable multimedia players), vehicle-mounted terminals (e.g., vehicle-mounted navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The web page flow automation device shown in fig. 10 is only an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present application.
As shown in fig. 10, the web page flow automation apparatus may include a processing device 1001 (e.g., a central processor, a graphic processor, etc.), which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage device 1003 into a random access Memory (RAM: random Access Memory) 1004. Also stored in the RAM1004 are various programs and data required for the operation of the web page flow automation device. The processing device 1001, the ROM1002, and the RAM1004 are connected to each other by a bus 1005. An input/output (I/O) interface 1006 is also connected to the bus. In general, a system including an input device 1007 such as a touch screen, a touch pad, a keyboard, a mouse, an image sensor, a microphone, an accelerometer, a gyroscope, etc., an output device 1008 including a Liquid crystal display (LCD: liquid CRYSTAL DISPLAY), a speaker, a vibrator, etc., a storage device 1003 including a magnetic tape, a hard disk, etc., and a communication device 1009 may be connected to the I/O interface 1006. The communication means 1009 may allow the web page flow automation device to communicate wirelessly or by wire with other devices to exchange data. While web page flow automation devices with various systems are shown in the figures, it is to be understood that not all illustrated systems are required to be implemented or provided. More or fewer systems may alternatively be implemented or provided.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through a communication device, or installed from the storage device 1003, or installed from the ROM 1002. The above-described functions defined in the method of the disclosed embodiment of the application are performed when the computer program is executed by the processing device 1001.
The webpage process automation equipment provided by the application adopts the webpage process automation method in the embodiment, and can solve the technical problem of low webpage element positioning efficiency of the existing webpage process automation method. Compared with the prior art, the beneficial effects of the webpage process automation equipment provided by the application are the same as those of the webpage process automation method provided by the embodiment, and other technical features of the webpage process automation equipment are the same as those disclosed by the method of the previous embodiment, and are not repeated herein.
It is to be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the description of the above embodiments, particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
The present application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon for performing the web page flow automation method in the above-described embodiments.
The computer readable storage medium provided by the present application may be, for example, a U disk, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access Memory (RAM: random Access Memory), a Read-Only Memory (ROM: read Only Memory), an erasable programmable Read-Only Memory (EPROM: erasable Programmable Read Only Memory or flash Memory), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM: CD-Read Only Memory), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this embodiment, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to electrical wiring, fiber optic cable, RF (Radio Frequency) and the like, or any suitable combination of the foregoing.
The computer readable storage medium may be included in the web page process automation device or may exist alone without being incorporated in the web page process automation device.
The computer readable storage medium is loaded with one or more programs, and when the one or more programs are executed by the webpage process automation equipment, the webpage process automation equipment receives business requirement description input by a user, performs step disassembly on the business requirement description to obtain a webpage operation process, wherein the webpage operation process comprises a plurality of operation step instructions, performs webpage element positioning through a large model based on a pre-acquired webpage document object model and the operation step instructions to obtain a webpage element path, and executes the webpage operation process through the webpage element path.
Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN: local Area Network) or a wide area network (WAN: wide Area Network), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present application may be implemented in software or in hardware. Wherein the name of the module does not constitute a limitation of the unit itself in some cases.
The readable storage medium provided by the application is a computer readable storage medium, and the computer readable storage medium stores computer readable program instructions (namely computer programs) for executing the webpage process automation method, so that the technical problem of low webpage element positioning efficiency in the existing webpage process automation method can be solved. Compared with the prior art, the beneficial effects of the computer readable storage medium provided by the application are the same as those of the webpage process automation method provided by the above embodiment, and are not described in detail herein.
The application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of a web page flow automation method as described above.
The computer program product provided by the application can solve the technical problem of low positioning efficiency of the webpage elements in the existing webpage process automation method. Compared with the prior art, the beneficial effects of the computer program product provided by the application are the same as those of the webpage process automation method provided by the embodiment, and are not repeated here.
The foregoing description is only a partial embodiment of the present application, and is not intended to limit the scope of the present application, and all the equivalent structural changes made by the description and the accompanying drawings under the technical concept of the present application, or the direct/indirect application in other related technical fields are included in the scope of the present application.

Claims (10)

Translated fromChinese
1.一种网页流程自动化方法,其特征在于,所述方法包括:1. A webpage process automation method, characterized in that the method comprises:接收用户输入的业务需求描述;Receive business requirement description input by users;对所述业务需求描述进行步骤拆解,得到网页操作流程,所述网页操作流程包括若干操作步骤指令;Decomposing the business requirement description into steps to obtain a webpage operation process, wherein the webpage operation process includes a plurality of operation step instructions;基于预先获取的网页文档对象模型和所述若干操作步骤指令,通过大模型进行网页元素定位,得到网页元素路径,并通过所述网页元素路径执行所述网页操作流程。Based on the pre-acquired web page document object model and the several operation step instructions, the web page elements are located through the large model to obtain the web page element path, and the web page operation process is executed through the web page element path.2.如权利要求1所述方法,其特征在于,所述对所述业务需求描述进行步骤拆解,得到网页操作流程的步骤包括:2. The method according to claim 1, wherein the step of decomposing the business requirement description into steps to obtain a webpage operation process comprises:获取全量组件信息;Get all component information;基于所述业务需求描述和全量组件信息,生成第一拆解提示词;Based on the business requirement description and all component information, generate a first disassembly prompt word;将所述第一拆解提示词输入所述大模型中,通过所述大模型进行步骤拆解,输出网页操作流程。The first disassembly prompt word is input into the large model, the steps are disassembled through the large model, and the web page operation process is output.3.如权利要求2所述方法,其特征在于,所述全量组件信息包括组件名称、组件参数、组件关系和组件功能描述中的一种或多种,所述操作步骤指令包括步骤名称、步骤指令参数和步骤功能描述中的一种或多种,所述操作步骤指令用于操作前端网页的网页元素,所述网页元素包括文本框、按钮、下拉框和复选框中的一种或多种。3. The method as described in claim 2 is characterized in that the full component information includes one or more of component name, component parameters, component relationship and component function description, the operation step instructions include one or more of step name, step instruction parameters and step function description, the operation step instructions are used to operate the web page elements of the front-end web page, and the web page elements include one or more of text boxes, buttons, drop-down boxes and check boxes.4.如权利要求2所述方法,其特征在于,所述基于预先获取的网页文档对象模型和所述若干操作步骤指令,通过大模型进行网页元素定位,得到网页元素路径的步骤之前,还包括:4. The method according to claim 2, characterized in that before the step of locating the web page element through the large model based on the pre-acquired web page document object model and the plurality of operation step instructions to obtain the web page element path, it also includes:基于预设的校验规则,对所述若干操作步骤指令进行校验;Based on preset verification rules, verify the several operation step instructions;当所述若干操作步骤指令校验不通过时,基于校验不通过的操作步骤指令,生成操作语义向量;When the plurality of operation step instructions fail verification, generating an operation semantic vector based on the operation step instructions that fail verification;根据所述操作语义向量,通过预设的组件向量数据库进行检索,得到目标组件数据;According to the operational semantic vector, searching is performed through a preset component vector database to obtain target component data;基于所述目标组件数据和第一拆解提示词,生成第二拆解提示词,以将所述第二拆解提示词输入所述大模型中,通过所述大模型进行步骤拆解,重新输出网页操作流程。Based on the target component data and the first disassembly prompt word, a second disassembly prompt word is generated, and the second disassembly prompt word is input into the large model, and the steps are disassembled through the large model to re-output the web page operation process.5.如权利要求1所述方法,其特征在于,所述基于预先获取的网页文档对象模型和所述若干操作步骤指令,通过预设的大模型进行网页元素定位,得到网页元素路径的步骤,包括:5. The method according to claim 1, wherein the step of locating the web page element through a preset large model based on the pre-acquired web page document object model and the plurality of operation step instructions to obtain the web page element path comprises:基于所述网页文档对象模型和若干操作步骤指令,生成定位提示词;Based on the webpage document object model and a number of operation step instructions, generating a positioning prompt word;将所述定位提示词输入所述大模型中,通过所述大模型进行网页元素定位,输出网页元素路径。The positioning prompt word is input into the large model, the web page element is positioned through the large model, and the web page element path is output.6.如权利要求5所述方法,其特征在于,所述将所述定位提示词输入所述大模型中,通过所述大模型进行网页元素定位,输出网页元素路径的步骤,包括:6. The method according to claim 5, characterized in that the step of inputting the positioning prompt word into the large model, locating the web page element through the large model, and outputting the web page element path comprises:将所述定位提示词输入所述大模型中进行如下处理:The positioning prompt word is input into the large model and processed as follows:根据所述定位提示词,确定需要操作的网页元素;Determine the web page element that needs to be operated according to the positioning prompt word;读取所述网页文档对象模型中所述网页元素对应的节点,并将所述网页元素对应的节点设为当前节点;Reading a node corresponding to the web page element in the web page document object model, and setting the node corresponding to the web page element as a current node;递归调用所述网页文档对象模型中所述当前节点的父节点,并将所述当前节点的父节点设为当前节点,直到所述当前节点不存在父节点,得到递归调用的节点路径;Recursively call the parent node of the current node in the webpage document object model, and set the parent node of the current node as the current node, until the current node has no parent node, to obtain a node path of the recursive call;根据所述节点路径,输出网页元素路径。According to the node path, a web page element path is output.7.一种网页流程自动化装置,其特征在于,所述装置包括:7. A webpage process automation device, characterized in that the device comprises:接收模块,用于接收用户输入的业务需求描述;A receiving module, used to receive a business requirement description input by a user;拆解模块,用于对所述业务需求描述进行步骤拆解,得到网页操作流程,所述网页操作流程包括若干操作步骤指令;A disassembly module, used to disassemble the business requirement description into steps to obtain a webpage operation process, wherein the webpage operation process includes a plurality of operation step instructions;定位模块,用于基于预先获取的网页文档对象模型和所述若干操作步骤指令,通过大模型进行网页元素定位,得到网页元素路径,并通过所述网页元素路径执行所述网页操作流程。The positioning module is used to locate the web page elements through the large model based on the pre-acquired web page document object model and the several operation step instructions, obtain the web page element path, and execute the web page operation process through the web page element path.8.一种网页流程自动化设备,其特征在于,所述设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序配置为实现如权利要求1至6中任一项所述网页流程自动化方法的步骤。8. A web page process automation device, characterized in that the device comprises: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the computer program is configured to implement the steps of the web page process automation method as described in any one of claims 1 to 6.9.一种存储介质,其特征在于,所述存储介质为计算机可读存储介质,所述存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至6中任一项所述网页流程自动化方法的步骤。9. A storage medium, characterized in that the storage medium is a computer-readable storage medium, and a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the web page process automation method as described in any one of claims 1 to 6 are implemented.10.一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序,所述计算机程序被处理器执行时实现如权利要求1至6中任一项所述网页流程自动化方法的步骤。10. A computer program product, characterized in that the computer program product comprises a computer program, and when the computer program is executed by a processor, the steps of the web page process automation method according to any one of claims 1 to 6 are implemented.
CN202510031776.8A2025-01-082025-01-08 Web page process automation method, device, equipment, storage medium and program productPendingCN119939004A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510031776.8ACN119939004A (en)2025-01-082025-01-08 Web page process automation method, device, equipment, storage medium and program product

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510031776.8ACN119939004A (en)2025-01-082025-01-08 Web page process automation method, device, equipment, storage medium and program product

Publications (1)

Publication NumberPublication Date
CN119939004Atrue CN119939004A (en)2025-05-06

Family

ID=95542387

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510031776.8APendingCN119939004A (en)2025-01-082025-01-08 Web page process automation method, device, equipment, storage medium and program product

Country Status (1)

CountryLink
CN (1)CN119939004A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120144727A (en)*2025-05-152025-06-13杭州新中大科技股份有限公司 A large model question answering optimization method, device, equipment and storage medium
CN120256703A (en)*2025-06-052025-07-04上海稀宇科技有限公司 A method and device for generating page parameters of a web page
CN120338723B (en)*2025-06-182025-09-12人谷科技(北京)有限责任公司Method and system for arranging order-receiving payment business process based on visualization

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120144727A (en)*2025-05-152025-06-13杭州新中大科技股份有限公司 A large model question answering optimization method, device, equipment and storage medium
CN120144727B (en)*2025-05-152025-07-29杭州新中大科技股份有限公司Large model question-answer optimization method, device, equipment and storage medium
CN120256703A (en)*2025-06-052025-07-04上海稀宇科技有限公司 A method and device for generating page parameters of a web page
CN120338723B (en)*2025-06-182025-09-12人谷科技(北京)有限责任公司Method and system for arranging order-receiving payment business process based on visualization

Similar Documents

PublicationPublication DateTitle
JP7253593B2 (en) Training method and device for semantic analysis model, electronic device and storage medium
CN112149399B (en)Table information extraction method, device, equipment and medium based on RPA and AI
US20200125482A1 (en)Code examples sandbox
CN119939004A (en) Web page process automation method, device, equipment, storage medium and program product
US20220414463A1 (en)Automated troubleshooter
Pereira et al.A mobile app for teaching formal languages and automata
US20200327201A1 (en)Provision of natural language response to business process query
CN114676705B (en)Dialogue relation processing method, computer and readable storage medium
CN119440491A (en) Programming assistant framework, programming assistance method, equipment, medium and product
CN119202167A (en) Policy question answering method, device, apparatus and computer program product
Hossain et al.Natural language–Based conceptual modelling frameworks: state of the art and future opportunities
CN120029604A (en) Training data generation method, device, equipment and medium
CN119226464A (en) Scenario question answering method, device, equipment, storage medium and program product
CN119396377A (en) Method, device, medium and product for assisting in writing code
CN119130373A (en) Method, device, equipment and storage medium for verifying business processes based on large models
CN119005166A (en)Knowledge distillation method, apparatus, device and storage medium
Li et al.A novel approach for rapid development based on chatgpt and prompt engineering
CN110727428B (en)Method and device for converting service logic layer codes and electronic equipment
CN119179467B (en)Artificial intelligence aided programming construction method
CN120045171B (en)Code generation method and device
US20250251916A1 (en)Query resolution using codebase analysis and generative artificial intelligence models
CN120407012A (en) Method, system, terminal and medium for automatically generating code comments
CN119248239A (en) Code generation method, device, equipment and medium
Rahman FarabiLarge scale empirical study on front-end software development technology from community QA sites
CN120578574A (en) Code debugging method and related equipment

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp