CN107340999A

Movatterモバイル変換

Info

Publication number: CN107340999A
Application number: CN201710013071.9A
Authority: CN
Inventors: 黄培红; 汪湛清
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2017-01-09
Filing date: 2017-01-09
Publication date: 2017-11-10

Abstract

Translated fromChinese

本发明公开了一种基于自然语言理解的软件生产自动化的方法与系统，内容包括：一、预先建立一个自然语言理解库的构建模型，构建词汇语义知识库；二、建立一个自然语言理解的系统，对软件过程描述性信息，机器先进行理解解析，构建句子语义对应的语法结构；三、建立理解推演模型，在语法语义结构的基础上，机器进行推演，不断抽取难解性元素，搜索规则或语义值，插入重写难解性元素，该过程重复进行，直到没有难解性元素为止，形成求解串；四、建立形式化模型，该模型通过整理这些解决过程形成的求解串，对求解串抽取操作语义进行形式化后，输出对应的软件代码行集合。利用本发明，可以有效提高软件的开发速度与软件的可靠性。

The invention discloses a method and system for software production automation based on natural language understanding, which includes: 1. Pre-establishing a construction model of a natural language understanding database to construct a lexical and semantic knowledge base; 2. Establishing a natural language understanding system , for the descriptive information of the software process, the machine first understands and analyzes it, and constructs the grammatical structure corresponding to the semantics of the sentence; 3. Establish a comprehension deduction model. or semantic value, insert and rewrite intractable elements, this process is repeated until there are no insolvable elements, and a solution string is formed; 4. Establish a formal model. After the string extraction operation semantics are formalized, the corresponding set of software code lines is output. The invention can effectively improve software development speed and software reliability.

Description

Translated fromChinese

软件自动化方法与系统以及构建自然语言理解库的方法Software automation method and system and method for building natural language understanding library

【技术领域】本发明涉及一种软件生产自动化方法，特别是涉及自然语言描述的需求分析直接理解、推演并形式化为软件的软件自动化方法、软件自动化系统，以及对自然语言进行实质理解并构建自然语言理解库的方法。[Technical Field] The present invention relates to a software production automation method, in particular to a software automation method and a software automation system that directly understands, deduces and formalizes requirements analysis described in natural language into software, and a software automation system that essentially understands and constructs natural language Methods for natural language understanding libraries.

【背景技术】编程是个繁琐的脑力劳动，软件自动化则是一种让机器代替人类进行自动编程的方法，可提高效率，解放脑力劳动。但目前软件自动化程度仍然处于半自动加人工干预的态势，软件生产自动化的程度仍然较低。具体表现为，第一，软件自动化实现途径首先需要把软件需求描述转换为功能规格说明，由于非形式化到形式化的过程非常困难，非形式化的软件需求转化为形式化的规格说明仍然需要手工方式，这是软件生产自动化的难点之一。从需求规格说明到功能规格说明目前多数系统效率较低，也容易出差错。第二，当前，无论是基于构件的软件工厂还是模型驱动构架MDA，都是基于形式化的规格说明之上的，而且形式化基础上的演绎综合，这方面的软件自动化的规模还很小，从功能规格说明到生成设计规格说明仍然难以自动化；[Background Art] Programming is a tedious mental work, and software automation is a method for machines to replace humans to perform automatic programming, which can improve efficiency and liberate mental work. But at present, the degree of software automation is still in the state of semi-automatic plus manual intervention, and the degree of software production automation is still low. Specifically, first, the software automation implementation approach first needs to convert software requirements descriptions into functional specifications. Since the process from informalization to formalization is very difficult, it is still necessary to convert informal software requirements into formalized specifications. Manually, this is one of the difficulties in software production automation. From Requirements Specification to Functional Specification Most current systems are inefficient and error-prone. Second, at present, whether it is a component-based software factory or a model-driven architecture (MDA), both are based on formal specifications, and the deductive synthesis based on formalization, the scale of software automation in this area is still very small. The transition from functional specification to design specification remains difficult to automate;

【发明内容】【Content of invention】

本发明提供一种软件生产自动化方法与系统，使得软件项目或应用得以实现。根据本发明的一个方面，提供了一种从自然语言描述信息经过理解、推演、形式化直接导出软件的软件自动化方法，有别于类自然语言的描述方式。类自然语言是受限的自然语言，属于形式化方法，本方法直接基于自然语言描述需求分析，只要把程序做什么手工怎么操作描述出来，该软件自动化方法就可以自动生成所需要的软件。其中，描述信息包括操作流程、操作规范及知识描述，描述性文本中的这些知识描述亦可以单独放置。该方法能够降低劳动强度，提高编程效率。The invention provides a software production automation method and system, enabling software projects or applications to be realized. According to one aspect of the present invention, a software automation method for directly deriving software from natural language description information through understanding, deduction, and formalization is provided, which is different from natural language-like description methods. Quasi-natural language is a limited natural language, which belongs to the formal method. This method is directly based on natural language description requirements analysis. As long as the program is described manually and how it is operated, the software automation method can automatically generate the required software. Among them, the descriptive information includes operating procedures, operating specifications, and knowledge descriptions, and these knowledge descriptions in the descriptive text can also be placed separately. The method can reduce labor intensity and improve programming efficiency.

根据本发明的一个方面，提供一种软件自动化系统，一是可增加根据描述信息执行的准确性，提高对需求处理的正确性；二是减少人为因素引起的人类不易觉察的机器错误，提高编程的正确率；三是提高编程效率，减少人工编程的劳动强度。According to one aspect of the present invention, a software automation system is provided. First, it can increase the accuracy of execution according to the description information and improve the correctness of demand processing; The accuracy rate; the third is to improve programming efficiency and reduce the labor intensity of manual programming.

根据本发明的一个方面，提供一种构建自然语言理解库即语义知识库(常识库)的方法。本发明的目的是通过以下技术方案实现一种软件自动化方法，该方法包括A、理解(机器)。对描述信息进行理解，简称脚本理解；脚本理解包括词语理解、句子理解、句群理解以及语义表示；在理解基础上进行词法分析，可达到词语分析的合理性；通过理解实施句法分析，以达到句法分析的合理性；通过词法分析与句法分析，最终构建语义理解目标的文本知觉语义结构，为后续过程做准备；B、推演(机器)。难解性元素，即尚未解决的元素，已知条件中无法搜到的元素则是难解性元素。在步骤A的描述信息的语义表示基础上抽取出交易，解读交易，对难解性元素进行判断重写，从而不断展开交易，问题得到推演解决。具体过程为：首先通过语义识别出交易(程序标题)，交易即问题串；机器对问题串进行分析，发现捕捉难解性元素。若捕捉到，如果可匹配已知，则置为可解性元素；否则，对难解性元素在本文本内进行搜索规则或what值，即搜索语义或因果串，重写难解性元素。重新分析问题串，看是否有难解性元素。这个捕捉重写过程连续进行，直到无难解性元素为止；从而得到最终的问题串，亦称推理串或求解串。替换策略还包括使用启发性知识以缩小求解的途径，避免求解途径的组合爆炸。C、形式化(机器)：形式化即是对推理串的整理与符号化过程，从推理串中整理出程序流程。推理串整理包括变量识别、运算符与函数等的变换翻译等。变量识别通过把what值语义中具有机器取值指向特征的前变量词语识别为变量，从而实现根据概念的语义特征所进行的变量形式化。运算符的变换翻译则是根据语义进行运算符直接理解变换，是以知觉语义特征为中介进行自然语言与程序语言的一一对应过程。通过变量识别与运算符等的变换翻译，从而抽取出推理串中的操作语义，组成程序代码行集合。According to one aspect of the present invention, a method for constructing a natural language understanding base, that is, a semantic knowledge base (common sense base) is provided. The purpose of the present invention is to realize a kind of software automatic method by following technical scheme, and this method comprises A, understanding (machine). Understanding the description information, referred to as script understanding; script understanding includes word understanding, sentence understanding, sentence group understanding and semantic representation; lexical analysis on the basis of understanding can achieve the rationality of word analysis; through understanding, implement syntactic analysis to achieve The rationality of syntactic analysis; through lexical analysis and syntactic analysis, the text-aware semantic structure of the semantic understanding target is finally constructed to prepare for the subsequent process; B. Deduction (machine). Difficult elements, that is, unresolved elements, elements that cannot be found in known conditions are insoluble elements. On the basis of the semantic representation of the description information in step A, the transaction is extracted, the transaction is interpreted, and the difficult elements are judged and rewritten, so that the transaction is continuously carried out, and the problem is deduced and solved. The specific process is as follows: first, the transaction (program title) is identified through semantics, and the transaction is the question string; the machine analyzes the question string and finds and captures difficult elements. If captured, if the matching is known, set it as a solvable element; otherwise, search the rules or what value in this text for the insoluble element, that is, search for semantics or causal strings, and rewrite the insoluble element. Re-analyze the problem string to see if there are intractable elements. This capturing and rewriting process continues until there are no intractable elements; thus the final problem string is obtained, also called reasoning string or solving string. The replacement strategy also includes using heuristic knowledge to narrow down the solution path and avoid combinatorial explosion of solution paths. C. Formalization (machine): Formalization is the process of organizing and symbolizing the reasoning strings, and sorting out the program flow from the reasoning strings. Inference string sorting includes variable identification, transformation and translation of operators and functions, etc. Variable identification realizes the formalization of variables based on the semantic features of concepts by identifying the pre-variable words that have the characteristics of machine value orientation in what value semantics as variables. Operator transformation and translation is to directly understand and transform operators according to semantics, and use perceptual semantic features as an intermediary to carry out a one-to-one correspondence process between natural language and programming language. Through the transformation and translation of variable identification and operators, etc., the operational semantics in the inference string are extracted to form a set of program code lines.

较佳地，所述步骤A之前进一步包括A’、采集格式化或非格式化的词汇信息，生成词汇语义知识库。上述方法中，所述步骤A’包括A’1、判断所述词汇定义方法是否是格式化方法，如果是，则执行步骤A’2、采集格式化的词汇信息，生成词汇语义知识库；否则，执行步骤A’3、采集非格式化的词汇信息，生成词汇语义知识库。上述方法中，步骤A’2所述采集格式化的词汇信息，生成词汇语义知识库包括A’21、确定需自动采集的词汇点，为所述需采集的词汇点分配词汇ID；A’22、根据所述需自动采集的词汇点，对词汇信息记录进行采集，根据格式化标志获得与所述需自动采集的词汇点对应的词汇语义信息与有关知识信息，为所述词汇信息包含的词汇语义点分配词汇语义点ID；A’23、将所述词汇信息以五元组的形式进行存储；任一所述词汇点的五元组包含词语ID(词汇)、知觉语义、语义点ID、例子及有关知识。上述方法中，步骤A’3所述对词汇信息记录进行采集，获得与所述需自动采集的词汇点对应的词汇语义信息包括A’31、对确定进行采集的词汇信息进行词法分析与语法分析；A’32、识别格式化后的所述词汇信息点包含的词汇，获得词汇点；A’33、从意思描述记录中选择包含所述词汇的语义点信息；A’34、通过例子(相关知识)理解获得精细化的语义点信息，并提取相关知识语义。A’35、将所述词汇信息以五元组的形式进行存储；任一所述词汇点的五元组包含词语ID(词汇)、知觉语义、语义点ID、例子及知识。词汇信息以五元组的形式进行存储，其存储前要进行词汇归位、语义归位；词汇归位包括查询常识库，看该词汇是否已存在，若已存在则应判别语义异同，然后一致性地增加定义；语义归位包括查询常识库的语义部分，看该语义是否已存在，并一致性地增加语义链接。知识归位包括查询常识库，看该知识是否已存在，并一致性地增加该知识语义。语义与知识，其储存形式都是结构化的；其中的语义是知觉语义。Preferably, before the step A, it further includes A', collecting formatted or unformatted vocabulary information, and generating a vocabulary semantic knowledge base. In the above method, the step A' includes A'1, judging whether the vocabulary definition method is a formatting method, and if so, performing step A'2, collecting formatted vocabulary information, and generating a vocabulary semantic knowledge base; otherwise , execute step A'3, collect unformatted vocabulary information, and generate a vocabulary semantic knowledge base. In the above method, the formatted vocabulary information is collected in step A'2, and the vocabulary semantic knowledge base is generated including A'21, determining the vocabulary points to be automatically collected, and assigning a vocabulary ID to the vocabulary points to be collected; A'22 . According to the vocabulary points that need to be automatically collected, the vocabulary information records are collected, and the semantic information and related knowledge information corresponding to the vocabulary points that need to be automatically collected are obtained according to the formatting marks, which are the vocabulary information contained in the vocabulary information. Semantic point distribution vocabulary semantic point ID; A'23, described lexical information is stored with the form of quintuple; The quintuple of any described lexical point comprises word ID (vocabulary), perceptual semantics, semantic point ID, Examples and related knowledge. In the above method, in step A'3, the lexical information records are collected, and the lexical semantic information corresponding to the lexical points that need to be automatically collected is obtained, including A'31, performing lexical analysis and grammatical analysis on the lexical information that is determined to be collected A'32, identify the vocabulary contained in the vocabulary information point after formatting, and obtain the vocabulary point; A'33, select the semantic point information that includes the vocabulary from the meaning description record; A'34, pass an example (related Knowledge) to understand and obtain refined semantic point information, and extract relevant knowledge semantics. A'35. The vocabulary information is stored in the form of quintuples; any quintuple of vocabulary points includes word ID (vocabulary), perceptual semantics, semantic point ID, examples and knowledge. Vocabulary information is stored in the form of quintuples, and vocabulary homing and semantic homing should be performed before storage; lexical homing includes querying the common sense database to see if the vocabulary already exists. Consistently add definitions; Semantic homing includes querying the semantic part of the knowledge base to see if the semantic already exists, and adding semantic links consistently. Knowledge homing includes querying the knowledge base to see if the knowledge already exists, and adding semantics to the knowledge consistently. Both semantics and knowledge are stored in structured forms; the semantics in them are perceptual semantics.

【附图说明】【Description of drawings】

图1为现有的软件自动化系统的结构方块示意图，现结合图1，对现有的软件自动化系统的结构进行说明，具体如下：现有的软件自动化系统包括理解模块101、推演模块102和形式化模块103。理解模块101接受描述信息，并根据描述信息的what-why值(知觉语义集)，把描述信息切分为词语法结构，及其对应的理解结果，该理解结果为对应的语义结构集。推演模块102接受自然语言理解系统(理解模块101)发来的理解结果，并对该理解结果进行演绎推理，形成展开的问题串，或称为问题解决过程串。形式化模块103接受推演模块(102)发来的问题解决过程串，对该问题串进行形式化，最终输出程序序列。现有的软件自动化系统的知识库104的构建阶段(105)，需要从输入的新词汇定义对中挖掘词汇模板，构建出词汇语义模板库，以供理解单元101查询获得语义模式，知识库(词汇语义模板库或称词汇语义库)中的词汇定义包括语义模式、知识模式等；可采用串行分析技术对输入的新词汇定义进行处理以获得与词汇对应的知觉模式集，还可对输入的词汇知识进行知识解析以获得与词汇相关的知识。知识库104中保存的知识即为与该词汇的功能模式对应的词汇语义间的连接，并且知识库104采用人工智能领域中的知觉语义集表示知识与词汇定义，而上述知识库104的构建需要人工完成与自动完成相结合。现有的软件自动化系统的知识库构建与维护的成本较大，且需要业务人员总结词语语料及知识，运维需要持续加入新词汇新知识；由于知识库(常识库)会越来越大，推演或理解模块单元进行推演或理解计算的过程会越来越耗时，效率下降，现有软件自动化系统还有待进一步改进。Fig. 1 is the structural block schematic diagram of existing software automation system, now in conjunction with Fig. 1, the structure of existing software automation system is described, specifically as follows: existing software automation system comprises comprehension module 101, deduction module 102 and form module 103. The comprehension module 101 accepts the description information, and according to the what-why value (perceptual semantic set) of the description information, divides the description information into grammatical structures and corresponding understanding results, which are the corresponding semantic structure sets. The deduction module 102 receives the understanding result sent by the natural language understanding system (the understanding module 101 ), and performs deductive reasoning on the understanding result to form an expanded problem string, or a problem-solving process string. The formalization module 103 accepts the problem-solving process string sent by the deduction module (102), formalizes the problem string, and finally outputs the program sequence. In the construction stage (105) of the knowledge base 104 of the existing software automation system, it is necessary to mine vocabulary templates from the input new vocabulary definition pairs, and construct a vocabulary semantic template library for querying by the understanding unit 101 to obtain semantic patterns, the knowledge base ( The vocabulary definitions in the vocabulary semantic template library (or vocabulary semantics library) include semantic patterns, knowledge patterns, etc.; the serial analysis technology can be used to process the input new vocabulary definitions to obtain a set of perceptual patterns corresponding to the vocabulary, and the input Knowledge analysis of vocabulary knowledge to obtain vocabulary-related knowledge. The knowledge stored in the knowledge base 104 is the connection between the vocabulary and semantics corresponding to the functional mode of the vocabulary, and the knowledge base 104 uses the perceptual semantic set in the field of artificial intelligence to represent knowledge and vocabulary definitions, and the construction of the above knowledge base 104 requires Human completion combined with automatic completion. The cost of building and maintaining the knowledge base of the existing software automation system is high, and business personnel are required to summarize word corpus and knowledge, and operation and maintenance need to continuously add new vocabulary and new knowledge; since the knowledge base (common sense base) will become larger and larger, Deduction or understanding of modular units The process of deduction or understanding calculation will become more and more time-consuming, and the efficiency will decrease. The existing software automation system needs to be further improved.

【权利要求】【Rights request】

1、一种软件自动化方法与系统，其特征在于，该方法或系统包括步骤A、将接收的描述信息进行理解处理，获得描述信息的语法语义结构；该描述信息包括多个句子的语法语义结构，即语法语义结构集；此为理解模块(自然语言理解系统)；B、根据描述信息的语法语义结构集提取确定难解性元素；难解性元素即初始问题串。C、利用难解性元素推演公式展开问题串，直到该问题串没有难解性元素为止，BC为推演模块；D、根据最终的问题串整理出动作序列，并对该动作序列形式化，输出与该描述信息对应的程序语句序列集，此为形式化模块。1. A software automation method and system, characterized in that the method or system includes step A, understanding and processing the received description information to obtain the grammatical semantic structure of the descriptive information; the descriptive information includes the grammatical semantic structure of multiple sentences , that is, the grammatical-semantic structure set; this is the comprehension module (natural language understanding system); B. According to the grammatical-semantic structure set of the description information, the difficult element is extracted and determined; the difficult element is the initial question string. C. Use the deduction formula of difficult elements to expand the problem string until the problem string has no difficult elements, BC is the deduction module; D. According to the final problem string, sort out the action sequence, formalize the action sequence, and output A set of program statement sequences corresponding to the description information, which is a formalized module.

2、根据权利要求1所述的方法或系统，其特征在于，所述步骤A之前进一步包括A’、采集格式化或非格式化的词汇信息记录，预先实施结构化处理后，生成词汇语义知识库(常识库)，此为构建自然语言理解库的方法与构建模块。所述步骤A’包括A’1、判断所述词汇定义方法是否是格式化方法，如果是，则执行步骤A’2、采集格式化的词汇信息，生成词汇语义知识库，否则，执行步骤A’3、采集非格式化的词汇信息，生成词汇语义知识库。2. The method or system according to claim 1, characterized in that before the step A, it further includes A', collecting formatted or unformatted vocabulary information records, and generating vocabulary semantic knowledge after pre-structuring library (common sense library), which is the method and building block for building a natural language understanding library. The step A' includes A'1, judging whether the vocabulary definition method is a formatting method, and if so, performing step A'2, collecting formatted vocabulary information, and generating a vocabulary semantic knowledge base, otherwise, performing step A '3. Collect unformatted lexical information and generate a lexical semantic knowledge base.

3、根据权利要求2所述的方法，其特征在于，所述步骤A’2所述采集格式化的词汇信息，生成词汇语义知识库，包括A’21、确定需自动采集的词汇点，为所述需采集的词汇点分配词汇ID；A’22、根据所述需自动采集的词汇点，对词汇信息记录进行采集，根据格式化标志获得与所述需自动采集的词汇点对应的词汇语义信息与相关知识，为所述词汇信息包含的词汇语义分配语义点ID；A’23、将所述词汇信息以五元组的形式进行存储；任一所述词汇点的五元组包含词语ID(词汇)、知觉语义、语义点ID、例子及相关知识。3. The method according to claim 2, characterized in that, said step A'2 collects formatted vocabulary information to generate a vocabulary semantic knowledge base, including A'21, determining the vocabulary points that need to be collected automatically, for The vocabulary points to be collected are assigned a vocabulary ID; A'22. According to the vocabulary points to be automatically collected, collect the vocabulary information records, and obtain the semantic meaning of the words corresponding to the vocabulary points to be automatically collected according to the format flag Information and related knowledge, assigning a semantic point ID for the lexical semantics contained in the lexical information; A'23, storing the lexical information in the form of a quintuple; any quintuple of the lexical point contains a word ID (vocabulary), perceptual semantics, semantic point ID, examples and related knowledge.

4、根据权利要求2所述的方法，其特征在于，上述方法中，步骤A’3所述采集非格式化的词汇信息记录，生成词汇语义知识库包括A’31、确定需自动采集的词汇点，为所述需采集的词汇点分配词汇ID；A’32、对确定进行采集的词汇信息实施词法分析与语法分析等格式化步骤；A’33、识别该已格式化的所述自动采集的词汇信息点包含的词汇，获得词汇点；确定词语的语义与相应知识，并分配词语语义点ID；A’34、从意思描述记录中选择包含的语义点信息并理解输出对应的知觉语义结构集；A’35、通过例子(知识)理解获得精细化的语义点信息。A’36、将所确定的词汇信息以五元组的形式进行存储；任一所述词汇点的五元组包含词语ID(词汇)、知觉语义、语义点ID、例子及相关结构化知识，这些五元组形式保存的词汇点信息集合组成该系统的词汇语义知识库(自然语言语义理解库)。4. The method according to claim 2, characterized in that, in the above method, the collection of unformatted vocabulary information records in step A'3, generating the vocabulary semantic knowledge base includes A'31, determining the vocabulary that needs to be automatically collected Point, for the vocabulary points that need to be collected, assign a vocabulary ID; A'32, carry out formatting steps such as lexical analysis and grammatical analysis on the vocabulary information that is determined to be collected; A'33, identify the formatted automatic collection The vocabulary contained in the lexical information point, obtain the lexical point; determine the semantics and corresponding knowledge of the word, and assign the semantic point ID of the word; A'34. Select the semantic point information contained in the meaning description record and understand the corresponding perceptual semantic structure of the output Set; A'35. Obtain refined semantic point information through example (knowledge) comprehension. A'36. Store the determined vocabulary information in the form of quintuples; any quintuple of vocabulary points includes word ID (vocabulary), perceptual semantics, semantic point ID, examples and related structured knowledge, The collection of lexical point information stored in the form of these five-tuples constitutes the lexical semantic knowledge base (natural language semantic understanding base) of the system.

5、根据权利要求2-4所述的方法，其特征在于，步骤A’2与A’3所述的语义与知识，其储存形式都是结构化的。所述步骤A’35通过例子理解获得精细化的语义点信息，这些信息包括词性知觉，并使得词汇定义的知觉语义集落到实处。5. The method according to claims 2-4, characterized in that the semantics and knowledge in steps A'2 and A'3 are stored in a structured form. The step A'35 obtains refined semantic point information through example comprehension, which includes part-of-speech perception, and makes the perceptual semantic set defined by vocabulary fall into practice.

6、根据权利要求4与权利要求1所述的方法与系统，其特征在于，所述步骤A’32与步骤A的词汇理解，包括对词汇库的匹配，即将接收到的描述信息逐次对词汇库执行匹配，并根据what-why理解效应与知觉集理解的基础上，进行词汇对应的语义的理解判定，并把理解结果输出到语法分析单元，此为所述理解模块的词语(词汇)理解单元。所述步骤A’32与步骤A的语法分析，包括词语序集对知识库的知识或语义模板进行匹配，对应的知识语义的理解是基于what-why理解效应与知觉集理解的基础上，对词汇搭配对应的知识语义进行理解判定的，即通过理解从而对搭配进行语义层面的why因素约束(语义层面的真值性约束)，此为所述理解模块的语法分析单元；根据基于理解的语法分析，对词法切分(词法分析)进行调整，此为所述理解模块的词法分析调整单元。所述的语义及语义知识为了理解目的，应用于理解基础上的词法分析与句法分析，根据语义的what-why理解效应来确定词法分析与句法分析的合理性。6. The method and system according to claim 4 and claim 1, characterized in that the vocabulary comprehension in step A'32 and step A includes the matching of the vocabulary database, and the description information to be received successively matches the vocabulary The library executes matching, and on the basis of what-why comprehension effect and perceptual set comprehension, carry out semantic comprehension judgment corresponding to vocabulary, and output comprehension result to syntax analysis unit, this is the term (vocabulary) comprehension of described comprehension module unit. The grammatical analysis of the step A'32 and step A includes matching the knowledge or semantic template of the knowledge base with the word sequence, and the understanding of the corresponding knowledge semantics is based on the what-why understanding effect and the understanding of the perception set. The knowledge semantics corresponding to the vocabulary collocation is understood and judged, that is, the why factor constraint on the semantic level (the truth value constraint on the semantic level) is carried out on the collocation through understanding, which is the grammatical analysis unit of the comprehension module; according to the grammar based on understanding Analyze, adjust the lexical segmentation (lexical analysis), which is the lexical analysis adjustment unit of the understanding module. For the purpose of understanding, the semantics and semantic knowledge are applied to lexical analysis and syntactic analysis on the basis of understanding, and the rationality of lexical analysis and syntactic analysis is determined according to the what-why comprehension effect of semantics.

7.根据权利要求1-6所述的方法与系统，其特征在于，所述步骤A’36与A’23将所述词汇信息以五元组的形式进行存储，其存储前要进行词汇归位、语义归位；词汇归位包括查询常识库，看该词汇是否已存在，若已存在则应判别语义异同，然后一致性地增加定义；语义归位包括查询常识库的语义部分，看该语义是否已存在，并一致性地增加语义链接。知识归位包括查询常识库，看该知识是否已存在，并一致性地增加该知识语义。所述步骤A’23与A’36所述的语义与知识，其储存形式都是结构化的；其中的语义是知觉语义。7. The method and system according to claims 1-6, characterized in that, said steps A'36 and A'23 store said vocabulary information in the form of quintuples, and vocabulary classification is performed before storage. Position and semantic homing; vocabulary homing includes querying the common sense base to see if the vocabulary already exists, and if it exists, it should distinguish semantic similarities and differences, and then add definitions consistently; semantic homing includes querying the semantic part of the common sense base, see the Whether the semantics already exists, and add semantic links consistently. Knowledge homing includes querying the knowledge base to see if the knowledge already exists, and adding semantics to the knowledge consistently. The semantics and knowledge described in steps A'23 and A'36 are stored in a structured form; the semantics are perceptual semantics.

8.根据权利要求1-7任一项所述的方法与系统，其特征在于，所述步骤C推演包括用语义或知识来插入并重写替换难解性元素，该元素的难解性标志去除，此为难解性元素重写单元；若发现不在已知条件内，则为难解性元素，此为所述推演模块的难解性元素发现单元；替换策略还包括使用启发性知识以缩小求解的途径，避免求解途径的组合爆炸。8. The method and system according to any one of claims 1-7, characterized in that, the derivation in step C includes using semantics or knowledge to insert and rewrite and replace incomprehensible elements, and the incomprehensible flag of the element Remove, this is the rewriting unit of difficult elements; if found not in the known conditions, then it is difficult elements, this is the discovery unit of difficult elements of the deduction module; the replacement strategy also includes using heuristic knowledge to Narrow the solution path to avoid combinatorial explosion of solution paths.

9.根据权利要求1-8任一项所述的方法与系统，其特征在于，所述步骤D形式化包括D1、根据词语对应语义中的机器取值指向特征对问题串的前变量词语进行识别与形式化，此为所述形式化模块的前变量词语形式化单元；D2、根据求解串中的词语与对应的编程语言术语的语义相同性的对应关系及编程语言规则，对运算符号等语言成分进行语义特征识别及与编程语言术语对应翻译，此为翻译单元；形式化后输出的结果构成描述信息的操作语义，此为形式化输出单元；D3、对展开的问题串通过变量整理与语句序列的组织，输出程序语句序列集合，此为程序流程组织单元。9. The method and system according to any one of claims 1-8, characterized in that, the formalization of the step D includes D1, according to the machine value pointing feature in the corresponding semantics of the word to the pre-variable word of the question string Recognition and formalization, this is the pre-variable word formalization unit of the formalization module; D2, according to the corresponding relationship between the words in the solution string and the semantic identity of the corresponding programming language term and the programming language rules, the operation symbols, etc. The semantic features of the language components are identified and translated corresponding to the programming language terms, which is the translation unit; the output result after formalization constitutes the operational semantics of the description information, which is the formalized output unit; D3, through variable sorting and The organization of statement sequence, output program statement sequence set, which is the organization unit of program flow.

10、根据权利要求1-9所述的方法与系统，其特征在于，步骤A与步骤A’32构成自然语言理解的方法；理解模块与构建模块构成自然语言理解系统，包括：(1)词汇语义定义集(词典或常识库)，用于储存词语与其相对应的语义，以及相关的知识；(2)描述信息词语法切割器(词语法分析单元)，描述信息送词语语义定义集进行匹配，通过对相关知识语义的对应，形成理解后的语法语义结构集，同时分析出各句法信息。10. The method and system according to claims 1-9, characterized in that, step A and step A'32 constitute a method for natural language understanding; the understanding module and the building blocks constitute a natural language understanding system, including: (1) vocabulary Semantic definition set (dictionary or common sense base), which is used to store words and their corresponding semantics and related knowledge; (2) description information word syntax cutter (word syntax analysis unit), description information is sent to word semantic definition set for matching , through the corresponding semantics of relevant knowledge, a set of grammatical and semantic structures after understanding is formed, and each syntactic information is analyzed at the same time.

Claims

Translated fromChinese

1.一种软件自动化方法与系统，其特征在于，该方法或系统包括步骤A、将接收的描述信息进行理解处理，获得描述信息的语法语义结构；该描述信息包括多个句子的语法语义结构，即语法语义结构集；此为理解模块(自然语言理解系统)；B、根据描述信息的语法语义结构集提取确定难解性元素；难解性元素即初始问题串。C、利用难解性元素推演公式展开问题串，直到该问题串没有难解性元素为止，BC为推演模块；D、根据最终的问题串整理出动作序列，并对该动作序列形式化，输出与该描述信息对应的程序语句序列集，此为形式化模块。1. A software automation method and system, characterized in that the method or system includes step A, understanding and processing the received description information to obtain the grammatical-semantic structure of the descriptive information; the descriptive information includes the grammatical-semantic structure of a plurality of sentences , that is, the grammatical-semantic structure set; this is the comprehension module (natural language understanding system); B. According to the grammatical-semantic structure set of the description information, the difficult element is extracted and determined; the difficult element is the initial question string. C. Use the deduction formula of difficult elements to expand the problem string until the problem string has no difficult elements, BC is the deduction module; D. According to the final problem string, sort out the action sequence, formalize the action sequence, and output A set of program statement sequences corresponding to the description information, which is a formalized module.

2.根据权利要求1所述的方法或系统，其特征在于，所述步骤A之前进一步包括A’、采集格式化或非格式化的词汇信息记录，预先实施结构化处理后，生成词汇语义知识库(常识库)，此为构建自然语言理解库的方法与构建模块。所述步骤A’包括A’1、判断所述词汇定义方法是否是格式化方法，如果是，则执行步骤Α’2、采集格式化的词汇信息，生成词汇语义知识库，否则，执行步骤A’3、采集非格式化的词汇信息，生成词汇语义知识库。2. The method or system according to claim 1, characterized in that before the step A, it further includes A', collecting formatted or unformatted vocabulary information records, and generating vocabulary semantic knowledge after pre-structuring library (common sense library), which is the method and building block for building a natural language understanding library. The step A' includes A'1, judging whether the vocabulary definition method is a formatting method, and if so, performing step A'2, collecting formatted vocabulary information, and generating a vocabulary semantic knowledge base, otherwise, performing step A '3. Collect unformatted lexical information and generate a lexical semantic knowledge base.

3.根据权利要求2所述的方法，其特征在于，所述步骤A’2所述采集格式化的词汇信息，生成词汇语义知识库，包括A’21、确定需自动采集的词汇点，为所述需采集的词汇点分配词汇ID；A’22、根据所述需自动采集的词汇点，对词汇信息记录进行采集，根据格式化标志获得与所述需自动采集的词汇点对应的词汇语义信息与相关知识，为所述词汇信息包含的词汇语义分配语义点ID；A’23、将所述词汇信息以五元组的形式进行存储；任一所述词汇点的五元组包含词语ID(词汇)、知觉语义、语义点ID、例子及相关知识。3. method according to claim 2, it is characterized in that, the vocabulary information of the described collection format of described step A'2, generate vocabulary semantics knowledge base, comprise A'21, determine the vocabulary point that needs automatic collection, for The vocabulary points to be collected are assigned a vocabulary ID; A'22. According to the vocabulary points to be automatically collected, collect the vocabulary information records, and obtain the semantic meaning of the words corresponding to the vocabulary points to be automatically collected according to the format flag Information and related knowledge, assigning a semantic point ID for the lexical semantics contained in the lexical information; A'23, storing the lexical information in the form of a quintuple; any quintuple of the lexical point contains a word ID (vocabulary), perceptual semantics, semantic point ID, examples and related knowledge.

4.根据权利要求2所述的方法，其特征在于，上述方法中，步骤A’3所述采集非格式化的词汇信息记录，生成词汇语义知识库包括A’31、确定需自动采集的词汇点，为所述需采集的词汇点分配词汇ID；A’32、对确定进行采集的词汇信息实施词法分析与语法分析等格式化步骤；A’33、识别该已格式化的所述自动采集的词汇信息点包含的词汇,获得词汇点；确定词语的语义与相应知识，并分配词语语义点ID；Α’34、从意思描述记录中选择包含的语义点信息并理解输出对应的知觉语义结构集；Α’35、通过例子(知识)理解获得精细化的语义点信息。A’36、将所确定的词汇信息以五元组的形式进行存储；任一所述词汇点的五元组包含词语ID(词汇)、知觉语义、语义点ID、例子及相关结构化知识，这些五元组形式保存的词汇点信息集合组成该系统的词汇语义知识库(自然语言语义理解库)。4. The method according to claim 2, characterized in that, in the above method, the non-formatted vocabulary information record is collected in step A'3, and generating the vocabulary semantic knowledge base includes A'31, determining the vocabulary that needs to be automatically collected Point, for the vocabulary points that need to be collected, assign a vocabulary ID; A'32, carry out formatting steps such as lexical analysis and grammatical analysis on the vocabulary information that is determined to be collected; A'33, identify the formatted automatic collection The vocabulary contained in the lexical information points, and the lexical points are obtained; the semantics and corresponding knowledge of the words are determined, and the semantic point ID of the word is assigned; Α'34. Select the semantic point information contained in the meaning description record and understand the corresponding perceptual semantic structure of the output Set; Α'35, obtain refined semantic point information through example (knowledge) comprehension. A'36. Store the determined vocabulary information in the form of quintuples; any quintuple of vocabulary points includes word ID (vocabulary), perceptual semantics, semantic point ID, examples and related structured knowledge, The collection of lexical point information stored in the form of these five-tuples constitutes the lexical semantic knowledge base (natural language semantic understanding base) of the system.

5.根据权利要求2-4所述的方法，其特征在于，步骤A’2与A’3所述的语义与知识，其储存形式都是结构化的。所述步骤Α’35通过例子理解获得精细化的语义点信息，这些信息包括词性知觉，并使得词汇定义的知觉语义集落到实处。5. The method according to claim 2-4, characterized in that the semantics and knowledge in steps A'2 and A'3 are stored in a structured form. The step A'35 obtains refined semantic point information through example comprehension, which includes part-of-speech perception, and makes the perceptual semantic set defined by vocabulary fall into practice.

6.根据权利要求4与权利要求1所述的方法与系统，其特征在于，所述步骤A’32与步骤A的词汇理解，包括对词汇库的匹配，即将接收到的描述信息逐次对词汇库执行匹配，并根据what-why理解效应与知觉集理解的基础上，进行词汇对应的语义的理解判定，并把理解结果输出到语法分析单元，此为所述理解模块的词语(词汇)理解单元。所述步骤A’32与步骤A的语法分析，包括词语序集对知识库的知识或语义模板进行匹配，对应的知识语义的理解是基于what-why理解效应与知觉集理解的基础上，对词汇搭配对应的知识语义进行理解判定的，即通过理解从而对搭配进行语义层面的why因素约束(语义层面的真值性约束)，此为所述理解模块的语法分析单元；根据基于理解的语法分析，对词法切分(词法分析)进行调整，此为所述理解模块的词法分析调整单元。所述的语义及语义知识为了理解目的，应用于理解基础上的词法分析与句法分析，根据语义的what-why理解效应来确定词法分析与句法分析的合理性。6. The method and system according to claim 4 and claim 1, characterized in that the vocabulary understanding in step A'32 and step A includes the matching of the vocabulary database, and the description information to be received successively matches the vocabulary The library executes matching, and on the basis of what-why comprehension effect and perceptual set comprehension, carry out semantic comprehension judgment corresponding to vocabulary, and output comprehension result to syntax analysis unit, this is the term (vocabulary) comprehension of described comprehension module unit. The grammatical analysis of the step A'32 and step A includes matching the knowledge or semantic template of the knowledge base with the word sequence, and the understanding of the corresponding knowledge semantics is based on the what-why understanding effect and the understanding of the perception set. The knowledge semantics corresponding to the vocabulary collocation is understood and judged, that is, the why factor constraint on the semantic level (the truth value constraint on the semantic level) is carried out on the collocation through understanding, which is the grammatical analysis unit of the comprehension module; according to the grammar based on understanding Analyze, adjust the lexical segmentation (lexical analysis), which is the lexical analysis adjustment unit of the understanding module. For the purpose of understanding, the semantics and semantic knowledge are applied to lexical analysis and syntactic analysis on the basis of understanding, and the rationality of lexical analysis and syntactic analysis is determined according to the what-why comprehension effect of semantics.

9.根据权利要求1-8任一项所述的方法与系统，其特征在于，所述步骤D形式化包括Dl、根据词语对应语义中的机器取值指向特征对问题串的前变量词语进行识别与形式化，此为所述形式化模块的前变量词语形式化单元；D2、根据求解串中的词语与对应的编程语言术语的语义相同性的对应关系及编程语言规则，对运算符号等语言成分进行语义特征识别及与编程语言术语对应翻译，此为翻译单元；形式化后输出的结果构成描述信息的操作语义，此为形式化输出单元；D3、对展开的问题串通过变量整理与语句序列的组织，输出程序语句序列集合，此为程序流程组织单元。9. according to the method and system described in any one of claim 1-8, it is characterized in that, described step D formalization comprises D1, according to the machine value pointing feature in the word corresponding semantics to the preceding variable word of question string Recognition and formalization, this is the pre-variable word formalization unit of the formalization module; D2, according to the corresponding relationship between the words in the solution string and the semantic identity of the corresponding programming language term and the programming language rules, the operation symbols, etc. The semantic features of the language components are identified and translated corresponding to the programming language terms, which is the translation unit; the output result after formalization constitutes the operational semantics of the description information, which is the formalized output unit; D3, through variable sorting and The organization of statement sequence, output program statement sequence set, which is the organization unit of program flow.

10.根据权利要求1-9所述的方法与系统，其特征在于，步骤A与步骤A’32构成自然语言理解的方法；理解模块与构建模块构成自然语言理解系统，包括：(1)词汇语义定义集(词典或常识库)，用于储存词语与其相对应的语义，以及相关的知识；(2)描述信息词语法切割器(词语法分析单元)，描述信息送词语语义定义集进行匹配，通过对相关知识语义的对应，形成理解后的语法语义结构集，同时分析出各句法信息。10. The method and system according to claims 1-9, characterized in that, step A and step A'32 constitute a method for natural language understanding; the understanding module and building blocks constitute a natural language understanding system, including: (1) vocabulary Semantic definition set (dictionary or common sense base), which is used to store words and their corresponding semantics and related knowledge; (2) description information word syntax cutter (word syntax analysis unit), description information is sent to word semantic definition set for matching , through the corresponding semantics of relevant knowledge, a set of grammatical and semantic structures after understanding is formed, and each syntactic information is analyzed at the same time.