CN119536743B

Movatterモバイル変換

Info

Publication number: CN119536743B
Application number: CN202510099630.7A
Authority: CN
Inventors: 刘敏; 姜凯; 赵鑫鑫; 薛海军
Original assignee: Shandong Inspur Science Research Institute Co Ltd
Current assignee: Shandong Inspur Science Research Institute Co Ltd
Priority date: 2025-01-22
Filing date: 2025-01-22
Publication date: 2025-07-08
Anticipated expiration: 2045-01-22
Also published as: CN119536743A

Abstract

The application discloses a compiling method, equipment and medium of a multi-language mixed code, and relates to the technical field of computers, wherein the method comprises the steps of scanning multi-language mixed code files line by line to identify code fragments corresponding to a plurality of languages and determining language identifications corresponding to the code fragments; the method comprises the steps of determining a corresponding analysis rule according to a language identification, carrying out semantic analysis on code segments according to the analysis rule to determine semantic association patterns among a plurality of code segments, converting the plurality of code segments according to the semantic association patterns to determine an intermediate form, and determining an operating environment of a target platform to convert the intermediate form according to the operating environment to convert the intermediate form into a code format corresponding to the target platform. The method has the advantages of high accuracy, strong flexibility, good optimizing effect and the like, and provides powerful guarantee for cross-language programming and cross-platform deployment.

Description

Compiling method, device and medium for multi-language mixed code

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a medium for compiling a multilingual hybrid code.

Background

In the present digital age, the complexity of software systems is increasing at an unprecedented rate. To address increasingly complex business needs and technical challenges, multi-language hybrid programming has become a norm in the field of software development. However, most of the existing compiler technology is still limited to design and development of a single programming language, and is not careful when facing multi-language mixed code.

In the compiling process, the traditional compiler only carries out static code analysis and conversion according to language grammar rules and semantic specifications supported by the traditional compiler, and the traditional compiler lacks deep understanding of complex semantic relations during multi-language interaction. This results in difficulty for the compiler to accurately capture deep semantic associations between code segments in different languages, such as cross-language function calls, consistency maintenance of shared data, and collaborative work between control flows and data structures in different languages, when processing multi-language mixed code.

This lack of semantic understanding presents a serious set of problems. Firstly, in terms of compiling efficiency, because the multi-language mixed code cannot be optimized as a whole, a compiler often carries out isolated and low-efficiency compiling processing on different language code segments, thereby generating a large amount of redundant intermediate codes and unnecessary runtime overhead, so that the compiling process is long in time consumption and seriously affects the development efficiency. Secondly, in terms of optimization strategies, it is also difficult for existing compilers to formulate effective optimization measures for the characteristics of the multilingual hybrid code.

Disclosure of Invention

In order to solve the problems, the application provides a compiling method of a multi-language mixed code, which comprises the steps of scanning a multi-language mixed code file line by line to identify code segments corresponding to a plurality of languages and determine language identifications corresponding to the code segments, determining corresponding analysis rules according to the language identifications, carrying out semantic analysis on the code segments according to the analysis rules to determine semantic association patterns among the code segments, converting the code segments according to the semantic association patterns to determine intermediate forms, and determining the running environment of a target platform to convert the intermediate forms according to the running environment to convert the code formats corresponding to the target platform.

In one example, the method for scanning the multi-language mixed code file line by line specifically comprises the steps of determining a characteristic field corresponding to the multi-language, scanning the code file line by line according to the characteristic field to determine whether the characteristic field is contained in the code file, determining the code segment according to the characteristic field if the characteristic field is contained in the code file, determining the corresponding language identification according to the characteristic field, and marking the code segment according to the language identification.

In one example, corresponding analysis rules are determined according to the language identification, and semantic analysis is performed on the code segments according to the analysis rules, specifically comprising determining corresponding grammar according to the language identification, determining the corresponding analysis rules according to the grammar, and performing model construction according to the analysis rules to determine a semantic analysis model, wherein functions of the semantic analysis model comprise variable declaration and use analysis, function call analysis, data type deduction, control flow analysis, and interface and dependency analysis between code segments.

In one example, determining semantic association graphs among a plurality of code segments specifically comprises performing semantic analysis on the plurality of code segments through the semantic analysis model to determine function call relations among the plurality of code segments, determining semantic association according to the function call relations, determining a data sharing mechanism and an interface specification among the plurality of code segments, and determining the semantic association graphs according to the semantic association, the data sharing mechanism and the interface specification to display interaction paths among the plurality of code segments and a dependent network through the semantic association graphs.

In one example, the code segments are converted according to the semantic association graph to determine an intermediate form, and the method specifically comprises the steps of determining grammar structure information of the code segments according to the semantic association graph, determining semantic relations among the code segments, and performing mapping conversion according to the grammar structure and the semantic relations to determine the intermediate form.

In one example, the method further includes determining a frequency of use and a declaration period of a variable in the code segment, and determining a register corresponding to the code segment based on the frequency of use and the declaration period.

In one example, the method further includes determining an object reference mechanism corresponding to the multiple languages, and laying out the code fragments according to the object reference mechanism.

In one example, after determining the language identifier corresponding to the code segment, the method further includes preprocessing the code segment, where the preprocessing includes removing annotations and processing blank characters.

On the other hand, the application further provides compiling equipment of the multi-language mixed code, which comprises at least one processor and a memory in communication connection with the at least one processor, wherein the memory stores instructions which can be executed by the at least one processor, the instructions are executed by the at least one processor, so that the compiling equipment of the multi-language mixed code can execute the multi-language mixed code file, the multi-language mixed code file is scanned line by line to identify code fragments corresponding to a plurality of languages and determine language identifications corresponding to the code fragments, corresponding parsing rules are determined according to the language identifications, the code fragments are parsed semantically according to the parsing rules to determine semantic association patterns among the code fragments, the code fragments are converted according to the semantic association patterns to determine an intermediate form, and the operating environment of a target platform is determined to convert the intermediate form into a code format corresponding to the target platform according to the operating environment.

On the other hand, the application also provides a nonvolatile computer storage medium which stores computer executable instructions, wherein the computer executable instructions are configured to scan a multi-language mixed code file line by line to identify code segments corresponding to a plurality of languages and determine language identifications corresponding to the code segments, determine corresponding analysis rules according to the language identifications, perform semantic analysis on the code segments according to the analysis rules to determine semantic association patterns among the code segments, convert the code segments according to the semantic association patterns to determine intermediate forms, and determine the operating environment of a target platform to convert the intermediate forms according to the operating environment to convert the code formats corresponding to the target platform.

The method and the device identify the multilingual code fragments through progressive scanning, accurately determine the language identification of the multilingual code fragments, and lay a solid foundation for subsequent processing. The process effectively avoids analysis errors caused by language confusion, and improves the compiling accuracy and efficiency. According to the method, the corresponding analysis rules are selected for semantic analysis according to the language identification, semantic association among the code fragments can be deeply analyzed, and an exhaustive semantic association map is constructed. This not only helps understand the complex interactions of the code, but also provides powerful support for subsequent optimizations and transformations. The multi-language code fragments are converted into a unified intermediate form, so that grammar and semantic information of different languages are effectively integrated, and favorable conditions are created for cross-language code optimization and cross-platform deployment. The application fully considers the running environment of the target platform, ensures that the intermediate form can be smoothly converted into the code format corresponding to the target platform, enhances the flexibility and applicability of the compiling method, and can be widely applied to various platforms and environments. The method has the advantages of high accuracy, strong flexibility, good optimizing effect and the like, and provides powerful guarantee for cross-language programming and cross-platform deployment. According to the application, through semantic perception and unified compiling flow, the multi-language codes can be cooperatively processed in the compiling process, so that unnecessary compiling steps and resource consumption are reduced, and the compiling efficiency is remarkably improved. Based on semantic analysis, the errors can be accurately detected in the compiling stage, and through cross-language semantic association analysis, some deep logic errors can be found, so that the quality of the multi-language mixed codes is effectively improved. The performance potential of the multi-language mixed code is fully mined through intermediate representation optimization links, such as cross-language function inlining, shared data access optimization, instruction scheduling optimization and other means, and the execution speed and the resource utilization rate of the program are improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of a compiling method for multi-language mixed codes according to an embodiment of the application;

Fig. 2 is a schematic diagram of a compiling apparatus for multi-language mixed codes according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

The existing compiler technology is insufficient in a multi-language hybrid programming scene, becomes a key bottleneck for restricting the improvement of the development efficiency, quality and performance of a software system, and an innovative method and system capable of effectively processing multi-language hybrid codes and performing accurate compiling based on semantics are urgently needed to meet the increasingly complex modern software development requirements.

As shown in fig. 1, in order to solve the above problem, a method for compiling multi-language mixed code according to an embodiment of the present application includes:

S101, scanning the multi-language mixed code file line by line to identify code segments corresponding to a plurality of languages, and determining language identifications corresponding to the code segments.

The core of the multi-language hybrid code is the cooperation of different programming languages at the semantic level and the runtime environment level. From a semantic perspective, it is critical to identify and match functionally and logically equivalent expressions in different languages to ensure that they can be seamlessly integrated into the overall logic flow of the program to work together. For example, function calls in Python and function calls in c++, both at the semantic level are intended to execute specific blocks of code to accomplish tasks, which can be efficiently passed between different languages through interface definition and translation mechanisms. From the dimension of the runtime environment, the nature of the multi-language hybrid code is reflected in how the runtime systems in different languages share and coordinate the resources provided by the operating system, such as memory, processor time, etc. Although the code segments of different languages have unique layout and management mechanisms in the memory, smooth transmission of data and control flows among the codes of different languages is realized by utilizing means of shared libraries, memory mapping files and the like, and then all functions of the software system are realized together.

The method comprises the steps of preprocessing codes, marking language identification, scanning input contents line by a system when processing a multi-language mixed code file so as to accurately identify and distinguish code fragments in different languages, and attaching corresponding language identification tags to each identified code fragment. In this process, preliminary code preprocessing operations are also performed, including removing annotations, simplifying blank characters, etc., in order to simplify subsequent analysis steps while ensuring that structural information and semantic cues of the code are preserved.

In one embodiment, in the code preprocessing and language identification phase, a hybrid file containing C++ and Python codes is first scanned line by line. When a line is detected that starts with the ". Cpp" extension or contains a C++ specific grammar key, such as "class", "template", etc., the system determines the segment as a C++ code fragment and attaches a "C++" identification tag. When encountering a line conforming to the syntax feature of Python, such as a dependent indentation format, containing keywords "def", "import", etc., then the code fragment of Python is determined, and a "Python" identification tag is attached. In this process, the annotated content in the code is removed and consecutive blank characters are merged or uniformly formatted, but structural symbols such as brackets, semicolons, etc. in the code are carefully preserved to ensure that the integrity of the code logic structure is not affected.

S102, determining a corresponding analysis rule according to the language identification, and carrying out semantic analysis on the code segments according to the analysis rule so as to determine semantic association patterns among a plurality of code segments.

In order to cope with a plurality of common programming languages, respective semantic analysis units are specially designed. Each unit builds a semantic analysis model according to the grammar rules, semantic specifications, standard library function definitions and other expertise of the corresponding language. The models have the capability of carrying out deep semantic analysis on the language code fragments, and comprise analysis of variable declarations and use, analysis of function calls, deduction of data types, carding of control flows and fine analysis of interfaces and dependency relations among the code fragments.

After the semantic analysis of each language code segment is completed, a semantic association graph between different language code segments is constructed according to the function call relationship, the data sharing mechanism and the language-specific interaction protocol in the code, for example, cross-language interface standards defined in some multi-language frameworks. The atlas can clearly reveal the semantic interaction path and the dependency network among all parts in the multi-language mixed code, and provides comprehensive and detailed semantic information support for the subsequent compiling and optimizing steps.

In one embodiment, in the process of constructing the semantic analysis unit, a semantic model capable of deeply analyzing the complex syntax structure of C++ is constructed based on the standard syntax rules of C++, and the model comprises classes, templates, reloading functions and the like. Specifically, when the variable declaration in the C++ code is analyzed, the model accurately deduces the type and life cycle of the variable according to details such as the type modifier and the scope qualifier of the variable. And when the function call analysis is carried out, complex conditions such as function reload, default parameters and the like are properly processed. For Python, a semantic analysis model is also constructed based on its dynamic type system and unique indentation grammar rules. In the data type deducing link, the model dynamically and accurately determines the type of the variable according to the assignment operation of the variable. When the function call is processed, the method flexibly responds to various function parameter transmission modes in Python, such as position parameters, keyword parameters, variable parameters and the like.

In one embodiment, in the cross-language semantic association and integration stage, when a function in c++ is called by Python code, a call statement of the c++ function and a corresponding function call definition in Python need to be analyzed, so that semantic association between the two needs to be accurately established. For example, if there is a call in Python code such as lib.add_ numbers (5, 3), it needs to be associated directly with the add_ numbers function in C++, and likewise, a call in lib.multiple_ numbers (2, 4) needs to be associated with the multiply _ numbers function in C++. In addition, for data shared between c++ and Python, such as global_variable, it is necessary to deeply analyze access patterns and sharing mechanisms in two languages and integrate these information into semantic association graphs to ensure accuracy and efficiency of cross-language data interaction.

S103, converting the plurality of code segments according to the semantic association graph so as to determine an intermediate form.

Based on the constructed semantic association graph, the multi-language mixed code is converted into a unified intermediate representation form. The intermediate representation not only comprehensively retains the grammar structure information of codes, but also deeply merges the semantic relation of cross languages. After the intermediate representation is successfully generated, a series of optimization operations are performed according to semantic information therein. For example, for frequently called functions in cross-language, inline optimization is implemented to improve efficiency, for shared data access, optimization is performed to reduce unnecessary data transmission overhead, and at the same time, fine instruction scheduling optimization is performed according to the execution flow and semantic logic of codes, so as to further improve overall performance.

In one embodiment, in the intermediate representation generation and optimization stage, a unified intermediate representation is designed that employs graph-based data structures to precisely map the syntactic structure and semantic relationships of C++ and Python code. In this figure, nodes represent core elements such as variables, functions, code blocks, etc., while edges depict data dependencies, control flow associations, and cross-language function call relationships, etc. In the optimization procedure, python functions frequently called in c++, especially those with relatively compact functions, are analyzed and identified. And adopting an inline strategy for the functions, and directly integrating the functions into an intermediate representation part corresponding to the C++ code, so that the additional cost of function call is effectively reduced. At the same time, the access operation for the shared data is deeply optimized. When detecting that multiple redundant data transmission exists, optimizing a data access path, and reducing the number of data transmission times by introducing an intermediate buffer mechanism or directly utilizing a memory mapping technology, thereby improving the overall performance of the system.

S104, determining an operation environment of the target platform, and converting the intermediate form according to the operation environment to convert the intermediate form into a code format corresponding to the target platform.

And according to the specific requirements of the instruction set architecture and the runtime environment of the target platform, the optimized intermediate representation is accurately converted into a target machine code or a code format which can be directly executed by the target platform. In the conversion process, the unique operation characteristics of the multi-language mixed code on the target platform are considered, and the fine register allocation, the memory layout adjustment, the adaptation of an exception handling mechanism and the like are performed according to the unique operation characteristics, so that the generated target code can be ensured to realize efficient and stable operation on the target platform.

In one embodiment, if the target platform is an operating system based on the x86 architecture, the optimized intermediate representation is accurately converted to x86 machine code. When the register allocation is carried out, the use frequency of variables and the life cycle of the variables are considered, so that the general registers and the special purpose registers are scientifically and reasonably allocated. For the adjustment of the memory layout, the storage requirements of the C++ and Python codes in the memory are comprehensively analyzed, for example, the layout mode of the objects in the C++ and the reference mechanism of the objects in the Python are comprehensively analyzed, and the optimization layout is carried out on the basis so as to improve the memory access efficiency. Meanwhile, the requirements of an exception handling mechanism of a target platform are strictly followed, and corresponding processing code fragments are added aiming at cross-language exception conditions possibly occurring. This measure aims to ensure that the program exhibits a high degree of stability and reliability during operation.

As shown in fig. 2, the embodiment of the present application further provides a compiling device for a multilingual mixed code, including:

at least one processor, and

A memory communicatively coupled to the at least one processor, wherein,

The memory stores instructions executable by the at least one processor to enable a multi-language hybrid code compiling device to perform:

scanning the multi-language mixed code file line by line to identify code segments corresponding to a plurality of languages and determining language identifiers corresponding to the code segments;

determining a corresponding analysis rule according to the language identification, and carrying out semantic analysis on the code segments according to the analysis rule so as to determine semantic association patterns among a plurality of code segments;

converting the plurality of code segments according to the semantic association graph to determine an intermediate form;

And determining the running environment of the target platform, and converting the intermediate form according to the running environment to convert the intermediate form into a code format corresponding to the target platform.

The embodiment of the application also provides a nonvolatile computer storage medium, which stores computer executable instructions, wherein the computer executable instructions are configured to:

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable GATE ARRAY, FPGA)) is an integrated circuit whose logic functions are determined by user programming of the device. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler (logic compiler)" software, which is similar to the software compiler used in program development and writing, and the original code before being compiled is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but HDL is not just one, but a plurality of kinds, such as ABEL（Advanced Boolean Expression Language）、AHDL（Altera Hardware Description Language）、Confluence、CUPL（Cornell University Programming Language）、HDCal、JHDL（Java Hardware Description Language）、Lava、Lola、MyHDL、PALASM、RHDL（Ruby Hardware Description Language）, and VHDL (Very-High-SPEED INTEGRATED Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application SPECIFIC INTEGRATED Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, and the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for the apparatus and medium embodiments, the description is relatively simple, as it is substantially similar to the method embodiments, with reference to the section of the method embodiments being relevant.

The devices and media provided in the embodiments of the present application are in one-to-one correspondence with the methods, so that the devices and media also have similar beneficial technical effects as the corresponding methods, and since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the devices and media are not repeated here.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

Translated fromChinese

1.一种多语言混合代码的编译方法，其特征在于，包括：1. A method for compiling multi-language mixed code, characterized by comprising:

对多语言混合代码文件进行逐行扫描，以识别出多个语言对应的代码片段，并确定所述代码片段对应的语言标识；Scanning the multi-language mixed code file line by line to identify code snippets corresponding to multiple languages and determine language identifiers corresponding to the code snippets;

根据所述语言标识确定对应的解析规则，并根据所述解析规则对所述代码片段进行语义解析，以确定多个所述代码片段之间的语义关联图谱；Determine a corresponding parsing rule according to the language identifier, and perform semantic parsing on the code snippet according to the parsing rule to determine a semantic association graph between the plurality of code snippets;

根据所述语义关联图谱将多个所述代码片段进行转换，以确定中间形式；Converting the plurality of code snippets according to the semantic association graph to determine an intermediate form;

确定目标平台的运行环境，以根据所述运行环境对所述中间形式进行转换，以转换成所述目标平台对应的代码格式；Determining the operating environment of the target platform, so as to convert the intermediate form according to the operating environment, so as to convert it into a code format corresponding to the target platform;

根据所述语言标识确定对应的解析规则，并根据所述解析规则对所述代码片段进行语义解析，具体包括：Determining a corresponding parsing rule according to the language identifier, and performing semantic parsing on the code snippet according to the parsing rule, specifically includes:

根据所述语言标识确定对应的语法，以根据所述语法确定对应的所述解析规则；Determine a corresponding grammar according to the language identifier, so as to determine the corresponding parsing rule according to the grammar;

根据所述解析规则进行模型构建，以确定语义解析模型，所述语义解析模型的功能包括变量声明与使用分析、函数调用解析、数据类型推导、控制流分析、代码段之间的接口和依赖关系分析；Model building is performed according to the parsing rules to determine a semantic parsing model, wherein the functions of the semantic parsing model include variable declaration and usage analysis, function call parsing, data type derivation, control flow analysis, and interface and dependency analysis between code segments;

确定多个所述代码片段之间的语义关联图谱，具体包括：Determining a semantic association graph between the plurality of code snippets specifically includes:

通过所述语义解析模型对多个所述代码片段进行语义解析，以确定多个所述代码片段之间的函数调用关系，根据所述函数调用关系确定语义关联；Performing semantic analysis on the plurality of code snippets by using the semantic analysis model to determine function call relationships between the plurality of code snippets, and determining semantic associations according to the function call relationships;

确定多个所述代码片段之间的数据共享机制、接口规范，根据所述语义关联、所述数据共享机制和所述接口规范确定所述语义关联图谱，以通过所述语义关联图谱将多个所述代码片段之间的交互路径和依赖网络进行显示；Determine a data sharing mechanism and an interface specification between the plurality of code snippets, and determine the semantic association graph according to the semantic association, the data sharing mechanism and the interface specification, so as to display the interaction paths and dependency networks between the plurality of code snippets through the semantic association graph;

基于C++标准语法确定语义模型，所述语义模型包括类、模板以及重载函数；在解析C++代码中的变量声明时，所述语义模型依据变量的类型修饰符、作用域限定符推导出变量的类型和生命周期；而在进行函数调用解析时，则处理函数重载、默认参数；Determine a semantic model based on the C++ standard syntax, the semantic model includes classes, templates, and overloaded functions; when parsing variable declarations in C++ code, the semantic model infers the type and life cycle of the variable based on the variable's type modifier and scope qualifier; and when parsing function calls, it handles function overloading and default parameters;

对于Python，根据动态类型系统和缩进语法规则构建了一个语义分析模型；在数据类型推导环节，所述语义分析模型依据变量的赋值操作，确定类型；在处理函数调用时，则确定Python中的函数参数传递方式；For Python, a semantic analysis model is constructed based on the dynamic type system and indentation syntax rules; in the data type derivation stage, the semantic analysis model determines the type based on the variable assignment operation; when processing function calls, it determines the function parameter passing method in Python;

在中间表示生成与优化阶段，确定中间表示形式，所述中间表示形式采用基于图的数据结构，以映射C++和Python代码的语法结构及语义关系；图的节点代表变量、函数、代码块；图的边代表描绘了数据依赖、控制流关联以及跨语言的函数调用关系；在优化流程中，分析并识别出在C++中被调用的Python函数，对于所述函数采取内联策略，将所述函数融入到C++代码对应的中间表示部分。In the intermediate representation generation and optimization stage, the intermediate representation is determined. The intermediate representation adopts a graph-based data structure to map the grammatical structure and semantic relationship of C++ and Python codes. The nodes of the graph represent variables, functions, and code blocks. The edges of the graph represent data dependencies, control flow associations, and cross-language function call relationships. In the optimization process, the Python functions called in C++ are analyzed and identified, and an inline strategy is adopted for the functions to integrate the functions into the intermediate representation part corresponding to the C++ code.

2.根据权利要求1所述的方法，其特征在于，对多语言混合代码文件进行逐行扫描，具体包括：2. The method according to claim 1, characterized in that scanning the multi-language mixed code file line by line specifically comprises:

确定所述多语言对应的特征字段，根据所述特征字段对代码文件进行逐行扫描，以确定所述代码文件中是否包含所述特征字段；Determine the characteristic fields corresponding to the multiple languages, and scan the code file line by line according to the characteristic fields to determine whether the code file contains the characteristic fields;

若所述代码文件中包含所述特征字段，则根据所述特征字段确定所述代码片段，并根据所述特征字段确定对应的所述语言标识，根据所述语言标识对所述代码片段进行标记。If the code file contains the characteristic field, the code snippet is determined according to the characteristic field, the corresponding language identifier is determined according to the characteristic field, and the code snippet is marked according to the language identifier.

3.根据权利要求1所述的方法，其特征在于，根据所述语义关联图谱将多个所述代码片段进行转换，以确定中间形式，具体包括：3. The method according to claim 1, characterized in that the multiple code snippets are converted according to the semantic association graph to determine the intermediate form, specifically comprising:

根据所述语义关联图谱确定所述代码片段的语法结构信息，并确定多个所述代码片段之间的语义关系；Determining grammatical structure information of the code snippet according to the semantic association graph, and determining semantic relationships between a plurality of the code snippets;

根据所述语法结构和所述语义关系进行映射转换，以确定所述中间形式。Mapping conversion is performed according to the grammatical structure and the semantic relationship to determine the intermediate form.

4.根据权利要求1所述的方法，其特征在于，所述方法还包括：4. The method according to claim 1, characterized in that the method further comprises:

确定所述代码片段中变量的使用频率和声明周期，根据所述使用频率和所述声明周期确定所述代码片段对应的寄存器。Determine the usage frequency and declaration period of the variables in the code snippet, and determine the register corresponding to the code snippet according to the usage frequency and the declaration period.

5.根据权利要求1所述的方法，其特征在于，所述方法还包括：5. The method according to claim 1, characterized in that the method further comprises:

确定所述多语言对应的对象引用机制，根据所述对象引用机制对所述代码片段进行布局。An object reference mechanism corresponding to the multiple languages is determined, and the code snippets are laid out according to the object reference mechanism.

6.根据权利要求1所述的方法，其特征在于，确定所述代码片段对应的语言标识之后，所述方法还包括：6. The method according to claim 1, characterized in that after determining the language identifier corresponding to the code snippet, the method further comprises:

对所述代码片段进行预处理，所述预处理过程包括去除注释、处理空白字符。The code snippet is preprocessed, and the preprocessing process includes removing comments and processing blank characters.

7.一种多语言混合代码的编译设备，其特征在于，包括：7. A compilation device for multi-language mixed code, characterized by comprising:

至少一个处理器；以及，at least one processor; and,

与所述至少一个处理器通信连接的存储器；其中，a memory communicatively connected to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述一种多语言混合代码的编译设备能够执行：The memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor to enable the compilation device of the multi-language mixed code to execute:

确定目标平台的运行环境，以根据所述运行环境对所述中间形式进行转换，以转换成所述目标平台对应的代码格式。The operating environment of the target platform is determined, so as to convert the intermediate form according to the operating environment to a code format corresponding to the target platform.

8.一种非易失性计算机存储介质，存储有计算机可执行指令，其特征在于，所述计算机可执行指令设置为：8. A non-volatile computer storage medium storing computer executable instructions, wherein the computer executable instructions are configured as follows: