Disclosure of Invention
In order to solve the problems, the application provides a compiling method of a multi-language mixed code, which comprises the steps of scanning a multi-language mixed code file line by line to identify code segments corresponding to a plurality of languages and determine language identifications corresponding to the code segments, determining corresponding analysis rules according to the language identifications, carrying out semantic analysis on the code segments according to the analysis rules to determine semantic association patterns among the code segments, converting the code segments according to the semantic association patterns to determine intermediate forms, and determining the running environment of a target platform to convert the intermediate forms according to the running environment to convert the code formats corresponding to the target platform.
In one example, the method for scanning the multi-language mixed code file line by line specifically comprises the steps of determining a characteristic field corresponding to the multi-language, scanning the code file line by line according to the characteristic field to determine whether the characteristic field is contained in the code file, determining the code segment according to the characteristic field if the characteristic field is contained in the code file, determining the corresponding language identification according to the characteristic field, and marking the code segment according to the language identification.
In one example, corresponding analysis rules are determined according to the language identification, and semantic analysis is performed on the code segments according to the analysis rules, specifically comprising determining corresponding grammar according to the language identification, determining the corresponding analysis rules according to the grammar, and performing model construction according to the analysis rules to determine a semantic analysis model, wherein functions of the semantic analysis model comprise variable declaration and use analysis, function call analysis, data type deduction, control flow analysis, and interface and dependency analysis between code segments.
In one example, determining semantic association graphs among a plurality of code segments specifically comprises performing semantic analysis on the plurality of code segments through the semantic analysis model to determine function call relations among the plurality of code segments, determining semantic association according to the function call relations, determining a data sharing mechanism and an interface specification among the plurality of code segments, and determining the semantic association graphs according to the semantic association, the data sharing mechanism and the interface specification to display interaction paths among the plurality of code segments and a dependent network through the semantic association graphs.
In one example, the code segments are converted according to the semantic association graph to determine an intermediate form, and the method specifically comprises the steps of determining grammar structure information of the code segments according to the semantic association graph, determining semantic relations among the code segments, and performing mapping conversion according to the grammar structure and the semantic relations to determine the intermediate form.
In one example, the method further includes determining a frequency of use and a declaration period of a variable in the code segment, and determining a register corresponding to the code segment based on the frequency of use and the declaration period.
In one example, the method further includes determining an object reference mechanism corresponding to the multiple languages, and laying out the code fragments according to the object reference mechanism.
In one example, after determining the language identifier corresponding to the code segment, the method further includes preprocessing the code segment, where the preprocessing includes removing annotations and processing blank characters.
On the other hand, the application further provides compiling equipment of the multi-language mixed code, which comprises at least one processor and a memory in communication connection with the at least one processor, wherein the memory stores instructions which can be executed by the at least one processor, the instructions are executed by the at least one processor, so that the compiling equipment of the multi-language mixed code can execute the multi-language mixed code file, the multi-language mixed code file is scanned line by line to identify code fragments corresponding to a plurality of languages and determine language identifications corresponding to the code fragments, corresponding parsing rules are determined according to the language identifications, the code fragments are parsed semantically according to the parsing rules to determine semantic association patterns among the code fragments, the code fragments are converted according to the semantic association patterns to determine an intermediate form, and the operating environment of a target platform is determined to convert the intermediate form into a code format corresponding to the target platform according to the operating environment.
On the other hand, the application also provides a nonvolatile computer storage medium which stores computer executable instructions, wherein the computer executable instructions are configured to scan a multi-language mixed code file line by line to identify code segments corresponding to a plurality of languages and determine language identifications corresponding to the code segments, determine corresponding analysis rules according to the language identifications, perform semantic analysis on the code segments according to the analysis rules to determine semantic association patterns among the code segments, convert the code segments according to the semantic association patterns to determine intermediate forms, and determine the operating environment of a target platform to convert the intermediate forms according to the operating environment to convert the code formats corresponding to the target platform.
The method and the device identify the multilingual code fragments through progressive scanning, accurately determine the language identification of the multilingual code fragments, and lay a solid foundation for subsequent processing. The process effectively avoids analysis errors caused by language confusion, and improves the compiling accuracy and efficiency. According to the method, the corresponding analysis rules are selected for semantic analysis according to the language identification, semantic association among the code fragments can be deeply analyzed, and an exhaustive semantic association map is constructed. This not only helps understand the complex interactions of the code, but also provides powerful support for subsequent optimizations and transformations. The multi-language code fragments are converted into a unified intermediate form, so that grammar and semantic information of different languages are effectively integrated, and favorable conditions are created for cross-language code optimization and cross-platform deployment. The application fully considers the running environment of the target platform, ensures that the intermediate form can be smoothly converted into the code format corresponding to the target platform, enhances the flexibility and applicability of the compiling method, and can be widely applied to various platforms and environments. The method has the advantages of high accuracy, strong flexibility, good optimizing effect and the like, and provides powerful guarantee for cross-language programming and cross-platform deployment. According to the application, through semantic perception and unified compiling flow, the multi-language codes can be cooperatively processed in the compiling process, so that unnecessary compiling steps and resource consumption are reduced, and the compiling efficiency is remarkably improved. Based on semantic analysis, the errors can be accurately detected in the compiling stage, and through cross-language semantic association analysis, some deep logic errors can be found, so that the quality of the multi-language mixed codes is effectively improved. The performance potential of the multi-language mixed code is fully mined through intermediate representation optimization links, such as cross-language function inlining, shared data access optimization, instruction scheduling optimization and other means, and the execution speed and the resource utilization rate of the program are improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
The existing compiler technology is insufficient in a multi-language hybrid programming scene, becomes a key bottleneck for restricting the improvement of the development efficiency, quality and performance of a software system, and an innovative method and system capable of effectively processing multi-language hybrid codes and performing accurate compiling based on semantics are urgently needed to meet the increasingly complex modern software development requirements.
As shown in fig. 1, in order to solve the above problem, a method for compiling multi-language mixed code according to an embodiment of the present application includes:
S101, scanning the multi-language mixed code file line by line to identify code segments corresponding to a plurality of languages, and determining language identifications corresponding to the code segments.
The core of the multi-language hybrid code is the cooperation of different programming languages at the semantic level and the runtime environment level. From a semantic perspective, it is critical to identify and match functionally and logically equivalent expressions in different languages to ensure that they can be seamlessly integrated into the overall logic flow of the program to work together. For example, function calls in Python and function calls in c++, both at the semantic level are intended to execute specific blocks of code to accomplish tasks, which can be efficiently passed between different languages through interface definition and translation mechanisms. From the dimension of the runtime environment, the nature of the multi-language hybrid code is reflected in how the runtime systems in different languages share and coordinate the resources provided by the operating system, such as memory, processor time, etc. Although the code segments of different languages have unique layout and management mechanisms in the memory, smooth transmission of data and control flows among the codes of different languages is realized by utilizing means of shared libraries, memory mapping files and the like, and then all functions of the software system are realized together.
The method comprises the steps of preprocessing codes, marking language identification, scanning input contents line by a system when processing a multi-language mixed code file so as to accurately identify and distinguish code fragments in different languages, and attaching corresponding language identification tags to each identified code fragment. In this process, preliminary code preprocessing operations are also performed, including removing annotations, simplifying blank characters, etc., in order to simplify subsequent analysis steps while ensuring that structural information and semantic cues of the code are preserved.
In one embodiment, in the code preprocessing and language identification phase, a hybrid file containing C++ and Python codes is first scanned line by line. When a line is detected that starts with the ". Cpp" extension or contains a C++ specific grammar key, such as "class", "template", etc., the system determines the segment as a C++ code fragment and attaches a "C++" identification tag. When encountering a line conforming to the syntax feature of Python, such as a dependent indentation format, containing keywords "def", "import", etc., then the code fragment of Python is determined, and a "Python" identification tag is attached. In this process, the annotated content in the code is removed and consecutive blank characters are merged or uniformly formatted, but structural symbols such as brackets, semicolons, etc. in the code are carefully preserved to ensure that the integrity of the code logic structure is not affected.
S102, determining a corresponding analysis rule according to the language identification, and carrying out semantic analysis on the code segments according to the analysis rule so as to determine semantic association patterns among a plurality of code segments.
In order to cope with a plurality of common programming languages, respective semantic analysis units are specially designed. Each unit builds a semantic analysis model according to the grammar rules, semantic specifications, standard library function definitions and other expertise of the corresponding language. The models have the capability of carrying out deep semantic analysis on the language code fragments, and comprise analysis of variable declarations and use, analysis of function calls, deduction of data types, carding of control flows and fine analysis of interfaces and dependency relations among the code fragments.
After the semantic analysis of each language code segment is completed, a semantic association graph between different language code segments is constructed according to the function call relationship, the data sharing mechanism and the language-specific interaction protocol in the code, for example, cross-language interface standards defined in some multi-language frameworks. The atlas can clearly reveal the semantic interaction path and the dependency network among all parts in the multi-language mixed code, and provides comprehensive and detailed semantic information support for the subsequent compiling and optimizing steps.
In one embodiment, in the process of constructing the semantic analysis unit, a semantic model capable of deeply analyzing the complex syntax structure of C++ is constructed based on the standard syntax rules of C++, and the model comprises classes, templates, reloading functions and the like. Specifically, when the variable declaration in the C++ code is analyzed, the model accurately deduces the type and life cycle of the variable according to details such as the type modifier and the scope qualifier of the variable. And when the function call analysis is carried out, complex conditions such as function reload, default parameters and the like are properly processed. For Python, a semantic analysis model is also constructed based on its dynamic type system and unique indentation grammar rules. In the data type deducing link, the model dynamically and accurately determines the type of the variable according to the assignment operation of the variable. When the function call is processed, the method flexibly responds to various function parameter transmission modes in Python, such as position parameters, keyword parameters, variable parameters and the like.
In one embodiment, in the cross-language semantic association and integration stage, when a function in c++ is called by Python code, a call statement of the c++ function and a corresponding function call definition in Python need to be analyzed, so that semantic association between the two needs to be accurately established. For example, if there is a call in Python code such as lib.add_ numbers (5, 3), it needs to be associated directly with the add_ numbers function in C++, and likewise, a call in lib.multiple_ numbers (2, 4) needs to be associated with the multiply _ numbers function in C++. In addition, for data shared between c++ and Python, such as global_variable, it is necessary to deeply analyze access patterns and sharing mechanisms in two languages and integrate these information into semantic association graphs to ensure accuracy and efficiency of cross-language data interaction.
S103, converting the plurality of code segments according to the semantic association graph so as to determine an intermediate form.
Based on the constructed semantic association graph, the multi-language mixed code is converted into a unified intermediate representation form. The intermediate representation not only comprehensively retains the grammar structure information of codes, but also deeply merges the semantic relation of cross languages. After the intermediate representation is successfully generated, a series of optimization operations are performed according to semantic information therein. For example, for frequently called functions in cross-language, inline optimization is implemented to improve efficiency, for shared data access, optimization is performed to reduce unnecessary data transmission overhead, and at the same time, fine instruction scheduling optimization is performed according to the execution flow and semantic logic of codes, so as to further improve overall performance.
In one embodiment, in the intermediate representation generation and optimization stage, a unified intermediate representation is designed that employs graph-based data structures to precisely map the syntactic structure and semantic relationships of C++ and Python code. In this figure, nodes represent core elements such as variables, functions, code blocks, etc., while edges depict data dependencies, control flow associations, and cross-language function call relationships, etc. In the optimization procedure, python functions frequently called in c++, especially those with relatively compact functions, are analyzed and identified. And adopting an inline strategy for the functions, and directly integrating the functions into an intermediate representation part corresponding to the C++ code, so that the additional cost of function call is effectively reduced. At the same time, the access operation for the shared data is deeply optimized. When detecting that multiple redundant data transmission exists, optimizing a data access path, and reducing the number of data transmission times by introducing an intermediate buffer mechanism or directly utilizing a memory mapping technology, thereby improving the overall performance of the system.
S104, determining an operation environment of the target platform, and converting the intermediate form according to the operation environment to convert the intermediate form into a code format corresponding to the target platform.
And according to the specific requirements of the instruction set architecture and the runtime environment of the target platform, the optimized intermediate representation is accurately converted into a target machine code or a code format which can be directly executed by the target platform. In the conversion process, the unique operation characteristics of the multi-language mixed code on the target platform are considered, and the fine register allocation, the memory layout adjustment, the adaptation of an exception handling mechanism and the like are performed according to the unique operation characteristics, so that the generated target code can be ensured to realize efficient and stable operation on the target platform.
In one embodiment, if the target platform is an operating system based on the x86 architecture, the optimized intermediate representation is accurately converted to x86 machine code. When the register allocation is carried out, the use frequency of variables and the life cycle of the variables are considered, so that the general registers and the special purpose registers are scientifically and reasonably allocated. For the adjustment of the memory layout, the storage requirements of the C++ and Python codes in the memory are comprehensively analyzed, for example, the layout mode of the objects in the C++ and the reference mechanism of the objects in the Python are comprehensively analyzed, and the optimization layout is carried out on the basis so as to improve the memory access efficiency. Meanwhile, the requirements of an exception handling mechanism of a target platform are strictly followed, and corresponding processing code fragments are added aiming at cross-language exception conditions possibly occurring. This measure aims to ensure that the program exhibits a high degree of stability and reliability during operation.
As shown in fig. 2, the embodiment of the present application further provides a compiling device for a multilingual mixed code, including:
at least one processor, and
A memory communicatively coupled to the at least one processor, wherein,
The memory stores instructions executable by the at least one processor to enable a multi-language hybrid code compiling device to perform:
scanning the multi-language mixed code file line by line to identify code segments corresponding to a plurality of languages and determining language identifiers corresponding to the code segments;
determining a corresponding analysis rule according to the language identification, and carrying out semantic analysis on the code segments according to the analysis rule so as to determine semantic association patterns among a plurality of code segments;
converting the plurality of code segments according to the semantic association graph to determine an intermediate form;
And determining the running environment of the target platform, and converting the intermediate form according to the running environment to convert the intermediate form into a code format corresponding to the target platform.
The embodiment of the application also provides a nonvolatile computer storage medium, which stores computer executable instructions, wherein the computer executable instructions are configured to:
scanning the multi-language mixed code file line by line to identify code segments corresponding to a plurality of languages and determining language identifiers corresponding to the code segments;
determining a corresponding analysis rule according to the language identification, and carrying out semantic analysis on the code segments according to the analysis rule so as to determine semantic association patterns among a plurality of code segments;
converting the plurality of code segments according to the semantic association graph to determine an intermediate form;
And determining the running environment of the target platform, and converting the intermediate form according to the running environment to convert the intermediate form into a code format corresponding to the target platform.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable GATE ARRAY, FPGA)) is an integrated circuit whose logic functions are determined by user programming of the device. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler (logic compiler)" software, which is similar to the software compiler used in program development and writing, and the original code before being compiled is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but HDL is not just one, but a plurality of kinds, such as ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language), and VHDL (Very-High-SPEED INTEGRATED Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application SPECIFIC INTEGRATED Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, and the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for the apparatus and medium embodiments, the description is relatively simple, as it is substantially similar to the method embodiments, with reference to the section of the method embodiments being relevant.
The devices and media provided in the embodiments of the present application are in one-to-one correspondence with the methods, so that the devices and media also have similar beneficial technical effects as the corresponding methods, and since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the devices and media are not repeated here.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.