Movatterモバイル変換


[0]ホーム

URL:


WO2022235265A1 - Debug channel for communication between a processor and an external debug host - Google Patents

Debug channel for communication between a processor and an external debug host
Download PDF

Info

Publication number
WO2022235265A1
WO2022235265A1PCT/US2021/030960US2021030960WWO2022235265A1WO 2022235265 A1WO2022235265 A1WO 2022235265A1US 2021030960 WUS2021030960 WUS 2021030960WWO 2022235265 A1WO2022235265 A1WO 2022235265A1
Authority
WO
WIPO (PCT)
Prior art keywords
debug
instruction
processor
processing core
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2021/030960
Other languages
French (fr)
Inventor
Jian Wei
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zeku Inc
Original Assignee
Zeku Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zeku IncfiledCriticalZeku Inc
Priority to PCT/US2021/030960priorityCriticalpatent/WO2022235265A1/en
Publication of WO2022235265A1publicationCriticalpatent/WO2022235265A1/en
Anticipated expirationlegal-statusCritical
Ceasedlegal-statusCriticalCurrent

Links

Classifications

Definitions

Landscapes

Abstract

According to one aspect of the present disclosure, a system-on-a-chip (SoC) is disclosed. The SoC may include, for example, at least one processor comprising an instruction cache and at least one processing core and at least one debug unit configured to facilitate communication between an external debug host and the at least one processor via a debug channel. The debug channel may comprise a portion of an instruction cache miss channel. The at least one processing core may be configured to halt operations associated with regular mode signal processing based on a debug mode trigger. The at least one processing core may be further configured to obtain at least one debug instruction from the at least one debug unit via the instruction cache miss channel. The at least one processing core may be further configured to process the at least one debug instruction.

Description

DEBUG CHANNEL FOR COMMUNICATION BETWEEN A PROCESSOR
AND AN EXTERNAL DEBUG HOST
BACKGROUND
[0001] Embodiments of the present disclosure relate to processors and operations thereof.
[0002] Hardware debug circuitry may be useful in various scenarios to perform debugging of a processor and to identify defects in its operations. A processor core may be configured to halt the running of its regular mode hardware threads (also referred to as “harts”) and to switch to debug mode in response to certain trigger conditions (e.g., breakpoints, watchpoints, etc.). Debug mode is a special mode used only when the core halts harts for external debugging. While in debug mode, a debug module located on the same system-on-chip (SoC) (sometimes referred to as a “platform”) as the processor may read instructions from an instruction transfer register, which is controlled by an external debugger connected to the debug module via a debug port on the platform that includes, e.g., a debug transport module (DTM) and/or a debug module interface (DMT). Depending on the choice of instruction, the debugger may be able to view'· and modify the contents of the processor registers and view and modify the contents of memories connected to the processor. Thus, the debugger may effectively debug software programs and diagnose other defects such as those in the design of the processor itself.
SUMMARY
[0003] Embodiments of processors and operations thereof are disclosed herein.
[0004] According to one aspect of the present disclosure, an SoC is disclosed. The SoC may include, for example, at least one processor comprising an instruction cache and at least one processing core and at least one debug unit configured to facilitate communication between an external debug host and the at least one processor via a debug channel. The debug channel may comprise a portion of an instruction cache miss channel. The at least one processing core may be configured to halt operations associated with regular mode signal processing based on a debug mode trigger. The at least one processing core may be further configured to obtain at least one debug instruction from the at least one debug unit via the instruction cache miss channel. The at least one processing core may be further configured to process the at least one debug instruction. [0005] According to another aspect of the present disclosure, a processor is disclosed. The processor may include an instruction cache and at least one processing core. The processing core may be coupled to the instruction cache and at least one debug unit of an associated SoC. The at least one processing core may be connected to the at least one debug unit via a debug channel that comprises a portion of an instruction cache miss channel. The at least one processing core may be configured to halt operations associated with regular mode signal processing based on a debug mode trigger. The at least one processing core may be configured to receive at least one debug instruction from the at least one debug unit via the instruction cache miss channel. The at least one processing core may be configured to process the at least one debug instruction.
[0006] According to another aspect of the present disclosure, a method of debugging with a SoC is disclosed. The method may include communicating, using at least one debug unit, with an external debug host. The method may also include halting, using at least one processor, operations associated with regular mode signal processing based on a debug mode trigger. The method may also include receiving, at a processing core of the processor, at least one debug instruction from the at least one debug unit. The at least one debug instruction may be received via an instruction cache miss channel. The instruction cache miss channel may be a portion of a debug channel between the at least one debug unit and the at least one processing core. The method may also include processing, using the processing core, the at least one debug instruction.
[0007] According to yet another aspect of the present disclosure, a method of debugging with a processor is disclosed. The method may include halting operations associated with regular mode signal processing based on a debug mode trigger. The method may include receiving at least one debug instruction from at least one debug unit via an instruction cache miss channel. The method may include processing the at least one debug instruction.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate embodiments of the present disclosure and, together with the description, further serve to explain the principles of the present disclosure and to enable a person skilled in the pertinent art to make and use the present disclosure.
[0009] FIG. 1 illustrates a block diagram of an exemplary system having a system-on-a- chip (SoC), according to some embodiments of the present disclosure.
[0010] FIG. 2A illustrates a first detailed block diagram of an exemplary debug system, according to some embodiments of the present disclosure.
[0011] FIG. 2B illustrates a block diagram of an expanded view of the exemplary debug system of FIG. 2A, according to some embodiments of the present disclosure.
[0012] FIG. 3 illustrates a flow chart of an exemplary method for debugging, according to some embodiments of the present disclosure.
[0013] FIG. 4 illustrates a conventional debug system.
DETAILED DESCRIPTION
[0014] Although specific configurations and arrangements are discussed, it should be understood that this is done for illustrative purposes only. A person skilled in the pertinent art will recognize that other configurations and arrangements can be used without departing from the spirit and scope of the present disclosure. It will be apparent to a person skilled in the pertinent art that the present disclosure can also be employed in a variety of other applications.
[0015] It is noted that references in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” “some embodiments,” “certain embodiments,” etc., indicate that one or more embodiments described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases do not necessarily refer to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it would be within the knowledge of a person skilled in the pertinent art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
[0016] In general, terminology may be understood at least in part from usage in context.
For example, the term “one or more” as used herein, depending at least in part upon context, may be used to describe any feature, structure, or characteristic in a singular sense or may be used to describe combinations of features, structures or characteristics in a plural sense. Similarly, terms, such as “a,” “an,” or “the,” again, may be understood to convey a singular usage or to convey a plural usage, depending at least in part upon context. In addition, the terms “based on,” “based upon,” and terms with similar meaning may be understood as not necessarily intended to convey an exclusive set of factors and may, instead, allow for existence of additional factors not necessarily expressly described, again, depending at least in part on context.
[0017] Various aspects of the present disclosure will now be described with reference to various apparatus and methods. These apparatus and methods will be described in the following detailed description and illustrated in the accompanying drawings by various blocks, modules, units, components, circuits, steps, operations, processes, algorithms, etc. (collectively referred to as “elements”). These elements may be implemented using electronic hardware, firmware, computer software, or any combination thereof. Whether such elements are implemented as hardware, firmware, or software depends upon the particular application and design constraints imposed on the overall system.
[0018] In computer programming and software development, debugging is the process of finding and resolving defects or problems that prevent the correct operation of computer programs, software, or systems. Many programming languages and software development tools also offer programs to aid in debugging, known as debuggers. A debugger is a computer program that assists in the detection and correction of errors in other computer programs (also referred to as “target programs”). The main use of a debugger is to run the target program under controlled conditions that permits a programmer to track the test program’s operations while in progress and monitor changes in computer resources, such as memory areas or registers used by the target program or the ’computer’s operating system, that may indicate malfunctioning instructions being run by a processor core. Typical debugging facilities include the ability to run or halt the target program at specific points, display the contents of memory, central processing unit (CPU) registers, or storage devices (e.g., such as disk drives), and modify memory or register contents to enter selected test data that might be a cause of faulty program execution.
[0019] On example debugger is GBD, which is the GNU project debugger. GBD, enables a user to see “inside” a target program being executed on another device. For example, GBD may enable a user to see which lines of code and/or instructions are running on a core of a separate device or enable a user to identify what a target program was doing at the moment it crashed. More specifically, GBD may perform the following functions to enable a user to identify a bug associated with a target program in real-time: 1) enable a user to run the target program and specify trigger conditions (e.g., breakpoints) that may affect the behavior of the target program; 2) halt the target program when a trigger condition is met; 3) enable a user to examine what happened in the lines of code when the target program is halted; and 4) augment portions of the target program, such that the user may experiment with correcting the effects of one bug and go on to learn about another. A conventional debugging system is described below in connection with FIG. 4.
[0020] FIG. 4 illustrates a block diagram of an example debug system 400 that uses an instruction set architecture (ISA). The ISA used by debug system 400 may include, e.g., a reduced instruction set computer (RISC)-V architecture, a complex instruction set computer (CISC) architecture, a very long instruction word (VLIW) architecture, a long instruction word (LIW), an explicitly parallel instruction computing (EPIC) architecture, a minimal instruction set computer (MISC) architecture, or a one instruction set computer (OISC) architecture, just to name a few.
[0021] As seen in FIG. 4, debug system 400 may include, e.g., an external debug host 410
(e.g., laptop computer, tablet, personal computer (PC), smartphone, etc.) and a platform 461, e.g., such as a multi-core processor on a system-on-chip (SoC). Debug host 410 may include, among other things, a debugger 412 and a debug translator 414. Debugger 412 may include, e.g., a GNU debugger (also referred to as a “GBD”), which is a portable debugger that runs on “Unix-like” systems and can be implemented for many programming languages, e.g., including Ada, C, C++, Objective-C, Free Pascal, Fortran, and/or Go, just to name a few. Debugger 412 may enable a user to access information related to instructions and/or functions processed/performed by hardware thread(s) 452 (also referred to as “harts”), which may be running on one or more cores 450 of platform 461. Aside from the aforementioned core(s) 450 and their associated hardware thread(s) 452, platform 461 may also include, e.g., one or more of at least one debug transport module (DTM) 432, a debug module interface (DMI) 434, at least one debug module 440, a system bus 460, and/or a program buffer 470. Platform 461 may be located on an external device (e.g., laptop computer, tablet, personal computer (PC), smartphone, etc.) from debug host 410 and communicate with debug host 410 using a debug port, e.g., such as DTM 432.
[0022] Debug module 440 may act as a slave to a bus, e.g., DMI 434, where the master of the bus is the DTM 432. DMI 434 can be a trivial bus with one master and one slave, or use a more full-featured bus like TileLink or the advanced microcontroller bus architecture (AMB A) Advanced Peripheral Bus. DMI 434 may use a certain number of address bit, e.g., between 7 and 32 address bits. DMI 434 may support certain read and write operations. The bottom of the address space may be used for debug module 440. Debug module 440 may be controlled by debugger 412 via register accesses to its DMI address space. Debug module 440 may implement multiple serial ports, e.g., up to 8 serial ports. The serial ports of debug module 440 may support basic flow control and full duplex data transfer between a component and debugger 412, essentially allowing DTM 432 to communicate with debugger 412. DTM 432 may be based around a normal Joint Test Action Group (JTAG) Test Access Port (TAP). The JTAG TAP allows access to arbitrary JTAG registers by first selecting one of the JTAG registers using the JTAG instruction register (IR), and then accessing it through the JTAG data register (DR).
[0023] Each core 450 on platform 461 may run one or more hardware thread(s) 452 (also referred to as “hart(s)”), which may include a set of functions and/or instructions. Moreover, each hart may have an associated debug mode trigger module 456 that may identify when a debug trigger condition (e.g., a breakpoint) is met for that hart. A user may set the breakpoints used by debug mode trigger module 456 to identify trigger conditions using debugger 412. Examples of breakpoints include, e.g., “line-of-code” (e.g., an exact region of code), “conditional line-of-code” (e.g., an exact region of code, but only when some other condition is true), “document object model (DOM)” (e.g., code that changes or removes a specific DOM node or its children), “XLMHttpRequest (XHR)” (e.g., when an XHR universal resource locator (URL) contains a string pattern), “event listener” (e.g., code that runs after an event), “exception” (e.g., line of code that is throwing a caught or uncaught exception), and/or “function” (e.g., whenever a specific function is called).
[0024] Each time a breakpoint is reached, debug mode trigger module 456 switches core
450 to debug mode and halt the associated hart. Moreover, debug mode trigger module 456 may send a signal indicating the breakpoint and the halted hart to debug module 440, which may in turn send a signal to debugger 412 via DMI 434 and DTM 432. Debugger 412 may then access the hart with the associated breakpoint via debug module 440 to determine any of the following: 1) what statement or expression did the program crash on, if a core dump occurred; 2) if an error occurred while executing a function, what line of the program included the call to that function, and what are the parameters; 3) what are the values of program variables at a particular point during the execution of the program; and/or 4) what is the result of a particular expression in a program, just to name a few.
[0025] For example, to begin, debugger 412 may send debug instructions to debug translator 414 (also referred to as an “open on-chip debugger (openOCD)”). OpenOCD may provide on-chip programming and debugging support with a layered architecture of JTAG TAP support. More specifically, openOCD may be configured to support any of the following functions: 1) Xilinx Serial Vector Format ((X)SVF) playback to facilitate automated boundary scan and field-programmable gate array (FPGA)/Complex Programmable Logic Device (CPLD) programming; 2) debug target program support (e.g., Advanced RISC Machine (ARM), microprocessor without interlocked pipelined stages (MIPS), etc.) such as singlestepping breakpoints, Gprof profiling, etc.; 3) flash chip drivers (e.g., common flash memory interface (CFI), NOT- AND (NAND), internal flash); and/or 4) embedded tiny command language (TCL) interpreting for scripting, just to name a few. When debugger 412 includes GBD, debugger 412 may enable openOCD to function as a “remote target program” for source- level debugging of embedded systems at platform 461 using the GNU GBD program or other similar programs. Debug translator 414 may communicate with debug transport hardware 420, e.g., such as a JTAG adapter, which may be configured to connect with platform 461 via DTM 432 and/or DMI 434.
[0026] Functionally, DTM 432, DMI 434, and debug module(s) 440 provide a channel through which debugger 412 gains access to the harts running on any core 450 in debug mode or, under certain conditions, regular mode. For example, debug module(s) 440 may communicate with core(s) 450 using reset/halt control module 442, abstract commands module 444, and bus access module 446. Debug module 440 may enable the debugger 412 to halt any hart running on any core 450 by sending reset/halt control signaling and abstract commands to cores 450. Debugger 412 may execute abstract commands by sending them to debug module 440, which implements a translation interface between abstract debug operations sent from debugger 412 and their specific implementation on a hart, e.g., using abstract commands module 444. Debug module 440 may support the following operations: 1) providing debugger 412 information related to the implementation of platform 461 and/or core(s) 450; 2) enabling any individual hart to be halted and resumed; 3) providing status on which harts are halted to debugger 412; 4) provide debugger 412 read and write access to a halted hart’s global pointer registers (GPRs); 5) enabling debugger 412 to access system memory; and 6) providing an interface between debugger 412 and the core(s) 450 that force the hart to execute arbitrary instructions sent from debugger 412. Debug module 440 may also access system bus 460 and program buffer 470. In certain implementations, abstract commands module 444 of debug module 440 may provide debugger 412 with access to GPRs.
[0027] Program buffer 470 may allow the debugger 412 to execute arbitrary instructions on the hart, which may allow access to additional hart states, and/or access memory. In certain implementations, core(s) 450 may implement control units, which may be configured to convert abstract commands to other signaling (e.g., instructions, data migration assistant (DMA) commands, etc.). In certain other implementations, debug module(s) 440 may generate instructions for the program buffer 470 based on abstract commands. Bus access module 446 may provide debug module(s) 440 with direct access a memory that is external to core 450.
[0028] While the debug module 440 of conventional debug system 400 provides a certain amount of utility during debugging, such as those operations mentioned above, debug module 440 occupies an undesirable amount of space on platform 461 (e.g., a silicon-on-chip (SoC)). Moreover, debug module 440 is expensive to manufacture due to, among other things, its large size. Still further, debug module 440 impedes the regular functionality of the micro-controller of core 450, which may reduce the performance of the device in which platform 461 is located by introducing significant timing delays in timing-critical paths performed during the regular mode.
[0029] Thus, there is an unmet need for a channel through which debugger 412 can implement debugging operations at core 450 that has a smaller silicon footprint and limits the disruption to regular mode operations performed by the micro-controller of core 450 as compared with using debug module 440.
[0030] To overcome the challenges mentioned above in connection with the sizeable and costly debug module of conventional ISAs, the present disclosure provides a debug channel from the DTM/DMI to the core using the instruction cache miss channel such that any debug instructions are run by the core itself. When the core switches to debug mode, an instructions cache miss signal may be input to the processor by the DTM/DMI such that the processor stalls operations as if a regular mode instruction was missed and it waits for an instruction from an external memory via the instruction cache miss channel. Rather than an instruction being input from external memory, a debug instruction is input via the debug channel to the core. Then, the core runs the debug instruction rather than a debug module, thereby enabling information associated with debug instruction to be displayed to the user on the device that includes the core and without disrupting regular mode operations that are time sensitive. In other words, the debug module of FIG. 4 may be omitted entirely in the platform described herein, there reducing cost and increasing the available silicon area on the SoC.
[0031] FIG. 1 illustrates a block diagram of an exemplary system 100 having an SoC 102, according to some embodiments of the present disclosure. System 100 may include SoC 102 having a processor 108 and a primary memory 110, a bus 104, and a secondary memory 106. System 100 may be applied or integrated into various systems and apparatus capable of highspeed data processing, such as computers and wireless communication devices. For example, system 100 may be part of a mobile phone, a desktop computer, a laptop computer, a tablet, a vehicle computer, a gaming console, a printer, a positioning device, a wearable electronic device, a smart sensor, a virtual reality (VR) device, an argument reality (AR) device, or any other suitable electronic devices having high-speed data processing capability. Using a wireless communication device as an example, SoC 102 may serve as an application processor (AP) and/or a baseband processor (BP) that imports data and instructions from secondary memory 106, executing instructions to perform various mathematical and logical calculations on the data, and exporting the calculation results for further processing and transmission over cellular networks.
[0032] As shown in FIG. 1, secondary memory 106 may be located outside SoC 102 and operatively coupled to SoC 102 through bus 104. Secondary memory 106 may receive and store data of different types from various sources via communication channels (e.g., bus 104). For example, secondary memory 106 may receive and store digital imaging data captured by a camera of the wireless communication device, voice data transmitted via cellular networks, such as a phone call from another user, or text data input by the user of the system through an interactive input device, such as a touch panel, a keyboard, or the like. Secondary memory 106 may also receive and store computer instructions to be loaded to processor 108 for data processing. Such instructions may be in the form of an instruction set, which contains discrete instructions that teach the microprocessor or other functional components of the microcontroller chip to perform one or more of the following types of operations — data handling and memory operations, arithmetic and logic operations, control flow operations, coprocessor operations, etc. Secondary memory 106 may be provided as a standalone component in or attached to the apparatus, such as a hard drive, a Flash drive, a solid-state drive (SSD), or the like. Other types of memory compatible with the present disclosure may also be conceived. It is understood that secondary memory 106 may not be the only component capable of storing data and instructions. Primary memory 110 may also store data and instructions and, unlike secondary memory 106, may have direct access to processor 108. Secondary memory 106 may be a non-volatile memory, which can keep the stored data even though power is lost. In contrast, primary memory 110 may be volatile memory, and the data may be lost once the power is lost. Because of this difference in structure and design, each type of memory may have its own dedicated use within the system.
[0033] Data between secondary memory 106 and SoC 102 may be transmitted via bus 104.
Bus 104 functions as a highway that allows data to move between various nodes, e.g., memory, microprocessor, transceiver, user interface, or other sub-components in system 100, according to some embodiments. Bus 104 can be serial or parallel. Bus 104 can also be implemented by hardware (such as electrical wires, optical fiber, etc.). It is understood that bus 104 can have sufficient bandwidth for storing and loading a large amount of data (e.g., vectors) between secondary memory 106 and primary memory 110 without delay to the data processing by processor 108.
[0034] SoC designs may integrate one or more components for computation and processing on an integrated-circuit (IC) substrate. Such SoC designs may also be referred to as “platforms.” For applications where chip size matters, such as smartphones and wearable gadgets, SoC design is an ideal design choice because of its compact area. It further has the advantage of small power consumption. In some embodiments, as shown in FIG. 1, one or more processors 108 and primary memory 110 are integrated into SoC 102. It is understood that in some examples, primary memory 110 and processor 108 may not be integrated on the same chip, but instead on separate chips.
[0035] Processor 108 may include any suitable specialized processor including, but not limited to, CPU, graphic processing unit (GPU), digital processing processor (DSP), tensor processing unit (TPU), vision processing unit (VPU), neural processing unit (NPU), synergistic processing unit (SPU), physics processing unit (PPU), and image signal processor (ISP). Processor 108 may also include a microcontroller unit (MCU), which can handle a specific operation in an embedded system. In some embodiments in which system 100 is used in wireless communications, each MCU handles a specific operation of a mobile device, for example, communications other than cellular communication (e.g., Bluetooth communication, Wi-Fi communication, frequency modulation (FM) radio, etc.), power management, display drive, positioning and navigation, touch screen, camera, etc. Processor 108 may operate in regular mode or debug mode. In some embodiments, the MCU may stall regular mode operations of processor 108 while in debug mode.
[0036] As shown in FIG. 1, processor 108 may include one or more processing cores 112
(a.k.a. “cores”), a register array 114, and a control module 116. As used herein, an SoC 102 with processor 108 with multiple processing cores 112 may be referred to as a “platform.” In some embodiments, processing core 112 may include one or more functional units that perform various data operations. For example, processing core 112 may include an arithmetic logic unit (ALU) that performs arithmetic and bitwise operations on data (also known as “operand”), such as addition, subtraction, increment, decrement, AND, OR, Exclusive-OR, etc. Processing core 112 may also include a floating-point unit (FPU) that performs similar arithmetic operations but on a type of operands (e.g., floating-point numbers) different from those operated by the ALU (e.g., binary numbers). The operations may be addition, subtraction, multiplication, etc. Another way of categorizing the functional units may be based on whether the data processed by the function unit is a scalar or a vector. For example, processing cores 112 may include scalar function units (SFUs) for handling scalar operations and vector function units (VFUs) for handling vector operations. It is understood that in case that processor 108 includes multiple processing cores 112, each processing core 112 may carry out regular mode data and instruction operations in serial or in parallel. This multi-core processor design can effectively enhance the processing speed of processor 108 and multiplies its performance. In some embodiments, processor 108 may be a CPU with a vector co-processor (a vector engine) that can handle both scalar operations and vector operations. Each processing core 112 may include a debug mode trigger module 156 configured to identify breakpoints and/or watchpoint. When a breakpoint and/or watchpoint occurs, debug mode trigger module 156 may send a signal to the MCU that switches the processor 108 and/or one or more processing core(s) 112 to debug mode and halt regular mode operations so that debugging may occur using an external debug host, as described in additional detail below in connection with FIGs. 2 A and 2B.
[0037] Register array 114 (also referred to as an “instruction cache”) may be operatively coupled to processing core 112 and primary memory 110 and include multiple sets of registers for various purposes. Because of their architecture design and proximity to processing core 112, register array 114 allows processor 108 to access data, execute instructions, and transfer computation results faster than primary memory 110, according to some embodiments. In some embodiments, register array 114 includes a plurality of physical registers fabricated on SoC 102, such as fast static random-access memory (RAM) having multiple transistors and multiple dedicated read and write ports for high-speed processing and simultaneous read and/or write operations, thus distinguishing from primary memory 110 and secondary memory 106 (such as a dynamic random-access memory (DRAM), a hard drive, or the like). The register size may be measured by the number of bits they can hold (e.g., 4 bits, 8 bits, 16 bits, 32 bits, 64 bits, 128 bits, 256 bits, 512 bits, etc.). In some embodiments, register array 114 serves as an intermediary memory placed between primary memory 110 and processing core 112. For example, register array 114 may hold frequently used programs or processing tools so that access time to these data can be reduced, thus increasing the processing speed of processor 108 while also reducing power consumption of SoC 102. In another example, register array 114 may store data being operated by processing core 112, thus reducing delay in accessing the data from primary memory 110. This type of register is known as data registers. Another type is address registers, which may hold addresses and may be used by instructions for indirect access of primary memory 110. There are also status registers that decide whether a certain instruction should be executed, such as the control and status register (CSR). In some embodiments, at least part of register array 114 is implemented by one or more physical register files (PRFs) within processor 108.
[0038] Control module 116 may be operatively coupled to primary memory 110 and processing core 112. Control module 116 may be implemented by circuits fabricated on the same semiconductor chip as processing core 112. Control module 116 may serve as a role similar to a command tower. For example, control module 116 may retrieve and decode various computer instructions from primary memory 110 to processing core 112 and instruct processing core 112 what processes to be carried out on operands loaded from primary memory 110. Computer instructions (also referred to herein as “hardware threads” and “harts”) may be in the form of a computer instruction set. Different computer instructions may have a different impact on the performance of processor 108. For example, instructions from a RISC are generally simpler than those from a CISC and thus may be used to achieve fewer cycles per instruction, therefore reducing the processing time by processor 108. Examples of processes carried out by processor 108 include setting a register to a fixed value, copying data from a memory location to a register, copying data between registers, adding, subtracting, multiplying, and dividing, comparing values stored on two different registers, etc. In some embodiments, control module 116 may further include an instruction decoder (not shown) that decodes the computer instructions into instructions readable by other components on processor 108, such as processing core 112. The decoded instructions may be subsequently provided to processing core 112.
[0039] It is understood that additional components, although not shown in FIG.l, may be included in SoC 102 as well, such as components that interface with an external debug host and form a debug channel to processor 108 and/or processing core(s) 112, as described below in detail with respect to FIGs. 2A and 2B.
[0040] FIGs. 2A and 2B illustrate an embodiment of an exemplary debug system
200 organized according to a particular microarchitecture, according to some embodiments of the disclosure. In some embodiments, debug system 200 is configured to implement the RISC- V instruction set architecture (ISA), although other embodiments may implement other suitable ISAs. For example, debug system 200 may be additionally and/or alternatively configured to implement one or more of, e.g., CISC architecture, VLIW architecture, LIW, EPIC architecture, MISC architecture, or OISC architecture. As illustrated in FIGs. 2A and 2B, debug system 200 may include, e.g., SoC 102, processor 108, and processing core(s) 112, debug transport circuit 220, and debug host 210. As shown in FIG. 2A, SoC 102 may also include DTM 232 and DMI 234 that form a debug channel 290 to processing core 112 without a debug module. Rather than use a debug module to run debug instructions to access registers or memory associated with processing core 112, processing core 112 runs the debug instructions directly, thereby reducing the footprint of debug components on SoC 102.
[0041] Each processing core 112 may include circuits configured to perform various aspects of instruction execution. In particular, processing core 112 includes a fetch circuit 211 coupled to an aligner circuit 222, which is in turn coupled to a decoder circuit 213. Decoder circuit 213 is coupled to a number of instruction execution circuits, including first and second integer execution circuits, respectively denoted IEX0 224 and 1EX1 215, along with load/store circuit 216, multiplier circuit 217, and divider circuit 218. Additionally, processor 108 includes a memory processing unit 221, an instruction cache 230, and a bus interface unit 250 that connects processor 108 to DTM 232, a DMI 234, and secondary memory 106.
[0042] While operating in regular mode, processing core 112 may be configured to fetch instructions and necessary data, execute instructions, and write results either locally (e.g., to a register file) or into a memory subsystem, fit particular, fetch circuit 211 may be configured to initiate this process by retrieving instructions for execution. In various embodiments, fetch circuit 211 may be configured to implement program counter logic and branch prediction circuitry in order to trade the flow of program execution and attempt to predict the outcome of conditional branches in order to speculatively fetch branch targets. For example, fetch circuit 211 may implement a “gshare^-styfe branch predictor in which a table of branch direction predictors is used in combination with a branch target buffer (i.e., a cache of branch target addresses) along with the current program counter and an indicator of global branch history7 to generate a predicted address from which to fetch instructions.
[0043] The fetch address generated by fetch circuit 211 may be directed to instruction cache 230. In some embodiments, instruction cache 230 may he implemented as a pipelined, banked, set-associative cache that is accessed by performing an index lookup and a tag comparison to verify that the fetch address is in fact present in instruction cache 230. In the event of a cache miss, instruction cache 230 may send a miss signal to processing core 112 to stall operations until the missing instruction can be retrieved. Then, processing core 112 may send the fetch address to bus interface unit 250 to be retrieved from secondary memory7 106 via instruction cache miss channel 291 that connects to processing core 112.
10044 ! In some IS As, instructions may have variable lengths. For example, the RISC-V
ISA defines a set of 32-bit instructions as well as 16-bit‘compressed” variants of a subset of the 32-bit instructions. Accordingly, in some embodiments, aligner circuit 222 may be configured to identify instruction boundaries within the fetch stream and extract the corresponding instructions for further processing. For example, aligner circuit 222 may be configured to identify RISC-V 16-bit compressed instructions and convert them to their uncompressed 32 -bit variants for downstream processing, which may simplify later processing relative to preserving the compressed instructions in their native format. Decoder circuit 213 may be configured to receive fetched instructions from aligner circuit 222 and decode them to determine how7 further processing should proceed within processing core 112. For example, decoder circuit 213 may examine the operand fields of instructions to determine instruction dependencies that may dictate when an instruction is ready to execute. If an instruction requires a result that is not yet available, decoder circuit 213 may delay its execution (and possibly the execution of upstream instructions) until its dependencies are satisfied. In some embodiments, decoder circuit 213 may attempt to group multiple instructions for concurrent execution. To simplify the complexity of this task, some embodiments of decoder circuit 213 may limit the number of instructions issued for concurrent execution. For example, although processing core 112 includes multiple execution units that could, in theory, operate concurrently, these execution units may be grouped such that only two instructions are issued per cycle by decoder circuit 213. In other embodiments, however, such limitations may not apply.
[0045] In some embodiments, decoder circuit 213 may implement additional operations.
For example, decoder circuit 213 may detect synchronization attributes of particular instructions (e.g., instructions that may have special execution timing requirements relative to other instructions in order to ensure correct execution) and appropriately stall or freeze the execution pipeline in order to enforce those attributes. In some instances, decoder circuit 213 may also include a register file configured to implement the architected registers defined by the ISA and/or con troi /status registers defined by the ISA or the particular processor implementation, although these features may alternatively be implemented elsewhere within processing core 112.
[0046] Once processed by decoder circuit 213, instructions may then be issued to the appropriate execution circuit for execution, in the illustrated embodiment, processing core 112 includes two integer execution circuits IEX0 224 and IEX1 215, each of which may implement circuitry for executing arithmetic, logical, and shift instructions defined by the ISA. In the illustrated embodiment, lEXO 224 and IEX1 215 are each configured to implement two arithmetic/logic units (ALUs), for a total of four ALUs. In addition to the integer execution circuits, load/store circuit 216 may be configured to execute load and store instructions defined by the ISA.
[0047] Multiplier circuit 217 may be configured to implement integer multiplication instructions defined by the ISA, Divider circuit 218 may be configured to implement integer division instructions defined by the ISA, While multiplier circuit 217 may be pipelined, integer division is typically a complex, long-latency operation. Accordingly, in the illustrated embodiment, divider circuit 218 is implemented as a non-pipelined circuit and instructions dependent on the results of an integer division instruction will stall until the division is complete. It is noted that while floating-point arithmetic is not explicitly discussed above, embodiments of processing core 112 may include execution circuits that support such operations.
[0048] As shown in FIG. 2 A, processor 108 may include a memory' processing unit
(MPU) 221 interposed between processing core 112 and other elements of the memory hierarchy, such as instruction cache 230 and bus interface unit 250. In some embodiments, MPU 221 may include circuitry that supports the load/store pipeline, such as buffers and queues. For example, once load/store circuit 216 computes a memory address (or, in some cases, once fetch circuit 211 computes a fetch address), in some embodiments, a memory access may he enqueued within MPU 221 while awaiting downstream processing. Similarly, MPU 221 may implement a store buffer that is configured to hold post-commit store instructions (e.g., store instructions that have been completed and are intended to modify programmer-visible state) until they can be written to the memory subsystem via bus interface unit 250. It is noted that in other embodiments, some or all of the features of MPU 221 may be implemented elsewhere within processor 108, such as within load/store circuit 216, for example. Additionally, in some embodiments, MPU 221 may implement protection features that, for example, enforce a privilege model or otherwise restrict access to defined addresses or regions of the memory address space, which may improve the stability and security of code execution. In embodiments of processor 108 that support virtual memory addressing, MPU 221 may additionally include circuitry related to address translation, such as translation lookaside buffers (TLBs).
[0049] Bus interface unit (BIU) 250 may be configured to interface the processor 108 with other devices, such as secondary memory 106, input/output devices, such as DMI 234/OTM 232, for example, or other peripherals. External devices may either be on the same chip as SoC 102 or off-chip. In some embodiments, BIU 250 may interface with external devices according to a version of the AMBA standard, such as the Advanced High-Performance Bus (AHB) bus protocol introduced in the AMBA 2 specification. Any other suitable bus architecture or protocol may be employed, however. BIU 250 may include circuits such as load and store queues configured to store pending load and store instructions, as well as state machines or other circuits configured to implement the appropriate bus transaction logic.
[0050] When debug mode is triggered by debug mode trigger module 156, instruction cache 230 may send a miss signal to processing core 112. Moreover, DTM 232 may act as an interface between processor 108 and debug host 210 and enable debug instructions to be delivered and processed by processing core 112, which is expecting a missing instruction to be received via instruction cache miss channel 291. Processing core 112 may process the debug instructions received via a debug channel 290 that overlaps in part with the instruction cache miss channel 291 as it would a regular instruction that is received via the instruction cache miss channel 291 As seen in FIG. 2A, SoC 102 does not include a debug module as in FIG. 4. This is because processing core 112 may perform the same or similar functions as debug module in FIG. 4. For example, when processing core 112 enters debug mode, a signal may be sent from the debug mode trigger module 156 to debugger 212 via debug channel 290. Then debug instructions may be processed by processing core 112 rather than the conventional debug module used in known approaches.
[0051] Similar to the debug host of FIG. 4, debug host 210 of debug system 200 may include, for example, debugger 212 and debug translator 214, which may each be implemented using the same or similar structures and/or functionalities as the counterpart components described above in detail in connection with FIG. 4. For example, debugger 212 may include, e.g., a GBD debugger. Debugger 212 may enable a user to access information related to instructions and/or functions processed/performed by harts running on processing core 112 by sending the debug instructions directly to processing core 112 through debug channel 290.
[0052] Although not shown in FIG. 2 A, each processing core 112 may run one or more harts, which may include a set of functions and/or instructions. Moreover, each hart may have an associated debug mode trigger module 156 that may identify when a debug trigger condition (e.g., a breakpoint, watchpoint, etc.) is met for that hart. A user may use debugger 212 to set the breakpoints and/or watchpoints used by debug mode trigger module 156.
[0053] Each time a breakpoint or watchpoint occurs, debug mode trigger module 156 switches processor 108 to debug mode and halt the hart(s) running on the various processing cores 112. Moreover, debug mode trigger module 156 may send a signal indicating the breakpoint and/or watchpoint and the halted hart to debugger 212 via debug channel 290 of SoC 102. Debugger 212 may then access the hart with the associated breakpoint directly via processing core 112 (rather than a debug module) to determine any of the following: 1) what statement or expression did the program crash on, if a core dump occurred; 2) if an error occurred while executing a function, what line of the program included the call to that function, and what are the parameters; 3) what are the values of program variables at a particular point during the execution of the program; and/or 4) what is the result of a particular expression in a program, just to name a few.
[0054] For example, to begin, debugger 212 may send debug instructions to debug translator 214 (e.g., openOCD). OpenOCD may provide on-chip programming and debugging support with a layered architecture of JTAG TAP support. More specifically, openOCD may be configured to support any of the following functions: 1) (X)SVF playback to facilitate automated boundary scan and FPGA/CPLD programming; 2) debug target program support (e.g., ARM, MIPS, etc.) such as single-stepping breakpoints, gprof profiling, etc.; 3) flash chip drivers (e.g., CFI, NAND, internal flash); and/or 4) embedded TCL interpreting for scripting, just to name a few. When debugger 212 includes GBD, debugger 212 may enable openOCD to function as a ‘‘remote target program” for source-level debugging of embedded systems at SoC 102 using the GNU GBD program or other similar programs. Debug translator 214 may communicate with debug transport circuit 220, e.g., such as a JTAG adapter, which may be configured to connect with SoC 102 via DTM 232 and/or DMI 234.
[0055] Functionally, DTM 232, DMI 234, BIU 250, and the portion of the instruction cache miss channel 291 between BIU 250 and processing core 112 provide a debug channel 290 through which debugger 212 gains access to the harts running on any processing core 112 while in debug mode or, under certain conditions, regular mode. Debugger 212 may execute abstract commands by sending them to processing core 112 via debug channel 290, which implements a translation interface between abstract debug operations sent from debugger 212 and their specific implementation on a hart. While in debug mode, processing core 112 may support the following operations: 1) providing debugger 212 information related to the implementation of SoC 102, processor 108, and/or processing core 112; 2) enabling any individual hart to be halted and resumed; 3) providing status on which harts are halted to debugger 212; 4) provide debugger 212 read and write access to a halted hart’s GPRs; and 5) enabling debugger 212 to access system memory.
[0056] Thus, by implementing debug channel 290, the debug module of conventional debug systems may be omitted in part or entirely from debug system 200 of FIGs. 2A and 2B, thereby freeing space within the SoC 102 and reducing its cost to manufacture.
[0057] FIG. 3 illustrates a flow chart of an exemplary method 300 for debugging using an apparatus, according to some embodiments of the present disclosure. Examples of the apparatus that can perform operations of method 300 include, for example, SoC 102, processor 108, and/or processing core 112 depicted in FIGs. 1, 2A, and 2B or any other suitable apparatus disclosed herein. It is understood that the operations shown in method 300 are not exhaustive and that other operations can be performed as well before, after, or between any of the illustrated operations. Further, some of the operations may be performed simultaneously, or in a different order than shown in FIG. 3.
[0058] Referring to FIG. 3, at 302, the apparatus may communicate with an external debug host via a debug channel located within the apparatus. The debug channel including a portion of an instruction cache miss channel. For example, referring to FIG. 2A, communication between debug host 210 and processor 108/processing core 112 may be facilitated through debug channel 290. The portion of the instruction cache miss channel 291 between BIU 250 and processing core 112 may be a portion of debug channel 290 such that when debug mode is entered and instruction cache 230 sends a miss signal, processing core 112 processes a debug instruction as if it were a regular mode instruction received via instruction cache miss channel 291.
[0059] At 304, the apparatus may halt operations associated with regular mode signal processing based on a debug mode trigger. For example, referring to FIG. 2A, each time a breakpoint or watchpoint occurs, debug mode trigger module 156 switches processor 108 to debug mode and halt the hart(s) running on the various processing cores 112.
[0060] At 306, the apparatus may receive at least one debug instruction from the at least one debug unit via the instruction cache miss channel. For example, referring to FIG. 2A, processing core 112 may receive the debug instructions via a debug channel 290 that overlaps in part with the instruction cache miss channel 291.
[0061] At 308, the apparatus may process the at least one debug instruction. For example, referring to FIG. 2A, while in debug mode, processing core 112 may support the following operations by processing a debug instruction: 1) providing debugger 212 information related to the implementation of SoC 102, processor 108, and/or processing core 112; 2) enabling any individual hart to be halted and resumed; 3) providing status on which harts are halted to debugger 212; 4) provide debugger 212 read and write access to a halted hart’s GPRs; and 5) enabling debugger 212 to access system memory.
[0062] In various aspects of the present disclosure, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or encoded as instructions or code on a non-transitory computer-readable medium. Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computing device, such system 100 in FIG. 1. By way of example, and not limitation, such computer-readable media can include RAM, read-only memory (ROM), electrically erasable programmable ROM (EEPROM), compact disc-ROM (CD-ROM) or other optical disk storage, hard disk drive (HDD), such as magnetic disk storage or other magnetic storage devices, flash drive, SSD, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a processing system, such as a mobile device or a computer. Disk and disc, as used herein, includes CD, laser disc, optical disc, digital versatile disc (DVD), and floppy disk where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
[0063] According to one aspect of the present disclosure, an SoC is disclosed. The SoC may include, for example, at least one processor comprising an instruction cache and at least one processing core and at least one debug unit configured to facilitate communication between an external debug host and the at least one processor via a debug channel. The debug channel may comprise a portion of an instruction cache miss channel. The at least one processing core may be configured to halt operations associated with regular mode signal processing based on a debug mode trigger. The at least one processing core may be further configured to obtain at least one debug instruction from the at least one debug unit via the instruction cache miss channel. The at least one processing core may be further configured to process the at least one debug instruction.
[0064] In some embodiments, the at least one debug unit may include one or more of a
DMI or a DTM.
[0065] In some embodiments, the SoC may further comprise a debug mode trigger unit. In some embodiments, the debug mode trigger unit may be configured to identify a debug mode trigger. In some embodiments, the debug mode trigger unit may be configured to switch the at least one processor from regular mode to debug mode when the debug mode trigger is identified.
[0066] In some embodiments, the instruction cache may be configured to send at least one miss signal to the at least one processing core while in debug mode.
[0067] In some embodiments, the at least one debug unit is further configured to receive the at least one debug instruction from the external debug host.
[0068] In some embodiments, the at least one processing core may be further configured to output debug information associated with the at least one debug instruction to a display.
[0069] In some embodiments, the at least one processor may be associated with one or more of a RISC-V architecture, a CISC architecture, a VLIW architecture, a LIW architecture, an EPIC architecture, a MISC architecture, or an OISC architecture. [0070] In some embodiments, the at least one debug unit may be associated with the RISC-
V architecture.
[0071] According to another aspect of the present disclosure, a processor is disclosed. The processor may include an instruction cache and at least one processing core. The processing core may be coupled to the instruction cache and at least one debug unit of an associated SoC. The at least one processing core may be connected to the at least one debug unit via a debug channel that comprises a portion of an instruction cache miss channel. The at least one processing core may be configured to halt operations associated with regular mode signal processing based on a debug mode trigger. The at least one processing core may be configured to receive at least one debug instruction from the at least one debug unit via the instruction cache miss channel. The at least one processing core may be configured to process the at least one debug instruction.
[0072] In some embodiments, the at least one debug unit may include one or more of a
DMI or a DTM.
[0073] In some embodiments, the processor may include a debug mode trigger unit. In some embodiments, the debug mode trigger unit may be configured to identify a debug mode trigger. In some embodiments, the debug mode trigger unit may be configured to switch from regular mode to debug mode when the debug mode trigger is identified.
[0074] In some embodiments, the instruction cache may be configured to send at least one miss signal to the at least one processing core while in debug mode.
[0075] In some embodiments, the at least one processing core may be further configured to send debug information to the at least one debug unit via the debug channel for communication with an external debug host.
[0076] In some embodiments, the at least one processing core may be further configured to output debug information associated with the at least one debug instruction to a display.
[0077] In some embodiments, the at least one processing core may be associated with one or more of a RISC-V architecture, a CISC architecture, a VLIW architecture, a LIW architecture, an EPIC architecture, a MISC architecture, or an OISC architecture.
[0078] According to another aspect of the present disclosure, a method of an SoC is disclosed. The method may include communicating, using at least one debug unit, with an external debug host. The method may also include halting, using at least one processor, operations associated with regular mode signal processing based on a debug mode trigger. The method may also include receiving, at a processing core of the processor, at least one debug instruction from the at least one debug unit. The at least one debug instruction may be received via an instruction cache miss channel. The instruction cache miss channel may be a portion of a debug channel between the at least one debug unit and the at least one processing core. The method may also include processing, using the processing core, the at least one debug instruction.
[0079] In some embodiments, the at least one debug unit may include one or more of a
DMI or a DTM.
[0080] In some embodiments, the method may include identifying, using a debug mode trigger unit, the debug mode trigger. In some embodiments, the method may include switching, using the debug mode trigger unit, the at least one processor from regular mode to debug mode when the debug mode trigger is identified.
[0081] In some embodiments, the method may include receiving, at the processing core, at least one miss signal from the instruction cache during debug mode. In some embodiments, the method may include receiving, at the at least one debug unit, the at least one debug instruction from the external debug host. In some embodiments, the method may include sending the at least one debug instruction to the at least one processor via the debug channel. In some embodiments, the method may include generating, using the processing core, debug information associated with the at least one debug instruction. In some embodiments, the method may include outputting the debug information to a display.
[0082] According to yet another aspect of the present disclosure, a method of a processor is disclosed. The method may include halting operations associated with regular mode signal processing based on a debug mode trigger. The method may include receiving at least one debug instruction from at least one debug unit via an instruction cache miss channel. The method may include processing the at least one debug instruction.
[0083] The foregoing description of the specific embodiments will reveal the general nature of the present disclosure that others can, by applying knowledge within the skill of the art, readily modify and/or adapt for various applications such specific embodiments, without undue experimentation, without departing from the general concept of the present disclosure. Therefore, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance.
[0084] Embodiments of the present disclosure have been described above with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed.
[0085] The Summary and Abstract sections may set forth one or more but not all exemplary embodiments of the present disclosure as contemplated by the inventor(s), and thus, are not intended to limit the present disclosure and the appended claims in any way.
[0086] Various functional blocks, modules, and steps are disclosed above. The particular arrangements provided are illustrative and without limitation. Accordingly, the functional blocks, modules, and steps may be re-ordered or combined in different ways than in the examples provided above. Likewise, certain embodiments include only a subset of the functional blocks, modules, and steps, and any such subset is permitted.
[0087] The breadth and scope of the present disclosure should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

WHAT IS CLAIMED IS:
1. A system-on-chip (SoC), comprising: at least one processor comprising an instruction cache and at least one processing core; and at least one debug unit configured to facilitate communication between an external debug host and the at least one processor via a debug channel that comprises a portion of an instruction cache miss channel, wherein the at least one processing core is configured to: halt operations associated with regular mode signal processing based on a debug mode trigger; obtain at least one debug instruction from the at least one debug unit via the instruction cache miss channel; and process the at least one debug instruction.
2. The SoC of claim 1, wherein the at least one debug unit includes one or more of a debug module interface (DMI) or a debug transport module (DTM).
3. The SoC of claim 2, further comprising a debug mode trigger unit configured to: identify a debug mode trigger; and switch the at least one processor from regular mode to debug mode when the debug mode trigger is identified.
4. The SoC of claim 3, wherein the instruction cache is configured to: send at least one miss signal to the at least one processing core while in debug mode.
5. The SoC of claim 1, wherein the at least one debug unit is further configured to: receive the at least one debug instruction from the external debug host.
6. The SoC of claim 1, wherein the at least one processing core is further configured to: output debug information associated with the at least one debug instruction to a display.
7. The SoC of claim 1, wherein the at least one processor is associated with one or more of a reduced instruction set computer (RISC)-V architecture, a complex instruction set computer (CISC) architecture, a very long instruction word (VLIW) architecture, a long instruction word (LIW) architecture, an explicitly parallel instruction computing (EPIC) architecture, a minimal instruction set computer (MISC) architecture, or a one instruction set computer (OISC) architecture.
8. The SoC of claim 7, wherein the at least one debug unit is associated with the RISC-V architecture.
9. A processor, comprising: an instruction cache; and at least one processing core connected to the instruction cache and at least one debug unit of a system-on-chip (SoC), the at least one processing core being connected to the at least one debug unit via a debug channel that comprises a portion of an instruction cache miss channel, wherein the at least one processing core is configured to: halt operations associated with regular mode signal processing based on a debug mode trigger; receive at least one debug instruction from the at least one debug unit via the instruction cache miss channel; and process the at least one debug instruction.
10. The processor of claim 9, wherein the at least one debug unit includes one or more of a debug module interface (DMI) or a debug transport module (DTM).
11. The processor of claim 10, further comprising a debug mode trigger unit configured to: identify a debug mode trigger; and switch from regular mode to debug mode when the debug mode trigger is identified.
12. The processor of claim 11, wherein the instruction cache is configured to: send at least one miss signal to the at least one processing core while in debug mode.
13. The processor of claim 9, wherein the at least one processing core is further configured to: send debug information to the at least one debug unit via the debug channel for communication with an external debug host.
14. The processor of claim 9, wherein the at least one processing core is further configured to: output debug information associated with the at least one debug instruction to a display.
15. The processor of claim 9, wherein the at least one processing core is associated with one or more of a reduced instruction set computer (RISC)-V architecture, a complex instruction set computer (CISC) architecture, a very long instruction word (VLIW) architecture, a long instruction word (LIW) architecture, an explicitly parallel instruction computing (EPIC) architecture, a minimal instruction set computer (MISC) architecture, or a one instruction set computer (OISC) architecture.
16. A method of debugging with a system-on-chip (SoC), comprising: communicating, using at least one debug unit, with an external debug host; halting, using at least one processor, operations associated with regular mode signal processing based on a debug mode trigger; receiving, at a processing core of the processor, at least one debug instruction from the at least one debug unit, the at least one debug instruction being received via an instruction cache miss channel, and the instruction cache miss channel being a portion of a debug channel between the at least one debug unit and the at least one processing core; and processing, using the processing core, the at least one debug instruction.
17. The method of claim 16, wherein the at least one debug unit includes one or more of a debug module interface (DMI) or a debug transport module (DTM).
18. The method of claim 17, further comprising: identifying, using a debug mode trigger unit, the debug mode trigger; and switching, using the debug mode trigger unit, the at least one processor from regular mode to debug mode when the debug mode trigger is identified.
19. The method of claim 16, further comprising: receiving, at the processing core, at least one miss signal from the instruction cache during debug mode; receiving, at the at least one debug unit, the at least one debug instruction from the external debug host; sending the at least one debug instruction to the at least one processor via the debug channel; generating, using the processing core, debug information associated with the at least one debug instruction; and outputting the debug information to a display.
20. A method of debugging with a processor, comprising: halting operations associated with regular mode signal processing based on a debug mode trigger; receiving at least one debug instruction from at least one debug unit via an instruction cache miss channel; and processing the at least one debug instruction.
PCT/US2021/0309602021-05-052021-05-05Debug channel for communication between a processor and an external debug hostCeasedWO2022235265A1 (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
PCT/US2021/030960WO2022235265A1 (en)2021-05-052021-05-05Debug channel for communication between a processor and an external debug host

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
PCT/US2021/030960WO2022235265A1 (en)2021-05-052021-05-05Debug channel for communication between a processor and an external debug host

Publications (1)

Publication NumberPublication Date
WO2022235265A1true WO2022235265A1 (en)2022-11-10

Family

ID=83931985

Family Applications (1)

Application NumberTitlePriority DateFiling Date
PCT/US2021/030960CeasedWO2022235265A1 (en)2021-05-052021-05-05Debug channel for communication between a processor and an external debug host

Country Status (1)

CountryLink
WO (1)WO2022235265A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116737475A (en)*2023-05-292023-09-12中国第一汽车股份有限公司Chip diagnosis method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6321329B1 (en)*1999-05-192001-11-20Arm LimitedExecuting debug instructions
US20080313442A1 (en)*2007-06-132008-12-18Jian WeiDebugging techniques for a programmable integrated circuit
US20170068610A1 (en)*2015-09-042017-03-09International Business Machines CorporationDebugger display of vector register contents after compiler optimizations for vector instructions
US20200210301A1 (en)*2018-12-312020-07-02Texas Instruments IncorporatedDebug for multi-threaded processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6321329B1 (en)*1999-05-192001-11-20Arm LimitedExecuting debug instructions
US20080313442A1 (en)*2007-06-132008-12-18Jian WeiDebugging techniques for a programmable integrated circuit
US20170068610A1 (en)*2015-09-042017-03-09International Business Machines CorporationDebugger display of vector register contents after compiler optimizations for vector instructions
US20200210301A1 (en)*2018-12-312020-07-02Texas Instruments IncorporatedDebug for multi-threaded processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HERDT VLADIMIR; GROBE DANIEL; JENTZSCH EYCK; DRECHSLER ROLF: "Efficient Cross-Level Testing for Processor Verification: A RISC- V Case-Study", 2020 FORUM FOR SPECIFICATION AND DESIGN LANGUAGES (FDL), IEEE, 15 September 2020 (2020-09-15), pages 1 - 7, XP033846917, DOI: 10.1109/FDL50818.2020.9232941*

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116737475A (en)*2023-05-292023-09-12中国第一汽车股份有限公司Chip diagnosis method, device, equipment and storage medium

Similar Documents

PublicationPublication DateTitle
US9703562B2 (en)Instruction emulation processors, methods, and systems
US9128781B2 (en)Processor with memory race recorder to record thread interleavings in multi-threaded software
EP0762280B1 (en)Data processor with built-in emulation circuit
US5530804A (en)Superscalar processor with plural pipelined execution units each unit selectively having both normal and debug modes
JP6006248B2 (en) Instruction emulation processor, method and system
US20060059286A1 (en)Multi-core debugger
US7681078B2 (en)Debugging a processor through a reset event
US7689867B2 (en)Multiprocessor breakpoint
RU2638641C2 (en)Partial width loading depending on regime, in processors with registers with large number of discharges, methods and systems
KR102262176B1 (en)Avoiding premature enabling of nonmaskable interrupts when returning from exceptions
EP0762276A1 (en)Data processor with built-in emulation circuit
EP0762277A1 (en)Data processor with built-in emulation circuit
EP0762279A1 (en)Data processor with built-in emulation circuit
US7840845B2 (en)Method and system for setting a breakpoint
US10867031B2 (en)Marking valid return targets
US9817763B2 (en)Method of establishing pre-fetch control information from an executable code and an associated NVM controller, a device, a processor system and computer program products
CN111209247A (en)Integrated circuit computing device and computing processing system
US11023342B2 (en)Cache diagnostic techniques
WO2022235265A1 (en)Debug channel for communication between a processor and an external debug host
US9081895B2 (en)Identifying and tagging breakpoint instructions for facilitation of software debug
CN114580329B (en)Real-time debugging method for digital signal processor chip
US9483379B2 (en)Randomly branching using hardware watchpoints
US20210043266A1 (en)Apparatus and methods for debugging on a host and memory device
US11119149B2 (en)Debug command execution using existing datapath circuitry
CN118860762B (en) Simulation system and method based on digital signal processor

Legal Events

DateCodeTitleDescription
121Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number:21939949

Country of ref document:EP

Kind code of ref document:A1

NENPNon-entry into the national phase

Ref country code:DE

122Ep: pct application non-entry in european phase

Ref document number:21939949

Country of ref document:EP

Kind code of ref document:A1

122Ep: pct application non-entry in european phase

Ref document number:21939949

Country of ref document:EP

Kind code of ref document:A1

32PNEp: public notification in the ep bulletin as address of the adressee cannot be established

Free format text:NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14.06.2024)

122Ep: pct application non-entry in european phase

Ref document number:21939949

Country of ref document:EP

Kind code of ref document:A1


[8]ページ先頭

©2009-2025 Movatter.jp