WO2006017874A2

Movatterモバイル変換

Info

Publication number: WO2006017874A2
Application number: PCT/AT2005/000326
Authority: WO
Inventors: Martin SCHÖBERL
Original assignee: Schoeberl Martin
Priority date: 2004-08-17
Filing date: 2005-08-12
Publication date: 2006-02-23
Also published as: WO2006017874A3; AT505203A5; AT500858A4; AT500858B1; AT500858B8

Abstract

For real-time systems, the knowledge of the maximum execution time (worst case execution time (WCET)) is of fundamental importance. The influence of the instruction cache upon this WCET in conventional designs is difficult to predict and leads to pessimistic WCET values. For this reason, the invention provides an instruction cache (103) for a processor (100), which enables a more precise predictability of the real-time behavior. The instruction cache is loaded with complete functions and combines all cache misses in the event of a function call-up and function return.

Description

Translated fromGerman

Instruction Cache für EchtzeitsystemeInstruction cache for real-time systems

Die Erfindung bezieht sich auf einen echtzeitfähigen Instruction Cache.The invention relates to a real-time instruction cache.

In Echtzeitsystemen ist die Korrektheit eines Programms nur gegeben, wenn neben der algorithmischen Korrektheit auch zeitliche Bedingungen eingehalten werden. Um diese zeitlichen Bedingungen einhalten zu können muss die maximale Ausfiihrungszeit von Programmen bekannt sein. Diese Werte sind die Basis jeder 'schedulability' Analyse.In real-time systems, the correctness of a program is given only if, in addition to the algorithmic correctness, temporal conditions are adhered to. In order to be able to meet these temporal conditions, the maximum execution time of programs must be known. These values are the basis of every 'schedulability' analysis.

Die maximale Ausfuhrungszeit von Programmen muss durch Analyse der Program¬ me und der dazu notwendigen Modellierung des Systems erfolgen. Eine Messung der Ausführungszeit ist nicht möglich, da nicht sichergestellt werden kann, dass alle Kombi¬ nationen von Ausführungspfaden durchlaufen wurden.The maximum execution time of programs must be done by analyzing the programs and the necessary modeling of the system. A measurement of the execution time is not possible since it can not be ensured that all combinations of execution paths have been run through.

Cache Speicher sind ein wichtiger Bestandteil von Prozessoren um den Geschwin¬ digkeitsunterschied zwischen Hauptspeicher und Prozessor auszugleichen. Die bekann¬ ten Cache Architekturen sind jedoch für die durchschnittliche Performanz und nicht für vorhersagbare Performanz optimiert. Dies führt zu schwer vorhersagbaren bzw. sehr pes¬ simistischen WCET Werten. In (Proc. of the IEEE, 91(7): 1038-1054, JuI. 2003) werden Caches von realen Prozessoren analysiert. Architekturmerkmale führen dazu, dass nur 1/2 bzw. 1/4 des vorhandenen Cache Speicher modelliert werden können.Cache memory is an important part of processors to compensate for the Geschwin¬ digkeitsunterschied between main memory and processor. However, the well-known cache architectures are optimized for the average performance and not for predictable performance. This leads to difficult to predict or very pes¬ simistic WCET values. In (Proc. Of the IEEE, 91 (7): 1038-1054, June 2003) caches of real processors are analyzed. Architectural features mean that only 1/2 or 1/4 of the existing cache memory can be modeled.

Ein Ansatz zur Lösung dieser Problematik besteht aus der Teilung des Instruciton Caches in einen Block für allgemeine Programme und einen Block für echtzeit rele¬ vanten Code (z.B.: EP 0 529 217 Al oder US 5,913,224). Der Echtzeitcode wird vor der Ausführung in einen Cacheblock geladen und dieser Block gesperrt. Dieser Block enthält dann während der kompletten Laufzeit den echtzeit relevanten Programmteil. Die¬ se Lösung ist jedoch sehr inflexibel und beschränkt die maximale Größe der Echtzeitpro¬ gramme auf die Cachegröße.One approach to solving this problem consists of dividing the instruciton cache into a block for general programs and a block for real-time rele vant code (for example: EP 0 529 217 A1 or US 5,913,224). The real-time code is loaded into a cache block before execution and this block is disabled. This block then contains the real-time relevant program part during the entire runtime. However, this solution is very inflexible and limits the maximum size of the real-time programs to the cache size.

Der Erfindung liegt die Aufgabe zugrunde einen Instruction Cache zu gestalten, des¬ sen Echtzeitverhalten genauer modelliert werden kann ohne die Programmgröße einzu¬ schränken.The object of the invention is to design an instruction cache whose real-time behavior can be modeled more accurately without restricting the program size.

Die Aufgabe wird dadurch gelöst, dass komplette Funktionen im Instruction Cache gespeichert werden. Das Laden des Instruction Caches erfolgt nur, wenn notwendig, bei einem Funktionsaufruf bzw. bei einer Funktionsrückkehr.The task is solved by storing complete functions in the instruction cache. The instruction cache is loaded only when necessary during a function call or during a function return.

Da sichergestellt ist, dass eine Funktion bei der Ausführung komplett im Instruction Cache geladen ist, fallen keine Cache bedingten Wartezeiten während der Ausführung der Funktion an. Der Cache muss daher nur bei Funktionsaufruf und Funktionsrückkehr in der WCET Analyse berücksichtigt werden. Die Entscheidung ob ein 'cach hit' oder 'cache miss' vorliegt ist nur vom Aufrufbaum der Funktionen bestimmt und nicht von den Adressen der einzelnen Instruktionen.Since it is ensured that a function is completely loaded in the instruction cache during execution, there are no cache-related waiting times during the execution of the function. The cache must therefore only be taken into account in the WCET analysis when the function is called and the function returns. The decision whether a 'cach hit' or 'cache miss' exists is determined only by the call tree of the functions and not by the addresses of the individual instructions.

Funktionen werden nur relativ adressiert. D.h. es sind innerhalb der Funktion nur re¬ lative Sprünge möglich. Diese Bedingung ist z.B. in dem Zwischenkode der Sprache Java erfüllt. Daher eignet sich dieser Instruction Cache sehr gut für einen echtzeitfähigen Java Prozessor. Java ist aber nur als Beispiel für die Anwendung dieses Instruction Caches zu verstehen. Auch andere Programmiersprachen, wie z.B.: C, lassen sich auf eine Weise Übersetzen, die nur relative Sprünge innerhalb von Funktionen enthält.Functions are only addressed relatively. That within the function only relative jumps are possible. This condition is e.g. met in the intermediate code of the Java language. Therefore, this instruction cache is very well suited for a real-time capable Java processor. Java is only to be understood as an example for the application of this instruction cache. Other programming languages, such as C, can also be compiled in a way that contains only relative jumps within functions.

Durch die relative Adressierung ist es während der Funktionsausführung irrelevant an welcher Cacheposition die Funktion beginnt. Der Program Counter 102 muss nur beim Funktionsaufruf mit der Startadresse im Cache geladen werden.Due to the relative addressing, it does not matter at which cache position the function starts during the function execution. The program counter 102 only has to be loaded in the cache with the start address when the function is called.

Um mehr als nur eine Funktion im Instruction Cache halten zu können wird dieser in Blöcke eingeteilt. Eine Funktion kann sich über mehrere zusammenhängende Blöcke erstrecken. Wobei ein Zusammenhang auch vom letzen Block zum ersten Block besteht, da der Program Counter auf die Cacheadressierung begrenzt ist und es dadurch zu einem automatisch korrekten Über- bzw. Unterlauf kommt.To keep more than one function in the instruction cache, it is divided into blocks. A function can span several contiguous blocks. Whereby a connection also exists from the last block to the first block, since the program counter is limited to the cache addressing and this leads to an automatically correct overflow or underflow.

Durch die relative Adressierung können der Programm Counter 102 und die zughöri¬ gen Busse und Multiplexer einfacher, da kleiner, realisiert werden. Auch die Adressüber¬ setzung für die Implementierung eines virtuellen Speichers ist nur mehr beim Laden einer Funktion notwendig.As a result of the relative addressing, the program counter 102 and the associated buses and multiplexers can be implemented more simply because they are smaller. Also the address translation for the implementation of a virtual memory is only necessary when loading a function.

Die Feststellung eines 'cach hit' ist nur beim Funktionsaufruf bzw. bei der Rückkehr notwendig und wird durch Lesen des Block RAM 105 gelöst. Das in konventionellen Caches notwendig 'tag RAM', das bei jedem Cachezugriff gelesen werden und mit der Adresse verglichen werden muss, kann dadurch entfallen. Der Zugriff auf das 'tag RAM' und der Adressenvergleich liegen normalerweise im kritischen Pfad der Hardware und bestimmen dadurch die minimale Zugriffszeit auf den Cache. Ohne Vergleich bei jedem Zugriff, wie in dieser Erfindung, kann die Zugriffszeit auf den Cache bei gleicher Tech¬ nologie verringert werden.The determination of a 'cach hit' is necessary only at the function call or at the return and is solved by reading the block RAM 105. The "tag RAM" required in conventional caches, which must be read with every cache access and must be compared with the address, can thus be dispensed with. The access to the 'tag RAM' and the address comparison are usually in the critical path of the hardware and thereby determine the minimum access time to the cache. Without comparison on each access, as in this invention, the access time to the cache can be reduced with the same technology.

Das Laden kompletter Funktionen, und damit größerer Blöcke als bei einem kon¬ ventionellen Cache, wirkt sich auch positiv bei Verwendung von dynamischen Speichern für den Hauptspeicher 201 aus. Diese Speichertechnologie zeichnet sich dadurch aus, dass das erste Wort erst nach einer beträchtlichen Verzögerung verfügbar ist, jedoch die folgenden nach kürzerer Zeit. Diese Initialverzögerung wirkt sich bei größeren Blocken weniger aus, als bei kleinen Blöcken.The loading of complete functions, and thus larger blocks than in the case of a conventional cache, also has a positive effect when using dynamic memories for the main memory 201. This memory technology is characterized by the fact that the first word is available only after a considerable delay, but the following is available in a shorter time. This initial delay has less effect on larger blocks than on small blocks.

In Fig. 1 wird die Architektur eines Prozessors dargestellt, der den Erfindungsgegen- stand enthält. Fig. 2 zeigt exemplarisch die Belegung der Cache Blöcke bei der Ausführung des Programmfragments in Fig. 3.FIG. 1 shows the architecture of a processor which contains the subject matter of the invention. 2 shows by way of example the assignment of the cache blocks during execution of the program fragment in FIG. 3.

Der Instruction Cache 103 liegt zwischen dem Prozessorkern 101 und dem Bus Inter¬ face 104. Instruktionen werden über den Bus 112 vom Instruction Cache 103 geholt. Der Instruction Cache 103 wird über den Program Counter 102 adressiert. Da dieser nur den Cache adressiert muss dieser und die zugeörigen Busse 110 und 111 log2(Cachegröße) Bits breit sein.The instruction cache 103 is located between the processor core 101 and the bus interface 104. Instructions are fetched via the bus 112 from the instruction cache 103. The instruction cache 103 is addressed via the program counter 102. Since this only addresses the cache, this and the associated buses 110 and 111 must be log2 (cache size) bits wide.

Der Instruction Cache 103 wird vom Bus Interface 104 aus dem Hauptspeicher 201 mit kompletten Funktionen gefüllt. Die Busse 113 und 114 sind die Adress- bzw. Da¬ tenbusse zwischen dem Bus Interface 104 und dem Instruction Cache 103. Über den Adressbus 117 und dem Datenbus 118 werden Lade- und Speicheranforderungen des Pro¬ zessorkerns 101 abgewickelt.The instruction cache 103 is filled by the bus interface 104 from the main memory 201 with complete functions. The buses 113 and 114 are the address or data buses between the bus interface 104 and the instruction cache 103. Load and memory requirements of the processor core 101 are handled via the address bus 117 and the data bus 118.

Das Bus Interface 104 wickelt den Datenaustausch und das Laden des Instruction Caches 103 mit dem Hauptspeicher 201 über den Adressbus 210 und dem Datenbus 211 ab. Da das Laden des Instruction Caches 103 nur bei einem Funktionsaufruf oder einer Rückkehr aus einer Funktion passiert, kommt es zu keinen Konflikten mit den Lade- und Speicheranforderungen des Prozessorkerns 101.The bus interface 104 handles the data exchange and loading of the instruction cache 103 with the main memory 201 via the address bus 210 and the data bus 211. Since the loading of the instruction cache 103 occurs only upon a function call or a return from a function, there are no conflicts with the load and memory requirements of the processor core 101.

Der Block RAM 105 dient dem Prozessor zur Speicherung welche Blöcke des In¬ struction Caches 103 von welchen Funktionen belegt sind. Er wird über den Adressbus 116 und den Datenbus 115 angesprochen.The block RAM 105 is used by the processor to store which blocks of the instruction cache 103 are occupied by which functions. It is addressed via the address bus 116 and the data bus 115.

Fig. 2 zeigt die Belegung von Cache Blöcken während der Ausführung des in Fig. 3 skizzierte Programms. Die Anzahl der Blöcke und die Strategie welche Blöcke ersetzt werden ist nur exemplarisch. Die Ersetzungsstrategie kann komplexer als bei herkömm¬ lichen Instruction Caches ausfallen, da die Entscheidung seltener (nur beim Laden einer kompletten Funktion) anfällt. Die Belegung der Blöcke wird in Block RAM 105 gespei¬ chert. Dieser muss gelesen werden um festzustellen ob ein 'cach hit' vorliegt und ge¬ schrieben werden, wenn eine Funktion neu in den Instruction Cache geladen wird.FIG. 2 shows the occupancy of cache blocks during the execution of the program outlined in FIG. The number of blocks and the strategy which blocks are replaced is only an example. The replacement strategy can be more complex than with conventional instruction caches because the decision is less likely (only when loading a complete function). The assignment of the blocks is stored in block RAM 105. This must be read to determine if a 'cache hit' exists and is written when a function is newly loaded into the instruction cache.

Das Beispiel in Fig. 2 besteht aus 4 Funktionen, wobei die Funktionen A() und D() klein genug sind um in einen Block zu passen. Funktonen B() und C() sind größer und belegen zwei Blöcke. 301 zeigt den Zustand nach dem Aufruf der Funktion A(). Der erste Block ist belegt, die restlichen drei sind frei. Der Aufruf der Funktion B() innerhalb von A() führt zur Belegung wie in 302 gezeigt. Es ist nur mehr ein Block frei. Die Funktion C(), die von B() aufgerufen wird benötigt jedoch zwei Blöcke. Wie in 303 gezeigt wird C() in Block 4 und Block 1 geladen, wodurch Funktion A() nicht mehr im Cache ist.The example in Figure 2 consists of 4 functions, where the functions A () and D () are small enough to fit in a block. Functions B () and C () are larger and occupy two blocks. 301 shows the state after the call of the function A (). The first block is occupied, the remaining three are free. Calling function B () within A () results in occupancy as shown in 302. There is only one more block left. The C () function called by B (), however, requires two blocks. As shown in 303, C () is loaded in block 4 and block 1, whereby function A () is no longer in the cache.

Die Adressierung der Funktion C() über das Cacheende (Block 4) zum Cacheanfang (Block 1) geschieht implizit durch die Begrenzung vom Program Counter 102 auf die Cachegröße. Die Addition bzw. Subtraktion über die Cachgrenze hinaus ergibt implizit den korreten Überlauf bzw. Unterlauf des Program Counters 102.The addressing of the function C () via the cache end (block 4) to the beginning of the cache (block 1) is implicitly effected by the limitation of the program counter 102 to the cache Cache size. The addition or subtraction beyond the cache boundary implicitly results in the correct overflow or underflow of the program counter 102.

Bei der Rückkehr von Funktion C() zur Funktion B() ist kein Laden des Caches not¬ wendig, da Sich Funktion B() zu diesem Zeitpunkt noch im Cache befindet. Der Aufruf von Funktion D() führt zur Belegung wie in 304 gezeigt. Obwohl D() nur einen Block belegt und damit einen Teil B() verdrängt, ist Block 3 als unbelegt markiert. Dies ist Not¬ wendig, da nur komplette Funktionen gültig sind.When function C () is returned to function B (), no loading of the cache is necessary since function B () is still in the cache at this time. Calling function D () results in occupancy as shown in 304. Although D () occupies only one block and thus displaces part B (), block 3 is marked as empty. This is necessary because only complete functions are valid.

Die Entscheidung ob D() Funktion B() oder Funktion C() aus dem Cache verdrängt ist abhängig von der Ersatzstrategie. In diesem Beispiel wird jeweils der nächste Block nach einer geladenen Funktion als Startblock für eine neue zu ladende Funktion verwen¬ det. Dies ist aber nur eine Möglichkeit von vielen (z.B.: 'last recently used' oder 'best fit'). Ebenfalls ist die Einteilung in vier Blöcken nur exemplarisch zur Vereinfachung der Illustration.The decision whether D () function B () or function C () is flushed out of the cache depends on the replacement strategy. In this example, in each case the next block after a loaded function is used as start block for a new function to be loaded. However, this is only one option of many (eg 'last recently used' or 'best fit'). Also, the division into four blocks is only an example to simplify the illustration.

Claims

Translated fromGerman

Patentansprüche claims

1. Verfahren zum Betrieb eines Instruction Caches, dadurch gekennzeichnet, dass kom¬ plette Funktionen gespeichert werden, die nur bei einem Funktionsaufruf oder ei¬ nem Rücksprung aus einer Funktion geladen werden.1. A method for operating an instruction cache, characterized in that Kom¬ plete functions are stored, which are loaded only in a function call or ei¬ nem return from a function.

2. Verfahren nach Anspruch 1, dadurch gekennzeichnet, dass Funktionen innerhalb des Caches nur relativ adressiert werden und dadurch mehrere Funktionen jeweils in aufeinanderfolgenden Blöcken gehalten werden können.2. The method according to claim 1, characterized in that functions within the cache are only relatively addressed and thereby a plurality of functions can be kept in successive blocks.

3. Verfahren nach Anspruch 1 oder 2, dadurch gekennzeichnet, dass das "tag memory" durch einen "Block RAM" (105) ersetzt ist, welches nur bei Funktionsaufruf bzw. Funktionsrückkehr vom Prozessorkern (101) gelesen bzw. geschrieben wird.3. The method of claim 1 or 2, characterized in that the "tag memory" is replaced by a "block RAM" (105), which is read or written only when function call or function return from the processor core (101).

4. Instruciton Cache, dadurch gekennzeichnet, dass er nach einem der Verfahren nach Ansprüchen 1-3 betrieben wird.4. Instruciton cache, characterized in that it is operated according to one of the methods of claims 1-3.