技术领域technical field
本发明涉及中央处理单元(CPU)的字符比较指令优化。The present invention relates to the optimization of character comparison instructions of a central processing unit (CPU).
背景技术Background technique
中央处理单元的指令集设计必须在硬件成本以及执行效率之间权衡。如何以低成本硬件在中央处理单元内供应高效指令集为本技术领域重要课题。以X86指令集中的字符串比较指令为例,这些指令加速对XML文本的字符串操作、存储校验等,其大幅优化文本处理,从而提升了办公性能。然而在传统的中央处理单元中,字符串比较指令需要译码为上百条微运算(或微操作或“μops”),中央处理单元的执行单元须执行完这上百条的微运算才能完成该字符串比较指令,执行效率低下。The instruction set design of the central processing unit must balance between hardware cost and execution efficiency. How to provide high-efficiency instruction sets in the central processing unit with low-cost hardware is an important issue in this technical field. Take the string comparison instructions in the X86 instruction set as an example. These instructions accelerate string operations on XML text, storage verification, etc., which greatly optimize text processing, thereby improving office performance. However, in a traditional central processing unit, string comparison instructions need to be decoded into hundreds of micro-operations (or micro-operations or "μops"), and the execution unit of the central processing unit must execute these hundreds of micro-operations to complete The string comparison instruction has low execution efficiency.
发明内容Contents of the invention
根据本发明一种实施方式所实现的中央处理单元提供字符串比较指令,且包括字符串缓存器、移位寄存器以及逻辑运算单元。该字符串缓存器以多个缓存单元储存第一字符串的多个数据单元。该移位寄存器令载入的第二字符串的多个数据单元在该移位寄存器的多个缓存单元间逐步移位。所述比较器将该字符串缓存器的所述缓存单元的内容与该移位寄存器的所述缓存单元的内容进行比较。该逻辑运算单元用于根据所述比较器的输出对应该字符串比较指令要求的比较模式运算该第一字符串以及该第二字符串的比较结果。A central processing unit implemented according to an embodiment of the present invention provides string comparison instructions, and includes a string register, a shift register, and a logical operation unit. The character string register uses a plurality of buffer units to store a plurality of data units of the first character string. The shift register shifts the data units of the loaded second character string step by step among the buffer units of the shift register. The comparator compares the content of the buffer unit of the string buffer with the content of the buffer unit of the shift register. The logical operation unit is used to operate the comparison result of the first string and the second string according to the comparison mode required by the string comparison instruction corresponding to the output of the comparator.
根据本发明一种实施方式所实现的一种中央处理单元操作方法以中央处理单元执行字符串比较指令,包括:以该中央处理单元的字符串缓存器的多个缓存单元储存第一字符串的多个数据单元;令第二字符串的多个数据单元在该中央处理单元的移位寄存器的多个缓存单元间逐步移位;将该字符串缓存器的所述缓存单元的内容与该移位寄存器的所述缓存单元的内容进行比较;以及根据上述比较的结果,对应该字符串比较指令要求的比较模式,运算该第一字符串以及该第二字符串的比较结果。A method for operating a central processing unit implemented according to an embodiment of the present invention uses a central processing unit to execute a character string comparison instruction, including: using a plurality of cache units of a character string buffer of the central processing unit to store the first character string a plurality of data units; make the plurality of data units of the second character string gradually shift among the plurality of cache units of the shift register of the central processing unit; Comparing the contents of the cache unit of the bit register; and according to the comparison result, corresponding to the comparison mode required by the string comparison instruction, operating the comparison result of the first string and the second string.
移位寄存器的设计使得中央处理单元得以在少数时钟周期内完成字符串比较指令。The design of the shift register allows the central processing unit to complete the string comparison instruction in a few clock cycles.
下文特举实施例,并配合所附图示,详细说明本发明内容。Hereinafter, specific embodiments are cited, and the contents of the present invention are described in detail in conjunction with the accompanying drawings.
附图说明Description of drawings
图1图解根据本发明一种实施方式所实现的中央处理单元100;FIG. 1 illustrates a central processing unit 100 implemented according to one embodiment of the present invention;
图2以流程图显示根据本发明中央处理单元的微架构,字符串比较指令可转换为三个中央处理单元微运算(包括:字符串有效长度辨识微运算202;比较操作微运算204;以及比较结果产生微运算206)依序进行;以及Fig. 2 shows with flow chart according to the microarchitecture of central processing unit of the present invention, and string comparison instruction can be converted into three micro-operations of central processing unit (comprising: character string effective length identification micro-operation 202; Comparison operation micro-operation 204; and comparison Result generating micro-operations 206) are performed sequentially; and
图3依序图解该移位寄存器104内容。FIG. 3 illustrates the contents of the shift register 104 in sequence.
具体实施方式Detailed ways
以下叙述列举本发明的多种实施例。以下叙述介绍本发明的基本概念,且并非意图限制本发明内容。实际发明范围应依照申请权利要求书来界定。The following description lists various embodiments of the present invention. The following description introduces the basic concept of the present invention and is not intended to limit the content of the present invention. The actual scope of the invention should be defined in accordance with the claims of the application.
图1图解根据本发明一种实施方式所实现的中央处理单元100,执行字符串比较指令并包括:字符串缓存器102、移位寄存器104、多个多工器Mux1~Mux7、多个比较器cmp0~cmp7、零值比较器Zero_cmp、以及逻辑运算单元106。Fig. 1 illustrates a central processing unit 100 implemented according to an embodiment of the present invention, which executes character string comparison instructions and includes: a character string buffer 102, a shift register 104, multiple multiplexers Mux1-Mux7, and multiple comparators cmp0˜cmp7, a zero comparator Zero_cmp, and a logic operation unit 106 .
字符串缓存器102以多个缓存单元储存第一字符串SRC1的多个数据单元(例如图1中编号0至7,但本发明并不限于8个数据单元)。移位寄存器104则可有两种输入,一为零值ZERO,一为第二字符串SRC2。载入该移位寄存器104的数据的多个数据单元(例如图1中编号0至7,但本发明并不限于8个数据单元)会在该移位寄存器104的多个缓存单元间逐步移位。所述多工器Mux1~Mux7耦接在该移位寄存器104以及所述比较器cmp0~cmp7之间,在第一模式以及第二模式间切换。所述多工器Mux1~Mux7在该第一模式时,所述比较器cmp0~cmp7是分别将该字符串缓存器102的各个缓存单元的内容与该移位寄存器104的该最低位缓存单元(图1中为储存标号’0’数据单元的该缓存单元)的内容进行比较。所述多工器Mux1~Mux7在该第二模式时,所述比较器cmp0~cmp7是分别将该字符串缓存器102的各个缓存单元的内容与该移位寄存器104的各个缓存单元的内容一对一进行比较。The string register 102 stores a plurality of data units of the first string SRC1 (for example, numbered 0 to 7 in FIG. 1 , but the present invention is not limited to 8 data units) in a plurality of buffer units. The shift register 104 can have two kinds of inputs, one is the zero value ZERO, and the other is the second string SRC2. A plurality of data units (for example, numbers 0 to 7 in FIG. 1 , but the present invention is not limited to 8 data units) of the data loaded into the shift register 104 will be gradually shifted among the plurality of buffer units of the shift register 104. bit. The multiplexers Mux1-Mux7 are coupled between the shift register 104 and the comparators cmp0-cmp7, and switch between the first mode and the second mode. When the multiplexers Mux1-Mux7 are in the first mode, the comparators cmp0-cmp7 are respectively the content of each buffer unit of the character string buffer 102 and the lowest bit buffer unit of the shift register 104 ( In FIG. 1, the contents of the cache unit storing the data unit labeled '0' are compared. When the multiplexers Mux1-Mux7 are in the second mode, the comparators cmp0-cmp7 respectively match the content of each buffer unit of the character string buffer 102 with the content of each buffer unit of the shift register 104. Compare one to one.
该字符串比较指令所要求的比较模式可为:逐个相等比较模式(Equal Any)、逐个大小比较模式(Ranges)、子字符串比较模式(Equal Ordered)以及全相等比较模式(EqualEach),分别对第一字符SRC1以及第二字源SRC2有不同的比较操作。为实现之,除了字符串缓存器102缓存第一字符SRC1外,填入该移位寄存器104的是该第二字符串SRC2。The comparison mode required by the string comparison instruction can be: one by one equal comparison mode (Equal Any), one by one size comparison mode (Ranges), substring comparison mode (Equal Ordered) and all equal comparison mode (EqualEach), respectively The first character SRC1 and the second word source SRC2 have different comparison operations. To achieve this, in addition to buffering the first character SRC1 in the character string register 102 , what is filled into the shift register 104 is the second character string SRC2 .
所述多工器Mux1~Mux7的该第一模式操作可用于满足上述逐个相等比较模式(Equal Any)及逐个大小比较模式(Ranges)。比较器cmp0~cmp7将该字符串缓存器102的所述缓存单元的内容(第一字符串SRC1编号’0’至’7’的数据单元)反复于多个时钟周期中与该移位寄存器104的该最低位缓存单元的内容比较。在连续的8个时钟周期中,该移位寄存器104的该最低位缓存单元的内容循序为编号’0’数据单元、编号’1’数据单元、编号’2’数据单元、编号’3’数据单元、编号’4’数据单元、编号’5’数据单元、编号’6’数据单元、以及编号’7’数据单元。该字符串比较指令要求的比较模式为上述逐个相等比较模式(Equal Any)时,该逻辑运算单元106根据所述比较器cmp0~cmp7的输出运算该第二字符串SRC2的所述数据单元(编号’0’至’7’)中任一个与该第一字符串SRC1的所述数据单元(编号’0’至’7’)中任一个的比较结果。该字符串比较指令要求的比较模式为上述逐个大小比较模式(Ranges)时,该逻辑运算单元106根据所述比较器cmp0~cmp7的输出判断该第二字符串SRC2的所述数据单元(编号’0’至’7’)中任一个如何相对至该第一字符串SRC1的所述数据单元(编号’0’至’7’)所界定的范围,例如比较第二字符串SRC2编号’0’的数据单元是否落入第一字符串SRC1编号’0’与编号’1’的数据单元之间的范围,如是则输出命中(hit)信号,如否则再判断其是否落入第一字符串SRC1编号’1’与编号’2’的数据单元之间的范围…直至找到命中的范围或者比较完所有的范围;然后移位寄存器104移位,取第二字符串SRC2编号’1’的数据单元再进行上述比较…以此类推直至第二字符串SRC2编号’7’的数据单元完成上述比较。The first mode of operation of the multiplexers Mux1-Mux7 can be used to satisfy the above-mentioned one-by-one equality comparison mode (Equal Any) and one-by-one size comparison mode (Ranges). Comparators cmp0~cmp7 repeat the contents of the buffer unit of the character string buffer 102 (data units of the first character string SRC1 numbers '0' to '7') repeatedly with the shift register 104 in a plurality of clock cycles The contents of the lowest bit cache location are compared. In 8 consecutive clock cycles, the contents of the lowest bit buffer unit of the shift register 104 are numbered '0' data unit, numbered '1' data unit, numbered '2' data unit, numbered '3' data unit, number '4' data unit, number '5' data unit, number '6' data unit, and number '7' data unit. When the comparison mode required by the string comparison instruction is the above-mentioned equal comparison mode (Equal Any) one by one, the logic operation unit 106 calculates the data unit (number) of the second string SRC2 according to the output of the comparators cmp0-cmp7 A comparison result between any one of '0' to '7') and any one of the data units (numbers '0' to '7') of the first character string SRC1. When the comparison mode required by the string comparison instruction is the above-mentioned size comparison mode (Ranges) one by one, the logical operation unit 106 judges the data unit (number ' 0' to '7') relative to the range bounded by said data units (numbers '0' to '7') of the first string SRC1, e.g. compare the second string SRC2 number '0' Whether the data unit of the first character string SRC1 falls into the range between the data unit of the number '0' and the number '1' of the first character string SRC1, if so, output a hit (hit) signal, otherwise judge whether it falls into the first character string SRC1 The range between the data units of the number '1' and the number '2'... until the hit range is found or all the ranges are compared; then the shift register 104 shifts to get the data unit of the second string SRC2 number '1' Then perform the above comparison...and so on until the data unit numbered '7' of the second character string SRC2 completes the above comparison.
所述多工器Mux1~Mux7的该第二模式操作可用于满足上述全相等比较模式(Equal Each)以及子字符串比较模式(Equal Ordered)。在单一时钟周期,比较器cmp0~cmp7将该字符串缓存器102的所述缓存单元的内容(第一字符串SRC1编号’0’至’7’的数据单元)与该移位寄存器104的所述缓存单元的内容(第一字符串SRC1编号’0’至’7’的数据单元)一对一比较(即,第一字符串SRC1的编号’0’数据单元与第二字符串SRC2的编号’0’数据单元比较,第一字符串SRC1的编号’1’数据单元与第二字符串SRC2的编号’1’数据单元比较,以此类推至第一字符串SRC1的编号’7’数据单元与第二字符串SRC2的编号’7’数据单元比较)。该字符串比较指令要求的比较模式为上述全相等比较模式(Equal Each)时,该逻辑运算单元106根据所述比较器cmp0~cmp7的输出运算该第一字符串SRC1以及该第二字符串SRC2对应位者是否相等的比较结果。该字符串比较指令要求的比较模式为上述子字符串比较模式(Equal Ordered)时,该逻辑运算单元106根据所述比较器cmp0~cmp7的输出判断出该第一字符串SRC1以及该第二字符串SRC2之间相同的子字符串,具体而言,在该子字符串比较模式下,在第1个时钟周期,比较器cmp0~cmp7将该第一字符串SRC1的内容(编号’0’至’7’的数据单元)与该移位寄存器104中第二字符串SRC2的内容(编号’0’至’7’的数据单元)一对一比较,看是否命中;在第2个时钟周期,该移位寄存器104向右移位,比较器cmp0~cmp7将该第一字符串SRC1的内容(编号’0’至’7’的数据单元)与该移位寄存器104中第二字符串SRC2的内容(编号’1’至’7’的数据单元)一对一比较….,以此类推直到在第8个时钟周期,比较器cmp0~cmp7将该第一字符串SRC1的内容(编号’0’至’7’的数据单元)与该移位寄存器104中第二字符串SRC2的内容(仅剩编号至’7’的数据单元)比较。The second mode of operation of the multiplexers Mux1-Mux7 can be used to satisfy the above-mentioned equal comparison mode (Equal Each) and substring comparison mode (Equal Ordered). In a single clock cycle, the comparators cmp0-cmp7 compare the contents of the cache cells of the string buffer 102 (the data cells of the first string SRC1 numbers '0' to '7') with all the contents of the shift register 104 One-to-one comparison of the contents of the buffer unit (the data units of the first character string SRC1 numbers '0' to '7') (that is, the number '0' data unit of the first character string SRC1 and the number of the second character string SRC2 The '0' data unit is compared, the number '1' data unit of the first string SRC1 is compared with the number '1' data unit of the second string SRC2, and so on to the number '7' data unit of the first string SRC1 compared with the number '7' data unit of the second character string SRC2). When the comparison mode required by the string comparison instruction is the above-mentioned equal comparison mode (Equal Each), the logical operation unit 106 operates the first string SRC1 and the second string SRC2 according to the output of the comparators cmp0-cmp7 The comparison result of whether the corresponding bits are equal. When the comparison mode required by the string comparison instruction is the above-mentioned substring comparison mode (Equal Ordered), the logical operation unit 106 judges the first string SRC1 and the second character string SRC1 according to the output of the comparators cmp0-cmp7 The same substring between the strings SRC2, specifically, in the substring comparison mode, in the first clock cycle, the comparators cmp0 ~ cmp7 will use the contents of the first string SRC1 (number '0' to The data unit of '7') is compared one-to-one with the content of the second character string SRC2 in the shift register 104 (the data units of numbers '0' to '7') to see if it hits; in the second clock cycle, The shift register 104 shifts to the right, and the comparators cmp0-cmp7 compare the contents of the first character string SRC1 (data units of numbers '0' to '7') with the contents of the second character string SRC2 in the shift register 104 Contents (data units numbered '1' to '7') one-to-one comparison..., and so on until the 8th clock cycle, comparators cmp0 ~ cmp7 the content of the first character string SRC1 (numbered '0 ' to '7') are compared with the content of the second character string SRC2 in the shift register 104 (only data units numbered to '7' remain).
整理之,逻辑运算单元106包括根据所述比较器cmp0~cmp7的输出对应该字符串比较指令要求的比较模式运算该第一字符串SRC1以及该第二字符串SRC2的比较结果。In summary, the logic operation unit 106 includes comparing the first string SRC1 and the second string SRC2 according to the comparison mode required by the string comparison instruction corresponding to the outputs of the comparators cmp0-cmp7.
由于在执行字符串比较指令时,首先要辨识比较指令的操作数(即所比较的第一字符串SRC1和第二字符串SRC2)的有效长度,以便于对逻辑运算单元106的输出进行处理,对于无效位置的数据单元所对应的输出则直接丢弃。现有技术需要额外进行零值比较器,硬件成本较高,而本发明一实施例还可复用执行字符串比较指令的多个比较器cmp0~cmp7来进行第一字符串SRC1或第二字符串SRC2的有效长度辨识。详言之,所述多工器Mux1~Mux7的该第二模式操作还可复用于第一字符串SRC1的有效长度辨识。为实现之,字符串缓存器102缓存第一字符SRC1,且填入该移位寄存器104的是零值。单一时钟周期中,比较器cmp0~cmp7将该字符串缓存器102的所述缓存单元的内容(第一字符串SRC1编号’0’至’7’的数据单元)与都为零值的八个数据单元一对一比较。据此,该逻辑运算单元106辨识出该第一字符串SRC1的有效长度。至于第二字符串SRC2有效长度辨识,图1所示实施方式是以该零值比较器Zero_cmp将该第二字符串SRC2与零值比较。据此,该逻辑运算单元106根据该零值比较器Zero_cmp的输出辨识出该第二字符串SRC2的有效长度。以上第一字符串SRC1以及第二字符串SRC2的有效长度可安排在同一时钟周期获得,且安排在该字符串比较指令的初始化阶段。在其它实施例中,也可将比较器cmp0~cmp7复用于第二字符串SRC2的有效长度辨识。为实现之,移位寄存器104缓存第二字符SRC2,且填入字符串缓存器102的是零值。单一时钟周期中,比较器cmp0~cmp7将移位寄存器104(此时不移位)的所述缓存单元的内容与都为零值的八个数据单元一对一比较。据此,该逻辑运算单元106在一个时钟周期辨识出该第二字符串SRC2的有效长度。此实施例中,第一字符串SRC1有效长度辨识则是在同一时钟周期以另一零值比较器将该第一字符串SRC1与零值比较。Since when executing the string comparison instruction, the effective length of the operand of the comparison instruction (i.e. the first string SRC1 and the second string SRC2 to be compared) should be identified first, so as to process the output of the logical operation unit 106, The output corresponding to the data unit in the invalid position is discarded directly. The prior art requires an additional zero-valued comparator, and the hardware cost is relatively high. However, in an embodiment of the present invention, multiple comparators cmp0-cmp7 that execute string comparison instructions can be reused to perform the first string SRC1 or the second character Effective length identification of string SRC2. In detail, the second mode of operation of the multiplexers Mux1 -Mux7 can also be multiplexed to identify the effective length of the first string SRC1 . To achieve this, the string register 102 buffers the first character SRC1, and the shift register 104 is filled with zero values. In a single clock cycle, the comparators cmp0-cmp7 compare the contents of the buffer unit of the character string buffer 102 (the data units of the first character string SRC1 numbers '0' to '7') with the eight data units that are all zero values. Data units are compared one-to-one. Accordingly, the logical operation unit 106 identifies the effective length of the first character string SRC1. As for the effective length identification of the second character string SRC2, the embodiment shown in FIG. 1 uses the zero value comparator Zero_cmp to compare the second character string SRC2 with a zero value. Accordingly, the logical operation unit 106 identifies the effective length of the second string SRC2 according to the output of the zero comparator Zero_cmp. The above effective lengths of the first string SRC1 and the second string SRC2 can be arranged to be obtained in the same clock cycle, and arranged in the initialization stage of the string comparison instruction. In other embodiments, the comparators cmp0-cmp7 can also be multiplexed for identifying the effective length of the second character string SRC2. To achieve this, the shift register 104 buffers the second character SRC2, and the string register 102 is filled with zero values. In a single clock cycle, the comparators cmp0-cmp7 compare the content of the buffer unit of the shift register 104 (not shifted at this time) with the eight data units that are all zero values one-to-one. Accordingly, the logical operation unit 106 identifies the effective length of the second character string SRC2 in one clock cycle. In this embodiment, the identification of the effective length of the first string SRC1 is to compare the first string SRC1 with a zero value by using another zero comparator in the same clock cycle.
图1所示的中央处理单元100的设计属于中央处理单元的执行单元(executionunit)内的硬件架构设计,例如特别地为执行单元中执行字符串比较指令的单指令多数据(Single Instruction Multiple Data,SIMD)执行单元的硬件架构设计。现有技术中,要执行字符串比较指令的大量比较操作(包括辨识字符串有效长度时与零值的比较操作,及两个字符串的复数基本单元之间的两两比较操作)需要复用执行单元中的整形单元(IU)中的比较器来实现,占用硬件资源多;并且一个字符串比较指令需要译码为多达上百条的微运算(或微操作或“μops”)通过多个时钟周期才能完成,这些微运算分别实现辨识有效长度、缓存、比较、移位以及比较结果生成等操作。而本发明采用上述执行单元的微架构后,则可仅需3条的微运算就可实现该字符串比较指令,这3条微运算包括:字符串有效长度辨识μop、比较操作μop以及比较结果生成μop(后面图2详细描述),占用较少硬件资源的同时又缩短了字符串比较指令的执行周期。该执行单元架构通用该字符串比较指令所要求的所有比较模式(包括逐个相等比较模式(Equal Any)、逐个大小比较模式(Ranges)、全相等比较模式(Equal Each)以及子字符串比较模式(Equal Ordered)),还在时钟周期中达到第一字符串SRC1以及第二字符串SRC2的有效长度辨识。相较于不涉及流水线(pipeline)设计的传统字符串比较指令,本发明如此流水线设计显著加快字符串比较指令的执行速度。此外,上述数据单元的大小可弹性设计,例如:为1位、2位(1字符)、4位(2字符)、8位(4字符)、16位(8字符)…等,端取决于流水线复杂度以及时钟周期消耗之间如何权衡。该字符串比较指令,举例而言为SSE4.2指令集中新加入的字符串比较指令,例如SSEPCMPxSTRx指令,其例如包括PCMPESTRI,PCMPESTRM,PCMPISTRI,PCMPISTRM指令等对应不同比较模式的字符串比较指令。The design of the central processing unit 100 shown in FIG. 1 belongs to the hardware architecture design in the execution unit (execution unit) of the central processing unit, for example, the Single Instruction Multiple Data (SIMD) that performs string comparison instructions in the execution unit in particular. SIMD) execution unit hardware architecture design. In the prior art, a large number of comparison operations (including the comparison operation with zero value when identifying the effective length of the string and the pairwise comparison operation between the complex basic units of two strings) need to be multiplexed to perform a large number of comparison operations of string comparison instructions. The comparator in the shaping unit (IU) in the execution unit takes up a lot of hardware resources; and a string comparison instruction needs to be decoded into as many as hundreds of micro-operations (or micro-operations or "μops") through multiple Only one clock cycle can be completed, and these micro-operations realize operations such as identifying the effective length, caching, comparing, shifting, and generating comparison results. However, after the present invention adopts the micro-architecture of the above-mentioned execution unit, the string comparison instruction can be realized by only 3 micro-operations, and these 3 micro-operations include: string effective length identification μop, comparison operation μop and comparison result Generate μop (described in detail in Figure 2 later), which takes up less hardware resources and shortens the execution cycle of string comparison instructions. The execution unit architecture is common to all comparison modes required by the string comparison instruction (including equal comparison mode (Equal Any), size comparison mode (Ranges) one by one, equal comparison mode (Equal Each) and substring comparison mode ( Equal Ordered)), also achieve the effective length identification of the first character string SRC1 and the second character string SRC2 in the clock cycle. Compared with traditional character string comparison instructions that do not involve pipeline (pipeline) design, the pipeline design of the present invention significantly speeds up the execution speed of character string comparison instructions. In addition, the size of the above-mentioned data unit can be flexibly designed, for example: 1 bit, 2 bits (1 character), 4 bits (2 characters), 8 bits (4 characters), 16 bits (8 characters), etc., depending on What is the trade-off between pipeline complexity and clock cycle consumption. The string comparison instruction is, for example, a newly added string comparison instruction in the SSE4.2 instruction set, such as the SSEPCMPxSTRx instruction, which includes, for example, PCMPESTRI, PCMPESTRM, PCMPISTRI, PCMPISTRM instructions and other string comparison instructions corresponding to different comparison modes.
图2以流程图显示根据本发明的中央处理单元的微架构字符串比较指令可转换为三个中央处理单元(CPU)微运算(μops):字符串有效长度辨识微运算202;比较操作微运算204;以及比较结果产生微运算206依序进行。上述于时钟周期实现的第一字符串SRC1以及第二字符串SRC2的有效长度辨识即对应该字符串有效长度辨识微运算202,此时多工器Mux1~Mux7操作在该第二模式、且填入该移位寄存器104的是零值(或者填入该字符串缓存器102的是零值)。比较操作微运算204和比较结果产生微运算206执行时,填入该移位寄存器104的则是第二字符串SRC2(或者填入该字符串缓存器102的是第一字符串SRC1),至于多工器Mux1~Mux7则是视该字符串比较指令要求的比较模式在该第一或该第二模式操作。逻辑运算单元106的动作则是对应该比较结果产生微运算206。Fig. 2 shows that the microarchitecture string comparison instruction of the central processing unit according to the present invention can be converted into three central processing unit (CPU) micro-operations (μops): string effective length identification micro-operation 202; comparison operation micro-operation 204; and the comparison result generation micro-operation 206 is performed sequentially. The identification of the effective length of the first character string SRC1 and the second character string SRC2 realized in the clock cycle corresponds to the micro-operation 202 for identifying the effective length of the character string. At this time, the multiplexers Mux1-Mux7 operate in the second mode and fill A value of zero is loaded into the shift register 104 (or a value of zero is filled into the string register 102). When the comparison operation micro-operation 204 and the comparison result generation micro-operation 206 were executed, what was filled into the shift register 104 was the second character string SRC2 (or what was filled into the character string register 102 was the first character string SRC1). The multiplexers Mux1-Mux7 operate in the first mode or the second mode depending on the comparison mode required by the string comparison instruction. The action of the logical operation unit 106 is to generate a micro-operation 206 corresponding to the comparison result.
图3依序图解该移位寄存器104内容。字符串有效长度辨识微运算202执行时,对应标号302,该移位寄存器104储存的是零值。比较操作微运算204执行时,对应标号304~318,八个时钟周期中,填入该移位寄存器104的第二字符串SRC2的标号’0’~标号’7’的数据单元循序移位;移位寄存器104最低位缓存单元的内容依序是编号’0’、’1’、’2’、’3’、’4’、’5’、’6’以及’7’。FIG. 3 illustrates the contents of the shift register 104 in sequence. When the character string effective length identification micro-operation 202 is executed, corresponding to the label 302, the shift register 104 stores a value of zero. When the comparison operation micro-operation 204 is executed, corresponding to labels 304 to 318, in eight clock cycles, the data units of the labels '0' to '7' of the second character string SRC2 filled in the shift register 104 are sequentially shifted; The contents of the lowest bit buffer unit of the shift register 104 are numbered '0', '1', '2', '3', '4', '5', '6' and '7' in sequence.
本发明另一种实施方式还实现一种中央处理单元操作方法,对应图1,以中央处理单元100执行字符串比较指令,包括:以该中央处理单元100的字符串缓存器102的多个缓存单元储存第一字符串SRC1的多个数据单元(编号’0’至’7’);令第二字符串SRC2的多个数据单元(编号’0’至’7’)在该中央处理单元100的移位寄存器104的多个缓存单元间逐步移位;将该字符串缓存器102的所述缓存单元的内容与该移位寄存器104的所述缓存单元的内容进行比较;以及根据前述比较的结果,对应该字符串比较指令要求的比较模式,运算该第一字符串SRC1以及该第二字符串SRC2的比较结果。Another embodiment of the present invention also implements a method for operating a central processing unit, corresponding to FIG. The unit stores a plurality of data units (numbers '0' to '7') of the first character string SRC1; makes a plurality of data units (numbers '0' to '7') of the second character string SRC2 Stepwise shift between a plurality of cache cells of the shift register 104; compare the contents of the cache cells of the character string buffer 102 with the contents of the cache cells of the shift register 104; and compare according to the foregoing As a result, the comparison result of the first string SRC1 and the second string SRC2 is calculated corresponding to the comparison mode required by the string comparison instruction.
在其他实施方式中,多工器Mux1~Mux7不使用。微架构单纯为该字符串比较指令所要求的逐个相等比较模式(Equal Any)或逐个大小比较模式(Ranges)使用,或者单纯为该字符串比较指令所要求的全相等比较模式(Equal Each)或子字符串比较模式(EqualOrdered)使用。In other implementation manners, the multiplexers Mux1-Mux7 are not used. The micro-architecture is only used for the one-by-one equal comparison mode (Equal Any) or one-by-one size comparison mode (Ranges) required by the string comparison instruction, or simply for the all-equal comparison mode (Equal Each) or Substring comparison mode (EqualOrdered) used.
虽然本发明已以较佳实施例揭露如上,然其并非用以限定本发明,任何熟悉此项技艺者,在不脱离本发明的精神和范围内,当可做些许更动与润饰,因此本发明的保护范围当视权利要求书所界定的为准。Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Anyone familiar with this art can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, this The scope of protection of the invention should be defined by the claims.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610485700.3ACN106201440B (en) | 2016-06-28 | 2016-06-28 | The central processing unit of character string comparison optimization and its operating method |
| TW105138801ATWI616812B (en) | 2016-06-28 | 2016-11-25 | Central processing unit and operating method therefor with improved string comparisom instruction |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610485700.3ACN106201440B (en) | 2016-06-28 | 2016-06-28 | The central processing unit of character string comparison optimization and its operating method |
| Publication Number | Publication Date |
|---|---|
| CN106201440A CN106201440A (en) | 2016-12-07 |
| CN106201440Btrue CN106201440B (en) | 2018-10-23 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610485700.3AActiveCN106201440B (en) | 2016-06-28 | 2016-06-28 | The central processing unit of character string comparison optimization and its operating method |
| Country | Link |
|---|---|
| CN (1) | CN106201440B (en) |
| TW (1) | TWI616812B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5295250A (en)* | 1990-02-26 | 1994-03-15 | Nec Corporation | Microprocessor having barrel shifter and direct path for directly rewriting output data of barrel shifter to its input |
| CN101251791A (en)* | 2006-09-22 | 2008-08-27 | 英特尔公司 | Instructions and logic for processing text strings |
| CN102736888A (en)* | 2012-07-02 | 2012-10-17 | 江汉大学 | Data retrieval circuit being synchronous with data stream |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5193167A (en)* | 1990-06-29 | 1993-03-09 | Digital Equipment Corporation | Ensuring data integrity by locked-load and conditional-store operations in a multiprocessor system |
| TWI244033B (en)* | 2003-11-26 | 2005-11-21 | Sunplus Technology Co Ltd | Processor capable of cross-boundary alignment of a plurality of register data and method of the same |
| US9588762B2 (en)* | 2012-03-15 | 2017-03-07 | International Business Machines Corporation | Vector find element not equal instruction |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5295250A (en)* | 1990-02-26 | 1994-03-15 | Nec Corporation | Microprocessor having barrel shifter and direct path for directly rewriting output data of barrel shifter to its input |
| CN101251791A (en)* | 2006-09-22 | 2008-08-27 | 英特尔公司 | Instructions and logic for processing text strings |
| CN102736888A (en)* | 2012-07-02 | 2012-10-17 | 江汉大学 | Data retrieval circuit being synchronous with data stream |
| Publication number | Publication date |
|---|---|
| TW201800935A (en) | 2018-01-01 |
| CN106201440A (en) | 2016-12-07 |
| TWI616812B (en) | 2018-03-01 |
| Publication | Publication Date | Title |
|---|---|---|
| US12132821B2 (en) | SM3 hash algorithm acceleration processors, methods, systems, and instructions | |
| US10503505B2 (en) | Read and write masks update instruction for vectorization of recursive computations over independent data | |
| US10503510B2 (en) | SM3 hash function message expansion processors, methods, systems, and instructions | |
| TWI567585B (en) | Sm4 acceleration processors, methods, systems, and instructions | |
| US9100184B2 (en) | Instructions processors, methods, and systems to process BLAKE secure hashing algorithm | |
| TWI715618B (en) | Data element comparison processors, methods, systems, and instructions | |
| CN104982051B (en) | Apparatus, method and system for accelerating wireless security algorithm | |
| WO2017112351A1 (en) | Hardware apparatuses and methods for data decompression | |
| US9898300B2 (en) | Instruction for fast ZUC algorithm processing | |
| US20160085547A1 (en) | Data element selection and consolidation processors, methods, systems, and instructions | |
| US20140095828A1 (en) | Vector move instruction controlled by read and write masks | |
| US10509580B2 (en) | Memory controller and methods for memory compression utilizing a hardware compression engine and a dictionary to indicate a zero value, full match, partial match, or no match | |
| US10083032B2 (en) | System, apparatus and method for generating a loop alignment count or a loop alignment mask | |
| CN106201440B (en) | The central processing unit of character string comparison optimization and its operating method |
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CP01 | Change in the name or title of a patent holder | ||
| CP01 | Change in the name or title of a patent holder | Address after:Room 301, 2537 Jinke Road, Zhangjiang High Tech Park, Pudong New Area, Shanghai 201203 Patentee after:Shanghai Zhaoxin Semiconductor Co.,Ltd. Address before:Room 301, 2537 Jinke Road, Zhangjiang High Tech Park, Pudong New Area, Shanghai 201203 Patentee before:VIA ALLIANCE SEMICONDUCTOR Co.,Ltd. |