Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Parallel processing (DSP implementation)

From Wikipedia, the free encyclopedia

Indigital signal processing (DSP),parallel processing is a technique duplicating function units to operate different tasks (signals) simultaneously.[1] Accordingly, we can perform the same processing for differentsignals on the corresponding duplicated function units. Further, due to the features ofparallel processing, the parallel DSP design often contains multiple outputs, resulting in higher throughput than not parallel.

Conceptual example

[edit]

Consider a function unit (F0{\displaystyle F_{0}}) and three tasks (T0{\displaystyle T_{0}},T1{\displaystyle T_{1}}, andT2{\displaystyle T_{2}}). The required time for the function unitF0{\displaystyle F_{0}} to process those tasks ist0{\displaystyle t_{0}},t1{\displaystyle t_{1}}, andt2{\displaystyle t_{2}}, respectively. Then, if we operate these three tasks in a sequential order, the required time to complete them ist0+t1+t2{\displaystyle t_{0}+t_{1}+t_{2}}.


However, if we duplicate the function unit to another two copies (F{\displaystyle F}), the aggregate time is reduced tomax(t0,t1,t2){\displaystyle max(t_{0},t_{1},t_{2})}, which is smaller than in a sequential order.


Versus pipelining

[edit]

Mechanism:

  • Parallel: duplicated function units working in parallel
    • Each task is processed entirely by a different function unit.
  • Pipelining: different function units working in parallel
    • Each task is split into a sequence of sub-tasks, which are handled by specialized and different function units.

Objective:

  • Pipelining leads to a reduction in the critical path, which can increase thesample speed or reducepower consumption at the same speed, yielding higherperformance per watt.
  • Parallel processing techniques require multiple outputs, which are computed in parallel in aclock period. Therefore, the effective sample speed is increased by the level of parallelism.

Consider a condition that we are able to apply both parallel processing and pipelining techniques, it is better to choose parallel processing techniques with the following reasons

  • Pipelining usually causes I/O bottlenecks
  • Parallel processing is also utilized for reduction of power consumption while using slow clocks
  • The hybrid method of pipelining and parallel processing further increase the speed of the architecture

Parallel FIR filters

[edit]

Consider a 3-tap FIR filter:[2]

y(n)=ax(n)+bx(n1)+cx(n2){\displaystyle y(n)=ax(n)+bx(n-1)+cx(n-2)}

which is shown in the following figure.

Assume the calculation time for multiplication units is Tm and Ta for add units. The sample period is given by

TsampleTm+2Ta{\displaystyle T_{\text{sample}}\geq T_{m}+2T_{a}}

By parallelizing it, the resultant architecture is shown as follows. The sample rate now becomes

TsampleTclockN=Tm+2Ta3{\displaystyle T_{\text{sample}}\geq {\frac {T_{\text{clock}}}{N}}={\frac {T_{m}+2T_{a}}{3}}}

where N represents the number of copies.

Please note that, in a parallel system,TsampleTclock{\displaystyle T_{\text{sample}}\neq T_{\text{clock}}} whileTsample=Tclock{\displaystyle T_{\text{sample}}=T_{\text{clock}}} holds in a pipelined system.

Parallel 1st-order IIR filters

[edit]

Consider the transfer function of a 1st-order IIR filter formulated as

H(z)=z11az1{\displaystyle H(z)={\frac {z^{-1}}{1-az^{-1}}}}

where |a| ≤ 1 for stability, and such filter has only one pole located atz = a;

The corresponding recursive representation is

y(n+1)=ay(n)+u(n){\displaystyle y(n+1)=ay(n)+u(n)}

Consider the design of a 4-parallel architecture (N = 4). In such parallel system, each delay element means a block delay and the clock period is four times the sample period.

Therefore, by iterating the recursion withn = 4k, we have

y(n+4)=a4y(n)+a3u(n)+a2u(n+1)+au(n+2)+u(n+3){\displaystyle y(n+4)=a^{4}y(n)+a^{3}u(n)+a^{2}u(n+1)+au(n+2)+u(n+3)}
y(4k+4)=a4y(4k)+a3u(4k)+a2u(4k+1)+au(4k+2)+u(4k+3){\displaystyle \rightarrow y(4k+4)=a^{4}y(4k)+a^{3}u(4k)+a^{2}u(4k+1)+au(4k+2)+u(4k+3)}

The corresponding architecture is shown as follows.

The resultant parallel design has the following properties.

  • The pole of the original filter is atz = a while the pole for the parallel system is atz = a4 which is closer to the origin.
  • The pole movement improves the robustness of the system to the round-off noise.
  • Hardware complexity of this architecture:N×N multiply-add operations.

The square increase in hardware complexity can be reduced by exploiting the concurrency and the incremental computation to avoid repeated computing.

Parallel processing for low power

[edit]

Another advantage for the parallel processing techniques is that it can reduce the power consumption of a system by reducing the supply voltage.

Consider the following power consumption in a normal CMOS circuit.

Pseq=CtotalV02f{\displaystyle P_{\text{seq}}=C_{\text{total}}\cdot V_{0}^{2}\cdot f}

where theCtotal represents the total capacitance of the CMOS circuit.

For a parallel version, the charging capacitance remains the same but the total capacitance increases byN times.

In order to maintain the same sample rate, the clock period of theN-parallel circuit increases toN times the propagation delay of the original circuit.

It makes the charging time prolongsN times. The supply voltage can be reduced toβV0.

Therefore, the power consumption of the N-parallel system can be formulated as

Ppara=(NCtotal)(βV02)fN=β2Pseq{\displaystyle P_{\text{para}}=(NC_{\text{total}})\cdot (\beta V_{0}^{2})\cdot {\frac {f}{N}}=\beta ^{2}P_{\text{seq}}}

whereβ can be computed by

N(βV0Vt)2=β(V0Vt)2.{\displaystyle N(\beta V_{0}-V_{t})^{2}=\beta (V_{0}-V_{t})^{2}.\,}

References

[edit]
  1. ^K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, John Wiley, 1999
  2. ^Slides for VLSI Digital Signal Processing Systems: Design and Implementation John Wiley & Sons, 1999 (ISBN 0-471-24186-5):http://people.ece.umn.edu/~parhi/publications/books/
Retrieved from "https://en.wikipedia.org/w/index.php?title=Parallel_processing_(DSP_implementation)&oldid=1202019986"
Categories:

[8]ページ先頭

©2009-2025 Movatter.jp