### A study on Wallace tree multiplier

Priyanka Mishra<sup>1,</sup> Seema Nayak<sup>2</sup>

Department of Electronics and Communication Engineering <sup>1,2</sup> IIMT Group of Colleges, Greator Noida (India)

### ABSTRACT

The need for low-power VLSI system arises from two main forces. First, with the steady growth of operating frequency and processing capacity per chip, large currents have to be delivered and the heat due to large power consumption must be removed by proper cooling techniques. Second, battery life in portable electronic devices is limited. Low power design directly leads to prolonged operation time in these portable devices. The principle of Wallace tree multiplication involves algorithm by grouping the partial products into sets of 3.

Keywords: Arithmetic And Logic Unit, Wallace Tree Multiplier, Fast Fourier Transform (FFT),

### **I INTRODUCTION**

Multipliers are one the most important component of many systems. In high speed digital signal processing (DSP) and image processing multiplier play an vital role. In image processing fast Fourier transform (FFT) is one of the most important transform often used. A computational process of fast Fourier transform requires large number of multiplication and addition operation. The execution of these algorithms requires dedicated MAC and Arithmetic and Logic Unit (ALU) architectures. Multipliers and adders are the key element of these arithmetic units as they lie in the critical path.

With the recent advances in technology, many researchers have tried to implement increasingly efficient multiplier. They aim at offering low power consumption, high speed and reduced delay. Multiplication is a fundamental operation in most signal processing algorithms. Multipliers have large area, long latency and consume considerable power. Therefore low-power multiplier design has been an important part in low- power VLSI system design. It is an easily hardware implementable and efficient methodology, that multiplies two integers, proposed by an Australian Computer Scientist Chris Wallace. For unsigned multiplication, up to n shifted copies of the multiplicand are added to form the result. The entire procedure is carried out into three steps: partial product (PP) generation, partial product grouping & reduction, and final addition.

There has been extensive work on low-power multipliers at technology, physical, circuit and logic levels. A system's performance is generally determined by the performance of the multiplier because the multiplier is generally the slowest element in the system. Furthermore, it is generally the most area consuming. Hence, optimizing the speed and area of the multiplier is a major design issue. However, area and speed are usually conflicting constraints so that improving speed results mostly in larger areas. As a result, a whole spectrum of multipliers with different area- speed constraints has been designed with fully parallel.

Fully Parallel Multipliers at one end of the spectrum and fully serial multipliers at the other end. In between are digit serial multipliers where single digits consisting of several bits are operated on. These multipliers have moderate performance in both speed and area. However, existing digit serial multipliers have been plagued by complicated switching systems and/or irregularities in design. Radix 2<sup>n</sup> multipliers which operate on digits in a parallel fashion instead of bits bring the pipelining to the digit level and avoid most of' the above problems. These structures are iterative and modular. The pipelining done at the digit level brings the benefit of constant operation speed irrespective of the size of the multiplier. The clock speed is only determined by the digit size which is already fixed before the design is implemented. Multiplication consists of three steps: generation of partial products or (PPG), reduction of partial products (PPR), and finally carry-propagate addition (CPA).In general there are sequential and combinational multiplier implementations. We only consider combinational case here because the scale of integration now is large enough to accept parallel multiplier implementations in digital VLSI systems. One such multiplier is Standard Wallace Multiplier (SWM). SWM is fully parallel version of the multiplier, the carry save adders used in SWM are conventional full adders whose carries are not connected, so that three inputs are taken and two words are out. SWM also uses half adders in reduction phase. The basic principle of Wallace tree multiplication is ,for an n x n multiplication there are n2 partial products that have to be summed. The 1st step in the algorithm involves grouping the partial products into sets of 3. It is

considerably faster than a simple array multiplier because its height is logarithmic in word size, not linear. However, in addition to the large number of adders required, the Wallace tree's wiring is much less regular and more complicated. As a result, Wallace trees are often avoided by designers, while design complexity is a concern to them.

Wallace tree styles use a log-depth tree network for reduction. Faster, but irregular, they trade ease of layout for speed. Wallace tree styles are generally avoided for low power applications, since excess of wiring is likely to consume extra power. While subsequently faster than Carry-save structure for large bit multipliers, the Wallace tree multiplier has the disadvantage of being very irregular, which complicates the task of coming with an efficient layout. The Wallace tree multiplier is a high speed multiplier. The summing of the partial product bits in parallel using a tree of carry-save adders became generally known as the "Wallace Tree".

#### **II MATERIAL AND METHOD**

Digital signal processors are designed in such a way that it has a feature of DSP algorithm in real time. The basic building blocks of DSP are multiplier, arithmetic logic unit and multiply and accumulate unit. For real time applications the key issues in the design of computational building blocks of digital signal processor are speed and accuracy [1]. Multiplication is essential for computers, process controllers, microprocessors, digital signal processing and graphic engines. In DSP processor, multiplication is done by multiplier. So, system performance is determined by the performance of the multiplier. Hence various multiplier architectures have been proposed to increase the performance of the multiplier

There are different factors that drive for high performance electronic system design in terms of low power dissipation and high speed [2-3]. With the recent advances in technology, many researchers have worked on the design of increasingly more efficient multipliers. Main aim is to offer higher speed and lower power dissipation.

This makes them compatible for various complex and portable VLSI circuit implementations. The block diagram of multiplier architecture is shown in Fig.1. The multiplier architecture consists of three parts: generation of partial product, addition of partial product and final addition.



#### Fig.1.Block diagram of multiplier architecture

Parallel multipliers are the high speed multiplier. There are so many multiplier architectures that are available for designing parallel multiplier. Wallace multiplier is one of them. A Wallace tree is an efficient hardware implementation of a digital circuit. It can multiply two integers. A multiplier based on Wallace-tree structure is called Wallace multiplier. It is substantially faster than other multiplier [1].

The multiplier is composed of half adder, full adder and AND gate. The designing of half and full adder is designed with CMOS technology because of the various advantages of the circuit. It provides high input impedance. The input signal is driving electrodes with a layer of insulation (the metal oxide) between them and what they are controlling. This gives them a small amount of capacitance, but virtually infinite resistance. The current into or out of CMOS input held at one level is just leakage, usually 1 nano Ampere or less.CMOS logic takes very little power when held in a fixed state. The current consumption comes from switching as those capacitors are charged and discharged. Even then, it has good speed to power ratio compared to other logic types.CMOS gates are very simple. The basic gate is an inverter, which is only two transistors. This together with the low power consumption means it lends itself well to dense integration. Or conversely, you get a lot of logic for the size, cost and power. Figure 2 and 3 shows the designing of half and full adder with the CMOS technology.



#### Figure 2:Designing of half adder with CMOS technology



Fig 3: Designing of full adder with CMOS technology

The figure4, given below shows the example of 4x4 Wallace tree multiplier that speed up the multiplier by decreasing the number of stages, grouping the partial product at an interval of three line and then adding.Otherwise, pass it to the next stage.Finally, the remaining two lines, calculation is done by using fast adder and ripple carry adder.



Fig.4. 4x4 Wallace tree multiplier with partial product and various stages





#### 2.1 Various stages in Wallace tree multiplier

Numbers are binary integers, It is "long hand" multiplication.Products (each in blue square) are the result of simple AND gate.All products are done simultaneously.Requires N square AND gates. To add up the columns, add up three rows at a time.The result for each set of three row is a set of two rows.Each resulting set of two rows has a row for the sum and a row for the carry out.Odd rows are left alone.Repeat the processThere are two set of three rowsResult is two set of two rows.Gray boxes indicates that summation bits have been moved down to the carry out row.Repeat the processThis time, there is only one set of three rows, plus an extra row to carry down.Result is three rows, two from the set of three plus the one carried down.All adders are done in parallel.Repeat the process one last time.Remaining three rows become two rows.In this example, stage 2 had 4 steps, and four full adder delays.The five LSBs have already been calculated.**Final Addition**,Final result is calculated by adding the final two rows.The result is Wallace tree multiplier.

#### **III CONCLUSION**

It can be concluded that Wallace Multiplier is superior in all respect like speed, delay, area, complexity, power consumption. However Array Multiplier requires more power consumption and gives optimum number of components required, but delay for this multiplier is larger. Hence for low power requirement and for less delay requirement Wallace tree multiplier is suggested. This gives efficient algorithms or formulae for multiplication which increase the speed of devices.Wallace multiplier is an efficient parallel multiplier.If the multiplication is of NxN bits, then it produces N square partial product.Wallace tree multiplier or Wallace multiplier is the most popular multiplier among the existing multipliers. Wallace multiplier is also known for its fast speed and low power consumption.

In multimedia and communication systems, FIR filters, digital signal processor, microprocessors etc. Many current DSP applications are targeted at portable, battery-operated systems, so that power dissipation becomes one of the primary design constraints. DSP, Image processing architectures and microprocessors. Fast Fourier Transform (FFT), Discrete and in Wavelet Transform (DWT) and auto-correlation.

#### REFERENCES

- Wallace, C. S., "A Suggestion for a Fast Multiplier", IEEE Transactions on Computers, vol. 13, pp. 14-17, 1964.
- [2] Y. Harata; Y. Nakamura; H. Nagase; M. Takigawa; N. Takagi,"A highspeed multiplier using a redundant binary adder tree", IEEE Journal of Solid-State Circuits, vol. 22, pp. 28-34,1987.
- [3] P. G. McCrea; W. S. Matheson,"Design of high-speed fully serial tree multiplier", IEEE Proceedings E -Computers and Digital Techniques, vol. 128, pp. 13-20, 1981.
- [4] G.Goto; T.Sato; M.Nakajima; T.Sukemura,"A 54×54 regularly structured tree multiplier",IEEE Journal of Solid-State Circuits,vol. 27,pp. 1229-1236,1992.

- [5] J. Fadavi-Ardekani," M\*N Booth encoded multiplier generator using optimized Wallace trees", IEEE Transactions on Very Large Scale Integration (VLSI) Systems (Volume:1, Issue: 2), PP. 120-125, 1993.
- [6] B. Parhami; S. Kawahito; M. Ishida; T. Nakamura; M. Kameyama; T. Higuchi," Comments on "High-speed area-efficient multiplier design using multiple-valued current-mode circuits""IEEE Transactions on Computers, vol. 45, pp. 637-639, 1996.
- [7] J. B. Kuo; K. W. Su; J. H. Lou," 1.5V BiCMOS dynamic multiplier using Wallace tree reduction architecture", Electronics Letters, vol. 29, pp. 2097-2098, 1993.
- [8] S. Kawahito; M. Ishida; T. Nakamura; M. Kameyama; T. Higuchi," High-speed area-efficient multiplier design using multiple-valued current-mode circuits", IEEE Transactions on Computers, vol. 43, pp. 34-42, 1994.
- [9] J. B. Kuo; K. W. Su; J. H. Lou," A BiCMOS dynamic multiplier using Wallace tree reduction architecture and 1.5 V full-swing BiCMOS dynamic logic circuit", Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94,vol. 4,pp. 323-326,1994.
- [10] V. G. Oklobdzija; D. Villeger,"Improving multiplier design by using improved column compression treeand optimized final adder in CMOS technology",IEEE Transactions on Very Large Scale Integration (VLSI) Systems,vol. 3,pp. 292-301,1995.
- [11] Shen-Fu Hsiao; Ming-Roun Jiang; Jia-Sien Yeh," Design of high-speed low-power 3-2 counter and 4-2 compressor for fastmultipliers", Electronics Letters, vol.34, pp. 341-343, 1998.
- [12] Wen-Chang Yeh; Chein-Wei Jen," High-speed Booth encoded parallel multiplier design", IEEE Transactions on Computers, vol. 49, pp.692-701, 2000.
- [13] N. Itoh; Y. Naemura; H. Makino; Y. Nakase; T. Yoshihara; Y. Horiba, "A 600-MHz 54×54-bit multiplier with rectangular-styled Wallace tree", IEEE Journal of Solid-State Circuits,vol. 36,pp. 249-257,2001.
- [14] Vinoth, C., Bhaaskaran, V. S. K., Brindha, B., Sakthikumaran, S., Kavinilavu, V., Bhaskar, B., Kanagasabapathy, M., and Sharath, B., "A Novel Low Power and High Speed Wallace Tree Multiplier for RISC Processor", IEEE 3rd International Conference on Electronics Computer Technology, Kanyakumari, pp. 330 – 334, 2011.
- [15] Dubey, S., and Rao, M. J., "A High Speed and Area Efficient Booth Recoded Wallace Tree Multiplier for fast Arithmetic Circuits", IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics & Electronics, Hyderabad, pp. 220 – 223, 2012.