Distributed arithmetic design

Distributed arithmetic design

Introduction

Distributed arithmetic is an important algorithm for DSP applications. It is based on a bit level rearrangement of the MAC "multiply accumulate" operation to replace it with set of addition and shifting operations.

Theoretical derivation

Download the ps file or the pdf file for the theoretical derivation

Practical Derivation

This algorithm is based on the scaling accumulation algorithm described in [1]. This accumulator takes one parallel and one serial inputs. The parallel input in DA algorithm is considered to be a constant. Several Scaling accumulator units can be used in parallel to do the MAC operation for many terms. The next figer shows two serial inputs (A,B) that are multiplied by the constants (C1,C2). These constants and the AND operation with input bit can be considered as product terms.

The output of this circuit is A*C1 + B*C2

If we moved the adder-shifter circuits at the end of the circuit and using single circuit we eleminate large number of adders = N number of inputs to only one. The constants with the AND operators and partial sums can represent product terms that can have predefined equations. These equations can be implemented using ROM where its contents are defined by the constants and its address the inputs bits (A,B in our example)

Since only one bit of each input goes to the ROM address The rom contents is defined as follows assuming we have three inputs:

Addr (000) => 0

Addr (001) => C0

Addr (010) => C1

Addr (011) => C0+C1

Addr (100) => C2

Addr (101) => C0+C2

Addr (110) => C1+C2

Addr (111) => C0+C1+C2

This algorithm can be implemented in many different approaches depending on he system constraints. For more information refer to the links below.

The Implementation I used is shown in the next figure where the first block is a 2D shifter that is connected to the address lines of the Distributed Arithmetic Look-up Table (DALUT). The result is added serially to produce filter's result.

Implementation

The VHDL code is available on the opencores CVS under arith_cores modules.

The design can take N number of tabs and M number of bits for data and constants. These parameters can be changed in DApkg.vhd.
The current design has 4 tabs and 8 bits data size.
all numbers are unsigned
The output become valid after M clocks and it is indicated by valid signal.
Overflow signal indicates an overflow in the result if it exceeds M the bits.
a pipeline is used between the da lut result and the serial adder to speed up the design.
The input is sampled on the Mth clock.
There is a software written in both C and perl to generate the system parameters
Check the JBits implementation of the core which is suitable for run time reconfiguration

Source files:

da.vhd : defines the serial addition and input buffers.
dalut.vhd : defines the DA look up table.
DApkg.vhd : defines some system constants, parameters and components
Mempkg.vhd : defines some procedures
da_tb.vhd: is the system test bench,
dapkg_gen.c: is a C program that generates Dapkg.vhd in order to customize all system parameters
da_pkg_fract.c: dapkg_gen modified to accept negative and fractional coefficients by James Edgar
dapkg_gen.pl: is a perl program that generates Dapkg.vhd in order to customize all system parameters

Advantages

It reduces the logic needed to implement MAC operator to only adders and shifters
It can be implemented on LUT "Look-up-table" based FPGAs without wasting resources
It has several implementation based on several trade-offs
Most FIR filters have constant operators so using DA algorithm they can be implemented with very custom circuits.

Resources and references

Last update: 30 September 2000