Efficiency of A Self-Timed Ripple Adder

Feodor
Vainstein, Ph.D.

Professor of Computer Engineering

GTREP & ECE

Georgia Institute of Technology

6001 Chatham Center Dr., Suite 350

Savannah, GA 31405, USA

Tel: 912.790.3440, Fax: 912.651.7279

Email: fvainste@gtrep.gatech.edu,
vainstei@ece.gatech.edu** **

Lev B. Levitin, Ph.D.

Distinguished Professor of Engineering Science

Department of Electrical and Computer Engineering

Boston University

Boston, Massachusetts, >USA

Tel: 617.353.4607

Fax: 617.353.6440

Email: levitin@bu.edu

**Abstract**

An expression for the average carry propagation delay in a ripple carry adder is obtained which is exact up to terms of the order . The case of several adders working in parallel is also considered.

**Introduction **

Self-timed circuit techniques were first investigated in the 1950s by Huffman and Muller, and continued in the 1960s by Miller. However, clocked circuits were more economical in their use of components and were universally adopted. The advent of very large scale integration in the 1980s changed the balance and the availability of millions of devices on a chip means that clock skew is a more serious problem and also that the cost of components is less of an obstacle. This sounds like a panacea to designers of today's large-scale circuits but it requires a totally new methodology for design. Various problems of self-timed design discussed in the following publications [1-16]

A synchronous self-timed system consists of a number of functional units equipped by completion detectors. Initially, with a new clock pulse, completion its are cleared by the completion detectors circuitry when the operation being performed by units is completed. An AND gate is used to signal to the clock generator that all units completed their operations and a new clock pulse can be issued.

With this approach, the throughput of a system becomes considerably higher compared with the systems using traditional approach, when the rate of clock pulses is determined by the longest possible propagation delay in the functional units. In this paper we evaluate the potentials in speeding up for the case when we have only one functional unit that is a ripple carry adder. A more general case of m ripple carry adders operating in parallel is also considered.

An
estimate of the average carry propagation delay was first obtained in [17]. A
tighter upper bound was obtained in [18]. This paper presents a much simpler
analytical approach that yield an expression for the average delay which is
asymptotically exact with the remainder of _{ } is the number of full
adders in the ripple adder.

By definition, the propagation delay in a combinational circuit is the duration of the time interval between the moment when all the input signals are stabilized and the moment when all output signals are stabilized.

Propagation delay depends on both the present inputs and on the inputs immediately proceeding to the present ones. Often the term “propagation delay of a circuit” is used to denote the upper bound of propagation delays for all possible inputs.

Denote
by _{ }the maximum propagation delay in one full adder. It is clear
that in the worst case the propagation delay of n-bit carry ripple adder is
equal to _{ }. This value is used to determine the clock frequency of a
conventional (not self-timed) system. In this paper we will show that in the
average case the propagation delay is much smaller than _{ }. In fact it is equal to _{ }, where _{ } is a constant
approximately equal to 0.668.

It follows from this result that a 64-bit ripple carry adder can work on average almost 9 times faster if the self-timed approach is used.

**Stabilization
time determined by stoppers and runners **

Consider a ripple carry adder that consists of n full adders. We assume that the input bits

_{ } are independent random
variables taking on values of 0 and 1 with equal probabilities. At time _{ }, the output carry for some full adders will stabilize. In
particular, if _{ } then _{ }=1 in _{ } and _{ } will not change
thereafter unless a new set of inputs is applied.

Similarly
c then _{ }=0 in _{ } and _{ } will not change
thereafter unless a new set of inputs is applied.

Hence
if, for example, for every _{ } _{ }, then the carry propagation delay will be less than or equal
to _{ } and all the outputs of
the adder will be stable at time _{
}.

In fact, the actual
time depends on the values of carries that appeared as a result of computation
with inputs immediately preceding the present ones. In the most unfavorable
case, however, the carry propagation delay is determined by the maximum number
of consecutive bits of the input for which _{
}. Let us call input values _{
} “stoppers”, and values
_{
} “runners”. Obviously, for completely random
inputs, stoppers and runners occur independently with equal probabilities.
Denote stoppers by 0 and runners by 1. Then the total stabilization time _{
}, where _{
} is the random variable that is the maximum
length of a run of consecutive 1’s in a random sequence of 0’s and 1’s.** **

**The
average propagation delay in a ripple carry adder****. **

As known in previous section, the propagation delay in a ripple carry n-bit adder is a linear function of the maximum length of a run of 1’s in a random sequence of zeros and ones of length n, where the bits take on values of 0 and 1 with equal probabilities.

Thus the problem is reduced to the following: find the distribution of the maximum length of a run of 1’s in a random sequence of zeros and ones.

Denote
by_{ }the number of such sequences where the maximum length of a
run of 1’s is smaller _{ }.

The
cumulative distribution function (cdf) of the maximum length _{ } of a run is given by

_{ } (1)

It
is easy to see that _{ }satisfies the following recurrence equation:

_{ } (2)

with the boundary condition

_{ } (3)

It
is well known that for large _{ }the solution of equation (2) is asymptotically equal to

_{ } (4)

where
_{ } is the largest real
root of the characteristic equation:

_{ } (5)

Substituting into the bounding condition (3), we obtain:

_{ } (6)

Hence,

_{ } (7)

The
largest root of (5) can be expressed asymptotically for _{ }, which gives a good approximation of _{ }:

_{ } (8)

Hence, asymptotically,

_{ } (9)

Thus,
from (1) and (9) the cdf of _{ } is

_{ } (10)

The
expected value _{ } is given by

_{ } (11)

For
large _{ }, the function _{ } changes very rapidly
from 0 to 1 in a narrow region. Therefore, a good estimate of is given by _{ }, where _{ } is the root of the
equation:

_{ } (12)

Solving (12), we obtain:

_{ } (13)

Note
that for large _{ },

_{ } (14)_{ }

Expression
(14) allows us to obtain an accurate evaluation of _{ } using (11).

_{ } (15)

where
_{ } is a constant _{ }

Finally, we obtain that the average propagation delay in a ripple carry adder is approximately

_{ } (16)

Let
us consider now another version of the problem. Suppose, we have _{ }such _{ }-bit adders working in parallel. Then the delay is equal to
the maximum delay among all _{ }adders. The latter is proportional to the maximum length _{ }of the runs of _{ } in all m adders. The
cumulative distribution function of _{ } is

_{ } (22)

As
a result, the expected value of _{ } is:

_{
}
(23)

**Conclusion
**The results obtained above
give the exact value of the average carry propagation delay up to terms of
order

It is interesting
to compare the results obtained under the assumption random input statistics
with empirical data. There exists experimental evidence [19] that data arithmetic
operations carry propagation chains can be very long, exceeding _{
}. One possible explanation of this effect is that very often in such
computations a small number (that has a long run of 0’s starting with the
leftmost digit) is subtracted from another small number, which creates a long
sequence of runners. If this is the case the situation can be corrected by
special means with small hardware overhead.

**References **

- Simon
Moore, Ross Anderson, Paul Cunningham, Robert Mullins, George Taylor,
*Improving Smart Card Security using Self-timed Circuits*, Eighth International Symposium on Advanced Research in Asynchronous Circuits and Systems, 2002 - Simon
Moore, George Taylor, Robert Mullins, Peter Robinson,
*Point to Point GALS Iterconnect*, Eighth International Symposium on Advanced Research in Asynchronous Circuits and Systems, 2002 - S.W.
Moore,
*Protecting Consumer Security Devices --- The Next 10 Years*, published by Springer in LNCS 2140, Cannes, September 2001. - G
S Taylor, S W Moore, P Robinson,
*An on-chip dynamically recalibrated delay line for embedded self-timed systems*, Sixth International Symposium on Advanced Research in Asynchronous Circuits and Systems, 2000. - S
W Moore, R Anderson, M Kuhn,
*Self-timed Technology to Reduce Smartcard Fraud*, ACiD-WG Workshop, Grenoble, 2000. - S.W.
Moore, G.S. Taylor, P.A. Cunningham, R.D. Mullins and P.Robinson,
*Self Calibrating Clocks for Globally Asynchronous Locally Synchronous Systems*, International Conference on Computer Design, Austin Texas, September 2000. - G
S Taylor, S W Moore, P
Robinson,
*Frequency Locked Loops: Adding Some Asynchronous Circuits to Synchronous Circuits,*7th UK Async. Forum, 1999 - P
Cunningham, G S Taylor, P Robinson, S W Moore,
*Cheating Safely in Asynchronous Design*, 7th UK Async. Forum, 1999 - S
P Wilcox,
*Synthesis of Asynchronous Circuits,*Technical Report 468, Computer Laboratory, University of Cambridge, 1999 - S
P Wilcox, G S Taylor, S W Moore, P Robinson,
*Designing a 16-bit ALU for an Embedded Processor,*6th UK Async. Forum, 1999 - S
W Moore, P Robinson,
*Rapid Prototyping of Self-timed Circuits*, ICCD, 1998. - S
W Moore, P Robinson, S Wilcox,
*Engineering Arbiters for FPGAs*, 4th UK Async. Forum, 1998. - S
W Moore, P Robinson,
*Geometry Planning for Self-Timed ASIC Design*, ACiD-WG Workshop, Turin, 1998. - S
P Wilcox, S W Moore, P Robinson,
*Flow Table Synthesis*, ACiD-WG Workshop, Turin, 1998. - S
W Moore, P Robinson, S Wilcox,
*RED Flip-Flops and Hybrid Completion Detection*, ACiD-WG Workshop, University of Groningen technical report CSN 9602, 1997. - S
W Moore, P Robinson, S Wilcox,
*Rotary Pipeline Processors*, Special issue of IEE Computers and Digital Techniques on Asynchronous Processors, 1996. - 17. A. W. Burks, H. H. Goldstine, and J. von Neumann, “Preliminary discussion of the logical design of an electronic computing instrument”, in Collected Works of John von Neumann”, A. H. Taub, ed., vol.5, New York, Macmillan, 1963, pp.34-79
- 18. B. E. Briley, ;Some new results on average worst case carry’, IEEE Trans. on Computers, vol C-22, No 5, May 1973, pp. 459-463.
- 19. K. Y. Yun, P. A. Beerel, V. Vakilotjar, A. E. Dooply, J. Arceo, “The design and verification of a high-performance low-control-overhead asynchronous differential equation solver”, IEEE Conference, 1997, pp. 140-153