Latency Optimized Asynchronous Early Output Ripple Carry Adder Based on Delay-insensitive Dual-rail Data Encoding
Asynchronous circuits employing delay-insensitive codes for data representation i.e. encoding and following a 4-phase return-to-zero protocol for handshaking are generally robust. Depending upon whether a single delay-insensitive code or multiple delay-insensitive code(s) are used for data encoding, the encoding scheme is called homogeneous or heterogeneous delay-insensitive data encoding. This article proposes a new latency optimized early output asynchronous ripple carry adder (RCA) that utilizes single-bit asynchronous full adders (SAFAs) and dual-bit asynchronous full adders (DAFAs) which incorporate redundant logic and are based on the delay-insensitive dual-rail code i.e. homogeneous data encoding, and follow a 4-phase return-to-zero handshaking. Amongst various RCA, carry lookahead adder (CLA), and carry select adder (CSLA) designs, which are based on homogeneous or heterogeneous delayinsensitive data encodings which correspond to the weak-indication or the early output timing model, the proposed early output asynchronous RCA that incorporates SAFAs and DAFAs with redundant logic is found to result in reduced latency for a dualoperand addition operation. In particular, for a 32-bit asynchronous RCA, utilizing 15 stages of DAFAs and 2 stages of SAFAs leads to reduced latency. The theoretical worst-case latencies of the different asynchronous adders were calculated by taking into account the typical gate delays of a 32/28nm CMOS digital cell library, and a comparison is made with their practical worst-case latencies estimated. The theoretical and practical worst-case latencies show a close correlation. The proposed early output 32-bit asynchronous RCA, which contains 2 stages of SAFAs in the least significant positions and 15 stages of DAFAs in the more significant positions, reports the following optimizations in latency over its architectural counterparts for a similar adder size: i) 35.3% reduction in latency over a weakindication section-carry based CLA (SCBCLA), ii) 30.5% reduction in latency over a weak-indication hybrid SCBCLA-RCA, iii) 20.2% reduction in latency over an early output recursive CLA (RCLA), iv) 18.7% reduction in latency over an early output hybrid RCLA-RCA, and v) a 13% reduction in latency over an early output CSLA that features an optimum 8-8-8-8 uniform input partition.