## **Clock Sharing Double Edge Triggered Flip Flop**

<sup>1</sup>C.N. Marimuthu, <sup>2</sup>Riboy Cheriyan and <sup>3</sup>P. Thangaraj
<sup>1</sup>Department of ECE, Maharaja Engineering College, Avinashi
<sup>2</sup>Maharaja Engineering College, 2nd ME-Applied Electronics, Avinashi
<sup>3</sup>Department of CT, Kongu Engineering College, Perundurai

**Abstract:** In very large scale integration, low power VLSI design is necessary to meet MOORE'S law and to produce consumer electronics with more back up and less weight. For a sequential circuit, most of the power consumption is due to clock signal required. It accounts for 30-60% of the total power dissipation in a system. THE CLOCK system, which consists of the clock distribution network and timing elements, is one of the most power consuming components in a VLSI system. As a result, reducing the power consumed by flip-flops will have a great effect on the total power consumed. Voltage scaling is the most effective way to decrease power consumption, since power is proportional to the square of the voltage. However, voltage scaling is associated with threshold voltage scaling which can cause the leakage to increase exponentially. Besides supply voltage scaling, half of the power on the clock distribution network can be saved by using double-edge clocking. The power of the clocking system =  $P_{\text{clock\_distribution\_network}} + P_{\text{flip\_flop}}$ . Power consumption on the clock distribution network has been reduced to half by cutting the frequency of the clock by one half.

Key words: CMOS, clock branch sharing, low power, double edge, triggered

### INTRODUCTION

For a sequential circuit, more than 50% of total power consumption is due to the clock signal applied. The clock system, which consists of the clock distribution network and timing elements (flip-flops and latches), is one of the most power consuming components in a VLSI system (Kawaguchi and Sakurai, 1998). As a result, reducing the power consumed by flip-flops will have a good effect on the total power consumed. Voltage scaling is the most effective way to decrease power consumption, since power is proportional to the square of the voltage (Chandrakasan et al., 2001). However, voltage scaling is associated with threshold voltage scaling which can cause the leakage to increase exponentially. Besides supply voltage scaling, half of the power on the clock distribution network can be saved by using double-edge clocking (Kawaguchi and Sakurai, 1998). The power of the clocking system =  $P_{\text{clock distribution network}} + P_{\text{flip flop}}$ . Cutting the frequency of the clock by one half will halve the power consumption on the clock distribution network.

In this study we propose a novel clock branch sharing topology to ensure efficient implementation of double-edge clock triggering in an implicit pulsed environment and to overcome the problem with existing implicit pulsed flip-flops which is the large clock load? This sharing concept is similar to the single transistor

clocked FF and another clock branch sharing flip-flop. This clock-sharing scheme reduces the number of clocked transistors. The reduction of the number of clocked transistors reduces the switching activity, thus decreasing the power consumption. In addition the conditional discharge technique and split path technique are used to reduce redundant switching activity at internal nodes and to reduce the short circuit power consumption, respectively. This circuit is designed for 180 nm process technology by using level one model parameters.

# TECHNIQUES FOR IMPLEMENTING DOUBLE EDGE TRIGGERED FLIP-FLOPS

The previous art of DEFF are categorized into 3 groups: conventional DEFF, explicit pulsed DEFF and implicit pulsed DEFF. The DEFF design will use more clocked transistors than SEFF design generally. However, the DEFF design should not increase the clock load too much. The DEFF Design should aim at saving energy both on the distribution network and flip-flops. It is preferable to reduce circuits' clock loads by minimizing the number of clocked transistors (Weste and Harris, 2004). Furthermore, circuits with reduced switching activity would be preferable. Low swing capability is very helpful to further reduce the voltage on the clock distribution network for power saving.

Conventional double-edge triggered flip-flop: Conventional DEFF is a master-slave double-edge flip-flops which are made up of 2 stages, one master and one slave. It is a hard edged flip-flop and is characterized by a positive setup time, causing large D-to-Q delays. The general scheme is shown in Fig. 1.

The conventional way of designing DEFFs is to duplicate the latch part of the single edge flip-flop to achieve sampling input data at both clock edges. This approximately duplicates the area and also increases the load on the data and the clock inputs, which affects performance.

One example of the conventional DE flip-flop in (Chung *et al.*, 2002) is shown in Fig. 2. The left branch samples data when clk =1, the right branch samples data when clkb = 1. The data path is duplicated.

Flip-flops with explicit pulse generator schemes: The master-slave FF has the hard edge property. Pulsed flip-flops allow cycle stealing and are skew tolerant (Tschanz et al., 2001). Explicit DEFFs use a pulse generator outside the latching part; the data latch part does not need duplication. A general scheme is shown in Fig. 3. The double-edge pulse generator could be classified as an XOR using a floating inverter, XOR using pass transistors, or an XOR using transmission gate schemes. The latching part could be transmission gate (TG).

The schematic diagram of the explicit-pulsed dualedge triggered static hybrid flip-flop (ep-DEFF) is shown in Fig. 4. This design achieves a transparency window through an explicitly generated pulse. The pulse generator is elegantly designed based on TG-based XOR logic. The design has a simple structure on the critical path, so it may have less capacitive load on the critical path.

Flip-flops with implicit pulse-generator schemes: Implicit pulsed DE flip-flops use 2 series devices embedded in the logic branch receiving a clock and a delayed clock, respectively (Kim and Kang, 2002; Nedovic and Oklobd Zija, 2005). A general scheme is shown in Fig. 5. The latching part could be TSPC-SPLIT or TSPC.

**Symmetric Pulse Generator Flip-Flop (SPGFF):** The SPGFF is shown in Fig. 6. This design achieves dual-edge triggering with 2 symmetric stages. Each stage responds to one particular transition of the clock, hence, the name symmetric pulse generator flip-flop (Nedovic and Oklobd Zija, 2005).

Two stages X and Y of the flip-flop, shown in Fig. 6, work in opposite phases of the clock; when the clock rises, node Y is going to be charged and node X holds the value captured at the rising edge; when the clock is low,



Fig. 1: General Scheme for conventional dual edge flip flop



Fig. 2: Conventional dual edge flip flop



Fig. 3: General scheme of explicit pulsed DEFF



Fig. 4: Dual edge Static Hybrid Flip Flop (explicit DSFF)



Fig. 5: General Scheme of implicit pulsed DEFF



Fig. 6: Symmetric Pulse generator flip flop

node X is precharged and Y holds the value captured at the falling edge. SPGFF needs 5 clock phases to ensure a correct sampling window. The critical path of the SPGFF is to sample the D-Q 1-> 0 transitions at the CLK rising edge. If during the previous CLK1 rising edge, D=1 and Y is discharged to 0, then D drops to 0; afterwards when CLK rises, CLK1 falls and begins to charge Y. Mp4 outputs a 1 to the NAND. At this point, the NAND has both X = 1 and Y = 1 as inputs. Following that, the NAND's output drops to 0 for a total of 3 gate delays (INV1, P4 and NAND).

Since, SPGFF has 2 symmetric stages, it creates a separate internal node on each stage in the critical path. In addition, redundant switching exists in these nodes. When an input has a lower probability, for example if D stays at 1, node X and Y continually charge and discharge, respectively; the associated nodes X' and Y'



Fig. 7: Double edge conditional Precharge flip flop

(inverter output of X (P9 and N11) and Y (P10 and N12)) switch accordingly. These switching consume power but do not produce anything useful; hence, they are redundant switching activities. This increases the overall power consumption since there are 4 redundant nodes. Due to the dynamic nature of each stage, if D changes from 1-0 after evaluation begins, neither internal node X nor Y can be pulled up, therefore, this D 1->0 transition will not be evaluated during the current clock cycle.

Glitches exist at the output; because of this, caution must be taken when driving the next logic gate to avoid noise propagation.

### Double-EdgeConditionalPrechargeFlip-Flop(DECPFF):

The DECPFF includes an implementation of the conditional precharge technique (Nedovic and Oklobd Zija, 2005). The schematic diagram of DECPFF is shown in Fig. 7. Signal Q is used as a feedback signal to control precharging to reduce redundant switching activity. It uses the clocked branch separating/duplicating scheme. The nMOS clocked transistors of the 1st branch are the same structure as the nMOS clocked transistors of the second branch. Both branches of the nMOS clocked transistors receive exactly the same clocks (CLK, CK and CKD). However, the 2 clock branches work separately. Since it has a complex clocking structure and a large number of transistors that switch with the clock, the benefit of reducing redundant switching activity is somewhat offset by the large clocking power.

While, SPGFF has a total of 16 clocked transistors (including those in the pulse generator and those embedded in the logic), DECPFF has 21 clocked transistors; its total number of transistors is 33, one more than SPGFF. The complex structure as well as the large number of clocking transistors increase the clock load and power consumption. In view of how to implement double-edge clocking, SPGFF uses 5 (21-16) clocked transistors less than DECPFF, thus, it is more efficient than DECPFF.

# PROPOSED DE CLOCK BRANCH SHARING IMPLICIT PULSED FLIP-FLOP (CBS IP)

Conventional DEFFs duplicate the area and the load on the inputs. Explicit pulsed DEFFs use external clock pulse generators, which increase the power. In addition, explicit pulsed DEFFs cannot work with dynamic logic. SPGFF uses implicit pulsing; however, it has 4 internal redundant switching nodes. Unlike SPGFF, DECPFF eliminates the redundant switching activity, however, the number of clocked transistors reaches 21 and the clock branch duplicating structure is complex.

To ensure efficient implementation of double-edge clock triggering in an implicit pulsed environment and to overcome the problem with previous implicit pulsed flipflops which is the large clock load, a novel clock branch sharing topology is proposed.

The sharing concept is similar to the single transistor clocked FF (Zhao et al., 2002) and another clock branch sharing flip-flop. In this new clock branch sharing scheme, Fig. 8, the 2 groups of clocked branches in the previous clock branch separating scheme (DECPFF, Fig. 7) are merged; (N1, N3), (N2, N4) are shared by the first stage and second stage (in the doted circle). Note that a split path (node X does not drive nMOS N6 of the second stage, which is in the output discharging path) is used to ensure correct functioning after merging. The advantage of this sharing concept is reflected in reducing the number of transistors required to implement the clocking branch of the double-edge triggered implicit-pulsed flip-flops. Recall that clocked transistors have a 100% activity factor and consume a large amount of power. Reducing the number of clocked transistors is an efficient way to decrease the power (Weste and Harris, 2004). Using Pseudo nMOS (always on pMOS P1) in CBS ip takes advantage of the fact that D and Ob have inversed polarity resulting from the conditional discharge technique. The discharging path only stays ON for a short while, yielding only a little short circuit current. An inverter is placed after Q, providing protection from direct noise coupling.

The double edge triggering operation of the flip-flop, Fig. 8. is as follows. Qfdbk is used to control N7. When CLK rises, CLKB will stay high for a short interval of time equal to one inverter delay. During this period, the clocked branch (N1 and N3) turns on and the flip-flop will be in the evaluation period. Note that the other clocked branch (N2 and N4) is disconnected. When CLK falls, CLKB will rise and CLKB\_delay will stay HIGH for one inverter delay period during which the transistors N2 and N4 are both on and the flip-flop is in the evaluation mode. The first stage in the design is responsible for capturing 0-> 1 input transitions of D.

The internal node X will discharge causing the outputs Q and Qb to be HIGH and LOW, respectively; N7 turns off by Qfdbk = 0; If the input D stays 1, the first stage is disconnected from ground in the later evaluations preventing node X from experiencing redundant switching activity. The second stage, on the other hand, is responsible for capturing the 1-> 0 input transitions. In this case, the falling transition of the input will cause the pull down network of the second stage to be ON and, thus, forcing the output nodes Q and Qb to be 0 and 1, respectively.

Using a split path in CBS\_ip (P2 is driven by X, N6 by Y, respectively), the capacitance on node X is much smaller than that on node Q, which causes a significant difference in propagation delay through the FF. The reason for this is that node X only drives one device, P2. To further reduce latency, clocked inverters I1 (N8 and P3) and I2 (N9 and P4) are placed to drive bottom clocked transistors N1 and N2, respectively. Before the clock rising/falling edge, the output of I1/I2 turns on N1, N2, respectively, thus, the internal nodes A and B are discharged to ground before evaluation correspondingly



Fig. 8: Proposed CBS\_ip flip flop

and this can reduce the discharge time. Though it has 4 stacked transistors in the first stage, the above methods (split path and moving the early signals near GND) help to reduce the high stack's negative effect on delay.

Using the conditional discharge technique, Qfdbk turns off N7 in 2 gate delays, so we need not use a 3-inverter delay in the clock pulse window. The one inverter window width is sufficient for node X to discharge to ground. This allows N1 and N2 to stay ON longer after the clock rising/falling edge, respectively, before being turned off by the nMOS in I1 and I2, thus, enlarging the pulse width.

A properly sized always-ON pMOS P1 enables a constant charging path, which reduces the effect of charge sharing. P1, N1, N2 and N3 should be properly sized to ensure a correct noise margin; the value of VOL should be small. In summary, the clock-sharing scheme reduces the number of clocked transistors. The reduction of the number of clocked transistors reduces the switching activity, decreasing the power consumption.

#### RESULTS AND DISCUSSION

The simulation results were obtained from XSPICE simulation for transistor model using 180 nm CMOS technology at room temperature with a supply voltage of 1.8V. Each design is simulated using the circuit at the layout level. The value of capacitance at the at the load at Q is 0.3 pF (CBS\_ip and ep-DEFF have their load at Qb). An additional capacitance is placed after the clock driver in the amount of 3 fF. The clock frequency used was 125MHz.

Figure 9 shows the various waveforms of CBS\_ip DEFF at 180 nm process technology. Table 1 presents the power and delay comparison between SPGFF, ep-DEFF and the newly proposed CBS ip.

In summary, the clock-sharing scheme reduces the number of clocked transistors. The reduction of the number of clocked transistors reduces the switching activity, decreasing the power usage. Also, the pseudo-nMOS replaces the pMOS clocking scheme. In addition, the conditional discharge technique and split path technique are used to reduce redundant switching activity at internal nodes and to reduce the short circuit power consumption, respectively.

Table 1: Power and delay comparison between SPGFF, ep-DEFF and the newly proposed CBS-ip

|       | my proposed case ip |         |        |  |
|-------|---------------------|---------|--------|--|
| Type  | ep-DEFF             | SPGFF   | CBS-ip |  |
| Power | 89uW                | 10uW    | 7.19uW |  |
| Delay | 7.707nS             | 10.11nS | 8.45nS |  |



Fig. 9: CBS ip Wave forms of D, Clk, Qb and Qfdbk

16.7n

Ref = Graound

8.33n

#### IMPLEMENTATION AND COMPARISON

25n

33.3n

x=8.33n/Div y=voltage

The functional implementation can be done using VHDL and the functional verification can be done using modelsim. Each design is simulated using the circuit at the layout level. In deep submicron technology, delay strongly depends on the internal gate capacitance, parasitic capacitance and wiring capacitance. Further, the capacitance affects the dynamic switching power and the short circuit power as well.

Delay is measured from data D to output Q (except for CBS\_ip and ep-DEFF, where delay is measured from D

to  $Q_h$ ). Delay is the sum of the setup time plus CQ delay. Minimum D-to-Q delay is an appropriate metric for flipflops because it reflects the correlations between D-to-Clock delay, Clock-to-Q delay and the D-to-Q delay. The D-to-Q delay is obtained by sweeping the LOW-to-HIGH and HIGH-to-LOW data transition times with respect to the clock edge and the minimum data to output delay corresponding to optimum setup time is recorded. Since both clock edges are used to sample data in DEFF, 4 cases of DQ are checked: sweep the high to low data transition, the same way as, with respect to the clock rising edge /falling edge, respectively; then sweep the low to high data transition with regard to the clock rising/fall edge, respectively, too. The worst case DQ delay is recorded. SPGFF suffers from large power consumption because of the large number of the nodes switching with the clock. Since the CMOS logic style has a typical activity factor of about 0.1, the clocks have an activity factor of 1. Further, there are 4 nodes (X, Y, X' and Y') switching redundantly at each clock rising edge and falling edge when D remains 1, without doing useful work. It also has a glitch at the output.

In view of power of all the designs, the newly proposed CBS\_ip has the lowest power consumption. The low power consumption is due to 4 main factors. First, it has a clock branch sharing topology, where fewer transistors are clocked, which efficiently reduces the clock load. Second, the conditional discharge technique employed in the latch eliminates the redundant switching activity. Third, the split path technique reduces the short circuit current in the second stage. Fourth, an implicit pulse generator scheme with one inverter delay is used which further reduces power consumption.

In view of PDP, CBS\_ip is comparable to ep-DEFF and better than SPGFF. However, ep-DEFF has a drawback of an exposed input diffusion subject to noise and a ratio concern. Standard cell latches are usually built with buffered inputs rather than exposed diffusion nodes (Weste and Harris, 2004). If add one inverter at the input to avoid the exposed input diffusion, ep\_DEFF's PDP will degrade. In addition, ep-DSFF uses an explicit pulse generator, so it can not be used with dynamic logic. Ep-DSFF cannot work with low swing clock.

### CONCLUSION

In this study, we surveyed the double-edge clocking flip-flops and classified them into 3 groups. Conventional DEFF duplicate the latching component, hence, duplicating the area and increasing the input loads. The explicit DE pulsed flip-flops have an external pulse generator, so they have higher power consumption. The newly proposed CBS\_ip uses a clock branch sharing scheme to sample the clock transitions, which efficiently reduces the number of clocked transistors. It employs the conditional discharge technique and the split path technique to reduce the redundant switching activity and short circuit current, respectively. The CBS\_ip flip flop has the least number of clocked transistors and lowest power; hence, it is suitable for use in high-performance and low-power environments.

### REFERENCES

- Chandrakasan, A., W. Bowhill and F. Fox, 2001. Design of High-Performance Microprocessor Circuits. 1st Edn. Piscataway, NJ: IEEE.
- Chung W., T. LO and M. Sachdev, 2002. A comparative analysis of low power low-voltage dual-edge-triggered flip-flops. IEEE. Trans. Very Large Integr. (VLSI) Sys., 10 (6): 913-918.
- Kawaguchi, H. and T. Sakurai, 1998. A Reduced Clock-Swing Flip-Flop (RCSFF) for 63% power reduction. IEEE. J. Solid-State Circuits, 6 (33): 807-811.
- Kim, C.L. and S. Kang, 2002. A low-swing clock double edge-triggered flip-flop. IEEE. J. Solid-State Circuits, 5 (37): 648-652.
- Nedovic, N. and V.G. Oklobd Zija, 2005. Dual-edge triggered storage elements and clocking strategy for low-power systems. IEEE. Trans. Very Large Scale Integr. (VLSI) Sys., 5 (13): 577-590.
- Rabaey, J., A. Chandrakasan and B. Nikolic, 2003. Digital Integrated Circuits. 2nd Edn. Englewood Cliffs, NJ: Prentice-Hall.
- Tschanz, J., S. Narendra, Z. Chen, S. Borkar, M. Sachdev and V. De, 2001. Comparative delay and energy of single edge-triggered and dual edge triggered pulsed flip-flops for high-performance microprocessors. In: Proc. ISPLED., pp. 207-212.
- Weste, N. and D. Harris, 2004. CMOS VLSI Design. Reading, MA: Addison Wesley.
- Zhao, P., T. Darwish and M. Bayoumi, 2002. Low power and high speed explicit-pulsed flip-flops. In: Proc. 45th IEEE Int. Midw. Symp. Circuits Sys. Conf., pp. 477-480.