On CBS.com: Women talk about undressing
Find Articles in:
all
Business
Reference
Technology
News
Sports
Health
Autos
Arts
Home & Garden
advertisement
advertisement

Content provided in partnership with
ProQuest

Standard-cell-based design methodology for high-performance support chips

IBM Journal of Research and Development,  Jul-Sep 1997  by Kick, B,  Baur, U,  Koehl, J,  Ludwig, T,  Pflueger, T

We describe the methodology used for the design of a set of CMOS support chips used in the IBM S/390(R) Parallel Enterprise Server Generations 3 and 4. The logic design is based on functional units, and the majority of the logic is implemented by standard cell elements placed and routed flat, using timing-driven techniques. Custom library elements are used wherever needed for performance reasons. Using this approach, a density has been achieved that is comparable to those of contemporary custom designs, combined with very attractive turnaround times.

Introduction

Custom design is the dominant design style for highperformance processors. It offers the advantage of full control over the size and the location of each transistor for performance tuning, but requires considerable effort to implement because of the complexity of a complete transistor-level design. This complexity creates the need to introduce additional hierarchies, usually leading to a "floorplanning" approach.

A standard cell design approach (Figure 1) makes it possible to globally apply advanced optimization algorithms, which reduce the manual effort required and improve the quality of the synthesized logic during layout. The use of basic standard cell elements reduces complexity to the extent that a complete chip design can be handled flat by layout and test generation tools, removing the need for artificial floorplan boundaries. Our approach uses a small number of custom logic macros and custom memory arrays whenever a standard cell solution is not competitive. The major part of the combinational logic portions, however, are implemented in standard cells.

Design entry, synthesis, and simulation are performed on the basis of functional units. There is no need to optimize logic partitioning on the basis of timing, layout, and test considerations. Flat, timing-driven placement and routing without floorplan boundaries minimizes interconnection delay in critical paths. This, coupled with in-place logic optimization, achieves a post-layout cycle time no more than 15% above the zero-net estimate.

The testing methodology we have used consists of design for test (DFT) to ensure high test coverage, and test pattern generation to enable testing, analysis, and debugging of chips in manufacturing. Key are fast turnaround time and high-quality testing.

Test data generation, circuit and logic design, and timing verification are performed with proprietary IBM tools [1-4]. The tools for layout optimization were developed at the Institute for Discrete Mathematics at Bonn, Germany, in close cooperation with the IBM Laboratory in Boeblingen, Germany. This cooperation has minimized the lead time required to incorporate combinatorial optimization research results into our production tools. The approach has been used successfully on a set of CMOS chips which, together with processor and cache, are the heart of the S/390* Parallel Enterprise Server Generations 3 and 4.

System overview

The chip set (Figure 2) consists of processor chips (PU0-PUB), cache chips (L2), and a set of support chips (Clock, MBA, BSN, STC). A tightly coupled S/390 multiprocessor system with up to twelve processors and 16GB physical main memory can be designed with this chip set. The clock chip (Clock) provides the clocks, selftest, and power-on.control logic, and the interface with the service element for all chips in the system. The memory bus adapter (MBA) chips are direct-memoryaccess (DMA) controllers that are the interface between the asynchronous, byte-serial I/O buses and the 16-bytewide system bus. The bus-switching network (BSN) chips hold shared level-3 caches and bus arbiters that control the concurrent access of PUs, MBAs, and system-wide memory. The storage controller (STC) chips are DRAM controllers, supporting transparent refresh, interleaving, and multibit error detection and repair. More details can be found in [5].

Technology and design of custom elements

Technology

The CMOS process [6, 7] used on the chip set was developed by the IBM Microelectronics Division. The technology provides six layers of metallization-one layer for internal circuit wiring only, and four layers for wiring in a 1.8-(mu)m wiring pitch. The last metallization layer is used primarily for wiring redistribution to the chip I/O pads. The technology parameters are shown in Table 1.

Library and chip image

The standard cell library we used provides a set of logic gates, latches, and I/O cells which fit into 3.5 million placement locations and are interconnected through horizontal and vertical wiring tracks defined by the chip image. The I/O cells can be placed anywhere among the 3.5 million legal locations. After chip placement and routing, the unused cell locations are filled with nonpersonalized gate array elements to provide an engineering change capability with metallization changes only.

Custom circuit design

The base standard cell library provides simple logic gates, but a small set of custom logic macros and custom SRAM macros was required for the special needs of the S/390 in order to improve cycle time and density. The custom implementation of the macros gives the circuit designer the freedom to use special circuit design techniques such as dynamic and double-pass circuits [8] to improve the propagation delay.